[PDF] A Data-Driven Approach to Dynamically Adjust Resource Allocation for Compute Clusters

Abstract

Nowadays, data-centers are largely under-utilized because resource allocation is based on reservation mechanisms which ignore actual resource utilization. Indeed, it is common to reserve resources for peak demand, which may occur only for a small portion of the application life time. As a consequence, cluster resources often go under-utilized. In this work, we propose a mechanism that improves cluster utilization, thus decreasing the average turnaround time, while preventing application failures due to contention in accessing finite resources such as RAM. Our approach monitors resource utilization and employs a data-driven approach to resource demand forecasting, featuring quantification of uncertainty in the predictions. Using demand forecast and its confidence, our mechanism modulates cluster resources assigned to running applications, and reduces the turnaround time by more than one order of magnitude while keeping application failures under control. Thus, tenants enjoy a responsive system and providers benefit from an efficient cluster utilization.

Full PDF

AA Data-Driven Approach to Dynamically Adjust ResourceAllocation for Compute Clusters

Francesco Pace , Dimitrios Milios , Damiano Carra , Daniele Venzano and Pietro Michiardi Data Science Department, Eurecom, Biot Sophia-Antipolis, France Computer Science Department, University of Verona, Verona, Italy { pace,milios,venzano,michiard } @eurecom.fr [email protected] Abstract

Nowadays, data-centers are largely under-utilized becauseresource allocation is based on reservation mechanismswhich ignore actual resource utilization. Indeed, it is com-mon to reserve resources for peak demand, which may oc-cur only for a small portion of the application life time. Asa consequence, cluster resources often go under-utilized.In this work, we propose a mechanism that improvescluster utilization, thus decreasing the average turnaroundtime, while preventing application failures due to con-tention in accessing ﬁnite resources such as RAM. Our ap-proach monitors resource utilization and employs a data-driven approach to resource demand forecasting, featur-ing quantiﬁcation of uncertainty in the predictions. Us-ing demand forecast and its conﬁdence, our mechanismmodulates cluster resources assigned to running applica-tions, and reduces the turnaround time by more than oneorder of magnitude while keeping application failures un-der control. Thus, tenants enjoy a responsive system andproviders beneﬁt from an efﬁcient cluster utilization.

Data-center efﬁciency is a subject that attracted a vastamount of research [6, 65, 49, 62, 54, 11, 2]. Recently, thecloud computing paradigm, both in its public and privateforms, fueled the proliferation of a wide array of resourcemanagement tools [62, 54, 17, 31] aiming at an efﬁcientoperating point, where cluster resources are fully utilized.Despite such efforts, data-center resources go often underutilized, as shown in recent traces from large-scale pro-duction clusters [53, 63]: in most cases ( ∼ applications theuse of distributed frameworks such as Apache Spark [4]and Google TensorFlow [27] that include different com-ponents to produce work. Reservation centric resource allocation.

In most privateor public cloud systems, users gain access to computingresources by specifying the amount of resources requiredto run their application, in the form of a reservation re-quest. Upon receiving a request, the cluster scheduler decides which application to serve based on the schedul-ing policy the provider implements (e.g., First-In-First-Out (FIFO)). Cluster schedulers operate according toseveral variants of objective functions, including fairnessacross users, service-level objectives, and various mea-sures of performance. In this work, we focus on twocommon optimization objectives: (i) average turnaroundtime (also called completion time) and (ii) cluster utiliza-tion [54, 37, 3]. The ﬁrst metric accounts for the averagetime requests spend in the system (queuing and executiontimes). The second metric considers the utilization of theavailable resources. Optimizing for such objectives trans-lates in high system responsiveness, which is desirable forboth tenants and providers.Cluster schedulers use mechanisms to provision and1 a r X i v : . [ c s . D C ] J u l anage resources: given a resource request , the resourcemanager determines its admission in the cluster based on reservation information. An admitted request triggers a resource allocation procedure, which concludes with re-served resources being allocated to the request [54]. Inmost system implementations, the concept of reservationand allocation coincide, although neither is representativeof the true resource utilization a request might induce onthe system. In fact, resource utilization is generally notconstant throughout a request lifetime, and ﬂuctuates ac-cording to application behavior [64].The main consequence for current cloud environmentsis that reservation requests are engineered to cope with peak resource demands of an application, which is onekey factor that induces poor system utilization, and ul-timately, negatively impacts system efﬁciency. This isexacerbated by coarse-grained reservation speciﬁcations:instance ﬂavors exhibit discrete gaps in terms of resourceunits. In fact, picking the right conﬁguration for cloud ap-plications (and in particular for the “big data” applicationswe consider in this paper) is a daunting task [1], which re-quires sophisticated optimization mechanisms going be-yond human tuning abilities.Thus, mechanisms to reduce resource slack , which isdeﬁned as the difference between resource allocation andutilization, are truly needed, for they can prevent clus-ters from denying admission to new requests which wouldqueue up, while spare capacity goes unused.

Problem Statement.

We study the problem of clus-ter efﬁciency by reducing the resource slack induced byreservation-centric application schedulers, which matchallocation to reservation. To do so, we introduce a newmechanism that predicts the resource utilization and ad-justs the resource allocation accordingly. The main chal-lenge to face is that prediction errors may have problem-atic consequences, since sudden spikes could wreak havocthe system [62]. When dealing with ﬁnite resources suchas RAM, in fact, not providing the correct amount of re-sources leads to application failures. Careful engineer-ing would suggest to introduce a buffer that will act as“safe-guard” to prediction errors. This results in a trade-off , since on the one hand the safe-guard buffer should besmall to minimize slack, while on the other hand it shouldbe sufﬁciently large to prevent application failures.Previous works (a detailed description is provided inSection 2) usually consider shareable resources, such asCPU, where the effect of wrong resource dimensioningdoes not translate into application failures. Other ap-proaches consider resource over-provisioning, where theslack is not continuously optimized, and where the appli- In our prose, we neglect several important technical details that arehowever irrelevant to our point, such as quota management, security as-pects, and concurrency control, to name a few. cation failures can be unpredictable and are taken care bythe Operating System (OS).In our approach, we leverage on three key ideas: pre-diction conﬁdence, application elasticity and controlledfailures. In the prediction process, most of the tools pro-vide additional information about the conﬁdence of theprediction. We use such information to dynamically adaptthe safe-guard buffer that should prevent application fail-ures. In addition, the frameworks, on which the appli-cations are based, are composed by several elements thatare characterized by either a core or elastic nature [42].Core components are compulsory for a framework to pro-duce useful work (e.g, Apache Spark requires a controller,a master, and one worker); elastic components, instead,optionally contribute to a job, e.g. by decreasing its run-time. An application that features only core componentsis called rigid , whereas applications with a mix of coreand elastic components are called elastic . If the resourcedemand is higher than the available resources, we inter-vene (when possible) on elastic components to avoid ap-plication failures. As a last step, should the previous twomechanisms not be sufﬁcient to provide enough resources,we explicitly decide which application should fail so thatto minimize the amount of wasted work. Contributions.

In this paper we present our design ofa data-driven resource shaping mechanism that improvescluster utilization, thus decreasing the average turnaroundtime, while preventing application failures due to resourcecontention. Our approach monitors resource utilizationand relies on online forecasting of resource demand tomodulate allocated resources such as they approximateutilization patterns well. Our experiments, that we con-duct on a system simulator as well as a full-ﬂedged imple-mentation using real-life data-center traces, indicate sub-stantial gains over existing alternatives. In summary, thecontributions we present in this work are as follows: • We present the design of a mechanism that dynam-ically adjusts resources allocated to applications byan existing scheduler. In this work, we target a spe-ciﬁc family of application schedulers, and material-ize our ideas for such systems. • We compare parametric and non-parametric machinelearning methodologies for the forecasting of re-source utilization. In particular, we focus on accuratequantiﬁcation of uncertainty, which is used to steersystem parameters to safeguard against unexpectedresource demand peaks. • We perform an extensive simulation campaign us-ing publicly available production traces from Googledata-centers, and discuss about the trade-off that anoptimistic vs. a pessimistic approach to applicationpreemption entails. We also present a full-ﬂedged2mplementation of our mechanism, that we use in anacademic compute cluster serving hundreds of stu-dents and researchers. Our results indicate substan-tial improvements in terms of efﬁciency, which trans-late in a system capable of ingesting a heavier work-load with the same number of machines.The remainder of the paper is organized as follows. InSection 2 we review the related literature. In Section 3we present our system design, and we validate our ideasusing a simulation campaign in Section 4. We present ourprototype implementation in Section 5 and its evaluationin Section 5.1. Finally we conclude in Section 6.

Resource allocation has been approached in many differ-ent ways in the literature [62, 35, 34, 11, 2, 14, 13, 23, 28,49, 38, 56, 30, 6, 65, 43, 16, 44, 15].The authors in [35, 34] use feedback control loopwhich requires every framework to periodically sendapplication-speciﬁc information to the scheduler, whichis used to steer resource allocation. In contrast, our ap-proach does not require such instrumentation, as it is ap-plication agnostic: we use general metrics to dynamicallyadjust resources allocated to running applications.The authors in [11] introduce a reservation-basedscheduler and propose a Reservation Deﬁnition Language(RDL) that allows users to declaratively reserve access tocluster resources. They formalize the planning of currentand future cluster resources as a mixed-integer linear pro-gramming problem and they integrate their work in YARN[61]. In our work, we avoid delegating this task to users byasking them to specify such information; generally, usershave no knowledge of how their applications will behave.The authors in [43] develop a feedback control loop forvirtual machines, using a simple regression model to fore-cast future allocation. They show that it is possible toreduce the CPU resource slack, but they do not addressmemory and the consequences that under-provisioningsuch resource has on applications, as we do in our work.The authors in [7] adopt a distributed scheduling archi-tecture, whereby each scheduler aims at minimizing taskcompletion time by careful placement strategies that useestimates of task runtime and their resource utilization.Contrary to our work, they use over-provisioning of re-sources and they tackle conﬂicts in an optimistic-manner.Our approach cooperates with an existing scheduler, in-stead of replacing it, and does not use task runtime to ad-just cluster resources allocated to applications.Some other works [6, 56] propose to address the prob-lem with economics principles. In particular, in [56]the authors build a pricing model that enables infrastruc-ture providers to incentivize their tenants to use grace- ful degradation, a self-adaptation technique originally de-signed for constructing robust services that survive re-source shortages. The authors in [6], present a frameworkfor scheduling and pricing cloud resources, aimed at in-creasing the efﬁciency of cloud resources usage by allo-cating resources according to economic principles. How-ever, they achieve that by allocating more capacity thanwhat is physically available, i.e., over-provisioning, whichis a solution prone to uncontrolled failures when utiliza-tion exceeds available resources.Finally, works such as [33, 40, 49, 38, 2, 14, 13, 23, 28],focus either on resource placement or on meeting ServiceLevel Objective (SLO). In the ﬁrst case they relate toa packing problem and try to optimize it; Karanasos etal [33] suggest to dynamically re-balance the load acrosshosts if the packing performed at a certain time leads touneven loaded hosts. In the second case they leverage theelasticity of some frameworks and they increase resourcesfor applications that are falling behind on their SLO. Ourwork is orthogonal to such methods and can leverage themto improve the system performance.The authors in [66] propose task scheduling and dataplacement techniques that rely on historical resource uti-lization. Speciﬁcally, they process the history of CPU uti-lizations using the Fast Fourier Transform (FFT). Lever-aging the k -Means algorithm, they cluster patterns in threecategories: periodic, constant and unpredictable. They ex-ploit the patterns of periodic and constant categories toimprove the quality of task scheduling.Albeit all these works are valid and propose their ownvision of the problem, they share one element: althoughsome of them address a multi-dimensional packing prob-lem for provisioning resources to applications, when itcomes to reclaiming resources granted to applicationsthey mostly focus on “time sharable” resources, like theCPU, rather than “ﬁnite” resources like Memory. As aconsequence, such methods are limited to improve systemefﬁciency from the perspective of CPU utilization.An example of prior work that modulates “ﬁnite” re-sources is Borg [62]. Borg features a resource reclamationsystem that seizes unused resources and offers them toother applications. The authors study the impact of wrongmemory reallocation on running tasks, which causes re-source contention: the OS enters a special state to killprocesses that are OOM. The authors present differentlevels of “rigidity” for their reclamation system (baseline,medium and aggressive) and show both the beneﬁt and thenumber of OOMs events for each of them. They conclude The OS kills processes due to Out Of Memory (OOM) following itsown algorithm. On the one hand, a resource is considered “time sharable” when theOS is able to use time sharing for scheduling it, and thus it does notimpose limits on its availability. On the other hand, “ﬁnite” resourcesare those that cannot be sliced in time and thus cannot be effectivelyshared by multiple processes. luster StateApplication Scheduler Backend[e.g. Docker]Resource Monitor Resource ShaperUtilization ForecastingApplication Request( Reservation ) Allocation AllocationAllocation UtilizationPredictionNewAllocationAllocation Confidence

Figure 1: System overview: shaded boxes represent ex-isting components, white boxes indicate new componentspresented in this work.by accepting the trade-off obtained by the medium set-ting. Instead, we present a dynamic allocation system thatrelies on online resource forecasting, with accurate quan-tiﬁcation of uncertainty. In addition, we seek to gain con-trol over the OS and minimize application failures eventswhile maximizing the resource utilization.

What sets apart our approach from previous work is asfollows. We use on-line forecasting with quantiﬁcation ofuncertainty to steer system behavior. This is necessarybecause, contrary to previous works, we explicitly takeinto account ﬁnite resources which, if handled improperly,can lead to failures. Additionally, we operate on low-levelUNIX processes, and take control over the OS for shapingthe resources allocated to applications.

Figure 1 illustrates the architecture we assume in ourwork. The backend module is an instance of a clustermanagement system, such as Docker [18] or Kubernetes[26]. Additionally, we assume the presence of an ap-plication scheduler such as [42], which reads the com-pute cluster state from a dedicated database component.Finally, the monitoring component populates the clusterstate database with measurements taken from the back-end. In this Section, we focus on the two additional com-ponents we present in this paper: the utilization forecast-ing module, and the resource shaper module.A bird’s view on the operation of our system is as fol-lows. Application execution requests take the form ofresource reservations, which are submitted to the appli-cation scheduler. The application scheduler admits therequest based on reservation information alone, and in-structs the back-end to provision the necessary resources.The resource monitor collects information about both al-located and used resources, which are fed to the systemstate and the forecasting component respectively. The re-source shaper module gauges resource allocation to match predicted utilization patterns, and is responsible for thepreemption of running applications in case of suddenpeaks in resource demand. The modiﬁed resource alloca-tion is reﬂected in the system state, which in turn triggersnew scheduling decisions. Next, we describe in detail thecomponents that materialize our ideas.

Resource monitor.

This module collects informationabout resource allocation and utilization from every com-ponent of every running application. This happens at reg-ular time intervals: higher frequencies provide more ac-curate views, but generate more data. Our goal is to mini-mize intrusiveness by being application agnostic: for thisreason we do not instrument applications (as done for ex-ample in [35]), but take standard metrics (CPU, memory,etc) as they are seen by the OS.

Utilization forecasting.

The goal of this module is to an-ticipate the resource utilization of every application com-ponent. We study both parametric and non-parametricmodeling approaches to predict resource utilization, withemphasis on the quantiﬁcation of the uncertainty associ-ated to these predictions. A more detailed exposition ofthe methodology we employ can be found in Section 3.1.

Resource shaper.

This module uses utilization forecaststo adjust the resources allocated to every component ofrunning applications. We anticipate prediction errors, thuswe compensate using a “safe-guard” buffer of size β toartiﬁcially increase (that is, to force over estimation) pre-dicted peak resource utilization. A more detailed exposi-tion of β can be found in Section 3.2.Additionally, the resource shaper is in charge of appli-cation preemption. Preemption policies can either be op-timistic [54, 62] or strict (pessimistic). We advocate for astrict policy, to avoid delegating application preemption tothe OS, which manages resource shortage (such as OOM)in an application agnostic and “unpredictable” way. A de-tailed exposition of the preemption policy can be found inSection 3.2. The forecasting module is responsible for making pre-dictions about future resource utilization, for each appli-cation component. For a given application, we forecast both

CPU and memory utilization using monitoring data,which is available in the form of a time series that reﬂectsresource usage across time . We seek to discover patternsof resource usage that allow reasoning about our expecta-tions on the future state of the system utilization.We advocate for the need to quantify the level of un-certainty associated with each prediction: predictive er-rors may have serious impact on “ﬁnite” resources (i.e. Other types of resource can be considered as well. parametric

Au-toregressive Integrated Moving Average (ARIMA) modelto an alternative non-parametric model that offers a prin-cipled quantiﬁcation of uncertainty. On the one hand,we use state-of-the-art ARIMA implementations that au-tomatically tune hyper parameters and that provide amethod to compute conﬁdence levels associated to pre-dicted values [8]. On the other hand, we model resourceutilization using Gaussian Process (GP) regression [50],which is a Bayesian non-parametric regression methodwith many attractive features. Bayesian approaches con-trol model complexity and thus avoid problems such asover-ﬁtting [39]. Moreover, GPs offer a sensible frame-work for tuning their hyper parameters, through evidencemaximization, that does not require cross-validation ap-proaches which are typically more expensive and unprac-tical in the context of our work. Finally, the output ofa GP regression model is a predictive distribution, ratherthan a single prediction, which allows reasoning about un-certainty in a principled way.

ARIMA is often considered as the “go-to method” fortime series forecasting: it is a generalization of the Au-toregressive Moving Average (ARMA) model to copewith non-stationary time series data, which appear fre-quently in real-life applications such as the one we con-sider in this paper. Considering observation y t at time t ,the ARMA( p , q ) model is described as follows: y t − α y t − − ... − α p y t − p = (cid:15) t + θ (cid:15) t − + ... + θ (cid:15) t − q (1)where α are the parameters of the autoregressive part ofthe model, the θ are the parameters of the moving averagepart and the (cid:15) are error terms. In particular, p and q areintegers greater than or equal to zero and refers to the or-der of the autoregressive and moving average parts of themodel respectively.The underlying idea of ARIMA is that current valuesof a time series can be obtained by a linear combinationof its past values, using ﬁnite differencing to produce sta-tionary data. Formally, the ARIMA( p , d , q ) model usinglag polynomials is given below: (1 − p X i =1 φ i L i )(1 − L ) d y t = δ + (1 + q X I =1 θ i L i ) (cid:15) t (2) where p = p − d , δ is a constant and L is deﬁned asthe lag or back-shift operator. d is an integer greater thanor equal to zero and refer to the order of the integratedparts of the model and controls the level of differencing.Generally d = 1 is enough in most cases. An in-depthdiscussion about ARIMA can be found in [9].In this work, model selection, that is, searching throughcombinations of order parameters to pick the set that op-timizes model ﬁt criteria, is carried out using the Akaikeinformation criteria, a method that is widely available inmost ARIMA implementations. Note that parameter opti-mization is an operation that needs to be performed mul-tiple times during a forecasting period, to adapt to varia-tions in the time series characteristics.Finally, most ARIMA implementations output conﬁ-dence intervals associated with the selected model param-eters [9]. We note that conﬁdence intervals should not beconfused with prediction intervals: the former are asso-ciated to the probability of the true model parameters tobe within the conﬁdence interval, whereas the latter areassociated to the likely range of future values output bythe model. As discussed in the literature [9], conﬁdenceintervals for the mean are generally much narrower thanprediction intervals. This has a direct consequence in thecontext of our work, which revolves around the idea ofusing predictive conﬁdence to steer system behavior: forthis reason, in the next section, we develop a Bayesian ap-proach to time series modeling that features a principledapproach to compute predictive conﬁdence. In the GP literature, time series are treated as state spacemodels, which are generalizations of auto-regressivemodels [41, 22]. Considering state x t and observation y t at time t , a state space model is described as follows: x t +1 = f ( x t ) + (cid:15) t y t = g ( x t ) + v t (3)where f ( x t ) is the state transition function and (cid:15) t is theprocess noise, which follows a normal distribution. Thestate x t may not be observed directly; an observation y t isgiven as a function of the state g ( x t ) , which is additionallycorrupted by observation noise v t .According to Equation (3), a time series is modeled asa non-linear Markovian dynamical system. The Markovproperty implies that the current state x t is conditionally independent from past states { x τ : τ < t − } , given theprevious state x t − . The same is not true for the observa-tions however. Thus, given a collection of noisy observa-tions { y τ : τ ≤ t } , the goal for time series prediction isto infer the future state x t +1 . This requires learning thefunctions f and g , which involves placing a GP prior over5 and g . However, the posterior over a non-linear dynam-ical system is not Gaussian, thus several approximationmethods have been proposed in the literature [60, 21, 59].In the context of recording resource utilization, we canmake some simplifying assumptions. It is reasonable toassume that an observation y t matches the state x t . Ofcourse, we have to acknowledge that resource utilizationconstantly ﬂuctuates; these ﬂuctuations however can besufﬁciently explained by the noise term (cid:15) t , which now ac-counts for both the process and the observation noise. Weshall additionally make the dependency on past states ex-plicit; for a history window of size h , we consider thefollowing state-space model: y t = f ( y t − , . . . , y t − h ) + (cid:15) t (4)To make predictions, we shall learn the transition func-tion f by means of standard GP regression. From Equa-tion (4), the transition function depends on the history ex-plicitly. In this way, we avoid the additional costs of ap-proximating the true posterior of a non-linear dynamicalsystem.A GP model transfers information across points thatare considered similar, as this is reﬂected in the choiceof kernel k ( x, x ) , which determines the prior covariancebetween inputs x and x . If we assume that the inputs X solely consist of the recorded times, then similarity is onlya matter of temporal locality, which is not optimal prac-tice if the aim is to predict sudden changes of behaviorthroughout the course of a time series.Hence, we resort to the deﬁnition of a kernel that re-lies on the observation history. It is implicitly assumedthat if two sequences of observations are similar, then theymust have been caused by the same “hidden” backgroundprocesses; it is reasonable then to extrapolate and predictthat the future observations will be similar as well. Sucha history-dependent kernel can be easily constructed bytransforming the data in an appropriate way. Consider ahistory window of size h , the training instances will beutilization patterns expressed as vectors of the form: ˜ x t = [ x t , y t − h , . . . , y t − ] > (5)where x t is the t -th recorded time. Therefore, the history-dependent kernel is implemented by applying a typical ex-ponential kernel on the transformed inputs: k h ( x, x ) = k (˜ x , ˜ x ) (6)Two different inputs x and x will be similar if they havea similar history pattern, or equivalently, if the h precededinputs have similar outputs. Note that we have kept therecorded times x t along with the history, thus we do notcompletely ignore locality in the original input space. From a practical perspective, the forecast component op-erates in an online manner . As long as new data isavailable, the predictive model will be trained and sub-sequently queried about the future workload. Dependingon the modeling methodology, our approach is as follows.

Using the ARIMA model.

The online training and pre-diction process that uses ARIMA operates by appendingthe new resource utilization data to the collection of ob-servations gathered so far. ARIMA hyper-parameters areoptimized using well-known methods [46, 51], which areknown to be computationally expensive. Alternatively,works like [32] propose a stepwise algorithm (instead ofusing grid-search) that improves performance.The k -step ahead forecast error is a linear combinationof the future errors entering the system after time t : e t ( k ) = y t + k − ˆ y t ( k ) where ˆ y t ( k ) is the estimated value. Since E[ e t ( k ) | y t ] = 0 ,the forecast ˆ y t ( k ) is unbiased with Mean Squared Error(MSE): MSE[ y t ( k )] = Var[ e t ( k )] Given these results, if the process is normal, the − α ) forecast interval is: [ y t ( k ) ± N α/ p Var[ e t ( k )] ] where N α/ is the multiplicative factor to obtain the per-centile. Using the GP model.

The online training and predictionprocess that uses GP regression operates as follows:1. New resource utilization data is appended to the col-lection of observations X , y . The rows of X are pat-terns as deﬁned in Equation (5).2. Using a history-dependent kernel k h ( x, x ) , Equa-tions (7) and (8) are used to make predictions basedon observations X , y .Under the assumption of a zero-mean prior and a Gaus-sian likelihood , that is, for any input-output pair we have y ∼ N ( f ( x ) , σ ) , the posterior is also a GP whose meanand covariance can be calculated analytically as follows: E[ f ( x ) | X ] = k h ( x, X )( k h ( X , X ) + σ ) − y (7) Var[ f ( x ) | X ] = k h ( x, x ) − k h ( x, X )( k h ( X , X ) + σ ) − k h ( X , x ) (8)The predicted value at a new point will be the expectationunder the posterior distribution, and the posterior variancequantiﬁes the uncertainty about the prediction.6 P-Exp GP-RBF0.02.55.07.510.012.515.0 P e r c e n t a g e ( % ) Prediction errors h=10 h=20 h=40ARIMA

Figure 2: Boxplot showing error distribution of predictedutilization for a collection of time series in our academiccluster with different history points and, in case of GP,different kernels. The red triangle is the mean.The regression step can be computationally expensive.Equations (7) and (8) involve a matrix inversion (for k ( X , X )+ σ ), which is an operation of cubic complexity.Moreover, the set of observations X , y will grow indeﬁ-nitely during the lifetime of the system. While there is aplethora of methodologies on sparse GPs in the literature[58, 47, 48, 10], that can be used to reduce the complexityof regression, in this work we adopt the simple solution ofrestricting the dataset X , y to the N latest observations,thus keeping the model tractable. Note that N is the num-ber of patterns used; it should not be confused with h ,which is the size of each pattern. Numerical results.

We have applied our modeling ap-proaches on a dataset consisting of approximately 6000time series that monitor the memory usage of applicationsin our academic cluster. Figure 2 summarizes the empiri-cal distribution function for the predictive errors observedacross the entire dataset, using ARIMA and GP.In case of GP we forecast the future value using dif-ferent number of past observations h = [10 , , , with N = h . As seen in Figure 2, increasing the value of h results in smaller prediction errors. Also for the imple-mentation of the history-dependent kernel as described inEquation (6), we have experimented both with the expo-nential and the squared-exponential (also known as RBFin the literature) functions. Figure 2 implies that the ex-ponential implementation (GP-Exp) outperforms the RBF(GP-RBF) choice in terms of prediction error. Results forthe GP are in line with our expectations, as the time seriesin question are typically not smooth. For the experimentsof Section 4 and Section 5.1, we consider the exponentialimplementation of the history-dependent kernel only.With ARIMA we observe that setting p = h (so theautoregressive order equal to the history size) is overrid-den by hyper-parameter optimization, which yields p ≤ . Hence, the results for ARIMA do not depend on h . FromFigure 2, it appears that ARIMA performs slightly bet-ter compared to GP for the median test error. Also thevariance of the predictive error is smaller than with theGP model, an indication of a possible “over-conﬁdence”in the model predictions. Our experimental results dis-cussed in Section Section 4 corroborate this intuition:over-conﬁdence leads to higher application failure rates,and an overall lower system efﬁciency, when compared tothe GP model we present in this work. We now delve into the details of the resource shaper mod-ule, which we use to adjust resource allocated to an ap-plication and its components as a function of predictedutilization. When resource are underutilized, the resourceshaper “redeems” the excess capacity such that the ap-plication scheduler can dequeue idle applications. Onthe contrary, upon a utilization spike, the resource shaperneeds to redeem resources from running applications anddedicate them to those experiencing a peak demand, forotherwise such applications are doomed to fail. Thus, thegoal of the preemption policy we associate to the resourceshaper is to decide how to redistribute resources, by oper-ating on running applications and their components. Sucha policy can optionally account for application priorities,as dictated by the application scheduler. Note that, irre-spectively of the chosen preemption policy, a failed appli-cation is resubmitted to the application scheduler , makingsure it enters the scheduling queue in a position commen-surate to its original priority.Recent works (for example [62]) advocate for an op-timistic preemption policy, which is reminiscent of opti-mistic concurrency control [54]: resources are redeemedwithout taking explicit actions to manage the conse-quences of resource redistribution. Either explicit (andoften manually set) priorities determine the fate of run-ning applications, or the task is left to the OS.Here, we present an alternative preemption policy,which we call pessimistic . Our goal is to control whichapplication should be partially or fully preempted , whileminimizing the amount of work that is wasted.Algorithm 1 presents the details of our pessimisticpreemption policy implemented by the resource shaper,which is triggered at regular time intervals, as determinedby the output produced by the forecasting module. Giventhe current cluster state, and the resource utilization fore-casts, the algorithm computes a new resource allocation We consider preemption primitives such as a kill operation,which inevitably waste work. Component or application suspension [45]and migration are outside the scope of this work. Alternatively, it wouldbe interesting to consider techniques such as [29], which would allow agraceful management of memory pressure. lgorithm 1: Overview of the pessimistic pre-emption policy implemented by the resourceshaper module.

Data:

H ←

Hosts,

A ←

Running Applications cpusF ree ← Array ( H ) memF ree ← Array ( H ) foreach host ∈ H do cpusF ree [ host ] ← host.totalCpus memF ree [ host ] ← host.totalMem J ← S ORT ( schedulingP olicy , A ) foreach req ∈ J do cpus ← cpusF ree mem ← memF ree remove ← F alse foreach c ∈ req.CoreCpts do cpus [ c.host ] ← cpus [ c.host ] − c.futureCpus − β if cpus [ c.host ] < then remove ← T rue break mem [ c.host ] ← mem [ c.host ] − c.futureMem − β if mem [ c.host ] < then remove ← T rue break if remove then I NSERT ( req , K ) else cpusF ree ← cpus memF ree ← mem E ← S ORT ( timeAlive , req.ElasticCpts ) foreach e ∈ E do cpus ← cpusF ree [ e.host ] − e.futureCpus − β mem ← memF ree [ e.host ] − e.futureMem − β if cpus ≤ or mem ≤ then I NSERT ( e , K E ) else cpusF ree [ r.host ] ← cpus memF ree [ r.host ] ← mem foreach req ∈ K do foreach c ∈ ( req.CoreCpts ∪ req.ElasticCpts ) do P REEMP C OMPONENT ( c ) foreach e ∈ K E do P REEMP C OMPONENT ( e ) foreach req ∈ J \ K do foreach c ∈ ( req.CoreCpts ∪ req.ElasticCpts ) do R ESIZE C OMPONENT ( c ) for each running application, which is then imposed onthe cluster by operating directly on application compo-nents through low-level preemption primitives. The algorithm starts by initializing (lines 1-5) the vari-ables that holds the information about the allocated re-sources. Then it sorts (line 6) running applications accord-ing to the application scheduler policy (e.g.; FIFO, that is,arrival times), and it computes (lines 7-33) an allocationby trying to maximize the resource allocation while mini-mizing the number of running applications. In particular,it ﬁrst allocates the core (lines 8-19) components and thenall elastic components that ﬁt in the host (lines 23-33).The algorithm continues until all running applications areprocessed.Resource allocation is determined, and we can turn ourattention to preemption. Core components that no longerﬁt a host entail full application preemption (lines 34-36).Also elastic components can be preempted (lines 37-38),inducing only a partial application preemption. In addi-tion, in case of elastics components, we can experiencepartial or entire loss of the work done by the preemptedcomponent. For this reason, our algorithm allocates thecore components of an application, then moves to the elas-tic components by giving priority to the ones that havebeen living in the cluster for a longer time (line 25). Com-ponents recently scheduled are the best candidates for pre-emption, because they have likely produced less usefulwork. Finally, the algorithm resizes (lines 39-41) the com-ponents according to the computed allocations. Our algo-rithm currently supports CPU and Memory, but it can beextended to other types of resource as well. Safe-guard buffer.

We are now ready to deﬁne the “safe-guard” buffer. The buffer size β is a function of the un-certainty quantiﬁed by the forecasting module: β = K R A i + K V A i (9)where R A i is the initial resource request for application A i , and V A i is the estimated variance of the prediction,as these are given by the forecasting module (ARIMA orGP). Equation (9) involves a constant term K R A i and adynamic term K V A i . The constant term can be thoughof as a minimum resource allocation that is granted to ap-plication A i . The dynamic term uses the conﬁdence (ex-pressed as variance V A i ) given by the predictor to adjust β accordingly: it thus changes during an application life-time. In Section 4, we study how different values of K and K affect the performance of our method. We evaluate our mechanism using an event-based, trace-driven discrete simulator which was developed to study In case the application scheduler does not support the distinctionbetween core and elastic, all components are treated as core. to support the concepts of this work.We use publicly available traces [63, 52, 53, 25], andgenerate a workload by sampling from the empirical dis-tributions computed from such traces. Our workload iscomposed by 150.000 batch applications, both rigid (e.g.TensorFlow) and elastic (e.g. Apache Spark) variants.Applications are assigned a number of components rang-ing from a few to tens of thousands. The resource require-ments of application components follow that of the inputtraces, ranging from a few MB of memory to a few dozensof GB, and up to 6 CPU cores. Application runtime is gen-erated according to the input traces, and ranges from a fewdozens of seconds to several weeks (of simulated time).Inter-arrival times are drawn from the empirical distribu-tions of the input traces, and exhibit a bi-modal distribu-tion with fast-paced bursts, and longer intervals betweenapplication submissions.We simulate a cluster consisting of 250 homogeneousmachines, each with 32 cores and 128GB of memory. Allresults shown here include 10 simulation runs, for a totalof roughly 3 months of simulation time for each run.The metrics we use to analyze the results include: (i) application turnaround , which allows reasoning aboutthe scheduling objective function, (ii) resource slack ,measured as the difference of percentage of CPU andmemory the scheduler allocates to each application com-pared to the percentage actually used by the applicationand (iii) application failures , which give us informationabout the aggressiveness of our approach. Next, we present experimental results that demonstratethe advantage of our resource shaping mechanism, com-pared to a baseline approach which matches allocation toreservation. Two alternatives for time series prediction areexamined. We ﬁrst consider an ideal setup with an oraclehaving perfect information about future workload: thisallows to determine an upper bound of the performancegains achieved by our approach. Then, we compare thetwo models developed in Section 3.1 (ARIMA and GP),to investigate the impact of prediction errors on systemperformance.

Baseline.

It constitutes a reservation centric approach(similar to Mesos and Yarn, as originally implemented inthe Omega simulator [54]) that achieves the performancereported in Figure 3. This approach relies entirely on theresource requested by the application (when submitted)in order to allocate resources in the cluster and does not https://github.com/DistributedSystemsGroup/cluster-scheduler-simulator CPU Memory0.00.20.40.60.81.0 P e r c e n t a g e ( % ) Resource Slack T i m e ( s ) Turnaround

CPU Memory0.00.20.40.60.81.0 P e r c e n t a g e ( % ) Resources Slack

Baseline Dynamic Optimistic Dynamic Pessimistic

Figure 3: Boxplots comparing baseline vs optimistic vspessimistic approaches over different metrics, using an or-acle in place of the prediction module. The red triangle isthe mean.modify them at runtime.

Oracle-based resource shaping.

We gloss over predic-tion errors induced by a real statistical model and consideran ideal scenario from the forecasting point of view. Ul-timately, our goal is to discern virtues and drawbacks ofdifferent preemption policies. Results are summarized inFigure 3: the plots correspond to resources slack and ap-plication turnaround, whereas each box correspond to thebaseline and our resource shaping approach, with an opti-mistic (as originally implemented in the Omega simulator[54]) and our pessimistic preemption policy. Note thatour simulator implements the concept of work lost whenan application component crashes or gets killed.Overall our results indicate that resource shaping bringssubstantial beneﬁts in terms of all metrics we consider, inthe absence of prediction errors. Cluster efﬁciency im-proves because resource slack, computed as the differencebetween allocated and used resources, drastically shrinksas shown in Figure 3 (left) compared to the baseline. Sim-ilarly, turnaround times are notably smaller as shown inFigure 3 (right) in comparison to the baseline. Indeed, thesystem can ingest new applications more quickly, becauseresources are better used.Figure 3 can now be used to compare optimistic ver-sus pessimistic eviction policies, in absence of predictionerrors. While both approaches improve over the base-line, the pessimistic policy we introduce in this work isconsistently superior to the optimistic policy in all re-spects. As shown in Figure 3 (left), the pessimistic policyinduces our resource shaping mechanism to follow veryclosely application resource utilization: in this case, re-source slack becomes negligibly small. This result ex-plains why turnaround times, Figure 3 (right), are almosttwo orders of magnitude smaller with the pessimistic pol-icy: by freeing up resources, the application scheduleris amened to trigger new executions, thus queuing timesshirk. Furthermore, we compute the number of applica-tion failures: in case of the optimistic policy we record37.67% application failures, whereas with the pessimistic9 K Turnaround R a t i o K Memory Slack % K Crashes % (a) ARIMA K Turnaround R a t i o K

71 30 31 35 39 54 67 7171 26 30 35 40 55 67 7169 24 29 35 40 55 67 7168 22 29 35 41 55 67 71

Memory Slack % K Crashes % (b) GP Figure 4: Heat maps showing the effect of K and K , which compose β , on different metrics when using ARIMAand GP. Bright cells are better.policy no application fails. Indeed, with the optimisticpolicy, when two applications compete for resources andthere are none left, the system will let one of the two fail.Instead, the pessimistic policy avoids failures through par-tial preemption, by freeing elastic resources ﬁrst. ARIMA-based resource shaping.

Next, we study thesystem behavior when using ARIMA to predict future re-source utilization. As anticipated in Section 3, statisticalmodels are prone to prediction errors, which we addressusing the buffer β . A key feature of our approach is that β is a function of the uncertainty produced by the model.In practice, when the predictor outputs a future (peak) re-source utilization, we adjust the value by adding the buffer β . In Figure 4a we demonstrate the effect of the bufferparameters ( β = f ( K , K ) ) on the turnaround ratio overthe baseline, the memory slack and application failures(we show average results). In all cases, bright cells arebetter.On the x-axis, K controls the static component ofEquation (9), which gauges the minimum amount of re-sources systematically granted to applications. The valueof K is expressed as a percentage of the requested re-sources; when K = 100% our approach degenerates tothe baseline. On the y-axis, K controls the dynamic com-ponent of Equation (9), which integrates prediction un-certainty. We let K vary in the range [0 , , , which deﬁne different bands around the mean of the predictiveGaussian distribution, according to the three-sigma rule.Let’s ﬁrst slice Figure 4a by row, and focus on the K = 0 case: here we omit uncertainty informationand only consider the effects of a static, minimum re-source allocation. Even with just K = 5% , our ap-proach achieves 7.5x average improvements in terms ofapplication turnaround, while resource slack is only 30%in average. However, the number of crashed applicationis high: roughly 26% of applications experience a failurein average, and the situation improves only for large val-ues of K . In the limit, when K = 100% , our methoddegenerates to the baseline: here no application fail, butturnaround times and slack exhibit no improvements. Inour system, when an application crashes it is resubmittedand, after a certain amount of failures, the system is notshaping its allocation anymore. Also, even if applicationscrash they can still beneﬁt from being able to start soonerthan a baseline system because other applications wereable to complete their work sooner.We note that the absence of a static term (i.e. K = 0% )results in turnaround that is very close to the baseline re-gardless of K , due to the high number of applicationsfailures which also lead to an high memory slack. Thisis a consequence of the occasional high conﬁdence of thepredictor in cases where a sudden change in the usage be-havior occurs. It is necessary to maintain a static com-10onent to accommodate unexpected variations, which arevery difﬁcult to capture with statistical methods.Finally, we focus on K = 5% : the minimum resourceallocation is small, and we absorb prediction errors andﬂuctuations using uncertainty information. However, as K increases, all metrics remain similar: the uncertaintyproduced by the ARIMA model is not sufﬁciently accu-rate to compensate forecasting errors. GP-based resource shaping.

Next, we study the sys-tem behavior when using GP regression to predict futureresource utilization. Similarly to the ARIMA-based re-source shaping, in Figure 4b we demonstrate the effect ofthe buffer parameters. However, we can see that whileGP gives slightly worst results when not considering theuncertainty of the forecasting values ( K = 0 ) comparedto ARIMA, as K increases, all metrics improve: aver-age turnaround ratios increase up to 10.6x improvement,average slack is reduced to a 22% in average, while appli-cation failures quickly decrease.In our setup, the best performance is achieved when thesystem is most ﬂexible regarding the size of the buffer,i.e., a high value for its dynamic and a small value for itsstatic components. In summary the results show that, for the best conﬁg-uration of parameters with a real predictor and not anoracle, tunraround time and resource slack is more thanhalved in the median case, both in terms of CPU andmemory resources. By using the uncertainty provided bythe forecasting model based on the GP, we are able to im-prove these metrics further, achieving 10.6x improvementcompared to the baseline for the turnaround time.

We materialize the ideas presented in this paper with afull-ﬂedged, python-based, implementation of our mecha-nism, following the system design presented in Section 3,and depicted in Figure 1. For this work, we build the re-source shaper to interact with the application schedulerpresented in [42], which we recently adopted to man-age our workloads. In our implementation, the resourceshaper modulates both CPU and memory resources.In our cluster, we use Docker [17] as the back-end andwe have investigated how to resize its containers (corre-sponding to application components). There are two val-ues that Docker uses to check for Memory limits: a hardand a soft limit. When the hard limit is surpassed, thecontainer is killed by the OS. Instead, when the soft limitis reached, the OS tries to release some resources ﬁrst.In our work we use the soft limit value since the applica-tion scheduler we use takes decisions based on such value.In particular, we rely on the OS low level mechanismsto notify the processes running in the container to free some of their resources. This practice is compatible withframeworks such as the Java Garbage Collector (GC) thatattempts to release allocated but unused memory space.Note that our technique is compatible with approachessuch as [30], which trade performance for a smaller mem-ory footprint.The monitoring component feeds the utilization fore-cast module with data at regular time intervals. Frequentupdates ultimately result in better system efﬁciency, as thepredictor operates on a high-ﬁdelity view of resource uti-lization in the cluster. However, this might impose a hightoll in terms of monitoring scalability. On the other hand,infrequent updates improve scalability at the expense oflower system efﬁciency and responsiveness. In our im-plementation, we collect resource utilization informationevery minute, which is in line with what done in [62].Next, we provide additional details of our prototype.

Forecasting module.

It implements the two models wediscuss in Section 3.1. For the ARIMA model we use thewell-known

StatsModel [55] library, which features anefﬁcient implementation of the ARIMA model and its au-tomatic parameter tuning through the Pyramid wrapper[24]. For the GP model we use the well-known library

GPy [57]. Both models consider a small history of theten past observations for training, to keep computationalcomplexity under control.

Resource shaping module.

It materializes the ideas pre-sented in Section 3.2. The ultimate goal of the resourceshaper is that of issuing commands to preempt (kill, inour implementation) an entire application, or individualcomponents thereof, and to resize the resource allocation,as computed by the by Algorithm 1. It is important topoint out that the resource shaper adapts resource alloca-tions only after enough historical data points are availablefor the forecasting module: we call this a grace period ,and set it to 10 minutes in our experiments.The resource shaper uses the mechanisms exposed byDocker (as discussed above) to adjust application re-sources, and to eventually preempt components or entireapplications. This module computes a new resource allo-cation for all running application in the system, based onthe predicted value and variance obtained from the fore-casting module. The buffer β is set to compensate for pre-diction uncertainty, using the parameters that we obtainthrough simulations, that is K = 5% and K = 3 . We have deployed the mechanism presented in this paperin our cluster, which we operate using [20]. Our goal isto perform a comparative analysis between dynamic re-source shaping and a baseline, as done in Section 4. Thebaseline system supports the concept of distributed appli-11 P e r c e n t a g e ( % ) Memory Slack T i m e ( s ) Turnaround

CPU Memory0.00.20.40.60.81.0 P e r c e n t a g e ( % ) Resources Slack

Baseline Dynamic Pessimistic

Figure 5: Boxplots comparing baseline vs pessimistic dy-namic approach over memory slack and turnaround timedistributions using GP-based resource shaping. The redtriangle is the mean.cations [42], but follows a reservation centric approach, inwhich allocation matches reservation for the entire appli-cation lifetime. In our experiments, we consider exactlythe same workload trace on both systems which takesapproximately 24 hours from the ﬁrst submission to thecompletion of the last application.

Workload.

We use two representative application tem-plates including: 1) an elastic application using theApache Spark framework; 2) a rigid application using theTensorFlow framework. Similarly to the traces used inSection 4, we set our workload to include 60% of elasticand 40% of rigid applications, for a total of 100 applica-tions. Application inter-arrival times follow a Gaussiandistribution with parameters µ = 120 sec, and σ = 40 sec, which is compatible with what we observe in ourcluster. Regarding the elastic application templates, weconsider three use cases. First we consider an applica-tion that induces a random-forest regression model to pre-dict ﬂight delays, using publicly available data from theUS DoT [19]. Second we consider a music recommendersystem based on the alternating least squares algorithm,using publicly available data from Last.fm [36]. Third weconsider an Extract, Transform and Load (ETL) applica-tion. All applications have 3 different ﬂavors: while theyall have 3 core components, the number of elastic compo-nents varies depending on the ﬂavor. In terms of RAM,all ﬂavors have different reservation values that span from8GB to 32GB. Instead, using the rigid application tem-plate, we train a deep GP model [12], and use a singleTensorFlow instance, with 1 worker and 8-16-32GB ofRAM depending on the ﬂavor. Experimental setup.

We run our experiment on aisolated platform (which we use as testbed for non-production systems) with ten servers, each with a 8-coreCPU running at 2.40GHz, 64GB of memory, 1Gbps Eth-ernet network fabric and two 1TB hard drives each. Theservers use Ubuntu 14.04 and Docker 17.09.0. Docker im-ages for the applications are preloaded on each machine to prevent startup delays and network congestion.

Summary of results.

Using the FIFO scheduling pol-icy, and the GP-based utilization forecasting module, wecompare the two systems, baseline and dynamic. Overall,the dynamic system is largely more efﬁcient and respon-sive. We measure substantial improvements in terms ofresource allocation: indeed our system can afford to ingestmore applications, that would otherwise wait to be served.Figure 5 (left) illustrates resource slack, which is roughly40% lower with our resource shaping mechanism. As aconsequence, applications spend less time in the sched-uler queue and have short turnaround times, as shown inFigure 5 (right). The median turnaround times are ∼ shorter. Note also that the tails of the distributions arein favor of our approach. Finally, we report that no ap-plication, nor component failed when using our resourceshaping mechanism, conﬁgured with the pessimistic pre-emption policy. The emergence of “the data-center as a computer”paradigm has led to unprecedented advances in clus-ter management frameworks, that aim at exposing dis-tributed, cluster resources to a variety of business-criticaland scientiﬁc applications. However, the current re-source reservation model hinders an efﬁcient use of clus-ter resources. Resource utilization dynamics induce over-provisioning, which is one of the main culprit of poor ef-ﬁciency. The problem of underutilization has been ad-dressed by several approaches. For example, the design ofeconomic incentives to steer system operation has led tothe development of complex resource markets, e.g. AWSSpot instances, which call for the design failure tolerantapplications, due to the ephemeral nature of the resourcesthey are offered.In this work, we presented a mechanism that cooperateswith a scheduler to dynamically adjust resources allocatedto an application, so that they closely match those they ac-tually use throughout their lifecycle. Our design featured:a method to build a statistical model to forecast resourceutilization, and a preemption policy that reallocates sys-tem resources while minimizing failures.We have validated our mechanism numerically andwith a real experimental campaign. Our simulations shedlights on the key role played by our ability to model anduse prediction uncertainty, and by the use of strict preemp-tion vs. optimistic concurrency control. We implementeda system prototype of our dynamic allocation mechanismand deployed it in a test environment, where we executeda real workload. Results indicate notably improved sys-tem efﬁciency, which translates in better responsiveness.12 eferences [1] O. Alipourfard, H. H. Liu, J. Chen, S. Venkatara-man, M. Yu, and M. Zhang. Cherrypick: Adaptivelyunearthing the best cloud conﬁgurations for big dataanalytics. In

NSDI , pages 469–482, 2017.[2] G. Ananthanarayanan, C. Douglas, R. Ramakrish-nan, S. Rao, and I. Stoica. True elasticity in multi-tenant data-intensive compute clusters. In

Proceed-ings of the Third ACM Symposium on Cloud Com-puting , page 24. ACM, 2012.[3] Apache. Aurora. http://aurora.apache.org/.[4] Apache. Spark. http://spark.apache.org/.[5] A. W. S. (AWS). Elastic map reduce (emr).https://aws.amazon.com/emr/.[6] M. Babaioff, Y. Mansour, N. Nisan, G. Noti,C. Curino, N. Ganapathy, I. Menache, O. Reingold,M. Tennenholtz, and E. Timnat. Era: A frameworkfor economic resource allocation for the cloud. In

Proceedings of the 26th International Conference onWorld Wide Web Companion , pages 635–642. In-ternational World Wide Web Conferences SteeringCommittee, 2017.[7] E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou,Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable andcoordinated scheduling for cloud-scale computing.In

OSDI , volume 14, pages 285–300, 2014.[8] G. Box, G. Jenkins, and G. Reinsel.

Time SeriesAnalysis, Forecasting and Control . Wiley Series inProbability and Statistics. Wiley, 2008.[9] P. J. Brockwell and . R. A. Davis.

Introductionto Time Series and Forecasting, Second Edition .Springer, 2002.[10] K. Chalupka, C. K. I. Williams, and I. Murray. Aframework for evaluating approximation methodsfor gaussian process regression.

J. Mach. Learn.Res. , 14(1):333–350, Feb. 2013.[11] C. Curino, D. E. Difallah, C. Douglas, S. Krishnan,R. Ramakrishnan, and S. Rao. Reservation-basedscheduling: If you’re late don’t blame us! In

Pro-ceedings of the ACM Symposium on Cloud Comput-ing , pages 1–14. ACM, 2014.[12] K. Cutajar, E. Bonilla, P. Michiardi, and M. Filip-pone. Random feature expansions for deep Gaussianprocesses. In

ICML 2017, 34th International Con-ference on Machine Learning, 6-11 August 2017,Sydney, Australia , Sydney, AUSTRALIA, 08 2017. [13] P. Delgado, F. Dinu, D. Didona, and W. Zwaenepoel.Eagle: A better hybrid data center scheduler. Tech-nical report, Tech. Rep, 2016.[14] P. Delgado, F. Dinu, A.-M. Kermarrec, andW. Zwaenepoel. Hawk: Hybrid datacenter schedul-ing. In

USENIX Annual Technical Conference(USENIX ATC’15) , pages 499–510, 2015.[15] M. Dell’Amico, D. Carra, and P. Michiardi. Psbs:Practical size-based scheduling.

IEEE Transactionson Computers , 65(7):2199–2212, 2016.[16] M. Dell’Amico, D. Carra, M. Pastorelli, andP. Michiardi. Revisiting size-based scheduling withestimated job sizes. In

Modelling, Analysis & Simu-lation of Computer and Telecommunication Systems(MASCOTS), 2014 IEEE 22nd International Sympo-sium on

Ad-vances in Neural Information Processing Systems .MIT Press, 2007.[22] R. Frigola-Alcalde.

Bayesian Time Series Learningwith Gaussian Processes . PhD thesis, University ofCambridge, 2015.[23] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski,S. Shenker, and I. Stoica. Dominant resource fair-ness: Fair allocation of multiple resource types. In

NSDI

ACM SIGCOMM ComputerCommunication Review , volume 44, pages 455–466.ACM, 2014.1329] J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G.Shin. Efﬁcient memory disaggregation with inﬁn-iswap. In

NSDI , pages 649–667, 2017.[30] W. U. Hassan and W. Zwaenepoel. Don’t cryover spilled records: Memory elasticity of data-parallel applications and its application to clusterscheduling. In

USENIX Annual Technical Confer-ence (USENIX ATC 17) , 2017.[31] B. Hindman et al. Mesos: A platform for ﬁne-grained resource sharing in the data center. In

Proc.of the USENIX NSDI 2011 , NSDI’11, pages 295–308, Berkeley, CA, USA, 2011. USENIX Associa-tion.[32] R. J. Hyndman, Y. Khandakar, et al.

Automatictime series for forecasting: the forecast package forR . Number 6/07. Monash University, Department ofEconometrics and Business Statistics, 2007.[33] K. Karanasos, S. Rao, C. Curino, C. Douglas,K. Chaliparambil, G. M. Fumarola, S. Heddaya,R. Ramakrishnan, and S. Sakalanaga. Mercury: Hy-brid centralized and distributed scheduling in largeshared clusters. In

USENIX Annual Technical Con-ference , pages 485–497, 2015.[34] A. Kuzmanovska, R. H. Mak, and D. Epema. Dy-namically scheduling a component-based frame-work in clusters. In

Workshop on Job SchedulingStrategies for Parallel Processing , pages 129–146.Springer, 2014.[35] A. Kuzmanovska, R. H. Mak, and D. Epema. Koala-f: A resource manager for scheduling frameworksin clusters. In

Cluster, Cloud and Grid Computing(CCGrid), 2016 16th IEEE/ACM International Sym-posium on

Handbook of scheduling: algorithms,models, and performance analysis . CRC Press,2004.[38] D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan,and C. Kozyrakis. Heracles: improving resource ef-ﬁciency at scale. In

ACM SIGARCH Computer Ar-chitecture News , volume 43, pages 450–462. ACM,2015.[39] D. J. C. MacKay.

Information Theory, Inference &Learning Algorithms . Cambridge University Press,2003. [40] H. Mao, M. Alizadeh, I. Menache, and S. Kan-dula. Resource management with deep reinforce-ment learning. In

Proceedings of the 15th ACMWorkshop on Hot Topics in Networks , pages 50–56.ACM, 2016.[41] A. McHutchon.

Nonlinear Modelling and Controlusing Gaussian Processes . PhD thesis, Universityof Cambridge, 2015.[42] F. Pace, D. Venzano, D. Carra, and P. Michiardi.Flexible scheduling of distributed analytic applica-tions. In

CCGRID 2017, 17th IEEE/ACM Interna-tional Symposium on Cluster, Cloud and Grid Com-puting, May 14-17, 2017, Madrid, Spain , Madrid,SPAIN, 05 2017.[43] P. Padala, K. G. Shin, X. Zhu, M. Uysal, Z. Wang,S. Singhal, A. Merchant, and K. Salem. Adaptivecontrol of virtualized resources in utility computingenvironments. In

ACM SIGOPS Operating SystemsReview , volume 41, pages 289–302. ACM, 2007.[44] M. Pastorelli, D. Carra, M. Dell’Amico, andP. Michiardi. Hfsp: bringing size-based schedulingto hadoop.

IEEE Transactions on Cloud Computing ,5(1):43–56, 2017.[45] M. Pastorelli, M. Dell’Amico, and P. Michiardi.Os-assisted task preemption for hadoop. In

Dis-tributed Computing Systems Workshops (ICDCSW),2014 IEEE 34th International Conference on , pages94–99. IEEE, 2014.[46] Pyramid. Auto-arima.[47] J. Qui˜nonero Candela and C. E. Rasmussen. A uni-fying view of sparse approximate gaussian processregression.

J. Mach. Learn. Res. , 6:1939–1959, Dec.2005.[48] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In

NIPS , 2007.[49] J. Rasley, K. Karanasos, S. Kandula, R. Fonseca,M. Vojnovic, and S. Rao. Efﬁcient queue manage-ment for cluster scheduling. In

Proceedings of theEleventh European Conference on Computer Sys-tems , page 36. ACM, 2016.[50] C. E. Rasmussen and C. K. I. Williams.

GaussianProcesses for Machine Learning . MIT Press, 2006.[51] RDocumentation. Auto-arima.[52] C. Reiss et al. Google cluster-usage traces: format +schema. Technical report, Google Inc., Nov. 2011.1453] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz,and M. A. Kozuch. Heterogeneity and dynamicity ofclouds at scale: Google trace analysis. In

Proceed-ings of the Third ACM Symposium on Cloud Com-puting , page 7. ACM, 2012.[54] M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek,and J. Wilkes. Omega: ﬂexible, scalable schedulersfor large compute clusters. In

Proceedings of the 8thACM European Conference on Computer Systems ,pages 351–364. ACM, 2013.[55] S. Seabold and J. Perktold. Statsmodels: Econo-metric and statistical modeling with python. In , 2010.[56] M. Shahrad, C. Klein, L. Zheng, M. Chiang, E. Elm-roth, and D. Wentzlaf. Incentivizing self-capping toincrease cloud utilization. In

ACM Symposium onCloud Computing 2017 (SoCC’17) . Association forComputing Machinery (ACM), 2017.[57] Shefﬁeld. Gpy. https://shefﬁeldml.github.io/GPy/.[58] E. Snelson and Z. Ghahramani. Sparse gaussian pro-cesses using pseudo-inputs. In

Proceedings of the18th International Conference on Neural Informa-tion Processing Systems , NIPS, pages 1257–1264,Cambridge, MA, USA, 2005. MIT Press.[59] A. Svensson, A. Solin, S. S¨arkk¨a, and T. Sch¨on.Computationally Efﬁcient Bayesian Learning ofGaussian Process State Space Models. In

Proceed-ings of the 19th International Conference on Arti-ﬁcial Intelligence and Statistics , volume 51 of

Pro-ceedings of Machine Learning Research , pages 213–221. PMLR, 2016.[60] R. Turner, M. Deisenroth, and C. Rasmussen. State-Space Inference and Learning with Gaussian Pro-cesses. In

Proceedings of the Thirteenth Inter- national Conference on Artiﬁcial Intelligence andStatistics , volume 9 of

Proceedings of MachineLearning Research , pages 868–875. PMLR, 2010.[61] V. K. Vavilapalli et al. Apache hadoop yarn: Yetanother resource negotiator. In

Proc. of the ACMSoCC 2013 , page 5. ACM, 2013.[62] A. Verma, L. Pedrosa, M. Korupolu, D. Oppen-heimer, E. Tune, and J. Wilkes. Large-scale clustermanagement at google with borg. In

Proceedings ofthe Tenth European Conference on Computer Sys-tems , page 18. ACM, 2015.[63] J. Wilkes. More Google cluster data.Google research blog, Nov. 2011. Posted athttp://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html.[64] Y. Yan, Y. Gao, Y. Chen, Z. Guo, B. Chen, andT. Moscibroda. Tr-spark: Transient computing forbig data analytics. In

Proceedings of the SeventhACM Symposium on Cloud Computing , pages 484–496. ACM, 2016.[65] Y. Yang, G.-W. Kim, W. W. Song, Y. Lee, A. Chung,Z. Qian, B. Cho, and B.-G. Chun. Pado: A data pro-cessing engine for harnessing transient resources indatacenters. In

Proceedings of the Twelfth EuropeanConference on Computer Systems , pages 575–588.ACM, 2017.[66] Y. Zhang, G. Prekas, G. M. Fumarola, M. Fontoura,´I. Goiri, and R. Bianchini. History-based harvest-ing of spare cycles and storage in large-scale data-centers. In