[PDF] Optimal Resource Allocation for Elastic and Inelastic Jobs

Abstract

Modern data centers are tasked with processing heterogeneous workloads consisting of various classes of jobs. These classes differ in their arrival rates, size distributions, and job parallelizability. With respect to paralellizability, some jobs are elastic, meaning they can parallelize linearly across many servers. Other jobs are inelastic, meaning they can only run on a single server. Although job classes can differ drastically, they are typically forced to share a single cluster. When sharing a cluster among heterogeneous jobs, one must decide how to allocate servers to each job at every moment in time. In this paper, we design and analyze allocation policies which aim to minimize the mean response time across jobs, where a job's response time is the time from when it arrives until it completes. We model this problem in a stochastic setting where each job may be elastic or inelastic. Job sizes are drawn from exponential distributions, but are unknown to the system. We show that, in the common case where elastic jobs are larger on average than inelastic jobs, the optimal allocation policy is Inelastic-First, giving inelastic jobs preemptive priority over elastic jobs. We obtain this result by introducing a novel sample path argument. We also show that there exist cases where Elastic-First (giving priority to elastic jobs) performs better than Inelastic-First. We then provide the first analysis of mean response time under both Elastic-First and Inelastic-First by leveraging recent techniques for solving high-dimensional Markov chains.

Full PDF

aa r X i v : . [ c s . PF ] M a y Optimal Resource Allocation for Elastic and Inelastic Jobs

Benjamin Berg ∗∗ Carnegie Mellon UniversityPittsburgh, PA, [email protected]

Mor Harchol-Balter †† Carnegie Mellon UniversityPittsburgh, PA, [email protected]

Benjamin Moseley ‡‡ Carnegie Mellon UniversityPittsburgh, PA, [email protected]

Weina Wang

Carnegie Mellon UniversityPittsburgh, PA, [email protected]

Justin Whitehouse§ ¶ Carnegie Mellon UniversityPittsburgh, PA, [email protected]

ABSTRACT

Modern data centers are tasked with processing heterogeneousworkloads consisting of various classes of jobs. These classes dif-fer in their arrival rates, size distributions, and job parallelizabil-ity. With respect to paralellizability, some jobs are elastic , meaningthey can parallelize linearly across many servers. Other jobs are in-elastic , meaning they can only run on a single server. Although jobclasses can diﬀer drastically, they are typically forced to share a sin-gle cluster. When sharing a cluster among heterogeneous jobs, onemust decide how to allocate servers to each job at every moment intime. In this paper, we design and analyze allocation policies whichaim to minimize the mean response time across jobs, where a job’sresponse time is the time from when it arrives until it completes.We model this problem in a stochastic setting where each jobmay be elastic or inelastic. Job sizes are drawn from exponentialdistributions, but are unknown to the system. We show that, in thecommon case where elastic jobs are larger on average than inelas-tic jobs, the optimal allocation policy is

Inelastic-First , giving inelas-tic jobs preemptive priority over elastic jobs. We obtain this resultby introducing a novel sample path argument. We also show thatthere exist cases where

Elastic-First (giving priority to elastic jobs)performs better than Inelastic-First. We then provide the ﬁrst anal-ysis of mean response time under both Elastic-First and Inelastic-First by leveraging recent techniques for solving high-dimensionalMarkov chains.

ACM Reference Format:

Benjamin Berg, Mor Harchol-Balter, Benjamin Moseley, Weina Wang, and JustinWhitehouse. 2020. Optimal Resource Allocation for Elastic and InelasticJobs. ∗ This author is supported in part by the Facebook Graduate Fellowship † This author is supported in part by NSF-CMMI-1938909, NSF-XPS-1629444, and NSF-CSR-1763701 ‡ This author is supported in part by a Google Research Award, an Infor ResearchAward, a Carnegie Bosch Junior Faculty Chair and NSF grants CCF-1824303, CCF-1845146, CCF-1733873 and CMMI-1938909 § The author is supported in part by the National Science Foundation Graduate Re-search Fellowship Program under grant DGE 1745016. , ,

Modern data centers are tasked with processing astonishingly di-verse workloads on a common set of shared servers [50]. Thesejobs diﬀer not only in their resource requirements on a single server,but also in how eﬀectively they scale across multiple servers [12].For instance, a simple client query may not be parallelizable, but itmay complete in just milliseconds on a single server. Conversely, adata intensive job may run for hours even when parallelized acrossdozens of servers [41]. The challenge facing system architects is tobuild data centers which, in light of this heterogeneity, achieve low response time – the time from when a job enters the system untilit is completed.The state-of-the-art in many data centers is to allow users tospecify their own server requirements, and then over-provisionthe system. By always ensuring that idle servers are available, sys-tem designers avoid having to make tough resource allocation de-cisions while users always receive the resources they request. Un-fortunately, these over-provisioned systems are expensive to buildand waste resources [17]. Most large-scale data centers, for exam-ple, run at an average utilization of less than 20% [50].To try and reduce this waste, many cluster scheduling systemshave been proposed in the literature [12, 30, 38, 39, 41, 47]. Thesescheduling systems aim to maintain low response times withouthaving to over-provision the system. One way to achieve this goal[12, 47] is to have the system scheduler determine resource alloca-tions rather than allowing users to reserve resources. While theseschedulers often work well in practice, none of them oﬀer theoret-ical response time guarantees.

We propose a simple model of heterogeneous traﬃc running ina multiserver data center. Our goal is to design a resource alloca-tion policy which dynamically allocates servers to jobs in orderto minimize the mean response time across jobs. We assume jobsare preemptible, and that an allocation policy can change a job’sserver allocation over time. In particular, we will consider a sys-tem of k servers which processes jobs that arrive over time froma workload consisting of two distinct job classes. The ﬁrst class ofjobs, which we call elastic , consists of jobs which can run on any set of servers at any moment in time. We assume that elastic jobs xperience a speedup factor proportional to the number of serversthey run on. That is, an elastic job which completes in 2 secondson a single server would complete in 1 second on 2 servers, or .5seconds on 4 servers. The second class of jobs, which we refer to as inelastic , consists of jobs which are not parallelizable. While an in-elastic job can run on any server, it can only run on a single serverat any moment in time. A resource allocation policy must deter-mine, at every moment in time, how to allocate servers to each jobin system, both elastic and inelastic.In practice each job also has some amount of inherent work as-sociated with it. This inherent work, which we call a job’s size ,determines how long it would take to complete the job on a sin-gle server. We assume that job sizes in our model are unknownto the system, but are drawn independently for each job from anexponential distribution. To further model the heterogeneity of aworkload, we allow elastic and inelastic job sizes to be drawn fromtwo diﬀerent exponential distributions, with rates µ E and µ I re-spectively.Even given the simplicity of the model above, devising an op-timal scheduling policy is non-trivial. For instance, consider theproblem of dividing k servers between one elastic job and one in-elastic job which are both of size 1. On the one hand, we know thatcompleting jobs quickly beneﬁts mean response time, so one mightthink to run the elastic job on all k servers before running the in-elastic job. On the other hand, this schedule leaves k − eﬃcient schedule by running the elastic and inelastic jobssimultaneously, giving k − It is common to ﬁnd systems which use a shared set of servers toprocess both elastic and inelastic jobs. Typically in such settingsthe elastic jobs have more inherent work than the inelastic jobs.For example, consider a cluster which must process a stream ofmany MapReduce jobs [11]. From the cluster’s point of view, thisworkload produces a stream of map stages and reduce stages . Mapstages (elastic) are designed to be parallelized across any numberof servers and do a large amount of processing. Reduce stages (in-elastic) are inherently sequential and do much less total work thana map stage. As another example, modern machine learning frame-works [41] advocate the use of a single platform for both the train-ing and serving of models. Training jobs (elastic) are large, requir-ing large data sets and many training epochs. Distributed trainingmethods such as distributed stochastic gradient descent are also de-signed to scale out across an arbitrary number of nodes [36]. Oncea model has been trained, serving the model (inelastic), which con-sists of feeding a computed model a single data point in order toretrieve a single prediction, is done sequentially and requires com-paratively little processing power. It is less common for elastic jobs to be smaller than inelastic jobsin practice, given the overhead involved in writing code that canbe parallelized. If the amount of inherent work required for a job issmall to begin with, system developers may not choose to add theadditional data structures and synchronization mechanisms thatwould be required to make the job elastic. One exception is HPCworkloads. In this setting, there are often both malleable jobs (elas-tic) [19] and jobs with hard requirements (inelastic). While mal-leable jobs are designed to run on any number of cores, jobs withhard requirements demand a ﬁxed number of cores. In this case,it is unclear which class of jobs we would expect to involve moreinherent work.The model presented in this paper is ﬂexible enough to captureall of the above examples.

There has been a sizable amount of work that considers the prob-lem of scheduling jobs onto k parallel servers. The vast majorityof this work has considered only inelastic jobs of known sizes , andhas focused on worst-case analysis. Given the optimality of theShortest-Remaining-Processing-Time (SRPT) policy in the degen-erate case where k = k ≥

2. Speciﬁcally, one mightconsider a policy called SRPT-k [18] which runs the k jobs withthe shortest remaining processing times at every moment in time.Unfortunately, [35] shows that SRPT-k can be arbitrarily far fromoptimal. In fact, SRPT-k has a competitive ratio of Θ ( log min ( p , nk )) where n is the number of jobs and p is the ratio of the maximumjob size to the minimum job size. Additionally, [35] shows that thiscompetitive ratio is a tight lower bound – no online algorithm cando better in the worst case. Using speed augmentation, SRPT-k isknown to be constant competitive with 1 + ϵ speed for any constant ϵ > parallelizable jobs of known sizes onto k parallel servers. Thiswork assumes that each job has an arbitrary speedup curve whichdictates its running time as a function of the number of servers onwhich it runs. Again using worst-case analysis, [15] shows howto achieve an O ( ϵ ) -competitive ratio using ( + ϵ ) -speed servers.Without using resource augmentation, [31] provides an algorithmwith a competitive ratio of O ( log p ) , where again p is the ratio ofthe largest job size to the smallest job size. This competitive ra-tio essentially matches the known worst-case lower bound for theproblem.The above results suggest that, without resource augmentation,there is little room to improve the worst-case performance of sched-uling policies for parallelizable jobs. This is because the aforemen-tioned lower bounds for worst-case scheduling directly apply tothe case where jobs are given speedup curves. However, from thepoint of view of system designers, this problem remains unsolved!In particular, a competitive ratio of log p [31] can be arbitrarilyhigh when job sizes span a wide range, which is common in prac-tice. Thus, a log p -competitive algorithm could be impractical. Ad-ditionally, the results in [15] use an elegant algorithm that is in-teresting theoretically, but the algorithm is diﬃcult to implement ue to frequent context switches. The problem is that results like[15, 31] and others (see Section 3) perform badly on adversarialcases which are uncommon in practice. We therefore propose shift-ing to stochastic analysis which discounts the impact of these ad-versarial cases. By considering a stochastic analysis, there is thepotential to reveal new algorithmic insights into the problem. Itcould even be possible to ﬁnd online algorithms that are optimalin expectation.There has been recent work aimed at allocating servers to par-allelizable jobs in a stochastic setting in order to minimize meanresponse time [7]. However, this line of work is in an early stage.Speciﬁcally, [7] only considers the case where all jobs are homoge-neous with respect to job size and job speedup. While [7] is able toderive the optimal policy in this simpler case, they explicitly notethe complexity of handling even just two diﬀerent classes of jobs.In particular, the problem of allocating to servers to both elasticand inelastic jobs in a stochastic setting remains completely open.Although [7] presents some approximate numerical analysis of thecase where jobs are heterogeneous, the techniques used are com-putationally intensive and oﬀer no guarantess of accuracy.

This paper addresses the problem of allocating servers to both elas-tic and inelastic jobs. Section 2 introduces our stochastic modelof elastic and inelastic jobs of unknown sizes which arrive overtime to a system composed of k servers. Using this model, we thenpresent the following results: • We propose two natural server allocation policies whichaim to minimize the mean response time across jobs. First,the

Elastic-First policy gives strict preemptive priority toelastic jobs and aims to minimize mean response time bymaximizing the rate at which jobs depart the system. Sec-ond, the

Inelastic-First policy gives strict preemptive prior-ity to inelastic jobs. By deferring elastic work for as longas possible, Inelastic-First maximizes system eﬃciency. It isnot immediately obvious if either of these policies is opti-mal, or which policy is better. • We show in Section 4.1 that if elastic and inelastic jobs fol-low the same exponential size distribution, Inelastic-Firstis optimal with respect to mean response time. This argu-ment uses precedence relations to show that deferring elas-tic work increases the long run eﬃciency of the system. • Next, in Section 4.2, we show that in the case where elasticjobs are larger on average than inelastic jobs, Inelastic-Firstis optimal with respect to mean response time. This requiresthe introduction of a novel sample path argument. Our keyinsight is that Inelastic-First minimizes the expected amountof inelastic work in the system as well as the expected total work in the system. As long as elastic jobs are larger thaninelastic jobs on average, this suﬃces for minimizing meanresponse time. • In the case where elastic jobs are smaller on average than in-elastic jobs, Inelastic-First is no longer optimal. We illustratethis via a counterexample in Section 4.3 which shows that The algorithm is a generalization of equipartition, splitting the system evenlyamount a fraction of the jobs in the system.

Elastic-First can outperform Inelastic-First. In order to deter-mine when Elastic-First outperforms Inelastic-First, we per-form the ﬁrst analysis of both the Elastic-First and Inelastic-First allocation policies in Section 5. This analysis leveragesrecent techniques for solving high-dimensional Markov chains.Our analytical results match simulation. • For the sake of completeness, we also consider the case wherejob sizes are known and jobs arrive at time 0. Here we useworst-case analysis. Using standard dual-ﬁtting techniques(e.g. [4, 5]), we show SRPT-k is a 4-approximation for theobjective of minimizing mean response time. This demon-strates the need for stochastic modeling and analysis. In-deed, the stochastic setting yields optimality results withoutresorting to approximations. Due to lack of space, this ﬁnalcontribution is saved for the Appendix A.

We consider a model where jobs arrive over time to a system of k identical servers. Each job has an associated amount of inherentwork which we refer to as the job size . We assume that each of the k servers processes jobs with a rate of 1 unit of work per second.Hence, a job’s size is equal to its running time on a single server. Weassume that job sizes are unknown to the system, and are drawnfrom exponential distributions.Each job may be either elastic or inelastic . We assume that elas-tic jobs arrive according to a Poisson process with rate λ E , andthat elastic job sizes are drawn independently from an exponentialdistribution with rate µ E . Similarly, inelastic jobs arrive indepen-dently according to a Poisson process with rate λ I , and inelasticjob sizes are drawn independently from an exponential distribu-tion with rate µ I . We let S E and S I be random variables represent-ing the initial sizes of an elastic job or an inelastic job respectively.Every elastic job can run on any number of servers at any mo-ment in time. Because each server processes work at rate 1, n serversprocess work at a rate of n units of work per second. Hence, an elastic job of size x completes in x secondson a single server but completes in xn secondson n servers. By contrast, inelastic jobs can run on at most one server at anymoment in time.We note that all of the results presented in this paper hold equallyif inelastic jobs can run on up to some ﬁxed number of servers, C .If C ≥ k , we there is eﬀectively no diﬀerence between elastic andinelastic jobs, since we can never allocate more than k servers intotal. If C < k , we can simply renormalize our allocation policiesto consider allocating in units of kC servers. After renormalizing,inelastic jobs can once again receive up to one unit of allocationwhile elastic jobs can receive any number of units of allocation.While our results do not depend on the value of C , we consider thecase where C = allocation policy , π , must determine how many servers toallocate to each job at any moment in time t . Speciﬁcally, π canincrease or decrease the allocation to a particular job as it runs.We assume that servers are capable of time sharing, and thus anallocation policy may allocate a fractional number of servers to anyjob. For any n ∈ R ≥ , we assume that an allocation of n servers rocesses work at a rate of n units of work per second. At anymoment in time, t , an allocation policy can allocate at most 1 serverto each inelastic job, and at most k servers in total.We can model this system under any policy π as a continuoustime Markov chain where each state denotes the number of elas-tic and inelastic jobs currently in the system. That is, we deﬁne acontinuous time Markov process {( N πI ( t ) , N πE ( t )) : t ≥ } where ( N πI ( t ) , N πE ( t )) ∈ Z ≥ , ∀ t ≥ . Here, we deﬁne N πI ( t ) to be the number of inelastic jobs in systemat time t , and we deﬁne N πE ( t ) to be the number of elastic jobs insystem at time t . We therefore let the state ( N πI ( t ) , N πE ( t )) = ( i , j ) denote that there are i inelastic jobs and j elastic jobs currently inthe system.Because job sizes are exponential and arrivals occur accordingto a Poisson process, at any moment in time t , the distributionsof remaining job sizes and the distributions of times until the nextarrival for each job class can be fully speciﬁed by the numbers ofinelastic jobs and elastic jobs in the system. Hence, we will onlyconsider policies which are stationary and deterministic , meaningthe policy π makes the same allocation decision at every time t ,given that the system is in state ( i , j ) . Speciﬁcally, we deﬁne π I ( i , j ) to be the number of servers allocated to inelastic jobs in state ( i , j ) under policy π , and we deﬁne π E ( i , j ) to be the number of serversallocated to elastic jobs in state ( i , j ) under policy π . Note that π I ( i , j ) ≤ i ∀ ( i , j ) ∈ Z ≥ , π E ( i , j ) ≤ k · { j > } ∀ ( i , j ) ∈ Z ≥ , and π I ( i , j ) + π E ( i , j ) ≤ k ∀ ( i , j ) ∈ Z ≥ . In general, π I ( i , j ) + π E ( i , j ) could be less than k if there are not asuﬃcient number of jobs to use all k servers, or if π chooses to idleservers instead of allocating them to an eligible job.We refer to a policy π as work conserving if and only if, in anystate ( i , j ) , π I ( i , j ) + π E ( i , j ) ≥ i , and π I ( i , j ) + π E ( i , j ) = k · { j > } . That is, π never leaves servers idle if there is an eligible job in thesystem. In Appendix B we show that there exists an optimal policywhich is also work conserving. It therefore suﬃces to only considerwork conserving policies throughout our analysis.We deﬁne the system load , ρ to be ρ ≡ λ I kµ I + λ E kµ E . (1)In Appendix C we show that for any work conserving policy, π , ( N πI ( t ) , N πE ( t )) is an ergodic Markov chain if ρ <

1. Because thereexists an optimal work conserving policy, (1) is necessary for sta-bility under any policy π ′ . We therefore only consider the regimewhere ρ < N π ( t ) , as N π ( t ) = N πI ( t ) + N πE ( t ) . We also deﬁne W π ( t ) to be the total work in the system under pol-icy π at time t , where total work is the sum of the remaining sizes of all jobs in the system. Similarly, we let W πE ( t ) and W πI ( t ) be the total elastic work and the total inelastic work in the system underpolicy π at time t . These quantities are the sums of the remainingsizes of all elastic or inelastic jobs respectively. When referring tothe corresponding steady-state quantities, we omit the argument t . We deﬁne the random variable T π to be the response time ofa job which arrives to the system in steady-state under policy π .Here, the response time of a job is the time from when the jobarrives until it is completed (i.e. its remaining size is 0). Our goalis to ﬁnd the policy which minimizes the mean response time .We will investigate the performance of two allocation policies, Elastic-First ( EF ) and Inelastic-First ( IF ). EF gives strict preemptivepriority to elastic jobs, and processes jobs in ﬁrst-come-ﬁrst-serve(FCFS) order within each job class. That is, in any state ( i , j ) where j > EF allocates all k servers to the elastic job with the earliestarrival time. In any state ( i , j ) where j = EF allocates one serverto each inelastic job, in FCFS order, until either all jobs have re-ceived a server or all k servers have been allocated. By contrast, IF gives strict preemptive priority to inelastic jobs while processingjobs in FCFS order within each job class. Under IF , in any state ( i , j ) where i < k , one server is allocated to each inelastic job andthe remaining k − i servers are allocated to the elastic job with theearliest arrival time if there is one. In any state ( i , j ) where i ≥ k ,all k servers are allocated to the inelastic jobs with the k earliestarrival times. Although many real-world systems are tasked with allocating serversto heterogeneous workloads, these systems do not allocate serversoptimally in order to minimize the mean response time across jobs.Most large-scale cluster schedulers allow users to explicitly reservethe number of servers they want [30, 38, 39, 41, 50], only allow-ing the system to choose the placement of each job onto its re-quested number of servers. Some systems have proposed allowingthe system to determine the number of servers allocated to eachjob [12, 37, 47] in order to reduce response times. However, thesesystems rely on heuristics and do not make theoretical guarantees.In the theoretical literature, the closest work to the results pre-sented in this paper come from the stochastic performance model-ing community. In particular, [7] develops a model of jobs whosesizes are drawn from an exponential distribution and which re-ceive a sublinear speedup from being allocated additional servers.However, [7] only provides optimality results when jobs are ho-mogeneous, following a single speedup function and a single expo-nential size distribution. We emphasize that our paper is the ﬁrstever to consider more than one speed-up curve in the setting withstochastic arrivals over time and stochastic job sizes. Essentiallyall other work in the stochastic community has considered non-parallelizable inelastic jobs. Much of the prior work has been lim-ited to scheduling jobs on a single server [10]. While there has cer-tainly been work on scheduling in stochastic multiserver systems(e.g [1, 6, 18, 22, 24, 29]), this literature assumes that a job occu-pies at most one server at a time (that is, all jobs are inelastic). Onenotable model that considers jobs that run on multiple servers isthe queueing model motivated from MapReduce [32, 42, 51]. This ork assumes that each job consists of a set of pieces that can beprocessed on diﬀerent machines at the same time. These pieces canbe processed in any order and, critically, a job only completes whenall of its pieces have completed. This model can only be analyzedexactly when the number of servers is k = Θ ( log min ( p , nk )) where n is thenumber of jobs and p is the ratio of the maximum job size to theminimum job size. The policy which achieves the best competitiveratio is SRPT-k, which at every moment schedules the k jobs withthe smallest remaining processing times.Several prior works have also considered scheduling paralleliz-able jobs in the worst-case setting. The speed-up curve model wasﬁrst addressed by [13]. The best result for mean response time is[15] which gave a constant competitive algorithm with minimalspeed augmentation. This paper introduced the inﬂuential LAPSscheduling algorithm that has been used in a variety of settings[14, 20]. The work of [31] considers the problem without speed aug-mentation and gives a O ( log p ) competitive algorithm with mildassumptions on the speed-up curves. Recently, there has been aline of work on the Directed-Acyclic-Graph (DAG) model for par-allelism. Here a constant competitive algorithm with 1 + ϵ speedaugmentation is known [3]. The work of [2] gave an O ( ) speedconstant O ( ) competitive algorithm for mean response time thatis practical, using minimal preemptions. Note, however, that thebest possible competitive ratio in any model with release times isstill lower bounded by Θ ( log min ( p , nk )) , since all jobs could be in-elastic in the worst case. The following sections establish two results. First, we show that if µ I ≥ µ E , then IF is optimal for minimizing mean response time.Second, we show that if µ I < µ E , then IF is not necessarily optimal.In Section 4.1, we consider the special case where µ I = µ E . Inthis case where we have homogeneous sizes, analysis is particu-larly easy. Unfortunately, the technique used to demonstrate op-timality, which is based on the notion of precedence relations incontinuous time Markov chains, does not extend to when µ I , µ E .In Section 4.2, we consider the case where µ I ≥ µ E . Here, weconsider a novel sample path argument which allows us to demon-strate the optimality of IF .Lastly, in section 4.3, we consider the case where µ I < µ E . Here,we construct a very simple example demonstrating that IF is notoptimal in this environment. Furthermore, in this example, we showthe policy EF actually outperforms IF . We do not know what policyis optimal in this regime. µ I = µ E We ﬁrst consider the case where µ I = µ E . In this case, IF is op-timal with respect to minimizing mean response time. As statedin Section 1.2, the optimal policy should balance the trade-oﬀ be-tween completing jobs quickly and preserving system eﬃciency. When µ I = µ E , IF maximizes system eﬃciency without reducingthe overall completion rate of jobs. We argue this formally in The-orem 1 by leveraging a result from [7]. Theorem 1. IF is optimal with respect to minimizing mean re-sponse time when µ I = µ E . Proof.

Consider the server allocations made by a policy π inany state ( i , j ) . We deﬁne the total rate of departures under π in thestate ( i , j ) to be d π ( i , j ) = π E ( i , j ) · µ E + π I ( i , j ) · µ I . Following the terminology of [7], we say that π is in the class of GREEDY policies if d π ( i , j ) = max π ′ d π ′ ( i , j ) ∀ ( i , j ) ∈ Z ≥ . That is, a policy is in GREEDY if it achieves the maximal rate ofdepartures in every state.Furthermore, [7] deﬁnes a class of policies called

GREEDY* . Apolicy is said to be in GREED Y* if, in every state ( i , j ) , it minimizesthe number of servers allocated to elastic jobs while still maximiz-ing the total rate of departures. That is, a policy π is in GREED Y*iﬀ π E ( i , j ) = min π ′ ∈ GREEDY π ′ E ( i , j ) ∀ ( i , j ) ∈ Z ≥ . It is shown in [7], using precedence relations, that for any policy π ∈ GREEDY ∗ E [ T π ] = min π ′ ∈ GREEDY E [ T π ′ ] . (2)To leverage this result, we note that when µ I = µ E in our model,a policy is in GREEDY if and only if it does not idle servers unnec-essarily.We now argue that IF , which is non-idling, must be in GREED Y*.In states where IF allocates zero servers to elastic jobs, IF E ( i , j ) is clearly minimal. In any state ( i , j ) where IF E ( i , j ) >

0, serverscannot be reallocated from elastic jobs to inelastic jobs, since all i inelastic jobs must already be in service. Hence, reducing IF E ( i , j ) in this case results in a policy which is not in GREEDY. IF E ( i , j ) is therefore minimal amongst GREEDY policies in any state ( i , j ) ,and IF is in GREED Y*.W e show in Appendix B that there exists an optimal policy whichis non-idling. Hence, when µ I = µ E , there is an optimal policyin GREEDY. This implies that there must be an optimal policy inGREED Y* as w ell. Because any policy in GREED Y* has the samerate of departures of elastic and inelastic jobs in every state ( i , j ) ,every policy in GREED Y* has the same mean response time. Thus, IF , which is in GREED Y*, is optimal with respect to mean responsetime. (cid:3)

Why the prior argument does not generalize

Unfortunately, the results of [7] do not extend to the case where µ I , µ E . In particular, the proof of (2) uses a precedence relationbetween any two states ( i , j − ) and ( i − , j ) . This claim essen-tially states that a policy π in state ( i , j ) would perform better bytransitioning to state ( i − , j ) than it would by transitioning tostate ( i , j − ) . In the case where µ I = µ E , this makes perfect intu-itive sense. In this case, both states ( i − , j ) and ( i , j − ) containthe same amount of expected total work. Hence, it is better to be n state ( i − , j ) , which beneﬁts from having an additional elasticjob. Consider how this intuition changes when µ I > µ E . In thiscase, state ( i , j − ) has less expected total work, but state ( i − , j ) has more expected elastic work. It turns out that the precedencerelation shown in [7] no longer holds when µ I , µ E . Moreover,even if the precedence relations were to hold when µ I > µ I , [7]would yield that GREED Y* is optimal amongst GREEDY policies,not optimal amongst all policies. We must therefore devise a newargument to reason about the optimal allocation policy when elas-tic and inelastic jobs follow diﬀerent size distributions. µ I ≥ µ E We will show IF is optimal in the more general case of µ I ≥ µ E .While our goal is to minimize mean response time, we note thatvia Little’s Law [25], it suﬃces to minimize the mean total numberof jobs in the system. First, we start by deﬁning a class of policies P which serve in-elastic jobs on a ﬁrst-come-ﬁrst-serve (FCFS) basis; elastic jobs canbe served in any order. In more detail, a policy π is said to be inclass P if the following hold true:(1) π is work-conserving.(2) π serves inelastic jobs in FCFS order. In particular, if π al-locates N servers to inelastic jobs at time t ( N may be frac-tional, and there may be more than N inelastic jobs in thesystems), the allocation must give ⌊ N ⌋ servers to the ⌊ N ⌋ inelastic jobs with the earliest arrival times. If there is a re-maining fraction of a server, it may then be allocated to theinelastic job with the next earliest arrival time.Clearly, IF ∈ P . Road map:

Theorem 2 argues that we only need to compare IF to policies in P . Speciﬁcally, P contains some optimal policy thatminimizes the mean number of jobs in system and mean responsetime.Next, in Theorem 3 we present a novel sample path argumentwhich shows that IF has stochastically less work in the systemthan any policy in P . We will directly leverage this fact to showthat, out of all policies π ∈ P , IF has the least expected inelasticwork in system and also the least expected total work in system.Finally, In Theorem 5 we show that, of all policies in P , IF mini-mizes the expected number of jobs in system. Thus, by Little’s Law, IF is optimal with respect to mean response time. Analysis.

We now present Theorem 2.

Theorem 2.

The class P contains a policy π which minimizesboth mean response time and mean number of job in system. Specif-ically E (cid:2) N π (cid:3) = min π ′ n E h N π ′ i o , Little’s Law states that for any ergodic system with average total arrival rate λ , themean response time, E [ T ] is related to the mean total number of jobs in system, E [ N ] via the formula E [ T ] = E [ N ] λ . i , ji -1, j i +1, ji , j -1 i , j +1 λ I λ E π I ( i , j ) µ I π E ( i , j ) µ E Figure 1: The Markov chain ( N πI ( t ) , N πE ( t )) for a stationary,deterministic, work-conserving allocation policy, π . and E (cid:2) T π (cid:3) = min π ′ n E h T π ′ io , where N π is the total number of jobs in the system in steady-stateunder policy π , and T π is the response time of a job in the systemunder π in steady-state. Proof.

Recall that we will consider only stationary, determin-istic, work-conserving policies which make allocation decisionsbased on state ( i , j ) . Let π be a stationary, deterministic, work-conserving policy with the minimal mean number of jobs in sys-tem. Figure 1 shows the transition rates out of state ( i , j ) under π .We see that the transition rates out of the current state ( i , j ) under policy π depend solely on the number of servers allocated toeach type of job. Thus, neither the order in which we serve the jobsnor how many jobs of each type are running matter. In particular,we can construct a policy π ′ such that, for any state ( i , j ) , π I ( i , j ) = π ′ I ( i , j ) and π E ( i , j ) = π ′ E ( i , j ) and π ′ serves inelastic jobs in FCFS order. The policy π ′ has thesame Markov chain as π , so the expected numbers of jobs in systemunder π and π ′ are identical. Because π is work-conserving, π ′ isalso work-conserving. Hence, π ′ is in P and achieves the minimalmean number of jobs in system. (cid:3) The power of Theorem 2 is that, to show IF is optimal withrespect to mean response time, it now suﬃces to show: E [ N IF ] ≤ E (cid:2) N π (cid:3) ∀ π ∈ P . (3)However, it is hard to directly compare the numbers of jobs underdiﬀerent policies. We get around this roadblock by instead analyz-ing how the remaining work in the system under IF relates to otherpolicies π ∈ P . In particular, we obtain the following strong result. Theorem 3.

For all policies π ∈ P , if we assume that ( N πI ( ) , N πE ( )) = ( N IF I ( ) , N IF E ( )) , t t t isfullIF IFisfull TimeNumber serversused by IF k Figure 2: Intervals of time during which all k servers arebusy under IF then: W IF ( t ) ≤ ST W π ( t ) and W IF I ( t ) ≤ ST W πI ( t ) ∀ t ≥ , where W π ( t ) is the total remaining work under policy π at time t , W πI ( t ) is the remaining inelastic work under policy π at time t , and ≤ ST denotes stochastic dominance. Proof.

Fix an arbitrary policy π ∈ P , and let us consider a ﬁxed arrival sequence , that is, a ﬁxed sequence of arrival times and jobsizes. We couple π and IF under this sequence. Here, it suﬃces toconsider arrival sequences where the total number of job arrivalsup to any time t is ﬁnite, as this occurs with probability 1.Recall that W πI ( t ) and W πE ( t ) are respectively the remaining in-elastic and elastic work in the system at time t under schedulingpolicy π . Furthermore, also recall that W π ( t ) , the total work attime t , is given by: W π ( t ) = W πI ( t ) + W πE ( t ) . In order to show the desired stochastic dominance relations, itwill suﬃce to show that on any such arrival sequence W IF I ( t ) ≤ W πI ( t ) and W IF ( t ) ≤ W π ( t ) ∀ t ≥ . First, we see it is immediate that, under our arrival sequence, W IF I ( t ) ≤ W πI ( t ) for all t ≥

0. Since IF and π process inelastic jobsin FCFS order, each inelastic job enters service at least as early un-der IF as it does under π . Furthermore, IF never preempts inelasticjobs. Hence, at each time t , the remaining size of each inelastic jobthat has arrived by time t is no larger under IF than it is under π .Since the inelastic work in system is just the sum of the remainingsizes of inelastic jobs, the total inelastic work at time t under IF isless than the total inelastic work at time t under π .It remains to show that W IF ( t ) ≤ W π ( t ) ∀ t ≥ . (4)We prove our claim by induction. For a base case, it is clear that W IF ( ) ≤ W π ( ) , as the policies have the same set of jobs at timezero, and no work has been completed yet. For any time t , we parti-tion the interval [ , t ] into subintervals [ t i , t i + ] (see Figure 2) suchthat either(1) IF allocates all k servers on [ t i , t i + ] , or(2) IF allocates strictly less than k servers on [ t i , t i + ] .We now induct on i , and show that W IF ( t i ) ≤ W π ( t i ) implies W IF ( t i + ) ≤ W π ( t i + ) . If the interval [ t i , t i + ] falls into case (1), IF is completing workat the maximal rate of any policy. In particular, IF completes ex-actly ( t i + − t i ) · k work on [ t i , t i + ] . Let ω denote the work com-pleted by π on [ t i , t i + ] . Then, we must have ω ≤ ( t i + − t i ) · k .Since IF and π experience the same set of arrivals on this interval,we have: W π ( t i + ) − W IF ( t i + ) = (cid:0) W π ( t i ) − ω (cid:1) − ( W IF ( t i ) − ( t i + − t i ) · k ) = (cid:0) W π ( t i ) − W IF ( t i ) (cid:1) + (( t i + − t i ) · k − ω )≥ . Thus, we have W IF ( t i + ) ≤ W π ( t i + ) , as desired.If the interval [ t i , t i + ] falls into case (2), IF allocates strictlyless than k servers on [ t i , t i + ] . We aim to show that W IF ( t i + ) ≤ W π ( t i + ) . Observe that IF can have no elastic jobs in its system on [ t i , t i + ) . This is because we have deﬁned IF to be work-conserving.Hence, if there was an elastic job, IF would run it on all availableservers.Observe that, assuming no elastic job arrives at time t i + , W IF ( t i + ) = W IF I ( t i + ) . Likewise, we know W π ( t i + ) = W πI ( t i + ) + W πE ( t i + ) ≥ W πI ( t i + ) . We get the inequality above because π cannot have negativeelastic work at time t i + . Finally, we have W π ( t i + ) − W IF ( t i + ) = (cid:0) W πI ( t i + ) + W πE ( t i + ) (cid:1) − W IF I ( t i + ) = (cid:0) W πI ( t i + ) − W IF I ( t i + ) (cid:1) + W πE ( t i + )≥ W πI ( t i + ) − W IF I ( t i + )≥ , where the last inequality follows from the fact that W IF I ( t ′ ) ≤ W πI ( t ′ ) for all t ′ ≥

0. Thus, we have W IF ( t i + ) ≤ W π ( t i + ) .As a side note, some elastic work could arrive at exactly time t i + . However, this increases the total work in both systems bythe same amount and thus has no eﬀect on the ordering of thesequantities.Thus, for any interval [ t i , t i + ] , if W IF ( t i ) ≤ W π ( t i ) , then wehave W IF ( t i + ) ≤ W π ( t i + ) . Since W IF ( ) ≤ W π ( ) , it follows thatthis inequality holds at the end of the last subinterval. The end ofthis ﬁnal subinterval is exactly time t . Thus, for any t ≥

0, we have W IF ( t ) ≤ W π ( t ) , as desired.We have thus found a coupling of π and IF such that the amountof total work and the amount of inelastic work in each system isordered at every moment in time. This implies that W IF I ( t ) ≤ ST W πI ( t ) ∀ t ≥ W IF ( t ) ≤ ST W π ( t ) ∀ t ≥ (cid:3) In other words, IF is the best policy in P for minimizing remain-ing inelastic and total work in the system. One possible explana-tion for this is that, by deferring parallelizable work, IF ensuresthat all k servers are saturated with work for as long as possible.We now understand that, out of all policies in P , IF is optimalwith respect to minimizing both expected remaining inelastic work nd expected remaining total work at any time t . We now estab-lish a relationship between expected remaining work and expectednumber of jobs in system. Lemma 4.

For any policy π , we have: E [ W πI ] = µ I E [ N πI ] and E [ W πE ] = µ E E [ N πE ] , where W πI and N πI are respectively the inelastic work and numberof inelastic jobs in the system in steady-state under policy π . Further-more, S I is the size of an inelastic job, distributed as S I ∼ Exp ( µ I ) . W πE , N πE , and S E are the analogous quantities for elastic jobs. Proof.

We do the proof for the inelastic relationship, but theproof for the elastic relationship is identical. Let the random vari-able N πI ( t ) denote the number of inelastic jobs in the system underpolicy π at time t . Assume that ℓ ∈ { , . . . , N πI ( t )} is used as anindex for the jobs which are in the system at time t , and deﬁne R π ℓ, I ( t ) as the remaining size of inelastic job ℓ under policy π attime t .Recall W πI ( t ) is the remaining inelastic work in system at time t under policy π . We have the following equivalence: W πI ( t ) = N πI ( t ) Õ ℓ = R π ℓ, I ( t ) . By the memoryless property of the exponential distribution, theremaining size of jobs ℓ ∈ { , . . . , N πI ( t )} also follow an exponen-tial distribution. Speciﬁcally, R π ℓ, I ( t ) ∼ Exp ( µ I ) , regardless of pol-icy π or time t . Thus, N πI ( t ) and R π ℓ, I ( t ) are independent and wehave that E [ W πI ( t )] = E [ R π ℓ, I ( t )] · E [ N πI ( t )] = E [ S I ] · E [ N πI ( t )] . As shown in Appendix C, E [ N πI ( t )] converges to E [ N πI ] as t → ∞ .This implies the convergence of E [ W πI ( t )] . Thus, taking the limitas t → ∞ yields: E [ W πI ] = E [ S I ] · E [ N πI ] , where E [ S I ] = µ I as desired . (cid:3) We can now show that IF has the lowest expected number ofjobs in system when µ I ≥ µ E . Theorem 5.

For any policy π , if µ I ≥ µ E , we have: E [ N IF ] ≤ E (cid:2) N π (cid:3) . And via Little’s Law, we have: E [ T IF ] ≤ E (cid:2) T π (cid:3) . Technically, we have only proven that E [ W πI ( t )] converges to some value , but notthat it converges to E [ W πI ] . This would be suﬃcient for our subsequent results. Itturns out that E [ W πI ( t )] converges to E [ W πI ] as t → ∞ , but we omit this proof forbrevity. Proof.

Because there exists an optimal work-conserving policyin P , it suﬃces to consider any policy π ∈ P . We write total workunder π as W π = W πI + W πE . Likewise, we have the equality N π = N πI + N πE . First, from Lemma 4, we have the following equalities: E [ W πI ] = µ I E [ N πI ] and E [ W πE ] = µ E E [ N πE ] . Furthermore, by the stochastic dominance results of Theorem 3, E [ W IF I ] ≤ E [ W πI ] and E [ W IF ] ≤ E [ W π ] Thus, we have: E [ N IF ] = E (cid:2) N IF I + N IF E (cid:3) = µ I E (cid:2) W IF I (cid:3) + µ E E (cid:2) W IF E (cid:3) = ( µ I − µ E ) E (cid:2) W IF I (cid:3) + µ E E (cid:2) W IF I + W IF E (cid:3) ≤ ( µ I − µ E ) E (cid:2) W πI (cid:3) + µ E E (cid:2) W πI + W πE (cid:3) (5) = µ I E (cid:2) W πI (cid:3) + µ E E (cid:2) W πE (cid:3) = E [ N πI ] + E [ N πE ] = E [ N π ] . Note, we leverage the fact µ I ≥ µ E in (5). If µ E > µ I , then µ I − µ E would be negative, so we would not be able establish a relationshiplike (5). This completes the proof. (cid:3) We have therefore established that IF is optimal with respect tomean response time when µ I ≥ µ E . µ I < µ E Now, we consider the case when µ I < µ E . Here, we demonstratethat IF is not optimal in minimizing mean response time. In fact, IF is not even optimal in the simpliﬁed environment where there areonly two servers and no arrivals. We construct our counterexamplein Theorem 6 below. Theorem 6.

In general, IF is not optimal for minimizing meanresponse time when µ I < µ E . Proof.

Assume we have k = µ E = µ I , and there areno arrivals. We show that, if the system starts with two inelasticjobs and one elastic job, the policy EF outperforms IF .We directly compute the mean response time for both policies,starting with IF . We let T IF denote response time under IF , and T EF denote response time under elastic ﬁrst. We have: E [ T IF ] = µ I + µ I + µ E + µ I µ I + µ E (cid:18) µ E (cid:19) + µ E µ I + µ E (cid:18) µ I (cid:19) = µ I + µ I + µ I µ I (cid:18) µ I (cid:19) + µ I µ I (cid:18) µ I (cid:19) = / µ I + / µ I + / µ I + / µ I = / µ I . , , , · · · k -1 , k , · · · , , , · · · k -1 , k , · , , , · · · k -1 , k , · · · ... ... ... ... ... λ I λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I kµ I kµ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ E λ E λ E λ E λ E kµ E kµ E kµ E kµ E kµ E λ E λ E λ E λ E λ E kµ E kµ E kµ E kµ E kµ E λ E λ E λ E λ E λ E kµ E kµ E kµ E kµ E kµ E (a) Full EF chain , , , · · · k -1 , k , · · · , b , b , b · · · k -1 , b k , b · · · λ I λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I kµ I kµ I λ I λ I λ I λ I λ I λ I λ E λ E λ E λ E λ E B B B B B (b) EF chain with special states , , , · · · k -1 , k , · · · , b , b , b · · · k -1 , b k , b · · · , b , b , b · · · k -1 , b k , b · · · λ I λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I kµ I kµ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ I λ E λ E λ E λ E λ E γ γ γ γ γ γ γ γ γ γ γ γ γ γ γ (c) Final 1D EF chain Figure 3: The transformation of the 2D-inﬁnite EF chain toa 1D-inﬁnite chain via the busy period transformation. Spe-cial states representing an M / M / busy period are shown in(b), and these busy periods are approximated by a Coxiandistribution in (c). On the other hand, we see: E [ T EF ] = µ E + µ I + µ I = / µ I + µ I + µ I = / µ I . In particular, we have E [ T EF ] < E [ T IF ] . Thus, in general, IF is notoptimal when µ I < µ E . In fact, in this environment, we see EF outperforms IF . (cid:3) From the results of Section 4, we know that IF is optimal withrespect to mean response time when µ I ≥ µ E . However, Section 4also shows that EF can outperform IF when µ I < µ E . This begsthe question of which allocation policy, IF or EF , performs betterfor given values of µ I and µ E . In this section we derive the mean response time for EF undera range of values of µ I , µ E , λ I , λ E , and k . The analysis for the IF policy is similar, and thus we defer it to Appendix D. We outlineour approach here:(1) In Section 5.1 we present the Markov chains for EF . ThisMarkov chain is 2D-inﬁnite.(2) In Section 5.2 we present a technique from the stochasticliterature called Busy Period Transitions [45, 46] which re-duces the 2D-inﬁnite chain to a 1D-inﬁnite chain. Althoughthe Busy Period Transitions approach produces an approxi-mation, it is known to be highly accurate, with errors of lessthan 1% [26–28, 45, 46].(3) In Section 5.3 we apply standard Matrix-Analytic methodsto solve the 1D-inﬁnite Markov chain, obtaining the station-ary distribution and ﬁnally the mean response time EF .The results of our analysis for IF and EF are shown in Figures 4, 5,and 6. We compared our analysis with simulation, and all numbersagree within 1%. We note that [7] used MDP-based techniques toanalyze allocation policies in a similar model. These previous re-sults required truncating the state space, and were computationallyintensive. The techniques presented in this section do not requiretruncating the state space, can be tuned to arbitrary precision, andare comparatively eﬃcient.Figure 4 presents a high-level view of our results, showing onlythe relative performance of IF and EF as the system load, ρ , ismoved from (a) low load to (b) medium load to (c) high load. Inevery case, IF outperforms EF when µ I ≥ µ E , as expected fromthe optimality of IF in this region. When µ I < µ E , Figure 4 showsus that EF can outperform IF , and that the region where EF is bettergrows as ρ increases.Figure 5 shows the absolute mean response times under IF and EF as a function of µ I . We again examine the system under variousﬁxed values of ρ . The dotted lines at µ I = µ I = µ E . We therefore know that IF is optimal to the right of thisline in every graph, while EF may dominate IF to the left of thisline. We see that our choice of allocation policy has a major impacton mean response time.While Figures 4 and 5 assume that k =

4, our analysis worksequally well with any number of servers, k . Figure 6 shows howthe mean response time under IF and EF changes as k increaseswhile system load, ρ , remains constant. IF and EF Figure 3a shows the Markov chain which exactly describes EF . Thecorresponding IF chain is given in Appendix D. Recall that thestate ( i , j ) denotes having i inelastic jobs and j elastic jobs in thesystem. This chain is inﬁnite in 2 dimensions – the number of in-elastic jobs and the number of elastic jobs. Because there is no gen-eral method for solving 2D-inﬁnite Markov chains, we provide atechnique for converting this chain to a 1D-inﬁnite Markov chainin Section 5.2. We start by describing how to reduce the dimensionality of theMarkov chain for EF . To do this, we make three key observationsabout its structure. I E (a) Low load, ρ = . I E (b) Med. Load, ρ = . I E EF SuperiorIF Superior (c) High load, ρ = . Figure 4: Heat maps showing the relative performance of IF and EF as a function of µ I and µ E when k = . We ﬁx load ρ andvary µ I and µ E . To oﬀset the changes to µ I and µ E , we change λ I and λ E to keep ρ constant. In every graph, λ I = λ E . The redcircles represent settings where IF dominates EF . The blue + ’s represent cases where EF dominates IF . As ρ increases, the regionwhere EF dominates IF grows. However, as expected, when µ I ≥ µ E IF dominates EF for all loads. I E [ T ] (a) Low load, ρ = . I E [ T ] (b) Med. load, ρ = . I E [ T ] EFIF (c) High load, ρ = . Figure 5: Graphs showing the absolute mean response times under IF and EF as a function of µ I when k = . In each graph, weﬁx system load, ρ , and set µ E = . We then vary µ I . To oﬀset the changes in µ I , we change λ I and λ E to keep ρ constant. In everygraph, λ I = λ E . The dotted lines at µ I = denote the case where µ I = µ E . Thus IF is optimal to the right of this line, while EF may dominate IF to the left of this line. We see that the allocation policy has a major impact on mean response time. k E [ T ] EFIF (a) µ I = . , µ E = k E [ T ] EFIF (b) µ I = . , µ E = Figure 6: Graphs showing the mean response time under IF and EF as a function of the number of servers, k under high load( ρ = . ). The values of µ I and µ E are chosen to represent the extreme ends of Figure 5c (where the performance gap betweenthe policies is the largest). Even when k = , the diﬀerence between IF and EF remains large. bservation 1: Response time of elastic jobs is trivial. Under EF , elastic jobs have preemptive priority over inelastic jobs. Thus,their behavior is independent of the state of inelastic jobs in thesystem. We can therefore model the response time of elastic jobsas an M / M / λ E and servicerate kµ E , which is well understood in the queueing literature [33].What remains is to understand the response time of the inelasticjobs. Observation 2: The busy period transformation.

Looking atFigure 3a, we notice that the chain has a repeating structure whenthere is at least 1 elastic job in the system ( j ≥ EF to a1D-inﬁnite chain. Speciﬁcally, while there are elastic jobs in thesystem, EF does not process any inelastic jobs. The length of timewhere EF is not processing any inelastic jobs can be viewed as an M / M / busy period . In an M / M / j ≥ M / M / Observation 3: Creating 1D chain for inelastic jobs.

Lookingat Figure 3b, we note the bolded transition arrows (labeled “B”)emanating from the busy period states. Because the duration of an M / M / IF (see Appendix D).Given these 1D-inﬁnite chains, we now apply standard matrixanalytic techniques to solve for mean response time. We now explain how to analyze IF and EF using the 1D-inﬁniteMarkov chains developed in the previous section. We do this by ap-plying matrix analytic methods [34, 43, 44]. Matrix analytic meth-ods are iterative procedures which compute the stationary distri-bution of a repeating, 1D-inﬁnite Markov chain.Consider, for example, Figure 3c which shows the 1D-inﬁnitechain for EF . Observe that each column of this chain, after the ﬁrstcolumn, has identical transitions. The idea of matrix analytic meth-ods is to represent the stationary distribution of column j + j and some un-known matrix R . The matrix R is determined iteratively through anumeric procedure [34, 43, 44]. This procedure yields the station-ary distribution of the chain. Using the stationary distribution wecan easily determine the mean number of inelastic jobs, and hencethe mean response time for inelastic jobs (recall that the responsetime for elastic jobs under EF is trivial). An analogous argument can be applied to solve the 1D-inﬁnitechain for IF . In this paper, we establish optimality results and provide the ﬁrstanalysis of policies for scheduling jobs which are heterogeneouswith respect to their parallelizability. Speciﬁcally, we study a modelwhere jobs are either inelastic or elastic: inelastic jobs can only runon a single server and elastic jobs parallelize linearly across manyservers. We prove that the policy

Inelastic-First ( IF ), which givesinelastic jobs preemptive priority over elastic jobs, is optimal forminimizing the mean response time across jobs in the commoncase where elastic jobs are larger on average than inelastic jobs.We then provide analysis of mean response time under the Elastic-First ( EF ) and Inelastic-First ( IF ) policies. Our techniques includea novel sample path argument for proving stochastic dominance,and a method for solving 2D-inﬁnite Markov chains.There are many open questions in scheduling jobs which areheterogeneous with respect to their parallelizability. One immedi-ate follow-up of our work is to ﬁnd optimal policies when elasticjobs are smaller on average than inelastic jobs. We show in this pa-per that in this setting EF can outperform IF ; however it’s not clearthat EF is the optimal allocation policy. Furthermore, the modelstudied in this paper can be generalized in many ways to capturea broad range of application scenarios. For example, one can con-sider a model where the elastic jobs are not fully elastic as in thispaper, but are elastic up to a certain number of servers. More gener-ally, we can have more than two classes of jobs with diﬀerent levelsof parallelizability and diﬀerent job size distributions. The problemof ﬁnding optimal policies and providing analysis in these modelsis wide open. PPENDIXA APPROXIMATION WHEN JOBS ARRIVE ATTHE SAME TIME

In this section we show that a generalization of SRPT-k is a 4-approximate algorithm for mean response time if all jobs arrive atthe same time. This case is entirely deterministic. This result gen-eralizes beyond elastic and inelastic jobs; in particular, this resultholds even in more general parallelizability settings where everyjob j is parallelizable up to k j processors. That is, if job j is given k ′ ≤ k processors, the rate it is processed is min { k j , k ′ } .To prove the theorem, we will use a dual ﬁtting analysis. Con-sider the following LP relaxation of the problem. In the following,we use x j to denote the inherent size of job j . The variable y j , t ishow much job j is processed at time t .min { y jt } Õ j Õ t ≥ (cid:18) tx j + k j (cid:19) · y jt LP primal Õ t ≥ y jt ≥ x j ∀ j Õ j : t ≥ y jt ≤ k ∀ ty jt ≥ ∀ j , t : t ≥ k speed single machine plus the standard corrective term in the ob-jective. See [9] for similar relaxations. The dual of LP primal is asfollows. max { α j } , { β t } Õ j α j − Õ t β t LP dual α j x j − β t k ≤ tx j + k j ∀ j , t : t ≥ α j ≥ ∀ jβ t ≥ ∀ t The algorithm that will be used is a natural generalization ofSRPT-k to the case of parallelizable jobs. The algorithm sorts thejobs according to their inherent size in increasing order. For therest of the analysis we assume that the jobs are in this order suchthat x ≤ x ≤ . . . x n , where n is the total number of jobs . At anypoint in time, the algorithm gives the cores to the jobs in this pri-ority order. Each job j is assigned up to k j processors and then thealgorithm considers the next job in the list with the remaining pro-cessors. We let U j = Í j − i = x i be the total amount of work strictlyahead of job j .To analyze the algorithm, we will assume the processors the al-gorithm has are of speed s ≥

1. Later we will set s =

2. That is, eachprocessor completes s units on a job each timestep it works on ajob. We compare to an optimal solution with one speed processors.The following theorem allows us to do this with minimal loss inthe approximation ratio. This allows us to compare to the sloweroptimal solution. Lemma 7 ([21]).

Let OPT s denote the value of the total responsetime of the optimal algorithm where the optimal algorithm has pro-cessors of speed s . Then for any s ≥ ,OPT ≤ s OPT s . We now deﬁne the dual variables. Let Q ( t ) denote the set of jobsreleased and unsatisﬁed at time t in the algorithm’s schedule. Let α j = U j ks + x j sk j and let β t = s | Q ( t )| . Our main claim is the following. Lemma 8.

Let C denote the algorithm’s total completion time. It isthe case that Í j α j − Í t β t ≥ (cid:16) − s (cid:17) C . Moreover, α , β correspondto a feasible dual solution when s = . The majority of the section will be devoted to proving this lemma.We ﬁrst observe that this is suﬃcient to prove our theorem.

Theorem 9.

The SRPT-k algorithm is a -approximation for meanresponse time when all jobs arrive at time . Proof.

Set s =

2. Lemma 8 ensures that C is at most a factor 2larger than the optimal solution using 1 speed. Lemma 7 ensuresthat the 1 speed optimal is within a factor 2 of the 2 speed optimal.Together this shows the algorithm is a 4 approximation. (cid:3) Now return to proving Lemma 8. We being by establishing thevalue of the objective function.

Lemma 10. Í j α j − Í t β t = (cid:16) − s (cid:17) C . Proof.

First notice that Í t β t = Í t s | Q ( t )| . This is precisely s C . Thus, it is suﬃcient to prove U j ks + x j k j ≥ C . To do so, we showthat U j ks + x j sk j is an upper bound on job j ’s response time. Indeed,we know that either all k processors are working on work in U j + x j with speed s if j is unsatisﬁed or job j is being worked on with k j processors with speed s . (cid:3) Next we will show that this setting of the dual variables corre-sponds to a feasible dual solution.

Lemma 11.

The dual solution α , β is feasible when s = . Proof.

We need to show the following for all jobs j and times t ≥ α j x j − β t k ≤ tx j + k j . Consider the left hand side for a ﬁxed job j and time t . Let x rj ′ ( t ) be the remaining work left on job j ′ at time t and x pj ′ ( t ) = x j ′ − x rj ′ ( t ) be the amount of job j ′ that has been processed up to time t . Thisis equivalent to the following given the deﬁnitions of α and β :1 x j (cid:18) U j ks + x j sk j (cid:19) − sk | Q ( t )| = x j © « ks Õ j ′ ∈[ n ] , x j ′ < x j (cid:16) x pj ′ ( t ) + x rj ′ ( t ) (cid:17) + x j sk j ª®¬ − sk | Q ( t )| . Now consider any job that is in complete at time t . That is, thosein Q ( t ) . We can remove these from the ﬁrst term by combiningterms with the − sk | Q ( t )| term. The prior expression is only lessthan the following: x j © « ks Õ j ′ ∈[ n ]\ Q ( t ) , x j ′ < x j (cid:16) x pj ′ ( t ) + x rj ′ ( t ) (cid:17) + x j sk j ª®¬ ≤ x j © « ks Õ j ′ ∈[ n ]\ Q ( t ) , x j ′ < x j (cid:16) x pj ′ ( t ) (cid:17) + x j sk j ª®¬ . [ x rj ′ ( t ) = t ] Notice that Í j ′ ∈[ n ]\ Q ( t ) , x j ′ < x j (cid:16) x pj ′ ( t ) (cid:17) is less than kst . This isbecause the summation is counting work that the algorithm hasprocessed by time t . The algorithm has k processors of speed s .Thus, the prior term is less than the following: tx j + sk j . Given the s =

2, we get that the dual solution is feasible. (cid:3)

Together Lemmas 10 and 11 complete the proof.

B IDLING POLICIES

We deﬁne a policy to be idling if it chooses to leave one or moreservers idle rather than allocating them to some eligible jobs.

Theorem 12.

For any policy π which unnecessarily idles serversthere exists a non-idling policy π ′ such that E [ T π ′ ] ≤ E [ T π ] . Hence, there exists an optimal policy which is non-idling.

Proof.

Consider any policy π which idles servers unnecessar-ily in one or more states. We will construct a new policy, π ′ , whichis identical to π in every state where π does not idle servers unnec-essarily. In each state ( i , j ) where π does idle server unnecessarily,if j > π ′ will allocate all of π ’s idle servers to the elastic jobwith the earliest arrival time. If j = π ′ will instead allocate π ’sidle servers to each unserved (or underserved) inelastic job in FCFSorder.We now compare the performance of π to π ′ on any ﬁxed ar-rival sequence of elastic and inelastic jobs. Suppose π ﬁrst unnec-essarily idles servers at time t , and suppose π gives jobs constantallocations on the time interval ( t , t + δ ] . We reallocate the idleservers during this time interval in order to match the allocations π ′ would make. No job received fewer servers as a result of thistransformation, and at least one job received additional serversduring ( t , t + δ ] . Each job which received additional servers dur-ing ( t , t + δ ] had its response time decreased, and no jobs had theirresponse time increased. Furthermore, after this interchange, theschedule now reﬂects the allocation decisions that π ′ would make.We now proceed to the next time, t ′ , in the schedule where thereare unnecessarily idle servers. Note, this idle space may exist be-cause it is part of the policy π , or because an earlier interchangecaused a job to complete earlier, creating some idle servers at time t ′ . In either case, we simply perform the same interchange as be-fore, decreasing the response time of some jobs without increasingthe response time of any jobs.Note that each interchange causes the earliest occurrence of un-necessary idle servers in the schedule to occur at a later time. We therefore iterate this argument until all idle time either vanishes oroccurs after the completion of the last job in the arrival sequence.At this point, the schedule reﬂects the actions taken by π ′ , andhence the mean response time under π ′ is no larger than the meanresponse time under π . Hence, given any optimal idling policy π ,we can construct a non-idling policy π ′ which also optimal. (cid:3) C LYAPUNOV STABILITY OF WORKCONSERVING POLICIES

Theorem 13.

For any work-conserving policy π , the associatedMarkov chain {( N πI ( t ) , N πE ( t )) : t ≥ } has a stationary distribu-tion. If we deﬁne ( N πI , N πE ) to be a random element that follows thisstationary distribution, then lim t →∞ ( N πI ( t ) , N πE ( t )) d = ( N πI , N πE ) . (6) Furthermore, lim t →∞ E [ N πI ( t )] = E [ N πI ] , (7) and lim t →∞ E [ N πE ( t )] = E [ N πE ] . (8) Proof.

To prove this claim, it suﬃces to show the drift resultsbelow, which allows us to apply the Foster-Lyapunov theorem [49]to show the convergence in distribution in (6) and apply the boundsin [23] to show the convergence of expectations in (7) and (8).Consider the following Lyapunov function V : Z ≥ → R ≥ forthe Markov chain {( N πI ( t ) , N πE ( t )) : t ≥ } : V ( i , j ) = ikµ I + jkµ E . Then its drift ∆ V ( i , j ) can be written as ∆ V ( i , j ) = Õ ( i ′ , j ′ ) r ( i , j )→( i ′ , j ′ ) ( V ( i ′ , j ′ ) − V ( i , j )) , where r ( i , j )→( i ′ , j ′ ) is the rate of transition from state ( i , j ) to state ( i ′ , j ′ ) . Note that for any ( i , j ) and ( i ′ , j ′ ) , | V ( i ′ , j ′ ) − V ( i , j )| < k min { µ I , µ E } . We now show that for the ﬁnite set, F = {( i , j ) : i + j ≤ k } , wehave ∆ V ( i , j ) ≤ − ϵ ∀ ( i , j ) < F for some ϵ >

0. Let ( i , j ) be any state not in F , i.e., i + j > k . Bydeﬁnition, ∆ V ( i , j ) = λ I kµ I + λ E kµ E − (cid:18) π I ( i , j ) µ I kµ I + π E ( i , j ) µ E kµ E (cid:19) . Because π is assumed to be a work conserving policy, and thereare at least k jobs in system, we know that π I ( i , j ) + π E ( i , j ) = k . Furthermore, we have assumed that ρ = λ I kµ I + λ E kµ E = − ϵ < ϵ >

0. Hence, we have that ∆ V ( i , j ) = ρ − = − ϵ s desired. We can therefore conclude that the Markov chain in-duced by π is positive recurrent and the convergence in distribu-tion in (6) follows.Note that for any ( i , j ) ∈ F , V ( i , j ) ≥ { µ I , µ E } . Then extend-ing Theorem 2.3 of [23] to continuous-time Markov chains usinguniformization implies thatsup t ≥ E [( V ( N π ( t ))) ] < ∞ . Therefore, { N πI ( t ) , t ≥ } and { N πE ( t ) , t ≥ } are uniformly inte-grable, which implies the convergence of expectations in (7) and(8). (cid:3) D MARKOV CHAINS FOR IF We present the Markov chain for IF in Figure 7a. To analyze thischain we will apply a busy period transformation analogous to themethod used in Section 5.2.First, we note that the inelastic jobs under IF see an M / M / k queueing system, and hence their mean response time is known.We therefore only need to consider the mean response time of elas-tic jobs under IF . When there are more than k inelastic jobs in thesystem under IF , elastic jobs receive no service. The amount oftime from when there are ﬁrst k inelastic jobs in the system untilthere are k − M / M / IF . Thisresults in a 1D-inﬁnite Markov chain which we can analyze usingmatrix analytic methods.We depict the busy period transformation for IF in Figure 7b.We then show the busy period states replaced with Coxian distri-butions in Figure 7c. , , , · · · k -1 , k , · · · , , , · · · k -1 , k , · , , , · · · k -1 , k , · · · ... ... ... ... ... λ I λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I kµ I kµ I λ I λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I kµ I kµ I λ I λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I kµ I kµ I λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E (a) Full IF chain , , , · · · k -1 , b , , , , · · · k -1 , b , , , , · · · k -1 , b , ... ... ... ... ... λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I Bλ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I B λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I Bλ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E (b) IF chain with special states , , , · · · k -1 , b , b , , , , · · · k -1 , b , b , , , , · · · k -1 , b , b , ... ... ... ... ... ... λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I γ γ γ λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I γ γ γ λ I λ I λ I λ I λ I µ I µ I µ I ( k -1 ) µ I γ γ γ λ E λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E λ E λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E λ E λ E λ E λ E λ E λ E kµ E ( k -1 ) µ E ( k -2 ) µ E µ E (c) Final 1D IF chain Figure 7: The transformation of the 2D-inﬁnite IF chain toa 1D-inﬁnite chain via the busy period transformation. Spe-cial states representing an M / M / busy period are shown in(b), and these busy periods are approximated by a Coxiandistribution in (c). EFERENCES [1] I.J.B.F. Adan, G.J. van Houtum, and J. van der Wal. Upper and lower bounds forthe waiting time in the symmetric shortest queue system.

Annals of OperationsResearch , 48:197–217, 1994.[2] Kunal Agrawal, I-Ting Angelina Lee, Jing Li, Kefu Lu, and Benjamin Moseley.Practically eﬃcient scheduler for minimizing average ﬂow time of parallel jobs.In , pages 134–144. IEEE, 2019.[3] Kunal Agrawal, Jing Li, Kefu Lu, and Benjamin Moseley. Scheduling parallelDAG jobs online to minimize average ﬂow time. In Robert Krauthgamer, editor,

Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Al-gorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016 , pages 176–189.SIAM, 2016.[4] S. Anand, Naveen Garg, and Amit Kumar. Resource augmentation for weightedﬂow-time explained by dual ﬁtting. In

Proceedings of the Twenty-Third AnnualACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, Janu-ary 17-19, 2012 , pages 1228–1241, 2012.[5] Spyros Angelopoulos, Giorgio Lucarelli, and Nguyen Kim Thang. Primal-dualand dual-ﬁtting analysis of online scheduling algorithms for generalized ﬂow-time problems.

Algorithmica , 81(9):3391–3421, 2019.[6] Eitan Bachmat and Hagit Sarfati. Analysis of size interval task assigment poli-cies.

Performance Evaluation Review , 36(2):107–109, 2008.[7] Benjamin Berg, Jan-Pieter Dorsman, and Mor Harchol-Balter. Towards optimal-ity in parallel scheduling.

Proceedings of the ACM on Measurement and Analysisof Computing Systems , 1(2):1–30, 2018.[8] Carl Bussema and Eric Torng. Greedy multiprocessor server scheduling.

Oper.Res. Lett. , 34(4):451–458, 2006.[9] Jivitej S. Chadha, Naveen Garg, Amit Kumar, and V. N. Muralidhara. A compet-itive algorithm for minimizing weighted ﬂow time on unrelatedmachines withspeed augmentation. In Michael Mitzenmacher, editor,

Proceedings of the 41st An-nual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA,May 31 - June 2, 2009 , pages 679–684. ACM, 2009.[10] Richard W Conway, Louis W Miller, and William L Maxwell.

Theory of schedul-ing . Dover, 2003.[11] Jeﬀrey Dean and Sanjay Ghemawat. Mapreduce: simpliﬁed data processing onlarge clusters.

Communications of the ACM , 51(1):107–113, 2008.[12] Christina Delimitrou and Christos Kozyrakis. Quasar: resource-eﬃcient andqos-aware cluster management.

ACM SIGPLAN Notices , 49(4):127–144, 2014.[13] Jeﬀ Edmonds. Scheduling in the dark.

Theor. Comput. Sci. , 235(1):109–141, 2000.[14] Jeﬀ Edmonds, Sungjin Im, and Benjamin Moseley. Online scalable schedulingfor the k-norms of ﬂow time without conservation of work. In Dana Randall,editor,

Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Dis-crete Algorithms, SODA 2011, San Francisco, California, USA, January 23-25, 2011 ,pages 109–119. SIAM, 2011.[15] Jeﬀ Edmonds and Kirk Pruhs. Scalably scheduling processes with arbitraryspeedup curves. In

Proceedings of the twentieth annual ACM-SIAM symposiumon Discrete algorithms , pages 685–692. SIAM, 2009.[16] Kyle Fox and Benjamin Moseley. Online scheduling on identical machines usingSRPT. In Dana Randall, editor,

Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California,USA, January 23-25, 2011 , pages 120–128. SIAM, 2011.[17] Anshul Gandhi and Mor Harchol-Balter. How data center size impacts the ef-fectiveness of dynamic power management. In , pages 1164–1169.IEEE, 2011.[18] Isaac Grosof, Ziv Scully, and Mor Harchol-Balter. Srpt for multiserver systems.

Performance Evaluation , 127:154–175, 2018.[19] Abhishek Gupta, Bilge Acun, Osman Sarood, and Laxmikant V Kalé. Towardsrealizing the potential of malleable jobs. In , pages 1–10. IEEE, 2014.[20] Anupam Gupta, Sungjin Im, Ravishankar Krishnaswamy, Benjamin Moseley,and Kirk Pruhs. Scheduling heterogeneous processors isn’t as easy as you think.In Yuval Rabani, editor,

Proceedings of the Twenty-Third Annual ACM-SIAM Sym-posium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012 ,pages 1242–1253. SIAM, 2012.[21] Varun Gupta, Benjamin Moseley, Marc Uetz, and Qiaomin Xie. Stochastic onlinescheduling on unrelated machines. In

Integer Programming and CombinatorialOptimization - 19th International Conference, IPCO 2017, Waterloo, ON, Canada,June 26-28, 2017, Proceedings , pages 228–240, 2017.[22] Varun Gupta, Karl Sigman, Mor Harchol-Balter, and Ward Whitt. Insensitivityfor ps server farms with jsq routing.

ACM SIGMETRICS Performance EvaluationReview , 35(2):24–26, 2007.[23] Bruce Hajek. Hitting-time and occupation-time bounds implied by drift analysiswith applications. 14(3):502–525, 1982.[24] Mor Harchol-Balter. Task assignment with unknown duration.

Journal of theACM , 49(2):260–288, March 2002. [25] Mor Harchol-Balter.

Performance modeling and design of computer systems:queueing theory in action . Cambridge University Press, 2013.[26] Mor Harchol-Balter, Cuihong Li, Takayuki Osogami, Alan Scheller-Wolf, andMark Squillante. Cycle stealing under immediate dispatch task assignment. In

Proceedings of the 15th ACM Symposium on Parallel Algorithms and Architectures .[27] Mor Harchol-Balter, Cuihong Li, Takayuki Osogami, Alan Scheller-Wolf, andMark Squillante. Task assignment with cycle stealing under central queue. In

Proceedings of the 23rd International Conference on Distributed Computing Sys-tems , pages 628–637, Providence, RI, May 2003.[28] Mor Harchol-Balter, Takayuki Osogami, Alan Scheller-Wolf, and Adam Wier-man. Multi-server queueing systems with multiple priority classes.

QueueingSystems: Theory and Applications , 51(3–4):331–360, 2005.[29] Mor Harchol-Balter, Alan Scheller-Wolf, and Andrew Young. Surprising resultson task assignment in server farms with high-variability workloads. In

ACMSigmetrics 2009 Conference on Measurement and Modeling of Computer Systems ,pages 287–298, 2009.[30] Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony DJoseph, Randy H Katz, Scott Shenker, and Ion Stoica. Mesos: A platform for ﬁne-grained resource sharing in the data center. In

NSDI , volume 11, pages 22–22,2011.[31] Sungjin Im, Benjamin Moseley, Kirk Pruhs, and Eric Torng. Competitively sched-uling tasks with intermediate parallelizability.

ACM Transactions on ParallelComputing (TOPC) , 3(1):1–19, 2016.[32] Cheeha Kim and Ashok K Agrawala. Analysis of the fork-join queue.

IEEETransactions on computers , 38(2):250–255, 1989.[33] Leonard Kleinrock.

Queueing systems, volume 2: Computer applications , vol-ume 66. Wiley New York, 1976.[34] Guy Latouche and V. Ramaswami.

Introduction to Matrix Analytic Methods inStochastic Modeling . ASA-SIAM, Philadelphia, 1999.[35] Stefano Leonardi and Danny Raz. Approximating total ﬂow time on parallelmachines.

Journal of Computer and System Sciences , 73(6):875–891, 2007.[36] Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu.Can decentralized algorithms outperform centralized algorithms? a case studyfor decentralized parallel stochastic gradient descent. In

Advances in NeuralInformation Processing Systems , pages 5330–5340, 2017.[37] Richard Liaw, Romil Bhardwaj, Lisa Dunlap, Yitian Zou, Joseph E Gonzalez, IonStoica, and Alexey Tumanov. Hypersched: Dynamic resource reallocation formodel development on a deadline. In

Proceedings of the ACM Symposium onCloud Computing , pages 61–73, 2019.[38] David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, andChristos Kozyrakis. Heracles: Improving resource eﬃciency at scale. In

Pro-ceedings of the 42nd Annual International Symposium on Computer Architecture ,pages 450–462, 2015.[39] Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soﬀa.Bubble-up: Increasing utilization in modern warehouse scale computers via sen-sible co-locations. In

Proceedings of the 44th annual IEEE/ACM International Sym-posium on Microarchitecture , pages 248–259, 2011.[40] Robert McNaughton. Scheduling with deadlines and loss functions.

ManagementScience , 6(1):1–12, 1959.[41] Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, RichardLiaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jor-dan, et al. Ray: A distributed framework for emerging { AI } applications. In { USENIX } Symposium on Operating Systems Design and Implementation( { OSDI } , pages 561–577, 2018.[42] Randolph Nelson and Asser N Tantawi. Approximate analysis of fork/join syn-chronization in parallel queues. IEEE transactions on computers , 37(6):739–743,1988.[43] Marcel F. Neuts.

Matrix-Geometric Solutions in Stochastic Models . Johns HopkinsUniversity Press, 1981.[44] Marcel F. Neuts.

Structured Stochastic Matrices of M/G/1 Type and Their Applica-tions . Marcel Dekker, 1989.[45] Takayuki Osogami and Mor Harchol-Balter. Closed form solutions for mappinggeneral distributions to quasi-minimal PH distributions.

Performance Evaluation ,63(6):524–552, 2006.[46] Takayuki Osogami, Mor Harchol-Balter, and Alan Scheller-Wolf. Analysis ofcycle stealing with switching times and thresholds. In

Proceedings of ACM Sig-metrics , pages 184–195, San Diego, CA, June 2003.[47] Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. Op-timus: an eﬃcient dynamic resource scheduler for deep learning clusters. In

Proceedings of the Thirteenth EuroSys Conference , pages 1–14, 2018.[48] Donald R Smith. A new proof of the optimality of the shortest remaining pro-cessing time discipline.

Operations Research , 26(1):197–199, 1978.[49] R. Srikant and Lei Ying.

Communication Networks: An Optimization, Control andStochastic Networks Perspective . Cambridge Univ. Press, New York, 2014.[50] Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, EricTune, and John Wilkes. Large-scale cluster management at google with borg. In

Proceedings of the Tenth European Conference on Computer Systems , pages 1–17,2015.1551] Weina Wang, Mor Harchol-Balter, Haotian Jiang, Alan Scheller-Wolf, andRayadurgam Srikant. Delay asymptotics and bounds for multitask parallel jobs.