[PDF] SEH: Size Estimate Hedging for Single-Server Queues

Abstract

For a single server system, Shortest Remaining Processing Time (SRPT) is an optimal size-based policy. In this paper, we discuss scheduling a single-server system when exact information about the jobs' processing times is not available. When the SRPT policy uses estimated processing times, the underestimation of large jobs can significantly degrade performance. We propose a simple heuristic, Size Estimate Hedging (SEH), that only uses jobs' estimated processing times for scheduling decisions. A job's priority is increased dynamically according to an SRPT rule until it is determined that it is underestimated, at which time the priority is frozen. Numerical results suggest that SEH has desirable performance when estimation errors are not unreasonably large.

Full PDF

AA NEAR - OPTIMAL POLICY FOR SINGLE - SERVER SCHEDULINGWITH ESTIMATED JOB SIZES

Maryam Akbari-Moghaddam

Department of Computing and SoftwareMcMaster UniversityHamilton, Ontario [email protected]

Douglas G. Down

Department of Computing and SoftwareMcMaster UniversityHamilton, Ontario [email protected]

January 5, 2021 A BSTRACT

For a single server system, Shortest Remaining Processing Time (SRPT) is a size-based policy thatis optimal in the sense that, regardless of the job size distribution, it minimizes the number of jobsin the system at each point in time. However, one reason that size-based policies such as SRPT arerarely deployed in practice is that the exact processing times of jobs are usually not known to thescheduler. In this paper, we will discuss scheduling a single-server system when accurate informationabout the jobs’ processing times is not available. When the SRPT policy uses estimated processingtimes, the underestimation of large jobs can signiﬁcantly degrade performance. When the estimationerror distribution is known, the Gittins’ Index policy is known to be optimal in minimizing the meansojourn time in an M/G/1 queue. For a multiplicative error model, we ﬁrst characterize the Gittins’Index policy for any estimation error distribution. We then use insights from the Gittins’ Indexpolicy to construct a simple heuristic, Size Estimate Hedging (SEH), that only uses jobs’ estimatedprocessing times for scheduling while exhibiting near-optimal performance.

Over the past decades, there has been signiﬁcant study on the scheduling of jobs in single-server queues. Whenpreemption is allowed and processing times are known to the scheduler, the Shortest Remaining Processing Time(SRPT) policy is optimal in the sense that, regardless of the processing time distribution, it minimizes the number of jobsin the system at each point in time and hence, minimizes the mean sojourn time [1], [2]. However, scheduling policiessuch as SRPT are rarely deployed in practical settings. A key disadvantage is that the assumption of knowing the exactjob processing times prior to scheduling is not always practical to make. However, it is often possible to estimate the jobprocessing times and use this approximate information for scheduling. The Shortest Estimated Remaining ProcessingTime (SERPT) policy is a version of SRPT that employs the job processing time estimates as if they were error-free andthus, schedules the jobs based on their estimated remaining times. Motivated by the fact that estimates can often beobtained through machine learning techniques, Mitzenmacher [3] studies the potential beneﬁts of using such estimatesfor simple scheduling policies. For this purpose, a price for misprediction, the ratio between a job’s expected sojourntime using its estimated processing time and the job’s expected sojourn time when the job processing time is knownis introduced, and a bound on this price is given. The results in [3] suggest that naïve policies work well, and even aweak predictor can yield signiﬁcant improvements under policies such as SERPT. However, this insight is only madewhen the job processing times have relatively low variance. As discussed below, when job processing times have highvariance, underestimating even a single very large job can severely affect the smaller jobs’ sojourn times.The work in [3] has the optimistic viewpoint that it is possible to obtain improved performance by utilizing processingtime estimates in a simple manner. The more pessimistic view is that when job processing times are estimated,estimation errors naturally arise, and they can degrade a scheduling policy’s performance, if the policy was designedto exploit exact knowledge of job processing times [4]. The SERPT policy may have poor performance when thejob processing times have high variance and large jobs are underestimated. Consider a situation where a job with a a r X i v : . [ c s . OH ] D ec CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021processing time of 1000 enters the system and is underestimated by 10%. The moment the job has been processed for900 units (its estimated processing time), the server assumes that this job’s estimated remaining processing time iszero, and until it completes, the job will block the jobs already in the queue as well as any new arrivals. This situationbecomes more severe when both the actual job processing time and the level of underestimation increase. However,when the job processing times are generated from lower variance distributions, the underestimation of large jobs willnot cause severe performance degradation [5].Shortest Estimated Processing Time (SEPT) policy is a version of the Shortest Processing Time (SPT) policy that skipsupdating the estimated remaining processing times and prioritizes jobs based only on their estimated processing times.Experimental results show that SEPT has impressive performance in the presence of approximate job processing times,besides being much faster and simpler than SERPT [6].In this paper, we will discuss the problem of single-server scheduling when accurate information about the jobprocessing times is not available. In Section 2, we discuss the existing literature for scheduling policies that handleinexact job processing time information. Most of the existing literature analyzes and introduces size-based policieswhen the estimation error is relatively small, restricting applicability of the results. Furthermore, many simulation-basedexaminations only consider certain workload classes and are not validated over a range of job processing times andestimation error distributions. We propose a scheduling policy that exhibits near-optimal performance over a wide rangeof job processing time distributions, estimation error distributions, and workloads.When the job processing times are random, the Gittins’ Index policy [7], a dynamic priority-based policy, is optimal inminimizing the mean sojourn time in an M/G/1 queue [8]. When there are job processing time estimates, the Gittins’Index policy utilizes information about job estimated processing times and estimation error distributions to decidewhich job should be processed next. The assumption of knowing the estimation error distribution before schedulingmay be problematic in real environments. Furthermore, scheduling jobs using the Gittins’ Index policy introducescomputational overhead that may be prohibitive. Our proposed policy is motivated by the form of the Gittins’ indexpolicy.We make the following contributions: We ﬁrst specify the Gittins’ Index policy given a multiplicative estimation errordistribution. Secondly, while the SEPT policy performs well in the presence of estimated job processing times [6], wecombine the merits of SERPT and SEPT and introduce a new heuristic that outperforms SEPT when the job processingtimes exhibit high variance. We show that our proposed policy, which we call the Size Estimate Hedging (SEH) policy,has performance close to the Gittins’ Index policy. Similar to SERPT and SEPT, the SEH policy does not need anyinformation about the estimation error distribution and only uses the job processing time estimates to prioritize thejobs. Finally, we provide numerical results obtained by running a wide range of simulations for both synthetic andreal workloads. The key observations suggest that our proposed SEH policy performs in a near-optimal manner acrossall parameters and outperforms SERPT except in scenarios where the job processing time variance is extremely low.With the presence of more underestimated jobs in the system, the Gittins’ Index and SEH policies outperform the SEPTpolicy and achieve a mean sojourn time (MST) close to the MST of the optimal policy (SRPT) if the estimation errorsare removed. We also notice that the system load does not signiﬁcantly affect the relative performance of the policiesunder evaluation. The SEH policy treats underestimated and overestimated jobs fairly in contrast with other policies thattend to favor only one class of jobs. When the job processing time variance is high, the SEH and SEPT policies obtain anear-optimal mean slowdown value of , which indicates that underestimated large jobs do not delay small jobs.The rest of the paper is organized as follows. Section 2 presents the existing literature in scheduling single-serverqueues with estimated job processing times. Section 3 discusses the Gittins’ Index policy in depth and we deﬁne ourproposed SEH policy in Section 4. Our simulation experiments are described in detail in Section 5. We provide theresults of our simulations in Section 6 and conclude this work and discuss future directions in Section 7. Scheduling policies and their performance evaluation in a preemptive M/G/1 queue have been a subject of interestfor some time. Size-based policies are known to perform better than size-oblivious policies with respect to sojourntimes. In fact, the SRPT policy is optimal in minimizing the mean sojourn time [2]. However, size-based policies havea considerable disadvantage: When the exact processing times are not known to the system before scheduling, whichis often the case in practical settings, their performance may signiﬁcantly degrade. Dell’Amico et al. [9] study theperformance issues of SRPT with estimated job processing times and demonstrate the consequences of job processingtime underestimations under different settings. Studies in Harchol-Balter et al. [10] and Chang et al. [11] discussthe effect of inexact processing time information in size-based policies for web servers and MapReduce systems,respectively. This paper assumes that the processing time is not available to the scheduler until the job is fully processed,2

CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021but that processing time estimations are available. The related literature for this setting is reviewed in the followingparagraph.Lu et al. [4] were the ﬁrst to study this setting. They show that size-based policies only beneﬁt the performance whenthe correlation between a job’s real and estimated processing time is high. The results in Wierman and Nuyens [12],Bender et al. [13], and Becchetti et al. [14] are obtained by making assumptions that may be problematic in practice. Abound on the estimation error is assumed in [12], which results in disregarding scenarios where the jobs are highlyunderestimated/overestimated. On the other hand, [13] and [14] deﬁne speciﬁc job processing time classes and schedulethe jobs based on their processing time class, which is also problematic for very small or very large jobs, especiallywhen the processing time estimations are very inaccurate. This setting is also known as semi-clairvoyant scheduling. Inthis work, we do not assume any bounds on the estimation error or assign jobs to particular processing time classes.When the job processing time distribution is available, the Gittins’ Index policy [7] assigns a score to each job based onthe processing time it has received so far, and the scheduler chooses the job with the highest score to process at eachpoint in time. This policy is proven to be optimal for minimizing the mean sojourn time in a single-server queue whenthe job processing time distribution is known [8]. This policy is speciﬁed in the next section.

We ﬁnd that the Gittins’ Index Policy is an appropriate technique for determining scheduling policies when the jobprocessing time estimation error distribution is known. For a waiting job i , an index G ( a i ) is calculated, where a i is theelapsed processing time. At each time epoch, the Gittins’ Index policy processes the job with the highest index G ( a ) among all of the present waiting jobs [7]. The Gittins’ Index policy requires knowing the estimation error distributionbefore scheduling, which is not always possible and makes this policy difﬁcult to implement in practice. However,examining the form of optimal policies motivates the construction of a simple heuristic that can be used when the errordistribution is not known. As proposed in Section 4, our SEH policy is a dynamic-priority scheduling policy that hasperformance that is very close to the Gittins’ Index policy and only requires the job processing time estimates and thejobs’ elapsed processing times to make scheduling decisions. Consider an M/G/1 queue where preemption is allowed. We assume that a job’s processing time is not known uponarrival; however, an estimated processing time is provided to the scheduler and the distribution of estimation errors isknown. We concentrate on a multiplicative error model where the error distribution is independent of the job processingtime distribution. The estimated processing time ˆ S of a job is deﬁned as ˆ S = SX where S is the job processing timeand X is the job processing time estimation error. We assume that the value of ˆ S is known upon each job’s arrival andis given by ˆ s . The choice of a multiplicative error model results in having an absolute error proportional to the jobprocessing time S , thus avoiding situations where the estimation errors tend to be worse for small jobs than for largejobs. Furthermore, Dell’Amico et al. [9] and Pastorelli et al. [15] suggest that a multiplicative error model is a betterreﬂection of reality.To deﬁne our scheduling policies, we require the notion of a quantum of service. The job with the highest priorityis processed for a quantum of service ∆ until either it completes or a new job arrives. At that point, priorities arerecomputed. The Gittins’ rule takes the job’s elapsed processing time into account and calculates the optimal quantumof service ∆ ∗ ( a ) that it should receive.The associated efﬁciency function J ( a, ∆) , a, ∆ ≥ of a job with processing time S , elapsed processing time a andquantum of service ∆ is deﬁned as J ( a, ∆) = P ( S − a ≤ ∆ | S > a ) E [min { S − a, ∆ }| S > a ] . (1)The numerator is the probability that the job will be completed within a quantum of service ∆ , and the denominator isthe expected remaining processing time a job with elapsed processing time a and quantum of service ∆ will require tobe completed. With our estimation model in mind, (1) can be rewritten as J ( a, ∆ , ˆ s ) = P ( ˆ sX − a ≤ ∆ | ˆ sX > a ) E [min { ˆ sX − a, ∆ }| ˆ sX > a ] . CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021The Gittins’ index G ( a, ˆ s ) , a ≥ , is deﬁned by G ( a, ˆ s ) = sup ∆ ≥ J ( a, ∆ , ˆ s ) . The optimal quantum of service is denoted as ∆ ∗ ( a, ˆ s ) = sup { ∆ ≥ | G ( a, ˆ s ) = J ( a, ∆ , ˆ s ) } . Suppose that the lower and upper limits on the estimation error distribution are l and u , respectively ( l may be zero and u may be ∞ ). The Gittins’ index can then be written as G ( a, ˆ s ) = (cid:40) s − aE [ X | X ≤ ˆ sa ] , ˆ sa < u, s − aE [ X ] , otherwise, (2)where ∆ ∗ = ˆ sl − a . For instance, the Gittins’ index for a Log − N ( µ, σ ) error distribution is G ( a, ˆ s ) = 1ˆ s − ae µ + g ( a, ˆ s ) , (3)where g ( a, ˆ s ) = σ φ [ ln( ˆ sa − µ − σ ) σ ]2 φ [ ln( ˆ sa − µ ) σ ] , and φ is the cumulative distribution function of the Log − N (0 , σ ) distribution. Note that for the Log-Normaldistribution as the job processing time error distribution, the second case in (2) cannot happen.The server (preemptively) processes the job with the highest index at each decision epoch. Decisions are made when(i) a new job arrives to the queue, (ii) the current job under processing completes, or (iii) the current job receives itsoptimal quantum of service and does not complete. If there are multiple jobs that have the same highest index andall have zero optimal quanta of service, the processor will be shared among them as long as this situation does notchange. If there is only one job with the highest index and zero optimal quantum of service, its index should be updatedthroughout its processing [8].In the next section, we will examine the Gittins’ Index policy and use its form to motivate a simple heuristic that usesonly the job processing time estimates and elapsed processing times while performing desirably both with high varianceand low variance job processing times. Although the Gittins’ Index policy is optimal in terms of minimizing the mean sojourn time in an

M/G/ queue[8], the assumption of knowing the estimation error distribution might not always be practical to make. Furthermore,forming the Gittins’ Index policy’s efﬁciency function has signiﬁcant computational overhead. As a result, this policymay be a problematic choice for real environments where the scheduling speed is important. Taking the score in(3) into account, for any job with an estimated processing time ˆ s , the score calculated with the Gittins’ Index policycontinuously increases until the job completes. Fig. 1 shows this score for a job with an estimated processing time of20 and an estimation error generated from a Log − N (0 , σ ) distribution as a function of its elapsed processing time.We observe that for larger values of elapsed processing time, the slope of the score is decreasing. This issue motivatesus to make a very simple approximation to deﬁne our policy. We show that this policy, which we call the Size EstimateHedging (SEH) policy, not only avoids the calculation overhead in (2) but we will see that it can perform very closeto the Gittins’ Index policy while being independent from the choice of estimation error distribution. Figure 2 showsthe score calculated with the SEH policy for a job with an estimated processing time of 20 as a function of its elapsedprocessing time.To deﬁne the policy in detail, we ﬁrst note that the elapsed processing time of each job is zero upon arrival, and thesystem does not have any information about the jobs other than their estimated processing time. The Gittins’ index forthe SEH policy is such that it assigns scores like the SERPT policy up to the moment the elapsed processing time ofthe job becomes equal to its estimated processing time. When the job’s elapsed processing time becomes equal to itsestimated processing time, we consider the job as an underestimated job and stop increasing its score. This is consistentwith the score function having decreasing slope at some point beyond the point at which the elapsed processing timereaches the estimated processing time, as in Fig. 1. In other words, we would like to transition between SERPT when4 CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021Figure 1: Job score calculated with the Gittins’ Index policy as a function of the elapsed processing timeFigure 2: Job score calculated with the SEH policy as a function of the elapsed processing timewe cannot determine if a job processing time is underestimated to a ﬁxed priority like SEPT when it is determinedthat underestimation has occurred. One consequence of using this policy is that any underestimated small job can stillreceive a “high" score and be processed, while underestimated large jobs will have a much lower score and do notinterfere, even with underestimated small jobs. Furthermore, not needing to know the job processing time estimationerror distribution, the SEH Policy does not have much overhead. Thus, it can schedule the jobs at a speed comparableto fast policies such as the SEPT policy.We introduce the Gittins’ index of our SEH policy as G ( a, ˆ s ) = (cid:40) s − a (1 − a s ) , ≤ a < ˆ s, s , a ≥ ˆ s, (4)where the scheduling decisions are only made at arrivals and departures.With the Gittins’ index in (4), a job’s score will increase up to the point that it receives processing equal to its estimatedprocessing time and then receives a constant score of s until it completes. We note that we have written (4) under theassumption that we have no information about the estimation error distribution. However, if for example we know E [ X ] , we can modify (4) by incorporating this information.5 CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021

Parameter Deﬁnition Default , k shape for Weibull job processing time distribution . σ sigma in the Log-Normal error distribution . ρ system load . Table 1: Parameter Settings

In this section, we introduce the size-based scheduling policies considered for evaluation. We consider the SRPT policywhen the exact job processing times, given by s , are known before scheduling as our baseline policy. The SRPT policyis an “ideal" policy since it assumes that there are no errors in estimating the processing time.• SERPT policy — The SERPT policy is a version of SRPT that uses the estimates of job processing times as ifthey were the true processing times.•

SEPT policy — The SPT policy skips the SRPT policy’s updating of remaining processing times and onlyschedules jobs based on their estimated processing time.•

Gittins’ Index policy and SEH policy — The Gittins’ Index Policy and our proposed simpliﬁed priority-basedSEH policy are explained in detail in Section 3 and Section 4, respectively.All these policies ﬁt into the “scoring" framework, and they assign scores to each job and process the jobs in the queuein the descending order of their scores. Moreover, pre-emption is allowed, and a newly-arrived job can pre-empt thecurrent job if it has a higher score. The score functions in (2) and (4) show how we calculate the scores for the Gittins’Index Policy and SEH policy, respectively. The score functions for SRPT, SERPT, and SEPT are provided in (5), (6),and (7), respectively. G ( a, s ) = 1 s − a , (5) G ( a, ˆ s ) = (cid:26) s − a , ˆ s > a, ∞ , ˆ s ≤ a, (6) G ( a, ˆ s ) = 1ˆ s . (7)As for SEH, the scheduling decisions are only made at arrival and departure epochs. We evaluate the policies deﬁned in Section 5.1 with respect to two performance metrics: Mean Sojourn Time and MeanSlowdown. When the job processing times have large variance, the sojourn times for small jobs and large jobs differsigniﬁcantly. Thus, we use the per job slowdown, the ratio between a job’s sojourn time and its processing time [16].

We would like to evaluate the policies over a wide range of job processing time and estimation error distributions. Toaccomplish this, we use the same settings that Dell’Amico et al. [9] use in their work. Table 1 provides the defaultparameter values that we use in our simulation study. We now provide details of our simulation model.

Job Processing Time Distribution — We consider an

M/G/ queue where the job processing time is generatedaccording to a Weibull distribution. This allows us to model high variance job processing time distributions, whichbetter reﬂect the reality of computer systems (see [17], [18] for example). In general, the choice of a Weibull distributiongives us the ﬂexibility to model a range of scenarios. The shape parameter k in the Weibull distribution allows us toevaluate both high variance (smaller k ) and low variance (larger k ) job processing time distributions.6 CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021Considering that the job processing time distribution plays a signiﬁcant role in the scheduling policies’ performance andsize-based policies show different behaviors with high variance job processing time distributions, we choose k = 0 . as our default shape for the Weibull job processing time distribution. With considering this choice for k , the schedulingpolicies’ performance is highly inﬂuenced by a few very large jobs that take a substantial percentage of the system’stotal processing time. We vary k between . and , considering speciﬁc values of . , . , . , . , , and .We show that the Gittins’ Index policy and SEH policy perform best in the presence of high variance job processingtime distributions. Job Processing Time Error Distribution — We have chosen the Log-Normal distribution as our error distribution sothat a job has an equal probability of being overestimated or underestimated. The Gittins’ index for this estimation errordistribution is shown in (3). The σ parameter controls the correlation between the actual and estimated processing time,as well as the estimation error variance. By increasing the σ value, the correlation coefﬁcient becomes smaller, and wewould be biased towards overestimating the job processing times. Moreover, the estimation error variance increaseswith σ , resulting in the occurrence of more large overestimations. Therefore, we choose σ = 0 . as the defaultvalue to generate underestimated and overestimated jobs with equal likelihood. We vary σ between . and withspeciﬁc values of . , . , . , . , , and to better illustrate the effect of σ on the evaluated policies’ performance. System Load — Following Lu et al. [4], we consider ρ = 0 . as the default load value and vary ρ between . (lightlyloaded) and . (heavily loaded) with increments of 0.05 and an additional system load of . . Number of Jobs — The number of jobs in each simulation run is , . We ﬁx the conﬁdence level at , and foreach simulation setting, we continue to perform simulation runs until the width of the conﬁdence interval is within of the estimated value. For low variance processing time distributions (larger k ), simulation runs sufﬁce; however,more simulation runs are required for high variance processing time distributions (smaller k ). In this section, we evaluate the performance of the policies in Section 5.1 by running experiments on both synthetic andreal workloads. We run different simulations by generating synthetic workloads based on different job processing timeand error parameters and we analyze these parameters’ effect on the performance of each of the policies.For evaluating our results in practical environments, we consider a real trace from a Facebook Hadoop cluster in 2010[19] and show that the policies’ performance is consistent with the results we obtained with synthetic workloads. Thekey observations, validated both on synthetic and real workloads, are highlighted as follows:• The Gittins’ Index policy and our proposed SEH policy outperform SERPT, except for the values of k and σ that correspond to the job processing time distributions where the variance is very low.• The Gittins’ Index and SEH policies outperform SEPT with lower values of σ (more job processing timeunderestimations) and have an MST near the optimal MST obtained without any estimation errors.• The Gittins’ Index policy performs best in reducing the MST of underestimated jobs, and SEH performs wellin reducing both the MST of overestimated jobs and underestimated jobs.• The load parameter does not have a signiﬁcant effect on the relative values of the MST obtained with theevaluated policies.• The SEH and SEPT policies have a near-optimal mean slowdown of 1 when the job processing time estimateshave a high variance.• The SEH policy is near-optimal across all job processing time and estimation error distributions considered.In what follows, we discuss the numerical results and how they support these key observations. Synthetic Workloads — We ﬁrst note that the job processing time k parameter and the estimation error σ parameterhave the most impact on the policies’ performance. Thus, we focus on varying these parameters. We show that theGittins’ Index policy and our SEH policy outperform SERPT, except for the values of k and σ that correspond todistributions with extremely low variance. For the scenarios where we do not state the parameter values explicitly, theparameters in Table 1 (see Section 5.3) are considered.Figure 3 captures the impact of job processing time variance and displays the mean sojourn time (MST) of the Gittins’Index, SEH, SERPT, and SEPT policies normalized against the mean sojourn time obtained with SRPT with σ having7 CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021the default value of . . We observe that for a high variance job processing time distribution ( k = 0 . ), SERPTperforms very poorly compared to the other policies due to the presence of large, underestimated jobs. We note that theSERPT policy performs well if the variance of the processing times is sufﬁciently low. Based on Figure 3, we noticethat the gap between SEPT and the Gittins’ Index policy grows slightly when the job processing time variance is lower.The gap between SEH and the Gittins’ Index policy also grows but not to the same degree as SEPT. For k > . , theperformance of the Gittins’ Index policy, SEH, and SERPT are quite close. In fact, we observe that our SEH policyperforms very close to the Gittins’ Index policy. Furthermore, we notice that as the variance in processing times getssmaller, the gap between what is achievable by the policy under evaluation and what is achievable if there were noerrors is larger than for the high variance scenarios.Figure 3: Impact of k on the mean sojourn timeIn Figure 3, we observe that the Gittins’ Index policy’s MST is a little worse than the MST of the SEH policy for k = 0 . . The reason is that lower values of k cause more large jobs in the system, and the Gittins’ Index policy allowssome of these jobs to have higher scores. Thus, it spends more time on some larger jobs that remain in the system at theend of the simulation run. We will see this effect in a number of other policies, where the Gittins’ Index policy appearsto be outperformed, which is clearly not possible.The shape parameter k affects the job processing time variance and the scheduling policies’ performance the most,especially when the job processing time distribution has high variance. We can be optimistic about using estimates if thevariance is low, but we have to be careful in choosing the scheduling policy if the job processing time variance is high.The literature focuses on high variance workloads, and we will continue evaluating the policies on such workloads. InFigure 4, we display the normalized MST of the policies against the MST of the SRPT policy under varying σ , ρ = 0 . ,and the default k = 0 . . We notice that the Gittins’ Index, SEH, and SEPT policies are relatively insensitive to the σ value, while the gap between these three policies and SERPT increases with increasing σ . In fact, the Gittins’ Index andSEH policies outperform SEPT with σ ≤ . and have an MST near the optimal MST obtained without any estimationerrors. The reason is that the impact of the Gittins’ Index policy and SEH becomes more prominent with the presenceof more underestimations ( σ < . ).Figure 5a, Figure 5b, and Figure 5c show the result of simulations with considering the default values in Table 1 andvarying the system load between . and . for all jobs, only the overestimated jobs, and only the underestimatedjobs, respectively. If we concentrate only on one class of jobs (overestimated or underestimated), the Gittins’ Indexpolicy is no longer optimal. We observe that the Gittins’ Index and SEH policies perform best in minimizing the overallMST given different system loads. The Gittins’ Index policy performs best in reducing the MST of underestimatedjobs and the SEH policy has desirable performance in reducing the MST of all jobs, the overestimated jobs, and theunderestimated jobs. Figure 5a shows that the load parameter does not have a signiﬁcant effect on the MST since theratio between the MST of each policy and the MST of SRPT remains almost unchanged.The mean slowdown is the other metric we consider to evaluate the performance of the policies. High values of meanslowdown indicate that some jobs spend a disproportionate amount of time waiting. In Figure 6, we show the mean8 CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021Figure 4: Impact of σ on the mean sojourn time (a) All jobs (b) Overestimated jobs (c) Underestimated jobs Figure 5: Impact of ρ on the mean sojourn timeslowdown for different values of k with ρ = 0 . and a σ value of . . We note that the mean slowdown of SERPT isnot included since it is several orders of magnitude higher for k ≤ . . We see that the Gittins’ Index, SEH, and SEPTpolicies have similar performance. The SEH and SEPT policies have a near-optimal mean slowdown of 1 under highvariance job processing time distributions (smaller k ). The reason is that under these policies, the very small jobs thatmake up the majority of the jobs are processed the moment they enter the system, and no large job blocks them.We conclude our experiments on synthetic workload by indicating that the Gittins’ Index and SEH policies performbetter than SERPT under different parameter settings. The only exception is extreme situations like the low variancejob processing time distributions (larger k ) where SERPT outperforms SEH and works analogously to the Gittins’Index policy. Real Workloads — We consider a Facebook Hadoop cluster trace from [19] and show that the results with thisworkload look very similar to those with synthetic workloads generated with k = 0 . . The trace consists of , jobs. We assume each job’s processing time is the sum of its input, intermediate output, and ﬁnal output bytes. Thejob processing times of this workload follow a high variance distribution, and thus, we run hundreds of simulationsto reach the desired conﬁdence interval (as described in Section 5.3). We vary the error estimation distribution’s σ parameter to evaluate different scenarios of job overestimation and underestimation. To maintain the default settings inTable 1, we deﬁne the processing speed in bytes per second. The arrival rate λ is chosen to yield the desired ρ = 0 . .Figure 7 shows the MST normalized against the optimal MST obtained with SRPT with varying σ between . (moreprocessing time underestimation) and (more processing time overestimation). We observe that this result is consistent9 CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021Figure 6: Impact of k on the mean slowdownwith what is shown in Figure 4. The Gittins’ Index and SEH policies perform best when more job processing timeunderestimations are present in the system, and they perform better than SERPT across all values of σ .Figure 7: MST of the Facebook Hadoop workloadIn Figure 8, we display the mean slowdown obtained with the policies under evaluation. Similar to Figure 6, we havenot included the mean slowdown of SERPT since it is several orders of magnitude higher. We observe that for σ < . ,where we have more job processing time underestimations, and for σ = 0 . , where processing time underestimationsand overestimations are equally likely, the Gittins’ Index, SEH, and SEPT policies have a near-optimal mean slowdownof 1. The SEPT policy’s performance is best when more job processing time overestimations are present.10 CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021Figure 8: Mean slowdown of the Facebook Hadoop workload

The SRPT policy, which is optimal for scheduling in single-server systems, may have problematic performance whenjob processing times are estimated. This work has considered the problem of scheduling with the presence of inaccuratejob processing time information. A multiplicative error model is used to produce estimation errors proportional to thejob processing times. Considering that the Gittins’ Index policy is optimal for scheduling jobs in an M/G/1 queue whenthe job processing time distribution is known, we have speciﬁed the Gittins’ Index for any estimation error distribution.Inspired by the Gittins’ Index policy, we have introduced a novel heuristic that combines the merits of SERPT and SEPTand requires minimal calculation overhead and no information about the estimation error distribution. Our numericalresults demonstrate that the SEH policy has near-optimal performance in minimizing both the MST and mean slowdownof the system under different parameters and outperforms SERPT except in scenarios where the job processing timevariance is extremely low. Examining the SEH policy under other error models as well as analytic bounds as to how farit is from optimal could be investigated in future work.Not much work has been done in the area of multi-server scheduling with the presence of estimation errors. Onemajor reason is that determining optimal policies for multi-server queues is much more challenging compared to thesingle-server case. Mailach and Down [20] suggest that when SRPT is used in a multi-server system, the estimationerror affects the system’s performance to a lesser degree than in a single-server system. Grosof et al. [21] prove thatmulti-server SRPT is asymptotically optimal when an M/G/k system is heavily loaded. Our work only evaluates theperformance of SEH in a single-server framework so we leave the extension and evaluation of this policy in multi-serverqueues for future investigation.

References [1] L. E. Schrage and L. W. Miller, “The queue M/G/1 with the shortest remaining processing time discipline,”

Operations Research , vol. 14, no. 4, pp. 670–684, 1966.[2] L. Schrage, “Letter to the editor—a proof of the optimality of the shortest remaining processing time discipline,”

Operations Research , vol. 16, no. 3, pp. 687–690, 1968.[3] M. Mitzenmacher, “Scheduling with predictions and the price of misprediction,” arXiv preprint arXiv:1902.00732 ,2019.[4] D. Lu, H. Sheng, and P. Dinda, “Size-based scheduling policies with inaccurate scheduling information,” in

The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation ofComputer and Telecommunications Systems, 2004.(MASCOTS 2004). Proceedings.

IEEE, 2004, pp. 31–38.11

CHEDULING WITH ESTIMATED JOB SIZES - J

ANUARY

5, 2021[5] R. Mailach, “Robustness to estimation errors for size-aware scheduling,” Ph.D. dissertation, McMaster University,Department of Computing and Software, Canada, 2017.[6] M. Dell’Amico, “Scheduling with inexact job sizes: The merits of shortest processing time ﬁrst,” arXiv preprintarXiv:1907.04824 , 2019.[7] J. C. Gittins, “Bandit processes and dynamic allocation indices,”

Journal of the Royal Statistical Society: Series B(Methodological) , vol. 41, no. 2, pp. 148–164, 1979.[8] S. Aalto, U. Ayesta, and R. Righter, “On the Gittins index in the M/G/1 queue,”

Queueing Systems , vol. 63, no.1-4, p. 437, 2009.[9] M. Dell’Amico, D. Carra, and P. Michiardi, “PSBS: Practical size-based scheduling,”

IEEE Transactions onComputers , vol. 65, no. 7, pp. 2199–2212, 2015.[10] M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal, “Size-based scheduling to improve web performance,”

ACM Transactions on Computer Systems (TOCS) , vol. 21, no. 2, pp. 207–233, 2003.[11] H. Chang, M. Kodialam, R. R. Kompella, T. Lakshman, M. Lee, and S. Mukherjee, “Scheduling in mapreduce-likesystems for fast completion time,” in . IEEE, 2011, pp. 3074–3082.[12] A. Wierman and M. Nuyens, “Scheduling despite inexact job-size information,” in

Proceedings of the 2008 ACMSIGMETRICS international conference on Measurement and modeling of computer systems , 2008, pp. 25–36.[13] M. A. Bender, S. Muthukrishnan, and R. Rajaraman, “Improved algorithms for stretch scheduling,” in

Proceedingsof the thirteenth annual ACM-SIAM symposium on Discrete algorithms , 2002, pp. 762–771.[14] L. Becchetti, S. Leonardi, A. Marchetti-Spaccamela, and K. Pruhs, “Semi-clairvoyant scheduling,”

Theoreticalcomputer science , vol. 324, no. 2-3, pp. 325–335, 2004.[15] M. Pastorelli, A. Barbuzzi, D. Carra, M. Dell’Amico, and P. Michiardi, “HFSP: size-based scheduling for Hadoop,”in . IEEE, 2013, pp. 51–59.[16] A. Wierman, “Fairness and scheduling in single server queues,”

Surveys in Operations Research and ManagementScience , vol. 16, no. 1, pp. 39–48, 2011.[17] M. Harchol-Balter, “The effect of heavy-tailed job size distributions on computer system design.” in

Proc. ofASA-IMS Conf. on Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics , 1999.[18] M. E. Crovella, M. S. Taqqu, and A. Bestavros, “Heavy-tailed probability distributions in the World Wide Web,”

Apractical guide to heavy tails , vol. 1, pp. 3–26, 1998.[19] Y. Chen, S. Alspaugh, and R. Katz, “Interactive analytical processing in big data systems: A cross-industry studyof mapreduce workloads,” arXiv preprint arXiv:1208.4174 , 2012.[20] R. Mailach and D. G. Down, “Scheduling jobs with estimation errors for multi-server systems,” in , vol. 1. IEEE, 2017, pp. 10–18.[21] I. Grosof, Z. Scully, and M. Harchol-Balter, “SRPT for multiserver systems,”