[PDF] Performance Analysis of Modified SRPT in Multiple-Processor Multitask Scheduling

Abstract

In this paper we study the multiple-processor multitask scheduling problem in both deterministic and stochastic models. We consider and analyze Modified Shortest Remaining Processing Time (M-SRPT) scheduling algorithm, a simple modification of SRPT, which always schedules jobs according to SRPT whenever possible, while processes tasks in an arbitrary order. The M-SRPT algorithm is proved to achieve a competitive ratio of Θ(logα+β) for minimizing response time, where α denotes the ratio between maximum job workload and minimum job workload, β represents the ratio between maximum non-preemptive task workload and minimum job workload. In addition, the competitive ratio achieved is shown to be optimal (up to a constant factor), when there are constant number of machines. We further consider the problem under Poisson arrival and general workload distribution (\ie, M/GI/N system), and show that M-SRPT achieves asymptotic optimal mean response time when the traffic intensity ρ approaches 1 , if job size distribution has finite support. Beyond finite job workload, the asymptotic optimality of M-SRPT also holds for infinite job size distributions with certain probabilistic assumptions, for example, M/M/N system with finite task workload.

Full PDF

aa r X i v : . [ c s . PF ] J un A Note on Multiple-Processor Multitask Scheduling

Wenxin LiDepartment of ECEThe Ohio State University [email protected]@osu.edu

Ness ShroﬀDepartment of ECE and CSEThe Ohio State University [email protected]

June 12, 2020

Abstract

In this paper we study the multiple-processor multitask scheduling problem in the determin-istic and stochastic models. We consider and analyze M-SRPT, a simple modiﬁcation of theshortest remaining processing time algorithm, which always schedules jobs according to SRPTwhenever possible, while processes tasks in an arbitrary order. The modiﬁed SRPT algorithmis shown to achieve an competitive ratio of Θ(log α + β ) for minimizing ﬂow time, where α denotes the ratio of maximum job workload and minimum job workload, β represents the ratiobetween maximum non-preemptive task workload and minimum job workload. The algorithm isshown to be optimal (up to a constant factor) when there are constant number of machines. Wefurther consider the problem under poisson arrival and general workload distribution, M-SRPTis proved to be asymptotic optimal when the traﬃc intensity ρ approaches 1, if the task size isupper bound by the derived upper bound η . With widespread applications in various manufacturing industries, scheduling jobs to minimize thetotal ﬂow time (also known as response time, sojourn time and delay) is one of the most classicand fundamental problem in operation research and has been extensively studied. As an importantmetric measuring the quality of a scheduler, ﬂow time, is formally deﬁned as the diﬀerence betweenjob completion time and releasing date, and characterizes the amount of time that the job spendsin the system.Optimizing the objective of ﬂow time has been considered both in oﬄine and online scenarios. Ifpreemption is allowed, shortest remaining processing time (SRPT) discipline is shown to be optimalin single machine environment. Many generalizations of this basic formulation become NP-hard,for example, non-preemptive single machine model and preemptive model with two machines [5].When jobs arrive online, no information about jobs is known to the algorithm in advance, severalalgorithms with logarithmic competitive ratio are proposed in various settings [5, 1]. On the otherhand, while SRPT minimizes the mean response time sample-path wise, it requires the knowledge ofremaining job service time. Gittins proved that the Gittins index policy minimizes the mean delayin an M/G/1 queue, which only requires the access to the information about job size distribution.Though much progresses have been made in single-task job scheduling, there is a lack of theo-retical understanding on multiple-processor multitask scheduling (MPMS). Jobs with multiple tasks1re common and relevant in practice, as jobs and tasks can take many diﬀerent forms in moderncomputing environment. For example, for the objective of computing matrix vector product, wecan divide matrix elements and vector elements into groups of columns and rows respectively, thenthe tasks correspond to the block-wise multiplication operations. Moreover, tasks can also be map,shuﬄe and reduce procedures in MapReduce framework. To this end, in this paper, we investi-gate how to minimize the total ﬂow time of multitask jobs in a multiserver system, where a job isconsidered to be completed until all the tasks within the job are ﬁnished.With the tremendous increasing in data size and job complexity, we cannot emphasize toomuch the importance of multiple-processor multitask scheduling in modern era. More speciﬁcally,distributed computing has indeed become a useful tool to tackle many large-scale computationalchallenges, since parallel algorithms can be advantageous over their sequential counterparts, bydividing computational expensive jobs over machines as multiple tasks, to utilize the combina-tion computational power of processors. For example, there are two basic perspectives to designdistributed scalable machine learning methods [4], data-parallel and model-parallel. In the ﬁrstperspective, the data set is partitioned and dispersed into diﬀerent machines, each machine has alocal copy of the whole model, while model-parallel framework partitions and distributes the modelparameters on diﬀerent workers and update a subset of parameters on each worker. A naturalquestion arising is, how to design eﬃcient scheduling algorithms to minimize the total amount timethat the multitask jobs spend in the system.On the other hand, the ability to preempt jobs is important for desirable performance in ﬂowtime minimization [3, p.506]. When preemption is not available, the approach of checkpoint basedpreemption is suggested [3, p.506]. Checkpointing is a tolerate failure technique to avoid appli-cations with large processing time being forced to restart from the very beginning. Similarly wecan take extra time to checkpoint jobs and restart again from the last checkpoint, to provide moreﬂexibility for scheduling jobs. The number of checkpoints can be varied, it is important to under-stand the eﬀects of checkpoints on the system performance. If there are only a few checkpoints,the performance is close to that under non-preemptive disciplines, otherwise we need to pay a largeamount of extra time for saving job states and restarting jobs, when the number of checkpoints islarge. It is natural to ask, how to choose the number of checkpoints to ensure good performance.

Related Work.

For the MapReduce framework, Wang et al. [12] studied the problem of schedul-ing map tasks with data locality, and proposed a map task scheduling algorithm consisting of theJoin the Shortest Queue policy the MaxWeight policy. The algorithm asymptotically minimizesthe number of backlogged tasks (which is directly related to the delay performance based on Lit-tle’s law), when the arrival rate vector approaches the capacity region boundary. Zheng et al. [14]proposed an online scheduler called available shortest remaining processing time (ASRPT), whichis shown to achieve an eﬃciency ratio no more than two.However, little is known about multitask scheduling. Scully et. al [9] presented the ﬁrst theo-retical analysis of single-processor multitask scheduling problem, and gave an optimal policy that iseasy to compute for batch arrival, together with the assumption that the processing time of taskssatisﬁes the aged Pareto distributions. To model the scenario when the scheduler has incompleteinformation about job size, Scully et. al [10] introduced the multistage job model and proposedan optimal scheduling algorithm for multistage job scheduling in M/G/1 queue. In addition, theclosed-form expression of mean response time is given for the optimal scheduler. Sun et al. [11]studied the multitask scheduling problem when all the tasks are of unit size, and proved that2mong causal and non-preemptive policies, fewest unassigned tasks ﬁrst (FUT) policy, earliest duedate ﬁrst (EDD) policy, and ﬁrst come ﬁrst serve (FCFS) are near delay-optimal in distribution(stochastic ordering) for minimizing the metric of average delay, maximum lateness and maximumdelay respectively.

Contributions.

In this paper we answer the aforementioned questions and our contributions aresummarized as follows.

The analysis in this paper follows from and is closely related tothat in [2, 6] . • We present Algorithm 1 in Section 3, which is a simple modiﬁcation of SRPT and achieves acompetitive ratio of O (log α + β ), where α is the maximum-to-minimum job workload ratio, β represents the ratio between maximum non-preemptive task workload and minimum jobworkload. In addition, it can be shown that no o (log α + β )-competitive algorithm exists whenthe number of machines is constant. For the class of work-conserving algorithms, O ( α + β − ε )is the best possible competitive ratio. • Under certain probabilistic structure on the problem instances, we further reveal the followingconclusion about the algorithm in Section 4, by utilizing our aforementioned result in theadversarial setting. Assuming that jobs arrive according to a poisson process, we prove thatAlgorithm 1 is optimal when load ρ →

1, as long as the workload of non-preemptive tasks areupper bounded by threshold η speciﬁed in equation (10). Deterministic Model.

We are given a set J = { J , J , . . . , J n } of n jobs arriving online overtime, together with a set of N identical machines. Job i consists of n i tasks and its workload p i is equal to the total summation of the processing time of tasks, i.e. , p i = P ℓ ∈ n i p ( ℓ ) i , where p ( ℓ ) i represents the processing time of task ℓ . Tasks can be either preemptive or non-preemptive. A taskis non-preemptive if it is not allowed to interrupt the task once it starts service, i.e. , the task is runto completion. All the information of job i is unknown to the algorithm until its releasing date r i .Under any given scheduling algorithm, the completion time of job j under the algorithm, denotedby C j , is equal to the maximum completion time of individual tasks within the job. Formally, let C ( ℓ ) j be the completion time of task ℓ in job j , then C j = max ℓ ∈ [ n i ] C ( ℓ ) j . The ﬂow time of job j is deﬁned as F j = C j − r j , our objective is to minimize the total ﬂow time P j ∈ [ n ] F j . Note thatdiﬀerent tasks within the same job may or may not be allowed to be processed in parallel, ouranalysis holds for both scenarios.Throughout the paper we use α = max i ∈ [ n ] p i / min i ∈ [ n ] p i to denote the ratio of maximum andminimum job workload. Let η = max { p ( ℓ ) i | task ℓ of job i is non-preemptive } be the maximum pro-cessing time of a non-preemptive task, β = η/ min i ∈ [ n ] p i be the ratio between η and minimum jobworkload. In some sense, parameters β and η represent the degree of non-preemptivity and exhibitsa trade-oﬀ between the preemptive and non-preemptive setting, since the problem degenerates tothe preemptive case when η = 1, and the problem approaches the non-preemptive case when η increases to max i ∈ [ n ] p i .The deﬁnitions of work-conserving algorithms and competitive are formally given as following,notations of this paper are summarized in Table 1.3 eﬁnition 1 (Work-conserving scheduling algorithm [3]) . A scheduling algorithm π is called work-conserving if it never idles machines when there exists at least one feasible job or task awaiting theexecution in the system. Here a job or task is called feasible, if it satisﬁes all the given constraintsof the system (e.g, precedence constraint, preemptive and non-preemptive constraint, etc). Deﬁnition 2 (Competitive ratio) . The competitive ratio of online algorithm A refers to the worstratio of the cost incurred by A and that of optimal oﬄine algorithm A ∗ over all input instances ω in Ω , i.e. , CR A = max ω ∈ Ω Cost A ( ω )Cost A ∗ ( ω ) . In the multiple-processor multitask scheduling problem, the cost is the total ﬂow time under instance ω = { ( r i , p ( ℓ ) i ) } ℓ ∈ [ n i ] ,i ∈ [ n ] . Stochastic Model.

In the stochastic setting, we assume that jobs arrive into the system accord-ing to a Poisson process with rate λ . Job processing time are i.i.d distributed with probabilitydensity function f ( · ). The analysis relies on the concept of busy period, which is deﬁned as follow-ing. Deﬁnition 3 (Busy Period [3]) . Busy period is deﬁned to be the longest time interval in which nomachines are idle.

We use B ( w ) to denote the length of a busy period with started by a workload of w . It can be seenthat the B ( · ) is an additive function [3, p.460], i.e. , for ∀ w , w , B ( w + w ) = B ( w ) + B ( w ) , as a busy period with initial workload of w + w can be regarded as a busy period started byinitial workload w , following a busy period started by initial workload w . Moreover, the lengthof a busy period with initial workload of w and load ρ is shown to be equal to B ( w ) = E [ w ]1 − ρ . (1) N number of machines n number of jobs r i arrival time of job ip i total workload of job iρ ≤ y load composed of jobs with size 0 to y : ρ y = λ · R y tf ( t ) dtα job size ratio: α = max i ∈ [ n ] p i min i ∈ [ n ] p i η maximum processing time of a single task β η min i ∈ [ n ] p i C i completion time of job i Table 1: Notation Table4

Competitive Ratio Analysis

The main idea of Algorithm 1 is similar as SPRT, i.e. , we utilize as many resources as possible onthe job with smallest remaining workload, to reduce the number of alive jobs in a greedy manner,while satisfying all the given constraints.

Algorithm 1:

Modiﬁed SRPT (M-SRPT) At each time slot t , maintain the following quantities: • For each job i ∈ [ n ], maintain – W i ( t ) // remaining workload – w i ( t ) // remaining workload of the shortest single task being processed (ifexists) or alive • J ( t ) ← { i ∈ [ n ] | w i ( t ) = 0 } // Jobs with tasks that are finised at time t • d ( t ) ← |J ( t ) | // Number of machines to be reallocated and assign jobs alive to the d ( t ) machines, where jobs with lower remaining workload have a higherpriority. When parallelism is not allowed, at most one machine is allocated to a single job. Our main result is stated in the following theorem.

Theorem 4.

Algorithm 1 achieves a competitive ratio that is no more than CR M − SRPT ≤ p max p min + 2 ηp min + 8 . To show the competitive ratio above, we divide the jobs into diﬀerent classes and comparethe remaining number of jobs under Algorithm 1 with that under optimal algorithm π ∗ . For anyalgorithm π , at time slot t , we divide the unﬁnished jobs into Θ(log α ) classes {C k ( π, t ) } k ∈ [log α +1] ,based on their remaining workload. Jobs with remaining workload that is no more than 2 k andlarger than 2 k − are assigned to the k -th class. Formally, C k ( π, t ) = n i ∈ [ n ] (cid:12)(cid:12)(cid:12) W i ( π, t ) ∈ (2 k − , k ] o , where W i ( π, t ) represents the unﬁnished workload of job i at time t . In the following analysis, weuse C [ k ] ( π, t ) = ∪ ki =1 C i ( π, t ) to denote the collection of jobs in the ﬁrst k classes, and let W [ k ] π ( t ) = P ki =1 W ( i ) π ( t ) represent the total remaining workload of jobs in the ﬁrst k classes, where W ( k ) π ( π, t )denotes the amount of remaining workload of jobs in class C k ( π, t ). W ( k ) π ∗ ( t ) and W [ k ] π ∗ ( t ) are deﬁnedin a similar way for the optimal scheduling algorithm π ∗ .We ﬁrst prove the following lemma, which relates the remaining workload in M-SRPT with thatunder optimal algorithm π ∗ . Lemma 5.

For ∀ t ≥ , the unﬁnished workload under Algorithm 1 can be upper bounded as W [ k ]M − SRPT ( t ) ≤ W [ k ] π ∗ ( t ) + N · (2 k +1 + η + 1) . (2)5 roof: In the proof we always divide jobs into diﬀerent classes according to the remaining workloadunder M-SRPT, we suppress reference to M-SRPT in the notation of C k . Without loss of generalitywe can assume that W [ k ]M − SRPT ( t ) > W [ k ] π ∗ ( t ), otherwise Lemma 5 already holds. Since the remainingworkload under M-SRPT is strictly larger than that under the optimal algorithm, we claim thatthere must exist time slots in (0 , t ], at which either • Idle machines exist under M-SRPT; • Jobs with remaining workload (under M-SRPT) larger than 2 k are processed.Otherwise, all the machines will be processing jobs belonging to set C [ k ] ( t ) before time t , while nojobs in higher classes, i.e. , ∪ i>k C i ( t ), will be switched into class C [ k ] ( t ). Combining with the fact thatthe initial workload under Algorithm 1 and optimal algorithm are identical, i.e. , W [ k ]M − SRPT (0) = W [ k ] π ∗ (0), we can see that W [ k ]M − SRPT ( t ) should be no more than W [ k ] π ∗ ( t ) and the contradiction appears.Now consider the following two collection of time slots before t : T (1) k = n ¯ t ∈ [0 , t ] (cid:12)(cid:12)(cid:12) At time ¯ t, at least one machine is idle under Algorithm 1 o , T (2) k = n ¯ t ∈ [0 , t ] (cid:12)(cid:12)(cid:12) At time ¯ t, there exists i > k such that at least one machine isprocessing jobs in C i under Algorithm 1 o . Let ¯ t ( i ) k = max { t | t ∈ T ( i ) k } ( i ∈ { , } ) be the last time slot in T ( i ) k , based on which we divide ourproof into the following two cases. Case 1: ¯ t (1) k ≥ ¯ t (2) k . From the deﬁnition of ¯ t (1) k , it can be seen that during (¯ t (1) k , t ], no machines areidle or process jobs with remaining workload larger than 2 k under Algorithm 1, while the incrementin remaining workload incurred by newly arriving jobs are identical for Algorithm 1 and π ∗ . Inaddition, it is important to point out that ([ n ] \ C [ k ] (¯ t (1) k )) ∩ C [ k ] (˜ t ) = ∅ for ∀ ˜ t ∈ (¯ t (1) k , t ], i.e. , no jobwill switch from a higher class to C [ k ] during (¯ t (1) k , t ]. Hence W [ k ]M − SRPT ( t ) − W [ k ] π ∗ ( t ) ≤ W [ k ]M − SRPT (¯ t (1) k ) − W [ k ] π ∗ (¯ t (1) k ) . It suﬃces to prove the workload diﬀerence inequality (2) for t = ¯ t (1) k , i.e. , W [ k ]M − SRPT (¯ t (1) k ) ≤ W [ k ] π ∗ (¯ t (1) k ) + N · (2 k +1 + η + 1) . (3)Note that there exists some idle machines at time t = ¯ t (1) k , which implies that under Algorithm 1,the number of jobs alive must be less than N . Hence W [ k ]M − SRPT (¯ t (1) k ) ≤ ( N − · k and (3) holds. Case 2 : ¯ t (1) k < ¯ t (2) k . According to the deﬁnition of ¯ t (2) k , there exist jobs with remaining workloadlarger than 2 k being processed at ¯ t (2) k , we use ˆ J (¯ t (2) k ) ⊆ [ n ] \C [ k ] (¯ t (2) k ) to denote the collection ofsuch jobs.When all the tasks are processed preemptively, we can obtain (2) easily, as we are able toconclude that there are at most N − C [ k ] (¯ t (2) k ). This is because that tasks are allowed6o be preempted, while Algorithm 1 selects a job with remaining workload larger than 2 k at thebeginning of time ¯ t (2) k . Consequently W [ k ]M − SRPT (¯ t (2) k ) ≤ n [ k ]M − SRPT (¯ t (2) k ) · k and for ∀ t > ¯ t (2) k , W [ k ]M − SRPT ( t ) − W [ k ] π ∗ ( t ) ≤ W [ k ]M − SRPT (¯ t (2) k ) − W [ k ] π ∗ (¯ t (2) k ) + [ N − n [ k ]M − SRPT (¯ t (2) k )] · k ≤ N · k , where the ﬁrst inequality follows from the fact that no more than N − n [ k ]M − SRPT (¯ t (2) k ) jobs switchesfrom higher classes to C [ k ] ( t ), as there are at most N − n [ k ]M − SRPT (¯ t (2) k ) jobs with remaining workloadlarger than 2 k are being processed at time ¯ t (2) k . Hence Lemma 5 holds.Now for the case when there exist non-preemptive tasks, arguments above does not work sincemachines may be processing tasks with remaining workload larger than 2 k and n [ k ]M − SRPT (¯ t (2) k ) maybe larger than N . Let r ∈ [ N ] be the number of tasks that are being processed at time ¯ t (2) k andbelongs to [ n ] \ C [ k ] (¯ t (2) k ), and t s ≤ ¯ t (2) k be the latest starting processing time of these tasks. Wedivide our analysis into the following two subcases: • Case . : No jobs switch from set [ n ] \C [ k ] ( t s ) to C [ k ] (¯ t (2) k ) under Algorithm 1. We use ∆ k to represent the increment of W [ k ] A incurred by the newly arriving jobs during time period[ t s , ¯ t (2) k ]. Then we have: W [ k ]M − SRPT (¯ t (2) k ) − W [ k ]M − SRPT ( t s ) = − ( N − r )(¯ t (2) k − t s ) + ∆ k . (4)On the other hand, W [ k ] π ∗ , the remaining workload of jobs in class C [ k ] under optimal algorithm,decreases at a speed that is no more than N units of workload per time slot, hence W [ k ] π ∗ (¯ t (2) k ) − W [ k ] π ∗ ( t s ) ≥ − N · (¯ t (2) k − t s ) + ∆ k . (5)According to the deﬁnition of ¯ t (2) k , no jobs with remaining workload larger than 2 k are pro-cessed in (¯ t (2) k , t ]. Compared with time ¯ t (2) k , there are at most r jobs switch from [ n ] \C [ k ] (¯ t (2) k )to set C [ k ] (¯ t (2) k + 1). Therefore W [ k ]M − SRPT (¯ t (2) k + 1) − W [ k ] π ∗ (¯ t (2) k + 1) ≤ W [ k ]M − SRPT (¯ t (2) k ) − W [ k ] π ∗ (¯ t (2) k ) + r · k . (6)Combining inequalities (4 )—(6), we can obtain W [ k ]M − SRPT ( t ) − W [ k ] π ∗ ( t ) ≤ W [ k ]M − SRPT (¯ t (2) k + 1) − W [ k ] π ∗ (¯ t (2) k + 1) ≤ W [ k ]M − SRPT ( t s ) − W [ k ] π ∗ ( t s ) + r · [2 k + (¯ t (2) k − t s )] ≤ ( N − · k + r · (¯ t (2) k − t s ) ≤ N · (2 k + η ) . The third inequality above holds since at time t s , Algorithm 1 is required to do job selectionand a job with remaining workload larger than k is selected. The last inequality follows fromthe fact that ¯ t (2) k − t s ≤ η , as t s is the starting time of a non-preemptive task that is still aliveat time ¯ t (2) k . 7 Case . : There exist jobs switching from set [ n ] \C [ k ] ( t s ) to C [ k ] (¯ t (2) k ) under Algorithm 1. Weuse J s to denote the collection of such switching jobs. It is essential to bound the number ofswitching jobs, which will incur an increment of |J s |· k in the remaining workload of class C [ k ] .A straightforward bound is |J s | ≤ N · (¯ t (2) k − t s ) ≤ N · η , since at most N jobs receive serviceat each time slot, and hence the number of switching jobs is no more than N . However, thisbound is indeed loose and we argue that |J s | ≤ N − r. (7)Notice that after a job switches to class C [ k ] during [ t s , ¯ t (2) k ] , it will only be preempted by jobsthat are also in class C [ k ] , which is due to the SRPT rule. According to the precondition ofthis case, there are r jobs in set [ n ] \ C [ k ] that are continuously being processed during [ t s , ¯ t (2) k ] ,hence at most N − r units of resources per time slot are available for the remaining jobs. Notethat resources that are allocated to jobs in C [ k ] will not be utilized for switching a job from ahigher class to C [ k ] . In addition, ﬁnished jobs will have no contribution to the total remainingworkload W [ k ]M − SRPT ( t ) . Hence |J s | is no more than N − r .Furthermore, we can derive the following conclusion: W [ k ]M − SRPT ( t ) − W [ k ] π ∗ ( t ) ≤ W [ k ]M − SRPT (¯ t (2) k + 1) − W [ k ] π ∗ (¯ t (2) k + 1) ≤ W [ k ]M − SRPT (¯ t (2) k ) − W [ k ] π ∗ (¯ t (2) k ) + r · k + N (job switching at t (2) ) ≤ [ W [ k ]M − SRPT ( t s ) − W [ k ] π ∗ ( t s ) + ( N − r ) · k + N · (¯ t (2) k − t s )] + r · k + N (job switching during [ t s , ¯ t (2) k ] ) ≤ N · (2 k +1 + η + 1) . ( ¯ t (2) k − t s ≤ η )The proof is complete. (cid:3) We are ready to prove the competitive ratio of Algorithm 1.

Proof of Theorem 4:

Let n M − SRPT ( t ) and n π ∗ ( t ) represent the number of jobs alive at time t under Algorithm 1 and optimal scheduler respectively. For ∀ t ≥ , n π ∗ ( t ) ≥ log p max +1 X k =log p min W ( k ) π ∗ ( t )2 k = log p max +1 X k =log p min h W [ k ] π ∗ ( t ) − W [ k − π ∗ ( t ) i k (deﬁnition of W [ k ] π ∗ ( t ) ) = W [log p max +1]M − SRPT ( t )2 log p max +1 + log p max +1 X k =log p min W [ k ]M − SRPT ( t )2 k +1 ≥ log p max +1 X k =log p min W [ k ] π ∗ ( t )2 k +1 . (8)On the other hand, the number of jobs alive under Algorithm 1 can be upper bounded in a similar8ashion, n M − SRPT ( t ) ≤ log p max +1 X k =log p min W ( k )M − SRPT ( t )2 k − = log p max +1 X k =log p min h W [ k ]M − SRPT ( t ) − W [ k − − SRPT ( t ) i k − (deﬁnition of W [ k ]M − SRPT ( t ) ) = log p max X k =log p min W [ k ]M − SRPT ( t )2 k + W [log p max +1]M − SRPT ( t )2 log p max ≤ log p max +1 X k =log p min W [ k ]M − SRPT ( t )2 k − Using Lemma 5, we are able to relate the number of unﬁnished jobs under two algorithms, n M − SRPT ( t ) ≤ log p max +1 X k =log p min W [ k ]M − SRPT ( t )2 k − ≤ log p max +1 X k =log p min W [ k ] π ∗ ( t )2 k − + log p max +1 X k =log p min N · (2 k + η )2 k − ≤ n π ∗ ( t ) + N · (cid:16) α + 4 ηp min + 4 (cid:17) , where the last inequality follows from inequality (8). To summarize, the competitive ratio ofAlgorithm 1 satisﬁes that CR M − SRPT = P t : n M − SRPT ( t )

Fact 6.

For multiple-processor multitask scheduling problem with constant number of machines,there exists no algorithm achieving an competitive ratio of O (log α + β ) . Proof:

When p min = η = 1 , the problem degenerates to preemptive setting and no algorithmcan achieve a competitive ratio of o (log α ) . When η = p max , the problem degenerates to the non-preemptive setting and O ( β ) is the best possible competitive ratio if the number of machines isconstant. The proof is complete. (cid:3) Fact 7.

For multiple-processor multitask scheduling problem, the competitive ratio of any work-conserving algorithms have an competitive ratio of

Ω(log α + β − ε ) for ∀ ε > . Proof:

The reasoning is similar as Fact 6, since work-conserving algorithms cannot achieve acompetitive ratio of o ( β − ε ) in the non-preemptive setting. (cid:3) Optimality with Poisson Arrival

In this section we show that under mild probabilistic assumptions, Algorithm 1 is asymptoticoptimal for minimizing the total ﬂow time in the heavy traﬃc region. The main result is stated asfollowing.

Theorem 8.

Let F M − SRPT ρ and F ∗ ρ be the mean ﬂow time incurred by Algorithm 1 and optimalalgorithm respectively, when the traﬃc intensity is equal to ρ . In an M/G/N with job size distri-bution satisﬁng either (1) bounded or (2) unbounded with tail function of upper Matuszewska indexless than − , Algorithm 1 is heavy traﬃc optimal, i.e. , lim ρ → E [ F M − SRPT ρ ] E [ F ∗ ρ ] = 1 , (9) as long as the size of a single task is no more than η =  o (cid:16) − ρ ) · R ∞ f ( x )1 − ρ ≤ x dx (cid:17) Case (1) o (cid:16) − ρ ) · G − ( ρ ) · R ∞ f ( x )1 − ρ ≤ x dx (cid:17) Case (2) (10)

Remark.

The probabilistic assumptions (1) and (2) here are all with respect to the distributionof job size, i.e. , the total workload of tasks. For the processing time of a single task, the onlyassumption we have is the upper bound η . It can be seen that the optimality result in [2] correspondsto the special case when η = 1 , while the bound derived in (10) could be extremely large when ρ approaches . On the other hand, for the integral above, we have the following rough estimation, Z ∞ f ( x )1 − ρ ≤ x dx ≤ Z ∞ xf ( x )1 − ρ ≤ x dx + Z f ( x )1 − ρ ≤ x dx ≤ log (cid:16) − ρ (cid:17) + 11 − ρ ≤ . Lower bound on minimum ﬂow time E [ F ∗ ρ ] . To start with, we consider the benchmark systemconsisting of a single machine with speed N , where all the tasks can be allowed to be served inpreemptive fashion, i.e. , the concept of task is indeed unnecessary in this setting. The performanceof SRPT for this single server system is summarized by the following fact. Fact 9 ([7]) . In an

M/G/ with service distribution satisﬁng either (1) bounded or (2) unboundedwith tail function of upper Matuszewska index less than − , then E [ F SRPT − ρ ] =  Θ (cid:16) − ρ (cid:17) Case (1)Θ (cid:16) − ρ ) · G − ( ρ ) (cid:17) Case (2) where G − ( · ) denotes the inverse of G ( x ) = ρ ≤ x /ρ . It is clear to see that the mean ﬂow time under SRPT for this system can be performed as avalid lower bound for the multitask problem, i.e. , E [ F ∗ ρ ] ≥ E [ F SRPT − ρ ] . (11)10 roof of Theorem 8: Our main goal is to derive an analytical upper bound on the quantity E [ F M − SRPT ρ ] . The proof mainly follows from techniques in [2, 8], which relates the ﬂow time of thetagged job with an appropriate busy period.Consider a tagged job with remaining workload x , arriving time r x and completion time C x .The computing resources of N servers must be spent on the following types of job during time [ r x , C x ] :1. The system may be dealing with jobs with remaining workload larger than x , or some machinesare idle, while the tagged job is in service , because the number of jobs alive is smaller than N . We use W waste ( r x ) to represent the amount of such resources, then W waste ( r x ) ≤ ( N − · x, (12)which is indeed the same as Lemma . in [2]. The reason is straightforward—the tagged jobmust be in service according to Algorithm 1, hence the number of such time slots should notexceed x and thus (12) holds.2. The system may be dealing with jobs with remaining workload no more than x at time r x ,the amount of resources spent on this class is no more than W M − SRPT ≤ x ( r x ) . Here we use W M − SRPT ≤ x ( t ) to denote the total workload of jobs with remaining workload no more than x at time t .3. The system may be dealing with jobs which have a remaining workload larger than x at time t = r x , while the tagged job is not in service. This is possible and happens only if the systemmay be processing tasks which belong to a job with total remaining workload larger than x , the tasks are in service before time r x and the non-preemptive rule allows the task to beserved from time r x onwards. Let W non − pm ( r x ) denote the total units of computing resourcesspent on this class of jobs during [ r x , C x ] . Our main argument for this class of jobs is W non − pm ( r x ) ≤ ( N + N ) · η + N · x, (13)To see the correctness of inequality (13), we consider time intervals [ r x , r x + η ] and ( r x + η, C x ] separately. • Note that there are N · η computing resources during time [ r x , r x + η ] in total, henceit is obvious to see that the amount of resources spent on this collection of jobs during [ r x , r x + η ] cannot exceed N · η . • We next show that in time interval ( r x + η, C x ] , the total amount of computing resourcesspent on such jobs is no more than N · η + N · x . Consider the following two types ofjobs: – Note that jobs of this class that have a remaining workload larger than x at time t = r x + η will be processed after time t = r x + η only if the tagged job is in service,hence the amount of resources spending on such jobs are already taken into accountin the ﬁrst class above, i.e. , the quantity W waste ( r x ) , and we can ignore this subclass. – For the collection of jobs with remaining workload no more than x at time t = r x + η ,we ﬁrst consider the setting when diﬀerent tasks within the same job can be processedin parallel. It is clear to see that the remaining workload of such jobs at time t = r x x + N · η . Since there are at most N such jobs in total, we canconclude that the remaining workload of jobs in this subclass must be no more than N · ( x + N · η ) = N · x + N · η , which implies that W non − pm ( r x ) ≤ N · x + N · η + N η and (13) holds.4.

Tagged job itself . The amount of resources is equal to x , the size of the tagged job.5. Newly arriving jobs during [ r x , C x ] with size no more than x .Hence f M − SRPT x , the ﬂow time of the tagged job, is no more than the length of a busy periodwith arrival rate ρ ≤ x and initial workload of W waste ( r x ) + W non − pm ( r x ) + W M − SRPT ≤ x ( r x ) + x . Hencewe have f M − SRPT x ≤ st B ( ρ ≤ x ) (cid:16) W waste ( r x ) + W non − pm ( r x ) + W M − SRPT ≤ x ( r x ) + x (cid:17) ( a ) = B ( ρ ≤ x ) (cid:16) W waste ( r x ) + W non − pm ( r x ) + x (cid:17) + B ( ρ ≤ x ) (cid:16) W M − SRPT ≤ x ( r x ) (cid:17) ( b ) ≤ B ( ρ ≤ x ) (cid:16) N · η + N · (2 x + η ) (cid:17) + B ( ρ ≤ x ) (cid:16) W M − SRPT ≤ x ( r x ) (cid:17) ( c ) ≤ B ( ρ ≤ x ) (cid:16) N · ( η + x ) (cid:17)| {z } Σ + B ( ρ ≤ x ) (cid:16) W SRPT − ≤ x ( r x ) (cid:17)| {z } Σ , where ( a ) follows from the additivity of busy period; In ( b ) we utilize the upper bounds establishedin (12) and (13) and ( c ) follows from Lemma 10.Note that the average ﬂow time under SRPT in a single server system is lower bounded as E [ F SRPT − ρ ] ≥ E [ B ( ρ ≤ x ) ( W SRPT − ≤ x ( t ))] = E x,r x [ B ( ρ ≤ x ) ( W SRPT − ≤ x ( r x ))] = E x,r x [Σ ] , (14)where the ﬁrst equality holds due to the Poission Arrivals See Time Average (PASTA) property [13].Note that E [Σ ] = O (cid:16) E (cid:16) B ( ρ ≤ x ) ( η + x ) (cid:17)(cid:17) = O (cid:16) E h η + x − ρ ≤ x i(cid:17) = O (cid:16) log 11 − ρ (cid:17) + η · O (cid:16) Z ∞ f ( x )1 − ρ ≤ x dx (cid:17) . To achieve heavy traﬃc optimality, it suﬃces to show that the diﬀerence between average ﬂow timeunder Algorithm 1 and optimal algorithm is a lower order term, i.e. , lim ρ → E [ F M − SRPT ρ ] − E [ F SRPT − ρ ] E [ F SRPT − ρ ] = 0 . (15)Note that E [ F M − SRPT ρ ] = E x,r x [ f M − SRPT x ] = E x,r x [Σ ] + E x,r x [Σ ] , lim ρ → η · O (cid:16) R ∞ f ( x )1 − ρ ≤ x dx (cid:17) E [ F SRPT − ρ ] = 0 , since log(1 / (1 − ρ )) is always a lower order term, compared with the optimal ﬂow time E [ F SRPT − ρ ] .Hence η =  o (cid:16) − ρ ) · R ∞ f ( x )1 − ρ ≤ x dx (cid:17) Case (1) o (cid:16) − ρ ) · G − ( ρ ) · R ∞ f ( x )1 − ρ ≤ x dx (cid:17) Case (2) (cid:3)

Lemma 10.

The diﬀerence of under Algorithm 1 and optimal algorithm is upper bounded by W M − SRPT ≤ y ( t ) − W SRPT − ≤ y ( t ) ≤ N · ( y + η + 1) , ∀ y, t ≥ . Proof:

The proof is similar as that of Lemma 5. (cid:3)

References [1] Yossi Azar and Noam Touitou. Improved online algorithm for weighted ﬂow time. In

FOCS ,pages 427–437, 2018.[2] Isaac Grosof, Ziv Scully, and Mor Harchol-Balter. Srpt for multiserver systems.

PerformanceEvaluation , 127:154–175, 2018.[3] Mor Harchol-Balter.

Performance modeling and design of computer systems: queueing theoryin action . Cambridge University Press, 2013.[4] Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, and Eric P.Xing. STRADS: a distributed framework for scheduled model parallel machine learning. In

EuroSys , pages 5:1–5:16, 2016.[5] Stefano Leonardi and Danny Raz. Approximating total ﬂow time on parallel machines. In

STOC , pages 110–119, 1997.[6] Stefano Leonardi and Danny Raz. Approximating total ﬂow time on parallel machines.

Journalof Computer and System Sciences , 73(6):875–891, 2007.[7] Minghong Lin, Adam Wierman, and Bert Zwart. Heavy-traﬃc analysis of mean response timeunder shortest remaining processing time.

Performance Evaluation , 68(10):955–966, 2011.[8] Linus E Schrage and Louis W Miller. The queue m/g/1 with the shortest remaining processingtime discipline.

Operations Research , 14(4):670–684, 1966.[9] Ziv Scully, Guy Blelloch, Mor Harchol-Balter, and Alan Scheller-Wolf. Optimally schedulingjobs with multiple tasks.

ACM SIGMETRICS Performance Evaluation Review , 45(2):36–38,2017. 1310] Ziv Scully, Mor Harchol-Balter, and Alan Scheller-Wolf. Optimal scheduling and exact responsetime analysis for multistage jobs. arXiv preprint arXiv:1805.06865 , 2018.[11] Yin Sun, C Emre Koksal, and Ness B. Shroﬀ. Near delay-optimal scheduling of batch jobs inmulti-server systems.

Ohio State Univ., Tech. Rep , 2017.[12] Weina Wang, Kai Zhu, Lei Ying, Jian Tan, and Li Zhang. Maptask scheduling in mapreducewith data locality: Throughput and heavy-traﬃc optimality.

IEEE/ACM Transactions onNetworking (TON) , 24(1):190–203, 2016.[13] Ronald W Wolﬀ. Poisson arrivals see time averages.

Operations Research , 30(2):223–231, 1982.[14] Yousi Zheng, Ness B. Shroﬀ, and Prasun Sinha. A new analytical technique for designingprovably eﬃcient mapreduce schedulers. In