[PDF] Optimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traffic

Abstract

We consider scheduling to minimize mean response time of the M/G/k queue with unknown job sizes. In the single-server case, the optimal policy is the Gittins policy, but it is not known whether Gittins or any other policy is optimal in the multiserver case. Exactly analyzing the M/G/k under any scheduling policy is intractable, and Gittins is a particularly complicated policy that is hard to analyze even in the single-server case. In this work we introduce monotonic Gittins (M-Gittins), a new variation of the Gittins policy, and show that it minimizes mean response time in the heavy-traffic M/G/k for a wide class of finite-variance job size distributions. We also show that the monotonic shortest expected remaining processing time (M-SERPT) policy, which is simpler than M-Gittins, is a 2-approximation for mean response time in the heavy traffic M/G/k under similar conditions. These results constitute the most general optimality results to date for the M/G/k with unknown job sizes. Our techniques build upon work by Grosof et al., who study simple policies, such as SRPT, in the M/G/k; Bansal et al., Kamphorst and Zwart, and Lin et al., who analyze mean response time scaling of simple policies in the heavy-traffic M/G/1; and Aalto et al. and Scully et al., who characterize and analyze the Gittins policy in the M/G/1.

Full PDF

aa r X i v : . [ c s . PF ] O c t Optimal Multiserver Scheduling with Unknown Job Sizesin Heavy Traﬀic

ZIV SCULLY,

Carnegie Mellon University, USA

ISAAC GROSOF,

Carnegie Mellon University, USA

MOR HARCHOL-BALTER,

Carnegie Mellon University, USAWe consider scheduling to minimize mean response time of the M/G/ k queue with unknown job sizes. In thesingle-server 𝑘 = Gittins policy, but it is not known whether Gittins or anyother policy is optimal in the multiserver case. Exactly analyzing the M/G/ k under any scheduling policy isintractable, and Gittins is a particularly complicated policy that is hard to analyze even in the single-servercase.In this work we introduce monotonic Gittins (M-Gittins), a new variation of the Gittins policy, and showthat it minimizes mean response time in the heavy-traﬃc M/G/ k for a wide class of ﬁnite-variance job sizedistributions. We also show that the monotonic shortest expected remaining processing time (M-SERPT) policy,which is simpler than M-Gittins, is a 2-approximation for mean response time in the heavy traﬃc M/G/ k under similar conditions. These results constitute the most general optimality results to date for the M/G/ k with unknown job sizes. Our techniques build upon work by Grosof et al. [15], who study simple policies,such as SRPT, in the M/G/ k ; Bansal et al. [6], Kamphorst and Zwart [19], and Lin et al. [23], who analyzemean response time scaling of simple policies in the heavy-traﬃc M/G/1; and Aalto et al. [3, 4] and Scullyet al. [32, 33], who characterize and analyze the Gittins policy in the M/G/1. Scheduling to minimize mean response time of the M/G/ k queue is an important problem inqueueing theory. The single-server 𝑘 = shortest remaining processing time (SRPT) policy is easily shown to beoptimal [29]. If the scheduler does not know job sizes, which is very often the case in practicalsystems, then a more complex policy called the Gittins policy is known to be optimal [3, 4, 12].The Gittins policy tailors its priority scheme to the job size distribution, and it takes a simple formin certain special cases. For example, for distributions with decreasing hazard rate (DHR), Gittinsbecomes the foreground-background (FB) policy, so FB is optimal in the M/G/1 for DHR job sizedistributions [3, 4, 11].In contrast to the M/G/1, the M/G/ k with 𝑘 ≥ k , with the only nontrivial results holding under heavy traﬃc. For known job sizes, recentwork by Grosof et al. [15] shows that a multiserver analogue of SRPT is optimal in the heavy-traﬃcM/G/ k . For unknown job sizes, Grosof et al. [15] address only the case of DHR job size distributions,showing that a multiserver analogue of FB is optimal in the heavy-traﬃc M/G/ k . But in general,optimal scheduling is an open problem for unknown job sizes, even in heavy traﬃc. We thereforeask:

What scheduling policy minimizes mean response time in the heavy-traﬃc M/G/k withunknown job sizes and general job size distribution? A job’s response time , also called sojourn time or latency , is the amount of time between its arrival and its completion. FB is the policy that prioritizes the job of least age, meaning the job that has been served the least so far. It is also knownas least attained service (LAS). Here “heavy traﬃc” refers to the limit as the system load approaches capacity for a ﬁxed number of servers. Both the SRPT and FB optimality results of Grosof et al. [15] hold under technical conditions similar to ﬁnite variance.

Ziv Scully, Isaac Grosof, and Mor Harchol-Balter

This is a very diﬃcult question. In order to answer it, we draw upon several recent lines of workin scheduling theory. • As part of their heavy-traﬃc optimality proofs, Grosof et al. [15] use a tagged job methodto stochastically bound M/G/ k response time under each of SRPT and FB relative to M/G/1response time (Fig. 2.1) under the same policy. • Lin et al. [23] and Kamphorst and Zwart [19] characterize the heavy-traﬃc scaling of M/G/1mean response time under SRPT and FB, respectively. • Scully et al. [33] show that a policy called monotonic shortest expected remaining processingtime (M-SERPT), which is considerably simpler than Gittins, has M/G/1 mean response timewithin a constant factor of that of Gittins.While these prior results do not answer the question on their own, together they suggest a plan ofattack for proving optimality in the heavy-traﬃc M/G/ k .When searching for a policy to minimize mean response time, a natural candidate is a multi-server analogue of Gittins. As a ﬁrst step, one might hope to use the tagged job method of Grosofet al. [15] to stochastically bound M/G/ k response time under Gittins relative to M/G/1 responsetime. Unfortunately, the tagged job method does not apply to multiserver Gittins, because it relieson both stochastic and worst-case properties of the scheduling policy, whereas Gittins has poorworst-case properties.One of our key ideas is to introduce a new variant of Gittins, called monotonic Gittins (M-Gittins),that has better worst-case properties than Gittins while maintaining similar stochastic properties.This allows us to generalize the tagged job method [15] to M-Gittins, thus bounding its M/G/ k response time relative to its M/G/1 response time.Our M/G/ k analysis of M-Gittins reduces the question of whether M-Gittins is optimal in theheavy-traﬃc M/G/ k to analyzing the heavy-traﬃc scaling of M-Gittins’s M/G/1 mean responsetime. However, there are no heavy-traﬃc scaling results for the M/G/1 under policies other thanSRPT [23], FB [19], ﬁrst-come, ﬁrst served (FCFS) [21, 22], and a small number of other simplepolicies [6, 9]. To remedy this, we derive heavy-traﬃc scaling results for M-Gittins in the M/G/1.It turns out that analyzing M-Gittins directly is very diﬃcult. Fortunately, M-Gittins has a simplercousin, M-SERPT, which Scully et al. [33] introduce and analyze. We analyze M-SERPT in heavytraﬃc as a key stepping stone in our heavy-traﬃc analysis of M-Gittins.This paper makes the following contributions: • We introduce the M-Gittins policy and prove that it minimizes mean response time in theheavy-traﬃc M/G/ k for a large class of ﬁnite-variance job size distributions (Theorem 3.1). • We also prove that the simple and practical M-SERPT policy is a 2-approximation for meanresponse time in the heavy-traﬃc M/G/ k for a large class of ﬁnite-variance job size distribu-tions (Theorem 3.2). • We characterize the heavy-traﬃc scaling of mean response time in the M/G/1 under Gittins,M-Gittins, and M-SERPT (Theorem 3.3).Section 3 formally states these results and compares them to prior work. Their proofs rely ona large collection of intermediate results, which we outline in detail in Section 4 and prove inSections 5–7.

We consider an M/G/ k queue with arrival rate 𝜆 and job size distribution 𝑋 . Each of the 𝑘 servershas speed 1 / 𝑘 , so regardless of the number of servers, the total service rate is 1 and the system loadis 𝜌 = 𝜆 E [ 𝑋 ] . This allows us to easily compare the M/G/ k system to a single-server M/G/1 system,as illustrated in Fig. 2.1. We assume a preempt-resume model with no preemption overhead. This ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 3 Single-Server System speed 1 𝜆 𝑘 -Server System speed 1 / 𝑘 speed 1 / 𝑘 speed 1 / 𝑘𝜆 Fig. 2.1. Single-Server and 𝑘 -Server Systems means that a single-server M/G/1 system can simulate any M/G/ k policy by time-sharing between 𝑘 jobs.Throughout this paper we consider the 𝜌 → heavy-traﬃc limit. This is the 𝜆 → / E [ 𝑋 ] limit with the job size distribution 𝑋 and number of servers 𝑘 held constant.We write 𝐹 for the cumulative distribution function of 𝑋 and 𝐹 ( 𝑥 ) = − 𝐹 ( 𝑥 ) for its tail. Weassume that 𝑋 has a continuous, piecewise-monotonic hazard rate ℎ ( 𝑥 ) = dd 𝑥 𝐹 ( 𝑥 ) 𝐹 ( 𝑥 ) . We also frequently work with the expected remaining size of a job at age 𝑎 , which is E [ 𝑋 − 𝑎 | 𝑋 > 𝑎 ] .We assume it, too, is continuous and piecewise-monotonic as a function of 𝑎 .The above assumptions on hazard rate and expected remaining size are not restrictive and serveprimarily to simplify presentation. It is very likely that our proofs can be generalized to relax them. All of the scheduling policies considered in this work are in the class of

SOAP policies [32], gen-eralized to a multiserver setting. In a single-server setting, a SOAP policy 𝜋 is speciﬁed by a rankfunction 𝑟 𝜋 : R + → R which maps a job’s age , namely the amount of service it has received so far, to its rank , or prioritylevel. Single-server SOAP policies work by always serving the job of minimal rank , breaking tiesin FCFS fashion. As an example, FB is a SOAP policy with 𝑟 FB ( 𝑎 ) = 𝑎 . Because lower age corresponds to lowerrank, FB prioritizes the job of least age. A multiserver SOAP policy uses the same rank function as its single-server analogue. The onlydiﬀerence is that the system can serve up to 𝑘 jobs, so a multiserver SOAP policy works as follows: • If there are at most 𝑘 jobs in the system, serve all of them. • If there are more than 𝑘 jobs in the system, serve the 𝑘 jobs of minimal rank, breaking tiesin FCFS fashion. A function is piecewise-monotonic if, roughly speaking, it switches between increasing and decreasing ﬁnitely manytimes in any compact interval. The full SOAP class allows a job’s rank to depend on both its age and its “static” characteristics, such as its size or class,but we do not use this generality in this paper. When multiple jobs are tied for least age, FB shares the server among all such jobs because the rank function is increasing.See Scully et al. [32, Appendix B] for details.

Ziv Scully, Isaac Grosof, and Mor Harchol-Balter 𝑎 𝑟 M-SERPT ( 𝑎 ) 𝑟 SERPT ( 𝑎 ) 𝑎 𝑟 M-Gittins ( 𝑎 ) 𝑟 Gittins ( 𝑎 ) Fig. 2.2. Rank Function Examples

We often compare the 𝑘 -server variant of a policy 𝜋 to its single-server analogue. When it isnecessary to distinguish between them, we write 𝜋 - k for the 𝑘 -server version of a policy, so 𝜋 -1 isthe single-server version. We write 𝑇 𝜋 - 𝑘𝑥 for the size-conditional response time distribution of jobsof size 𝑥 under 𝜋 - k , and we write 𝑇 𝜋 - 𝑘 for the overall response time distribution.There are four main policies we consider in this work: SERPT, M-SERPT, Gittins, and M-Gittins.None of the policies need job size information, but each uses the job size distribution to tune itsrank function. As an example, Fig. 2.2 shows the four rank functions for a bounded distributionwith nonmonotonic hazard rate. Deﬁnition 2.1.

The shortest expected remaining processing time (SERPT) policy is the SOAP policywith rank function 𝑟 SERPT ( 𝑎 ) = E [ 𝑋 − 𝑎 | 𝑋 > 𝑎 ] = ∫ ∞ 𝑎 𝐹 ( 𝑡 ) d 𝑡𝐹 ( 𝑎 ) . As a reminder, lower rank means better priority, so, as hinted by its name, SERPT prioritizes thejob of least expected remaining size.

Deﬁnition 2.2.

The monotonic SERPT (M-SERPT) policy is the SOAP policy with monotonic rankfunction 𝑟 M-SERPT ( 𝑎 ) = max 𝑏 ∈[ ,𝑎 ] 𝑟 SERPT ( 𝑏 ) . Deﬁnition 2.3.

The

Gittins policy is the SOAP policy with rank function 𝑟 Gittins ( 𝑎 ) = inf 𝑏 > 𝑎 E [ min { 𝑋, 𝑏 } − 𝑎 | 𝑋 > 𝑎 ] P { 𝑋 ≤ 𝑏 | 𝑋 > 𝑎 } = inf 𝑏 > 𝑎 ∫ 𝑏𝑎 𝐹 ( 𝑡 ) d 𝑡𝐹 ( 𝑎 ) − 𝐹 ( 𝑏 ) . Deﬁnition 2.4.

The monotonic Gittins (M-Gittins) policy is the SOAP policy with monotonic rankfunction 𝑟 M-Gittins ( 𝑎 ) = max 𝑏 ∈[ ,𝑎 ] 𝑟 Gittins ( 𝑏 ) . ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 5 The M-Gittins and M-SERPT policies, which both have monotonic rank functions, are the pri-mary focus of this paper. Some of our intermediate results apply more broadly to any policy witha monotonic rank function.

Deﬁnition 2.5.

A SOAP policy 𝜋 is monotonic if its rank function is nondecreasing, meaning 𝑟 𝜋 ( 𝑎 ) ≤ 𝑟 𝜋 ( 𝑏 ) for all ages 𝑎 < 𝑏 . Figure 2.2 shows the SERPT, M-SERPT, Gittins, and M-Gittins rank functions for a boundeddistribution with nonmonotonic hazard rate. Notice that SERPT and Gittins are not monotonic.This makes it hard to analyze their M/G/ k response time (Appendix A). In contrast, the M-SERPTand M-Gittins are monotonic: their rank functions alternate between constant regions and strictlyincreasing regions.While the rank functions of Gittins and SERPT may not be monotonic, they are still well behavedunder our assumptions on the job size distribution. Lemma 2.6.

Under the assumption that the job size distribution 𝑋 has continuous and piecewise-monotonic hazard rate and expected remaining size functions, each of 𝑟 SERPT , 𝑟 M-SERPT , 𝑟 Gittins , and 𝑟 M-Gittins is continuous and piecewise-monotonic.

Proof.

It suﬃces to prove the claims for 𝑟 SERPT and 𝑟 Gittins . The claim for 𝑟 SERPT is exactly ourassumption on expected remaining size, and the claim for 𝑟 Gittins is a known result [4, Theorem 1]. (cid:3)

We consider several classes of job size distributions in this paper. We brieﬂy describe each classbefore giving the formal deﬁnitions. • The OR (−∞ , − ) class (Deﬁnition 2.7) contains, roughly speaking, distributions with Pareto-like tails. – We focus especially on the OR (−∞ , − ) subclass, all members of which have ﬁnite vari-ance. • The

MDA ( Λ ) class (Deﬁnition 2.12) contains, roughly speaking, distributions with smoothtails that are lighter than Pareto tails. It includes, among others, exponential, normal, log-normal, Weibull, and Gamma distributions. • The

QDHR and

QIMRL classes (Deﬁnitions 2.8 and 2.9) are relaxations of the well-known decreasing hazard rate ( DHR ) and increasing mean residual lifetime ( IMRL ) classes [1–4, 11, 27,28, 34].

QDHR contains distributions whose hazard rate is roughly decreasing with age, evenif it is not perfectly monotonic, and

QIMRL contains distributions with roughly increasingexpected remaining size. – We focus especially on the subclasses

MDA ( Λ ) ∩ QDHR and

MDA ( Λ ) ∩ QIMRL . • The

ENBUE class (Deﬁnition 2.10) is a relaxation of the well-known new better than used inexpectation ( NBUE ) class [3, 4, 34]. It contains distributions whose expected remaining sizereaches a global maximum at some age. – We focus especially on the

Bounded subclass, which contains all bounded distributions.These classes play two diﬀerent roles in our analysis. • Some of the classes broadly characterize the asymptotic behavior of the tail 𝐹 . These include OR (−∞ , − ) , MDA ( Λ ) , and ENBUE . Virtually all job size distributions of interest are in one The nonincreasing case is less interesting, because all nonincreasing rank functions encode FCFS. Because the

NBUE terminology originates in reliability analysis, the word “better” here means “longer”.

Ziv Scully, Isaac Grosof, and Mor Harchol-Balter of these classes, so requiring membership in one of them, as in Theorem 3.3, should not beviewed as a major restriction. • Some of the classes impose additional conditions on the job size distribution that help usbound the M-Gittins and M-SERPT rank functions (Section 6). These include

QDHR , QDHR ,and

Bounded . While these classes are much broader than those previously studied (Sec-tion 3.1), they do not cover all distributions of interest. Requiring membership in one ofthem, as in Theorems 3.1 and 3.2, represents a genuine restriction.

Deﬁnition 2.7.

A function 𝑓 is 𝑂 -regularly varying if there exist exponents 𝛽 ≥ 𝛼 > 𝐶 , 𝑥 > 𝑦 ≥ 𝑥 ≥ 𝑥 ,1 𝐶 (cid:16) 𝑦𝑥 (cid:17) − 𝛽 ≤ 𝑓 ( 𝑦 ) 𝑓 ( 𝑥 ) ≤ 𝐶 (cid:16) 𝑦𝑥 (cid:17) − 𝛼 . We write OR (− 𝛽 , − 𝛼 ) for the set of 𝑂 -regularly varying functions where the exponents 𝛼 and 𝛽 above may be chosen such that 𝛼 < 𝛼 ≤ 𝛽 < 𝛽 . We use the same OR (− 𝛽 , − 𝛼 ) notation torepresent the class of distributions whose tails are in OR (− 𝛽 , − 𝛼 ) . Deﬁnition 2.8.

A job size distribution is in the quasi-decreasing hazard rate class, denoted

QDHR ,if there exist a strictly increasing function 𝑚 : R + → R + , an exponent 𝛾 ≥

1, and constants 𝐶 , 𝑥 > 𝑥 ≥ 𝑥 , 𝑚 ( 𝑥 ) ≤ ℎ ( 𝑥 ) ≤ 𝑚 ( 𝐶 𝑥 𝛾 ) . Deﬁnition 2.9.

A job size distribution is in the quasi-increasing mean residual lifetime class, de-noted

QIMRL , if there exist a strictly increasing function 𝑚 : R + → R + , an exponent 𝛾 ≥

1, andconstants 𝐶 , 𝑥 > 𝑥 ≥ 𝑥 , 𝑚 ( 𝑥 ) ≤ E [ 𝑋 − 𝑥 | 𝑋 > 𝑥 ] ≤ 𝑚 ( 𝐶 𝑥 𝛾 ) . Deﬁnition 2.10.

A job size distribution is in the eventually new better than used in expectation class, denoted

ENBUE , if there exists an age 𝑎 ∗ ≥ 𝑥 ≠ 𝑎 ∗ , E [ 𝑋 − 𝑎 ∗ | 𝑋 > 𝑎 ∗ ] ≥ E [ 𝑋 − 𝑥 | 𝑋 > 𝑥 ] . Deﬁnition 2.11.

A job size distribution is in the bounded class, denoted

Bounded , if there exists 𝑥 max < ∞ such that 𝐹 ( 𝑥 max ) = Deﬁnition 2.12.

A job size distribution is said to be in the

Gumbel domain of attraction , denoted

MDA ( Λ ) , under certain conditions speciﬁed in extreme value theory [26].The exact characterization of MDA ( Λ ) is outside the scope of this paper. The most importantproperty is that distributions in MDA ( Λ ) are lighter-tailed than all Pareto distributions. Lemma 2.13. If 𝑋 ∈ MDA ( Λ ) , then 𝐹 ( 𝑥 ) = 𝑜 ( 𝑥 − 𝛼 ) for all 𝛼 > . Proof.

The result follows from a known characterization of

MDA ( Λ ) [26, Proposition 1.4]. (cid:3) This is not the standard deﬁnition of 𝑂 -regular variation, but it is equivalent to it [8, Section 2.2.1]. Speciﬁcally, our OR (− 𝛽 , − 𝛼 ) contains the 𝑂 -regularly varying functions whose Matuszewska indices are in the interval (− 𝛽 , − 𝛼 ) . ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 7 We now present our main results, explaining how they relate to prior work in Section 3.1. Webegin with our heavy-traﬃc M/G/ k optimality result. Theorem 3.1.

In an M/G/k, if 𝑋 ∈ OR (−∞ , − ) ∪ ( MDA ( Λ ) ∩ QDHR ) ∪

Bounded , then lim 𝜌 → E [ 𝑇 M-Gittins- 𝑘 ] E [ 𝑇 Gittins-1 ] = . In such cases, M-Gittins-k is optimal for mean response time in heavy traﬃc.

The M-Gittins policy is based on the Gittins policy, which is somewhat complex to describeand compute. Fortunately, the M-SERPT policy, which can be much simpler to compute [33], alsoperforms well in the heavy-traﬃc M/G/ k . Theorem 3.2.

In an M/G/k, if 𝑋 ∈ OR (−∞ , − ) ∪ ( MDA ( Λ ) ∩ ( QDHR ∪ QIMRL )) ∪

Bounded , then lim 𝜌 → E [ 𝑇 M-SERPT- 𝑘 ] E [ 𝑇 Gittins-1 ] ≤ . In such cases, M-SERPT-k is a -approximation for mean response time in heavy traﬃc. Theorems 3.1 and 3.2 apply to a broad class of ﬁnite-variance job size distributions. Roughlyspeaking, OR (−∞ , − ) covers heavy-tailed distributions, and MDA ( Λ ) covers non-heavy-taileddistributions that are unbounded (Section 2.2). Assuming membership in these sets is standard forheavy-traﬃc analysis [19]. The main restriction the results impose is on MDA ( Λ ) distributions,for which we additionally require membership in QDHR or QIMRL . While slightly relaxing thisrestriction is possible, removing it entirely appears to be very diﬃcult (Section 8).A key step in the proofs of Theorems 3.1 and 3.2 is analyzing M-Gittins and M-SERPT in theheavy-traﬃc M/G/1. This analysis is itself a new result of independent interest. Notably, it extendsto ordinary Gittins in addition to M-Gittins, thus characterizing the optimal heavy-traﬃc scalingattainable by any scheduling policy in the setting of unknown job sizes. Theorem 3.3.

Let 𝜋 -1 be one of Gittins-1, M-Gittins-1, or M-SERPT-1. If 𝑋 ∈ OR (− , − ) , then inthe 𝜌 → limit, E [ 𝑇 𝜋 -1 ] = Θ (cid:18) log 11 − 𝜌 (cid:19) and if 𝑋 ∈ OR (−∞ , − ) ∪ MDA ( Λ ) ∪ ENBUE , then E [ 𝑇 𝜋 -1 ] = Θ ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! , where 𝐹 − 𝑒 is the inverse of the tail of the excess of 𝑋 , namely 𝐹 𝑒 ( 𝑥 ) = E [ 𝑋 ] ∫ ∞ 𝑥 𝐹 ( 𝑡 ) d 𝑡 . For example, we only need the

QDHR and

QIMRL assumptions to prove Theorems 6.3 and 6.5, so we could insteadassume the results of those theorems.

Ziv Scully, Isaac Grosof, and Mor Harchol-Balter

Theorem 3.1 is the ﬁrst result proving optimality of a scheduling policy in the heavy-traﬃc M/G/ k with unknown job sizes and general job size distribution. As mentioned in Section 1, the only priorresults of this type were shown by Grosof et al. [15], who prove similar results for SRPT and FB,that latter for decreasing hazard rate ( DHR ) job size distributions. • SRPT was shown to be optimal in the heavy-traﬃc M/G/ k for job size distributions whosetail has upper Matuszewska index less than − 𝛼 >

2. This is somewhat broader thanthe precondition of Theorem 3.1, though it is still limited to ﬁnite-variance distributions. – Given that SRPT is designed for known job sizes while M-Gittins is designed for unknownjob sizes, Theorem 3.1 complements the prior SRPT results. • FB was shown to be optimal in the heavy-traﬃc M/G/ k for job size distributions in theclass DHR ∩ ( OR (−∞ , − ) ∪ MDA ( Λ )) [15, Theorem 7.13]. The

DHR class is much morerestrictive than

QDHR , so this is much narrower than the precondition of Theorem 3.1. – Given that FB is equivalent to M-Gittins in the

DHR case [3, 4], Theorem 3.1 subsumes theprior FB results.There is another result that follows from two prior works that complements Theorem 3.1, al-though to the best of our knowledge it has never been explicitly stated. Köllerström [21, 22] showsthat under FCFS, the mean response times in the M/G/1 and M/G/ k converge. This means that ifGittins and M-Gittins happen to be equivalent to FCFS for a given job size distribution, then FCFSminimizes mean response time in the heavy-traﬃc M/G/ k . Aalto et al. [3, 4] show this occursexactly for job size distributions in the new better than used in expectation ( NBUE ) class, whichincludes some distributions that Theorem 3.1 does not cover.Finally, versions of the Gittins policy have been shown to be heavy-traﬃc optimal for twodiscrete-state versions of the M/G/ k queue [13, 14]. These models support some features our modeldoes not, such as multiple job classes, but discretizing the state space imposes some limitations.Speciﬁcally, Glazebrook and Niño-Mora [14] require each job to be composed of phases whereeach phase has exponentially distributed size; and Glazebrook [13] allows nonexponential job sizedistributions but discretizes time and additionally requires ENBUE job size distributions (Deﬁni-tion 2.10). In contrast, Theorem 3.1 applies to heavy-tailed and other non-

ENBUE job size distri-butions that are of practical importance in computer systems [5, 10, 17, 25].Theorem 3.2 shows that a simple scheduling policy, namely M-SERPT, has mean response timewithin a constant factor of optimal in the heavy-traﬃc M/G/ k with unknown job sizes and gen-eral job size distribution. Speciﬁcally, we show M-SERPT is a 2-approximation. This complementsthe result of Scully et al. [33], who show that in the M/G/1, M-SERPT is a 5-approximation forM/G/1 mean response time at all loads. Our result is tighter and applies to multiserver systems,not just single-server systems, but it applies only in heavy traﬃc. The techniques we introducecould be useful for tightening the upper bound on M-SERPT’s M/G/1 approximation ratio, whichis conjectured to be 2 [33].Theorem 3.3 characterizes the heavy-traﬃc scaling of M/G/1 mean response time under Gittins,M-Gittins, and M-SERPT. There are three other policies whose heavy-traﬃc scaling has been char-acterized: FB, SRPT, and a policy called randomized multilevel feedback (RMLF) [7, 18]. We nowcompare Theorem 3.3 to each of these prior results. While Grosof et al. [15, Theorem 7.13] claim that this result applies to all distributions in

DHR with upper Matuszewskaindex less than −

2, their proof incorrectly cites the preconditions of results of Kamphorst and Zwart [19]. Correcting theprecondition narrows the result to what we state here. ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 9

Kamphorst and Zwart [19] study FB in heavy traﬃc. They show that if 𝑋 ∈ OR (− , − ) , then E [ 𝑇 FB-1 ] = Θ (cid:18) log 11 − 𝜌 (cid:19) , matching the ﬁrst expression in Theorem 3.3. They also show that if 𝑋 ∈ OR (−∞ , − ) ∪ MDA ( Λ ) ,then E [ 𝑇 FB-1 ] = Θ ( − 𝜌 ) · 𝑟 SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! . This is similar to the second expression in Theorem 3.3, except it replaces the monotonic 𝑟 M-SERPT with the nonmonotonic 𝑟 SERPT , which pinpoints the suboptimality of FB’s heavy-traﬃc scaling.Lin et al. [23] study SRPT in heavy traﬃc. They show that if 𝑋 ∈ OR (− , − ) , then E [ 𝑇 SRPT-1 ] = Θ (cid:18) log 11 − 𝜌 (cid:19) , and if 𝐹 has upper Matuszewska index less than −

2, which covers 𝑋 ∈ OR (−∞ , − ) ∪ MDA ( Λ ) ,then E [ 𝑇 SRPT-1 ] = Θ ( − 𝜌 ) · 𝐺 − ( − 𝜌 ) ! , where 𝐺 ( 𝑥 ) = − E [ 𝑋 ( 𝑋 ≤ 𝑥 )] E [ 𝑋 ] = 𝐹 𝑒 ( 𝑥 ) + 𝑥𝐹 ( 𝑥 ) E [ 𝑋 ] . Recall that SRPT minimizes mean response time in the presence of job size information, whereasGittins does not use job size information, so the heavy-traﬃc scaling of SRPT is a lower bound onthat of Gittins. By comparing the above result for SRPT with our result for Gittins (Theorem 3.3),we learn when knowledge of job sizes yields an asymptotic improvement in mean response time. • For 𝑋 ∈ OR (− , − ) , meaning 𝑋 is heavy-tailed with inﬁnite variance, the heavy-traﬃcscaling of Gittins matches that of SRPT. • For 𝑋 ∈ OR (−∞ , − ) , meaning 𝑋 is heavy-tailed with ﬁnite variance, the heavy-traﬃc scal-ing of Gittins still matches that of SRPT. Speciﬁcally, we later show 𝑟 M-SERPT ( 𝑎 ) = Θ ( 𝑎 ) (Theorem 6.2), and one can also show 𝐺 − ( − 𝜌 ) = Θ ( 𝐹 − 𝑒 ( − 𝜌 )) . • For 𝑋 ∈ MDA ( Λ ) , meaning 𝑋 is not heavy-tailed, one can show 𝑟 M-SERPT ( 𝑎 ) = 𝑜 ( 𝑎 ) [26],implying Gittins has worse heavy-traﬃc scaling than SRPT in those cases.We see that, roughly speaking, Gittins matches the heavy-traﬃc scaling of SRPT if and only if thejob size distribution is heavy-tailed. We conclude that knowledge of job sizes yields an asymptoticimprovement in mean response time for non-heavy-tailed job size distributions.Bansal et al. [6] study RMLF in heavy traﬃc. They show that if E [ 𝑋 𝛼 ] < ∞ for some 𝛼 >

2, then E [ 𝑇 RMLF-1 ] = 𝑂 (cid:18) E [ 𝑇 SRPT-1 ] · log 11 − 𝜌 (cid:19) . (3.1)Because Gittins minimizes M/G/1 mean response time, this serves as an upper bound on the heavy-traﬃc scaling of Gittins. However, as previously discussed when comparing Theorem 3.3 to priorresults on SRPT, there are cases where Gittins matches the heavy-traﬃc scaling of SRPT, so ourresult is a tighter bound. With that said, requiring E [ 𝑋 𝛼 ] < ∞ for some 𝛼 > Key Deﬁnitions • (Section 2.2) Job size distribution classes:

QDHR , OR (−∞ , − ) , MDA ( Λ ) , etc. • (Sections 4 and 5) Single-server quantities: E [ 𝑄 𝜋 -1 ] , E [ 𝑅 𝜋 -1 ] , and E [ 𝑆 𝜋 -1 ] . • (Section 4.1) Age cutoﬀs: 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 . Proof Steps • (Section 5) Compare M/G/k to M/G/1: E [ 𝑇 𝜋 - 𝑘 ] ≤ E [ 𝑄 𝜋 -1 ] + 𝑘 E [ 𝑅 𝜋 -1 ] + ( 𝑘 − ) E [ 𝑆 𝜋 -1 ] ,whereas E [ 𝑇 𝜋 -1 ] = E [ 𝑄 𝜋 -1 ] + E [ 𝑅 𝜋 -1 ] . • Show E [ 𝑄 𝜋 -1 ] dominates E [ 𝑅 𝜋 -1 ] and E [ 𝑆 𝜋 -1 ] in 𝜌 → limit. – (Section 6) Job size distribution classes imply bounds on age cutoﬀs: for example, if 𝑋 ∈ QDHR , then 𝑧 𝜋𝑥 = 𝑂 ( 𝑥 𝛾 ) for some 𝛾 ≥ – (Section 7) Job size distribution classes and bounds on age cutoﬀs imply E [ 𝑄 𝜋 -1 ] dominates: for example, if 𝑋 ∈ MDA ( Λ ) and 𝑧 𝜋𝑥 = 𝑂 ( 𝑥 𝛾 ) for some 𝛾 ≥

1, then E [ 𝑆 𝜋 -1 ] = 𝑜 ( 𝑄 𝜋 -1 ) . • (Section 4.4) Compare M-Gittins-k and M-SERPT-k to Gittins-1. – M-Gittins-k vs. Gittins-1: prior work shows E [ 𝑄 M-Gittins-1 ] ≤ E [ 𝑇 Gittins-1 ] , implyinglim 𝜌 → E [ 𝑇 M-Gittins- 𝑘 ]/ E [ 𝑇 Gittins-1 ] = – M-SERPT-k vs. Gittins-1: prior work shows E [ 𝑄 M-SERPT-1 ] ≤ E [ 𝑇 Gittins-1 ] , implyinglim 𝜌 → E [ 𝑇 M-SERPT- 𝑘 ]/ E [ 𝑇 Gittins-1 ] ≤ Throughout, 𝜋 stands for either M-Gittins or M-SERPT. Fig. 4.1. Proof Overview

Our main goal is to show that M-Gittins minimizes M/G/ k mean response time in the 𝜌 → E [ 𝑇 M-Gittins- 𝑘 ] ≤ E [ 𝑇 Gittins-1 ] + 𝑜 ( E [ 𝑇 Gittins-1 ]) . (4.1)The only existing technique for proving a bound like (4.1) is the M/G/ k tagged job method ofGrosof et al. [15]. In general, tagged job methods work as follows [15, 16, 20, 24, 30–32, 35]: onefocuses on a “tagged” job 𝐽 throughout its time in the system, tracking how much each other jobdelays 𝐽 . The amount of time for which another job can delay 𝐽 is called the relevant work due tothat other job. The speciﬁc M/G/ k tagged job method [15] relates the amount of relevant work inan M/G/ k under 𝜋 - k to the amount of relevant work in an M/G/1 under 𝜋 -1.As a ﬁrst approach, we might try to prove a result like (4.1) for Gittins- k using the M/G/ k taggedjob method. Unfortunately, the method turns out not to work for Gittins, because Gittins can havea nonmonotonic rank function. It turns out that under nonmonotonic rank functions, jobs cancontribute more relevant work in an M/G/ k than in an M/G/1 (Appendix A), resulting in a muchlooser response time bound.Our key insight is that we can generalize the M/G/ k tagged job method of Grosof et al. [15] toany SOAP policy, provided it has a monotonic rank function. In Theorem 5.1 we show that for anymonotonic SOAP policy 𝜋 , E [ 𝑇 𝜋 - 𝑘 ] ≤ E [ 𝑄 𝜋 -1 ] + 𝑘 E [ 𝑅 𝜋 -1 ] + ( 𝑘 − ) E [ 𝑆 𝜋 -1 ] , (4.2) ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 11 𝑦 𝜋𝑥 𝑟 𝜋 ( 𝑥 ) 𝑥 𝑧 𝜋𝑥 𝑦 𝜋𝑥 ′ = 𝑥 ′ = 𝑧 𝜋𝑥 ′ 𝑟 𝜋 ( 𝑥 ′ ) Fig. 4.2. New Job and Old Job Age Cutoﬀs where the quantities on the right hand side, deﬁned formally in Section 5, can be thought of asfollows: • 𝑄 𝜋 -1 and 𝑅 𝜋 -1 are distributions called waiting time and residence time , respectively [32]. Re-sponse time in the M/G/1 is the sum of waiting time and residence time. • 𝑆 𝜋 -1 is a new distribution we call inﬂated residence time , which is similar to residence timebut longer.Proving (4.2) is the ﬁrst stepping stone to proving Theorem 3.1 because it reduces an M/G/ k anal-ysis to an M/G/1 analysis. Only the E [ 𝑅 𝜋 -1 ] and E [ 𝑆 𝜋 -1 ] coeﬃcients depend on 𝑘 , so to prove The-orem 3.1, we show the E [ 𝑄 𝜋 -1 ] term dominates in the 𝜌 → 𝜋 is M-Gittins. Figure 4.1gives an overview of the main proof steps.In the remainder of this section, our goal is to bound E [ 𝑄 𝜋 -1 ] , E [ 𝑅 𝜋 -1 ] , and E [ 𝑆 𝜋 -1 ] , where 𝜋 iseither M-Gittins or M-SERPT. We begin in Section 4.1 by explaining in more detail the concepts ofrelevant work and of waiting, residence, and inﬂated residence time. In doing so, we introduce agecutoﬀs , quantities which characterize the relevant work due to each job. It turns out that to bound E [ 𝑄 𝜋 -1 ] , E [ 𝑅 𝜋 -1 ] , and E [ 𝑆 𝜋 -1 ] , we ﬁrst need to bound the age cutoﬀs. Section 4.2 presents ourage cutoﬀ bounds, deferring proofs to Section 6, and Section 4.3 presents our bounds on E [ 𝑄 𝜋 -1 ] , E [ 𝑅 𝜋 -1 ] , and E [ 𝑆 𝜋 -1 ] , deferring proofs to Section 7. Finally, in Section 4.4, we formally prove The-orems 3.1–3.3 by combining the intermediate results discussed throughout this section. In this section we give intuition for the tagged job method, deferring some formalities to Section 5.Recall that the tagged job method works by focusing on the journey of a “tagged” job 𝐽 throughthe system. Roughly speaking, the relevant work due to any other job is the amount of time bywhich that job delays 𝐽 ’s departure. A key insight from the M/G/1 SOAP analysis [32] is that toﬁgure out how much another job delays 𝐽 , we need to look not at 𝐽 ’s current rank but at its worstfuture rank . This is because even if 𝐽 has priority over another job at ﬁrst, if 𝐽 ’s rank later increases,the other job can get priority.Suppose that 𝐽 has size 𝑥 . Under a monotonic SOAP policy 𝜋 , such as M-Gittins or M-SERPT,the worst future rank 𝐽 will have is always the rank it will have just before completion, namely 𝑟 𝜋 ( 𝑥 ) . The amount of relevant work due to another job 𝐽 ′ is the amount of time 𝐽 ′ is served while 𝐽 is in the system until 𝐽 ′ either completes or reaches rank 𝑟 𝜋 ( 𝑥 ) . Due to the FCFS tiebreaking rule(Section 2.1), exactly what “reaches” means depends on when 𝐽 ′ arrives. • New jobs , those that arrive after 𝐽 , contribute relevant work until they ﬁrst have rank greaterthan or equal to 𝑟 𝜋 ( 𝑥 ) . This occurs at a speciﬁc age called the new job age cutoﬀ , denoted 𝑦 𝜋𝑥 . • Old jobs , those that arrive before 𝐽 , contribute relevant work until they ﬁrst have rank strictly greater than 𝑟 𝜋 ( 𝑥 ) . This occurs at a speciﬁc age called the old job age cutoﬀ , denoted 𝑧 𝜋𝑥 . Figure 4.2 illustrates the new job and old job age cutoﬀs 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 , which are formally deﬁnedbelow. Roughly speaking, • if 𝑟 𝜋 is increasing at 𝑥 , then 𝑦 𝜋𝑥 = 𝑥 = 𝑧 𝜋𝑥 ; and • if 𝑟 𝜋 is constant at 𝑥 , then 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 are the endpoints of the constant region containing 𝑥 .As Fig. 4.2 illustrates, we always have 𝑦 𝜋𝑥 ≤ 𝑥 ≤ 𝑧 𝜋𝑥 . (4.3) Deﬁnition 4.1.

Let 𝜋 be a monotonic SOAP policy. The new job age cutoﬀ and old job age cutoﬀ of size 𝑥 are, respectively, 𝑦 𝜋𝑥 = sup { 𝑎 ≥ | 𝑟 𝜋 ( 𝑎 ) < 𝑟 𝜋 ( 𝑥 )} ,𝑧 𝜋𝑥 = sup { 𝑎 ≥ | 𝑟 𝜋 ( 𝑎 ) ≤ 𝑟 𝜋 ( 𝑥 )} . When the policy in question is clear, we drop the superscript 𝜋 .One can use new job and old job age cutoﬀs to write M/G/1 mean response time under a mono-tonic SOAP policy [33]. As a ﬁrst step, we write M/G/1 response time 𝑇 𝜋 -1 as a sum of two parts,called waiting time 𝑄 𝜋 -1 and residence time 𝑅 𝜋 -1 [32]: E [ 𝑇 𝜋 -1 ] = E [ 𝑄 𝜋 -1 ] + E [ 𝑅 𝜋 -1 ] . We deﬁne waiting and residence times formally in Section 5. For now, we just need to know thattheir means can be written in terms of 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 . Speciﬁcally, Scully et al. [33, Propositions 4.7and 4.8] show E [ 𝑄 𝜋 -1 ] = ∫ ∞ 𝜏 ( 𝑧 𝜋𝑥 ) 𝜌 ( 𝑦 𝜋𝑥 ) 𝜌 ( 𝑧 𝜋𝑥 ) d 𝐹 ( 𝑥 ) , E [ 𝑅 𝜋 -1 ] = ∫ ∞ 𝑥𝜌 ( 𝑦 𝜋𝑥 ) d 𝐹 ( 𝑥 ) , (4.4)where 𝜌 and 𝜏 are deﬁned as 𝜌 ( 𝑎 ) = − 𝜆 E [ min { 𝑋, 𝑎 }] = − ∫ 𝑎 𝜆𝐹 ( 𝑡 ) d 𝑡,𝜏 ( 𝑎 ) = 𝜆 E [ min { 𝑋, 𝑎 } ] = ∫ 𝑎 𝜆𝑡𝐹 ( 𝑡 ) d 𝑡 . (4.5)The proof of Theorem 5.1 explains the intuition behind (4.4).The signiﬁcance of (4.2) is that it expresses M/G/ k response time in terms of waiting and resi-dence times, which are M/G/1 quantities. It also features a third quantity called inﬂated residencetime 𝑆 𝜋 -1 . We deﬁne inﬂated residence time formally in Section 5. For now, we just need to knowthat its mean, E [ 𝑆 𝜋 -1 ] = ∫ ∞ 𝑧 𝜋𝑥 𝜌 ( 𝑦 𝜋𝑥 ) d 𝐹 ( 𝑥 ) , (4.6)can be written in terms of 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 . Note that E [ 𝑅 𝜋 -1 ] ≤ E [ 𝑆 𝜋 -1 ] . Recall that proving our main results rests on characterizing the heavy-traﬃc scaling of E [ 𝑄 𝜋 ] , E [ 𝑅 𝜋 ] , and E [ 𝑆 𝜋 ] , where 𝜋 is either M-Gittins or M-SERPT. As we see in (4.4) and (4.6), both 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 feature prominently in the formulas of E [ 𝑄 𝜋 ] , E [ 𝑅 𝜋 ] , and E [ 𝑆 𝜋 ] . This means the ﬁrst stepof characterizing the heavy-traﬃc scaling of E [ 𝑄 𝜋 ] , E [ 𝑅 𝜋 ] , and E [ 𝑆 𝜋 ] is understanding 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 .This is the subject of Section 6, in which we prove bounds on 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 for a wide class of job size The new job and old job age cutoﬀs of 𝑥 are equivalent to what Scully et al. [33] call the previous and next hill ages of 𝑥 . ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 13 Table 4.1. New Job and Old Job Age Cutoﬀ Bounds

Size Distribution Quantity Bound Reference OR (−∞ , − ) 𝑦 M-Gittins-1 𝑥 Θ ( 𝑥 ) Theorem 6.4 𝑧 M-Gittins-1 𝑥 Θ ( 𝑥 ) 𝑦 M-SERPT-1 𝑥 Θ ( 𝑥 ) Theorem 6.2 𝑧 M-SERPT-1 𝑥 Θ ( 𝑥 ) QDHR 𝑦 M-Gittins-1 𝑥 Ω ( 𝑥 / 𝛾 ) for some 𝛾 ≥ 𝑧 M-Gittins-1 𝑥 𝑂 ( 𝑥 𝛾 ) for some 𝛾 ≥ QDHR ∪ QIMRL 𝑦 M-SERPT-1 𝑥 Ω ( 𝑥 / 𝛾 ) for some 𝛾 ≥ 𝑧 M-SERPT-1 𝑥 𝑂 ( 𝑥 𝛾 ) for some 𝛾 ≥ These bounds on 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 are critical for characterizing heavy-traﬃc scaling of E [ 𝑄 𝜋 -1 ] , E [ 𝑅 𝜋 -1 ] , and E [ 𝑆 𝜋 -1 ] . distributions. Table 4.1 summarizes these results. The main takeaway is that 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 are alwayspolynomially bounded relative to 𝑥 . Armed with bounds on age cutoﬀs, we are ready to characterize heavy-traﬃc scaling of meanwaiting, residence, and inﬂated residence times. This is the subject of Section 7, in which • Theorems 7.4, 7.5, 7.9 and 7.10 characterize M-SERPT’s heavy-traﬃc scaling; and • Theorems 7.11 and 7.12 characterize M-Gittins’s heavy-traﬃc scaling in terms of M-SERPT’s.Table 4.2 summarizes these results. The main takeaway of the table is that for all of the ﬁnite-variance job size distribution classes considered, if 𝜋 is either M-Gittins or M-SERPT, E [ 𝑄 𝜋 -1 ] dominates E [ 𝑅 𝜋 -1 ] and E [ 𝑆 𝜋 -1 ] , with the latter sometimes requiring an additional condition. Specif-ically, • E [ 𝑄 𝜋 -1 ] grows polynomially in 1 /( − 𝜌 ) , whereas • E [ 𝑅 𝜋 -1 ] and E [ 𝑆 𝜋 -1 ] grow subpolynomially in 1 /( − 𝜌 ) . We now prove our main results. The proofs of Theorems 3.1 and 3.2 both follow the same threemain steps, where 𝜋 is M-Gittins or M-SERPT, respectively: • Theorem 5.1 bounds E [ 𝑇 𝜋 - 𝑘 ] in terms of M/G/1 quantities. • The results in Table 4.2 show lim 𝜌 → E [ 𝑇 𝜋 - 𝑘 ]/ E [ 𝑄 𝜋 -1 ] = • Prior work relates E [ 𝑄 𝜋 -1 ] to E [ 𝑇 Gittins-1 ] . Proof of Theorem 3.1.

An M/G/1 can simulate any M/G/ k policy by sharing the server, so thefact that Gittins minimizes M/G/1 mean response time means E [ 𝑇 M-Gittins- 𝑘 ]/ E [ 𝑇 Gittins-1 ] ≥

1. Ittherefore suﬃces to show lim 𝜌 → E [ 𝑇 M-Gittins- 𝑘 ]/ E [ 𝑇 Gittins-1 ] ≤ E [ 𝑇 M-Gittins- 𝑘 ] E [ 𝑄 M-Gittins-1 ] ≤ + 𝑘 E [ 𝑅 M-Gittins-1 ] + ( 𝑘 − ) E [ 𝑆 M-Gittins-1 ] E [ 𝑄 M-Gittins-1 ] . That is, for all the classes in Table 4.2 except OR (− , − ) . Table 4.2. Heavy-Traﬀic Scaling of Waiting, Residence, and Inflated Residence Times

Size Distribution Quantity Heavy-Traffic Scaling Reference OR (− , − ) E [ 𝑄 𝜋 -1 ] 𝑂 (− log ( − 𝜌 )) Theorems 7.4 and 7.11 E [ 𝑅 𝜋 -1 ] 𝑂 (− log ( − 𝜌 )) OR (−∞ , − ) E [ 𝑄 𝜋 -1 ] Ω (( − 𝜌 ) − 𝛿 ) for some 𝛿 > E [ 𝑅 𝜋 -1 ] 𝑂 (− log ( − 𝜌 )) E [ 𝑆 𝜋 -1 ] 𝑂 (− log ( − 𝜌 )) Theorems 7.9 and 7.12

MDA ( Λ ) E [ 𝑄 𝜋 -1 ] Ω (( − 𝜌 ) −( − 𝜀 ) ) for all 𝜀 > E [ 𝑅 𝜋 -1 ] 𝑂 (( − 𝜌 ) − 𝜀 ) for all 𝜀 > MDA ( Λ ) ∩ QDHR E [ 𝑆 𝜋 -1 ] 𝑂 (( − 𝜌 ) − 𝜀 ) for all 𝜀 > MDA ( Λ ) ∩ QIMRL E [ 𝑆 M-SERPT-1 ] 𝑂 (( − 𝜌 ) − 𝜀 ) for all 𝜀 > ENBUE E [ 𝑄 𝜋 -1 ] Θ (( − 𝜌 ) − ) Theorems 7.5 and 7.11 E [ 𝑅 𝜋 -1 ] Θ ( ) Bounded E [ 𝑆 𝜋 -1 ] Θ ( ) Theorems 7.5 and 7.12

These bounds hold when 𝜋 is either M-Gittins or M-SERPT, except for the MDA ( Λ ) ∩ QIMRL case, in which thebound holds only for M-SERPT.

Theorems 7.5 and 7.9–7.12 imply that the second term vanishes in the 𝜌 → E [ 𝑄 M-Gittins-1 ] ≤ E [ 𝑄 Gittins-1 ] ≤ E [ 𝑇 Gittins-1 ] , (4.7)implying the desired result. (cid:3) Proof of Theorem 3.2.

Theorem 5.1 implies E [ 𝑇 M-SERPT- 𝑘 ] E [ 𝑄 M-SERPT-1 ] ≤ + 𝑘 E [ 𝑅 M-SERPT-1 ] + ( 𝑘 − ) E [ 𝑆 M-SERPT-1 ] E [ 𝑄 M-SERPT-1 ] . Theorems 7.5, 7.9 and 7.10 imply that the second term vanishes in the 𝜌 → E [ 𝑄 M-SERPT-1 ] ≤ E [ 𝑄 M-Gittins-1 ] , which combines with (4.7) to imply the desired result. (cid:3) To prove Theorem 3.3, we simply combine the results in Table 4.2.

Proof of Theorem 3.3.

We examine each case in turn. • For 𝑋 ∈ OR (− , − ) , we use Theorems 7.4 and 7.11. • For 𝑋 ∈ OR (−∞ , − ) ∪ MDA ( Λ ) , we use Theorems 7.9–7.11. • For 𝑋 ∈ ENBUE , we have 𝑟 M-SERPT ( 𝑎 ) = Θ ( ) by Deﬁnition 2.10, so we use Theorems 7.5and 7.11. (cid:3) k RESPONSE TIME BOUND

This section bounds M/G/ k mean response time under any monotonic SOAP policy 𝜋 . The notationused in Theorem 5.1 below is summarized in Table 5.1. While Scully et al. [33, Lemma 5.6] mention Gittins instead of M-Gittins, they prove the desired statement for M-Gittinsas an intermediate step of their proof. ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 15

Table 5.1. Summary of Notation

Notation Description Reference 𝜋 - k 𝑘 -server version of SOAP policy 𝜋 Section 2.1 𝜌 ( 𝑎 ) , 𝜏 ( 𝑎 ) functions of moments of min { 𝑋, 𝑎 } (4.5) 𝑦 𝜋𝑥 , 𝑧 𝜋𝑥 new job and old job age cutoﬀs Deﬁnition 4.1 𝑇 𝜋 - 𝑘 response time under 𝜋 - k Section 2.1 𝑄 𝜋 -1 waiting time under 𝜋 -1 (4.4) 𝑅 𝜋 -1 residence time under 𝜋 -1 (4.4) 𝑆 𝜋 -1 inﬂated residence time under 𝜋 -1 (4.6) Additionally, 𝑇 𝜋 - 𝑘𝑥 is size-conditional response time for size 𝑥 , and similarly for 𝑄 𝜋 -1 𝑥 , 𝑅 𝜋 -1 𝑥 , and 𝑆 𝜋 -1 𝑥 . Theorem 5.1.

For any monotonic SOAP policy 𝜋 , E [ 𝑇 𝜋 - 𝑘𝑥 ] ≤ 𝜌 ( 𝑦 𝜋𝑥 ) (cid:18) 𝜏 ( 𝑧 𝜋𝑥 ) 𝜌 ( 𝑧 𝜋𝑥 ) + 𝑘𝑥 + ( 𝑘 − ) 𝑧 𝜋𝑥 (cid:19) , (5.1) and therefore E [ 𝑇 𝜋 - 𝑘 ] ≤ E [ 𝑄 𝜋 -1 ] + 𝑘 E [ 𝑅 𝜋 -1 ] + ( 𝑘 − ) E [ 𝑆 𝜋 -1 ] . Proof.

In order to bound M/G/ k mean response time, we use a tagged job method in the style ofGrosof et al. [15], but we generalize it to allow an arbitrary monotonic SOAP policy 𝜋 . We consideran arbitrary “tagged” job 𝐽 of size 𝑥 arriving to a steady-state system. Our goal is to analyze thedistribution of 𝐽 ’s response time.The ﬁrst step is a shift in perspective: instead of thinking about time passing , we reason in termsof work completed . Since each of the 𝑘 servers works at rate 1 / 𝑘 , the system can complete workat rate 1. While 𝐽 is in the system, servers sometimes complete work and are sometimes left idle.This means 𝐽 ’s response time is the sum of • the amount of work completed while 𝐽 is in the system and • the amount of work “wasted”, meaning service capacity left idle, while 𝐽 is in the system.We bound 𝐽 ’s response time by bounding the total amount of work above. We do so by dividing itinto several pieces: • Tagged work : the work of 𝐽 itself. • Virtual work : work on jobs prioritized behind 𝐽 , plus wasted work due to servers left idle. • Relevant work : work on jobs prioritized ahead of 𝐽 . We divide this into two subcategories: – Old relevant work: relevant work on old jobs , namely those present when 𝐽 arrives. – New relevant work: relevant work on new jobs , namely those that arrive after 𝐽 .For the ﬁrst two categories, we have the same simple bound as Grosof et al. [15]: tagged workand virtual work add up to at most 𝑘𝑥 . This is because tagged work is 𝐽 ’s size 𝑥 , and the schedulingpolicy ensures that a server only completes virtual work while 𝐽 is in service at another server.However, bounding the two relevant work categories is more complicated than in Grosof et al. [15].We begin by asking: what rank must a job have to contribute to relevant work? Note that thejob 𝐽 will never have rank greater than its rank upon completion, 𝑟 𝜋 ( 𝑥 ) , since 𝜋 is a monotonicpolicy. As a result, all new relevant work is from jobs with rank strictly less than 𝑟 𝜋 ( 𝑥 ) , and all oldrelevant work is from jobs with rank less than or equal to 𝑟 𝜋 ( 𝑥 ) . We can put this in terms of theage cutoﬀs deﬁned in Deﬁnition 4.1: • jobs contribute new relevant work up to at most age 𝑦 𝜋𝑥 , and • jobs contribute old relevant work up to at most age 𝑧 𝜋𝑥 .In the rest of this proof, 𝑦 𝑥 and 𝑧 𝑥 refer to 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 , respectively.To help us bound the amount of old relevant work completed while 𝐽 is in the system, we deﬁnea new concept: the amount of relevant work in the M/G/ k system under 𝜋 . Deﬁnition 5.2.

Let RelWork 𝜋 - 𝑘𝑥 ( 𝑡 ) denote the amount of work in the M/G/ k at time 𝑡 which isrelevant to a job 𝐽 of size 𝑥 :RelWork 𝜋 - 𝑘𝑥 ( 𝑡 ) = Õ jobs 𝐽 ′ (cid:0) min { 𝑧 𝑥 , 𝑥 𝐽 ′ } − 𝑎 𝐽 ′ ( 𝑡 ) (cid:1) + , where 𝑥 𝐽 ′ is the size of job 𝐽 ′ and 𝑎 𝐽 ′ ( 𝑡 ) is its age at time 𝑡 . We write RelWork 𝜋 - 𝑘𝑥 for the steadystate distribution of the amount of relevant work in the M/G/ k system.Since 𝐽 is a Poisson arrival, RelWork 𝜋 - 𝑘𝑥 is the distribution of the amount of relevant work in thesystem when 𝐽 arrives. That amount is an upper bound on the amount of old relevant work thatwill be completed while 𝐽 is in the system.To bound new relevant work, note that if a job 𝐽 ′ of size 𝑥 ′ arrives while 𝐽 is in the system, then 𝐽 ′ contributes at most min { 𝑥 ′ , 𝑦 𝑥 } new relevant work. As a result, new relevant work can be upperbounded by considering a transformed M/G/1 system in which the job size distribution is 𝑋 𝑦 𝑥 = st min { 𝑋, 𝑦 𝑥 } . The amount of new relevant work that arrives to our real system is upper bounded by the totalamount of work that arrives to the transformed system. Let 𝐵 𝑦 𝑥 ( 𝑤 ) be the length of a busy periodin the transformed M/G/1 system started by an initial amount of work 𝑤 . If 𝑤 is the total amount oftagged, virtual, and old relevant work, then the amount of new relevant work is at most 𝐵 𝑦 𝑥 ( 𝑤 ) − 𝑤 .Combining our bounds, we obtain 𝑇 𝜋 - 𝑘𝑥 ≤ st 𝐵 𝑦 𝑥 (cid:0) 𝑘𝑥 + RelWork 𝜋 - 𝑘𝑥 (cid:1) . Applying Lemma 5.3, stated and proven later in this section, yields 𝑇 𝜋 - 𝑘𝑥 ≤ st 𝐵 𝑦 𝑥 (cid:0) 𝑘𝑥 + RelWork 𝜋 -1 𝑥 + ( 𝑘 − ) 𝑧 𝑥 (cid:1) . (5.2)Taking expectations gives us E [ 𝑇 𝜋 - 𝑘𝑥 ] ≤ E [ RelWork 𝜋 -1 𝑥 ] + 𝑘𝑥 + ( 𝑘 − ) 𝑧 𝑥 𝜌 ( 𝑦 𝑥 ) . Because 𝜋 -1 is work conserving with respect to relevant work, the Pollaczek-Khinchine formulatells us E [ RelWork 𝜋 -1 𝑥 ] = 𝜏 ( 𝑧 𝑥 ) 𝜌 ( 𝑧 𝑥 ) , which completes the proof of (5.1).To connect (5.1) to the quantities E [ 𝑄 𝜋 ] , E [ 𝑅 𝜋 ] , and E [ 𝑆 𝜋 ] , we rewrite (5.2) as 𝑇 𝜋 - 𝑘𝑥 ≤ st 𝐵 𝑦 𝑥 ( RelWork 𝜋 -1 𝑥 ) + 𝑘 Õ 𝐵 𝑦 𝑥 ( 𝑥 ) + 𝑘 − Õ 𝐵 𝑦 𝑥 ( 𝑧 𝑥 ) , (5.3)where all of the relevant busy periods are independent. Prior work on SOAP policies [32, 33] givesnames to some of the distributions on the right-hand side. We deﬁne waiting, residence, and inﬂated residence times in terms of relevant busy periods. Waiting and residence timesalso have natural deﬁnitions as components of M/G/1 response time [32, 33], but we do not need them in this paper. ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 17 • The size-conditional waiting time for size 𝑥 is the random variable 𝑄 𝜋 -1 𝑥 = st 𝐵 𝑦 𝑥 ( RelWork 𝜋 -1 𝑥 ) ,and waiting time is 𝑄 𝜋 -1 = st 𝑄 𝜋 -1 𝑋 . • The size-conditional residence time for size 𝑥 is the random variable 𝑅 𝜋 -1 𝑥 = st 𝐵 𝑦 𝑥 ( 𝑥 ) , and residence time is 𝑅 𝜋 -1 = st 𝑅 𝜋 -1 𝑋 . • As there is no concise name for 𝐵 𝑦 𝑥 ( 𝑧 𝑥 ) in prior work, we deﬁne size-conditional inﬂatedresidence time for size 𝑥 to be the random variable 𝑆 𝜋 -1 𝑥 = st 𝐵 𝑦 𝑥 ( 𝑧 𝑥 ) , and we deﬁne inﬂatedresidence time to be 𝑆 𝜋 -1 = st 𝑆 𝜋 -1 𝑋 .With these deﬁnitions in place, (5.3) gives us 𝑇 𝜋 - 𝑘𝑥 ≤ st 𝑄 𝜋 -1 𝑥 + 𝑘 Õ 𝑅 𝜋 -1 𝑥 + 𝑘 − Õ 𝑆 𝜋 -1 𝑥 , so the result follows by taking the expectation of 𝑇 𝜋 - 𝑘 = st 𝑇 𝜋 - 𝑘𝑋 . (cid:3) Theorem 5.1 applies only to monotonic SOAP policies. It is tempting to try to apply the sametechnique to SOAP policies with nonmonotonic rank functions, but as we discuss in Appendix A,the argument does not readily generalize.The proof of Theorem 5.1 assumes a bound on RelWork 𝜋 - 𝑘𝑥 . We prove the bound in the followinglemma, which generalizes a similar lemma of Grosof et al. [15, Lemma 7.10]. Lemma 5.3.

Let Δ 𝑥 ( 𝑡 ) = RelWork 𝜋 - 𝑘𝑥 ( 𝑡 ) − RelWork 𝜋 -1 𝑥 ( 𝑡 ) . Then Δ 𝑥 ( 𝑡 ) ≤ ( 𝑘 − ) 𝑧 𝜋𝑥 for all times 𝑡 , and therefore RelWork 𝜋 - 𝑘𝑥 ≤ st RelWork 𝜋 -1 𝑥 + ( 𝑘 − ) 𝑧 𝜋𝑥 . Proof.

Throughout this proof, 𝑧 𝑥 refers to 𝑧 𝜋𝑥 . We consider a pair of coupled systems with thesame arrival sequence: • System 1 , an M/G/1 using 𝜋 -1; and • System 𝑘 , an M/G/ k using 𝜋 - k .Our approach is to bound the diﬀerence in relevant work between Systems 1 and 𝑘 at any time 𝑡 .Call a job relevant if it has age less than 𝑧 𝑥 . These are the only jobs that contribute relevantwork. To bound Δ 𝑥 ( 𝑡 ) , we divide times 𝑡 into two types of intervals: • few-jobs intervals , during which there are fewer than 𝑘 relevant jobs in System 𝑘 ; and • many-jobs intervals , during which there are at least 𝑘 relevant jobs in System 𝑘 .Note that both types of intervals are deﬁned based on System 𝑘 alone, so System 1 may or maynot have relevant jobs during either type of interval.Any time 𝑡 is in either a few-jobs interval or a many-jobs interval. If 𝑡 is in a few-jobs interval,the argument is simple: there are at most 𝑘 − 𝑘 at time 𝑡 , so Δ 𝑥 ( 𝑡 ) ≤ RelWork 𝜋 - 𝑘𝑥 ( 𝑡 ) ≤ ( 𝑘 − ) 𝑧 𝑥 . Suppose instead that 𝑡 is in a many-jobs interval. Let 𝑠 ≤ 𝑡 be the start of the many-jobs intervalcontaining 𝑡 . We will show Δ 𝑥 ( 𝑡 ) ≤ Δ 𝑥 ( 𝑠 ) ≤ ( 𝑘 − ) 𝑧 𝑥 . We begin by showing Δ 𝑥 ( 𝑡 ) ≤ Δ 𝑥 ( 𝑠 ) . Note that arrivals do not aﬀect Δ 𝑥 , because the twosystems experience the same arrivals and have the same deﬁnition of relevant work. Next, note thatservice to irrelevant jobs does not aﬀect Δ 𝑥 , because irrelevant jobs never become relevant under 𝜋 ,since 𝜋 is a monotonic policy. In fact, the only way that Δ 𝑥 changes over a many-jobs period is dueto service to relevant jobs. System 𝑘 serves relevant jobs on all 𝑘 servers throughout a many-jobs 𝑦 𝑥 𝑟 M-SERPT ( 𝑥 ) 𝑥 𝑧 𝑥 𝑦 𝑥 ′ = 𝑥 ′ = 𝑧 𝑥 ′ 𝑟 M-SERPT ( 𝑥 ′ ) 𝑟 M-SERPT ( 𝑎 ) 𝑟 SERPT ( 𝑎 ) Here 𝑦 𝑥 stands for 𝑦 M-SERPT 𝑥 , and similarly for 𝑧 𝑥 , 𝑦 𝑥 ′ , and 𝑧 𝑥 ′ . Fig. 6.1. Relationship Between SERPT and M-SERPT Rank Functions period, completing relevant work at rate 1. System 1 may or may not serve relevant jobs during amany-jobs period, so it completes relevant work at rate at most 1. This means Δ 𝑥 ( 𝑡 ) ≤ Δ 𝑥 ( 𝑠 ) , asdesired.All that remains is to show that Δ 𝑥 ( 𝑠 ) ≤ ( 𝑘 − ) 𝑧 𝑥 . Recall that 𝑠 is the start of a many-jobsinterval. Many-jobs intervals cannot start due to irrelevant jobs becoming relevant, because 𝜋 is amonotonic policy. This means each many-jobs interval starts due to a relevant job arriving whileSystem 𝑘 has 𝑘 − Δ 𝑥 , as discussed above. Thismeans Δ 𝑥 ( 𝑠 ) = Δ 𝑥 ( 𝑠 − ) , where 𝑠 − is the instant before the arrival that starts the many-jobs interval.But 𝑠 − is in a few-jobs interval, so Δ 𝑥 ( 𝑠 ) = Δ 𝑥 ( 𝑠 − ) ≤ ( 𝑘 − ) 𝑧 𝑥 . (cid:3) We now have a bound on M/G/ k mean response time under monotonic SOAP policies 𝜋 , includingM-Gittins and M-SERPT. The bound (Theorem 5.1) is expressed in terms of E [ 𝑄 𝜋 -1 ] , E [ 𝑅 𝜋 -1 ] , and E [ 𝑆 𝜋 -1 ] , quantities which in turn are expressed in terms of the new job and old job age cutoﬀs 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 . In order to prove optimality of M-Gittins in the heavy-traﬃc M/G/ k , we need to understandthe heavy-traﬃc behavior of E [ 𝑄 𝜋 -1 ] , E [ 𝑅 𝜋 -1 ] , and E [ 𝑆 𝜋 -1 ] , which, as we will see in Section 7, boilsdown to understanding the behavior of 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 in the 𝑥 → ∞ limit. This section is thus devotedto asymptotically bounding the new job and old job age cutoﬀs, and more generally the rankfunctions, of M-Gittins and M-SERPT.Recall from Deﬁnition 2.2 that SERPT’s rank function is used to deﬁne M-SERPT’s. The follow-ing lemma shows that the two rank functions are equal at the new job and old job age cutoﬀs, andsimilarly for Gittins and M-Gittins. Figure 6.1 gives an intuitive picture of the result. Lemma 6.1.

The SERPT and M-SERPT rank functions are related by 𝑟 SERPT ( 𝑦 M-SERPT 𝑥 ) = 𝑟 M-SERPT ( 𝑦 M-SERPT 𝑥 ) = 𝑟 M-SERPT ( 𝑥 ) = 𝑟 M-SERPT ( 𝑧 M-SERPT 𝑥 ) = 𝑟 SERPT ( 𝑧 M-SERPT 𝑥 ) , and analogously for Gittins and M-Gittins. Proof.

We prove the statement for SERPT and M-SERPT, as the proof for Gittins and M-Gittinsis analogous. Throughout this proof, 𝑦 𝑥 and 𝑧 𝑥 refer to 𝑦 M-SERPT 𝑥 and 𝑧 M-SERPT 𝑥 , respectively. Theillustration in Fig. 6.1 may provide helpful intuition for the following argument.We ﬁrst show the outer equalities. Deﬁnition 4.1 implies that 𝑟 M-SERPT is increasing in the inter-vals ( 𝑦 𝑥 − 𝛿, 𝑦 𝑥 ) and ( 𝑧 𝑥 , 𝑧 𝑥 + 𝛿 ) for some 𝛿 >

0. By Deﬁnition 2.2, for 𝑟 M-SERPT to be increasingat age 𝑎 , we must have 𝑟 M-SERPT ( 𝑎 ) = 𝑟 SERPT ( 𝑎 ) , so continuity of 𝑟 M-SERPT (Lemma 2.6) implies theouter equalities. ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 19

By (4.3) and the monotonicity of 𝑟 M-SERPT , it remains only to show 𝑟 M-SERPT ( 𝑦 𝑥 ) = 𝑟 M-SERPT ( 𝑧 𝑥 ) .This is immediate if 𝑦 𝑥 = 𝑧 𝑥 , and if 𝑦 𝑥 < 𝑧 𝑥 , then 𝑟 M-SERPT is constant over the interval [ 𝑦 𝑥 , 𝑧 𝑥 ) , sothe result follows by the continuity of 𝑟 M-SERPT (Lemma 2.6). (cid:3)

In this section we show two bounds on 𝑦 M-SERPT 𝑥 and 𝑧 M-SERPT 𝑥 , each subject to a diﬀerent assump-tion on the job size distribution. Theorem 6.2. If 𝑋 ∈ OR (−∞ , − ) , then 𝑟 SERPT ( 𝑎 ) = Θ ( 𝑎 ) ,𝑟 M-SERPT ( 𝑎 ) = Θ ( 𝑎 ) ,𝑦 M-SERPT 𝑥 = Θ ( 𝑥 ) ,𝑧 M-SERPT 𝑥 = Θ ( 𝑥 ) . Proof.

By Deﬁnition 2.7, there exists 𝛼 > 𝑟 SERPT ( 𝑎 ) = ∫ ∞ 𝑎 𝐹 ( 𝑡 ) 𝐹 ( 𝑎 ) d 𝑡 ≤ 𝑂 ( ) ∫ ∞ 𝑎 (cid:16) 𝑡𝑎 (cid:17) − 𝛼 d 𝑡 = 𝑂 ( 𝑎 ) , and 𝑟 SERPT ( 𝑎 ) = Ω ( 𝑎 ) follows similarly. This implies 𝑟 M-SERPT ( 𝑎 ) = max 𝑏 ∈[ ,𝑎 ] 𝑟 SERPT ( 𝑏 ) = max 𝑏 ∈[ ,𝑎 ] Θ ( 𝑏 ) = Θ ( 𝑎 ) , so the result follows from Lemma 6.1. (cid:3) Theorem 6.3. If 𝑋 ∈ QDHR ∪ QIMRL with exponent 𝛾 , then 𝑦 M-SERPT 𝑥 = Ω ( 𝑥 / 𝛾 ) ,𝑧 M-SERPT 𝑥 = 𝑂 ( 𝑥 𝛾 ) . Proof.

The

QDHR case follows from Theorem 6.5 (Section 6.2) and a result of Scully et al. [33,Eq. (3.8)] stating 𝑦 M-Gittins 𝑥 ≤ 𝑦 M-SERPT 𝑥 ≤ 𝑧 M-SERPT 𝑥 ≤ 𝑧 M-Gittins 𝑥 , so only the QIMRL case remains.In the rest of this proof, 𝑦 𝑥 and 𝑧 𝑥 refer to 𝑦 M-SERPT 𝑥 and 𝑧 M-SERPT 𝑥 , respectively. By (4.3), it suﬃcesto show 𝑧 𝑥 = 𝑂 ( 𝑦 𝛾𝑥 ) . Because 𝑋 ∈ QIMRL with exponent 𝛾 , there exists strictly increasing function 𝑚 : R + → R + such that for all ages 𝑎 , 𝑎 ≤ 𝑚 − (cid:0) 𝑟 SERPT ( 𝑎 ) (cid:1) ≤ 𝑂 ( 𝑎 𝛾 ) . The result follows by plugging in 𝑎 = 𝑦 𝑥 and 𝑎 = 𝑧 𝑥 and applying Lemma 6.1. (cid:3) In this section we show two bounds on 𝑦 M-Gittins 𝑥 and 𝑧 M-Gittins 𝑥 , each subject to a diﬀerent assump-tion on the job size distribution. Theorem 6.4. If 𝑋 ∈ OR (−∞ , − ) , then 𝑦 M-Gittins 𝑥 = Θ ( 𝑥 ) ,𝑧 M-Gittins 𝑥 = Θ ( 𝑥 ) . Theorem 6.5. If 𝑋 ∈ QDHR with exponent 𝛾 , then 𝑦 M-Gittins 𝑥 = Ω ( 𝑥 / 𝛾 ) ,𝑧 M-Gittins 𝑥 = 𝑂 ( 𝑥 𝛾 ) . These bounds are harder to prove than their M-SERPT counterparts from Section 6.1. The mostimportant component is the following deﬁnition, which helps us better understand the M-Gittinsrank function and relate it to the simpler M-SERPT rank function.

Deﬁnition 6.6.

The time per completion over an age interval ( 𝑎, 𝑏 ] is 𝜂 ( 𝑎, 𝑏 ) = E [ min { 𝑋, 𝑏 } − 𝑎 | 𝑋 > 𝑎 ] P { 𝑋 < 𝑏 | 𝑋 > 𝑎 } = ∫ 𝑏𝑎 𝐹 ( 𝑡 ) d 𝑡𝐹 ( 𝑎 ) − 𝐹 ( 𝑏 ) . We extend this deﬁnition to the 𝑏 → 𝑎 and 𝑏 → ∞ limits: 𝜂 ( 𝑎, 𝑎 ) = ℎ ( 𝑎 ) ,𝜂 ( 𝑎, ∞) = E [ 𝑋 − 𝑎 | 𝑋 > 𝑎 ] . We can write the rank functions of SERPT, M-SERPT, Gittins, and M-Gittins in terms of 𝜂 as 𝑟 SERPT ( 𝑎 ) = 𝜂 ( 𝑎, ∞) ,𝑟 M-SERPT ( 𝑎 ) = max 𝑏 ∈[ ,𝑎 ] 𝜂 ( 𝑏, ∞) ,𝑟 Gittins ( 𝑎 ) = min 𝑏 ∈[ 𝑎, ∞] 𝜂 ( 𝑎, 𝑏 ) ,𝑟 M-Gittins ( 𝑎 ) = max 𝑏 ∈[ ,𝑎 ] min 𝑐 ∈[ 𝑏, ∞] 𝜂 ( 𝑏, 𝑐 ) . (6.1)Armed with Deﬁnition 6.6 and (6.1), we are ready to prove Theorems 6.4 and 6.5. The formerproof relies on some technical lemmas that we defer to Section 6.3. Proof of Theorem 6.4.

Throughout this proof, 𝑦 𝑥 and 𝑧 𝑥 refer to 𝑦 M-Gittins 𝑥 and 𝑧 M-Gittins 𝑥 , respec-tively. By (4.3), it suﬃces to show there exist 𝐶 , 𝑥 > 𝑥 ≥ 𝑥 , 𝑧 𝑥 ≤ 𝐶 𝑦 𝑥 . We will set 𝐶 ≥

2, which covers the 𝑧 𝑥 ≤ 𝑦 𝑥 case. The rest of the proof is thus devoted to the 𝑧 𝑥 > 𝑦 𝑥 case. Our approach is to show there exist 𝐶 , 𝐶 such that for all 𝑥 ≥ 𝑥 , 𝐶 𝑦 𝑥 ≥ 𝑟 Gittins ( 𝑦 𝑥 ) ≥ 𝐶 𝑧 𝑥 . (6.2)We begin with the upper bound on 𝑟 Gittins ( 𝑦 𝑥 ) . By Lemma 6.1, we have 𝑟 Gittins ( 𝑦 𝑥 ) = 𝑟 M-Gittins ( 𝑦 𝑥 ) for all sizes 𝑥 , and by (6.1), we have 𝑟 M-Gittins ( 𝑎 ) ≤ 𝑟 M-SERPT ( 𝑎 ) for all ages 𝑎 . Combining these ob-servations with Theorem 6.2 implies 𝑟 Gittins ( 𝑦 𝑥 ) = 𝑂 ( 𝑦 𝑥 ) and thereby implies the desired upperbound from (6.2). We now turn to the lower bound on 𝑟 Gittins ( 𝑦 𝑥 ) . This requires Lemmas 6.7 and 6.8, which arefacts about 𝜂 that we prove in Section 6.3. Combining Lemma 6.7 with (6.1) and the fact that weare in the 𝑧 𝑥 > 𝑦 𝑥 case gives us 𝑟 Gittins ( 𝑦 𝑥 ) = 𝜂 ( 𝑦 𝑥 , 𝑧 𝑥 ) ≥ 𝜂 (cid:16) 𝑧 𝑥 , 𝑧 𝑥 (cid:17) . Our time per completion function is the reciprocal of what Aalto et al. [3, 4] call the eﬃciency function . This would be more subtle if lim 𝑥 →∞ 𝑦 𝑥 were ﬁnite, but Theorem 6.2 and a result of Aalto et al. [4, Proposition 9] implylim 𝑥 →∞ 𝑦 𝑥 = ∞ . ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 21 By Lemma 6.8, there exist 𝐶 , 𝑥 such that for all 𝑥 with 𝑧 𝑥 / > 𝑥 , 𝜂 (cid:16) 𝑧 𝑥 , 𝑧 𝑥 (cid:17) ≥ 𝐶 𝑧 𝑥 , implying the desired lower bound from (6.2). (cid:3) Proof of Theorem 6.5.

Throughout this proof, 𝑦 𝑥 and 𝑧 𝑥 refer to 𝑦 M-Gittins 𝑥 and 𝑧 M-Gittins 𝑥 , respec-tively. By (4.3), it suﬃces to show 𝑧 𝑥 = 𝑂 ( 𝑦 𝛾𝑥 ) . Because 𝑋 ∈ QDHR with exponent 𝛾 , there exists astrictly increasing function 𝑚 : R + → R + such that for all sizes 𝑥 , 𝑚 ( 𝑥 ) ≤ ℎ ( 𝑥 ) ≤ 𝑚 ( 𝑂 ( 𝑥 𝛾 )) . We have 𝑟 Gittins ( 𝑦 𝑥 ) ≤ / ℎ ( 𝑦 𝑥 ) by (6.1), and Lemma 6.1 implies 𝑟 Gittins ( 𝑧 𝑥 ) = 𝑟 Gittins ( 𝑦 𝑥 ) , so 𝑟 Gittins ( 𝑧 𝑥 ) ≤ 𝑚 ( 𝑂 ( 𝑦 𝛾𝑥 )) . It remains only to lower bound 𝑟 Gittins ( 𝑧 𝑥 ) . We do so using the observation that for any age 𝑎 , 𝑟 Gittins ( 𝑎 ) = min 𝑏 ∈[ 𝑎, ∞] 𝜂 ( 𝑎, 𝑏 ) = max 𝑏 ∈[ 𝑎, ∞] ∫ 𝑏𝑎 𝐹 ( 𝑡 ) ℎ ( 𝑡 ) d 𝑡 ∫ 𝑏𝑎 𝐹 ( 𝑡 ) d 𝑡 ! − ≥ (cid:0) sup 𝑏 > 𝑎 ℎ ( 𝑏 ) (cid:1) − = inf 𝑏 > 𝑎 ℎ ( 𝑏 )≥ 𝑚 ( 𝑎 ) , where the ﬁrst inequality follows from viewing the ratio of integrals as a weighted average. Plug-ging in 𝑎 = 𝑧 𝑥 implies 𝑚 ( 𝑧 𝑥 ) ≤ 𝑚 ( 𝑂 ( 𝑦 𝛾𝑥 )) , so the result follows because 𝑚 is strictly increasing. (cid:3) Lemma 6.7.

For all sizes 𝑥 and ages 𝑎 , if 𝑦 𝑥 < 𝑎 < 𝑧 𝑥 , then 𝑟 Gittins ( 𝑦 𝑥 ) = 𝜂 ( 𝑦 𝑥 , 𝑧 𝑥 ) ≥ 𝜂 ( 𝑎, 𝑧 𝑥 ) . Proof.

A property of the Gittins index [12, Lemma 2.2] implies 𝑟 Gittins ( 𝑦 𝑥 ) = 𝜂 ( 𝑦 𝑥 , 𝑧 𝑥 ) . In particular, for any 𝑎 ≠ 𝑧 𝑥 , 𝜂 ( 𝑦 𝑥 , 𝑎 ) ≥ 𝜂 ( 𝑦 𝑥 , 𝑧 𝑥 ) . (6.3)A basic property of the 𝜂 function [33, Eq. (D.3)] is that for any 𝑑 < 𝑒 < 𝑓 , 𝜂 ( 𝑑, 𝑒 ) ≥ 𝜂 ( 𝑑, 𝑓 ) ⇔ 𝜂 ( 𝑑, 𝑓 ) ≥ 𝜂 ( 𝑒, 𝑓 ) . Plugging in 𝑑 = 𝑦 𝑥 , 𝑒 = 𝑎 , and 𝑓 = 𝑧 𝑥 and applying (6.3) yields 𝜂 ( 𝑦 𝑥 , 𝑧 𝑥 ) ≥ 𝜂 ( 𝑎, 𝑧 𝑥 ) , as desired. (cid:3) Lemma 6.8. If 𝑋 ∈ OR (−∞ , − ) , then there exist constants 𝐶 , 𝑥 > such that for all 𝑏 > 𝑎 > 𝑥 , 𝜂 ( 𝑎, 𝑏 ) ≥ 𝐶 𝑎 (cid:16) − 𝑎𝑏 (cid:17) . The proof given by Gittins et al. [12] is in a discrete setting, but essentially the same proof carries over to our continuoussetting.

Proof.

We can write 𝜂 ( 𝑎, 𝑏 ) as 𝜂 ( 𝑎, 𝑏 ) = ∫ 𝑏𝑎 𝐹 ( 𝑡 )/ 𝐹 ( 𝑎 ) d 𝑡 − 𝐹 ( 𝑏 )/ 𝐹 ( 𝑎 ) ≥ ∫ 𝑏𝑎 𝐹 ( 𝑡 ) 𝐹 ( 𝑎 ) d 𝑡 . Because 𝑋 ∈ OR (−∞ , − ) , there exist 𝛽 > 𝐶 , 𝑥 > 𝑡 > 𝑎 > 𝑥 , 𝐹 ( 𝑡 ) 𝐹 ( 𝑎 ) ≥ 𝐶 (cid:16) 𝑡𝑎 (cid:17) − 𝛽 . For all 𝑏 > 𝑎 > 𝑥 , we have 𝜂 ( 𝑎, 𝑏 ) ≥ 𝐶 ∫ 𝑏𝑎 (cid:16) 𝑡𝑎 (cid:17) − 𝛽 d 𝑡 = 𝐶 𝑎𝛽 − (cid:16) − (cid:16) 𝑏𝑎 (cid:17) −( 𝛽 − ) (cid:17) . We now consider two cases: 𝛽 ≥ < 𝛽 <

2. If 𝛽 ≥

2, then ( 𝑏 / 𝑎 ) −( 𝛽 − ) ≤ 𝑎 / 𝑏 and therefore 𝜂 ( 𝑎, 𝑏 ) ≥ 𝐶 𝑎𝛽 − (cid:16) − 𝑎𝑏 (cid:17) , (6.4)so setting 𝐶 = 𝐶 /( 𝛽 − ) and 𝑥 = 𝑥 suﬃces. If 1 < 𝛽 <

2, we use the fact that for all 𝑢 > 𝑢 𝛽 − ≤ + ( 𝛽 − ) ( 𝑢 − ) . Substituting 𝑢 = 𝑎 / 𝑏 and combining this with (6.4) yields 𝜂 ( 𝑎, 𝑏 ) ≥ 𝐶 𝑎 (cid:16) − 𝑎𝑏 (cid:17) , so setting 𝐶 = 𝐶 and 𝑥 = 𝑥 suﬃces. (cid:3) In this section we characterize the heavy-traﬃc scaling of mean waiting, residence, and inﬂatedresidence times, which are the M/G/1 quantities that appear Theorem 5.1. Because M-SERPT isa simpler policy than M-Gittins, our approach is to ﬁrst study M-SERPT’s heavy-traﬃc scaling(Sections 7.2 and 7.3) then show that the results extend to M-Gittins (Section 7.4).

Before starting the heavy-traﬃc analyses of M-Gittins and M-SERPT, we introduce some newnotation. Let 𝐻 𝜌 ( 𝑥 ) = 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) . Deﬁnition 7.1.

The key M/G/1 response time quantities , or simply “key quantities”, of a monotonicSOAP policy 𝜋 are the following:I 𝜋𝑄 = ∫ ∞ (cid:0) 𝐻 𝜌 ( 𝑦 𝜋𝑥 ) + 𝐻 𝜌 ( 𝑧 𝜋𝑥 ) (cid:1) 𝜆𝜏 ( 𝑧 𝜋𝑥 ) 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥, II 𝜋𝑄 = ∫ ∞ 𝜆𝑥𝐻 𝜌 ( 𝑦 𝜋𝑥 ) · 𝐹 ( 𝑥 ) 𝐹 ( 𝑦 𝜋𝑥 ) d 𝑥, II 𝜋𝑅 = ∫ ∞ 𝜆𝑧 𝜋𝑥 𝐻 𝜌 ( 𝑦 𝜋𝑥 ) 𝐻 𝜌 ( 𝑧 𝜋𝑥 ) · 𝐹 ( 𝑥 ) 𝐹 ( 𝑦 𝜋𝑥 ) d 𝑥, III 𝜋𝑅 = ∫ ∞ 𝐻 𝜌 ( 𝑦 𝜋𝑥 ) · 𝐹 ( 𝑥 ) 𝐹 ( 𝑦 𝜋𝑥 ) d 𝑥, ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 23 II 𝜋𝑆 = II 𝜋𝑅 , III 𝜋𝑆 = ∫ ∞ 𝐻 𝜌 ( 𝑦 𝜋𝑥 ) d 𝑥 . When the policy in question is clear, we drop the superscript 𝜋 .In Theorems B.1–B.3 (Appendix B) we show that for any monotonic SOAP policy 𝜋 , E [ 𝑄 𝜋 ] = I 𝜋𝑄 + II 𝜋𝑄 , E [ 𝑅 𝜋 ] = II 𝜋𝑅 + III 𝜋𝑅 , E [ 𝑆 𝜋 ] = II 𝜋𝑆 + III 𝜋𝑆 . Bounding mean waiting, residence, and inﬂated residence times thus amounts to bounding the keyquantities.For the most of the rest of this section we focus on the case where 𝜋 is M-SERPT, deferring theM-Gittins case to Section 7.4. Until then, 𝑦 𝑥 , 𝑧 𝑥 , and the key quantities are understood to have animplicit superscript M-SERPT.The most important step of bounding the key quantities is bounding 𝐻 𝜌 ( 𝑦 𝑥 ) and 𝐻 𝜌 ( 𝑧 𝑥 ) . As aﬁrst step, we bound 𝐻 𝜌 ( 𝑥 ) . Let 𝐹 𝑒 ( 𝑥 ) = E [ 𝑋 ] ∫ ∞ 𝑥 𝐹 ( 𝑡 ) d 𝑡 (7.1)be the tail of the excess of 𝑋 . We can write 𝜌 ( 𝑥 ) as 𝜌 ( 𝑥 ) = ( − 𝜌 ) + 𝜌𝐹 𝑒 ( 𝑥 ) . (7.2)This means that for all 𝜀 ∈ [ , ] , we have 𝐻 𝜌 ( 𝑥 ) ≤ 𝐹 ( 𝑥 ) max { − 𝜌, 𝜌𝐹 𝑒 ( 𝑥 )} ≤ 𝐹 ( 𝑥 )( − 𝜌 ) 𝜀 ( 𝜌𝐹 𝑒 ( 𝑥 )) − 𝜀 = 𝐹 ( 𝑥 ) 𝜀 𝐻 ( 𝑥 ) − 𝜀 ( − 𝜌 ) 𝜀 𝜌 − 𝜀 , (7.3)where 𝐻 ( 𝑥 ) = 𝐹 ( 𝑥 )/ 𝐹 𝑒 ( 𝑥 ) = lim 𝜌 → 𝐻 𝜌 ( 𝑥 ) . This bound is useful because it separates 𝐻 𝜌 ( 𝑥 ) ’sdependence on 𝑥 and 𝜌 : the numerator depends only on 𝑥 , and the denominator depends onlyon 𝜌 . We will typically choose 𝜀 to be either 0 or arbitrarily small.Having bounded 𝐻 𝜌 ( 𝑥 ) in (7.3), we now turn to bounding 𝐻 𝜌 ( 𝑦 𝑥 ) and 𝐻 𝜌 ( 𝑧 𝑥 ) . Recalling thedeﬁnition of 𝑟 SERPT (Deﬁnition 2.1), 𝐻 ( 𝑥 ) = 𝐹 ( 𝑥 ) 𝐹 𝑒 ( 𝑥 ) = E [ 𝑋 ] 𝑟 SERPT ( 𝑥 ) , so Lemma 6.1 and the monotonicity of 𝑟 M-SERPT imply 𝐻 ( 𝑦 𝑥 ) = 𝐻 ( 𝑧 𝑥 ) = E [ 𝑋 ] 𝑟 M-SERPT ( 𝑥 ) = 𝑂 ( ) . (7.4)Combining this with (7.3) yields bounds on 𝐻 𝜌 ( 𝑦 𝑥 ) and 𝐻 𝜌 ( 𝑧 𝑥 ) , though the bounds still have 𝐹 ( 𝑦 𝑥 ) and 𝐹 ( 𝑧 𝑥 ) terms. To better understand 𝐻 𝜌 ( 𝑦 𝑥 ) and 𝐻 𝜌 ( 𝑧 𝑥 ) , we need to use our results from Sec-tion 6 in arguments that depend on what class of distributions contains 𝑋 . We do this over thecourse of Sections 7.2 and 7.3. In this section we study the heavy-traﬃc scaling of M-SERPT’s waiting, residence, and inﬂatedresidence times for inﬁnite-variance job size distributions, speciﬁcally those in OR (− , − ) . Withthat said, many of the intermediate results we prove will also be useful for the ﬁnite-variance OR (−∞ , − ) case (Section 7.3).Suppose that 𝑋 ∈ OR (−∞ , − ) . Combining Theorem 6.2 and (7.4) gives us 𝑦 𝑥 , 𝑧 𝑥 = Θ ( 𝑥 ) ,𝐻 ( 𝑦 𝑥 ) , 𝐻 ( 𝑧 𝑥 ) = Θ (cid:16) 𝑥 (cid:17) . (7.5)This alone is enough to bound all of the key quantities except I 𝑄 . Lemma 7.2.

Under M-SERPT, if 𝑋 ∈ OR (−∞ , − ) , then II 𝑄 , II 𝑅 , III 𝑅 , II 𝑆 , III 𝑆 = 𝑂 (cid:18) log 11 − 𝜌 (cid:19) . Proof.

Our approach is to use the fact that, by (4.5), ∫ ∞ 𝐻 𝜌 ( 𝑥 ) d 𝑥 = ∫ ∞ 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥 = E [ 𝑋 ] 𝜌 log 11 − 𝜌 . (7.6)Because II 𝑅 = II 𝑆 and III 𝑅 ≤ III 𝑆 , it suﬃces to show that the integrands of II 𝑄 , II 𝑆 , and III 𝑆 are all 𝑂 ( 𝐻 𝜌 ( 𝑥 )) .We begin by showing that III 𝑆 ’s integrand is 𝑂 ( 𝐻 𝜌 ( 𝑥 )) . By (7.5) and the fact that 𝑋 ∈ OR (−∞ , − ) ,we have 𝐹 ( 𝑦 𝑥 ) = 𝐹 ( Θ ( 𝑥 )) = Θ ( 𝐹 ( 𝑥 )) , which yields 𝐻 𝜌 ( 𝑦 𝑥 ) = 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) ≤ 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑥 ) = 𝑂 ( 𝐹 ( 𝑥 )) 𝜌 ( 𝑥 ) = 𝑂 ( 𝐻 𝜌 ( 𝑥 )) . (7.7)This implies the desired bound for III 𝑆 and III 𝑅 .We show II 𝑆 ’s integrand is 𝑂 ( 𝐻 𝜌 ( 𝑥 )) by applying (7.3) with 𝜀 =

0, (7.5), and (7.7): 𝜆𝑧 𝑥 𝐻 𝜌 ( 𝑦 𝑥 ) 𝐻 𝜌 ( 𝑧 𝑥 ) ≤ 𝜆𝑧 𝑥 𝐻 𝜌 ( 𝑦 𝑥 ) 𝐻 ( 𝑧 𝑥 ) = 𝑂 ( 𝐻 𝜌 ( 𝑥 )) . This implies the desired bound for II 𝑆 and II 𝑅 . Similarly, 𝜆𝑥𝐻 𝜌 ( 𝑦 𝑥 ) · 𝐹 ( 𝑥 ) 𝐹 ( 𝑦 𝑥 ) ≤ 𝜆𝑥𝐻 𝜌 ( 𝑦 𝑥 ) 𝐻 ( 𝑦 𝑥 ) = 𝑂 ( 𝐻 𝜌 ( 𝑥 )) , implying the bound for II 𝑄 . (cid:3) It remains only to characterize the heavy-traﬃc scaling of I 𝑄 . Treating the OR (−∞ , − ) caserequires some additional care, so we defer it to Section 7.3, focusing on the OR (− , − ) case fornow. The ﬁrst step is to bound 𝜏 ( 𝑥 ) . Lemma 7.3. If 𝑋 ∈ OR (− , − ) , then 𝜏 ( 𝑥 ) = Θ ( 𝑥 𝐹 ( 𝑥 )) . Proof.

By Deﬁnition 2.7, there exists 𝛽 ∈ ( , ) such that 𝜏 ( 𝑥 ) 𝐹 ( 𝑥 ) = ∫ 𝑥 𝜆𝑡𝐹 ( 𝑡 ) 𝐹 ( 𝑥 ) d 𝑡 ≤ 𝑂 ( ) ∫ 𝑥 𝑡 (cid:16) 𝑡𝑥 (cid:17) − 𝛽 d 𝑡 = 𝑂 ( 𝑥 ) , and similarly for the lower bound. (cid:3) ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 25 We now have bounds on every term in I 𝑄 ’s integrand, allowing us to bound I 𝑄 and therebymean response time. Theorem 7.4. If 𝑋 ∈ OR (− , − ) , then in the 𝜌 → limit, E [ 𝑄 M-SERPT-1 ] = 𝑂 (cid:18) log 11 − 𝜌 (cid:19) , E [ 𝑅 M-SERPT-1 ] = 𝑂 (cid:18) log 11 − 𝜌 (cid:19) , and therefore E [ 𝑇 M-SERPT-1 ] = 𝑂 (cid:18) log 11 − 𝜌 (cid:19) . Proof.

By Lemma 7.2, it suﬃces to upper bound I 𝑄 . We compute (cid:0) 𝐻 𝜌 ( 𝑦 𝑥 ) + 𝐻 𝜌 ( 𝑧 𝑥 ) (cid:1) 𝜆𝜏 ( 𝑧 𝑥 ) 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) ≤ (cid:0) 𝐻 ( 𝑦 𝑥 ) + 𝐻 ( 𝑧 𝑥 ) (cid:1) 𝜆𝜏 ( 𝑧 𝑥 ) 𝐻 ( 𝑥 ) 𝜌 ( 𝑥 ) [by (7.3)] = (cid:0) 𝐻 ( 𝑦 𝑥 ) + 𝐻 ( 𝑧 𝑥 ) (cid:1) 𝑂 ( 𝑧 𝑥 𝐹 ( 𝑧 𝑥 )) · 𝐻 ( 𝑥 ) 𝜌 ( 𝑥 ) [by Lem. 7.3] = 𝑂 ( 𝐹 ( 𝑥 )) 𝜌 ( 𝑥 ) [by (7.5)] = 𝑂 ( 𝐻 𝜌 ( 𝑥 )) , so (7.6) implies the desired bound. (cid:3) We now turn to ﬁnite-variance job size distributions, speciﬁcally those in OR (−∞ , − ) , MDA ( Λ ) ,and ENBUE . We begin with the simplest case, which is

ENBUE . Theorem 7.5. If 𝑋 ∈ ENBUE , then in the 𝜌 → limit, E [ 𝑄 M-SERPT-1 ] = Θ (cid:18) − 𝜌 (cid:19) , E [ 𝑅 M-SERPT-1 ] = Θ ( ) , and therefore E [ 𝑇 M-SERPT-1 ] = Θ (cid:18) − 𝜌 (cid:19) . If additionally 𝑋 ∈ Bounded , then in the 𝜌 → limit, E [ 𝑆 M-SERPT-1 ] = Θ ( ) . Proof.

Let 𝑥 max be the supremum of 𝑋 ’s support, so we may have 𝑥 max = ∞ . Because 𝑋 ∈ ENBUE ,there exists age 𝑎 ∗ < 𝑥 max such that • 𝑟 M-SERPT ( 𝑎 ) < 𝑟 M-SERPT ( 𝑎 ∗ ) for all 𝑎 < 𝑎 ∗ , and • 𝑟 M-SERPT ( 𝑎 ) = 𝑟 M-SERPT ( 𝑎 ∗ ) for all 𝑎 ≥ 𝑎 ∗ .This means • 𝑦 𝑥 ≤ 𝑎 ∗ for all sizes 𝑥 , • 𝑧 𝑥 ≤ 𝑎 ∗ for all sizes 𝑥 ≤ 𝑎 ∗ , and • 𝑧 𝑥 = 𝑥 max for all sizes 𝑥 > 𝑎 ∗ . Because 𝜌 ( 𝑎 ∗ ) < 𝜌 ( 𝑥 max ) = − 𝜌, applying (4.4) yields E [ 𝑄 M-SERPT-1 ] = Θ ( ) + ∫ ∞ 𝑎 ∗ 𝜏 ( 𝑥 max ) 𝜌 ( 𝑎 ∗ ) · ( − 𝜌 ) d 𝐹 ( 𝑥 ) = Θ (cid:18) − 𝜌 (cid:19) , E [ 𝑅 M-SERPT-1 ] = Θ ( ) + ∫ ∞ 𝑎 ∗ 𝑥𝜌 ( 𝑎 ∗ ) d 𝐹 ( 𝑥 ) = Θ ( ) . If additionally 𝑋 ∈ Bounded , then 𝑥 max < ∞ , so E [ 𝑆 M-SERPT-1 ] = Θ ( ) + ∫ ∞ 𝑎 ∗ 𝑥 max 𝜌 ( 𝑎 ∗ ) d 𝐹 ( 𝑥 ) = Θ ( ) . (cid:3) We now turn to the OR (−∞ , − ) and MDA ( Λ ) cases, which require the following technicallemma. Lemma 7.6.

Let 𝐿 𝜋 ( 𝑢 ) = 𝑟 𝜋 (cid:0) 𝐹 − 𝑒 ( / 𝑢 ) (cid:1) , where 𝜋 is SERPT or M-SERPT. If 𝑋 ∈ OR (−∞ , − ) , then 𝐿 SERPT , 𝐿

M-SERPT ∈ OR (− , ) , and if 𝑋 ∈ MDA ( Λ ) , then 𝐿 SERPT , 𝐿

M-SERPT ∈ OR (− 𝜀, 𝜀 ) for all 𝜀 > . Proof.

Because 𝐿 M-SERPT is the nonincreasing envelope of 𝐿 SERPT , it suﬃces to prove the re-sult for 𝐿 SERPT . The OR (−∞ , − ) case follows from closure properties of Matuszewska indices [19,Lemmas 4.5 and 4.6]. The MDA ( Λ ) case follows from a result of Kamphorst and Zwart [19, Sec-tion 4.2.2] which states that if 𝑋 ∈ MDA ( Λ ) , then 𝐿 SERPT is slowly varying , a property implying 𝐿 SERPT ∈ OR (− 𝜀, 𝜀 ) for all 𝜀 > (cid:3) One implication of Lemma 7.6 is that if 𝑋 ∈ MDA ( Λ ) , then 𝐻 ( 𝑥 ) = 𝑂 ( 𝐹 ( 𝑥 ) − 𝜀 ) for all 𝜀 > . (7.8)We are now ready to tackle the OR (−∞ , − ) and MDA ( Λ ) cases. As in Section 7.2, we beginby bounding the ﬁve key quantities other than I 𝑄 . Lemma 7.2 does so for OR (−∞ , − ) , and thefollowing lemma does so for MDA ( Λ ) . Lemma 7.7.

Under M-SERPT, if 𝑋 ∈ MDA ( Λ ) , then II 𝑄 , II 𝑅 , III 𝑅 , II 𝑆 = 𝑂 (cid:18) ( − 𝜌 ) 𝜀 (cid:19) for all 𝜀 > . If additionally 𝑋 ∈ MDA ( Λ ) ∩ ( QDHR ∪ QIMRL ) , then III 𝑆 = 𝑂 (cid:18) ( − 𝜌 ) 𝜀 (cid:19) for all 𝜀 > . Proof.

Our overall approach is to use (7.3) on each key quantity to bound it by an expressionof the form ( − 𝜌 ) − 𝜀 · ∫ ∞ Φ ( 𝜀, 𝑥 ) d 𝑥 , where Φ ( 𝜀, 𝑥 ) does not depend on 𝜌 . The challenge is thento show that the integral converges for arbitrarily small 𝜀 > ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 27 We begin with two bounds on 𝐻 𝜌 ( 𝑦 𝑥 ) · 𝐹 ( 𝑥 )/ 𝐹 ( 𝑦 𝑥 ) , a term which appears in the integrands ofseveral key quantities. By (4.3), 𝐻 𝜌 ( 𝑦 𝑥 ) · 𝐹 ( 𝑥 ) 𝐹 ( 𝑦 𝑥 ) ≤ 𝐻 𝜌 ( 𝑦 𝑥 ) , (7.9) 𝐻 𝜌 ( 𝑦 𝑥 ) · 𝐹 ( 𝑥 ) 𝐹 ( 𝑦 𝑥 ) = 𝐹 ( 𝑥 ) 𝜌 ( 𝑦 𝑥 ) ≤ 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) = 𝐻 𝜌 ( 𝑥 ) . (7.10)Combining (7.10) with (7.6) implies the desired bound for III 𝑅 .We now bound II 𝑄 . To do so, we apply (7.3) twice, choosing 𝜀 = 𝐻 𝜌 ( 𝑦 𝑥 ) and arbitrarilysmall 𝜀 > 𝐻 𝜌 ( 𝑥 ) :II 𝑄 ≤ ∫ ∞ 𝜆𝑥𝐻 𝜌 ( 𝑦 𝑥 ) 𝐻 𝜌 ( 𝑥 ) d 𝑥 [by (7.10)] ≤ ( − 𝜌 ) 𝜀 ∫ ∞ 𝜆𝑥𝐹 ( 𝑥 ) 𝜀 𝐻 ( 𝑦 𝑥 ) 𝐻 ( 𝑥 ) − 𝜀 d 𝑥 [by (7.3)] ≤ 𝑂 ( )( − 𝜌 ) 𝜀 ∫ ∞ 𝑥𝐹 ( 𝑥 ) 𝜀 𝐹 ( 𝑥 ) − 𝜀 ( − 𝜀 ) d 𝑥 [by (7.4), (7.8)] ≤ 𝑂 ( )( − 𝜌 ) 𝜀 ∫ ∞ 𝑥 − 𝛼𝜀 d 𝑥, [by Lem. 2.13] where we may choose 𝛼 > 𝛼 > / 𝜀 makes the integral converge, soII 𝑄 = 𝑂 (( − 𝜌 ) − 𝜀 ) . The computation for II 𝑆 is similar:II 𝑆 ≤ ( − 𝜌 ) 𝜀 ∫ ∞ 𝜆𝑧 𝑥 𝐹 ( 𝑧 𝑥 ) 𝜀 𝐻 ( 𝑦 𝑥 ) 𝐻 ( 𝑧 𝑥 ) − 𝜀 d 𝑥 [by (7.3), (7.9)] ≤ 𝑂 ( )( − 𝜌 ) 𝜀 ∫ ∞ 𝑧 − 𝛼𝜀𝑥 d 𝑥 . [by (7.4), Lem. 2.13] Because 𝑧 𝑥 ≥ 𝑥 , the integral converges if we choose 𝛼 > / 𝜀 , so II 𝑆 = 𝑂 (( − 𝜌 ) − 𝜀 ) . This alsocovers II 𝑅 because II 𝑅 = II 𝑆 .If additionally 𝑋 ∈ MDA ( Λ ) ∩ ( QDHR ∪ QIMRL ) with exponent 𝛾 , then we can similarlybound III 𝑆 : III 𝑆 ≤ ( − 𝜌 ) 𝜀 ∫ ∞ 𝐹 ( 𝑦 𝑥 ) 𝜀 𝐻 ( 𝑦 𝑥 ) − 𝜀 d 𝑥 [by (7.3)] ≤ 𝑂 ( )( − 𝜌 ) 𝜀 ∫ ∞ 𝑦 − 𝛼𝜀𝑥 d 𝑥 [by (7.4), Lem. 2.13] ≤ 𝑂 ( )( − 𝜌 ) 𝜀 ∫ ∞ 𝑥 − 𝛼𝜀 / 𝛾 d 𝑥, [by Thm. 6.3] so choosing 𝛼 > 𝛾 / 𝜀 shows that III 𝑆 = 𝑂 (( − 𝜌 ) − 𝜀 ) . (cid:3) It remains only to characterize the heavy-traﬃc scaling of I 𝑄 . Lemma 7.8.

Under M-SERPT, if 𝑋 ∈ OR (−∞ , − ) ∪ MDA ( Λ ) , then I 𝑄 = ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! . Proof.

Because E [ 𝑋 ] < ∞ , we have 𝜏 ( 𝑥 ) = Θ ( ) , so by (7.3) and (7.4),I 𝑄 = ∫ ∞ Θ ( ) 𝑟 M-SERPT ( 𝑥 ) · 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥 . For the lower bound, we integrate up to 𝐹 − 𝑒 ( − 𝜌 ) instead of ∞ . For 𝑥 ≤ 𝐹 − 𝑒 ( − 𝜌 ) , we have 𝐹 𝑒 ( 𝑥 ) ≥ − 𝜌 , so (7.2) implies 𝜌𝐹 𝑒 ( 𝑥 ) ≤ 𝜌 ( 𝑥 ) ≤ ( + 𝜌 ) 𝐹 𝑒 ( 𝑥 ) . Using this fact along with the monotonicity of 𝑟 M-SERPT yieldsI 𝑄 ≥ Ω ( ) 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ∫ 𝐹 − 𝑒 ( − 𝜌 ) 𝐹 ( 𝑥 ) 𝐹 𝑒 ( 𝑥 ) d 𝑥 = Ω ( ) 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) 𝐹 𝑒 (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) − ! [by (7.1)] = Ω ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! . For the upper bound, we split the integration region at 𝐹 − 𝑒 ( − 𝜌 ) :I 𝑄 = ∫ 𝐹 − 𝑒 ( − 𝜌 ) 𝑂 ( ) 𝑟 M-SERPT ( 𝑥 ) · 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥 + ∫ ∞ 𝐹 − 𝑒 ( − 𝜌 ) 𝑂 ( ) 𝑟 M-SERPT ( 𝑥 ) · 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥 . (7.11)The second integral in (7.11) is simple to bound using the monotonicity of 𝑟 M-SERPT : ∫ ∞ 𝐹 − 𝑒 ( − 𝜌 ) 𝑂 ( ) 𝑟 M-SERPT ( 𝑥 ) · 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥 ≤ 𝑂 ( ) 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ∫ ∞ 𝐹 − 𝑒 ( − 𝜌 ) 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥 ≤ 𝑂 ( ) 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) − 𝜌 − − 𝜌 + 𝜌𝐹 − 𝑒 ( − 𝜌 ) ! [by (4.5), (7.2)] = 𝑂 ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! . To bound the ﬁrst integral in (7.11), we change variables to 𝑢 = / 𝐹 𝑒 ( 𝑥 ) : ∫ 𝐹 − 𝑒 ( − 𝜌 ) 𝑂 ( ) 𝑟 M-SERPT ( 𝑥 ) · 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) d 𝑥 ≤ ∫ 𝐹 − 𝑒 ( − 𝜌 ) 𝑂 ( ) 𝑟 M-SERPT ( 𝑥 ) · 𝐹 ( 𝑥 ) 𝐹 𝑒 ( 𝑥 ) d 𝑥 [by (7.2)] = ∫ /( − 𝜌 ) 𝑂 ( ) 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( / 𝑢 ) (cid:1) d 𝑢 = 𝑂 ( ) ∫ /( − 𝜌 ) 𝐿 M-SERPT ( 𝑢 ) d 𝑢, where 𝐿 M-SERPT is as in Lemma 7.6. By Lemma 7.6, we have 𝐿 M-SERPT ∈ OR (− , ∞) , so a result inKaramata theory [8, Theorem 2.6.1] implies ∫ v 𝐿 M-SERPT ( 𝑢 ) d 𝑢 = 𝑂 ( v 𝐿 M-SERPT ( v )) ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 29 in the v → ∞ limit. Letting v = /( − 𝜌 ) yields the desired bound. (cid:3) Having characterized the heavy-traﬃc scaling of all the key quantities, the main heavy-traﬃcresults for OR (−∞ , − ) and MDA ( Λ ) follow easily. Theorem 7.9. If 𝑋 ∈ OR (−∞ , − ) , then in the 𝜌 → limit, E [ 𝑄 M-SERPT-1 ] = Θ ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! = Ω (cid:18) ( − 𝜌 ) 𝛿 (cid:19) for some 𝛿 > , E [ 𝑅 M-SERPT-1 ] ≤ E [ 𝑆 M-SERPT-1 ] = Θ (cid:18) log 11 − 𝜌 (cid:19) , and therefore E [ 𝑇 M-SERPT-1 ] = Θ ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! . Proof.

After applying Lemmas 7.2 and 7.8, it remains only to show I 𝑄 = Ω (( − 𝜌 ) − 𝛿 ) . Using 𝐿 M-SERPT from Lemma 7.6, we can rewrite Lemma 7.8 asI 𝑄 = Θ (cid:18) − 𝜌 𝐿 M-SERPT (cid:18) − 𝜌 (cid:19) (cid:19) . (7.12)By Lemma 7.6, we have 𝐿 ∈ OR (− , ) , which means there exists 𝛽 ∈ ( , ) such that 𝐿 ( 𝑢 ) = Ω ( 𝑢 − 𝛽 ) in the 𝑢 → ∞ limit. Letting 𝛿 = − 𝛽 and 𝑢 = /( − 𝜌 ) yields the desired bound. (cid:3) Theorem 7.10. If 𝑋 ∈ MDA ( Λ ) , then in the 𝜌 → limit, E [ 𝑄 M-SERPT-1 ] = Θ ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! = Ω (cid:18) ( − 𝜌 ) − 𝜀 (cid:19) for all 𝜀 > , E [ 𝑅 M-SERPT-1 ] = 𝑂 (cid:18) ( − 𝜌 ) 𝜀 (cid:19) for all 𝜀 > , and therefore E [ 𝑇 M-SERPT-1 ] = Θ ( − 𝜌 ) · 𝑟 M-SERPT (cid:0) 𝐹 − 𝑒 ( − 𝜌 ) (cid:1) ! . If additionally 𝑋 ∈ MDA ( Λ ) ∩ ( QDHR ∪ QIMRL ) , then E [ 𝑆 M-SERPT-1 ] = 𝑂 (cid:18) ( − 𝜌 ) 𝜀 (cid:19) for all 𝜀 > . Proof.

After applying Lemmas 7.7 and 7.8, it remains only to show I 𝑄 = Ω (( − 𝜌 ) −( − 𝜀 ) ) . Thisfollows from (7.12) and Lemma 7.6, similarly to the proof of Theorem 7.9. (cid:3) Having characterized heavy-traﬃc scaling under M-SERPT, we now do the same for Gittins andM-Gittins. Our ﬁrst result shows that the mean waiting and residence times of Gittins and M-Git-tins have the same heavy-traﬃc scaling as that of M-SERPT. Note that the precondition holds forall of the job size distributions we consider in Section 7.3. Theorem 7.11.

In the 𝜌 → limit, E [ 𝑅 Gittins-1 ] , E [ 𝑅 M-Gittins-1 ] = 𝑂 ( E [ 𝑅 M-SERPT-1 ]) , and if E [ 𝑅 M-SERPT-1 ] = 𝑂 ( E [ 𝑄 M-SERPT-1 ]) , then E [ 𝑄 Gittins-1 ] , E [ 𝑄 M-Gittins-1 ] = Θ ( E [ 𝑄 M-SERPT-1 ]) . Proof.

The residence time result follows immediately from results of Scully et al. [33, Eq. (3.8)and Proposition 4.8], which imply E [ 𝑅 Gittins-1 ] ≤ E [ 𝑅 M-Gittins-1 ] ≤ E [ 𝑅 M-SERPT-1 ] . For waiting time, we ﬁrst invoke further results of Scully et al. [33, Proposition 4.7 and Lemma 5.6],which imply E [ 𝑄 Gittins-1 ] ≥ E [ 𝑄 M-Gittins-1 ] ≥ E [ 𝑄 M-SERPT-1 ] . It thus suﬃces to show E [ 𝑄 Gittins-1 ] = 𝑂 ( E [ 𝑄 M-SERPT-1 ]) . Because Gittins minimizes mean responsetime [3, 4, 12], we have E [ 𝑄 Gittins-1 ] ≤ E [ 𝑇 Gittins-1 ] ≤ E [ 𝑇 M-SERPT-1 ] = E [ 𝑄 M-SERPT-1 ] + E [ 𝑅 M-SERPT-1 ] , so the result follows from the E [ 𝑅 M-SERPT-1 ] = 𝑂 ( E [ 𝑄 M-SERPT-1 ]) precondition. (cid:3) Our ﬁnal heavy-traﬃc result shows that for certain job size distributions, under M-Gittins, meanwaiting time dominates mean inﬂated residence time. The conditions are the same as those shownfor M-SERPT over the course of Section 7.3, except

QDHR ∪ QIMRL is replaced by

QDHR . Theorem 7.12. If 𝑋 ∈ OR (−∞ , − ) ∪ ( MDA ( Λ ) ∩ QDHR ) ∪

Bounded , then in the 𝜌 → limit, E [ 𝑆 M-Gittins-1 ] = 𝑜 ( E [ 𝑄 M-Gittins-1 ]) . More speciﬁcally, E [ 𝑆 M-Gittins-1 ] obeys the same scaling bounds as shown for E [ 𝑆 M-SERPT-1 ] in Theo-rems 7.5, 7.9 and 7.10. Proof.

The proof is very similar to the proofs of analogous results for M-SERPT (Theorems 7.5,7.9 and 7.10), so we just describe the diﬀerences. • If 𝑋 ∈ OR (−∞ , − ) , we follow the same proof as Theorem 7.9 and the lemmas it requires,except we use Theorem 6.4 to bound 𝑦 M-Gittins 𝑥 and 𝑧 M-Gittins 𝑥 . • If 𝑋 ∈ MDA ( Λ ) ∩ QDHR , we follow the same proof as Theorem 7.10 and the lemmas itrequires, except we use Theorem 6.5 to bound 𝑦 M-Gittins 𝑥 and 𝑧 M-Gittins 𝑥 . • If 𝑋 ∈ Bounded , we follow the same proof as Theorem 7.5, except we use a result of Aaltoet al. [4, Proposition 9] to justify the existence of the critical age 𝑎 ∗ . (cid:3) With some extra eﬀort, one can show it also holds for 𝑋 ∈ OR (− , − ) . ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 31 We study optimal scheduling in the M/G/ k to minimize mean response time. This problem is solvedby the Gittins policy for the single-server 𝑘 = M-Gittins (Deﬁnition 2.4)and show that it minimizes mean response time in the heavy-traﬃc M/G/ k for a large class ofﬁnite-variance job size distributions (Theorem 3.1). We also show that the simple and practicalM-SERPT policy is a 2-approximation for mean response time in the heavy-traﬃc M/G/ k undersimilar conditions (Theorem 3.2). As a byproduct of our M/G/ k study, we obtain results characteriz-ing the heavy-traﬃc scaling of M/G/1 mean response time under Gittins, M-Gittins, and M-SERPT(Theorem 3.3).A natural question to ask is whether the conditions under which we prove M-Gittins’s optimalitycan be relaxed, particularly the QDHR and

Bounded assumptions. The diﬃculty lies in the fact thatfor some job size distributions, the bound in Theorem 5.1 is not strong enough because inﬂatedresidence time is inﬁnite. It is possible that the techniques used by Köllerström [21, 22] to analyzethe heavy-traﬃc M/G/ k under FCFS could be helpful, seeing as FCFS has inﬁnite inﬂated residencetime.Another major open question is analyzing the performance of M-Gittins outside of the heavy-traﬃc limit. In the single-server case, one can generalize the techniques of Scully et al. [33] toshow that M-Gittins is a 3-approximation for M/G/1 mean response time at all loads. However, themultiserver case remains open. ACKNOWLEDGMENTS

This work was supported by NSF grants CMMI-1938909, XPS-1629444, and CSR-1763701; and aGoogle 2020 Faculty Research Award.

REFERENCES [1] Samuli Aalto and Urtzi Ayesta. 2006. Mean delay analysis of multi level processor sharing disciplines. In

INFOCOM2006. 25th IEEE International Conference on Computer Communications. Proceedings . IEEE, 1–11.[2] S Aalto and U Ayesta. 2006. On the nonoptimality of the foreground-background discipline for IMRL service times.

Journal of Applied Probability

43, 2 (2006), 523–534.[3] Samuli Aalto, Urtzi Ayesta, and Rhonda Righter. 2009. On the Gittins index in the M/G/1 queue.

Queueing Systems

63, 1 (2009), 437–458.[4] Samuli Aalto, Urtzi Ayesta, and Rhonda Righter. 2011. Properties of the Gittins index with application to optimalscheduling.

Probability in the Engineering and Informational Sciences

25, 03 (2011), 269–288.[5] Martin F. Arlitt and Carey L. Williamson. 1996. Web server workload characterization: The search for invariants.

ACM SIGMETRICS Performance Evaluation Review

24, 1 (1996), 126–137.[6] Nikhil Bansal, Bart Kamphorst, and Bert Zwart. 2018. Achievable performance of blind policies in heavy traﬃc.

Mathematics of Operations Research

43, 3 (2018), 949–964.[7] Luca Becchetti and Stefano Leonardi. 2004. Nonclairvoyant scheduling to minimize the total ﬂow time on single andparallel machines.

Journal of the ACM (JACM)

51, 4 (2004), 517–539.[8] N. Bingham, C. Goldie, and J. Teugels. 1987.

Regular Variation . Cambridge University Press.[9] Yan Chen and Jing Dong. 2020. Scheduling with service-time information: The power of two priority classes. (2020).Preprint.[10] M. E. Crovella and A. Bestavros. 1997. Self-similarity in World Wide Web traﬃc: evidence and possible causes.

IEEE/ACM Transactions on Networking

5, 6 (1997), 835–846.[11] Hanhua Feng and Vishal Misra. 2003. Mixed scheduling disciplines for network ﬂows. In

ACM SIGMETRICS Perfor-mance Evaluation Review , Vol. 31. ACM, 36–39.[12] John C. Gittins, Kevin D. Glazebrook, and Richard Weber. 2011.

Multi-armed Bandit Allocation Indices . John Wiley &Sons.[13] Kevin D Glazebrook. 2003. An analysis of Klimov’s problem with parallel servers.

Mathematical Methods of OperationsResearch

58, 1 (2003), 1–28. [14] Kevin D Glazebrook and José Niño-Mora. 2001. Parallel scheduling of multiclass M/M/m queues: Approximate andheavy-traﬃc optimization of achievable performance.

Operations Research

49, 4 (2001), 609–623.[15] Isaac Grosof, Ziv Scully, and Mor Harchol-Balter. 2018. SRPT for multiserver systems.

Performance Evaluation v a.2018.10.001[16] Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action (1st ed.).Cambridge University Press, New York, NY, USA.[17] Mor Harchol-Balter and Allen B. Downey. 1997. Exploiting Process Lifetime Distributions for Dynamic Load Balanc-ing.

ACM Trans. Comput. Syst.

15, 3 (Aug. 1997), 253–285. https://doi.org/10.1145/263326.263344[18] Bala Kalyanasundaram and Kirk R Pruhs. 1997. Minimizing ﬂow time nonclairvoyantly. In

Proceedings 38th AnnualSymposium on Foundations of Computer Science . IEEE, 345–352.[19] Bart Kamphorst and Bert Zwart. 2020. Heavy-Traﬃc Analysis of Sojourn Time Under the Foreground–BackgroundScheduling Policy.

Stochastic Systems

10, 1 (2020), 1–28. https://doi.org/10.1287/stsy.2019.0036[20] Leonard Kleinrock. 1976.

Queueing Systems, Volume 2: Computer Applications . Vol. 66. Wiley New York.[21] Julian Köllerström. 1974. Heavy Traﬃc Theory for Queues with Several Servers. I.

Journal of Applied Probability

ACM SIGMETRICS Performance Evaluation Review , Vol. 38. ACM, 12–14.[24] Natalia Osipova, Urtzi Ayesta, and Konstantin Avrachenkov. 2009. Optimal policy for multi-class scheduling in asingle server queue. In

Teletraﬃc Congress, 2009. ITC 21 2009. 21st International . IEEE, 1–8.[25] Kihong Park and Walter Willinger. 2000. Self-Similar Network Traﬃc: An Overview.

Self-Similar Network Traﬃc andPerformance Evaluation (2000), 1–38.[26] Sidney I Resnick. 2013.

Extreme values, regular variation and point processes . Springer.[27] Rhonda Righter and J George Shanthikumar. 1989. Scheduling multiclass single server queueing systems to stochas-tically maximize the number of successful departures.

Probability in the Engineering and Informational Sciences

3, 3(1989), 323–333.[28] Rhonda Righter, J George Shanthikumar, and Genji Yamazaki. 1990. On extremal service disciplines in single-stagequeueing systems.

Journal of Applied Probability

27, 2 (1990), 409–416.[29] Linus Schrage. 1968. A proof of the optimality of the shortest remaining processing time discipline.

OperationsResearch

16, 3 (1968), 687–690.[30] Linus E Schrage. 1967. The queue M/G/1 with feedback to lower priority queues.

Management Science

13, 7 (1967),466–474.[31] Linus E Schrage and Louis W Miller. 1966. The queue M/G/1 with the shortest remaining processing time discipline.

Operations Research

14, 4 (1966), 670–684.[32] Ziv Scully, Mor Harchol-Balter, and Alan Scheller-Wolf. 2018. SOAP: One Clean Analysis of All Age-Based SchedulingPolicies.

Proc. ACM Meas. Anal. Comput. Syst.

2, 1, Article 16 (April 2018), 30 pages. https://doi.org/10.1145/3179419[33] Ziv Scully, Mor Harchol-Balter, and Alan Scheller-Wolf. 2020. Simple Near-Optimal Scheduling for the M/G/1.

Proc.ACM Meas. Anal. Comput. Syst.

4, 1, Article 11 (March 2020), 29 pages. https://doi.org/10.1145/3379477[34] Moshe Shaked and J George Shanthikumar. 2007.

Stochastic orders . Springer Science & Business Media.[35] Adam Wierman, Mor Harchol-Balter, and Takayuki Osogami. 2005. Nearly insensitive bounds on SMART scheduling.In

ACM SIGMETRICS Performance Evaluation Review , Vol. 33. ACM, 205–216.

A DIFFICULTY OF M/G/ k ANALYSIS FOR NONMONOTONIC RANK FUNCTIONS

In this appendix we explain why Theorem 5.1 does not readily generalize to SOAP policies withnonmonotonic rank functions.Recall that the proof of Theorem 5.1 considers a tagged job 𝐽 of size 𝑥 and considers severalcategories of work completed while 𝐽 is in the system. Our focus here is on relevant work, whichis work on jobs that are prioritized ahead of 𝐽 . Let 𝑠 𝜋 - 𝑘𝑥 be the maximum age at which a new job,namely one that arrives after 𝐽 , can contribute relevant work under 𝜋 - k . When 𝜋 is monotonic, 𝑠 𝜋 - 𝑘𝑥 does not depend on the number of servers 𝑘 . Speciﬁcally, we have 𝑠 𝜋 - 𝑘𝑥 = 𝑦 𝜋𝑥 . The problem fornonmonotonic SOAP policies 𝜋 is that, as we show below, we can have 𝑠 𝜋 - 𝑘𝑥 > 𝑠 𝜋 -1 𝑥 when 𝑘 ≥ 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 generalized to all SOAP policies 𝜋 . • If 𝜋 is monotonic, then 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 are given by Deﬁnition 4.1. ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 33 𝑎 𝑦 𝜋𝑥 = 𝑠 𝜋 -1 𝑥 𝑏 𝑐 = 𝑠 𝜋 -2 𝑥 𝑥 𝑧 𝜋𝑥 𝑟 M- 𝜋 ( 𝑎 ) 𝑟 𝜋 ( 𝑎 ) Fig. A.1. Age Cutoﬀs for Nonmonotonic Rank Functions • If 𝜋 is nonmonotonic, we can deﬁne 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 in terms of a monotonic SOAP policy relatedto 𝜋 [33]. Speciﬁcally, letting M- 𝜋 be the monotonic SOAP policy with rank function 𝑟 M- 𝜋 ( 𝑎 ) = max 𝑏 ∈[ ,𝑎 ] 𝑟 𝜋 ( 𝑏 ) , we deﬁne 𝑦 𝜋𝑥 = 𝑦 M- 𝜋𝑥 and 𝑧 𝜋𝑥 = 𝑧 M- 𝜋𝑥 .Consider the example SOAP policy 𝜋 and tagged job size 𝑥 shown in Fig. A.1. In the single-server 𝑘 = 𝑠 𝜋 -1 𝑥 = 𝑦 𝜋𝑥 . To see why, consider the moment a new job 𝐽 ′ reaches age 𝑦 𝜋𝑥 whilethe tagged job 𝐽 is still in the system. For this to occur, it must be that 𝐽 is also at age 𝑦 𝜋𝑥 , becauseotherwise 𝐽 would have priority over 𝐽 ′ . With both 𝐽 and 𝐽 ′ at the same rank, the FCFS tiebreakerprioritizes 𝐽 . Thereafter, 𝐽 never has rank worse than 𝑟 𝜋 ( 𝑦 𝜋𝑥 ) , so 𝐽 ′ remains stuck at age 𝑦 𝜋𝑥 and isnever prioritized over 𝐽 .We now reconsider the same example from Fig. A.1 but with 𝑘 ≥ 𝐽 ′ can receive service even while 𝐽 has better rank because 𝐽 and 𝐽 ′ can occupy diﬀerent servers simultaneously. This means 𝐽 ′ no longer gets stuck at age 𝑦 𝜋𝑥 .In particular, if 𝐽 reaches age 𝑐 and 𝐽 ′ passes age 𝑏 , then 𝐽 ′ contributes relevant work between ages 𝑏 and 𝑐 . Therefore, 𝑠 𝜋 - 𝑘𝑥 = 𝑐 > 𝑠 𝜋 -1 𝑥 for 𝑘 ≥ 𝐽 ′ will contribute relevantwork until it completes or reaches age 𝑠 𝜋 - 𝑘𝑥 . This is a worst-case estimate, because the tagged job 𝐽 might complete before 𝐽 ′ completes or reaches age 𝑠 𝜋 - 𝑘𝑥 . When 𝜋 is monotonic, we have 𝑠 𝜋 - 𝑘𝑥 = 𝑠 𝜋 -1 𝑥 ,so this overestimate is tight enough to compare the mean response times under 𝜋 - k and 𝜋 -1. How-ever, when 𝜋 is nonmonotonic, it may be that 𝑠 𝜋 - 𝑘𝑥 > 𝑠 𝜋 -1 𝑥 , as explained above, so we do not obtaina tight comparison between the 𝜋 - k and 𝜋 -1 systems. This suggests generalizing Theorem 5.1 tononmonotonic SOAP policies requires not relying as heavily on worst-case quantities like 𝑠 𝜋 - 𝑘𝑥 . B NEW FORMULAS FOR MEAN WAITING AND RESIDENCE TIMES

In this appendix we prove the following new formulas for mean waiting, residence, and inﬂatedresidence times.

Theorem B.1.

Under any monotonic SOAP policy 𝜋 , E [ 𝑄 𝜋 -1 ] = ∫ ∞ (cid:18) 𝐹 ( 𝑦 𝜋𝑥 ) 𝜌 ( 𝑦 𝜋𝑥 ) + 𝐹 ( 𝑧 𝜋𝑥 ) 𝜌 ( 𝑧 𝜋𝑥 ) (cid:19) 𝜆𝜏 ( 𝑧 𝜋𝑥 ) 𝐹 ( 𝑥 ) 𝜌 ( 𝑥 ) + 𝜆𝑥𝐹 ( 𝑦 𝜋𝑥 ) 𝐹 ( 𝑥 ) 𝜌 ( 𝑦 𝜋𝑥 ) ! d 𝑥 . Theorem B.2.

Under any monotonic SOAP policy 𝜋 , E [ 𝑅 𝜋 -1 ] = ∫ ∞ (cid:18) 𝜆𝑧 𝜋𝑥 𝐹 ( 𝑥 ) 𝐹 ( 𝑧 𝜋𝑥 ) 𝜌 ( 𝑦 𝜋𝑥 ) 𝜌 ( 𝑧 𝜋𝑥 ) + 𝐹 ( 𝑥 ) 𝜌 ( 𝑦 𝜋𝑥 ) (cid:19) d 𝑥 . Theorem B.3.

Under any monotonic SOAP policy 𝜋 , E [ 𝑆 𝜋 -1 ] = ∫ ∞ (cid:18) 𝜆𝑧 𝜋𝑥 𝐹 ( 𝑥 ) 𝐹 ( 𝑧 𝜋𝑥 ) 𝜌 ( 𝑦 𝜋𝑥 ) 𝜌 ( 𝑧 𝜋𝑥 ) + 𝐹 ( 𝑦 𝜋𝑥 ) 𝜌 ( 𝑦 𝜋𝑥 ) (cid:19) d 𝑥 . Proving these results requires new technical machinery for, roughly speaking, performing inte-gration by parts on expressions involving 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 , such as those in (4.4). Appendix B.1 introducesthe general technical machinery, which Appendix B.2 then applies to prove the above results.Throughout this appendix, 𝜕 denotes the derivative operator, and [ 𝑡 , . . . , 𝑡 𝑛 ↦→ RHS ] denotesthe function that maps variables 𝑡 , . . . , 𝑡 𝑛 to expression RHS. B.1 Integration by Parts with Hills and Valleys

Deﬁnition B.4. A hill-valley partition of R + is a sequence0 = 𝑢 ≤ v < 𝑢 < v < 𝑢 < v < . . . . Intervals of the form ( 𝑢 𝑖 , v 𝑖 ] are called valleys , and intervals of the form ( v 𝑖 , 𝑢 𝑖 + ] are called hills . Deﬁnition B.5.

Functions 𝑦, 𝑧 : R + → R + are a hill-valley pair for a given hill-valley partition iffor each valley ( 𝑢 𝑖 , v 𝑖 ] , 𝑦 ( 𝑥 ) = 𝑢 𝑖 , 𝑧 ( 𝑥 ) = v 𝑖 , for all 𝑥 ∈ ( 𝑢 𝑖 , v 𝑖 ] , and for each hill ( v 𝑖 , 𝑢 𝑖 + ] , 𝑦 ( 𝑥 ) = 𝑥, 𝑧 ( 𝑥 ) = 𝑥, for all 𝑥 ∈ ( v 𝑖 , 𝑢 𝑖 + ] . For compactness, we write 𝑦 𝑥 = 𝑦 ( 𝑥 ) and 𝑧 𝑥 = 𝑧 ( 𝑥 ) .It is simple to check that for any monotonic SOAP policy 𝜋 , the pair 𝑦 𝜋 , 𝑧 𝜋 (Deﬁnition 4.1) is ahill-valley pair. Deﬁnition B.6.

For functions Φ : R + → R + , we deﬁne the diﬀerence ratio operator Δ as follows: ΔΦ (h 𝑢, v i) =  Φ ( v ) − Φ ( 𝑢 ) v − 𝑢 if 𝑢 ≠ v 𝜕 Φ ( 𝑢 ) if 𝑢 = v , where 𝜕 is the derivative operator. Similarly, for functions with multiple arguments, Δ 𝑖 is a versionof Δ that works on the 𝑖 th argument: Δ 𝑖 Φ ( . . . , h 𝑢, v i , . . . ) = Δ [ 𝑡 ↦→ Φ ( . . . , 𝑡, . . . )] (h 𝑢, v i) . Like 𝜕 , it is easily seen that Δ is a linear operator. When applied to polynomials, Δ elegantlygeneralizes 𝜕 . For example, Δ (cid:20) 𝑡 ↦→ 𝑡 (cid:21) (h 𝑢, v i) = 𝑢 v . (B.1)The Δ operator also obeys various chain-rule-like identities. We highlight the two we use below. Lemma B.7.

Let Φ , Ψ : R → R be diﬀerentiable. For all 𝑢, v ∈ R , Δ [ 𝑡 ↦→ Φ ( Ψ ( 𝑡 ))] (h 𝑢, v i) = ΔΦ (h Ψ ( 𝑢 ) , Ψ ( v )i) ΔΨ (h 𝑢, v i) . Proof. If 𝑢 = v , this is the chain rule. If 𝑢 ≠ v but Ψ ( 𝑢 ) = Ψ ( v ) , then both sides are 0. If Ψ ( 𝑢 ) ≠ Ψ ( v ) , then the result follows by a simple computation. (cid:3) We borrow the terms “hill” and “valley” from Scully et al. [33], who use a similar concept to analyze SOAP policies, butthis deﬁnition is abstracted away from the details of SOAP. As a corner case, we consider the ﬁrst hill or valley to alsoinclude 0. ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 35

Lemma B.8.

Let Φ : R → R be diﬀerentiable. For all 𝑢, v ∈ R , Δ [ 𝑡 ↦→ Φ ( 𝑡, 𝑡 )] (h 𝑢, v i) = Δ Φ ( 𝑢, h 𝑢, v i) + Δ Φ (h 𝑢, v i , v ) . Proof. If 𝑢 = v , this is the multivariable chain rule. If 𝑢 ≠ v , ( v − 𝑢 ) Δ [ 𝑡 ↦→ Φ ( 𝑡, 𝑡 )] (h 𝑢, v i) = Φ ( v , v ) − Φ ( 𝑢, 𝑢 ) = Φ ( v , v ) − Φ ( 𝑢, v ) + Φ ( 𝑢, v ) − Φ ( 𝑢, 𝑢 ) = ( v − 𝑢 ) ( Δ Φ (h 𝑢, v i , v ) + Δ Φ ( 𝑢, h 𝑢, v i)) . (cid:3) The most important result of this appendix is the following lemma, which formulates a versionof integration by parts that works for hill-valley pairs despite their discontinuity.

Lemma B.9.

Let 𝑦, 𝑧 be a hill-valley pair, Φ : R + → R be diﬀerentiable, 𝑃 : R + → R be diﬀeren-tiable, and 𝑃 ( 𝑥 ) = 𝑐 − 𝑃 ( 𝑥 ) for some 𝑐 ∈ R . If 𝑃 ( ) Φ ( , , 𝑧 ) = , lim 𝑥 →∞ 𝑃 ( 𝑥 ) Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) = , then ∫ ∞ Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) 𝜕𝑃 ( 𝑥 ) d 𝑥 = ∫ ∞ (cid:16) 𝑃 ( 𝑦 𝑥 ) Δ Φ ( 𝑦 𝑥 , 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) + 𝑃 ( v ) Δ Φ (h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 , 𝑧 𝑥 ) (cid:17) d 𝑥 . Proof.

For each valley ( 𝑢, v ] , ∫ v 𝑢 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) 𝜕𝑃 ( 𝑥 ) d 𝑥 = ∫ v 𝑢 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑢, 𝑥, v ) d 𝑥 + 𝑃 ( 𝑢 ) Φ ( 𝑢, 𝑢, v ) − 𝑃 ( v ) Φ ( 𝑢, v , v ) = ∫ v 𝑢 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑢, 𝑥, v ) d 𝑥 + 𝑃 ( 𝑢 ) Φ ( 𝑢, 𝑢, 𝑢 ) − 𝑃 ( v ) Φ ( v , v , v )+ ( v − 𝑢 ) 𝑃 ( 𝑢 ) Δ Φ ( 𝑢, 𝑢, h 𝑢, v i) + ( v − 𝑢 ) 𝑃 ( v ) Δ Φ (h 𝑢, v i , v , v ) = ∫ v 𝑢 (cid:16) 𝑃 ( 𝑢 ) Δ Φ ( 𝑢, 𝑢, h 𝑢, v i) + 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑢, 𝑥, v ) + 𝑃 ( v ) Δ Φ (h 𝑢, v i , v , v ) (cid:17) d 𝑥 + 𝑃 ( 𝑢 ) Φ ( 𝑢, 𝑢, 𝑢 ) − 𝑃 ( v ) Φ ( v , v , v ) = ∫ v 𝑢 (cid:16) 𝑃 ( 𝑦 𝑥 ) Δ Φ ( 𝑦 𝑥 , 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) + 𝑃 ( v ) Δ Φ (h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 , 𝑧 𝑥 ) (cid:17) d 𝑥 + 𝑃 ( 𝑢 ) Φ ( 𝑢, 𝑢, 𝑢 ) − 𝑃 ( v ) Φ ( v , v , v ) . For each hill ( v , 𝑢 ] , ∫ 𝑢 v Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) 𝜕𝑃 ( 𝑥 ) d 𝑥 = ∫ 𝑢 v 𝑃 ( 𝑥 ) 𝜕 [ 𝑡 → Φ ( 𝑡, 𝑡, 𝑡 )] ( 𝑥 ) d 𝑥 + 𝑃 ( v ) Φ ( v , v , v ) − 𝑃 ( 𝑢 ) Φ ( 𝑢, 𝑢, 𝑢 ) = ∫ 𝑢 v (cid:16) 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑥, 𝑥, 𝑥 ) + 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑥, 𝑥, 𝑥 ) + 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑥, 𝑥, 𝑥 ) (cid:17) d 𝑥 + 𝑃 ( v ) Φ ( v , v , v ) − 𝑃 ( 𝑢 ) Φ ( 𝑢, 𝑢, 𝑢 ) = ∫ 𝑢 v (cid:16) 𝑃 ( 𝑦 𝑥 ) Δ Φ ( 𝑦 𝑥 , 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) + 𝑃 ( v ) Δ Φ (h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 , 𝑧 𝑥 ) (cid:17) d 𝑥 + 𝑃 ( v ) Φ ( v , v , v ) − 𝑃 ( 𝑢 ) Φ ( 𝑢, 𝑢, 𝑢 ) . Summing the hill and valley expressions over all hills and valleys, most of the non-integral termscancel out, and the two that remain are 0 by assumption: ∫ ∞ Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) 𝜕𝑃 ( 𝑥 ) d 𝑥 = ∫ ∞ (cid:16) 𝑃 ( 𝑦 𝑥 ) Δ Φ ( 𝑦 𝑥 , 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝑃 ( 𝑥 ) 𝜕 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) + 𝑃 ( v ) Δ Φ (h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 , 𝑧 𝑥 ) (cid:17) d 𝑥 + 𝑃 ( ) Φ ( , , 𝑧 ) − lim 𝑥 →∞ 𝑃 ( 𝑥 ) Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) . (cid:3) Our ﬁnal two lemmas show that integrals using Δ can sometimes be turned into integrals us-ing 𝜕 . Lemma B.10.

Let 𝑦, 𝑧 be a hill-valley pair and Φ : R + → R + be diﬀerentiable with respect to itssecond argument. Then ∫ ∞ Δ Φ ( 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 ) d 𝑥 = ∫ ∞ 𝜕 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) d 𝑥 . Proof.

For each valley ( 𝑢, v ] , ∫ v 𝑢 Δ Φ ( 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 ) d 𝑥 = ∫ v 𝑢 Δ Φ ( 𝑢, h 𝑢, v i , v ) d 𝑥 = ( v − 𝑢 ) Δ Φ ( 𝑢, h 𝑢, v i , v ) = Φ ( 𝑢, v , v ) − Φ ( 𝑢, 𝑢, v ) = ∫ v 𝑢 𝜕 Φ ( 𝑢, 𝑥, v ) d 𝑥 = ∫ v 𝑢 𝜕 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) d 𝑥 . For each hill ( v , 𝑢 ] , ∫ 𝑢 v Δ Φ ( 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 ) d 𝑥 = ∫ 𝑢 v Δ Φ ( 𝑥, h 𝑥, 𝑥 i , 𝑥 ) d 𝑥 = ∫ 𝑢 v 𝜕 Φ ( 𝑥, 𝑥, 𝑥 ) d 𝑥 = ∫ 𝑢 v 𝜕 Φ ( 𝑦 𝑥 , 𝑥, 𝑧 𝑥 ) d 𝑥 . Summing the hill and valley expressions over all hills and valleys yields the desired result. (cid:3)

Lemma B.11.

Let 𝑦, 𝑧 be a hill-valley pair and both Φ : R + → R and Ψ : R + → R be diﬀerentiable.Then ∫ ∞ Δ [ 𝑡 ↦→ Φ ( 𝑦 𝑥 , Ψ ( 𝑡 ) , 𝑧 𝑥 )] (h 𝑦 𝑥 , 𝑧 𝑥 i) d 𝑥 = ∫ ∞ Δ Φ ( 𝑦 𝑥 , h Ψ ( 𝑦 𝑥 ) , Ψ ( 𝑧 𝑥 )i , 𝑧 𝑥 ) 𝜕 Ψ ( 𝑥 ) d 𝑥 . ptimal Multiserver Scheduling with Unknown Job Sizes in Heavy Traﬀic 37 Proof.

We compute ∫ ∞ Δ [ 𝑡 ↦→ Φ ( 𝑦 𝑥 , Ψ ( 𝑡 ) , 𝑧 𝑥 )] (h 𝑦 𝑥 , 𝑧 𝑥 i) d 𝑥 = ∫ ∞ Δ Φ ( 𝑦 𝑥 , h Ψ ( 𝑦 𝑥 ) , Ψ ( 𝑧 𝑥 )i , 𝑧 𝑥 ) ΔΨ (h 𝑦 𝑥 , 𝑧 𝑥 i) d 𝑥 [by Lem. B.7] = ∫ ∞ Δ h 𝑢, 𝑡, v ↦→ Δ Φ ( 𝑢, h Ψ ( 𝑢 ) , Ψ ( v )i , v ) · Ψ ( 𝑡 ) i ( 𝑦 𝑥 , h 𝑦 𝑥 , 𝑧 𝑥 i , 𝑧 𝑥 ) d 𝑥 = ∫ ∞ Δ Φ ( 𝑦 𝑥 , h Ψ ( 𝑦 𝑥 ) , Ψ ( 𝑧 𝑥 )i , 𝑧 𝑥 ) 𝜕 Ψ ( 𝑥 ) d 𝑥 . [by Lem. B.10] (cid:3) B.2 Proofs of New Formulas

We now apply the theory developed in Appendix B.1 to prove Theorems B.1–B.3. Throughout theproofs, 𝑦 𝑥 and 𝑧 𝑥 refer to 𝑦 𝜋𝑥 and 𝑧 𝜋𝑥 , respectively. Recall that 𝑦, 𝑧 form a hill-valley pair (Deﬁni-tion B.5) under any monotonic SOAP policy 𝜋 . Proof of Theorem B.1.

We compute E [ 𝑄 𝜋 -1 ] = ∫ ∞ 𝜏 ( 𝑧 𝑥 ) 𝜌 ( 𝑦 𝑥 ) 𝜌 ( 𝑧 𝑥 ) d 𝐹 ( 𝑥 ) [by (4.4)] = ∫ ∞ 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) Δ (cid:20) 𝑡 ↦→ 𝜏 ( 𝑡 ) 𝜌 ( 𝑡 ) (cid:21) (h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝐹 ( 𝑧 𝑥 ) 𝜏 ( 𝑧 𝑥 ) 𝜌 ( 𝑧 𝑥 ) Δ (cid:20) 𝑡 ↦→ 𝜌 ( 𝑡 ) (cid:21) (h 𝑦 𝑥 , 𝑧 𝑥 i) ! d 𝑥 [by Lem. B.9] = ∫ ∞ 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) Δ 𝜏 (h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝐹 ( 𝑦 𝑥 ) 𝜏 ( 𝑧 𝑥 ) 𝜌 ( 𝑦 𝑥 ) Δ (cid:20) 𝑡 ↦→ 𝜌 ( 𝑡 ) (cid:21) (h 𝑦 𝑥 , 𝑧 𝑥 i)+ 𝐹 ( 𝑧 𝑥 ) 𝜏 ( 𝑧 𝑥 ) 𝜌 ( 𝑧 𝑥 ) Δ (cid:20) 𝑡 ↦→ 𝜌 ( 𝑡 ) (cid:21) (h 𝑦 𝑥 , 𝑧 𝑥 i) ! d 𝑥 [by Lem. B.8] = ∫ ∞ 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) 𝜕𝜏 ( 𝑥 ) + 𝜏 ( 𝑧 𝑥 ) (cid:18) 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) + 𝐹 ( 𝑧 𝑥 ) 𝜌 ( 𝑧 𝑥 ) (cid:19) 𝜕 (cid:20) 𝑡 ↦→ 𝜌 ( 𝑡 ) (cid:21) ( 𝑥 ) ! d 𝑥, [by Lem. B.10] which equals the desired result by (4.5). (cid:3) Proof of Theorem B.2.

We compute E [ 𝑅 𝜋 -1 ] = ∫ ∞ 𝑥𝜌 ( 𝑦 𝑥 ) d 𝐹 ( 𝑥 ) [by (4.4)] = ∫ ∞ (cid:18) 𝑧 𝑥 𝐹 ( 𝑧 𝑥 ) Δ (cid:20) 𝑡 ↦→ 𝜌 ( 𝑡 ) (cid:21) (h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝐹 ( 𝑥 ) 𝜌 ( 𝑦 𝑥 ) (cid:19) d 𝑥 [by Lem. B.9] = ∫ ∞ (cid:18) − 𝑧 𝑥 𝐹 ( 𝑧 𝑥 ) 𝜌 ( 𝑦 𝑥 ) 𝜌 ( 𝑧 𝑥 ) 𝜕𝜌 ( 𝑥 ) + 𝐹 ( 𝑥 ) 𝜌 ( 𝑦 𝑥 ) (cid:19) d 𝑥, [by (B.1), Lem. B.11] which equals the desired result by (4.5). (cid:3) Proof of Theorem B.3.

Very similarly to the proof of Theorem B.2, we compute E [ 𝑆 𝜋 -1 ] = ∫ ∞ 𝑧 𝑥 𝜌 ( 𝑦 𝑥 ) d 𝐹 ( 𝑥 ) [by (4.6)] = ∫ ∞ (cid:18) 𝑧 𝑥 𝐹 ( 𝑧 𝑥 ) Δ (cid:20) 𝑡 ↦→ 𝜌 ( 𝑡 ) (cid:21) (h 𝑦 𝑥 , 𝑧 𝑥 i) + 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) (cid:19) d 𝑥 [by Lem. B.9] = ∫ ∞ (cid:18) − 𝑧 𝑥 𝐹 ( 𝑧 𝑥 ) 𝜌 ( 𝑦 𝑥 ) 𝜌 ( 𝑧 𝑥 ) 𝜕𝜌 ( 𝑥 ) + 𝐹 ( 𝑦 𝑥 ) 𝜌 ( 𝑦 𝑥 ) (cid:19) d 𝑥, [by (B.1), Lem. B.11] which equals the desired result by (4.5).which equals the desired result by (4.5).