[PDF] Threshold-based rerouting and replication for resolving job-server affinity relations

Abstract

We consider a system with several job types and two parallel server pools. Within the pools the servers are homogeneous, but across pools possibly not in the sense that the service speed of a job may depend on its type as well as the server pool. Immediately upon arrival, jobs are assigned to a server pool. This could be based on (partial) knowledge of their type, but such knowledge might not be available. Information about the job type can however be obtained while the job is in service; as the service progresses, the likelihood that the service speed of this job type is low increases, creating an incentive to execute the job on different, possibly faster, server(s). Two policies are considered: reroute the job to the other server pool, or replicate it there. We determine the effective load per server under both the rerouting and replication policy for completely unknown as well as partly known job types. We also examine the impact of these policies on the stability bound, and find that the uncertainty in job types may significantly degrade the performance. For (highly) unbalanced service speeds full replication achieves the largest stability bound while for (nearly) balanced service speeds no replication maximizes the stability bound. Finally, we discuss how the use of threshold-based policies can help improve the expected latency for completely or partly unknown job types.

Full PDF

TThreshold-based rerouting and replication forresolving job-server a ﬃ nity relations Youri Raaijmakers a, ∗ , Sem Borst a , Onno Boxma a a Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600 MB Eindhoven, TheNetherlands

Abstract

We consider a system with several job types and two parallel server pools. Within the pools theservers are homogeneous, but across pools possibly not in the sense that the service speed of ajob may depend on its type as well as the server pool. Immediately upon arrival, jobs are assignedto a server pool. This could be based on (partial) knowledge of their type, but such knowledgemight not be available. Information about the job type can however be obtained while the job isin service; as the service progresses, the likelihood that the service speed of this job type is lowincreases, creating an incentive to execute the job on di ﬀ erent, possibly faster, server(s). Twopolicies are considered: reroute the job to the other server pool, or replicate it there.We determine the e ﬀ ective load per server under both the rerouting and replication policyfor completely unknown as well as partly known job types. We also examine the impact ofthese policies on the stability bound, and ﬁnd that the uncertainty in job types may signiﬁcantlydegrade the performance. For (highly) unbalanced service speeds full replication achieves thelargest stability bound while for (nearly) balanced service speeds no replication maximizes thestability bound. Finally, we discuss how the use of threshold-based policies can help improve theexpected latency for completely or partly unknown job types. Keywords:

Parallel-processing systems, stability, rerouting, replication, compatibilityconstraints, server heterogeneity, straggler mitigation

1. Introduction

This paper considers parallel-processing systems with two heterogeneous server pools, inwhich a dispatcher assigns jobs to one of the server pools immediately upon arrival. However, ajob is allowed to be rerouted to the other pool when its service has not yet been completed aftera certain amount of processing time. We also consider an alternative option in which such a jobis replicated at the other pool. The jobs are assumed to be of di ﬀ erent types; one job type might,e.g., be fast on a server from pool 1 and slow on a server from pool 2, whereas this might bereversed for another job type. We examine the ability of threshold-based policies to deal withsuch heterogeneity in service speeds: if the job types are unknown, or only partly known, can ∗ Corresponding author

Email address: [email protected] (Youri Raaijmakers)

Preprint submitted to International Teletra ﬃ c Congress ITC 32 May 28, 2020 a r X i v : . [ c s . PF ] M a y e use a threshold for the amount of processing time after which the job should be rerouted (orreplicated) in order to improve stability of the system and reduce latency?Replication schemes as described above were introduced to mitigate the adverse e ﬀ ect ofso-called stragglers on the system performance, see for example [3, 7, 17]. Closely relatedto the present paper is [1] which studies the replication policy under the assumption of i.i.d.replicas and homogeneous servers. An approximation for the expected latency is derived forboth exponential and shifted-exponential job size distributions. Moreover, the trade-o ﬀ betweenthe expected latency and the cost, deﬁned as the sum of the processing times of each replicainvolved in the job execution, is analyzed via simulation. Replication schemes are shown toreduce both cost and expected latency in case of heavy-tailed job size distributions. Also closelyrelated is [11] which focuses on the throughput-optimal replication policy under the assumptionof identical replicas and a continuous speed distribution at each server. The Markov DecisionProcess formulation for the optimal replication policy is in general intractable, and therefore anupper bound is derived. In addition, a myopic MaxRate policy is introduced, which depends onthe number of unﬁnished jobs after a certain action and the expected remaining processing time.In all the above papers the speed variations experienced by the various replicas are essentiallyassumed to be purely random in nature. The multi-type set-up that we consider in the presentpaper allows for intrinsic di ﬀ erences in speed across servers depending on the speciﬁc character-istics of each individual job. Such heterogeneity in service speeds captures systematic job-servera ﬃ nity relations which may arise from data locality issues but may also reﬂect soft compat-ibility constraints that are increasingly prevalent in data center environments. We investigatethe e ﬀ ectiveness of threshold-based replication and rerouting in the presence of such underlyingjob-server a ﬃ nity relations.The performance of the threshold-based replication policy is strongly linked to the stability ofredundancy scheduling, where several replicas are created for each job immediately upon arrival,see for example [10, 13, 14, 15]. Speciﬁcally, it is demonstrated in [15] that in case of known job types and a probabilistic type-dependent job assignment strategy, no replication maximizesthe stability bound for New-Better-than-Used (NBU) job size distributions. In contrast, in caseof unknown job types, full replication achieves a larger stability bound than no replication forNew-Worse-than-Used (NWU) job size distributions.Likewise, the threshold-based rerouting policy is closely connected to the problem of learningthe type of a job in an online manner as considered by Bimpikis and Markakis [4]. In theirmodel there is a server pool that is compatible with jobs of all types and a server pool that iseither compatible or incompatible with a job depending on its type. Here (in)compatible meansthat the service of a job can(not) be fulﬁlled by a server in this pool. The job types can belearned via observing the processing time. Indeed, as time elapses and the service has not beencompleted, the likelihood that the job is incompatible with the speciﬁc server it is on increases,and in [4] the job is therefore rerouted to the other server pool when the likelihood that the serveris incompatible exceeds some value, also called threshold. For related literature on this learningproblem we refer to [4] and the references therein.In this paper we consider similar threshold-based policies as in [4], but deﬁned in terms ofthe received processing time, and extend the model of [4] to allow for completely general type-dependent service speed variations. Furthermore, we include replication as an additional optionbesides rerouting, resulting in concurrent execution of replicas at possibly di ﬀ erent speeds asin [15]. Throughout, we use the term stability bound to refer to the maximum arrival rate of jobsfor which the e ﬀ ective load per server is smaller than one. The e ﬀ ective load per server is deﬁnedas a measure of the amount of time required to serve all o ﬀ ered jobs.2he key contributions and insights can be summarized as follows.1. We determine the e ﬀ ective load per server under both the rerouting and replication policy.We show that for unknown job types and (highly) unbalanced service speeds the largeststability bound is achieved by full replication. In contrast, for (nearly) balanced servicespeeds the stability bound is maximized by not replicating at all. Surprisingly, reroutingor replication does not signiﬁcantly increase the stability bound in case of unknown jobtypes, in part because of the variability in the job sizes.2. We also determine the e ﬀ ective load per server in a scenario with known job sizes. We ﬁndthat typically there still is a signiﬁcant performance loss in terms of the stability boundcompared to the case of known job types. This implies that the uncertainty in the job typesplays a more pertinent role than the variability in the job sizes.3. We extend our modeling framework to allow for partly known job types. We observethat decreasing the uncertainty in the job types increases the stability bound in a convexmanner.4. We discuss how the use of threshold-based policies can also help improve the expectedlatency for completely unknown or partly known job types. While an exact latency analysisis beyond the scope of this paper, we provide approximations that show a reduction of theexpected latency.The remainder of the paper is organized as follows. In Section 2 we provide a detailedmodel description. Analytical expressions for the e ﬀ ective load per server under threshold-basedrerouting or replication in the case of completely unknown and partly known job types are derivedin Section 3. In Section 4 we characterize the e ﬀ ective load per server in a scenario where thethresholds depend on the job sizes when these are known in advance. Section 5 presents extensivenumerical results which quantify the performance implications due to the uncertainty in jobtypes. In Section 6 we examine how the use of threshold-based policies can also help reduceexpected latency. Section 7 contains conclusions and some suggestions for further research.

2. Model description

Consider a system with N servers and a dispatcher where jobs arrive at rate λ . The servers aredivided into two server pools, where n i denotes the number of servers in pool i , with n + n = N .The dispatcher assigns jobs immediately upon arrival to one of the two server pools accordingto a static policy as will be further speciﬁed later. Each pool has a central queue that follows anon-idling service discipline, in the sense that the servers always serve jobs when the queue isnon-empty. We allow the sizes X , X of a generic job on the server pools to be governed by somejoint distribution F X ( x , x ), where X i , i = ,

2, are each distributed as a generic random variable X , but not necessarily independent. This covers the extreme scenarios of perfect dependence(identical replicas) and no dependence at all (i.i.d. replicas), as previously considered in theliterature, as special cases. The service speeds R , R for a given job in the two pools may di ﬀ er,but within the pools servers have the same speed. For a particular job on a server in pool i , i = ,

2, with size x i , x i / R i represents the execution time. We allow the service speeds R , R of a generic job to be governed by some joint distribution F R ( r , r ), reﬂecting possible serverheterogeneity and job-server a ﬃ nity relations, thus covering a broad range of common workloadmodels as special cases. Examples are the S&X model [8], a scenario with heterogeneous serverspeeds and the ’output-queued’ ﬂexible server model [16].3or convenience, we consider the case where the joint distribution F R ( r , r ) is discrete, andhas mass in a ﬁnite number of, say, J points ( r j , r j ) with corresponding probabilities p j , j = , . . . , J . This system may equivalently be thought of as having J job types, where r i j is theservice speed of type- j jobs at servers in pool i . n n Figure 1: Representation of the model that illustrates the di ﬀ erence between the rerouting and the replication policy. As an important feature of our model we consider the following two resource allocationpolicies: rerouting and replication , see also Figure 1. In both policies, a job exits the systemonce its service requirement has been fully fulﬁlled. However, if the job has received a givenamount of processing time (we focus on the case of a ﬁxed service time threshold τ = ( τ , τ )),then the dispatcher, under the rerouting policy, reroutes the job to the other server pool. Under thereplication policy, the job is replicated in the other server pool while also staying in service at itsoriginal pool. In both policies the service does not carry over, i.e., the entire service requirementshould be fulﬁlled by a server in the ‘new’ pool. Thus, the main di ﬀ erence between the policiesis that in the case of replication, when the job is in service for an amount of processing timelarger than the threshold, there is still a replica in service at the original pool.We make the following assumption: replicas of a job after replication have preemptive-resume priority over jobs that are not yet replicated, if they can get simultaneous service atboth server pools. If both replicas cannot immediately get simultaneous service, then they waituntil one server in each pool is available. This ensures that after replication the replicas willalways receive simultaneous service. This assumption additionally helps eschew stability issuesthat can arise due to local priorities in scenarios with simultaneous resource possession, see forexample [12, Section 8.4].For equally large server pools, i.e., n = n , the preemptive-resume priority implies that jobsnever have to wait after replication. Indeed, the numbers of servers serving replicated jobs areequal at all times and therefore it can never occur that a job is replicated, while all the servers inthe other server pool are serving jobs that are already replicated. For unequal sizes of the serverpools, i.e., n (cid:44) n , it might occur that after replication both replicas have to wait before gettingservice.Replicas are abandoned as soon as one ﬁnishes service. The replication policy with threshold τ = is referred to as the full redundancy policy. For threshold τ = ∞ the rerouting and4eplication policy are equivalent, and will be called the zero redundancy policy.In this paper we restrict ourselves to the scenario with two server pools. Extension to anarbitrary number of server pools is left as a topic for further research. We focus on increasingthe achievable stability bound by optimizing threshold values in terms of the received processingtimes.

3. Stability for unknown job sizes

In Section 3.1 we derive analytical expressions for the e ﬀ ective load per server for both thererouting and replication policy under a given assignment and threshold values. In Section 3.2we specify these allocation fractions for three cases: Completely unknown job types, partlyunknown job types and completely known job types. Throughout we assume that the job sizesare unknown.We attach superscripts Rer and

Rep to metrics that correspond to the rerouting and replication policy, respectively.

Let A denote the 2 × J stochastic assignment matrix with elements α i j . Under the S ( A , τ )policy we assign a fraction α i j of type- j jobs to server pool i and reroute (or replicate) at this poolas soon as the processing time equals τ i for i = ,

2. We deﬁne l : = − i as the other server poolthan i , so that α i j + α l j = j = , . . . , J . Proposition 1.

The e ﬀ ective load of server pool i, for i = , , in the system with rerouting underthe S ( A , τ ) policy is ρ Reri , S ( A , τ ) = λ E (cid:104) B Reri ( A , τ ) (cid:105) n i , (1) where the expected service time requirement of an arbitrary job, assigned either to pool i orpool l, at pool i is E (cid:104) B Reri ( A , τ ) (cid:105) = J (cid:88) j = p j α i j E (cid:34) min (cid:40) X i r i j , τ i (cid:41)(cid:35) + J (cid:88) j = p j α l j E (cid:34) X i r i j (cid:40) X l r l j > τ l (cid:41)(cid:35) . (2) Proof:

When allocating a job to server pool i , there are two possibilities: i) the job ﬁnishes beforererouting; ii) the job is rerouted, in which case the job receives τ i units of service at this serverpool. However, jobs that started in server pool l can be rerouted to pool i as well. The proof thenfollows from noting that E (cid:20) min (cid:26) X i r ij , τ i (cid:27)(cid:21) represents the expected amount of processing of a jobin server pool i before completing or being rerouted to server pool l , and E (cid:20) X i r ij (cid:26) X l r lj > τ l (cid:27)(cid:21) rep-resents the expected received amount of processing of a job in server pool i , if any, after havingreceived τ l units in pool l ﬁrst. (cid:3) To simplify the expressions for the replication policy, we introduce some notation: k mil , j ( y ) = E (cid:34)(cid:32) min (cid:40) X i r i j − y , X l r l j (cid:41)(cid:33) m (cid:40) X i r i j > y (cid:41)(cid:35) . m -th moment of the amount of time that a job will be processed in both serverpools, if any, after being processed up to y time units in server pool i ﬁrst. In case of m = Proposition 2.

The e ﬀ ective load per server in pool i, for i = , , in the system with replicationunder the S ( A , τ ) policy is ρ Repi , S ( A , τ ) = λ E (cid:104) B Repi ( A , τ ) (cid:105) n i , (3) where the expected service time requirement of an arbitrary job, assigned either to pool i orpool l, at pool i is E (cid:104) B Repi ( A , τ ) (cid:105) = J (cid:88) j = p j α i j (cid:18) E (cid:34) min (cid:40) X i r i j , τ i (cid:41)(cid:35) + k il , j ( τ i ) (cid:19) + J (cid:88) j = p j α l j k li , j ( τ l ) . (4) Proof:

When allocating a job to server pool i , there are two possibilities: i) the job ﬁnishesbefore replication; ii) the job is replicated, in which case the job spends at least τ i time units inservice at this server pool. The (remaining) expected service time requirement of jobs starting atserver pool i after replication, if any, is k il , j ( τ i ). However, jobs that started in server pool l can bereplicated to pool i as well. The (remaining) expected service time requirement of these jobs, ifany, is k li , j ( τ l ). (cid:3) Evidently, ρ Rer i , S ( A , τ ) < ρ Rep i , S ( A , τ ) < i = ,

2, is a necessary condition for stability. It isquite plausible that this condition is in fact also su ﬃ cient for the system to be stable (under thepreemptive-resume priority policy for replicated jobs).Indeed, for multiserver queues it is well known that the system is stable if and only if the loadper server is smaller than one, where the arrival process can be quite general (see for example [6,Chapter 1] or [18, Chapter 7]). Moreover, in [18, Proposition 7.4.12] it is proved that, underthe assumption that the sequence of inter-arrival and service times of the jobs at the queuesis ergodic and stationary, the stability of two G / G / and rerouted(or replicated) jobs arrive at a server pool. In addition, replicated jobs must receive servicesimultaneously at both server pools, implying that servers may be idling even when there arejobs waiting. As a result, it is hard to rigorously establish that a load per server smaller than oneis su ﬃ cient for the system to be stable. Remark 1.

The expected service time requirement at server pool i for the zero redundancy policyis equal to E (cid:104) B Reri ( A , ∞ ) (cid:105) = E (cid:104) B Repi ( A , ∞ ) (cid:105) = J (cid:88) j = p j α i j E (cid:34) X i r i j (cid:35) , (5) and the expected service time requirement at server pool i for the full redundancy policy is equalto E (cid:104) B Repi ( A , ) (cid:105) = J (cid:88) j = p j E (cid:34) min (cid:40) X i r i j , X l r l j (cid:41)(cid:35) . (6)6bserve that the expected service time requirement for the full redundancy policy is indepen-dent of the assigned fractions. The achievable stability bound for this policy coincides with thatderived in [15]. Furthermore, note that in case of identical replicas the e ﬀ ective load per serverfor both the zero redundancy and full redundancy policy is insensitive to the job size distribution,given its mean. In this subsection, we specify the assignment fractions in Equations (2) and (4) for the threecases of completely unknown, partly unknown and completely known job types. Let q ij denotethe fraction of type- j jobs that are assigned to server pool i in these three cases. The proofs arestraightforward, and hence omitted. Corollary 1.

For the case of unknown job types, Propositions 1 and 2 hold with α i j = q i , for i = , and j = , . . . , J . (7)We proceed with the case of partly known job types by which we mean that every arrivingjob is believed to be of a speciﬁc type. This can be thought of as a label indicating the likely typeof a job. Let p j → j ∗ denote the probability that a job of type j is believed to be of type j ∗ , with (cid:80) Jj ∗ = p j → j ∗ =

1. The scenario p j → j = p j → j ∗ = p j ∗ , for all j = , . . . , J , corresponds to the case of unknown job types. Corollary 2.

For the case of partly known job types, Propositions 1 and 2 hold with α i j = J (cid:88) j ∗ = p j → j ∗ q ij ∗ , for i = , and j = , . . . , J . (8)Observe that in the case of completely unknown job types Equation (8) becomes α i j = (cid:80) Jj ∗ = p j ∗ q ij ∗ for i = ,

2, which is equivalent to Equation (7) since it is independent of the jobtypes.

Corollary 3.

For the case of known job types, Propositions 1 and 2 hold with α i j = q ij , for i = , and j = , . . . , J . (9)Substituting these assignment fractions in Equation (5) gives E (cid:104) B Rer i ( A , ∞ ) (cid:105) = E (cid:104) B Rep i ( A , ∞ ) (cid:105) = J (cid:88) j = p j q ij E (cid:34) X i r i j (cid:35) . This coincides with the stability condition derived in [15] for no replication and known job types.Moreover, in [15] it is proved that no replication gives a strictly larger stability bound than repli-cation in the case of NBU job size distributions, independent of the server speeds. For NWU jobsize distributions examples show that both no replication and full replication can give a largerstability bound depending on the server speeds. Examples in which neither no replication norfull replication gives a larger stability bound have not been found, see [15].7 eﬁnition 1.

A rerouting policy corresponds to a stochastic assignment matrix A with elements α i j , where we assign a fraction α i j of type- j jobs to server pool i and vectors ( τ i , τ i , . . . , τ iK i ) ∈ R K i + with possibly K i = or K i = ∞ , i = , . Here a job of size x that is assigned to serverpool i, is processed for up to τ i time units, and then successively restarted and processed inserver pool l for up to τ i time units, restarted and processed in server pool i for up to τ i timeunits, etc. Eventually this job is restarted and processed for up to τ iK i time units in server pool ior l if K i is odd or even, respectively, and ultimately processed for any amount of time in serverpool i or l if K i is even or odd, respectively, until the job is completed, whichever occurs ﬁrst (or,as long as the job has not been completed). In this section we focused on single threshold policies. The e ﬀ ective load per server for thererouting policy can also be derived for the case of an arbitrary n -threshold policy as deﬁned inDeﬁnition 1, see Appendix B. However, this would make the optimization over all the parame-ters, as done numerically in Section 5, much more complex.

4. Stability for known job sizes

In the previous section we derived the e ﬀ ective load per server for unknown job sizes. In thissection, we consider the case of known job sizes, which is indicated by an additional superscript KS . We allow the assignment fractions and the threshold values to depend on the size of the jobat the server pool, which is indicated by a superscript x . Thus, the threshold at a server poolmay di ﬀ er for every arriving job, whereas in the previous section the threshold at the pool wasequal for all jobs. The joint density of the job sizes at the server pools is denoted by f X ( · , · ). Thee ﬀ ective load per server for the rerouting and replication policy are derived in Propositions 3and 4, respectively. Proposition 3.

The e ﬀ ective load per server in pool i, for i = , , in the system with reroutingunder the S ( A x , τ x ) policy, in case of known job sizes, is ρ Rer,KSi , S ( A x , τ x ) = λ E (cid:104) B Rer,KSi ( A x , τ x ) (cid:105) n i , (10) where E (cid:104) B Rer,KSi ( A x , τ x ) (cid:105) = (cid:90) (cid:90) R  J (cid:88) j = p j α xi j min (cid:40) x i r i j , τ xi (cid:41) + J (cid:88) j = p j α xl j x i r i j (cid:40) x l r l j > τ xl (cid:41) f X ( x , x ) dx dx . (11) Proof:

The proof follows along the same lines as the proof of Proposition 1. (cid:3)

Proposition 4.

The e ﬀ ective load per server in pool i, for i = , , in the system with replicationunder the S ( A x , τ x ) policy, in case of known job sizes, is ρ Rep,KSi , S ( A x , τ x ) = λ E (cid:104) B Rep,KSi ( A x , τ x ) (cid:105) n i , (12)8 here E (cid:104) B Rep,KSi ( A x , τ x ) (cid:105) = (cid:90) (cid:90) R  J (cid:88) j = p j α xi j (cid:18) min (cid:40) x i r i j , τ xi (cid:41) + k KSil , j ( x , τ xi ) (cid:19) + J (cid:88) j = p j α xl j k KSli , j ( x , τ xl )  f X ( x , x ) dx dx , (13) where k KSil , j ( x , y ) = min (cid:26) x i r ij − y , x l r lj (cid:27) (cid:26) x i r ij > y (cid:27) . Proof:

The proof follows along the same lines as the proof of Proposition 2. (cid:3)

Again, the e ﬀ ective load per server can also be derived for the case of an arbitrary n -thresholdpolicy, see Appendix B. Deﬁnition 2.

Let ( n xi , n xi , . . . , n xiK i ) ∈ N K i be vectors, with (cid:80) K i k = n xik ≤ J − , n xik ≥ for allk = , . . . , K i and i = , . Given the set J ik − , with J i = { , , . . . , J } , let r ik be the n xik -th highestservice speed on server pool i ( k ) among the job types in J ik − and let J ik be the job types in J ik − that do not have the n xik highest service speeds on server pool i ( k ) , where i ( k ) = i if k is odd andi ( k ) = l if k is even and i = , The set J ik corresponds to the remaining possible job types of ajob that is initially assigned to server pool i and rerouted k times (according to a speciﬁc policy).Then the rerouting policy that corresponds to some stochastic assignment matrix A x and vectors ( τ xi , τ xi , . . . , τ xiK i ) ∈ R K i + , with τ xik = min ( n ik ) j ∈ J ik − x i ( k ) / r i ( k ) j for k = , . . . , K i and i = , , where min ( n ) denotes the n-th order statistic, is said to be associated with the vectors ( n xi , n xi , . . . , n xiK i ) ∈ N K i . Any rerouting policy that is associated with two particular vectors as described in Deﬁnition 2is said to be a rational policy. Note that a rational policy involves at most J − Observation 1.

In the case of known job sizes and J job types, for each job there are at most Jtime points, referred to as rational time points, at which the job could possibly fulﬁll its servicerequirement, which depend on the job size, server speeds and the threshold values. Moreover, we(only) gain information about the job type at these (rational) time points.

The next lemma establishes that in order to achieve maximum stability for known job sizes,we may restrict within the class of all rerouting policies as speciﬁed in Deﬁnition 1, to thesubclass of rational policies as deﬁned in Deﬁnition 2.

Lemma 1.

For any arbitrary rerouting policy there exists a rational rerouting policy that isbetter in the sense that for a job of any size, given the initial assignment to one of the serverpools, it uses at most the same cumulative amount of processing time in each of the server pools.

Proof by induction in the number of remaining possible job types:

Base case: Suppose that for a job initially assigned to server pool i , the rerouting policy is rationalup to (measured in processing time) t J − = min (1) j ∈ J iKi − x i ( K i − / r i ( K i − j time units, at which weknow that the job is of the one remaining job type and without rerouting the job would ﬁnish at9 J . Then rerouting exactly at t J − uses at most the same cumulative amount of processing time ineach of the server pools as rerouting between t J − and t J .Inductive step: Show that for any m ≥

1, if the lemma holds for m remaining possible job types,then it also holds for m + i , the rerouting policy is rationalup to (measured in processing time) t J − m − = min ( n ik ) j ∈ J ik x i ( k ) / r i ( k ) j time units, with k (the numberof reroutings after which the job can possibly be of the m + | J ik | = m + − n ik , and without rerouting the ﬁrst time the job could possibly ﬁnish is t J − m (if it isof a speciﬁc type). Note that at t J − m − we know that the job is of one of the m + exactly at t J − m − uses at most the same cumulative amount of processingtime in each of the server pools as rerouting between t J − m − and t J − m . In case of no rerouting, ifthe job does not ﬁnish at t J − m we know that the job is of one of the m remaining job types, andwe can apply the induction hypothesis. In case of rerouting at t J − m − the ﬁrst time the job couldpossibly ﬁnish, say t ∗ J − m , may di ﬀ er from t J − m , however we will never reroute again before t ∗ J − m .At this time we can apply the induction hypothesis. (cid:3) Observe that Lemma 1 proves that there exists a rational rerouting policy that is better thanany arbitrary policy given the initial assignment to one of the server pools. To achieve the max-imum achievable stability bound we also need to ﬁnd the appropriate initial assignment matrix A x . Remark 2.

For the single-threshold rerouting policy, analyzed in Proposition 3, Lemma 1 holdsas well. For the optimization we hence have J − candidate values for the threshold.

5. Numerical results

In Sections 3 and 4 we derived the e ﬀ ective load per server for various cases. We now presentnumerical results to get further insight in the performance implications due to the uncertainty injob types. Throughout this section we denote the policies as follows: Rerouting (Rer), Repli-cation (Rep), Zero redundancy (ZRed) and Full redundancy (FRed). The expected service timerequirement for these four policies is respectively given by (2), (4), (5) and (6). For known jobsizes (KS) we only show the maximum of the rerouting and replication policy, where the ex-pected service requirement is respectively given by (11) and (13). Surprisingly, in all scenariosthat we considered the maximum was achieved by the rerouting policy with known job sizes.In Section 5.1 we examine the scenario of completely unknown job types and in Section 5.2that of partly known job types. In both subsections we distinguish between identical and i.i.d.replicas with exponentially distributed job sizes with mean N . We refer to Appendix AppendixA for results where the job sizes are Pareto Type I distributed with minimum possible value 1and index NN − , thus again mean N , to ensure that the stability condition for known job types is λ < I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow , λ max as function of some relevant system parameter. For the maximization of the achievablestability bound, we used a brute-force search with a certain ﬁne multidimensional grid of startingpoints. 10 .1. Completely unknown job types5.1.1. Identical replicas In Figure 2 the maximum achievable stability bound for the various policies is depicted whenvarying the parameters p and r slow . Observe that the latter scenario is completely symmetric andtherefore q = q = . . . . . . . . .

81 KTKSRep + FRedRerZRed p λ m a x . . . . . . . .

81 KTKSRep FRedRerZRed r slow λ m a x Policy:Known typesKnown sizesReplicationFull redundancyReroutingZero redundancy

Figure 2: Achievable stability bound for identical replicas with exponentially distributed job sizes in the scenario I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with corresponding probabilities p and p = − p , with r slow = . p = p = . The left sub-ﬁgure in Figure 2 shows that the replication policy outperforms the reroutingpolicy. This generally holds for scenarios where r slow is relatively small (unbalanced serverspeeds). The reason is that the rerouting policy becomes unstable, i.e., λ max ↓

0, as r slow ↓

0. Inparticular, in this scenario, the probability of rerouting the job to an (almost) incompatible serverpool, after which we cannot reroute the job again, is strictly positive.The right sub-ﬁgure in Figure 2 reveals that both the rerouting and replication policy achievethe same achievable stability bound for r slow > .

4, which is the same as for the zero redundancy.Therefore, the achievable stability bound can be achieved by the threshold τ = ∞ , see (5). Sohere the threshold does not increase the achievable stability bound. This generally holds in sce-narios where r slow is relatively large (balanced server speeds). For unbalanced server speeds, thereplication policy is equivalent to the full redundancy policy and both outperform the reroutingpolicy.The achievable stability bound in Figure 2 fails to give information about the processing timeper job that is needed to distinguish the job types. In particular, consider the example where onejob type has sizes (cid:15) << K >> K and (cid:15) on these server pools. In the rerouting policy, after only (cid:15) time unitswe know the job type and we reroute when the service requirement has not been fully fulﬁlled.Thus, we can distinguish the job types really fast. However, the service time requirement of arerouted job is in total 2 (cid:15) , while in the case of known types the service requirement of all jobs is (cid:15) . Therefore, for unbalanced server speeds, the rerouting policy always has a performance lossof at least 33%, despite the fact that it can distinguish the job type fast. In Figure 3 we consider the same scenario as in Figure 2, but now for i.i.d. replicas.11 . . . . . .

52 KTKSRep + FRedRerZRed r slow λ m a x Policy:Known typesKnown sizesReplicationFull redundancyReroutingZero redundancy

Figure 3: Achievable stability bound for i.i.d. replicas with exponentially distributed job sizes in the scenario I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with corresponding probabilities p = p = .

5. Thereplication and full redundancy policy are overlapping.

Figure 3 shows that the replication policy outperforms the rerouting policy, especially inscenarios with unbalanced server speeds. In this ﬁgure, the replication policy is equivalent withthe full redundancy policy. This means that learning the job types for the replication policy doesnot improve the achievable stability bound. Moreover, it can be seen that observing the job sizescan signiﬁcantly increase the achievable stability bound. For balanced server speeds, reroutingin the case of known sizes even outperforms the policy with known job types.

In this section we present numerical results for the achievable stability bound in the casewhere job types are partly known, i.e., for an arriving job there is a belief that it is of a speciﬁctype.

We consider a scenario with unbalanced server speeds and balanced server speeds (Figure 4),when varying the probability p → = p → . . . . . . . . .

81 KTKSRep + FRedRerZRed p → λ m a x . . . . . . . .

81 KTKS + Rep + Rer + ZRedFRed p → λ m a x Policy:Known typesKnown sizesReplicationReroutingZero redundancyFull redundancy

Figure 4: Achievable stability bound for identical replicas with exponentially distributed job sizes in the scenario I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with corresponding probabilities p = p = .

5, with r slow = . r slow = . p → = p → . Figure 4 indicates that decreasing the uncertainty in the job types increases the achievablestability bound in a convex manner. Moreover, for unbalanced server speeds, decreasing the un-certainty about the job types at ﬁrst does not have any e ﬀ ect on the achievable stability bound for12he replication policy. However, the replication policy still achieves a larger achievable stabilitybound than the rerouting policy, especially in scenarios with high uncertainty about the job types.For balanced server speeds, it can be seen that the achievable stability bounds for the knownsizes, replication, rerouting and zero redundancy policy are all equal. Hence, for balanced serverspeeds, thresholds do not increase the achievable stability bound even if the uncertainty in thejob types is decreased. . . . . . . . .

81 KTKSRep + FRedRerZRed p → λ m a x . . . . . .

52 KTKS FRedRep + Rer + ZRed p → λ m a x Policy:Known typesKnown sizesReplicationFull redundancyReroutingZero redundancy

Figure 5: Achievable stability bound for i.i.d. replicas with exponentially distributed job sizes in the scenario I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with corresponding probabilities p = p = . r slow = . r slow = . p → = p → . Figure 5 reveals that the use of thresholds does not improve the achievable stability boundfor the replication policy. Interestingly, for the replication policy in a scenario with (highly)unbalanced server speeds, decreasing the uncertainty has no e ﬀ ect on the achievable stabilitybound at ﬁrst. Namely, in the ﬁgure it can be seen that the achievable stability bound is constantfor p → between 0 . .

9, i.e., where the replication and full redundancy policy coincide.

6. Reducing the expected latency

In the previous sections we focused on the achievable stability bound as key performancemetric of interest. The present section aims to take a ﬁrst step towards investigating how the useof thresholds can help reduce the expected latency. Throughout this section we assume that theservice discipline of the two server pools is FCFS, and that there is only one server per pool, i.e., n = n = d with Accomplishment Sampling) policy uses the idea of server accomplish-ment to decide which servers to poll, which can be viewed as a learning scheme. In [2] a systemis introduced that improves the expected latency. The improvement is achieved by replicationof small jobs, which ensures that these jobs do not have to wait for large jobs, called stragglers.Learning the optimal threshold for replication is of vital importance for the performance, i.e., theexpected latency. 13onsider the same model as for the analysis of the e ﬀ ective load per server. Again jobs areassigned to a certain server pool and can be rerouted (or replicated) to the other pool after acertain processing time. In this section we assume that the jobs arrive at the system accordingto a Poisson process with parameter λ . We even know what fraction of jobs is rerouted (orreplicated) at server pool i , namely p τ i : = (cid:80) Jj = p j α i j P (cid:18) X i r ij > τ i (cid:19) . By the Poisson thinning propertythese arrivals follow a Poisson process with parameter λ p τ i . However, the departure process, andthus the arrival process at the other server pool after rerouting (or replication), is a complicatedprocess. It is not Poisson, not even a renewal process, even if the job sizes are exponentiallydistributed. Still, it turns out that we can approximate the mean latency of jobs quite accuratelyby assuming that jobs are rerouted (or replicated) according to a Poisson process. The Poissonassumption is exact in the extreme cases of τ = , in which case all jobs are immediately rerouted(or replicated) and of τ = ∞ , in which case there is no rerouting (or replication). For large τ ,rerouted tra ﬃ c should also be reasonably close to Poisson tra ﬃ c, while its contribution to themean latency is quite small. The Poisson assumption allows us to use the well-known Pollaczek-Khinchin formula for the mean delay in an M / G / i under the S ( A , τ )rerouting policy is given by E (cid:104) T Rer i ( A , τ ) (cid:105) ≈ λ E (cid:20)(cid:16) B Rer i ( A , τ ) (cid:17) (cid:21) (cid:16) − λ E (cid:104) B Rer i ( A , τ ) (cid:105)(cid:17) + E (cid:104) X Rer i ( τ ) (cid:105) + λ E (cid:20)(cid:16) B Rer l ( A , τ ) (cid:17) (cid:21) (cid:16) − λ E (cid:104) B Rer l ( A , τ ) (cid:105)(cid:17) J (cid:88) j = p j P (cid:32) X i r i j > τ i (cid:33) , where, for i = ,

2, the expected service time requirement is given by Equation (2) and E (cid:20)(cid:16) B Rer i ( A , τ ) (cid:17) (cid:21) = J (cid:88) j = p j α i j E (cid:32) min (cid:40) X i r i j , τ i (cid:41)(cid:33)  + J (cid:88) j = p j α l j E (cid:32) X i r i j (cid:33) (cid:40) X l r l j > τ l (cid:41) , with E (cid:104) X Rer i ( τ ) (cid:105) = J (cid:88) j = p j (cid:32) E (cid:34) min (cid:40) X i r i j , τ i (cid:41)(cid:35) + E (cid:34) X l r l j (cid:40) X i r i j > τ i (cid:41)(cid:35)(cid:33) . Indeed, an arriving job at server pool i has to wait for all the jobs present, and this meandelay is given by the Pollaczek-Khinchin formula for an M / G / E (cid:20) min (cid:26) X i r ij , τ i (cid:27)(cid:21) . With probability p τ i thejob is rerouted to server pool l . Again the job has to wait for all the jobs present at this pool.Moreover, in this case the service time requirement of the job was τ i on server pool i and theexpected service time requirement is equal to E (cid:20) X l r lj (cid:26) X i r ij > τ i (cid:27)(cid:21) on pool l .The expected latency of a job that is initially assigned to server pool i under the S ( A , τ )replication policy is given by E (cid:104) T Rep i ( A , τ ) (cid:105) ≈ λ E (cid:20)(cid:16) B Rep i ( A , τ ) (cid:17) (cid:21) (cid:16) − λ E (cid:104) B Rep i ( A , τ ) (cid:105)(cid:17) + E (cid:104) X Rep i ( τ ) (cid:105) , . . λ E xp ec t e d l a t e n c y . . λ E xp ec t e d l a t e n c y Policy:Known typesReplicationFull redundancyReroutingZero redundancy

Figure 6: Expected latency for identical replicas in the scenario I = J = N = n = n =

1, ( r , r ) = (1 , .

1) and( r , r ) = (0 . ,

1) with corresponding probabilities p = p = . λ and degenerate (left)and exponential (right) service times. The approximations are depicted by the dashed lines. For degenerate job sizes, thereplication policy is depicted with ﬁxed τ = , while τ = gives a lower expected latency. For exponential job sizes, thereplication and full redundancy policy are overlapping. where, for i = ,

2, the expected service time requirement is given by Equation (4) and E (cid:20)(cid:16) B Rep i ( A , τ ) (cid:17) (cid:21) = J (cid:88) j = p j α i j (cid:18) E (cid:32) min (cid:40) X i r i j , τ i (cid:41)(cid:33)  + k il , j ( τ i ) (cid:19) + J (cid:88) j = p j α l j k li , j ( τ l ) , with E (cid:104) X Rep i ( τ ) (cid:105) = J (cid:88) j = p j (cid:32) E (cid:34) min (cid:40) X i r i j , τ i (cid:41)(cid:35) + k il , j ( τ i ) (cid:33) , for i = , i has to wait for all the jobs present, and this mean delayis again given by the Pollaczek-Khinchin formula for an M / G / E (cid:20) min (cid:26) X i r ij , τ i (cid:27)(cid:21) . After replication the(remaining) expected service time requirement is k il , j ( τ i ).To obtain the expected latency of an arbitrary job we can simply sum, the expected latencyfor a job that is initially assigned to server pool i multiplied by the probability of assigning thisjob to server pool i , over i . For example, in the rerouting policy we obtain E (cid:104) T Rer ( A , τ ) (cid:105) = (cid:88) i = J (cid:88) j = p j α i j E (cid:104) T Rer i ( A , τ ) (cid:105) . Numerical results

We provide numerical results to give further insights in the expression derived for the ex-pected latency.In Figure 6 the expected latency is depicted as a function of λ , for degenerate and exponen-tially distributed job sizes. The approximations of the expected latency (dashed lines) appear tobe reasonably good. Depending on the variability of the job size distribution either the reroutingor replication policy achieves the lowest expected latency.15 . Conclusion and suggestions for further research We have quantiﬁed the e ﬀ ective load per server for a system with two server pools whenjob types are completely unknown or partly known, but where jobs can either be rerouted orreplicated to the other server pool after receiving service for some amount of time. From thenumerical results we observed that in most of the scenarios rerouting nor replication increasesthe achievable stability bound. Moreover, we observed that for balanced server speeds the zeroredundancy policy achieves the maximum achievable stability bound. We also observed that de-creasing the uncertainty in job types increases the achievable stability bound in a convex manner.Topics for further research include:(i) Extension to multiple server pools. This is more involved than having two server pools,since after rerouting or replication there is the extra choice to which server pool(s).(ii) Extensions of the analysis of the expected latency of Section 6. One could, e.g., use ideasfrom [5, 19] to approximate the departure process of a single-server queue. Moreover, note thatanalytic expressions for the expected latency are lacking in the case of redundancy and thus itis not known what the optimal mean latency is for generally distributed job sizes. The questionhow uncertainty of job types a ﬀ ects the expected latency remains interesting.(iii) Optimization of the expressions for the achievable stability bound under the S ( A , τ )policy.(iv) Extension where the service does carry over. In this case the expected service timerequirement for the rerouting policy becomes E (cid:104) B Rer i ( A , τ ) (cid:105) = J (cid:88) j = p j α i j E (cid:34) min (cid:40) Xr i j , τ i (cid:41)(cid:35) + J (cid:88) j = p j α l j E (cid:34) X − r l j τ l r i j (cid:40) Xr l j > τ l (cid:41)(cid:35) , and for the replication policy E (cid:104) B Rep i ( A , τ ) (cid:105) = J (cid:88) j = p j α i j (cid:18) E (cid:34) min (cid:40) Xr i j , τ i (cid:41)(cid:35) + k il , j ( τ i ) (cid:19) + J (cid:88) j = p j α l j k li , j ( τ l ) . where k il , j ( y ) = E (cid:34) min (cid:40) Xr i j − y , X − r i j τ i r l j (cid:41) (cid:40) Xr i j > y (cid:41)(cid:35) . Acknowledgments

The work in this paper is supported by the Netherlands Organisation for Scientiﬁc Research(NWO) through Gravitation grant NETWORKS 024.002.003.

References [1] M.F. Aktas, P. Peng, and E. Soljanin. E ﬀ ective straggler mitigation: When clones should attack and when? ACMSIGMETRICS Performance Evaluation Review , 45(2):12–14, 2017.[2] G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. E ﬀ ective straggler mitigation: Attack of the clones. NSDI’13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation , 11:185–198, 2013.

3] G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliersin map-reduce clusters using mantri.

OSDI’10 Proceedings of the 9th USENIX conference on Operating SystemsDesign and Implementation , pages 265–278, 2010.[4] K. Bimpikis and M.G. Markakis. Learning and hierarchies in service systems.

Management Science , 65(3):1268–1285, 2018.[5] G.R. Bitran and D. Tirupati. Multiproduct queueing networks with deterministic routing: Decomposition approachand the notion of inference.

Management Science , 34(1):75–100, 1988.[6] A.A. Borovkov.

Stochastic Processes in Queueing Theory.

Springer, 1976.[7] J. Dean and L.A. Barroso. The tail at scale.

Communications of the ACM , 56(2):74–80, 2013.[8] K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, and B. Van Houdt. A better model for job redundancy: Decou-pling server slowdown and job size.

IEEE ACM Transactions on Networking , 25(6):3353–3367, 2017.[9] K. Gardner and C. Stephens. Smart dispatching in heterogeneous systems.

ACM SIGMETRICS PerformanceEvaluation Review , 47(2):12–14, 2019.[10] G. Joshi. E ﬃ cient Redundancy Techniques to Reduce Delay in Cloud Systems . PhD thesis, Massachusetts Instituteof Technology, 2016.[11] G. Joshi. Boosting service capacity via adaptive task replication. ACM SIGMETRICS Performance EvaluationReview , 45(2):9–11, 2017.[12] F.P. Kelly and E. Yudovina.

Stochastic Networks.

Cambridge University Press, 2014.[13] Y. Kim, R. Righter, and R. Wol ﬀ . Job replication on multiserver systems. Advances in Applied Probability ,41(2):546–575, 2009.[14] G. Koole and R. Righter. Resource allocation in grid computing.

Journal of Scheduling , 11:163–173, 2008.[15] Y. Raaijmakers, S.C. Borst, and O.J. Boxma. Achievable stability in redundancy scheduling.

Paper in preparation ,2020.[16] A.L. Stolyar. Optimal routing in output-queued ﬂexible server systems.

Probability in the Engineering and Infor-mational Sciences , 19(2):141–189, 2005.[17] A. Vulimiri, P.B. Godfrey, R. Mittal, J. Sherry, S. Ratnasamy, and S. Shenker. Low latency via redundancy.

CoNEXT’13 Proceedings of the 9th ACM conference on Emerging Networking Experiments and Technologies ,pages 283–294, 2013.[18] J. Walrand.

An Introduction to Queueing Networks.

Prentice Hall, 1988.[19] W. Whitt. Approximations for departure processes and queues in series.

Naval Research Logistics Quarterly ,31(4):499–521, 1984. ppendix A. Additional numerical results In this appendix we present additional numerical results for the achievable stability boundin case of Pareto Type I distributed job sizes. For Figures A.7-A.10 the same scenarios as inFigures 2-5 are considered, but now for Pareto distributed job sizes, with minimum possiblevalue 1 and index NN − instead of exponentially distributed job sizes with mean N .Figure A.7 reveals that, when comparing the policies for Pareto and exponentially distributedjob sizes, the achievable stability bound for the rerouting policy performs worse for Pareto, theother policies achieve the same achievable stability bound. Only the rerouting policy performsworse for Pareto and (highly) unbalanced server speeds. . . . . . . . .

81 KTKSRep + FRedRerZRed p λ m a x . . . . . . . .

81 KTKSRep + FRedRerZRed r slow λ m a x Policy:Known typesKnown sizesReplicationFull redundancyReroutingZero redundancy

Figure A.7: Achievable stability bound for identical replicas with Pareto distributed job sizes in the same scenario asFigure 2, i.e., I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with correspondingprobabilities p and p = − p , with r slow = . p = p = . Figure A.8 shows that, when comparing the policies for Pareto and exponentially distributedjob sizes, the known sizes, replication, full redundancy and rerouting policy all perform betterfor Pareto distributed job sizes. Similar to Figure 3, the replication and full redundancy policyboth outperform the rerouting policy. . . . . r slow λ m a x Policy:Known typesKnown sizesReplicationFull redundancyReroutingZero redundancy

Figure A.8: Achievable stability bound for i.i.d. replicas with Pareto distributed job sizes in the same scenario as Figure 3,i.e., I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with corresponding probabilities p = p = . Figure A.9 indicates that, when comparing the policies for Pareto and exponentially dis-tributed job sizes, for unbalanced server speeds, both the replication and rerouting policy performworse for Pareto distributed job sizes. For unbalanced server speeds Figure A.9 shows that the18nown sizes, replication, rerouting and zero redundancy policy all achieve the same achievablestability bound. The numerical results for the exponential and Pareto job sizes suggest that thesepolicies are insensitive to the job size distribution, given its mean. . . . . . . . .

81 KTKSRep + FRedRer + ZRed p → λ m a x . . . . . . . .

81 KTKS + Rep + Rer + ZRedFRed p → λ m a x Policy:Known typesKnown sizesReplicationReroutingZero redundancyFull redundancy

Figure A.9: Achievable stability bound for identical replicas with Pareto distributed job sizes in the same scenario asFigure 4, i.e., I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with correspondingprobabilities p = p = .

5, with r slow = . r slow = . p → = p → . Figure A.10 reveals that, when comparing the policies for Pareto and exponentially dis-tributed job sizes, the known sizes, replication, full redundancy and rerouting policy all performbetter for Pareto distributed job sizes. This is due to the assumption of i.i.d. replicas and theheavy tail of the Pareto distribution. . . . . + FRedRerZRed p → λ m a x . . . . p → λ m a x Policy:Known typesKnown sizesReplicationFull redundancyReroutingZero redundancy

Figure A.10: Achievable stability bound for i.i.d. replicas with Pareto distributed job sizes in the same scenario asFigure 5, i.e., I = J = N = n = n =

5, ( r , r ) = (1 , r slow ) and ( r , r ) = ( r slow ,

1) with correspondingprobabilities p = p = .

5, with r slow = . r slow = . p → = p → . Appendix B. Load of the system: n -threshold policy Unknown job sizes

In this appendix we extend the expressions for the e ﬀ ective load per server, in the case ofunknown job sizes, of the rerouting policy, given by Proposition 1, by allowing for multiplereroutings (see Deﬁnition 1). For the rerouting policy this means that after assigning a job toserver pool i , we can reroute the job to the other pool. In contrast to the threshold policy we canreroute the job back to the pool it was initially assigned to. In the n -threshold policy we can in19otal reroute n times from one pool to the other.As before, let A denote the 2 × J stochastic assignment matrix with elements α i j . Let T bethe 2 × n rerouting matrix, with rows ( τ i , τ i , . . . , τ in ) denoting the rerouting vector for the jobinitially assigned to server pool i , for i = ,

2. We deﬁne τ i = i = , Proposition 5.

The e ﬀ ective load per server in pool i, for i = , , in the system with reroutingunder the S ( A , T ) policy is ρ Reri , S ( A , T ) = λ E (cid:104) B Reri ( A , T ) (cid:105) n i , where, E (cid:104) B Reri ( A , T ) (cid:105) = J (cid:88) j = p j α i j E (cid:34) min (cid:40) X i r i j , τ i (cid:41)(cid:35) + n − (cid:88) m = J (cid:88) j = p j α I ( m ) j E (cid:34) min (cid:40) X i r i j , τ I ( m ) m + (cid:41) · (cid:40) τ I ( m ) m − < X i r i j , τ I ( m ) m < X l r l j (cid:41)(cid:35) + J (cid:88) j = p j α I ( n ) j E (cid:34) X i r i j (cid:40) τ I ( n ) n − < X i r i j , τ I ( n ) n < X l r l j (cid:41)(cid:35) , where I ( m ) = l if m is odd, and I ( m ) = i if m is even.Known job sizes Proposition 6.

The e ﬀ ective load per server in pool i, for i = , , in the system with reroutingunder the S ( A x , T x ) policy, in case of known job sizes, is ρ Rer,KSi , S ( A x , T x ) = λ E (cid:104) B Rer,KSi ( A x , T x ) (cid:105) n i , where, E (cid:104) B Rer,KSi ( A x , T x ) (cid:105) = (cid:90) (cid:90) R  J (cid:88) j = p j α xi j min (cid:40) x i r i j , τ xi (cid:41) + n − (cid:88) m = J (cid:88) j = p j α xI ( m ) j min (cid:40) x i r i j , τ xI ( m ) m + (cid:41) · (cid:40) τ xI ( m ) m − < x i r i j , τ xI ( m ) m < x l r l j (cid:41) + J (cid:88) j = p j α xI ( n ) j x i r i j (cid:40) τ xI ( n ) n − < x i r i j , τ xI ( n ) n < x l r l j (cid:41) f X ( x , x ) dx dx ..