[PDF] Performance Analysis of Load Balancing Policies with Memory

Abstract

Joining the shortest or least loaded queue among d randomly selected queues are two fundamental load balancing policies. Under both policies the dispatcher does not maintain any information on the queue length or load of the servers. In this paper we analyze the performance of these policies when the dispatcher has some memory available to store the ids of some of the idle servers. We consider methods where the dispatcher discovers idle servers as well as methods where idle servers inform the dispatcher about their state. We focus on large-scale systems and our analysis uses the cavity method. The main insight provided is that the performance measures obtained via the cavity method for a load balancing policy {\it with} memory reduce to the performance measures for the same policy {\it without} memory provided that the arrival rate is properly scaled. Thus, we can study the performance of load balancers with memory in the same manner as load balancers without memory. In particular this entails closed form solutions for joining the shortest or least loaded queue among d randomly selected queues with memory in case of exponential job sizes. Moreover, we obtain a simple closed form expression for the (scaled) expected waiting time as the system tends towards instability. We present simulation results that support our belief that the approximation obtained by the cavity method becomes exact as the number of servers tends to infinity.

Full PDF

PPerformance Analysis of Load Balancing Policies withMemory

Tim Hellemans a,1, ∗ , Benny Van Houdt a a University of Antwerp, Middelheimlaan 1, Antwerp

Abstract

Joining the shortest or least loaded queue among d randomly selected queuesare two fundamental load balancing policies. Under both policies the dispatcherdoes not maintain any information on the queue length or load of the servers. Inthis paper we analyze the performance of these policies when the dispatcher hassome memory available to store the ids of some of the idle servers. We considermethods where the dispatcher discovers idle servers as well as methods whereidle servers inform the dispatcher about their state.We focus on large-scale systems and our analysis uses the cavity method. Themain insight provided is that the performance measures obtained via the cavitymethod for a load balancing policy with memory reduce to the performancemeasures for the same policy without memory provided that the arrival rate isproperly scaled. Thus, we can study the performance of load balancers withmemory in the same manner as load balancers without memory. In particularthis entails closed form solutions for joining the shortest or least loaded queueamong d randomly selected queues with memory in case of exponential job sizes.Moreover, we obtain a simple closed form expression for the (scaled) expectedwaiting time as the system tends towards instability.We present simulation results that support our belief that the approximationobtained by the cavity method becomes exact as the number of servers tends toinﬁnity.

1. Introduction

Load balancing is often used in large-scale clusters to reduce latency. Asimple algorithm, denoted by SQ( d ), exists in assigning incoming jobs to a serverthat currently holds the least number of jobs out of d randomly selected queues.This is referred to as the power-of- d -choices algorithm [1, 14, 23]. Anotherpopular algorithm which has received quite some attention recently exists inassigning an incoming job to the server which is the least loaded amongst d ∗ Corresponding author

Email address: [email protected] (Tim Hellemans)

Preprint submitted to Elsevier January 25, 2021 a r X i v : . [ c s . PF ] J a n andomly selected queues, i.e. the server which is able to start working on thejob ﬁrst receives the job. This policy is referred to as LL( d ) and has been studiedin e.g. [9, 5, 17, 15].The main objective of this paper is to generalize the analysis of the SQ( d )and LL( d ) policy to the case where the dispatcher has some (ﬁnite or inﬁnite)memory available to store the ids of idle servers. These ids may be discovered byeither probing servers to check whether they are idle or servers may inform thedispatcher that they became idle. We focus on the performance of large scalesystems and as such make use of the cavity method introduced in [4]. The cavitymethod relies on the assumption that the queue length (or load) of any ﬁniteset of queues becomes independent as the number of servers tends to inﬁnity,called the ansatz .The ansatz was proven in some particular cases, in [5] it was shown forLL( d ) with general job sizes and SQ( d ) with decreasing hazard rate job sizes.Recently, the ansatz was also proven a variety of load balancing policies whichare similar to LL( d ) (see [19]). Our objective is not to prove the ansatz for loadbalancers with memory, but to study these policies using the cavity method. Todemonstrate the usefulness of our analysis we present simulation results whichsuggest that the cavity method captures the system behavior as the number ofservers tends to inﬁnity.A few papers have previously studied the use of some (bounded) memoryat the dispatcher in combination with a power-of- d policy. In [16] the authorsstudy a policy with a memory of size m , where at every job arrival d serversare probed and the job is assigned to the server with the smallest number ofpending jobs amongst the d probed servers and m servers in memory. The m servers with the shortest queue amongst the remaining d + m − d ) policy, we donot impose any restrictions on the job size distribution. For the SQ( d ) policy,we initially restrict our attention to exponential job sizes and then generalizeour main result to phase type and general job size distributions.As a by-product, our results allow us to study the Join-Idle-Queue policy(denoted by JIQ) with ﬁnite memory. JIQ exists in keeping track of the ids ofthe idle queues and assigning incoming jobs to an idle queue whenever there isan idle server in memory and simply assigning it to a random server otherwise.This policy has vanishing delays when the number of servers tends to inﬁnityin case of inﬁnite memory [13, 20, 7, 6].Apart from the cavity method analysis, we additionaly present explicit re-sults for the heavy traﬃc limit by relying on the framework in [10] that allows2ne to compute the limit lim λ → − − E [ R λ ]log(1 − λ ) for load balancing policies with ex-ponential job sizes. We show that (unsurprisingly) for most memory schemes,the heavy traﬃc limit remains unchanged when we add memory at the dis-patcher. However, when the dispatcher has a memory of size A and serversinform the dispatcher when they become idle, the heavy traﬃc limit is multi-plied by A +1 for both the LL( d ) and SQ( d ) policy. In particular with a memorysize of 1, the response time under heavy traﬃc is halved compared to having nomemory at the dispatcher.Finally, we analyze the low traﬃc limit in case of exponential job sizes. Inparticular, we take a closer look at the ratio of the mean waiting time for twodiﬀerent load balancing policies as the load tends to zero, for which we ﬁnd asimple closed form solution.The paper is structured as follows. In Section 2, the model is introduced andwe shortly review previously obtained results for SQ( d ) and LL( d ). In Section3 we present four approaches which make use of memory at the dispatcher andshow how to obtain the probability that the memory is empty for each of thesemethods. Next we present our major analytical tool, the queue at the cavity inSection 4 and we describe how it is deﬁned for the memory dependent LL( d ) andSQ( d ) policy. We carry out the analysis of the queue at the cavity in Section5. Our analysis is veriﬁed by means of simulation in Section 6. In Section 7 weshow how our results may be used for numerical experimentation by studyingone speciﬁc setting. In Section 8 we study the heavy traﬃc limit, while inSection 9 the low traﬃc limit is considered. We conclude the paper in Section10 and discuss possible future work.All code used to generate Table 1 and Figure 1 can be found at https://github.com/THellemans/memoryDependentLB.git .

2. Model Description

We consider a system consisting of N identical servers (with N large). Thereis some central dispatcher to which jobs arrive according to a Poisson( λN )process. The dispatcher has some (ﬁnite or inﬁnite) memory available to storeids of idle servers. When a job arrives and the dispatcher has the id(s) of someidle server(s) in its memory, the job is dispatched to a random server, the id ofwhich is in memory. If the dispatcher’s memory is empty, d servers are chosenat random and the job is either send to the server with the shortest queue(SQ( d ), see Section 2.1) or to the server with the least amount of work (LL( d ),see Section 2.2). Setting d = 1 yields the JIQ policy where the job is simplyrouted arbitrarily whenever there are no idle servers known by the dispatcher.Before we proceed we provide some further details on the classic SQ( d ) andLL( d ) policy. d ) The SQ( d ) policy was ﬁrst introduced in [14, 23] for a system with Poisson( λN )arrivals and exponential job sizes (with mean 1 /µ ). Whenever a job arrives, d N → ∞ the system behavior converges to the solution of thefollowing set of ODEs: ddt u k ( t ) = λ ( u k − ( t ) d − u k ( t ) d ) − µ ( u k ( t ) − u k +1 ( t )) , where we denote by u k ( t ) the probability that, at time t , an arbitrary server hasat least k jobs in its queue and u ( t ) = 1. This set of ODEs also correspondsto applying the cavity method to the SQ( d ) policy. The ﬁxed point of this setof ODEs obeys a simple recursive formula: µu k +1 = λu dk , (1)which has the closed form solution u k = ρ dk − d − with ρ = λ/µ . In particularone obtains from Little’s Law the closed form solution of the expected responsetime: E [ R ] = 1 λ ∞ (cid:88) k =1 ρ dk − d − . (2) d ) The LL( d ) policy was analyzed in [9] for a system with arbitrary job sizeswith mean E [ G ] using the cavity method. Whenever a job arrives, d queues areprobed and the job is sent to the queue which has the least amount of workleft. This means that the job joins the queue at which its service can start thesoonest. In practice this can be implemented through late binding, see also [17].Let ¯ F ( w ) denote the equilibrium probability that an arbitrary queue has at least w work left using the cavity method. It is shown in [9] that ¯ F ( w ) satisﬁes theﬁxed point equation:¯ F ( w ) = ρ − λ (cid:90) w (1 − ¯ F ( u ) d ) ¯ G ( w − u ) du, (3)with ρ = λ E [ G ] and ¯ G ( w − u ) the probability that a job has a size greater than w − u . This ﬁxed point equation can alternatively be written as the followingIntegro Diﬀerential Equation (IDE):¯ F (cid:48) ( w ) = − λ (cid:20) ¯ G ( w ) − ¯ F ( w ) d + (cid:90) w ¯ F ( u ) d g ( w − u ) du (cid:21) , with g the density function of the job size distribution. Both have the boundarycondition ¯ F (0) = ρ . Moreover, this equation has a closed form solution in caseof exponential job sizes (with mean 1 /µ ):¯ F ( w ) = ( ρ + ( ρ − d − ρ ) e ( d − w ) − d . (4)In particular, one obtains a closed form solution for the expected response time: E [ R ] = 1 λ ∞ (cid:88) n =0 ρ dn +1 n · ( d − . (5)4 . Discovering idle servers We now discuss a number of approaches for the dispatcher to discover idsof idle servers. In the ﬁrst few approaches the dispatcher discovers idle serversby probing, while in the last approach the idle servers identify themselves tothe dispatcher. Note that as the amount of incoming work per server per unitof time is equal to ρ <

1, no work is replicated, and all servers are identical, itfollows that the steady state probability that a server is busy is given by ρ . In the ﬁrst approach, called interrupted probing (IP), the dispatcher probes d servers when its memory is empty upon a job arrival. If there are k ≥ d probed servers, it sends the incoming job to one of the idleservers and stores ids of the k − k − k − k − ρ is the steady state probability thata server is busy, we can ﬁnd the probability π of having no ids in memory whena new job arrives by looking at the Markov chain with state space 0 , . . . , d − M ( ρ ): M ( ρ ) , = ρ d + (cid:18) d (cid:19) ρ d − (1 − ρ ) ,M ( ρ ) ,(cid:96) = (cid:18) d(cid:96) + 1 (cid:19) ρ d − − (cid:96) (1 − ρ ) (cid:96) +1 ,M ( ρ ) k,k − = 1 , and M ( ρ ) k,(cid:96) = 0 otherwise.As only the ﬁrst row is non-trivial, it is not hard to check that π = ( π , . . . , π d − )given by: π k = π  − k (cid:88) j =0 (cid:18) dj (cid:19) ρ d − j (1 − ρ ) j  , for k ≥ M ( ρ ). From the requirement (cid:80) d − k =0 π k = 1 itthen follows that π = 1 ρ d + (1 − ρ ) d . (6)The number of probes used per arrival is clearly given by π d which equals11 − ρ + ρ d d . For SQ( d ) with exponential job sizes this is shown explicitly in the proof of Theorem 1,while for SQ( d ) with general job sizes the proof is carried out in Proposition 3. For LL( d ) thiseasily follows from integrating both sides of (32). d probesper arrival when ρ is not too large (see also Figure 1b). This approach is similar to the IP approach, except that whenever we usea server id from memory for a job arrival, the dispatcher still probes d randomservers. The ids of the d servers that are idle are subsequently added to memory.We assume that the available memory is unlimited.In order to determine the probability π of having a server id in memory,we need to study a Markov chain on an inﬁnite state space. Its transitionprobability matrix M ( ρ ) has the following form: M ( ρ ) , = ρ d + (cid:18) d (cid:19) ρ d − (1 − ρ ) ,M ( ρ ) ,(cid:96) = (cid:18) d(cid:96) + 1 (cid:19) ρ d − − (cid:96) (1 − ρ ) (cid:96) +1 , for 1 ≤ (cid:96) ≤ d −

1. For k ≥

1, we have M ( ρ ) k,k − (cid:96) = (cid:18) d(cid:96) (cid:19) ρ d − (cid:96) (1 − ρ ) (cid:96) , for k − ≤ (cid:96) < d + k , and M ( ρ ) k,(cid:96) = 0 otherwise. First note that if d > − ρ ,this Markov chain is transient as the drift in state k > d (1 − ρ ) − d < − ρ , the chain ispositive recurrent and we need to determine π <

1. A similar observation wasmade in [21, 2].The average time the memory remains empty is equal to:11 − M ( ρ ) , = 11 − ρ d − dρ d − (1 − ρ ) . Furthermore, when the memory becomes non-empty, the length that it remainsnon-empty depends on the number of server ids that are placed into memory.More speciﬁcally let E [ T k, ] denote the expected ﬁrst return time to 0 given thatthe chain starts in state k >

0, then: E [ T k, ] = k E [ T , ] , and the mean time that the Markov chain stays away from state 0 given thatit just made a jump from state 0 to some state k > E [ X ] E [ T , ],where E [ X ] is one less than the mean number of idle servers among d serversgiven that at least 2 are idle. It is not hard to see that E [ X ] = d (1 − ρ ) − (1 − ρ d )1 − ρ d − dρ d − (1 − ρ ) . E [ T , ] = − d (1 − ρ ) as E [ T , ] = 1+ d (1 − ρ ) E [ T , ]. Putting this togetherwe obtain π = 1 − d (1 − ρ ) ρ d , (7)when d < − ρ . Note that the CP approach uses d probes per arrival. This approach is identical to the CP approach, but with ﬁnite memory size A . Hence the transition matrix M ( ρ ) is of size A + 1 and its transitions are thesame as in Section 3.2, except that any transition from a state k ≤ A to a state (cid:96) > A becomes a transition to state A . In particular for k > A − d + 1 we have: M ( ρ ) k,A = d (cid:88) j = A − k +1 (cid:18) dj (cid:19) ρ d − j (1 − ρ ) j , and for all other values, M ( ρ ) coincides with the expressions given in Section3.2. This Markov chain does not appear to have a simple closed form solutionfor arbitrary values of d , however for d = 2 one ﬁnds: π = 1 − (cid:16) − ρρ (cid:17) − (cid:16) − ρρ (cid:17) A +1) . For d > π . Note thatthis approach uses d probes per arrival unless the dispatcher sends the probesone at a time and stops probing when the memory is full. In this section we present a result that applies to any scheme where thedispatcher discovers idle servers by probing and any idle server that is discoveredis stored in memory. Thus the result only applies to BCP if the probes are sentone at a time.

Proposition 1.

Assume all discovered idle servers are stored in memory. Thenfor any LL( d )/SQ( d ) memory based policy, the average number of probes usedper arrival is given by: − π ρ d − ρ . (8) Proof.

If we think of the probes being transmitted one at a time and assigningthe job as soon as an idle server is discovered, the dispatcher uses on average (cid:80) d − k =0 ρ k probes for any job arrival that occurs when the memory is empty.Further, for any arrival that uses an id in memory, an average of 1 / (1 − ρ )7robes was used to discover that id. Hence, the average number of probestransmitted can be written as: π − ρ d − ρ + (1 − π ) 11 − ρ . The above result indicates that for any such policy for which we either knowthe average number of probed queues (as for CP) or can express this using π (as for IP), we immediately obtain π . As the CP policy sends d probes perarrival and the IP policy π d , Proposition 1 yields (6) and (7) without the needto analyze a Markov chain. In this scheme the dispatcher does not probe to discover idle servers, insteada server notiﬁes the dispatcher whenever it becomes idle. In case of inﬁnitememory, the dispatcher knows all idle server ids at all times and the systemreduces to the JIQ policy when d = 1. Our interest lies mostly in knowing whathappens when the memory size is ﬁnite and the job is assigned to the shortestof d queues whenever the memory is empty when a job arrives.If we denote A as the number of ids that can be stored in memory, we showthat π is given by π = 1 − (1 − ρ d ) A +1 ρ d . (9)For SQ( d ) this is shown in Proposition 4, while for LL( d ) this is presented inProposition 6. In particular, this result entails that π is insensitive to the jobsize distribution for SQ( d ) and LL( d ).If we assume that the d probes are transmitted one at a time when memoryis empty and the dispatcher stops probing as soon as an idle server is discovered,the number of probes and messages transmitted by the dispatchers and serversper job arrival can be expressed as: π − ρ d − ρ + (1 − π ρ d ) , where the ﬁrst term corresponds to the number of probes send per arrival bythe dispatcher and the second correspond to the number of server messages perarrival (which is equal to the probability that a job is assigned to an idle server).

4. Description of the queue at the cavity

Our analysis is based on the queue at the cavity method which was introducedin [4] to analyze load balancing systems. The key idea is to focus on the evolutionof a single tagged queue, referred to as the queue at the cavity, and to assumethat all other queues have the same queue length (or workload) distribution at8ny time t . Moreover the queue length (or workload) of any ﬁnite set of queuesis assumed to be independent at any time t . We ﬁrst explain the approach ina system without memory and then indicate how to adapt it to incorporatememory.In a system without memory, the queue at the cavity experiences potentialarrivals at rate λd as this is the rate at which a tagged queue is selected asone of the d randomly selected queues. If a potential arrival occurs at time t , d − t . The potentialarrival becomes an actual arrival if the queue at the cavity has the shortestqueue (or smallest workload) amongst these d values (where ties are broken atrandom). For SQ( d ) with exponential job sizes with mean 1 /µ the queue lengthof the queue at the cavity decreases at a constant rate equal to µ in betweenpotential arrivals, while for LL( d ) the workload decreases linearly at rate 1 whenlarger than zero. For Phase Type distributed job sizes, one needs to include thephase of the job at the head of the queue, while for general job sizes we need toinclude the work left for the job at the head of the queue.To incorporate memory into the cavity method we note that the state of thememory (that is, the number of ids that it contains) evolves at a faster timescale than the fraction of queues with a certain queue length (or workload). Assuch the state of the memory at time t is given by the steady state π ( t ) of thediscrete time Markov chain with transition matrix M ( ρ ( t )) that captures theevolution of the memory, where ρ ( t ) is the fraction of busy servers at time t (seeSection 3 for some examples with ρ ( t ) = ρ ). For more details on the concept ofthe time-scale separation we employ, we refer the reader to [3].Let π ( t ), the ﬁrst entry of π ( t ), represent the probability that the memory isempty at time t . We modify the queue at the cavity by decreasing the potentialarrival rate to the queue at the cavity to λdπ ( t ), i.e. potential arrivals onlyoccur when there is no empty queue to join in memory. These potential arrivalsare then dealt with in the exact same manner as in the setting without memory.When the queue at the cavity is empty, we assume that on top of the potentialarrival rate of λdπ ( t ), we have an eﬀective arrival rate of λ − π ( t )1 − ρ ( t ) . The latterarrival rate can be interpreted as follows: jobs arrive at rate λN , with probability(1 − π ( t )) such a job is assigned to a queue in memory and with probability1 / ((1 − ρ ( t )) N ) the queue at the cavity gets the job as it is one of the (1 − ρ ( t )) N idle servers at time t .In the next section we study the cavity process of SQ( d ) and LL( d ) withmemory in detail. We assume job sizes have some general distribution withprobability density function (pdf) g , cumulative distribution function (cdf) G and complementary cdf (ccdf) ¯ G . For a random variable with cdf H we let E [ H ]denote its mean. Let µ = E [ G ] denote the mean service rate and note that wehave for the system load: ρ = λ · E [ G ]. Furthermore we let G denote a genericrandom variable with distribution G . We will sometimes assume that G is anexponential random variable. Furthermore, for LL( d ) we denote by f, F and ¯ F the pdf, cdf and ccdf of the workload distribution of the queue at the cavity in9quilibrium (note that we have ¯ F (0) = ρ ). For SQ( d ) with exponential job sizeswe denote by u k the equilibrium probability that the queue at the cavity has k or more jobs (with u = 1 and u = ρ ).

5. Analysis of the queue at the cavity

We now analyze the queue at the cavity described in the previous section forSQ( d ) and LL( d ). Note that the results presented in this section apply to anyof the memory schemes discussed in Section 3. To obtain results for a speciﬁcmemory scheme one simply replaces π by the appropriate expression. We showthat the equilibrium queue length and workload distribution of SQ( d ) and LL( d )with memory, respectively, have exactly the same form as in the same settingwithout memory if we replace λ by λπ /d and divide by π /d . With respectto the response time distribution, we show that the system with memory andarrival rate λ has the same response time distribution as the system withoutmemory and arrival rate λπ /d . d ) In this section we develop the analysis of the queue at the cavity for theSQ( d ) policy, we start by assuming job sizes are exponential and subsequentlywe consider Phase Type and general job sizes. We start by describing the transient behaviour of the queue at the cavity forSQ( d ): Proposition 2.

Consider the SQ( d ) policy with memory, exponential job sizeswith mean /µ and arrival rate λ < µ . Let u k ( t ) be the probability that thequeue at the cavity has k or more jobs at time t , then ddt u k ( t ) = λπ ( t )( u k − ( t ) d − u k ( t ) d ) − µ ( u k ( t ) − u k +1 ( t )) , (10) ddt u ( t ) = λπ ( t )( u ( t ) d − u ( t ) d ) + λ (1 − π ( t )) − µ ( u ( t ) − u ( t )) , (11) for k ≥ and u ( t ) = 1 .Proof. Let ∆ > k ≥ k or more jobs at time t + ∆.First, it may have exactly k jobs at time t and no departures occur in [ t, t + ∆],this occurs with probability: Q ,k = (1 − µ ∆)( u k ( t ) − u k +1 ( t )) + o (∆) . (12)It may also have k + 1 or more jobs at time t , and at most 1 departure occursin [ t, t + ∆]: Q ,k = u k +1 ( t ) + o (∆) . (13)10 third possibility is that it had exactly k − t and exactly onearrival occurs in [ t, t + ∆] which joined the queue at the cavity, this occurs withprobability: Q ,k = λd (cid:32)(cid:90) ∆0 π ( t + δ ) dδ (cid:33) ( u k − ( t ) − u k ( t )) (14) d − (cid:88) j =0 j + 1 (cid:18) d − j (cid:19) ( u k − ( t ) − u k ( t )) j · u k ( t ) d − j + o (∆)= λ (cid:32)(cid:90) ∆0 π ( t + δ ) dδ (cid:33) ( u k − ( t ) d − u k ( t ) d ) + o (∆) . (15)We now obtain: u k ( t + ∆) = Q ,k + Q ,k + Q ,k , subtracting u k ( t ) on both sides, dividing by ∆ and taking the limit ∆ → Q ,k , Q ,k and Q ,k as for k ≥ t and it experiences anarrival due to the memory induced arrival rate, this yields: Q , = λ (cid:32)(cid:90) ∆0 − π ( t + δ ) u ( t + δ ) − u ( t + δ ) dδ (cid:33) ( u ( t ) − u ( t )) + o (∆) , one then obtains u ( t + ∆) = Q , + Q , + Q , + Q , , subtracting u ( t ),dividing both sides by ∆ and taking the limit ∆ → u ( t ) = 1 is trivial by the deﬁnition of u ( t ).From the transient regime, we are able to deduce the equilibrium workloaddistribution: Theorem 1.

Consider the SQ( d ) policy with memory, exponential job sizeswith mean /µ and arrival rate λ < µ . Let u k be the equilibrium probability thatthe queue at the cavity has k or more jobs, then u k = ρ dk − d − · π dk − − d − = ( ρπ /d ) dk − d − /π /d , (16) for k ≥ and ρ = λ/µ .Proof. Taking the limit of t → ∞ in (10-11) we ﬁnd that the following holds:0 = λπ ( u d − u d ) + λ (1 − π )1 − ρ · ( u − u ) − µ · ( u − u ) , λπ ( u dk − − u dk ) − µ · ( u k − u k +1 ) , for k ≥

2. Summing all of these equations yields u = ρ , while taking the sumfor k ≥ j implies that u j = λπ u dj − for j ≥

2. This simple recurrence relationhas (16) as its unique solution. 11omparing (16) with the solution of (1), we see that u k is identical as in thesetting without memory if we replace ρ by ρπ /d and divide by π /d (even for k = 1). Theorem 2.

Let < λ < µ be arbitrary and R the response time of the SQ( d )policy with memory, exponential job sizes with mean /µ and arrival rate λ .Further, let ˜ R denote the response time for the same system without memory,but with arrival rate λπ /d , then ˜ R and R have the same distribution.Proof. Let us denote by u k and v k the probability that the queue at the cavityhas at least k jobs for the system with and without memory, respectively. Wehave u k = π − /d · v k , for k ≥ u = v = 1. Let ¯ F X be the ccdf of X , then¯ F R ( w ) = (1 − π ) e − µw + π ∞ (cid:88) k =0 ( u dk − u dk +1 ) k (cid:88) n =0 w n n ! e − µw , as with probability (1 − π ) the job joins an idle queue from memory (meaningthe response time is simply exponential) and with probability π ( u dk − u dk +1 )the job joins a queue with length k (yielding an Erlang k + 1 response time).Exchanging the order of the sums and using π u dk = v dk , for k ≥

1, implies that¯ F R ( w ) = (1 − π ) e − µw + ∞ (cid:88) n =1 w n n ! e − µw v dn + π e − µw = ∞ (cid:88) n =0 w n n ! e − µw v dn . Similarly, ¯ F ˜ R ( w ) = ∞ (cid:88) k =0 ( v dk − v dk +1 ) k (cid:88) n =0 w n n ! e − µw = ∞ (cid:88) n =0 w n n ! e − µw v dn . Phase Type (PH) distributions consist of all distributions which have a mod-ulating ﬁnite background Markov chain (see also [12]). They form a broad spec-trum of distributions as any positive valued distribution can be approximatedarbitrarily close by a PH distribution. Moreover, various ﬁtting tools are avail-able online for PH distributions (e.g. [11, 18]). A PH distribution with ¯ G (0) = 1is fully characterized by a stochastic vector α = ( α i ) ni =1 and a subgenerator ma-trix A = ( a i,j ) ni,j =1 such that ¯ G ( w ) = αe Aw , where is a column vector ofones.We ﬁnd that the result found in Theorem 2 generalizes to the case of PHdistributed job sizes. Theorem 3.

Let < λ < µ (with /µ the mean of the job size distribution)be arbitrary and R the response time for a memory dependent version of theSQ( d ) policy with PH distributed job sizes with parameters ( α, A ) . Further, let ˜ R denote the response time for the classic SQ( d ) policy with the same job sizedistribution and arrival rate λπ /d , then R and ˜ R have the same distribution. roof. Let us denote by u k,j ( t ) resp. v k,j ( t ) the probability that, at time t , thequeue at the cavity has at least k jobs and the job at the head of the queueis in phase j for the memory dependent scheme resp. the memory independentscheme. Furthermore let u k,j and v k,j denote the limit of t → ∞ for thesevalues. We ﬁrst show that u k,j = π − /d · v k,j . Throughout we let ν = − A (with a vector consisting of only ones). For v k,j we ﬁnd by an analogousreasoning as in [22] that for k ≥ ddt v k,j ( t ) = λπ ( t ) /d v k − ,j ( t ) − v k,j ( t ) v k − ( t ) − v k ( t ) ( v k − ( t ) d − v k ( t ) d + (cid:88) j (cid:48) ( v k,j (cid:48) ( t ) A j (cid:48) ,j + v k +1 ,j (cid:48) ( t ) ν j (cid:48) α j ) , (17)where v k ( t ) denotes (cid:80) j v k,j ( t ) (further on, we also use this notation for v k , u k ( t )and u k ). For k = 1 we ﬁnd: ddt v ,j ( t ) = α j λπ ( t ) /d (1 − v ( t ) d ) + (cid:88) j (cid:48) ( v ,j (cid:48) ( t ) A j (cid:48) ,j + v ,j (cid:48) ( t ) ν j (cid:48) α j ) . (18)Taking the limit of t to inﬁnity and multiplying by π − /d we ﬁnd that (17)yields for the equilibrium distribution (with k ≥ π λ ( π − /d v k − ,j ) − ( π − /d v k,j )( π − /d v k − ) − ( π − /d v k ) · (cid:16) ( π − /d v k − ) d − ( π − /d v k ) d (cid:17) + (cid:88) j (cid:48) ( π − /d v k,j (cid:48) ) A j (cid:48) ,j + ( π − /d v k +1 ,j (cid:48) ) ν j (cid:48) α j . (19)while for k = 1 one may compute from (18):0 = α j λ (cid:16) − π ( π − /d v ) d (cid:17) + (cid:88) j (cid:48) (cid:18) ( π − /d v ,j (cid:48) ) A j (cid:48) ,j + ( π − /d v ,j (cid:48) ) ν j (cid:48) α j (cid:19) (20)For ( u k,j ( t )) with k ≥

2, we ﬁnd the same ODE as (17) but with λπ ( t ) ratherthan λπ /d ( t ). Taking the limit t → ∞ it is not hard to see that u k,j satisﬁes(19) with π − /d v k,j replaced by u k,j . Furthermore for u ,j ( t ) we ﬁnd (similarto Proposition 2): ddt u ,j ( t ) = λα j π ( t )(1 − u ( t ) d ) + λα j (1 − π ( t ))+ (cid:88) j (cid:48) ( u ,j (cid:48) ( t ) A j (cid:48) ,j + u ,j (cid:48) ( t ) ν j (cid:48) α j ) . Taking t → ∞ it is not hard to see how this equation for u k,j reduces to (20)with π − /d v k,j replaced by u k,j . This shows that we indeed have for all k and j that u k,j = π − /d v k,j . 13or the response time distribution we denote by X k,j the response time ofa job that joins a queue with length k in phase j . We ﬁnd for the memorydependent policy:¯ F R ( w ) = (1 − π ) ¯ G ( w ) + π (cid:18) (1 − u d ) ¯ G ( w )+ ∞ (cid:88) k =1 (cid:88) j u k,j − u k +1 ,j u k − u k +1 · ( u dk − u dk +1 ) P { X k,j > w } (cid:19) = (1 − ( π /d u ) d ) ¯ G ( w )+ ∞ (cid:88) k =1 (cid:88) j π /d u k,j − π /d u k +1 ,j π /d u k − π /d u k +1 (cid:18) ( π /d u k ) d − ( π /d u k +1 ) d (cid:19) P { X k,j > w } One can now easily check that R and ˜ R indeed coincide. We further generalize the results given in section 5.1.2 to the case of generaljob sizes. In particular we show the following result :

Theorem 4.

Let < λ < µ (with /µ the mean of the job size distribution) bearbitrary and R the response time for a memory dependent version of the SQ( d )policy with an arbitrary job size distribution. Further, let ˜ R denote the responsetime for the classic SQ( d ) policy with the same job size distribution and arrivalrate λπ /d , then R and ˜ R have the same distribution.Proof. Let us denote by x k ( t, w ) resp. y k ( t, w ) the density at which, at time t ,the queue at the cavity has exactly k jobs and the job at the head of the queuehas a remaining size exactly equal to w for the memory dependent schemeresp. the memory independent scheme. Associated to these values, we denote u k ( t ) = (cid:82) ∞ (cid:80) (cid:96) ≥ k x (cid:96) ( t, w ) dw and v k ( t ) = (cid:82) ∞ (cid:80) (cid:96) ≥ k y (cid:96) ( t, w ) dw . Furthermorelet x k ( w ) , y k ( w ) and u k , v k denote the limit of t → ∞ for these values. We ﬁrstshow that x k ( w ) = π − /d · y k ( w ) (and consequently also u k = π − /d · v k ).Let us ﬁrst consider x k ( t, w ) for k ≥

2. Analogously to the proof for expo-nential and Phase Type job sizes, we obtain: x k ( t + ∆ , w ) = x k ( t, w + ∆) − λdx k ( t, w + ∆) (cid:90) ∆0 π ( t + δ ) · d − (cid:88) j =0 j + 1 (cid:18) d − j (cid:19) ( u k ( t + δ ) − u k +1 ( t + δ )) j u k +1 ( t + δ ) d − j − dδ + λdx k − ( t, w + ∆) (cid:90) ∆0 π ( t + δ ) d − (cid:88) j =0 j + 1 (cid:18) d − j (cid:19) ( u k − ( t + δ ) − u k ( t + δ )) j u k +1 ( t + δ ) d − j − dδ + (cid:90) ∆0 x k +1 ( t, δ ) g ( w + ∆ − δ ) dδ + o (∆) . x k ( t, w ) on both sides, dividing both sides by ∆ and taking thelimit ∆ → + we obtain the following system of IDEs: ∂x k ( t, w ) ∂t − ∂x k ( t, w ) ∂w = − λπ ( t ) x k ( t, w ) x k ( t ) ( u k ( t ) d − u k +1 ( t ) d )+ λπ ( t ) x k − ( t, w ) x k − ( t ) ( u k − ( t ) d − u k ( t ) d ) + x k +1 (0 + ) g ( w ) . Taking the limit of t → ∞ we obtain: x (cid:48) k ( w ) = λπ x k ( w ) x k ( u dk − u dk +1 ) − λπ x k − ( w ) x k − ( u dk − − u dk ) − x k +1 (0 + ) g ( w ) . (21)A diﬀerential equation for the system without memory can be inferred from (21)by setting π = 1 and replacing λ by λπ /d : y (cid:48) k ( w ) = λπ /d y k ( w ) y k ( v dk − v dk +1 ) − λπ /d y k − ( w ) y k − ( v dk − − v dk ) − y k +1 (0 + ) g ( w ) . Multiplying both sides by π − /d , we ﬁnd that y k satisﬁes the following (for k ≥ π − /d y k ( w )) (cid:48) = λπ π − /d y k ( w ) π − /d y k (( π − /d v k ) d − ( π − /d v k ) d ) − λπ π − /d y k − ( w ) π − /d y k − (( π − /d v k − ) d − ( π − /d v k ) d ) − ( π − /d y k +1 (0 + )) g ( w ) , which is identical to (21) if we replace x k with π − /d y k .It remains to look at the case k = 1. For this case, the arrivals we need toconsider are those which occur when the queue at the cavity is empty. Therefore,we need to consider two types of arrivals: those which occur due to the fact thatthe queue at the cavity is in the memory and those which occur due to the queueat the cavity being selected by the SQ( d ) policy. For the arrivals incurred bythe memory we ﬁnd :lim t →∞ lim ∆ → + λ (cid:90) ∆0 (1 − π ( t + δ )) g ( w + ∆ − δ ) dδ + o (∆)∆ = λ (1 − π ) g ( w ) . The arrivals incurred from the SQ( d ) policy are similar to the case k ≥

2, weobtain that x (cid:48) ( w ) satisﬁes: x (cid:48) ( w ) = λπ x ( w ) x ( u d − u d ) − λπ (1 − u d ) g ( w ) − x (0 + ) g ( w ) − λ (1 − π ) g ( w ) . (22)15or the system without memory we replace π by 1 and λ by λπ /d . If we thenmultiply both sides by π − /d , we obtain:( π − /d y ( w )) (cid:48) = λπ π − /d y ( w ) π − /d y (( π − /d v ) d − ( π − /d v ) d ) − λg ( w ) + λπ g ( w )( π − /d v ) d − ( π − /d y (0 + )) g ( w ) . (23)It is not hard to see that (22) and (23) are equivalent (with x k replaced by π − /d y k ). This shows that we indeed have x k ( w ) = π − /d y k ( w ) for all k ≥ w ≥ F R ( w ) = (1 − π ) ¯ G ( w )+ π (1 − x d ) ¯ G ( w ) + π ∞ (cid:88) k =1 (cid:90) w x k ( s ) x k ( u dk − u dk +1 ) P {G ∗ k > w − s } ds = (1 − ( π /d x ) d ) ¯ G ( w )+ ∞ (cid:88) k =1 (cid:90) w x k ( s ) x k (( π /d u k ) d − ( π /d u k +1 ) d ) P {G ∗ k > w − s } ds. Analogously one can compute ¯ F ˜ R to complete the proof. Remark 1.

When d = 1 in Theorem 4, the system without memory reduces toan ordinary M/G/1 queue with arrival rate λπ /d for which many results exist.In particular, we ﬁnd from the Pollaczek-Khinchin formula that the followingholds: R ∗ ( w ) = (1 − π /d ) ρ G ∗ ( w ) wπ /d λ G ∗ ( w ) + w − π /d λ (24) with R ∗ and G ∗ the Laplace transform of R and G , respectively. Using the ISMscheme presented in Section 3.5, this allows one to analyze the JIQ policy withﬁnite memory by plugging π = − (1 − ρ ) A +1 ρ into (24) (see also Proposition 4). Using the ideas in Theorem 4 we are able to show that u = ρ holds: Proposition 3.

For the memory dependent SQ( d ) policy with general job sizeswe have u = ρ .Proof. We use the same notation as in the proof of Theorem 4. Further-more, we denote ˜ x k ( w ) = (cid:82) ∞ w x k ( u ) du . We now wish to show that u = (cid:80) ∞ k =1 (cid:82) ∞ x k ( w ) dw = ρ .Integrating (22) from w to ∞ , we ﬁnd: x ( w ) = − λπ ˜ x ( w ) x ( u d − u d )+ λπ (1 − u d ) ¯ G ( w ) + x (0 + ) ¯ G ( w ) + λ (1 − π ) ¯ G ( w ) . (25)16or k ≥ w to inﬁnity): x k ( w ) = − λπ ˜ x k ( w ) x k ( u dk − u dk +1 ) + λπ ˜ x k − ( w ) x k − ( u dk − − u dk ) + x k +1 (0 + ) ¯ G ( w ) . (26)It is now easy to see from taking the sum of (25) and (26) (for all k ≥

2) thatfor any w : u ( w ) = λ (1 − π u d ) ¯ G ( w ) + u (0 + ) ¯ G ( w ) . (27)Integrating this expression from 0 to inﬁnity, we obtain: u = (cid:0) u (0 + ) + λ (1 − π u d ) (cid:1) E [ G ] . (28)Furthermore, it is not hard to see that we have for any k ≥ u k ( t + ∆) = (cid:90) ∆0 (cid:0) u k ( t + δ ) − x k ( t + δ, ∆ − δ ) (cid:1) dδ + λ (cid:90) ∆0 π ( t + δ ) (cid:0) u k − ( t + δ ) d − u k ( t + δ ) d (cid:1) dδ + o (∆) ,u (cid:48) k ( t ) = − x k ( t, + ) + λπ ( t )( u dk − ( t ) − u k ( t ) d ) , letting t → ∞ this leads to:0 = − x k (0 + ) + λπ ( u dk − − u dk ) . Taking the sum of these equations for k ≥ u (0 + ) = λπ u d . Using this allows us to conclude that u = λ E [ G ] = ρ from (28).In the following Proposition, we obtain π for the ISM memory schemepresented in Section 3.5. Proposition 4.

For the SQ( d ) policy with general job sizes and the ISM mem-ory scheme presented in Section 3.5 we have π = 1 − (1 − ρ d ) A +1 ρ d . Proof.

We use the same notation as in the proof of Proposition 3. The rateat which servers send probes is equal to x (0 + ) (which is equal to the rate atwhich servers become idle). Therefore, the memory state evolves as a birth-death process with birth rate x (0 + ) and death rate λ . From taking the limit w → + in (27), we ﬁnd that x (0 + ) = λ (1 − π u d ).We consequently ﬁnd that due to the birth-death structure: π = 1 (cid:80) Ai =0 (1 − π ρ d ) i = π ρ d − (1 − π ρ d ) A +1 . From this one easily completes the proof.17n particular, Proposition 4 holds for d = 1, which provides a closed form of π for JIQ with ﬁnite memory size. d ) For LL( d ), we again start by describing the transient regime (the proof issimilar to the one presented in [9]). Proposition 5.

The density of the cavity process associated to the memorydependent LL( d ) policy satisﬁes the following Partial Integro Diﬀerential Equa-tions (PIDEs): ∂f ( t, w ) ∂t − ∂f ( t, w ) ∂w = λdπ ( t ) (cid:90) w f ( t, u ) ¯ F ( t, u ) d − g ( w − u ) du + λπ ( t )(1 − ¯ F ( t, d ) g ( w ) − λdπ ( t ) f ( t, w ) ¯ F ( t, w ) d − + λ (1 − π ( t )) g ( w ) (29) ∂ ¯ F ( t, ∂t = − f ( t, + ) + λπ ( t )(1 − ¯ F ( t, w ) d ) + λ (1 − π ( t )) , (30) for w > , where f ( x, z + ) = lim y ↓ z f ( x, y ) .Proof. Assume w > w > ∆ > d ), we write: f ( t + ∆ , w ) = Q ,w + Q ,w + Q ,w . (31)For Q ,w we consider the case where no arrivals occur in the interval [ t, t + ∆]:if the cavity queue at time t has a workload exactly equal to w + ∆ and receivesno arrivals in [ t, t + ∆], it has a workload equal to w at time t + ∆. Thereforewe ﬁnd: Q ,w = f ( t, w + ∆) − λd (cid:32)(cid:90) ∆0 π ( t + δ ) f ( t + δ, w + ∆ − δ ) dδ (cid:33) + o (∆) . For Q ,w we consider the case where a single arrival occurs when the queue atthe cavity is busy: in this case at some time t + δ, δ ∈ [0 , ∆] an arrival of size w + ∆ − u occurs, while the queue at the cavity has workload u − δ for some u ∈ ( δ, w + ∆]. This arrival only joins the queue at the cavity if the other d − u − δ , hence we ﬁnd: Q ,w = λd (cid:90) ∆0 π ( t + δ ) (cid:90) w +∆ u = δ f ( t + δ, u − δ ) ¯ F ( t + δ, u − δ ) d − g ( w + ∆ − u ) dudδ + o (∆) . w + ∆ − δ arrives at time t + δ for some δ ∈ [0 , ∆]. Hence, Q ,w = λd (cid:90) ∆0 π ( t + δ ) (1 − ¯ F ( t + δ, d ) d g ( w + ∆ − δ ) dδ + λ (cid:90) ∆0 − π ( t + δ )1 − ¯ F ( t + δ,

0) (1 − ¯ F ( t + δ, g ( w + ∆ − δ ) dδ + o (∆) . By subtracting f ( t, w + ∆), dividing by ∆ and letting ∆ decrease to zero, weﬁnd (29) from (31).We still require an equation for F ( t, t + ∆ by remaining idle in [ t, t + ∆] or byhaving a workload equal to ∆ − δ, δ < ∆ at time t + δ . We therefore ﬁnd: F ( t + ∆ ,

0) = F ( t, − λd (cid:90) ∆0 π ( t + δ ) (1 − ¯ F ( t + δ, d ) d dδ − λ (cid:90) ∆0 − π ( t + δ )1 − ¯ F ( t + δ,

0) (1 − ¯ F ( t + δ, dδ + (cid:90) ∆0 f ( t + δ, ∆ − δ ) dδ + o (∆) , subtracting F ( t, − d ) policy with memory: Theorem 5.

The ccdf of the equilibrium workload distribution for the cavityprocess associated to an LL( d ) policy with memory satisﬁes the following IDE: ¯ F (cid:48) ( w ) = − λ (cid:20) ¯ G ( w ) + π · (cid:18) − ¯ F ( w ) d + (cid:90) w ¯ F ( u ) d g ( w − u ) du (cid:19)(cid:21) . (32) with boundary condition ¯ F (0) = ρ . Equivalently we have: ¯ F ( w ) = ρ − λ (cid:90) w (1 − π ¯ F ( u ) d ) ¯ G ( w − u ) du. (33) with π the probability that the memory is empty.Proof. To show this result, one ﬁrst lets t → ∞ in (29-30), this way we re-move the ∂f ( t,w ) ∂t and ∂ ¯ F ( t, ∂t . One then integrates (29) once and uses (30) as aboundary condition. Using Fubini, simple integration techniques and the factthat f ( w ) = − ¯ F (cid:48) ( w ) we obtain (32). The last equality (33) can be shown byintegrating once more and applying Fubini’s theorem.We can rewrite (33) as π /d ¯ F ( w ) = E [ G ]( λπ /d ) − ( λπ /d ) (cid:90) w (1 − ( π /d ¯ F ( u )) d ) ¯ G ( w − u ) du. F ( w ) in a system with memoryis equal to the same probability in a system without memory with arrival rate λπ /d divided by π /d . Due to (4) we therefore have the following corollary: Corollary 1.

The equilibrium workload of the queue at the cavity of an LL( d )system with memory and exponential job sizes is given by ¯ F ( w ) = ( ρπ + ( ρ − d − ρπ ) e ( d − w ) − d (34)We are now able to show our main result for a memory dependent LL( d )policy: Theorem 6.

Let < ρ = λ E [ G ] < be arbitrary and R the response time ofthe memory dependent LL( d ) policy with mean job size E [ G ] and arrival rate λ .Further, let ˜ R denote the response time for the same system without memory,but with arrival rate λπ /d , then R and ˜ R have the same distributionProof. Let ¯ F ( w ) and ¯ H ( w ) be the ccdf of the workload for the system with andwithout memory, respectively. We have ¯ F ( w ) π /d = ¯ H ( w ) which yields:¯ F R ( w ) = (1 − π ) ¯ G ( w ) + π (cid:20) (cid:90) w ¯ F ( w − u ) d g ( u ) du + ¯ G ( w ) (cid:21) = ¯ G ( w ) + (cid:90) w ¯ H ( w − u ) d g ( u ) du, which can easily be seen to be equal to ¯ F ˜ R ( w ).By using the results in this section, one can easily generalise many of theresults presented in [9] including an analytical proof that LL( d ) outperformsSQ( d ) and closed form solutions for the response time distribution, mean re-sponse time and mean workload. Proposition 6.

For the LL( d ) policy with the ISM memory scheme presentedin Section 3.5 we have π = 1 − (1 − ρ d ) A +1 ρ d for any job size distribution.Proof. The rate at which servers send probes is equal to f (0) = − ¯ F (cid:48) (0) and itfollows from (32) that f (0) = λ (1 − π ρ d ). The memory state therefore evolvesas a birth-death process with birth rate λ (1 − π ρ d ) and death rate λ . Theremainder of the proof is therefore identical to the proof of Proposition 4.20etup N = 10 N = 20 N = 50 N = 100 N = 2001 1.8839 1.5363 1.3556 1.3059 1.28322 1.4533 1.3119 1.2313 1.2045 1.19263 1.5906 1.3860 1.2787 1.2399 1.22154 1.9086 1.3981 1.1643 1.1158 1.09995 2.3918 2.0132 1.8200 1.7733 1.74076 1.7583 1.5920 1.4943 1.4578 1.44047 2.0504 1.8040 1.6643 1.6161 1.59018 2.2790 1.5924 1.2950 1.2352 1.2186Setup N = 500 N = 1000 N = 3000 Cavity Method1 1.2683 1.2638 1.2574 1.25832 1.1836 1.1810 1.1794 1.17873 1.2110 1.2097 1.2068 1.20584 1.0928 1.0921 1.0896 1.08885 1.7252 1.7178 1.7146 1.71386 1.4314 1.4304 1.4257 1.42567 1.5753 1.5736 1.5667 1.56608 1.2097 1.2096 1.2070 1.2056 Table 1: Comparison of mean response time for the ﬁnite system and the cavity method.

6. Finite System Accuracy

The results presented in Section 5 all focused on the cavity process of SQ( d )and LL( d ) with memory. In Table 1 we present simulation results which il-lustrate that the stationary mean response time in a ﬁnite stochastic systemconsisting of N servers converges to the mean response time obtained using thecavity method. We simulated a system with N = 10 , , , , , , λN , the runtime was set to 10 /N and we used a warm-up period equal to a third of the runtime. Job sizes havemean one and are either exponential or hyperexponential with balanced meansand a Squared Coeﬃcient of Variation (SCV) equal to 2 or 3.The following 8 arbitrarily chosen settings have been considered: Setup λ = 0 .

9, exponential job sizes and the IP memory scheme.

Setup λ = 0 .

8, exponential job sizes and the CP memory scheme(meaning memory is of inﬁnite size).

Setup λ = 0 .

8, hyperexponential job sizes with SCV equal to 2 andBCP memory scheme with A = 5. Setup λ = 0 .

85, hyperexponential job sizes with SCV equal to 3 andthe ISM memory scheme with A = 10.Setups 5 through 8 are the same as 1 through 4, but using SQ( d ) rather thanLL( d ). In all cases the mean response time appears to converge towards theresponse time of the cavity method. Note that in the last two setups we areconsidering SQ( d ) with memory and hyperexponential job sizes. In this case theresponse time of the cavity method is simply computed as the response time in21 .5 0.6 0.7 0.8 0.9 1012345 (a) Mean response time. (b) Number of probes used per arrival. (c) Prob. of having empty memory. Figure 1: Performance of the diﬀerent memory schemes for SQ(5) with exponential job sizeswith mean one. the same system without memory, but with arrival rate λπ /d .

7. Numerical Example

In this section we brieﬂy demonstrate the type of numerical results thatcan be obtained using our ﬁndings. This section is not intended as a detailedcomparison of the diﬀerent memory schemes presented in Section 3.Figure 1 focuses on the SQ(5) policy with exponential job sizes with meanone and a memory size A of 4 (except for CP). For the BCP and ISM memoryschemes the dispatcher is assumed to send its d probes one at a time (if memoryis empty upon a job arrival) and stops probing as soon as an idle server is found.This is also the case for the setting without memory (labeled No memory ). Forthe CP memory scheme we assume that the dispatcher has inﬁnite memory. Weplot the mean response times, the probabity of having empty memory π andthe average number of probes/messages used per job arrival.In Figure 1a we see that the mean response time is nearly optimal for allschemes when the load is low (say below 0 . d probes at once (which is faster). Looking at both the meanresponse time and number of probes/messages used, the ISM scheme is clearlybest in this case.In Figure 1c we look at the probability of having an empty memory when ajob arrives. For the IP scheme and a load λ ≈

0, the dispatcher almost alwaysdiscovers 5 idle servers and therefore π is close to 1 /

5. For (B)CP we note thatas long as the load is suﬃciently low (that is, 5 < − λ or equivalently λ < / π ≈

0, but for larger λ values it sharply increases to one. When λ ≈ / λ is suﬃcientlysmall: π ≈

15 = 1 A + 1 = lim ρ → + − (1 − ρ d ) A +1 ρ d , which is independent of d . Only when λ is close to one, π starts a very steepclimb to one.

8. Mean Field Limit Under Heavy Traﬃc

Throughout this section and Section 9, we assume job sizes are exponentialwith mean equal to one. The assumption that the mean equals one is merely atechnicality to ease notation. Our goal is to compute the limit:lim λ → − − E [ R λ ]log(1 − λ ) , (35)where R λ denotes the response time for some SQ( d )/LL( d ) memory based loadbalancing policy. To this end, we employ the framework developed in [10]. Notethat this limit gives an indication of the performance of the load balancing policyunder a high load. Moreover, it is easy to see that this limit remains unchangedif we swap the mean response time by either the mean waiting time or the meanqueue length/workload. To emphasize that π depends on λ , we denote π as π ( λ ) in this section. Deﬁne T λ ( x ) = λπ ( λ ) x d , (36)and note that ( u k ) k for SQ( d ) resp. ¯ F ( w ) for LL( d ) satisfy the relations u k +1 = T λ ( u k ) resp. ¯ F (cid:48) ( w ) = T λ ( ¯ F ( w )) − ¯ F ( w ).23 heorem 7. For the memory dependent SQ( d ) policy, provided that lim λ → − π (cid:48) ( λ ) < ∞ and lim λ → − π ( λ ) = 1 , we obtain the heavy traﬃc limit: lim λ → − − E [ R ( SQ ) λ ]log(1 − λ ) = 1log( d ) , (37) while for the LL( d ) variant we have: lim λ → − − E [ R ( LL ) λ ]log(1 − λ ) = 1 d − . (38) Proof.

We validate the requirements (a)–(g) of [10] from which the result di-rectly follows.(a) In this step we should show there exists some continuous function u · : λ → u λ such that u λ ∈ (1 , ∞ ), T λ ( u λ ) = u λ and lim λ → − u λ = 1. For ourmodel, it is not hard to ﬁnd an explicit formula for u λ = T λ ( u λ ), namely : u λ = ( λπ ( λ )) / (1 − d ) . (b) One trivially veriﬁes that T λ (0) = 0, and for any u ∈ (0 ,

1) we have T λ ( u )

0. For d ≥ x · h (cid:48) λ ( x )):( x h (cid:48) λ ( x )) (cid:48) = − λπ ( λ ) d ( d − x ( u λ − x ) d − , which is negative. As one easily veriﬁes that ( x h (cid:48) λ ( x )) equals zero in x = 0,this indeed shows h λ is decreasing on [ u λ − , u λ ].(d) This is a technicality which is automatically satisﬁed because h λ is decreas-ing on [ u λ − , u λ ], which we showed in the previous step.(e) For this step we need to compute the value of: A = lim λ → − h λ ( u λ −

1) = lim λ → − u λ − λπ ( λ ) u λ − λ → − u (cid:48) λ − π ( λ ) − λπ (cid:48) ( λ ) u (cid:48) λ . (39)It is not hard to see that: u (cid:48) λ = 11 − d ( λπ ( λ )) − d/ ( d − · ( π ( λ ) + λπ (cid:48) ( λ )) . λ → − π ( λ ) = 1, we obtain (continuing from (39)): A = lim λ → − π (cid:48) ( λ )1 − d − − π (cid:48) ( λ ) π (cid:48) ( λ )1 − d = d. (f) For this this step, we need to compute the value B = lim λ → − log( u λ − − λ ) = lim λ → − − (1 − λ ) u (cid:48) λ u λ − , at this point, we use the assumption that lim λ → − π (cid:48) ( λ ) < ∞ , as thisimplies that u (cid:48) λ < ∞ , allowing us to use l’Hopital only on − λu λ − , by whichit trivially follows that B = 1.(g) For the last step, we should verify that lim ε → + lim λ → − h λ ( ε ) = A . In-deed, lim ε → + lim λ → − h λ ( ε ) = lim ε → + − (1 − ε ) d ε = d = A. It is not hard to see that Theorem 7 applies to all policies described inSection 3, except the ISM policy which we discussed in Section 3.5. Indeed,ISM is the only policy for which lim λ → − π (cid:48) ( λ ) = ∞ , see also Figure 1c. Wetherefore ﬁnd the heavy traﬃc limit to be slightly diﬀerent in case of ISM. Theorem 8.

For the memory dependent SQ( d ) policy with ISM and a memorysize equal to A , we obtain the heavy traﬃc limit: lim λ → − − E [ R ( SQ ) λ ]log(1 − λ ) = 1 A + 1 1log( d ) , (40) while for the LL( d ) variant we have: lim λ → − − E [ R ( LL ) λ ]log(1 − λ ) = 1 A + 1 1 d − . (41) Proof.

One can copy the proof of Theorem 7, except for step (f), as it was usedthat lim λ → − π (cid:48) ( λ ) < ∞ , while it is easy to verify that this limit is indeedinﬁnite for ISM. Let us ﬁrst compute u (cid:48) λ , using (9) we ﬁnd: u (cid:48) λ = (cid:104) ( λπ ( λ )) / (1 − d ) (cid:105) (cid:48) = (cid:18) λ (1 − (1 − λ d ) / ( A +1) ) / ( d − (cid:19) (cid:48) = ( d − A + 1) − dλ d (1 − (1 − λ d ) A +1 ) − (1 − λ d ) − AA +1 ( d − A + 1)(1 − (1 − λ d ) / ( A +1) ) / ( d − , λ → − − λ − u λ =0, we obtain: B = lim λ → − (1 − λ ) u (cid:48) λ − u λ = − lim λ → − (cid:20) − λ − u λ dλ d ( d − A + 1) (1 − λ d ) − A/ ( A +1) (1 − (1 − λ d ) / ( A +1) ) d/ ( d − (cid:21) = − d ( d − A + 1) lim λ → − (cid:20) − λ (1 − λ d ) A/ ( A +1) (1 − u λ ) (cid:21) = − d ( d − A + 1) lim λ → − − λ − λ d · lim λ → − (1 − λ d ) / ( A +1) − u λ = − d − A + 1) · lim λ → − (1 − λ d ) / ( A +1) − λ ( − (1 − λ d ) / ( A +1) ) / ( d − . For ease of notation, let us deﬁne ξ = (1 − λ d ) / ( A +1) . We ﬁnd that the abovesimpliﬁes to : B = − d − A + 1) lim ξ → + ξ (1 − ξ ) / ( d − (1 − ξ ) / ( d − − (1 − ξ A +1 ) /d = 1 A + 1 , where the last equality follows from a ﬁnal use of l’Hopital’s rule. Combiningthis with the proof of Theorem 7, we may conclude the proof.

9. Mean Field Limit Under Low Traﬃc

In this section, we investigate the system in the low traﬃc limit ratherthan the heavy traﬃc limit. In particular, we investigate the behaviour ofthe expected waiting time in the low traﬃc limit, i.e. lim λ → − ( E [ R λ ] − λ → − E [ R (1) λ ] − E [ R (2) λ ] − . Here R (1) λ , R (2) λ denote the response time distribution of two diﬀerent load bal-ancing policies with arrival rate λ . We take the quotient of the expected waitingtimes rather than response times as the quotient for the expected response timesis trivially one for any 2 load balancing policies. This quantity signiﬁes thequality of a policy under a low arrival rate. In particular we have the followingresult: Proposition 7.

Let R ( i ) λ ( i = 1 , ) denote the response time for a memory de-pendent load balancing policy with probability π ( i )0 ( λ ) of having an empty mem-ory, using either SQ( d i ) or LL( d i ). Furthermore, we assume job sizes are ex-ponentially distributed with mean . We ﬁnd: . If d < d , we have lim λ → + E [ R (1) λ ] − E [ R (2) λ ] − ∞ .

2. If d = d = d and both policies use the same strategy (either SQ( d ) orLL( d )), we have: lim λ → + E [ R (1) λ ] − E [ R (2) λ ] − λ → + π (1)0 ( λ ) π (2)0 ( λ ) .

3. If d = d = d and R (1) λ employs the SQ( d ) policy while R (2) λ uses theLL( d ) policies, we ﬁnd: lim λ → + E [ R (1) λ ] − E [ R (2) λ ] − λ → + π (1)0 ( λ ) dπ (2)0 ( λ ) . Proof.

From (2) with arrival rate λπ /d and mean job size equal to 1, one ﬁndsthat the expected response time for SQ( d ) is given by: E [ R ( SQ ( d )) λ ] − ∞ (cid:88) n =2 ( λπ /d ) dn − d − − . By Theorem 2, we ﬁnd that this expression corresponds to the mean responsetime of a memory based SQ( d ) policy. Analogously, for LL( d ) and using The-orem 6, we obtain that the expected response time for a memory based LL( d )policy with exponential job sizes of mean one is given by: E [ R λ ] − ∞ (cid:88) n =1 ( λπ /d ) dn n ( d − . (42)In order to compute the sought limits, one only retains the terms with the lowestpower of λ . For example, assume d < d and we wish to compare LL( d ) withLL( d ), it follows from (42) that:lim λ → + E [ R (1) λ ] − E [ R (2) λ ] − λ d π λ d π · d − d −

1) = ∞ . As a second example let us consider case (3), we ﬁnd:lim λ → + E [ R (1) λ ] − E [ R (2) λ ] − λπ /d ) d ( λπ /d ) d d = lim λ → + dπ (1)0 ( λ ) π (2)0 ( λ ) . emark 2. For the methods discussed in Section 3 we can easily compute thelimit lim ρ → + π ( ρ ) . Indeed, by elementary calculus we ﬁnd: • For IP we have lim ρ → + π ( ρ ) = d . • For CP and BCP we have lim ρ → + π ( ρ ) = 0 . • For ISM with memory size A we have lim ρ → + π ( ρ ) = A +1 .In particular, we see that, while ISM is the dominant policy in heavy traﬃc, itdoes not perform as well in low traﬃc. See also Figure 1c for an example.

10. Conclusions and Future Work

In this paper we studied the cavity process of the SQ( d ) and LL( d ) loadbalancing policies with memory. The main insight provided was that the re-sponse time distribution of the cavity process with memory is identical to theresponse time distribution of the cavity process of the system without memoryif the arrival rate is properly set. This result holds for a large variety of memoryschemes including the ones presented in Section 3. This insight allowed us toanalyse the heavy and low traﬃc limit. Simulation results were presented whichsuggest that the cavity process corresponds to the exact limit process as thenumber of servers tends to inﬁnity.As future work, it may be possible to prove that the cavity process is theproper limit process. For SQ( d ) with exponential job sizes, one can build uponthe framework of [3], whilst for LL( d ) it might be possible to extent the frame-work in [19] to prove the ansatz for general job sizes. References [1] Reza Aghajani, Xingjie Li, and Kavita Ramanan. 2017. The PDE Methodfor the Analysis of Randomized Load Balancing Networks.

Proceedings ofthe ACM on Measurement and Analysis of Computing Systems

1, 2 (2017),38.[2] Jonatha Anselmi and Francois Dufour. 2020. Power-of-d-choices with mem-ory: Fluid limit and optimality.

Mathematics of Operations Research (2020).[3] Michel Benaim and Jean-Yves Le Boudec. 2008. A class of mean ﬁeldinteraction models for computer and communication systems.

Performanceevaluation

65, 11-12 (2008), 823–838.[4] M. Bramson, Y. Lu, and B. Prabhakar. 2010. Randomized load balancingwith general service time distributions. In

ACM SIGMETRICS 2010 . 275–286. https://doi.org/10.1145/1811039.1811071

Queueing Syst.

71, 3 (2012),247–292. https://doi.org/10.1007/s11134-012-9311-0 [6] Anton Braverman. 2018. Steady-state analysis of the Join the ShortestQueue model in the Halﬁn-Whitt regime. arXiv preprint arXiv:1801.05121 (2018).[7] Sergey Foss and Alexander L Stolyar. 2017. Large-scale join-idle-queuesystem with general service times.

Journal of Applied Probability

54, 4(2017), 995–1007.[8] David Gamarnik, John N Tsitsiklis, Martin Zubeldia, et al. 2020. A lowerbound on the queueing delay in resource constrained load balancing.

Annalsof Applied Probability

30, 2 (2020), 870–901.[9] T. Hellemans and B. Van Houdt. 2018. On the Power-of-d-choices withLeast Loaded Server Selection.

Proceedings of the ACM on Measurementand Analysis of Computing Systems

2, 2 (2018), 27.[10] Tim Hellemans and Benny Van Houdt. 2020. Heavy Traﬃc Analysis ofthe Mean Response Time for Load Balancing Policies in the Mean FieldRegime. arXiv preprint arXiv:2004.00876 (2020).[11] Jan Kriege and Peter Buchholz. 2014. PH and MAP ﬁtting with aggregatedtraﬃc traces. In

Measurement, Modelling, and Evaluation of ComputingSystems and Dependability and Fault Tolerance . Springer, 1–15.[12] Guy Latouche and Vaidyanathan Ramaswami. 1999.

Introduction to matrixanalytic methods in stochastic modeling . Vol. 5. Siam.[13] Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R Larus, and AlbertGreenberg. 2011. Join-Idle-Queue: A novel load balancing algorithm fordynamically scalable web services.

Performance Evaluation

68, 11 (2011),1056–1071.[14] M. Mitzenmacher. 2001. The Power of Two Choices in Randomized LoadBalancing.

IEEE Trans. Parallel Distrib. Syst.

12 (October 2001), 1094–1104. Issue 10.[15] Michael Mitzenmacher. 2019. The Supermarket Model with Known andPredicted Service Times. arXiv preprint arXiv:1905.12155 (2019).[16] Michael Mitzenmacher, Balaji Prabhakar, and Devavrat Shah. 2002. Loadbalancing with memory. In

The 43rd Annual IEEE Symposium on Foun-dations of Computer Science, 2002. Proceedings.

IEEE, 799–808.[17] K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. 2013. Sparrow:Distributed, Low Latency Scheduling. In

Proceedings of the Twenty-FourthACM Symposium on Operating Systems Principles (SOSP ’13) . ACM, NewYork, NY, USA, 69–84. https://doi.org/10.1145/2517349.2522716

Perform. Eval.

64, 7-8 (Aug. 2007), 629–645. https://doi.org/10.1016/j.peva.2006.09.002 [19] Seva Shneer and Alexander Stolyar. 2020. Large-scale parallel server systemwith multi-component jobs. arXiv preprint arXiv:2006.11256 (2020).[20] A. L. Stolyar. 2015. Pull-based load distribution in large-scale heteroge-neous service systems.

Queueing Systems

80, 4 (2015), 341–361. https://doi.org/10.1007/s11134-015-9448-8 [21] Mark van der Boor, Sem Borst, and Johan van Leeuwaarden. 2019. Hyper-scalable JSQ with sparse feedback.

Proceedings of the ACM on Measure-ment and Analysis of Computing Systems

3, 1 (2019), 1–37.[22] Ignace Van Spilbeeck and Benny Van Houdt. 2015. Performance of rate-based pull and push strategies in heterogeneous networks.

PerformanceEvaluation

91 (2015), 2–15.[23] N.D. Vvedenskaya, R.L. Dobrushin, and F.I. Karpelevich. 1996. Queue-ing System with Selection of the Shortest of Two Queues: an AsymptoticApproach.