Stability, memory, and messaging tradeoffs in heterogeneous service systems
aa r X i v : . [ c s . PF ] J u l STABILITY, MEMORY, AND MESSAGING TRADEOFFSIN HETEROGENEOUS SERVICE SYSTEMS
By David Gamarnik John N. Tsitsiklis and Martin Zubeldia th July, 2020
We consider a heterogeneous distributed service system, consist-ing of n servers with unknown and possibly different processing rates.Jobs with unit mean and independent processing times arrive as arenewal process of rate λn , with 0 < λ <
1, to the system. Incomingjobs are immediately dispatched to one of several queues associatedwith the n servers. We assume that the dispatching decisions aremade by a central dispatcher endowed with a finite memory, andwith the ability to exchange messages with the servers.We study the fundamental resource requirements (memory bitsand message exchange rate) in order for a dispatching policy to be maximally stable , i.e., stable whenever the processing rates aresuch that the arrival rate is less than the total available processingrate. First, for the case of Poisson arrivals and exponential servicetimes, we present a policy that is maximally stable while using a posi-tive (but arbitrarily small) message rate, and log ( n ) bits of memory.Second, we show that within a certain broad class of policies, a dis-patching policy that exchanges o (cid:0) n (cid:1) messages per unit of time, andwith o (log( n )) bits of memory, cannot be maximally stable. Thus, aslong as the message rate is not too excessive, a logarithmic memoryis necessary and sufficient for maximal stability. CONTENTS1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . 42 Model and main results . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Modeling assumptions . . . . . . . . . . . . . . . . . . . . . . 52.2 A maximally stable policy . . . . . . . . . . . . . . . . . . . . 52.3 A general class of dispatching policies . . . . . . . . . . . . . 72.4 Instability of resource constrained policies . . . . . . . . . . . 112.5 Stability versus resources tradeoff . . . . . . . . . . . . . . . . 133 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . 14A Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 14B Proof of Theorem 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.1 Local limitations of symmetry and finite memory . . . . . . . 18 B.2 High arrival rate to slow servers . . . . . . . . . . . . . . . . . 20References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Author’s addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1. Introduction.
Distributed service systems are pervasive, from thecheckout lines at the supermarket, to server farms for cloud computing. Ata high level, many of these systems involve a stream of incoming jobs thatare dispatched to a distinct queue associated with one of the servers (seeFigure 1 for a stylized model). Naturally, the behavior and performance ofsuch systems depends on the dispatching policy.
Incoming jobs Dispatcher ...
Servers
Fig 1 . Parallel server queueing system with a central dispatcher.
While delay performance and stability are important factors when choos-ing how to operate these systems, the huge number of servers in applicationssuch as multi-core processors and data centers has led to a desire for lowcommunication and memory requirements. On the other hand, communi-cation between the dispatcher and the servers, as well as memory at thedispatcher, allow the dispatcher to obtain and store information about thecurrent state of the queues and about the characteristics of the servers,leading to better dispatching decisions. This points to a tradeoff betweenthe resources utilized (in terms of communication overhead and memory),and the attainable delay performance and stability of the system.In this paper, we consider a heterogeneous distributed service system,where servers can have different and unknown processing rates, and explorethe tradeoff between the stability region of the system and the amount ofcommunication overhead and memory used to gather and store relevant in-formation. This complements the work in [5, 6], where the authors explore the tradeoff between the delay performance and the amount of communica-tion overhead and memory in a system with identical servers. In particular,in the setting of [5, 6] stability was easy to achieve (even with a static, ran-domized policy), and the focus was on the queueing delay going to zero (asthe arrival rate and the number of servers jointly increase). In the presentcontext, stability becomes an issue: the dispatcher must either “learn” therates of the different servers (and store this information in its memory), ormust use some dynamic queue-size information to stabilize the system.1.1.
Previous work.
There is a wide range of policies for operating thesystem described above, which result in different delay performances, sta-bility regions, and resource utilizations. For example, a most simple policyis to dispatch jobs uniformly at random. This policy requires no messageexchanges and no memory, but it is unstable if some server is slow enough.At the opposite extreme, the server can use dynamically available informa-tion and send incoming jobs to a shortest queue. This policy results in smalldelay and is maximally stable [3] but requires substantial communicationoverhead and an unbounded memory.Many intermediate policies have been proposed and analyzed in the past,with a focus on low resource usage. Most notably, the Power-of- d -Choices(also known as SQ( d )) was introduced and analyzed in [8, 13], and resultsin relatively low average delays for the jobs, while requiring a message rateproportional to the arrival rate, and no memory. However, the blind ran-domization used by the policy renders it unstable if there is at least onesufficiently slow server. Another popular policy is Join-Idle-Queue [7, 11],which leverages the power of memory (one bit per server) to obtain vanish-ing queueing delays (as the arrival rate and the number of servers jointly in-crease) while using roughly the same amount of communication overhead asthe Power-of- d -Choices. However, this policy also utilizes blind randomiza-tion that renders it unstable if there is at least one sufficiently slow server [2].Recently, there has been a focus on policies that attain a vanishing queue-ing delay while minimizing their resource usage. In particular, in [9] a varia-tion of the Power-of- d -Choices was shown to yield a vanishing queueing delaywhile using no memory, and a message rate that is superlinear in the arrivalrate. Moreover, variations of Join-Idle-Queue were shown to have vanishingqueueing delays with either a memory of size (in bits) superlogarithmic inthe number of servers and a message rate equal to the arrival rate [5], ora memory size (in bits) equal to the number of servers and a message ratestrictly smaller (but still proportional) to the arrival rate [12]. Last but notleast, a novel combination of size-based load balancing and Round-Robin was shown to have vanishing queueing delay using unbounded memory andno communication overhead [1].On the other hand, there are few policies in the literature that focus onmaximizing the stability region. In [10] the authors present and analyze avariation of Power-of- d -Choices that utilizes memory (of size logarithmicin the number of servers) to guarantee maximal stability. Furthermore, in[2] the authors propose yet another variation of Join-Idle-Queue, dubbedPersistent-Idle, that achieves maximal stability, without any randomization.This policy requires a message rate proportional to the arrival rate, and amemory of size (in bits) at least proportional to the number of servers.1.2. Our contribution.
Instead of focusing on yet another policy or de-cision making architecture, we step back and address a more fundamentalquestion: What are the message rate (from dispatcher to servers and fromservers to dispatcher combined) and/or memory size requirements that arenecessary and sufficient in order for a policy to be maximally stable? We areable to provide a fairly complete answer to this question.a) For the case of Poisson arrivals and exponential service times: If themessage rate is positive and the memory size (in bits) is logarithmicin the number of servers, we provide a fairly simple and natural policythat is maximally stable.b) If the message rate is sublinear in the square of the arrival rate and thenumber of memory bits is sublogarithmic in the number of servers, weshow that no decision making architecture and policy, within a certainbroad class of policies, is maximally stable. The main constraint thatwe impose on the policies that we consider is that they are “weaklysymmetric”, in a sense to be defined later.In a nutshell, as long as the message rate is not too excessive, a logarithmicmemory is necessary and sufficient for maximal stability. Remark . Our proposed policy is more economical than the mostefficient maximally stable policy analyzed in earlier literature, the Power-of- d -Choices with memory policy [10], which requires a memory of size (in bits)at least logarithmic in the number of servers, and a message rate propor-tional to the arrival rate. In contrast, our proposed policy requires a memorysize (in bits) logarithmic in the number of servers, and an arbitrarily smallmessage rate.
2. Model and main results.
In this section, we present our modelingassumptions and main results. We present a unified framework for a broad set of dispatching policies, which includes most of the policies studied inthe previous literature, and then present our negative result on the failureof maximal stability to hold for resource-constrained policies within thisframework.2.1.
Modeling assumptions.
We consider a distributed service systemconsisting of n parallel servers, where each server is associated with an in-finite capacity FIFO queue. For each i ∈ { , . . . , n } , the i -th server hasconstant (but unknown) service rate µ i >
0. Despite the heterogeneity inthe service rates, we assume that the total processing power of all servers isequal to n . Thus, the set of possible service rate vectors is(2.1) Σ n , ( µ ∈ (0 , ∞ ) n : n X i =1 µ i = n ) . Jobs arrive to the system as a single renewal process of rate λn (for somefixed λ ∈ (0 , i can only contain information aboutthe state of its own queue (number of remaining jobs and the remainingworkload of each one) and about its processing rate µ i . Within this context,a system designer has the freedom to choose a messaging policy, as well asthe rules for updating the memory and for selecting the destination of anincoming job.Regarding the performance metric, our focus is on the stability region of a policy under the arrival rate λ , i.e., the largest subset of server ratesΓ n ( λ ) ⊂ Σ n such that the policy is stable for all µ ∈ Γ n ( λ ). We will formalizethis definition in Subsection 2.4.2.2. A maximally stable policy.
In this subsection we propose a simpledispatching policy with the largest possible stability region (i.e., with sta-bility region equal to Σ n , for all λ ∈ (0 , Policy description.
For any fixed value of n , we consider the fol-lowing policy. At any time, the dispatcher stores the ID of a single serverin its memory. This ID is initialized in an arbitrary way, and it is updatedbased on spontaneous messages from the servers. In particular, each serversends messages to the dispatcher as an independent Poisson process of rate α n >
0, informing the dispatcher of its queue length (i.e., of the number ofjobs in its queue or in service). When a message from a server arrives to thedispatcher, the dispatcher stores the ID of this server only if the sender’squeue is shorter than the queue of the server that is currently stored inmemory. In order to make this comparison, the queue length of the cur-rently stored server is obtained by sending a query to it. Finally, whenevera new job arrives to the system, it is sent to the server whose ID is stored inthe dispatcher’s memory (the server ID in memory does not change at thispoint).
Remark . This policy requires only ⌈ log ( n ) ⌉ bits of memory, andan arbitrarily small (but positive) average message rate of 3 α n n .2.2.2. Main result.
When the arrival process is Poisson and the servicetimes are exponentially distributed, the behavior of the system under thispolicy can be modeled as a continuous-time Markov chain (cid:0) Q ( · ) , I ( · ) (cid:1) , where Q ( · ) = (cid:0) Q ( · ) , . . . , Q n ( · ) (cid:1) is the vector of queue lengths and I ( · ) is the IDof the server stored in memory. In this setting, the stability of the policy isestablished in the following result. Theorem . Suppose that the arrival process is Poisson, and that thejob sizes are exponentially distributed. For any n , if α n > , then the stabilityregion of the policy described above is Σ n , for all λ ∈ (0 , . This stability result is established by constructing an appropriate Lya-punov function. The proof is given in Appendix A.Theorem 2.1 states that, at least in the Markovian case, the stabilityregion of our proposed policy is the whole set of admissible rates Σ n . More-over, it implies that ⌈ log ( n ) ⌉ bits of memory and an average message rate of3 α n n (which can be arbitrarily small) are sufficient for a policy to be alwaysstable. We conjecture that our policy is always stable, even with renewalarrivals and generally distributed service times.While Theorem 2.1 ensures stability even when the message rate is ar-bitrarily small, a small message rate will result in poor steady-state delayperformance of the policy. Indeed, since the policy sends all incoming jobs to the same queue between consecutive messages, a small message rate leadsto large build ups of jobs in the queues. In particular, we conjecture thatthe steady-state queueing delay of our policy is of order Θ(1 /α n ).Given this apparent tradeoff between the average message rate and thedelay performance of the policy, the proposed policy is most useful for appli-cations where a large stability region and a small communication overheadare preferred, and where there is tolerance for large delays. Furthermore,since the policy does not depend explicitly on the rates of the servers (or es-timates thereof), it would continue to work even if the service rates changedslowly over time, which makes it robust.2.3. A general class of dispatching policies.
In this subsection we presenta unified framework that describes memory-based dispatching policies insystems with heterogeneous servers, which is slightly more general than theone introduced in [6].Let c n be the number of memory bits available to the dispatcher. Wedefine the corresponding set of memory states to be M n , { , . . . , c n } .Furthermore, we define the set of possible states at a server as the set ofnonnegative sequences Q , R Z + + , where a sequence specifies the remainingworkload of each job in that queue, including the one that is being served.(In particular, an idle server is represented by the zero sequence.) As longas a queue has a finite number of jobs, the queue state is a sequence thathas only a finite number of non-zero entries. The reason that we include theworkload of the jobs in the state is that we wish to allow for a broad classof policies that can take into account the remaining workload in the queues.In particular, we allow for information-rich messages that describe the fullworkload sequence at the server that sends the message. We are interestedin the process Q ( · ) = (cid:0) Q ( · ) , . . . , Q n ( · ) (cid:1) = (cid:16)(cid:0) Q ,j ( · ) (cid:1) ∞ j =1 , . . . , (cid:0) Q n,j ( · ) (cid:1) ∞ j =1 (cid:17) , which takes values in the set Q n , and describes the evolution of the workloadof each job in each queue. We are also interested in the process M ( · ) thatdescribes the evolution of the memory state, and in a process Z ( · ) thatdescribes the elapsed time since the arrival of the previous job.2.3.1. Fundamental processes and initial conditions.
All processes of in-terest are driven by the following common fundamental processes:1.
Arrival process:
A delayed renewal counting process A n ( · ) withrate λn , and event times { T k } ∞ k =1 , defined on a probability space(Ω A , A A , P A ). Spontaneous messages process:
A Poisson counting process R n ( · )with rate β n , and event times { T sk } ∞ k =1 , defined on a probability space(Ω R , A R , P R ).3. Job sizes:
A sequence of i.i.d. random variables { W k } ∞ k =1 with meanone, defined on a probability space (Ω W , A W , P W ).4. Randomization variables:
Eight independent and individually i.i.d.sequences of random variables { U ,k } ∞ k =1 , . . . , { U ,k } ∞ k =1 , uniform on[0 , U , A U , P U ).5. Initial conditions:
Random variables Q (0), M (0), and Z (0), definedon a common probability space (Ω , A , P ).The whole system will be defined on the associated product probability space (cid:0) Ω A × Ω R × Ω W × Ω U × Ω , A A ×A R ×A W ×A U ×A , P A × P R × P W × P U × P (cid:1) , to be denoted by (Ω , A , P ). All of the randomness in the system originatesfrom these fundamental processes, and everything else is a deterministicfunction of them.2.3.2. A construction of sample paths.
We provide a construction of aMarkov process ( Q ( · ) , M ( · ) , Z ( · )), taking values in the set Q n × M n × R + .The memory process M ( · ) is piecewise constant, and can only jump at thetime of an event. All processes considered will have the c`adl`ag property(right-continuous with left limits) either by assumption (e.g., the underlyingfundamental processes) or by construction.There are three types of events: job arrivals, spontaneous messages, andservice completions. We now describe the sources of these events, and whathappens when they occur. Job arrivals:
At the time of the k -th event of the arrival process A n , whichoccurs at time T k and involves a job with size W k , the following transitionshappen sequentially but instantaneously:1. First, the dispatcher chooses a set S k of distinct servers, from whichit solicits information about their state, according to S k = f (cid:16) M (cid:0) T − k (cid:1) , W k , U ,k (cid:17) , where f : M n × R + × [0 , → P ( { , . . . , n } ) is a measurable functiondefined by the policy. Here, and in the sequel, P ( A ) stands for thepower set of a set A .
2. Second, messages are sent to the servers in the set S k , and the serversrespond with messages containing their queue states and their servicerates. This results in a total of 2 | S k | messages exchanged. Using thisinformation, the destination of the incoming job is chosen to be D k = f (cid:16) M (cid:0) T − k (cid:1) , W k , n(cid:16) Q i (cid:0) T − k (cid:1) , µ i , i (cid:17) : i ∈ S k o , U ,k (cid:17) , where f : M n × R + × B n × [0 , → { , . . . , n } is a measurable functiondefined by the policy, with B n ⊂ P (cid:0) Q × R + × { , . . . , n } (cid:1) comprised ofthose sets of triples such that the triples in a set have different thirdcoordinates. Note that the destination of a job can depend not only onthe current memory state, the job size, the set of queried servers, andthe state of their queues, but also on the rates of the queried servers.3. Third, the memory state is updated according to M ( T k ) = f (cid:16) M (cid:0) T − k (cid:1) , W k , n(cid:16) Q i (cid:0) T − k (cid:1) , µ i , i (cid:17) : i ∈ S k o , D k , U ,k (cid:17) , where f : M n × R + × B n × { , . . . , n } × [0 , → M n is a measurablefunction defined by the policy. Note that the new memory state isobtained using the same information as for selecting the destination,including the rates of the queried servers, plus the destination of thejob. Spontaneous messages:
At the time of the k -th event of the spontaneousmessage process R n , which occurs at time T sk , the i -th server sends a spon-taneous message to the dispatcher if and only if g (cid:16) Q (cid:0) T sk (cid:1) , µ, U ,k (cid:17) = i, where g : Q n × R n + × [0 , → { , , . . . , n } is a measurable function defined bythe policy. On the other hand, no message is sent when g (cid:0) Q ( T sk ) , µ, U ,k (cid:1) =0. Note that the dependence of g on Q and µ allows the message rate at eachserver to depend on all servers’ current workloads, and on their rates. Thisallows for policies that let servers with higher service rates send messagesat a higher rate than servers with slower service rates.When a spontaneous message from server i arrives to the dispatcher, thefollowing transitions happen sequentially but instantaneously:1. First, the dispatcher chooses a set of distinct servers S sk , from whichit solicits information about their state, according to S sk = g (cid:16) M (cid:0) T − k (cid:1) , i, Q i (cid:0) T sk (cid:1) , µ i , U ,k (cid:17) , where g : M n × { , . . . , n } × Q × R + × [0 , → P ( { , . . . , n } ) is ameasurable function defined by the policy. Note that the set of serversthat are sampled not only depends on the current memory state butalso on the index, queue state, and rate of the server that sent themessage.2. Second, messages are sent to the servers in the set S sk , and the serversrespond with messages containing their queue states and their servicerates. This results in a total of 2 | S sk | messages exchanged. Using thisinformation, the memory is updated to the new memory state M ( T sk ) = g (cid:16) M (cid:0) T sk − (cid:1) , i, Q i (cid:0) T sk (cid:1) , µ i , n(cid:16) Q j (cid:0) T sk (cid:1) , µ j , j (cid:17) : j ∈ S sk o , U ,k (cid:17) , where g : M n ×{ , . . . , n }×Q× R + ×B n × [0 , → M n is a measurablefunction defined by the policy. Service completions:
Let { T dk ( i ) } ∞ k =1 be the sequence of departure timesat the i -th server. At those times, the i -th server sends a message to thedispatcher if and only if h (cid:16) Q i (cid:0) T dk ( i ) (cid:1) , µ i , U ,k (cid:17) = 1 , where h : Q × R + × [0 , → { , } is a measurable function defined by thepolicy. In that case, the memory is updated to the new memory state M (cid:16) T dk ( i ) (cid:17) = h (cid:16) M (cid:0) T dk ( i ) − (cid:1) , i, Q i (cid:0) T dk ( i ) (cid:1) , µ i , U ,k (cid:17) , where h : M n × { , . . . , n } × Q × R + × [0 , → M n is a measurable functiondefined by the policy. Finally, no message is sent when h (cid:0) Q i ( T dk ( i )) , µ i , U ,k (cid:1) =0. Remark . Note that this framework allows for policies that are moregeneral than those considered in [6]. In particular, (i) some decisions candepend on the rates of the different servers, (ii) the dispatcher can sampleservers whenever a spontaneous message arrives, and (iii) memory updatesmay involve randomization.We now introduce a symmetry assumption on the policies.
Assumption . (Weakly symmetric policies.) We assume that the dis-patching policy is weakly symmetric, in the following sense. For any givenpermutation of the servers σ , there exists a corresponding (not necessarilyunique) permutation σ M of the memory states M n that satisfies both of thefollowing properties:
1. For every m ∈ M n and w ∈ R + , and if U is a uniform random variableon [0 , σ (cid:16) f ( m, w, U ) (cid:17) d = f (cid:0) σ M ( m ) , w, U (cid:1) , where d = stands for equality in distribution. Note that this equality indistribution is only with respect to U .2. For every m ∈ M n , w ∈ R + , S ∈ P ( { , . . . , n } ), q ∈ Q n , and µ ∈ R n + ,and if V is a uniform random variable on [0 , σ (cid:16) f (cid:0) m, w, (cid:8) ( q i , µ i , i ) : i ∈ S (cid:9) , V (cid:1)(cid:17) d = f (cid:16) σ M ( m ) , w, n(cid:0) q i , µ i , σ ( i ) (cid:1) : i ∈ S o , V (cid:17) . Remark . This assumption prevents any bias for or against a server,unless it is encoded in the memory in a sufficiently detailed way so that theassumption is satisfied. For example, in order to implement (in a weaklysymmetric way) the randomized dispatching policy where incoming jobs aresent to a server with a probability proportional to its processing rate, thesecond condition in Assumption 2.1 requires the dispatching probabilities tobe encoded in memory, in a sufficiently detailed way.
Remark . Note that the universally stable policy introduced in Sub-section 2.2.1 falls within the class of policies defined by this general frame-work, and it satisfies Assumption 2.1.2.4.
Instability of resource constrained policies.
In this subsection westate the main result about the instability of general weakly symmetric dis-patching policies. Before stating this main result, we first define the averagemessage rate between the dispatcher and the servers aslim sup t →∞ t A n ( t ) X k =1 | S k | + R n ( t ) X k =1 (cid:16) | S sk | (cid:17) { ,...,n } (cid:16) g (cid:0) Q (cid:0) T sk (cid:1) , µ, U ,k (cid:1)(cid:17) + n X i =1 X k : T dk ( i ) In this subsection, we providea visual summary of our results on the tradeoff between the stability re-gion, and the memory and communication overhead of weakly symmetricdispatching policies.First, according to Theorem 2.1, with a memory size of at least ⌈ log ( n ) ⌉ bits and with an arbitrarily small message rate, we can obtain a weaklysymmetric policy that is always stable (for any service rate vector in Σ n ).Second, Theorem 2.2 states that weakly symmetric policies with o (cid:0) log( n ) (cid:1) bits of memory and a message rate of order o (cid:0) n (cid:1) cannot be always stable.Finally, note that both the Join-Shortest-Queue policy, and a policy whichsends incoming jobs to each server with a probability proportional to theserver’s rate, can be implemented by querying all servers at the time of eacharrival. These policies require a message rate of order Θ (cid:0) n (cid:1) , and no memory,and they are always stable. The three regimes are depicted in Figure 2. Not always stableAlways stableTotal message rateBits of memory o (cid:0) n (cid:1) Ω (cid:0) n (cid:1) Ω(log( n )) o (log( n )) Theorem 2.1Theorem 2.2 Weighted randompolicy Fig 2 . Resource requirements for stable policies. The only remaining question in this setting is whether stability can beguaranteed with zero communication overhead, and Ω(log( n ) bits of memory.In this case, no messages are exchanged, and the dispatcher can never obtaininformation about the rate of the servers. As a result, it can only dispatchjobs blindly, and stability fails for some server rates. 3. Conclusions and future work. In this paper, we proposed a sim-ple but efficient dispatching policy that requires a memory of size (in bits)logarithmic in the number of servers, and an arbitrarily small message rate,and showed that it has the largest possible stability region. The key to thestability properties of this policy is the fact that it never chooses the destina-tion of a job by random sampling of the servers (like the Power-of- d -Choices)or by random dispatching of the job (like Join-Idle-Queue). On the otherhand, we showed that when we have a memory size (in bits) sublogarithmicin the number of servers, and a message rate sublinear in the square of thearrival rate, all weakly symmetric dispatching policies have a sub-optimalstability region.There are several interesting directions for future research. For example:(i) While policies can have the largest possible stability region using anarbitrarily small message rate and logarithmic memory, their delayperformance could be arbitrarily bad. We conjecture that the averagedelay of a policy is at least inversely proportional to its average messagerate per server.(ii) In light of the symmetry assumption in Theorem 2.2, a natural ques-tion is whether the result still holds without it. In that case, the sam-pling of servers and dispatching of jobs need not be uniform (as estab-lished in propositions B.1 and B.3 using the symmetry assumption),and it becomes unclear whether maximal stability is still impossiblein the same regime.APPENDIX A: PROOF OF THEOREM 2.1Let us fix some n and some arbitrary vector of processing rates in Σ n . Let µ min and µ max be the smallest and largest processing rates in the chosenvector, respectively. In particular, note that they are positive.We will use the Foster-Lyapunov criterion to show that the continuous-time Markov chain (cid:0) Q ( · ) , I ( · ) (cid:1) is positive recurrent. First, note that thisprocess has state space Z n + × { , . . . , n } . Its transition rates, denoted by r · → · , are as follows, where we use e j to denote the j -th unit vector in Z n + :1. Since incoming jobs are sent to the queue whose ID is stored in mem-ory, each queue sees arrivals with rate: r ( q ,i ) → ( q + e i ,i ) = λn. 2. Transitions due to service completions occur according to the process-ing rate of each server, and they do not affect the ID stored in memory: r ( q ,i ) → ( q − e j ,i ) = µ j [1 , ∞ ) (cid:0) q j (cid:1) . 3. Spontaneous messages are sent from each server to the dispatcher ata rate equal to α n , but the ID stored in memory only changes if thesender of the message has a shorter queue: r ( q ,i ) → ( q ,j ) = α n [0 , q i − (cid:0) q j (cid:1) . 4. Any transitions that do not appear in the above have zero rate.Note that the Markov process (cid:0) Q ( · ) , I ( · ) (cid:1) on the state space Z n + × { , . . . , n } is irreducible, with all states reachable from each other. To show positiverecurrence, we define the Lyapunov functionsΞ ( q , i ) , µ max α n q i , Ξ ( q , i ) , n X j =1 q j , and(A.1) Ξ( q , i ) , Ξ ( q , i ) + Ξ ( q , i ) , and note that X ( q ′ ,i ′ ) =( q ,i ) Ξ( q ′ , i ′ ) r ( q ,i ) → ( q ′ ,i ′ ) < ∞ , ∀ ( q , i ) ∈ Z n + × { , . . . , n } . We also define the finite set(A.2) F n , ( q , i ) ∈ Z n + × { , . . . , n } : n X j =1 q j < λn (cid:16) µ max α n (cid:17) + n + 12 min { − λ, µ min } . For any state ( q , i ), we have X ( q ′ ,i ′ ) ∈ Z n + ×{ ,...,n } h Ξ ( q ′ , i ′ ) − Ξ ( q , i ) i r ( q ,i ) → ( q ′ ,i ′ ) = λn (cid:18) µ max α n (cid:19) − µ max α n µ i [1 , ∞ ) (cid:0) q i (cid:1) − n X j =1 µ max (cid:0) q i − q j (cid:1) + ≤ λn (cid:18) µ max α n (cid:19) − n X j =1 µ max (cid:0) q i − q j (cid:1) + , (A.3) and X ( q ′ ,i ′ ) ∈ Z n + ×{ ,...,n } h Ξ ( q ′ , i ′ ) − Ξ ( q , i ) i r ( q ,i ) → ( q ′ ,i ′ ) = λn (2 q i + 1) − n X j =1 µ j (cid:0) q j − (cid:1) [1 , ∞ ) (cid:0) q j (cid:1) = λn (2 q i + 1) + n X j =1 µ j [1 , ∞ ) (cid:0) q j (cid:1) − n X j =1 µ j q j ≤ λn (2 q i + 1) + n − n X j =1 µ j q j , (A.4)where in the last inequality we used that the vector of server rates µ is inΣ n , so that(A.5) n X j =1 µ j = n. Combining equations (A.1), (A.3), and (A.4), for any state ( q , i ) / ∈ F n , we have X ( q ′ ,i ′ ) ∈ Z n + ×{ ,...,n } h Ξ( q ′ , i ′ ) − Ξ( q , i ) i r ( q ,i ) → ( q ′ ,i ′ ) ≤ λn (cid:18) µ max α n (cid:19) + n + 2 λn q i − n X j =1 h µ j q j + µ max ( q i − q j ) + i ≤ λn (cid:18) µ max α n (cid:19) + n + 2 λn q i − n X j =1 µ j h q j + ( q i − q j ) + i = λn (cid:18) µ max α n (cid:19) + n + 2 λn q i − n X j =1 µ j max (cid:8) q i , q j (cid:9) = λn (cid:18) µ max α n (cid:19) + n + 2 λn q i − n X j =1 µ j h q i + ( q j − q i ) + i = λn (cid:18) µ max α n (cid:19) + n + 2 λn q i − q i n X j =1 µ j − n X j =1 µ j ( q j − q i ) +( ∗ ) = λn (cid:18) µ max α n (cid:19) + n − − λ ) n q i − n X j =1 µ j ( q j − q i ) + ≤ λn (cid:18) µ max α n (cid:19) + n − − λ ) n q i − µ min n X j =1 ( q j − q i ) + ≤ λn (cid:18) µ max α n (cid:19) + n − { − λ, µ min } n X j =1 h q i + ( q j − q i ) + i = λn (cid:18) µ max α n (cid:19) + n − { − λ, µ min } n X j =1 max (cid:8) q i , q j (cid:9) ≤ λn (cid:18) µ max α n (cid:19) + n − { − λ, µ min } n X j =1 q j ≤ − , where in equality ( ∗ ) we used Equation (A.5), and in the last inequality weused the fact that ( q , i ) / ∈ F n and the definition of the finite set F n (Equation(A.2)). Then, the Foster-Lyapunov criterion [4] implies the positive recur-rence of the Markov chain (cid:0) Q ( · ) , I ( · ) (cid:1) . Finally, since this is true for all serverrates in Σ n , we conclude that Σ n is the stability region of the policy. APPENDIX B: PROOF OF THEOREM 2.2Fix λ , and consider a vector of server rates in Σ n where ⌊ n/ ⌋ servers haverate ǫ n > 0. We will show that, for any given λ , and for all ǫ n small enough,every resource constrained dispatching policy that is weakly symmetric (i.e.,satisfies Assumption 2.1) overloads the slow servers.The high-level outline of the proof is as follows. In Subsection B.1 we showthat under our weak symmetry assumption, the constraint on the numberof bits available implies that the dispatcher treats all servers in a symmetricway, in some appropriate sense.Then, in Subsection B.2 we combine the results obtained in Subsection B.1with the bound on the average message rate to show that jobs are sent toslow servers (i.e., to servers with service rate ǫ n ) with a positive rate that isbounded away from zero. This implies that the total workload of the serversdiverges when ǫ n is small enough, thus completing the proof.In the proof that follows, we assume that the sequences c n (memory size)and α n (message rate) have been fixed, and are of order o (log n ) and o ( n ),respectively. B.1. Local limitations of symmetry and finite memory. In thissubsection we will show how the constraint of having only o (log( n )) bits ofmemory affects the distribution of the sampled servers, and the distributionof the dispatched jobs. The results that we provide are corollaries or specialcases of results in [6].We first note that if the dispatcher has o (log( n )) bits of memory, and if n is large enough, then the distribution of the sampled servers is uniformamong all sets of the same size. Proposition B.1 . Let U be a uniform random variable over [0 , . Forall n large enough, for every memory state m ∈ M n and every possiblejob size w ∈ R + , the following holds. Consider any set of servers S ∈P ( { , . . . , n } ) with | S | ∈ o ( n ) . Consider the event B ( m, w ; S ) = (cid:8) f ( m, w, U ) = S ∪ { i } , for some i / ∈ S (cid:9) , and assume that the conditional probability measure P (cid:0) · (cid:12)(cid:12) B ( m, w ; S ) (cid:1) is well-defined. Then, P (cid:0) j ∈ f ( m, w, U ) (cid:12)(cid:12) B ( m, w ; S ) (cid:1) is the same for all j / ∈ S . Proof. This is a special case of Proposition 5.1 in [6], noting that whilethe statement of that proposition requires | S | ≤ √ n , its proof goes throughunder the weaker assumption | S | ∈ o ( n ). Corollary B.2 . Let U be a uniform random variable over [0 , . Forall n large enough, for every memory state m ∈ M n , for every possible jobsize w ∈ R + , and for any set of servers S ∈ P ( { , . . . , n } ) with | S | ∈ o ( n ) ,we have P (cid:16) f (cid:0) m, w, U (cid:1) = S (cid:17) = P (cid:16) f (cid:0) m, w, U (cid:1) = σ ( S ) (cid:17) , for every permutation σ . Proof. In order to simplify notation, we omit the dependence of f on m and w . Let us fix a set S , and a transposition τ . If τ ( S ) = S , then it istrivially true that P (cid:0) f ( U ) = S (cid:1) = P (cid:0) f ( U ) = τ ( S ) (cid:1) . On the other hand, if τ ( S ) = S , then there exists some i ∈ S such that τ ( i ) / ∈ S . In that case, we have: P (cid:0) f ( U ) = S (cid:1) = P (cid:0) f ( U ) = S (cid:12)(cid:12) | f ( U ) | = | S | (cid:1) P (cid:0) | f ( U ) | = | S | (cid:1) = P (cid:16) i ∈ f ( U ) (cid:12)(cid:12)(cid:12) (cid:8) | f ( U ) | = | S | (cid:9) ∩ (cid:8) S \{ i } ⊂ f ( U ) (cid:9)(cid:17) · P (cid:16) S \{ i } ⊂ f ( U ) (cid:12)(cid:12)(cid:12) | f ( U ) | = | S | (cid:17) P (cid:0) | f ( U ) | = | S | (cid:1) = P (cid:16) τ ( i ) ∈ f ( U ) (cid:12)(cid:12)(cid:12) (cid:8) | f ( U ) | = | S | (cid:9) ∩ (cid:8) S \{ i } ⊂ f ( U ) (cid:9)(cid:17) · P (cid:16) S \{ i } ⊂ f ( U ) (cid:12)(cid:12)(cid:12) | f ( U ) | = | S | (cid:17) P (cid:0) | f ( U ) | = | S | (cid:1) = P (cid:0) f ( U ) = τ ( S ) (cid:1) , where in the second to last equality we used Proposition B.1.Finally, since any permutation σ can be obtained as a sequence of trans-positions, applying the previous argument iteratively yields P (cid:0) f ( U ) = S (cid:1) = P (cid:0) f ( U ) = σ ( S ) (cid:1) , for every permutation σ . Remark B.1 . Although all sets of servers of the same size have the sameprobability of being sampled, the memory state and the incoming job sizecan influence the number of sampled servers. Similarly, if the dispatcher has o (log( n )) bits of memory, then the dis-tribution of the destination of the incoming job is uniform (possibly zero)outside the set of sampled servers. Proposition B.3 . Let V be a uniform random variable over [0 , . Forall n large enough, for every memory state m ∈ M n , every set of indices S ∈ P ( { , . . . , n } ) with | S | ∈ o ( n ) , every queue vector state q ∈ Q n , everyrate vector µ ∈ R n + , and every job size w ∈ R + , we have P (cid:16) f (cid:0) m, w, { ( q i , µ i , i ) : i ∈ S } , V (cid:1) = j (cid:17) is the same for all j / ∈ S . Proof. This is a special case of Proposition 5.2 in [6]. B.2. High arrival rate to slow servers. In this subsection we willleverage the results of the previous subsection to show that the total work-load in the system diverges in time.For every t ≥ 0, let W n ( t ) be the total remaining workload in the systemat time t . Lemma B.4 . Fix some λ > , and suppose that the service rate of ⌊ n/ ⌋ servers is equal to some ǫ n > . Then, there exists a positive sequence (cid:8) b n ( λ ) (cid:9) n ≥ , which is completely determined by λ (i.e., independent of ǫ n )such that b n ∈ Θ (cid:0) e − α n /n (cid:1) , and lim inf t →∞ W n ( t ) t ≥ (cid:2) b n ( λ ) − ǫ n (cid:3) n, a.s., for all n large enough. Proof. Let A n ( t ) be the counting process of arrivals with a job size ofat least 1 / 2, and let us define p / , P (cid:18) W ≥ (cid:19) . Since the arrivals are modeled as a renewal process of rate λn , and the jobsizes { W k } ∞ k =1 are i.i.d. with unit mean, it follows that A n ( t ) is a renewalcounting process with rate λnp / > 0. On the other hand, since the average message rate (cf. Equation 2.2) is upper bounded by α n almost surely, wehave lim sup t →∞ t A n ( t ) X k =1 | S k | ≤ α n , a.s. Combining this with the fact thatlim sup t →∞ t A n ( t ) X k =1 (cid:18) α n λnp / (cid:19) (cid:8) | S k | > αnλnp / (cid:9) ≤ lim sup t →∞ t A n ( t ) X k =1 | S k | , we obtain lim sup t →∞ t A n ( t ) X k =1 (cid:8) | S k | > αnλnp / (cid:9) ≤ λnp / . This in turn implies thatlim inf t →∞ t A n ( t ) X k =1 (cid:8) | S k |≤ αnλnp / (cid:9) = lim inf t →∞ t A n ( t ) X k =1 − (cid:8) | S k | > αnλnp / (cid:9)! ≥ lim inf t →∞ A n ( t ) t + lim inf t →∞ t A n ( t ) X k =1 − (cid:8) | S k | > αnλnp / (cid:9) = λnp / − lim sup t →∞ t A n ( t ) X k =1 (cid:8) | S k | > αnλnp / (cid:9) ≥ λnp / , a.s. (B.1)Let N ǫ n ⊂ { , . . . , n } be the set of servers with service rate ǫ n , which wasassumed to have cardinality ⌊ n/ ⌋ . Let s be a nonnegative integer upperbounded by s ∗ n , α n /λnp / . Since s ∗ n ∈ o ( n ), Corollary B.2 applies, and weobtain P (cid:0) S k ⊂ N ǫ n (cid:12)(cid:12) | S k | = s (cid:1) = (cid:0) ⌊ n/ ⌋ s (cid:1)(cid:0) ns (cid:1) = ⌊ n/ ⌋ (cid:0) ⌊ n/ ⌋ − (cid:1) · · · (cid:0) ⌊ n/ ⌋ − s + 1 (cid:1) n (cid:0) n − (cid:1) · · · (cid:0) n − s + 1 (cid:1) ≥ (cid:18) (cid:19) s , for all n large enough, where in the last inequality we used that s ∗ n ∈ o (cid:0) n (cid:1) .Since this is true for all s in the given range, we obtain P (cid:0) S k ⊂ N ǫ n (cid:12)(cid:12) | S k | ≤ s ∗ n (cid:1) ≥ (cid:18) (cid:19) s ∗ n , for all k ≥ 1, and for all n large enough. Combining this with Equation (B.1),we obtain(B.2) lim inf t →∞ t A n ( t ) X k =1 (cid:8) | S k |≤ s ∗ n , S k ⊂ N ǫn (cid:9) ≥ λnp / (cid:18) (cid:19) s ∗ n , almost surely, for all n large enough.Let us fix a particular set S that satisfies | S | ≤ s ∗ n , and S ⊂ N ǫ n . For anysuch set, Proposition B.3 implies P (cid:0) D k ∈ N ǫ n (cid:12)(cid:12) S k = S (cid:1) = P (cid:0) D k ∈ N ǫ n (cid:12)(cid:12) D k ∈ S, S k = S (cid:1) P (cid:0) D k ∈ S (cid:12)(cid:12) S k = S (cid:1) + P (cid:0) D k ∈ N ǫ n (cid:12)(cid:12) D k ∈ S c , S k = S (cid:1) P (cid:0) D k ∈ S c (cid:12)(cid:12) S k = S (cid:1) = P (cid:0) D k ∈ S (cid:12)(cid:12) S k = S (cid:1) + | N ǫ n ∩ S c || S c | P (cid:0) D k ∈ S c (cid:12)(cid:12) S k = S (cid:1) ≥ | N ǫ n ∩ S c || S c | = (cid:4) n (cid:5) − | S | n − | S |≥ (cid:4) n (cid:5) − s ∗ n n ≥ , for all n large enough, where in the last inequality we used that s ∗ n ∈ o (cid:0) n (cid:1) .Since this is true for every set S with the given properties, we conclude that P (cid:0) D k ∈ N ǫ n (cid:12)(cid:12) S k ⊂ N ǫ n , | S k | ≤ s ∗ n (cid:1) ≥ . for all k ≥ 1, and for all n large enough. Combining this with Equation(B.2), we obtainlim inf t →∞ t A n ( t ) X k =1 (cid:8) D k ∈ N ǫn (cid:9) ≥ lim inf t →∞ t A n ( t ) X k =1 (cid:8) D k ∈ N ǫn , | S k |≤ s ∗ n , S k ⊂ N ǫn (cid:9) ≥ λnp / (cid:18) (cid:19) s ∗ n , a.s., for all n large enough. Note that this is a lower bound on the average rate ofarrival of jobs with size at least 1 / 2, to the servers with service rate ǫ n . Onthe other hand, those servers have a total processing rate of ǫ n ⌊ n/ ⌋ unitsof workload per unit of time. Then, since the total workload of the systemis at least as much as the workload of the servers with rate ǫ n , we havelim inf t →∞ W n ( t ) t ≥ lim inf t →∞ t A n ( t ) X k =1 (cid:8) D k ∈ N ǫn (cid:9) − ǫ n j n k ≥ " λp / (cid:18) (cid:19) s ∗ n − ǫ n n, for all n large enough. This establishes the desired result, with b n ( λ ) equalto the first term in the bracketed expression above.Lemma B.4 implies that, for all n large enough, the total workload in thesystem increases at least linearly with time, as long as ⌊ n/ ⌋ of the servershave rate ǫ n < b n ( λ ). In particular, this will happen if ǫ n ∈ O (cid:0) e − α n /n (cid:1) .Since the above is true for every weakly symmetric policy with o (log n )bits of memory, and with an average message rate upper bounded by α n ∈ o (cid:0) n (cid:1) almost surely, it follows that, for all n large enough, the stabilityregion of any such policy is contained in a proper subset Γ n ( λ, α n ) of Σ n which excludes service rate vectors for which ⌊ n/ ⌋ of the servers have rate ǫ n < b n ( λ ). REFERENCES [1] Anselmi, J. (2019). Combining Size-Based Load Balancing with Round-Robin forScalable Low Latency. IEEE Transactions on Parallel and Distributed Systems Atar, R. , Keslassy, I. , Mendelson, G. , Orda, A. and Vargaftik, S. (2020).Persistent-Idle Load-Distribution. To appear in Stochastic Systems .[3] Bramson, M. (2011). Stability of Join the Shortest Queue Network. The Annals ofApplied Probability Foster, F. G. (1953). On the stochastic matrices associated with certain queueingprocesses. The Annals of Mathematical Statistics Gamarnik, D. , Tsitsiklis, J. N. and Zubeldia, M. (2018). Delay, memory, andmessaging tradeoffs in distributed service systems. Stochastic Systems Gamarnik, D. , Tsitsiklis, J. N. and Zubeldia, M. (2020). A lower bound onth queueing delay in resource constrained load balancing. The Annals of AppliedProbability Lu, Y. , Xie, Q. , Kliot, G. , Geller, A. , Larus, J. R. and Greenberg, A. (2011).Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web ser-vices. Performance Evaluation [8] Mitzenmacher, M. D. (1996). The power of two choices in randomized load bal-ancing PhD thesis, U.C. Berkeley.[9] Mukherjee, D. , Borst, S. , van Leeuwaarden, J. and Whiting, P. (2016). Uni-versality of Power-of-d Load Balancing Schemes. In Workshop on Mathematical per-formance Modeling and Analysis (MAMA) .[10] Shah, D. and Prabhakar, B. (2002). The use of memory in randomized load bal-ancing. In Proceedings of ISIT 2002 .[11] Stolyar, A. (2015). Pull-based load distribution in large-scale heterogeneous servicesystems. Queueing Systems van der Boor, M. , Zubeldia, M. and Borst, S. (2020). Zero-Wait Load Balancingwith Sparse Messaging. Operations Research Letters Vvedenskaya, N. D. , Dobrushin, R. L. and Karpelevich, F. I. (1996). Queueingsystem with selection of the shortest of two queues: an asymptotic approach. Problemsof Information Transmission David GamarnikJohn N. TsitsiklisMassachusetts Institute of TechnologyE-mail: [email protected]@mit.edu