A Restless Bandit Model for Resource Allocation, Competition and Reservation
aa r X i v : . [ m a t h . O C ] M a r A Restless Bandit Model for ResourceAllocation, Competition and Reservation
Jing Fu, Bill Moran and Peter G. Taylor
Abstract
We study a resource allocation problem with varying requests, and with resources of limitedcapacity shared by multiple requests. It is modeled as a set of heterogeneous Restless Multi-ArmedBandit Problems (RMABPs) connected by constraints imposed by resource capacity. Following Whittle’srelaxation idea and Weber and Weiss’ asymptotic optimality proof, we propose a simple policy and proveit to be asymptotically optimal in a regime where both arrival rates and capacities increase. We providea simple sufficient condition for asymptotic optimality of the policy, and in complete generality proposea method that generates a set of candidate policies for which asymptotic optimality can be checked. Theeffectiveness of these results is demonstrated by numerical experiments. To the best of our knowledge,this is the first work providing asymptotic optimality results for such a resource allocation problem andsuch a combination of multiple RMABPs.
Index Terms restless bandits; resource sharing; Markov decision process
I. I
NTRODUCTION
A. Overview and Motivation
Modern technologies enable Internet resources such as routers, computing servers and cablesto be abstracted from the physical layer to a virtual layer, facilitating a quick response to demandsfor setting up communication networks or processing computing jobs. Virtual servers comprising
Jing Fu was with School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia, and she is nowwith School of Engineering, RMIT University, VIC3001, Australia (e-mail: [email protected]).Peter G. Taylor is with School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia (e-mail:[email protected]).Bill Moran is with Department of Electrical and Electronic Engineering, The University of Melbourne, VIC 3010, Australia(e-mail:[email protected]). different sets of physical resources are assigned to arriving customers who use these resourcesfor a period of time and then return them to a pool when they depart.Such networks are just particular examples of more general systems where users of differenttypes arrive with a desire to be allocated resources of various kinds, to use these resourcesand then return them. Users are often indifferent to the precise set of resources that they areallocated, they just require allocation of some resources that will enable them to accomplishthe task at hand. In such circumstances a network manager has the task of deciding whether anarriving customer should be admitted into the system and, if so, which set of resources shouldbe assigned to satisfying their requirements.In this paper we describe and analyze a very general model for such systems. Specifically,we study a system in which J resource pools , each made up of finite numbers of resource units (RUs), await allocation to incoming requests of L different types. We refer to the number ofRUs in a resource pool as its capacity . Each resource pool is potentially shared and competed for by many requests, but reservation of RUs for still-to-arrive requests is also allowed. Whena request has been accommodated by a resource pool, an appropriate number of RUs of thistype are occupied by the request until it leaves the system. The released RUs can be reusedby other requests. A request is permitted to occupy RUs from more than one resource poolsimultaneously. In this context, the number of requests of the same type that are accommodatedby a group of resource pools varies according to a stochastic process, where the transition ratesare affected by the resource allocation policy employed. Several such processes associated withthe same resource pool are coupled by its capacity limitations.By strategically assigning requests to appropriate combinations of RUs, we aim to maximizethe long-run average revenue, defined as the difference between the long-run average rewardearned by serving the requests and the long-run average cost incurred by using the resourcepools. Such a resource allocation problem can be easily applied to a rich collection of classicalmodels, such as loss networks in telecommunications, resource allocation for logistic systems,and job assignment in parallel computing.[Kelly, 1991] published a comprehensive analysis of loss network models with and without alternative routing . In the latter case, network traffic can be re-routed onto alternative pathswhen the original path fails or is full. In [Kelly, 1991], a list of alternative paths as choices ofresource pools is given for each call/request. The alternative paths are selected in turn after ifpreceding offered paths are unavailable. In contrast, the manager of a typical resource allocation Fig. 1. A simple loss network. Fig. 2. A simple parallel queueing model. problem described above is potentially able to change the priorities of paths dynamically. Howthis should be done is a key focus of this paper.To illustrate the kind of problem of interest here, consider the simple loss network modelshown in Figure 1. Links a , b and c are abstracted as resource pools with capacities equal to 1,3 and 3, respectively: link a consists of one channel as an RU, and links b and c each have 3channels. Requests asking for a connection from A to B occupying one channel can be servedby either path { a } or { b, c } , but requests requiring two channels for each connection from A to B are able to be accommodated only by path { b, c } . We refer to the former and the latter astype-I and type-II requests, respectively. An arrival of a type-I request results in one of the paths { a } and { b, c } being chosen by the optimizer depending on current traffic loads on the threelinks, where links b and c might be shared with existing type-II requests. Occupied channels orRUs are released immediately and simultaneously when relevant requests are completed.Resource allocation problems with small values of L and J , such as the example above, canbe modeled by a Markov Decision Processes (MDP), and solved through dynamic programming.However, in real-world applications, where L and J are large, resulting in high dimensionalityof the state and action spaces, such an approach is often intractable.In this paper we use an analysis inspired by techniques applied to Restless Multi-Armed BanditProblems (RMABPs). The standard RMABP consists of parallel MDPs with binary actions (theycan either be “pulled’, that is activated, or not), which are competing for a limited possibility ofbeing selected at each decision epoch. Each of the MDPs, referred to as a bandit process , hasits own individual state-dependent reward rates and transition probabilities when it is activatedand when it is not.Attempts to solve the problem are faced with exponential growth in the size of the statespace as the number of parallel bandit processes increases. This class of problems was de-scribed by [Whittle, 1988], who proposed a heuristic management policy that was shown tobe asymptotically optimal under non-trivial extra conditions by [Weber and Weiss, 1990]; thispolicy approaches optimality as the number of bandit processes tends to infinity. The policy, subsequently referred to as the Whittle index policy , always prioritizes bandit processes withhigher state-dependent indices that intuitively represent marginal rewards earned by processes ifthey are selected. The Whittle indices can be computed independently for each bandit process -a process that imposes significantly reduced computational complexity. The Whittle index policyis scalable to a RMABP with a large number of bandit processes. Also, the asymptotic optimalityproperty, if it is satisfied, guarantees a bounded performance degradation in a large-scale systemand is appropriate for large problems where optimal solutions are intractable. The non-trivialextra conditions required by the asymptotic optimality proof in [Weber and Weiss, 1990] arerelated to proving the existence of a global attractor of a stochastic process.RMABPs have been widely used in scheduling problems, such as channel detecting (see[Liu et al., 2012], [Wang et al., 2019] ), job assignments in data centers (see [Fu et al., 2016]),web crawling (see [Avrachenkov and Borkar, 2016]), target tracking (see [Krishnamurthy and Djonin, 2007],[Le Ny et al., 2010]) and job admission control (see [Ni˜no-Mora, 2012], [Ni˜no-Mora, 2019]).Here we treat the resource allocation problem described above as a set of RMABPs coupled bylinear inequalities involving random state and action variables.
B. Main Contributions
We propose a modified index policy that takes into account the capacity constraints of theproblem. The index policy prioritizes combinations of RUs with the highest indices, each ofwhich is a real number representing the marginal revenue of using its associated RUs. Thepolicy is simple, scalable and appropriate for a large scale resource allocation problem.Our analysis of asymptotic optimality of the index policy proceeds through a relaxed versionof the problem and study of a global attractor of a stochastic process defined in (35) below. Weprove that the process (35) will almost surely converge to a global attractor in the asymptoticregime regardless of its initial point, and hence the index policy is asymptotically optimal ifand only if this global attractor coincides with an optimal solution of the resource allocationproblem. Following ideas similar to those of [Weber and Weiss, 1990], optimality of the globalattractor for the resource allocation problem can be deduced from its optimality for the relaxedproblem, which can be analyzed with remarkably reduced computational complexity.A sufficient condition for the global attractor and optimal solution to coincide is that theoffered traffic for the entire system is heavy and the resource pools in our system are weakly coupled . We rigorously define these concepts in Section III-C. These results are enunciated inTheorems 1, 2 and Corollary 2 in Section V-C.When the above-mentioned sufficient conditions are not satisfied, an asymptotically optimalindex policy can still exist. In this case, we propose a method that can derive the parametersrequired by the asymptotically optimal policy. Although asymptotic optimality is not guaranteed,Theorem 2 provides a verifiable sufficient condition, less stringent than the one mentioned above,to check asymptotic optimality of the index policy with adapted parameters. We numericallydemonstrate the effectiveness of this method in Section VI.The index policy exhibits remarkably reduced computational complexity, compared to conven-tional optimizers, and its potential asymptotic optimality is appropriate for large-scale systemswhere computational power is a scarce commodity. Furthermore, simulation studies indicate thatan index policy can still be good in the pre-limit regime. As mentioned earlier, our problem canbe seen as a set of RMABPs coupled by the capacity constraints. When the capacities of allresource pools tend to infinity, the index policy reduces to the Whittle index policy because thelinks between RMABPs no longer exist.To the best of our knowledge, no existing work has proved asymptotic optimality in resourceallocation problems, where resource competition and reservation are potentially permitted, norhas there been a previous analysis of such a combination of multiple, different RMABPs, resultingin a much higher dimensionality of the state space.The remainder of the paper is organized as follows. In Section II, we describe the resourceallocation problem. In Section III, we apply the Whittle relaxation technique. In Section IV,we propose an algorithm to implement an index policy. In Section V, we define the asymptoticregime and we prove the asymptotic optimality of the index policy under some conditions.To demonstrate the effectiveness of the proposed policies, numerical results are provided inSection VI. In Section VII, we present conclusions.
C. Relation to the Literature
The classical Multi-Armed Bandit Problem (MABP) is a optimization problem in which onlyone bandit process (BP) among K BPs can be activated at any one time, while all the other K − BPs are frozen : an active BP randomly changes its state, while state transitions will nothappen to the frozen BPs. In 1974, Gittins and Jones published the well-known index theorem for the MABP [Gittins and Jones, 1974], and in 1979, [Gittins, 1979] proved the optimality of a simple index policy , subsequently referred to as the
Gittins index policy . Under the Gittinsindex policy, an index value, referred to as the
Gittins index , is associated with each state ofeach BP, and the BP with the largest index value is activated, while all the other BPs are frozen.More details about Gittins indices can be found in [Gittins et al., 2011, Chapter 2.12] (and thereferences therein).The optimality of the Gittins index policy for the conventional MABP fails for the generalcase where the K − BPs that are not selected can also change their states randomly; sucha process is known as a Restless Multi-Armed Bandit Process (RMABP). The RMABP wasproposed by [Whittle, 1988]. The RMABP allows M = 1 , , . . . , K BPs to be active simul-taneously. In a similar vein to the Gittins index policy, Whittle assigned a state-dependentindex value, referred to as the
Whittle index , to each BP and always activated the M BPs withthe highest indices. The Whittle indices are calculated from a relaxed version of the originalRMABP obtained by randomizing the action variables. [Whittle, 1988] defined a property ofa RMABP, referred to as indexability , under which the
Whittle index policy exists. Whittleconjectured in [Whittle, 1988] that the Whittle index policy, if it exists, is asymptotically optimal .[Papadimitriou and Tsitsiklis, 1999] proved that the optimization of RMABPs is PSPACE-hardin general; nonetheless, [Weber and Weiss, 1990] were able to establish asymptotic optimalityof Whittle index policy under mild conditions.[Ni˜no-Mora, 2001] proposed a Partial Conservation Law (PCL) for the optimality of RMABP;this is an extension of the General Conservation Law (GCL) published in [Bertsimas and Ni˜no-Mora, 1996].Later, [Ni˜no-Mora, 2002] defined a group of problems that satisfies PCL-indexibility and pro-posed a new index policy that improved the Whittle index. The new index policy was proved tobe optimal for problems with PCL-indexibility. PCL-indexibility implies (and is stronger than)Whittle indexibility. A detailed survey about the optimality of bandit problems can be found in[Ni˜no-Mora, 2007].[Verloop, 2016] proved the asymptotic optimality of the Whittle index policy in an extendedversion of an RMABP, where BPs randomly arrive and depart the system. She proposed anindex policy that was not restricted to Whittle indexable models and numerically demonstratedits near-optimality. [Larra˜naga et al., 2015] applied this extended RMABP to a queueing problemassuming convex, non-decreasing functions for both holding costs and measured values ofpeople’s impatience. More results on asymptotic optimality of index-like polices can be foundin [Fu, 2016, Chapter IV].
Asymptotically optimal policies for cost-minimization problems in network systems using afluid approximation technique have been considered in [B¨auerle et al., 2000], [B¨auerle, 2002],[Stolyar, 2004], [Nazarathy and Weiss, 2009] and [Bertsimas et al., 2015]. The fluid approxima-tion to the stochastic optimization problem can be much simpler than the original. A key problemhere is to establish an appropriate fluid problem and translate its optimal solution to a policyamenable to the stochastic problem. Asymptotic optimality of the translated stochastic policycan be established if the fluid solution provides an upper/lower bound of the stochastic problemand the policy coincides with this bound asymptotically. The reader is referred to [Meyn, 2008]for a detailed description of fluid approximation across various models.Although the fluid approximation technique helps with asymptotic analysis in a wide range of(cost-minimization) network problems, existing results cannot be directly applied to our problem,where the arrival and departure rates of request queues are state-dependent and capacity violationover resource pools is strictly forbidden. Our system is always stable for any offered trafficbecause of the strict capacity constraints. In our case, the form of the corresponding fluid modelremains unclear for generic policies. Even given the optimal solution of a well-established fluidmodel, the synthesis of an explicit policy in the stochastic model remains a challenge.We adopt another approach, following the ideas of [Whittle, 1988] and [Weber and Weiss, 1990].Our asymptotic optimality is derived from an optimal solution of a relaxed version of thestochastic optimization problem. The relaxed problem is still a stochastic optimization problemwith a discrete state space. We propose a policy based on intuition captured by the relaxedproblem, of which the optimal solution provides a performance upper bound of the originalproblem. Then, we prove, under certain conditions, that this policy coincides with the upperbound asymptotically. The detailed analysis comprises the main content of the paper.II. A R
ESOURCE A LLOCATION P ROBLEM
We use N + and N to denote the sets of positive and non-negative integers, respectively, andfor any N ∈ N + , let [ N ] represent the set { , , . . . , N } with [0] = ∅ . Let R , R + and R be theset of all, positive and non-negative reals, respectively. A. System Model
Recall that there are L types of requests and J pools of RUs, all potentially different, withresource pool j ∈ [ J ] having capacity C j RUs that can be dynamically allocated to and released by the L types of requests.Each request comes with an associated list of candidate resource combinations. Specifically,requests from request type ℓ ∈ [ L ] can be accommodated by one of a set P ℓ of candidate patterns . One of these candidate patterns will be selected by a policy. Patterns are indexed by i ∈ N + . If a request is accommodated by pattern i , w j,i RUs of pool j ∈ [ J ] are occupieduntil the request is completed and departs. We can thus identify pattern i with the weight vector w i = ( w j,i ) that defines its requirement. Preemption or re-allocation of requests are not allowed.A request is blocked if there is not enough capacity on any of its corresponding patterns. Wemight also want to block a request in other circumstances, if accepting it would be detrimentalto future performance. In either case, we model the situation where a request is blocked byassigning it to the dummy pattern d ( ℓ ) with the weight vector set to .It is possible for different RTs to be satisfied by the same pattern (this occurs, in particularwith the dummy pattern). In such cases, we consider there to be multiple copies of each pattern,one for each RT that it can satisfy. This enables us to consider the sets P ℓ to be mutuallyexclusive; that is, P ℓ ∩ P ℓ = ∅ for any ℓ = ℓ . Given | P ℓ | patterns for each RT ℓ , we have intotal I = P ℓ ∈ [ L ] | P ℓ | patterns associated with weight vectors w i ∈ N J , i ∈ [ I ] . For any pattern i , let ℓ ( i ) be the unique RT that is satisfied by that pattern.Let W be a J × I matrix with entries w j,i . We assume that there is no row and exactly L columns in W with all zero entries. Each of these zero columns corresponds to one of thedummy patterns d ( ℓ ) where requests of type ℓ ∈ [ L ] are blocked.Requests of RT ℓ arrive at the system sequentially, following a Poisson process, with rates λ ℓ and the occupation times of the requests accommodated by pattern i ∈ P ℓ are exponentiallydistributed with parameter µ i . Although there might be situations when it is reasonable to assumethat the occupation time depends only on the request type ℓ , there might also be cases wherethe lifetime of a request depends on the resources accommodating it, which is why we allowthe occupation time distribution to depend on i . The RUs used to accommodate a request areoccupied and released at the same time. Neither the request nor the system knows the lifespanof a request until it is accomplished and departs the system.Since there are similarities between our problem and a parallel queueing model, we presenta second example to clarify the similarities and differences. Consider two resource pools corre-sponding to two queues as illustrated in Figure 2, where both capacities are set to three; that is, J = 2 and C = C = 3 . There are two types of requests: if a type-one request is accommodated in the system, it will simultaneously occupy one RU of both pools; and a type-two request can beaccommodated by two RUs of either pool. In other words, L = 2 , P = { , } , P = { , , } ,patterns and are dummy patterns with w = w = , w = (1 , , w = (2 , , w = (0 , and I = 5 .In this case, the number of occupied RUs in both resource pools may decrease or increaseby one simultaneously, or by two exclusively for an arrival or departure event. The transitionrates are affected by the system controller: if the capacity constraints are not violated, thereare two choices, resource pool one or two, for accommodating a type-two request. The task ofa system manager is to find a policy for deciding which of these choices to take in order tomaximize some long-term objective. Each choice will result in a parallel queueing model withdependencies between the sizes of queues, between the policy employed and queue transitionrates. As mentioned in Section I, conventional optimization methods cannot be applied directlywhen L and J are large. B. A Stochastic Optimization Problem
We focus here an explanation of the stochastic mechanism of the resource allocation problem.An instantiation is generated in the memory of the system controller when a request of RT ℓ ∈ [ L ] is accommodated by a pattern i ∈ P ℓ . Once the request departs the system, the associatedinstantiation will be removed from the controller’s memory. As requests are accommodated andcompleted, the number of instantiations associated with each pattern forms a birth-and-deathprocess, indicating the number of requests being served by this pattern. As mentioned in thesecond example, the birth-and-death processes for all patterns i ∈ [ I ] are coupled by capacityconstraints and affected by control decisions.Let N i ( t ) , t ≥ , represent the number of instantiations for pattern i at time t . The process N i ( t ) has state space N i that is a discrete, finite set of possible values. The finiteness of N i derives from the finite capacities C j . If N i ( t ) is known for all i ∈ [ I ] , the number of occupiedRUs in pool j ∈ [ J ] at time t is given by S j ( t ) = P i ∈ [ I ] w j,i N i ( t ) , which must be less than C j . The vector N ( t ) = ( N i ( t ) : i ∈ [ I ]) is the state variable of the entire system taking valuesin N := Q i ∈ [ I ] N i , where Q represents Cartesian product. Since the state variables are furthersubject to capacity constraints to be discussed in Section II-B2, N is larger than necessary.With slightly abused notation, we still refer to N as the state space of the system.
1) Action Constraints:
We associate an action variable a i ( n ) ∈ { , } with process i ∈ [ I ] when the system is in state n ∈ N , and a ( n ) = ( a i ( n ) : i ∈ [ I ]) . The action variable a i ( n ) tells us what to do with a potential new request of type ℓ ( i ) . If a i ( n ) = 1 , then such a patternwill be allocated to pattern i . The action constraint , X i ∈ P ℓ a i ( n ) = 1 , ∀ ℓ ∈ [ L ] , ∀ n ∈ N , (1)ensures that exactly one pattern (which may be the dummy pattern d ( ℓ ) ) is selected for each RT ℓ and current state n .At any time t , we say that the arrival process for pattern i is active or passive according towhether a i ( N ( t )) is or respectively. The birth rate of process i ∈ P ℓ , ℓ ∈ [ L ] , is λ ℓ if a i ( N ( t )) = 1 ; and zero otherwise. The death rate of process i is µ i N i ( t ) . The time proportionthat a d ( ℓ ) ( N ( t )) = 1 is the blocking probability for requests of type ℓ .
2) Capacity Constraints:
To ensure feasibility of an allocation of a request of type ℓ ( i ) topattern i when the state is n , we need W ( n + e i ) ≤ C , (2)where e i is a vector with a one in the i th position and zeros everywhere else and C ∈ N J + is avector with entries C j . In view of the action constraint (1), a neat way to collect together theconstraints (2) for all i ∈ P ℓ is to write them in the form W ( n + E ℓ a ( n )) ≤ C , ∀ n ∈ N , (3)where E ℓ is a diagonal matrix of size I with entries e ℓ,i,i = 1 if i ∈ P ℓ and e ℓ,i,i = 0 if i ∈ [ I ] \ P ℓ .For two different request types ℓ and ℓ , a constraint of the form W ( n + E ℓ a ( n ) + E ℓ a ( n )) ≤ C , ∀ n ∈ N , (4)captures the idea that the action vector a ( n ) must be such that the allocation decisions for ℓ and ℓ ensure enough capacity to implement both of them when both requests arrive simultaneouslywhile the state is n . Another way to think about this is that, if a request of type ℓ is allocated toa non-dummy pattern i when the state is n , the decision for a request of type ℓ when the stateis n must satisfy constraint (3) when the state is n + e i . In particular, if there is not enough capacity to accommodate a request of type ℓ when the state is n + e i , then a request of type ℓ must be allocated to the dummy pattern d ( ℓ ) , when the state is n . This can be viewed as givingpriority to reserving resources for a type ℓ request over a type ℓ request when the state is n .As we shall see below, the decision to do this will be made in order to optimize a long-termreward function.Observing that P ℓ ∈ [ L ] E ℓ = I , we see that the constraint W ( n + a ( n )) = W (cid:16) n + (cid:0) X ℓ ∈ [ L ] E ℓ (cid:1) a ( n ) (cid:17) ≤ C , ∀ n ∈ N , (5)can be thought of as an extended version of (4). In (5), requests of all types are taken into accountwhen the state is n and allocation decisions for some types are made in order to reserve resourcesfor other types that turn out to be more profitable in the long run. In particular, resources arereserved for those request types ℓ which are allocated to non-dummy patterns i at the expenseof those types that are allocated to less profitable patterns or the corresponding dummy patterns.In this paper, all the results presented are based on capacity constraint (5).From (5), there is an upper bound, min j ∈ [ J ] ⌈ C j /w j,i ⌉ , on the number of instantiations ofpattern i , and this serves as a bounding state at which no further instantiation of this pattern canbe added; that is, N i = { , , . . . , min j ∈ [ J ] ⌈ C j /w j,i ⌉} and | N i | = min j ∈ [ J ] ⌈ C j /w j,i ⌉ + 1 < + ∞ .In this context, Equation (5) implies the condition a i ( n ) = 0 , if i / ∈ { d ( ℓ ) : ℓ ∈ [ L ] } and n i = | N i | − . (6)
3) Objective: A policy φ is defined as a mapping N → A where A := Q ℓ ∈ [ L ] { , } | P ℓ | ,determined by the action variables a ( n ) defined above. When we are discussing a systemoperating with a given policy φ , we rewrite the action and state variables as a φ ( · ) and N φ ( t ) ,respectively.By serving a request of type ℓ ∈ [ L ] and occupying an RU of pool j for one unit of time, wegain expected reward r ℓ and pay expected cost ε j . The expected reward for a whole service isgained at the moment the service is completed. It corresponds to the situation where a requestallocated to pattern i earns reward at rate r ℓ ( i ) µ i for as long as it is in the system (so thatthe expected revenue per customer is ( r ℓ ( i ) µ i ) . (1 /µ i ) = r ℓ ( i ) ). The value of ǫ j is the cost perunit time of using a unit of capacity from resource pool j in which case the expected costof accommodating the request in pool j as part of pattern i is ǫ j /µ i . We seek a policy that maximizes the revenue : the difference between expected reward and cost, by efficiently utilizingthe limited amount of resources.The objective is to maximize the long-run average rate of earning revenue, which existsbecause, for any policy φ , the process can be modeled by a finite-state Markov chain. Let r = ( r ℓ : ℓ ∈ [ L ]) and ε = ( ε j : j ∈ [ J ]) . For all ℓ ∈ [ L ] and i ∈ [ I ] , define a L × I matrix U with entries u ℓ,i := µ i i ∈ P ℓ . By the Strong Law of Large Numbers for ContinuousTime Markov Chains, see for example [Serfozo, 2009] Theorem 45 in Chapter 4, noting thesubsequent discussion of the case where rewards are earned at jump times, the long-run averagerate of earning revenue when the policy is φ is given by R φ := E π φ h r U − ε W i = X i ∈ [ I ] X n i ∈ N i π φi ( n i ) (cid:16) r ℓ ( i ) µ i − X j ∈ J w j,i ε j (cid:17) n i , (7)where π φi ( n i ) is the stationary probability that the state of process i is n i when the policy is φ .Then we wish to find the policy φ that maximizes R φ , that is we wish to find R := max φ R φ . (8)Define Φ to be the set of all policies with the constraints in (1) and (5) satisfied. Each policy in Φ is then a feasible policy for our resource allocation problem.III. W HITTLE R ELAXATION
Our resource allocation problem with objective function defined by (8) and constraints givenby (1) and (5) can be modeled as a set of RMABPs coupled by capacity constraints. We leavethe specification of the RMABPs to Appendix A.In this section, we provide a theoretical analysis of the resource allocation problem, followingthe idea of Whittle relaxation [Whittle, 1988]. In the vein of a RMABP, we randomize the actionvariable a φ ( n ) so that its elements take values from { , } with probabilities determined by thepolicy φ and relax constraint (1) to require that lim t → + ∞ E (cid:20) X i ∈ P ℓ a φi ( N φ ( t )) (cid:21) = 1 , ∀ ℓ ∈ [ L ] . (9)Following similar ideas, we relax (5) into two equations: lim t → + ∞ E (cid:20) W (cid:16) N φ ( t ) + a φ ( N φ ( t )) (cid:17)(cid:21) ≤ C , (10) and lim t → + ∞ E h a φi ( N φ ( t )) N φi ( t )= | N i |− i = 0 , ∀ i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } . (11) Remark
Equation (10) is derived by taking expectations for both sides of Equation (5), and (11)is a consequence of (6), so constraints described by (10) and (11) are relaxed versions of theconstraints described by (5). The justification for Equation (11) will be discussed in Section V-A,in conjunction with the physical meanings of all variables, when we increase the scale of theentire system. We refer to the problem with objective (8), constraints (9), (10) and (11) andrandomized control variables a φ ( n ) , for all n ∈ N , as the relaxed problem .A value a in (0 , can be interpreted as a randomisation between taking a φi ( n ) = 0 and a φi ( n ) = 1 . Specifically we take a φi ( n ) = 1 with probability a . We represent the set of policiesthat correspond to assigning such values a ∈ (0 , as ˜Φ . For n i ∈ N i , φ ∈ ˜Φ , i ∈ [ I ] , define • α φi ( n i ) := lim t → + ∞ E h a φi ( N φ ( t )) | N φi ( t ) = n i i , which is the expectation with respect to thestationary distribution when policy φ is used, and the vector α φi := ( α φi ( n i ) : n i ∈ N i ) ; • the stationary probability that N φi ( t ) = n i under policy φ to be π φi,n i , and the vector π φi :=( π φi,n i : n i ∈ N i ) .Let Π φn := (cid:16) π φi · ( N i ) : i ∈ [ I ] (cid:17) T and Π φa := (cid:16) π φi · α φi : i ∈ [ I ] (cid:17) T , where ( N i ) represents thevector (0 , , . . . , | N i | − . The Lagrangian function for the optimization problem with objectivefunction (8) and constraints (9), (10) and (11) is g ( γγγ, ν , η ) := max φ ∈ ˜Φ ( r U − ε W ) Π φn − L X ℓ =1 ν ℓ (cid:16) X i ∈ P ℓ π φi · α φi − (cid:17) − γγγ · (cid:16) W ( Π φn + Π φa ) − C (cid:17) − X i ∈ [ I ] \{ d ( ℓ ): ℓ ∈ [ L ] } η i π φi, | N i |− α φi ( | N i | − , (12)where ν ∈ R L , γγγ ∈ R J and η ∈ R I − L are Lagrange multiplier vectors for constraints (9),(10) and (11), respectively. In (12), the constraints no longer apply to variables α φi ( i ∈ [ I ] )but appear in the maximization as cost items weighted by their Lagrange multipliers. For i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } , define functions Λ i ( φ, γγγ, ν ℓ ( i ) , η i ) := ( r ℓ ( i ) µ i − ε · w i ) π φi · ( N i ) − ν ℓ ( i ) π φi · α φi − γγγ · (cid:0) w i ( π φi · ( N i ) + π φi · α φi ) (cid:1) − η i π φi, | N i |− α φi ( | N i | − , (13)where we recall that w i is the weight vector of pattern i given by the i th column vector of W ; similarly, for ℓ ∈ [ L ] , γγγ ∈ R J and η ∈ R , define Λ d ( ℓ ) ( φ, γγγ, ν ℓ , η ) := − ν ℓ α φd ( ℓ ) ( n ) , where n is theonly state in N d ( ℓ ) . From Equation (12), for γγγ ∈ R J , ν ∈ R L and η ∈ R I − L , g ( γγγ, ν , η ) = max φ ∈ ˜Φ X i ∈ [ I ] Λ i ( φ, γγγ, ν ℓ ( i ) , η i ) + X ℓ ∈ [ L ] ν ℓ + γγγ · C . (14)where η d ( ℓ ) ( ℓ ∈ [ L ]) are unconstrained real numbers that are used for notational convenience.In the maximization problem on the right hand side of (14), there is no constraint that restrictsthe value of one Λ i ( φ, γγγ, ν ℓ ( i ) , η i ) once the others are known. As a result, we can maximize thesum in (14) by maximizing each of the summands independently. We can thus write (14) as g ( γγγ, ν , η ) = X i ∈ [ I ] max φ ∈ ˜Φ Λ i ( φ, γγγ, ν ℓ ( i ) , η i ) + X ℓ ∈ [ L ] ν ℓ + γγγ · C , (15)but with the maximum over φ ∈ ˜Φ . Observe now that maximizing Λ i over φ is equivalent tochoosing α φi ( n i ) from [0 , | N i | , by interpreting α φi,n ∈ [0 , as the probability that process i isactivated under policy φ when it is in state n . Thus, g ( γγγ, ν , η ) = X i ∈ [ I ] max α φi ∈ [0 , | N i | Λ i ( φ, γγγ, ν ℓ , η i ) + X ℓ ∈ [ L ] ν ℓ + γγγ · C . (16)By slightly abusing notation, we refer to the policy φ determined by an action vector α φi asthe policy for pattern i , and define Φ i as the set of all policies for pattern i . Definition 1.
The maximization of Λ i ( φ, γγγ, ν ℓ , η i ) over α φi ∈ [0 , | N i | is the sub-problem forpattern i ∈ [ I ] . For given γγγ , ν and η , the sub-problem for any pattern is an MDP, so that it can be numericallysolved by dynamic programming. By solving the sub-problems for all patterns i ∈ [ I ] , we obtain g ( γγγ, ν , η ) . For any γγγ , ν and η , the Lagrangian function g ( γγγ, ν , η ) is a performance upper boundfor the primal problem described in (8), (9), (10) and (11), which is a relaxed version of theoriginal resource allocation problem. Thus there will be a non-negative gap between this upperbound and the maximized performance of the original problem. A. Analytical Solutions
Proposition 1.
For given ν and γγγ , there exists E ∈ R I − L such that, for any η > E , a policy ofthe sub-problem for pattern i , referred to as ¯ ϕ ∈ Φ i , determined by action vector α ¯ ϕi ∈ [0 , | N i | is optimal for this sub-problem, if, for n ∈ N i , α ¯ ϕi ( n ) = 1 if < λ ℓ ( r ℓ − µ i P j ∈ J i ε j w j,i ) − (1 + λ ℓ µ i ) P j ∈ J i w j,i γ j − ν ℓ and n < | N i | − , (17) ∈ [0 , if λ ℓ ( r ℓ − µ i P j ∈ J i ε j w j,i ) − (1 + λ ℓ µ i ) P j ∈ J i w j,i γ j − ν ℓ and n < | N i | − , (18) = 0 otherwise, (19) where ℓ = ℓ ( i ) . The proof will be given in Appendix B in the e-companion to this paper. In the maximizationof Λ i ( φ, γγγ, ν ℓ ( i ) , η i ) ), the only term of Λ i dependent on η is − η i π φi, | N i |− α φi ( | N i | − . The choiceof a sufficiently large η i guarantees that α φi ( | N i | − is for an optimal policy of the sub-problem, so that constraints (11) of the relaxed problem are satisfied. For convenience, in whatfollows we fix η to be one of these large values so that α φi ( | N i | − is also fixed to be for anyoptimal policy φ of the sub-problem for pattern i . By slightly abusing notation, in all subsequentequations and discussions, we directly require α φi ( | N i | −
1) = 0 ( i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } ) unlessspecified otherwise. Remark
Recall that the action variables α φi for any pattern i ∈ [ I ] and policy φ ∈ Φ i arepotentially state-dependent. However, the right hand sides of equations (17)-(19) are independentof the state variable n which appears on their left hand side, provided that this is less than | N i | − . This state-independence phenomenon is a consequence of the linearity of the rewardand cost rates in the state variable, N φi ( t ) , for pattern i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } : from ourdefinition in Section II, the reward and cost rates of process i in state N φi ( t ) are r ℓ ( i ) µ i N φi ( t ) and P j ∈ J i ε j w j,i N φi ( t ) , respectively. A detailed analysis is provided in the proof of Proposition 1.Using an argument similar to that in [Whittle, 1988], we can derive from (17)-(19) an ab-stracted priority for each pattern-state pair (PS pair) ( i, n ) with n ∈ N i \{| N i | − } and i ∈ [ I ] ;unlike in [Whittle, 1988], here, this priority is ( γγγ, ν ) -dependent. The priority of a PS pair ( i, n ) with n ∈ N i \{| N i | − } is determined by the index Ξ i ( γγγ, ν ) := λ ℓ ( i ) (cid:16) r ℓ ( i ) − µ i X j ∈ J i ε j w j,i (cid:17) − (cid:16) λ ℓ ( i ) µ i (cid:17) X j ∈ J i w j,i γ j − ν ℓ ( i ) , (20)and (17)-(19) can be characterized as comparing Ξ i ( γγγ, ν ) with . When there is strict inequalityin the comparison (that is, the cases described in (17) and (19)), the value of α φi ( n ) is specified,since PS pairs ( i, n ) for all n ∈ N i \{| N i | − } correspond to the same Ξ i ( γγγ, ν ) value. However, there is still freedom to decide different values of α φi ( n ) , when Ξ i ( γγγ, ν ) = 0 (the case describedin (18)). A detailed discussion about priorities of PS pairs corresponding to the same Ξ i ( γγγ, ν ) will be provided in Section III-B. By solving the sub-problem of dummy pattern d ( ℓ ) ( ℓ ∈ [ L ]) which involves only one state n ∈ N d ( ℓ ) , we obtain an optimal policy ϕ determined by α ϕd ( ℓ ) ( n ) = 1 , if < − ν ℓ , ∈ [0 , , if − ν ℓ , = 0 , otherwise . (21)The priority of the state of a dummy pattern is then Ξ d ( ℓ ) ( γγγ, ν ) ≡ − ν for any γγγ .In addition, from Equation (19) in Proposition 1, for any given ν ∈ R I and γγγ ∈ R J , there exists η ∈ R I − L such that it is optimal to make states | N i | − passive (that is, α ¯ ϕi ( | N i | −
1) = 0 ) forall i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } . Among all PS pairs ( n, i ) ( n ∈ N i , i ∈ [ I ] ), we assign, without lossof generality, the least priority to those PS pairs ( i, | N i | − for which i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } .The policy ¯ ϕ determined by (17)-(19) and (21) is optimal for the relaxed problem describedby (8), (9), (10) and (11), if the given multipliers ν and γγγ that appear in (17)-(19) and (21)satisfy the complementary slackness conditions of this relaxed problem, defined by Complementary Slackness: ν ℓ (cid:16) X i ∈ P l π φi · α φi − (cid:17) = 0 , ∀ l ∈ [ L ] , (22) and γ j (cid:16) ω j · (cid:0) Π φn + Π φa (cid:1) − C j (cid:17) = 0 , ∀ j ∈ [ J ] , (23) where ω j = ( w j,i : i ∈ [ I ]) is the j th row of matrix W . In this context, if resource pool j ∈ [ J ] is very popular so that the capacity constraint corre-sponding to the j th row in (10) achieves equality, then γ j is allowed to be positive, leading to alower value of Ξ i ( γγγ, ν ) than for γ j = 0 . On the other hand, if resource pool j ∈ [ J ] cannot befully occupied and the j th capacity constraint in (10) is satisfied with a strict inequality, thenthe complementary slackness condition described in (23) forces γ j to be zero. Following thismechanism, when resource pool j ∈ [ J ] is overloaded and its priority is reduced by increasing γ j , the offered traffic to this resource pool will be reduced in line with its priority.If there exist multipliers ν , γγγ and a policy ¯ ϕ determined by (17)-(19), such that the comple-mentary slackness conditions (22) and (23) are satisfied by taking φ = ¯ ϕ , then, by the strongduality theorem, this policy ¯ ϕ is optimal for the relaxed problem; this observation, together withTheorem 1 in Section V-C, leads to the existence of an asymptotically optimal policy feasible for the original problem, derived with priorities of patterns induced by the descending order of Ξ i ( γγγ, ν ) . More details about the analysis in the asymptotic regime will be provided in Section V.Here we focus on the non-asymptotic regime, and specifically on the choice and computationof γγγ and ν . B. Decomposable Capacity Constraints
In the general case, it is not clear whether the the complementary slackness conditions (22)and (23) can be satisfied and, even if they are, what the values of γγγ and the corresponding ν are.More important is the question of how the multipliers help with proposing the asymptoticallyoptimal policy applicable to the original problem.In Sections III-B and III-C, we concentrate on the complementary slackness conditions andthe existence of an optimal policy (for the relaxed problem) satisfying (17)-(19). Recall that(17)-(19) intuitively suggest priorities of patterns induced by Ξ i ( γγγ, ν ) . Later in Section IV, apolicy feasible for the original problem will be proposed based on given priorities of patterns,and its asymptotic optimality will be discussed in Section V-C.
1) Priorities of PS Pairs:
As described in Section III-A, the priorities of PS pairs aredetermined by the descending order of Ξ i ( γγγ, ν ) , with higher priorities given by higher values of Ξ i ( γγγ, ν ) . It may happen that, because of different tie-breaking rules, the same γγγ and ν lead todifferent priorities. For given γγγ ∈ R J and ν ∈ R L , let O ( γγγ, ν ) represent the set of all rankings ofPS pairs according to the descending order of Ξ i ( γγγ, ν ) ( i ∈ [ I ] ). Also, for notational convenience,let O represent the set of all PS pair rankings.To emphasize the priorities of these PS pairs, according to a given ranking o ∈ O , we labelall these pairs by their order ι o ∈ [ N ] with N := P i ∈ [ I ] | N i | and ( i ι o , n ι o ) giving the pattern andthe state of the ι o th PS pair. We will omit the superscript o and use ι for notational simplicityunless it is necessary to specify the underlying ranking. There exists one and only one ℓ ∈ [ L ] satisfying i ι ∈ P ℓ for any PS pair labeled by ι . Such an ℓ is denoted by ℓ ι .For any given ranking of PS pairs o ∈ O , we can generate a policy ¯ ϕ ( o ) with priorities ofPS pairs defined by o , such that (9), (10) and (11) are satisfied: the policy ¯ ϕ ( o ) is feasible forthe relaxed problem but not necessarily feasible for the original problem. The pseudo-code forgenerating ¯ ϕ ( o ) is presented in Algorithm 1. The key idea for generating such a ¯ ϕ ( o ) is toinitialize α ¯ ϕ ( o ) i to for all i ∈ [ I ] , and sequentially activate the PS pairs according to their Input : a vector of non-negative reals γγγ ∈ R J and a ranking of PS pairs o ∈ O . Output: a policy ¯ ϕ ( o ) ∈ ˜Φ determined by action variables α ¯ ϕ ( o ) i ∈ [0 , | N i | for all i ∈ [ I ] and a vectorof reals ν ( o, γγγ ) . Function
PriorityPolicy( o, γγγ ) : α ¯ ϕi ← for all i ∈ [ I ] /* Variables α ¯ ϕi determine a policy ¯ ϕ */ Initializing the list of candidate PS pairs as the list of all PS pairs ι ← /* Iteration variable */ while ι < N and the list of candidate PS pairs is not empty do ι ← ι + 1 if PS pair ι is not in the list of candidate PS pairs then continue end a ← inf nn α ¯ ϕi ι ( n ι ) ∈ [0 , (cid:12)(cid:12)(cid:12) P i ∈ P ℓι π ¯ ϕi · α ¯ ϕi = 1 o ∪ { } o /* The maximal probability of activating PS pair ι such that *//* the action constraint for RT ℓ ι is not violated. */ a ← inf (cid:8)(cid:8) α ¯ ϕi ι ( n ι ) ∈ [0 , | ∃ j ∈ [ J ] , ω j · ( Π ¯ ϕn + Π ¯ ϕa ) = C j (cid:9) ∪ { } (cid:9) /* The maximal probability of activating PS pair ι such that *//* the capacity constraints are not violated. */ α ¯ ϕi ι ( n ι ) ← min { a , a } /* Update α ¯ ϕi ι ( n ι ) with the maximal activating probability *//* without violating any constraint. */ if P i ∈ P ℓι π ¯ ϕi · α ¯ ϕi = 1 then /* If the action constraint achieves equality under policy ¯ ϕ */ /* determined by updated α ¯ ϕi , i ∈ [ I ] . */ ν ℓ ι ( o, γγγ ) ← Ξ i ι ( γγγ, ) remove all PS pairs ι ′ > ι with ℓ ι ′ = ℓ ι from the list of candidate PS pairs else if ∃ j ∈ [ J ] , ω j · ( Π ¯ ϕn + Π ¯ ϕa ) = C j then /* If a capacity constraint achieves equality under policy ¯ ϕ */ /* determined by updated α ¯ ϕi , i ∈ [ I ] . */ remove all PS pairs ι ′ > ι with w j,i ι ′ > from the list of candidate PS pairs end end α ¯ ϕ ( o ) i ← α ¯ ϕi for all i ∈ [ I ] return Algorithm 1:
Priority-style policy for the relaxed problempriorities defined by o until either a relaxed action or capacity constraint described in (9) and(10), respectively, achieves equality. In particular,(I) if a relaxed action constraint described in (9) achieves equality by activating PS pairs lessthan or equal to ι , then the multiplier ν ℓ ι is set to Ξ i ι ( γγγ, ) , and all later PS pairs ι ′ > ι with ℓ ι ′ = ℓ ι are disabled from being activated and are removed from the list of candidatepairs awaiting later activation;(II) similarly, if a relaxed capacity constraint described in (10) associated with resource pool j ∈ [ J ] achieves equality by activating PS pairs less than or equal to ι , then all later PSpairs ι ′ > ι with w j,i ι ′ > are disabled and removed from the list of candidate states.Maintaining an iteratively updated list of candidate pairs in this way continues until all actionconstraints in (9) achieve equality: the policy ¯ ϕ ( o ) is determined by the resulting α ¯ ϕ ( o ) i ( i ∈ [ I ] ), and the multipliers ν are updated in (I). The vector of these multipliers is denoted by ν ( o, γγγ ) .The PS pair labeled by ι satisfying the condition described in (II) is called the critical pair , withthe corresponding resource pool j referred to as the critical pool of PS pair ι , denoted by j ι ( o ) .Note that, from the description in (II), there might be more than one resource pool for which thecapacity constraints achieve equality simultaneously while activating PS pair ι ; we choose oneof them to be j ι ( o ) and refer to this resource pool as the critical pool of ι . Let I ( o ) representthe set of all critical pairs with respect to the policy ¯ ϕ ( o ) . Lemma 1.
For any o ∈ O and ι, ι ′ ∈ I ( o ) , if ι = ι ′ then i ι = i ι ′ .Proof. Consider critical pairs ι, ι ′ ∈ I ( o ) with ι = ι ′ , and assume ι < ι ′ without loss ofgenerality. Since ι is a critical pair, there is a critical resource pool j ι which is fully occupied. Inthis case, if i ι = i ι ′ , then pair ι ′ must require some resource units from pool j ι and so α ¯ ϕ ( o ) ι ′ = 0 .PS pair ι ′ cannot be critical, which violates the condition ι ′ ∈ I ( o ) . Hence, i ι = i ι ′ . This provesthe lemma. (cid:3) Recall, for any ranking o , the policy ¯ ϕ ( o ) must satisfy the action and capacity constraints (9),(10) and (11). Also, since (9) holds, the complementary slackness conditions correspondingto the action constraints (22) are satisfied by taking φ = ¯ ϕ ( o ) . However, the complementaryslackness conditions corresponding to the capacity constraints (23) and equations (17)-(19) arenot necessarily satisfied if we plug in φ = ¯ ϕ ( o ) and γγγ : the policy ¯ ϕ ( o ) is a heuristic policyapplicable for the relaxed problem defined by (8), (9), (10) and (11) derived by intuitivelyprioritizing PS pairs according to their ranking o ∈ O .In Section III-C we shall define a particular class of resource allocation models, for whichwe can show the complementary slackness conditions are indeed satisfied. Definition 2.
The system said to be decomposable if there exist multipliers γγγ ∈ R J , ν ∈ R L anda ranking o ∈ O ( γγγ, ν ) such that ν = ν ( o, γγγ ) and the complementary slackness conditions (22) and (23) are achieved by taking φ = ¯ ϕ ( o ) . In this case the optimal values of the dual variablesare called decomposable values . Recall that, in the general case, for γγγ ∈ R J and ν ∈ R L , even if o ∈ O ( γγγ, ν ) , the policy ¯ ϕ ( o ) is not necessarily optimal (because it does not necessarily satisfy (17)-(19)). When the policy ¯ ϕ ( o ) is optimal for the relaxed problem, the ranking o can be used to construct an index policyapplicable to the original problem (detailed steps are provided in Section IV). Theorem 1 (in Section V-C) then ensures that such an index policy is asymptotically optimal.
2) Derivation of the Pair Ranking:
We start with a proposition that shows how the valuesof the Lagrange multipliers ν and γγγ can be derived from a knowledge of the critical pair andcritical resource pool corresponding to a given order o ∈ O . Proposition 2.
For any given γγγ ∈ R J and o ∈ O , the linear equations ν ℓ ι ( o, γγγ ) = Ξ i ι ( γγγ, ) , ∀ ι ∈ I ( o ) (24) and γ j = 0 , ∀ j / ∈ { j ι ( o ) ∈ [ J ] | ι ∈ I ( o ) } (25) have a unique solution γγγ ∈ R J . The proof of Proposition 2 will be given in Appendix C in the e-companion. For a ranking o ∈ O , define an function T o of γγγ ∈ R J with respect to o ∈ O : T o ( γγγ ) := γγγ where γγγ is theunique solution of (24) and (25). Let T oj ( γγγ ) represent the j th element of T o ( γγγ ) . Proposition 3.
If there exist γγγ ∈ R J and o ∈ O ( γγγ , ) such that T o ( γγγ ) = γγγ , then γγγ is avector of decomposable multipliers and the policy ¯ ϕ ( o ) based on ranking o is optimal for therelaxed problem defined by (8) , (9) , (10) and (11) . The proof of Proposition 3 will be given in Appendix D in the e-companion. Recall that I ( o ) is the set of critical pairs with respect to the policy ¯ ϕ ( o ) , j ι ( o ) is the critical resourcepool corresponding to critical pair ι ∈ I ( o ) according to ranking o , and ν ℓ ι ( o, γγγ ) is an outputof Algorithm 1 when the inputs are o and γγγ = γγγ . Remark
Proposition 3 provides a way of checking decomposability of γγγ and optimality of ¯ ϕ ( o ) . By Proposition 3, any fixed point γγγ ∈ R J of the function T o with respect to a ranking o ∈ O ( γγγ , ) is a decomposable vector. The decomposability of γγγ can be checked withoutrequiring knowledge of any ν ∈ R L . Also, we present the following corollary of Proposition 3. Corollary 1.
For γγγ ∈ R J and o ∈ O ( γγγ , ) , if T o ( γγγ ) = γγγ , T o ( γγγ ) ∈ R J and o ∈ O ( T o ( γγγ ) , ) , then T o ( T o ( γγγ )) = T o ( γγγ ) . Note that the hypothesis of Corollary 1 requires all components of T o ( γγγ ) to be nonnegative.This is not such an easy condition to satisfy. The proof of Corollary 1 will be given in Appendix Ein the e-companion. In this context, consider a given γγγ ∈ R J and a ranking o ∈ O ( γγγ , ) . If γγγ is a fixed pointof T o , then it is the vector of decomposable multipliers; if it is not but T o ( γγγ ) is a nonnegativefixed point of T o , then T o ( γγγ ) represents the decomposable multipliers. However, in both caseswe need to propose a specific γγγ ; it requires prior knowledge of the multipliers, which is, ingeneral, not available. Section III-C will discuss a special case where the decomposability isprovable and we have a method of deriving the decomposable multipliers. Here, to make areasonably good choice of the Lagrangian multipliers in a general system, we embark on a fixedpoint iteration method .Since Proposition 3 requires a fixed point γγγ of the function T o with o ∈ O ( γγγ, ) , we need toiterate not only the value of γγγ but also the corresponding ranking o which affects the function T o and should be an element of O ( γγγ, ) . Following the idea of conventional fixed point interationmethods, for k ∈ N , let γγγ k +1 = (cid:0) T o k ( γγγ k ) (cid:1) + with initial γγγ and o ∈ O ( γγγ , ) , where ( v ) + :=(max { , v i } : i ∈ [ N ]) for a vector v ∈ R N ( N ∈ N + ). Construct a ranking o k +1 ∈ O ( γγγ k +1 , ) according to o k : for any two different PS pairs ( i, n ) and ( i ′ , n ′ ) with Ξ i ( γγγ k +1 , ) = Ξ i ′ ( γγγ k +1 , ) , ( i, n ) precedes ( i ′ , n ′ ) in the ranking o k +1 if and only if ( i, n ) precedes ( i ′ , n ′ ) in the ranking o k . Here, the operation ( · ) + is used to make all the elements of γγγ k +1 non-negative, so that γγγ k +1 is feasible for the function T o k +1 . Thus the ranking o k +1 inherits the tie-breaking ruleused for o k so that the difference between o k and o k +1 , which must satisfy o k ∈ O ( γγγ k , ) and o k +1 ∈ O ( γγγ k +1 , ) , is minimized. Corollary 1 can be used to check whether the γγγ k +1 is a fixedpoint of the function T o k . Also, γγγ k +1 and o k +1 are uniquely determined by γγγ k and o k . We canconsider ( γγγ k , o k ) as an entity which is an argument delivered to the function T o k ( γγγ k ) , and wishto find a fixed point in this sense.In the general case, the function T o k ( γγγ k ) is discontinuous in γγγ k and the sequence { γγγ k } ∞ k =0 is heuristically generated with no proof of convergence to a fixed point. In fact, the choice of γγγ k +1 = (cid:0) T o k ( γγγ k ) (cid:1) + may result in the sequence { γγγ k } ∞ k =0 being trapped in oscillations. To avoidthis, with slight abuse of notation, we modify the iteration as γγγ k +1 = (cid:0) c T o k ( γγγ k ) + (1 − c ) γγγ k (cid:1) + with a parameter c ∈ [0 , , which captures the effects of exploring the new point T o k ( γγγ k ) .Numerical examples of iterating γγγ k will be provided in Section VI.With an upper bound, U ∈ N + , we take k ∗ := arg min k =1 , ,...,U k γγγ k − − γγγ k k and consider o k ∗ as a reasonably good ranking of PS pairs. Such o k ∗ is pre-computable with computationalcomplexity no worse than O ( U ( N + J )) , where N and J result from ordering the N pairs andsolving the J linear equations, respectively. In Section IV, we show that an index policy feasible for the original problem can always be generated with such an o k ∗ , and the implementationcomplexity is O ( I ) in terms of computation and storage. C. Weakly Coupled Constraints
Here, we discuss a sufficient condition under which the sequence { γγγ k } ∞ k =0 is provably con-vergent; and, in Section VI, when this condition fails, we show via numerical examples that thesequence might still converge. Definition 3.
Recall the matrix W = ( w j,i ) defined in Section II-A. We say that row j ∈ [ J ] is1) a type-1 row if there is at most one i ∈ [ I ] with w j,i > ;2) a type-2 row if there is more than one i ∈ [ I ] with w j,i > . That is, row j is a type-1 row if resource pool j is not shared by patterns of different types;and is a type-2 row, otherwise. Denote by J i = { j ∈ [ J ] | w j,i > } the set of resource poolsused by pattern i . We then define a condition. Weak Coupling:
A system is weakly coupled if, for any pattern i , there is at most one j ∈ J i with row j of W being a type-2 row. This condition implies that there is at most one shared resource pool associated with each pattern.In a weakly coupled system, if pattern i shares a resource pool j with pattern i and pattern i shares a resource pool j with pattern i then j = j . A system where each of the patternsrequires only one resource pool is clearly weakly coupled. Note that, in a weakly coupled system,dependencies between state variables of different patterns still exist, because each resource poolcan be shared by requests of multiple RTs. Definition 4.
For a weakly coupled system define, for each i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } , w ∗ i = w j,i where j is the only resource pool in J i shared with other patterns, if there is one; or anymember of the set arg min j ′ ∈ J i C j ′ w j ′ ,i , otherwise. Definition 5.
For a weakly coupled system define, for ν ∈ R L , a set of PS rankings O ∗ ( ν ) ⊂ O such that, for any o ∈ O ∗ ( ν ) , PS pairs ι ∈ [ N ] are ranked according to the descending order of Ξ ∗ ι = Ξ iι ( , ) − ν ℓι w ∗ iι (1+ λ ℓι /µ iι ) , if ∄ ℓ ∈ [ L ] , i ι = d ( ℓ ) , , otherwise , (26) Proposition 4.
If the system is weakly coupled and there exists a ranking o ∈ O ∗ ( ) satisfying ν ( o, ) = , then the capacity constraints described in (10) are decomposable and the policy ¯ ϕ ( o ) is optimal for the relaxed problem defined by (8) and (9) - (11) . In particular, there existdecomposable multipliers γγγ ∈ R J satisfying, for j ∈ [ J ] ,i) if there is a critical PS pair ι ∈ I ( o ) with critical resource pool j = j ι ( o ) , and no j ′ = j with j ′ ∈ J i ι is critical for any other PS pair ι ′ ∈ I ( o ) , then γ j = Ξ i ι ( , ) − ν ℓ i w j,i ι (1 + λ ℓ ι /µ i ι ) ; (27) ii) if there are critical PS pairs ι and ι ′ in I ( o ) with critical resource pools j = j ι ( o ) = j ι ′ ( o ) and j ι ′ ( o ) ∈ J i ι , then γ j = w j ι ′ ( o ) ,i ι w j,i ι Ξ i ι ( , ) − ν ℓ ι w j ι ′ ( o ) ,i ι (1 + λ ℓ ι /µ i ι ) − Ξ i ι ′ ( , ) − ν ℓ ι ′ w j ι ′ ( o ) ,i ι ′ (cid:0) λ ℓ ι ′ /µ i ι ′ (cid:1) ! ; (28) iii) otherwise, γ j = 0 . (29)The proof is given in Appendix G in the e-companion. Note that, from Lemma 1, for anycritical PS pairs ι, ι ′ ∈ I ( o ) with ι = ι ′ , it follows that i ι = i ι ′ . If the system is weakly coupled,for any j ∈ [ J ] , there exist at most two different critical pairs ι ∈ I ( o ) satisfying j ∈ J i ι .Also, in a weakly coupled system, for the second case stated in Proposition 4, if there are criticalPS pairs ι and ι ′ in I ( o ) with critical resource pools j = j ι ( o ) = j ι ′ ( o ) and j ι ′ ( o ) ∈ J i ι , then j ι ( o ) / ∈ J i ι ′ because there is at most one resource pool in J i ι shared with other patterns.In Proposition 4, the assumption that the system is weakly coupled constrains the way inwhich resource pools are shared by different requests. The case where there is an o ∈ O ∗ ( ) with ν ( o, ) = will occur when the relaxed action constraint (9) is satisfied with α ¯ ϕ ( o ) d ( ℓ ) ( n ) > for the only n ∈ N d ( ℓ ) and for all ℓ ∈ [ L ] . To see this, note that the construction of the policy ¯ ϕ ( o ) guarantees that the resulting multipliers ν ( o, ) will be non-negative, and so it followsfrom (21) that α ¯ ϕ ( o ) d ( ℓ ) ( n ) > only if ν ℓ ( o, ) = 0 . That is, having ν ℓ ( o, ) = 0 is associatedwith there being a positive probability that the dummy pattern d ( ℓ ) is selected in the relaxedsystem. Furthermore, if there is a PS pair ι (for a non-dummy pattern i ι ∈ P ℓ ) which satisfiesthe condition described in (I), that is PS pair ι causes the relaxed action constraint (9) to bite,Algorithm 1 will ensure that α ¯ ϕ ( o ) i ι ′ ( n ι ′ ) = 0 for all PS pairs ι ′ ranked lower than ι according tothe order o . In particular, this will cause α ¯ ϕ ( o ) d ( ℓ ) ( n ) = 0 for the only n ∈ N d ( ℓ ) .So if α ¯ ϕ ( o ) d ( ℓ ) ( n ) > , it is because the relaxed capacity constraints (10) bite before the relaxedaction constraints (9). If this is true for all ℓ , then the capacity constraints are biting for everyrequest type, and so we refer to the case where there is an o ∈ O ∗ ( ) with ν ( o, ) = as a heavy traffic condition. Heavy Traffic:
The system is in heavy traffic if there is a ranking o ∈ O ∗ ( ) such that ν ( o, ) = . Remark
The property of being weakly coupled and in heavy traffic simplifies the analysis ofthe complementary slackness condition of the relaxed problem. In particular, the index relatedto a pattern, described in Equation (20), is affected only by the multipliers of resource pools j ∈ [ J ] with w j,i > . Weak coupling helps reduce the number of such multipliers γ j , sothat the index of a pattern is affected by at most one γ j , which in turn affects other patternindices. When the system is weakly coupled and in heavy traffic, we can analytically solve the I linear equations (24) and (25) and derive the φ and γγγ that satisfy the complementary slacknesscondition described in equaitons (22) and (23). A detailed discussion is provided in the proofof Proposition 4.Proposition 4 guarantees the decomposability of a system when it is weakly coupled andin heavy traffic. The property of being weakly coupled and in heavy traffic is stronger thannecessary for decomposability, but it is simple to check and is satisfied in a number of commonresource allocation problems. We consider examples about how to easily define such a system.As explained above, the heavy traffic property is usually satisfied when the service capacityis not enough (or just enough) to address its high traffic load. On the other hand, the weakcoupling specifies the structure of the weight matrix W . For instance, if each pattern involvesonly one resource pool (that is, for all i ∈ [ I ] \{ d ( ℓ ) | ℓ ∈ [ L ] } , | J i | = 1 ), then the system isweakly coupled as each resource pool is still potentially shared by requests of different types.Within the above framework, we can model skill-based resource pooling in call centers (see[Wallace and Whitt, 2004], [Cezik and L’Ecuyer, 2008]) as a weakly coupled resource allocationproblem; and when its traffic load is also heavy, the system is decomposable. In each call center,agents are trained for several skills, such as two or three languages, and are able to handle somebut not all of the incoming calls. We classify these agents into multiple call centers accordingto their trained skills; that is, all agents in the same call center have the same skills and areable to serve the same types of calls. In this context, a call corresponds to a request, an agentcorresponds to a resource unit, a call center is a resource pool and a call type is a request type.Since each call is served by an agent with appropriate skills, each pattern consists of onlyone call center (resource pool) and selecting a pattern means selecting an agent (a resource unit) from the corresponding call center: this problem is weakly coupled. Note that agents of each callcenter are potentially serving different types of calls simultaneously, and the capacity constraints(5) still restrict the system because of the limited number of agents in each call center.In particular, the trained skills are used to establish the availability of an agent to serve a call,and do not relate to any concept defined in the resource allocation problem. When an agent ofcall center j ∈ [ J ] is able to serve calls of type ℓ ∈ [ L ] , regardless of the skills needed for thisservice, there is a pattern i in P ℓ with w j,i = 1 and w j ′ ,i = 0 for all j ′ ∈ [ J ] \{ j } . For instance,a call center has agents who can speak English and Chinese, and there are two types of calls:one requires English or French-speaking agents and the other Chinese or Japanese speakers. Acall of either type can be served by an agent of this call center, and many calls of both typescan be served by this call center simultaneously.Other problems with similar features, such as health-care task scheduling for agents with differ-ent qualifications (see [Lieder et al., 2015]) and home health-care scheduling (see [Fikar and Hirsch, 2017]),can also be modeled as weakly coupled systems. And, of course, when the systems are also inheavy traffic, they are decomposable.A virtual machine (VM) replacement problem can be modeled as a resource allocation problem(see [Stolyar, 2013], [Stolyar, 2017]). VM replacement is about consolidating multiple VMs ontoa set of physical machines/servers, where each physical server can usually accommodate morethan one VM simultaneously. To consolidate a VM, certain numbers of physical units, suchas CPU cycles, memory, disk, or I/O ports, located on a server will be occupied by this VMuntil it is completed. The VMs and servers are potentially different, and, because of differentserver profiles or user preference, a server is not necessarily able to accommodate every VM.Consider a simple version, for which the capacity of a server is determined by the total amountof only one type of physical unit: this server has a plentiful supply of the others or is notaware of other physical units. In this case, regarding a VM as a request, a server as a resourcepool and a physical unit of the shortage type as a resource unit, we obtain a resource allocationproblem that is weakly coupled. Similar problems in computer networks, such as the virtual nodeembedding (see [Esposito et al., 2016]), server provisioning in distributed cloud environments(see [Wei and Neely, 2017]), and wireless resource scheduling (see [Chen et al., 2017]), canpotentially be modeled as weakly coupled resource allocation problems. And as before, whenthe weakly coupled systems are in heavy traffic, the decomposability property holds.As in [Stolyar, 2013], [Stolyar, 2017], for general VM replacement problems, each server capacity is not necessarily constrained by physical units of just one type. As above, we modela VM as a request, a physical unit as a resource unit, and the set of all physical units of thesame type located on the same server as a resource pool. In this context, the capacity of eachresource pool is determined by the total number of its associated physical units of a giventype on a server and the weak coupling property cannot hold in general. It follows that, unlikethe preceding examples, the system is not necessarily decomposable. However, as discussed inSection III-B2, a decomposable system that is not weakly coupled or in heavy traffic can befound by finding a fixed point γγγ ∈ R J of the function T o ( o ∈ O ( γγγ, ) ). Numerical examplesof such systems will be provided in Section VI.IV. T HE I NDEX P OLICY : I TS I MPLEMENTATION IN THE N ON -A SYMPTOTIC R EGIME
In Section III, we considered the relaxed problem with constraints (9)-(11). Here, we returnto the original problem with constraints (1) and (5).For each RT ℓ ∈ [ L ] , we refer to a policy ϕ ∈ Φ as an index policy according to PS-pairranking o ∈ O , if it always prioritizes a candidate process in a PS pair with a ranking equal orhigher than those of all the other candidate processes. This policy ϕ is applicable to the originalproblem while, the policy ¯ ϕ ( o ) proposed in Section III-B1 is not in general. The method ofimplementing such a ϕ is not unique; for instance, the computation of the ranking of the PSpairs can vary. Here we propose one possible implementation.For t > , we maintain a sequence of I ordered PS pairs ( i, N ϕi ( t )) ( i ∈ [ I ] ) that areassociated with the I patterns, according to the given ranking o and the state vector N ϕ ( t ) : PSpair ( i, N ϕi ( t )) is placed ahead of ( i ′ , N ϕi ′ ( t )) if and only if the former precedes the latter in theranking o . Let i oσ ( N ϕ ( t )) ( σ ∈ [ I ] ) represent the pattern associated with the σ th PS pair in thisordered sequence.For a general ranking o ∈ O , the variables i oσ ( N ϕ ( t )) are potentially updated at each statetransition. Nonetheless, for the purpose of this paper, we mainly focus on the rankings o ∈ O ( γγγ, ν ) (for some γγγ ∈ R J and ν ∈ R L ) that follow the descending order of Ξ i ( γγγ, ν ) . In thiscase, the variables i oσ ( N ϕ ( t )) are updated only if a pattern i ∈ [ I ] \{ d ( ℓ ) | ℓ ∈ [ L ] } transitionsinto or out of its boundary state | N i | − . Consider the capacity constraints X i ′ ∈ [ I ] w j,i ′ N ϕi ′ ( t )+ X i ′ ∈ [ I ] ,i ′ = i w j,i ′ a ϕi ′ ( N ϕ ( t ))+ a ϕi ( N ϕ ( t )) ≤ l C j (cid:0) − ¯ ǫ j,ι ( i,N ϕi ( t )) (cid:1)m , ∀ j ∈ [ J ] , i ∈ [ I ] , (30)where ι ( i, N ϕi ( t )) ∈ [ N ] represents the order of PS pair ( i, N ϕi ( t )) in the ranking o and ¯ ǫ ∈ [0 , J × [ N ] is a given matrix of parameters. Apart from this matrix of parameters, constraints (30)are the same as constraints (5). As we shall discuss in Section V-B, we choose the ¯ ǫ j,ι such that ¯ ǫ j,ι C j ≥ w j,i ι − and, for any j ∈ [ J ] and PS pairs ι < ι ′ with respect to the given ranking o , if w j,i ι , w j,i ι ′ > , then ¯ ǫ j,ι < ¯ ǫ j,ι ′ . In this context, if ¯ ǫ j,ι C j ∈ [ w j,i ι − , w j,i ι ) for all ι ∈ [ N ] and j ∈ [ J ] , then constraints (30) reduce to (5); otherwise, they are more stringent than (5). Theparameter ¯ ǫ is used to specify the trajectory of the underlying process N ϕ ( t ) when the systemis scaled to the asymptotic regime. This specification is required for proving the asymptoticoptimality of the index policy ϕ .In the interests of notational consistency, we shall use the form (30) throughout but, here, sincewe do not worry about the asymptotic behavior, we consider the case with ¯ ǫ j,ι C j ∈ [ w j,i ι − , w j,i ι ) for all ι ∈ [ N ] and j ∈ [ J ] so that (30) reduces to (5). A detailed discussion about the scalingprocedure and the role of ¯ ǫ in the asymptotic case will be provided in Section V.Under the index policy ϕ , we select L patterns to accept new arrivals of L types accordingto their orders in sequence i oσ ( N ϕ ( t )) ( σ ∈ [ I ] ). In particular, at a decision making epoch t > , we initialize a ϕi ( N ϕ ( t )) = 0 for all i ∈ [ I ] and a set of available patterns to be [ I ] .If, for i = i o ( N ϕ ( t )) , constraints (30) will not be violated by setting a ϕi ( N ϕ ( t )) = 1 , thenset a ϕi ( N ϕ ( t )) = 1 and remove all patterns associated with request type ℓ ( i ) from the set ofavailable patterns.The other L − patterns are selected similarly and iteratively. That is, we look for the smallest σ ∈ { , , . . . , I } such that • i oσ ( N ϕ ( t )) remains in the set of available patterns; and • the capacity constraints (30) will not be violated by setting a ϕi ( N ϕ ( t )) = 1 where i = i oσ ( N ϕ ( t )) .If there is such a σ , set a ϕi ( N ϕ ( t )) = 1 for i = i oσ ( N ϕ ( t )) , remove all patterns associated withrequest type ℓ ( i ) from the set of available patterns and continue selecting the remaining L − patterns in the same manner. When all of the L patterns have been selected we can stop. Detailedsteps are provided in Algorithm 2, which has a computational complexity that is linear in I . Input : a ranking of PS pairs o ∈ O and a given state n ∈ N . Output: the action variables a ϕ ( n ) under the index policy ϕ ∈ Φ with respect to ranking o when thesystem is in state n . Function
IndexPolicy( o, n ) : a ϕ ( n ) ← /* Initializing the action variables */ P ← [ I ] /* Initializing the set of available patterns */ σ ← /* Starting with the pattern with the highest priority */ while P = ∅ do i ← i oσ ( n ) if i ∈ P and Constraints (30) are not violated by setting a ϕi ( n ) = 1 and N ϕ ( t ) = n then a ϕi ( n ) ← Remove all patterns i ′ ∈ P with ℓ ( i ′ ) = ℓ ( i ) from P end σ ← σ + 1 end return a ϕ ( n ) Algorithm 2:
Implementing the index policy ϕ with respect to ranking o .The performance of ϕ is mainly determined by the given order o ∈ O . Based on later discussionof the asymptotic regime, if the policy ¯ ϕ ( o ) is optimal for the relaxed problem in the asymptoticregime, then ϕ is asymptotically optimal for the original problem. Even without the provedasymptotic optimality, the ranking o should ensure good performance of ϕ as it is always rationalto prioritize patterns according to their potential profits. As long as there are reasonably good γγγ and ν leading to a o ∈ O ( γγγ, ν ) , which correctly reflects the potential profits of patterns, theperformance degradation of ¯ ϕ ( o ) is likely to be limited for the relaxed problem and close to theoptimal solution of the original problem; and the index policy ϕ derived from o is a promisingchoice for managing resources.The selection of γγγ , ν and o ∈ O ( γγγ, ν ) is discussed in Section III. The key point is to guaranteegood performance of ¯ ϕ ( o ) : the policy that is guaranteed to be optimal for the relaxed problemwhen the system is decomposable.V. S TOCHASTIC O PTIMIZATION IN A S CALED S YSTEM
In this section, we establish asymptotic optimality of ϕ . A. Scaling Parameter
With a parameter h ∈ N + , let C := h C , C ∈ N J + , and the arrival rates scale as λ := h λ , λ ∈ R L + . We refer to the parameter h as the scaling parameter , and the asymptotic regime asthe limiting case with h → + ∞ .We split the process associated with pattern i into h identical sub-processes ( i, k ) , k ∈ [ h ] ,and divide N φi ( t ) , the number of instantiations for pattern i under policy φ at time t , into h pieces. The number of instantiations of the k th piece is N φi,k ( t ) , so that N φi ( t ) = P hk =1 N φi,k ( t ) .We refer to N φi,k ( t ) as the number of instantiations for sub-pattern ( i, k ) . The counting processgiven by N φi,k ( t ) ( k ∈ [ h ] , i ∈ [ I ] ) has state space N i := { , , . . . , min j ∈ J i ⌈ C j /w j,i ⌉} . For anydummy pattern d ( ℓ ) , we take N d ( ℓ ) = N d ( ℓ ) = { } .The objective and constraints defined by (8), (1) and (5) still apply to the sums of variables P hk =1 N φi,k ( t ) := N φi ( t ) , i ∈ [ I ] . We say the process associated with pattern i is replaced by the h sub-processes associated with sub-patterns ( i, k ) , k ∈ [ h ] . Each sub-pattern ( i, k ) earns reward r ℓ ( i ) per each served request and the cost rate that a request accommodated by this sub-patternimposes on resource pool j ∈ [ J ] is ε j w j,i ; that is, the sub-process ( i, k ) maintains the samereward and cost rates in states n ∈ N i as process i . Let N φh ( t ) = ( N φi,k ( t ) : i ∈ [ I ] , k ∈ [ h ]) bethe state variable after this replacement, and a φi,k ( N φh ( t )) ∈ { , } ( i ∈ [ I ] , k ∈ [ h ] ) be the actionvariables with respect to the process N φh ( t ) . To clarify, we rewrite the objective described in (8)as max φ h X i ∈ [ I ] X k ∈ [ h ] X n i ∈ N i π φ,hi,k ( n i ) (cid:16) r ℓ ( i ) µ i − X j ∈ J w j,i ε j (cid:17) n i , (31)where π φ,hi,k ( n i ) represents the stationary probability that the state of sub-process ( i, k ) is n i underpolicy φ with scaling parameter h . We divide the total revenue earned by all sub-patterns by h ∈ N + so that the objective function is always bounded when h → + ∞ . The policy φ in (31)is determined by the action variables a φi,k ( N φh ( t )) ( i ∈ [ I ] , k ∈ [ h ] ) subject to X i ∈ P ℓ X k ∈ [ h ] a φi,k ( N φh ( t )) = 1 , ∀ ℓ ∈ [ L ] , ∀ t ≥ , (32)and X i ∈ [ I ] w j,i h X k ∈ [ h ] (cid:16) N φi,k ( t ) + a φi,k ( N φh ( t )) (cid:17) ≤ C j , ∀ j ∈ [ J ] , ∀ t ≥ . (33)The constraints in (33) are obtained by substituting C j h for C j in the constraints stated in (5),and thus (33) is equivalent to (5). Also, to guarantee that the maximal value of each N φi,k ( t ) ( k ∈ [ h ] ), min j ∈ J i ⌈ C j /w j,i ⌉ , is not exceeded, define, for k ∈ [ h ] and i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } , a φi,k ( N φh ( t )) = 0 , if N φi,k ( t ) = | N i | − , (34)which corresponds to the redundant constraints described in (6). Remark
As in Section I, we activate exactly one sub-process ( i, k ) ( i ∈ P ℓ , k ∈ [ h ] ) forRT ℓ ∈ [ L ] regardless of the scaling parameter h ∈ N + . The birth and death rates of this activesub-process are hλ ℓ and N φi,k ( t ) µ i , respectively, so that if hλ ℓ ≫ ( | N i | − µ i , the number of instantiations of pattern i will increase rapidly until it is restricted by the capacity constraints.A model with a single active sub-process at any time has different stochastic propertiescompared to the case where the number of active sub-processes is proportional to h (whichwas discussed in [Weber and Weiss, 1990]). To illustrate the difference, we present an examplein Appendix H of the e-companion.The optimization problem consisting of the hI sub-processes associated with hI sub-patterns,coupled through constraints (32)-(34) can be analyzed and relaxed along the same lines as inSection III. Let α φi,k ( n ) := lim t → + ∞ E { a φi,k ( N φh ( t )) | N φi,k ( t ) = n } ( n ∈ N i , i ∈ [ I ] , k ∈ [ h ] )represent the action variables of the hI sub-problems for the relaxed problem scaled by h .All the sub-processes corresponding to a given pattern i ∈ [ I ] in the same state n ∈ N i areequivalent. The controller then is concerned only with the total number of active sub-processesof a given pattern in a given state. Define the random variable Z φ,hι ( t ) to be the proportion ofsub-processes in PS pair ι at time t under policy φ where h is the scaling parameter; that is, Z φ,hι ( t ) = 1 hI (cid:12)(cid:12)(cid:12)(cid:8) ( i, k ) ∈ [ I ] × [ h ] (cid:12)(cid:12)(cid:12) N φi,k ( t ) = n ι , i ι = i (cid:9)(cid:12)(cid:12)(cid:12) . (35)Let Z φ,h ( t ) = ( Z φ,hι ( t ) : ι ∈ [ N ]) and Z be the probability simplex { z ∈ [0 , N | P ι ∈ [ N ] z ι =1 } . In this model, the process Z φ,h ( t ) is analogous to the counting process N φh ( t ) in the originalprocess. When the process Z φ,h ( t ) takes value z ∈ Z , it can transition only to a state ofthe form z + e ι,ι ′ ∈ Z with i ι = i ι ′ , where e ι,ι ′ ∈ R N is a vector with ι th element +1 /hI , ι ′ th element − /hI and all the other elements set to zero. For our birth-and-death process, atransition will happen only with n ι ′ = n ι ± corresponding to the arrival and departure of arequest, respectively. For any given h ∈ N + , the state space of the process Z φ,h ( t ) is a subset of Z and thus the system is always stable. We refer to the system with h → + ∞ as the asymptoticregime .Note that any resource allocation problem in the non-asymptotic regime coincides with a scaledproblem described in (31)-(34) with given h < + ∞ . The scaling parameter h is introduced torigorously specify the trajectory of the entire system going from a non-asymptotic regime to anasymptotic regime. B. Index Policies in a Scaled System
In Section IV, for a ranking o ∈ O , we proposed an index policy ϕ ∈ Φ for the resourceallocation problem in the non-asymptotic regime; this coincides with the problem described in(31)-(34) with given h < + ∞ . For clarity, we translate the description of ϕ to a policy used fora scaled system with the notation described in Section V-A. For a ranking o ∈ O , the index policy ϕ activates a sub-process in the first PS pair ι ∈ [ N ] in ranking o with Z ϕ,hι ( t ) > and the action and capacity constraints holding; that is, ϕ selectsa sub-process ( i ι , k ) ( k ∈ [ h ] ) satisfying N ϕi ι ,k ( t ) = n ι and sets a ϕi ι ,k ( N ϕh ( t )) to . The condition Z ϕ,hι ( t ) > is required because there has to be some sub-processes in PS pair ι for us to be ableto activate. Once a sub-process in PS pair ι is selected for activation, the action constraint (32)for RT ℓ ι achieves equality: there is exactly one active sub-process for a specific RT ℓ ∈ [ L ] .Resource units in associated resource pools are reserved for this activated sub-process in PS pair ι . In this way, L sub-processes in L different PS pairs will be activated iteratively, according tothe ranking o , for the L RTs.Under the index policy ϕ , the transition matrix of process Z ϕ,h ( t ) is determined by the valueof P k ∈ [ h ] ,N ϕiι,k ( t )= n ι a ϕi ι ,k ( N ϕh ( t )) for each PS pair ι ∈ [ N ] , which is either or and is dependenton N ϕh ( t ) through only Z ϕ,h ( t ) . Define υ ϕ,hι ( z ) , ι ∈ [ N ] , z ∈ Z , to be the ratio of the number ofactive sub-processes in PS pair ι , for which the corresponding sub-patterns are prepared to accepta request, to the total number of sub-processes in this PS pair under ϕ , when the proportions ofsub-processes in all PS pairs are currently specified by z . That is, at time t , for ι ∈ [ N ] , υ ϕ,hι (cid:0) Z ϕ,h ( t ) (cid:1) = P k ∈ [ h ] ,N φiι,k ( t )= n ι a φi ι ,k ( N ϕh ( t )) IhZ ϕ,hι ( t ) , (36)where we recall that the numerator on the right hand side relies on N ϕh ( t ) through Z ϕ,h ( t ) ∈ Z .Note that, for arbitrarily large h , the value of υ ϕ,hι ( z ) Ihz ι ( z ∈ Z ), representing the numberof active sub-processes in PS pair ι , is never more than because the policy ϕ must satisfythe action constraints (32). Let υ ϕ,h ( z ) = ( υ ϕ,hι ( z ) : ι ∈ [ N ]) . Although different tie-breakingrules lead to the same process Z ϕ,h ( t ) , we shall stipulate that, when there is more than onesub-process ( i, k ) ( i ∈ [ I ] , k ∈ [ h ] ) in the same PS pair available for activation, we prioritize theone with the smaller value of k . In this context, the variables υ ϕ,h ( z ) , z ∈ Z , provide sufficientinformation for the index policy ϕ to make decisions on the counting process N ϕh ( t ) .Let ζ ϕ,hι ( z ) represent the maximal proportion of sub-processes in PS pair ι that can be activeif we consider only the capacity constraints defined by (33) (neglecting the action constraintsdefined by (32)) with proportions of sub-processes in all PS pairs specified by z under policy ϕ . We obtain that, for ι ∈ [ N ] \{ d ( ℓ ) : ℓ ∈ [ L ] } , ζ ϕ,hι ( z ) = min ( z ι , max (cid:26) , min j ∈ J i w j,i Ih l hC j (1 − ǫ hj,ι ) − N X ι ′ =1 w j,i ι ′ n ι ′ z ι ′ Ih − X ι ′ ∈ N + ι w j,i ι ′ υ ϕ,hι ′ ( z ) z ι ′ Ih m(cid:27)) , (37)where N + ι , ι ∈ [ N ] , is the set of PS pairs ι ′ ∈ [ N ] with ι ′ < ι (with higher priorities than pair ι ) with respect to ranking o , and ǫ hj,ι ∈ [0 , corresponds to ¯ ǫ j,ι in (30).Here, the parameter ǫ hj,ι is defined so that < lim h → + ∞ ǫ hj,ι < lim h → + ∞ ǫ hj,ι ′ ≤ (38)for any ι < ι ′ , w j,i ι > and w j,i ι ′ > . We need ǫ hj,ι to indicate the priorities of PS pairs inresource pool j , because the last term in (37), P ι ′ ∈ N + ι w j,i ι ′ υ ϕ,hι ′ ( z ) z ι ′ Ih , is o ( h ) . In particular,in order to follow the strict capacity constraints described in (33), we need to define the ǫ hj,ι so that ǫ hj,ι C j h ≥ w j,i ι − for all j ∈ [ J ] , ι ∈ [ N ] and h ∈ N + and lim h → + ∞ ǫ hj,ι exists. Let ǫ h := ( ǫ hj,ι : j ∈ [ J ] , ι ∈ [ N ]) and ǫ := lim h → + ∞ ǫ h . Define E h , h ∈ N + ∪ { + ∞} , and Ψ as the setsof all such vectors ǫ h and sequences of such vectors ψ := ( ǫ , ǫ , . . . , ) , respectively.Equation (38) specifies possible trajectories of ǫ h as h → + ∞ , and is required for subsequentproofs of asymptotic optimality. Note that, although the asymptotic regime is a limiting situation,using an asymptotically-optimal policy is likely to be appropriate for systems with finite but large h . In (37), the value of ζ ϕ,hι ( z ) is constrained by the remaining capacities of relevant resourcepools, the proportion of sub-processes currently in PS pair ι and the proportions of active sub-processes in PS pairs with higher priorities. Recall that υ ϕ,hι ( z ) represents the proportion of activesub-processes in PS pair ι , for which the corresponding sub-patterns are prepared to accept arequest, when the proportions of sub-processes in all PS pairs are currently specified by z .Together with the action constraints described in (32), under an index policy ϕ , for z ι > , υ ϕ,hι ( z ) = 1 z ι hI min (cid:26) ζ ϕ,hι ( z ) hI, max n , − X ι ′ ∈ N + ι l ι ′ = l ι ζ ϕ,hι ′ ( z ) hI o(cid:27) . (39)If z ι = 0 , then there are no sub-processes in PS pair ι and so υ ϕ,hι ( z ) can take any value in [0 , without making a difference to the evolution of the process. For completeness, define, for z with z ι = 0 and z xι := ( z , z , . . . , z ι − , x, z ι +1 , . . . , z N ) , υ ϕ,hι ( z ) = lim x ↓ υ ϕ,hι ( z xι ) . For any given z ∈ Z , ζ ϕ,hι ( z ) and υ ϕ,hι ( z ) can be obtained iteratively using equations (37) and (39) from ι = 1 to N . Remark
Although capacity constraints were not considered in [Weber and Weiss, 1990], theconstruction of υ ϕ,h ( z ) ( z ∈ Z ) follows ideas similar to those used in that paper. Recall that,for a given ranking, o ∈ O , the policy ¯ ϕ ( o ) is generated by Algorithm 1 and is infeasible for theoriginal problem. This gives rise to the interesting property that values of υ ϕ,hι ( z ) and α ¯ ϕ ( o ) i ι ,k ( n ι ) ( k ∈ [ h ] ) for all h ∈ N + ∪ { + ∞} , are always independent of those of PS pairs ι ′ with ι ′ > ι :the PS pairs with lower priorities than ι . The property is important for the proofs of Theorems 1and 2. C. Asymptotic Optimality
For given h ∈ N + , define the long-run average revenue normalized by h of the resourceallocation problem under policy φ to be R φ,h ; that is, R φ,h := 1 h X i ∈ [ I ] X k ∈ [ h ] X n i ∈ N i π φ,hi,k ( n i ) (cid:16) r ℓ ( i ) µ i − X j ∈ J w j,i ε j (cid:17) n i , (40)the objective function described in (31). Definition 6.
We say that the index policy ϕ derived from PS ranking o by iterating (37) and (39) is asymptotically optimal if lim k ǫ k→ lim h → + ∞ | R ϕ,h − max φ ∈ Φ R φ,h | = 0 . Recall that the index policy ϕ described in Section V-B is dependent on the parameter ǫ h with ǫ := lim h → + ∞ ǫ h . The ǫ is used to guarantee strict priorities of the sub-processes in the asymptoticregime as discussed in Section V-B. The policy ¯ ϕ ( o ) , proposed in Section III-B1 for the relaxedproblem, is generally not applicable to the original resource allocation problem. Although thepolicies ¯ ϕ ( o ) and ϕ both rely on the same ranking o ∈ O , they are different policies. Theorem 1.
For given o ∈ O , ϕ derived from o by iterating (37) and (39) is asymptoticallyoptimal if and only if lim h → + ∞ | R ¯ ϕ ( o ) ,h − max φ ∈ Φ R φ,h | = 0 . (41)The proof is given in Appendix I. Theorem 1 indicates that asymptotic optimality of ϕ is equiv-alent to the convergence between R ¯ ϕ ( o ) ,h and the maximized long-run average revenue of the orig-inal problem as h → + ∞ . It is proved by showing the existence of lim h → + ∞ lim t → + ∞ E [ Z ¯ ϕ ( o ) ,h ( t )] and a global attractor of the process Z ϕ,h ( t ) as t, h → + ∞ and k ǫ k → , and specifically thatthey coincide with each other. The long-run average revenue R ϕ,h then coincides with R ¯ ϕ ( o ) ,h as h → + ∞ and k ǫ k → . A similar condition relevant to the global attractor was required in [Weber and Weiss, 1990]for asymptotic optimality of the Whittle index policy in a general RMABP. It does not necessarilyhold. However, in our problem, each sub-process is a queueing process with the departure rateincreasing in its queue size. Such a sub-process is a special case of a general bandit process. Weprove in general that the underlying process Z ϕ,h ( t ) , regardless of its initial point, will convergeto any specified neighborhood of a fixed point almost surely as t, h → + ∞ and k ǫ k → .Detailed proofs are provided in Appendix I.Theorem 1, in itself, does not provide a verifiable condition. This is given in our next theorem.If there exists H ∈ R such that, for all h > H , the system is decomposable, we say the systemis decomposable in the asymptotic regime. Theorem 2.
If the capacity constraints described in (33) (or equivalently (5) ) are decomposablewith decomposable multipliers γγγ ∈ R J in the asymptotic regime, then there exist ν ∈ R L anda PS pair ranking o ∈ O ( γγγ, ν ) such that the index policy ϕ derived from o is asymptoticallyoptimal. The proof, based on Theorem 1, is given in Appendix L. Recall that we discussed decompos-ability of multipliers in Section III and provided examples of provably decomposable systems.
Remark
Theorem 2 binds asymptotic optimality of ϕ to the decomposability of the relaxedproblem. For a decomposable system (see Definition 2), there always exists a ranking o ∈ O suchthat ¯ ϕ ( o ) is optimal for the relaxed problem. If a system is decomposable in the asymptotic regimethen (41) is satisfied. This follows because, for any h ∈ N + ∪ { + ∞} , R ϕ,h ≤ max φ ∈ Φ R φ,h ≤ max φ ∈ ˜Φ R φ,h where Φ and ˜Φ are the sets of feasible policies for the original and relaxed problem,and R ϕ,h coincides with R ¯ ϕ ( o ) ,h as h → + ∞ and k ǫ k → .Similarly, we say the system is in heavy traffic in the asymptotic regime if there exists H ∈ R such that, for all h > H , the system is in heavy traffic. Corollary 2.
If the system is in heavy traffic in the asymptotic regime and is weakly coupled,then there exist decomposable multipliers γγγ ∈ R J , satisfying (27) - (29) , and a PS pair ranking o ∈ O ∗ ( ) , so that the index policy ϕ derived from o is asymptotically optimal. The proof, invoking Theorem 2 and Proposition 4, is given in Appendix M. In particular,the PS pair ranking o , described in Corollary 2, exists in closed form: its PS pairs are rankedaccording to the descending order of Ξ ∗ ι with ν = . Scaling parameter R e l a t i v e d i ff e r en c e ( % ) INDEX(0)INDEX(0.01)Max-RewardMin-CostRandom (a)
Scaling parameter R e l a t i v e d i ff e r en c e ( % ) INDEX(0)INDEX(0.01)Max-RewardMin-CostRandom (b)Fig. 3. Relative difference of a specific policy to R ( o k ∗ ) against the scaling parameter of the system: (a) diverse performance andnon-zero decomposable multipliers; (b) similar performance and non-zero decomposable multipliers; and (c) zero decomposablemultipliers. VI. N
UMERICAL R ESULTS
We demonstrate by simulation the performance of the index policy ϕ , defined in Section V-B(or equivalently, defined in Section IV for a given h < + ∞ ), in systems that are not weaklycoupled or in heavy traffic in comparison with baseline policies.In this section, the confidence intervals of all the simulated average revenues at the levelbased on the Student’s t-distribution are maintained within ± of the observed mean. We recallthat the capacities C and arrival rates λ are scaled by the scaling parameter h .Along with the fixed point iteration method proposed in Section III-B2, we have been ableto find systems which are not weakly coupled or in heavy traffic but are decomposable. Here,we provide two examples, where L and J are sampled uniformly from the sets { , , , } and { , , . . . , } , respectively. Let ǫ M = max j ∈ [ J ] ,ι ∈ [ N ] ǫ j,ι . We refer to an index policy ϕ withspecific ǫ M ∈ [0 , as INDEX( ǫ M ).We consider three baseline policies: two greedy policies that prioritize patterns with maximalreward rates and minimal cost rates, and one policy randomly uniformly selecting an availablepattern for each request type. We refer to the three policies as Max-Reward , Min-Cost and
Random . The Max-Reward and Min-Cost policies are in fact index policies with PS pairs rankedaccording to the descending order of their reward rates and the ascending order of their costrates, respectively. The Random policy was proposed by [Stolyar, 2017] for a VM replacementproblem, aiming to minimize the system blocking probabilities in the case with finite capacities.It is not a feasible policy of the original problem with capacity constraints (5) because it doesnot reserve resource units for a specific pattern that is more profitable than the others. When there are not enough resource units in a pool to accommodate multiple request types that havechosen their patterns involving this pool, the Random policy will always assign the resourceunits to the request that arrives first.In Figure 3, we compare the performance of INDEX(0), INDEX( . ), the baseline policiesand ¯ ϕ ( o k ∗ ) , where o k ∗ is the ranking of the multipliers γγγ k ∗ resulting from the fixed point iterationmethod (described in Section III-B2) with parameter c = 0 . and initial point γγγ = . The systemparameters are listed in Appendix N and are generated by pseudo-random functions. The dis-covered multipliers γγγ k ∗ for simulations in Figures 3(a) and 3(b) are (269 . , , , , , . , , . , , , , . × − , . × − , and , respectively, satisfying T o k ∗ ( γγγ k ∗ ) = γγγ k ∗ in the asymptotic regime. By Proposition 3, these γγγ k ∗ are decomposable multipliers and, byTheorem 2, the index policies derived from the rankings o k ∗ are asymptotically optimal. Let R ( o ) := lim h → + ∞ R ¯ ϕ ( o ) ,h ( o ∈ O ) of which the existence is guaranteed in the proof of Theorem 1.For the decomposable systems with h < + ∞ and ¯ ϕ ( o k ∗ ) optimal for the relaxed problem in theasymptotic regime, the asymptotic long-run average revenue, R ( o k ∗ ) , is no less than the optimumof the original problem: R ( o k ∗ ) is an upper bound of R φ,h for any φ ∈ Φ .Figure 3 illustrates the relative difference of average revenues, (cid:0) R ( o k ∗ ) − R φ,h (cid:1) /R ( o k ∗ ) for φ = INDEX (0) , INDEX (0 . , Max-Reward, Min-Cost and Random, against the scaling parameter h . In this context, there are two aspects of performance evaluation presented in Figure 3. First,we see the performance of the index policies in the non-asymptotic regime by comparing theirlong-run average revenues with an upper bound on the optimum. In particular, Figures 3(a)and 3(b) show that INDEX( . ) significantly outperforms INDEX(0) for large h : the small butpositive ǫ does affects the performance of ϕ . The performance of INDEX( . ) is close to theupper bound of the optimal solution with relative difference less than for h greater than in all three examples: its performance degradation against the optimal solution is limited in thenon-asymptotic regime.On the other hand, by comparing to R ( o k ∗ ) , a trend of coincidence between R INDEX (0 . ,h and R ( o k ∗ ) is observed in Figure 3 as h increases from to , consistent with the provedasymptotic optimality of ϕ . Recall that the examples presented in Figure 3 are not for systemswith weak coupling or heavy traffic but the index policy ϕ is still proved to be asymptoticallyoptimal here. Also, the performance of ϕ is close to the optimum without requiring extremelylarge h . Scaling parameter R e l a t i v e d i ff e r en c e ( % ) INDEX(0)INDEX(0.01)Max-RewardMin-CostRandom (a)
Number of iterations -10-8-6-4-202468101214 R e l a t i v e d i ff e r en c e ( % ) INDEX(0)INDEX(0.01)Max-RewardMin-CostRandom (b)Fig. 4. (a) Relative difference of a specific policy to R ( o k ∗ ) against scaling parameter of the system; (b) Relative differenceof a specific policy to R ( o k ) against k . In Figure 4, we consider another example with multipliers that are not decomposable (that is, T o k ∗ ( γγγ k ∗ ) = γγγ k ∗ ). Similar to Figure 3, in Figure 4(a), we plot the relative difference of revenue ofINDEX(0), INDEX( . ) and the baseline policies to R ( o k ∗ ) against the scaling parameter; while,in Figure 4(b), fixing the scaling parameter h = 50 , we illustrate curves of the relative differences, (cid:0) R ( o k ) − R φ,h (cid:1) /R ( o k ) ( φ = INDEX (0) , INDEX (0 . , Max-Reward , Min-Cost , Random), againstthe number of iterations k for the fixed point iteration method. Note that the rankings o k arepotentially different as k varies, so as R ( o k ) . In Figure 4(a), the INDEX( ) and INDEX( . )are proposed based on the ranking o k ∗ , while, with slightly abused notation, in Figure 4(b),INDEX( ) and INDEX( . ) represent the index policies ϕ , which are derived from the rankings o k associated with the varying k , with ǫ M = 0 and . , respectively. The system parametersfor the simulations in Figure 4 are listed in Appendix N.Figure 4(a) can be read in a similar way to Figure 3 except that R ( o k ∗ ) is not a proved upperbound for the average revenue for the original problem. Here, INDEX( ) and INDEX( . )perform similarly and numerically converge to R ( o k ∗ ) as h increases although the system is notnecessarily decomposable. The convergence is consistent with Theorem 1 which generally holdswithout requiring decomposability. On the other hand, for each finite h (which corresponds tothe non-asymptotic regime), INDEX( ) and INDEX( . ) significantly outperform all the otherbaseline policies, although the system is not proved to be decomposable, and their performanceadvantages are likely to maintain as h continues increasing.Figure 4(b) illustrates the performance trajectory as the iteration number k (the x-axis) forthe fixed point iteration method increases for a system with h = 50 (in the non-asymptoticregime). Recall that, for the simulations presented here, the average revenues of INDEX( ) and INDEX( . ) and R ( o k ) are varying with k while all of the baseline policies are independentof k . We observe a shape jump on the curves between k = 1 and . This is caused by theinitial setting, γγγ = , which is not a good choice of multipliers. After several steps of theiteration method, the curves in Figure 4(b) become almost flat; that is, the values of R ( o k ) , R INDEX (0) and R INDEX (0 . become relatively stable for k = 5 to . Also, in Figure 4(b), afterthe performance becomes stable, INDEX( ) and INDEX( . ) achieve clearly higher long-runaverage revenues than those of the baseline policies: given the poor setting at the beginning, thefixed point iteration method can still lead to a reasonably good ranking o k ∗ and its associatedmultipliers γγγ k ∗ . VII. C ONCLUSIONS
We have modeled a resource allocation problem, described by (8), (1) and (5), as a combinationof various RMABPs coupled by limited capacities of the shared resource pools, which areshared, competed for, and reserved by requests. This presents us with an optimization problemfor a stochastic system, aimed at maximizing the long-run average revenue by dynamicallyaccommodating requests into appropriate resource pools.Using the ideas of Whittle relaxation [Whittle, 1988] and the asymptotic optimality proof of[Weber and Weiss, 1990], we have proved the asymptotic optimality of an index policy (referredto as ϕ ) if the capacity constraints are decomposable with multipliers γγγ ∈ R J (Theorem 2). Theasymptotic optimality is proved based on the existence of a global attractor z ∈ Z for theunderlying process Z ϕ,h ( t ) as h → + ∞ and ǫ → . We have proved in general that such aglobal attractor exists, and then proposed a necessary and sufficient condition for asymptoticoptimality in Theorem 1: the performance of the attractor z approaches the optimum of theoriginal problem in the asymptotic regime. This condition holds if the system is decomposable.We have proved a sufficient condition, described as the property of being weakly coupled andin heavy traffic, for the existence of such decomposable multipliers as well as the asymptoticoptimality of policy ϕ (Corollary 2). The property is not necessary, but is easy to verify andcovers a significant class of resource allocation problems. We have listed examples of systemswith the property satisfied in Section III-C.In a general system, we have proposed a fixed point method to fine tune the multipliers γγγ ∈ R J and a ranking o ∈ O ( γγγ, ) . We have proved that, if there exists a fixed point γγγ ∈ R J of thefunction T o satisfying o ∈ O ( γγγ, ) , then this γγγ is a vector of decomposable multipliers. Wehave successfully discovered the decomposable multipliers in some situations without assuming weak coupling or heavy traffic by applying the fixed point method. Also, in Section VI, we havecompared the index policy ϕ with different parameter ǫ to baseline policies through simulationsfor systems that are not weakly coupled or in heavy traffic in the non-asymptotic regime. Theindex policy achieves clearly higher performance than the baseline policies. To the best of ourknowledge, no existing work provides rigorous asymptotic optimality for a resource allocationproblem where dynamic allocation, competition and reservation are permitted.A CKNOWLEDGMENT
Jing Fu’s and Peter Taylor’s research is supported by the Australian Research Council (ARC)Centre of Excellence for the Mathematical and Statistical Frontiers (ACEMS) and ARC LaureateFellowship FL130100039. R
EFERENCES [Avrachenkov and Borkar, 2016] Avrachenkov, K. E. and Borkar, V. S. (2016). Whittle index policy for crawling ephemeralcontent.
IEEE Transactions on Control of Network Systems , 5(1):446–455.[B¨auerle, 2002] B¨auerle, N. (2002). Optimal control of queueing networks: An approach via fluid models.
Advances in AppliedProbability , 34(2):313–328.[B¨auerle et al., 2000] B¨auerle, N. et al. (2000). Asymptotic optimality of tracking policies in stochastic networks.
The Annalsof Applied Probability , 10(4):1065–1083.[Bertsimas et al., 2015] Bertsimas, D., Nasrabadi, E., and Paschalidis, I. C. (2015). Robust fluid processing networks.
IEEETrans. Autom. Control , 60(3):715–728.[Bertsimas and Ni˜no-Mora, 1996] Bertsimas, D. and Ni˜no-Mora, J. (1996). Conservation laws, extended polymatroids andmultiarmed bandit problems; a polyhedral approach to indexable systems.
Mathematics of Operations Research , 21(2):257–306.[Cezik and L’Ecuyer, 2008] Cezik, M. T. and L’Ecuyer, P. (2008). Staffing multiskill call centers via linear programming andsimulation.
Management Science , 54(2):310–323.[Chen et al., 2017] Chen, X., Han, Z., Zhang, H., Xue, G., Xiao, Y., and Bennis, M. (2017). Wireless resource scheduling invirtualized radio access networks using stochastic learning.
IEEE Transactions on Mobile Computing , 17(4):961–974.[Coddington and Levinson, 1955] Coddington, E. A. and Levinson, N. (1955).
Theory of ordinary differential equations . TataMcGraw-Hill Education.[Esposito et al., 2016] Esposito, F., Di Paola, D., and Matta, I. (2016). On distributed virtual network embedding withguarantees.
IEEE/ACM Trans. Netw. , 24(1):569–582.[Fikar and Hirsch, 2017] Fikar, C. and Hirsch, P. (2017). Home health care routing and scheduling: A review.
Computers &Operations Research , 77:86–95.[Freidlin and Wentzell, 2012] Freidlin, M. I. and Wentzell, A. D. (2012).
Random perturbations of dynamical systems . SpringerScience & Business Media. translated by J. Sz¨ucs.[Fu, 2016] Fu, J. (2016).
Energy-Efficient Heuristics for Job Assignment in Server Farms . PhD thesis, Department of ElectronicEngineering, City University of Hong Kong, Hong Kong. [Fu et al., 2016] Fu, J., Moran, B., Guo, J., Wong, E. W. M., and Zukerman, M. (2016). Asymptotically optimal job assignmentfor energy-efficient processor-sharing server farms. IEEE J. Sel. Areas Commun. , 34(12):4008–4023.[Gittins, 1979] Gittins, J. C. (1979). Bandit processes and dynamic allocation indices.
Journal of the Royal Statistical Society.Series B (Methodological) , pages 148–177.[Gittins et al., 2011] Gittins, J. C., Glazebrook, K., and Weber, R. R. (2011).
Multi-armed bandit allocation indices: 2ndedition . Wiley.[Gittins and Jones, 1974] Gittins, J. C. and Jones, D. M. (1974). A dynamic allocation index for the sequential design ofexperiments. In Gani, J., editor,
Progress in Statistics , pages 241–266. North-Holland, Amsterdam, NL.[Kelly, 1991] Kelly, F. P. (1991). Loss networks.
The annals of applied probability , pages 319–378.[Krishnamurthy and Djonin, 2007] Krishnamurthy, V. and Djonin, D. V. (2007). Structured threshold policies for dynamic sensorschedulinga partially observed Markovdecision process approach.
IEEE Transactions on Signal Processing , 55(10):4938–4957.[Larra˜naga et al., 2015] Larra˜naga, M., Ayesta, U., and Verloop, I. M. (2015). Asymptotically optimal index policies for anabandonment queue with convex holding cost.
Queueing Systems , pages 1–71.[Le Ny et al., 2010] Le Ny, J., Feron, E., and Dahleh, M. A. (2010). Scheduling continuous-time Kalman filters.
IEEETransactions on Automatic Control , 56(6):1381–1394.[Lieder et al., 2015] Lieder, A., Moeke, D., Koole, G., and Stolletz, R. (2015). Task scheduling in long-term care facilities: Aclient-centered approach.
Operations Research for Health Care , 6:11–17.[Liu et al., 2012] Liu, H., Liu, K., and Zhao, Q. (2012). Learning in a changing world: Restless multiarmed bandit withunknown dynamics.
IEEE Transactions on Information Theory , 59(3):1902–1916.[Meyn, 2008] Meyn, S. (2008).
Control techniques for complex networks . Cambridge University Press.[Nazarathy and Weiss, 2009] Nazarathy, Y. and Weiss, G. (2009). Near optimal control of queueing networks over a finite timehorizon.
Ann. Oper. Res. , 170(1):233–249.[Ni˜no-Mora, 2001] Ni˜no-Mora, J. (2001). Restless bandits, partial conservation laws and indexability.
Advances in AppliedProbability , 33(1):76–98.[Ni˜no-Mora, 2002] Ni˜no-Mora, J. (2002). Dynamic allocation indices for restless projects and queueing admission control: apolyhedral approach.
Mathematical programming , 93(3):361–413.[Ni˜no-Mora, 2007] Ni˜no-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices.
TOP ,15(2):161–198.[Ni˜no-Mora, 2012] Ni˜no-Mora, J. (2012). Admission and routing of soft real-time jobs to multiclusters: Design and comparisonof index policies.
Computers & operations research , 39(12):3431–3444.[Ni˜no-Mora, 2019] Ni˜no-Mora, J. (2019). Resource allocation and routing in parallel multi-server queues with abandonmentsfor cloud profit maximization.
Computers & Operations Research , 103:221–236.[Papadimitriou and Tsitsiklis, 1999] Papadimitriou, C. H. and Tsitsiklis, J. N. (1999). The complexity of optimal queuingnetwork control.
Mathematics of Operations Research , 24(2):293–305.[Ross, 1992] Ross, S. M. (1992).
Applied probability models with optimization applications . Dover Publications (New York).[Serfozo, 2009] Serfozo, R. (2009).
Basics of applied stochastic processes . Springer Science & Business Media.[Stolyar, 2004] Stolyar, A. L. (2004). Maxweight scheduling in a generalized switch: State space collapse and workloadminimization in heavy traffic.
Ann. Appl. Probab. , 14(1):1–53.[Stolyar, 2013] Stolyar, A. L. (2013). An infinite server system with general packing constraints.
Operations Research ,61(5):1200–1217. [Stolyar, 2017] Stolyar, A. L. (2017). Large-scale heterogeneous service systems with general packing constraints. Advancesin Applied Probability , 49(1):61–83.[Verloop, 2016] Verloop, I. M. (2016). Asymptotically optimal priority policies for indexable and non-indexable restless bandits.
Ann. Appl. Probab. , 26(4):1947–1995.[Wallace and Whitt, 2004] Wallace, R. B. and Whitt, W. (2004). Resource pooling and staffing in call centers with skill-basedrouting.
Operations Research , 7(4):276–294.[Wang et al., 2019] Wang, J., Ren, X., Mo, Y., and Shi, L. (2019). Whittle index policy for dynamic multi-channel allocationin remote state estimation.
IEEE Transactions on Automatic Control . to apear.[Weber and Weiss, 1990] Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits.
J. Appl. Probab. ,(3):637–648.[Wei and Neely, 2017] Wei, X. and Neely, M. J. (2017). Data center server provision: Distributed asynchronous control forcoupled renewal systems.
IEEE/ACM Transactions on Networking (TON) , 25(4):2180–2194.[Whittle, 1988] Whittle, P. (1988). Restless bandits: Activity allocation in a changing world.
J. Appl. Probab. , 25:287–298. A PPENDIX AM ULTIPLE R ESTLESS M ULTI -A RMED B ANDIT P ROBLEMS
Consider the special casei) L = 1 ;ii) w j,i ∈ { , } for all i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } and j ∈ [ J ] ; andiii) P j ∈ [ J ] w j,i w j,i ′ = 0 for any i = i ′ , i, i ′ ∈ [ I ] , (so that the intersection of the sets of resourcetypes used by different patterns is empty).Here, the stochastic optimization problem defined by (8), (1) and (5) becomes max φ lim t → + ∞ E (cid:20) X i ∈ P (cid:16) r ℓ ( i ) µ i − X j ∈ [ J ] ε j w j,i (cid:17) N φi ( t ) (cid:21) (42)subject to X i ∈ P a φi ( N φ ( t )) = 1 , ∀ N φ ( t ) ∈ N , (43)This is a typical RMABP as defined in [Whittle, 1988], and the process for each pattern i ∈ [ I ] is a restless Bandit Process (BP).Similarly, if Premises ii) and iii) hold but Premise i) does not then the resource allocationproblem can be modeled by L independent RMABPs. If Premises ii) and iii) hold then thecapacity constraints defined in (5) are no longer necessary.We can also obtain L independent RMABPs by simply dropping the capacity constraintsin (5) without assuming any of the three premises, but the physical meaning of the entire problem is changed: the capacity constraints are then reflected by just the definition of statespace N i = { , , . . . , min j ∈ J i ⌈ C j /w j,i ⌉} , for each pattern i ∈ [ I ] , rather than be limited by thesum of RUs simultaneously occupied by different patterns. In the general case where capacityconstraints are in place, the L RMABPs, each of which corresponds to the | P ℓ | BPs coupledby the action constraint described in (1), are further linked by these capacity constraints.A
PPENDIX BP ROOF OF P ROPOSITION
Lemma 2.
For all n = 0 , , . . . , | N i | − , there exists a policy ϕ determined by α ϕi ( n ) ∈ { , } that maximizes the long-run average revenue of the underlying MDP associated with the sub-problem for pattern i .Proof of Lemma 2. The underlying MDP associated with the sub-problem for pattern i is a birth-and-death process,and its state variable n represents the number of instantiations generated for pattern i .Defining state as the absorbing state, for all n ∈ [ | N i | − , Bellman’s equation for the valuefunction of this MDP is V ( n ) = max α ϕi ( n ) ∈ [0 , (cid:26) α ϕi ( n ) λ ℓ ( i ) + nµ i (cid:16) r ℓ ( i ) µ i n − X j ∈ J i ε j w j,i n − g (cid:17) + α ϕi ( n ) λ ℓ ( i ) α ϕi ( n ) λ ℓ ( i ) + nµ i V ( n + 1) + nµ i α ϕi ( n ) λ ℓ ( i ) + nµ i V ( n − (cid:27) , (44)where g ∈ R is a given parameter that is equal to the maximized long-run average revenueof this MDP and V (0) = 0 . The parameter g acts as an attached cost, such that the actionvariable α ϕi ( n ) under an optimal solution ϕ that maximizes the long-run average revenue ofthis MDP will also maximize the right hand side of (44) (see [Ross, 1992]). The value function V ( · ) , which is a solution of the Bellman equation (44), can be computed through the valueiteration technique. In this context, the expression on the right hand side of (44) is maximizedby exploring α ϕi ( n ) ∈ [0 , with all the other parameters given. Now note that the right hand sideof (44) is of the form A + BCα ϕi ( n )+ D ( A, B, C, D ∈ R ), which is either increasing or decreasingin α ϕi ( n ) or remains constant. Hence, for all n ∈ [ | N i | − , the right hand side of (44) can beoptimized either by taking α ϕi ( n ) = 0 or by taking α ϕi ( n ) = 1 . We then consider the action variable for state . For any policy ϕ with α ϕi (0) = 0 , g = 0 ,because the MDP will stay in state all the time. Thus the average revenue g for the optimalpolicy ϕ must be non-negative. From Bellman’s equation, α ϕi (0) under this optimal policy ϕ iseither or maximizes max α ϕi (0) ∈ (0 , (cid:26) − gα ϕi (0) λ ℓ ( i ) + V (1) (cid:27) . (45)Since g ≥ , α ϕi (0) = 1 will always maximize the bracketed term in (45). It follows that thereexists an optimal policy ϕ with α ϕi (0) ∈ { , } that maximizes the long-run average revenue ofthis MDP. This proves the lemma. (cid:3) Proof of Proposition 1.
From Lemma 2, for all n = 0 , , . . . , | N i | − , there exists a policy ϕ determined by α ϕi ( n ) ∈{ , } that maximizes the long-run average revenue of this MDP.For any ϕ ∈ Φ i and n = 0 , , . . . , | N i | − , if α ϕi ( n ) = 1 , there is a positive transition ratefrom state n to state n + 1 ; otherwise, that transition rate is . The transition rate from n to n − , if n ≥ , is always positive and independent of the policy employed.For any ϕ ∈ Φ i and ≤ n ≤ | N i | − , if α ϕi ( n ) = 0 then there is no difference betweenmaking α ϕi ( n ′ ) = 0 or when n ′ > n , because we cannot reach state n ′ if we start below state n . For any ϕ ∈ Φ i and ≤ n < n ′ ≤ | N i | − , this allows us to assume that if α ϕi ( n ) = 0 then α ϕi ( n ′ ) = 0 , which makes ϕ a threshold policy . For ϕ ∈ Φ i , we define m ϕ = , if α ϕi (0) = 0 ,m, if α ϕi ( m −
1) = 1 , α ϕi ( m ) = 0 , m ∈ [ | N i | − , so that m − is the maximum value of n for which α φi ( n ) = 1 .Let π mn represent the steady state probability of state n ∈ N i under policy ϕ with m ϕ = m .We maximize the right hand side of (13) with respect to all threshold policies defined by m taking into account the specific form of the stationary distribution. For an optimal solution ϕ ∗ ,we obtain m ϕ ∗ ∈ arg max m ∈ [ | N i |− ( π m m X n =1 (cid:20) ( λ ℓ ) n n !( µ i ) n (cid:16) r i ( n, γγγ ) − nµ i λ ℓ (cid:0) ν ℓ + X j ∈ J i w j,i γ j (cid:1)(cid:17)(cid:21)) (46) where r i ( n, γγγ ) = nµ i r ℓ − P j ∈ J i ε j w j,i n − P j ∈ J i γ j w j,i n . Equation (46) can be rewritten as m ϕ ∗ ∈ arg max m ( ( λ ℓ ˜ r i − ν ℓ ) m − P n =0 ( λ ℓ ) n n !( µ i ) n m P n =0 ( λ ℓ ) n n !( µ i ) n ) (47)where ˜ r i = r ℓ − µ i P j ∈ J i ε j w j,i − µ i P j ∈ J i γ j w j,i − λ ℓ P j ∈ J i γ j w j,i . Note that the right hand side of(47) may return a set with more than one element: the optimal value for m ϕ ∗ is not necessarilyunique. If this is the case, we choose any of the possible maxima. From (47), if ν ℓ < λ ℓ ˜ r i , then m ϕ ∗ = | N i | − ; if ν ℓ = λ ℓ ˜ r i , then m ϕ ∗ can be any value in N i ; otherwise, m ϕ ∗ = 0 . We thusinterpret ϕ ∗ as an index policy where indices for states n ∈ N i \{| N i | − } are given by λ ℓ ˜ r i .This proves Proposition 1. (cid:3) A PPENDIX CP ROOF OF P ROPOSITION Proof.
For given γγγ ∈ R J , o ∈ O ( γγγ ) and ι ∈ I ( o ) , from the definition of Ξ i ( · ) in (20), werewrite (24) in the form (cid:16) λ ℓ ι µ i ι (cid:17) X j ∈ [ J ] w j,i ι γ ,j = Ξ i ι ( ) − ν ℓ ι ( o, γγγ ) . (48)Note that, from (25), γ ,j = 0 if there is no ι ∈ I ( o ) with j = j ι . For the remaining | I ( o ) | entries in γγγ where the γ ,j are allowed to be non-zero, we construct a | I ( o ) | × | I ( o ) | matrix M = ( m ι,j ) and write (48) as M ˜ γγγ = y with m ι,j = (1 + λ ℓ ι /µ i ι ) w j,i ι , ˜ γ ,ι = γ ,j ι and y ι = Ξ i ι ( ) − ν ℓ ι ( o, γγγ ) for all ι ∈ I ( γγγ ) .As described in (II), under policy ¯ ϕ ( o ) produced by Algorithm 1, if a resource pool j ∈ [ J ] becomes a critical pool of any critical pair ι ∈ I ( o ) , all PS pairs ι ′ ∈ [ N ] with j ∈ J i ι ′ and lower priorities than that of ι are removed from the list of candidate PS pairs; that is, allthese lower prioritized pairs ι ′ will not become critical pairs. In other words, by appropriatelyre-ordering columns of M and entries of y , it becomes an upper-triangular matrix with |M| = P ι ∈ I ( o ) (1 + λ ℓ ι /µ i ι ) w j ι ,i ι . Because w j ι ,i ι > for all ι ∈ I ( o ) , |M| > . This proves theproposition. (cid:3) A PPENDIX DP ROOF OF P ROPOSITION
Lemma 3.
For given γγγ ∈ R J and o ∈ O , if T o ( γγγ ) ∈ R J , then the complementary slacknessconditions corresponding to the capacity constraints (that is, Equation (23) ) and Equation (18) are satisfied by taking φ = ϕ = ¯ ϕ ( o ) and γγγ = T o ( γγγ ) . The proof of Lemma 3 will be given in Appendix F in the e-companion. Further, if (17) and(19) are satisfied by setting γγγ = T o ( γγγ ) and ϕ = ¯ ϕ ( o ) , then T o ( γγγ ) is a vector of decomposablemultipliers and, from the strong duality theorem, the policy ¯ ϕ ( o ) is an optimal policy for therelaxed problem. Proof of Proposition 3.
For given γγγ ∈ R J and o ∈ O ( γγγ , ) , if T o ( γγγ ) = γγγ , then (17) and(19) are satisfied by setting γγγ = γγγ and ϕ = ¯ ϕ ( o ) , because in this case O ( γγγ , ) = O ( T o ( γγγ ) , ) .That is, this T o ( γγγ ) = γγγ is a vector of decomposable multipliers. (cid:3) A PPENDIX EP ROOF OF C OROLLARY Proof.
For all ι ∈ I ( o ) , from the definition of Algorithm 1, ν ℓ ι ( o, T o ( γγγ )) = Ξ i ι ( T o ( γγγ ) , ) .For all j / ∈ { j ι ( o ) ∈ [ J ] | ι ∈ I ( o ) } , T oj ( γγγ ) = 0 because T o ( γγγ ) is the solution of (24)-(25). Inother words, T o ( T o ( γγγ )) = T o ( γγγ ) is a solution of (24)-(25) by taking γγγ = T o ( γγγ ) and, fromProposition 2, this solution is unique. Together with Proposition 3, this yields that T o ( γγγ ) is adecomposable vector and ¯ ϕ ( o ) is optimal for the relaxed problem. (cid:3) A PPENDIX FP ROOF OF L EMMA Proof.
Under policy ¯ ϕ ( o ) , a critical resource pool j ∈ { j ι ( o ) ∈ [ J ] | ι ∈ I ( o ) } is always fullyoccupied; that is, ω j · (cid:16) Π ¯ ϕ ( o ) n + Π ¯ ϕ ( o ) a (cid:17) = C j , ∀ j ∈ { j ι ( o ) ∈ [ J ] | ι ∈ I ( o ) } . (49) Equation (23) is achieved for j ∈ { j ι ( o ) ∈ [ J ] | ι ∈ I ( o ) } . For the other resource pools j / ∈{ j ι ( o ) ∈ [ J ] | ι ∈ I ( o ) } , (23) is satisfied by taking γ j = γ ,j = 0 , which follows from (25). Also,for policy ¯ ϕ ( o ) , (18) is guaranteed by (24) and procedure (I) of Algorithm 1. This proves theproposition. (cid:3) A PPENDIX GP ROOF OF P ROPOSITION Input :
Real vectors ν ∈ R L and γγγ ∈ R J and a ranking of pairs o ∈ O ( ν , γγγ, β ) with β ∈ R I . Output:
A policy ϕ ( o, ν , γγγ ) determined by α ϕ ( o, ν ,γγγ ) i for sub-problems i ∈ [ I ] . Function
PriorityPolicyForSubProblem( ν , γγγ, o ) : α ϕi ← for all i ∈ [ I ] Initializing a list of candidate PS pairs as the list of all PS pairs. ι ← /* Iteration variable */ while ι < N and the list of candidate PS pairs is not empty do ι ← ι + 1 if PS pair ι is not in the list of candidate PS pairs then continue end a ← inf (cid:8)(cid:8) α ϕi ι ( n ι ) ∈ [0 , | ∃ j ∈ [ J ] , ω j · ( Π ϕn + Π ϕa ) = C j (cid:9) ∪ { } (cid:9) A ← n α ϕi ι ( n ι ) ∈ [0 , (cid:12)(cid:12)(cid:12) P i ∈ P ℓι π ϕi · α ϕi = 1 o α ϕi ι ( n ι ) ← a /* Update α ϕi ι ( n ι ) with the maximal activatingprobability *//* without violating any capacity constraint. */ if Ξ i ι ( γγγ, ) = ν ℓ ι and A = ∅ then α ϕi ι ( n ι ) ← inf A /* When Ξ i ι ( γγγ, ) = ν ℓ ι and the action constraint can besatisfied by setting α ϕi ι ( n ι ) ∈ [0 , , update α ϕi ι ( n ι ) . */ end if Ξ i ι ( γγγ, ) < ν ℓ ι then α ϕi ι ( n ι ) ← remove all PS pairs ι ′ > ι with ℓ ι ′ = ℓ ι from the list of candidate PS pairs. else if ∃ j ∈ [ J ] , ω j · ( Π ϕn + Π ϕa ) = C j then remove all PS pairs ι ′ > ι with w j,i ι ′ > from the list of candidate PS pairs. end end α ϕ ( o, ν ,γγγ ) i ← α ϕi for all i ∈ [ I ] return Algorithm 3: Priority-style policy for the sub-problemsFrom Proposition 3, a sufficient condition for the existence of decomposable multipliers isthat there exist γγγ ∈ R J and o ∈ O ( γγγ , ) such that T o ( γγγ ) = γγγ . Such a γγγ is the vector of decomposable multipliers; from the definition of Ξ i ( · ) in (20), Equation (24) can be rewrittenin the form of (48).Similar to Section III-B, for given ν ∈ R L , γγγ ∈ R J and a positive vector β ∈ R I , let states n ∈ N i , i ∈ P ℓ , ℓ ∈ [ L ] , be ranked according to the descending order of β i (Ξ i ( γγγ, ) − ν ℓ ) , andlet O ( ν , γγγ, β ) represent the set of all such rankings. Note that, in Algorithm 1 and Section III-B,we ordered the PS pairs ι ∈ [ N ] according to Ξ i ι ( γγγ, ) , but here we order them according to β i ι (Ξ i ι ( γγγ, ) − ν ℓ ι ) for given β ∈ R I and ν ∈ R L .As in Algorithm 1, Algorithm 3 generates a policy ϕ ( o, ν , γγγ ) ∈ ˜Φ with PS-pair prioritiesdefined by o ∈ O ( ν , γγγ, β ) , satisfying the relaxed capacity constraints described in (10). Thepolicy ϕ ( o, ν , γγγ ) is dependent on the given β only through the ranking o . The main idea ofAlgorithm 3, as in Algorithm 1, is to activate PS pairs consecutively according to their order inthe ranking o with α ϕ ( o, ν ,γγγ ) i initialized to be for all i ∈ [ I ] . When a relaxed capacity constraintachieves equality by activating or partially activating the PS pair ι ∈ [ N ] (the ι th pair in ranking o ), all parameters are maintained in the same way as described in (II).(III) If the given multiplier ν ℓ ι for the ι th pair in ranking o is greater than Ξ i ι ( γγγ, ) , then all PSpairs ι ′ ≥ ι with ℓ ι ′ = ℓ ι are removed from the list of candidate PS pairs awaiting activation.Note that, during the process of generating policy ϕ ( o, ν , γγγ ) , we do not necessarily stop activatingPS pairs if the constraints in (9) achieve equality; that is, here, the mechanism described in (I)is replaced by that described in (III). The mechanism described in (III) guarantees that for each ℓ ∈ [ L ] , PS pairs ι ∈ [ N ] with ℓ ι = ℓ are passive if Ξ i ι ( γγγ, ) − ν ℓ < ; and are partially orfully active, otherwise. Also, for the non-passive PS pairs ι satisfying Ξ i ι ( γγγ, ) − ν ℓ ι ≥ , theparameter β will affect their priorities for activation, which together with the specified ranking o ∈ O ( ν , γγγ, β ) determine whether they are fully or partially activated.For any ranking of O ( ν , γγγ, β ) , we stipulate that the PS pairs for the boundary states | N i | − of patterns i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } are the lowest ranked, so that they are always passive.Under the policy ϕ ( o, ν , γγγ ) , the set of passive PS pairs ι with ℓ ι = ℓ ( ℓ ∈ [ L ] ) increasesmonotonically from { ι ∈ [ N ] | n ι = | N i ι | − , ℓ ι = ℓ } to { ι ∈ [ N ] | ℓ ι = ℓ } as ν ℓ increases from −∞ to + ∞ . So the sum of the expected action variables, P i ∈ P ℓ π ϕ ( o, ν ,γγγ ) i · α ϕ ( o, ν ,γγγ ) i , decreaseswith respect to ν ℓ from 1 to 0.We focus on the first N − I + L PS pairs represented by ι ∈ [ N − I + L ] , which are allowedto be active. Let I ( o, ν , γγγ ) represent the set of critical pairs and j ι ( o, ν , γγγ ) represent the critical resource pool of critical pair ι ∈ I ( o, ν , γγγ ) with respect to policy ϕ ( o, ν , γγγ ) . Define I ( o, ν , γγγ ) := n ι ∈ [ N − I + L ] | α ϕ ( o, ν ,γγγ ) i ι ( n ι ) > and ι / ∈ I ( o, ν , γγγ ) o (50)and I ( o, ν , γγγ ) := n ι ∈ [ N − I + L ] | α ϕ ( o, ν ,γγγ ) i ι ( n ι ) = 0 o . (51)For ν ∈ R L , γγγ ∈ R J , and ι ∈ [ N − I + L ] , define ∆ ∈ R [ N − I + L ] with ∆ ι := Ξ i ι ( , ) − ν ℓ ι − (cid:18) λ ℓ ι µ i ι (cid:19) X j ∈ [ J ] w j,i ι γ j . (52)We introduce a condition. Partial Decomposability:
We say the system is partially decomposable if and only if, for anygiven ν ∈ R L , there exist γγγ ∈ R J , β ∈ R I , o ∈ O ( ν , , β ) and ∆ ∈ R [ N − I + L ] that satisfy ∆ ι ≥ if ι ∈ I ( o, ν , ) , = 0 if ι ∈ I ( o, ν , ) , ≤ if ι ∈ I ( o, ν , ) . (53) and γ j = 0 (54) for all j ∈ [ J ] \{ j ι ( o, ν , ) |∃ ι ∈ I ( o, ν , ) } . If the system is partially decomposable, then, fine-tune the ν ∈ R L such that, for thecorresponding γγγ ∈ R J , β ∈ R I and o ∈ O ( ν , , β ) satisfying (53) and (54), the relaxed actionconstraints described in (9) are satisfied by the policy ϕ ( o, ν , ) . In this context, the policy ϕ ( o, ν , ) generated by Algorithm 3 coincides with the policy ¯ ϕ ( o ) generated by Algorithm 1with the same PS-pair ranking o , and the ν ℓ ( o, γγγ ) ( ℓ ∈ [ L ] ) generated by Algorithm 1 is equalto the ν ℓ . The existence of the γγγ , β and the ranking o ∈ O ( ν , , β ) is guaranteed by partialdecomposability condition. Also, for the ranking o and the multipliers ν ℓ ( o, γγγ ) = ν ℓ ( ℓ ∈ [ L ] ), γγγ is non-negative and a fixed point of the function T o (that is, T o ( γγγ ) = γγγ ). Together withProposition 3, this yields that, if the system is partially decomposable, then it is decomposable. Lemma 4.
If the system is weakly coupled then it is partially decomposable. With the vector γγγ in the definition of partial decomposability, for j ∈ [ J ] ,i) if there is a critical PS pair ι ∈ I ( o, ν , ) with critical resource pool j = j ι ( o, ν , ) , andno j ′ = j with j ′ ∈ J i ι is critical for any other PS pair ι ′ ∈ I ( o, ν , ) , then γ j = Ξ i ι ( , ) − ν ℓ ι w j,i ι (cid:0) λ ℓι µ iι (cid:1) ; (55) ii) if there are critical PS pairs ι and ι ′ in I ( o, ν , ) with critical resource pools j = j ι ( o, ν , ) = j ι ′ ( o, ν , ) and j ι ′ ( o, ν , ) ∈ J i ι , then γ j = w j ι ′ ( o, ν , ) ,i ι w j,i ι (cid:16) Ξ i ι ( , ) − ν ℓ ι w j ι ′ ( o, ν , ) ,i ι (cid:0) λ ℓι µ iι (cid:1) − Ξ i ι ′ ( , ) − ν ℓ ι ′ w j ι ′ ( o, ν , ) ,i ι ′ (cid:0) λ ℓι ′ µ iι ′ (cid:1) (cid:17) ; (56) iii) otherwise, γ j = 0 . (57) Proof.
For PS pairs ι ∈ [ N − I + L ] , if Ξ i ι ( , ) < ν ℓ ι , then ι ∈ I ( o, ν , ) , and for any γγγ ∈ R J ,there always exist ∆ ι ≤ satisfying (52).We focus, then, on equations (52) corresponding to the remaining PS pairs. Let N = { ι ∈ [ N − I + L ] | Ξ i ι ( , ) < ν ℓ ι } , and σ ( ι ) represent the position of PS pair ι ∈ [ N − I + L ] \ N among all pairs in the set [ N − I + L ] \ N according to the ranking o ∈ O ( ν , , β ) . The notation σ ( ι ) clarifies the difference between σ ( ι ) th PS pair among [ N − I + L ] \ N and the PS pair ι defined previously.We construct an ( N − I + L − | N | ) × ( N − I + L − | N | ) matrix M = ( m i,j ) and write(52) as M x = y ( x , y ∈ R N − I + L −| N | ), where y σ ( ι ) := Ξ i ι ( , ) − ν ℓ ι ( ι ∈ [ N − I + L ] \ N ).For given ν ∈ R L , β ∈ R I and o ∈ O ( ν , , β ) , M and x are defined by:a) if PS pair ι ∈ I ( o, ν , ) , then x σ ( ι ) equals ∆ ι , m σ ( ι ) ,σ ( ι ) = 1 and m σ ( ι ′ ) ,σ ( ι ) = 0 for all ι ′ = ι , ι ′ ∈ [ N − I + L ] \ N ;b) if PS pair ι ∈ I ( o, ν , ) , then x σ ( ι ) equals γ j ι ( o, ν , ) and m σ ( ι ′ ) ,σ ( ι ) , ι ′ ∈ [ N − I + L ] \ N ,are equal to w j ι (0 , ν , ) ,i ι ′ (1 + λ ℓι ′ µ iι ′ ) , the coefficients of γ j ι ( o, ν , ) in (52); andc) for all the other PS pairs ι ∈ I ( o, ν , ) , x σ ( ι ) equals − ∆ ι , entry m σ ( ι ) ,σ ( ι ) = − and m σ ( ι ′ ) ,σ ( ι ) = 0 for all ι ′ = ι , ι ′ ∈ [ N − I + L ] \ N .Also, we set γ j = 0 for all j ∈ [ J ] \{ j ι ( o, ν , ) |∃ ι ∈ I ( o, ν , ) } . Then, the lemma holds if and only if, for any weakly coupled system, there exists a non-negative solution to M x = y , forwhich (55)-(57) are satisfied.For a given i ∈ [ I ] , from (52), ∆ ι = ∆ i for all PS pairs ι with n ι ∈ N i \{| N i | − } . In thiscontext, when we consider a solution to M x = y satisfying (52), we only need to consider oneof these PS pairs ι with n ι ∈ N i \{| N i | − } associated with one row of M . Thus, we removethe other | N i | − linear functions by removing corresponding rows and columns of M andelements of x and y . In particular, for all critical pairs ι ∈ I ( o, ν , ) , we keep the row andcolumn associated with PS pair ι in M and remove the | N i ι | − others. We represent the PSpair associated with the remaining linear function for pattern i by ι i .Removing these unnecessary rows and columns, we reformulate M x = y as ˜ M ˜ x = ˜ y , where ˜ M ∈ R K × K and ˜ x , ˜ y ∈ R K and K ≤ I is the number of patterns i ∈ [ I ] with Ξ i ( , ) ≥ ν ℓ ( i ) .With m the number of critical pairs/pools, for k ∈ [ m ] , let M k = w j k ,i (1) (1 + λ ℓ ( i (1)) µ i (1) ) . . . ...1 w j k ,i ( m k ) (1 + λ ℓ ( i ( m k )) µ i ( m k ) ) w j k ,i ∗ k (1 + λ ℓ ( i ∗ k ) µ i ∗ k ) w j k ,i ( m k +2) (1 + λ ℓ ( i ( m k +2)) µ i ( m k +2) ) -1... . . . w j k ,i ( m k ) (1 + λ ℓ ( i ( mk )) µ i ( mk ) ) -1 , (58)where j k represents the k th critical pool with i ∗ k representing the pattern associated with theonly critical pair of critical pool j k , and all the rows in M k are associated with patterns i ( n ) ( n ∈ [ m k ] ) requiring resource pool j k (that is, j k ∈ J i ( n ) for all n ∈ [ m k ] ), of which the first m k ones are fully activated; that is, ι i ( n ) ∈ I ( o, ν , ) for n ∈ [ m k ] . If | J i | = 1 for all i ∈ [ I ] ,then there exists m ∈ N , m k ∈ N + , k ∈ [ m ] , so that with appropriately re-ordered columns, ˜ M has the block diagonal form diag ( M k : k ∈ [ m + 1]) .The last matrix M m +1 is an identity matrix, of which the rows are related to fully activatedPS pairs ι ∈ I ( o, ν , ) that do not require any of the critical resource pools.We firstly explain the correspondence of rows and columns between M k ( k ∈ [ m ] ) and ˜ M .As just described, the first m k rows of M k are associated with non-critical, activated PS pairs ι ∈ I ( o, ν , ) , the ( m k + 1) th row corresponds to the critical pair of critical pool j k , and the remaining rows correspond to passive pairs. From the definition of the matrix M described ina) and b), in the special case with | J i | = 1 for all i ∈ [ I ] , there are in total two non-zeroentries in the row for a non-critical, activated PS pair ι ∈ I ( o, ν , ) : one equal to one andlocated in the main diagonal of both M and ˜ M ; and the other in the column associated withthe only resource pool required by this PS pair. Also, from the definition described in b) andc), the rows for passive PS pairs ι ∈ I ( o, ν , ) have the same structure except the diagonalentries are equal to − . By re-ordering the rows and columns of ˜ M , we can position the rowsrelated to critical pool j k next to each other, so that the first m k rows are related to non-critical,activated PS pairs, the ( m k + 1) th row is associated with the only critical PS pair of this criticalpool, and the remaining rows stand for the passive ones.Recall that the rows of the last matrix M m +1 are related to fully activated PS pairs ι ∈ I ( o, ν , ) that do not require any of the critical resource pools. Similar to the rows of non-critical, activated PS pairs in the matrix M k for k ∈ [ m ] , the diagonal entries for these rows arealways one. Since these PS pairs require no critical resource pool, all the other entries in theserows are zero. Accordingly, the matrix M m +1 can be constructed as an identity matrix of size m k +1 := K − P k ∈ [ m ] m k , which is possibly zero.For the more general case with | J i | > for some i ∈ [ I ] , in a similar way, ˜ M is a blockupper triangular matrix with M k ( k ∈ [ m ] ) as just defined as its diagonal. When the system isweakly coupled, there is at most one critical pool j k with j k = j ι i ( o, ν , ) for pattern i , whichis associated with a row of M k ( k ∈ [ m ] ) corresponding to row ¯ i of matrix ˜ M , and at mostone other critical resource j k ′ , k = k ′ , satisfying j k ′ ∈ J i .If there are such i, k, ¯ i and k ′ , we perform an elementary row operation on [ ˜ M| ˜ y ] : replacingrow ¯ i by the difference between row ¯ i and row ¯ i ′ where row ¯ i ′ of ˜ M corresponds to the onlycritical pair associated with critical pool j k ′ . The corresponding γ j k for row ¯ i (associated withpattern i ) is, then, γ j k = w j k ′ ,i w j k ,i (cid:16) y σ ( ι i ) w j k ′ ,i (1 + λ ℓ ιi /µ i ) − y σ ( ι i ′ ) w j k ′ ,i ′ (1 + λ ℓ ιi ′ /µ i ′ ) (cid:17) , (59)where i ′ is the pattern corresponding to row ¯ i ′ . Note that, if the system is weakly coupled, row j k ′ is the only type-2 row for columns i and i ′ (of matrix W ) with w j k ′ ,i > and w j k ′ ,i ′ > ; thatis, resource pool j k ′ is the only resource pool for patterns i and i ′ that is shared with multiplepatterns. Let M represent the set of subscripts k ∈ [ m ] of all critical pools j k of critical pairs ι i ∗ k , for which there exists another critical pool j k ′ satisfying j k , j k ′ ∈ J i ∗ k .After this operation, we remove row and column ¯ i from ˜ M and the ¯ i th elements from ˜ x and ˜ y correspondingly. With some abuse of notation, we refer to the resulting matrix and vectorsas ˜ M , ˜ x and ˜ y . As a consequence, when all such rows and columns are removed from ˜ M ,we write the remaining matrix as ˜ M = diag ( M k : k ∈ [ m + 1] \ M ) . Now, for each M k ( k ∈ [ m ] \ M ) there are m k rows associated with PS pairs ι ∈ I ( o, ν , ) , one row associatedwith the critical pair for critical pool j k and m k := m k − m k − rows for PS pairs ι ∈ I ( o, ν , ) that are passive because of capacity constraints.We obtain the expression | ˜ M| = Y k ∈ [ m ] \ M |M k | = Y k ∈ [ m ] \ M (cid:20) w j k ,i ∗ k (cid:16) λ ℓ ( i ∗ k ) µ i ∗ k + 1 (cid:17) ( − m k (cid:21) , (60)for the determinant of the square matrix ˜ M .Let ˜ M ¯ i , ¯ i ∈ [ P k ∈ [ m ] \ M m k ] , be the matrix after replacing matrix ˜ M ’s column ¯ i by ˜ y . For acolumn ¯ i of matrix ˜ M and pattern i that is associated with row ¯ i of ˜ M , there exists a unique k ∈ [ m ] \ M , such that (cid:12)(cid:12)(cid:12) ˜ M ¯ i (cid:12)(cid:12)(cid:12) = Y k ′ ∈ [ m ] \ M k ′ = k |M k ′ | × (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y σ ( ι i ) w j k ,i (cid:16) λ ℓ ( i ) µ i + 1 (cid:17) y σ ( ι i ∗ k ) w j k ,i ∗ k (cid:18) λ ℓ ( i ∗ k ) µ i ∗ k + 1 (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − m k , if ι i ∈ I ( o, ν , ) , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w j k ,i ∗ k (cid:18) λ ℓ ( i ∗ k ) µ i ∗ k + 1 (cid:19) y σ ( ι i ∗ k ) w j k ,i (cid:16) λ ℓ ( i ) µ i + 1 (cid:17) y σ ( ι i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − m k − , if ι i ∈ I ( o, ν , ) ,y σ ( ι i ) ( − m k , otherwise . (61)By Cramer’s Rule, we conclude that, for ι ∈ [ N − I + L ] \ N ,i) if ι ∈ I ( o, ν , ) and ∃ k ∈ [ m ] with j k ∈ J i ι , x σ ( ι ) = w j k ,i ι (cid:0) ℓ ( i ι ) µ i ι (cid:1) y σ ( ι ) w j k ,i ι (cid:0) λ ℓ ( iι ) µ iι (cid:1) − y σ ( ι i ∗ k ) w j k ,i ∗ k (cid:0) λ ℓ ( i ∗ k ) µ i ∗ k (cid:1) ! ; (62)ii) if ι ∈ I ( o, ν , ) and ∄ k ∈ [ m ] with j k ∈ J i ι , x σ ( ι ) = y σ ( ι ) ; (63) iii) if ι ∈ I ( o, ν , ) and ∃ k, k ′ ∈ [ m ] with k < k ′ and j k , j k ′ ∈ J i ι , x σ ( ι ) = w j k ′ ,i w j k ,i (cid:16) y σ ( ι i ) w j k ′ ,i (cid:0) λ ℓιi µ i (cid:1) − y σ ( ι i ∗ k ′ ) w j k ′ ,i ∗ k ′ (cid:0) λ ℓ ( i ∗ k ′ ) µ i ∗ k ′ (cid:1) (cid:17) ; (64)iv) if ι ∈ I ( o, ν , ) and ∃ k ∈ [ m ] with j k ∈ J i ι , x σ ( ι ) = w j k ,i ι (cid:0) ℓ ( i ι ) µ i ι (cid:1) y σ ( ι i ∗ k ) w j k ,i ∗ k (cid:0) λ ℓ ( i ∗ k ) µ i ∗ k (cid:1) − y σ ( ι ) w j k ,i ι (cid:0) λ ℓ ( iι ) µ iι (cid:1) ! ; (65)v) otherwise, x σ ( ι ) = y σ ( ι ) w j k ,i ι (cid:0) λ ℓ ( iι ) µ iι (cid:1) . (66)Recall that σ ( ι ) represents the position of PS pair ι ∈ [ N − I + L ] \ N among all pairs inthe set [ N − I + L ] \ N according to the ranking o ∈ O ( ν , , β ) , and that the PS-pair ranking o is established in the descending order of β i ι (Ξ i ι ( , ) − ν ℓ ι ) ( ι ∈ [ N ] ). From (62)-(66), x hasa non-negative solution if β i = 1 w j,i (1 + λ ℓ ( i ) µ i ) , (67)where j is the only resource pool in J i that is shared with multiple patterns, if there is one; orany element of the set arg min j ′ ∈ J i C j ′ /w j ′ ,i , otherwise. In this case, o can be any ranking in O ( ν , , β ) .The resulting values of γγγ , as defined in b), are given by (64) and (66). The lemma is thenproved. Proof of Proposition 4.
The proposition can be derived from Lemma 4 by fine-tuning values of ν ∈ R L , such thatthe relaxed action constraints described in (9) are satisfied by the policy ϕ ( o, ν , ) . (cid:3) A PPENDIX HT WO E XAMPLES OF A CTIVATING S UB - PROCESSES
In their asymptotic optimality proof, [Weber and Weiss, 1990] allowed h sub-processes si-multaneously to be active in the RMABP with scaling parameter h . The specific birth-and-deathprocess form of our bandit process allows us to do something different. We activate exactly one (a) (b)Fig. 5. Markov chains for cases with proportional and fixed numbers of active sub-processes: (a) our model; and (b) the modelof [Weber and Weiss, 1990]. bandit process for each request type and accelerate the birth rate of active bandit processes bya factor of h with the birth rates of the passive ones remaining zero.That is, as described in (32) in Section V-A, we activate exactly one sub-process ( i, k ) ( i ∈ P ℓ , k ∈ [ h ] ) for RT ℓ ∈ [ L ] regardless of the scaling parameter h ∈ N + . The birth and death ratesof this active sub-process are hλ ℓ and N φi,k ( t ) µ i , respectively.For instance, consider a simple system with only one type of request and one non-dummypattern available for it so that | P | = 2 . We label this pattern as pattern , of which the state spaceis N = { , } with w j, = 1 for all j ∈ [ J ] . For a trivial policy that always prioritizes the onlynon-dummy pattern or its sub-patterns if the capacity constraints are not violated, we illustratein Figure 5 the underlying Markov chains for our model and that of [Weber and Weiss, 1990]when h = 2 . For the latter, the two sub-patterns of the non-dummy pattern will be activatedsimultaneously all the time with the same Markov chain as shown in Figure 5(b). For the former,its Markov chain is illustrated in Figure 5(a), where the first sub-pattern is activated with birthrate λ , while the second sub-pattern is passive with zero birth rate. This has consequencesfor our proof of asymptotic optimality and further discussion about the effects of fixed andproportional numbers of active sub-patterns with h → + ∞ is provided in conjunction withProposition 6. A PPENDIX IP ROOF OF T HEOREM o ∈ O , the index policy ϕ (describedby (37) and (39)), based on ranking o , achieves the same long-run average revenue as ¯ ϕ ( o ) (generated by Algorithm 1). Let q ,h ( ι, ι ′ ) and q ,h ( ι, ι ′ ) , ι, ι ′ ∈ [ N ] , represent the transition rates of a sub-process in PSpair ι transitioning to ι ′ with i ι ′ = i ι and n ι ′ = n ι ± , if it is active and passive, respectively.For notational convenience, define q ,h ( ι, ι ′ ) = q ,h ( ι, ι ′ ) = 0 if i ι ′ = i ι or n ι ′ = n ι ± , ι, ι ′ ∈ [ N ] . Consider a deterministic process z ϕ,h ( t ) = ( z φ,hι ( t ) : ι ∈ [ N ]) with a given initialpoint z ϕ,h (0) ∈ Z under the index policy ϕ (described in Section V-B), generated by thedifferential equation d z ϕ,hι ( t ) d t = X ι ′ ∈ [ N ] ι ′ = ι (cid:20) z ϕ,hι ′ ( t ) (cid:16) υ ϕ,hι ′ ( z ϕ,t ( t )) q ,h ( ι ′ , ι ) + (cid:0) − υ ϕ,hι ′ ( z ϕ,h ( t )) (cid:1) q ,h ( ι ′ , ι ) (cid:17) − z ϕ,hι ( t ) (cid:16) υ ϕ,hι ( z ϕ,t ( t )) q ,h ( ι, ι ′ ) + (cid:0) − υ ϕ,hι ( z ϕ,h ( t )) (cid:1) q ,h ( ι, ι ′ ) (cid:17)(cid:21) , (68)where the right hand side is the sum of transition rates entering PS pair ι minus the sum oftransition rates leaving PS pair ι . Proposition 5.
1) For any h ∈ N + , ǫ h ∈ E h and z ϕ,h (0) ∈ Z , there exists a unique solution z ϕ,h ( t ) , t ≥ , of (68) .2) For any ψ ∈ Ψ and δ > , if Z ϕ,h (0) = z ϕ,h (0) ∈ Z , lim h → + ∞ lim t → + ∞ t Z t P n(cid:13)(cid:13) Z ϕ,h ( u ) − z ϕ,h ( u ) (cid:13)(cid:13) > δ o du = 0 , (69) where k·k is the Euclidean norm. Recall that the index policy ϕ , described by (37) and (39), is ǫ h -dependent.The proof is given in Appendix J. The proof follows similar ideas and methods to those of[Weber and Weiss, 1990] by invoking Picard’s existence theorem and [Freidlin and Wentzell, 2012,Chapter 7, Theorem 2.1].We define a special point z ¯ ϕ ( o ) ,h = lim t → + ∞ E (cid:2) Z ¯ ϕ ( o ) ,h ( t ) (cid:3) , (70)which exists since, with given h ∈ N + , the underlying Markov chain of the process N ¯ ϕ ( o ) h ( t ) is irreducible with finitely many states and the process Z ¯ ϕ ( o ) ,h ( t ) is a function of the process N ¯ ϕ ( o ) h ( t ) as defined in Section V-A.The key to our proof of Theorem 1 is the following result. Lemma 5.
The limit z ¯ ϕ ( o ) = lim h → + ∞ z ¯ ϕ ( o ) ,h exists, and for any z ϕ,h (0) ∈ Z , lim k ǫ k→ lim h → + ∞ lim t → + ∞ z ϕ,h ( t ) = z ¯ ϕ ( o ) . (71)The proof is given in Appendix K. Remark
A condition similar to Lemma 5, trapping the process z ϕ,h ( t ) in the neighborhoodof z ¯ ϕ ( o ) , was an important assumption in [Weber and Weiss, 1990]: there exists a unique equi-librium point of the constructed deterministic process with any initial point. In our problem, theunderlying Markov chain for each sub-process is a birth-and-death process with state-dependentservice rates that are monotonically non-increasing in their priorities. This simplifies the analysisof z ϕ,h ( t ) in the asymptotic regime.Intuitively, under an index policy, the process related to the pattern with the highest priority willkeep occupying or releasing resource units until it reaches a stable point: a capacity constraintis about to be violated or the arrival rate of the related RT is balanced by the total servicerate. Then, another process with the second priority will behave similarly: it cannot occupyany resource units already occupied by the first process but can take resource units from otherprocesses with lower priorities. The final equilibrium point can be obtained by calculating thesestable points in turn, moving from the pattern with highest priority to the lowest one. A detailedanalysis is provided in the proof of Lemma 5 in Appendix K. Proposition 6.
For any σ > , lim k ǫ k→ lim h → + ∞ lim t → + ∞ t Z t P n(cid:13)(cid:13) Z ϕ,h ( u ) − z ¯ ϕ ( o ) (cid:13)(cid:13) > σ o du = 0 (72) Proof.
Proposition 5 shows that for any ǫ , the stochastic process Z ϕ,h ( t ) will not leave aneighbourhood of z ϕ,h ( t ) for any substantially positive proportion of time when h → + ∞ .Lemma 5 shows that the deterministic process z ϕ,h ( t ) will not leave a neighbourhood of z ¯ ϕ ( o ) for any substantially positive proportion of time when k ǫ k → and h → + ∞ .This proposition is a consequence of Proposition 5 and Lemma 5. (cid:3) Remark
Recall the long-run average revenue normalized by h of the resource allocationproblem under policy φ , R φ,h , as defined in Section V-C. For given o ∈ O , if policy ¯ ϕ ( o ) ∈ ˜Φ maximizes the long-run average revenue of the relaxed problem for a given h , then R ¯ ϕ ( o ) ,h ≥ max φ ∈ Φ R φ,h ≥ R ϕ,h . From Lemma 5 and Proposition 6, since R ϕ,h approaches R ¯ ϕ ( o ) ,h as k ǫ k → and h → + ∞ , R ϕ,h asymptotically approaches optimality of the original problem. Proof of Theorem 1.
If (41) holds for ¯ ϕ ( o ) , then, from Proposition 6, R ϕ,h approaches max φ ∈ Φ R φ,h as k ǫ k → and h → + ∞ . If R ϕ,h asymptotically approaches max φ ∈ Φ R φ,h as k ǫ k → and h → + ∞ , then, from Proposition 6, R ¯ ϕ ( o ) ,h also approaches max φ ∈ Φ R φ,h as h → + ∞ ; that is, (41) is satisfied. (cid:3) A PPENDIX JP ROOF OF P ROPOSITION Proof of Proposition 5.
This proof follows the ideas and methods of [Weber and Weiss, 1990]and [Freidlin and Wentzell, 2012].We firstly construct a stochastic process that matches the hypothesis of [Freidlin and Wentzell, 2012,Chapter 7, Theorem 2.1].Let t hℓ,s be the times of the s th arrival of a request of type ℓ ∈ [ L ] in a system with scalingparameter h . For convenience, define a one-to-one mapping θ : (cid:16) [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } (cid:17) × [ h ] → [ h ( I − L )] , so that each non-dummy sub-pattern or sub-process ( i, k ) , i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } , k ∈ [ h ] , can be labeled by an integer θ = θ ( i, k ) ∈ [ h ( I − L )] . Let ( i θ , k θ ) represent thesub-pattern or sub-process labeled by θ ∈ [ h ( I − L )] , and t hL + θ,s ( n ) be the times of the s th potential departure of a completed request with sub-process ( i θ , k θ ) in state n ∈ N i θ , s ∈ N + .In particular, let the h sub-processes for the same pattern be labeled successively: the mapping θ satisfies that, for any i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } and k ∈ [ h − , θ ( i, k + 1) = θ ( i, k ) + 1 .The time intervals between successive arrivals and potential departures of requests are inde-pendently and exponentially distributed random variables. The word potential is used because,for each pattern/sub-pattern, a potential departure is regarded as a real departure if there is atleast one request being served by this pattern; and it is ignored, otherwise.Let τ ( t ) represent the occurrence time of the latest event, either arrival or departure, beforetime t . Define a random vector ξ ht = ( ξ hℓ,t : ℓ ∈ [ L ] , ξ hL + θ,t ( n ) : θ ∈ [ h ( I − L )] , n ∈ N i θ ) asfollows: for ℓ ∈ [ L ] , ξ hℓ,t = t hℓ,s ∗ − τ ( t hℓ,s ∗ ) , if τ ( t hℓ,s ∗ ) ≤ t < t hℓ,s ∗ , where t hℓ,s ∗ = min k ′′ ∈ N + { t hℓ,k ′′ | t hℓ,k ′′ > t } , , otherwise , (73) for θ ∈ [ h ( I − L )] , n ∈ N i θ , ξ hL + θ,t ( n ) = t hθ,s ∗ ( n ) − τ ( t hθ,s ∗ ( n )) , if τ ( t hθ,s ∗ ( n )) ≤ t < t hθ,s ∗ ( n ) , where t hθ,s ∗ ( n ) = min k ′′ ∈ N + { t hθ,k ′′ ( n ) | t hθ,k ′′ ( n ) > t } , , otherwise . (74)From (73), at any time t > , there is at most one ℓ ∈ [ L ] for which ξ hℓ,t > and all the othersare zero. Also, for ℓ ∈ [ L ] , if the next event is not an arrival of request type ℓ (but a potentialdeparture or an arrival of other request types), ξ hℓ,t = 0 . If the next event to occur after time t is the s th arrival of type ℓ , then ξ hℓ,t is the reciprocal of the time between the last and the nextevent. Otherwise it is zero. Similar results apply to ξ hL + θ ( n ) ( θ ∈ [ h ( I − L )] , n ∈ N i θ ) definedin (74) associated with potential departures of sub-process ( i θ , k θ ) in state n .Then, for ℓ ∈ [ L ] , the number ⌊ R T ξ hℓ,t dt ⌋ represents the number of arrivals of RT ℓ by time T ; and for θ ∈ [ h ( I − L )] and n ∈ N i θ , the number ⌊ R T ξ hL + θ,t ( n ) dt ⌋ is the number of potentialdepartures associated with sub-pattern ( i θ , k θ ) in its state n by time T , when T is large.We define a function, Q h ( ι, ι ′ , x , ξ h ) , on ι, ι ′ ∈ [ N ] , x ∈ R N , ξ h ∈ R L + h ( N − L ) . For z ∈ Z , ι ∈ [ N ] , let a ι ( z ) = υ ϕ,hι ( z ) z ι hI , which, from the definition of υ ϕ,hι ( z ) in (39), takes values in { , } and becomes algebraically independent of h with given z , when h is sufficiently large.For given ξ h ∈ R L + h ( N − L ) and x ∈ R N , Q h ( ι, ι ′ , x , ξ h ) is defined by Q h ( ι, ι ′ , x , ξ h ) = a ι ( x /hI ) ξ hℓ ι + f ,hι,a ( x , ξ h ) , if i ι = i ι ′ , n ι ′ = n ι + 1 , θ ( i ι , ⌈ x − ι ⌉ ) P θ = θ ( i ι , ⌈ x − ι − ⌉ +1) ξ hL + θ ( n ι ) + f hι,a ( x , ξ h ) , if i ι = i ι ′ , n ι ′ = n ι − , , otherwise , (75)where x − ι = P ιι ′ =1 ,i ι ′ = i ι x ι ′ with x − = 0 , and f ,hι,a ( z , ξ h ) and f hι,a ( z , ξ h ) are appropriate functionsto make Q h ( ι, ι ′ , x , ξ h ) Lipschitz continuous in x for all ξ h and < a < , see the discussionbelow. For x ∈ R N \ R N , let ( x ) + = (max { x ι , } : ι ∈ [ N ]) , and define Q h ( ι, ι ′ , x , ξ h ) = Q h ( ι, ι ′ , ( x ) + , ξ h ) .If we set f ,hι,a ( z , ξ h ) and f hι,a ( z , ξ h ) to zero all the time, then Q h ( · , · , x , · ) is a discontinuousfunction of x . The idea to make the function smooth in x is to utilize the fact that the Diracdelta function can be considered as a limit of normal distributions of mean . With < a < , define y a ( u ) = R ̺ ( u ) −∞ a √ π e − ( v − a ) /a dv if u ∈ (0 , if u = 01 if u = 1 , (76)where ̺ ( u ) is a function of u ∈ (0 , which is −∞ at , + ∞ at and suitably smooth. It canpossibly be specified as a negative cotangent function, ̺ ( u ) = − cot( uπ ) . The function y a ( u ) iscontinuous at u ∈ (0 , , right-continuous at u = 0 and left continuous at u = 1 . Let ˜ N be the setof PS pairs associated with dummy patterns, and for any ι ∈ ˜ N , define f ,hι, · ( · , · ) = f hι, · ( · , · ) = 0 .For ι ∈ [ N ] , let Γ ι ( z ) := lim h → + ∞ min ( z ι , max (cid:26) , min j ∈ J i w j,i I (cid:16) C j (1 − ǫ hj,ι ) − N X ι ′ =1 w j,i ι ′ n ι ′ z ι ′ I − X ι ′ ∈ N + ι w j,i ι ′ υ ϕ,hι ′ ( z ) z ι ′ I (cid:17)(cid:27)) , (77)where, from the definition of υ ϕ,hι ( z ) in (39), the limit exists. Note that Γ ι ( z ) is used fordefinitions of f ,hι,a ( x , ξ h ) and f hι,a ( x , ξ h ) with ι ∈ [ N ] \ ˜ N , and is similar to lim h → + ∞ ζ ϕ,hι ( z ) defined in (37); Γ ι ( z ) is Lipschitz continuous in z ∈ Z and the latter is not. Also, let χ hι ( x ) := P ι ′′ ∈ N + ι , ℓ ι ′′ = ℓ ι Γ ι ′′ ( x /hI ) hI , so that χ hι ( x ) is continuous in x ∈ R N . For x ∈ R N , ι ∈ [ N ] \ ˜ N and h ∈ N + , let ι ′ = max { ι ′′ ∈ N + ι | ℓ ι ′′ = ℓ ι , Γ ι ( x /hI ) > } . We then define f ,hι,a ( x , ξ h ) = ξ hℓ ι y a (1 − χ hι ′ ( x )) , if Γ ι ( x /hI ) > , and < χ hι ′ ( x ) < , − ξ hℓ ι y a (1 − χ hι ( x )) , if χ hι ′ ( x ) = 0 , and < χ hι ( x ) < , , otherwise . (78)For x ∈ R N and ι ∈ [ N ] \ ˜ N with x − ι − > and ⌈ x − ι ⌉ < h , define f hι,a ( x , ξ h ) = − ξ hL + θ (cid:0) i ι , ⌈ x − ι − ⌉ +1 (cid:1) ( n ι ) y a (1 − ⌈ x − ι − ⌉ + x − ι − ) + ξ hL + θ (cid:0) i ι , ⌈ x − ι − ⌉ (cid:1) ( n ι ) y a ( ⌈ x − ι − ⌉ − x − ι − ) − ξ hL + θ (cid:0) i ι , ⌈ x − ι ⌉ (cid:1) ( n ι ) y a ( ⌈ x − ι ⌉ − x − ι ) + ξ hL + θ (cid:0) i ι , ⌈ x − ι ⌉ +1 (cid:1) ( n ι ) y a (1 − ⌈ x − ι ⌉ + x − ι ) . (79) For x ∈ R N and ι ∈ [ N ] \ ˜ N with x − ι − = 0 and ⌈ x − ι ⌉ < h , define f hι,a ( x , ξ h ) = − ξ hL + θ (cid:0) i ι , ⌈ x − ι − ⌉ +1 (cid:1) ( n ι ) y a (1 − ⌈ x − ι − ⌉ + x − ι − ) − ξ hL + θ (cid:0) i ι , ⌈ x − ι ⌉ (cid:1) ( n ι ) y a ( ⌈ x − ι ⌉ − x − ι ) + ξ hL + θ (cid:0) i ι , ⌈ x − ι ⌉ +1 (cid:1) ( n ι ) y a (1 − ⌈ x − ι ⌉ + x − ι ) , (80)For x ∈ R N and ι ∈ [ N ] \ ˜ N with x − ι − > and ⌈ x − ι ⌉ = h , define f hι,a ( x , ξ h ) = − ξ hL + θ (cid:0) i ι , ⌈ x − ι − ⌉ +1 (cid:1) ( n ι ) y a (1 − ⌈ x − ι − ⌉ + x − ι − )+ ξ hL + θ (cid:0) i ι , ⌈ x − l − ⌉ (cid:1) ( n ι ) y a ( ⌈ x − l − ⌉ − x − l − ) − ξ hL + θ (cid:0) i ι , ⌈ x − ι ⌉ (cid:1) ( n ι ) y a ( ⌈ x − ι ⌉ − x − ι ) (81)For x ∈ R N and ι ∈ [ N ] \ ˜ N with x − ι − = 0 and ⌈ x − ι ⌉ = h , define f hι,a ( x , ξ h ) = − ξ hL + ⌈ x − ι − ⌉ +1 ( n ι ) y a (1 − ⌈ x − ι − ⌉ + x − ι − ) − ξ hL + ⌈ x − ι ⌉ ( n ι ) y a ( ⌈ x − ι ⌉ − x − ι ) . (82)With these f ,hι,a ( z , ξ h ) and f hι,a ( z , ξ h ) , the function Q h ( ι, ι ′ , x , ξ h ) is Lipschitz continuous in x ∈ R N for all given ξ h ∈ R L + h ( N − L ) and < a < .For the special case with h = 1 , any given < a < , and σ > , we define X σt such that ˙ X σt := b ( X σt , ξ t/σ ) := X ι ′ ∈ [ N ] Q ( ι ′ , ι, X σt , ξ t/σ ) − Q ( ι, ι ′ , X σt , ξ t/σ ) . (83)It follows that b ( X σt , ξ t/σ ) satisfies a Lipschitz condition over X σt and ξ t/σ .Also, from the definition of the function Q h in (75), for x ∈ R N , t ≥ and h = 1 , thereexists a matrix ˜ Q ( x ) , such that b ( x , ξ t ) = ˜ Q ( x ) ξ t . For any x ∈ R N , h = 1 , δ > and anyfunction b ( x ) ∈ R N of x ∈ R N , we obtain P (cid:26)(cid:13)(cid:13)(cid:13) T Z t + Tt b ( x , ξ v ) dv − b ( x ) (cid:13)(cid:13)(cid:13) > δ (cid:27) ≤ P (cid:26)(cid:13)(cid:13)(cid:13) ˜ Q ( x ) 1 T (cid:4)Z t + Tt ξ v dv (cid:5) − b ( x ) (cid:13)(cid:13)(cid:13) + 1 T (cid:13)(cid:13)(cid:13) ˜ Q ( x ) (cid:16)Z t + Tt ξ v dv − (cid:4)Z t + Tt ξ v dv (cid:5)(cid:17)(cid:13)(cid:13)(cid:13) > δ (cid:27) ≤ P (cid:26)(cid:13)(cid:13)(cid:13) ˜ Q ( x ) 1 T (cid:4)Z t + Tt ξ v dv (cid:5) − b ( x ) (cid:13)(cid:13)(cid:13) + o ( T ) T > δ (cid:27) , (84)where ⌊ ξ ⌋ for a vector ξ takes the floor operation for each of its element. Recall that, for ℓ ∈ [ L ] ,the ℓ th element of the vector ⌊ R t + Tt ξ v dv ⌋ is a Poisson distributed random variable with rate λ ℓ ,representing the number of arrivals of requests of type ℓ ; and, for θ ∈ [ I − L ] and n ∈ N i θ , its θ + L th element is a Poisson distributed random variable with rate nµ i θ , representing the numberof potential departures of requests for a sub-process in state n of pattern i θ . Thus, by the Lawof Large Numbers, for any x ∈ R N , h = 1 , δ > , there exists b ( x ) = E [ b ( x , ξ t )] , which isindependent from t , satisfying lim T → + ∞ P (cid:26)(cid:13)(cid:13)(cid:13) T Z t + Tt b ( x , ξ v ) d v − b ( x ) (cid:13)(cid:13)(cid:13) > δ (cid:27) = 0 , (85)uniformly in t > .Let x ( t ) be the solution of ˙ x ( t ) = b ( x ( t )) , (86)with given x = X σ . By Picard’s Existence Theorem ( [Coddington and Levinson, 1955]), thereexists a unique solution x t , t ≥ , satisfying (86) with given x .Now we invoke [Freidlin and Wentzell, 2012, Chapter 7, Theorem 2.1]: if (85) holds true,and E k b ( x , ξ t ) k < + ∞ for all x ∈ R N , then, for any T > , δ > , lim σ → P (cid:26) sup ≤ t ≤ T k X σt − x ( t ) k > δ (cid:27) = 0 . (87)We interpret the scalar σ and the scaling effects in another way. For x ∈ R N and ξ h ∈ R L + h ( N − L ) , we define b h ( x , ξ h ) := X ι ′ ∈ [ N ] Q h ( ι ′ , ι, x , ξ h ) − Q h ( ι, ι ′ , x , ξ h ) . (88)Following the technique of [Fu et al., 2016], we set σ = 1 /h , and observe that, for any x ∈ R N , h > and T > , R T b ( x , ξ t/σ ) dt and R T ( b h ( h x , ξ ht ) /h ) dt are identically distributed. With Z σ = Z h = x /I , we define ˙ Z ht := hI b h ( hI Z ht , ξ ht ) and ˙ Z σt := I b ( I Z σt , ξ t/σ ) . From (87), forany T > and δ > , we obtain lim h → + ∞ P (cid:26) sup ≤ t ≤ T (cid:13)(cid:13) Z ht − x ( t ) /I (cid:13)(cid:13) > δ (cid:27) = 0 . (89)Effectively then, scaling time by σ = h is equivalent to scaling the system size by h .Note that ˙ Z ht and ˙ x ( t ) are dependent on the parameter a ∈ (0 , through functions f ,hι,a ( x , ξ h ) and f hι,a ( x , ξ h ) that are used in definition (75). Equation (89) holds for any given < a < .Because of the Lipschitz behavior of ˙ Z ht and ˙ x ( t ) on < a < , lim a ↓ d ˙ Z ht /da = 0 and lim a ↓ d ˙ x ( t ) /da = 0 , equation (89) holds in the limit as a → . Also, if Z h = Z ϕ,h (0) , and x (0) /I = z ϕ,h (0) , then lim a ↓ x ( t ) /I = lim h → + ∞ z ϕ,h ( t ) and lim h → + ∞ lim a ↓ Z ht = lim h → + ∞ Z ϕ,h ( t ) .For any h ∈ N + , the existence of z ϕ,h ( t ) , t ≥ , satisfying (68) with given z ϕ,h (0) can beproved along similar lines as x ( t ) by introducing a function y a ( u ) and then invoking Picard’sExistence Theorem for initial value problems.Recall that Z ϕ,h ( t ) is the vector of proportions of sub-processes under policy ϕ at time t and z ϕ,h ( t ) is given by (68) (as defined in Section V-C). Then, for any T > and δ > , lim h → + ∞ P (cid:26) sup ≤ t ≤ T (cid:13)(cid:13) Z ϕ,h ( t ) − z ϕ,h ( t ) (cid:13)(cid:13) > δ (cid:27) = 0 , (90)which leads to (69). This proves the proposition. (cid:3) A PPENDIX KP ROOF OF L EMMA z ¯ ϕ ( o ) . Lemma 6.
The limit z ¯ ϕ ( o ) = lim h → + ∞ z ¯ ϕ ( o ) ,h existsProof. Recall that the policy ¯ ϕ ( o ) is generated by Algorithm 1 in Section III-B, where o is a given PS pair ranking. Note that, as mentioned in Section V-A, the optimization problemconsisting of the hI sub-processes associated with hI sub-patterns, coupled through constraintsdescribed in (32)-(34) can be analyzed and relaxed along the same lines as in Section III.Algorithm 1 can also be applied directly to the problem scaled by h . To clarify, for the scaledproblem described by (31)-(34), the action and capacity constraints can be relaxed to lim t → + ∞ E X i ∈ P ℓ X k ∈ [ h ] a φi,k ( N φh ( t )) = 1 , ∀ ℓ ∈ [ L ] , (91)and lim t → + ∞ E X i ∈ [ I ] w j,i h X k ∈ [ h ] (cid:18) N φi,k ( t ) + a φi,k ( N φh ( t )) (cid:19) ≤ C j , ∀ j ∈ [ J ] , (92)which correspond to the relaxed constraints in (9) and (10), respectively. Here, we first recalland rewrite the procedures of generating the policy ¯ ϕ ( o ) (in Algorithm 1): initialize all PS pairsto be passive and then sequentially activate the PS pairs according to their priorities defined by o ∈ O . In particular, for each PS pair ι ∈ [ N ] , the action variables for the h sub-patterns, α ¯ ϕ ( o ) i,k ( n ι ) ( k ∈ [ h ] ), are sequentially activated according to any permutation of , , . . . , h , because all ofthem correspond to the same index Ξ i ι ( γ , ) and the same priority in the PS pair ranking o . Weassume without loss of generality that the α ¯ ϕ ( o ) i ι ,k ( n ι ) are activated in the order of k = 1 , , . . . , h .The activating process continues until either a relaxed action or capacity constraint described in(91) and (92), respectively, achieves equality. Also, the mechanism described in (I) and (II) isdirectly applicable to this procedure by replacing (9) and (10) with (91) and (92), respectively.In this context, we still use Algorithm 1 (including the mechanisms described in (I) and (II))to indicate the procedure of generating the policy ¯ ϕ ( o ) in the scaled system.For a given PS pair ranking o , the action variables α ¯ ϕ ( o ) i ι ,k ( n ι ) under the policy ¯ ϕ ( o ) are initializedto zero and activated sequentially from ι = 1 to N (from the PS pair with the highest priorityto the lowest), as described above. For clarity, we write κ ∈ { } ∪ [ N ] to indicate the initialcondition and the N iterations in Algorithm 1, and define a sequence of intermediate policies,referred to as ϕ ( o, κ ) , for which α ϕ ( o, i,k ( · ) = 0 ( i ∈ [ I ] , k ∈ [ h ] ) and α ϕ ( o,N ) i,k ( · ) = α ¯ ϕ ( o ) i,k ( · ) . Inparticular, for the κ th iteration and PS pairs ι ∈ [ N ] , α ϕ ( o,κ ) i ι ,k ( n ι ) = α ¯ ϕ ( o ) i ι ,k ( n ι ) ( k ∈ [ h ] ) if ι ≤ κ ;and α ϕ ( o,κ ) i ι ,k ( n ι ) = 0 if ι > κ .In the scaled system, each sub-process is a birth-and-death Markov process with finitely manystates. For any o , the Markov chain for the sub-process has a stationary distribution whichis limiting as t → + ∞ in the sense that the time-dependent distribution for any initial stateconverges to it. For a sub-process ( i, k ) ( i ∈ [ I ] \{ d ( ℓ ) : ℓ ∈ [ L ] } , k ∈ [ h ] ) and κ ∈ [ N ] , thestationary distribution of state n ∈ N i \{ } under policy ϕ ( o, κ ) is π o,κ,hi,k ( n ) = π o,κ,hi,k (0) n Y n ′ =1 α ϕ ( o,κ ) i,k ( n ′ − hλ ℓ ( i ) n ′ µ i , (93)with π o,κ,hi,k (0) the stationary distribution for state normalized by P n ∈ N i π o,κ,hi,k ( n ) = 1 .For κ ∈ [ N ] and h ∈ N + , define the expected sum of the action variables with respect to thestationary distribution (93) associated with pattern i ∈ [ I ] S o,hA ( i, κ ) := X n ∈ N i X k ∈ [ h ] π o,κ,hi,k ( n ) α ϕ ( o,κ ) i,k ( n ) (94)and, for j ∈ [ J ] , the expected sum of occupied capacities S o,hC ( i, κ, j ) := X n ∈ N i h X k ∈ [ h ] π o,κ,hi,k ( n ) w j,i n, (95) under policy ϕ ( o, κ ) .For κ = 1 and the action variables for PS pairs ι > initialized to zero, all the actionvariables α ϕ ( o,κ ) i κ ,k ( n κ ) ( k ∈ [ h ] ) remain constant if no constraint in (91) and (92) is violated as h → + ∞ ; or some of them decrease in h for sufficiently large h if a constraint in (91) or (92)achieves equality. Slightly abusing notation, when κ ∈ [ N ] , we write i κ , n κ and ℓ κ to indicate i ι , n ι and ℓ κ with ι = κ , respectively. In the former case, the κ th element of z ¯ ϕ ( o ) , z ¯ ϕ ( o ) κ =lim h → + ∞ lim t → + ∞ E [ Z ϕ ( o,κ ) ,hκ ( t )] = lim h → + ∞ h P k ∈ [ h ] π o,h,κi κ ,k ( n κ ) , lim h → + ∞ S o,hA ( i, κ ) and lim h → + ∞ S o,hC ( i, κ, j ) ( i ∈ [ I ] , j ∈ [ J ] ) exist. In the latter case, for any H < + ∞ , there exists h > H such that an equalityis achieved in (91) or (92); that is, S o,hA ( i κ , κ ) = X k ∈ [ h ] π o,κ,hi κ ,k ( n κ ) α ϕ ( o,κ ) ,hi κ ,k ( n κ ) = 1 , (96)or, there is a j ∈ [ J ] such that S o,hC ( i κ , κ, j ) = w j,i κ h X k ∈ [ h ] (cid:16) π o,κ,hi κ ,k ( n κ ) n κ + π o,κ,hi κ ,k ( n κ + 1) (cid:0) n κ + 1 (cid:1)(cid:17) = C j − o ( h ) h , (97)where π o,κ,hi κ ,k ( · ) is the stationary distribution of sub-pattern ( i κ , k ) that is a solution of (93) withgiven α ϕ ( o,κ ) i κ ,k ( · ) . In particular, the o ( h ) on the right hand side of (97) corresponds to the secondterm on the left hand side of (92), lim t → + ∞ E hX i ∈ [ I ] w j,i X k ∈ [ h ] a ϕ ( o,κ ) i,k (cid:0) N ϕ ( o,κ ) h ( t ) (cid:1)i , (98)which is equal to R hκ ( j ) := X i ∈ [ I ] w j,i X k ∈ [ h ] X n ∈ N i π o,κ,hi,k ( n ) α ϕ ( o,κ ) i,k ( n ) (99)and is bounded for any h ∈ N + ∪ { + ∞} because of the relaxed action constraints described in(91). Definition 7.
We say that the κ th iteration of Algorithm 1 is saturated , if, for any H < + ∞ ,there exists h > H such that equality is achieved in (91) or (92) . We now show the existence of z ¯ ϕ ( o ) κ = lim h → + ∞ h P k ∈ [ h ] π o,h,κi κ ,k ( n κ ) , lim h → + ∞ S o,hA ( i, κ ) and lim h → + ∞ S o,hC ( i, κ, j ) ( i ∈ [ I ] , j ∈ [ J ] ) in the saturated case.Because the action variables for the same PS pair are sequentially activated from k = , , . . . , h , there exists a k ∗ κ ( h ) ∈ [ h ] such that, for all k < k ∗ κ ( h ) , α ϕ ( o,κ ) i κ ,k ( n κ ) = 1 ; for k = k ∗ κ ( h ) , α ϕ ( o,κ ) i κ ,k ( n κ ) ∈ (0 , ; and, for k > k ∗ κ ( h ) , α ϕ ( o,κ ) i κ ,k ( n κ ) = 0 . Let π o,κ,hi κ ,k ( · ) = π o,κ,h, + i κ ( · ) for k < k ∗ κ ( h ) − ; and π o,κ,h, − i κ ( · ) for k > k ∗ κ ( h ) . For clarity, for given h and κ , since the actionvariables α ϕ ( o,κ ) ,hi κ ,k ( n κ ) are uniquely determined by the value ρ hκ := (cid:16) k ∗ κ ( h ) + α ϕ ( o,κ ) ,hi κ ,k ∗ κ ( h ) ( n κ ) (cid:17) /h, (100)we refer to ρ hκ ∈ [0 , as the decision in the κ th iteration of Algorithm 1.For ρ ∈ [0 , , let π o,κ,hρ ( n ) be the solution of equations: for n ∈ N i κ , if n ≤ n κ , π o,κ,hρ ( n ) = π o,κ,hρ (0) n Y n ′ =1 hλ ℓ κ n ′ µ i κ ; (101)if n = n κ + 1 , π o,κ,hρ ( n ) = π o,κ,hρ (0) ( ρh − ⌊ ρh ⌋ ) hλ ℓ κ nµ i κ n − Y n ′ =1 hλ ℓ κ n ′ µ i κ ; (102)if n > n κ + 1 , π o,κ,hρ ( n ) = 0; (103)and X n ′ ∈ N iκ π o,κ,hρ ( n ′ ) = 1 . (104)The π o,κ,hρ ( · ) represents the stationary distribution of a sub-process with respect to pattern i κ under a policy determined by ρ ∈ [0 , in the κ th iteration of Algorithm 1. We refer to thispolicy as ϕ ρ ( o, κ ) . In particular, if ρ = ρ hκ , π o,κ,hρ ( · ) = π o,κ,hi κ ,k ∗ κ ( h ) ( · ) .Next we define, for ρ ∈ [0 , , f o,κ,hC ( ρ, j ) := w j,i κ n κ +1 X n =0 (cid:18) ρπ o,κ,h, + i κ ,k ( n ) + 1 h π o,κ,hρ ( n ) (cid:19) n (105)which is the expected number of RUs of resource pool j occupied by sub-patterns with respectto pattern i κ under policy ϕ ρ ( o, κ ) . Also, if ρ = ρ hκ , f o,κ,hC ( ρ, j ) = S o,hC ( i κ , κ, j ) . Since, forgiven ρ ∈ [0 , , the action variables under policy ϕ ρ ( o, κ ) remain constant for any h ∈ N + , bysolving the stationary distribution in (101)-(104), we can show the existence of lim h → + ∞ f o,κ,hC ( ρ, j ) ( j ∈ [ J ] ). Similarly, define f o,κ,hA ( ρ ) := n κ X n =0 (cid:18) ⌊ ρh ⌋ π o,κ,h, + i κ ( n ) + π o,κ,hρ ( n ) n
H < + ∞ , there is h > H such that a constraintin (91) or (92) achieves equality on the κ th iteration; that is, either S o,hA ( i κ , κ ) = X n ∈ N iι X k ∈ [ h ] π o,κ,hi κ ,k ( n ) α ϕ ( o,κ ) ,hi κ ,k ( n ) = 1 − X i ∈ P ℓκ i = i κ X n ∈ N i X k ∈ [ h ] π o,κ,hi,k ( n ) α ϕ ( o,κ ) ,hi,k ( n )= 1 − X i ∈ P ℓκ i = i κ S o,hA ( i, κ − , (111)or, for some j ∈ [ J ] , S o,hC ( i κ , κ, j ) = X n ∈ N iκ w j,i κ h X k ∈ [ h ] π o,κ,hi κ ,k ( n ) n = C j − X i ∈ [ I ] i = i κ X n ∈ N i w j,i h X k ∈ [ h ] π o,κ,hi,k ( n ) n − R hκ h = C j − X i ∈ [ I ] i = i κ S o,hC ( i, κ − , j ) − o ( h ) h . (112)Along similar lines as discussed for Cases < i > and < ii > for κ = 1 , in the saturated casewith respect to the κ th iteration, either lim h → + ∞ f o,κ,hA (1) ≥ − lim h → + ∞ (cid:20) X i ∈ P ℓκ i = i κ S hA ( i, κ − (cid:21) , (113)or, for some j ∈ [ J ] , lim h → + ∞ f o,κ,hC (1 , j ) ≥ C j − lim h → + ∞ (cid:20)X i ∈ [ I ] i = i κ S hC ( i, κ − , j ) (cid:21) . (114)By slightly abusing notation, we also rewrite J κ as the subset of [ J ] , where, for any j ∈ J κ (114) is satisfied; and rewrite J κ , the subset of J κ , such that, for any j ∈ J κ , (114) achievesequality.If (113) holds with strict inequality, or J κ = ∅ and J κ = J κ , then there exists H < + ∞ such that for all h > H , (111) or (112) holds for a j ∈ J κ ; by taking limit for both side of(111) or (112), lim h → + ∞ S o,hA ( i κ , κ ) or lim h → + ∞ S o,hC ( i κ , κ, j ) exists. This proves the existence of lim h → + ∞ ρ hκ = lim h → + ∞ ¯ f o,κ,hA (cid:16) S o,hA ( i κ , κ ) (cid:17) = lim h → + ∞ ¯ f o,κ,hC (cid:16) S o,hC ( i κ , κ, j ) (cid:17) = ¯ f o,κ,hA (cid:16) lim h → + ∞ S o,hA ( i κ , κ ) (cid:17) = ¯ f o,κ,hC (cid:16) lim h → + ∞ S o,hC ( i κ , κ, j ) (cid:17) , (115) where the equality holds in the second line of (115) because of the Lipschitz continuity of theinverse functions, ¯ f o,κ,hA and ¯ f o,κ,hC , of f o,κ,hA and f o,κ,hC , respectively.If (113) holds with equality or J κ = J κ = ∅ , then let ∆ κA ( h ) = 1 − X i ∈ P ℓκ i = i κ S o,hA ( i, κ − − f o,κ,hA (1) , (116)and, for j ∈ J κ , ∆ κC ( h, j ) = C j − X i ∈ [ I ] i = i κ S o,hC ( i, κ − , j ) − f o,κ,hC (1 , j ) , (117)and so lim h → + ∞ ∆ κA ( h ) = lim h → + ∞ ∆ κC ( h, j ) = 0 .Similar to the discussion for (109), because of the strict monotonicity of the functions f o,κ,hA ( ρ ) and f o,κ,hC ( ρ, j ) ( j ∈ [ J ] ) ( h ∈ N + ) ρ κh takes the minimum value such that at least one equalityin S o,hA ( i κ , κ ) = f o,κ,hA ( ρ hκ ) ≤ min ( − X i ∈ P ℓκ i = i κ S o,hA ( i, κ − − ∆ κA ( h ) , − X i ∈ P ℓκ i = i κ S o,hA ( i, κ − ) , (118)and, for j ∈ J κ , S o,hC ( i κ , κ, j ) = f o,κ,hC ( ρ hκ , j ) ≤ min ( C j − R hκ ( j ) h − X i ∈ [ I ] i = i κ S o,hC ( i, κ − , j ) − ∆ κC ( h, j ) , C j − R hκ ( j ) h − X i ∈ [ I ] i = i κ S o,hC ( i, κ − , j ) ) (119) holds; that is, ρ hκ = min ((cid:26) ¯ f o,κ,hA (cid:18) − X i ∈ P ℓκ i = i κ S o,hA ( i, κ − − ∆ κA ( h ) (cid:19) , ¯ f o,κ,hA (cid:18) − X i ∈ P ℓκ i = i κ S o,hA ( i, κ − (cid:19)(cid:27) ∪ [ j ∈ J κ (cid:26) ¯ f o,κ,hC (cid:16) C j − R hκ ( j ) h − X i ∈ [ I ] i = i κ S o,hC ( i, κ − , j ) − ∆ κC ( h, j ) , j (cid:17) , ¯ f o,κ,hC (cid:16) C j − R hκ ( j ) h − X i ∈ [ I ] i = i κ S o,hC ( i, κ − , j ) , j (cid:17)(cid:27)) . (120)Because of the Lipschitz continuity of the functions ¯ f o,κ,hA and ¯ f o,κ,hC , the limits of all the bracketedarguments on the right hand side of Equation (120) exist and so lim h → + ∞ ρ hκ exists. In summary,for the κ th iteration with κ > , lim h → + ∞ ρ hκ always exists.Similar to the analysis for the case with κ = 1 , it follows that z ¯ ϕ ( o ) κ = lim h → + ∞ (cid:18) X n ∈ N iκ (cid:16) ρ hκ π o,κ,h, + i κ ( n ) + (cid:16) − ρ hκ (cid:17) π o,κ,h, − i κ ( n ) (cid:17) + o ( h ) h (cid:19) (121)exists. Also, the Lipschitz continuity of f o,κ,hA and f o,κ,hC and the existence of lim h → + ∞ ρ hκ lead to theexistence of lim h → + ∞ S o,hA ( i κ , κ ) = lim h → + ∞ f o,κ,hA ( ρ hκ ) and lim h → + ∞ S o,hC ( i κ , κ, j ) = lim h → + ∞ f o,κ,hC ( ρ hκ , j ) ( j ∈ [ J ] ).For i = i κ , lim h → + ∞ S o,hA ( i, κ ) and lim h → + ∞ S o,hC ( i, κ, j ) ( j ∈ [ J ] ) exist because S o,hA ( i, κ ) = S o,hA ( i, κ − and S o,hC ( i, κ, j ) = S o,hC ( i, κ − , j ) ( j ∈ [ J ] ).In summary, z ¯ ϕ ( o ) κ , lim h → + ∞ S o,hA ( i, κ ) and lim h → + ∞ S o,hA ( i, κ, j ) ( i ∈ [ I ] , j ∈ [ J ] ) still exist for κ > . The proof is completed by iteratively showing the existence of z ¯ ϕ ( o ) κ , lim h → + ∞ S o,hA ( i, κ ) and lim h → + ∞ S o,hA ( i, κ, j ) ( i ∈ [ I ] , j ∈ [ J ] ) from κ = 1 to N . (cid:3) Proof of Lemma 5.
We now consider (71). We will show that, when h is sufficiently large, the value of z ϕ,hι ( t ) as t → + ∞ is independent of those of PS pairs ι ′ with ι ′ > ι (PS pairs with lower priorities) atany time.For ι ∈ N i , ι ′ ∈ N i ′ , i ∈ P ℓ , i ′ ∈ P ℓ ′ , and ℓ, ℓ ′ ∈ [ L ] , the rates at which an active and passive sub-process in PS pair ι transition to PS pair ι ′ are, q ,h ( ι, ι ′ ) = hλ ℓ , if i = i ′ , n ι ′ = n ι + 1 ,n ι µ i , if i = i ′ , n ι ′ = n ι − , , otherwise , (122)and q ,h ( ι, ι ′ ) = n ι µ i , if i = i ′ , n ι ′ = n ι − , , otherwise , (123)respectively. By substituting for q ,h ( · , · ) and q ,h ( · , · ) in (68), we obtain d z ϕ,hι ( t ) d t = (cid:0) hλ ℓ υ ϕ,hι − ( z ϕ,h ( t )) z ϕ,hι − ( t ) + n ι + µ i z ϕ,hι + ( t ) − hλ ℓ υ ϕ,hι ( z ϕ,h ( t )) z ϕ,hι ( t ) − n ι µ i z ϕ,hι ( t ) (cid:1) Ih, if < n ι < | N i | − , − hλ ℓ υ ϕ,hι ( z ϕ,h ( t )) z ϕ,hι ( t ) Ih, if n ι = 0 , − n ι µ i z ϕ,hι ( t ) Ih, otherwise , (124)where ℓ = ℓ ι , i = i ι , and ι − and ι + are the PS pairs with i ι − = i ι + = i and satisfying n ι − = n ι − and n ι + = n ι + 1 , respectively.We define the proportion of RUs of resource pool j occupied by pattern i related to process z ϕ,h ( t ) to be ζ j,i ( z ϕ,h ( t )) = w j,i IC j X ι ′ ∈ [ N ] i ι ′ = i n ι ′ z ϕ,hι ′ ( t ) . (125)Consider the PS pair ι with the highest priority among all pairs in [ N ] . This must have n ι = 0 ,because pair ( i, has the highest priority among all pairs ( i, n ) , n ∈ N i , for any i ∈ [ I ] . From(124),(i) if λ ℓ ι > u ι ( z ϕ,h ( t )) , where u ι ( z ) := n ι + µ i z ι + , t ≥ , (126)then z ϕ,hι + ( t ) ( t ≥ t ) will increase and z ϕ,hι ( t ) will decrease, until z ϕ,hι ( t ) becomes zero or a capacity constraint is about to be violated; that is, there exists a j ∈ [ J ] , X i ′ ∈ [ I ] ζ j,i ′ ( z ϕ,hι ( t )) / (1 − ǫ hj,ι ) = 1; (127)(ii) if λ ℓ ι = u ι ( z ϕ,h ( t )) and z ϕ,hι ′ ( t ) = 0 for all ι ′ > ι + with i ι ′ = i ι , then z ϕ,hι ( t ) = z ϕ,hι ( t ) for any t ≥ t ;(iii) if u ι ( z ϕ,h ( t )) ≥ λ ℓ ι and n ι = 0 , then z ϕ,hι ′ ( t ) for all ι ′ > ι + with i ι ′ = i ι will decrease until λ ℓ ι = u ι ( z ϕ,h ( t )) and z ϕ,hι ′ ( t ) = 0 for all such ι ′ (invoking Case (ii)).In all of the cases (i)-(iii), for the PS pair ι with the highest priority, we consider a processstarting from time t and finishing at time t = t , at which either (127) holds or z ϕ,hι ′ ( t ) = 0 forall ι ′ > ι + with i ι ′ = i ι . In other words, this process ends in Case (i) or (ii). We refer to theprocess as the I-process and time t as its stopping time. Note that the I-process may continueindefinitely with t = + ∞ .Define a stopping pair of pattern i at time t , denoted by p i ( t ) , as the PS pair satisfying z ϕ,hι ( t ) > and z ϕ,hι ′ ( t ) = 0 for all ι ′ > ι with i ι ′ = i ι .If the I-process ends in Case (i), we then consider another process starting from its stoppingtime t . If (127) holds for some t ≥ t and j ∈ [ J ] , λ ℓ ι > u ι ( z ϕ,h ( t )) and ζ j,i ′ ( z ϕ,h ( t )) > for some i ′ ∈ [ I ] with w j,i ′ > and p i ′ ( t ) > ι (that is, resource pool j is partially occupied byanother pattern i ′ of which the stopping pair p i ′ ( t ) has less priority than pair ι ), then(iv) z ϕ,hι + ( t ) and u ι ( z ϕ,h ( t )) ( t > t ) will increase until λ ℓ ι = u ι ( z ϕ,h ( t )) (invoking Case (iii));(v) z ϕ,hι + ( t ) and u ι ( z ϕ,h ( t )) ( t > t ) will increase until z ϕ,hι ( t ) = 0 (stopping pair p i ι ( t ) forpattern i ι is no longer ι ); or(vi) z ϕ,hι + ( t ) and u ι ( z ϕ,h ( t )) ( t > t ) will increase until ζ j,i ′ ( z ϕ,h ( t )) = 0 for all i ′ with w j,i ′ > and p i ′ ( t ) > ι , and z ϕ,hι ′ ( t ) = 0 for all ι ′ > ι + with i ι ′ = i ι .We refer to such a process starting from time t until any of above three conditions is satisfiedas the II-process , and refer to its stopping time as t . The II-process may continue indefinitelywith t = + ∞ . The II-process describes the case where the RUs occupied by pattern i ′ can betaken by pattern i ι if the stopping pair of i ι (that is, PS pair ι ) has higher priority than that of i ′ and λ ℓ ι > u ι ( z ϕ,h ( t )) .We now generalize above discussions to a PS pair ι ∈ [ N ] with z ϕ,hι ′ ( t ) = 0 for all ι ′ < ι ;that is, PS pair ι is the one with the highest priority among ι ′ ∈ [ N ] with z ϕ,hι ′ ( t ) > . We can go through a similar argument by generalizing the definition of u ι ( z ϕ,h ( t )) so that u ι ( z ϕ,h ( t )) := X ι ′ ≤ ι,ℓ ι ′ = ℓ n ι ′ µ i ι ′ z ϕ,hι ′ ( t ) + n ι + µ i z ϕ,hι + ( t ) . (128)If n ι > , the description of the I-process is completed by adding(vii) if u ι ( z ϕ,h ( t )) ≥ λ ℓ ι and n ι > , then z ϕ,hι ′ ( t ) for all ι ′ > ι + with i ι ′ = i ι will decrease until(vii).1 λ ℓ ι = u ι ( z ϕ,h ( t )) and z ϕ,hι ′ ( t ) = 0 for all ι ′ > ι + with i ι ′ = i ι (invoking Case (ii)); or(vii).2 λ ℓ ι > u ι − ( z ϕ,h ( t )) and z ϕ,hι − ( t ) > , so the stopping pair p i ι ( t ) of pattern i ι becomes ι − .If the I-process for PS pair ι ends in Case (vii).2 at time t , since ι − always has higher prioritythan ι , ι − will be considered as the PS pair with the highest priority and z ϕ,hι − ( t ) > at time t . In this case, because λ ℓ ι > u ι − ( z ϕ,h ( t )) is guaranteed in Case (vii).2, the value z ϕ,hι − ( t ) for t ≥ t will be considered in Cases (iii) or (vii). Iteratively, once z ϕ,hι ( t ) goes into Case (vii),there will be an ι ′ ≤ ι with i ι ′ = i ι ′ , such that z ϕ,hι ′ ( t ) ends in Case (ii); or continues indefinitelyin Case (iii) or (vii).We firstly consider the situation with t , t < + ∞ . From above description of the I andII-processes, the process z ϕ,hι ( t ) ( t ≥ t ) will enter one of two regimes: • λ ℓ ι = u ι ( z ϕ,hι ( t )) and z ϕ,hι ′ ( t ) = 0 for all ι ′ > ι + with i ι ′ = i ι ; or • λ ℓ ι > u ι ( z ϕ,hι ( t )) , Equation (127) holds, there exists a j ∈ [ J ] such that ζ j,i ′ ( z ϕ,h ( t )) = 0 for all i ′ with w j,i ′ > and p i ι ′ ( t ) > ι , and z ϕ,hι ′ ( t ) = 0 for all ι ′ > ι + with i ι ′ = i ι .Intuitively, in the first regime, the arrival rate of RT ℓ ι is balanced by the total service ratesupported by all PS pairs with higher priorities than ι + and PS pair ι + ; and, in the secondregime, the capacity constraints are about to be violated, which helps balance the arrival rateby forcing the sub-processes to be passive (with zero arrival rate) when a resource pool is fullyoccupied.From the differential equation stated in (124), the value of z ϕ,hι ( t ) will not change once itenters any of the two regimes. We refer to them as the stability regimes , and say the value of z ϕ,hι ( t ) for the stopping pair ι of pattern i ι at time t becomes stable if z ϕι ( t ) = z ϕι ( t ′ ) for any t ′ ≥ t .If the I or II-process continues indefinitely, then it will continue indefinitely in one of Cases (i),(iii)-(vii). We recall the υ ϕ,hι ′ ( z ϕ,h ( t )) defined in (39), representing the proportion of activatedsub-processes in PS pair ι ′ ∈ [ N ] . From its definition, if z ϕ,hι ( t ) > , then, for all ι ′ > ι + with i ι ′ = i ι , υ ϕ,hι ′ ( z ϕ,h ( t )) = 0 . If the process z ϕ,hι ( t ) ( ι ∈ [ N ] ) continues indefinitely for time period t ∈ [ T, + ∞ ) in any of Cases (i), (iii)-(vii), then, for all ι ′ > ι + with i ι ′ = i ι and t ∈ [ T, ∞ ) , υ ϕ,hι ′ ( z ϕ,h ( t )) = 0 . Thus, for these ι ′ , lim t → + ∞ z ϕ,hι ′ ( t ) = 0 . In other words, lim t → + ∞ ( z ϕ,hι ( t )+ z ϕ,hι + ( t )) =1 : all the sub-processes of pattern i ι are transitioning between two states n ι and n ι + indefinitely.In this case, from the differential equation stated in (124), lim t → + ∞ z ϕ,hι ( t ) exists, and is a pointthat satisfies one of the stability regimes.Accordingly, the stable value of z ϕ,hι ( t ) is independent of the values of z ϕ,hι ′ ( t ′ ) for any t ′ ≥ and ι ′ > ι (PS pairs with lower priorities). Note that if ι becomes stable without delivering thestopping pair role to ι ′ > ι with i ι ′ = i ι , then ι ′ is removed from future consideration, becausethe value z ϕ,hι ′ ( t ) remains constant in the future.From the definition of the stability regimes, for any ǫ ∈ E + ∞ and z ϕ,h (0) ∈ Z , thereexists H ∈ N + such that, for all h > H , the stable values of z ϕ,h ( t ) (that is, lim t → + ∞ z ϕ,h ( t ) )can be iteratively calculated from PS pair ι = 1 to N , which are independent of h . Thus, z ϕ := lim h → + ∞ lim t → + ∞ z ϕ,h ( t ) exists with any initial point in Z . Note that the existence of positiveelements of ǫ is used to prioritize PS pairs in the asymptotic regime, which is crucial for thestable value of z ϕ,hι ( t ) to be independent from those of other stopping pairs with lower prioritiesthan ι .Recall that the z ϕ is continuous in ǫ ∈ E + ∞ and is a bounded vector in the probabilitysimplex Z . For the PS pair ι = 1 , z ϕι is continuous and increasing in ǫ j,ι ∈ (0 , ( j ∈ J i ι ),then lim k ǫ k→ z ϕι exists. If lim k ǫ k→ z ϕι ′ exists for any ι ′ < ι , then, since z ϕι is continuous andincreasing in ǫ j,ι ∈ (0 , ( j ∈ J i ι ), lim k ǫ k→ z ϕι also exists. Accordingly, lim k ǫ k→ z ϕ exists, forwhich the calculating procedure is the same as calculating z ¯ ϕ ( o ) . The lemma has been proved. (cid:3) A PPENDIX LP ROOF OF T HEOREM Proof.
If the capacity constraints described in (33) (or equivalently (5)) are decomposablewith decomposable values γγγ ∈ R J of the multipliers in the asymptotic regime, then there exist ν ∈ R L and a PS-pair ranking o ∈ O ( γγγ, ν ) such that the complementary slackness is satisfiedby the policy ¯ ϕ ( o ) and the multipliers γγγ . The policy ¯ ϕ ( o ) is optimal for the relaxed problem.Together with Theorem 1, the index policy ϕ derived from the same ranking o is also optimalin the asymptotic regime. The theorem is proved. TABLE IC
APACITIES AND C OST R ATES FOR RU S OF D IFFERENT P OOLS . Resource Pool ( j ) Capacity ( C j ) Cost Rate ( ε j ) .
046 4 .
995 0 .
679 2 .
761 9 .
010 4 .
775 3 . Resource Pool ( j ) Capacity ( C j ) Cost Rate ( ε j ) .
033 3 .
318 4 .
686 3 .
302 0 .
938 6 .
770 7 . (cid:3) A PPENDIX MP ROOF OF C OROLLARY Proof.
If the system is weakly coupled, then, from (62)-(66) in Appendix G, the ranking ofPS pairs o following the descending order of Ξ ∗ ι defined in (26) for ι ∈ [ N ] leads to an optimalpolicy ¯ ϕ ( o ) of the relaxed problem. Together with Theorem 2, the index policy ϕ derived fromsuch a PS-pair ranking o is asymptotically optimal. (cid:3) A PPENDIX NS ETTINGS OF S IMULATIONS
A. Settings of Simulations in Figure 3(a)
The simulations whose results are exhibited in Figure 3(a) have four request types and fourteenresource pools, with capacities and cost rates per RU given in Table I. • P = { , , . . . , , } , λ = 1 . , µ = µ = . . . = µ = 0 . , R = 4026 . ; • P = { , , . . . , , } , λ = 1 . , µ = µ = . . . = µ = 0 . , R = 3871 . ; • P = { , , . . . , , } , λ = 1 . , µ = µ = . . . = µ = 0 . , R = 3731 . ; • P = { , , . . . , , } , λ = 1 . , µ = µ = . . . = µ = 0 . , R = 3242 . ;where patterns , , , are dummy patterns for blocking requests. The weight vectors w i of patterns i ∈ [41] \{ , , , } are given in Table II, where e k ∈ { , } J , k ∈ [ J ] , is avector with all zero entries except its k th entry e kk = 1 . TABLE IIW
EIGHT V ECTORS OF P ATTERNS FOR F OUR D IFFERENT RT S . w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 2 e + 2 e w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 2 e + 2 e w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 2 e + 2 e w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 2 e + 2 e w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 2 e + 2 e w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 2 e + 2 e w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 2 e + 2 e w e + 3 e + 4 e w e + 3 e + e w e + e + 2 e w e + 3 e + 4 e w e + e + 2 e w e + 3 e + 4 e w e + e + 2 e w e + 3 e + 4 e w e + 3 e + 4 e TABLE IIIC
APACITIES AND C OST R ATES FOR RU S OF D IFFERENT P OOLS . Resource Pool ( j ) Capacity ( C j ) Cost Rate ( ε j ) .
684 7 .
249 0 .
224 4 .
969 6 .
874 8 . B. Settings of Simulations in Figure 3(b)
The simulations whose results are exhibited in Figure 3(b) have two request types and sixresource pools, with capacities and cost rates per RU given in Table III. • P = { , , } , λ = 1 . , µ = µ = 0 . , R = 3635 . ; • P = { , , , , } , λ = 1 . , µ = µ = µ = µ = 0 . , R = 3758 . ;where patterns , are dummy patterns for blocking requests. The weight vectors w i of patterns i ∈ [8] \{ , } are given in Table IV. C. Settings of Simulations in Figure 4
The simulations whose results are exhibited in Figure 4 have three request types and fifteenresource pools, with capacities and cost rates per RU given in Table V. • P = { , , . . . , , } , λ = 1 . , µ = µ = . . . = µ = 0 . , R = 3710 . ; • P = { , , , } , λ = 1 . , µ = µ = µ = 0 . , R = 3712 . ; TABLE IVW
EIGHT V ECTORS OF P ATTERNS FOR T WO D IFFERENT RT S . w e + e + e w e + 2 e + 4 e w e + e + e w e + 2 e + 4 e w e + 2 e + 4 e w e + 2 e + 4 e TABLE VC
APACITIES AND C OST R ATES FOR RU S OF D IFFERENT P OOLS . Resource Pool ( j ) Capacity ( C j ) Cost Rate ( ε j ) .
995 2 .
707 2 .
237 4 .
656 0 .
624 5 .
705 0 .
385 6 . Resource Pool ( j ) Capacity ( C j ) Cost Rate ( ε j ) .
492 6 .
584 1 .
085 7 .
332 5 .
862 1 .
938 8 . • P = { , , . . . , , } , λ = 1 . , µ = µ = . . . = µ = 0 . , R = 3821 . ;where patterns , , are dummy patterns for blocking requests. The weight vectors w i ofpatterns i ∈ [42] \{ , , } are given in Table VI. TABLE VIW
EIGHT V ECTORS OF P ATTERNS FOR F OUR D IFFERENT RT S . w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + 3 e + 2 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + 3 e + 2 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + 3 e + 2 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + e + 3 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + e + 3 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + e + 3 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + e + 3 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + e + 3 e w e + 3 e + e w e + 3 e + e w e + 3 e + e w e + e + 3 ee