Asymptotically Optimal Delay-aware Scheduling in Queueing Systems
11 Asymptotically Optimal Delay-awareScheduling in Queueing Systems
Saad Kriouile , Mohamad Assaad , and Maialen Larranaga TCL Chair on 5G, Laboratoire des Signaux et Syst`emes CentraleSup´elec, 91192 Gif sur Yvette, France ASML, P.O. Box 324, 5500 AH Veldhoven, The Netherlands
Abstract
In this paper, we investigate a general delay-aware channel allocation problem where the number of channelsis less than that of users. Due to the proliferation of delay sensitive applications, the objective of our problem ischosen to be the minimization of the total average backlog queues of the network in question. First, we show thatour problem falls in the framework of Restless Bandit Problems (RBP), for which obtaining the optimal solution isknown to be out of reach. To circumvent this difficulty, we tackle the problem by adopting a Whittle index approach.To that extent, we employ a Lagrangian relaxation of the original problem and prove it to be decomposable intomultiple one-dimensional independent subproblems. Afterwards, we provide structural results on the optimal policy ofeach of the subproblems. More specifically, we prove that a threshold policy is able to achieve the optimal operatingpoint of the considered subproblem. Armed with that, we show the indexability of the subproblems and characterizethe Whittle’s indices which are the basis of our proposed heuristic. We then provide a rigorous mathematical proofthat our policy is optimal in the infinitely many users regime. Finally, we provide numerical results that showcasethe remarkable good performance of our proposed policy and that corroborate the theoretical findings.
I. I
NTRODUCTION
This paper deals with user and channel scheduling, which has been widely recognized as a mean to improve thenetwork performance and to meet the service demands of the users. This problem has been widely studied in thepast and several allocation policies have been developed for various contexts (e.g. see [2]–[8] and the referencestherein). In 5G networks, the problem of channel and user scheduling will be receiving particular interest due to theincrease in the number of devices and users. Furthermore, the applications nowadays do not need high data ratesonly but they are more delay-sensitive, which implies that minimizing the delay is considered as a main designmetric in future networks.In this paper, we consider the problem of scheduling and channel allocation in a discrete time system composedof one central scheduler serving multiple users or queues. We consider that the traffic arriving to each queueis time varying, and that the number of users is higher than the number of channels, which is a quite realisticassumption especially with the growth in density of users in nowadays networks. At each timeslot, the centralscheduler decides to allocate the channels to users, where a channel can be seen as a server in wired networks or
Part of this work has been presented at the IEEE International Symposium of Information Theory [1]
May 22, 2020 DRAFT a r X i v : . [ c s . I T ] M a y a frequency bandwidth in wireless networks. Throughout this paper, we will use the terms ”channel” and ”server”interchangeably to designate a resource to allocate to users. Furthermore, we assume that the number of channelsis limited and each channel can only be allocated to one user at a time. The objective in this case is to find anallocation policy that minimizes the long-run average queue length of the users, as a mean to minimize the averagedelay in the network. Although it is a quite standard scheduling, we provide in this paper a rigorous mathematicalanalysis, leading to a novel scheduling algorithm of which we prove optimality in the many users regime. In fact,we show in this paper that the considered scheduling problem can be cast as a Restless Bandit Problem (RBP),which is a particular Markov Decision Processes (MDP). However, RBPs are PSPACE-Hard (see Papadimitriouet al. [9]), and hence their optimal solution is out of reach. One should therefore propose sub-optimal policieswhen dealing with such problems. In this paper, we approach the considered RBP problem using the Lagrangianrelaxation technique, which consists of relaxing the constraint on the available resources. In other words, insteadof having the constraint on the number of available channels satisfied in every time slot, we consider that it hasto be satisfied on average. This allows us to decompose the large relaxed optimization problem into much simplerone-dimensional problems. Based on the optimal solution of the individual relaxed problems, we develop a heuristicfor the original (i.e. non-relaxed) optimization problem. This heuristic is known as the Whittle’s index policy (WIP)and we will show that for our particular model, an explicit expression of the Whittle’s index can be found. WIP hasbeen proposed as a suboptimal policy for many problems in the literature, see for instance [10], [11]. It has alsobeen shown to perform near optimally in many scenarios and in the particular case of multiclass M/M/1 queues,WIP which simplifies to the cµ -rule is optimal, see Buyukkoc et al. [12], and Larranaga [13]. In this paper, we willprove that the developed WIP is asymptotically optimal in the many users regime. To that extent, we summarizein the following the key contributions of this paper: • We provide an analysis of the relaxed optimization problem, which let us obtain the structure of the optimalsolution of its dual problem. The optimal solution is shown to be a threshold-based policy by (i) provingthat the latter problem is decomposable and (ii) proving that the value function of the Bellman equation thatresolves each individual dual problem satisfies both the R-convexity and increasing properties. This part of theanalysis is far from trivial and constitutes one the main contributions in this paper. • We resolve the full balance equations verified by the stationary distribution of the user’s states under a generalthreshold policy n . This step is very crucial and requires a lot of analysis and computations. In fact, unlikethe other previous works where the full balance equations give an easy general recurrent relation between thestationary distribution at state i and state i + 1 under a threshold policy (e.g [14]), in our paper the term of thestationary distribution at any given state is linked to a set of terms of the stationary distribution at differentstates. Moreover, this relation depends on the value of the threshold n as we will see in Section IV. • We reformulate the individual dual problem of the relaxed problem using the steady state distribution. Af-terwards, we provide a general algorithm that allows us to obtain the Whittle index. To reduce even furtherthe complexity, we provide a rigorous proof of the indexability of the classes, along with several lemmasand definitions that allow us to derive simple expressions of the Whittle index. While in previous works
May 22, 2020 DRAFT the derivation of whittle index policy can be obtained using a standard approach, obtaining Whittle indexexpressions in our case is much more complex and requires several derivations and lemmas. • Unlike the previous works, in this paper we provide further characterization of the threshold-based optimalsolution of the relaxed optimization problem. The structure of this solution helps us to prove the local asymptoticoptimality of our proposed policy as we just need to compare the average cost under the Whittle’s Index policywith the optimal cost of the relaxed problem. The reason behind that is the fact that the latter is always lessthan the optimal cost of the original problem. • We show that the Whittle’s Index policy is asymptotically optimal in the infinitely many users regime, that is,when the number of users in the system as well as the available channels grow large. • Finally, we provide numerical performance results of the Whittle’s Index policy that corroborate our claims.
A. Related Work
The problem of resource allocation and scheduling in wireless networks has been widely studied in the literature.In [2]–[6], throughput optimal schedulers have been derived for single channel, multi-channel and multi user MIMOcontexts. The aforementioned set of work focuses on developing strategies that stabilize the queues of the usersusing the max weight rule. The classical max weight rule is however known to be not delay optimal. To overcomethis issue, many works have been developed in the past to take into account the average delay of the traffic of theusers (e.g. see [15] and the references therein). Most of the existing works use Markov Decision Process (MDP)frameworks and develop allocation strategies using Bellman equation (e.g. by using value iteration, policy iteration,etc.). However, MDP frameworks and Bellman equation suffer from the curse of dimensionality, which leads tocomplex resource allocation strategies. In [16] [17], the authors try to minimize the average delay of the users’queues using Markov Decision Process (MDP) and stochastic learning tools. The complexity of the developedsolutions is however much higher than the Whittle index policy. Stochastic learning is also used in [18] to dealwith the problem of power allocation in an OFDM (Orthgonal Frequency Division Multiplexing) system with thegoal being to minimize the average delay of the users’ packets in the queues. The developed solution requires highmemory and computational complexity as compared to the Whittle index policy.Whittle index based policies have also been used/developed in wireless networks to deal with the problem ofpilot allocation over Markovian channel models. If a pilot is allocated to a user, its CSI can be estimated correctlyand the user can hence transmit at a given rate. In [10] [14], a Gilbert-Elliot channel model is considered and theWhittle index is derived. It has been shown in [14] [19] that a policy based on Whittle index is asymptoticallyoptimal for their specific problem. The authors in [20] extended the problem of pilot allocation to the case wherethe channel evolves according to a Markovian process between K states instead of two states as in the Gilbert-Elliotmodel. In the aforementioned papers, the queues of the users were not considered. In fact, the focus was on thechannel allocation such that the long term total throughput (or equivalent objective function) is maximized withouttaking into account the dynamic traffic of the users. In this paper, we consider that the traffic arrival is bursty andthat the objective of the user/channel allocation is to minimize the long term average queues of the users.
May 22, 2020 DRAFT
In [11], a derivation of the Whittle index values for a simple multiclass M/M/1 model has been considered (whereonly one user can be served). However, the optimality of the obtained Whittle index policy has not been provedin [11] and the time was assumed to be continuous in their model. The authors in [21] considered the problem ofproject/job scheduling in which an effort is allocated to a fixed number of projects. The performance of a Whittleindex based policy was analyzed under a continuous time model. In contrast to these two papers, we consider thatthe time is slotted and that several users can be scheduled at a given time slot and not only one user. We provide anexplicit characterization of the Whittle indices, develop a Whittle index channel allocation policy for our problemand prove the asymptotic optimality of the developed policy in the many users regime.The remainder of the paper is organized as follows: In Section II, we formulate the problem under investigationand we introduce the Lagrangian relaxation. In Section III, we prove the optimality of threshold/monotone policiesfor the relaxed problem. In Section IV, we compute the steady-state distribution of the system under a generalthreshold policy. In Section V, we characterize the Whittle indices explicitly and we lay out our proposed Whittleindex based policy. Section VI provides further characterization of the optimal solution of the relaxed problem. InSections VII and VIII, we prove the local and global asymptotic optimality of our proposed scheme respectively. InSection IX, we evaluate the performance of the Whittle index policy numerically. Lastly, the mathematical proofsare provided in the appendices.II. S
YSTEM M ODEL AND P ROBLEM F ORMULATION
A. System model description
We consider a time-slotted system with one central scheduler, N users/queues and M uncorrelated channels (orservers) with ( N > M ). The terms ”server” and ”channel” will be used interchangeably throughout this paper, aswell as the terms ”user” and ”queue”. A channel can be allocated to at most one user, hence only M users willbe able to transmit (i.e. send packets) at time slot t . We consider K different classes of users and we assume thateach user in class- k , if scheduled, transmits at most R k packets per time slot. We will refer to R k as the maximumtransmission rate for every user in class k and we assume that min k { R k } ≥ . We denote by γ k the proportion ofclass- k users in the system. We further denote by A ki ( t ) ∈ { , . . . , R k − } the number of packets that arrive toqueue i in class k at time slot t . We also let q k,φi ( t ) denote the number of packets in queue i in class k . Furthermore, s k,φi ( q φ ( t )) will denote the transmission action under a decision policy φ and q φ ( t ) the vector of all queue lengths ( q ,φ ( t ) , . . . , q ,φNγ ( t ) , . . . , q K,φ ( t ) , . . . , q K,φNγ K ( t )) . For the sake of clarity, we define s k,φi ( t ) := s k,φi ( q φ ( t )) . If policy φ prescribes to schedule user i in class k at time t , then s k,φi ( t ) = 1 , and s k,φi ( t ) = 0 otherwise. We denote by L the buffer capacity, which is considered to be the same for all queues and can be very high. The general systemmodel is presented in Figure ?? .Based on our system model, the number of packets in queue i of class k evolves as follows: q k,φi ( t + 1) = min { ( q k,φi ( t ) − R k s k,φi ( t )) + + A ki ( t ) , L } , (1)where ( x ) + = max { x, } . May 22, 2020 DRAFT
Figure 1: System ModelThe objective of the present work is to find a scheduling policy φ that minimizes the average queue length ofthe users which results, according to Little Law, in the minimization of the average delay. B. Problem formulation
The cost incurred by user i in class k , at time t is equal to a k q k,φi ( t ) for all i ∈ { , . . . , γ k N } where a k is a predefined weight. One can see that the model described in Section II-A belongs to the family ofRestless Bandit Problems (RBP) [22]. We consider the broad class Φ of scheduling policies in which a schedulingdecision depends on the history of observed queue states and scheduling actions. Our user and channel allocationproblem therefore consists of identifying the policy φ ∈ Φ that minimizes the infinite horizon expected aver-age queues, subject to the constraint on the number of users selected at each time slot. Given the initial state q = ( q (0) , . . . , q Nγ (0) , ..., q K (0) , . . . , q KNγ K (0)) , the problem can be formulated as follows:min φ ∈ Φ lim sup T →∞ T E (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 a k q k,φi ( t ) | q (cid:35) , (2)s.t. K (cid:88) k =1 γ k N (cid:88) i =1 s k,φi ( t ) ≤ αN, for all t, (3)where α = M/N is the fraction of users that can be scheduled.III. R
ELAXED P ROBLEM AND T HRESHOLD - BASED P OLICY
As it has been discussed in the introduction of this paper, RBPs are PSPACE-Hard (see Papadimitriou et al. [9])and therefore one should develop well performing sub-optimal policies to solve these problems. In this paper, thedevelopment of our policy is done through several steps. First, we consider a Lagrangian relaxation of our problemand show that it can be decomposed into several one-dimensional problems. We then prove that the optimal solutionto each of these relaxed problems is a threshold-based policy. We then compute the stationary distribution of thestates of the system under the aforementioned threshold policy. This allows us to obtain a closed form expressionof the Whittle index values of the relaxed problem and develop a Whittle index-based scheduling policy for theoriginal RBP.In this section, we first formulate the relaxed problem and prove that its optimal policy is a threshold-based one.
May 22, 2020 DRAFT
A. Relaxed Problem and Dual Problem
The Lagrangian relaxation consists of relaxing the constraint on the available resources. Namely, we considerthat the constraint in Equation (3), has to be satisfied on average and not in every decision epoch, that is, lim sup T →∞ T E (cid:34) K (cid:88) k =1 γ k N (cid:88) i =1 s k,φi ( t ) (cid:35) ≤ αN. (4)Note that, contrary to the strict constraint in Equation (3), the relaxed constraint allows the activation of more than α fraction of users at each time slot. If we note W the Lagrangian multiplier for the constrained problem, then theLagrange function equals to: f ( W, φ ) = lim sup T →∞ T E (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 ( a k q k,φi ( t ) + W s k,φi ( t )) | q (cid:35) − W αN, (5)where W can be seen as a subsidy for not transmitting. Therefore, the dual problem for a given W ismin φ ∈ Φ f ( W, φ ) . (6) B. Problem Decomposition and Threshold-based Policy
In this section, we show that the relaxed problem can be decomposed into N one-dimensional subproblems, forwhich the optimal solution is a threshold-based policy. To do that, we first get rid of the constants that do notdepend on φ and reformulate the problem as follows,min φ ∈ Φ lim sup T →∞ T E (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 ( a k q k,φi ( t ) + W s k,φi ( t )) | q (cid:35) . (7)One can see that the solution of this problem can be deduced from the well known Bellman equation (see Ross [23]).More specifically: ¯ V ( q ) + θ = min s { K (cid:88) k =1 γ k N (cid:88) i =1 C k ( q ki , s ki ) + (cid:88) q (cid:48) P r ( q (cid:48) | q , s ) ¯ V ( q’ ) } , (8)for all q = ( q , . . . , q γ N , . . . , q K , . . . , q Kγ k N ) , with q ki ∈ { , . . . , L } being the queue length of class- k user i , and s = ( s , . . . , s γ N , . . . , s K , . . . , s Kγ k N ) , with s ki ∈ { , } being the action taken with respect to user i in class k .In equation (8), V ( · ) represents the Value Function, θ is the optimal average cost and C k ( q ki , s ki ) is the holdingcost a k q ki + W s ki . The optimal decision for each state q can be obtained by minimizing the right hand side ofEquation (8). We now show that the problem can be decomposed into N independent subproblems by decomposing ¯ V ( · ) into separate Value Functions for each user i in class k , i.e., V ki ( · ) . In other words, the optimal decision s toproblem (8) is a vector composed of elements s ki , where each s ki is nothing but the optimal decision that solvesthe individual Bellman equations. V ki ( q ki ) + θ ki = min s ki { C k ( q ki , s ki ) + (cid:88) q (cid:48) ki P r ( q (cid:48) ki | q ki , s ki ) V ki ( q (cid:48) ki ) } . (9) Proposition 1.
Let V ki ( · ) be the optimal value function that solves Equation (9) , and let ¯ V ( · ) be the optimal valuefunction that solves Equation (8) then: ¯ V ( · ) = K (cid:88) k =1 γ k N (cid:88) i =1 V ki ( · ) . (10) May 22, 2020 DRAFT
Proof.
See appendix A.In this section, we show that the solution to each individual problem (for each user i ) follows the structure of athreshold policy. For ease of notation, we drop the indices k and i and consider that V ( · ) is the value function fora given user. We first provide the definition of threshold policies. Definition 1.
A threshold policy is a policy φ ∈ Φ for which there exists n ∈ {− , , · · · , L } such that when thequeue of user i is in state q ≤ n , the prescribed action is s − ∈ { , } , and when the queue q > n , the prescribedaction is s + ∈ { , } while baring in mind that s − (cid:54) = s + .Since we only have two possible actions, a policy is of the form threshold policy if and only if it is monotone in q . The solution of the bellman equation (9) V ( · ) can be obtained by the well known Value iteration algorithm,which consists of updating V t ( · ) using the following equation: V t +1 ( q ) = min s { C ( q, s ) + (cid:88) q (cid:48) P r ( q (cid:48) | q, s ) V t ( q (cid:48) ) } − θ (11)We consider that the initial value function V is equal to for any q , (i.e. for all q , V ( q ) = 0 ). After many iterations, V t ( · ) will converge to the unique fixed point of the equation (9) called V ( · ) . However, the value iteration algorithmis known to have high complexity and can take a long time to converge. Therefore, we will give some structuralproperties of the value function V t ( · ) for any t and conclude that the optimal policy is a threshold-based one.For that, we consider the operator T O such that for each ( q, s ) ∈ { , . . . , L } × { , } ( T O ( V ))( q, s ) = C ( q, s ) + (cid:88) q (cid:48) P r ( q (cid:48) | q, s ) V ( q (cid:48) ) − θ (12)We first provide some useful definitions and preliminary results before proving the desired results. Definition 2.
We say that function f is R-convex in X = { , . . . , L } , if for any x and y in X such that x < y , wehave: f ( y + R ) − f ( x + R ) ≥ f ( y ) − f ( x ) (13) Lemma 1.
If for a given function f , there exists R such that for any x ∈ { , . . . , L − } , f ( x +1+ R ) − f ( x + R ) ≥ f ( x + 1) − f ( x ) , then f is R-convexProof. Considering y and x in { , . . . , L − } , with y > x , we have: f ( y + R ) − f ( x + R ) = y − (cid:88) k = x [ f ( k + 1 + R ) − f ( k + R )] (14) ≥ y − (cid:88) k = x [ f ( k + 1) − f ( k )] (15) = f ( y ) − f ( x ) (16)which concludes the proof. May 22, 2020 DRAFT
Definition 3.
Let g ( q, s ) be a real valued function defined on X × S , with S = { , } , and X = { , . . . , L } . Wesay that g is submodular if g ( q + 1 , − g ( q + 1 , ≤ g ( q, − g ( q, for all q on X . Theorem 1. min s T O ( · )( · , s ) conserves the R -convexity and increasing properties. In other words, if the input ofthe operator TO, i.e. a given V ( · ) , is R-convex and increasing function in q then min s T O ( V )( · , s ) is R-convex andincreasing function in q .Proof. Let us consider that the input of
T O ( V )( · , s ) ), i.e. a given, V ( · ) is R-convex and increasing in q . For theincreasing property of min s T O ( V )( · , s ) , we have by definition that C ( · , s ) is increasing in q . We also have that V ( · ) is increasing in q and the number of queue states is finite, then (cid:80) q (cid:48) P r ( q (cid:48) |· , s ) V ( q (cid:48) ) is an increasing functionin q (see Puterman [24]). Since θ is a constant, T O ( V )( · , s ) is increasing in q and therefore min s T O ( V )( · , s ) isincreasing in q .For R-convexity, we should first prove the following lemma. Lemma 2. If V ( · ) is R-convex and increasing in q , C ( q, s ) and (cid:80) q (cid:48) P r ( q (cid:48) | q, s ) V ( q (cid:48) ) are submodular functions.Proof. The proof is given in appendix B. (cid:4)
This demonstrates that the function
T O ( V )( · , · ) is submodular since it is the sum of two submodular functions. Letus now show that min s T O ( V )( · , s ) is R-convex. For that, we consider the function ∆ T O ( V )( q ) = T O ( V )( q, − T O ( V )( q, which is decreasing in q since T O ( V )( · , · ) is submodular. Therefore, there exists r ∈ R ∪ { + ∞} suchthat for q ≤ r , ∆ T O ( V )( q ) ≥ and for q ≥ r , ∆ T O ( V )( q ) ≤ . In the remainder of the proof, we consider allpossible cases of q and r .If q + R + 1 , q + R, q, q + 1 ≥ r :min s T O ( V )( q + 1 + R, s ) − min s T O ( V )( q + R, s ) =
T O ( V )( q + 1 + R, − T O ( V )( q + R, (17) = T O ( V )( q + 1 , − T O ( V )( q, (18) ≥ T O ( V )( q + 1 , − T O ( V )( q, (19) = min s T O ( V )( q + 1 , s ) − min s T O ( V )( q, s ) (20)where the inequality is due to the sub-modularity of T O ( V )( · , · ) .If q ≤ r ≤ q + 1 , q + R, q + 1 + R :min s T O ( V )( q + 1 + R, s ) − min s T O ( V )( q + R, s ) =
T O ( V )( q + 1 + R, − T O ( V )( q + R, (21) = T O ( V )( q + 1 , − T O ( V )( q, (22) ≥ T O ( V )( q + 1 , − T O ( V )( q, (23) = min s T O ( V )( q + 1 , s ) − min s T O ( V )( q, s ) (24) May 22, 2020 DRAFT if q, q + 1 ≤ r ≤ q + R, q + R + 1 :min s T O ( V )( q + 1 + R, s ) − min s T O ( V )( q + R, s ) =
T O ( V )( q + 1 + R, − T O ( V )( q + R, (25) = T O ( V )( q + 1 , − T O ( V )( q, (26) = min s T O ( V )( q + 1 , s ) − min s T O ( V )( q, s ) (27)if q, q + 1 , q + R ≤ r ≤ q + R + 1 :min s T O ( V )( q + 1 + R, s ) − min s T O ( V )( q + R, s ) =
T O ( V )( q + 1 + R, − T O ( V )( q + R, (28) ≥ T O ( V )( q + 1 + R, − T O ( V )( q + R, (29) = T O ( V )( q + 1 , − T O ( V )( q, (30) = min s T O ( V )( q + 1 , s ) − min s T O ( V )( q, s ) (31)If q, q + 1 , q + R ; q + R + 1 ≤ r :min s T O ( V )( q + 1 + R, s ) − min s T O ( V )( q + R, s ) =
T O ( V )( q + 1 + R, − T O ( V )( q + R, (32) ≥ T O ( V )( q + 1 + R, − T O ( V )( q + R, (33) = T O ( V )( q + 1 , − T O ( V )( q, (34) = min s T O ( V )( q + 1 , s ) − min s T O ( V )( q, s ) (35)Using lemma 1, min s T O ( V )( · , s ) is R-convex in q , i.e., we can conclude the R-convexity conservation. Remark 1.
Theorem 1 means that if the value function V t is increasing and R-convex, then the value function V t +1 in equation (11) , which is computed with the operator T O , is increasing and R-convex.Thus, as V is increasing and R-convex, all V t are increasing and R-convex and therefore we can conclude that thevalue function V will be also R-convex and increasing in q . Corollary 1.
The optimal policy φ ∗ of each one-dimensional relaxed subproblem is a threshold-based policy.Proof. As explained in Definition , it is sufficient to prove that the optimal policy φ ∗ is monotone in q .We consider q ≤ q . According to Remark , V ( . ) is increasing and R-convex, then using lemma 2, T O ( V ) issubmodular. Therefore, we have: ( T O ( V ))( q , − ( T O ( V ))( q , ≥ ( T O ( V ))( q , − ( T O ( V ))( q , (36) May 22, 2020 DRAFT0 If φ ∗ ( q ) = argmin s ( T O ( V ))( q , s ) = 0 Hence, ( T O ( V ))( q , − ( T O ( V ))( q , ≥ ( T O ( V ))( q , − ( T O ( V ))( q , (37)Given that ( T O ( V ))( q , − ( T O ( V ))( q , ≥ , then: ( T O ( V ))( q , − ( T O ( V ))( q , ≥ (38)Which leads to: argmin s ( T O ( V ))( q , s ) = 0 (39)i.e. φ ∗ ( q ) ≤ φ ∗ ( q ) (40)If φ ∗ ( q ) = argmin s ( T O ( V ))( q , s ) = 1 , obviously we have that: φ ∗ ( q ) ≤ φ ∗ ( q ) (41)Therefore, we can conclude that the optimal solution is monotone and increasing in q , which implies that it is athreshold policy. IV. S TATIONARY DISTRIBUTION
We have seen previously that the optimal solution of problem (7) is a threshold-based policy. Let us define n k as the threshold for users in class k , i.e. if the queue state of user i in class k is q ki such that q ki ≤ n k then the userwill not be scheduled, and else, the user will be selected for transmission. The objective of this section is to derivethe stationary distribution of the users’ states. This will be useful in the subsequent section in the derivation of aclosed form expression of the Whittle index values. We assume here that at each queue i in class k , packets arriveaccording to a discrete uniform distribution, that is, P ( A ki ( t ) = x ) = ρ k for all ≤ x ≤ R k − and otherwise,where ρ k = 1 /R k .For ease of notation, we again drop the indices k and i (e.g. we denote the threshold by n and the queue lengthby q ). We denote by p n ( i, j ) the transition probability from state i to j , by u the stationary distribution under thethreshold policy n , and by R the maximum rate ( ρ = 1 /R ). One can notice that u verifies the full balance equation,i.e.: u ( i ) = L (cid:88) j =0 p n ( j, i ) u ( j ) = n (cid:88) j =0 p n ( j, i ) u ( j ) + L (cid:88) j = n +1 p n ( j, i ) u ( j ) (42) Definition 4.
We define π i as: π i = ρ if ≤ i ≤ R − else (43) Proposition 2.
The expressions of p n ( j, i ) are given by:if ≤ i < L and j ≤ n p n ( j, i ) = π i − j = ρ if ≤ i − j ≤ R − else (44) May 22, 2020 DRAFT1 if ≤ i < L and n < j ≤ Lp n ( j, i ) = π i − ( j − R ) + = ρ if ≤ i − ( j − R ) + ≤ R − else (45) if i = L and j ≤ np n ( j, L ) = ( R − L + j ) π L − j = ( R − L + j ) ρ if ≤ L − j ≤ R − else (46) if i = L and n < j ≤ Lp n ( j, L ) = ( R − L + ( j − R ) + ) π L − ( j − R ) + = ( R − L + ( j − R ) + ) ρ if ≤ L − ( j − R ) + ≤ R − else (47) Proof.
See appendix C.
Proposition 3.
The expressions of the stationary distribution is: • L < R :1) − ≤ n ≤ L − : u ( i ) = ρ k (1 − ρ k ) n − i if ≤ i ≤ nρ k if n + 1 ≤ i ≤ L − − ρ k ) n +1 − ( L − n − ρ k if i = L (48) n = L : u ( i ) = if ≤ i ≤ L − if i = L (49) • R ≤ L < R :1) − ≤ n ≤ L − R − : u ( i ) = ρ − ( n − i ) ρ if ≤ i ≤ nρ if n + 1 ≤ i ≤ R − ρ − ( i − n ) ρ if R ≤ i ≤ n + R (50) L − R ≤ n < R : u ( i ) = ρ − ρ ( n − i ) if ≤ i ≤ L − R − − ρ ) n − i ρ if L − R ≤ i ≤ nρ if n + 1 ≤ i ≤ R − ρ − ρ ( i − n ) if R ≤ i ≤ L − − ρ ) n − L + R +1 − ρ ( L − − n ) if i = L (51) May 22, 2020 DRAFT2 R ≤ n ≤ L − : u ( i ) = ρ − ρ ( n − i ) if n − R + 1 ≤ i ≤ L − R − − ρ ) n − i ρ if L − R ≤ i ≤ nρ − ρ ( i − n ) if n + 1 ≤ i ≤ L − − ρ ) n − L + R +1 − ρ ( L − − n ) if i = L (52) n = L u ( i ) = if ≤ i ≤ L − if i = L (53) • L ≥ R − ≤ n < R : u ( i ) = ρ − ( n − i ) ρ if ≤ i ≤ nρ if n + 1 ≤ i ≤ R − ρ − ( i − n ) ρ if R ≤ i ≤ n + R (54) R ≤ n < L − R : u ( i ) = ρ − ( n − i ) ρ if n − R + 1 ≤ i ≤ nρ − ( i − n ) ρ if n ≤ i ≤ n + R − (55) L − R ≤ n < L : u ( i ) = ρ − ρ ( n − i ) if n − R + 1 ≤ i ≤ L − R − − ρ ) n − i ρ if L − R ≤ i ≤ nρ − ρ ( i − n ) if n + 1 ≤ i ≤ L − − ρ ) n − L + R +1 − ρ ( L − − n ) if i = L (56) n = L : u ( i ) = if ≤ i ≤ L − if i = L (57) Proof.
See appendix D. V. W
HITTLE ’ S INDEX
In this section, we provide the derivation of the Whittle indices, which are values that depend on the queuestate of the user and its maximum rate. Although this derivation is made using the relaxed problem, it allows usto develop a heuristic for the original problem. It is worth mentioning that the Whittle’s index at given state, say n , represents the Lagrange multiplier for which the optimal decision of the individual dual relaxed problem at thisstate is indifferent (passive and active decision are both optimal). However, the Whittle index is well defined onlyif the property of indexability is satisfied. This property requires to establish that as the Lagrange multiplier (orequivalently the subsidy for passivity W) increases, the collection of states in which the optimal action is passiveincreases. In this section, we work on a given class k , and we consider its maximum transmission rate is R with ρ = 1 /R . All the obtained results here can be applied for any class. We start the derivation by first reformulating May 22, 2020 DRAFT3 the dual of the relaxed problem using the stationary distribution derived in the previous section. Since the solutionof the dual of the relaxed problem (7) (given a constant W ) is a threshold-based policy, we can reformulate theproblem as follows: min n ∈ [0 ,L ] E [ aq n + W s n ] = min n ∈ [0 ,L ] { L (cid:88) q =0 au n ( q ) q − W n (cid:88) q =0 u n ( q ) } (58)with n and u n being the threshold and the stationary distribution under the threshold policy n .The new formulation of the problem turns out to be useful to derive the Whittle indices since, for any W , we canfind the minimizer of the expression in equation (58).We first give the expression of the mean cost in equation (58) given threshold n (for all possible values of n and L ). • L < R :if − ≤ n ≤ L − : L (cid:88) q =0 au n ( q ) q = a [( L + R )(1 − ρ ) n +1 + n − R + 1 + ( L − − n )( n − L )2 R ] (59)if n = L : L (cid:88) q =0 au n ( q ) q = aL (60) • R ≤ L < R :if − ≤ n ≤ L − R − : L (cid:88) q =0 au n ( q ) q = a [ R −
12 + n ( n + 1)2 R ] (61)if L − R ≤ n ≤ R − : L (cid:88) q =0 au n ( q ) q = 2 aR (1 − ρ ) n − L + R +1 − a [ n ( n + 1)2 R + R −
12 + LR ( L − n − (62)if R ≤ n ≤ L − : L (cid:88) q =0 au n ( q ) q = a [ n + 1 − R + 2 R (1 − ρ ) n − L + R +1 + ρ ( L − − n )( n − L )] (63)if n = L : L (cid:88) q =0 au n ( q ) q = aL (64) • L ≥ R :if − ≤ n ≤ R − : L (cid:88) q =0 au n ( q ) q = a [ R −
12 + n ( n + 1)2 R ] (65)if R ≤ n ≤ L − R : L (cid:88) q =0 au n ( q ) q = an (66) May 22, 2020 DRAFT4 if L − R + 1 ≤ n ≤ L − : L (cid:88) q =0 au n ( q ) q = a [ n + 1 − R + 2 R (1 − ρ ) n − L + R +1 + ρ ( L − − n )( n − L )] (67)if n = L : L (cid:88) q =0 au n ( q ) q = aL (68)Second, we provide the expression of the passive decision’s average time in equation (58) given a threshold n : • L < R :if − ≤ n ≤ L − : n (cid:88) q =0 u n ( q ) = 1 − (1 − ρ ) n +1 (69)if n = L : n (cid:88) q =0 u n ( q ) = 1 (70) • R ≤ L < R :if − ≤ n ≤ L − R − : n (cid:88) q =0 u n ( q ) = (1 − n R )( n + 1 R ) (71)if L − R ≤ n ≤ R − : n (cid:88) q =0 u n ( q ) = L ρ L − − n )( L − n ) + 1 + ρ ρn − (1 − ρ ) n − L + R +1 (72)if R ≤ n ≤ L − : n (cid:88) q =0 u n ( q ) = ρ L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 (73)if n = L : n (cid:88) q =0 u n ( q ) = 1 (74) • L ≥ R :if − ≤ n ≤ R − : n (cid:88) q =0 u n ( q ) = (1 − n R )( n + 1 R ) (75)if R ≤ n ≤ L − R : n (cid:88) q =0 u n ( q ) = 12 + 12 R (76)if L − R + 1 ≤ n ≤ L − : n (cid:88) q =0 u n ( q ) = ρ L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 (77)if n = L : n (cid:88) q =0 u n ( q ) = 1 (78) May 22, 2020 DRAFT5
A. Computation of the Whittle index values
We first formalize the indexability and the Whittle’s index in the following definitions.
Definition 5.
Considering problem (58) for a given W , we define D ( W ) as the set of states in which the optimalaction (with respect to the optimal solution of Problem (58) ) is the passive one. In other words, n ∈ D ( W ) if andonly if the optimal action at state n is the passive one. D ( W ) is well defined as the optimal solution of Problem (58) is a stationary policy, more precisely, a thresholdbased policy. Definition 6.
A class is indexable if the set of states in which the passive action is the optimal action increases in W , that is, W (cid:48) < W ⇒ D ( W (cid:48) ) ⊆ D ( W ) . When the class is indexable, the Whittle’s index in state n is defined as: W ( n ) = min { W | n ∈ D ( W ) } (79)In the literature, several works have been conducted to find the Whittle index values. For example, an interestingiterative algorithm has been provided in [13]. Even though the context of our work here is different from the oneconsidered in [13], we will prove in the sequel that the proposed algorithm in [13] can be adapted to our case upto some modifications (e.g. in our case we have a maximum buffer state L, etc.). In addition, further analysis willbe provided here to derive a closed form expression of the Whittle index values. We will first provide this modifiedalgorithm and then prove that it allows the computation of the Whittle’s index values for our problem. Algorithm 1
Whittle Index Computation Init.
Let j be initialized to Find W = inf n ∈ N (cid:80) Lq =0 au n ( q ) q − (cid:80) Lq =0 au − ( q ) q (cid:80) nq =0 u n ( q ) Define n as the largest minimizer of the above expression Let W ( k ) = W for all k ≤ n while n j (cid:54) = L do j = j + 1 Define M j the set { n : (cid:80) nq =0 u n ( q ) = (cid:80) n j − q =0 u n j − ( q ) } ∪ { , · · · , n j − } Find W j = inf n ∈ N \ M j (cid:80) Lq =0 au n ( q ) q − (cid:80) Lq =0 au nj − ( q ) (cid:80) nq =0 u n ( q ) − (cid:80) nj − q =0 u nj − ( q ) Define n j as the largest minimizer of the above expression Let W ( k ) = W j for all n j − < k ≤ n j Output
The Whittle index of state k which is given by W ( k ) Proposition 4.
Assuming that the optimal solution is a threshold policy, and that (cid:80) nq =0 u n ( q ) is increasing,then the class is indexable. Moreover, if (cid:80) Lq =0 au n ( q ) q is increasing in n and for all i and j such that i < j (cid:80) iq =0 u i ( q ) = (cid:80) jq =0 u j ( q ) = ⇒ (cid:80) Lq =0 au i ( q ) q < (cid:80) Lq =0 au j ( q ) q , then the Whittle’s index values are computed by May 22, 2020 DRAFT6 applying Algorithm 1.Proof.
For the proof, see appendix H.
Remark 2.
In order to simplify the notation in the sequel, we denote (cid:80) Lq =0 au n ( q ) q by a n and (cid:80) nq =0 u n ( q ) by b n . In order to apply Algorithm 1 that allows to obtain the Whittle’s index for each state in our case, we need toprove that the conditions given in Proposition 4 are satisfied. We focus only on the third case of L ( L ≥ R ) since itis more realistic as the maximum buffer length L is often much higher than the transmission rate R k . Nevertheless,the analysis in this paper can be easily extended to the case where L < R . To that end, we will be limited to giveonly the Whittle index expressions when L < R as well as a concise proof in the end of this section. Theorem 2.
For each k , the class-k is indexable.Proof. According to Proposition 4, we just need to prove that (cid:80) nq =0 u n ( q ) is increasing n . The proof is based onthe two following two lemmas. Lemma 3. (cid:80) nq =0 u n ( q ) is strictly increasing in [ − , R − Proof.
See appendix K. (cid:4)
Lemma 4. (cid:80) nq =0 u n ( q ) is strictly increasing in n ∈ [ L − R + 1 , L − Proof.
See appendix L (cid:4)
We have that for any n ∈ [ R, L − R ] : n (cid:88) q =0 u n ( q ) = R − (cid:88) q =0 u R − ( q ) = L − R +1 (cid:88) q =0 u L − R +1 ( q ) = 12 + 12 R (80)Therefore: R − (cid:88) q =0 u R − ( q ) ≤ n (cid:88) q =0 u n ( q ) ≤ L − R +1 (cid:88) q =0 u L − R +1 ( q ) (81)Moreover: L (cid:88) q =0 u L ( q ) = 1 > − (1 − ρ ) R = L − (cid:88) q =0 u L − ( q ) (82)Consequently, by combining Lemma 3 and Lemma 4, we can conclude the indexability of the class as (cid:80) nq =0 u n ( q ) is shown to be increasing in [0 , L ] .We prove the two others conditions of Proposition 4 which are the increasing property of (cid:80) Lq =0 au n ( q ) q in n ,and that for all i and j such that i < j (cid:80) iq =0 u i ( q ) = (cid:80) jq =0 u j ( q ) = ⇒ (cid:80) Lq =0 au i ( q ) q < (cid:80) Lq =0 au j ( q ) q . From theexpression of a n when n ∈ [ − , R − , a n is clearly increasing in n . For n ∈ [ R, L − R − , a n is strictly increasing May 22, 2020 DRAFT7 and a R − = a ( R − < aR = a R , which implies that a n is increasing in [ − , L − R − . For n ∈ [ L − R, L − ,we provide the following lemma Lemma 5. (cid:80) Lq =0 au n ( q ) q is strictly increasing in [ L − R, L − .Proof. See appendix N.We have that (cid:80) Lq =0 au L − R ( q ) q = a ( L − R ) > a ( L − R −
1) = (cid:80) Lq =0 au L − R − ( q ) q , and (cid:80) Lq =0 au L − ( q ) q = aL − aR (1 − − ρ ) R ) < aL = (cid:80) Lq =0 au L ( q ) q (because − − ρ ) R ≥ − − ≥ ), then we canconclude that a n is increasing in [ − , L ] .For the second condition (for all i and j such that i < j (cid:80) iq =0 u i ( q ) = (cid:80) jq =0 u j ( q ) = ⇒ (cid:80) Lq =0 au i ( q ) q < (cid:80) Lq =0 au j ( q ) q ), the only case when (cid:80) iq =0 u i ( q ) is equal to (cid:80) jq =0 u j ( q ) is when i and j are in the set [ R − , L − R +1] . In this set, we have shown that (cid:80) Lq =0 au n ( q ) q is strictly increasing, then for i < j and ( i, j ) ∈ [ R − , L − R +1] , (cid:80) Lq =0 au i ( q ) q < (cid:80) Lq =0 au j ( q ) q , hence the two conditions are satisfied.As the indexability is satisfied and the two conditions of Proposition 4 are verified, then we can apply Algorithm 1to get the Whittle’s index for each state. However, the complexity of this algorithm is L , where L is the maximumbuffer length which could be large in practice. In order to overcome this complexity issue, we will provide furtheranalysis and derive simple expressions of the Whittle indices.We first proceed by laying out the following definitions and lemmas. Definition 7.
For any given increasing threshold policy n , we define y n as a function of the subsidy W , such that y n ( W ) = (cid:80) Lq =0 au n ( q ) q − W (cid:80) nq =0 u n ( q ) = a n − W b n . Lemma 6.
For any state ( i, j ) ∈ [ − , L ] , the intersection point’s abscess between y i ( W ) and y j ( W ) denoted by x i,j is: (cid:80) Lq =0 au i ( q ) q − (cid:80) Lq =0 au j ( q ) (cid:80) iq =0 u i ( q ) − (cid:80) jq =0 u j ( q ) (83) Proof.
See appendix O.
Definition 8.
We define for ≤ n ≤ R , w n = x n,n − = (cid:80) Lq =0 au n ( q ) q − (cid:80) Lq =0 au n − ( q ) (cid:80) nq =0 u n ( q ) − (cid:80) n − q =0 u n − ( q ) = a n − a n − b n − b n − = aRnR − n (byreplacing a n and b n by their expressions when ≤ n ≤ R ). Definition 9.
We define a function f , such that for each n ∈ [0 , R ] , f ( n ) = w n [ (cid:80) Lq =0 u L ( q ) − (cid:80) nq =0 u n ( q )] + (cid:80) Lq =0 au n ( q ) q = w n [1 − (1 − n R ) n +1 R ] + a ( R − + n ( n +1)2 R ) , for n = R , f ( R ) = + ∞ , and for n = − , f ( −
1) = 0 .In other words, f ( n ) /a can be interpreted as the value of L such that w n = x L,n . Lemma 7. f is strictly increasing in n , for n ∈ [0 , R ] .Proof. See appendix P.
Lemma 8.
Assuming that L ≥ R , then there exists an integer d ∈ [0 , R − such that f ( d ) a < L ≤ f ( d +1) a May 22, 2020 DRAFT8
Proof.
We have f (0) /a = R − , and f ( R ) /a = + ∞ . Hence, as f ( . ) is strictly increasing in n , and f (0) /a = R − < R ≤ L ≤ f ( R ) /a = + ∞ , there exists one and only one d ∈ [0 , R − that satisfies f ( d ) a < L ≤ f ( d +1) a .That completes the proof.Therefore, according to the definition of f , d satisfies x d,d − ≤ x L,d and x L,d +1 ≤ x d +1 ,d . Theorem 3.
The Whittle’s index expressions are:for ≤ n ≤ d : W ( n ) = w n = x n,n − = aRnR − n for d < n ≤ L : W ( n ) = x L,d = a [ L − ( R − )+ d ( d +1)2 R ]1 − (1 − d R )( d +1 R ) Proof.
To prove Theorem 3, according to Proposition 4, we have to prove that, from ≤ j ≤ d , the largestminimizer at step j is j and at step d + 1 is L . In other words, for all ≤ j ≤ d , we have that a j − a j − b j − b j − < a n − a j − b n − b j − for all n > n j − + 1 = j such that b n (cid:54) = b j − and a L − a d b L − b d ≤ a n − a d b n − b d for all n ≥ n d + 1 = d + 1 such that b n (cid:54) = b d ,with n j being the largest minimizer at step j .To that extent, it turns out to be relevant to demonstrate that x L,R ≤ x n,R for L − R + 1 < n ≤ L − . For thedetailed proof, see Appendix Q. B. L < R
The indexability property can be easily set up by observing that (cid:80) ni =0 u n ( i ) is strictly increasing in n . Further-more, we have that (cid:80) Li =0 au n ( i ) i is increasing in n . Thereby we can apply the algorithm 1 to compute the Whittleindex expressions. According to [13, Corollary 2.1], if x n,n − is increasing in n , then the Whittle index of state n is x n,n − . Effectively, for L < R , x n,n − is increasing in n and we have the following theorem. Theorem 4.
Denoting − L ρ/ − ρ/ L + Lρ − /ρ by b .The Whittle index of state n :For n ∈ [0 , L − : W ( n ) = x n,n − = − ρ ( L + R )(1 − ρ ) n +( − n +1) ρ/ Lρ − ρ/ ρ (1 − ρ ) n For n = L : W ( L ) = x L,L − = L − [( L + R )(1 − ρ ) L − ( L − ρ/ L − Lρ − ρ/ b ](1 − ρ ) L R ≤ L < R : Regarding the case where R ≤ L < R , the class is indexable since (cid:80) ni =0 u n ( i ) is strictlyincreasing in n . Similar to the other cases, the algorithm 1 can be applied to obtain the expression of the Whittleindex for different states. Following the same methodology in appendix Q, we obtain the Whittle index expressionas follows: Theorem 5.
It exists d such that x d,d − ≤ x L,d ≤ x d +1 ,d and d < L − R , where the Whittle index expressions aregiven by: for ≤ n ≤ d : W ( n ) = w n = x n,n − = aRnR − n for d < n ≤ L : W ( n ) = x L,d = a [ L − ( R − )+ d ( d +1)2 R ]1 − (1 − d R ( d +1 R ) C. Whittle index policy for the original problem
We now consider the original optimization problem (3) and propose a simple Whittle index policy. This policyconsists of simply allocating the channels to the M users that have the highest Whittle index at time t , denoted by May 22, 2020 DRAFT9
Figure 2: Illustration of the function a n − W b n for different value of nW I , and computed using the simple expressions in Theorem 3.In Figure 2, we consider L > R . The straight lines are for n ≤ R − , the dashed ones are for R ≤ n ≤ L − R ,the doted ones are for L − R + 1 ≤ n ≤ L − , and the line with rounds is for n = L . As one can see, theslope of this latter line is very high if we compare it with the other curves. This means that all the intersectionpoints between the round line and straight lines are surely smaller than all the intersection points between the dotedand the straight ones, which confirms our Whittle index expressions. From now on, we consider that L > R ,furthermore, we suppose the following assumptions. Assumption 1.
The buffer length L satisfies: L > max ( i,j ) ∈ [1 ,K ] { a j a i } max k ∈ [1 ,K ] { ( R k − } (84) Assumption 2.
The proportion of queues scheduled at each time, α = M/N satisfies: α ≥ − K (cid:88) k =1 γ k R k (85)We justify in the next sections the reasons behind introducing these two assumptions.VI. F URTHER ANALYSIS OF THE OPTIMAL SOLUTION OF THE RELAXED PROBLEM
In this section, we provide further analysis and give the structure of the optimal solution for the relaxed problem,which will be useful for the proof of optimality of the Whittle’s Index policy. As we have seen in section III, forany given W , the optimal solution for the dual relaxed problem (7) is a threshold-based policy for each user. Byusing the Whittle index expressions defined in section V, we will provide a derivation of the optimal thresholdfor each class as function of the Lagrange parameter W . In this section, we denote by W ki the Whittle index atstate i in class k (the user and class indices cannot be dropped here as in the previous sections). We denote by May 22, 2020 DRAFT0 l = ( l , l , · · · , l K ) the vector which represents the set of thresholds for each class k . As f ( R k − /a = ( R k − ,then considering the assumption 1, we have that for all k , L > max ( i,j ) ∈ [1 ,K ] { a j a i } max k f ( R k − a ≥ max k f ( R k − a . Thatmeans for each class k , the integer d k (which depends on the maximum rate R k ), defined in Lemma 8, is equal to R k − . This allows us to obtain a general expression of the Whittle index for all class k . We denote by u nk , thestationary distribution for class k under threshold policy n . Proposition 5.
For a given W , the optimal threshold vector l = ( l ( W ) , l ( W ) , · · · , l K ( W )) for the dual problemsatisfies:For each k : l k ( W ) = max i { arg max i { W ki | W ki ≤ W }} (86) or l k ( W ) = max i { arg max i { W ki | W ki < W }} (87) In other words, l k is the biggest index among the ones that give the biggest Whittle index less than W , orstrictly less than W . We note that the solution can also be a linear combination between the threshold policiesmax i { arg max i { W ki | W ki ≤ W }} and max i { arg max i { W ki | W ki < W }} .Proof. See appendix R.Now, we give the structure of the optimal solution of the constrained relaxed problem.
Proposition 6.
The solution of the constrained relaxed problem is of type threshold policy l ( W ∗ ) , with l being thefunction vector defined in Proposition 5 and W ∗ satisfies α = (cid:80) Kk =1 γ k (cid:80) Li = l k ( W ∗ )+1 u l k ( W ∗ ) k ( i ) .Proof. See appendix S.However, W ∗ that satisfies the above constraint may not exist since α is a real number that can take any valuein [0 , , and (cid:80) Kk =1 γ k (cid:80) Li = l k +1 ( W ) u l k ( W ) k ( i ) is discrete, since the vector l ( W ) can only take discrete values in [0 , L ] K . To deal with this issue, we use the fact that for some values of W , the optimal solution of the dual problemcan be a linear combination or more precisely a randomized policy between two threshold policies for a given classas it has been mentioned in Proposition 5. To that extent, our task is to find among these values of W , the onefor which there exists a randomized parameter θ such that the constraint is satisfied with equality. To that end, weintroduce this following proposition. Proposition 7.
Under assumption 1 and 2, there exists a class m , state p , and a randomization parameter θ suchthat the optimal solution of the dual problem when the langrangian parameter W = W mp is characterized by: • For k (cid:54) = m , the optimal threshold is l k ( W mp ) = max i { arg max i { W ki | W ki ≤ W mp }} • For k = m , the optimal solution is randomized policy between two threshold policies l m ( W mp ) = max i { arg max i { W mi | W mi ≤ W mp }} and l m ( W mp ) − max i { arg max i { W mi | W mi < W mp }} , where the factor of randomization θ is theprobability of adopting the policy l k ( W mp ) and − θ the probability of adopting the policy l k ( W mp ) − . May 22, 2020 DRAFT1 • The constraint (4) is satisfied with equality, i.e. α = (cid:88) k (cid:54) = m L (cid:88) i = l k ( W mp )+1 γ k u l k ( W mp ) k ( i ) + L (cid:88) i = l m ( W mp )+1 γ m u ∗ m ( i ) + (1 − θ ) γ m u l m ( W mp ) − m ( l m ( W mp )) Where u ∗ m = θu l m ( W mp ) + (1 − θ ) u l m ( W mp ) − . • For all k , l k ( W mp ) < R k .Proof. See appendix TThe solution of the dual problem described in Proposition 7 satisfies the constraint (4) with equality, then accordingto Proposition 6, this solution is indeed the optimal solution of the constrained problem. In that regard, the optimalcost C RP,N is expressed as following: C RP,N = (cid:88) k (cid:54) = m L (cid:88) i =0 N γ k a k u l k ( W mp ) k ( i ) i + L (cid:88) i =0 N γ m a m u ∗ m ( i ) i (88)VII. L OCAL OPTIMALITY
In this section, we will show that the performance of the Whittle’s Index policy is asymptotically locally optimal.The asymptotic optimality means that for a large number of users N and a large number of channels M ( α = MN is a constant value), the Whittle’s Index policy is optimal. For that we will compare the average cost obtained bythe Whittle’s Index policy WI with the one obtained for the relaxed problem RP. Explicitly, denoting by C NT ( x ) theaverage cost obtained over the time duration ≤ t ≤ T under Whittle’s Index policy conditioned on the initial state x ,we show that C NT ( x ) tends to C RP,N when N scales. The reason behind comparing C RP,N and C NT ( x ) is that C RP,N is a lower bound of all expected average cost obtained by any policy that resolves the original Problem (3).This means that it is sufficient to prove that C NT ( x ) converges to C RP,N when T and N scale in order to establishthe asymptotic optimality of Whittle’s Index policy. For that, we will be in need of the optimal cost expression ofthe relaxed problem C RP,N derived in Section VI.First, we denote by Z k,Ni the proportion of queues at state i in class k over all the queues of the system. In otherwords, it denotes the number of queues at state i in class k over the number of all users which is N . We have that Z N = ( Z ,N , ....., Z K,N ) with Z k,N = ( Z k,N , ......, Z k,NL ) and (cid:80) Li =0 Z k,Ni = γ k for each class k .The expression of C NT ( x ) in function of Z N is T E (cid:104)(cid:80) T − t =0 (cid:80) Kk =1 (cid:80) Li =1 a k Z k,Ni ( t ) iN | Z N (0) = x (cid:105) , where Z N ( t ) evolves under Whittle’s Index policy. Denoting by z ∗ the optimal proportion of the the relaxed problem, we saythat the Whittle’s Index policy is asymptotically locally optimal if there exists δ > such that the initial proportionvector Z N (0) is within Ω δ ( z ∗ ) (i.e. || Z N (0) − z ∗ || < δ ), then C NT ( x ) converges to C RP,N when T and N scale.In order to prove that, we use the fluid limit technique that consists of analyzing the evolution of the expectationof Z N ( t ) under the Whittle’s Index policy. For that, we define the vector z ( t ) as follows: z ( t + 1) − z ( t ) | z ( t )= z = E (cid:2) Z N ( t + 1) − Z N ( t ) | Z N ( t ) = z (cid:3) (89) May 22, 2020 DRAFT2
If we denote by w hj the Whittle index for class h at state j and by p ki ( z ) the probability that a user is selectedrandomly among z ki to transmit, one can easily show that [21]: p ki ( z ) = min { z ki , max(0 , α − (cid:88) w hj >w ki z hj ) } /z ki (90)We denote by q k, i,j and q k, i,j the probability to transition from state i to state j in a class k queue if the queue isnot scheduled or is scheduled for transmission respectively.Then, the probability to transition from state i to state j in class k is: q ki,j ( z ) = p ki ( z ) q k, i,j + (1 − p ki ( z )) q k, i,j (91)Let w ∗ be the Lagrangian parameter that gives the optimal solution of the relaxed problem. Then, according toProposition 7, there exists a given class m such that w ml m = w ∗ where the corresponding optimal solution of therelaxed problem is of type threshold policy for class k (cid:54) = m denoted l k , and a randomized policy between twothreshold policies l m and l m − for class m . Moreover, l k < R k for all k .We define w ∗ as the set of states such that at any system state z ∈ w ∗ , if we use the Whittle’s Index policy,all users with the Whittle index value higher than w ∗ are scheduled, the users with Whittle index value smallerthan w ∗ stay idle and the users with index value w ∗ are scheduled with a certain randomization. Specifically, w ∗ = { z : (cid:80) w ki >w ∗ z ki < α, (cid:80) w ki ≥ w ∗ z ki ≥ α } .If we start with z (0) in w ∗ ,then: z ki ( t + 1) − z ki ( t ) = (cid:88) j (cid:54) = i q kj,i ( z ( t )) z kj ( t ) − (cid:88) i (cid:54) = j q ki,j ( z ( t )) z ki ( t ) (92)Moreover, we have the following equality for all k and t : L (cid:88) j =0 z kj ( t ) = γ k (93)and as z ( t ) ∈ w ∗ , we can show the following:1) k (cid:54) = m : z ki ( t + 1) = l k − (cid:88) j =0 ( q k, j,i − q k, l k ,i ) z kj ( t ) + L (cid:88) j = l k +1 ( q k, j,i − q k, l k ,i ) z kj ( t ) + γ k q k, l k ,i (94)2) k = mz mi ( t + 1) = l m − (cid:88) j =0 ( q m, j,i − q m, l m ,i ) z mj ( t ) + L (cid:88) j = l m +1 ( q m, j,i − q m, l m ,i ) z mj ( t ) + (1 − α ) q m, l m ,i + αq m, l m ,i − ( (cid:88) w hj >w mlm h (cid:54) = m,j (cid:54) = l h z hj ( t )) q m, l m ,i − ( (cid:88) w hj ≤ w mlm h (cid:54) = m,j (cid:54) = l h z hj ( t )) q m, l m ,i + ( K (cid:88) h =1 h (cid:54) = m L (cid:88) j =0 j (cid:54) = l h { w hlh >w mlm } z hj ( t )) q m, l m ,i + ( K (cid:88) h =1 h (cid:54) = m L (cid:88) j =0 j (cid:54) = l h { w hlh ≤ w mlm } z hj ( t )) q m, l m ,i − K (cid:88) h =1 h (cid:54) = m γ h ( { w hlh >w mlm } q m, l m ,i + { w hlh ≤ w mlm } q m, l m ,i ) (95) May 22, 2020 DRAFT3
Let g mi = (cid:80) Kh =1 h (cid:54) = m γ h ( { w hlh >w mlm } q m, l m ,i + { w hlh ≤ w mlm } q m, l m ,i ) ∀ i ∈ [0 , L ] , and C = ( c , · · · , c K ) such that c k =( γ k q k, l k , , · · · , γ k q k, l k ,L ) and c m = ((1 − α ) q m, l m , + αq m, l m , − g m , · · · , (1 − α ) q m, l m ,L + αq m, l m ,L − g mL ) for each k (cid:54) = m .Then, by replacing in the equation above for all k z kl k ( t ) with γ k − (cid:80) Lj =0 ,j (cid:54) = l k z kj ( t ) , we obtain the followinglinear relation in w ∗ between ˜ z ( t + 1) and ˜ z ( t ) where ˜ z is the proportion vector in which the elements z kl k fordifferent k are eliminated. ˜ z ( t + 1) = Q ˜ z ( t ) + C (96)The expression of matrix Q is given in Appendix U. The vector solution of the relaxed problem, denoted by ˜ z ∗ , isthe fixed point of the aforementioned linear equation. Moreover, as ˜ z ∗ ∈ w ∗ , and if ˜ z (0) = ˜ z ∗ + e , then we obtain: ˜ z ( t ) − ˜ z ∗ = Q t e (97)The analysis of the above linear system is therefore important to prove the local optimality. We first provide thefollowing lemma. Lemma 9.
If for all eigenvalues λ of Q , | λ | < , then there exists a neighborhood Ω σ (˜ z ∗ ) ⊆ w ∗ such that if ˜ z (0) ∈ Ω σ (˜ z ∗ ) , we have the following:1) For all t ≥ , || ˜ z ( t ) − ˜ z ∗ || < σ ( ˜ z ( t ) ∈ w ∗ ).2) ˜ z ( t ) converges to ˜ z ∗ .Proof. The proof follows from the convergence of the linear system.
Proposition 8.
For all eigenvalue λ of Q , | λ | < Proof.
See the proof in appendix U.The aforementioned result, combined with Lemma 9, proves the convergence of the fluid limit system (i.e. ˜ z ( t + 1) = Q ˜ z ( t ) + C ). Consequently, z converges to the fixed point of Equation (92) z ∗ . However, the above resultis not enough to prove the local optimality, as we have to show that the stochastic vector Z N ( t ) converges to z ∗ inprobability. For that, we introduce the discrete-time version of Kurtz Theorem applied to our problem (see [25]): Proposition 9.
There exists a neighborhood Ω δ ( z ∗ ) of z ∗ such that if Z N (0) = z (0) = x ∈ Ω δ ( z ∗ ) , then for any µ > and finite time horizon T , there exist positive constants C and C such that P x ( sup ≤ t Lemma 10. If Z N (0) = x ∈ Ω δ ( z ∗ ) , then for any µ > , there exists a time T such that for any T > T , thereexists positive constants s and s with, P x ( sup T ≤ t See appendix V.Now we are ready to prove the asymptotic local optimality of the proposed scheduling policy. Proposition 10. If the initial state is in the set Ω δ ( z ∗ ) , then lim T →∞ lim N →∞ C NT ( x ) N = C RP,N N (100) Proof. See appendix W VIII. G LOBAL ASYMPTOTIC OPTIMALITY In this section, we will prove that from any initial state x , the expected time average cost obtained with theWhittle’s Index policy is optimal when N is very large. In contrast to the method used to prove the local optimality,we work here with the steady state distribution of the stochastic process Z N ( t ) . To ensure that such a stationarydistribution exists, we need to show that there is at least one recurrent state. Since the states evolve according to afinite state Markov chain, we just need to prove that there exists a state reachable from any other states. Lemma 11. The state z (0) = ( z (0) , · · · , z K (0)) , defined for each class k as z k (0) = (1 , , · · · , , is reachablefrom any initial state using the Whittle’s Index policy.Proof. See appendix XThis lemma is stronger than proving the existence of a recurrent state. Indeed, this allows us to deduce that Z N ( t ) evolves in one recurrent aperiodic class, and that there exists a stationary distribution for Z N ( t ) denoted by Z N ( ∞ ) . We still need to check if for a fixed N , there exists at least one recurrent state within Ω (cid:15) ( z ∗ ) , as otherwise Ω (cid:15) ( z ∗ ) will be a transient class. If such state exists, surely Z N ( t ) will evolve in one recurrent class that containsthis recurrent state. For that, we demonstrate here that z ∗ is reachable from any state for a fixed N . Since z (0) isreachable from any state, we just need to find a path from z (0) to z ∗ . First, we start by giving α in function of theoptimal proportion z ∗ . Rewriting the expression of α given in Proposition 7, we get: α = (cid:88) k (cid:54) = m L (cid:88) i = l k +1 γ k u l k k ( i ) + L (cid:88) i = l m +1 γ m u ∗ m ( i ) + (1 − θ ) γ m u l m − m ( l m ) (101)The relation between the optimal vector z ∗ and the stationary distribution under the optimal threshold is as fellows:For k (cid:54) = m z k, ∗ h = γ k u l k k ( h ) .For k = m z m, ∗ h = γ m ((1 − θ ) u l m − m ( h ) + ( θu l m m ( h )) = γ m u ∗ m ( h ) .When h = l m ≤ R m − , we have that u l m − m ( l m ) = ρ m = 1 /R m , and u l m m ( l m ) = ρ m = 1 /R m = u l m − m ( l m ) .Then: z m, ∗ l m = γ m [(1 − θ ) ρ m + θ m ρ m ] = γ m ρ m (102) May 22, 2020 DRAFT5 Hence: γ m (1 − θ ) u l m − m ( l m ) = γ m (1 − θ ) ρ m = (1 − θ ) z m, ∗ l m (103)Therefore: α = (cid:88) k (cid:54) = m L (cid:88) i = l k +1 z k, ∗ i + L (cid:88) i = l m +1 z m, ∗ i + (1 − θ ) z m, ∗ l m (104)In addition, it will be useful for the subsequent analysis in this section also to derive the exact expression of u l k k ( h ) ,for all states h , by applying the results found in Section IV when the threshold l k is strictly less than R k . For k (cid:54) = m , we have: ≤ h ≤ l k − u l k k ( h ) = ρ k − ( l k − h ) ρ k l k ≤ h ≤ R k − u l k k ( h ) = ρ k R k ≤ h ≤ l k + R k − u l k k ( h ) = ( l k + R k − h ) ρ k (105)if k = m : ≤ h ≤ l m − u ∗ m ( h ) = ρ m − ( l m − − h + θ ) ρ m l m ≤ h ≤ R m − u ∗ m ( h ) = ρ m R m ≤ h ≤ l m + R m − u ∗ m ( h ) = ( l m + R m − − h + θ ) ρ m (106)Now, We will find a path from state z (0) to z ∗ under the Whittle’s Index policy. Proposition 11. By applying the Whittle’s Index policy, the steady state z ∗ is reachable from the state z (0) .Proof. See appendix Y.From this proposition, the state z ∗ is reachable from any state, which means that z ∗ is a recurrent state. However,as we remark in the demonstration of Proposition 11, the considered actions schedule a proportion of users (i.e.not an integer value). This is not feasible and unrealistic for some (small) values of N since the queues are notsplittable. In fact, for some values of N , the state z ∗ may not exist. On the other hand, we can say that for enoughlarge N , and for any (cid:15) > , there exists at least one recurrent state within the neighborhood Ω (cid:15) ( z ∗ ) . This willensure that there is a path to enter a neighborhood Ω (cid:15) ( z ∗ ) from any initial state. However, it is important to ensurethat the time to enter Ω (cid:15) ( z ∗ ) should not scale up with N . For that, we give the following assumption which willbe later justified via numerical studies in Section IX. Assumption 3. We assume that the expected time to enter a neighborhood of z ∗ from any initial state x does notdepend on the number of queues N . In other words, for all N the time to enter a neighborhood Ω (cid:15) ( z ∗ ) denotedby Γ N x ( (cid:15) ) is bounded by a constant T b (cid:15) . Now we provide a useful lemma that allows us to demonstrate the global asymptotic optimality. May 22, 2020 DRAFT6 Figure 3: Hitting Time of Ω (cid:15) ( z ∗ ) in function of N : (a) Z N (0) = x , (b) Z N (0) = yLemma 12. Under assumption 3, and for any (cid:15) , we have that:lim N → + ∞ P ( Z N ( ∞ ) ∈ Ω (cid:15) ( z ∗ )) = 1 (107) Proof. See lemma in [14].Since we have found a stationary distribution of Z N ( t ) under the Whittle’s Index policy, the expected averagecost under Whittle’s Index policy for a fixed N can be written as follows: lim T →∞ C NT ( x ) N = K (cid:88) k =1 L (cid:88) i =0 a k E (cid:104) Z k,Ni ( ∞ ) (cid:105) iN (108) Theorem 6. Under assumption 3, and for any initial state, we have that:lim N → + ∞ lim T →∞ C NT ( x ) N = C RP,N N (109) Proof. See appendix Z IX. N UMERICAL R ESULTS In this section, we provide numerical results that confirm the asymptotic optimality of the developed Whittleindex policy. To that extent, we consider classes having a respective rate of R = 5 and R = 10 . Moreover, wesuppose that α = 1 / , L = 50 , γ = γ = 1 / , and a = a = a = 1 . We also consider two initial states x and y such that all the queues are equal to and L respectively. A. Verification of Assumption 3 We plot in Figure 3, the evolution of the time needed to enter a neighborhood Ω (cid:15) ( z ∗ ) (i.e. hitting time of Ω (cid:15) ( z ∗ ) )with respect to N , given that (cid:15) is small enough. One can see that for large values of N , the hitting time can beconsidered as a constant and does not diverge, and this is true for both initial states x and y . This implies that thehitting time is bounded for large values of N which consolidates Assumption 3. May 22, 2020 DRAFT7 Figure 4: Performance evaluation of Whittle’s Index policy (a) Z N (0) = x , (b) Z N (0) = y B. Performance of the Whittle’s Index policy In this section, we compare the long run expected average cost per user under the Whittle’s Index policy, i.e. lim T →∞ C NT ( x ) , with the one obtained by applying the Max-Weight policy M W . The latter schedules, at eachtime t , the M weighted longest queues (equivalently the M highest a k Q ki ( t ) ). We also compare the performanceof these two policies with the optimal cost per user obtained by using the optimal solution of the relaxed problem,i.e. C RP /N . The results are plotted in Figures (4.a) and (4.b) respectively for the initial states x and y (definedabove). One can see that for large N , regardless of the initial state, the cost incurred by adopting the Whittle’sIndex policy tends to the optimal cost of the relaxed problem, which proves that it asymptotically converges to theoptimal solution of the original problem. One can also remark that the optimal cost of the relaxed problem peruser is constant and does not depend on N (see section VI). Lastly, we remark that the solution given by M W issuboptimal and lacks behind our proposed scheduling scheme. C. Fairness among users In order to improve the fairness among the users in the network, one can use the developed Whittle index policyin this paper up to some modifications. For example, we introduce in this section the following a new policy Θ which works as follows: at each time slot t , we schedule the users with the highest W k ( q ki ( t )) D k ( q ki ( t )) , where q ki ( t ) is the queue state of user i in class k , W k is the Whittle index of state q ki ( t ) when the transmission rateis R k and D k ( q ki ( t )) = (cid:80) tu =1 a k q ki ( u ) t . To evaluate numerically the performance of this policy, we consider thecase of two classes of users. To that extent, we consider the following two costs C ( N ) and C ( N ) incurredrespectively by users of class and users of class 2, specifically C ( N ) = lim T →∞ T E (cid:104)(cid:80) T − t =0 (cid:80) γ Ni =1 a q i ( t ) | x (cid:105) and C ( N ) = lim T →∞ T E (cid:104)(cid:80) T − t =0 (cid:80) γ Ni =1 a q i ( t ) | x (cid:105) . We plot these quantities with respect to N in Figure . InFigure (5.a), the costs are obtained by applying the new policy Θ while in Figure (5.b) the standard Whittle indexpolicy is applied. We conclude that the new policy gives a better performance in terms of fairness, since it reducesthe gap between the costs of the two classes of users. May 22, 2020 DRAFT8 Figure 5: Evaluation of C and C in function of N : (a) Policy Θ , (b) Whittle’s Index policyFigure 6: Performance evaluation of Whittle’s Index policy: L < R D. Performance of Whittle Index when L < R To get more comprehensive results, we also evaluate the performance of Whittle index policy when L < R byconsidering L = 10 , R = 20 and R = 30 . We let α = 1 / and a = a = a = 1 . To that end, we comparethe long run expected average cost per user under the Whittle index policy, with the one obtained by applying theMax-Weight policy M W . We see in Figure 6, that Whittle index policy still asymptotically optimal even when L < R . Hence, we can presume that Whittle index policy is asymptotically optimal regardless of the value of L .This in fact has been analytically proved throughout this whole paper when L ≥ R .X. C ONCLUSION In this paper, we have studied the problem of users and channels scheduling under bursty traffic arrivals. Ateach time slot, only M channels can be allocated to the users knowing that a user can be allocated one channel atmost. We formulated a Lagrangian relaxation of the optimization problem and provided a characterization of theoptimal solution of this relaxed problem. We then developed a simple Whittle index policy to allocate the channelsto the users and proved its asymptotic local and global optimality when the numbers of users and channels are large May 22, 2020 DRAFT9 enough. This result is of interest as the developed Whittle Index Policy has a low complexity and is near optimalfor large number of users. We then provided numerical results that corroborate our claims.R EFERENCES[1] S. Kriouile, M. Larranaga, , and M. Assaad, “Whittle index policy for multichannel scheduling in queueing systems,” in IEEE InternationalSymposium on Information Theory (ISIT) .[2] M. Deghel, M. Assaad, M. Debbah, and A. Ephremides, “Queueing stability and csi probing of a tdd wireless network with interferencealignment,” IEEE Transactions on Information Theory , vol. 64, no. 1, pp. 547–576, 2018.[3] A. Destounis, M. Assaad, M. Debbah, and B. Sayadi, “Traffic-aware training and scheduling for the 2-user miso broadcast channel,” in Information Theory (ISIT), 2014 IEEE International Symposium on . IEEE, 2014, pp. 1376–1380.[4] ——, “Traffic-aware training and scheduling for miso wireless downlink systems,” IEEE Transactions on Information Theory , vol. 61,no. 5, pp. 2574–2599, 2015.[5] L. Tassiulas and A. Ephremides, “Stability properties of constrained queueing systems and scheduling policies for maximum throughputin multihop radio networks,” IEEE transactions on automatic control , vol. 37, no. 12, pp. 1936–1948, 1992.[6] ——, “Dynamic server allocation to parallel queues with randomly varying connectivity,” IEEE Transactions on Information Theory ,vol. 39, no. 2, pp. 466–478, 1993.[7] M. J. Neely, “Optimal energy and delay tradeoffs for multiuser wireless downlinks,” IEEE Transactions on Information Theory , vol. 53,no. 9, pp. 3095–3113, 2007.[8] L. Georgiadis, M. J. Neely, L. Tassiulas et al. , “Resource allocation and cross-layer control in wireless networks,” Foundations and Trends R (cid:13) in Networking , vol. 1, no. 1, pp. 1–144, 2006.[9] C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimal queuing network control,” Mathematics of Operations Research ,vol. 24, no. 2, pp. 293–305, 1999.[10] K. Liu and Q. Zhao, “Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access,” IEEETransactions on Information Theory , vol. 56, no. 11, pp. 5547–5567, 2010.[11] P. Ansell, K. D. Glazebrook, J. Ni˜no-Mora, and M. O’Keeffe, “Whittle’s index policy for a multi-class queueing system with convexholding costs,” Mathematical Methods of Operations Research , vol. 57, no. 1, pp. 21–39, 2003.[12] C. Buyukkoc, P. Variaya, and J. Walrand, “c mu rule revisited.” Adv. Appl. Prob. , vol. 17, no. 1, pp. 237–238, 1985.[13] M. Larra˜naga, “Dynamic control of stochastic and fluid resource-sharing systems,” Ph.D. dissertation, 2015.[14] W. Ouyang, A. Eryilmaz, and N. B. Shroff, “Downlink scheduling over markovian fading channels,” IEEE/ACM Transactions onNetworking , vol. 24, no. 3, pp. 1801–1812, 2016.[15] Y. Cui, V. K. Lau, R. Wang, H. Huang, and S. Zhang, “A survey on delay-aware resource control for wireless systems—large deviationtheory, stochastic lyapunov drift, and distributed stochastic learning,” IEEE Transactions on Information Theory , vol. 58, no. 3, pp. 1677–1701, 2012.[16] I. Bettesh and S. Shamai, “Optimal power and rate control for minimal average delay: The single-user case,” IEEE Transactions onInformation Theory , vol. 52, no. 9, pp. 4115–4141, 2006.[17] R. Wang and V. K. Lau, “Delay-aware two-hop cooperative relay communications via approximate mdp and stochastic learning,” IEEETransactions on Information Theory , vol. 59, no. 11, pp. 7645–7670, 2013.[18] Y. Cui and V. K. Lau, “Distributive stochastic learning for delay-optimal ofdma power and subband allocation,” IEEE transactions onsignal processing , vol. 58, no. 9, pp. 4848–4858, 2010.[19] W. Ouyang, S. Murugesan, A. Eryilmaz, and N. B. Shroff, “Exploiting channel memory for joint estimation and scheduling in downlinknetworks,” in INFOCOM, 2011 Proceedings IEEE . IEEE, 2011, pp. 3056–3064.[20] M. Larra˜naga, M. Assaad, A. Destounis, and G. S. Paschos, “Asymptotically optimal pilot allocation over markovian fading channels,” IEEE Transactions on Information Theory , 2017.[21] R. R. Weber and G. Weiss, “On an index policy for restless bandits,” Journal of Applied Probability , vol. 27, no. 3, pp. 637–648, 1990.[22] P. Whittle, “Restless bandits: Activity allocation in a changing world,” Journal of applied probability , vol. 25, no. A, pp. 287–298, 1988.[23] S. M. Ross, Introduction to stochastic dynamic programming . Academic press, 2014.[24] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming . John Wiley & Sons, 2014. May 22, 2020 DRAFT0 [25] T. G. Kurtz, “Strong approximation theorems for density dependent markov chains,” Stochastic Processes and their Applications , vol. 6,no. 3, pp. 223–240, 1978.[26] J. Gittins, K. Glazebrook, and R. Weber, Multi-armed bandit allocation indices . John Wiley & Sons, 2011.[27] K. P. Papadaki and W. B. Powell, “Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem,” European Journal of Operational Research , vol. 142, no. 1, pp. 108–127, 2002.[28] Y. Ruan, W. Wang, Z. Zhang, and V. K. Lau, “Delay-aware massive random access for machine-type communications via hierarchicalstochastic learning,” in Communications (ICC), 2017 IEEE International Conference on . IEEE, 2017, pp. 1–6. A PPENDIX A PROOF OF P ROPOSITION k and i weobtain: K (cid:88) k =1 γ k N (cid:88) i =1 [ V ki ( q ki ) + θ ki ] = K (cid:88) k =1 γ k N (cid:88) i =1 min s ki { C k ( q ki , s ki ) + (cid:88) q (cid:48) ki P r ( q (cid:48) ki | q ki , s ki ) V ki ( q (cid:48) ki ) } (110) = min s { K (cid:88) k =1 γ k N (cid:88) i =1 [ C k ( q ki , s ki ) + (cid:88) q (cid:48) ki P r ( q (cid:48) ki | q ki , s ki ) V ki ( q (cid:48) ki )] } , (111)where s = ( s , . . . , s γ N , . . . , s K , . . . , s Kγ k N ) . We also have that: P r ( q (cid:48) | q , s ) = (cid:88) q (cid:48) ki P r ( q (cid:48) | q , s , q (cid:48) ki ) P r ( q (cid:48) ki | q , s ) = (cid:88) q (cid:48) ki P r ( q (cid:48) | q , s , q (cid:48) ki ) P r ( q (cid:48) ki | q ik , s ki ) , (112)for all q = ( q , . . . , q γ N , . . . , q K , . . . , q Kγ K N ) and q (cid:48) = ( q (cid:48) , . . . , q (cid:48) γ N , . . . , q (cid:48) K , . . . , q (cid:48) Kγ K N ) . Since P r ( q ki | q , s ) onlydepends on the decision taken with respect to user i in class k , we obtain: K (cid:88) k =1 γ k N (cid:88) i =1 (cid:88) q (cid:48) ki P r ( q (cid:48) ki | q ki , s ki ) V ki ( q (cid:48) ki ) = K (cid:88) k =1 γ k N (cid:88) i =1 (cid:88) q (cid:48) (cid:88) q (cid:48) ki P r ( q (cid:48) | q , s , q (cid:48) ki ) P r ( q (cid:48) ki | q ki , s ki ) V ki ( q (cid:48) ki ) (113) = (cid:88) q (cid:48) P r ( q (cid:48) | q , s ) K (cid:88) k =1 γ k N (cid:88) i =1 V ki ( q (cid:48) ki ) (114)From the previous equations we obtain: K (cid:88) k =1 γ k N (cid:88) i =1 V ki ( q ki ) + K (cid:88) k =1 γ k N (cid:88) i =1 θ ki = min s [ K (cid:88) k =1 γ k N (cid:88) i =1 C k ( q ki , s ki ) + K (cid:88) k =1 γ k N (cid:88) i =1 (cid:88) q (cid:48) ki P r ( q (cid:48) ki | q ki , s ki ) V ki ( q (cid:48) ki )] (115) = min s [ K (cid:88) k =1 γ k N (cid:88) i =1 C ( q ki , s ki ) + (cid:88) q (cid:48) P r ( q (cid:48) | q , s ) K (cid:88) k =1 γ k N (cid:88) i =1 V ki ( q (cid:48) ki )] (116)According to Theorem . Chapter , [23], it exists a unique function V and a constant θ that resolve the equation(8). Subsequently, since we have found a bounded function (cid:80) Kk =1 (cid:80) γ k Ni =1 V ki ( q ki ) , and a constant (cid:80) Kk =1 (cid:80) γ k Ni =1 θ ki that satisfy also the equation (8), then V ( q ) = (cid:80) Kk =1 (cid:80) γ k Ni =1 V ki ( q ki ) and θ = (cid:80) Kk =1 (cid:80) γ k Ni =1 θ ki . This is equivalentto finding for each user the decision that minimizes the right hand side of each individual Bellman equation. Thisconcludes the proof. May 22, 2020 DRAFT1 A PPENDIX B PROOF OF L EMMA C ( · , · ) is submodular. That is, ( C ( q + 1 , − C ( q + 1 , − ( C ( q, − C ( q, a ( q + 1) + W − a ( q + 1) − ( aq + W − aq ) = 0 ≤ . The latter is obtained by substituting the values of C ( q (cid:48) , s ) for s ∈ { , } and q (cid:48) ∈ { q, q + 1 } . In order to prove that (cid:80) q (cid:48) P r ( q (cid:48) | q, s ) V ( q (cid:48) ) is submodular, we distinguish between two cases:Case 1) q < R , then: (cid:88) q (cid:48) P r ( q (cid:48) | q + 1 , V ( q (cid:48) ) − (cid:88) q (cid:48) P r ( q (cid:48) | q + 1 , V ( q (cid:48) ) = (cid:88) q (cid:48) =0 P r ( A = q (cid:48) ) V ( q (cid:48) ) − (cid:88) q (cid:48) = q +1 P r ( A = q (cid:48) − q − V ( q (cid:48) )= (cid:88) q (cid:48) =0 P r ( A = q (cid:48) ) V ( q (cid:48) ) − (cid:88) q (cid:48) = q P r ( A = q (cid:48) − q ) V ( q (cid:48) + 1) ≤ (cid:88) q (cid:48) =0 P r ( A = q (cid:48) ) V ( q (cid:48) ) − (cid:88) q (cid:48) = q P r ( A = q (cid:48) − q ) V ( q (cid:48) )= (cid:88) q (cid:48) P r ( q (cid:48) | q, V ( q (cid:48) ) − (cid:88) q (cid:48) P r ( q (cid:48) | q, V ( q (cid:48) ) (117)The inequality follows from the fact that V ( · ) is increasing. This concludes the proof for q < R .Case 2) q ≥ R , then: (cid:88) q (cid:48) P r ( q (cid:48) | q + 1 , V ( q (cid:48) ) − (cid:88) q (cid:48) P r ( q (cid:48) | q + 1 , V ( q (cid:48) ) = (cid:88) q (cid:48) P r ( A = q (cid:48) − q − R ) V ( q (cid:48) ) − (cid:88) q (cid:48) P r ( A = q (cid:48) − q − V ( q (cid:48) )= (cid:88) q (cid:48) = q +1 − R P r ( A = q (cid:48) − q − R ) V ( q (cid:48) ) − (cid:88) q (cid:48) = q +1 P r ( A = q (cid:48) − q − V ( q (cid:48) )= (cid:88) q (cid:48) = q P r ( A = q (cid:48) − q ) V ( q (cid:48) − R + 1) − (cid:88) q (cid:48) = q P r ( A = q (cid:48) − q ) V ( q (cid:48) + 1) (118)Moreover, we have: (cid:88) q (cid:48) P r ( q (cid:48) | q, V ( q (cid:48) ) − (cid:88) q (cid:48) P r ( q (cid:48) | q, V ( q (cid:48) ) = (cid:88) q (cid:48) = q P r ( A = q (cid:48) − q ) V ( q (cid:48) − R ) − (cid:88) q (cid:48) = q P r ( A = q (cid:48) − q ) V ( q (cid:48) ) . (119)Subtracting Equation (118) and (119) (i.e., (118)-(119)) we obtain (cid:88) q (cid:48) = q P r ( A = q (cid:48) − q )[( V ( q (cid:48) − R + 1) − V ( q (cid:48) − R )) − ( V ( q (cid:48) + 1) − V ( q (cid:48) ))] ≤ , (120)which follows from the R-convexity of V ( · ) . Therefore, (cid:80) q (cid:48) P r ( q (cid:48) | q, s ) V ( q (cid:48) ) is submodular.A PPENDIX CP ROOF OF P ROPOSITION i < L :1) j ≤ n :Since j ≤ n , the optimal decision is to stay idle, that means if A denotes the number of arrival packets, in the nexttime slot the number of packets will be i = j + A with A ≤ R − and then A = i − j . Therefore, the probability May 22, 2020 DRAFT2 to transition from state j to i is the probability that A = i − j , which is exactly π i − j .2) j > n :The optimal decision in this case is to transmit. However, at most min( R, j ) can be transmitted. Taking into accountthe A arrival packets, then the new state for the next time slot will be i = j − min( R, j ) + A = ( j − R ) + + A , whichimplies that A = i − ( j − R ) + . This explains that the probability to transition from state j to i is the probabilitythat A is equal to i − ( j − R ) + which is equal to π i − ( j − R ) + .When i = L :1) j ≤ n :The optimal decision is a passive action. Then A arrival packets are added to the j packets present in the queue. Forthe next time slot, the number of packets is j + A . According to equation (1), since we cannot exceed the bufferlength L , we reach the state L if j + A ≥ L . Since A ≤ R − , then the probability of this event or equivalently theprobability to transition from state j to state L is P r ( L − j ≤ A ≤ R − 1) = (cid:80) R − k = L − j P r ( A = k ) = ( R − L + j ) π L − j .2) j > n :The optimal decision is an active action, thus to reach the next state the arrival packet number A must be in theset [ L − ( j − R ) + , R − . Then the probability to transition from j to L is P r ( L − ( j − R ) + ≤ A ≤ R − 1) = (cid:80) R − k = L − ( j − R ) + P r ( A = k ) = ( R − L + ( j − R ) + ) π L − ( j − R ) + . We can therefore conclude the results.A PPENDIX DP ROOF OF P ROPOSITION L ≥ R :1) First case: − ≤ n < R : u ( i ) = n (cid:88) j =0 p n ( j, i ) u ( j ) + R (cid:88) j = n +1 p n ( j, i ) u ( j ) + L (cid:88) j = R +1 p n ( j, i ) u ( j ) (121)We first provide the following lemma that follows from Proposition 2. Lemma 13. when i < L : p n ( j, i ) = π i − j if ≤ j ≤ nπ i if n + 1 ≤ j ≤ R − π i − ( j − R ) if R ≤ j ≤ L (122) when i = L : p n ( j, i ) = if ≤ j ≤ n if n + 1 ≤ j ≤ L (123)Using Lemma 13, we have:if i < L u ( i ) = n (cid:88) j =0 π i − j u ( j ) + R (cid:88) j = n +1 π i u ( j ) + L (cid:88) j = R +1 π i − ( j − R ) u ( j ) (124) May 22, 2020 DRAFT3 By definition of π given in definition 4, then: u ( i ) = min( i,n ) (cid:88) max( i − R +1 , ρu ( j ) + R (cid:88) n +1 π i u ( j ) + min( i + R,L ) (cid:88) max( i +1 ,R +1) ρu ( j ) (125)In order to prove Proposition 3 for this case, according to Lemma 13 we will distinguish between five sub-cases:a) i = L b) n + R + 1 ≤ i ≤ L − c) n + 1 ≤ i ≤ R − d) ≤ i ≤ n e) R ≤ i ≤ n + R a) Proof of u ( i ) = 0 for i = L :if i = L , since ∀ j p n ( j, L ) = 0 , then u ( L ) = 0 (126)b) Proof of u ( i ) = 0 for n + R + 1 ≤ i ≤ L − :For this case, we prove by strong induction in decreasing order that u ( i ) = 0 In fact we have that u ( L ) = 0 , and for n + R < i ≤ L , π i = 0 because i > R − , min( i, n ) = n
1) = (cid:80) min( i − R,L ) i ρu ( j ) = 0 .Hence we conclude the result.c) Proof of u ( i ) = ρ for n + 1 ≤ i ≤ R − :We have max( i − R + 1 , 0) = 0 , min( i, n ) = n , π i = ρ (since ≤ i ≤ R − ), max( i + 1 , R + 1) = R + 1 and min( i + R, L ) = i + R (recall that i + R < R ≤ L ). This implies, u ( i ) = n (cid:88) ρu ( j ) + R (cid:88) n +1 ρu ( j ) + i + R (cid:88) R +1 ρu ( j ) (128)Now, we prove that u ( i ) = ρ We have that: u ( i ) = n (cid:88) ρu ( j ) + R (cid:88) n +1 ρu ( j ) + i + R (cid:88) R +1 ρu ( j ) = ρ [ n (cid:88) u ( j ) + R (cid:88) n +1 u ( j ) + i + R (cid:88) R +1 u ( j )] (129) u ( i ) = i + R (cid:88) j =0 ρu ( j ) (130) May 22, 2020 DRAFT4 We have that i + R > n + R , then u ( p ) = 0 for all p ∈ [ n + R + 1 , i + R ] . We can hence simplify theexpression of u ( i ) as follows: u ( i ) = ρ n + R (cid:88) j =0 u ( j ) (131)Since we have proved that when j > n + R , u ( j ) = 0 (sub-case (b)), then (cid:80) n + Rj =0 u ( j ) = 1 ( (cid:80) L u ( j ) = 1 because u is probability distribution), i.e. u ( i ) = ρ .This ends the proof of sub-case (c).We will provide a useful lemma which allows us to prove Proposition 3 for the cases (d) and (e). Beforegiving this lemma, we will give general expressions of u ( i ) for these two cases.If ≤ i ≤ n : i ≤ n < R , which implies that i − R + 1 ≤ , max( i − R + 1 , 0) = 0 , min( i, n ) = i , π i = ρ since ≤ i ≤ n < R , max( i + 1 , R + 1) = R + 1 , and i + R ≤ n + R < R ≤ L . Therefore min( i + R, L ) = i + R ,which implies that: u ( i ) = i (cid:88) ρu ( j ) + R (cid:88) n +1 ρu ( j ) + i + R (cid:88) R +1 ρu ( j ) (132)If R ≤ i ≤ n + R :We have max( i − R + 1 , 0) = i − R + 1 , min( i, n ) = n (due to i ≥ R > n ), π i = 0 (since i > R − ) and max( i + 1 , R + 1) = i + 1 . Then: u ( i ) = n (cid:88) i − R +1 ρu ( j ) + min( i + R,L ) (cid:88) i +1 ρu ( j ) (133) Lemma 14. for ≤ k ≤ n : u ( n + R − k ) + u ( n − k ) = ρ (134) Proof. See appendix E (cid:4) d) Proof of u ( i ) = ρ − ρ ( n − i ) for ≤ i ≤ n :We start by proving by induction that for k ∈ [0 , n ] u ( n − k ) = ρ − ρ k , we have for ≤ k ≤ n , ≤ n − k ≤ n ,then: u ( n − k ) = n − k (cid:88) ρu ( j ) + R (cid:88) n +1 ρu ( j ) + n − k + R (cid:88) R +1 ρu ( j ) (135)For k = 0 , u ( n − 0) = ρ [ n (cid:88) u ( j ) + R (cid:88) n +1 u ( j ) + n + R (cid:88) R +1 u ( j )] (136) = n + R (cid:88) j =0 ρu ( j ) (137) = ρ (138) May 22, 2020 DRAFT5 We suppose that the expression is true for some k , we prove it for k + 1 u ( n − ( k + 1)) = n − k − (cid:88) ρu ( j ) + R (cid:88) n +1 ρu ( j ) + ( n − k − R ) (cid:88) R +1 ρu ( j ) (139) = n − k (cid:88) ρu ( j ) + R (cid:88) n +1 ρu ( j ) + ( n − k + R ) (cid:88) R +1 ρu ( j ) − ρ ( u ( n − k ) + u ( n − k + R )) (140) = u ( n − k ) − ρ ( u ( n − k ) + u ( n + R − k )) (141) = ρ − kρ − ρ ( u ( n − k ) + u ( n + R − k )) (142)Using Lemma 14, u ( n − k ) + u ( n + R − k ) = ρ , then: u ( n − ( k + 1)) = ρ − kρ − ρ ( ρ ) (143) = ρ − kρ − ρ (144) = ρ − ( k + 1) ρ (145)Thus we conclude that for k ∈ [0 , n ] u ( n − k ) = ρ − kρ .For i ∈ [0 , n ] , we replace k ∈ [0 , n ] by n − i ( n − i ∈ [0 , n ] ), we get: u ( i ) = u ( n − ( n − i )) = ρ − ρ ( n − i ) (146)e) Proof of u ( i ) = ρ ( n + R − i ) for R ≤ i ≤ n + R :For that we prove that for k ∈ [0 , n ] u ( n + R − k ) = ρ k .From the above result in the case (d), we get u ( n − k ) = ρ − kρ .So, according to Lemma 14: u ( n + R − k ) = ρ − u ( n − k ) (147) = ρ − ( ρ − ρ k ) (148) u ( n + R − k ) = ρ k (149)For i ∈ [ R, n + R ] , we replace k ∈ [0 , n ] by n + R − i ( n + R − i ∈ [0 , n ] ), we get: u ( i ) = u ( n + R − ( n + R − i )) = ρ ( n + R − i ) (150)2) Second case: R ≤ n < L − R : u ( i ) = n (cid:88) j =0 p n ( j, i ) u ( j ) + L (cid:88) j = n +1 p n ( j, i ) u ( j ) (151) Lemma 15. when i < L : p n ( j, i ) = π i − j if ≤ j ≤ nπ i − ( j − R ) if n + 1 ≤ j ≤ L (152) May 22, 2020 DRAFT6 when i = L : p n ( j, i ) = if ≤ j ≤ n if n + 1 ≤ j ≤ L (153)The results of Lemma 15 come from Proposition 2. Using Lemma 15:if i < L u ( i ) = n (cid:88) j =0 π i − j u ( j ) + L (cid:88) j = n +1 π i − ( j − R ) u ( j ) (154)By definition of π given in definition 4, then: u ( i ) = min( n,i ) (cid:88) max( i +1 − R, ρu ( j ) + min( L,i + R ) (cid:88) max( n +1 ,i +1) ρu ( j ) (155)According to Lemma 15, we will distinguish between five sub-cases:a) i = L b) ≤ i ≤ n − R c) n + R + 1 ≤ i ≤ L − d) n + 1 − R ≤ i ≤ n e) n + 1 ≤ i ≤ n + R a) Proof of u ( i ) = 0 for i = L :if i = L , since ∀ j p n ( j, L ) = 0 , then: u ( L ) = 0 (156)b) Proof of u ( i ) = 0 for ≤ i ≤ n − R :We prove by induction that for all ≤ i < n + 1 − R , u ( i ) = 0 .In fact, if ≤ i < n + 1 − R , then i < n − R < n , min( n, i ) = i , and min( i + R, L ) ≤ i + R < n + 1 =max( n + 1 , i + 1) . Then: u ( i ) = i (cid:88) ( i +1 − R ) + ρu ( j ) (157)for i = 0 u (0) = ρu (0) i.e. u (0) = 0 since ρ < .if u ( j ) = 0 for all j ≤ i , then: u ( i + 1) = i +1 (cid:88) ( i +2 − R ) + ρu ( j ) (158) = i (cid:88) ( i +2 − R ) + ρu ( j ) + ρu ( i + 1) (159) = 0 + ρu ( i + 1) (160) u ( i + 1) = ρu ( i + 1) (161)This implies that u ( i + 1) = 0 .c) Proof of u ( i ) = 0 for n + R + 1 ≤ i ≤ L − : May 22, 2020 DRAFT7 If i ≥ n + R + 1 then ( i + 1 − R ) + = i + 1 − R > n = min( n, i ) and max( n + 1 , i + 1) = i + 1 . This impliesthat u ( i ) = min( i + R,L ) (cid:88) i +1 ρu ( j ) (162)and we have u ( L ) = 0 .We now suppose that for all k between i and L : u ( k ) = 0 then u ( i − 1) = min( i − R,L ) (cid:88) i ρu ( j ) = 0 (163)We conclude the result.Next, we will provide a useful lemma which allows us to prove Proposition 3 for the cases (d) and (e). Beforeproviding this lemma, we will give general expressions of u ( i ) for these two cases.if n + 1 − R ≤ i ≤ n :We have min( n, i ) = i , max( n + 1 , i + 1) = n + 1 , and min( L, i + R ) = i + R (since i + R ≤ n + R 0) = i + 1 − R , min( n, i ) = n and max( n + 1 , i + 1) = i + 1 . Therefore: u ( i ) = n (cid:88) i +1 − R ρu ( j ) + min( L,i + R ) (cid:88) i +1 ρu ( j ) (167)We have i + R > n + R , and L > n + R because n < L − R , then min( L, i + R ) > n + R . Therefore, giventhat u ( j ) = 0 for all j between n + R + 1 and min( L, i + R ) , we can simplify the expression of u ( i ) asfollows: u ( i ) = n (cid:88) i +1 − R ρu ( j ) + n + R (cid:88) i +1 ρu ( j ) (168) u ( i ) = n + R (cid:88) i +1 ρ [ u ( j − R ) + u ( j )] (169) May 22, 2020 DRAFT8 Lemma 16. for ≤ k ≤ R − , u ( n + R − k ) + u ( n − k ) = ρ (170) Proof. See appendix F (cid:4) Let us now prove the result for cases (d) and (e).d) Proof of u ( i ) = ρ − ( n − i ) ρ for n + 1 − R ≤ i ≤ n :We prove by induction that, for ≤ k ≤ R − , u ( n − k ) = ρ − kρ For k = 0 : u ( n − 0) = n + R (cid:88) n +1 ρ [ u ( j − R ) + u ( j )] (171) = n (cid:88) n +1 − R ρu ( j ) + n + R (cid:88) n +1 ρu ( j ) (172) = ρ n + R (cid:88) n +1 − R u ( j ) (173) u ( n ) = ρ (174)We suppose that the expression is true for some k , we prove it for k + 1 . u ( n − ( k + 1)) = n − k − R (cid:88) n +1 ρ ( u ( j − R ) + u ( j )) (175) = n − k + R (cid:88) n +1 ρ [ u ( j − R ) + u ( j )] − ρ [ u ( n − k ) + u ( n − k + R )] (176) = u ( n − k ) − ρ [ u ( n − k ) + u ( n + R − k )] (177) = ρ − kρ − ρ [ u ( n − k ) + u ( n + R − k )] (178)Using Lemma 16, u ( n − k ) + u ( n + R − k ) = ρ , then u ( n − ( k + 1)) = ρ − kρ − ρ ( ρ ) (179) = ρ − kρ − ρ (180) u ( n − ( k + 1)) = ρ − ( k + 1) ρ (181)Thus we conclude that, for k ∈ [0 , R − , u ( n − k ) = ρ − kρ .For i ∈ [ n + 1 − R, n ] , we replace k ∈ [0 , R − by n − i ( n − i ∈ [0 , R − ) and get: u ( i ) = u ( n − ( n − i )) = ρ − ( n − i ) ρ (182)e) Proof of u ( i ) = ρ ( n + R − i ) for n + 1 ≤ i ≤ n + R May 22, 2020 DRAFT9 We prove that, for k ∈ [0 , R − , u ( n + R − k ) = ρ k . From above, we have u ( n − k ) = ρ − kρ , and byusing Lemma 16 we have: u ( n + R − k ) = ρ − u ( n − k ) (183) = ρ − ( ρ − ρ k ) (184) u ( n + R − k ) = ρ k (185)For i ∈ [ n + 1 , n + R ] , by replacing k ∈ [0 , R − by n + R − i ( n + R − i ∈ [0 , n ] ), we get: u ( i ) = u ( n + R − ( n + R − i )) = ρ ( n + R − i ) (186)This ends the proof of the second case.3) Third case: L − R ≤ n < L u ( i ) = n (cid:88) j =0 p n ( j, i ) u ( j ) + L (cid:88) j = n +1 p n ( j, i ) u ( j ) (187) Lemma 17. when i < L : p n ( j, i ) = π i − j if ≤ j ≤ nπ i − ( j − R ) if n + 1 ≤ j ≤ L (188) when i = L : p n ( j, L ) = ( R − L + j ) π L − j if ≤ j ≤ n if n + 1 ≤ j ≤ L (189)This Lemma comes from Proposition 2.So using Lemma 17, and by definition of π :if i < L : u ( i ) = min( i,n ) (cid:88) max( i − R +1 , ρu ( j ) + min( L,i + R ) (cid:88) max( n +1 ,i +1) ρu ( j ) (190)if i = L : u ( L ) = n (cid:88) j =0 ( R − L + j ) π L − j u ( j ) (191)According to Lemma 17, we will distinguish between five cases:a) ≤ i ≤ n − R b) n + 1 ≤ i ≤ L − c) n − R + 1 ≤ i ≤ L − R − d) L − R ≤ i ≤ n e) i = L a) Proof of u ( i ) = 0 for ≤ i ≤ n − R : May 22, 2020 DRAFT0 We prove by induction that, for i ≤ n − R , u ( i ) = 0 .Since ≤ i ≤ n − R , then min( i, n ) = i , i + R ≤ n < L and min( L, i + R ) = i + R < n +1 = max( n +1 , i +1) .Therefore: u ( i ) = i (cid:88) max( i − R +1 , ρu ( j ) (192)for i = 0 , u (0) = ρu (0) = 0 .We consider that u ( j ) = 0 for all j between and i , we demonstrate that u ( i + 1) = 0 . u ( i + 1) = i +1 (cid:88) max( i − R +2 , ρu ( j ) (193) = i (cid:88) max( i − R +2 , ρu ( j ) + ρu ( i + 1) (194) = 0 + ρu ( i + 1) (195) u ( i + 1) = ρu ( i + 1) (196)This implies that: u ( i + 1) = ρu ( i + 1) (197)Hence we prove that, for all i ∈ [0 , n − R ] , u ( i ) = 0 .We will provide a useful lemma which allows us to prove Proposition 3 for cases (b) and (c). Before givingthis lemma, we will give general expressions of u ( i ) for these two cases.if n − R + 1 ≤ i ≤ L − R − : i < L − R ≤ n , then min( i, n ) = i , max( n + 1 , i + 1) = n + 1 and min( L, i + R ) = i + R . This implies that, u ( i ) = i (cid:88) ( i − R +1) + ρu ( j ) + i + R (cid:88) n +1 ρu ( j ) (198)We have n − R + 1 > and n − R + 1 > i − R + 1 , which implies that n − R + 1 > ( i + 1 − R ) + and n − R ≥ ( i + 1 − R ) + . Since u ( j ) = 0 for all j less or equal to n − R , we can simplify the expression of u ( i ) as follows: u ( i ) = i (cid:88) n − R +1 ρu ( j ) + i + R (cid:88) n +1 ρu ( j ) (199) u ( i ) = i + R (cid:88) n +1 ρ [ u ( j − R ) + u ( j )] (200)if n + 1 ≤ i < L :We have ( i − R + 1) + = i − R + 1 (as i ≥ n + 1 > R ), min( i, n ) = n , max( n + 1 , i + 1) = i + 1 and min( L, i + R ) = L (due to i + R > n + R ≥ L − R + R = L ). Then: u ( i ) = n (cid:88) i − R +1 ρu ( j ) + L (cid:88) i +1 ρu ( j ) (201) May 22, 2020 DRAFT1 Lemma 18. for ≤ k ≤ L − n − , u ( n − R + k ) + u ( n + k ) = ρ (202) Proof. See appendix G (cid:4) b) Proof of u ( i ) = ρ − ( i − n ) ρ for n + 1 ≤ i ≤ L − :We prove first that, for ≤ k ≤ L − n − , u ( n + k ) = ρ − kρ .In fact: u ( n + k ) = n (cid:88) n + k − R +1 ρu ( j ) + L (cid:88) n + k +1 ρu ( j ) (203) = ρ − [ n + k − R (cid:88) n − R +1 ρu ( j ) + n + k (cid:88) n +1 ρu ( j )] (204) = ρ − [ k (cid:88) ρu ( n − R + j ) + k (cid:88) ρu ( n + j )] (205) = ρ − [ k (cid:88) ρ [ u ( n − R + j ) + u ( n + j )]] (206)According to Lemma 18, and given that ≤ k ≤ L − n − , then for all j ∈ [1 , k ] , u ( n − R + j )+ u ( n + j ) = ρ ,then: u ( n + k ) = ρ − [ k (cid:88) ρ ] (207) u ( n + k ) = ρ − kρ (208)Then for ≤ k ≤ L − n − , u ( n + k ) = ρ − kρ .For i ∈ [ n + 1 , L − , we replace k ∈ [1 , L − n − by i − n ( i − n ∈ [1 , L − n − ) and get: u ( i ) = u ( n + ( i − n )) = ρ − ( i − n ) ρ (209)c) Proof of u ( i ) = ρ ( R − n + i ) for n − R + 1 ≤ i ≤ L − R − :We need to prove that, for k ∈ [1 , L − n − , u ( n − R + k ) = ρ k Given that u ( n + k ) = ρ − ρ k which is proved in case (d), and using Lemma 18, then: u ( n − R + k ) = ρ − u ( n + k ) (210) = ρ − ( ρ − ρ k ) (211) u ( n − R + k ) = ρ k (212)For i ∈ [ n − R + 1 , L − R − , we replace k ∈ [1 , L − n − by R − n + i ( R − n + i ∈ [1 , L − n − ) andget: u ( i ) = u ( n − R + ( R − n + i )) = ρ ( R − n + i ) (213) May 22, 2020 DRAFT2 This ends the proof of case (c).d) Proof of u ( i ) = (1 − ρ ) n − i ρ for L − R ≤ i ≤ n :if L − R ≤ i ≤ n , ( i − R + 1) + = i − R + 1 because i ≥ L − R ≥ R , min( i, n ) = i , max( n + 1 , i + 1) = n + 1 and min( L, i + R ) = L . Then: u ( i ) = i (cid:88) i − R +1 ρu ( j ) + L (cid:88) n +1 ρu ( j ) (214)We have n ≥ i , then n − R + 1 ≥ i − R + 1 . If n − R + 1 = i − R + 1 , we replace i − R + 1 by n − R + 1 inthe expression of u ( i ) . If n − R + 1 > i − R + 1 , we know that, for all j less or equal to n − R , u ( j ) = 0 .We can then simplify the expression of u ( i ) as follows: u ( i ) = i (cid:88) n − R +1 ρu ( j ) + L (cid:88) n +1 ρu ( j ) (215)In order to prove Proposition 3 for this case, we prove by induction that u ( n − k ) = (1 − ρ ) k ρ for ≤ k ≤ n − L + R For k = 0 : u ( n ) = n (cid:88) n − R +1 ρu ( j ) + L (cid:88) n +1 ρu ( j ) (216) = L (cid:88) n − R +1 ρu ( j ) (217) u ( n ) = ρ (218)We suppose it is true for k , we prove it for k + 1 : u ( n − ( k + 1)) = n − k − (cid:88) n − R +1 ρu ( j ) + L (cid:88) n +1 ρu ( j ) (219) = n − k (cid:88) n − R +1 ρu ( j ) + L (cid:88) n +1 ρu ( j ) − ρu ( n − k ) (220) = u ( n − k ) − ρu ( n − k ) (221) = (1 − ρ ) k ρ − ρ (1 − ρ ) k ρ (222) = (1 − ρ ) k ρ (1 − ρ ) (223) u ( n − ( k + 1)) = (1 − ρ ) k +1 ρ (224)(225)Thus we conclude that, for k ∈ [0 , n − L + R ] , u ( n − k ) = (1 − ρ ) k ρ .For i ∈ [ L − R, n ] , we replace for k ∈ [0 , n − L + R ] by n − i ( n − i ∈ [0 , n − L + R ] ) and get: u ( i ) = u ( n − ( n − i )) = (1 − ρ ) n − i ρ (226)This proves the result. May 22, 2020 DRAFT3 e) Proof of u ( i ) = (1 − ρ ) n − L + R +1 − ρ ( L − − n ) for i = L : u ( L ) = n (cid:88) j =0 ( R − L + j ) π L − j u ( j ) (227) = n (cid:88) j = L − R +1 ( R − L + j ) ρu ( j ) (228)We replace u ( j ) by its expression when j ∈ [ L − R + 1 , n ] (it corresponds to the sub-case (d)) u ( L ) = n (cid:88) j = L − R +1 ( R − L + j )[ ρ (1 − ρ ) n − j ρ ] (229) u ( L ) = ρ n − L + R − (cid:88) k =0 ( R − L − k + n )(1 − ρ ) k (230) u ( L ) = (1 − ρ ) n − L + R +1 − ρ ( L − − n ) (231)4) Fourth case: n = L u ( i ) = L (cid:88) j =0 p L ( j, i ) u ( j ) (232)For i ≤ L − :According to Proposition 2, we have: u ( i ) = L (cid:88) j =0 π i − j u ( j ) (233)By definition of π , we get: u ( i ) = i (cid:88) ( i − R +1) + ρu ( j ) (234)We prove by induction that for ≤ i < L u ( i ) = 0 We have u (0) = ρu (0) = 0 .We suppose that u ( j ) = 0 for all ≤ j ≤ i , then: u ( i + 1) = i +1 (cid:88) ( i − R +2) + ρu ( j ) (235) = i (cid:88) ( i − R +2) + ρu ( j ) + ρu ( i + 1) (236) = 0 + ρu ( i + 1) (237) u ( i + 1) = 0 (238)Then, for all i ∈ [0 , L − , u ( i ) = 0 .Since (cid:80) Lj =0 u ( j ) = 1 , we have u ( L ) = 1 − (cid:80) L − j =0 u ( j ) = 1 − .This ends the proof. May 22, 2020 DRAFT4 A PPENDIX EP ROOF OF L EMMA u ( n − k ) + u ( n + R − k ) = n − k (cid:88) ρu ( j ) + R (cid:88) n +1 ρu ( j ) + n − k + R (cid:88) R +1 ρu ( j ) + n (cid:88) n − k +1 ρu ( j ) + min( n − k +2 R,L ) (cid:88) n + R − k +1 ρu ( j )) (239) u ( n − k ) + u ( n + R − k ) = ρ min(2 R + n − k,L ) (cid:88) u ( j ) (240)We know that R > n and n − k ≥ , which implies that R + n − k > n + R and n + R < R ≤ L . and hence min(2 R + n − k, L ) > n + R . Therefore, we get rid of all elements u ( j ) such that j ∈ [ n + R +1 , min(2 R + n − k, L )] since for all j > n + R , u ( j ) = 0 . Moreover (cid:80) n + R u ( j ) = 1 , consequently: u ( k ) + u ( R + k ) = ρ n + R (cid:88) u ( j ) = ρ (241)A PPENDIX FP ROOF OF L EMMA n − R + 1 ≤ n − k ≤ n , and n + 1 ≤ n + R − k ≤ n + R , then: u ( n − k ) + u ( n + R − k ) = n − k (cid:88) n +1 − R ρu ( j ) + n − k + R (cid:88) n +1 ρu ( j ) + n (cid:88) n − k +1 ρu ( j ) + n + R (cid:88) n + R − k +1 ρu ( j ) (242) = ρ n + R (cid:88) n +1 − R u ( j ) (243)Given that u ( j ) = 0 for j ∈ [0 , n − R ] ∪ [ n + R + 1 , L ] , then (cid:80) n + Rn +1 − R u ( j ) = 1 . Consequently: u ( n − k ) + u ( n + R − k ) = ρ (244)A PPENDIX GP ROOF OF L EMMA n − R + 1 ≤ n − R + k ≤ L − R − , and n + 1 ≤ n + k ≤ L − , then: u ( n − R + k ) + u ( n + k ) = n − R + k (cid:88) n − R +1 ρu ( j ) + n + k (cid:88) n +1 ρu ( j ) + n (cid:88) n + k − R +1 ρu ( j ) + L (cid:88) n + k +1 ρu ( j ) (245) u ( n − R + k ) + u ( n + k ) = ρ L (cid:88) n − R +1 u ( j ) (246)As we have demonstrated that u ( i ) = 0 for i ∈ [0 , n − R ] , then (cid:80) Ln − R +1 u ( j ) = 1 . Therefore, u ( n − R + k ) + u ( n + k ) = ρ (247) May 22, 2020 DRAFT5 A PPENDIX HP ROOF OF P ROPOSITION (cid:80) Lq =0 au n ( q ) q by a n and (cid:80) nq =0 u n ( q ) by b n . Before provingthe proposition, we give two useful lemmas. Lemma 19. Considering a j − , a j , a j +1 and b j − , b j , b j +1 , such that b j − < b j < b j +1 .If a j − a j − b j − b j − ≤ a j +1 − a j b j +1 − b j Then: a j − a j − b j − b j − ≤ a j +1 − a j − b j +1 − b j − ≤ a j +1 − a j b j +1 − b j (248) If a j − a j − b j − b j − ≥ a j +1 − a j b j +1 − b j Then: a j − a j − b j − b j − ≥ a j +1 − a j − b j +1 − b j − ≥ a j +1 − a j b j +1 − b j (249) If a j − a j − b j − b j − ≤ a j +1 − a j − b j +1 − b j − Then: a j − a j − b j − b j − ≤ a j +1 − a j − b j +1 − b j − ≤ a j +1 − a j b j +1 − b j (250) If a j − a j − b j − b j − ≥ a j +1 − a j − b j +1 − b j − Then: a j − a j − b j − b j − ≥ a j +1 − a j − b j +1 − b j − ≥ a j +1 − a j b j +1 − b j (251) If a j +1 − a j − b j +1 − b j − ≤ a j +1 − a j b j +1 − b j Then: a j − a j − b j − b j − ≤ a j +1 − a j − b j +1 − b j − ≤ a j +1 − a j b j +1 − b j (252) If a j +1 − a j − b j +1 − b j − ≥ a j +1 − a j b j +1 − b j Then: a j − a j − b j − b j − ≥ a j +1 − a j − b j +1 − b j − ≥ a j +1 − a j b j +1 − b j (253) Proof. See appendix I (cid:4) Lemma 20. The largest minimizer at step j in algorithm 1 satisfies n j = min { k : b k = b n j } Proof. See appendix J. (cid:4) We start by indexability:We consider W < W and prove that the optimal threshold n , when W = W , is less than n (when W = W ).In fact if n ≤ n and the threshold is n , all states [0 , n ] , for which the optimal decision is passive action, areincluded in [0 , n ] . This implies the desired result D ( W ) ⊆ D ( W ) .In order to prove that, we just need to prove that b n ≤ b n since n ≤ n is equivalent to b n ≤ b n (due toincreasiness of b n ).We have according to equation (7) and by definition of n and n : a n − W b n ≤ a n − W b n (254) May 22, 2020 DRAFT6 a n − W b n ≥ a n − W b n (255)This implies: W ( b n − b n ) ≤ a n − a n ≤ W ( b n − b n ) (256)Therefore: ( W − W )( b n − b n ) ≤ . Since W − W > , hence: b n ≤ b n , then n ≤ n .We conclude the indexability.For the Whittle’s index expressions, we need to demonstrate that, for k ∈ ] n j − , n j ] , W j = min { W, k ∈ D ( W ) } .For that, we prove first that for W < W j then k / ∈ D ( W ) .When k > n j − , W < W j , and b k (cid:54) = b n j − , then W < W j ≤ a k − a nj − b k − b nj − , and a k − b k W > a n j − − b n j − W .When k > n j − , W < W j and b k = b n j − , then given that a k > a n j − we have a k − b k W > a n j − − b n j − W Hence we have proved that, for W < W j and k > n j − , a k − b k W > a n j − − b n j − W . That means at W theoptimal threshold is n j − or even less. Therefore, for k ∈ ] n j − , n j ] where k is necessary strictly higher than thethreshold, the optimal action for k is active action , i.e. k / ∈ D ( W ) .There is still to prove that k ∈ D ( W j ) .For that, we prove that the threshold is at least n j when W = W j . In other words, for all k < n j , a k − b k W j ≥ a n j − b n j W j . We demonstrate this result by induction in j .For j = 0 , we have for all n , b n > , then W is well defined. W ≤ a k − a − b k ∀ k ≥ . Then for ≤ k < n , according to Lemma 20, b k < b n . Thus, by using Lemma 19(fourth case), we can deduce that a n − a k b n − b k ≤ W . That means, for k ∈ [ − , n [ , a n − a k b n − b k ≤ W , which implies that a k − b k W ≥ a n − b n W .We suppose at step j , a k − b k W j ≥ a n j − b n j W j i.e. a nj − a k b nj − b k ≤ W j for k < n j (this remains true since b k < b n j according to Lemma 20).At j + 1 :When n j ≤ k < n j +1 , then if b k (cid:54) = b n j , a k − a nj b k − b nj ≥ W j +1 . Thus, by using Lemma 19 (fourth case), we get a nj +1 − a k b nj +1 − b k ≤ W j +1 ( b n j < b k < b n j +1 ). If b k = b n j , a nj +1 − a k b nj +1 − b k = a nj +1 − a k b nj +1 − b nj ≤ a nj +1 − a nj b nj +1 − b nj = W j +1 since a k ≥ a n j .When k < n j , we have a nj − a k b nj − b k ≤ W j (induction assumption). Using the definition of n j defined in Algorithm 1, wehave W j < a nj +1 − a nj − b nj +1 − b nj − . Then according to Lemma 19 (third case), W j ≤ W j +1 ( b n j − < b n j < b n j +1 ). Therefore a nj − a k b nj − b k ≤ W j +1 and by using again Lemma 19 (first case), a nj +1 − a k b nj +1 − b k ≤ W j +1 . Therefore, for all k ≤ n j +1 , a k − b k W j +1 ≥ a n j +1 − b n j +1 W j .Thus, we have proved by induction that at any step j , for k < n j , a k − b k W j ≥ a n j − b n j W j .Then when W = W j , the threshold is at least n j . This means that for k ∈ ] n j − , n j ] , k is less or equal than thethreshold, which implies that the optimal decision at state k is passive action, i.e. k ∈ D ( W j ) .As we have demonstrated that for k ∈ ] n j − , n j ] and W < W j , k / ∈ D ( W ) and k ∈ D ( W j ) , then W j = min { W, k ∈ D ( W ) } . This concludes the proof. May 22, 2020 DRAFT7 A PPENDIX IP ROOF OF L EMMA a j − a j − b j − b j − ≤ a j +1 − a j b j +1 − b j = ⇒ a j − a j − b j − b j − ≤ a j +1 − a j − b j +1 − b j − ≤ a j +1 − a j b j +1 − b j :For the LHS inequality: a j +1 − a j − b j +1 − b j − = a j +1 − a j b j +1 − b j − + a j − a j − b j +1 − b j − (257) ≥ ( a j − a j − )( b j +1 − b j )( b j − b j − )( b j +1 − b j − ) + a j − a j − b j +1 − b j − (258)The inequality above comes from the fact that b j − < b j < b j +1 and a j − a j − b j − b j − ≤ a j +1 − a j b j +1 − b j Then a j +1 − a j − b j +1 − b j − ≥ a j − a j − b j − b j − [ b j +1 − b j + b j − b j − b j +1 − b j − ] (259) = a j − a j − b j − b j − (260)For the RHS inequality: a j +1 − a j − b j +1 − b j − = a j +1 − a j b j +1 − b j − + a j − a j − b j +1 − b j − (261) ≤ a j +1 − a j b j +1 − b j − + ( a j +1 − a j )( b j − b j − )( b j +1 − b j )( b j +1 − b j − ) (262)where the above inequality comes from the fact that b j − < b j < b j +1 and a j − a j − b j − b j − ≤ a j +1 − a j b j +1 − b j Then a j +1 − a j − b j +1 − b j − ≤ a j +1 − a j b j +1 − b j [ b j +1 − b j + b j − b j − b j +1 − b j − ] (263) = a j +1 − a j b j +1 − b j (264)A PPENDIX JP ROOF OF L EMMA i such that b i = b n j and we prove that n j ≤ i :By construction of n j , b n j − (cid:54) = b n j and n j − < n j . Hence, by increasiness of b k , b n j ≥ b n j − .Therefore b i = b n j > b n j − , and i > n j − . Consequently, according to definition of n j : a n j − a n j − b n j − b n j − ≤ a i − a n j − b i − b n j − (265) a n j − a n j − b n j − b n j − ≤ a i − a n j − b n j − b n j − (266)This implies that a n j ≤ a i .If i < n j , as b i = b n j , then a i < a n j which contradicts with a n j ≤ a i .Therefore n j ≤ i . This concludes the proof. May 22, 2020 DRAFT8 A PPENDIX KP ROOF OF L EMMA n ∈ [ − , R − n +1 (cid:88) q =0 u n +1 ( q ) − n (cid:88) q =0 u n ( q ) = (1 − n + 12 R )( n + 2 R ) − (1 − n R )( n + 1 R ) (267) = R − − nR (268) > (269)A PPENDIX LP ROOF OF L EMMA Lemma 21. we have the inequality: for all x ∈ ]0 , x + ln(1 − x )(1 − x ) > (270) Proof. See appendix M. (cid:4) We note that R ≥ , then ρ ∈ ]0 , .We denote the function h ( n ) = (cid:80) nq =0 u n ( q ) = ρ ( L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 . We give the firstderivative and the second derivative of h : h (cid:48) ( n ) = ρ − L + 1 + 2 n ) − ln(1 − ρ )(1 − ρ ) n − L + R +1 (271) h (cid:48)(cid:48) ( n ) = ρ − (ln(1 − ρ )) (1 − ρ ) n − L + R +1 (272)For n ∈ [ L − R + 1 , L − , (1 − ρ ) n − L + R +1 is decreasing in n , then h (cid:48)(cid:48) ( n ) ≥ ρ − (ln(1 − ρ )) (1 − ρ ) (273)Using lemma 21, ρ > − ln(1 − ρ )(1 − ρ ) (274)then ρ > (ln(1 − ρ )) (1 − ρ ) (275)Therefore h (cid:48)(cid:48) ( n ) ≥ ρ − (ln(1 − ρ )) (1 − ρ ) > (276)i.e. h (cid:48) is strictly increasing function in n .We have h (cid:48) ( L − R + 1) = ρ − ρ − ln(1 − ρ )(1 − ρ ) . In order to prove the positivity of h (cid:48) , we introduce thefunction r ( x ) = 3 x − x − ln(1 − x )(1 − x ) (277) May 22, 2020 DRAFT9 r (cid:48) ( x ) = 2( x + ln(1 − x )(1 − x )) > (according to Lemma 21), which means r is strictly increasing in [0 , .Hence, for all x ∈ ]0 , , r ( x ) > r (0) = 0 .Then: h (cid:48) ( L − R + 1) = 3 ρ − ρ − ln(1 − ρ )(1 − ρ ) > (278)Since h (cid:48) is increasing function in n , then: h (cid:48) ( n ) ≥ h (cid:48) ( L − R + 1) > (279)Therefore h is strictly increasing in n . This concludes the proof.A PPENDIX MP ROOF OF L EMMA v ( x ) = x + ln(1 − x )(1 − x ) in [0 , the first derivative: v (cid:48) ( x ) = − ln(1 − x ) > for all x ∈ ]0 , , we have v (0) = 0 , then for all x ∈ ]0 , v ( x ) >v (0) = 0 , which concludes the result. A PPENDIX NP ROOF OF L EMMA n ∈ [ L − R, L − , we have: L (cid:88) q =0 au n +1 ( q ) q − L (cid:88) q =0 au n ( q ) q = 1 − − ρ ) n − L +1+ R + 2 Lρ − nρ − ρ (280)If we denote the function p as: p ( n ) = 1 − − ρ ) n − L +1+ R + 2 Lρ − nρ − ρ (281) p (cid:48)(cid:48) ( n ) = − − ρ )) (1 − ρ ) n − L +1+ R (282)Hence, as p (cid:48)(cid:48) ( n ) ≤ , p is concave, that is p is quasi-concave in [ L − R, L − , then: p ( n ) ≥ min( p ( L − R ) , p ( L − (283) p ( L − R ) = 1 − − ρ ) + 2 − ρ = 1 > (284) p ( L − 1) = 1 − − ρ ) R (285)As (1 − ρ ) R ≤ exp( − (with exp the exponential function) for all R ≥ , then: p ( L − ≥ − − > (286)Thus p ( n ) > in [ L − R, L − . Hence, for n ∈ [ L − R, L − L (cid:88) q =0 au n +1 ( q ) q − L (cid:88) q =0 au n ( q ) q > (287) May 22, 2020 DRAFT0 A PPENDIX OP ROOF OF L EMMA W = x i,j , y i ( W ) = y j ( W ) , i.e.: L (cid:88) q =0 au i ( q ) q − W i (cid:88) q =0 u i ( q ) = L (cid:88) q =0 au j ( q ) q − W j (cid:88) q =0 u j ( q ) (288) L (cid:88) q =0 au i ( q ) q − L (cid:88) q =0 au i ( q ) q = W i (cid:88) q =0 u i ( q ) − W j (cid:88) q =0 u j ( q ) (289) L (cid:88) q =0 au i ( q ) q − L (cid:88) q =0 au i ( q ) q = W [ i (cid:88) q =0 u i ( q ) − j (cid:88) q =0 u j ( q )] (290)Hence W = (cid:80) Lq =0 au i ( q ) q − (cid:80) Lq =0 au j ( q ) (cid:80) iq =0 u i ( q ) − (cid:80) jq =0 u j ( q ) (291)A PPENDIX PP ROOF OF L EMMA Lemma 22. w n is strictly increasing in n ∈ [0 , R − .Proof: for n ∈ [0 , R − : w n +1 − w n = aR ( R − n )( R − n − > . (292)Let us first consider the interval [0 , R − .We have: f (cid:48) ( n ) = ( w n ) (cid:48) [1 − (1 − n R ) n + 1 R ] + w n [1 − ((1 − n R ) n + 1 R )] (cid:48) + [ a ( R − 12 + n ( n + 1)2 R )] (cid:48) (293)First, we deal with the first term ( w n ) (cid:48) [1 − (1 − n R ) n +1 R ] :According to Lemma 22, ( w n ) (cid:48) is positive since w n is increasing in n , and − (cid:80) nq =0 u n ( q ) = 1 − (1 − n R ) n +1 R isstrictly positive since (cid:80) nq =0 u n ( q ) < for n ≤ R − < L . Then, ( w n ) (cid:48) [1 − (1 − n R ) n +1 R ] ≥ , for n ∈ [0 , R − .For the second term, we have: w n [1 − ((1 − n R ) n + 1 R )] (cid:48) = a n R − R n + Rn ( R − n )(2 R ) (294)For the third term [ a ( R − + n ( n +1)2 R )] (cid:48) = a n +12 R . Adding the second term to the third term, we get: w n [1 − ((1 − n R ) n + 1 R )] (cid:48) + [ a ( R − 12 + n ( n + 1)2 R )] (cid:48) = a n R − R n + Rn ( R − n )(2 R ) + a n + 12 R (295) = a R ( R − n ) > (296) May 22, 2020 DRAFT1 So f is strictly increasing in [0 , R − For n = − , f ( − 1) = 0 < f (0) = a ( R − , and f ( R − < + ∞ then f in strictly increasing in [ − , R ] .A PPENDIX QP ROOF OF T HEOREM Lemma 23. For any numerical sequence: − ≤ i − < i < i < .... < i M ≤ L , such that for any k ∈ [0 , M − , b i k − < b i k < b i k +1 and a i k − a i k − b i k − b i k − < a i k +1 − a i k b i k +1 − b i k (297) Then for any k ∈ [0 , M − , we have for each k < s ≤ M : a i s − a i k − b i s − b i k − > a i k − a i k − b i k − b i k − (298) Proof: We fix certain k ∈ [0 , M − , we prove the result by induction:for s = k + 1 a i k +1 − a i k − b i k +1 − b i k − = a i k +1 − a i k − − a i k + a i k b i k +1 − b i k − (299) = a i k +1 − a i k b i k +1 − b i k − + a i k − a i k − b i k +1 − b i k − (300) > ( a i k − a i k − )( b i k +1 − b i k )( b i k − b i k − )( b i k +1 − b i k − ) + ( a i k − a i k − )( b i k − b i k − )( b i k − b i k − )( b i k +1 − b i k − ) (301)where the strict inequality comes from the lemma’s assumptions. We then have: a i k +1 − a i k − b i k +1 − b i k − > a i k − a i k − b i k − b i k − [ b i k +1 − b i k b i k +1 − b i k − + b i k − b i k − b i k +1 − b i k − ] (302) = a i k − a i k − b i k − b i k − (303)By induction, we consider that the above inequality is true for certain s strictly higher than k . The inequality belowis then verified for s + 1 : a i s +1 − a i k − b i s +1 − b i k − = a i s +1 − a i k − − a i s + a i s b i s +1 − b i k − (304) = a i s +1 − a i s b i s +1 − b i k − + a i s − a i k − b i s +1 − b i k − (305) > ( a i k − a i k − )( b i s +1 − b i s )( b i k − b i k − )( b i s +1 − b i k − ) + ( a i k − a i k − )( b i s − b i k − )( b i k − b i k − )( b i s +1 − b i k − ) (306) = a i k − a i k − b i k − b i k − [ b i s +1 − b i s b i s +1 − b i k − + b i s − b i k − b i s +1 − b i k − ] (307) = a i k − a i k − b i k − b i k − . (308)So the inequality is also true for s + 1 . This concludes the proof of the lemma. May 22, 2020 DRAFT2 Lemma 24. If L ≤ f ( d +1) a , then a L − a d b L − b d ≤ a d +1 − a d b d +1 − b d Proof: This lemma is an immediate application of Lemma 19.In fact when L ≤ f ( d +1) a it implies that a L − a d +1 b L − b d +1 ≤ a d +1 − a d b d +1 − b d Then according to the second case in Lemma 19, we have directly: a L − a d b L − b d ≤ a d +1 − a d b d +1 − b d (309) Lemma 25. The intersection points x L,R and x n,R satisfy x L,R ≤ x n,R , when n ∈ [ L − R + 2 , L − .Proof: We have: x L,R = 2 R ( L − R − − R (310) x n,R = n − R ρ ( L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 − − R − R (311) x n,R − x L,R = n − R ρ ( L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 − − R − R ( L − R − (312) = ( n − R )( R − − R ( L − ρ ( L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 − − R )( R − ρ ( L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 − − R ) (313)The denominator is greater than since R > , and h ( n ) = ρ ( L − − n )( L − n ) + 1 − (1 − ρ ) n − L + R +1 >h ( L − R + 1) = + R for n ∈ [ L − R + 2 , L − (using Lemma 4).We consider the following function (which is equal to the numerator): p ( x ) = ( x − R )( R − − R ( L − ρ L − − x )( L − x ) + 1 − (1 − ρ ) x − L + R +1 − − R ) (314)The function p is concave in the interval [ L − R + 1 , L − as p (cid:48)(cid:48) is negative. Then, p is quasi-concave in thisinterval and we have that p ( x ) ≥ min( p ( L − R + 1) , p ( L − for all x ∈ [ L − R + 1 , L − , where p ( L − R + 1) = ( L − R + 1)( R − ≥ (315)and p ( L − 1) = 2 R ( L − − ρ ) R + R − R ≥ (316)where the last inequality is due to the following analysis. First we use the fact that (1 − ρ ) R ≥ / for all R ≥ then R ( L − − ρ ) R ≥ R ( L − (317) R ( L − − ρ ) R + R − R ≥ R ( L − R − R (318) May 22, 2020 DRAFT3 We have L − ≥ R − , then: R ( L − − ρ ) R + R − R ≥ R (2 R − R − R (319) ≥ R − R R − R (320) ≥ R ≥ (321)From all the analysis above, we conclude that for all n ∈ [ L − R + 1 , L − p ( n ) ≥ . This is also true for n ∈ [ L − R + 2 , L − . Hence, the numerator and denominator of x n,R − x L,R are positive , which concludes theproof. Lemma 26. For any d ∈ [0 , R − , x L,d ≤ x n,d for any n ∈ [ L − R + 2 , L − .Proof: We start by proving that a L − a R − b L − b R − ≤ a n − a R − b n − b R − . We have: a L − a R − b L − b R − = a L − a R b L − b R − + a R − a R − b L − b R − (322)Since b R = b R − (see the expression of average passive time when n ∈ [ R − , L − R + 1] ), then: a L − a R − b L − b R − = a L − a R b L − b R + a R − a R − b L − b R − (323)As we have already proved in Lemma 25 that: a L − a R b L − b R ≤ a n − a R b n − b R . Hence: a L − a R − b L − b R − ≤ a n − a R b n − b R − + a R − a R − b L − b R − (324)Since b L > b n , hence: a L − a R − b L − b R − ≤ a n − a R b n − b R − + a R − a R − b n − b R − (325) = a n − a R − b n − b R − (326)Thus: a L − a R − b L − b R − ≤ a n − a R − b n − b R − (327)If d = R − , the proof is direct result from the inequality above.If d < R − :Given that a L − a R − b L − b R − ≤ a n − a R − b n − b R − , then applying lemma 19 fourth case, we deduce: a L − a n b L − b n ≤ a L − a R − b L − b R − ≤ a n − a R − b n − b R − (328)Now we prove that: a L − a R − b L − b R − ≤ a L − a d b L − b d (329)Given that L ≤ f ( d + 1) /a : a L − a d +1 b L − b d +1 ≤ a d +1 − a d b d +1 − b d .Hence applying lemma 24: a L − a d b L − b d ≤ a d +1 − a d b d +1 − b d (330) May 22, 2020 DRAFT4 According to Lemma 23, since w d +1 < · · · < w R − , thus: a d +1 − a d b d +1 − b d ≤ a R − − a d b R − − b d (331)Then: a L − a d b L − b d ≤ a R − − a d b R − − b d (332)Given that a L − a d b L − b d ≤ a R − − a d b R − − b d and applying Lemma 19 (fourth case), then: a L − a R − b L − b R − ≤ a L − a d b L − b d ≤ a R − − a d b R − − b d (333)Combining (328) and (333), we conclude: a L − a n b L − b n ≤ a L − a R − b L − b R − ≤ a L − a d b L − b d (334) a L − a n b L − b n ≤ a L − a d b L − b d (335)Given this result and applying lemma19 sixth case, we get our result: a n − a d b n − b d ≥ a L − a d b L − b d (336)Hence x n,d ≥ x L,d .This concludes the proof.Now, we can prove the proposition.Referring to the algorithm 1 that allows us to obtain the Whittle indices, we denote by j the step j described inthe algorithm.For ≤ j ≤ d ≤ R − We prove that for all n ∈ [ j + 1 , L ] , a n − a j − b n − b j − > a j − a j − b j − b j − We study four cases:1) n ∈ [ j + 1 , R − :Using lemma 22, w j < w j +1 < .... < w R − , therefore considering the set of element { j − , j, j + 1 , ..., R − } ,we can apply lemma 23, since a k − a k − b k − b k − < a k +1 − a k b k +1 − b k for all k ∈ [ j, R − .So for all n ∈ [ j + 1 , R − , a n − a j − b n − b j − > a j − a j − b j − b j − n ∈ [ R, L − R + 1] :There are two cases:a) j = R − :We have b n = b R − = b j , then a n > a j . Hence, a n − a j − b n − b j − > a j − a j − b j − b j − b) j < R − : b n = b R − ,and a n > a R − , then a n − a R − b n − b R − > a R − − a R − b R − − b R − . Therefore, by considering the set { j − , j, j + 1 , ...., R − May 22, 2020 DRAFT5 , n } , we have a n − a R − b n − b R − > a R − − a R − b R − − b R − = w R − > · · · > w j .Thus, we can apply Lemma 23 and get a n − a j − b n − b j − > a j − a j − b j − b j − .3) n ∈ [ L − R + 2 , L − :Using Lemma 26, we have a n − a d b n − b d ≥ a L − a d b L − b d .Given that f ( d ) a < L , that means a L − a d b L − b d > a d − a d − b d − b d − So considering the set { j − , j, ...d, n } , we have a n − a d b n − b d > a d − a d − b d − b d − = w d > ... > w j .Then we can apply Lemma 23 and obtain a n − a j − b n − b j − > a j − a j − b j − b j − .4) n = L We have a L − a d b L − b d > a d − a d − b d − b d − = w d > · · · > w j .Then, applying Lemma 23, a L − a j − b L − b j − > a j − a j − b j − b j − .Therefore, the largest minimizer at step j is j , and W ( j ) = w j = a j − a j − b j − b j − At step d + 1 :The largest minimizer at step d was d , then in order to prove that the largest minimizer at this step is L , we shouldprove that for all n > d , we have: a L − a d b L − b d ≤ a n − a d b n − b d . We distinguish again between three cases:1) n ∈ [ d + 1 , R − :We know that w d +1 < ... < w R − . Then, considering the set { d, d + 1 , ..., R − } and according to Lemma 23, weget a d +1 − a d b d +1 − b d ≤ a n − a d b n − b d for all n ∈ [ d + 1 , R − .Since a L − a d b L − b d ≤ a d +1 − a d b d +1 − b d (according to Lemma 24), then a L − a d b L − b d ≤ a n − a d b n − b d for all n ∈ [ d + 1 , R − .2) n ∈ [ R, L − R + 1] :a) d = R − :We have b n = b R − = b d . The case where the passive decision average time b n is equal to b n d = b d is not includedin the computation of Whittle indices (recall that n d is the largest minimizer at step d which is d ). This case canbe hence skipped.b) d = R − : b n = b R − ,and a n > a R − , then applying Lemma 24 we have a n − a R − b n − b R − > a R − − a R − b R − − b R − ≥ a L − a d b L − b d , and we concludethe result.c) d < R − :We have a n − a R − b n − b R − > a R − − a R − b R − − b R − . Therefore, by considering the set { d, d + 1 , ...., R − , n } , we have a d +1 − a d b d +1 − b d = May 22, 2020 DRAFT6 w d +1 < · · · ≤ w R − = a R − − a R − b R − − b R − < a n − a R − b n − b R − Combining Lemma 23 and Lemma 24, we get a L − a d b L − b d ≤ a d +1 − a d b d +1 − b d < a n − a d b n − b d for all n ∈ [ R, L − R + 1] n ∈ [ L − R + 2 , L − :Applying Lemma 26, we have a L − a d b L − b d < a n − a d b n − b d .Hence we proved that at step d + 1 , the largest minimizer is L . Therefore the Whittle’s index for all state i from d + 1 until L is W ( i ) = x L,d = a L − a d b L − b d .This concludes the proof of the proposition. A PPENDIX RP ROOF OF P ROPOSITION k in which W is different from all W ki .2) Class k such that there exists a given state j that satisfies W kj = W .First type of classes: For the class k in which W is different from all W ki , we prove that the optimal thresholdverifies l k ( W ) = l k = max i { arg max i { W ki | W ki ≤ W }} = max i { arg max i { W ki ) | W ki < W }} . First we havemax i { arg max i { W ki | W ki ≤ W }} = max i { arg max i { W ki ) | W ki < W }} since W ki is different from W for all state i . For state i less than l k , given that W ki is increasing in i , then W ki ≤ W kl k < W . Hence, due to the indexabilityof the class, D ( W ki ) ⊆ D ( W ) , which implies that the optimal decision at state i is passive action. For the state i strictly greater than l k , by definition of l k , W ki must be strictly greater than W since l k is the biggest integeramong the states that give the biggest Whittle index less than W . Then, according to the definition of Whittle index, W < min { W, i ∈ D ( W ) } that means W (cid:54)∈ { W, i ∈ D ( W ) } , therefore i (cid:54)∈ D ( W ) . Thus, the optimal decision atstate i > l k is active decision. Hence l k = max i { arg max i { W ki | W ki ≤ W }} = max i { arg max i { W ki ) | W ki < W }} iseffectively the optimal threshold l k ( W ) .For the second case, we start first by describing qualitatively the optimal threshold with respect to W . Then weprove the explicit expression:Second type of classes:For the class k such that there exists j , W kj = W , we distinguish between two cases:1) j ≤ R k − :We know that according to Theorem 3 W kj = w kj = x j,j − which is the point for which if W = x j,j − , we have (cid:80) Lq =0 au j ( q ) q − W (cid:80) jq =0 u j ( q ) = (cid:80) Lq =0 au j − ( q ) q − W (cid:80) j − q =0 u j − ( q ) . That means, according to equation (58),for W = x j,j − , if j is a minimizer of this equation ( j is the optimal threshold), then j − is also a minimizer ofthis equation. Due to indexability, for all states less or equal than j the optimal decision is to stay passive. Also,according to definition of Whittle index, for all states strictly higher than j the optimal decision is to be active.Then, j could be the threshold, so as for j − .Hence, the optimal threshold can be either j or j − .In fact, since W k < · · · < W kj − < W kj = W , then j = max i { arg max i { W ki | W ki ≤ W }} , and j − May 22, 2020 DRAFT7 max i { arg max i { W ki | W ki < W }} .This proves the proposition for this case.2) If j ≥ R k :Then W kj = W kL = W kR k = W , thus according to Theorem 3, W = x L,R k − . That means, at W , the threshold policycan be either L or R k − . L is the biggest integer such that W kj = W , and R k − is the biggest integer that verifiesthe strict inequality, explicitly L = max i { arg max i { W ki | W ki ≤ W }} and R k − max i { arg max i { W ki | W ki < W }} .A PPENDIX SP ROOF OF P ROPOSITION W min φ ∈ Φ f ( W, φ ) ≤ min φ ∈ Φ lim sup T →∞ T E (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 a k q ki ( t ) | q (0) , φ (cid:35) (337)As the optimal solution for fixed W is a threshold policy, we use the steady state form and the expression of theLHS of the inequality becomes:max W min φ f ( W, φ ) = max W { K (cid:88) k =1 γ k N (cid:88) i =1 [ min l k ∈ [0 ,L ] { L (cid:88) q =0 a k u l k k ( q ) q − W l k (cid:88) q =0 u l k k ( q ) } ] + W (1 − α ) N } (338)with φ the threshold policy that corresponds to l ( W ) computed using Proposition 5 for fixed W . For W ∗ thatsatisfies the constraint with equality (i.e. αN = (cid:80) Kk =1 γ k N (cid:80) Li = l k +1 ( W ∗ ) u l k ( W ∗ ) k ( i ) , which is in fact true for all N , and then we can get rid of N ), we get exactly the objective function of the primal problem. Therefore,we get athreshold vector l ( W ∗ ) that gives a solution for the primal problem less than the optimal solution for this problemaccording to inequality (337). Then, surely this solution given by l ( W ∗ ) is the optimal one for the constrained relaxedproblem, since it satisfies the constraint and for all policy φ that satisfies the constraint and belong to Φ , we have f ( W ∗ , l ( W ∗ )) = (cid:80) Kk =1 (cid:80) γ k Ni =1 [ (cid:80) Lq =0 a k u l k ( W ∗ ) k ( q ) q ] = lim sup T →∞ T E (cid:104)(cid:80) T − t =0 (cid:80) Kk =1 (cid:80) γ k Ni =1 a k q ki ( t ) | q (0) , l ( W ∗ ) (cid:105) ≤ min φ lim sup T →∞ T E (cid:104)(cid:80) T − t =0 (cid:80) Kk =1 (cid:80) γ k Ni =1 a k q ki ( t ) | q (0) , φ (cid:105) .We deduce that the solution of the relaxed problem is of type threshold-based policy l ( W ∗ ) with W ∗ satisfies α = (cid:80) Kk =1 γ k (cid:80) Li = l k +1 ( W ∗ ) u l k ( W ∗ ) k ( i ) . A PPENDIX TP ROOF OF P ROPOSITION Lemma 27. For each class k , (cid:80) Ln +1 u nk ( i ) is strictly decreasing in n , when n ∈ [ − , R k − ∪ L .Proof. We have (cid:80) n u nk ( i ) is strictly increasing in this set (see Lemma 3 and the fact that (cid:80) L u Lk ( i ) > (cid:80) R k − u R k − k ( i ) ), then (cid:80) Ln +1 u nk ( i ) = 1 − (cid:80) n u nk ( i ) , is strictly decreasing in n . (cid:4) May 22, 2020 DRAFT8 We define the following order relation in R K such that for any two vectors l and l , l ≤ l ⇐⇒ for eachelement of vector of index k , we have l k ≤ l k . Recall that according to Proposition 5, we can directly deduce thatfor W ≤ W l ( W ) ≤ l ( W ) and for all W and class k , l k ( W ) can be either less than R k − or equal to L .Without loss of generality, when W ∈ R + , the corresponding set of threshold vectors l ( W ) is perfectly ordered.Then, by applying Lemma 27, (cid:80) Kk =1 γ k (cid:80) Li = l k ( W )+1 u l k ( W ) k ( i ) is strictly decreasing in l ( W ) , and take discretevalues from to . According to Proposition 5, we have for each class k and state i , if W = W ki then there is twopossible optimal thresholds vectors l ( W ) and l ( W ) with l ( W ) < l ( W ) . Hence we can deduce that there exists aclass m and state p such that (cid:80) Kk =1 γ k (cid:80) Li = l k ( W mp )+1 u l k ( W mp ) k ( i ) ≥ α and (cid:80) Kk =1 γ k (cid:80) Li = l k ( W mp )+1 u l k ( W mp ) k ( i ) ≤ α .We find the relation between l ( W mp ) and l ( W mp ) .Before that, we prove that l k ( W mp ) is less than R k − for all class k under assumption 2 and 1. For that we needto check if there exist W such that l ( W ) = ( R , · · · , R K ) . In fact, according to assumption 1, we can deducethat W k (cid:48) L is strictly greater than W ki for i ∈ [0 , R k − for all k and k (cid:48) (we check that by replacing the expressionof W k (cid:48) L and W kR k − given in Theorem 3). Hence there exists a given W such that W ki < W for i ∈ [0 , R k − and W k (cid:48) L > W for all k and k (cid:48) . Then for a such W denoted W the optimal threshold for each class k is l k ( W ) = R k − .According to the expression of the average passive time given in section V, (cid:80) Kk =1 γ k (cid:80) R k − i =0 u R k − k ( i ) = + (cid:80) Kk =1 γ k R k , therefore (cid:80) Kk =1 γ k (cid:80) Li = R k u R k − k ( i ) = − (cid:80) Kk =1 γ k R k . Hence, considering the assumption 2, α ≥ (cid:80) Kk =1 γ k (cid:80) Li = R k u R k − k ( i ) .As (cid:80) Kk =1 γ k (cid:80) Li = l k ( W mp )+1 u l k ( W mp ) k ( i ) ≥ α , then l ( W mp ) ≤ ( R − , · · · , R K − 1) = l ( W ) . Given that thethresholds vector are increasing in W , W mp ≤ W , hence l ( W mp ) ≤ l ( W ) = ( R − , · · · , R K − Therefore, l m ( W mp ) ≤ R k − , then according to Proposition 5, when W = W mp , l m ( W mp ) = l m ( W mp ) and l m ( W mp ) = l m ( W mp ) − l m ( W mp ) − can be both the optimal thresholds for class m . As for the other classes, l k ( W mp ) = l k ( W mp ) = l k ( W mp ) .If we force W ∗ to be equal to W mp , the optimal threshold vector can be either l ( W mp ) or l ( W mp ) , then we canintroduce some randomization between the two policies. In other words, we use the threshold policy l ( W mp ) withprobability θ and l ( W mp ) with probability − θ . The new stationary distribution for the class m is then a linearcombination of these two threshold policies l m ( W mp ) and l m ( W mp ) − : u ∗ m = θu l m ( W mp ) m + (1 − θ ) u l m ( W mp ) − m .Hence, in a state strictly less than l m ( W mp ) , the queues will not transmit, whereas in a state strictly greaterthan l m ( W mp ) , they will transmit with probability one. If the queues are in state l m ( W mp ) , they will transmitwith probability (1 − θ ) u lm ( Wmp ) − m ( l m ( W mp )) θu lm ( Wmp ) m ( l m ( W mp ))+(1 − θ ) u lm ( Wmp ) − m ( l m ( W mp )) . Since the probability to be in this state l m ( W mp ) is u ∗ m ( l m ( W mp )) , the proportion of time that the queues will be in active mode is: α = (cid:88) k (cid:54) = m L (cid:88) i = l k ( W mp )+1 γ k u l k ( W mp ) k ( i ) + L (cid:88) i = l m ( W mp )+1 γ m u ∗ m ( i ) + (1 − θ ) γ m u l m ( W mp ) − m ( l m ( W mp )) When θ = 0 , the threshold policy is l m ( W mp ) − and the total average time in active mode is higher than α . When θ = 1 , the threshold policy is l m ( W mp ) and the total average time in active mode is less than α .Given that (cid:80) k (cid:54) = m (cid:80) Li = l k ( W mp )+1 γ k u l k ( W mp ) k ( i ) + (cid:80) Li = l m ( W mp )+1 γ m u ∗ m ( i ) + (1 − θ ) γ m u l m ( W mp ) − m ( l m ( W mp )) is May 22, 2020 DRAFT9 continuous in θ , then there exists at least one θ which verifies the equality. Hence, for W ∗ = W mp , we get athreshold policy for all classes except for class m where the optimal solution is a linear combination of twothreshold policies. Moreover for a given randomized parameter θ , the constraint (4) is satisfied with equality.A PPENDIX UP ROOF OF P ROPOSITION Q .The matrix Q is of the form: Q · · · · · · · · · · · · Q · · · · · · · · · · · · ... . . . A A · · · Q m · · · A K − A K ... . . . ... · · · · · · · · · Q K − 00 0 · · · · · · · · · Q K (339)The characteristic polynomial of Q is the product of the characteristic polynomial of each matrix Q k : χ Q ( λ ) = K (cid:89) k =1 χ Q k ( λ ) (340)1)The case k (cid:54) = m : Q k = May 22, 2020 DRAFT0 · · · l − l + 1 l + 2 · · · R − R R + 1 · · · l + R − l + R l + R + 1 · · · L ρ k · · · ρ k · · · · · · ρ k ρ k · · · · · · · · · ... ... . . . ... ... ρ k ρ k ... ... . . . ... ... ... l − ... . . . ... ... ... . . . ... ... l − ρ k · · · · · · ρ k ρ k · · · · · · ρ k ρ k · · · · · · ρ k · · · · · · l + 1 0 · · · · · · · · · · · · · · · · · · − ρ k · · · − ρ k ... ... ... ... ... ... ... ... . . . − ρ k − ρ k R − ... ... ... ... ... ... ... . . . − ρ k − ρ k R − · · · · · · · · · · · · · · · · · · · · · − ρ k − ρ k R − ρ k · · · − ρ k · · · · · · − ρ k − ρ k · · · · · · − ρ k − ρ k ... ... . . . ... ... − ρ k − ρ k ... ... . . . ... ... . . . ...... . . . ... ... ... . . . ... − ρ k − ρ k · · · · · · − ρ k − ρ k · · · · · · − ρ k − ρ k · · · · · · − ρ k · · · − ρ k − ρ k l + R · · · · · · · · · · · · · · · · · · ρ k · · · ... ... ... ... ... ... ... ... . . . ρ k L − ... ... ... ... ... ... ... . . . L · · · · · · · · · · · · · · · · · · · · · · · · After computations and some algebraic manipulations, we get, χ Q k ( λ ) = ( − λ ) L k = m : Q m = May 22, 2020 DRAFT1 · · · l − l + 1 l + 2 · · · R − R R + 1 · · · l + R − l + R l + R + 1 · · · L ρ m · · · · · · · · · − ρ m · · · − ρ m − ρ m · · · · · · − ρ m ... ... . . . ... ... ... ... . . . ... ... − ρ m − ρ m ... l − ... . . . ... ... ... . . . − ρ m ... ... l − ρ m · · · · · · ρ m · · · · · · · · · · · · − ρ m · · · · · · − ρ m l + 1 0 · · · · · · · · · · · · · · · · · · − ρ m · · · − ρ m ... ... ... ... ... ... ... ... . . . − ρ m − ρ m R − ... ... ... ... ... ... ... . . . − ρ m − ρ m R − · · · · · · · · · · · · · · · · · · · · · − ρ m − ρ m R − ρ m · · · · · · · · · ρ m · · · ρ m ρ m · · · ... ... . . . ... ... ... ... . . . ... ... ρ m . . . ...... . . . ... ... ... . . . ρ m ... − ρ m · · · · · · − ρ m · · · · · · · · · · · · ρ m · · · l + R · · · · · · · · · · · · · · · · · · ρ m · · · ... ... ... ... ... ... ... ... . . . ρ L − ... ... ... ... ... ... ... . . . L · · · · · · · · · · · · · · · · · · · · · · · · After computations and some algebraic manipulations, we get: χ Q m ( λ ) = ( − λ ) L − ( l m ρ m − λ ) For k (cid:54) = m Q k has only as eigen value.For k = m , χ Q m ( λ ) = 0 ⇔ λ = 0 or λ = l m ρ m , hence Q m has two eigen values which are and l m ρ m . GivenAssumption 2, the optimal threshold l k is less strictly than R k for all k . Accordingly l m ρ m < R m ρ m = 1 .Consequently, in both cases, the norms of all eigen values of the obtained matrix are strictly less than 1.A PPENDIX VP ROOF OF L EMMA < (cid:15) < µ , Z N ( t ) converges to z ∗ , i.e. there exists T such that for all t ≥ T , || Z N ( t ) − z ∗ || ≤ (cid:15) .Hence: P x ( sup T ≤ t Therefore: P x ( sup T ≤ t In fact − α = (cid:88) k (cid:54) = m l k (cid:88) i =0 γ k u l k k ( i ) + l m − (cid:88) i =0 γ m u ∗ m ( i ) + θγ m u ∗ m ( l m ) . (355)For any k ∈ [1 , K ] and for any threshold n k < R k , and by replacing u n k k by its expression given in section IV, wehave: n k (cid:88) i =0 γ k u n k k ( i ) = γ k n k (cid:88) i =0 ( ρ k − ( n k − i ) ρ k ) = γ k ( n k + 1) ρ k − γ k ρ k ( n k + 1) n k May 22, 2020 DRAFT4 and R k + n k − (cid:88) i = R k γ k u n k k ( i ) = γ k n k (cid:88) i =0 ( n k − i ) ρ k = γ k ρ k ( n k + 1) n k (356)we have: γ k ( n k + 1) ρ k > γ k ρ k ( n k + 1) n k (357)Hence: n k (cid:88) i =0 γ k u n k k ( i ) ≥ R k + n k − (cid:88) i = R k γ k u n k k ( i ) (358)That means, for k (cid:54) = m : (cid:88) k (cid:54) = m l k (cid:88) i =0 γ k u l k k ( i ) ≥ (cid:88) k (cid:54) = m R k + l k − (cid:88) i = R k γ k u l k k ( i ) (359)For k = m : l m − (cid:88) i =0 γ m u ∗ m ( i ) + θγ m u ∗ m ( l m ) = γ m (1 − θ ) l m − (cid:88) i =0 u l m − m ( i ) + θγ m l m (cid:88) i =0 u l m m ( i ) ≥ γ m (1 − θ ) R m + l m − (cid:88) i = R m u l m − m ( i ) + θγ m R m + l m − (cid:88) i = R m u l m m ( i )= γ m (1 − θ ) R m + l m − (cid:88) i = R m u l m − m ( i ) + θγ m R m + l m − (cid:88) i = R m u l m m ( i )= R m + l − (cid:88) i = R m γ m u ∗ m ( i ) (360)The inequality comes from (358).Then (cid:80) k (cid:54) = m (cid:80) R k + l k − i = R k γ k u l k k ( i ) + (cid:80) R m + l m − i = R m γ m u ∗ m ( i ) is less than (cid:80) k (cid:54) = m (cid:80) l k i =0 γ k u l k k ( i ) + (cid:80) l m − i =0 γ m u ∗ m ( i ) + θγ m u ∗ m ( l m ) = 1 − α In the remaining of the proof, we will consider separately the cases α ≤ and α > .If α ≤ , the proof of the desired result consists of 3 steps.Step 1:We start by state z (0) , for all k (cid:54) = m , we will exactly schedule all proportions: z k, ∗ l k +1 .......z k, ∗ L , and for k = m ,we schedule all proportions z m, ∗ l k +1 , .....z m, ∗ L plus the proportion (1 − θ ) z m, ∗ l m . The sum of these tree proportions is α . We denote these sets of queues by group A. We consider that, after scheduling, all these proportions will be atstate R k − (depending on each class). For the rest of proportions which is equal to − α , only α proportion willbe at state (we call this group B). The rest which equals to − α (group C) will be at state . The queue stateproportions vector for class k (cid:54) = m after this step is: z k = ( z k = β k , z k = α k , , , · · · , z kR k − = L (cid:88) i = l k +1 z k, ∗ i , , · · · , (361) May 22, 2020 DRAFT5 The queue state proportions vector for class k = m : z m = ( β m , α m , , , · · · , L (cid:88) i = l m +1 z m, ∗ i + (1 − θ ) z m, ∗ l m , , · · · , (362)with (cid:80) α k = α and (cid:80) β k = 1 − α .Step 2:Using the Whittle’s Index policy, according to Lemma 28, group A is scheduled again. After scheduling, we considerthat group B which is at state goes to state R k ( R k − packets are the arrivals at each class-k queue). For groupC, the queues stay at state (no arrivals).But for the α proportion scheduled (group A), we have for each k :1) when k (cid:54) = m :a) For each state h from l k + 1 until R k − : exactly z k, ∗ h goes to state h (this is feasible since if a queue at state R k − is scheduled, it can go to any other state strictly less than R k )b) For each state h from R k until R k + l k − , we will have exactly z k, ∗ h proportion of queues that go to state h − ( R k − , which is strictly less than R k .2) When k = m a) for each state from l m + 1 until R m + l m − , the same analysis done for k (cid:54) = m holds.b) for h = l m , (1 − θ ) z m, ∗ l m will be at state l m .Hence after this step the new queue state proportion vector for class k (cid:54) = m is: ( β k , z k = z k, ∗ R k , · · · , z kl k = z k, ∗ R k + l k − , z kl k +1 = z k, ∗ l k +1 , · · · , z kR k − = z k, ∗ R k − , α k , , · · · , (363)The queue state proportion vector for class k = m is: ( β m , z m = z m, ∗ R m , · · · , z ml m = z m, ∗ R m + l m − + (1 − θ ) z m, ∗ l m , z ml m +1 = z m, ∗ l m +1 , · · · , z mR m − = z m, ∗ R m − , α m , , · · · , (364)Step 3: Under assumption 1, we have w kL = w kR ≥ w k (cid:48) n for all k and k (cid:48) and for ≤ n ≤ R k (cid:48) − .That means, we will schedule all the α queues at state R k (i.e. group B), and we can therefore go to any state lessthan R k − .For the remaining − α queues that are in state (i.e. group C), after applying a passive action (no transmission),their states will change to any state less than or equal to R k − .For group A ( α proportion of queues), we have for each k :1) For each state from l k + 1 until R k − ; they stay at same state ( arrivals).2) For h from R k until R k + l k − , the proportion z k, ∗ h goes from state h − ( R k − to h after that R k − packetsarrive.3) For k = m and h = l m : (1 − θ ) z m, ∗ l m proportion stays at same state (0 arrivals).So after this step: we will reach the optimal z ∗ of the relaxed problem: The queue state proportion vector for class k (cid:54) = m is: z k, ∗ = ( z k, ∗ , z k, ∗ , ...., z k, ∗ l k + R k − , , ...... (365) May 22, 2020 DRAFT6 The queue state proportion vector for class m is: z m, ∗ = ( z k, ∗ , z m, ∗ , ......, z m, ∗ l m + R m − , , ...... (366)This implies that we have reached the optimal proportion z ∗ .If α > :Step 1: the same step as we did when α ≤ , however all − α queues (group B) that are not scheduled willbe at state since − α < α . Hence the new queue state proportions vector after this step for k (cid:54) = m is: z k = (0 , z k = β k , , , · · · , z kR k − = L (cid:88) i = l k +1 z k, ∗ i , , · · · , (367)For k = m : z m = (0 , z m = β m , , , · · · , L (cid:88) i = l m +1 z m, ∗ i + (1 − θ ) z m, ∗ l m , , · · · , (368)with (cid:80) β k = 1 − α Step 2: The group A is scheduled again, and the − α proportion of queues at state (group B), which are notscheduled, will go to state R k . For l k + 1 ≤ h ≤ R k − , z k, ∗ h will be at state h , after scheduling.For R k ≤ h ≤ R k + l k − , z k, ∗ h will be at state h − ( R k − , and (1 − θ ) z m, ∗ l m will be at state l m .Hence after this step, the queue state proportion vector for class k (cid:54) = m is: (0 , z k = z k, ∗ R k , · · · , z kl k = z k, ∗ R k + l k − , z kl k +1 = z k, ∗ l k +1 , · · · , z kR k − = z k, ∗ R k − , β k , , · · · , (369)For k = m : (0 , z m = z m, ∗ R m , · · · , z ml m = z m, ∗ R m + l m − + (1 − θ ) z m, ∗ l m , z ml m +1 = z m, ∗ l m +1 , · · · , z mR m − = z m, ∗ R m − , β m , , · · · , (370)Step 3:Using the Whittle’s Index policy, we schedule (1 − α ) proportion of queues at state R k (group B), plus proportionamong the group A. We divide the group A into two disjoint proportions A and A , where A is defined as the setthat contains all proportions z k till z kl k for each k minus part from z ml m which is (1 − θ ) z m, ∗ l m . Explicitly, replacing z ki by its value at step 2, we have A = (cid:80) k (cid:54) = m (cid:80) R k + l k − i = R k z k, ∗ i + (cid:80) R m + l m − i = R m z m, ∗ i . Since we have proved that this sumis less than − α according to Lemma 29, then, we can be sure that the whole proportion is not scheduled at step3. Since B = 1 − α < α and B + A = 1 − A > α , we just need to schedule in addition to B, a proportion from A called A . In fact we will choose the A highest Whittle index’s queues among A such that A + B = α .We note A = A − A . Hence in this step the proportion scheduled is A + B and the proportion for whichwe take a passive decision is A + A . However we still need to prove that the Whittle index of proportions A is less than that of the α proportion scheduled states (i.e. B plus A ).For the group B at state R k , w kR k ≥ w k (cid:48) n for all k and k (cid:48) and ≤ n ≤ R k (cid:48) − , then the Whittle index of all otherqueues state belonging to either A or A are less than the one of queue state belonging to group B.For A : their states are surely among the states l k + 1 , ....., R k − for all k , plus the state l m . Hence, the Whittle May 22, 2020 DRAFT7 index of any of these states is higher or equal than w ∗ , with w ∗ is the optimal subsidy for the relaxed problem(following the definition of the optimal threshold vector l ), that is also true for A .For the proportion A , the whole proportion is at a state that has an index less or equal than w ∗ that is less than theWhittle indices of proportion A . Hence, the Whittle indices of proportion A is less that the Whittle indices ofproportion A and B. By definition of A , the Whittle indices of this proportion is less than the Whittle indices ofthe proportion A and by consequence less than those of the proportion B. This confirms that the whole proportion A + A is not scheduled.For the proportion (1 − α ) (group B) at R k , the group of queues can go to any state less than R k − after scheduling.In fact, their states will go to all states less than l k for each k according to the optimal proportion vector z ∗ , exceptfor the state l m at class m for which only θz m, ∗ l m goes to state l m .For A : the queues in this group will stay at the same states. In fact, for each class k , the states of the queues areall less than R k . Then by scheduling these queues, the departure will be equal to the queue length. On the otherhand, by considering that the number of arrival packets is equal to the previous queue length, one can ensure thatthe states of the queues in this group remain unchanged.For A : Not scheduling the queues in this group implies that they will stay at the same state considering thenumber of packet arrival is .For A : This group is not scheduled. The state of the queues in class k will change by adding R k − arrivalpackets to their previous length.Consequently, after this step, the new queue state proportion vector:for k (cid:54) = m : z k, ∗ = ( z k, ∗ , z k, ∗ , ...., z k, ∗ l k + R k − , , ...... (371)for k = m : z m, ∗ = ( z k, ∗ , z m, ∗ , ......, z m, ∗ l m + R m − , , ...... (372)which means that we have reached the optimal proportion vector z ∗ .A PPENDIX ZP ROOF OF T HEOREM lim T →∞ C NT ( x ) N − C RP,N N = K (cid:88) k =1 L (cid:88) i =0 a k E (cid:104) Z k,Ni ( ∞ ) (cid:105) i − K (cid:88) k =1 L (cid:88) i =0 a k z k, ∗ i i (373)We have the function f : z → (cid:80) Kk =1 (cid:80) Li =0 a k z ki i is lipchitz and continuous, then for an arbitrary small (cid:15) , thereexists µ such that if || z − z ∗ || < µ , then | f ( z ) − f ( z ∗ ) | < (cid:15) . May 22, 2020 DRAFT8 We denote U N the event sup || Z N ( ∞ ) − z ∗ || ≥ µ , then : | K (cid:88) k =1 L (cid:88) i =0 a k E (cid:104) Z k,Ni ( ∞ ) (cid:105) i − K (cid:88) k =1 L (cid:88) i =0 a k z k, ∗ i i | ≤ P ( U N ) E (cid:34) | K (cid:88) k =1 L (cid:88) i =0 ( a k Z k,Ni ( ∞ ) i ) − a k z k, ∗ i i || U N (cid:35) + (1 − P ( U N )) E (cid:34) | K (cid:88) k =1 L (cid:88) i =0 ( a k Z k,Ni ( ∞ ) i ) − a k z k, ∗ i i || U N (cid:35) ≤ L ( L + 1) K (cid:88) k =1 a k γ k P ( U N ) + (1 − P ( U N )) (cid:15) (374)According to Lemma 12, we have lim N →∞ P ( U N ) = 0 , then: lim N →∞ | K (cid:88) k =1 L (cid:88) i =0 a k E (cid:104) Z k,Ni ( ∞ ) (cid:105) i − K (cid:88) k =1 L (cid:88) i =0 a k z k, ∗ i i | ≤ (cid:15) (375)This is true for any (cid:15) . Finally we have: lim N →∞ | lim T →∞ C NT ( x ) N − C RP,N N | = 0 (376)That completes the proof.(376)That completes the proof.