[PDF] On the Global Optimality of Whittle's index policy for minimizing the age of information

Abstract

This paper examines the average age minimization problem where only a fraction of the network users can transmit simultaneously over unreliable channels. Finding the optimal scheduling scheme, in this case, is known to be challenging. Accordingly, the Whittle's index policy was proposed in the literature as a low-complexity heuristic to the problem. Although simple to implement, characterizing this policy's performance is recognized to be a notoriously tricky task. In the sequel, we provide a new mathematical approach to establish its optimality in the many-users regime for specific network settings. Our novel approach is based on intricate techniques, and unlike previous works in the literature, it is free of any mathematical assumptions. These findings showcase that the Whittle's index policy has analytically provable asymptotic optimality for the AoI minimization problem. Finally, we lay out numerical results that corroborate our theoretical findings and demonstrate the policy's notable performance in the many-users regime.

Full PDF

11 On the Global Optimality of Whittle’s indexpolicy for minimizing the age of information

KRIOUILE Saad * , ASSAAD Mohamad * , and MAATOUK Ali ** TCL Chair on 5G, Laboratoire des Signaux et Systemes, CentraleSupelec, Gif-sur-Yvette, France

Abstract

This paper examines the average age minimization problem where only a fraction of the network users cantransmit simultaneously over unreliable channels. Finding the optimal scheduling scheme, in this case, is known to bechallenging. Accordingly, the Whittle’s index policy was proposed in the literature as a low-complexity heuristic tothe problem. Although simple to implement, characterizing this policy’s performance is recognized to be a notoriouslytricky task. In the sequel, we provide a new mathematical approach to establish its optimality in the many-users regimefor speciﬁc network settings. Our novel approach is based on intricate techniques, and unlike previous works in theliterature, it is free of any mathematical assumptions. These ﬁndings showcase that the Whittle’s index policy hasanalytically provable asymptotic optimality for the AoI minimization problem. Finally, we lay out numerical resultsthat corroborate our theoretical ﬁndings and demonstrate the policy’s notable performance in the many-users regime.

I. I

NTRODUCTION

Technological advances in wireless communications and the cheap cost of hardware have led to the emergenceof real-time monitoring services. In these systems, an entity is interested in knowing the status of one or multipleprocesses observed by a remote source. Accordingly, the source sends packets to the monitor to provide informationabout the process/processes of interest. The main goal in these applications is to keep the monitor up to date. Infact, in such applications, information has the highest value when it is fresh since the outcome of the monitor’stasks is better when it is based on new rather than outdated data. To quantify this notion of freshness, the Age ofInformation (

AoI ) was introduced in [1]. Ever since, the AoI has become a hot research topic, and a considerablenumber of research works have been published on the subject [2]–[9].Among the most fundamental issues that the research community aimed to address is age-based resourceallocation. In most real-time applications, numerous sources share the same transmission channel where the availableresources are scarce. The scarcity can be a consequence of battery considerations for the devices involved orphysical interference that may limit the number of simultaneous transmissions. Consequently, a smart resourceallocation scheme has to be adopted to minimize the AoI and attain the desired timeliness objective. In [10], theauthors proposed both age-optimal and near age-optimal scheduling policies for the single and multi-server cases,respectively. In particular, they have shown that a greedy policy is age-optimal under certain assumptions in thesingle exponential server case. In [11], the authors examined a single-source scenario where the source’s update a r X i v : . [ c s . I T ] F e b rate cannot exceed a predeﬁned limit due to battery considerations. In this case, they were able to propose anage-optimal scheduling policy when the channel exhibit possible decoding errors. Age-optimal policies were alsoproposed in various network settings such as distributed scheduling and random access environments [12]–[14].Among the scheduling problems investigated in the literature, we cite the following: consider N users commu-nicating with a central entity over unreliable channels where, at most, M < N users can transmit simultaneously.What is the age-optimal strategy in this case? The wide range of applications that this problem encompasses let itemerge as a fundamental one that needs to be investigated. Unfortunately, this problem belongs to the family ofRestless Multi-Armed Bandit (

RMAB ) problems, which are generally difﬁcult to solve optimally. To address thisdifﬁculty, the authors in [15] have examined this problem and proved that a greedy algorithm is optimal when usershave identical channel statistics. For the asymmetric case, the authors proposed a sub-optimal policy, known as theWhittle’s index policy. The Whittle’s index policy has been embraced by many works in various frameworks [16]–[25] as it is recognized for its low complexity and its notable performance. For example, in [17], the Whittle’s indexpolicy was adopted to minimize the average delay of queues. In another line of work, the authors in [22] employeda Whittle’s index-based policy to maximize the average throughput over Markovian channels. Although it is simpleto implement, the main challenge that arises when adopting this policy is characterizing its performance since itsanalysis is known to be notoriously difﬁcult. To attend to this difﬁculty, the authors in [24] provided a sufﬁcientcondition, dubbed as Weber’s condition, for the Whittle’s index policy’s asymptotic optimality in the many-usersregime. However, this condition requires ruling out the existence of both closed orbits and chaotic behavior of ahigh-dimensional non-linear differential equation, which is extremely difﬁcult to verify even numerically. To furtherfacilitate the analysis of the policy, the works in [17], [22] have provided an approach based on a ﬂuid limit modelfor the delay minimization and throughput maximization frameworks. By leveraging this model, they proved theasymptotic optimality of the Whittle’s index policy in these frameworks under a recurrence assumption that iseasier than Weber’s condition but still requires numerical veriﬁcation. Following the same footsteps, the presentauthors adopted the ﬂuid limit model and provided proof of the asymptotic optimality of the Whittle’s indexpolicy in the AoI framework under similar assumptions [21]. This raises the following important question: can weprove the Whittle’s index policy’s asymptotic age-optimality in speciﬁc network settings without recoursing to any assumptions? Answering this question is extremely difﬁcult and has yet to be answered even for the standard delayand throughput metrics. In this paper, we examine this question in the AoI framework, and we provide rigoroustheoretical results that showcase the validity of the Whittle’s index asymptotic optimality in certain network settingswithout imposing any assumptions. Note that the importance of the asymptotic many-users regime stems from theastronomical growth in the number of interconnected devices. For example, machine-type communications and theIoT in 5G networks require supporting tens of thousands of connected devices in a single cell. To that end, wesummarize in the following the structure of the paper along with its key contributions: • We start by formulating the problem of minimizing the average age of a network where M out of N userscan communicate simultaneously with the central entity. As previously explained, this problem belongs to theclass of RMAB problems, which are known to be notoriously difﬁcult to solve. Accordingly, the Whittle’s index policy has been proposed in previous works as a low-complexity solution, which is the main focus ofour work. To establish the Whittle’s index policy, the following steps have to be taken:1) Provide a relaxed version of the original problem and tackle it through a Lagrangian approach.2) Prove the indexability property of the relaxed problem and derive the Whittle’s index expressions.These steps have been carried out in previous works by the authors in [15], and their main results are reportedin our paper for completeness. • Next, we present a ﬂuid limit model that approximates the Whittle’s index policy behavior. In the many-users regime, we prove that the ﬂuid limit can be made arbitrarily close to the actual network’s evolution.Therefore, we mainly focus on the evolution of the ﬂuid limit vector in our optimality analysis. The methodpreviously carried out in the literature to establish the Whittle’s index policy’s asymptotic optimality follows aspectral analysis approach [22]. However, this approach is highly contingent on the initial state of the system.Accordingly, to extend their results to any random initial state, the authors imposed a restrictive assumption,which can only be veriﬁed numerically. In our paper, we take a different approach to analyze the ﬂuid model.Speciﬁcally, we propose a novel method based on intricate techniques (e.g., Cauchy criterion) to prove the ﬂuidmodel’s convergence to a ﬁxed point. We stress that this step’s technical details are intricate and constituteour paper’s main technical contribution. Note that, even for the standard delay and throughput metrics, suchproof was not provided in the literature, which further highlights our approach’s novelty. Afterwards, weestablish the global optimality of Whittle’s index policy leveraging the fact that the aforementioned ﬁxed pointis nothing but the optimal system’s operating point in the many-users regime. Finally, we provide numericalresults that corroborate the theoretical results and highlight the Whittle’s index policy’s notable performancein the many-users regime.The rest of the paper is organized as follows: Section II is devoted to the system model and the problem formulation.Section III is dedicated to the establishment of the Whittle’s index policy. In Section IV-B, we provide our mainresults where we prove the asymptotic optimality of the Whittle’s index policy. Numerical results that corroborateour theoretical ﬁndings are given in Section V while Section VI concludes the paper.II. S

YSTEM M ODEL AND P ROBLEM F ORMULATION

A. System Model

We consider a time-slotted system with one base station, M uncorrelated channels, and N users ( N > M ). Timeis considered to be normalized to the slot duration (i.e, t = 1 , , . . . ). We suppose that any of the M channels canbe allocated to at most one user. Hence, at most M users will be able to transmit in each time slot t . If a user isscheduled at time t , it generates a fresh new packet and sends it to the base station. This packet is successfullydecoded by the base station at time t + 1 with a certain success probability. We consider that if a decoding errortakes place, the packet is discarded (i.e., users are not equipped with buffers). In practice, users may share similarchannel conditions. Accordingly, we suppose that the users can be partitioned into K = 2 different classes suchthat users within the same class share the same decoding success probability. In other words, each user i belonging to class k ∈ { , } has a decoding success probability p k , which is assumed to be known by the scheduler. We let γ k be the proportion of users belonging to class k . To that end, the following always holds: γ + γ = 1 .A scheduling policy π is deﬁned as a sequence of actions π = ( a π (0) , a π (1) , . . . ) where a π ( t ) = ( a ,π ( t ) , a ,π ( t ) , . . . , a ,πγ N ( t ) , a ,π ( t ) , a ,π ( t ) , . . . , a ,πγ N ( t )) is a binary vector such that a k,πi ( t ) = 1 ifuser i of class k is scheduled at time t . We also let the binary random variable c ki ( t ) denote the channel state ofuser i of class k such that c ki ( t ) = 1 if no decoding error takes place. As per our system model, we always have Pr( c ki ( t ) = 1) = p k and Pr( c ki ( t ) = 0) = 1 − p k for any user i of class k . We let B k,πi ( t ) denote the time-stampof the freshest packet delivered by user i of class k to the base station at time t under the scheduling policy π . Theage of information, or simply the age, of user i of class k is deﬁned as [1]: s ki ( t ) = t − B ki ( t ) (1)By taking into account the variables deﬁned, the age of this user under policy π evolves as follows: s k,πi ( t + 1) =  if a k,πi ( t ) = 1 , c ki ( t ) = 1 s k,πi ( t ) + 1 if a k,πi ( t ) = 1 , c ki ( t ) = 0 s k,πi ( t ) + 1 if a k,πi ( t ) = 0 , (2)We let s π ( t ) denote the vector of all users’ age s π ( t ) = ( s ,π ( t ) , · · · , s ,πγ N ( t ) , s ,π ( t ) , · · · , s ,πγ N ( t )) under policy π . With all these notations in mind, we can formulate the optimization problem that we focus on in our paper. B. Problem Formulation

In this paper, we are interested in minimizing the total expected average age of information of the network underthe constraint on the number of users scheduled at each time slot t . The latter must be less than the total numberof channels αN where α is equal to MN . We let Π denote the set of all causal scheduling policies in which thescheduling decisions are made based on the history and current states of the system. To that end, and given an initialsystem state s (0) = ( s (0) , · · · , s γ N (0) , · · · , s K (0) , · · · , s Kγ K N (0)) , our problem can be formulated as follows:min π ∈ Π lim sup T →∞ T E π (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 s k,πi ( t ) | s (0) (cid:35) s.t. K (cid:88) k =1 γ k N (cid:88) i =1 a k,πi ( t ) ≤ αN, t = 0 , , , . . . (3)This problems belongs to the family of RMAB problems, which are generally difﬁcult to solve optimally (seePapadimitriou et al. [26]). For this reason, one should aim to develop a well-performing sub-optimal policy. As ithas been mentioned, the low-complexity scheduling policy that we are interested in throughout this paper is theWhittle’s index policy. To establish this policy and derive the Whittle’s indices expressions, one has to follow thesteps below:1) Provide a relaxed version of the original problem and tackle it through a Lagrangian approach.2) Prove the indexability property of the problem and derive the Whittle’s index expressions. As previously mentioned, these steps have been carried out in previous works by the authors in [15]. For complete-ness, and as we will use these steps later in our optimality analysis, we report them along with the main results of[15] in the following section.III. R

ELAXED P ROBLEM AND W HITTLE ’ S I NDEX P OLICY

A. Relaxed Problem

The ﬁrst step toward establishing the Whittle’s index policy consists of relaxing the constraint on the number ofscheduled users of the problem in (3). Speciﬁcally, instead of having the constraint satisﬁed at each time slot, weconsider that it has to be satisﬁed on average. Therefore, the relaxed problem can be formulated as follows:min π ∈ Π lim sup T →∞ T E π (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 s k,πi ( t ) | s (0) (cid:35) s.t. lim sup T →∞ T E π (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 a k,πi ( t ) (cid:35) ≤ αN (4)To study this problem, one has to introduce a Lagrangian approach to transform the problem into an unconstrainedone as will be detailed in the sequel. B. Dual Problem

To circumvent the difﬁculty of studying the constrained problem in (4), a Lagrangian approach has to be adopted.In particular, let us denote by λ ≥ the Lagrangian parameter. For a ﬁxed λ , the Lagrangian function of the relaxedproblem is: F ( λ, π ) = lim sup T →∞ T E π (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 s k,πi ( t ) + λ ( a k,πi ( t ) − α ) | s (0) (cid:35) (5)Based on the dual approach, the next step consists of ﬁnding the policy π that minimizes F ( λ, π ) . Note that theterm T (cid:80) T − t =0 (cid:80) Kk =1 (cid:80) γ k Ni =1 λα , which is equal to N λα , doesn’t depend on π . Therefore, the policy that minimizesthe above function F ( λ, π ) also minimizes the following function: f ( λ, π ) = lim sup T →∞ T E π (cid:34) T − (cid:88) t =0 K (cid:88) k =1 γ k N (cid:88) i =1 s k,πi ( t ) + λa k,πi ( t ) | s (0) (cid:35) (6)Then, we can formulate the dual problem as follows:min π ∈ Π f ( λ, π ) (7) C. Structural Results

To solve the problem in (7), it can be shown that this N -dimensional problem can be decomposed into N one-dimensional problems that can be solved independently [15]. Therefore, we can drop the i and k indices from (6)and simply investigate the following one-dimensional problem:min π ∈ Π lim sup T →∞ T E π (cid:34) T − (cid:88) t =0 s π ( t ) + λa π ( t ) | s (0) (cid:35) (8) It turns out that the above one dimensional problem can be cast into an inﬁnite horizon average cost MarkovDecision Process (

MDP ) that is deﬁned as follows: • States : The state of the MDP at time t is the age of the user s ( t ) that can take any integer value strictly higherthan . Therefore, the considered state space is countable and inﬁnite. • Actions : The action at time t , denoted by a ( t ) , indicates if a transmission is attempted (value ) or the userremains idle (value ). • Transitions probabilities : The transitions probabilities between the different states have been previouslydetailed in Section II. • Cost : The cost function at time t is designated by C ( s ( t ) , a ( t )) = s ( t ) + λa ( t ) .To solve this MDP, the authors in [15] have leveraged the Bellman equation and studied the characteristics of thevalue function involved. Based on the particularity of the value function, the following result was found: Proposition 1.

The optimal policy that solves problem (8) is of a threshold nature.Proof.

See [15, Proposition 14].The above results tell us that there exists an integer l k ∈ N ∗ such that by only letting users of class k with an agelarger or equal to l k to transmit, we attain the optimal operating point of (8). These results are pivotal to proceedwith establishing the Whittle’s index policy. D. Indexability and Whittle’s Index Expressions

To proceed toward our goal, one has to analyze the behavior of the MDP when a threshold policy is adopted. Tothat end, we note that for any ﬁxed threshold n , the MDP can be modeled through a Discrete Time Markov Chain( DTMC ) where: • The state is the age s ( t ) . • For any state s ( t ) < n , the user is idle. On the other hand, when s ( t ) ≥ n , the user is scheduled.The DTMC is reported in Fig. 1. To be able to prove the indexability property and ﬁnd the Whittle’s indexFigure 1: The states transition when a threshold policy is adoptedexpression, one has to ﬁnd the average objective function in (8) when a threshold policy is adopted. To that end,we provide the following propositions. Proposition 2.

For a ﬁxed threshold n , the stationary distribution u n of the DMTC when the decoding successprobability is equal to p is: u n ( i ) =  pnp +1 − p if ≤ i ≤ n (1 − p ) i − n pnp +1 − p if i ≥ n (9) Proof.

The results can be easily obtained by solving the full balance equations.The next step consists of calculating the average objective function in (8) when a threshold policy is employed.

Proposition 3.

For a ﬁxed threshold n , the average cost of the threshold policy of the problem (8) is: C ( n, λ ) = [( n − + ( n − p + 2 p ( n −

1) + 22 p (( n − p + 1) + λnp + 1 − p (10) Proof.

The results can be concluded by leveraging the stationary distribution expressions and the fact that C ( n, λ ) = (cid:80) + ∞ i =1 iu n ( i ) + λ (cid:80) + ∞ i = n u n ( i ) .Using the stationary distribution, and the average cost, one can then prove the indexability property of the problem,which ensures the existence of the Whittle’s indices. Before providing these results, we ﬁrst lay out the deﬁnitionof the aforementioned property. Deﬁnition 1 (Indexability) . For a ﬁxed λ , consider the vector l ( W ) = ( l ( λ ) , . . . , l K ( λ )) where l k ( λ ) is the optimalthreshold for the problem in (8) for each user of class k . We deﬁne D k ( λ ) = { s ∈ N ∗ : s < l k ( λ ) } as the set ofstates for which the optimal action is to not schedule the users belonging to class k . The one-dimensional problemassociated with these users is said to be indexable if D k ( λ ) is increasing in λ . More speciﬁcally, the followingshould hold: λ (cid:48) ≤ λ ⇒ D k ( λ ) ⊆ D k ( λ ) (11)The indexability property for the problem in (8) was established by the authors in [15]. With the Whittle’s indicesensured to exist, one can then leverage the stationary distribution and the average cost reported in Proposition 2and 3 to derive the Whittle’s index expressions as previously done in [15] and [21]. Proposition 4.

For any given class k , the Whittle’s index expression of state i is: W k ( i ) = ( i − p k i i (12) Proof.

See [15, pp. 10].With the Whittle’s index expression derived, we can now establish the Whittle’s index scheduling policy. Thiscan be summarized in the following algorithm description.

Algorithm 1

Whittle’s index scheduling policy At each time slot t , calculate the Whittle’s index of all users in the network using (4). Schedule the M users having the highest Whittle’s index values at time t , with ties broken arbitrarily. Although the above scheduling policy is easy to implement, it remains sub-optimal. Accordingly, characterizing itsperformance compared to the optimal policy is important. Equipped with the above results and notations, we cannow tackle the main issue that we aim to address in our paper: the asymptotic optimality of this policy.IV. A

SYMPTOTIC O PTIMALITY OF THE W HITTLE ’ S I NDEX P OLICY

A. Optimal Solution of the Relaxed Problem

To be able to prove the asymptotic optimality of the Whittle’s index policy, one has to compare its performanceto the optimal policy that solves (3). However, as previously explained, the optimal policy of (3) is not known. Tocircumvent this, and to have a benchmark performance to compare to, we note that the following always holds: C RP,N N ≤ C OP,N N ≤ C W IP,N N (13)where C WIP,N N is the average age per-user under the Whittle’s index policy, C OP,N N is the optimal expected averageage per-user of the original problem (3), and C RP,N N is the optimal average age per-user of the relaxed problem(4). Thus, in order to show the asymptotic optimality, it is sufﬁcient to prove that for a large number of users N , C WIP,N N converges to C RP,N N . To that end, the next task is to ﬁnd an expression of C RP = C RP,N N . For this purpose,we provide the following proposition. Proposition 5.

The optimal solution of the relaxed problem is of type threshold for each class. More precisely, itis a linear combination between two threshold vectors ( l , · · · , l K ) and ( l , · · · , l K ) such that: • There exists a unique real value W ∗ ∈ R , a class m and state p such that W ∗ = W m ( p ) . • The expressions of l k and l k are as follows: l k = argmax i ∈ N ∗ { W k ( i ) : W k ( i ) ≤ W ∗ } + 1 ∀ k ∈ { , . . . , K } l k = argmax i ∈ N ∗ { W k ( i ) : W k ( i ) < W ∗ } + 1 ∀ k ∈ { , . . . , K } (14) • There exists a unique < θ ≤ that satisﬁes θ (cid:80) Kk =1 γ k (cid:80) + ∞ i = l k u l k k ( i ) + (1 − θ ) (cid:80) Kk =1 γ k (cid:80) + ∞ i = l k u l k k ( i ) = α ,where u nk is the stationary distribution of the age given a threshold n for class k .Proof. See [21, Proposition 5].Thanks to this proposition, we can conclude that the optimal per-user cost of the relaxed problem has the followingexpression: C RP = K (cid:88) k =1 γ k + ∞ (cid:88) i =1 [ θu l k k ( i ) + (1 − θ ) u l k k ( i )] i (15)By leveraging these results, we can proceed with characterizing the performance of the Whittle’s index policy. B. Global Optimality of the Whittle’s index policy

This section constitutes the main contribution of the paper where we show the asymptotic optimality of theWhittle’s index policy. The idea is to show that the performance of this policy converges to C RP when N is largeand the ratio α = MN is kept constant.We let Z k,Ni ( t ) denote the proportion of users belonging to class k in state i at time t . In other words, itdenotes the ratio of the number of users in class k having an age equal to i to the total number of users N .We have that Z N ( t ) = ( Z ,N ( t ) , ....., Z K,N ( t )) with Z k,N ( t ) = ( Z k,N ( t ) , ......, Z k,Nm k ( t ) ( t )) , where m k ( t ) is thehighest state at time t in class k and (cid:80) m k ( t ) i =0 Z k,Ni ( t ) = γ k for each class k . We also denote by z ∗ the proportioncorresponding to the optimal policy of the relaxed problem. Thus, the elements of the vector z ∗ are exactly the set { γ k ( θu l k k ( i ) + (1 − θ ) u l k k ( i )) } ≤ k ≤ K ≤ i where i and k refer to the user i and class k respectively. This can be easilyconcluded from the results previously laid out in eq. (15).In the sequel, we will establish the global optimality for two different classes of users where p and p are thesuccessful transmission probabilities of class and respectively ( p > p ). In order to prove that, we show thatwhen the Whittle’s index policy is adopted, Z N ( t ) converges in probability to z ∗ when N and t are very large.To that extent, we follow the steps below: • We show that the ﬂuid approximation of Z N ( t ) , denoted by z ( t ) , converges to z ∗ . Such a convergence hasbeen proven in previous works under restrictive mathematical assumptions that can only be veriﬁed numerically[22]. We escape these assumptions as we will detail in the following. • Since the relation between z ( t + 1) and z ( t ) is not linear, our approach to establish the convergence of z ( t ) involves two terms: α ( t ) and α ( t ) . These two proportions are nothing but the scheduled proportion at time t of class and , respectively. Note that we always have α ( t ) + α ( t ) = α . Based on Lemma 1, we showthat for a large enough time t , there exists T t such that we can ﬁnd a partial relation between each elementof the vector z ( t + T t ) and terms of the sequence { α k ( t (cid:48) ) } k =1 , t (cid:48)≤ t + Tt . More precisely, we prove that for T t , wecan express each proportion that is not scheduled at time t + T t in function of one term of { α k ( t (cid:48) ) } k =1 , t (cid:48)≤ t + Tt .This allows us to obtain − α as a linear combination between the terms of { α k ( t (cid:48) ) } k =1 , t (cid:48)≤ t + Tt at time t + T t . • Subsequently, we introduce in Deﬁnition 3, T max that satisﬁes these two following properties proven inPropositions 7 and 8 using Lemma 2: the Whittle’s index alternates between the two classes from state to T max under a given assumption on α ; the instantaneous thresholds l ( . ) and l ( . ) are bounded by T max attime t + T t . • Based on that, we derive the relation between the instantaneous thresholds at time t + T t in Proposition 9. Takingas initial time t + T t = T , we show by induction in Proposition 10 that, for all T ≥ T , the instantaneousthresholds are less than T max and that all none scheduled proportions can be expressed in function of termsof the sequence { α k ( t (cid:48) ) } k =1 , t (cid:48)≤ T . Next, we deﬁne for each class k a vector A k ( T ) composed by α k ( T ) (thescheduled proportion at time T ) plus the ﬁnite subset of the sequence { α k ( t (cid:48) ) } t (cid:48) ≤ T such that for all proportionof users in class k at a given state at time T that is not scheduled can be expressed by one element belongingto this subset. After that, we provide the relation between the elements of the vectors A k ( T ) and A k ( T + 1) in Propositions 11 and 12 • As was mentioned in Introduction, our proof is based on Cauchy criterion which states that in the real numberspace R , a given sequence h ( T ) is convergent if and only if its terms become closer together as T increases.To that extent, we show that the elements of the vector A k ( · ) which are nothing but the terms of the sequence α k ( · ) are getting closer when T increases. For that purpose, we prove that the highest and the smallest elementof A k ( T ) converge to the same limit when T grows. To that end, we start by establishing the convergenceof the highest and the smallest element of A k ( T ) in Theorem 1. Then, we demonstrate by contradiction thatthe highest and the smallest element of A k ( T ) must converge to the same limit in Proposition 13. This lastresult implies that α k ( t ) converges when t scales. In light of that fact, we prove that z ( t ) converges to z ∗ in Proposition 14. Finally, using Kurth theorem, we show in Proposition 15 that Z N ( t ) converges to z ∗ inprobability. And ﬁnally we establish in Proposition 16 the convergence of C WIP,N N to C RP .With the steps of our approach clariﬁed, we can proceed with introducing the ﬂuid limit approximation. The ﬂuidlimit technique consists of analyzing the evolution of the expectation of Z N ( t ) under the Whittle’s index policy.For that, we deﬁne the vector z ( t ) as follows: z ( t + 1) − z ( t ) | z ( t )= z = E (cid:104) Z N ( t + 1) − Z N ( t ) | Z N ( t ) = z (cid:105) (16)This above equation reveals to us that we have a sequence z ( t ) deﬁned by recurrence for a ﬁxed initial state z (0) that we should study its behavior when t is very large. Hence, we end up with a function z ( t ) that depends ontwo variables, t and the initial value z (0) . To that extent, our aim is to prove that z ( t ) converges to z ∗ regardlessof the initial state z (0) . We let z ( t ) = ( z ( t ) , ....., z K ( t )) with z k ( t ) = ( z k ( t ) , ......, z km k ( t ) ( t )) where z ki ( t ) is theexpected proportion of users at state i in class k at time t with respect to the equation (16). Accordingly we havethat (cid:80) m k ( t ) i =0 z ki ( t ) = γ k for each class k .One can notice that z ∗ is a particular vector with respect to the equation (16). Proposition 6. z ∗ is the unique ﬁxed point of the ﬂuid approximation equation. In other words, z ( t ) = z ( t + 1) ,if and only if z ( t ) = z ∗ .Proof. The proof follows the same methodology of the paper [22, Lemma 9]According to this proposition, it is sufﬁcient to show that z ( t ) converges starting from any initial state z (0) , asthe only eventual ﬁnite limit of z ( t ) when t tends to + ∞ is the ﬁxed point of the equation (16), z ∗ . Remark 1.

We highly emphasize that the proportion α k ( t ) and − α refer to the scheduled users’ proportion attime t in class k and the non scheduled users’ proportion either for class or respectively. Meanwhile, for anyother proportion A , it refers only to the number of users in this proportion over the total users’ number of thesystem whatever the different states of users that contains. Having said that, A = B means that they are equal interms of proportion, while they can contain users in different states. In the following, we prove that the ﬂuid approximation vector of Z N ( t ) , z ( t ) under the Whittle Index Policyconverges starting from any initial state. We prove this result for 2 different classes of users where p and p arethe successful transmission probabilities of the class 1 and 2 respectively ( p > p ), given a sufﬁcient conditionon α . Throughout this section, we denote by w ( n ) and w ( n ) , the Whittle’s index, whose expression is given inProposition 4, of state n in class 1 and class 2 respectively. We need to prove that z ki ( t ) converges for each state i in class k .Now, focusing on the Whittle index policy, we can see it as an instantaneous threshold policy for each class,where the thresholds vary over time t . Moreover, under the Whittle index policy, the proportion of users that arescheduled at each time slot t is ﬁxed and equals to α since the number of scheduled users at each time slot t is αN . This proportion α contains the users with the highest Whittle index values. In that respect, we deﬁne α ( t ) and α ( t ) the proportion of users in class 1 and class 2 respectively at time t with the highest Whittle index valuessuch that α ( t ) + α ( t ) = α . The remaining proportion of users which are not scheduled at each time slot t , whichis equal to − α , contains the users with the smallest Whittle index values. Now, regarding this proportion, wegive its decomposition into proportions of users at different states in different classes. Denoting by l ( t ) and l ( t ) at time t the instantaneous threshold integers under Whittle index policy, then there exists two real values between0 and 1, β ( t ) and γ ( t ) , with γ ( t ) = 1 and < β ( t ) ≤ , or < γ ( t ) ≤ and β ( t ) = 1 , such that: l ( t ) − (cid:88) i =1 z i ( t ) + l ( t ) − (cid:88) i =1 z i ( t ) + β ( t ) z l ( t ) ( t ) + γ ( t ) z l ( t ) ( t ) = 1 − α (17)and { z i } ≤ i ≤ l ( t ) ∪ { z i } ≤ i ≤ l ( t ) is exactly the set { z ki : w k ( i ) ≤ max( w ( l ( t ) , w ( l ( t )) } .In paper [21], in order to prove the convergence of z ( t ) , the authors assume that z (0) is within a preciseneighborhood of z ∗ and they consider that the number of states is ﬁnite. These assumptions allow them to ﬁnd aneasy linear relation between z ( t ) and z ( t + 1) ( z ( t + 1) = Qz ( t ) + c see [21, Section IV-C]), and then deducethe convergence of z by establishing that the spectral value of Q is less strictly than one. In our case, as we aimto prove the convergence of z from any initial state, the relation between z ( t + 1) and z ( t ) is as follows z ( t + 1) = Q ( z ( t )) z ( t ) + c ( t ) (18)This equation is not linear which makes studying the evolution of z ( · ) a hard task. Moreover, as the number ofstate is inﬁnite, then the dimensions of z ( t ) varies per time. Therefore, the matrix Q ( z ( t )) is not square. Hencewe can not apply the same method as in [21] since the spectral values are not deﬁned for a non square matrix. Forthese reasons, we proceed differently than [21]. Our method consists in fact on expressing each proportion z ki ( t ) that belongs to a non scheduled users’ proportion at time t in function of a term of α k ( · ) at a given time less than t . By this way, we will obtain a part of the vector z ( t ) in function of { α k ( t (cid:48) ) } t (cid:48) ≤ t,k ∈{ , } , and the sum of the otherpart equal to α . Then, we show that α k ( · ) converges for k = 1 , . We will see later that it is sufﬁcient to showthat α k ( · ) converges in order to conclude for the convergence of z ( · ) . To ﬁnd the partial relation between z ( t ) and { α k ( t (cid:48) ) } t (cid:48) ≤ tk ∈{ , } , we prove the following lemma. Lemma 1.

Knowing z k ( t ) , α k ( t ) and l k ( t ) , we have that:For i = 1 : z k ( t + 1) = p k α k ( t ) .For ≤ i < l k ( t ) : z ki +1 ( t + 1) = z ki ( t ) .Proof. See Appendix A.According to Lemma 1, after scheduling under the Whittle’s Index Policy, we get at time t + 1 , a proportion of p α ( t ) of users at state 1 in class 1 and p α ( t ) of users at state 1 in class 2 respectively (i.e. z ( t + 1) = p α ( t ) and z ( t + 1) = p α ( t ) ).According to the same lemma, at time t + 2 , a proportion of p α ( t ) and p α ( t ) of users will go to state 2 inclass 1 and class 2 respectively and p α ( t + 1) , p α ( t + 1) of users will move to state 1 in class 1 and class 2respectively (i.e. z ( t + 2) = p α ( t + 1) , z ( t + 2) = p α ( t + 1) , z ( t + 2) = p α ( t ) and z ( t + 2) = p α ( t ) ).At time t + 3 , a proportion of p α ( t ) and p α ( t ) of users will go to state 3 in class 1 and class 2 respectively, p α ( t +1) , p α ( t +1) of users will move to state 2 in class 1 and class 2 respectively, p α ( t +2) and p α ( t +2) of users will move to state 1 in class 1 and class 2 respectively, (i.e. z ( t +3) = p α ( t +2) , z ( t +3) = p α ( t +2) , z ( t + 3) = p α ( t + 1) and z ( t + 3) = p α ( t + 1) , z ( t + 3) = p α ( t ) , z ( t + 3) = p α ( t ) )Thereby, at time t + t where the instantaneous threshold l k ( t + t ) ≥ t , we get a set of proportions { p α ( t ) , p α ( t ) , · · · , p α ( t + t − , p α ( t + t − } that belong to the proportion − α of users withthe lowest Whittle index values, such that z ( t + t ) = p α ( t + t − , z ( t + t ) = p α ( t + t − , · · · , z t ( t + t ) = p α ( t ) and z t ( t + t ) = p α ( t ) . Hence, we obtain a z ki ( t + 1) which is well expressed in functionof terms of α k ( · ) ( k = 1 , ) for i ∈ [1 , t ] , k = 1 , . Remark 2.

Considering Whittle index policy framework, the order of the different users’ proportions with respectto their Whittle index values must be taking into account throughout this analysis. In fact, as we have alreadymentioned, we need to give the expression of the non scheduled users’ proportions in function of the terms of α k ( · ) for k = 1 , , which can not be done only if we consider the order of the Whittle index values. To that extent, sincethe set of the non scheduled users’ proportions, according to the Whittle’s index policy, is exactly the set of users’proportions with the lowest Whittle index values among all the different users’ proportions of the system, then theform at time t of this speciﬁc set will be { z ki ( t ) : w k ( i ) ≤ w m ( n ) } for a given m and n that vary with t . Based on this remark above, we need to ﬁnd at time t + t , a set of the form { z ki ( t + t ) : w k ( i ) ≤ w m ( n ) } fora given class m and state n , such all the elements of this set are well expressed in function of α k ( · ) . We show inthe sequel that the highest Whittle index of this set could be w ( t ) .Indeed, given that the Whittle index function is increasing with n where n refers to a given age of informationstate, then for any state in class with Whittle index less than w ( t ) , belongs to [1 , t ] . Moreover, considering thestate q in class such that w ( q ) ≤ w ( t ) ≤ w ( t ) ( p > p ), then w ( q ) ≤ w ( t ) , which means that q ∈ [1 , t ] .Hence, for any element in { z ki ( t + t ) : w k ( i ) ≤ w ( t ) } , can be expressed in function of terms of α k ( · ) ( k = 1 , ). Figure 2: Evolution of z ki ( · )) for different states i in function of α ( t ) and α ( t ) under the Whittle Index Policy(the green and the yellow colors refer to class 1 and 2 respectively)Accordingly, { z ki ( t + t ) : w k ( i ) ≤ w ( t ) } equals to the set { p α ( t ) , · · · , p α ( t + t − , p α ( t + t − l ( t + t )) , · · · , p α ( t + t − } , where l ( t + t ) is the greatest state q in class such that w ( q ) ≤ w ( t ) . We notethat l ( t + t ) ≤ t because w ( l ( t + t )) ≤ w ( l ( t + t )) ≤ w ( t ) .Therefore, in that regards, for a ﬁxed t , we associate for each t the corresponding sum (cid:80) l ( t + t ) j =1 z j ( t + t ) + (cid:80) t j =1 z j ( t + t ) = (cid:80) l ( t + t ) j =1 p α ( t + t − j ) + (cid:80) t j =1 p α ( t + t − j ) . To that extent, we deﬁne in the followingthe time t when this aforementioned sum exceeds − α . Deﬁnition 2.

Starting at time t , we deﬁne T t such that t + T t is the ﬁrst time that veriﬁes: l ( t + T t ) (cid:88) j =1 p α ( t + T t − j ) + T t (cid:88) i =1 p α ( t + T t − j ) ≥ − α (19) In other words, the ﬁrst time when (cid:80) l ( t + t ) j =1 p α ( t + t − j )+ (cid:80) t i =1 p α ( t + t − j ) exceeds − α is t + t = t + T t . Then, at time t + T t , there exists l (cid:48) ( t + T t ) ≤ l ( t + T t ) , l (cid:48) ( t + T t ) ≤ T t , such that the set { z i ( t + T t ) } ≤ i ≤ l (cid:48) ( t + T t ) ∪ { z i ( t + T t ) } ≤ i ≤ l (cid:48) ( t + T t ) is exactly the set { z ki ( t + T t ) : w k ( i ) ≤ max( w ( l (cid:48) ( t + T t ) , w ( l (cid:48) ( t + T t )) } , and γ ( t + T t ) = 1 and < β ( t + T t ) ≤ , or < γ ( t + T t ) ≤ and β ( t + T t ) = 1 such that: l (cid:48) ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ l (cid:48) ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ β ( t + T t ) p α ( t + T t − l (cid:48) ( t + T t ))+ γ ( t + T t ) p α ( t + T t − l (cid:48) ( t + T t )) = 1 − α, (20)with l (cid:48) ( t + T t ) and l (cid:48) ( t + T t ) being the instantaneous thresholds in class and respectively at time t + T t . α ( t + T t ) and α ( t + T t ) are the users’ proportions with the highest Whittle index values, and their sum is equalto α . Without loss of generality, we let l (cid:48) k ( t + T t ) = l k ( t + T t ) .Figure 3: The proportions of users at different states at time t + T t when γ ( t + T t ) = 1 and < β ( t + T t ) ≤ According to Remark 2, the form of this set means that it contains the users’ proportions with the lowest Whittle index values among allusers’ proportions of the system Figure 4: The proportions of users at different states at time t + T t when β ( t + T t ) = 1 and < γ ( t + T t ) ≤ As we can see, at time t + T t , all the expressions of the users’ proportions that belong to the − α of users withthe smallest Whittle index values, are in function of α ( t ) or α ( t ) at various time. In fact, at time t + T t , we end upwith z ( t + T t ) = p α ( t + T t − , z ( t + T t ) = p α ( t + T t − , · · · , z l ( t + T t ) ( t + T t ) = p α ( t + T t − l ( t + T t )) and z l ( t + T t ) ( t + T t ) = p α ( t + T t − l ( t + T t )) , and the rest of the proportions belongs to α ( t + T t ) for class 1and α ( t + T t ) for class 2. For this reason, we work only with α ( · ) and α ( · ) in order to prove the convergence.As we have mentioned earlier, the proof of the optimality is valid under an assumption on α . This later relies onthe maximum value that can take the instantaneous thresholds l k ( t + T t ) at time t + T t for k = 1 , . To that extent,we start by deﬁning and bounding a certain constant T max . Then under an assumption on α , we show that the orderof Whittle index alternates between the two classes in the set [1 , T max + 1] (this will be detailed later). Based onthis, we establish that T max is an upper bound of l k ( t + T t ) .First of all, we give a lemma which will be useful to prove the propositions 8, 9 and 10. Lemma 2.

There exists a time t f such that for all t ≥ t f , α ( t ) > .Proof. See appendix B. Figure 5: Graphical representation of T max In this following deﬁnition, we deﬁne T max , and we check later that it coincides with the upper bound of l k ( t + T t ) for k = 1 , . Deﬁnition 3.

Starting at time t , we deﬁne T max as T t deﬁned in Deﬁnition 2, that veriﬁes the following: • (cid:80) l ( t + T t ) j =1 p α ( t + T t − j ) + (cid:80) T t j =1 p α ( t + T t − j ) ≥ − α • α ( t + i ) = 0 for all i ∈ [0 , T max − In the next lemma, we determine the upper and the lower bound of T max . Lemma 3. T max doesn’t depend on t and satisﬁes: − αp α ≤ T max ≤ − αp α + 1 .Proof. See appendix C.We say that the order of the Whittle index strictly alternates between the two classes in [1 , n ] or from state 1 to n , if we have w (1) < w (1) < w (2) < w (2) < w (3) < w (3) < · · · < w ( n ) < w ( n ) . To that extent, theproof of α k ( · ) convergence is feasible when the alternation condition is satisﬁed from to l k ( t + T t ) + 1 for all t .We note that this condition will be relevant in the proof of the proposition 12. To that end, we start by introducing the assumption on α . Then, we demonstrate effectively that under this assumption the condition of alternation issatisﬁed from to l k ( t + T t ) + 1 . Assumption 1.

Denoting p − p ( p + p + (cid:113) p − p ) + ( p + p ) ) by D . Then, the users’ proportion scheduledat each time α satisﬁes: α >

11 + ( D − p (21)If T max is the highest value that l k ( t + T t ) can take, (this will be shown later in proposition 8), then it is sufﬁcientto prove that the hypothesis of the Whittle index alternation is satisﬁed from to T max + 1 . This will be shown inthe next proposition. Proposition 7.

Under Assumption (1) , the order of the Whittle index alternates between the two classes from state1 to state to T max + 1 .Proof. See appendix D.Now we prove that the instantaneous thresholds of the two classes can not exceed T max . Proposition 8.

Denoting by l max the highest instantaneous threshold in the sense that ∀ t ≥ t f , max( l ( t + T t ) , l ( t + T t )) ≤ l max , then T max = l max .Proof. See appendix EAccording to the last proposition, T max is truly the upper bound of l k ( t + T t ) for all t and k = 1 , . Asconsequence, the order of the Whittle index alternates between the two classes in the set [1 , l k ( t + T t ) + 1] . Thenext goal is to ﬁnd a relation between l ( t + T t ) and l ( t + T t ) . To do so, we recall that we have at time t : l ( t ) − (cid:88) i =1 z i ( t ) + l ( t ) − (cid:88) i =1 z i ( t ) + β ( t ) z l ( t ) ( t ) + γ ( t ) z l ( t ) ( t ) = 1 − α (22)with l ( t ) and l ( t ) being the thresholds in class and respectively at time t , and β ( t ) = 1 and < γ ( t ) ≤ ,or γ ( t ) = 1 and < β ( t ) ≤ . Thereby, the ﬁrst step consists of establishing the relationship between l ( t ) and l ( t ) when max( l ( t ) , l ( t )) ≤ T max depending on two different cases that we will explain thereafter in order togive a generalized expression of the aforementioned equation (22) where the index of the class is not speciﬁed inthe expressions of the thresholds l ( t ) and l ( t ) . Remark 3.

It is worth mentioning that, as we have deﬁned l max in Proposition 8, it refers to the highest valuethat can be attained by the thresholds of the class 1 or 2 at time t + T t for t > t f where t f is a given in Lemma2. Whereas, at any time t > t f , max( l ( t ) , l ( t )) ≤ l max might not be true since we don’t have necessary a given t (cid:48) such that t (cid:48) + T t (cid:48) = t for any t > t f . Proposition 9.

At any time t > t f , if max( l ( t ) , l ( t )) ≤ T max = l max , then there exists l ( t ) ≤ l max and, β ( t ) = 0 and < γ ( t ) ≤ , or < β ( t ) ≤ and γ ( t ) = 1 such that: l ( t ) − (cid:88) i =1 z i ( t ) + l ( t ) − (cid:88) i =1 z i ( t ) + β ( t ) z l ( t ) ( t ) + γ ( t ) z l ( t ) ( t ) = 1 − α (23) Proof.

See appendix G.Starting at time t ≥ t f , we have that at time t + T t , the thresholds l ( t + T t ) and l ( t + T t ) are less than l max .Hence, according to Proposition (9), there exists l ( t + T t ) such that: l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ β ( t + T t ) p α ( t + T t − l ( t + T t ))+ γ ( t + T t ) p α ( t + T t − l ( t + T t )) = 1 − α (24)where β ( t + T t ) = 0 and ≤ γ ( t + T t ) < , or ≤ β ( t + T t ) < and γ ( t + T t ) = 1 .Denoting t + T t by T , we obtain: l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + γ ( T ) p α ( T − l ( T )) = 1 − α (25)where β ( T ) = 0 and < γ ( T ) ≤ , or < β ( T ) ≤ and γ ( T ) = 1 .Now, we prove by induction that this latter expression is valid for all T ≥ T , and that l ( T ) , the instantaneousthreshold at time T , is less than l max . Proposition 10.

For all T ≥ T , there exists l ( T ) ≤ l max , β ( T ) and γ ( T ) , such that: l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + γ ( T ) p α ( T − l ( T )) = 1 − α (26) where β ( T ) = 0 and < γ ( T ) ≤ , or < β ( T ) ≤ and γ ( T ) = 1 .Proof. See appendix H.According to the latter proposition, we can now deﬁne at each time T ≥ T , for each class k , the vector A k ( T ) = ( α k ( T ) , α k ( T − , · · · , α k ( T − l ( T ))) , such that, there exists β ( T ) and γ ( T ) : l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + γ ( T ) p α ( T − l ( T )) = 1 − α (27)where β ( T ) = 0 and < γ ( T ) ≤ , or < β ( T ) ≤ and γ ( T ) = 1 . We note that as we have explained previously,the relation between A k ( T ) and z k ( T ) is: p k α k ( T −

1) = z k ( T ) , p k α k ( T −

2) = z k ( T ) , · · · , p k α k ( T − l ( T )) = z kl ( T ) ( T ) . Remark 4.

We emphasize that in the following analysis, T is always considered greater than T . We prove in the sequel that max A k ( T ) is decreasing and min A k ( T ) is increasing (with the max and min referring to the element of the vector with the greatest value, and the smallest value respectively). After that, weconclude the convergence of max A k ( T ) and min A k ( T ) when T tends to + ∞ . Then, we prove that they mustconverge to the same real number. In order to prove that max A k ( T ) is decreasing and min A k ( T ) is increasing,we ﬁrst demonstrate this following proposition. Proposition 11.

All the elements of the vector A k ( T + 1) belong to the elements of the vector A k ( T ) except α k ( T + 1) .Proof. See appendix IWith the intention of proving the monotony of max A ( T ) and min A ( T ) , we still need to prove that the valueof α ( T + 1) must be less than max A ( T ) and greater than min A k ( T ) . For that, we introduce the followingproposition.Before doing that, we note that, as α ( t ) + α ( t ) = α at each time slot t , then it is sufﬁcient for us to prove that α ( · ) is converging. To that extent, we study only the vector function A ( T ) in order to prove the convergence. Proposition 12.

Under assumption 1, for a given vector A ( T ) = ( α ( T ) , α ( T − , · · · , α ( T − l ( T )))( T ≥ T ) ,we have four possible cases of inequalities: α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) α ( T − l ( T )) ≤ α ( T + 1) ≤ α ( T ) α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) Moreover: If α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) , then: α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) If α ( T − l ( T )) ≤ α ( T + 1) ≤ α ( T ) , then: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ))) If α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) , then: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) If α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) , then: α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T ) + 1) − α ( T )) Proof.

See appendix J. Theorem 1. min A ( T ) and max A ( T ) converge and we denote their limits respectively by l and l .Proof. According to Proposition 11, the elements of the vector A ( T +1) except the ﬁrst element which is α ( T +1) belong to the elements of the vector A ( T ) . Hence, the values of these elements (except the ﬁrst element of A ( T + 1) ) is less than max A ( T ) and greater than min A ( T ) . According to the ﬁrst result of Proposition 12,we deduce that α ( T +1) is between two values of two elements of the vector A ( T ) . Hence, combining the resultsof Proposition 11 and 12, max A ( T + 1) ≤ max A ( T ) and min A ( T + 1) ≥ min A ( T ) . Then max A ( T ) isdecreasing with T and min A ( T ) is increasing with T . Given that for all T , ≤ α ( T ) ≤ α , then max A ( T ) and min A ( T ) are bounded by and α . Therefore, we can conclude that min A ( T ) and max A ( T ) convergeand we denote their limits by l and l respectively. Moreover max A ( T ) is lower bounded by l and min A ( T ) is upper bounded by l .However, in order to have α ( T ) converges to a unique point, we need to establish that max A ( T ) and min A ( T ) converge to the same limit. In other words, we need to prove that l = l . For that, we will usethe second result of Proposition 12. To that extent, we proceed by contradiction, i.e. we suppose that l (cid:54) = l . Morespeciﬁcally, given that l ≤ l by deﬁnition, the two possible cases satisﬁed by l and l are: l < l or l = l ,then to show that l = l , it is sufﬁcient to ﬁnd a contradiction considering l < l .In fact, we prove that if l < l , there exists T d such that all the elements of A ( T d ) are less strictly than l , thatcontradicts with the fact that max A ( T ) is lower bounded by l .As max A ( T ) converges to l , then for a given (cid:15) > , there exists a given time slot that we denote by T (cid:15) ≥ T such that for all T ≥ T (cid:15) , max A ( T ) < l + (cid:15) . Our proof consists of showing that for a small enough (cid:15) , thereexists T ≥ T (cid:15) , max A ( T ) is less strictly than l . We need ﬁrst to determine an upper bound of the number of theelements of the vector A ( T ) whatever T . In fact, as we have demonstrated that at each time T , the instantaneousthreshold l ( T ) is less than l max . Then the number of the elements of A ( T ) will not exceed l max + 1 . In thefollowing proof, we denote l max by L . Proposition 13. If l < l , for (cid:15) ≤ ( l − l ) (1 − p ) L − (1 − p ) L , there exist T d ≥ T (cid:15) such that all the elements of A ( T d ) are less strictly than l .Proof. See appendix K.Providing that l is a lower bound of max A ( T ) which contradicts with the result of the above proposition.Hence, the supposition of l (cid:54) = l is not valid.Therefore, l = l . Consequently, max A ( T ) and min A ( T ) converge to the same limit denoted α ∗ . Given that min A ( T ) ≤ α ( T ) ≤ max A ( T ) for all T , then α ( T ) also converges to α ∗ . Similarly, α ( T ) converges to α − α ∗ = α ∗ . In the following proposition, we prove that z ( t ) converges. Proposition 14. If α k ( t ) converges to α ∗ k , then for each state i and class k , z ki ( t ) converges to z k, ∗ i .Proof. See appendix L. However, we still have to establish that the stochastic vector Z N ( t ) converges to z ∗ in probability when N scales. For that, we introduce the following proposition inspired from the discrete-time version of Kurtz Theoremin [27]. Before that, knowing that the norms on the inﬁnite dimension vector space are not equivalents, we workonly with a speciﬁc norm which will be useful to show the optimality of the Whittle index’s policy. Accordingly,we deﬁne || · || as follows: || v || = + ∞ (cid:88) i =1 | v i | i + + ∞ (cid:88) i =1 | v i | i (28)where v ki is the i -th component in the class k of the vector v . The reason behind chosen a such norm will berevealed in the proof of Proposition 16. Proposition 15.

For any µ > and ﬁnite time horizon T , there exists positive constant C such that P x ( sup ≤ t

For any µ > , there exists a time T such that for each T > T , there exists a positive constant s with, P x ( sup T ≤ t

See appendix NWe remind that starting from an initial state x , our objective is to compare the total expected average age per userunder Whittle index policy which can be expressed as T E wi (cid:104)(cid:80) T − t =0 (cid:80) Kk =1 (cid:80) + ∞ i =1 Z k,Ni ( t ) i | Z N (0) = x (cid:105) where Z N ( t ) evolves under Whittle index policy, with the optimal age of the relaxed problem per user whose expressionin function of z ∗ is, C RP = C RP,N N = (cid:80) K (cid:80) + ∞ i =1 z k, ∗ i i , when the number of users N as well as the time duration T grow.According to Lemma 4, we are ready now to establish the asymptotic optimality of the Whittle index policy. Proposition 16.

Starting from a given initial state Z N (0) = z (0) = x , then:lim T → + ∞ lim N →∞ T E wi (cid:34) T − (cid:88) t =0 K (cid:88) k =1 + ∞ (cid:88) i =1 Z k,Ni ( t ) i | Z N (0) = x (cid:35) = K (cid:88) k =1 + ∞ (cid:88) i =1 z k, ∗ i i (29) Proof.

See appendix O. p p B α B α for awide range of channel statics Figure 6: Average age per-user under the Whittle’s index policyV. N UMERICAL R ESULTS

A. Veriﬁcation of assumption 1

In this section, we compute the value of the lower bound on α given in Assumption 1. We denote this lowerboundby B α . For a wide range of parameters p and p , we provide an exhaustive table that represents the lower boundon α in function of p and p . As can be seen, the lowerbound decreases when p and p are close one to theother. Moreover, it grows even smaller when p and p have relatively high values. According to table I, we cannotice that in most cases of ( p , p ) , the lower bound of α doesn’t exceed . . This implies that the interval of α where the assumption 1 is satisﬁed, is enough wide for different values of p and p . B. Implementation of the Whittle’s index policy

In this section, we evaluate the Whittle’s index policy’s performance by comparing the per-user average age ofthe Whittle’s index policy to the optimal per-user average age of the relaxed problem C rp . To that extent, we letthe number of users in class and class to be equal to N . The probability of successful transmission of class and class are set to . and . , respectively. At each time slot t , at most, M = N of users can be scheduledper each time slot, i.e., α = MN = . As seen in Figure 6, the gap between the two policies tightens as the numberof users N grows. Indeed, these numerical results corroborate our theoretical analysis and show that the Whittle’sindex policy is effectively globally asymptotically optimal.VI. C ONCLUSION

In this paper, we have examined the average age minimization problem where only a fraction of the networkusers can transmit simultaneously over unreliable channels. We presented and derived a novel method based Cauchy criterion to prove the Whittle’s index policy’s optimality in the many-users regime. Compared to the state of theart methods, our approach does not require imposing strict mathematical assumptions, which can be challengingto verify. We also provided numerical results that corroborate our theoretical ﬁndings and highlight the Whittle’sindex policy’s performance. Moving forward, the next research direction is to extend our proof to various otherscheduling problems under different system models and objective functions.R EFERENCES[1] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should one update?” in , March2012, pp. 2731–2735.[2] B. Buyukates and S. Ulukus, “Timely distributed computation with stragglers. october 2019,”

Available on .[3] S. Farazi, A. G. Klein, J. A. McNeill, and D. R. Brown, “On the age of information in multi-source multi-hop wireless status updatenetworks,” in . IEEE,2018, pp. 1–5.[4] A. Maatouk, M. Assaad, and A. Ephremides, “The age of updates in a simple relay network,” in . IEEE, 2018, pp. 1–5.[5] P. Zou, O. Ozel, and S. Subramaniam, “Waiting before serving: A companion to packet management in status update systems,”

IEEETransactions on Information Theory , 2019.[6] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,”

IEEE Transactionson Information Theory , vol. 63, no. 11, pp. 7492–7508, 2017.[7] R. Talak, S. Karaman, and E. Modiano, “Minimizing age-of-information in multi-hop wireless networks,” in . IEEE, 2017, pp. 486–493.[8] A. M. Bedewy, Y. Sun, and N. B. Shroff, “Age-optimal information updates in multihop networks,” in . IEEE, 2017, pp. 576–580.[9] A. Kosta, N. Pappas, A. Ephremides, and V. Angelakis, “Age of information performance of multiaccess strategies with packet management,”

Journal of Communications and Networks , vol. 21, no. 3, pp. 244–255, 2019.[10] Y. Sun, E. Uysal-Biyikoglu, and S. Kompella, “Age-optimal updates of multiple information ﬂows,” in

IEEE INFOCOM 2018-IEEEConference on Computer Communications Workshops (INFOCOM WKSHPS) . IEEE, 2018, pp. 136–141.[11] E. T. Ceran, D. G¨und¨uz, and A. Gy¨orgy, “Average age of information with hybrid arq under a resource constraint,”

IEEE Transactions onWireless Communications , vol. 18, no. 3, pp. 1900–1913, 2019.[12] A. Maatouk, M. Assaad, and A. Ephremides, “On the age of information in a csma environment,”

IEEE/ACM Transactions on Networking ,vol. 28, no. 2, pp. 818–831, 2020.[13] Z. Jiang, B. Krishnamachari, X. Zheng, S. Zhou, and Z. Niu, “Timely status update in massive iot systems: Decentralized scheduling forwireless uplinks,” arXiv preprint arXiv:1801.03975 , 2018.[14] R. Talak, S. Karaman, and E. Modiano, “Distributed scheduling algorithms for optimizing information freshness in wireless networks,” in . IEEE, 2018, pp. 1–5.[15] I. Kadota, A. Sinha, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, “Scheduling policies for minimizing age of information in broadcastwireless networks,”

IEEE/ACM Transactions on Networking , vol. 26, no. 6, pp. 2637–2650, 2018.[16] P. Ansell, K. D. Glazebrook, J. Ni˜no-Mora, and M. O’Keeffe, “Whittle’s index policy for a multi-class queueing system with convexholding costs,”

Mathematical Methods of Operations Research , vol. 57, no. 1, pp. 21–39, 2003.[17] S. Kriouile, M. Larranaga, and M. Assaad, “Asymptotically optimal delay-aware scheduling in wireless networks,” arXiv preprintarXiv:1807.00352 , 2018.[18] ——, “Whittle index policy for multichannel scheduling in queueing systems,” in . IEEE, 2019, pp. 2524–2528.[19] M. Larra˜naga, M. Assaad, A. Destounis, and G. S. Paschos, “Asymptotically optimal pilot allocation over markovian fading channels,”

IEEE Transactions on Information Theory , 2017.[20] K. Liu and Q. Zhao, “Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access,”

IEEETransactions on Information Theory , vol. 56, no. 11, pp. 5547–5567, 2010. [21] A. Maatouk, S. Kriouile, M. Assaad, and A. Ephremides, “On the optimality of the whittle’s index policy for minimizing the age ofinformation,” arXiv preprint arXiv:2001.03096 , 2020.[22] W. Ouyang, A. Eryilmaz, and N. B. Shroff, “Downlink scheduling over markovian fading channels,” IEEE/ACM Transactions onNetworking , vol. 24, no. 3, pp. 1801–1812, 2015.[23] K. P. Papadaki and W. B. Powell, “Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem,”

European Journal of Operational Research , vol. 142, no. 1, pp. 108–127, 2002.[24] R. R. Weber and G. Weiss, “On an index policy for restless bandits,”

Journal of Applied Probability , vol. 27, no. 3, pp. 637–648, 1990.[25] P. Whittle, “Restless bandits: Activity allocation in a changing world,”

Journal of applied probability , vol. 25, no. A, pp. 287–298, 1988.[26] C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimal queuing network control,”

Mathematics of Operations Research ,vol. 24, no. 2, pp. 293–305, 1999.[27] T. G. Kurtz, “Strong approximation theorems for density dependent markov chains,”

Stochastic Processes and their Applications , vol. 6,no. 3, pp. 223–240, 1978. A PPENDIX AP ROOF OF L EMMA z ( t + 1) = E (cid:104) Z N ( t + 1) (cid:12)(cid:12)(cid:12) Z N ( t ) = z ( t ) (cid:105) At time t + 1 , applying Whittle index policy, in average exactly a proportion of p k α k ( t ) of users will be at stateone since α k ( t ) refers to the proportion of users in class k that are scheduled. Accordingly, z k ( t + 1) = p k α k ( t ) .While for ≤ i < l k ( t ) , the users’ proportion z ki ( t ) is not scheduled. Therefore at time t + 1 , since prescribingidle action to a given user implies that its state will be increased by , the proportion z ki ( t ) at state i in class k will be at state i + 1 . Thus, E (cid:104) Z N,ki +1 ( t + 1) (cid:12)(cid:12)(cid:12) Z N ( t ) = z ( t ) (cid:105) = z ki +1 ( t + 1) = z ki ( t ) ..A PPENDIX BP ROOF OF L EMMA

Lemma 5.

We have for all integer i and for k = 1 , : w k ( i + 1) − w k ( i ) = ip k + 1 Proof. renewcommand (cid:4)

The result can be obtained directly by replacing w k ( i ) by its expression.In order to prove the present lemma, we proceed in two steps: • We prove ﬁrst by contradiction that there exists a given time t f such that α ( t f ) > . • We prove that if α ( t f ) > , then α ( t ) > for all t ≥ t f .1) For the ﬁrst point, we suppose that for all t , we have that α ( t ) = 0 . Consequently, we get that z ( t + T t ) =0 , · · · , z l ( t + T t ) ( t + T t ) = 0 , and α ( t + T t ) = 0 . This means that, the proportion of all users in class is equalto . However, the users’ proportion of class is γ (cid:54) = 0 . That is, there exists a given time t f such α ( t f ) > .

2) Before addressing the second point, we recall that α ( t ) refers to the scheduled users’ proportion in the class . Thereby, α ( t ) contains all users with the highest Whittle index values among all users in class . To thatextent, at time t f , the Whittle index of α ( t f ) is greater than the Whittle index of the users’ proportion − α that we denote by C . We let S t f ( C ) be the set of pair (state,class) at time t f in the users’ proportion C .Denoting by q the smallest state of α ( t f ) , n and m a given state and class respectively such that z mn ( t ) belongs to C at time t f , then w m ( n ) ≤ w ( q ) . Under the Whittle index policy, at time t f + 1 , the statesof a users’ proportion that equals to (1 − p ) α ( t f ) among the users’ proportion α ( t f ) , will be increasedby one in comparison with the time slot t f , as well as the users’ proportion C . Accordingly, the smalleststate of the proportion (1 − p ) α ( t f ) , is q + 1 . S t f +1 ( C ) is shifted of one with respect to S t f ( C ) , i.e., ( n, m ) ∈ S t f ( C ) ⇔ ( n + 1 , m ) ∈ S t f +1 ( C ) . We compare w ( q + 1) with the Whittle index of n in class m such that ( n, m ) ∈ S t f +1 ( C ) . In that direction, we let ( n, m ) ∈ S t f +1 ( C ) , and we distinguish between twocases: • m = 1 : Leveraging the fact that ( n − , m ) ∈ S t f ( C ) , then w ( q ) ≥ w ( n − . That implies that n − ≤ q since w k ( . ) is increasing. Hence n ≤ q + 1 . As consequence, w ( n ) ≤ w ( q + 1) • m = 2 : Again we distinguish between two case: – If n − ≤ q , then w ( n ) < w ( n ) ≤ w ( q + 1) .Therefore, we obtain our desired result for the ﬁrst case. – If n − > q :We have that: w ( q + 1) − w ( n ) = ( w ( q + 1) − w ( q )) − ( w ( n ) − w ( n − w ( q ) − w ( n − Applying Lemma 5, we obtain: ( w ( q + 1) − w ( q )) − ( w ( n ) − w ( n − qp − ( n − p . Giventhat w ( n − ≤ w ( q ) , therefore replacing by their expressions we get: ( n − n − p / n − ≤ ( q − qp / q As n − > q , then: ( n − n − p / ≤ ( q − qp / Hence: ( n − p ≤ qp Therefore, ( w ( q +1) − w ( q )) − ( w ( n ) − w ( n − ≥ . Hence, knowing that w ( q ) − w ( n − ≥ we end up with our desired result for this case, i.e. w ( q + 1) − w ( n ) ≥ .Thus, we have proved that at time t f + 1 , all the users’ proportions in C whose sum is equal to − α have aWhittle index less than that of (1 − p ) α ( t f ) deﬁned in the beginning of this proof. That means that thereexists at least a users’ proportion that equals to − α with Whittle index values less than those of the statesof the users’ proportion (1 − p ) α ( t f ) . Then surely, the users’ proportion (1 − p ) α ( t f ) that is different from belongs to the users’ proportion α with the highest Whittle index values. This implies that surely attime t f + 1 , there will be at least one queue in class 1 belonging to α with the highest Whittle index values.Therefore, we have that α ( t f + 1) > . This result can be generalized for all t ≥ t f . In other words, wehave for all t ≥ t f , α ( t ) > . A PPENDIX CP ROOF OF L EMMA α ( j ) + α ( j ) = α for all integers j , then, if α ( t + i ) = 0 , α ( t + i ) = α . For j ∈ [1 , T max ] , we have that T max − j ∈ [0 , T max − . This means that α ( t + T max − j ) is equal to , which implies that α ( t + T max − j ) = α .Moreover, knowing that l ( t + T max ) ≤ T max , then for all j ∈ [1 , l ( t + T max )] , T max − j ∈ [ T max − l ( t + T max ) , T max − ⊂ [0 , T max − . Hence, we get that α ( t + T max − j ) = 0 , for all j ∈ [1 , l ( t + T max )] .Therefore, according to the deﬁnition 2, T max satisﬁes: T max p α ≥ − α (30) T max ≥ − αp α (31)Providing that T max by deﬁnition is the ﬁrst time when (cid:80) l ( t + T max ) j =1 p α ( t + T max − j ) + (cid:80) T max j =1 p α ( t + T max − j ) exceeds − α , then at time t + T max − , (cid:80) l ( t + T max − j =1 p α ( t + T max − − j )+ (cid:80) T max − j =1 p α ( t + T max − − j ) < − α . This latter sum is equal to ( T max − p α which is less than − α . Therefore, we have as result that T max < − αp α + 1 . As there is one integer value between − αp α and − αp α + 1 , then T max doesn’t depend on t , andsatisﬁes: − αp α ≤ T max < − αp α + 1 .. A PPENDIX DP ROOF OF P ROPOSITION w ( n ) = ( n − p n + n , and w ( n ) = ( n − p n + n . We start ﬁrst by ﬁnding the set of states forwhich the Whittle index alternate between the two classes. As we can see from the expression of the Whittle index,for a given state n , w ( n ) < w ( n ) as p < p . In order to have the condition of alternation strictly satisﬁed forany given state n , we must have w ( n ) < w ( n + 1) . Hence, denoting by f ( n ) the difference w ( n + 1) − w ( n ) ,we study the sign of f ( n ) to see for which n f is strictly positive. Lemma 6.

For all n ∈ [1 , D [ , f ( n ) > Proof.

We have that: f ( n ) = n p − p ) + n p + p ) + 1 (32)Hence: f (cid:48) ( n ) = n ( p − p ) + p + p (33)The derivative is equal to zero for n = p + p p − p ) , which is greater strictly than . This means that f is strictlyincreasing in [0 , p + p p − p ) ] since f (cid:48) ( n ) > in [0 , p + p p − p ) [ . Providing that f (0) = 1 , then surely f is strictly positive in [0 , p + p p − p ) ] . This means that, the unique positive solution for f ( n ) = 0 must be in the interval [ p + p p − p ) , + ∞ [ ,as lim n → + ∞ f ( n ) = −∞ . Indeed, the unique solution n of f ( n ) = 0 in [ p + p p − p ) , + ∞ [ is the biggest root of thepolynomial (32) which is exactly the value D introduced in Assumption 1. As the function f is decreasing in [ p + p p − p ) , + ∞ [ , then f is strictly positive in [0 , D [ . Therefore, f ( n ) > for n ∈ [1 , D [ , which concludes theproof. (cid:4) According to Lemma 6, the order of the Whittle index strictly alternates between the two states when n ∈ [1 , D [ .Therefore, we need to prove that T max + 1 is upper bounded by D in order to prove that the alternation conditionis satisﬁed from state to T max + 1 .Indeed, as we have found an upper bound of T max which is equal to − αp α + 1 (according to Lemma 3), we justneed to prove that − αp α + 2 is strictly less than D .Under assumption (1), we have that: α >

11 + ( D − p (34) α (1 + p ( D − > (35) αp ( D − > − α (36) D − > − αp α (37) D > − αp α + 2 (38)Hence, from state to T max + 1 , the order of the Whittle index strictly alternates between the two classes.Accordingly, the proof is concluded. A PPENDIX EP ROOF OF P ROPOSITION

Lemma 7.

For any state q , at any time t , we have that: w ( q ) ≤ w ( l ( t )) ⇒ w ( q ) ≤ w ( l ( t )) and w ( q ) ≤ w ( l ( t )) ⇒ w ( q ) ≤ w ( l ( t )) Proof.

See appendix F (cid:4)

We consider t ≥ t f . After time T t , we have that: l ( t + T t ) (cid:88) j =1 p α ( t + T t − j ) + T t (cid:88) j =1 p α ( t + T t − j ) ≥ − α (39) Then, as it has been showcased, at time t + T t , there exists l ( t + T t ) ≤ l ( t + T t ) , l ( t + T t ) ≤ T t , γ ( t + T t ) = 1 and < β ( t + T t ) ≤ ; or < γ ( t + T t ) ≤ and β ( t + T t ) = 1 such that: l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ β ( t + T t ) p α ( t + T t − l ( t + T t ))+ γ ( t + T t ) p α ( t + T t − l ( t + T t )) = 1 − α (40)with l ( t + T t ) and l ( t + T t ) being the instantaneous thresholds in class 1 and 2 respectively at time t + T t .Now, we prove by contradiction that max( l ( t + T t ) , l ( t + T t )) ≤ T max .We prove ﬁrst that l ( t + T t ) is greater than l ( t + T t ) .As we have that w ( l ( t + T t )) < w ( l ( t + T t )) , then according to lemma 7, w ( l ( t + T t )) ≤ w ( l ( t + T t )) .This implies that l ( t + T t ) is greater than l ( t + T t ) .Reasoning by contradiction, we suppose that l ( t + T t ) > T max ( l ( t + T t ) = max( l ( t + T t ) , l ( t + T t )) > T max ).Based on this, we have that w ( T max ) < w ( l ( t + T t )) because w ( T max ) < w ( T max + 1) ≤ w ( l ( t + T t )) sincethe order of the Whittle index alternates between the two classes as it has been proved in Proposition 8. To thatextent, we distinguish between two cases:1) First case : If β ( t + T t ) = 1 :We have that w ( T max ) < w ( l ( t + T t )) . Then, according to Lemma 7, we have that w ( T max ) ≤ w ( l ( t + T t )) .Hence, we can conclude that T max ≤ l ( t + T t ) as w is an increasing function with the age of information.Moreover, since we have that p α ( t + T t − j ) + p α ( t + T t − j ) > p α (the strict inequality is due to the factthat α ( t ) > as t ≥ t f according to Lemma 2), then according to Lemma 3, we obtain: l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ β ( t + T t ) p α ( t + T t − l ( t + T t ))+ γ ( t + T t ) p α ( t + T t − l ( t + T t )) = 1 − α (41) = l ( t + T t ) (cid:88) j =1 p α ( t + T t − j ) + l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j ) + γ ( t + T t ) p α ( t + T t − l ( t + T t )) (42) ≥ T max (cid:88) j =1 p α ( t + T t − j ) + T max (cid:88) j =1 p α ( t + T t − j ) > T max p α ≥ − α (43)The last inequality comes from the fact that T max ≥ − αp α . This implies that: − α > − α (44)This gives us an illogical statement. Consequently, in this case, the assumption l ( t + T t ) > T max is not true.2) Second case : If β ( t + T t ) < :As we have that β ( t + T t ) < , then γ ( t + T t ) should be equal to . Therefore, all users at state l ( t + T t ) inclass 2 are in the users’ proportion − α with the smallest Whittle index values. However, there exists users instate l ( t + T t ) in class 1 in the users’ proportion α that has the highest Whittle index values. That is, we have w ( l ( t + T t )) ≥ w ( l ( t + T t )) . As it has been established before tackling the ﬁrst case, w ( T max ) < w ( l ( t + T t )) ,then w ( T max ) < w ( l ( t + T t )) . This means that l ( t + T t ) > T max . Therefore, we have that: l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ l ( t + T t ) − (cid:88) j =1 p α ( t + T t − j )+ β ( t + T t ) p α ( t + T t − l ( t + T t ))+ γ ( t + T t ) p α ( t + T t − l ( t + T t )) = 1 − α (45) ≥ T max (cid:88) j =1 p α ( t + T t − j ) + T max (cid:88) j =1 p α ( t + T t − j ) > T max p α ≥ − α (46)This implies that: − α > − α (47)Consequently, in this case, the assumption l ( t + T t ) > T max is not true.Hence, in both cases, l ( t + T t ) must be less than T max , i.e. max( l ( t + T t ) , l ( t + T t )) ≤ T max for all t .Thus, we end up with T max = l max , which concludes our proof.A PPENDIX FP ROOF OF L EMMA l ( t ) and l ( t ) , we have that { z i ( t ) } ≤ i ≤ l ( t ) ∪{ z i ( t ) } ≤ i ≤ l ( t ) is exactly the set { z ki ( t ) : w k ( i ) ≤ max( w ( l ( t ) , w ( l ( t )) } .Hence, if a given q veriﬁes w ( q ) ≤ w ( l ( t )) , then w ( q ) ≤ max( w ( l ( t ) , w ( l ( t )) , that implies that z q ( t ) ∈{ z ki ( t ) : w k ( i ) ≤ max( w ( l ( t ) , w ( l ( t )) } = { z i ( t ) } ≤ i ≤ l ( t ) ∪ { z i ( t ) } ≤ i ≤ l ( t ) . Knowing that the highest users’proportion’s state of the aforementioned set in class 1 is l ( t ) , then q ≤ l ( t ) . Therefore as w ( . ) is increasing, w ( q ) ≤ w ( l ( t )) . A PPENDIX GP ROOF OF P ROPOSITION l ( t ) − (cid:88) i =1 z i ( t ) + l ( t ) − (cid:88) i =1 z i ( t ) + β ( t ) z l ( t ) ( t ) + γ ( t ) z l ( t ) ( t ) = 1 − α (48)with l ( t ) and l ( t ) being the thresholds in class and respectively at time t , and β ( t ) = 1 and < γ ( t ) ≤ ,or γ ( t ) = 1 and < β ( t ) ≤ .Our aim in this proof is to show that there is a link between l ( t ) and l ( t ) when they are less than T max . By doingso, we ﬁnd a general form of the aforementioned equation. To that end, we prove ﬁrst that l ( t ) is less than l ( t ) .Indeed, as we have w ( l ( t )) < w ( l ( t )) , then according to lemma 7, w ( l ( t )) ≤ w ( l ( t )) . Consequently, wecan conclude that l ( t ) ≤ l ( t ) .Secondly, we prove that l ( t ) ≤ l ( t )+1 . As the order of the Whittle indices alternates between the two classes fromstate to state T max + 1 , w ( l ( t ) − < w ( l ( t )) . Hence, according to lemma 7, we have that w ( l ( t ) − ≤ w ( l ( t )) . Consequently, l ( t ) − ≤ l ( t ) .Given that l ( t ) ≤ l ( t ) ≤ l ( t ) + 1 , then l ( t ) can be either l ( t ) or l ( t ) − .The second step consists of deriving the value of β ( t ) or γ ( t ) depending on the value of l ( t ) and l ( t ) . • If l ( t )) = l ( t ) :We prove that γ ( t ) = 1 if z l ( t ) > . Indeed, if γ ( t ) (cid:54) = 1 and z l ( t ) > , thus there is at least a nonempty set of users in class at state l ( t ) that belongs to the users’ proportion α with the highest Whittleindex values. However there exists always a non empty set of queues in class at state l ( t ) that belong to − α users’ proportion with the least Whittle index values, since β ( t ) > . Then, we have that w ( l ( t )) ≥ w ( l ( t )) . However, we know that w ( l ( t )) = w ( l ( t )) < w ( l ( t )) . This later inequality contradicts withwhat precedes. Thus, the statement that γ ( t ) (cid:54) = 1 is not true, i.e. γ ( t ) = 1 .In this case we denote l ( t ) = l ( t ) = l ( t ) .We end up: l ( t ) − (cid:88) j =1 z i ( t ) + l ( t ) − (cid:88) i =1 z i ( t ) + β ( t ) z l ( t ) ( t ) + z l ( t ) ( t ) = 1 − α (49)If z l ( t ) = 0 , the last equation still valid since z l ( t ) = 0 whatever the value of γ ( t ) , namely when γ ( t ) = 1 . • If l ( t ) + 1 = l ( t ) :We prove that β ( t ) = 1 if z l ( t ) > . Indeed, if β ( t ) (cid:54) = 1 and z l ( t ) > , there is at least a set of users inclass 1 in state l ( t ) that belongs to the users’ proportion α with the highest Whittle index values. Howeverthere is always a set of queues in class 2 at state l ( t ) that belong to − α users’ proportion with the leastWhittle index values, since γ ( t ) > . Then, we have that w ( l ( t )) ≥ w ( l ( t )) . However, we know that w ( l ( t )) = w ( l ( t ) + 1) > w ( l ( t )) since the order of Whittle index alternates between the two classesfrom state to T max + 1 according to Proposition 8. Thus, w ( l ( t ) + 1) > w ( l ( t )) ≥ w ( l ( t ) + 1) , whichgives us an obvious contradiction. Therefore, we can assert that β ( t ) = 1 .In this case, we consider that l ( t ) = l ( t ) + 1 = l ( t ) and we get: l ( t ) − (cid:88) i =1 z i ( t ) + l ( t ) − (cid:88) i =1 z i ( t ) + γ ( t ) z l ( t ) ( t ) = 1 − α (50)Similarly to the ﬁrst case, if z l ( t ) = 0 , the last equation still valid since z l ( t ) = 0 whatever the value of β ( t ) ,namely when β ( t ) = 1 . Subsequently, combining the two cases, there exists l ( t ) such that: l ( t ) − (cid:88) i =1 z i ( t ) + l ( t ) − (cid:88) i =1 z i ( t ) + β ( t ) z l ( t ) ( t ) + γ ( t ) z l ( t ) ( t ) = 1 − α (51)where β ( t ) = 0 and < γ ( t ) ≤ , or < β ( t ) ≤ and γ ( t ) = 1 .A PPENDIX HP ROOF OF P ROPOSITION • For T = T , we have already proved our claim. • We suppose that the statement is valid for a given T , i.e. there exists l ( T ) , β ( T ) and γ ( T ) such that: l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + γ ( T ) p α ( T − l ( T )) = 1 − α (52) where β ( T ) = 0 and < γ ( T ) ≤ , or < β ( T ) ≤ and γ ( T ) = 1 . Then, at the next time slot, among theusers’ proportion scheduled, α , exactly p α ( T ) and p α ( T ) will go to state one for each class, while for therest, their states will be incremented by one. Likewise, for the other users for which the action taken is passive,their states will be incremented by one. As consequence, the decreasing order according to the Whittle indexvalue for these proportions of users at the next slot is β ( T ) p α ( T − l ( T )) , γ ( T ) p α ( T − l ( T )) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) + 3) , p α ( T − l ( T ) +3) , · · · , p α ( T ) , p α ( T ) (As we have mentioned before, the order of the Whittle indices alternates betweenthe two classes because l ( T ) + 1 ≤ l max + 1 ). Moreover, the states of the users’ proportion (1 − p ) α ( t ) and (1 − p ) α ( t ) ; which are scheduled but they don’t transit to the state with respect to their classes; willbe increased by one. Leveraging the above results, we provide the decreasing order of all users’ proportionsaccording to the Whittle index value depending on two cases of β ( t ) .If β ( T ) = 0 , then the smallest state’s value among the users’ proportions (1 − p ) α ( t ) and (1 − p ) α ( t ) attime T + 1 is l ( T ) + 1 . Hence, their Whittle index values will be higher than w ( l ( T ) + 1) , and consequently,they will be higher than those of users’ proportion of γ ( T ) p α ( T − l ( T )) at state l ( T ) + 1 in class 2.If β ( T ) (cid:54) = 1 , the smallest state value among the users’ proportions (1 − p ) α ( t ) and (1 − p ) α ( t ) at time T + 1 is respectively l ( T ) + 1 and l ( T ) + 2 . Then, their Whittle index values will be higher than w ( l ( T ) + 1) ( w ( l ( T ) + 1) < w ( l ( T ) + 2) as the alternation condition is satisﬁed from until l max + 1 ). Consequently,their Whittle index values will be higher than the Whittle index of users’ proportion β ( T ) p α ( T − l ( T )) atstate l ( T ) + 1 in class 1.Thus, the decreasing order of all users’ proportions according to the Whittle index value whatever the value of β ( T ) at T + 1 is: (1 − p ) α ( t ) , (1 − p ) α ( t ) , β ( T ) p α ( T − l ( T )) , γ ( T ) p α ( T − l ( T )) , p α ( T − l ( T ) +1) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) + 3) , p α ( T − l ( T ) +3) , · · · , p α ( T ) , p α ( T ) .As we have that (1 − p ) α ( t ) + (1 − p ) α ( t ) ≤ α , then surely the thresholds at time T + 1 in class 1and in class 2 are less than the state of the users’ proportion β ( T ) p α ( T − l ( T )) and γ ( T ) p α ( T − l ( T )) respectively. Therefore, there exists l ( T + 1) , l ( T + 1) , β ( T + 1) and γ ( T + 1) such that < β ( T + 1) ≤ and γ ( T + 1) = 1 , or β ( T + 1) = 1 and < γ ( T + 1) ≤ : l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ β ( T +1) p α ( T +1 − l ( T +1))+ γ ( T ) p α ( T +1 − l ( T +1)) = 1 − α (53)Now we prove by contradiction that max( l ( T + 1) , l ( T + 1)) ≤ T max .We prove ﬁrst that l ( T + 1) is greater than l ( T + 1) .As w ( l ( T + 1)) < w ( l ( T + 1)) , that means according to lemma 7, l ( T + 1) is greater than l ( T + 1) ( w ( l ( T + 1)) < w ( l ( T + 1)) ).Reasoning by contradiction, if l ( T + 1) > T max , then we distinguish between two cases: – First case: If β ( T + 1) = 1 :we have that w ( T max ) < w ( l ( T + 1)) ( w ( T max ) < w ( T max + 1) as the alternation condition is satisﬁed in [1 , T max + 1] ), i.e., according to lemma 7, we have that l max ≤ l ( T + 1) . Hence, accordingto lemmas 2 and 3, we have that: l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ β ( T +1) p α ( T +1 − l ( T +1))+ γ ( T +1) p α ( T +1 − l ( T +1)) (54) = 1 − α ≥ T max (cid:88) j =1 p α ( T + 1 − j ) + T max (cid:88) j =1 p α ( T + 1 − j ) > T max p α ≥ − α (55)Therefore we end up with: − α > − α (56)Hence, the assumption that l ( T + 1) > T max leads us to an illogical statement. Consequently, thehypothesis of l ( T + 1) > l max is not valid for the ﬁrst case. – Second case: If β ( T + 1) < :Then we have that γ ( T + 1) = 1 . Therefore, all users at state l ( T + 1) in class 2 are in the proportion − α with the smallest Whittle index values. However, there are users in state l ( T + 1) in class 1 of the α proportion with the highest Whittle index values. In other words, w ( l ( T + 1)) ≥ w ( l ( T + 1)) >w ( T max ) . This means that l ( T + 1) > T max . Therefore, according to lemmas 2 and 3: l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ β ( T +1) p α ( T +1 − l ( T +1))+ γ ( T +1) p α ( T +1 − l ( T +1)) (57) = 1 − α ≥ T max (cid:88) j =1 p α ( T + 1 − j ) + T max (cid:88) j =1 p α ( T + 1 − j ) > T max p α ≥ − α (58) − α > − α (59)Therefore, the hypothesis of l ( T + 1) > T max is not valid for the second case.Consequently, we have that l ( T + 1) ≤ T max , i.e. max( l ( T + 1) , l ( T + 1)) ≤ T max . Then, according toProposition 9, there exists l ( T + 1) , and γ ( T + 1) and β ( T + 1) such that: l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ l ( T +1) − (cid:88) j =1 p α ( T +1 − j )+ β ( T +1) p α ( T +1 − l ( T +1))+ γ ( T +1) p α ( T +1 − l ( T +1)) = 1 − α (60)where β ( T + 1) = 0 and < γ ( T + 1) ≤ , or < β ( T + 1) ≤ and γ ( T + 1) = 1 .To conclude, we have proved by induction, that for all T ≥ T , there exists l ( T ) , β ( T ) and γ ( T ) , such that: l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + γ ( T ) p α ( T − l ( T )) = 1 − α (61)where β ( T ) = 0 and < γ ( T ) ≤ , or < β ( T ) ≤ and γ ( T ) = 1 , which concludes our proof. A PPENDIX IP ROOF OF P ROPOSITION T : l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + γ ( T ) p α ( T − l ( T )) = 1 − α, (62)where β ( T ) = 0 and ≤ γ ( T ) < , or ≤ β ( T ) < and γ ( T ) = 1 . Among the users’ proportion scheduled α ,exactly p α ( T ) and p α ( T ) will go to state one for each classes, and (1 − p ) α ( T ) and (1 − p ) α ( T ) will goto the next state.For the other users for which the action taken is passive, their states will be increased by one, then the de-creasing order according to the Whittle index value at the next time slot is β ( T ) p α ( T − l ( T )) , γ ( T ) p α ( T − l ( T )) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) + 2) · · · p α ( T ) , p α ( T ) (Aswe said before that the order based on the value of the Whittle indices, alternate between the two classes fromstate to l ( T ) ≤ l max + 1 ). Moreover, the users’ proportion scheduled (1 − p ) α ( T ) and (1 − p ) α ( T ) will beat states that have Whittle index values higher than those of β ( T ) p α ( T − l ( T )) and γ ( T ) p α ( T − l ( T )) (as wehave explained in the proof of Proposition 10).Hence, the global decreasing order according to the Whittle index value is (1 − p ) α ( T ) , (1 − p ) α ( T ) , β ( T ) p α ( T − l ( T )) , γ ( T ) p α ( T − l ( T )) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) +2) · · · p α ( T ) , p α ( T ) .Providing that (1 − p ) α ( t ) + (1 − p ) α ( t ) ≤ α , then at time T + 1 : l ( T ) − (cid:88) j =1 p α ( T − j )+ l ( T ) − (cid:88) j =1 p α ( T − j )+ β ( T ) p α ( T − l ( T ))+ γ ( T ) p α ( T − l ( T ))+ p α ( T )+ p α ( T ) ≥ − α (63)Then, there exists β = 0 and < γ ≤ , or < β ≤ and γ = 1 , and sub-set { α ( T ) , α ( T ) , α ( T − , α ( T − · · · α ( T − m ) , α ( T − m ) } ⊂ { α ( T − l ( T )) , α ( T − l ( T )) , α ( T − l ( T ) + 1) , α ( T − l ( T ) + 1) , α ( T − l ( T ) +2) , α ( T − l ( T ) + 2) · · · α ( T ) , α ( T ) } , such that: ( m +1) − (cid:88) j =1 p α ( T +1 − j )+ ( m +1) − (cid:88) j =1 p α ( T +1 − j )+ βp α ( T +1 − ( m +1))+ γp α ( T +1 − ( m +1)) = 1 − α (64)Indeed, m + 1 is effectively l ( T + 1) , β = β ( T + 1) , γ = γ ( T + 1) , and the elements of the set { α ( T ) , α ( T − , · · · α ( T − m ) } ∪ { α ( T + 1) } and the set { α ( T ) , α ( T − , · · · , α ( T − m ) } ∪ { α ( T + 1) } are exactlythe elements of the vectors A ( T + 1) and A ( T + 1) respectively. Given that { α ( T ) , α ( T − , · · · α ( T − m ) } and { α ( T ) , α ( T − , · · · , α ( T − m ) } are included in the set of elements of the vector A ( T ) and A ( T ) respectively, then for k = 1 , , all the elements of the vector A k ( T + 1) except α k ( T + 1) belong to the elementsof vector A k ( T ) . A PPENDIX JP ROOF OF P ROPOSITION A ( T ) and A ( T ) satisfy: l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + γ ( T ) p α ( T − l ( T )) = 1 − α, (65)where < β ( T ) ≤ and γ ( T ) = 1 , or β ( T ) = 0 and < γ ( T ) ≤ . We distinguish between two cases dependingon the values of β and γ (we drop the index T on β ( T ) and γ ( T ) to ease the notation): • First case: < β ≤ , and γ = 1 :Hence: l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + β ( T ) p α ( T − l ( T )) + p α ( T − l ( T )) = 1 − α (66)Our aim is to derive the expression of α k ( T +1) for class 1 and class 2. Among the users’ proportion scheduled α , exactly p α ( T ) and p α ( T ) will go to state one for each class, and the rest will go to the next state.Hence: α ( T + 1) = (1 − p ) α ( T ) + B ( T ) (67) α ( T + 1) = (1 − p ) α ( T ) + B ( T ) (68)such that B ( T ) + B ( T ) = p α ( T ) + p α ( T ) .At time T +1 , the decreasing order according to the Whittle index value is (1 − p ) α ( T ) , (1 − p ) α ( T ) , βp α ( T − l ( T )) , p α ( T − l ( T )) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) +2) , · · · , p α ( T ) , p α ( T ) .In order to get B ( T ) and B ( T ) , we sum the users’ proportions at different states starting from the users’proportion βp α ( T − l ( T )) following the decreasing order of the Whittle index until we get the sum thatequals to p α ( T ) + p α ( T ) . We distinguish between six sub-cases and for each sub-case, we prove that α k ( T + 1) is surely between two elements of the vector A k ( T ) . In fact, if we prove it just for one class, theresult will be true for the other one, since α ( T ) + α ( T ) = α for all T . In the following, we derive theexpression of α k ( T + 1) for k = 1 , , in function of the elements of the vector A ( T ) and A ( T ) and weshow that α ( T + 1) is surely between two elements of the vector A ( T ) .1) If p α ( T ) + p α ( T ) ≤ p βα ( T − l ( T )) :In this case p α ( T ) + p α ( T ) is less than p βα ( T − l ( T )) . Therefore, we will take a proportion of usersfrom p βα ( T − l ( T )) that equals to p α ( T ) + p α ( T ) denoted by C . This users’ proportion exactly equalsto B ( T ) + B ( T ) that we add to (1 − p ) α ( T ) and (1 − p ) α ( T ) . Thus, B ( T ) + B ( T ) = C . However,since all the users of the proportion C belong to p βα ( T − l ( T )) , then C contains only the users of the class1. Consequently, B ( T ) = C and B ( T ) = 0 . Hence: α ( T + 1) = (1 − p ) α ( T ) (69) As α ( T + 1) + α ( T + 1) = α , then: α ( T + 1) = α − α ( T + 1) (70)Now we ﬁnd the upper bound of α ( T ) − α ( T + 1) : α ( T ) − α ( T + 1) = p α ( T ) (71) ≤ βα ( T − l ( T )) p − α ( T ) p (72) ≤ p ( α ( T − l ( T )) − α ( T )) (73) = p ( α ( T ) − α ( T − l ( T ))) (74)The ﬁrst inequality comes from the fact that p α ( T ) + p α ( T ) ≤ p βα ( T − l ( T )) and the second onecomes from the fact that β ≤ .Given that α ( i ) − α ( j ) = α ( j ) − α ( i ) for all integers i and j , thus: α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (75)Moreover, we have that α ( T +1) − α ( T ) ≥ because α ( T +1) − α ( T ) ≤ . Therefore, α ( T ) ≤ α ( T +1) .On the other hands, as p ( α ( T − l ( T )) − α ( T )) ≥ α ( T + 1) − α ( T ) ≥ then α ( T − l ( T )) − α ( T ) ≥ α ( T + 1) − α ( T ) . This means that α ( T − l ( T )) ≥ α ( T + 1) . Consequently, we end up with: α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) (76)2) If βα ( T − l ( T )) p ≤ p α ( T ) + p α ( T ) ≤ βα ( T − l ( T )) p + α ( T − l ( T )) p :Hence: α ( T + 1) =(1 − p ) α ( T ) + βp α ( T − l ( T )) (77) α ( T + 1) = α − α ( T + 1) (78)Then: α ( T + 1) − α ( T ) = βp α ( T − l ( T ) − p α ( T ) (79) ≤ p ( α ( T − l ( T ) − α ( T )) (80)On the other hand, we have according to the right inequality of sub-case’s assumption: α ( T + 1) − α ( T ) = βp α ( T − l ( T ) − p α ( T ) (81) ≥ p α ( T ) − p α ( T − l ( T )) (82) = p ( α ( T − l ( T )) − α ( T )) (83) Hence : p ( α ( T − l ( T )) − α ( T )) ≤ α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (84)Knowing that p < p , the later inequalities imply that α ( T − l ( T )) − α ( T ) ≥ .As a result we have that: α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) (85)And α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (86)3) If βα ( T − l ( T )) p + α ( T − l ( T )) p ≤ p α ( T ) + p α ( T ) ≤ βα ( T − l ( T )) p + α ( T − l ( T )) p + p α ( T − l ( T ) + 1) :Hence: α ( T + 1) =(1 − p ) α ( T ) + p α ( T − l ( T )) (87) α ( T + 1) = α − α ( T + 1) (88)Therefore: α ( T + 1) − α ( T ) = p ( α ( T − l ( T )) − α ( T )) (89)And: α ( T ) − α ( T + 1) = p ( α ( T ) − α ( T − l ( T ))) (90)This means that if α ( T ) ≤ α ( T + 1) : α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) (91)And α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (92)If α ( T + 1) ≤ α ( T ) : α ( T − l ( T )) ≤ α ( T + 1) ≤ α ( T ) (93)And α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ))) (94)4) If βα ( T − l ( T )) p + α ( T − l ( T )) p + p α ( T − l ( T ) + 1) ≤ p α ( T ) + p α ( T ) ≤ βα ( T − l ( T )) p + α ( T − l ( T )) p + p α ( T − l ( T ) + 1) + p α ( T − l ( T ) + 1) : Hence: α ( T + 1) =(1 − p ) α ( T ) + p βα ( T − l ( T )) + p α ( T − l ( T ) + 1) (95) α ( T + 1) = α − α ( T + 1) (96) Therefore: α ( T + 1) − α ( T ) = − p α ( T ) + p βα ( T − l ( T )) + p α ( T − l ( T ) + 1) (97)(98)According to the left inequality of the assumption of this case, we have that: α ( T + 1) − α ( T ) ≤ p α ( T ) − p α ( T − l ( T )) (99) = p ( α ( T − l ( T )) − α ( T )) (100)On the other hand, we have that: α ( T + 1) − α ( T ) = − p α ( T ) + p βα ( T − l ( T )) + p α ( T − l ( T ) + 1) (101) ≥ p ( α ( T − l ( T ) + 1) + α ( T ) (102)(103)Hence: p ( α ( T − l ( T ) + 1) − α ( T )) ≤ α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (104)Thus:If α ( T ) ≤ α ( T + 1) : α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) (105)And α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (106) ≤ p ( α ( T − l ( T )) − α ( T )) (107)If α ( T + 1) ≤ α ( T ) : α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) (108)And α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (109)5) If there exists m ≥ such that: βα ( T − l ( T )) p + α ( T − l ( T )) p + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m ) ≤ p α ( T ) + p α ( T ) ≤ βα ( T − l ( T )) p + α ( T − l ( T )) p + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m + 1) : This means that: α ( T + 1) =(1 − p ) α ( T ) + p α ( T − l ( T )) + · · · + p α ( T − l ( T ) + m ) (110) α ( T + 1) = α − α ( T + 1) (111) We have that: α ( T + 1) − α ( T ) = − p α ( T ) + p α ( T − l ( T )) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) (112) ≥ p ( α ( T − l ( T ) + 1) − α ( T )) (113)On the other hand: α ( T + 1) − α ( T ) = − p α ( T ) + p α ( T − l ( T )) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) (114) ≤ p α ( T ) − βp α ( T − l ( T )) − m (cid:88) i =1 p α ( T − l ( T ) + i ) (115) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (116) = p ( α ( T − l ( T ) + 1) − α ( T ) (117)Thus: p ( α ( T ) − α ( T − l ( T ) + 1) ≤ α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (118)Therefore: α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) (119)And: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (120)6) If there exists m ≥ such that: βα ( T − l ( T )) p + α ( T − l ( T )) p + p α ( T − l ( T ) + 1) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m + 1) ≤ p α ( T ) + p α ( T ) ≤ βα ( T − l ( T )) p + α ( T − l ( T )) p + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m +1) + p α ( T − l ( T ) + m + 1) : Hence: α ( T + 1) =(1 − p ) α ( T ) + p βα ( T − l ( T )) + · · · + p α ( T − l ( T ) + m + 1) (121) α ( T + 1) = α − α ( T + 1) (122)We have that: α ( T + 1) − α ( T ) = − p α ( T ) + p βα ( T − l ( T )) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m + 1) (123) ≥ p ( α ( T − l ( T ) + 1) − α ( T )) (124) On the other hand: α ( T + 1) − α ( T ) = − p α ( T ) + p βα ( T − l ( T )) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m + 1) (125) ≤ p α ( T ) − m (cid:88) i =0 p α ( T − l ( T ) + i ) (126) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (127) = p ( α ( T − l ( T ) + 1) − α ( T )) (128)Thus: p ( α ( T − l ( T ) + 1) − α ( T )) ≤ α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T ) + 1) − α ( T )) (129)Therefore: α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) (130)And: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (131) • Second case: β = 0 and < γ ≤ :Hence, we have that: l ( T ) − (cid:88) j =1 p α ( T − j ) + l ( T ) − (cid:88) j =1 p α ( T − j ) + γp α ( T − l ( T )) = 1 − α (132)Then, at time T + 1 , the decreasing order according to the Whittle index value is (1 − p ) α ( T ) , (1 − p ) α ( T ) , γp α ( T − l ( T )) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 1) , p α ( T − l ( T ) + 2) , p α ( T − l ( T ) +2) , · · · , p α ( T ) , p α ( T ) . In order to obtain B ( T ) and B ( T ) , we sum the users’ proportions at differentstates starting from the users’ proportion γp α ( T − l ( T )) following the decreasing order of the Whittle indexuntil we get the sum that equals to p α ( T ) + p α ( T ) . For this case, we distinguish between ﬁve sub-cases,and for each sub-case, we prove that α ( T + 1) is surely between two elements of the vector A ( T ) .1) If p α ( T ) + p α ( T ) ≤ γα ( T − l ( T )) p :Hence: α ( T + 1) = (1 − p ) α ( T ) (133) α ( T + 1) = α − α ( T + 1) (134) We have that: α ( T ) − α ( T + 1) = p α ( T ) (135) ≤ γα ( T − l ( T )) p − α ( T ) p (136) ≤ p ( α ( T − l ( T )) − α ( T )) (137) = p ( α ( T ) − α ( T − l ( T ))) (138)Thus: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T )) (139)And: α ( T − l ( T )) ≤ α ( T + 1) ≤ α ( T ) (140)2) If γα ( T − l ( T )) p ≤ p α ( T ) + p α ( T ) ≤ γα ( T − l ( T )) p + α ( T − l ( T ) + 1) p Consequently: α ( T + 1) =(1 − p ) α ( T ) + γp α ( T − l ( T )) (141) α ( T + 1) = α − α ( T + 1) (142)Hence: α ( T + 1) − α ( T ) = − p α ( T ) + γp α ( T − l ( T )) (143) ≤ p ( α ( T − l ( T )) − α ( T )) (144)(145)On the other hand, according to the right inequality of the assumption of this case, we have that: α ( T + 1) − α ( T ) = − p α ( T ) + γp α ( T − l ( T )) (146) ≥ p ( α ( T ) − α ( T − l ( T ) + 1)) (147) = p ( α ( T − l ( T ) + 1) − α ( T )) (148)That means: p ( α ( T − l ( T ) + 1) − α ( T )) ≤ α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (149)i.e. p ( α ( T ) − α ( T − l ( T ) + 1)) ≤ α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ))) (150)Therefore:If α ( T ) ≤ α ( T + 1) : α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) (151) And: α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T ) + 1) − α ( T )) (152)If α ( T + 1) ≤ α ( T ) : α ( T − l ( T )) ≤ α ( T + 1) ≤ α ( T ) (153)And: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ))) (154)3) If γα ( T − l ( T )) p + α ( T − l ( T ) + 1) p ≤ p α ( T ) + p α ( T ) ≤ γα ( T − l ( T )) p + α ( T − l ( T ) +1) p + p α ( T − l ( T ) + 1) .Hence: α ( T + 1) =(1 − p ) α ( T ) + p α ( T − l ( T ) + 1) (155) α ( T + 1) = α − α ( T + 1) (156)We have that: α ( T + 1) − α ( T ) = p ( α ( T − l ( T ) + 1) − α ( T ) (157)If α ( T ) ≤ α ( T + 1) : α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) (158)And: α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T ) + 1) − α ( T )) (159)If α ( T + 1) ≤ α ( T ) : α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) (160)And: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (161)4) If there exists m ≥ such that: γα ( T − l ( T )) p + · · · + α ( T − l ( T ) + m ) p + p α ( T − l ( T ) + m ) ≤ p α ( T ) + p α ( T ) ≤ γα ( T − l ( T )) p + · · · + α ( T − l ( T ) + m ) p + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m + 1) :Hence: α ( T + 1) =(1 − p ) α ( T ) + p γα ( T − l ( T )) + · · · + p α ( T − l ( T ) + m ) (162) α ( T + 1) = α − α ( T + 1) (163) α ( T + 1) − α ( T ) = − p α ( T ) + p γα ( T − l ( T )) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) (164) ≥ p ( α ( T − l ( T ) + 1) − α ( T )) (165)On the other hand: α ( T + 1) − α ( T ) = − p α ( T ) + p γα ( T − l ( T )) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m ) (166) ≤ p α ( T ) − m (cid:88) i =1 p α ( T − l ( T ) + i ) (167) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (168) = p ( α ( T − l ( T ) + 1) − α ( T )) (169)Thus: p ( α ( T ) − α ( T − l ( T ) + 1) ≤ α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (170)Therefore: α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) (171)And: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (172)5) If there exists m ≥ such that: γα ( T − l ( T )) p + · · · + α ( T − l ( T ) + m ) p + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m + 1) ≤ p α ( T ) + p α ( T ) ≤ γα ( T − l ( T )) p + · · · + α ( T − l ( T ) + m ) p + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m + 1) + p α ( T − l ( T ) + m + 1) :That implies that: α ( T + 1) =(1 − p ) α ( T ) + · · · + p α ( T − l ( T ) + m ) + p α ( T − l ( T ) + m + 1) (173) α ( T + 1) = α − α ( T + 1) (174) α ( T + 1) − α ( T ) = − p α ( T ) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m + 1) (175) ≥ p ( α ( T − l ( T ) + 1) − α ( T )) (176)On the other hand: α ( T + 1) − α ( T ) = − p α ( T ) + p α ( T − l ( T ) + 1) + · · · + p α ( T − l ( T ) + m + 1) (177) ≤ p α ( T ) − γp α ( T − l ( T )) − m (cid:88) i =1 p α ( T − l ( T ) + i ) (178) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (179) = p ( α ( T − l ( T ) + 1) − α ( T )) (180) Thus: p ( α ( T − l ( T ) + 1) − α ( T ) ≤ α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T ) + 1) − α ( T )) (181)Therefore: α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) (182)And: α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) (183)In conclusion, all these six sub-cases when γ = 1 and < β ≤ , plus the ﬁve sub-cases when β = 0 and < γ ≤ , can be summarized in four cases:1) α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) , and α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) .2) α ( T − l ( T )) ≤ α ( T + 1) ≤ α ( T ) , and α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ))) .3) α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) , and α ( T ) − α ( T + 1) ≤ p ( α ( T ) − α ( T − l ( T ) + 1)) .4) α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) , and α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T ) + 1) − α ( T )) .Thus, the proof is concluded. A PPENDIX KP ROOF OF P ROPOSITION (cid:15) ≤ ( l − l ) (1 − p ) L − (1 − p ) L .Before tackling the proof, we give a brief insight about the procedure adopted to establish the desired result: We startby ﬁnding a given time denoted T ≥ T (cid:15) where α ( T ) is less than l . Then, we show that α ( T ) , · · · , α ( T + L ) are strictly less than l . To that end, we start ﬁrst by deﬁning a relevant sequence u n in function of (cid:15) , l , l and p when n ∈ [0 , L ] . After that, we prove that u n is increasing with n and strictly less than l . Next, we establish that u n is an upper bound of α ( · ) in [ T , T + L ] . More precisely, we show that α ( T + n ) ≤ u n for n ∈ [0 , L ] . Forthat purpose, we proceed with two following steps: The ﬁrst one consists of deriving an inequality veriﬁed by twoconsecutive terms of the sequence α ( · ) , namely α ( T ) and α ( T + 1) using the Proposition 12 given that T ≥ T (cid:15) .As for the second step, we use essentially the aforementioned result to demonstrate by induction that u n is indeed anupper bound of α ( T + n ) . Finally, based on these results, we show that there exists T d such that max A ( T d ) < l .To ﬁnd a time T ≥ T (cid:15) such that α ( T ) is less than l , we use the fact that min A ( t ) ≤ l for all t . At time T (cid:15) + L , we have the vector A ( T (cid:15) + L ) = ( α ( T (cid:15) + L ) , α ( T (cid:15) + L − , · · · , α ( T (cid:15) + L − l ( T (cid:15) + L ))) . Providingthat min A ( T (cid:15) + L ) ≤ l , then there exists an element from the vector A ( T (cid:15) + L ) less than l denoted by α ( T ) .According to 10, we have for all T ≥ T , l ( T ) ≤ l max = L , then l ( T (cid:15) + L ) ≤ L . That is, T is greater than T (cid:15) since T ≥ T (cid:15) + L − l ( T (cid:15) + L ) ≥ T (cid:15) . Therefore, we ﬁnd an element of the sequence α ( · ) at time T ≥ T (cid:15) suchthat α ( T ) ≤ l . To that extent, we are interested in proving that α ( T ) , · · · , α ( T + L ) are strictly less than l .To do so, we deﬁne a sequence u n which will constitute an upper bound of the function α ( T ) . Deﬁnition 4.

We deﬁne a sequence u n by induction:  u = l if n = 0 u n +1 = p ( l + (cid:15) ) + (1 − p ) u n if n > (184)Next, we prove that the L ﬁrst terms of this sequence are strictly less than l . We detail this in the following. Lemma 8.

For n ∈ [0 , L ] , u n < l Proof. renewcommand (cid:4)

In fact, the sequence u n satisﬁes for all n : u n = λ (1 − p ) n + ( l + (cid:15) ) (185)where λ = − ( (cid:15) + l − l ) . u n is clearly increasing with n , then for all n ∈ [0 , L ] : u n ≤ u L = λ (1 − p ) L + ( l + (cid:15) ) = (cid:15) (1 − (1 − p ) L ) + l − ( l − l )(1 − p ) L (186)We have that: (cid:15) < ( l − l )( (1 − p ) L − (1 − p ) L ) (187)Given that − (1 − p ) L ≥ , then: (1 − (1 − p ) L ) (cid:15) < ( l − l )(1 − p ) L (188) (1 − (1 − p ) L ) (cid:15) + l − ( l − l )(1 − p ) L < l (189)Therefore, u L < l .Based on the lemma above, we prove that for any element of the set { α ( T ) , · · · , α ( T + L ) } must be lessthan u L .For that, we introduce a useful Lemma: Lemma 9.

If for T ∈ [ T , T + L − , we have that: α ( T ) ≤ α ( T + 1) (190) Then, we have that: α ( T + 1) ≤ p ( l + (cid:15) ) + (1 − p ) α ( T ) (191) Proof.

Before starting the proof, we recall that, according to the ﬁrst result of Proposition 12, the four possibleinequalities satisﬁed by α ( T ) , α ( T + 1) , α ( T − l ( T )) , α ( T − l ( T ) + 1) are: α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) (192) α ( T − l ( T )) ≤ α ( T + 1) ≤ α ( T ) (193) α ( T − l ( T ) + 1) ≤ α ( T + 1) ≤ α ( T ) (194) α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) (195)Therefore, the two cases for which α ( T ) ≤ α ( T + 1) are: • α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) . • α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) .Hence, according to the results of Proposition 12, the inequalities satisﬁed by α ( T + 1) − α ( T ) are:If α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T )) , then: α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T )) − α ( T )) (196)If α ( T ) ≤ α ( T + 1) ≤ α ( T − l ( T ) + 1) , then: α ( T + 1) − α ( T ) ≤ p ( α ( T − l ( T ) + 1) − α ( T )) (197)Since, by assumption of the Lemma, T ≥ T ≥ T (cid:15) , then max A ( T ) ≤ l + (cid:15) . As a consequence, α ( T − l ( T ) + 1) and α ( T − l ( T )) which are elements of the vector A ( T ) , are less than l + (cid:15) .Hence, for T ∈ [ T , T + L − : α ( T + 1) − α ( T ) ≤ p ( l + (cid:15) − α ( T )) (198)Therefore: α ( T + 1) ≤ p ( l + (cid:15) ) + (1 − p ) α ( T ) (199) (cid:4) Now we should prove that for all possible sequences of α in [ T , T + L ] , their values can not exceed λ (1 − p ) L + ( l + (cid:15) ) = u L . Lemma 10.

For all sequences of α when T ∈ [ T , T + L ] , α ( T ) ≤ u T − T Proof.

We prove this result by induction.For T = T , we have that: α ( T ) ≤ l = u (200) We suppose that at time T , α ( T ) ≤ u T − T , then at time T + 1 :If α ( T + 1) ≤ α ( T ) :Then as u T − T is increasing in T : α ( T + 1) ≤ u T − T ≤ u T − T +1 (201)If α ( T + 1) ≥ α ( T ) :Then, according to Lemma 9: α ( T + 1) ≤ p ( l + (cid:15) ) + (1 − p ) α ( T ) (202) ≤ p ( l + (cid:15) ) + (1 − p ) u T − T (203) = u T − T +1 (204)Therefore, α ( T + 1) ≤ u T − T +1 .Hence, we have proved by induction that for all T ∈ [ T , T + L ] , α ( T ) ≤ u T − T (cid:4) As u T − T is less than u L for T ∈ [ T , T + L ] , then according to Lemma 10, the elements α ( T +1) , · · · , α ( T + L ) are less than u L < l .Thus, we have found T ≥ T (cid:15) such that α ( T ) , α ( T + 1) , · · · , α ( T + l max ) are strictly less than l . We denote T + l max by T d and we verify that max A ( T d ) < l . Indeed, we now that T d − l ( T d ) ≥ T d − l max = T , then theelements of the vector A ( T d ) are included in the set of elements { α ( T ) , α ( T + 1) , · · · , α ( T + l max ) } . Thatis max A ( T d ) < l .Hence, we have found T d ≥ T (cid:15) , such that max A ( T d ) < l .A PPENDIX LP ROOF OF P ROPOSITION i in class k , z ki ( t ) converges. To that end, we start ﬁrst by specifyingthe eventual limit of z ki ( t ) for each i . To do so, we decompose − α as follows: l ( p α ∗ + p α ∗ ) + γp α ∗ + βp α ∗ = 1 − α (205)where l is the biggest integer such that: l ( p α ∗ + p α ∗ ) < − α , and < γ ≤ and β = 0 ; or γ = 1 and < β ≤ . Then, we proceed with these following steps: • We prove by induction that for all states ≤ i ≤ l + 1 , z ki ( t ) converges to p k α ∗ k . • Based on the theoretical ﬁndings of the ﬁrst step, we prove that z l +2 ( t ) converges to ( β + (1 − p )(1 − β )) p α ∗ and z l +2 ( t ) converges to ( γ + (1 − p )(1 − γ )) p α ∗ . • Finally, we show that for all states i > l + 2 , z i ( t ) converges to (1 − p ) i − l − ( β + (1 − p )(1 − β )) p α ∗ and z i ( t ) converges to (1 − p ) i − l − ( γ + (1 − p )(1 − γ )) p α ∗

1) For all states ≤ i ≤ l + 1 , z ki ( t ) → p k α ∗ k :We prove this result by induction • For i = 1 , we have that z k ( t ) = p k α k ( t − . Therefore, z k ( t ) converge to p k α ∗ k as α k ( t ) converges to α ∗ k . • We consider that for a certain j ≤ l , for each ≤ i ≤ j , z ki ( t ) converges to p k α ∗ k and we show that z kj +1 ( t ) converges also to p k α ∗ k .Given that j ≤ l : j ( p α ∗ + p α ∗ ) < − α We consider < (cid:15) ≤ − α − j ( p α ∗ + p α ∗ ) . Providing that z ki ( t ) converges to p k α ∗ k for all ≤ i ≤ j ,that means there exists t j such that for t ≥ t j , for ≤ i ≤ j : | z ki ( t ) − p k α ∗ k | < (cid:15) j Hence: j (cid:88) i =1 | z i ( t ) − p α ∗ | + j (cid:88) i =1 | z i ( t ) − p α ∗ | < (cid:15) That is, j (cid:88) i =1 z i ( t ) + j (cid:88) i =1 z i ( t ) < (cid:15) + j ( p α ∗ + p α ∗ ) As consequence, for all t ≥ t j , we have that: j (cid:88) i =1 z i ( t ) + j (cid:88) i =1 z i ( t ) < − α Thus, for all t ≥ t j , the action prescribed to the users’ proportion z kj ( t ) is the passive action . Then, forall t ≥ t j : z kj +1 ( t + 1) = z kj ( t ) Therefore, z kj +1 ( t ) converges to p k α ∗ k .Consequently, we prove by induction that for all ≤ i ≤ l + 1 , z ki ( t ) converges to p k α ∗ k .2) z l +2 ( t ) → ( β + (1 − p )(1 − β )) p α ∗ and z l +2 ( t ) → ( γ + (1 − p )(1 − γ )) p α ∗ .To avoid redundancy , we will be limited to the ﬁrst case when < γ ≤ and β = 0 , since the proof’s stepsfor both cases are exactly the same. We have that: l ( p α ∗ + p α ∗ ) + γp α ∗ = 1 − α As (cid:80) li =1 z i ( t ) + (cid:80) li =1 z i ( t ) converges to l ( p α ∗ + p α ∗ ) which is strictly less than − α , then there exists t l such that for all t ≥ t l , we have that: l (cid:88) i =1 z i ( t ) + l (cid:88) i =1 z i ( t ) < − α Knowing that the order of the proportions of the users according to the Whittle’s index value alternates between the two classes in the set [1 , l max + 1] as was established in 7, then for all integer b ∈ [1 , l max ] , the set { z ki : k = 1 ,

2; 1 ≤ i ≤ b } is the set of users with the lowestWhittle’s index value. Therefore, (cid:80) bi =1 z i ( t ) + (cid:80) bi =1 z i ( t ) < − α implies that the actions prescribed to the users belonging to the set { z ki : k = 1 ,

2; 1 ≤ i ≤ b } is the passive action. By deﬁnition of l , l < − αp α , then, l ≤ l max (see Lemma 3). Hence, the above reasoning canbe applied as well when b = l . As (cid:80) l +1 i =1 z i ( t ) + (cid:80) l +1 i =1 z i ( t ) converges to ( l + 1)( p α ∗ + p α ∗ ) which is strictly greater than − α , thenthere exists t l +1 such that for all t ≥ t l +1 , we have that: l +1 (cid:88) i =1 z i ( t ) + l +1 (cid:88) i =1 z i ( t ) > − α For t ≥ max { t l , t l +1 } , we have that: l (cid:88) i =1 z i ( t ) + l (cid:88) i =1 z i ( t ) < − α < l +1 (cid:88) i =1 z i ( t ) + l +1 (cid:88) i =1 z i ( t ) Denoting γ ( t ) and β ( t ) the users’ proportion of z l +1 ( t ) and z l +1 ( t ) respectively which are not scheduled,therefore, the relation that links z l +2 ( t + 1) and z l +2 ( t + 1) to z l +1 ( t ) and z l +1 ( t ) when t ≥ max { t l , t l +1 } : z l +2 ( t + 1) = β ( t ) z l +1 ( t ) + (1 − p )(1 − β ( t )) z l +1 ( t ) z l +2 ( t + 1) = γ ( t ) z l +1 ( t ) + (1 − p )(1 − γ ( t )) z l +1 ( t ) with < γ ( t ) ≤ and β ( t ) = 0 ; or γ ( t ) = 1 and < β ( t ) ≤ . To that extent, we show that β ( t ) tends to β = 0 and γ ( t ) tends to γ . For that purpose, we give the following equation which is always satisﬁed when t ≥ max { t l , t l +1 } : l (cid:88) i =1 z i ( t ) + l (cid:88) i =1 z i ( t ) + γ ( t ) z l +1 ( t ) + β ( t ) z l +1 ( t ) = 1 − α (206)Tending t to + ∞ in the equation 206, we obtain: lim t → + ∞ [ γ ( t ) z l +1 ( t ) + β ( t ) z l +1 ( t )] = γp α ∗ We consider the set { t : β ( t ) (cid:54) = 0 } . If this set is inﬁnite, then there exists a strictly increasing function n ( . ) from N to { t ∈ N β ( t ) (cid:54) = 0 } , such that β ( n ( t )) is a sub-sequence of β ( t ) . As β ( n ( t )) (cid:54) = 0 , then γ ( n ( t )) = 1 .Therefore, we get: lim t → + ∞ [ z l +1 ( n ( t )) + β ( n ( t )) z l +1 ( n ( t ))] = γp α ∗ Since z l +1 ( n ( t )) converges to p α ∗ , then: lim t → + ∞ [ β ( n ( t )) z l +1 ( n ( t ))] = ( γ − p α ∗ ( γ − p α ∗ is less than , and β ( n ( t )) z l +1 ( n ( t )) is greater than for all t . Thus: lim t → + ∞ [ β ( n ( t )) z l +1 ( n ( t ))] = ( γ − p α ∗ = 0 This implies that γ = 1 = γ ( n ( t )) , and lim t → + ∞ β ( n ( t )) = 0 because z l +1 ( n ( t )) converges to p α ∗ (cid:54) = 0 . Hence lim t → + ∞ β ( t ) = 0 = β , i.e. lim t → + ∞ γ ( t ) = γ = 1 .If { t : β ( t ) (cid:54) = 0 } is ﬁnite, then there exists t e such that for all t ≥ t e , β ( t ) = 0 . Therefore, for all t ≥ t e , wehave that: lim t → + ∞ [ γ ( t ) z l +1 ( t )] = γp α ∗ That means lim t → + ∞ β ( t ) = 0 , and lim t → + ∞ γ ( t ) = γ . Hence, in both cases, β ( t ) → β = 0 and γ ( t ) → γ .Consequently, combining the last result with the one derived in the ﬁrst step, we conclude that z l +2 ( t ) converges to ( β + (1 − p )(1 − β )) p α ∗ and z l +2 ( t ) converges to ( γ + (1 − p )(1 − γ )) p α ∗ . Similar analysiscan be applied to come with the aforementioned result when γ ( t ) = 1 and < β ( t ) ≤ .3) For i > l + 2 , z i ( t ) → (1 − p ) i − l − ( β + (1 − p )(1 − β )) p α ∗ and z i ( t ) → (1 − p ) i − l − ( γ + (1 − p )(1 − γ )) p α ∗ :For t ≥ max { t l , t l +1 } , we are sure that the action prescribed to z ki ( t ) for all i ≥ l + 2 is the active action.As consequence, z ki +1 ( t + 1) satisﬁes: z ki +1 ( t + 1) = (1 − p k ) z ki ( t ) Therefore, as z l +2 ( t ) converges to ( β +(1 − p )(1 − β )) p α ∗ and z l +2 ( t ) converges to ( γ +(1 − p )(1 − γ )) p α ∗ ,one can easily establish by induction that z i ( t ) converges to (1 − p ) i − l − ( β + (1 − p )(1 − β )) p α ∗ and z i ( t ) converges to (1 − p ) i − l − ( γ + (1 − p )(1 − γ )) p α ∗ for all i > l + 2 .We conclude that for all states i and k = 1 , , z ki ( t ) converges. On the other hands, according to Proposition 6, theonly possible limit of z ( t ) is z ∗ . As consequence, for each k and i , z ki ( t ) converges to z k, ∗ i .**here** A PPENDIX MP ROOF OF P ROPOSITION z , let m ( z ) and m ( z ) be the highest states of the class 1 and the class 2 respectively and l ( z ) and l ( z ) be the thresholds of class 1 and 2 respectively at time t when Z N ( t ) = z . Given that, we introduce thefollowing lemma. Lemma 11.

For any µ , there exists positive constant C ( z ) such that: P ( || Z N ( t + 1) − z (cid:48) || ≥ µ | Z N ( t ) = z ) ≤ C ( z ) N (207) where C ( z ) is independent of N and z (cid:48) = Q ( z ) z = E ( Z N ( t + 1) | Z N ( t ) = z ) Proof.

By deﬁnition of m ( z ) and m ( z ) , we have that z = ( z , · · · , z m ( z ) , z , · · · , z m ( z ) ) . On can easily showthat m ( z (cid:48) ) = m ( z ) + 1 and m ( z (cid:48) ) = m ( z ) + 1 since the users’ proportions at states m ( z ) and m ( z ) inclass 1 and class 2 will become at states m ( z ) + 1 and m ( z ) + 1 at the next time slot respectively. To prove thislemma, we use the Chebychev inequality presented as follows: P ( | X − E ( X ) | > µ ) ≤ V ar ( X ) µ (208)for any µ > and random variable X .As z (cid:48) = E ( Z N ( t +1) | Z N ( t ) = z ) , we can apply the Chebychev inequality. However we need to ﬁnd the distributionof Z N ( t + 1) knowing Z N ( t ) = z in order to derive the expression of V ar ( Z N ( t + 1) | Z N ( t ) = z ) . It is moresimple to study the parameters of one dimensional random variable than multi-dimensional random variable. Hence,instead of investigating Z N ( t + 1) , we look into Z N,ki . In this regard, we have that: { Z N ( t + 1) : || Z N ( t + 1) − z (cid:48) || ≥ µ } ⊂ ∪ k,i { Z N ( t + 1) : || Z N,ki ( t + 1) − z (cid:48) ki || i > µm ( z (cid:48) ) + m ( z (cid:48) ) } (209) Therefore: P ( || Z N ( t + 1) − z (cid:48) || ≥ µ | Z N ( t ) = z ) ≤ P ( ∪ k,i {|| Z N,ki ( t + 1) − z (cid:48) ki || i > µm ( z (cid:48) ) + m ( z (cid:48) ) | Z N ( t ) = z } ) (210) ≤ (cid:88) k,i P ( {|| Z N,ki ( t + 1) − z (cid:48) ki || i > µm ( z (cid:48) ) + m ( z (cid:48) ) | Z N ( t ) = z } ) (211)Now, we look for the distribution of Z N,ki ( t + 1) knowing Z N ( t ) = z .For ≤ i ≤ l k ( z ) , as all the users at state i − less strictly than l k ( z ) will transit to the state i at the next timeslot, then we have Z N,ki ( t + 1) = z ki − = z (cid:48) ki . This implies that: P ( {|| Z N,ki ( t + 1) − z (cid:48) ki || i > µm ( z (cid:48) ) + m ( z (cid:48) ) | Z N ( t ) = z } ) = 0 (212)For i = 1 , deﬁning α ( z ) and α ( z ) as the proportions of the scheduled users in class 1 an class 2 respectivelywhen Z N ( t ) = z , then N Z

N,k ( t + 1) | Z N ( t ) = z follows a binomial distribution with parameters p k and α k ( z ) N .Therefore, V ar ( N Z

N,k ( t + 1) | Z N ( t ) = z ) = p k (1 − p k ) α k ( z ) N , which means that V ar ( Z N,k ( t + 1) | Z N ( t ) = z ) = p k (1 − p k ) α k ( z ) N . As a results, according to Chebychev inequality, we have that: P ( {|| Z N,k ( t + 1) − z (cid:48) k || > µm ( z (cid:48) ) + m ( z (cid:48) ) | Z N ( t ) = z } ) ≤ p k (1 − p k ) α k ( z ) N µ ( m ( z (cid:48) ) + m ( z (cid:48) )) (213)For i ≥ l k ( z ) + 2 , N Z

N,ki ( t + 1) | Z N ( t ) = z follows a binomial distribution with parameters − p k and z ki − N .Hence, V ar ( Z N,ki ( t + 1) | Z N ( t ) = z ) = p k (1 − p k ) z ki − N . Thus: P ( {|| Z N,ki ( t + 1) − z (cid:48) ki || > µi ( m ( z (cid:48) ) + m ( z (cid:48) )) | Z N ( t ) = z } ) ≤ p k (1 − p k ) z ki − N µ ( m ( z (cid:48) ) + m ( z (cid:48) )) i (214)Denoting β k ( z ) the users’ proportion of z kl k ( z ) that will not be transmitted, then for i = l k ( z ) + 1 , N Z

N,ki ( t +1) | ( Z N ( t ) = z ) = β k ( z ) N z ki − + X , where X follows a binomial distribution with parameters − p k and (1 − β k ( z )) z ki − N , then: P ( {|| Z N,ki ( t +1) − z (cid:48) ki || > µi ( m ( z (cid:48) ) + m ( z (cid:48) )) | Z N ( t ) = z } ) ≤ p k (1 − p k )(1 − β k ( z )) z ki − N µ ( m ( z (cid:48) )+ m ( z (cid:48) )) i (215)We end up with: P ( || Z N ( t + 1) − z (cid:48) || ≥ µ | Z N ( t ) = z ) ≤ ( m ( z (cid:48) ) + m ( z (cid:48) )) . [ p (1 − p ) α ( z ) N µ + p (1 − p ) α ( z ) N µ + (cid:88) i ≥ l ( z )+2 p (1 − p ) i z i − N µ + (cid:88) i ≥ l ( z )+2 p (1 − p ) i z i − N µ + p (1 − p )( l ( z ) + 1) (1 − β ( z )) z l ( z ) N µ + p (1 − p )( l ( z ) + 1) (1 − β ( z )) z l ( z ) N µ ] Knowing that α k ( z ) ≤ , (cid:80) i ≥ l k ( z ) z ki ≤ , − β k ( z ) ≤ , and for all state i in the vector z (cid:48) , i ≤ m ( z (cid:48) ) + m ( z (cid:48) ) then: P ( || Z N ( t + 1) − z (cid:48) || ≥ µ | Z N ( t ) = z ) ≤ ( m ( z (cid:48) ) + m ( z (cid:48) )) µ N [2 p (1 − p ) + 2 p (1 − p )] Hence, denoting by C ( z ) , ( m ( z (cid:48) )+ m ( z (cid:48) )) µ [2 p (1 − p )+2 p (1 − p )] = ( m ( z )+1+ m ( z )+1) µ [2 p (1 − p )+2 p (1 − p )] , we obtain as a result: P ( || Z N ( t + 1) − z (cid:48) || ≥ µ | Z N ( t ) = z ) ≤ C ( z ) N (216) (cid:4) Now, we give a lemma that bounds the probability knowing the initial state z (0) = x . One can easily veriﬁes that m ( z ( t )) = m ( x )+ t and m ( z ( t )) = m ( x )+ t by induction. Without loss of generality, we let m k ( z ( t )) = m k ( t ) for k = 1 , . Lemma 12.

For any µ , there exists positive constant C ( t + 1) such that: P x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ ) ≤ C ( t + 1) N (217) where C ( t + 1) is independent of N .Proof. We recall from Lemma 11 that for any µ > , there exists a constant C ( z ) independent of N such that: P ( || Z N ( t + 1) − Q ( z ) z || ≥ µ | Z N ( t ) = z ) ≤ C ( z ) N (218)Before proving the present lemma, we give an important lemma that will helps us in the later analysis. Lemma 13.

For any proportion vector z , there exists σ > such that if || Z N ( t ) − z || ≤ σ , then Q ( Z N ( t )) = Q ( z ) .Proof. One can deduce from the analysis done in [21, Section IV-C] that there exists σ > such that if Z N ( t ) ∈ Ω σ ( z ) , Q ( Z N ( t )) is constant and doesn’t depend on Z N ( t ) . Therefore, there exists σ > such that Q ( Z N ( t )) = Q ( z ) . That concludes the proof. (cid:4) Corollary 1.

For any v > , there exists ρ such that || Z N ( t ) − z ( t ) || ≤ ρ ⇒ || Q ( Z N ( t )) Z N ( t ) − Q ( z ( t )) z ( t ) || ≤ v Proof.

According to the previous lemma, if || Z N ( t ) − z ( t ) || ≤ σ , then Q ( Z N ( t )) = Q ( z ( t )) . This implies that || Q ( Z N ( t )) Z N ( t ) − Q ( z ( t )) z ( t ) || = || Q ( z ( t )) Z N ( t ) − Q ( z ( t )) z ( t ) || ≤ || Q ( z ( t )) |||| Z N ( t ) − z ( t ) || . That is,choosing ρ = min { v || Q ( z ( t )) || , σ } , we get || Q ( Z N ( t )) Z N ( t ) − Q ( z ( t )) z ( t ) || ≤ v . (cid:4) With the above corollary being laid out, we prove the statement by a mathematical induction.For t = 1 , applying Lemma 11, the following holds: Pr x ( || Z N (1) − z (1) || ≥ µ ) = P ( || Z N ( t + 1) − Q ( x ) x || ≥ µ | Z N ( t ) = x ) ≤ C ( x ) N = C (1) N (219)and the desired result holds for t = 1 by simply choosing C (1) = ( m ( x )+1+ m ( x )+1) µ [2 p (1 − p ) + 2 p (1 − p )] .Let us suppose that the statement holds for any t ≥ . We investigate the property for t + 1 . To that end, let usconsider ν < µ . Therefore, according to Corollary 1, there exists ρ such that: || Z N ( t ) − z ( t ) || ≤ ρ ⇒ || Q ( Z N ( t )) Z N ( t ) − Q ( z ( t )) z ( t ) || ≤ v (220) Bearing that in mind, we have that: Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ ) =Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || ≥ ρ )Pr x ( || Z N ( t ) − z ( t ) || ≥ ρ )+ Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || < ρ )Pr x ( || Z N ( t ) − z ( t ) || < ρ ) ≤ ( a ) C (cid:48) ( t ) N + Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || < ρ ) (221)where ( a ) follows from Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || ≥ ρ ) ≤ and C (cid:48) ( t ) being the constantrelated to the statement holding for t and for ρ . Next, we tackle the second term of the inequality in (221): Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || < ρ )=Pr x ( || Z N ( t + 1) − Q ( Z N ( t )) Z N ( t ) + Q ( Z N ( t )) Z N ( t ) − z ( t + 1) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || < ρ ) ≤ ( a ) Pr x ( || Z N ( t + 1) − Q ( Z N ( t )) Z N ( t ) || + || Q ( Z N ( t )) Z N ( t ) − Q ( z ( t )) z ( t ) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || < ρ ) ≤ ( b ) Pr x ( || Z N ( t + 1) − Q ( Z N ( t )) Z N ( t ) || ≥ µ − ν (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || < ρ )= (cid:88) z ∈ Ω ρ ( z ( t )) m k ( z ) ≤ m k ( z ( t )) k =1 , Pr x ( Z N ( t ) = z (cid:12)(cid:12)(cid:12) Z N ( t ) ∈ Ω ρ ( z ( t )))Pr x ( || Z N ( t + 1) − Q ( z ) z || ≥ µ − ν | Z N ( t ) = z )+ (cid:88) z ∈ Ω ρ ( z ( t )) m ( z ) >m ( z ( t )) orm ( z ) >m ( z ( t )) Pr x ( Z N ( t ) = z (cid:12)(cid:12)(cid:12) Z N ( t ) ∈ Ω ρ ( z ( t )))Pr x ( || Z N ( t + 1) − Q ( z ) z || ≥ µ − ν | Z N ( t ) = z ) (222)where ( a ) and ( b ) follows from the triangular inequality and the relationship in (220). One can notice that at anytime slot t , m k ( Z N ( t )) ≤ m k ( z ( t )) . In light of that fact, the second term of the equation (222) is equal to .Bearing that in mind, We have for z ∈ Ω ρ ( z ( t )) such that m k ( z ) ≤ m k ( z ( t )) : Pr x ( || Z N ( t + 1) − Q ( z ) z || ≥ µ − ν | Z N ( t ) = z ) ≤ C ( z ( t )) N (223)where C ( t ) = ( m ( z ( t ))+ m ( z ( t ))+2) ( µ − ν ) [2 p (1 − p ) + 2 p (1 − p )] = ( m ( t )+ m ( t )+2) ( µ − ν ) [2 p (1 − p ) + 2 p (1 − p )] .By substituting the above results in (222), we get: Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ (cid:12)(cid:12)(cid:12) || Z N ( t ) − z ( t ) || < ρ ) ≤ C ( t ) N (224)Combining this with (221), we can conclude that there exists a constant C ( t + 1) such that: Pr x ( || Z N ( t + 1) − z ( t + 1) || ≥ µ ) ≤ C ( t + 1) N (225)which concludes our inductive proof. (cid:4) Knowing that: P x ( sup ≤ t i : z ki ( t ) =  if t ≤ t < t + i − i p i − i k z ki ( t − ( i − i )) if t ≥ t + i − i (226)Based on the above equation, for each i > i , z ki ( t ) is less than p i − i k for all t ≥ t . To that extent, weinvestigate the evolution of the series of interest only when t ≥ t (the limit inversion theorem still applicablesince + ∞ > t ). Moreover, we have that for all t ≥ t : (cid:88) i | z ki ( t ) − z k, ∗ i | i = i (cid:88) i =1 | z ki ( t ) − z k, ∗ i | i + + ∞ (cid:88) i +1 | z ki ( t ) − z k, ∗ i | i ≤ i + + ∞ (cid:88) i = i +1 ( p i − i k i + z k, ∗ i i ) This last sum is known to be a ﬁnite sum since (cid:80) + ∞ i =1 z k, ∗ i i is the optimal average age of the relaxed problemfor the class k which is ﬁnite, and (cid:80) + ∞ i =1 p i i is a ﬁnite sum for any ≤ p < . Hence, the uniform convergencecan be accordingly concluded. • Existence of the limit of f i ( t ) = | z ki ( t ) − z k, ∗ i | i : According to the result of Proposition 14, we have lim t → + ∞ | z ki ( t ) − z k, ∗ i | i = 0 which is ﬁnite. Therefore, the second condition is satisﬁed.Leveraging these ﬁndings, we can inverse the order between the limit and the sum. Subsequently: lim t → + ∞ + ∞ (cid:88) i =1 | z ki ( t ) − z k, ∗ i | i = + ∞ (cid:88) i =1 lim t → + ∞ | z ki ( t ) − z k, ∗ i | i = 0 In other words, for k = 1 , , (cid:80) + ∞ i =1 | z ki ( t ) − z k, ∗ i | i tends to when t grows. Consequently, z ( t ) converges to z ∗ with respect to our deﬁned norm.Therefore, for < ν < µ , there exists T such that for any t ≥ T : || z ( t ) − z ∗ || ≤ ν (227)By leveraging Proposition 15, we have: Pr x ( sup T ≤ t , then: lim N →∞ (cid:12)(cid:12) T E wi (cid:34) T − (cid:88) t =0 K (cid:88) k =1 + ∞ (cid:88) i =1 Z k,Ni ( t ) i (cid:12)(cid:12)(cid:12) Z N (0) = x (cid:35) − K (cid:88) k =1 + ∞ (cid:88) i =1 z k, ∗ i i (cid:12)(cid:12) ≤ T ( m ( T ) + C RP ) T (242)Finally we have: lim T →∞ lim N →∞ (cid:12)(cid:12) T E wi (cid:34) T − (cid:88) t =0 K (cid:88) k =1 + ∞ (cid:88) i =1 Z k,Ni ( t ) i (cid:12)(cid:12)(cid:12) Z N (0) = x (cid:35) − K (cid:88) k =1 + ∞ (cid:88) i =1 z k, ∗ i i (cid:12)(cid:12) = 0 (243)As consequence: lim T → + ∞ lim N →∞ T E wi (cid:34) T − (cid:88) t =0 K (cid:88) k =1 + ∞ (cid:88) i =1 Z k,Ni ( t ) i (cid:12)(cid:12)(cid:12) Z N (0) = x (cid:35) = K (cid:88) k =1 + ∞ (cid:88) i =1 z k, ∗ i ii