Minimizing the Age of Incorrect Information for Real-time Tracking of Markov Remote Sources
MMinimizing the Age of Incorrect Information forReal-time Tracking of Markov Remote Sources
Saad Kriouile and Mohamad AssaadLaboratoire des Signaux et Syst`emes, CentraleSup´elec, Universit´e Paris-Saclay, 91192 Gif sur Yvette, France
Abstract —The age of Incorrect Information (AoII) has beenintroduced to address the shortcomings of the standard Age ofinformation metric (AoI) in real-time monitoring applications. Inthis paper, we consider the problem of monitoring the states ofremote sources that evolve according to a Markovian Process. Acentral scheduler selects at each time slot which sources shouldsend their updates in such a way to minimize the Mean Ageof Incorrect Information (MAoII). The difficulty of the problemlies in the fact that the scheduler cannot know the states ofthe sources before receiving the updates and it has then tooptimally balance the exploitation-exploration trade-off. We showthat the problem can be modeled as a partially ObservableMarkov Decision Process Problem framework. We develop anew scheduling scheme based on Whittle’s index policy. Thescheduling decision is made by updating a belief value of thestates of the sources, which is to the best of our knowledgehas not been considered before in the Age of Information area.To that extent, we proceed by using the Lagrangian RelaxationApproach, and prove that the dual problem has an optimalthreshold policy. Building on that, we shown that the problem isindexable and compute the expressions of the Whittle’s indices.Finally, we provide some numerical results to highlight theperformance of our derived policy compared to the classical AoImetric.
I. I
NTRODUCTION
The notable advance in wireless technology and the avail-ability of low-cost hardware have led to the emergence ofreal-time monitoring services. In these systems, the monitorneeds to know the status of one or multiple processes observedby remote sources. Specifically, the sources send packets thatcontain the information about the process of interest to themonitor to perform a given task. To that extent, the main goalin these applications is to keep the monitor up to date byreceiving the fresh information from different sources. Thisconcept of freshness is captured by the Age of Information(AoI) which is introduced for the first time in [1]. Since then,the AoI has become a hot research topic, and a considerablenumber of research works have been published on the subject[2]–[4]. Although this metric quantifies the information timelag at the monitor, it fails to capture the correctness of theinformation at the monitor side. Specifically, the evolutionof this metric doesn’t take into consideration the state of theinformation at the monitor side. This has been confirmed in[5] where the authors establish that minimizing AoI gives asub-optimal policy in minimizing the status error in remotelyestimating Markovian sources. To deal with this issue, someworks propose to minimize the estimation error or the meansquare error [6], [7]. However, the metrics developed in these works are unable to capture the concept of freshness. In otherwords, there is no penalty incurred to the monitor or the centralentity for being in incorrect state for a long time.To meet the timeliness requirement in the process estimationframework, the authors in [8] have designed a new metricdubbed as Age of incorrect information AoII that capturesthe freshness of the information while taking into account theinformation content acquired from the transmitter. This metricis adopted in the context where a given source is representedas a process denoted by X ( t ) and the transmitter send statusupdate to the receiver to inform it about the current stateof X ( t ) . Under energy or transmission rate constraint, thetransmitter cannot use at each time slot the channel to transmitthe packet. In this case, as long as the transmitter is in the idlemode, the monitor keeps the last information which may be inthe erroneous state compared to the current state of the process X ( t ) . Denoting by ˆ X ( t ) the estimated state in the monitorside, being at the incorrect state or equivalently ˆ X ( t ) (cid:54) = X ( t ) ,is clearly an undesirable situation with regards to the monitor,and therefore a penalty should be paid. The metric developedby [8], AoII matches with this notion of penalty. Specifically,unlike AoI, AoII evolves only if the estimated state ˆ X ( t ) in the monitor side is different from the real state of theprocess of interest X ( t ) . While if X ( t ) = ˆ X ( t ) , the AoIIdoesn’t evolve. To that extent in [8], [9], the authors considerthe problem of minimizing the average AoII in a transmitter-receiver pair scenario where packets are sent over an unreliablechannel subject to a transmission rate constraint. They derivethe optimal solution which is of the form threshold-basedpolicy. The work in [10] studies the AoII metric in the simplecontext of monitoring a symmetric binary information sourceover a delay system with feedback. The authors proposesa dynamic programming algorithm to compute the optimalsampling policy.However, [8]–[10] assume that the scheduler has a perfectknowledge about the process X ( t ) at each time slot t andrestrict the analysis to one transmitter-receiver pair commu-nication. On the opposite of that, in this paper, we tackle arealistic case in which a scheduler tracks the states of multipleremote sources and selects at each time a subset of themsend their updates, in such a way to minimize the Mean Ageof Incorrect Information (MAoII). Furthermore, the schedulerdoes not know the instantaneous state of the remote sourcesuntil it receives their updates. Specifically, our contributionscan be summarized as follows: a r X i v : . [ c s . I T ] F e b Since the scheduler cannot know at each time the currentstates of the sources before it receives their updates, itcannot know exactly the value of MAoII and has totrack/predict its evolution. To that end, we introduce abelief state at the monitor, which can be interpreted asthe probability that the state at the monitor side is correct,i.e. ˆ X ( t ) = X ( t ) . We then describe how this belief statecan be derived and can be used in the development of thescheduling policy. • We then formulate the MAoII-based scheduling problemand show that it belongs to the family of Restless Multi-Armed Bandit (RMAB) problems. The optimal solutionof this type of problem is known to be out of reach. Tocircumvent this difficulty, we develop the low-complexand efficient policy called Whittle’s index policy (WIP)using the Lagrangian Relaxation Approach.II. S
YSTEM M ODEL
A. Network description
We consider in our paper N u users that generate and sendstatus updates about the process of interest to a central entityover unreliable channels. Time is considered to be discrete andnormalized to the time slot duration. More specifically, eachuser i observes an information process of interest X i ( t ) andat the request of the monitor, it samples the process X i ( t ) andsend it to the monitor over an unreliable channel. Based onthe last received update, the monitor constructs an estimate ofthe process, denoted by ˆ X i ( t ) . Given that the time durationof packet’s transmission is one time slot, then if the monitorallows a user i to transmit at time t , it receives the valueof X i ( t ) at time slot t + 1 in the case where the packetis successfully transmitted. Therefore, it updates the estimateprocess as ˆ X i ( t + 1) = X i ( t ) . In any other case, namely whenthe user i is not authorized to transmit or when the packet isunsuccessfully transmitted, the monitor keeps the same valueat time slot t , specifically ˆ X i ( t + 1) = ˆ X i ( t ) . As for theunreliable channel, we suppose that for user i , at each timeslot t , the probability of having successful transmission is ρ i ,and − ρ i otherwise. Consequently, the channel realizations areindependent and identically distributed (i.i.d.) over time slotsthat we denote c i ( t ) , i.e. c i ( t ) = 1 if the packet is successfullytransmitted and c i ( t ) = 0 otherwise.The next aspect of our model that we tackle is the natureof the process X i ( t ) . To that extent, for each user i , theinformation process of interest X i ( t ) evolves under Markovchain. For that, we define the probability of remaining at thesame state in the next time slot as p i . Similarly, the probabilityof transitioning to another state is r i . Denoting by N i thenumber of possible states of X i ( t ) , then the following alwaysholds: p i + ( N i − r i = 1 (1)In this chapter, We study the case where p i ≥ r i . Fig. 1: Illustration of process X i ( t ) B. Penalty function dynamics
In this paper, we study the mean age of incorrect informa-tion (MAoII) penalty function and we compare it with the ageof information metric (AoI). We see how it is relevant, accurateand more realistic to consider MAoII metric compared toAoI metric in order to have a good performance with regardto the empirical value of the age of information. For thatpurpose, we start by reintroducing in the next section the ageof information to emphasize its shortcomings, then we proposeas an alternative metric MAoII.
1) Age of information penalty function:
The standard met-ric (AoI) that captures the freshness of information for user i is: δ AoI ( t ) = t − g i ( t ) (2)where g i ( t ) is the time-stamp of the last successfully receivedpacket by monitor. This metric captures the lifetime of thelast update at the monitor without taking into account thecorrectness of the information. Thereby, this makes it fall shortin some applications. For instance in some scenarios, the agewill increase but the information of interest remains at thesame state. Nevertheless, to further emphasize the shortcomingof this metric, we provide the Whittle index policy consideringthis metric which is already derived in [8]. And we give somenumerical results that show the shortage of this policy.
2) Mean Age of incorrect information penalty function:
The age of incorrect information has been introduced the firsttime in [8]. This metric captures the freshness of informativeupdates. Specifically, if the monitor acquires the informationabout the process X i ( t ) , as long as the state of the process X i ( t ) remains at the same state in the next time slots, theage of the incorrect information will not increase, since thereis no new information unknown by the monitor. In [8], the Considering our system model detailed in II-A, g i ( t ) refers also to thesampling time of the information of interest contained in the last successfullyreceived packet uthors presume that the scheduler has a perfect knowledgeof the process at each time slot and restrict their analysis to atransmitter-receiver pair communication. While in our case, weconsider that the monitor which plays the role of the scheduler,knows only the state of the last successively received packetand we extend our analysis to a communication involvingseveral users that can transmit at each time slot. Accordingly,the explicit expression of MAoII metric is: δ MAoII ( t ) = E V i [( t − V i ( t )] (3)where V i ( t ) is the last time instant such that { X i ( V i ( t ))= ˆ X i ( g i ( t )+1) } = 1 . Remark 1.
It is worth mentioning that, as it was explainedin Section II-A, the reception of the successfully transmittedpacket takes place at time slot g i ( t ) + 1 . This means that ˆ X i ( g i ( t ) + 1) = X i ( g i ( t )) . In order to use this metric effectively in a partiallyObservable Markov Decision Process Problem, we need totake into consideration the markovian nature of the process X i ( t ) . To that extent, we introduce in the next section thenotion of the belief that represents the probability that ˆ X i ( t ) is in the correct state. C. Metrics evolution
In this section, we describe mathematically the evolution ofeach metric depending on the system parameters and the actiontaken. We denote by d i ( t ) the action prescribed to user i attime slot t and by a i , b i , the age of information and the meanage of incorrect information penalty functions respectively.
1) AoI:
Considering our system model, the age of infor-mation of user i evolves as follows: If the user i is scheduled( d i ( t ) = 1 ), the value of AoI goes to state if the packetis successively transmitted ( c i ( t ) = 1 ), otherwise, the valueof AoI is increased by one ( c i ( t ) = 0 ). If the user i is notscheduled ( d i ( t ) = 0 ), the value of AoI is increased by one.Accordingly, the evolution of the age of the user i can besummarized in the following: a i ( t + 1) = (cid:26) if d i ( t ) = 1 , c i ( t ) = 1 a i ( t ) + 1 else (4)As for the second metric, to highlight the notion of correctness,the monitor maintains a belief value π i ( t ) which is definedas the probability that the information state in the monitor, ˆ X i ( t ) = ˆ X i ( g i ( t ) + 1) = X i ( g i ( t )) at time t being correct.Explicitly π i ( t ) = P r ( ˆ X i ( t ) = X i ( t )) . One can show that π i ( t ) evolves as follows: Lemma 1. π i ( t +1) = (cid:26) p i if d i ( t ) = 1 , c i ( t ) = 1 π i ( t ) p i + r i (1 − π i ( t )) else (5) We recall that as long as the monitor has not received any new updatefrom the source at time instant t , it maintains the last update successivelyreceived at time instant g i ( t ) + 1 . In other words, ˆ X i ( t ) = ˆ X i ( g i ( t ) + 1) Proof:
See appendix A.
2) MAoII:
According to the expression of MAoII given insection II-B2, ( t − V i ( t )) is a random variable that we denote A i ( t ) that satisfies: Lemma 2. A i ( t ) = w.p π i ( t )1 w.p π i ( t − . (1 − p i )2 w.p π i ( t − . (1 − p i ) . (1 − r i )3 · · · · · · ... t − g i ( t ) − w.p π i ( g i ( t ) + 1) . (1 − p i ) . (1 − r i ) t − g i ( t ) − t − g i ( t ) w.p (1 − p i ) . (1 − r i ) t − g i ( t ) − (6) Proof:
See appendix B.Therefore, the mean of the age of the incorrect informationat slot t equals to the mean of A i ( t ) , i.e. n i ( t ) = E [ A i ( t )]= t − g i ( t ) − (cid:88) k =0 k (1 − p i )(1 − r i ) k − π i ( t − k )+ ( t − g i ( t )) . (1 − p i ) . (1 − r i ) t − g i ( t ) − = t − g i ( t ) (cid:88) k =1 ( t − g i ( t ) − k )(1 − p i )(1 − r i ) t − g i ( t ) − k − π i ( g i ( t ) + k )+ ( t − g i ( t )) . (1 − p i ) . (1 − r i ) t − g i ( t ) − (7)One can establish that for all t , using definition of g i ( t ) , π i ( g i ( t ) + 1) = p i . Hence, according to the evolution of π i ( · ) in Lemma 1, for all k ≤ t , π i ( g i ( t ) + k ) depends onlyon k and i . More precisely, we have that for each k ≤ t , π i ( g i ( t ) + k ) = π ki where π ki is a sequence defined byinduction as follows: π ki = (cid:26) if k = 0 p i π ki + r i (1 − π ki ) if k > (8)In light of that fact, we have that: n i ( t ) = t − g i ( t ) (cid:88) k =0 ( t − g i ( t ) − k )(1 − p i )(1 − r i ) t − g i ( t ) − k − π ki (9)We conclude that n i ( t ) depends on t − g i ( t ) and i . Therefore,we let n i ( t ) ∆ = n i ( t − g i ( t )) .To that extent, at time slot t , if the user i is scheduledand the packet is successively transmitted, then g i ( t + 1) = t .Accordingly, at time slot t +1 , MAoII equals to n i ( t +1 − g i ( t +1)) = n i (1) . If the user i is not scheduled or if the packet isnot successively transmitted, then g i ( t +1) = g i ( t ) . Therefore,MAoII will transit to n i ( t + 1 − g i ( t + 1)) = n i ( t − g i ( t ) + 1) .Based on this and denoting j ( t ) the index such that n i ( j ( t )) s the value of MAoII at time slot t , MAoII will transit tothe value n i ( j ( t ) + 1) at time instant t + 1 . To sum up, theevolution of MAoII can be summarized as follows: b i ( t + 1) = (cid:26) n i (1) if d i ( t ) = 1 , c i ( t ) = 1 n i ( j ( t ) + 1) else (10)where b i ( t ) = n i ( j ( t )) .III. P ROBLEM FORMULATION
In this section, we consider a given metric denoted by m where m can be either AoI or MAoII. We denote further by N u the total number of users in the system. We let the vector m at time t be m ( t ) = ( m ( t ) , . . . , m N u ( t )) where m i ( t ) isthe penalty function at the central entity of user i with respectto the metric m at time slot t . Our aim is to find a schedulingpolicy that allocates per each time slot, the available channels( M channels) to a given subset of users ( M users, M ≤ N u )in a such way to minimize the total expected average penaltyfunction of the metric considered. A scheduling policy φ isdefined as a sequence of actions φ = ( d φ (0) , d φ (1) , . . . ) where d φ ( t ) = ( d φ ( t ) , d φ ( t ) , . . . , d φN u ( t )) is a binary vectorsuch that d φi ( t ) = 1 if the user i is scheduled at time t .Denoting by Φ , the set of all causal scheduling policies, thenour scheduling problem can be formulated as follows:minimize φ ∈ Φ lim T → + ∞ sup T E φ ∈ Φ (cid:16) T − (cid:88) t =0 N u (cid:88) i =1 m φi ( t ) | m (0) (cid:17) subject to N u (cid:88) i =1 d φi ( t ) ≤ αN u t = 1 , , . . . (11)where αN u = M . The problem in (11) falls into Multi-armedbandit problems and especially Restless Bandit framework.RMAB problems are known to be generally difficult to solvethem as they are PSPACE-Hard [11]. To circumvent thiscomplexity, a well-known heuristic is proposed for thesetypes of problems called Whittle’s index policy [12]. Thispolicy is based on a Lagrangian relaxation, and was shownto have remarkable performance in real-life applications. Tothat extent, we tackle in more depth in the next section theLagrangian relaxation approach applied to our RBP problem.Then, we provide the theoretical analysis to get the low-complex policy: Whittle index policy that we denote by WIPfor the two metrics namely, AoI and MAoII.IV. L AGRANGIAN R ELAXATION AND W HITTLE ’ S I NDEX
A. Relaxed problem
The Lagrangian relaxation technique is the key componentfor defining the Whittle’s index scheduling policy. First, itconsists of relaxing the constraint on the available resourcesby letting it be satisfied on average rather than in every time slot. More specifically, we define our Relaxed Problem ( RP )as follows:minimize φ ∈ Φ lim T → + ∞ sup T E φ (cid:16) T − (cid:88) t =0 N u (cid:88) i =1 m φi ( t ) | m (0) (cid:17) subject to lim T → + ∞ sup T E φ (cid:16) T − (cid:88) t =0 N u (cid:88) i =1 d φi ( t ) (cid:17) ≤ αN u (12)The Lagrangian function f ( W, φ ) of the problem (12) isdefined as: lim T → + ∞ sup T E φ (cid:16) T − (cid:88) t =0 N u (cid:88) i =1 m φi ( t ) + W d φi ( t ) | m (0) (cid:17) − W αN u (13)where W ≥ can be seen as a penalty for schedulingusers. Thus, by following the Lagrangian approach, our nextobjective is to solve the following problem:min φ ∈ Φ f ( W, φ ) (14)As the term W αN u is independent of φ , it can be eliminatedfrom the analysis. Baring that in mind, we present the stepsto obtain the Whittle’s index policy:1) We focus on the one-dimensional version of the problemin (14). Indeed, it can be shown that the N u -dimensionalproblem can be decomposed into N u one-dimensionalproblems that can be solved independently [13]. Ac-cordingly, we drop the user’s index for ease of notationinvolving all user’s parameters, and we deal with theone-dimensional problem:min φ ∈ Φ lim T → + ∞ sup T E φ (cid:16) T − (cid:88) t =0 m φ ( t ) + W d φ ( t ) | m (0) (cid:17) (15)2) We give the structural results on the optimal solution ofthe one-dimensional problem.3) We establish the indexability property, which ensures theexistence of the Whittle’s indices.4) We derive a closed-form expression of the Whittle’s in-dex and, thereby, define the proposed scheduling policy(WIP) for the original problem (11). B. Structural results
The problem in (15) can be viewed as an infinite horizonaverage cost Markov decision process that is defined asfollows: • States : The state of the MDP at time t is the penaltyfunction m ( t ) . • Actions : The action at time t , denoted by d ( t ) , specifyif the user is scheduled (value ) or not (value ). • Transitions probabilities : The transitions probabilitiesbetween the different states. • Cost : We let the instantaneous cost of the MDP, C ( m ( t ) , d ( t )) , be equal to m ( t ) + W d ( t ) .he optimal policy φ ∗ of the one-dimensional problem (15)can be obtained by solving the following Bellman equationfor each state m : θ + V ( m )= min d ∈{ , } (cid:8) m + W d + (cid:88) m (cid:48) ∈ A m Pr( m → m (cid:48) | d ) V ( m (cid:48) ) (cid:9) (16)where Pr( m → m (cid:48) | d ) is the transition probability from state m to m (cid:48) under action d , θ is the optimal value of the problem, V ( m ) is the differential cost-to-go function and A m is theset of states of the metric m . There exist several numericalalgorithms that are developed to solve (16), such as the valueiteration algorithm. This later consists first of updating pereach iteration the value function V t ( . ) following the recurrencerelation for each state m : θ + V t +1 ( m ) (17) = min d ∈{ , } (cid:8) m + W d + (cid:88) m (cid:48) ∈ A m Pr( m → m (cid:48) | d ) V t ( m (cid:48) ) (cid:9) (18)given that V ( . ) = 0 . Then concluding for V ( . ) exploitingthe fact that lim t → + ∞ V t ( m ) = V ( m ) . The main shortcoming ofthis algorithm that it requires high memory and computationalcomplexity. To overcome this complexity, rather than comput-ing the value of V ( . ) for all states, we limit ourself to studythe structure of the optimal scheduling policy by exploiting thefact that lim t → + ∞ V t ( m ) = V ( m ) . In that way, we show that theoptimal solution of Problem (16) is a threshold-based policy: Definition 1.
A threshold policy is a policy φ ∈ Φ for whichthere exists n such that when the current state m < n , theprescribed action is d − ∈ { , } , and when m ≥ n , theprescribed action is d + ∈ { , } while baring in mind that d − (cid:54) = d + . To that extent, we show that for both metrics consideredin our paper, namely AoI and MAoII, the optimal policy of(16) is a threshold based policy. To that end, we specify firstthe states space A m for both metrics, then we provide theexpression of the corresponding Bellman equation (16). Afterthat, we establish our desired result.
1) AoI:
According to Section II-C1, a ( t ) evolves in thestate space: A a = { a j : j > , a j = j } (19)The corresponding Bellman equation is: θ a + V ( a j ) = min (cid:8) a j + V ( a j +1 ); a j + W + ρV ( a ) + (1 − ρ ) V ( a j +1 ) (cid:9) (20)The analysis have been already done in [2] regarding thismetric. Effectively, in [2], the authors demonstrate that thestructure of the optimal policy of Problem (15) is a thresholdbased policy. They prove further that this policy is increasingwith the age. i.e.: Proposition 1.
When m = a , the optimal solution of theproblem in (15) is an increasing threshold policy. Explicitly,there exists n such that when the current state a j < a n , theprescribed action is a passive action, and when a j ≥ a n , theprescribed action is an active action. One can see the detailed proof in [2].
2) MAoII:
According to Section II-C2, b ( t ) evolves in thestate space: A b = { b j : j > , b j = j (cid:88) k =0 k (1 − p )(1 − r ) k − π j − k } (21)Therefore, the expression of Bellman equation at state b j θ b + V ( b j ) = min (cid:8) b j + V ( b j +1 ); b j + W + ρV ( b ) + (1 − ρ ) V ( b j +1 ) (cid:9) (22) Theorem 1.
When m = b , the optimal solution of the problemin (15) is an increasing threshold policy. Explicitly, there exists n such that when the current state b j < b n , the prescribedaction is a passive action, and when b j ≥ b n , the prescribedaction is an active action.Proof: The proof can be found in Appendix C.
C. Indexability and Whittle’s index expressions
In order to establish the indexability of the problem andfind the Whittle’s index expressions, we provide the steady-state form of the problem in (15) under a threshold policy n .Explicitly: minimize n ∈ N ∗ m n + W d n (23)where m n is the average value of the penalty function withthe respect to the metric m , and d n is the average active timeunder threshold policy n . Specifically: m n = lim T → + ∞ sup T E n (cid:16) T − (cid:88) t =0 m ( t ) | m (0) , tp ( n ) (cid:17) (24) d n = lim T → + ∞ sup T E n (cid:16) T − (cid:88) t =0 d ( t ) | m (0) , tp ( n ) (cid:17) (25)where tp ( n ) denotes the threshold policy n . With the intentionof computing m n and d n , we derive the stationary distributionof the Discrete Time Markov Chain, DTMC that represents theevolution of MAoII under threshold policy n . One can showthat the steady state distribution in question is the same forboth metrics, AoI and MAoII. Specifically: Proposition 2.
For m = a, b , for a given threshold n , theDTMC admits u n ( m j ) as its stationary distribution: u n ( m j ) = (cid:26) ρnρ +1 − ρ if ≤ j ≤ n (1 − ρ ) j − n ρnρ +1 − ρ if j ≥ n (26) Proof:
The proof can be found in Appendix D.By exploiting the above results, we can now proceed withfinding a closed-form of the average cost of any thresholdpolicy. roposition 3.
For a given threshold n , the average cost ofthe threshold policy is m n : • m = a : a n = [( n − + ( n − ρ + 2 ρ ( n − ρ (( n − ρ + 1)+ 22 ρ (( n − ρ + 1) (27) • m = b : b n = ρnρ + 1 − ρ [ n ( N − N r − (1 − N r ) n +2 ( N r ) + (1 − r ) n +2 r + (1 − ρ )(1 − N r ) n +2 N r (1 − (1 − ρ )(1 − r )) − (1 − ρ )(1 − r ) n +2 r (1 − (1 − ρ )(1 − r )) + C ] (28) where C = (1 − Nr ) ( Nr ) − (1 − r ) r + ( N − − ρ ) Nrρ .Proof:
By leveraging the results of Proposition 2 andusing the expression of m j for j > , by definition of m n given in (24), we get after algebraic manipulations the desiredresults. Proposition 4.
For any given threshold n , the active averagetime is d n : d n = 1 nρ + 1 − ρ (29) Proof:
Likewise, exploiting the results in Proposition 2and according to the expression (25), we obtain the desiredresults.To ensure the existence of the Whittle’s indices, we needfirst to establish the indexability property for all users. Tothat end, we first formalize the indexability and the Whittle’sindex in the following definitions. We note that in the sequel,we precise the indices of users to differentiate between them.
Definition 2.
Considering Problem (15) for a given W and agiven user i , we define D mi ( W ) as the set of states in whichthe optimal action (with respect to the optimal solution ofProblem (15) considering the metric m ) is the passive one. Inother words, m ni ∈ D mi ( W ) if and only if the optimal actionat state m ni is the passive one. D mi ( W ) is well defined for both metrics as the optimalsolution of Problem (15) is a stationary policy, more precisely,a threshold based policy. Definition 3.
A class is indexable if the set of states in whichthe passive action is the optimal action increases with W ,that is, W (cid:48) < W ⇒ D mi ( W (cid:48) ) ⊆ D mi ( W ) . When the class isindexable, the Whittle’s index in state m ni is defined as: W ( m ni ) = min { W | m ni ∈ D mi ( W ) } (30) Proposition 5.
For each user i , the one-dimensional problemis indexable for both metrics.Proof: The proof rests on the decrease of d ni with n . Onecan see [2] for a detailed proof. As the indexability property has been established in theabove proposition, we can now assert the existence of theWhittle’s index. Theorem 2.
For any user i and state m ni , the Whittle’s indexis: • m = a W i ( a ni ) = n ( n − ρ i n (31) • m = bW i ( b ni ) = (1 − r i ) ρ i r i − (1 − N i r i ) ρ i ( N i r i ) + (1 − N i r i ) n +2 ( nρ i + 1 + ρ i (1 − N i r i ) N i r i ) × [ 1 − (1 − ρ i )(1 + ( N i − r i ) N i r i (1 − (1 − ρ i )(1 − r i )) ] − (1 − r i ) n +2 ( nρ i + 1 + ρ i (1 − r i ) r i ) × [ ρ i r i (1 − (1 − ρ i )(1 − r i )) ] (32) Proof:
The proof can be found in Appendix E.Based on the above proposition, we provide in the followingthe Whittle’s index scheduling policy for the original problem(11).
Algorithm 1
Whittle’s index scheduling policy At each time slot t , compute the Whittle’s index ofall users in the system using the expressions given inProposition 2. Allocating the M channels to the M users having thehighest Whittle’s index values at time t .V. N UMERICAL R ESULTS
Our goal in this section is to compare the averageempirical age of incorrect information under thedeveloped Whittle index policy WIP-MAoII to thebaseline policy, denoted by WIP-AoI, that considersthe standard AoI metric. More precisely, we plot C φ,N u = N u lim T → + ∞ sup T E φ (cid:16) (cid:80) T − t =0 (cid:80) N u i =1 m emp,φi ( t ) | m emp (0) , φ (cid:17) for φ equals to WIP-MAoII and WIP-AoI, in function of N u ,where m emp,φi ( · ) evolves as follows: • If m emp,φi ( t ) = 0 , then ˆ X i ( t ) = X i ( t ) . Therefore: m emp,φi ( t + 1) = (cid:26) w.p p i w.p − p i (33) • If m emp,φi ( t ) (cid:54) = 0 , then ˆ X ( t ) (cid:54) = X ( t ) . Therefore: – If φ i ( t ) = 1 , m emp,φi ( t +1) = w.p ρ i p i w.p ρ i (1 − p i ) m emp,φi ( t ) + 1 w.p (1 − ρ i ) × (1 − r i )0 w.p (1 − ρ i ) r i (34) If φ i ( t ) = 0 , m emp,φi ( t +1) = (cid:26) m emp,φi ( t ) + 1 w.p (1 − r i )0 w.p r i (35)We consider two scenarios of the network settings:1) For the first scenario, we consider two classes with therespective parameters: • Class 1: ρ = 0 . , N = 8 , r = 0 . . • Class 2: ρ = 0 . , N = 2 , r = 0 . .2) Regarding the second scenario, to shed light on theimportance of taking into account the source parametersnamely, p i , r i and N i , in the derivation of Whittle’sindices, we consider that the two classes share the samechannel statics, specifically ρ = ρ , while they don’thave the same source parameters. To that extent, weconsider the following case: • Class 1: ρ = 0 . , N = 10 , r = 0 . • Class 2: ρ = 0 . , N = 3 , r = 0 . Fig. 2: Comparison between WIP-MAoII and WIP-AoI interms of the empirical average age: different channel statics The value of p i can be directly deduced from Equation 1 Fig. 3: Comparison between WIP-MAoII and WIP-AoI interms of the empirical average age: same channel staticsOne can observe that effectively WIP-MAoII gives us betterperformance than WIP-AoI in terms of minimizing the aver-age empirical age of incorrect information. As consequence,our derivation of Whittle’s indices in the Markovian sourceframework turns out to be relevant in terms of tracking thereal state of remote sources.VI. C
ONCLUSION
In this paper, we considered the problem of remote monitor-ing of multiple sources where a central entity selects at eachtime a subset of the sources to send their updates, in sucha way to minimize the MAoII metrics. Since the scheduleris unaware of the current state of the source, we introduce abelief state at the monitor in order to predict the evolutionof the states of the sources and to derive an estimation ofthe MAoII. We then developed an efficient scheduling policybased on Whittle’s index framework. Finally, we have providednumerical results that highlight the performance of our policy.R
EFERENCES[1] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often shouldone update?” in . IEEE, 2012, pp.2731–2735.[2] A. Maatouk, S. Kriouile, M. Assaad, and A. Ephremides, “On theoptimality of the whittle’s index policy for minimizing the age ofinformation,” arXiv preprint arXiv:2001.03096 , 2020.[3] Y.-P. Hsu, E. Modiano, and L. Duan, “Scheduling algorithms forminimizing age of information in wireless broadcast networks withrandom arrivals,”
IEEE Transactions on Mobile Computing , 2019.[4] I. Kadota, A. Sinha, E. Uysal-Biyikoglu, R. Singh, and E. Modiano,“Scheduling policies for minimizing age of information in broadcastwireless networks,”
IEEE/ACM Transactions on Networking , vol. 26,no. 6, pp. 2637–2650, 2018.[5] Z. Jiang, S. Zhou, Z. Niu, and C. Yu, “A unified sampling and schedul-ing approach for status update in multiaccess wireless networks,” in
IEEE INFOCOM 2019-IEEE Conference on Computer Communications .IEEE, 2019, pp. 208–216.[6] Y. Sun, Y. Polyanskiy, and E. Uysal, “Sampling of the wiener process forremote estimation over a channel with random delay,”
IEEE Transactionson Information Theory , vol. 66, no. 2, pp. 1118–1135, 2019.7] C. Kam, S. Kompella, G. D. Nguyen, J. E. Wieselthier, andA. Ephremides, “Towards an effective age of information: Remoteestimation of a markov source,” in
IEEE INFOCOM 2018-IEEE Confer-ence on Computer Communications Workshops (INFOCOM WKSHPS) .IEEE, 2018, pp. 367–372.[8] A. Maatouk, S. Kriouile, M. Assaad, and A. Ephremides, “The age ofincorrect information: A new performance metric for status updates,” arXiv preprint arXiv:1907.06604 , 2019.[9] A. Maatouk, M. Assaad, and A. Ephremides, “The age of incorrectinformation: an enabler of semantics-empowered communication,” arXivpreprint arXiv:2012.13214 , 2020.[10] C. Kam, S. Kompella, and A. Ephremides, “Age of incorrect informationfor remote estimation of a binary markov source,” in
IEEE INFOCOM2020-IEEE Conference on Computer Communications Workshops (IN-FOCOM WKSHPS) . IEEE, 2020, pp. 1–6.[11] C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimalqueuing network control,”
Mathematics of Operations Research , vol. 24,no. 2, pp. 293–305, 1999.[12] R. R. Weber and G. Weiss, “On an index policy for restless bandits,”
Journal of Applied Probability , vol. 27, no. 3, pp. 637–648, 1990.[13] S. Kriouile, M. Larranaga, and M. Assaad, “Asymptotically optimaldelay-aware scheduling in wireless networks,” arxiv e-prints, p,” arXivpreprint arXiv:1807.00352 , 2018. A PPENDIX AP ROOF OF L EMMA d i ( t ) = 1 and c i ( t ) = 1 , then the central entity acquires the timelessinformation about the process X i ( t ) , it knows the effectivestate of the process that we denote by S at time t + 1 .Specifically, ˆ X i ( t +1) = X i ( t ) = S . Therefore, the probabilitythat X i ( t + 1) = ˆ X i ( t + 1) = S knowing the fact that X i ( t ) = S is exactly the probability of remaining at the samestate. Consequently, π i ( t + 1) = p i .If d i ( t ) = 1 and c i ( t ) = 0 , or d i ( t ) = 0 , then the informationstate in the monitor side does not change ( ˆ X i ( t + 1) = ˆ X i ( t ) ).Thus, the probability of the event X i ( t + 1) = ˆ X i ( t + 1) : P r ( X i ( t + 1) = ˆ X i ( t + 1))= P r ( X i ( t + 1) = ˆ X i ( t + 1) || X i ( t ) = ˆ X i ( t )) × P r ( X i ( t ) = ˆ X i ( t ))+ P r ( X i ( t + 1) = ˆ X i ( t + 1) || X i ( t ) (cid:54) = ˆ X i ( t )) × P r ( X i ( t ) (cid:54) = ˆ X i ( t ))= P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) = ˆ X i ( t )) × P r ( X i ( t ) = ˆ X i ( t ))+ P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) (cid:54) = ˆ X i ( t )) × P r ( X i ( t ) (cid:54) = ˆ X i ( t ))= P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) = ˆ X i ( t )) π i ( t )+ P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) (cid:54) = ˆ X i ( t ))(1 − π i ( t )) (36) P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) = ˆ X i ( t )) is the probability ofremaining at the same state in the next time slot that equalsto p i . Decomposing P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) (cid:54) = ˆ X i ( t )) : P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) (cid:54) = ˆ X i ( t ))= (cid:88) S (cid:54) = ˆ X i ( t ) P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) (cid:54) = ˆ X i ( t ) , X i ( t ) = S ) × P r ( X i ( t ) = S || X i ( t ) (cid:54) = ˆ X i ( t )) (37) P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) (cid:54) = ˆ X i ( t ) , X i ( t ) = S ) is theprobability of transitioning from S to ˆ X i ( t ) (cid:54) = S that equalsto r i . Hence: P r ( X i ( t + 1) = ˆ X i ( t ) || X i ( t ) (cid:54) = ˆ X i ( t ))= r i (cid:88) S (cid:54) = ˆ X i ( t ) P r ( X i ( t ) = S || X i ( t ) (cid:54) = ˆ X i ( t ))= r i (38)That is, combining the two results: P r ( X i ( t + 1) = ˆ X i ( t + 1)) = π i ( t ) p i + (1 − π i ( t )) r i (39)Therefore, the proof is complete.A PPENDIX BP ROOF OF L EMMA A i ( t ) = ( t − V i ( t )) . As g i ( t ) is known by the mon-itor, then it is a fixed constant. Whereas V i ( t ) which representsthe last time instant such that { X i ( V i ( t ))= ˆ X i ( g i ( t )+1) } = 1 is unknown by the monitor. Accordingly, it is viewed asa random variable by the monitor. By definition of g i ( t ) , { X i ( g i ( t ))= ˆ X i ( g i ( t )+1) } = 1 , then V i ( t ) takes values in [ g i ( t ) , t ] . To that extent, we distinguish between two cases:1) The event { V i ( t ) = g i ( t ) } implies that: • X i ( g i ( t )) = ˆ X i ( g i ( t ) + 1) • For all j ∈ [ g i ( t ) + 1 , t ] , X i ( j ) (cid:54) = ˆ X i ( g i ( t ) + 1) By definition of g i ( t ) , the probability of { X i ( g i ( t )) = ˆ X i ( g i ( t ) + 1) } is .The probability of {∀ j ∈ [ g i ( t ) + 1 , t ] , X i ( j ) (cid:54) = ˆ X i ( g i ( t ) + 1) || X i ( g i ( t )) = ˆ X i ( g i ( t ) + 1) } is (1 − r i ) t − g i ( t ) − (1 − p i ) .Accordingly, the probability of the event { V i ( t ) = g i ( t ) } is (1 − p i )(1 − r i ) t − g i ( t ) − .2) For k ∈ [ g i ( t ) + 1 , t ] , the event { V i ( t ) = k } impliesthat: • At time k , X i ( k ) = ˆ X i ( g i ( t ) + 1) . • For all j ∈ [ k + 1 , t ] , X i ( j ) (cid:54) = ˆ X i ( g i ( t ) + 1) .The probability of { X i ( k ) = ˆ X i ( g i ( t ) + 1) = ˆ X i ( k ) } is π i ( k ) .The probability of {∀ j ∈ [ k + 1 , t ] , X i ( j ) (cid:54) = ˆ X i ( g i ( t ) + 1) || X i ( k ) = ˆ X i ( g i ( t ) + 1) } is (1 − r i ) t − k − (1 − p i ) .Accordingly, the probability of the event { V i ( t ) = k } is π i ( k )(1 − p i )(1 − r i ) t − k − .Given that A i ( t ) = t − V i ( t ) , then A i ( t ) = k implies V i ( t ) = t − k . That is, the probability of the event { A i ( t ) = k } is: • If k = t − g i ( t ) : (1 − r i ) t − g i ( t ) − (1 − p i ) (40) • If k ∈ [0 , t − g i ( t )[ : π i ( t − k )(1 − r i ) k − (1 − p i ) (41)This concludes the proof. PPENDIX CP ROOF OF THEOREM
Lemma 3. b j is increasing with j Proof:
The explicit expression of b j is: b j = N −
11 + r − p (1 − ( j + 1)(1 − r ) j + j (1 − r ) j +1 ]+ 11 + r − p [( p − r ) j +1 − ( j + 1)(1 − r ) j ( p − r )+ j (1 − r ) j +1 ] (42)Therefore, after some computations and mathematical analy-sis, we obtain: b j +1 − b j = N r [(1 − r ) j +1 − ( p − r ) j +1 ] (43)Given that ≤ p − r ≤ − r , then ( p − r ) j +1 ≤ (1 − r ) j +1 .Therefore, (1 − r ) j +1 − ( p − r ) j +1 ≥ . Hence, b j is increasingwith j .Based on this lemma, we prove the following lemma. Lemma 4. V ( . ) is increasing with b j .Proof: We prove the present lemma by induction usingthe Value iteration equation (17). In fact, we show that V t ( · ) is increasing and we conclude for V ( · ) .As V ( . ) = 0 , then the property holds for t = 0 . If V t ( . ) is increasing with b , we show that for b j ≤ b i , V t +1 ( b j ) ≤ V t +1 ( b i ) and V t +1 ( b j ) ≤ V t +1 ( b i ) where for each k ∈ N ∗ : V t +1 ( b k ) = b k + V t ( b k +1 ) (44) V t +1 ( b k ) = b k + W + ρV t ( b ) + (1 − ρ ) V t ( b k +1 ) (45)We have that: V t +1 ( b j ) − V t +1 ( b i ) = b j − b i + ( V t ( b j +1 ) − V t ( b i +1 )) (46)According to Lemma 3, given that b j ≤ b i , then j ≤ i . Thatmeans b j +1 ≤ b i +1 . Therefore, since V t ( . ) is increasing with b j , we have that: V t +1 ( b j ) − V t +1 ( b i ) ≤ As consequence, V t +1 ( · ) is increasing with b j .In the same way, we have: V t +1 ( b j ) − V t +1 ( b i ) = b j − b i + (1 − ρ )( V t ( b j +1 ) − V t ( b i +1 )) Hence: V t +1 ( b j ) − V t +1 ( b i ) ≤ (47)As consequence, V t +1 ( · ) is increasing with b j .Since V t +1 ( . ) = min { V t +1 ( · ) , V t +1 ( · ) } , then V t +1 ( . ) is in-creasing with b j . Accordingly, we demonstrate by inductionthat V t ( . ) is increasing for all t . Knowing that lim t → + ∞ V t ( b j ) = V ( b j ) , V ( . ) must be also increasing with b j .We define: ∆ V ( b j ) = V ( b j ) − V ( b j ) (48) where lim t → + ∞ V t ( b j ) = V ( b j ) and lim t → + ∞ V t ( b j ) = V ( b j ) .Subsequently, ∆ V ( b j ) equals to: ∆ V ( b j ) = ρ [ Wρ + V ( b ) − V ( b j +1 )] (49)According to Lemma 4, V ( . ) is increasing with b j +1 . There-fore, ∆ V ( b j ) is decreasing with b j . Hence, there exists b n such that for all b j ≤ b n , ∆ V ( b j ) ≥ , and for all b j > b n , ∆ V ( b j ) < . Given that the optimal action forstate b j is the one that minimizes min { V ( · ) , V ( · ) } , thenfor all b j ≤ b n , the optimal decision is to stay idle since min { V ( b j ) , V ( b j ) } = V ( b j ) , and for all b j > b n , theoptimal decision is to transmit since min { V ( b j ) , V ( b j ) } = V ( b j ) . Specifically, as b j is increasing with j , there exists n such that for all j ≤ n , the optimal action is passive action,and for all j > n , the optimal action is the active one.A PPENDIX DP ROOF OF P ROPOSITION n at each state m j for m = a and m = b : u n ( m j ) = + ∞ (cid:88) i =1 pt n ( i → j ) u n ( m i ) (50)where pt n ( i → j ) denotes the transitioning probability fromthe state m i to state m j under threshold policy n . After somecomputations, we obtain the desired result which is valid forboth m = a and m = b .A PPENDIX EP ROOF OF T HEOREM m = a , since the analysis are same as in [2], we skipthis case for sake of space. When m = b , using Definition 3 tofind the Whittle’s index expressions can be tricky and difficult.To circumvent this, we first define the sequence W i ( b ni ) as theintersection points between b ni + W d ni and b n +1 i + W d n +1 i .Explicitly: W i ( b ni ) = b n +1 i − b ni d ni − d n +1 i (51)According to the results in [32, Corollary 2.1], if W i ( b ni ) isincreasing with b ni , then the Whittle’s index for any state b ni is nothing but W i ( b ni ) . To that extent, we prove that W i ( b ni ) is increasing with b ni . However, since b ni is increasing with n when m = b (Lemma 3), it is sufficient to show that W i ( b ni ) is increasing with n to establish the desired result.herefore, we first seek a closed-form expression of theintersection point W i ( b ni ) , we obtain: W i ( b ni ) = (1 − r i ) ρ i r i − (1 − N i r i ) ρ i ( N i r i ) + (1 − N i r i ) n +2 ( nρ i + 1 + ρ i (1 − N i r i ) N i r i ) × [ 1 − (1 − ρ i )(1 + ( N i − r i ) N i r i (1 − (1 − ρ i )(1 − r i )) ] − (1 − r i ) n +2 ( nρ i + 1 + ρ i (1 − r i ) r i ) × [ ρ i r i (1 − (1 − ρ i )(1 − r i )) ] (52)Now, we provide the main result that allows us to affirm that W i ( b ni ) is effectively the Whittle index of state b ni . Lemma 5.
The sequence W i ( b ni ) is increasing with n .Proof: After some mathematical analysis and algebraicmanipulations, we get: W i ( b n +1 i ) − W i ( b ni ) = ( nρ i + 1)[(1 − r i ) n +2 − (1 − N i r i ) n +2 ]+ ( nρ i + 1)(1 − ρ i )1 − (1 − ρ i )(1 − r i ) r i [ N i (1 − N i r i ) n +2 − (1 − r i ) n +2 ] (53)We have that: N i (1 − N i r i ) n +2 − (1 − r i ) n +2 ≥ (1 − N i r i ) n +2 − (1 − r i ) n +2 (54)Thus: W i ( b n +1 i ) − W i ( b ni ) ≥ ( nρ i + 1) × [1 − (1 − ρ i )1 − (1 − ρ i )(1 − r i ) r i ] × [(1 − r i ) n +2 − (1 − N i r i ) n +2 ] (55)Given that (1 − r i ) n +2 − (1 − N i r i ) n +2 ≥ and − (1 − ρ i )1 − (1 − ρ i )(1 − r i ) r i = ρ i − (1 − ρ i )(1 − r i ) ≥ , therefore: W i ( b n +1 i ) − W i ( b ni ) ≥0