[PDF] Detecting State Transitions of a Markov Source: Sampling Frequency and Age Trade-off

Abstract

We consider a finite-state Discrete-Time Markov Chain (DTMC) source that can be sampled for detecting the events when the DTMC transits to a new state. Our goal is to study the trade-off between sampling frequency and staleness in detecting the events. We argue that, for the problem at hand, using Age of Information (AoI) for quantifying the staleness of a sample is conservative and therefore, introduce \textit{age penalty} for this purpose. We study two optimization problems: minimize average age penalty subject to an average sampling frequency constraint, and minimize average sampling frequency subject to an average age penalty constraint; both are Constrained Markov Decision Problems. We solve them using linear programming approach and compute Markov policies that are optimal among all causal policies. Our numerical results demonstrate that the computed Markov policies not only outperform optimal periodic sampling policies, but also achieve sampling frequencies close to or lower than that of an optimal clairvoyant (non-causal) sampling policy, if a small age penalty is allowed.

Full PDF

aa r X i v : . [ c s . PF ] M a y Detecting State Transitions of a Markov Source:Sampling Frequency and Age Trade-off

Jaya Prakash Champati, Mikael Skoglund, and James GrossInformation Science and Engineering, EECS, KTH Royal Institute of Technology, Stockholm, SwedenE-mail: { jpra, skoglund, jamesgr } @kth.se Abstract —We consider a ﬁnite-state Discrete-Time MarkovChain (DTMC) source that can be sampled for detecting theevents when the DTMC transits to a new state. Our goal is tostudy the trade-off between sampling frequency and staleness indetecting the events. We argue that, for the problem at hand,using Age of Information (AoI) for quantifying the staleness ofa sample is conservative and therefore, introduce age penalty for this purpose. We study two optimization problems: minimizeaverage age penalty subject to an average sampling frequencyconstraint, and minimize average sampling frequency subject toan average age penalty constraint; both are Constrained MarkovDecision Problems. We solve them using linear programmingapproach and compute Markov policies that are optimal amongall causal policies. Our numerical results demonstrate that thecomputed Markov policies not only outperform optimal periodicsampling policies, but also achieve sampling frequencies closeto or lower than that of an optimal clairvoyant (non-causal)sampling policy, if a small age penalty is allowed.

I. I

NTRODUCTION

Detecting the occurrence of an event when monitoring aninformation source or a process of interest is essential toapplications from varied domains that include control andinformation systems. In a control system, for instance, a sensorsamples a process for detecting an event where the state ofthe process exceeds a certain threshold value. In World WideWeb, a web crawling application is equipped with the taskof downloading remote web pages to a local database (forpage ranking/indexing etc.), and is required to detect the eventswhen the remote web page gets updated.In practice, it is impossible to know the exact time instantof occurrence of an event unless the source is sampled in-ﬁnitely often (or in every time slot for discrete-time systems).However, sampling at a higher frequency incurs costs to asystem in terms of the energy consumption of a sensor, or thebandwidth usage of the network for transmitting the samples.On the other hand, sampling at a lower frequency results instaleness in detecting an event. Therefore, we are interested inthe question: given the source is sampled in time slot n , how tochoose the next sampling instant n + τ such that the conﬂictingobjectives average sampling frequency and average stalenessin the event detection are optimized? In this work, we addressthis question for an information source modelled using a ﬁnite-state DTMC and the events we want to detect are transitionsof the DTMC to new states. Even though this setting seemsfundamental and is useful in modelling different applications,to the best of our knowledge, the trade-off problems we study have not been tackled in the literature – see Section V forrelated works.The ﬁrst step in studying the trade-off between samplingfrequency and staleness is to choose an appropriate metric forquantifying the staleness of a sample. For this purpose, onemay choose Age of Information (AoI), which has emergedas a relevant performance metric for quantifying staleness ofupdates at a destination in a communication system. It isdeﬁned as the time elapsed since the generation of freshestupdate available at the destination [1]. However, we argue thatusing AoI is conservative for the problem at hand and intro-duce a staleness metric age penalty , which is deﬁned as thetime elapsed since the ﬁrst transition out of the most recentlyobserved state. We then formulate two problems: minimizeaverage age penalty subject to an average sampling frequencyconstraint, and minimize average sampling frequency subjectto an average age penalty constraint. Both the problemsare Constrained Markov Decision Problems (CMDPs). Weuse Linear Programming (LP) approach to solve for optimalMarkov policies that are known to be optimal among all causalpolices for the problems at hand. In our numerical analysisusing a two-state Markov chain we ﬁnd that, the optimalpolicy always provides lower sampling frequency than optimalperiodic sampling policy and the gap increases with lowerprobability of transitions. We also present a comparison ofthe sampling frequency achieved by the optimal policy withthat of the sampling frequency of an optimal clairvoyant (non-causal) sampling policy.The rest of the paper is organized as follows. In Section II,we present the system model and formulate the CMDPs. TheLP solution approach for both the problems is described inSection III. Numerical analysis using a two-state Markov chainis presented in Section IV. Related work in presented inSection V and we conclude in Section VI.II. S

YSTEM M ODEL AND P ROBLEM S TATEMENT

A. Markov Source

We consider an information source/process that is modelledby an N -state DTMC { X n , n ≥ } where N < ∞ . We assumethat the DTMC is ergodic, i.e., irreducible and aperiodic. Let S = { , , . . . , N } denote the set of states. We use p ij , for all i, j ∈ S , to denote the one-step probabilities, and the n -steptransition probabilities are denoted by p ( n ) ij = P ( X n = i | X = j ) , ∀ i, j ∈ S. ig. 1: Sampling an information source/process modelledusing a DTMC. Each sample reveals the state of the DTMC.Given the one-step probabilities, the n -step transition proba-bilities can be computed using matrix multiplication on theone-step transition probability matrix [2]. Let ξ j denote thestationary probability of ﬁnding the DTMC in state j .A time slot in the system represents one unit of time ofthe DTMC and the state transitions occur at the start of a timeslot. The state of the DTMC can only be observed by samplingthe source; see Figure 1. Let T = 0 , T , T , . . . denote thetime instants of transitions of the DTMC to new states. Weare interested in detecting these transitions at the earliest timepossible. Our motivation for studying this problem arises dueto its relevance to applications from different domains. • In a control system, the source is a process of interest, anda state transition represents an event where the processexceeds a certain threshold. • In a web crawling application [3], the source is a remoteweb page and the state transitions models the updatingevents of the website.Clearly, sampling the source at the start of every slot allowsus to detect each and every transition of the DTMC. Instead,our aim here is to use lower sampling frequency. This trans-lates to energy savings for a sensor and/or bandwidth savingsfor transmitting lower number samples to a controller/monitor.In the case of web crawling application, this translates to lowerfrequency of downloads of the remote web page. However,using lower sampling frequency will result in staleness indetecting a transition and may also miss several transitions. Weare thus interested in studying the trade-off between samplingfrequency and staleness. Next, we deﬁne sampling policies andthe age penalty for quantifying staleness.

B. Sampling Policy and Age Penalty

Assume that X is given. A sampling policy π speciﬁesthe set of sampling instants { G k , k ≥ } , where G k is thesampling instant of the k th sample. Deﬁne τ k = G k − G k − for all k ≥ , and G = 0 , then the policy π can beequivalently speciﬁed by { τ k , k ≥ } . We assume that τ ∈ Q = { , , . . . , M } , where M < ∞ is the maximum inter-sampling time allowed in the system. Let Π denote the set ofall causal policies, where a causal policy considers the currentand all past observed states and past actions for choosing thecurrent action. In the sequel, we study the following policies.1) Markov policies:

A Markov policy maps each state to anaction with a ﬁxed probability. To be precise, let j be theobserved state in the k th decision epoch, then under aMarkov policy τ k is assigned a value τ ∈ Q according to a ﬁxed probability distribution P π ( τ k = τ | j ) . Let Π MR denote the set of Markov policies.2) Periodic sampling policies:

Under these policies, sam-ples are taken at ﬁxed time intervals τ . With a slightabuse in notation we use π ( τ ) to denote such a policy.Note that periodic sampling policies are a subclass ofMarkov policies.3) Optimal clairvoyant sampling policy:

Under this policy,the next transition to a new state is assumed to be knowna priori, and thus the source is sampled exactly at theinstants when transitions between states occur. Let π † = { G † k , k ≥ } denote this policy and ν † denote its averagesampling frequency. Note that π † is a non-causal policyand we study it for theoretical benchmarking.As stated before, sampling the source at the start of everyslot allows us to identify each and every transition of theDTMC to a new state and thus staleness of each sample iszero. However, quantifying the staleness of a sample in generalis not entirely obvious. This is because, when the samplersamples the source it may ﬁnd that the DTMC is in the samestate or a different state from the previous sample, and even inthe former case multiple transitions might have occurred. Onemay consider AoI, denoted by ∆( t ) , at the sampler as thestaleness metric. It increases linearly between two samplinginstants and resets to zero at the sampling instants. However,using this statelessness metric is conservative in this context.To illustrate this, in Figure 2 we plot the sample-path of a 3-state DTMC and the resulting AoI. Note that in the durationbetween the instants G and G , DTMC stays in state for time-slots after it was observed by the sampler at G . Ideally,this should not be accounted for the staleness of the sampleat G , but AoI adds a linear penalty for this duration.Using the above insight, we quantify the stateless of asample k by introducing age penalty A k , which is deﬁnedas the time elapsed since the ﬁrst transition out of the state inthe k − sample . Under policy π the age penalty for the k thsample is given by A k ( π ) = max { , G k − min n { T n : T n ≥ G k − }} . This entity is illustrated and contrasted with AoI in Figure 2.Under a policy π , the average age penalty E [ A ( π )] is givenby, E [ A ( π )] = lim sup K →∞ E [ P Kk =1 A k ( π )] K , and the average sampling-interval is given by lim sup K →∞ E [ P Kk =1 τ k ] K , where the expectation is takenwith respect to the probability distribution induced by π onthe sequence of observed states and actions. C. Optimization problems P and P We are interested in the following problems. For a givenupper bound ν ∈ (0 , on the average sampling frequency, One may also consider including the number of missed transitions in theage penalty and with some effort solve the problem using the same approachin this paper. (cid:21)(cid:20) (cid:21) (cid:22) (cid:22)(cid:20) (cid:21) (cid:22)(cid:19) (cid:20) (cid:21) (cid:22)

Fig. 2: A sample path of a 3-state Markov chain. AoI andage penalties are depicted for ﬁrst three sampling instants ofa policy with G = 2 , G = 6 , and G = 7 .in problem P we aim to minimize the average age penaltywhich is stated below.minimize π ∈ Π E [ A ( π )] s.t. lim sup K →∞ E [ P Kk =1 τ k ] K ≥ ν . (1)For a given upper bound d ≥ on the average age penalty, inproblem P we aim to maximize the average sampling-intervalwhich is state below.maximize π ∈ Π lim sup K →∞ E [ P Kk =1 τ k ] K s.t. E [ A ( π )] ≤ d . (2)Let π ∗ and π ∗ denote optimal policies for P and P ,respectively. Remark:

For P , an optimal periodic sampling policychooses τ = ⌈ /ν ⌉ . For P an optimal periodic samplingpolicy chooses τ = d + 1 .Finally, we deﬁne τ † = ⌈ /ν † ⌉ .III. L INEAR P ROGRAMMING S OLUTION A PPROACH

Both P and P are Constrained Markov Decision Problems(CMDP). A CMDP with ﬁnite state and action sets has anoptimal policy in the set of Markov policies [4], and itcan be efﬁciently solved using the Linear Programming (LP)approach presented in [5]. Therefore, in the following we onlyneed to consider the set of Markov policies. Under Markovpolicies the induced stochastic process { X G k , k ≥ } , i.e., thesequence of observed states, is also a DTMC; in the sequelwe refer to it as induced DTMC . A. Elements of the CMDP

The decision epochs in P are indexed by k . • State space: S = { , , . . . , N } . • Action space:

At decision epoch k , the next inter-sampling time τ k +1 is chosen from the set Q = { , , . . . , M } . • Transition probabilities:

The next state i ∈ S ofthe induced DTMC only depends on the current ob-served/sampled state j and the sampling interval τ . Tobe precise, let j be the state observed in decision epoch k , i.e., in time slot G k , then the transition probability ofthe induced DTMC to state i for any sampling interval τ is given by q jτi = P ( X G k + τ = i | X G k = j )= P ( X τ = i | X = j )= p ( τ ) ji , ∀ i, j ∈ S and τ ∈ Q. Further, given π ∈ Π MR , the stead-state probabilities lim k →∞ P π ( X G k = j ) for the induced DTMC can becomputed from the following transition probabilities. P ( X G k +1 = i | X G k = j ) = E [ q jτi ]= M X τ =1 q jτi P π ( τ | j ) , ∀ i, j ∈ S. (3) • Costs:

In decision epoch k , if the state is j , then choosinga sampling interval τ ∈ Q results in a cost contributingto the average age-penalty which is given by c jτ = τ − X n =1 ( τ − n )(1 − p jj ) p n − jj , and the cost contributing to the average sampling-intervalis given by τ . Note that c jτ is the expected number ofslots the DTMC has spent after moving out of state j inthe sampling interval τ . It is easy to see that E [ A k +1 | X G k = j, τ k = τ ] = c jτ , ∀ k ≥ (4) B. LP formulations for P and P We deﬁne z πjτ = lim k →∞ P ( X G k = j, τ k = τ ) , the steady-state probability of observing the state-action pair ( j, τ ) undera policy π ∈ Π MR . Then, using (4), we obtain E [ A ( π )] = N X j =1 M X τ =1 c jτ z πjτ lim sup K →∞ E [ P Kk =1 τ k ] K = N X j =1 M X τ =1 τ z πjτ . In the LP formulations for P and P , we solve for z πjτ with the following constraints, N X j =1 M X τ =1 z πjτ = 1 , (5) M X τ =1 z πiτ = N X j =1 M X τ =1 q jτi z πjτ , i ∈ S, (6) z πjτ ≥ , j ∈ S and τ ∈ Q. (7)he constraint (6) is a consequence of the equilibrium equa-tions for the induced DTMC in the steady state. In thefollowing, we present an equivalent LP formulation for P ,minimize { z πjτ } N X j =1 M X τ =1 c jτ z πjτ s.t. N X j =1 M X τ =1 τ z πjτ ≥ ν , (5) , (6) , (7) . (8)Let { z ∗ jτ } denote the optimal solution for (8), then the sta-tionary probabilities under π ∗ are computed as follows. For τ ∈ Q , P π ∗ ( τ | j ) = z ∗ jτ P Mτ =1 z ∗ iτ , j ∈ S. Similarly, an equivalent LP can be formulated for P and π ∗ can be obtained. C. Computing ν † Note that, in P the value of ν in the constraint can bechosen in the interval (0 , . We are particularly interested insetting ν = ν † , because this will give us the minimum achiev-able average age-penalty for the same sampling frequencyachieved by the optimal clairvoyant sampling policy π † . Wenote that ν † can be obtained by subtracting the percentageof the total frequency of transitions in the DTMC contributeddue to self transitions, i.e., transitions from a state to itself,from the total frequency of transitions in the DTMC. Sincea transition occurs in every time slot, total frequency oftransitions in the DTMC is . The percentage of the totalfrequency of transitions in the DTMC contributed due to selftransitions is given by P Nj =1 ξ j p jj . The following propositionfollows directly from the above analysis. Proposition 1.

Under the optimal clairvoyant sampling policy π † , the average sampling frequency ν † is given by ν † = 1 − N X j =1 ξ j p jj . For a two-state Markov chain, the steady-state probabilitiesare given by ξ = p p + p , and ξ = p p + p , and ν † = ξ p + ξ p = 2 p p p + p . Figure 3 shows ν † versus p for different values of p . p ν † p = 0 . p = 0 . p = 0 . Fig. 3: Sampling frequency under π † for a two-state Markovchain.IV. N UMERICAL R ESULTS : T WO -S TATE M ARKOV C HAIN

In this section we present numerical analysis for a two-state DTMC. Even though this is the simplest case, it canpotentially be used in modelling sources where the set of statescan be divided into two sets, for example “disturbance” vs “nodisturbance”, and the events of interest are transitions betweenthese sets. We have implemented the LPs using linprog inMATLAB. In the following, we ﬁrst present two numericalexamples to examine the structure of the optimal Markovpolicies for P and P . We then present sampling frequencyand age penalty trade-off and a performance comparisonbetween optimal and optimal periodic sampling policies. Example 1:

In this example, we solve P for transitionprobabilities p = 0 . and p = 0 . , and the constraint onthe expected sampling interval is equal to /ν † = 5 . . Thecomputation of the optimal policy π ∗ results in the followingstationary probabilities, P π ∗ ( τ = 6 | j = 1) = 0 . and P π ∗ ( τ = 7 | j = 1) = 0 . , P π ∗ ( τ = 2 | j = 2) = 1 . The transition probability out of state is higher and thus thepolicy sets τ = 2 when the observed state is . The minimumexpected age penalty is computed to be . . An optimalperiodic sampling policy chooses τ = τ † = ⌈ /ν † ⌉ = 6 . Example 2:

In this example, we solve P when p = 0 . , p = 0 . , and the expected age penalty is upper bounded by d = 1 . The computation of the optimal policy π ∗ results inthe following stationary probabilities. P π ∗ ( τ = 2 | j ) = 0 . and P π ∗ ( τ = 3 | j ) = 0 . for j = 1 , . The minimum expected sampling frequency is computed tobe . . The optimal periodic sampling policy chooses τ = d + 1 = 2 , and hence its sampling frequency is . . A. Performance Comparison

In Figure 4, we compare the average age penalties achievedby optimal periodic sampler and the optimal policy π ∗ btained by solving P under the constraint ν = ν † . Recallthat for this case, the optimal periodic sampler sets thesampling interval equal to τ † = ⌈ /ν † ⌉ . From the ﬁgure,we observe that for lower transition probabilities betweenthe sates, i.e., lower p and p values, periodic samplerachieves age penalties only slightly higher than that ofthe optimal policy, because in this case the optimal policyis also choosing sampling intervals close to that of theperiodic sampler. The gap between them, however, increasessigniﬁcantly for higher transition probabilities. The zigzagpattern of the periodic sampler can be attributed to the ceilfunction used in computing the sampling interval.In Figures 5, and 6 we compare average sampling fre-quencies achieved by the optimal periodic sampler and theoptimal policy π ∗ by solving P . From Figure 5, we observethe trade-off between achievable sampling frequencies andage penalties. As expected, for age penalty constraint of onetime slot, i.e. d = 1 , the achievable sampling frequency islower than . for both policies. However, π ∗ results in muchlower sampling frequencies for lower transition probabilities.In Figure 6, we set d = 1 and thus the optimal periodic samplersamples every time slots with sampling frequency . . Onthe other hand, π ∗ provides much lower sampling frequencieswhen either of the transition probabilities are small.Finally, in Figure 7, we present the ratio between theexpected sampling frequency achieved by π ∗ and ν † , underaverage age penalty constraint d = 1 . We note that under π † the age penalty is always zero. This cannot be achievedby any causal policy with a sampling frequency strictly lessthan one. Nonetheless, an interesting observation from theﬁgure is that by allowing a small age penalty d = 1 , theoptimal policy π ∗ can achieve lower sampling frequency than ν † when transition probabilities are higher, say p = 0 . and p = 0 . . For lower transition probabilities p = 0 . and p = 0 . , the ratio is always greater than , i.e., optimalpolicy π ∗ couldn’t achieve the sampling frequency ν † andmay require more relaxation in the age penalty constraint. Inconclusion, for lower transition probabilities, i.e., if the eventsbecome rare, the optimal policy performs worse with respectoptimal clairvoyant sampling policy.V. R ELATED W ORKS

In the AoI literature, the works [6]–[9] considered remotemonitoring/estimation of the states of a Markov source. In[6], the authors studied remote state estimation of a two-stateMarkov Chain where the communication delay is geomet-rically distributed. They computed average AoI and estima-tion error for two sampling policies: zero-wait policy, whichgenerates a sample when the channel is idle, and sample-at-change policy, which generates a sample when the channelis idle and a transition to a state different from the previoussample occurs. The authors in [7] proposed a freshness metricbased on the mutual information between the current stateof the source and the received states at a remote monitor,and solved an optimal sampling problem for maximizing the p -1 A v e r ag e ag e p e n a l t y Average sampling interval lower bounded by 1 / ν † π ∗ , p = 0 . π ∗ , p = 0 . π ∗ , p = 0 . π ( τ † ) , p = 0 . π ( τ † ) , p = 0 . π ( τ † ) , p = 0 . Fig. 4: Average age penalties achieved by π ∗ and the optimalperiodic sampler for different p and p values. Average age penalty d A v e r ag e s a m p li n g f r e q u e n c y p = 0 . , p = 0 . π ∗ p = 0 . , p = 0 . π ∗ p = 0 . , p = 0 . π ∗ p = 0 . , p = 0 . π ∗ Periodic sampler π ( d + 1) Fig. 5: Sampling frequency vs age penalty trade-off: expectedsampling frequencies achieved by π ∗ and the optimal periodicsampler by varying age penalty constraint value.mutual information. In [8], the authors analysed freshnessby proposing a closely related metric based on conditionalentropy, where current state and the states in the past till thetime of generation of the freshest sample at the monitor areconditioned with respect to this freshest sample. Displayingthe state of a continuous-time Markov chain source at aremote monitor was studied in [9]. The authors analysed theprobability of error in displaying the correct state of the source.In our system model, we consider staleness only at the sampler.The age penalty metric we studied is different from the aboveworks and is used to uniquely capture the trade-off betweenstateless and sampling frequency by considering the dynamicsof the Markov chain.The problem of when to sample next has been studiedfor many years in control theory, see for example [10]–[13]. In [10] ( [11]), the authors considered the off-line(on-line) problem of choosing the time instants to samplesensor measurements to minimize a Linear Quadratic Gaussian .1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 p A v e r ag e s a m p li n g f r e q u e n c y Average age penalty upper bounded by 1 π ∗ , p = 0 . π ∗ , p = 0 . π ∗ , p = 0 . π (2) Fig. 6: Average sampling frequencies achieved by π ∗ and theoptimal periodic sampler for varying p , and d = 1 . p R a t i o ( s a m p li n g f r e q u e n c y t o ν † ) Average age penalty upper bounded by 1 π ∗ , p = 0 . π ∗ , p = 0 . π ∗ , p = 0 . Fig. 7: Ratio between the expected sampling frequencyachieved by π ∗ and ν † .(LQG) cost in a Linear Time Invariant (LTI) system. In [12],the authors considered minimizing squared error distortion forstate estimation of a Markov source under a constraint onmaximum number of transmitted samples. We note howeverthat, in this work the sensor is assumed to samples the processcontinuously but only transmits certain samples based on somecriterion (event-triggering). In [13], the authors studied thedesign of sampling intervals such that the stability of a non-linear stochastic dynamical system is ensured. In all the aboveworks, the objective is either to minimize estimation error orcontrol cost or ensure stability of the system.Perhaps the most relevant application of the problem wehave studied is the web crawling application [3], [14]. Theauthors in [14] have solved a static optimization problemfor computing optimal ﬁxed intervals between downloads fordifferent web pages. To the best of our knowledge, dynamicpolicies that use the state of the system have not been studiedin this line of work; see [3] for a survey. In contrast to the above works, we considered the set of causal sampling policiesand studied the trade-off between sampling frequency and agepenalty for detecting state transitions in a ﬁnite-state DTMC.VI. C ONCLUSION

We have studied the trade-off between sampling frequencyand staleness for detecting transitions of a DTMC to newstates. The staleness of the k th sample is quantiﬁed using agepenalty, which is deﬁned as the time elapsed since the ﬁrsttransition out of the state in the k − sample. The formulatedproblems P and P are CMDPs and were solved by derivingequivalent LPs. We have provided a closed-form expressionfor ν † , the sampling frequency under the optimal clairvoyantsampling policy. Even though our problem setting lookedsimple, the numerical examples revealed that the optimalpolicies have randomized Markov policy structure, i.e., simpledeterministic optimal policies may not exist for this problem.Apart from the superior performance of the computed optimalpolicy over optimal periodic sampling policy, we found thatby allowing a small age penalty the optimal policy achievessampling frequency lower than ν † in some cases.We leave comprehensive simulation results considering N > for future work. We would like to explore differentage penalties and study the trade-off when there are multiplesources. Finally, we are interested in studying the problem fordifferent models for the information source.R EFERENCES[1] S. Kaul, M. Gruteser, V. Rai, and J. Kenney, “Minimizing age ofinformation in vehicular networks,” in

Proc. IEEE SECON , 2011.[2] J. R. Norris,

Markov Chains . Cambridge University Press, 1997.[3] C. Olston and M. Najork, “Web crawling,”

Found. Trends Inf. Retr. ,vol. 4, no. 3, p. 175246, Mar. 2010.[4] E. Altman,

Constrained Markov Decision Processes . Chapman andHall, 1999.[5] A. S. Manne, “Linear programming and sequential decisions,”

Manage-ment Science , vol. 6, no. 3, pp. 259–267, 1960.[6] C. Kam, S. Kompella, G. D. Nguyen, J. E. Wieselthier, andA. Ephremides, “Towards an effective age of information: Remoteestimation of a markov source,” in

IEEE INFOCOM WKSHPS , April2018, pp. 367–372.[7] Y. Sun and B. Cyr, “Information aging through queues: A mutualinformation perspective,”

CoRR , vol. abs/1806.06243, 2018.[8] S. Feng and J.-S. Yang, “Information freshness for timely detection ofstatus changes,”

ArXiv , vol. abs/2002.04648, 2020.[9] Y. Inoue and T. Takine, “AoI perspective on the accuracy of monitoringsystems for continuous-time markovian sources,” in

IEEE INFOCOMWKSHPS , April 2019, pp. 183–188.[10] H. Kushner, “On the optimum timing of observations for linear controlsystems with unknown initial state,”

IEEE Transactions on AutomaticControl , vol. 9, no. 2, pp. 144–150, April 1964.[11] E. Skaﬁdas and A. Nerode, “Optimal measurement scheduling inlinear quadratic gaussian control problems,” in

Proceedings of the1998 IEEE International Conference on Control Applications (Cat.No.98CH36104) , vol. 2, Sep. 1998, pp. 1225–1229 vol.2.[12] M. Rabi, G. V. Moustakides, and J. S. Baras, “Adaptive sampling forlinear state estimation,”

SIAM Journal on Control and Optimization ,vol. 50, no. 2, pp. 672–702, 2012.[13] R. P. Anderson, D. Milutinovi, and D. V. Dimarogonas, “Self-triggeredsampling for second-moment stability of state-feedback controlled sdesystems,”

Automatica , vol. 54, pp. 8 – 15, 2015.[14] J. Cho and H. Garcia-Molina, “Effective page refresh policies for webcrawlers,”