AoI-optimal Joint Sampling and Updating for Wireless Powered Communication Systems
Mohamed A. Abd-Elmagid, Harpreet S. Dhillon, Nikolaos Pappas
11 AoI-optimal Joint Sampling and Updating forWireless Powered Communication Systems
Mohamed A. Abd-Elmagid, Harpreet S. Dhillon, and Nikolaos Pappas
Abstract —This paper characterizes the structure of the Age ofInformation (AoI)-optimal policy in wireless powered communi-cation systems while accounting for the time and energy costs ofgenerating status updates at the source nodes. In particular, fora single source-destination pair in which a radio frequency (RF)-powered source sends status updates about some physical processto a destination node, we minimize the long-term average AoIat the destination node. The problem is modeled as an averagecost
Markov Decision Process (MDP) in which, the generationtimes of status updates at the source, the transmissions of statusupdates from the source to the destination, and the wirelessenergy transfer (WET) are jointly optimized. After proving themonotonicity property of the value function associated with theMDP, we analytically demonstrate that the AoI-optimal policyhas a threshold-based structure w.r.t. the state variables. Ournumerical results verify the analytical findings and reveal theimpact of state variables on the structure of the AoI-optimalpolicy. Our results also demonstrate the impact of system designparameters on the optimal achievable average AoI as well as thesuperiority of our proposed joint sampling and updating policy w.r.t. the generate-at-will policy.
I. I
NTRODUCTION
AoI provides a rigorous way of quantifying the freshnessof information about a physical process at a destination nodebased on the status updates it receives from a source node [1].In [2], AoI was first defined as the time elapsed between thegeneration of a status update at the source and its reception atthe destination. Since then, AoI has been extensively used toquantify the performance of various communication networksthat deal with time-sensitive information, including multi-hop networks [3], multicast networks [4], broadcast networks[5], [6], and ultra-reliable low-latency vehicular networks[7]. Interested readers are advised to refer to [8], [9] forcomprehensive surveys.Recently, the concept of AoI has been argued to have animportant role in designing freshness-aware Internet of Things (IoT) networks (which can enable a broad range of real-timeapplications) [10]–[14]. A common assumption in most of theliterature on AoI is to neglect the costs of generating statusupdates, however, the IoT devices (source nodes in the contextof AoI setting) are currently expected to perform sophisticatedtasks while generating status updates [12], [15]. In that sense,it is crucial to incorporate the energy and time costs ofgenerating status updates in the design of future freshness-aware IoT networks. To further enable a sustainable operationof such networks, RF energy harvesting has emerged as apromising solution for charging low-power IoT devices [16].
M. A. Abd-Elmagid and H. S. Dhillon are with Wireless@VT, Departmentof ECE, Virginia Tech, Blacksburg, VA (Email: { maelaziz, hdhillon } @vt.edu).N. Pappas is with the Department of Science and Technology, Link¨oping Uni-versity, SE-60174 Norrk¨oping, Sweden (Email: [email protected]). Thesupport of the U.S. NSF (Grant CPS-1739642) is gratefully acknowledged. In particular, the ubiquity of RF signals even at hard-to-reachplaces makes them more suitable to power IoT devices thanother popular sources of energy harvesting, such as solar orwind. In addition, the implementation of RF energy harvestingmodules is usually cost efficient, which is another importantaspect of the deployment of IoT devices. The main focus ofthis paper is to investigate the structural properties of the AoI-optimal joint sampling and updating policy for freshness-awareRF-powered IoT networks.The AoI-optimal policy for an energy harvesting sourcehas already been investigated under various system settings[17]–[24]. The energy harvesting process is commonly mod-eled as an independent external stochastic process. However,when the source is assumed to be RF-powered, the harvestedenergy depends on the channel state information (CSI) andits variation over time, which makes the characterization ofthe AoI-optimal policies very challenging. It is worth notingthat [25]–[27] have very recently explored the AoI-optimalpolicy in wireless powered communication systems. However,none of the proposed policies took into account the time andenergy costs of generating status updates at the source. Inaddition, [25], [26] did not incorporate the evolution of thebattery level at the source and the variation of CSI over timein the process of decision-making. This paper makes the firstattempt to analytically characterize the structural properties ofthe AoI-optimal joint sampling and updating policy while: i)considering the dynamics of battery level, AoI, and CSI, andii) accounting for the costs of generating status updates in theprocess of decision-making.Contributions.
Our main contribution is the analytical char-acterization of the structure of the AoI-optimal policy for anRF-powered single source-destination pair system setup whileincorporating the time and energy costs for generating statusupdates at the source. In particular, we model the problemas an average cost MDP with finite state and action spacesfor which its corresponding value function is shown to bea monotonic function w.r.t. the state variables. Using thisproperty, the AoI-optimal policy is proven to have a threshold-based structure w.r.t. different state variables. Our numericalresults verify our analytical findings and reveal the impact ofstate variables as well as the energy required for generating astatus update at the source on the structure of the AoI-optimalpolicy. Our results also demonstrate that the optimal achievableaverage AoI by our proposed joint sampling and updatingpolicy significantly outperforms the achievable average AoIby the generate-at-will policy. The theory of MDPs is useful for problems in which the objective is toobtain an optimal mapping between the system state and action spaces. It alsoallows one to account for the temporal variations of the system state variablesin the process of decision-making. a r X i v : . [ c s . I T ] J un Source node
Wireless energy transfer
Update packet transmission
Destination nodeAoI evolution versus time
Fig. 1. An illustration of the system setup.
II. S
YSTEM M ODEL AND P ROBLEM F ORMULATION
A. Network Model
We consider a single source-destination pair model in whichthe source contains: i) a sensor that keeps sampling the real-time status of a physical process and ii) a transmitter thatsends status update packets about the observed process tothe destination, as shown in Fig. 1. Since the single source-destination pair model may actually be sufficient to study adiverse set of applications [2] (e.g., safety of an intelligenttransportation system, predicting and controlling forest fires,and efficient energy utilization in future smart homes), ouranalysis in this paper will be of interest in many applications.The scenario of having multiple source nodes is left as apromising direction of future work.We assume that the source node may perform sophisti-cated sampling tasks, e.g., initial feature extraction and pre-classification using machine learning tools [12]. Hence, unlikemost of the existing literature, the time and energy costsof generating an update packet at the source node cannotbe neglected. While the destination node is assumed to bealways connected to the power grid, the source node ispowered through WET by the destination node. Particularly,the destination node transmits RF signals in the downlink tocharge the source node. The energy harvested by the sourcenode is then stored in its battery, which has a finite capacityof B max joules. The source and destination nodes share thesame channel and they have a single antenna each. Hence, thesource can either harvest energy or transmit data at a giventime instant.We assume discrete time and the time slots are of equalsize. Let B ( n ) , A ( n ) and τ ( n ) denote the amount of availableenergy in the battery at the source node, the AoI at the desti-nation node, and the time passed since the generation instantof the current update packet available at the source node (i.e.,the AoI of the status updates at the source node), respectively,at the beginning of time slot n . Denote by h ( n ) and g ( n ) theuplink and downlink channel power gains between the sourceand destination nodes over slot n , respectively. We assume thatthe channels are influenced by quasi-static flat fading. This, inturn, means that the channels are fixed over a time slot, andindependently vary from one slot to another. B. State and Action Spaces
The state of the system at slot n can be expressed as s ( n ) (cid:44) ( B ( n ) , A ( n ) , τ ( n ) , h ( n ) , g ( n )) ∈ S ; where S is the state space which contains all the combinations of thesystem state variables. We also assume that the state variablescan take discrete values and obtain a lower bound to theperformance of the continuous system (as it will be clear inthe sequel). In particular, we have B ( n ) ∈ { , , · · · , b max } where b max denotes the battery capacity, such that each energyquantum in the battery is equivalent to e q = B max b max joules.Note that both the energy consumed from the battery foran update packet transmission and the harvested energy needto be expressed in terms of the energy quanta. In addition,if the channel power gains are originally modeled usingcontinuous random variables, we discretize them into a finitenumber of intervals whose probabilities are determined fromthe probability density function (PDF) of the fading gain. Inparticular, each interval is then represented by a discrete levelof channel power gain which has the same probability as thatof this interval. Without loss of generality, we also assume that A ( n ) ( τ ( n )) is upper bounded by a finite value A max ( τ max ) which can be chosen to be arbitrarily large [12], [22]. When A ( n ) reaches A max , it means that the available information atthe destination node is too stale to be of any use.Based on s ( n ) , two actions are decided at slot n : i) the firstaction a ( n ) ∈ A (cid:44) { S, I } determines whether the sourcegenerates a new update packet in slot n or not, and ii) the sec-ond action a ( n ) ∈ A (cid:44) { T, H } determines whether slot n isallocated for an update packet transmission from the source tothe destination or WET by the destination. Specifically, when a ( n ) = S , a new update packet is generated by the source,which replaces the currently available one, if any, since there isno benefit of sending out-of-date packets to the destination. Wealso consider that generating an update packet takes one timeslot (as a time cost ) and requires an amount of energy E S (as energy cost expressed in energy quanta). When a ( n ) = T , thesource sends its currently available packet (that was generatedfrom τ ( n ) time slots) to the destination. The required energyfor a packet transmission of size M bits in slot n , accordingto Shannon’s formula, is E T ( n ) = σ h ( n ) (cid:0) M/W − (cid:1) , where σ is the noise power at the destination and W is the channelbandwidth. When a ( n ) = H , slot n is allocated for WET bythe destination to charge the battery at the source. We considera practical non-linear energy harvesting model [29] such thatthe energy harvested by the source is given by E H ( n ) = P max (1 − exp [ − aP rec ( n )])1 + exp [ − a ( P rec ( n ) − b )] , (1)where a and b are constants representing the steepness andthe inflexion point of the curve that describes the input-outputpower conversion, P max is the maximum power that canbe harvested through a particular circuit configuration, and P rec ( n ) = P t g ( n ) such that P t is the average transmit powerby the destination. Hence the system action at slot n can beexpressed as a ( n ) = ( a ( n ) , a ( n )) ∈ A = A × A , where A is the action space of the system. Note that constructing a finite state space of an MDP by discretizing thestate variables and/or defining upper bounds on their maximum values is verycommon in the literature to obtain the optimal policy numerically as well ascharacterize its structure properties analytically using standard optimizationtechniques such as the Value Iteration Algorithm (VIA) or Policy IterationAlgorithm (PIA). See [12], [22], [23], [28] for representative examples.
Note that the system state is assumed to be available at thedestination node at the beginning of each time slot to takedecisions. In particular, we assume that the location of thesource node is known a priori , and hence the average channelpower gains are pre-estimated and known at the destinationnode. In particular, at the beginning of an arbitrary time slot,the destination node has perfect knowledge about the channelpower gains in that slot, and only a statistical knowledge forfuture slots [28]. Further, given some initial values for theremaining system state parameters (i.e., B (0) , τ (0) and A (0) ),the destination node updates their values based on the actiontaken at each time slot. More specifically, B ( n + 1) can beexpressed as a function of the system action at slot n ( a ( n )) as B ( n ) − (cid:6) E T ( n ) /e q (cid:7) , if a ( n ) = ( I, T ) ,B ( n ) − E S − (cid:6) E T ( n ) /e q (cid:7) , if a ( n ) = ( S, T ) , min (cid:8) b max , B ( n ) + (cid:4) E H ( n ) /e q (cid:5)(cid:9) , if a ( n ) = ( I, H ) , min (cid:8) b max , B ( n ) − E S + (cid:4) E H ( n ) /e q (cid:5)(cid:9) , if a ( n ) = ( S, H ) , (2)where we used the ceiling and floor with E T ( n ) and E H ( n ) ,respectively. Thus, we obtain a lower bound to the perfor-mance of the original continuous system. An upper bound canbe obtained by reversing the ceiling and floor operators. Let A ( s ( n )) denote the action space associated with state s ( n ) ,i.e., A ( s ( n )) contains the possible actions that can be takenat s ( n ) . We assume that a ( n ) ∈ A ( s ( n )) only if B ( n ) isgreater than the energy required for taking action a ( n ) , hencewe always have B ( n + 1) ≥ . Furthermore, A ( n + 1) and τ ( n + 1) can be expressed, respectively, as A ( n + 1) = (cid:26) min { A max , τ ( n ) + 1 } , if a ( n ) = ( a ( n ) , T )min { A max , A ( n ) + 1 } , otherwise . (3) τ ( n + 1) = (cid:26) , if a ( n ) = ( S, a ( n )) , min { τ max , τ ( n ) + 1 } , otherwise , (4)where a ( n ) = ( a ( n ) , T ) means that a ( n ) ∈ { ( I, T ) , ( S, T ) } .This also applies to ( S, a ( n )) w.r.t. a ( n ) . C. Problem Formulation
A policy is a mapping from the system state space to thesystem action space. Under a policy π , the long-term averageAoI at the destination with initial state s (0) is given by ¯ A π (cid:44) lim sup N →∞ N + 1 N (cid:88) n =0 E [ A ( n ) | s (0)] . (5)We take the expectation w.r.t. the channel conditions andpolicy in (5). We then aim at finding the policy π ∗ thatachieves the minimum average AoI, i.e., π (cid:63) = arg min π ¯ A π . (6)Owing to the independence of channel power gains overtime and the nature of the dynamics of remaining statevariables, as described by (2)-(4), the problem can be modeledas an MDP. Recall that the system state space is finite (the statevariables are discretized) and the system action space is clearlyfinite as well. In this case, the MDP at hand is a finite-statefinite-action MDP , for which there exists an optimal stationary deterministic policy (i.e., we take a deterministic action at eachstate that is fixed over time) that can be obtained using the VIAor PIA [30]. Therefore, in the sequel, we omit the time indexand explore this stationary deterministic policy. In the nextsection, we characterize the AoI-optimal policy π (cid:63) and deriveits structural properties.III. A NALYSIS OF THE A O I- OPTIMAL P OLICY
A. Optimal Policy Characterization
Given a stationary deterministic policy π , the probabilityof moving from state s = ( B, A, τ, h, g ) to state s (cid:48) =( B (cid:48) , A (cid:48) , τ (cid:48) , h (cid:48) , g (cid:48) ) can be expressed as P ( s (cid:48) | s, π ( s )) (cid:44) P ( B (cid:48) , A (cid:48) , τ (cid:48) , h (cid:48) , g (cid:48) | B, A, τ, h, g, π ( s )) (a) = P ( B (cid:48) , A (cid:48) , τ (cid:48) | B, A, τ, h, g, π ( s )) P ( h (cid:48) ) P ( g (cid:48) ) (b) = C P ( B (cid:48) | B, h, g, π ( s )) P ( A (cid:48) | A, τ, π ( s )) P ( τ (cid:48) | τ, π ( s )) , (7)where π ( s ) denotes the action taken at state s according to π , P ( h (cid:48) ) and P ( g (cid:48) ) denote the probability mass functionsfor the uplink and downlink channel power gains, and C = P ( h (cid:48) ) P ( g (cid:48) ) . Step (a) follows since the channel power gainsare independent over time from each other and from otherrandom variables. Note that for the case of a Markovianfading channel model, the conditional probabilities P ( h (cid:48) | h ) and P ( g (cid:48) | g ) will replace P ( h (cid:48) ) and P ( g (cid:48) ) , respectively. Theseconditional probabilities are determined according to theMarkovian fading channel model considered in the problem.However, all our analytical results regarding the structure ofthe AoI-optimal policy (derived in the next subsection) willremain the same. Step (b) follows due to the fact that given s and π ( s ) , we can obtain B (cid:48) , A (cid:48) and τ (cid:48) in a deterministic wayseparately from each other using (2)-(4). The optimal policy π (cid:63) can be characterized using following Lemma. Lemma 1.
The policy π (cid:63) can be obtained by solving thefollowing Bellman’s equation for average cost MDPs [30] ¯ A (cid:63) + V ( s ) = min a ∈ A ( s ) Q ( s, a ) , s ∈ S , (8) where V ( s ) is the value function, ¯ A (cid:63) is the achievable averageAoI by π (cid:63) which is independent of the initial state s (0) , and Q ( s, a ) is the expected cost due to taking action a in state s ,which is given by Q ( s, a ) = A + (cid:88) s (cid:48) ∈ S P ( s (cid:48) | s, a ) V ( s (cid:48) ) , (9) where P ( s (cid:48) | s, a ) can be computed using (7) . In addition, theoptimal action taken at state s can be evaluated as π (cid:63) ( s ) = arg min a ∈ A ( s ) Q ( s, a ) . (10)The value function V ( s ) can be obtained iteratively usingthe VIA [30]. Particularly, according to the VIA, the valuefunction at iteration k , k = 1 , , · · · , is evaluated as V ( s ) ( k ) = min a ∈ A ( s ) Q ( s, a ) ( k − = min a ∈ A ( s ) (cid:40) A + (cid:88) s (cid:48) ∈ S P ( s (cid:48) | s, a ) V ( s (cid:48) ) ( k − (cid:41) , (11) where s ∈ S . Hence, π (cid:63) ( s ) at iteration k is given by π (cid:63) ( k ) ( s ) = arg min a ∈ A ( s ) Q ( s, a ) ( k − . (12)Note that in each iteration of the VIA, the optimal actionat each system state needs to be computed using (12) (thisis referred to as the policy improvement step). Under anyinitialization of value function V ( s ) (0) , according to the VIA,the sequence (cid:8) V ( s ) ( k ) (cid:9) converges to V ( s ) which satisfies theBellman’s equation in (8), i.e., lim k →∞ V ( s ) ( k ) = V ( s ) . (13)In the next subsection, we will use the VIA to explorethe structural properties of π (cid:63) , which will be exploited toreduce the computational complexity of the VIA (as will bedemonstrated in Remark 2). Note that the obtained analyticalresults can be derived using Relative VIA (RVIA) as well [30]. B. Structural Properties of the Optimal Policy
Lemma 2.
The value function V ( s ) corresponding to π (cid:63) is:(i) non-decreasing w.r.t. A and τ , and (ii) non-increasing w.r.t. B , g and h .Proof: We first prove that V ( B, A, τ, h, g ) is non-decreasing w.r.t. A . Let s \ x denote the combination ofstate s variables excluding the variable x . Define s =( B , A , τ , h , g ) and s = ( B , A , τ , h , g ) such that A ≤ A and s \ A = s \ A . Therefore, the goal is toshow that V ( s ) ≤ V ( s ) . Clearly, it is sufficient to showthat the relation holds over all iterations of the VIA, i.e., V ( s ) ( k ) ≤ V ( s ) ( k ) , ∀ k . We prove that using mathematicalinduction as follows. For k = 0 , the relation holds since we canchoose the initial values { V ( s ) (0) } s ∈ S arbitrary. Now, for anarbitrary value of k , we show that having V ( s ) ( k ) ≤ V ( s ) ( k ) leads to V ( s ) ( k +1) ≤ V ( s ) ( k +1) . From (11) and (12), V ( s ) ( k +1) and V ( s ) ( k +1) are given, respectively, by V ( s ) ( k +1) = A + (cid:88) s (cid:48) ∈ S P ( s (cid:48) | s , π (cid:63) ( k +1) ( s )) V ( s (cid:48) ) ( k )(a) ≤ A + (cid:88) s (cid:48) ∈ S P ( s (cid:48) | s , π (cid:63) ( k +1) ( s )) V ( s (cid:48) ) ( k )(b) = A + C (cid:88) g (cid:48) (cid:88) h (cid:48) V ( ¯ B , ¯ A , ¯ τ , h (cid:48) , g (cid:48) ) ( k ) , (14) V ( s ) ( k +1) = A + (cid:88) s (cid:48) ∈ S P ( s (cid:48) | s , π (cid:63) ( k +1) ( s )) V ( s (cid:48) ) ( k )(b) = A + C (cid:88) g (cid:48) (cid:88) h (cid:48) V ( ¯ B , ¯ A , ¯ τ , h (cid:48) , g (cid:48) ) ( k ) , (15)where step (a) follows since it is not optimal to take ac-tion π (cid:63) ( k +1) ( s ) in state s ; step (b) follows from (2)-(4)and (7) where for a given π (cid:63) ( k +1) ( s ) : 1) ¯ B i and ¯ τ i aredetermined using (2) and (4), respectively, and 2) ¯ A i isevaluated from (3), i ∈ { , } . Note that ¯ B = ¯ B and ¯ τ = ¯ τ for π (cid:63) ( k +1) ( s ) ∈ A since we have B = B and τ = τ . On the other hand, since A ≤ A , we can observefrom (3) that ¯ A ≤ ¯ A for π (cid:63) ( k +1) ( s ) ∈ A , and hence V ( ¯ B , ¯ A , ¯ τ , h (cid:48) , g (cid:48) ) ( k ) ≤ V ( ¯ B , ¯ A , ¯ τ , h (cid:48) , g (cid:48) ) ( k ) . There-fore, V ( s ) ( k +1) is greater than or equal to the expression in (14) which makes V ( s ) ( k +1) ≤ V ( s ) ( k +1) and indicates thatthe value function is non-decreasing w.r.t. A . Using the sameapproach, we can show that V ( B, A, τ, h, g ) is non-decreasing(non-increasing) w.r.t. τ ( B ). Finally, note that increasing h ( g )reduces E T (increases E H ), which increases the battery levelat the next time slot and hence the value function is reduced.Therefore, V ( B, A, τ, h, g ) is non-increasing w.r.t. h and g .Using the monotonicity property of the value function, asdemonstrated by Lemma 2, the following Theorem character-izes some structural properties of the AoI-optimal policy π (cid:63) . Theorem 1.
For any s = ( B , A , τ , h , g ) and s =( B , A , τ , h , g ) , the AoI-optimal policy π (cid:63) has the follow-ing structural properties:(i) When B ≥ B , s \ B = s \ B and B ≥ b max − (cid:4) E H1 /e q (cid:5) , if π (cid:63) ( s ) = ( I, H ) , then π (cid:63) ( s ) = ( I, H ) .(ii) When B ≥ B , s \ B = s \ B and B ≥ b max − (cid:4) E H1 /e q (cid:5) + E S , if π (cid:63) ( s ) = ( a , H ) , then π (cid:63) ( s ) = ( a , H ) .(iii) When A ≥ A and s \ A = s \ A , if π (cid:63) ( s ) = ( a , T ) ,then π (cid:63) ( s ) = ( a , T ) .(iv) When τ ≥ τ and s \ τ = s \ τ , if π (cid:63) ( s ) = ( S, a ) ,then π (cid:63) ( s ) = ( S, a ) .Proof: We first notice from (10) that when π (cid:63) ( s ) = a , we have Q ( s , a ) − Q ( s , a (cid:48) ) ≤ , ∀ a (cid:48) ∈ A ( s ) . Hence,proving that π (cid:63) ( s ) = a leads to π (cid:63) ( s ) = a is equivalent toshowing Q ( s , a ) − Q ( s , a (cid:48) ) ≤ Q ( s , a ) − Q ( s , a (cid:48) ) , ∀ a (cid:48) (cid:54) = a. (16)For instance, to prove (i), we need to show that (16) holdswhen a = ( I, H ) and a (cid:48) ∈ { ( I, T ) , ( S, H ) , ( S, T ) } . In thefollowing, we prove part (i) while parts (ii), (iii) and (iv) canbe proven similarly. According to (2), the next battery levelfor both states s and s when taking action a = ( I, H ) is b max since we have B ≥ B and B ≥ b max − (cid:4) E H1 /e q (cid:5) .Therefore, we have Q ( s , a ) = Q ( s , a ) since s \ B = s \ B , and showing that (16) holds for (i) reduces to showing that Q ( s , a (cid:48) ) ≤ Q ( s , a (cid:48) ) , ∀ a (cid:48) (cid:54) = a . Now, since B ≥ B , we notefrom (2) that the next battery level of s is greater than or equalto the associated next battery level with s for all possiblevalues of a (cid:48) (cid:54) = a . Therefore, based on Lemma 2 ( V ( s ) is non-increasing w.r.t. B ), we have Q ( s , a (cid:48) ) ≤ Q ( s , a (cid:48) ) , ∀ a (cid:48) (cid:54) = a from (7) and (9). This completes the proof of (i). Remark 1.
Theorem 1 demonstrates the threshold-basedstructure of the AoI-optimal policy π (cid:63) w.r.t. each of the systemstate variables. Specifically from (i) and (ii), we can see that π (cid:63) has a threshold-based structure w.r.t. B when taking action ( I, H ) for B ≥ b max − (cid:4) E H1 /e q (cid:5) (when taking action ( a , H ) for B ≥ b max − (cid:4) E H1 /e q (cid:5) + E S ). For instance, for a fixed s \ B , if B th is the maximum value of B ≥ b max − (cid:4) E H1 /e q (cid:5) for which it is optimal to take an action a = ( I, H ) , then forall states s such that B ≤ B th , the optimal decision is ( I, H ) as well. Similarly, from (iii) and (iv), we observe that π (cid:63) hasa threshold-based structure w.r.t. A and τ when taking actions ( a , T ) and ( S, a ) , respectively. This essentially means that π (cid:63) aims to restrict the occurrence of the scenario of havinga large AoI value at the destination node. In fact, in sucha scenario, π (cid:63) would allocate a time slot for update packet transmission as soon as the source node has enough energyrequired for performing that action so that the average AoI atthe destination node (expressed in (5)) is minimized. One can also show that (16) does not hold when B < B in parts (i) and (ii), A < A in part (iii) or τ < τ in part(iv). Because of this, it is not possible to discuss structuralproperties in this case. Remark 2.
Based on Remark 1, the threshold-based structureof π (cid:63) w.r.t. the system state variables can be exploited toreduce the computational complexity of the VIA in terms of thenumber of required evaluations. More specifically, due to thethreshold-based structure of π (cid:63) , the optimal actions at somestates can be directly determined based on the optimal actionstaken at some other states without performing any evaluations.This, in turn, reduces the number of evaluations needed forthe policy improvement step, and hence the computationalcomplexity of the VIA is reduced. We refer the readers to [12],[31] for a detailed treatment of this point. IV. N
UMERICAL R ESULTS
We model the uplink and downlink channel power gainsbetween the source and destination as g = h = δθ d − β ; δ is the gain of the signal power at a distance of meter, d − β models power law path-loss with exponent β , and θ ∼ exp(1) denotes the small-scale fading gain. Each state variable isdiscretized into levels. Considering a similar simulationsetup to that of [29], we use W = 1 MHz, d = 25 meters, P t = 37 dBm, P max = 12 dBm, σ = − dBm, M = 12 Mbits, B max = 0 . mjoules, a = 1500 , b = 0 . , δ = 4 × − , and β = 2 . We also consider that the sensitivityof the power received at the RF energy harvesting circuit is − dBm. Note that we use the red (blue) color to represent a = T ( a = H ) whereas the circle (square) marker torepresent a = S ( a = I ) .First, from Figs. 2a, 2b, 2c and 2d, we can verify theanalytical structural properties of π (cid:63) derived in Theorem 1.For instance, we can observe from Figs. 2a and 2b that π (cid:63) has a threshold-based structure w.r.t. A ( τ ) when action ( a , T ) (action ( S, a ) ) is taken, as derived in parts (iii) and (iv) ofTheorem 1. In addition, parts (i) and (ii) of Theorem 1 can beverified from Figure 2c. For instance, since (cid:4) E H /e q (cid:5) = 9 and E S = 4 , we can see that: 1) since the optimal action at thepoint (3 , is ( I, H ) , it is optimal to take action ( I, H ) at thepoints ( B, , ≤ B ≤ (part (i) in Theorem 1), and 2) sincethe optimal action at the point (9 , is ( S, H ) , it is optimal totake action ( S, H ) at the points ( B, , ≤ B ≤ (part (ii)in Theorem 1). Second, the impact of E S on π (cid:63) is revealedin Figs. 2a and 2b, where (cid:6) E T /e q (cid:7) = 2 . In particular, wediscuss this impact in two different regimes: 1) the value of E S is comparable with B ( E S /B = 3 / in Fig. 2a), and 2) E S is small w.r.t. B ( E S /B = 3 / in Fig. 2b). We observethat when E s is comparable with B and τ is relatively large,it is optimal to take action ( S, H ) and save energy that couldbe used for an update packet transmission for future packettransmissions when τ is small. Note that this insight can alsobe obtained for small values of A (e.g., A = 1 in Fig. 2d). Third, we show the impact of M on the optimal achievableaverage AoI ( ¯ A (cid:63) ) in Fig. 2e. As expected, ¯ A (cid:63) monotonicallyincreases w.r.t. M since the larger M , the larger is E T requiredfor its transmission. Finally, in Fig. 2f, we demonstrate theimportance of our proposed joint sampling and updating policyby comparing its achievable average AoI with that of thegenerate-at-will policy proposed in [27]. The generate-at-willpolicy just decides whether to allocate each time slot foran update packet transmission or WET such that the updatepackets are only generated at the beginning of the time slotsallocated for update packet transmissions. This means that thegenerate-at-will policy does not optimize the timing of updatepacket generations, and hence Fig. 2f captures the impact ofoptimally generating update packets on ¯ A (cid:63) . We observe fromFig. 2f that the achievable average AoI by our proposed policysignificantly outperforms that of the generate-at-will policy[27] especially when M is large and/or when E S is large.This happens since it becomes crucial in such cases to wiselydecide the timing of update packet generations so that theenergy available at the battery can be efficiently utilized toachieve a small value of average AoI.V. C ONCLUSION
This paper has studied the long-term average AoI minimiza-tion problem for wireless powered communication systemswhile taking into account the costs of generating status updatesat the source nodes. The problem was modeled as an averagecost MDP for which its corresponding value function wasshown to be monotonic w.r.t. state variables. We analyticallydemonstrated the threshold-based structure of the AoI-optimalpolicy w.r.t. state variables. Our numerical results revealedthat when the energy required for an update packet generationis comparable with the energy available in the battery, theoptimal action mainly depends on the time elapsed since thegeneration of the current packet available at the source. Inparticular, it is optimal to generate a new update packet ifthe current packet available at the source was generated froma relatively long time ago. Our results also demonstrated theimportance of optimally generating status updates by showingthat the performance of our proposed joint sampling andupdating policy significantly outperforms that of the generate-at-will policy in terms of the achievable average AoI. Apromising avenue of future work is to extend our analysisand results to the scenario with multiple source nodes. Giventhe prohibitive complexity of the problem resulting from theextreme curse of dimensionality in the state space of itsassociated MDP, it is difficult to tackle it with conventionalapproaches. A feasible option it to use deep reinforcementlearning-based algorithms to reduce the complexity of the statespace while learning the optimal policy at the same time.R
EFERENCES[1] M. A. Abd-Elmagid, N. Pappas, and H. S. Dhillon, “On the role ofage of information in the Internet of things,”
IEEE Commun. Magazine ,vol. 57, no. 12, pp. 72–77, 2019.[2] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often shouldone update?” in
Proc., IEEE INFOCOM , 2012. A (a) A (b) B (c) B (d) Update packet size (bits) O p t i m a l a v e r age A o I (e) Update packet size (bits) O p t i m a l a v e r age A o I Generate-at-willProposed policy (f)Fig. 2. Structure of the AoI-optimal policy: (a) E S = 3 and B = g = h = 5 , (b) E S = 3 , g = h = 5 , and B = 9 , (c) E S = 4 , g = h = 6 , and A = 5 ,and (d) E S = 4 , g = h = 6 , and A = 1 . System design insights: (e) Impact of E S on the optimal achievable average AoI, and (f) Comparison between theperformance of the proposed joint sampling and updating policy and that of the generate-at-will policy proposed in [27].[3] R. Talak, S. Karaman, and E. Modiano, “Minimizing age-of-informationin multi-hop wireless networks,” in Proc., Allerton Conf. on Commun.,Control, and Computing , 2017.[4] B. Buyukates, A. Soysal, and S. Ulukus, “Age of information in Two-hop multicast networks,” in
Proc., IEEE Asilomar , 2018.[5] I. Kadota, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, “Minimizingthe age of information in broadcast wireless networks,” in
Proc., AllertonConf. on Commun., Control, and Computing , 2016.[6] M. Bastopcu and S. Ulukus, “Who should google scholar update moreoften?” in
Proc., IEEE INFOCOM Workshops , 2020.[7] M. K. Abdel-Aziz, C.-F. Liu, S. Samarakoon, M. Bennis, and W. Saad,“Ultra-reliable low-latency vehicular networks: Taming the age of infor-mation tail,” in
Proc., IEEE Globecom , 2018.[8] A. Kosta, N. Pappas, and V. Angelakis, “Age of information: A newconcept, metric, and tool,”
Foundations and Trends in Networking , 2017.[9] Y. Sun, I. Kadota, R. Talak, and E. Modiano, “Age of information: A newmetric for information freshness,”
Synthesis Lectures on CommunicationNetworks , 2019.[10] Y. Gu, H. Chen, Y. Zhou, Y. Li, and B. Vucetic, “Timely status updatein internet of things monitoring systems: An age-energy tradeoff,”
IEEEInternet of Things Journal , vol. 6, no. 3, pp. 5324–5335, 2019.[11] M. A. Abd-Elmagid and H. S. Dhillon, “Average peak age-of-information minimization in UAV-assisted IoT networks,”
IEEE Trans.on Veh. Technology , vol. 68, no. 2, pp. 2003–2008, Feb 2019.[12] B. Zhou and W. Saad, “Joint status sampling and updating for mini-mizing age of information in the Internet of Things,”
IEEE Trans. onCommun. , vol. 67, no. 11, pp. 7468–7482, 2019.[13] M. A. Abd-Elmagid, A. Ferdowsi, H. S. Dhillon, and W. Saad, “Deepreinforcement learning for minimizing age-of-information in UAV-assisted networks,” in
Proc., IEEE Globecom , 2019.[14] P. D. Mankar, Z. Chen, M. A. Abd-Elmagid, N. Pappas, and H. S.Dhillon, “Throughput and age of information in a cellular-based iotnetwork,” 2020, available online: arxiv.org/abs/2005.09547.[15] E. Fountoulakis, N. Pappas, M. Codreanu, and A. Ephremides, “Optimalsampling cost in wireless networks with age of information constraints,”in
Proc., IEEE INFOCOM Workshops , 2020.[16] M. A. Abd-Elmagid, M. A. Kishk, and H. S. Dhillon, “Joint energy andSINR coverage in spatially clustered RF-powered IoT network,”
IEEETrans. on Green Commun. and Networking , vol. 3, no. 1, pp. 132–146,March 2019. [17] R. D. Yates, “Lazy is timely: Status updates by an energy harvestingsource,” in
Proc., IEEE ISIT , 2015.[18] B. T. Bacinoglu, E. T. Ceran, and E. Uysal-Biyikoglu, “Age of infor-mation under energy replenishment constraints,” in
Proc., IEEE ITA ,2015.[19] X. Wu, J. Yang, and J. Wu, “Optimal status update for age of informationminimization with an energy harvesting source,”
IEEE Trans. on GreenCommun. and Networking , vol. 2, no. 1, pp. 193–204, 2018.[20] S. Feng and J. Yang, “Age of information minimization for an energyharvesting source with updating erasures: With and without feedback,”2018, available online: arxiv.org/abs/1808.05141.[21] A. Arafa and S. Ulukus, “Timely updates in energy harvesting two-hop networks: Offline and online policies,”
IEEE Trans. on WirelessCommun. , vol. 18, no. 8, pp. 4017–4030, Aug. 2019.[22] E. T. Ceran, D. G¨und¨uz, and A. Gy¨orgy, “Reinforcement learning tominimize age of information with an energy harvesting sensor with harqand sensing cost,” in
Proc., IEEE INFOCOM Workshops , 2019.[23] G. Stamatakis, N. Pappas, and A. Traganitis, “Control of status updatesfor energy harvesting devices that monitor processes with alarms,” in
Proc., IEEE GLOBECOM Workshops , 2019.[24] O. Ozel, “Timely status updating through intermittent sensing andtransmission,” 2020, available online: arxiv.org/abs/2001.01122.[25] Y. Lu, K. Xiong, P. Fan, Z. Zhong, and K. B. Letaief, “Optimal onlinetransmission policy in wireless powered networks with urgency-awareage of information,” in
Proc., IEEE IWCMC , 2019.[26] I. Krikidis, “Average age of information in wireless powered sensornetworks,”
IEEE Wireless Commun. Letters , 2019.[27] M. A. Abd-Elmagid, H. S. Dhillon, and N. Pappas, “A reinforcementlearning framework for optimizing age of information in RF-poweredcommunication systems,”
IEEE Trans. Commun. , to appear.[28] A. Biason and M. Zorzi, “Battery-powered devices in wpcns,”
IEEETransactions on Communications , vol. 65, no. 1, pp. 216–229, 2017.[29] E. Boshkovska, D. W. K. Ng, N. Zlatanov, and R. Schober, “Practicalnon-linear energy harvesting model and resource allocation for SWIPTsystems,”
IEEE Commun. Letters , 2015.[30] D. P. Bertsekas, “Dynamic programming and optimal control 3rd edition,volume ii,”
Belmont, MA: Athena Scientific , 2011.[31] Y.-P. Hsu, E. Modiano, and L. Duan, “Scheduling algorithms forminimizing age of information in wireless broadcast networks withrandom arrivals,”