[PDF] Channels, Remote Estimation and Queueing Systems With A Utilization-Dependent Component: A Unifying Survey Of Recent Results

Abstract

In this article, we survey the main models, techniques, concepts, and results centered on the design and performance evaluation of engineered systems that rely on a utilization-dependent component (UDC) whose operation may depend on its usage history or assigned workload. Specifically, we report on research themes concentrating on the characterization of the capacity of channels and the design with performance guarantees of remote estimation and queueing systems. Causes for the dependency of a UDC on past utilization include the use of replenishable energy sources to power the transmission of information among the sub-components of a networked system, and the assistance of a human operator for servicing a queue. Our analysis unveils the similarity of the UDC models typically adopted in each of the research themes, and it reveals the differences in the objectives and technical approaches employed. We also identify new challenges and future research directions inspired by the cross-pollination among the central concepts, techniques, and problem formulations of the research themes discussed.

Full PDF

aa r X i v : . [ m a t h . O C ] M a y Channels, Learning, Queueing and RemoteEstimation Systems With AUtilization-Dependent Component

Varun Jog, Richard J. La and Nuno C. Martins

Abstract —In this article, we survey the main models, tech-niques, concepts, and results centered on the design and perfor-mance evaluation of engineered systems that rely on a utilization-dependent component (UDC) whose operation may depend on itsusage history or assigned workload. More speciﬁcally, we reporton research themes concentrating on the characterization of thecapacity of channels and the design with performance guaranteesof learning algorithms, queueing and remote estimation systems.Causes for the dependency of a UDC on past utilization includethe use of replenishable energy sources to power the transmissionof information among the sub-components of a networked system,the inﬂuence of the dynamics of optimization iterates on theconvergence of learning mechanisms and the assistance of ahuman operator for servicing a queue. Our analysis unveilsthe similarity of the UDC models typically adopted in eachof the research themes, and it reveals the differences in theobjectives and technical approaches employed. We also identifynew challenges and future research directions inspired by thecross-pollination among the central concepts, techniques andproblem formulations of the research themes discussed.

Index Terms —Channel capacity, learning algorithms, taskscheduling, queueing, remote estimation, energy harvesting, hu-man factors, age of information, security.

I. I

NTRODUCTION

As new technologies and applications emerge, the algo-rithms that determine the functionality and regulate the opera-tion of engineered systems have to contend with unexampledconstraints and nonstandard problems. This evolution has beenevident for communication [1], cyber-physical [2], [3], human-assisted [4], networked estimation [5] and control [6] systems,which are now designed for maximal performance subjectto restrictions that are more intricate than the conventionallimits on reliability and power usage. In this article, weprovide a partial account of such advances by surveyingmodels, concepts and results on the characterization of channelcapacity, and the design and performance analysis of learningalgorithms, queuing and remote estimation systems, all ofwhich have in common the unconventional attribute of relyingon a component whose performance may be constrained by itsusage history and possibly also be affected by the workloadassigned to it. We refer succinctly to this class of components

The authors are listed according to the lexicographical order of their lastnames. Varun Jog is with the Electrical and Computer Engineering Departmentat UW-Madison. Richard J. La and Nuno Miguel Lara Cintra Martins are withthe Department of Electrical and Computer Engineering and the Institutefor Systems Research, University of Maryland, College Park, MD, 20742USA. E-mails: [email protected], { hyongla,nmartins } @umd.edu .Work writing this article was supported by AFOSR grant FA95501510367,NSF grant ECCS 1446785 and NIST grant 70NANB16H024 as UDC , which stands for utilization-dependent component.We do not aim at a comprehensive survey; instead, we willcite a selection of published work relevant to each key concept,problem formulation or technique on an as-needed basis forillustration.A primary goal of this article is to foster future researchthat builds on the cross-pollination among the methods andproblem formulations originally developed and employed oneach of the research themes broached.

Paper structure:

After the Introduction, in Section II wedeﬁne a class of UDC that is general enough to model theperformance restrictions imposed by the reliance on the energyharvested from stochastic sources, human-assisted decision-making, or human labor. In Section III, we introduce widely-used models quantifying certain performance-limiting factors,such as mental workload, queueing workload and the state ofcharge of the battery of an energy harvesting module. Subse-quently, in Sections IV - VII, we employ these deﬁnitions as aunifying framework to discuss research on methods to designand analyze the performance of systems comprising a UDC inthe context of communication, learning algorithms, queueingand remote estimation, respectively. This article ends with theconclusions and future directions proposed in Section VIII.II. A G

ENERAL U TILIZATION -D EPENDENT C OMPONENT (UDC) M

ODEL

We start by proposing a model that is general enough todescribe all types of UDC considered throughout this article.Without loss of generality, we limit our discussion to discrete-time processes and models. Namely, time takes values in theset of non-negative integers N , and we use N + to indicate thesubset that excludes . Deﬁnition 1. (UDC Model)

The following describes the twomain sub-components of the UDC model (see Fig. 1). • The ﬁrst sub-component is a partially-observed controlledMarkov chain (POCMC), whose state is represented as S := { S ( k ) : k ∈ N } . It has two inputs denoted as Y := { Y ( k ) : k ∈ N } and U := { U ( k ) : k ∈ N } ,where the latter is an external control process. Theoutputs are indicated as O := { O ( k ) : k ∈ N } and W := { W ( k ) : k ∈ N } . The former is characterizedby an output kernel and is available to the policy thatgenerates U and, in some cases, also X , while thelatter is a deterministic function of S that we refer toas the performance process. The processes U , Z , S S ( s + | s, u, y ) (state recursion) O ( o | s, u ) (output kernel)UDC POCMC A ( y | x, w ) UDC Action Kernel

OU WY YX

Fig. 1. Basic overall structure of the utilization-dependent component (UDC)model. and W take values in given alphabets U , Z , S and W ,respectively, which are subsets of real coordinate spaces.The POCMC is speciﬁed by maps S : S × U × Y → [0 , , O : O × S × U → [0 , and W : U × S → W . The ﬁrst twodetermine the state transition probability and the outputkernel as follows: S ( s + | s, u, y ) := P S ( k +1) | S ( k ) ,U ( k ) ,Y ( k ) ( s + | s, u, y ) ,s, s + ∈ S , u ∈ U , y ∈ Y O ( v | s, u ) := P O ( k ) | S ( k ) ,U ( k ) ( v | s, u ) ,v ∈ O , s ∈ S , u ∈ U The performance process is determined as W : (cid:0) U ( k ) , S ( k ) (cid:1) W ( k ) . • The second sub-component is an action kernel that mod-els the functionality whose performance is affected by theprocess W . More speciﬁcally, the output of the actionkernel is Y and the inputs are W and an external sourceor command signal denoted as X := { X ( k ) : k ∈ N } .The processes X and Y take values in given alphabets X and Y , respectively, which are subsets of real coordinatespaces. A map A : X × S → [0 , speciﬁes probabilisti-cally Y in terms of X and S as follows: A ( y | x, s ) := P Y ( k ) | X ( k ) ,S ( k ) ( y | x, s ) ,y ∈ Y , x ∈ X , s ∈ S The deﬁnition of the UDC model is not complete until we spec-ify the probabilistic dependence among the implicit sourcesof randomness of the state recursion and the output kernels.These will be particularized throughout the text on an as-needed basis. Typically, S ( k + 1) and O ( k ) are conditionallyindependent given S ( k ) , U ( k ) and Y ( k ) ; and Y ( k ) and O ( k ) are conditionally independent given S ( k ) , U ( k ) and X ( k ) . Although most existing work adopts variations of the mod-els discussed in Section III, we opted to deﬁne a UDC modelthat is general enough to be used as a common framework forfuture work.III. C

OMMONLY USED

POCMC

MODELS

We proceed with deﬁning a few common POCMC models,which we will invoke later appropriately altered to suit aspeciﬁc application. In Sections III-A and III-B, we will specify W as a function of U and we will explicitly describea state recursion in cases when it is clearer to do so. Becausethe output kernel is application dependent, we will defer itsspeciﬁcation on an as-needed basis to Sections IV-VII. Typicalcases include when O equals S or when additive measurementnoise is present.To be concise, when the POCMC is deterministic, wespecify it via the functional recursion that governs the stateupdate in terms of the inputs, and we express each output asa function of the current state and inputs. In the stochasticcase, the probabilistic state recursion and output kernel canalways be speciﬁed by the conditional probabilities associatedwith S and O , respectively. The action kernel is speciﬁed inan analogous manner.In order to appropriately indicate the dependence on certainparameters, or to discern which model is associated with agiven internal process, such as S and W , we often annotatethem with a self-descriptive superscript. A. Utilization Ratio and Workload Models

In human-assisted systems, the concept of mental work-load [4] refers broadly to the burden imposed on a humanoperator by the difﬁculty of and frequency with which tasks areassigned. Hence, considering that it is known to inﬂuence theperformance of a human operator [7], quantifying workload isimportant for the design of task assignment policies. In spiteof having a rather simple structure, [8, Chapter 11] explainswhy the utilization ratio deﬁned below is a pertinent mentalworkload metric. Here, the UDC is a human operator who hasto service tasks from a queue. The types of services carriedout by an operator include classiﬁcation, supervision [9], [10]or assembly jobs within a production system [11].

Deﬁnition 2. Utilization ratio:

The utilization ratio POCMCfor a given positive averaging horizon T is deﬁned as: W R, T ( k ) = 1 T min { T,k } X i =1 U ( k − i ) , k ∈ N + (1) where we adopt the convention that W R, T = R + , W R, T (0) = 0 and U is a scalar non-negative process that governs the levelof utilization. In its simplest and most prevalent form, the set of utilizationlevels U would be { , } , and U ( k ) = 1 and U ( k ) = 0 wouldindicate whether the component is being used or not, respec-tively, at time k . The research on intelligent task managementreported in [12], also uses U = { , } for the followingalternative mental workload metric quantifying utilization ratiowith a forgetting factor. Deﬁnition 3. Utilization ratio with forgetting factor:

Givena forgetting factor α in (0 , , the associated utilization ratioPOCMC is deﬁned as: S F ,α ( k + 1) = αS F ,α ( k ) + U ( k ) , k ∈ N , S F ,α (0) = 0 (2) W F ,α ( k ) = (1 − α ) S F ,α ( k ) , k ∈ N (3) where we adopt the convention that W F ,α and S F ,α are R + and U is a scalar non-negative process that governs the level of utilization. Notice that the performance process canbe computed directly for k greater than or equal to as W F ,α ( k ) = (1 − α ) P k − i =0 α k − i − U ( i ) . The authors of [13] also adopt the utilization ratio withforgetting factor to model the dynamics of the temperatureof the circuitry of the transmitter that broadcasts informationacross an additive noise link, subject to a transmission powerprocess U . In this context, U is R + and the UDC is the result-ing communication channel whose performance is adverselyaffected by the thermal noise that intensiﬁes with increasingtemperature. Related work on allocation of energy harvestedfrom a stochastic source for wireless transmission subject toconstraints on temperature is reported in [14], while thermaleffects were considered in [15] in the context of distributedestimation.In contrast to the concept of mental workload, in thequeueing literature the workload affecting the performance ofthe server quantiﬁes the effort needed to complete the tasksapportioned to the server, but not yet completed. The followingis a discrete-time approximation of the continuous-time modelgoverning the workload process analyzed in [16]. Deﬁnition 4. Queueing Workload:

The following deﬁnesthe queueing workload POCMC for a component acting asa server: S W ( k + 1) = S W ( k ) + U ( k ) − ˇ Y ( k ) , k ∈ N , S W (0) = 0 (4) W W ( k ) = S W ( k ) (5) where S W , U and Y are N . Here, U ( k ) and ˇ Y ( k ) may representthe number of work quanta, or effort, associated with theincoming and completed tasks at time k , respectively. Noticethat, given the structure in Fig. 1, ˇ Y ( k ) must be determined asa function of Y ( k ) . We assume that ˇ Y ( k ) must be zero when S W ( k ) and U ( k ) are zero, which requires a properly deﬁnedaction kernel .B. Energy harvesting models We also consider cases in which the state of the POCMCis governed not only by utilization but, unlike the modelscovered in Section III-A, also by extrinsic stochastic processes.Prime examples of these, include models of the so-calledstate of charge (SOC) quantifying the energy stored in abattery that is repeatedly recharged using energy harvestedfrom unsteady sources. We propose the following model thatis both a generalization and an adaptation to our discrete-time framework of ubiquitous models, such as those usedin [17], [18]. Our model is general enough to capture theeffects described in [19, Part IV] for microbatteries that areoften used in small devices powered by energy harvesting,including implanted medical devices [20].

Deﬁnition 5. Energy harvesting (EH) model

Let S A := { S A ( k ) : k ∈ N } be a given homogeneous Marko-vian process that quantiﬁes not only the energy harvested In particular, this will require that Y ( k ) carries enough information todetermine when U ( k ) is zero. over time but possibly also other stochastic phenomena thatinﬂuence the operation of the battery and its recharging sub-systems. This process takes values in a subset S A of a realcoordinate space. A given map ∆ E : S SOC × S A × R + → R + governs the dynamics of the energy harvesting POCMC ac-cording to following recursion: S SOC ( k + 1) = S SOC ( k ) + ∆ E (cid:0) S SOC ( k ) , S A ( k ) , W E ( k ) (cid:1) , k ∈ N where we assume that S SOC ( k ) , which quantiﬁes the SOC attime k , is in S SOC := [0 , ¯ s SOC ] and ¯ s SOC denotes the maximumSOC. The initial SOC is quantiﬁed by S SOC (0) . Here, W E takesvalues in R + and quantiﬁes the energy effectively extractedfrom the battery for use by the action kernel. The map ∆ E quantiﬁes the net change in the battery charge resulting fromthe difference between the effect of W E ( k ) and the energyharvested. The state of the POCMC can be chosen as S E := (cid:8)(cid:0) S SOC ( k ) , S A ( k ) (cid:1) : k ∈ N (cid:9) . The map W E that determines W E in terms of S E and U is described below in Remark 1. The map ∆ E is characteristic of each battery and it mustsatisfy the following consistency conditions: ∆ E ( s SOC , s A , w E ) ≤ ¯ s SOC − s SOC , (6) ∆ E ( s SOC , s A , w E ) ≥ − s SOC , (7)for all s SOC , s A and w E in S SOC , S A and R + , respectively. We alsoassume that ∆ E ( s SOC , s A , w E ) is continuous with respect to w E . Remark 1. (Description of W E ) As is known since the early work in [21], each battery type hasa discharge curve that characterizes the voltage in terms of theSOC. Invariably, even for modern batteries [22], the voltagedecreases as the SOC drops, which leads to the followingconstraints: • The maximum energy that can be delivered by the batteryat any time k is a decreasing function of the SOC, whichwe represent as D : S SOC → R + . More speciﬁcally, theconstraint is given by: W E ( k ) ≤ D (cid:0) S SOC ( k ) (cid:1) , k ∈ N (8) • There is a positive minimum SOC, denoted as s

SOC , belowwhich the voltage is too low to power the component. Thisleads to the following constraint that must be satisﬁed forevery k in N : (cid:16) ∆ E (cid:0) S SOC ( k ) , S A ( k ) , W E ( k ) (cid:1) + S SOC ( k ) − s SOC (cid:17) W E ( k ) ≥ (9) Consequently, U , which represents the energy requested bythe control policy, may differ from W E . More speciﬁcally, theenergy extracted from the battery is determined in terms of U and the state of the EH model via the map W E : (cid:0) s E , u (cid:1) w E speciﬁed as follows: w E = max ( ˜ w E ≥ (cid:12)(cid:12)(cid:12) ˜ w E ≤ u, ˜ w E ≤ D (cid:0) s SOC (cid:1) , (cid:16) ∆ E (cid:0) s SOC , s A , ˜ w E (cid:1) + s SOC − s SOC (cid:17) ˜ w E ≥ ) The following is a simpliﬁed version of the EH model that ischaracterized by ∆ E with a linear range in which it quantiﬁes the difference between the energy used and harvested, andit also implements a saturation that restricts the SOC to theinterval S SOC . Deﬁnition 6. (Linear-saturated EH model)

The linear-saturated EH model is speciﬁed as follows for every k in N : S SOC ( k + 1) = min { S SOC ( k ) + S A ( k ) − W E ( k ) , ¯ s SOC } (10a) W E ( k ) = ( U ( k ) if U ( k ) ≤ S SOC ( k ) + S A ( k ) S SOC ( k ) + S A ( k ) otherwise (10b) where S A ( k ) , in this simpliﬁed model, represents the energyharvested at time k . The linear-saturated EH model does not capture the effectsof the discharge curve and the changes that the mechanismsof charge and discharge go through as the SOC varies. Thefollowing is another simpliﬁed model in which the SOC takesvalues in a ﬁnite set and evolves as a controlled Markov chain.

Deﬁnition 7. (Finite-state EH model)

The SOC evolves according to a controlled Markov chainwhose state S E := { S E ( k ) : k ∈ N } takes values in S E := { , , . . . , ¯ s E } . The following map determines the prob-ability transition map for S E in terms of U : S E ( s + | s, u, y ) =  ˆΓ s ( u ) if s + = s + 1 , s < ¯ s E ˇΓ s ( u ) if s + = s − , s > − ˆΓ s ( u ) − ˇΓ s ( u ) if s + = s ,s + , s ∈ S E , u ∈ U , y ∈ Y where ˆΓ s : U → [0 , and ˇΓ s : U → [0 , are given mapssatisfying ˇΓ ( u ) = 0 , ˆΓ ¯ s E ( u ) = 0 and ˆΓ s ( u ) + ˇΓ s ( u ) ≤ .Here, we assume that S E ( k + 1) is independent of Y ( k ) when conditioned on S E ( k ) and U ( k ) taking values s and u , respectively. In this case, a given map W E : S E × U → R + determines the energy used by the action kernel as W E : (cid:0) S E ( k ) , U ( k ) (cid:1) W E ( k ) . IV. C

APACITY OF COMMUNICATION CHANNELS

In recent years, there has been a tremendous amount ofresearch focused on energy harvesting wireless communicationsystems. For a comprehensive survey, we refer the reader tosurvey articles [23], [24], [25]. As detailed above, an energyharvesting transmitter may be modeled using the UDC modelwith a state that indicates the amount of charge available forusage. Therefore, we may use a UDC-based energy harvestingmodel to study various communication problems with energyharvesting devices. In what follows, we provide a brief surveyof two key areas: determining channel capacities and optimalscheduling policies in energy harvesting systems.From an information-theoretic perspective, a key problem isidentifying the capacity of an energy harvesting communica-tion channel. The capacity of an additive white Gaussian noise(AWGN) channel with an energy harvesting transmitter wasanalyzed in references [26] and [27], for the inﬁnite battery case and the no-battery case, respectively. Various upper andlower bounds on capacities have been studied in [28], [29],[30], [31], [32]. A general formula for the capacity of a point-to-point energy harvesting channel was established in [33].Reference [33] also established a novel connection between thechannel capacity and the optimal throughput (discussed below)for an energy harvesting transmitter. Beyond point-to-pointchannels, the capacity of energy harvesting MAC channels hasbeen analyzed in [34], [35], where a general capacity formulais derived, along with lower and upper bounds on capacity.A signiﬁcant amount of research has focused on the problemof scheduling for an energy harvesting transmitter. In this case,the transmitter has an energy queue as well as a data queue,and the goal is to transmit data to the recipients in the leastamount of time, or equivalently transmit the maximum amountof data until a certain time. This problem has been studied inthe ofﬂine setting, where the energy arrivals are non-causallyknown, as well as the online setting where the transmitterhas causal information about energy and data arrival [24],[25]. For the ofﬂine case, a variety of channel models havebeen investigated including point-to-point channels, broadcastchannels [36], [37], interference channels [38], and MACchannels [39] . The online case has also been studied forthe point-to-point channel [40], the broadcast channel [41],and the MAC channel [42]. We refer to [42] for a thoroughlist of references concerning online and ofﬂine schedulingin energy harvesting channels. In all the models describedso far, the transmitter utilizes energy for the sole purposeof transmission. Energy harvesting transmitters which expendenergy on sensing, computing, communicating, and possessingimperfect batteries have been surveyed in [25].

A. Channels with evolving power constraints

In addition to energy harvesting systems, we show that theUDC framework may also be used to analyze more generalcommunication channels with time evolving power constraints.We describe these constraints below. The AWGN channel isone of the most popular channel models in information theorydue to its relevance in practical applications. Evaluating thecapacity of the AWGN channel under a variety of powerconstraints is a problem that has received much attention inthe literature. The classical constraint studied by Shannon [43]involved an average power constraint of P avg . In other words,if ( X (1) , X (2) , . . . , X ( n )) is the input to a channel, then itmust satisfy n n X k =1 k X ( k ) k ≤ P avg . Shannon showed that the capacity of this channel is achievedusing a random Gaussian codebook. In addition to the averagepower constraint, another practically relevant power constraintis the peak power constraint. A peak power constraint of P peak stipulates that every input X to the channel should satisfy k X k ≤ P peak . Finding the capacity of this channel in thescalar case was ﬁrst studied by Smith [44]. Smith showedthat, although it is not possible to express the capacity in aclosed-form expression, it may be calculated efﬁciently. The key observation in [44] was that the capacity is achieved by a discrete input distribution that is supported on a ﬁnite numberof atoms in [ − p P peak , p P peak ] .The ﬂexible UDC framework allows us to model a variety ofpower-constrained channels. For example, the average powerand peak power constraint may be restated as k X ( k ) k ≤ min n kP avg − k − X j =1 k X ( j ) k , P peak o , k ∈ N + . It is evident that the power constraint on the k -th channel usedepends not only on P avg and P peak but also on the symbolstransmitted prior to time k . Therefore, this power constraintis utilization dependent. We now provide a description of theUDC framework used for modeling a large class of power-constrained communication channels. Deﬁnition 8 (Evolving power constraints) . An evolving powerconstraint is deﬁned via a sequence of functions { P k : k ∈ N } ,where P k : R k + → R + such that P k ( u , u , . . . , u k ) deter-mines the power constraint on the ( k + 1) -th transmission X ( k +1) , where u l is the power of the l -th transmitted symbol,i.e., u l = k X ( l ) k , l ≥ . Deﬁnition 9 (Evolving power constraint with state) . An evolv-ing power constraint with a state is characterized by threesequences of functions: (i) { f k : k ∈ N } with f k : R k → R ,(ii) { ˜ f k : k ∈ N } , where ˜ f k : R → R , and (iii) { p k : k ∈ N } such that p k : R → R . These functions satisfy the property that f k ( u , . . . , u k ) = ˜ f k ( f k − ( u , . . . , u k − ) , u k ) , i.e., the valueof function f k at time k can be computed from that of function f k − at time k − and u k .An evolving power constraint is said to have a state f k ( u , . . . , u k ) at time k + 1 if the sequence of functions P k in Deﬁnition 8 may be written as P k ( u , . . . , u k ) = p k ( f k ( u , . . . , u k )) . Thus, the power constraint on the k -thsymbol depends on the history of transmitted symbols up totime k − through the state at time k . Evolving power constraints may be used to describe severalpower constraints studied in the literature. We provide a fewexamples below:

Example 1.

Consider the standard average power constrainedcommunication channel. Here, the constraint on the k -thsymbol X ( k ) is given by k X ( k ) k ≤ kP avg − k − X j =1 k X ( j ) k , for some ﬁxed P avg > . This power constraint may becharacterized as an evolving power constraint with state, asfollows: For k ∈ N , deﬁne f k ( x , . . . , x k ) = P kj =1 k x j k . This deﬁnition satisﬁes the property that f k +1 ( x , . . . , x k +1 ) can be calculated using f k ( x , . . . , x k ) and x k +1 ; in partic-ular, ˜ f k +1 ( u, v ) = u + v . The power constraint functions P k as P k ( x , . . . , x k ) = ( k + 1) P avg − f k ( x , . . . , x k ) for every k ∈ N . Example 2.

For an average power constraint of P avg coupledwith a peak power constraint of P peak , the only change from above is that P k ( x , . . . , x k ) = min( P peak , ( k + 1) P avg − f k ( x , . . . , x k )) . Example 3.

For a windowed-average power constraint over awindow T , the state is the total energy expended over the last T − transmitted symbols. By allowing states to be vectorvalued in R T − , this constraint is easily accommodated inDeﬁnition 9. Example 4. A ( σ, ρ ) -power constraint is found to be relevantin energy harvesting applications as well as neuroscience. The ( σ, ρ ) -power constraint is deﬁned as follows: Let σ, ρ ≥ . Acodeword ( x , x , . . . , x n ) is said to satisfy a ( σ, ρ ) -powerconstraint if l X j = k +1 x j ≤ σ + ( l − k ) ρ for all ≤ k < l ≤ n. (11) The ( σ, ρ ) -power constraint essentially imposes a restrictionon how bursty the transmit power can be, by constraining thetotal energy consumed over every interval to be approximatelylinear in the length of the interval. In energy harvestingcommunication systems, a ( σ, ρ ) -power constraint may be usedto model a transmitter that harvest ρ units of energy per unittime, and is equipped with a battery with capacity σ unitsthat is used to store unused energy for future transmissions.The ( σ, ρ ) -power constraints can be expressed equivalently bytracking a state parameter σ i , that keeps track of the tightestconstraint among the k + 1 inequalities for x k +1 . The statefunction σ k evolves as follows: σ k ( x , . . . , x k ) = min( σ, σ k − ( x , . . . , x k − ) + ρ − x k ) . The power constraint function P k is deﬁned as P k ( x , . . . , x k ) = σ k ( x , . . . , x k ) + ρ . Note that ρ neednot be constant over time, and such dependence or variabilitywith respect to time is useful in modeling energy harvestingwith arbitrary amounts of energy ρ k harvested at time k . We deﬁne a UDC model that imposes evolving power con-straints on an action kernel that is a communication channel.

Deﬁnition 10 (POCMC component) . Let { ( P k , f k , ˜ f k ) : k ∈ N } be an evolving power constraint with state. The POCMCcomponent has state S ( k ) that tracks the state of the powerconstraint at time k , i.e. S ( k ) = f k − ( u , . . . , u k − ) . An input U ( k ) indicates the desired power output for time k , i.e. theenergy required to send symbol X ( k ) . The performance pro-cess W ( k ) is equal to the power constraint imposed on X ( k ) .In other words, W ( k ) = P k − ( u , . . . , u k − ) = p k − ( S ( k )) is a deterministic function of S ( k ) . The state at time k + 1 satisﬁes S ( k + 1) = ˜ f k ( S ( k ) , u k ) . Deﬁnition 11 (Action kernel) . The action kernel is a commu-nication channel with input X and output Y . At time k , the k -th symbol X ( k ) is scheduled to be transmitted. The outputof the channel Y ( k ) depends on the input X ( k ) , the noise inthe channel, as well as the performance process W ( k ) . Twonatural cases to consider are: If U ( k ) ≤ W ( k ) , then X ( k ) is transmitted unaltered.Otherwise, X ( k ) is rescaled to have power W ( k ) , i.e. X ′ ( k ) is transmitted across the channel where X ′ ( k ) = X ( k ) × q W ( k ) U ( k ) . If U ( k ) ≤ W ( k ) , then X ( k ) is transmitted unaltered.Otherwise, there is no transmission, i.e. X ′ ( k ) is trans-mitted across the channel where X ′ ( k ) = 0 . Note that since these are deterministic power constraints,the transmitter can calculate the power constraints on the k -thsymbol in advance and ensure that X ( k ) satisﬁes these powerconstraints. However, this is not possible when the powerconstraints are random. An example of random power con-straint is the following. Consider an arbitrary i.i.d. stochasticprocess E ( k ) . We may now deﬁne the state as ( S ( k ) , E ( k )) and the power constraint on the k -th symbol is computed via p k ( S ( k ) , E ( k )) . The state evolution proceeds as S ( k + 1) =˜ f k ( S ( k ) , u k , E ( k )) . This particular formulation is relevantto energy harvesting communication systems, discussed inSection III-B. In the absence of any output O ( k ) , the trans-mitter has no way of modifying its k -th symbol to satisfythe power constraints. A variety of feedback settings areworth considering: O ( k ) = S ( k ) or O ( k ) = ( S ( k ) , E ( k )) .Additionally, the transmitter may also receive feedback fromthe receiver, i.e., O ( k ) contains Y ( k ) . The capacity of energyharvesting systems with feedback has also been investigatedin recent years, and it has been found that feedback increasescapacity [45]. In addition to random E ( k ) , yet another settingto consider is when the sequence of E ( k ) is completelyarbitrary, but is known to lie in some set. This is analogous toarbitrarily varying channels (AVCs) [46], [47] and may alsobe modeled using the UDC framework. B. Utilization-dependent Markov channels

Point-to-point communication under memoryless channelsis widely studied and well-understood. More difﬁcult is mod-eling channels that change with time, such as fading channels,or channels that are simply unknown, e.g., AVCs [46], [47].There are other channel models that lie between a memo-ryless channel and an AVC, such as a ﬁnite-state Markovchannel (FSMC) [48], and the channel with action dependentstates [49]. It is easy to model these examples under the UDCframework. The key advantage of the UDC framework is,however, that the channel can also depend on its usage, i.e.,on past symbols transmitted through the channel.

Inter-symbol interference (ISI):

ISI channels are a naturalexample of such channels. We demonstrate a UDC model thatdescribes the discrete-time Gaussian ISI channel as found in[50]. For an input X ( k ) at time k , the output Y ( k ) of thischannel is given by Y ( k ) = T − X i =0 h ( i ) X ( k − i ) + N ( k ) , (12)where N ( k ) is AWGN. The controlled Markov chain hasthe state S ( k ) := ( X ( k − , . . . , X ( k − T )) ∈ R T − .The performance process is identical to the state. The actionkernel consists of an AWGN channel with input X ( k ) andoutput Y ( k ) . Based on the performance process, the outputis evaluated as per equation (12). The input to the controlled S A ( k ) C S A ( k ) ( ·|· ) O ( k ) X ( k ) S A ( k ) Y ( k ) Y ( k ) Fig. 2. A utility dependent ﬁnite-state Markov Channel (FSMC).

Markov process at time k is U ( k ) := X ( k ) . The state at time k + 1 is given by S ( k + 1) = ( X ( k ) , . . . , X ( k − T + 1)) .The main point to note above is that the channel dependson the past usage, and is therefore a utilization-dependentchannel. Just as the FSMC, we provide an example here of aﬁnite-state Markov channel that is also utilization-dependent. Utilization-dependent FSMC:

For ﬁnite discrete sets X and Y , a channel with input X ∈ X and output Y ∈ Y isdeﬁned as a the collection of probability measures p Y | X = x ( · ) for each x ∈ X . Consider a ﬁnite number of channels { C , . . . , C M } . For each x ∈ X and i, j ∈ { , . . . , M } ,consider the function θ ( x, i, j ) , and the random process { S A ( k ) : k ≥ } over the set { , , . . . , M } as follows: P ( S A ( k + 1) = j | S A ( k ) = i, X ( k ) = x ) = θ ( x, i, j ) Naturally, we have θ ( · , · , · ) ≥ and P Mj =1 θ ( i, j, x ) = 1 for all i ∈ { , . . . , M } and all x ∈ X . The random process { S A ( k ) : k ≥ } dictates which channel is available at each time k inthe action kernel. The main point to note is that the index ofthe channel at time k + 1 , which is S A ( k + 1) , depends notonly on S A ( k ) , but also on the symbol X ( k ) transmitted attime k . A utilization-dependent FSMC is described in Fig. 2.The main components are described below. Markov process and action kernel:

The controlled Markovchain has input X ( k ) , and state S A ( k ) . The performanceprocess is identical to the state. The action kernel is thecommunication channel, and it also has input X ( k ) . Thechannel at time k is C S A ( k ) , and a random output Y ( k ) isgenerated by passing X ( k ) through this channel. At time k +1 ,the state S A ( k + 1) equals j ∈ { , , . . . , M } with probability θ ( S A ( k ) , j, X ( k )) . The simplest FSMC channel to consider isthe Gilbert-Elliott channel [51], where the input and output arebinary. The set of channels is also binary and consists of twobinary symmetric channels. A utilization-dependent Gilbert-Elliott channel is completely speciﬁed in terms of a function θ : { , } → [0 , . There are a number of open problemsconcerning such channels: Is it possible to calculate channelcapacity in closed-form? If the input { X ( k ) : k ∈ N } isﬁxed to be Markov, what is the maximum achievable channelcapacity? Is it possible to generalize these results beyondbinary channels?V. L EARNING ALGORITHMS AND ADVERSARIAL MODELS

In recent years, information-theoretic techniques haveemerged as effective tools to study optimization proceduresin machine learning problems [52], [53], [54]. A commonproblem setting is as follows: A dataset D is constructed by drawing N i.i.d. samples from a distribution p U over a set U D ; i.e., D = { U , . . . , U N } where U i ∼ p U and, hence, D ∼ p ⊗ NU . A loss function ℓ : W × U D → R , where W isa set of parameters that govern a machine learning algorithm.The goal is to identify w ∗ ∈ W such that w ∗ = argmin w ∈ W E [ ℓ ( w, U )] , where U has distribution p U . The above expression cannot beevaluated in general since the data distribution p U is unknown.A natural idea is to use the empirical distribution of U asper dataset D for estimating w ∗ . The optimization problem isformulated as min w ∈ W N N X i =1 ℓ ( w, U i ) . In general, such a problem is nonconvex and intractable.However, it has been observed that a local minimum obtainedvia gradient descent or stochastic gradient descent (SGD) isoften a good enough estimate [55]. Note that the input to analgorithm is D , and the output is some W ∈ W . We maythink of the algorithm as a communication channel that maps D to W via p W | D ( ·|· ) . Recent results in learning theory haveshown that the mutual information I ( D ; W ) provides upperbounds on the generalization error of an algorithm, whichmeasures the degree to which an algorithm overﬁts to the data.Thus, designing an algorithm is equivalent to designing thecommunication channel p W | D ( ·|· ) .In the next subsection, we show how SGD and closelyrelated optimization procedures may be formulated using theUDC framework. The advantage of formulating the problemin this framework is that we are able to describe severalinteresting problems of learning in the presence of adversaries. A. Iterative algorithms

Iterative optimization algorithms produce a sequence ofestimates { W ( k ) : k ∈ N } , where W ( k + 1) is generatedusing W ( k ) , the dataset D , and possibly some independentnoise. SGD is one such procedure which we brieﬂy describebelow. At each time k , a point Z ( k ) ∈ D is chosen accordingto a predeﬁned strategy, and the estimate W ( k ) is computedaccording to W ( k + 1) = W ( k ) − η k ∇ ℓ ( W ( k ) , Z ( k )) . (13)Note that Z ( k ) takes values in the dataset D , and thereforehas a distribution that corresponds to the empirical distributionof the data. Instead of choosing a single point from D , onemay also choose a subset of points and evaluate the averagevalue of the gradient evaluated at each of those points. Forease of exposition, we focus on the case when a single pointis drawn from D . Versions of SGD have been proposed inrecent years which modify the update equation to include anadditional noise term as follows: W ( k + 1) = W ( k ) − η k ( ∇ ℓ ( W ( k ) , Z ( k )) + ξ ( k )) (14)Adding an independent noise ξ ( k ) achieves two goals: itimproves the generalization error, and it helps the optimizationprocedure escape shallow local minima. The update equation of SGD may be easily described using a UDC framework asshown in Fig. 3. Markov process and action kernel:

It is easy to seefrom the update equation (14) that { W ( k ) : k ∈ N } is acontrolled Markov chain. We deﬁne the state of this Markovchain as S ( k ) = ( W ( k ) , Z ( k )) , i.e., the current estimate andthe current sample drawn from D . The performance processis identical to the state. The other input to the action kernelis ξ ( k ) . The action kernel computes the direction to move attime k by evaluating the gradient ∇ ℓ ( W ( k ) , Z ( k )) and adding ξ ( k ) to it. The output of the action kernel is Y ( k ) , and thisis an input to the controlled Markov chain. The state of theMarkov chain is then updated as per equation (14). The outputprocess O ( k ) may be set to equal a noisy version of W ( k ) ,or a delayed version of the same.The noise ξ ( k ) in the above model may be Gaussiannoise, in which case we obtain the popular stochastic gradientLangevin dynamics (SGLD) algorithm [56], or uniformlydistributed over a shell of suitable radius, in which case weobtain the algorithm of Ge et al. [57]. The UDC frameworkmay also be used to model the algorithms such as momentum-based methods [58] in a similar fashion. W ( k + 1) = W ( k ) − η k ( ∇ ℓ ( W ( k ) , Z ( k )) + ξ ( k )) Y ( k ) = ∇ ℓ ( W ( k ) , Z ( k )) + ξ ( k ) O ( k ) Z ( k ) S ( k ) Y ( k ) Y ( k ) ξ ( k ) Fig. 3. Stochastic gradient descent (SGD) via a UDC model.

B. Adversarial models

Incorporating an action kernel with input ξ ( k ) also makesit possible to describe adversarial actions during the trainingprocess. Adversarial attacks on machine learning algorithmshave been extensively studied in the past few years [59]. Manystudies have focused on robustness properties of algorithmswith respect to small perturbations in the test sample. Morerecently, other novel adversarial strategies such as data poi-soning attacks [60], [61], [62], or byzantine gradient attacks[63], [64] have also been proposed. In this subsection, wedemonstrate how a UDC framework may be used to modelsuch adversarial settings. Markov chain and action kernel:

As in the case ofSGD, the controlled Markov chain consists of the state ( W ( k ) , X ( k )) at time k , an output O ( k ) , and an input Z ( k ) .The observation process is assumed to convey informationconcerning the state, such as a noisy version of W ( k ) or delayed updates of the W ( k ) process. The input X ( k ) is sampled minibatch from a dataset D , or it is a newlysampled data point. The performance process is identical tothe state. The adversarial action is modeled via the actionkernel. The adversary has causal access to the output process { O ( k ) : k ∈ N } . The adversary’s goal is to add a noise ξ ( k ) to the gradient to impede the training process. There are various possible adversary models that we may consider:(a) Gradient perturbation: The noise ξ ( k ) is variance (oramplitude) constrained, (b) Data poisoning: The adversarycorrupts the minibatch X ( k ) by adding spurious points, orby perturbing some small subset of points. The action kerneltakes as input the adversary’s poisoned dataset, which wecall X ′ ( k ) , the performance process ( W ( k ) , X ( k )) , and sets Y ( k ) = ∇ ℓ ( W ( k ) , X ′ ( k )) + ζ ( k ) , where ζ ( k ) is additivenoise. The output Y ( k ) is then fed back to the Markovchain, and the state evolves according to the following updateequation: W ( k + 1) = W ( k ) − ηY ( k ) There are several interesting problems one can consider inthis model. These include an information-theoretic analysisof the generalization error in the presence of an adversary,where the output process { O ( k ) : k ∈ N } contains a boundedamount of information about { W ( k ) : k ∈ N } . Anotherproblem is identifying the impact of the adversary on trainingerror. In particular, how strong does the adversary need tobe to ensure that training does not converge? Finally, itwould also be interesting to examine UDC models where theadversary has non-causal information concerning the sequenceof minibatches, i.e., the process { X ( k ) : k ∈ N } that is to beused for training.VI. Q UEUEING S YSTEMS WITH

UDC S

ERVERS

There exists a large volume of literature on queueing sys-tems with time-varying parameters, dating back to the studiesby Conway and Maxwell [65], Jackson [66], Yadin an Naor[67], Gupta [68] and Harris [69], most of which focused on thestate-dependent service rates. We refer a reader interested ina summary of earlier studies on queues with state-dependentparameters to [70] and references therein.In recent years, in addition to energy harvesting in wirelessnetworks discussed in Section IV, much attention is given tothe issue of scheduling in wireless systems in order to improveuser experience (e.g., streaming on a cell phone or a tablet)and to maximize system capacity, using limited spectrum. Inwireless systems, such as cellular systems, channel conditionschange over time, thereby affecting the probability of success-ful transmissions even with adaptive transmit power controland modulation and coding schemes [71], [72], [73], [74].A key difference between the proposed framework andexisting studies on wireless scheduling (e.g., [72], [73]) is thefollowing. The studies on wireless scheduling which take time-varying channel conditions into consideration, assume that thechannel conditions change independently of the actions takenby the scheduler. In other words, the input process to the actionkernel, namely the performance process W , does not dependon the past actions of scheduler. Therefore, W can be viewedas an independent exogenous process to the action kernel,and the feedback loop present in Fig. 1 is absent in thesestudies. As a result, they can be considered a special case ofthe proposed framework in which the input process W to theaction kernel does not depend on the POCMC state.The performance and management of human operators andservers has been the subject of many studies in the past, e.g., [75], [76], [77], [78]. Recently, with rapid advancesin information and sensor technologies, human supervisorycontrol (HSC) became an active research area [10], [79]. InHSC, human supervisors play a crucial role in the systems(e.g., supervisory control and data acquisition (SCADA)) andat times are required to process a large amount of informationin a short period with seconds to make a critical decision(e.g., a possible failure of a nuclear reactor due to loss ofcoolant), potentially causing information overload. For thisreason, there is a resurging interest in understanding andmodelng the performance of humans under widely varyingsettings. Although this is still an active research area, it is welldocumented that the performance of humans depends on manyfactors, including arousal and perceived workload [76], [78],[79], [80], [81]. For example, the well-known Yerkes-Dodsonlaw suggests that moderate levels of arousal are beneﬁcial,leading to the inverted-U model [75].In the remainder of this section, we ﬁrst illustrate howthe proposed UDC framework can be used for two examplescenarios in which the service rates depend on either queuelengths or utilization levels. The latter is recently gaininginterest for modeling and studying human supervisors [12],[82]. Also, we note that a similar model is applicable tostudying and designing scheduling policies for multi-coreprocessors with adaptive control of clock speed or voltage ofeach core subject to power and thermal constraints [83]. Sincethe cases in which service rates depend on workloads (insteadof queue lengths) can be handled in an analogous manner, wedo not discuss them here. Furthermore, we limit our discussionto the cases with ﬁxed arrival rate(s) for the sake of simplicityof our discussion. A. Queue length-dependent service rates

Many studies consider a server whose service rate dependson the queue length or backlog [65], [69], [70]. Oftentimes,the service rate of a server is allowed to change only when atask is completed or the server starts servicing a new task, e.g.,[69]. For example, it is well documented that the performanceof human servers (e.g., bank tellers, doctors, nurses, tollcollectors) is affected by various factors including the queuelength (e.g., [80], [84]). In a more recent study, Chatterjeeet al. [85] investigated the capacity of systems in which thereliability or quality of service provided by a server, which ismodeled as the channel quality, depends on the queue lengthfrom an information-theoretic perspective.

Task arrival processes and task workloads:

Suppose thatthere are T different types of tasks ( T ≥ ), and denote the setof task types by T := { , . . . , T } . We also deﬁne another set T + := T ∪ { } that includes the null type (type 0). The arrivalrate vector λ :=( λ , . . . , λ T ) is a T -dimensional vector, andthe t -th element λ t indicates the probability with which a newtask of type t arrives at each time k in N , independently ofthe past and arrivals of other types. In other words, the arrivalprocess A := { ( A ( k ) , . . . , A T ( k )) : k ∈ N } is a collectionof T mutually independent Bernoulli processes with rates λ t , t ∈ T . Although we assume Bernoulli arrivals to simplifyour discussion, more general arrival distributions (e.g., Poisson distributions) can be handled only with minor changes as itwill be clear.Each new task brings a (random) workload. The workloadsof tasks of different types, especially those of tasks that arrivetogether, may be correlated in some cases. Although such casescan be dealt with in the proposed UDC framework, we make asimplifying assumption that the workloads of tasks belongingto different types are mutually independent. Servers with queue length-dependent service rates:

Con-sider a queueing system with vacation. There are L servers( L ≥ ), and deﬁne the set of servers to be L := { , . . . , L } .The servers share T unbounded queues, where the t -th queueholds uncompleted type t tasks. The service rates of the L servers are queue length-dependent in the manner to be madeprecise. We only study non-preemptive servers here: once aserver starts working on tasks, it has to complete the serviceof the tasks before it can either vacation or work on new tasks.In addition, when the ℓ -th server goes on a vacation, it remainsidle for m ℓ units of time ( m ℓ ≥ ). The case with a randomvacation time can be handled with minor changes in the model.For our discussion, we assume that a server works on atmost one type of tasks at any given time. The scenario in whicha server can service a batch of tasks of different types can behandled with appropriate changes. In addition, we assume thatthe scheduling decision at time k is made prior to the arrivalsof new tasks at time k . Hence, new arrivals at time k are noteligible for scheduling till time k + 1 .A scheduling policy under consideration is a mapping Θ : S → D ( T L + × N L )) , where D ( T L + × N L ) is theset of distributions over T L + × N L . A scheduling vector isgiven by a pair ( t , n ) , where t :=( t , . . . , t L ) ∈ T L + and n :=( n , . . . , n L ) ∈ N L . The ℓ -th elements t ℓ and n ℓ specifythe type and the number of the tasks, respectively, assigned tothe ℓ -th server. When t ℓ = 0 , it indicates that the ℓ -th serveris asked to rest. In a special case where servers can work onat most one task, i.e., they are not allowed to work on a batchof tasks, the scheduling vector is simply given by t . Similarly,if there is only a single type of tasks, the scheduling vectoronly stipulates n .Deﬁne R Q : N T × T L + × N L → R L + to be theservice rate function: suppose (i) the queue length vec-tor is q :=( q , . . . , q T ) ∈ N T , where the t -th element q t represents the number of uncompleted type t tasks,and (ii) ( t , n ) is a scheduling vector. For a given triple ( q , t , n ) , the service rates of the L servers are given by R Q ( q , t , n ):=( R Q ( q , t , n ) , . . . , R QL ( q , t , n )) and specify theamount of service that can be performed by each server onthe tasks assigned by the scheduling vector ( t , n ) .Note that this framework allows for heterogeneous serverswith varying capabilities and the dependence of service ratesacross the L servers. For example, when a processor withmultiple cores is subject to a power budget or a thermalconstraint, their service rates (i.e., processing speeds) maybe dependent. Furthermore, the service rate of a server candepend on the number of tasks that it serves simultaneously. Some studies assume that a server works on a null task when it is idle. Inour study, we can take the view that a null task of type zero is assigned to aserver when it is asked to rest and completes service after one unit of time.

The aggregate workload of n type t tasks ( n ∈ N and t ∈ T ) is modeled using a random variable with distribution F t,n . Forthe sake of simplicity, we asume that the workloads of differentbatchs of tasks are mutually independent. Task completion probabilities:

When the ℓ -th server workson n ℓ type t ℓ tasks, the probability that it will complete thetasks within one unit of time depends on several factors,including the queue lengths and the scheduling vector (viathe service rates) as well as the total amount of service thatthe tasks have already received in the past.Let q and ( t , n ) be the queue lengths and the schedulingvector, respectively. Suppose r :=( r , . . . , r L ) is the cumu-lative service vector, where the ℓ -th element r ℓ is the totalamount of service that uncompleted tasks of type t ℓ currentlybeing serviced by the ℓ -th server have received before. If eithernew tasks are assigned to the server or t ℓ = 0 , we have r ℓ = 0 .We deﬁne a task completion probability function C Q : N T × T L + × N L × R L + → [0 , L : for agiven quadruple ( q , t , n , r ) , the value of the function C Q ( q , t , n | r ):=( C Q ( q , t , n | r ) , . . . , C QL ( q , t , n | r )) determinesthe probabilities that the tasks serviced by each server asspeciﬁed by the scheduling vector ( t , n ) will be completedduring one unit of time. In other words, if t ℓ > , C Qℓ ( q , t , n | r ) is the probability that the n ℓ type t ℓ tasks onwhich the ℓ -th server works will complete their service duringone unit of time, provided that the queue lengths are q andthe tasks have already received r ℓ amount of service from theserver before.The values of the task completion probability function canbe computed from the given workload distributions F t,n , t ∈ T and n ∈ N , and the service rate function R Q as follows: foreach t ∈ T and n ∈ N , denote a generic random variable withdistribution F t,n by Z t,n (which we denote as Z t,n ∼ F t,n )and deﬁne B t,n : R → [0 , , where B t,n ( r, µ ) = P (cid:0) Z t,n ≤ r + µ | Z t,n > r ) . Then, for all n ∈ N L , q ∈ N T , r ∈ R L + , and t ∈ T L + , C Qℓ ( q , t , n | r ) = ( B t ℓ ,n ℓ (cid:0) r ℓ , R Qℓ ( q , t , n ) (cid:1) if t ℓ > , otherwise.In a special case of exponentially distributed workloads, thetask completion probability function does not depend on thecumulative services r and C Qℓ ( q , t , n | r ) = C Qℓ ( q , t , n | ) forall r ∈ R L + , where = (0 , . . . , . POCMC state:

The POCMC state is modeled using theprocess S Q = { ( S q ( k ) , S v ( k ) , S r ( k ) , S µ ( k ) , S d ( k )) : k ∈ N } , where the state at time k , S Q ( k ) , comprises the following:(i) S q ( k ):=( S q ( k ) , . . . , S qT ( k )) is the queue length vector.(ii) S v ( k ):=( S v ( k ) , . . . , S vL ( k )) indicates remaining vaca-tion time of each server before it becomes available toservice tasks.(iii) S r ( k ):=( S r ( k ) , . . . , S rL ( k )) indicates the amounts ofservice already received by the tasks assigned by thescheduling vector. Recall that S rℓ ( k ) = 0 if either the ℓ -th server rested or completed the service of tasks at time k − . Otherwise, S rℓ ( k ) is the total amount of service performed by the server on the uncompleted tasks inservice prior to time k .(iv) S µ ( k ):=( S µ ( k ) , . . . , S µL ( k )) keeps track of the servicerates of the servers at the previous time k − , i.e., S µℓ ( k ) is equal to the service rate of the ℓ -th server at time k − .(v) S d ( k ):=( S d ( k ) , . . . , S dL ( k )) retains the scheduling vectorat time k − , i.e., S d ( k ) = U ( k − as explained below.Although there are different queueing models we can con-sider, here we focus on the following simple model fordiscussion. The queue length process S q := { S q ( k ) : k ∈ N } evolves according to S qt ( k + 1) = S qt ( k ) + A t ( k ) − ˇ Y t ( k ) , (15) t ∈ T and k ∈ N , where A t ( k ) = 1 (resp. A t ( k ) = 0 ) with probability λ t (resp. − λ t ), and ˇ Y t ( k ) is the number of type t tasks completed attime k . Performance process:

At each time k , the external controlinput U ( k ):=( U ( k ) , . . . , U L ( k )) , which is the schedulingvector for time k , is chosen according to a scheduling policyin place. The ℓ -th element of U ( k ) is a pair U ℓ ( k ) =( U ℓ, ( k ) , U ℓ, ( k )) consisting of the type ( U ℓ, ( k ) ) and thenumber ( U ℓ, ( k ) ) of tasks assigned to the ℓ -th server and takesvalues in T + × N .Suppose that ( S q ( k ) , S v ( k ) , S r ( k ) , S µ ( k )) = ( q , v , r , µ ) and U ( k ) = ( t , n ) . Then, the value of the performance processat time k , W Q ( k ):=( W Q ( k ) , . . . , W QL ( k )) , is given by W Qℓ ( k ) =  B t ℓ ,n ℓ ( r ℓ , µ ℓ ) if v ℓ = 0 and r ℓ > , C Qℓ ( q , t , n | r ) if v ℓ = 0 and r ℓ = 0 , otherwise, (16)for every ℓ ∈ L . Note that S dℓ ( k ) = U ℓ ( k ) = U ℓ ( k − in theﬁrst case because the servers are assumed non-preemptive.The performance process given in (16) assumes that theservice rate remains constant while a server works on a batchof tasks, which is reﬂected in the ﬁrst case of (16). If theservice rate changes during the service time of a batch, S µ ( k ) can be dropped from the POCMC state and the value of theperformance process at time k simpliﬁes to W Qℓ ( k ) = ( C Qℓ ( q , t , n | r ) if v ℓ = 0 , otherwise . (16a) Output process:

The output of the action kernel at time k , namely Y ( k ) := ( Y ( k ) , . . . , Y L ( k )) , is a vector of mutu-ally independent Bernoulli random variables with parameters W Qℓ ( k ) , ℓ ∈ L , and indicates the completion of tasks servicedby the servers before time k + 1 : Y ℓ ( k ) = 1 if the ℓ -thserver completes the service of the tasks it worked on at time k , and Y ℓ ( k ) = 0 otherwise. Therefore, the output process Y := { Y ( k ) : k ∈ N } takes values in Y := { , } L .The number of type t tasks completed at time k , ˇ Y t ( k ) , isdetermined by the control input U ( k ) and the output Y ( k ) ofthe action kernel according to ˇ Y t ( k ) = L X ℓ =1 (cid:16) { U ℓ, ( k ) = t } U ℓ, ( k ) Y ℓ ( k ) (cid:17) , t ∈ T . (17) Finally, the transition probability of POCMC can be obtainedfrom the above description and the scheduling policy in place.We describe the model in more detail for the example ofquorum policy. • Example: ( l, K ) -quorum system with queue length-dependent service rate Suppose that there is a single server ( L = 1 ) and all tasksare of the same type ( T = 1 ). Since T = 1 , we drop thedependence on the type of task when appropriate. For instance,we write B n and Z n in place of B ,n and Z ,n , respectively.Under the ( l, K ) -quorum policy ( ≤ l ≤ K ), the serverservices tasks in accordance with the following rule. When theserver is available to work on new tasks and ﬁnds q backloggedtasks, there are two possibilities to consider:i. If q is smaller than the threshold l , the server rests.ii. If q is at least l , it works on a batch of min( q , K ) tasksuntil their service is completed.A special case is when l = 1 , which corresponds to thescenario where the server rests only if the queue is empty.For example, this may describe a shuttle bus or a ferry boattransporting passengers. The shuttle bus driver may wait untilat least l passengers are onboard, and the shuttle bus has acapacity of K passengers.The control input U ( k ) is the scheduling vector chosen bythe ( l, K ) -quorum policy and determines the number of tasksthat the server services at time k . There are three possibilitiesto consider based on the above description: U ( k ) =  S d ( k ) if S r ( k ) > S q ( k ) , K ) if S r ( k ) = 0 and S q ( k ) ≥ l otherwiseWhen the server is working on tasks, its service ratemay depend on the number of tasks being served as wellas the number of backlogged tasks. This is a discrete-timegeneralization of the model studied by Neuts [86], for theservice rate is allowed to depend on the queue length. In theearlier example of a shuttle bus or a ferry, for instance, thedriver may feel the pressure to shorten the trip times whenthere are many passengers waiting in line. Also, we assumethat the service rate remains constant during the service timeof a batch of tasks.Since the server rests only when there are fewer than l un-completed tasks, it sufﬁces to model the POCMC state at time k , S Q ( k ) , using the quadruple ( S q ( k ) , S r ( k ) , S µ ( k ) , S d ( k )) .For notational simplicity, for every q ∈ N , we deﬁne q K := min( q, K ) .Suppose S ( k ) = ( q, r, µ, n ) for some k ∈ N . The value ofperformance process at time k , W Q ( k ) in (16), is given by W Q ( k ) =  B n ( r, µ ) if r > , B q K (0 , R Q ( q, q K )) if r = 0 and q ≥ l, otherwise , where B n ( r, µ ) = P ( Z n ≤ r + µ | Z n > r ) and Z n ∼ F n . Theoutput of action kernel, Y ( k ) , is a Bernoulli random variablewith parameter W Q ( k ) . Transition probability of POCMC:

The following map S Q describes the transition probabilities of POCMC. For ease of exposition, we break it into three cases. Let s = ( q, r, µ, n ) and s + = ( q + , r + , µ + , n + ) : Case 1. q < l (the server rests): S Q ( s + | s, ,  λ if q + = q + 1 and r + = µ + = n + = 01 − λ if q + = q and r + = µ + = n + = 00 otherwise Case 2. r > (the server worked on tasks at the previoustime but did not complete their service): S Q ( s + | s, n, y )=  λ if q + = q + 1 − ny, r + = ( r + µ )(1 − y ) ,µ + = µ and n + = n − λ if q + = q − ny, r + = ( r + µ )(1 − y ) ,µ + = µ and n + = n otherwise Case 3. q ≥ l and r = 0 (the server becomes availablefor new tasks, and ﬁnds at least l backlogged tasks): let ˜ µ := R Q ( q, q K ) . S Q ( s + | ¯ s, q K , y )=  λ if q + = q − q K + 1 , r + = 0 ,µ + = ˜ µ, n + = q K and y = 11 − λ if q + = q − q K , r + = 0 ,µ + = ˜ µ, n + = q K and y = 1 λ if q + = q + 1 , r + = µ + = ˜ µ,n + = q K and y = 01 − λ if q + = q, r + = µ + = ˜ µ,n + = q K and y = 00 otherwise B. Utilization-dependent service rates

In many cases of interest, the service rates of servers dependon their (recent) utilization levels. For example, the efﬁciencyof human servers is not constant and varies with severalfactors, such as arousal and fatigue [75], [81]. Hence, in manyapplications with human servers making critical decisions(e.g., air trafﬁc control and nuclear plant monitoring), it isimportant to take into account the efﬁciency and alertness ofhuman servers in order to improve the performance of overallsystems [12].The case in which the service rate of a server varies as afunction of its utilization level can be handled in a similarmanner. For our discussion, we assume the same task arrivalprocesses and setup with T types of tasks served by L servers,which are described in Section VI-A.Let S u := { S u ( k ) : k ∈ N } , where S u ( k ):= ( S u ( k ) , . . . ,S uL ( k )) , be the process that tracks the utilization levels ofthe L servers. For example, S uℓ ( k ) , ℓ ∈ L , could representthe utilization ratio or the utilization ratio with forgettingfactor α of the ℓ -th server (provided in Deﬁnitions 2 and 3of Section III-A). In the remainder of this section, withoutloss of generaliy, we assume that the utilization levels are non-negative. Furthermore, for the simplicity of discussion,we only consider exponentially distributed workloads. Moregeneral workload distributions can be handled as discussed inSection VI-A.The POCMC is given by S U = { ( S q ( k ) , S u ( k )) , S a ( k ) , S d ( k )) : k ∈ N } , where S a ( k ):=( S a ( k ) , . . . , S aL ( k )) indicatesthe availability of each server to take on new tasks. In otherwords, S aℓ ( k ) = 1 if the ℓ -th server is available to service newtasks at time k (either after completing the tasks or resting attime k − ), and S aℓ ( k ) = 0 otherwise. Note that, because theworkloads are assumed exponentially distributed, S r ( k ) canbe dropped from the POCMC state, thanks to the memorylessproperty of exponential distributions.The control input at time k , U ( k ) = ( U ( k ) , . . . , U L ( k )) ,is the scheduling vector and is determined by the employedscheduling policy. Recall that U ℓ ( k ) is given by a pair ( U ℓ, ( k ) , U ℓ, ( k )) and, if U ℓ, ( k ) = 0 , the ℓ -th server restsat time k .The performance process at time k , W U ( k ) =( W U ( k ) , . . . , W UL ( k )) , depends on S U ( k ) and U ( k ) , andreﬂects the efﬁciency of the L servers as a function of theirutilization levels. This is determined with the help of a servicerate function and a task completion probability function givenby R U : R L + × T L + × N L → R L + and C U : R L + × T L + × N L → [0 , L , respectively: R Uℓ ( η , t , n ) denotes the service rate ofthe ℓ -th server as a function of the server utilization levels η :=( η , . . . , η L ) and the scheduling vector ( t , n ) .Similarly, C Uℓ ( η , t , n ) , ℓ ∈ L , is the probability with whichthe ℓ -th server completes the n ℓ tasks of type t ℓ in one unitof time. Because the workloads are exponentially distributed,we have C Uℓ ( η , t , n ) = ( P (cid:0) Z t ℓ ,n ℓ ≤ R Uℓ ( η , t , n ) (cid:1) if t ℓ > , otherwise.Suppose that S u ( k ) = η and U ( k ) = ( t , n ) . Then, for all ℓ ∈ L , W Uℓ ( k ) = C Uℓ ( η , t , n ) .Analogously to the previous case of queue length-dependentservice rates, the output of the action kernel is a vectorof mutually independent Bernoulli random variables withparameters in W U ( k ) and indicates which servers completedthe tasks of the type chosen by the control input. The queuelengths evolve according to (15) with the number of type t tasks completed at time k given by (17) for all t ∈ T .In order to describe the POCMC state transition probabil-ities, we still need to know how the utilization levels evolveas a function of the current POCMC state, in particular, thecurrent utilization levels, and the control input. We capturethe transition probabilities of utilization levels using a map Λ U : R L + × T L + × N L → [0 , , where Λ U ( η + | η , t , n ) is theprobability that the utilization levels will transition from η to η + when the control input is ( t , n ) . In a simple setting, thetransition probability of the utilization level of the ℓ -th servermay depend only on its current utilization level and whetheror not the control input requires it to work on tasks. In order to keep our discussion simple, we assume that the utilization leveltakes values in a discrete set. • Example: Throughput-optimal scheduling policy for asingle server with a utilization-dependent service rate

In a recent study, Lin et al. [82] studied the inﬂuence ofserver utilization in a system with one server servicing a singletype of tasks whose workloads are exponentially distributed.We shall use this study as an example to illustrate how theUDC framework can be used to investigate the problem ofdesigning a simple yet efﬁcient scheduling policy for systemsin which service rates depend on utilization. This study alsoillustrates how the UDC framework allows us to leverage asimpler system to facilitate the analysis. Since there is onlyone type of tasks, we drop the dependence on the type of taskas explained earlier.The task arrival rate is denoted by λ > . The server isallowed to work on at most one task. Therefore, the controlinput U ( k ) at time k can be speciﬁed using a binary value: U ( k ) = 1 if the server works on a task at time k , and U ( k ) = 0 otherwise. Here, as mentioned earlier, we take the view that U ( k ) represents the number of tasks that the server workson at time k . Similarly, S a ( k ) = 1 if the server is availableto work on a new task at time k , and S a ( k ) = 0 otherwise(indicating that the server is still working on a task that it didnot complete at time k − ).The utilization level of the server is modeled using acontrolled Markov chain S u = { S u ( k ) : k ∈ N } thattakes values in a ﬁnite set S u := { , . . . , s max } . The transitionprobabilities of the utilization level S u ( k ) = η at time k depend on (i) the current value of utilization, η , and (ii)the control input U ( k ) , and are governed by the followingmapping: Λ U ( η + | η, n ) =  ξ + η if n = 1 , η + = min( s max , η + 1)1 − ξ + η if n = 1 , η + = ηξ − η if n = 0 , η + = max(1 , η − − ξ − η if n = 0 , η + = η It is clear from the given transition probabilities that if theserver works on a task (resp. rests) at time k , the utilizationlevel either remains at η with probability − ξ + η (resp. − ξ − η )or goes up by one with probability ξ + η if η < s max (resp. goesdown by one with probability ξ − η if η > ) with the convention ξ − = ξ + s max = 0 .When the server works on a task, its service rate dependson its current utilization level, η . For notational simplicity, wedeﬁne ¯ C U : S u → [0 , such that ¯ C U ( η ) = C U ( η, for all η ∈ S u . In other words, ¯ C ( η ) is equal to the probability thatthe server will complete a task within one unit of time whenits utilization level is η . Transition probability of POCMC:

For this model, it isnot necessary to retain the control input at the previous time.Hence, the state of POCMC at time k reduces to a triple S U ( k ) = ( S q ( k ) , S u ( k ) , S a ( k )) .Based on the mapping Λ U , the transition probabilities ofthe POCMC are as follows: for all ( q, η, a ) , ( q + , η + , a + ) ∈ S U = N × S u × { , } , Case 1. n = 0 (the server rests): S U (( q + , η + , a + ) | ( q, η, a ) , ,  λ Λ U ( η + | η, if q + = q + 1 and a + = 1(1 − λ )Λ U ( η + | η, if q + = q and a + = 10 otherwise Case 2. n = 1 (the server works on a task): S U (( q + , η + , a + ) | ( q, η, a ) , , y )=  λ Λ U ( η + | η, if q + = q + 1 − y and a + = y (1 − λ )Λ U ( η + | η, if q + = q − y and a + = y otherwise Stationary randomized scheduling policies and stabil-ity:

The authors of [82] considered the following class ofstationary policies that map the current state of POCMC tothe probability of scheduling a task when the queue is notempty.

Deﬁnition 12.

An admissible stationary randomizedscheduling policy (SRSP) is a mapping

Θ : S U → [0 , suchthat, for all ( q, η, a ) ∈ S U , Θ( q, η, a ) is the probability thatthe server is asked to work on a task when the controlledMarkov chain state is ( q, η, a ) . Recall that, because the server is assumed non-preemptive,once it starts working on a task, it is required to continue toservice the task until completion and, hence, Θ( q, η,

0) = 1 for all q > and η ∈ S u . Also, for a ﬁxed SRSP Θ , thecontrolled Markov chain S U is a discrete-time Markov chainwith a countable state space. Deﬁnition 13.

For a ﬁxed task arrival rate λ > , thecontrolled Markov chain S U under a chosen SRSP Θ , denotedby S U Θ , is said to be stable if there exists at least one recurrent communicating classof S U Θ ; all recurrent communicating classes are positive recur-rent; and the number of transient states is ﬁnite.In addition, Θ is said to stabilize S U for λ . Using this notion of stability, the authors of [82] investigatedthe problem of designing a throughput-optimal schedulingpolicy that stabilizes S U for any arrival rate λ for which thereexists a stabilizing SRSP. In order to facilitate their analysis,they made use of a virtual queue that always has a task waitingfor service when the server becomes available. Removing thequeue size, the state of the virtual queue is given by the process ˜ S U := { ( ˜ S u ( k ) , ˜ S a ( k )) : k ∈ N } . Threshold scheduling policies:

In order to identify athroughput-optimal scheduling policy with simple structure,the authors of [82] focused on a family of threshold policies:ﬁx τ ∈ S u + := { , . . . , s max + 1 } . A threshold (scheduling)policy for the virtual queue with threshold τ is a deterministicscheduling policy given by a mapping Φ τ : S u × { , } →{ , } , where Φ τ ( η, a ):= ( if η ≥ τ and a = 1 , otherwise. (18) Clearly, when the server is available to service a new task,the threshold policy Φ τ assigns a new task if and only if theutilization level is less than the threshold τ .The virtual queue ˜ S U under a threshold policy Φ τ for τ > can be modeled using a ﬁnite-state Markov chain with a uniquestationary distribution ˜ π τ concentrated on the set ˜ S τ := { ( η, a ) | η ∈ { τ − , . . . , s max } , a ∈ { , }} . Deﬁne λ ⋆ := max τ ∈ S u + (cid:16) X ( η,a ) ∈ ˜ S τ ˜ π τ ( η, a ) Φ τ ( η, a ) ¯ C U ( η ) (cid:17) (19)to be the maximum average task completion rate for the virtualqueue among all threshold policies of the form in (18). Let τ ∗ be a maximizer of the right-hand side of (19).The authors of [82] showed that if there is a stabilizingSRSP for some λ > , then λ cannot be larger than λ ⋆ . Hence, λ ⋆ serves as an upper bound on the average task completionrate that can be achieved by any SRSP, and for any task arrivalrate λ > λ ⋆ , we cannot ﬁnd an SRSP that stabilizes S U .In addition, they proved that the following deterministicscheduling policy stabilizes S U for any task arrival rate λ strictly smaller than λ ⋆ : deﬁne Θ τ ∗ : S U → { , } , where Θ τ ∗ ( q, η, a ):= ( Φ τ ∗ ( η, a ) if q > , otherwise. (20)Obviously, Θ τ ∗ and Φ τ ∗ make the same deterministic schedul-ing decisions when the queue is non-empty. Thus, the schedul-ing policy Θ τ ∗ applies a threshold on the utilization level tomake scheduling decisions when there is a task to be serviced. Remark 2. Throughput-optimal scheduling policy Θ τ ∗ i. The value of an optimal threshold τ ∗ can be easilyidentiﬁed by solving the optimization problem in (19) by searching through the ﬁnite set S u + with s max + 1 elements. Thus, this greatly simpliﬁes identifying thethroughput-optimal policy provided in (20) . ii. The aforementioned results hold with no assumption onthe task completion probability function ¯ C U . In particular, ¯ C U need not be monotonic and can be an arbitraryfunction taking values in (0, 1). This is important forunderstanding and optimizing the performance of serverswith state-dependent service rates (e.g., human supervi-sors), which may not be monotonic with utilization. In a closely related study, Savla and Frazzoli [12] investi-gated a similar problem of designing a task release controlpolicy. There are two key differences between these twostudies. First, the model employed by Savla and Frazzoliassumes that the service time function is convex, which isanalogous to the service rate function employed in [82] beingunimodal. Lin et al., however, do not impose any assumptionon the service rate function. Second, a threshold policy isproved to be maximally stabilizing only for the case withidentical task workload by Savla and Frazzoli. In the studyof Lin et al., the workloads of tasks are modeled using i.i.d.random variables. VII. R

EMOTE ESTIMATION ACROSS A PACKET - DROP LINKPOWERED BY ENERGY HARVESTING

We begin this section by describing a UDC consisting ofa packet-drop link powered by energy harvested and storedaccording to the models delineated in Section III-B. The ap-portionment of energy for transmission of information acrossthe link over time is governed by a control process. We thenproceed to discussing a few research themes in which the linkis used in a remote estimation context.

A. Packet-drop links powered by energy harvesting

At each time k , the link can either convey unerringly asymbol in X or a packet drop occurs. Implementation of thepacket-drop link using wireless communication requires, foreach k , that a codeword appropriately encoding X ( k ) is placedfor transmission across one or more physical channels. Thetransmission of a codeword will, in general, require multipleuses of each channel. A decoder at the receiver attempts torecover X ( k ) and a packet drop occurs when it fails due toan outage caused by fading, interference or other detrimentaleffects. If X is inﬁnite, such as when it is a real coordinatespace, we assume that the codeword length is large enough toencode X ( k ) with negligible quantization error. Deﬁnition 14. (EH packet-drop link)

A packet-drop linkcomprises an action kernel whose output alphabet is Y := X ∪ { E } , where E indicates a packet drop. The input-outputrelationship is speciﬁed as follows: Y ( k ) = ( X ( k ) if L ( k ) = 1 E if L ( k ) = 0 (21) where the link process L indicates that there is a successfultransmission when L ( k ) = 1 and the packet is dropped oth-erwise. We assume that a map L : W E → [0 , characterizes L probabilistically as follows: P L ( k ) | S E ( k ) ,U ( K ) (cid:0) | s E , u (cid:1) = L ( w E ) , k ∈ N , s E ∈ S E , u ∈ U (22) which quantiﬁes the probability of packet drop. Here, W E and S E are obtained from the EH model described in Deﬁnition 5or a simpliﬁed version, such as the one speciﬁed in Deﬁni-tion 7. In addition, we assume that L ( k ) , S E ( k + 1) and O ( k ) are conditionally independent given S E ( k ) and U ( k ) . In a wireless communication setting, an outage causing apacket drop occurs when fading, which is stochastic in general,attenuates the transmitted signal to a point that the receivedpower is below a threshold needed for decoding [74], [87]. Thethreshold depends on the codeword length, noise, interferencecharacteristics [88] and it may also be stochastic. Here, weassume that fading and the transmission power are constantduring the transmission of the codeword encoding X ( k ) .Moreover, W E ( k ) represents the total energy used attemptingto transmit X ( k ) . Hence, L , which quantiﬁes the probabilityof outage given the transmission power as in (22), is a non-increasing function that can be determined on a case-by-casebasis, such as in [89]. T Transmitter

EH packet-drop link E Y V E XUV T O RemoteEstimator

Fig. 4. Basic overall structure of the remote estimation system considered.

B. Design of remote estimation systems: problem deﬁnitions

Henceforth, we prioritize the discussion of research on thedesign of remote estimation systems. Our choice is motivatednot only by applications, such as monitoring of physicalprocesses, but also by relevance for the design of controlsystems.We consider the conﬁguration depicted in Figure 4 in whichan estimator E is a causal map that is possibly time-varying,and seeks to reconstruct a process V based on information sentto it via a packet-drop link according to E : y (1 : k ) v E ( k ) ,for k in N . A transmitter is a causal map T that is possiblytime-varying, and uses V T and O to produce X and U according to T : ( V T (1 : k ) , O (1 : k )) ( X ( k ) , U ( k )) , for k in N . In most cases of interest V T is either V itself, or a causalfunction of V possibly disrupted by additive or multiplicativenoise. We refer to the pair ( T , E ) in conjunction with theUDC that speciﬁes the EH packet-drop link as a remoteestimation system. Remark 3. Synchronization between T and E Notice that one-step delayed feedback from the output of thelink can be made available to T through O by augmentingthe state of the POCMC so as to include Y ( k − . Whensuch a feedback is present, a copy of the estimate V E ( k − can be replicated by T at time k . This synchronization oftensimpliﬁes the joint design of T and E to meet stability oroptimality conditions. We proceed with discussing the chronology of research onthe design of remote estimation systems and control, withemphasis on the former.

Problem 1. (Optimal remote estimation system design)

Let an EH packet-drop link, the joint probabilistic descriptionof V T (1 : k ) and V (1 : k ) for all k in N be given. Forpredetermined sets T and E of allowable transmitters andremote estimators, respectively, determine whether a pair ( T , E ) exists that is optimal with respect to a given ﬁgureof merit J : T × E → R + that should assess the quality of V E relative to V and can include additional costs. If such apair exists, determine one. Unless stated otherwise, we assume the following widely-used covariance-based cost structure: J (2 q, K ) ( T , E ) :=1 K K X k =1 E "(cid:16)(cid:0) V T ( k ) − V ( k ) (cid:1) T ( V T ( k ) − V ( k ) (cid:1)(cid:17) q (23) where q is a positive integer and K indicates the length of theoptimization horizon.Stabilizability in the m -th moment sense, as deﬁned below,is another relevant design objective. Problem 2. ( m -th moment stabilizability) Let an EH packet-drop link, the joint probabilistic descriptionof V T (1 : k ) and V (1 : k ) for all k in N be given. Con-sider that the m -th moment of V is unbounded. Determinewhether a pair ( T , E ) exists for which the m -th moment of V ( k ) − V E ( k ) is bounded for all k in N . If such a pair exists,determine one. Notice that the existence of a solution that is optimal for J (2 q, K ) ( T , E ) in the limit when K tends to inﬁnity mayimply, under certain conditions, q -th moment stabilizability.When either T or E is a singleton in Problems 1 or 2, wesay that the associated design problem is of the single-blocktype, and we qualify it as two-block otherwise. Remark 4. (Relevance of remote estimation for controlsystems)

There are at least two scenarios for which Prob-lems 1 or 2 are relevant in the context of control systems.The ﬁrst is when a packet-drop link connects the sensorsthat access the output of the plant to the controller. In thiscase, the transmitter is collocated with the sensors and theremote estimator is typically a component of the controller. Thesecond setting is when the controller includes a transmitterto send its command signals to a remote estimator that iscollocated with the actuator. A combination of both cases isalso possible.C. Uncontrolled transmission: optimal policies

As is surveyed in [6], the design of stabilizing and, when-ever possible, optimal estimation and control systems whosecomponents communicate via packet-drop links has been anactive research topic for at least ﬁfteen years. Early workassumed that the link process L was an uncontrolled time-homogeneous Markov chain. This assumption is realistic whenthe fading process, as indexed by k , is a real-valued time-homogeneous Harris chain and T does not have the authorityto select the transmission power, which may be kept constantthanks to a dependable energy supply.Henceforth, we limit our discussion to remote estimationsystems in which V and V T are obtained as follows: V ( k + 1) = A V ( k ) + N ( k ) , k ∈ N (24a) V T ( k ) = C V ( k ) + N T ( k ) , k ∈ N (24b)where A and C are real matrices of appropriate dimensionsand the noise processes N and N T are independent and whitewith nonsingular covariance. In the context of control systems,an additional input term may be present in the right hand sideof (24a) and (24b).At ﬁrst, the effect of uncontrolled packet drops was modeledas multiplicative noise [90], [91], which makes the analysis ofstability and second moment optimal design amenable to tech-niques inspired on Markovian jump linear system theory [92].Typically the noise process would be Bernoulli, which would take value when a drop occurs. In a control systems setting,these multiplicative noises could affect the links carryingsensor measurements to the controller and controls signals tothe actuator. Most approaches focused on single-block design,which, depending on which links suffer packet drops, would beeither a component at the sensors that processes measurementsprior to transmission, a controller [93] or a remote estimator.As a consequence of the simplicity of the single-block frame-work, optimal policies and tight stabilizability conditions forstate estimation and control can be obtained even when thereis no link output feedback [94], [95], which can be viewed asa form of user datagram protocol (UDP).The two-block remote estimation system formulated in [96]was the ﬁrst to consider the simultaneous design of T and E . When (24) is detectable [97], [98] the approach in [96],which is speciﬁed in continuous-time, can be immediatelyadapted to our discrete-time framework. In such a case, when L is a Bernoulli process, the remote estimation system is m -th moment stabilizable if and only if the following conditionholds: p outage ρ ( A ) m < (25)where p outage := P L ( k ) (0) is the probability of drop and ρ ( A ) is the spectral radius of A . As is shown in [96], a stabilizingsolution is obtained by selecting T as a Kalman ﬁlter and X as its state followed by a properly designed estimator E .Subsequent work in [99] showed that the scheme in [96] isoptimal with respect to a quadratic cost when N and N T are independent white Gaussian processes. Stabilizability ina control systems context was characterized in [100] usingsimilar techniques for the case in which measurements areconveyed to the controller using two packet-drop links, witheach having a distinct transmitter block. The setting in which apacket-drop link conveys command signals from the controllerto the actuator was investigated in [101].Interestingly, (25) can be obtained as the limiting case [102]when r tends to inﬁnity of the condition in [103], [104]that characterizes stabilizability when a r -ary erasure channelconnects the transmitter to the remote estimator. D. Controlled transmissions without packet drops

We now consider the case in which transmissions may becontrolled through U , while V and V T are modeled by (24).When restricted to the remote estimation framework adoptedhere, in which T must designed to appropriately generateboth X and U , controlled transmissions were ﬁrst studied ina stabilizability context in [96].In [96], T incorporates a Kalman ﬁlter that uses V T togenerate a local estimate ˆ V of V . In addition, it implementspolicies that use the magnitude of ˆ V − V E to determine thelikelihood that a transmission is requested, which requiressynchronization between T and E so that V E can be re-constructed at the transmitter. Notice that, in the absence ofpacket drops, T and E can be synchronized without the needfor feedback through O since Y can be causally computedat the transmitter based on U and X . In this context, when V is scalar and the noises N and N T are Gaussian, apolicy that requests a transmission when the magnitude of ˆ V − V E exceeds a threshold was later shown in [105] to beoptimal jointly with a Kalman-like estimator, with respect toa cost that linearly combines the expected squared estimationerror and the time-averaged probability of transmission. Asreported in [107], threshold-type policies remain optimal when V has dimension two or higher, provided that A is a scaledorthogonal matrix. Although [108] shows that a jointly optimaltransmitter and estimator pair exists for the aforementionedsetting even when A is any real-valued matrix, the question ofwhether there is a jointly optimal pair admitting threshold-typepolicies for transmission remains an open problem. Certaintyequivalence properties for these estimators, which are relevantfor the design of optimal controllers, are investigated in [109].Optimal strategies subject to restrictions on the total numberof transmissions were determined in [93]. Results reportedin [110] show that threshold-based schemes can be adaptedto guarantee stabilizability of a system formed by a networkof plants and controllers connected by multiple packet-droplinks.The framework in [107] was the ﬁrst, in the context ofremote estimation considered here, to allow for transmissionpolicies that account for energy harvesting. Notably, it con-siders that U is generated based not only on V T but also on S SOC , as determined by the linear-saturated EH model (10a)for which the arrival process S A is assumed i.i.d. and S SOC is normalized so that each transmission at time k requires W E ( k ) = 1 . When the noises in (24) are zero-mean whiteGaussian and there are no packet drops, it follows from [107,Theorems 3 and 4] that there are transmission and estimationpolicies with the structure in (26) and (27), respectively, thatare jointly optimal for the scalar case. V E ( k ) = ( AV E ( k − if Y ( k ) = E X ( k ) if Y ( k ) = E , k ∈ N (26) U ( k + 1) =  if | ˆ V ( k + 1) − AV E ( k ) | > G ( k, S SOC ( k + 1))0 otherwise (27a) X ( k ) = ˆ V ( k ) , k ∈ N (27b)Here, G is a threshold that depends on time and the state ofcharge S SOC ( k ) . The threshold determines when U ( k ) is , inwhich case a transmission setting Y ( k ) to X ( k ) is requestedat time k . Methods to determine G are described in [107].It is remarkable that a policy pair with the simple structurein (26) and (27) is jointly optimal, which also guarantees thatit accomplishes the best trade-off between transmitting at time k or saving energy to transmit later.It is important to note that, barring the dependence ofthe thresholds for transmission on S SOC ( k ) , (26) and (27) areakin to the optimal policies in [105], [106]. As is explainedin [107], one way to obtain these results is to establishthat there is a jointly optimal policy pair whose estimatorhas the structure (26), after which the problem of ﬁnding a The techniques and results in [105] are to a signiﬁcant extent equivalentto the research reported in [106] for paging and registration policies. corresponding optimal transmission policy can be cast as anMDP [111] whose state is ﬁnite dimensional because T and E are synchronized. Subsequently, well-known results can beinvoked to prove that restricting transmission policies to bememoryless functions of the state of the MDP incurs no lossof optimality. Properties of the probability distributions of thenoises, such as symmetry and unimodality, are used to showthat there is no optimality loss when these policies are furtherrestricted to be of the form (27). E. Controlled transmissions with packet drops and perfectfeedback

In this subsection, we discuss recent work for the frameworkthat extends of that of Section VII-D by allowing packet dropsin the link that connects the transmitter to the remote estimator.

Assumption 1.

Unless noted otherwise, here we assume thatthere is a causal map with which Y ( k ) can be recoveredunerringly from O (1 : k + 1) , for all k in N , which also impliesthat T and E can be synchronized. Assumption 2.

We also assume that S SOC ( k ) and S A ( k ) canbe can recovered from O (1 : k ) , for all k in N . We proceed to deﬁning and subsequently discussing advan-tages and properties of a class of covariance-based transmis-sion policies, which has been adopted in [112], [113], [114],[115], to list a few.A transmission policy is classiﬁed as covariance-based whenthe dependence of U on V T and O can be recast in terms ofa matrix-valued process P T that is determined from Y asfollows, for k in N : P T ( k ) := E h(cid:0) V ( k ) − V E ( k )) T ( V ( k ) − V E ( k ) (cid:1) | Y (1 : k ) i (28)where P T (0) is predetermined.There is a recursive time-update mechanism [112] for P T that guarantees that it is an information state [116], which,as we discuss below, may be used to recast the underlyingoptimization as an MDP, subject to the following set ofpolicies. Deﬁnition 15. ( T C - Memoryless covariance-based trans-mission policy set) We use T C to denote the set of transmittersfor which there is a map T U determining U according to T U : (cid:0) P T ( k − , S SOC ( k ) , S A ( k ) (cid:1) U ( k ) . Now, consider the formulation in [112], in which for each k T selects X ( k ) equal to ˆ V ( k ) , and U ( k ) is either zero (notransmission) or a pre-selected energy quantum, as opposed toallowing two or more energy levels. A transmitter that seeksto convey X ( k ) = ˆ V ( k ) to the estimator is often labeled smart sensor to distinguish it from the scheme in [114], whichattempts to forward the unprocessed measurements by setting X ( k ) equal to V T ( k ) . If N and N T are Gaussian then thereis a tractable method to design remote estimation systems thatare optimal subject to the restriction that T is in T C . Namely, inexchange for the possible loss of optimality that results fromthis restriction, as pointed out in [112], there is no furtherloss of optimality by also assuming that the estimator has the structure in (26). This property allowed the authors of [112]to show that there are coordinate-wise threshold transmissionpolicies that are optimal among those in T C . In spite of theseadvantages, there is no known bound on the performance lossincurred by this method.The authors of [113] investigated methods to determineoptimal power selection policies when the probability of out-age depends exponentially on the transmission power, whichin their framework is allowed to vary among two or morelevels. Notably, short of allowing for varying transmissionpower levels, the formulation of [113] is analogous to theone in [112]. Notwithstanding their similarities, the analysisin the former demonstrates why allowing the transmitter toselect among multiple power levels complicates signiﬁcantlythe search and characterization of optimal policies. In order tocontend with the complexity of the problem, work in [113]includes useful approximations and tractable methods. Theanalysis and framework in [117], which also examines acontrol problem, provides suboptimal policies and numericalmethods to address the case in which Assumptions 1 and 2are not satisﬁed.Tight necessary and sufﬁcient conditions for the existenceof a transmission policy that stabilizes the estimation errorin the second-moment sense have been recently determinedin [118]. The formulation in [118] considers memorylesspolicies that use the state of charge S SOC ( k ) to decide, ateach time k , whether a transmission should be attemptedand if so at which power level U ( k ) . In order to state thestabilizability conditions, we refer to a map L d : S SOC → [0 , representing the probability of outage in terms of the state ofcharge when a transmission is requested. More speciﬁcally, themap L d , which is represented with d in [118], must quantifythe combined effect of the power selection policy, the batterymodel that yields the effective power W E ( k ) and L (cid:0) W E ( k ) (cid:1) ,which quantiﬁes the outage probability according to (22). Theprobability that a transmission is requested at time k is afunction of S SOC ( k ) , speciﬁed by a randomized transmission-request policy map, which is represented as L θ : S SOC → [0 , and is denoted as θ in [118]. Theorem 3.1 in [118] states thatgiven L d , there is a stabilizing transmission-request policy ifonly if the following inequality holds: λ S ρ ( A ) ≤ (29)where the nonnegative real constant λ S is a function of L d and S , which specify the probabilities of transition of S SOC interms of U . In addition, it is stated in [118, Theorem 3.1] that itsufﬁces to consider deterministic transmission-request policiesand according to [118, Theorem 3.2] the search can be furthernarrowed to threshold policies when L θ is non-increasing.Notice that (29) is a generalization of (25) and the twoconditions coincide when L d is constant and equal to p outage .VIII. C ONCLUSIONS AND FUTURE DIRECTIONS

Our overview of the concepts, formulations, and methodsutilized on the research themes expounded in Sections IV -VII evinces not only the similarities elicited by the presenceof a UDC, but it also unveils a clear distinction among the objectives, techniques and assumptions adopted in each theme.This disconnection creates new research opportunities andchallenges that would beneﬁt from the fusion of the techniquesand approaches that hitherto have been routinely employedby the information theory, wireless communication, opera-tions research, networking and control theory communities.More speciﬁcally, we concluded that the research challengesdescribed in Sections VIII-A-VIII-C are currently not fullyaddressed, and constitute signiﬁcant opportunities for futurework that would also lead to methods for tackling prob-lems speciﬁed by more realistic models and assumptions.Subsequently, in Sections VIII-D - VIII-G, we proceed withsuggesting additional future research directions that broachaspects of security and secrecy, effective methods to copewith systems comprising multiple UDCs, UDC in learning andmore realistic battery models, respectively. A. Noisy channels for remote estimation

Most work discussed in Section VII presumes that, in theabsence of an outage, an EH packet-drop link can convey areal vector unerringly from the sensor to the estimator when atransmission is requested. Considering the unidealized case inwhich a noisy channel links the sensor to the estimator wouldrequire the investigation of causal encoding and decodingschemes possibly inspired on modiﬁcations of those discussedin Section IV. Introducing channel encoding and decoding, andpossibly lossy source compression, as was done in [119] for anindependent Gaussian source would expand the set of policiesto include high and low ﬁdelity solutions whose implementa-tion may consume more or less energy [120], respectively, inaddition to that required for transmission. Obtaining methodsfor the design of such policies with stability and performanceguarantees is, therefore, an important open challenge.The case in which the UDC would depend not only on theenergy available but also on the state of a physical system,such as the position and velocity vectors of a mobile agent,would be an interesting extension of this framework. In thissetting, the UDC could be a communication channel betweenthe agent and a base station whose outage likelihood wouldincrease with distance for each transmission power level. Thescenario in which the UDC would be a global positioningmodule (GPS) whose accuracy would depend on the locationand power level, with higher ﬁdelity consuming more power,would be an example relevant to autonomous navigation [121]of unmanned assets. In these cases, one needs to considerpolicies that not only allocate power for the UDC but alsogovern the control action that steers the agent. As is discussedin [122], many active sensing problems could be formulatedsimilarly once energy harvesting constraints are included.

B. Queueing, remote estimation and age of information

According to the optimality principle used in [99], forthe framework adopted in Section VII-C, if a sensor hasaccess to the state V or is able to compute the optimal stateestimate ˆ V - cases we refer to as full-information sensor or smart sensor , respectively - then it should always attempt totransmit the latest one to the remote estimator. Hence, given a choice, it is optimal to discard state estimates correspondingto failed transmission attempts in favor of the most recentone - a principle we denote as most-recent-only optimality .In fact, this most-recent-only optimality principle for a full-information/smart sensor remains valid even in the controlledtransmission setting described in Section VII-E. Hence, theseobservations suggest that introducing a packet managementlayer, such as establishing a queue, prior to transmission isnot necessary and may even be counterproductive when thesensor is full-state/smart.However, when using an existing transmission system onemay be left with no option but to deal with a pre-existing ﬁrst-in-ﬁrst-out queue-based non-preemptive management systemin which a packet leaves the queue only when it is successfullyconveyed to the remote estimator. Notably, as is provedin [123], [124] for the aforementioned scenario, for the case inwhich the source is a Wiener or Ornstein-Uhlenbeck processand the sensor is full-state, it is never optimal to submit ameasurement for transmission when the queue is non-empty,and when a new measurement is inserted in the empty queuefor transmission it must be the current state of the process,which can be viewed as a version of the most-recent-only op-timality principle for the case when pre-emption is not allowed.Interestingly, the optimal rule proposed in [124] to determinewhether to submit the latest measurement for transmission,subject to the queue being empty, follows an event-basedthreshold policy that is analogous to the one found to be op-timal for the closely-related case analyzed in [125] . The factthat the most-recent-only optimality principle may no longerhold when the sensor is neither full-state nor smart [114] raisesthe question of whether, if the sensor in the framework of [124]could transmit only noisy output measurements V T , therewould be optimal policies for which a transmission would bescheduled even when the queue is non-empty. Furthermore, ifthe queue is served by a channel powered by energy harvestedfrom stochastic sources then we are left with the currentlyunsolved problem of designing policies that determine not onlywhen and which estimates or measurements should be placedin the queue for transmission but also allocate the energy usedfor each transmission attempt. A typical approach would be tocharacterize stabilizing policies ﬁrst, perhaps within an appro-priately parametrized class, followed by the characterizationof structural properties that could facilitate the computation ofoptimal policies using tractable methods. The stability problemmay require the integration of techniques such as the ones usedin [82] and [118], which were discussed in Sections VI and VIIin the context of queueing and remote estimation, respectively.Devising methods to design optimal policies may involvefusing the techniques adopted in Section VII-E and [123],[124], and possibly leveraging the fact that our UDC model isamenable to existing methodologies [116], [111] for partiallyobservable controlled Markov chains. Since the ﬁdelity of theestimate constructed at the remote estimator depends on therecency of the information received by the remote estimator,both the stability and the optimization problems are related to The techniques used in [125] are analogous to the ones adopted for thecase without packet drops in [105]. recent work seeking to analyze and design data-transmissionsystems that effectively regulate the age of information [126],[127]. In fact, it has been suggested in [128], [129] thatthe remote estimation and age of information problems areinextricably tied. C. Feasible region and trade-off among performance metrics

Most existing studies in which queue length, utilizationor workload affects servers, including those mentioned inSection VI, examine the effects on a single aspect of serverperformance, oftentimes their service rates being the choice.In another example, the study by Chatterjee et al. [85] takesinto account the service quality (which is modeled as channelcondition in their study) as a function of queue length andexamines the information-theoretic capacity of such systems.In many cases of interest, however, including human super-visors [79], several performance aspects, including service rateand service quality (e.g., reliability or frequency of mistakesor poor decisions), can be affected at the same time by workhistory via server state. Moreover, the requirements (e.g.,service rate vs. reliability) in different applications are likelyto vary considerably based on the types of tasks that need tobe processed.From this viewpoint, it is important to develop a compre-hensive theory for these systems, including their fundamentallimits. Regrettably, to the best of our knowledge, little isknown about the feasible region of multiple performancemetrics which can be achieved simultaneously and how todesign suitable policies for carrying out a desired trade-offamong various performance metrics in the feasible region, inparticular on the Pareto frontier.

D. Secure remote estimation powered by energy-harvesting

Preventing, or at the very least mitigating the effect of,attacks on the channels connecting the sensors to everycomponent relying on remotely constructed state estimatesis critical to ensure the safe operation [130] of networkedcyber-physical systems. While clever encoding and decodingschemes [131], some of which may be implemented efﬁcientlyusing event-based algorithms, may thwart or curb the ef-fect [132] of certain types of attacks, a relentless surreptitiousMan-in-the-Middle (MitM) attack injecting false data [133]may signiﬁcantly degrade the performance of any remoteestimation system. The case-study in [134] illustrates thatby employing message authentication codes (MACs), even ifinfrequently, may afford performance guarantees against MitMattacks. It further demonstrates that although MAC are knownto substantially increase communication overhead, which isparticularly critical when using bandwidth-limited networkssuch as the ones found in automobiles, its parsimonious usemay sufﬁce for practical purposes. A promising new researchavenue is to investigate estimation-oriented encoding anddecoding schemes and MAC scheduling policies that wouldjointly provide stability and performance guarantees, or wouldeven be jointly optimal with respect to a given estimation errormetric, in the presence of MitM attacks. Realistic problemformulations, in which information transmission is powered by an energy harvesting module, would have to account forthe additional energy required for the transmission of MAC.A new type of EH link that would account not only forpacket-drop events, but also MitM attacks whose likelihoodand severity would depend on the power employed in eachtransmission for the inclusion of MAC could be a usefulabstraction to design and evaluate the performance of suchsystems. The open problems discussed here would also berelevant for distributed function calculation [135] in the casesin which information would be wirelessly disseminated amongthe agents via such security-threatened EH links.Finally, it would be important to investigate all of theseproblems in light of other security threats [136], includingdenial-of-service attacks [137]. E. Coping with multiple UDCs

In many situations of practical interest, there are a set ofservers working on tasks (e.g., emergency rooms at hospitals).Furthermore, the availability of servers may be affected bysome exogenous processes (e.g., schedules of doctors andnurses at hospitals). For example, data centers comprise alarge number of server racks that are connected by high-speednetworks and are sometimes subject to power constraints.Also, because the reliability of hardware components, suchas CPUs, GPUs and memory modules, degrades when thetemperature exceeds some threshold, they need to be cooledfor stable operation, for instance, via direct-to-chip liquidcooling. Moreover, because new server racks are added overtime to meet increasing demands and old or failed racks arereplaced at different times, the computational capabilities of-fered by various computational resources, which are designedfor different types of tasks (e.g., CPUs vs. GPUs), can varysigniﬁcantly.Another class of problems well suited for the UDC frame-work with multiple UDCs, which is also related to those inSections VIII-B and VIII-D, is information collection frommultiple sources over time. These sources may be distributedsensors in wireless sensor networks (WSNs), which are pow-ered by renewable energy, or “friends” in social networks whoprefer not to be bothered constantly for the latest information.One can view the “usefulness” of the information collectedfrom each sensor or friend as the reward. Such usefulness ofinformation from a sensor or a friend will likely be stochastic.However, there are certain factors that would affect the use-fulness of the information. These include (i) the accuracy orquality of the sensors or the importance of the friends in socialnetworks (which are often measured using their “centralities”in social networks [138]) and (ii) the age-of-information fromeach sensor or friend introduced in Section VIII-B as well asthe frequency of information requests.Unfortunately, the quality of sensors and the importanceof friends may not be known in advance. In addition, inmany practical scenarios, we may be able to poll or collectinformation from only a limited number of sensors or friendsat any given time and only so often. In WSNs, for instance,the number of available channels or timeslots in a frame may Possibly based on a modiﬁcation of Deﬁnition 14. constrain the number of measurements we can collect at eachtime and, when the sensors are powered by renewable energy,they may not be able to report measurements even when theyare polled, as their availability for reporting measurements willbe governed by a stochastic process.Despite their prevalence, not much is known about theirfundamental performance limits and efﬁcient resource man-agement in such systems. Only recently research has demon-strated the beneﬁts of task-aware scheduling at data centers(e.g. [139]). Consequently, there is a rich set of open problemsin related domains. For instance, consider a system in whichhuman servers are employed for processing tasks of differenttypes, which arrive at ﬁxed rates (e.g., assembly lines atmanufacturing facilities). One interesting open question is howone should schedule tasks so that the long-term fraction oftime the humans servers are required to work is minimized, inorder to reduce the fatigue or operational costs (e.g., wages foremployees and costs of keeping the facilities running). Simi-larly, when servers are heterogeneous and designed/optimizedfor different types of tasks, how should we schedule arrivingtasks so that the system remains stable and the average servicetimes of the tasks are minimized subject to utilization-basedconstraints similar to those discussed in the previous sections?These are some of questions, the answer to which can have sig-niﬁcant impact on many areas, including crucial applicationsinvolving HSC (e.g., air trafﬁc control and nuclear power plantmonitoring). A useful approach for studying these problems,especially when some of the parameters are unknown, is therestless multi-armed bandit model, which has been previouslyapplied to stochastic scheduling [140], [141]. F. UDC in learning

Recent years have highlighted several exciting problemsat the intersection of machine learning and control theory.The dual component structure of UDC models is particularlyrelevant in adversarial machine learning, as touched upon inSection V-B. We mention here a few of other areas whereUDC-based models are likely to have impact.Generative adversarial networks (GANs) have been success-fully applied to text, image, or video generation, drug discov-ery, and image-to-image synthesis [142], [143], [144]. A GANhas two components: a discriminator and a generator. The gen-erator component produces ﬁner and ﬁner approximations ofa data distribution of interest, whereas the discriminator com-ponent distinguishes samples from the true data distributionand the generator’s output. Training a GAN may be thoughtof as a two-person zero-sum game between the generator andthe discriminator. Since SGD is used for updating parametersof the generator and discriminator, the training process forboth components may be thought of as Markov processes.However, the generator’s update relies on the discriminator’scurrent state, and the discriminator’s update relies on thegenerator’s current state. Thus, the performance processes forboth these components consist of their corresponding states ata certain time k , and these two components drive each other’sstate. Although such a model is not described using the UDCframework from this paper, a variation where the action kernel is allowed to have its own internal state can model the trainingprocess for a GAN.There has been a lot of recent work at the intersectionof reinforcement learning and robust control [145], [146].Learning optimal policies for applications such as self-drivingvehicles requires a large amount of data, and it is oftendifﬁcult or expensive to obtain such data in large quantities.For this reason, reinforcement learning algorithms are oftentrained on simulated data. Since the algorithm is tested inconditions that may differ signiﬁcantly from what it wastrained on, an important new tool has been to incorporaterobust control techniques in the training process. There areseveral interacting components in this system: the learningalgorithm that responds to the environment, a model of the en-vironment with uncertainty quantiﬁcation, and also adversariesthat may change the environment in adversarially optimal waysto derail the learning algorithm [147], [148]. As shown inSection V-B, a UDC-based model may easily describe thealgorithm and the adversary. An interesting addition is thecomponent that models the environment and quantiﬁes thecorresponding uncertainty. This component, which is also aMarkov process, also acts as the controller for the learningalgorithm. G. More realistic battery models for energy harvesting: leak-age and nonlinearities

Although, as we discussed in Section III-B, the batteriesused in energy harvesting modules have a rather complexbehavior, the existing work discussed throughout this articleadopts either the linear-saturated or the ﬁnite-state approxima-tions. These simpliﬁed models do not capture a host of issuesthat could possibly require new methods and abstractions.This is illustrated by the following two features that couldbe captured by our general model of Deﬁnition 5: a) Leakage:

The chemistry of every battery and theoperation of its auxiliary circuitry will cause charge to leak,even when it is not supplying power. Hence, the charge thatis stored in a battery may be partially lost unless it is usedquickly or the leakage is offset by harvesting, which raisesthe issue of age of energy . This is a relevant problem for low-power remote sensing devices that operate over long periodsof time. b) Nonlinearities:

In Section III-B we mentioned the factthat, due to the discharge curve, in general there is a state ofcharge threshold below which the voltage of the battery doesnot sufﬁce to power the other components. Consequently, if thevoltage is near the required minimum then leakage effects maydrain the state of charge below the aforesaid threshold, afterwhich enough energy must be harvested before the battery canfunction again. The fact that the state of charge also governsthe portion of the energy harvested that is effectively storedconstitutes another important nonlinearity. Notably, as the stateof charge nears its maximum and minimum the ability of thebattery to store energy varies considerably.A

CKNOWLEDGEMENT

The authors would like to thank Sennur Ulukus (UMD),Yasser Shoukry (UMD) and Vijay Gupta (UND) for helpful discussions and suggestions.R EFERENCES[1] S. Ulukus, A. Yener, E. Erkip, O. Simeone, M. Zorzi, P. Grover, andK. Huang, “Energy harvesting wireless communications: a review ofrecent advances,”

IEEE Journal on Selected Areas in Communications ,vol. 33, pp. 360–381, March 2015.[2] K.-D. Kim and P. R. Kumar, “Cyber-physical systems: a perspective atthe centenial,”

Proceedings of the IEEE , pp. 1287–1308, May 2012.[3] R. Baheti and H. Gill,

The impact of control technology , ch. Cyber-physical systems, pp. 161–166. IEEE Control Systems Society, 2011.[4] M. A. Staal, “Stress, cognition and human performance: A literature re-view and conceptual framework,” Tech. Rep. NASA/TM-2004-212824,NASA, August 2004.[5] A. S. Leong, D. E. Quevedo, and S. Dey,

Optimal control of energyresources for state estimation over wirless channels . Briefs in eletricaland computer engineering, Springer, 2018.[6] J. P. Hespanha, P. Naghshtabrizi, and Y. Xu, “A survey of recent resultsin networked control systems,”

Proceedings of the IEEE , vol. 95, no. 1,pp. 138–162, 2007.[7] K. H. Teigen, “Yerkes-Dodson: a law for all seasons,”

Theory &Psychology , vol. 4, no. 4, pp. 525–547, 1994.[8] C. D. Wickens and J. G. Hollands,

Engineering psychology and humanperformance . Prentice Hall, third edition ed., 2000.[9] M. L. Cummings and C. E. Nehme, “Modeling the impact of workloadin network centric supervisory control settings,” in

Proceedings of the2nd Annual Sustaining Performance Under Stress Symposium , February2009.[10] J. R. Peters, V. Srivastava, G. S. Taylor, A. Surana, M. P. Eckstein, andF. Bullo, “Human supervisory control of robotic teams: Integratingcognitive modeling with engineering design,”

IEEE Control SystemsMagazine , vol. 35, pp. 57–80, December 2015.[11] H. P. G. van Ooijen and J. W. M. Bertrand, “The effects of asimple arrival rate control policy on throughput and work-in-processin production systems with workload dependent processing rates,”

International Journal of Production Economics , vol. 85, pp. 61–68,2003.[12] K. Savla and E. Frazzoli, “A dynamical queue approach to intelligenttask management for human operators,”

Proceedings of the IEEE ,vol. 100, pp. 672–686, March 2012.[13] T. Koch, A. Lapidoth, and P. P. Sotiriadis, “Channels that heat up,”

IEEE Transactions on Information Theory , vol. 55, pp. 3594–3612,August 2009.[14] A. Baknina, O. Ozel, and S. Ulukus, “Energy harvesting commu-nications under explicit and implicit temperature constraints,”

IEEETransactions on Wireless Communcations , vol. 17, pp. 6680–6692,October 2018.[15] D. Forte and A. Srivastava, “Thermal-aware sensor scheduling fordistributed estimation,”

ACM Transactions on Sensor Networks , vol. 9,pp. 53:1–53:31, July 2013.[16] F. Baccelli and P. Br´emaud,

Elements of queueing theory . Springer,second edition ed., 2003.[17] S. Sudevalayam and P. Kulkarni, “Energy harvesting sensor nodes: Sur-vey and implications,”

IEEE Communications Surveys and Tutorials ,vol. 13, no. 3, pp. 443–461, 2011.[18] A. Kansal, J. Hsu, S. Zahedi, and M. B. Srivastava, “Power manage-ment in energy harvesting sensor networks,”

ACM Transactions onEmbeded Computing Systems , vol. 6, September 2007.[19] S. Priya and D. J. Inman, eds.,

Energy Harvesting Technologies .Springer, 2009.[20] B. E. Lewandowski, K. L. Kilgore, and K. J. Gustafson, “Feasibility ofan implantable, stimulated muscle-powered piezoelectric generator asa power source for implanted medical devices,” in

Energy HarvestingTechnologies (S. Priya and D. J. Inman, eds.), ch. 15, pp. 389–404,Springer, 2009.[21] C. M. Shepherd, “Design of primary and secondary cells ii. an equationdescribing battery discharge.,”

J. Electrochem. Soc. , vol. 112, no. 7,pp. 657–664, 1965.[22] M. Chen and G. A. Ric´on-Mora, “Accurate electrical battery modelcapable of predicting runtime and i-v performance,”

IEEE Transactionson Energy Conversion , vol. 21, pp. 504–511, June 2006.[23] S. Sudevalayam and P. Kulkarni, “Energy harvesting sensor nodes:Survey and implications,”

IEEE Communications Surveys & Tutorials ,vol. 13, no. 3, pp. 443–461, 2011. [24] D. Gunduz, K. Stamatiou, N. Michelusi, and M. Zorzi, “Designingintelligent energy harvesting communication systems,”

IEEE commu-nications magazine , vol. 52, no. 1, pp. 210–216, 2014.[25] S. Ulukus, A. Yener, E. Erkip, O. Simeone, M. Zorzi, P. Grover, andK. Huang, “Energy harvesting wireless communications: A review ofrecent advances,”

IEEE Journal on Selected Areas in Communications ,vol. 33, no. 3, pp. 360–381, 2015.[26] O. Ozel and S. Ulukus, “Achieving AWGN capacity under stochasticenergy harvesting,”

IEEE Transactions on Information Theory , vol. 58,no. 10, pp. 6471–6483, 2012.[27] O. Ozel and S. Ulukus, “AWGN channel under time-varying ampli-tude constraints with causal information at the transmitter,” in , pp. 373–377, IEEE, 2011.[28] Y. Dong and A. ¨Ozg¨ur, “Approximate capacity of energy harvestingcommunication with ﬁnite battery,” in , pp. 801–805, IEEE, 2014.[29] V. Jog and V. Anantharam, “An energy harvesting AWGN channel witha ﬁnite battery,” in , pp. 806–810, IEEE, 2014.[30] K. Tutuncuoglu, O. Ozel, A. Yener, and S. Ulukus, “Improved capacitybounds for the binary energy harvesting channel,” in , pp. 976–980, IEEE,2014.[31] Y. Dong, F. Farnia, and A. ¨Ozg¨ur, “Near optimal energy controland approximate capacity of energy harvesting communication,”

IEEEJournal on Selected Areas in Communications , vol. 33, no. 3, pp. 540–557, 2015.[32] O. Ozel, K. Tutuncuoglu, S. Ulukus, and A. Yener, “Fundamentallimits of energy harvesting communications,”

IEEE CommunicationsMagazine , vol. 53, no. 4, pp. 126–132, 2015.[33] D. Shaviv, P. Nguyen, and A. ¨Ozg¨ur, “Capacity of the energy-harvestingchannel with a ﬁnite battery,”

IEEE Transactions on InformationTheory , vol. 62, no. 11, pp. 6436–6458, 2016.[34] O. Ozel and S. Ulukus, “On the capacity region of the Gaussian MACwith batteryless energy harvesting transmitters,” in , pp. 2385–2390, IEEE,2012.[35] H. Inan, D. Shaviv, and A. ¨Ozg¨ur, “Capacity of the energy harvestingGaussian MAC,”

IEEE Transactions on Information Theory , vol. 64,no. 4, pp. 2347–2360, 2018.[36] O. Ozel, J. Yang, and S. Ulukus, “Optimal scheduling over fadingbroadcast channels with an energy harvesting transmitter,” in , pp. 193–196, IEEE, 2011.[37] O. Ozel, J. Yang, and S. Ulukus, “Optimal broadcast scheduling foran energy harvesting rechargeable transmitter with a ﬁnite capacitybattery,”

IEEE Transactions on Wireless Communications , vol. 11,no. 6, pp. 2193–2203, 2012.[38] K. Tutuncuoglu and A. Yener, “Sum-rate optimal power policies forenergy harvesting transmitters in an interference channel,”

Journal ofCommunications and Networks , vol. 14, no. 2, pp. 151–161, 2012.[39] J. Yang and S. Ulukus, “Optimal packet scheduling in a multiple accesschannel with energy harvesting transmitters,”

Journal of Communica-tions and Networks , vol. 14, no. 2, pp. 140–150, 2012.[40] D. Shaviv and A. ¨Ozg¨ur, “Universally near optimal online powercontrol for energy harvesting nodes,”

IEEE Journal on Selected Areasin Communications , vol. 34, no. 12, pp. 3620–3631, 2016.[41] A. Baknina and S. Ulukus, “Optimal and near-optimal online strategiesfor energy harvesting broadcast channels,”

IEEE Journal on SelectedAreas in Communications , vol. 34, no. 12, pp. 3696–3708, 2016.[42] A. Baknina and S. Ulukus, “Energy harvesting multiple access chan-nels: Optimal and near-optimal online policies,”

IEEE Transactions onCommunications , vol. 66, no. 7, pp. 2904–2917, 2018.[43] C. E. Shannon, “A mathematical theory of communication, I and II,”

Bell Syst. Tech. J , vol. 27, pp. 379–423, 1948.[44] J. G. Smith, “The information capacity of amplitude-and variance-constrained scalar Gaussian channels,”

Information and Control ,vol. 18, no. 3, pp. 203–219, 1971.[45] D. Shaviv, A. ¨Ozg¨ur, and H. Permuter, “Can feedback increase thecapacity of the energy harvesting channel?,” in

IEEE InformationTheory Workshop (ITW), 2015 , pp. 1–5, IEEE, 2015.[46] D. Blackwell, L. Breiman, and A. J. Thomasian, “The capacitiesof certain channel classes under random coding,”

The Annals ofMathematical Statistics , vol. 31, no. 3, pp. 558–567, 1960.[47] I. Csiszar and J. K¨orner,

Information theory: coding theorems fordiscrete memoryless systems . Cambridge University Press, 2011. [48] A. J. Goldsmith and P. P. Varaiya, “Capacity, mutual information,and coding for ﬁnite-state Markov channels,” IEEE Transactions onInformation Theory , vol. 42, no. 3, pp. 868–886, 1996.[49] T. Weissman, “Capacity of channels with action-dependent states,”

IEEE Transactions on Information Theory , vol. 56, no. 11, pp. 5396–5411, 2010.[50] W. Hirt and J. L. Massey, “Capacity of the discrete-time Gaussian chan-nel with intersymbol interference,”

IEEE Transactions on InformationTheory , vol. 34, no. 3, pp. 38–38, 1988.[51] M. Mushkin and I. Bar-David, “Capacity and coding for the Gilbert-Elliott channels,”

IEEE Transactions on Information Theory , vol. 35,no. 6, pp. 1277–1290, 1989.[52] D. Russo and J. Zou, “Controlling bias in adaptive data analysisusing information theory,” in

Proceedings of the 19th InternationalConference on Artiﬁcial Intelligence and Statistics (A. Gretton andC. C. Robert, eds.), vol. 51 of

Proceedings of Machine LearningResearch , pp. 1232–1240, PMLR, 09–11 May 2016.[53] A. Xu and M. Raginsky, “Information-theoretic analysis of gener-alization capability of learning algorithms,” in

Advances in NeuralInformation Processing Systems (I. Guyon, U. V. Luxburg, S. Ben-gio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.),pp. 2521–2530, Curran Associates, Inc., 2017.[54] A. Pensia, V. Jog, and P. Loh, “Generalization error bounds fornoisy, iterative algorithms,” in , pp. 546–550, June 2018.[55] M. Hardt, B. Recht, and Y. Singer, “Train faster, generalize better:Stability of stochastic gradient descent,” in

Proceedings of The 33rdInternational Conference on Machine Learning (M. F. Balcan and K. Q.Weinberger, eds.), vol. 48, (New York, New York, USA), pp. 1225–1234, PMLR, 20–22 Jun 2016.[56] M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradientLangevin dynamics,” in

Proceedings of the 28th International Confer-ence on Machine Learning (ICML-11) , pp. 681–688, 2011.[57] R. Ge, F. Huang, C. Jin, and Y. Yuan, “Escaping from saddle points—Online stochastic gradient for tensor decomposition,” in

Conference onLearning Theory , pp. 797–842, 2015.[58] Y. Nesterov, “A method of solving a convex programming problem withconvergence rate O (1 / √ k ) ,” Soviet Mathematics Doklady , vol. 27,pp. 372–376, 1983.[59] B. Biggio and F. Roli, “Wild patterns: Ten years after the rise ofadversarial machine learning,”

Pattern Recognition , vol. 84, pp. 317–331, 2018.[60] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against supportvector machines,” in

Proceedings of the 29th International Conferenceon Machine Learning , pp. 1467–1474, Omnipress, 2012.[61] S. Mei and X. Zhu, “Using machine teaching to identify optimaltraining-set attacks on machine learners.,” in

AAAI , pp. 2871–2877,2015.[62] P. Koh and P. Liang, “Understanding black-box predictions via inﬂu-ence functions,” arXiv preprint arXiv:1703.04730 , 2017.[63] P. Blanchard, R. Guerraoui, J. Stainer, et al. , “Machine learningwith adversaries: Byzantine tolerant gradient descent,” in

Advances inNeural Information Processing Systems , pp. 119–129, 2017.[64] Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning inadversarial settings: Byzantine gradient descent,”

ACM SIGMETRICSPerformance Evaluation Review , vol. 46, no. 1, pp. 96–96, 2019.[65] R. W. Conway and W. L. Maxwell, “A queueing model with statedependent service rates,”

Journal of Industrial Engineering , vol. 12,pp. 132–136, 1962.[66] J. R. Jackson, “Jobshop-like queueing systems,”

Management Science ,vol. 10, no. 1, pp. 131–142, 1963.[67] M. Yadin and P. Naor, “Queueing systems with a removable servicestation,”

Operational Research Society , vol. 14, pp. 393–405, December1963.[68] S. Gupta, “On bulk queues with state dependent parameters,”

Journalof the Operations Research Society of Japan , vol. 9, pp. 69–82, April1967.[69] C. M. Harris, “Queues with state-dependent stochastic service rates,”

Operations Research , vol. 15, pp. 117–130, February 1967.[70] J. H. Dshalalow, “Queueing systems with state dependent parameters,”in

Frontiers in Queueing: Models and Applications in Science andEngineering, Probability and Stochastics Series (J. H. Dshalalow, ed.),ch. 4, pp. 132–136, CRC, 1997.[71] R. Agrawal and V. G. Subramanian, “Optimality of certain channelaware scheduling policies,” in

Proceedings of Allerton Conference onCommunication, Control and Computing , October 2002. [72] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijayakumar,and P. Whiting, “Scheduling in a queueing system with asynchronouslyvarying service rates,”

Probability in the Engineering and Informa-tional Sciences , vol. 18, pp. 191–217, 2004.[73] S. C. Borst, “User-level performance of channel-aware schedulingalgorithms in wireless data networks,”

IEEE/ACM Transactions onNetworking , vol. 13, pp. 636–647, June 2005.[74] A. Goldsmith,

Wireless communications . Cambridge University Press,1 ed., 2005.[75] R. M. Yerkes and J. D. Dodson, “The relation of strength of stimulusto rapidity of habit-formation,”

Journal of Comparative Neurology andPsychology , vol. 18, pp. 459–482, November 1908.[76] L. C. Edie, “Trafﬁc delays at toll booths,”

Journal of the OperationsResearch Society of America , vol. 2, pp. 107–138, May 1954.[77] G. Borghini, L. Astolﬁ, G. Vecchiato, D. Mattia, and F. Babiloni,“Measuring neurophysiological signals in aircraft pilots and car driversfor the assessment of mental workload, fatigue and drowsiness,”

Neuroscience & Biobehavioral Reviews , vol. 44, pp. 58–75, July 2014.[78] M. Shunko, J. Niederhoff, and Y. Rosokha, “Humans are not machines:the behavioral impact of queueing design on service time,”

Manage-ment Science , vol. 64, pp. 57–80, December 2017.[79] T. S. Sheridan, “Supervisory control,” in

Handbook of Human Factorsand Ergonomics, second edition (G. Salvendy, ed.), pp. 1295–1327,John Wiley & Sons, 1997.[80] P. V. Asaro, L. M. Lewis, and S. B. Boxerman, “The impact of inputand output factors on emergency department throughput,”

AdademicEmergency Medicine , vol. 14, pp. 235–242, April 2007.[81] D. S. Kc and C. Terwiesch, “Impact of workload on service timeand patient safety: an economic analysis of hospital operations,”

Management Science , vol. 55, pp. 1486–1498, September 2009.[82] M. Lin, R. J. La, and N. C. Martins, “Stability of a single queuesubject to action-dependent server performance.” preprint available athttp://arxiv.org/abs/1903.00135, 2019.[83] R. Rao and S. Vrudhula, “Efﬁcient online computation of core speedsto maximize the throughput of thermally constrained multi-core pro-cessors,” in

Proceedings of the IEEE/ACM International Conferenceon Computer-Aided Design , November 2008.[84] M. Delasay, A. Ingolfsson, B. Kolfal, and K. Schultz, “Load effect onservice times,”

European Journal of Operational Research , 2018.[85] A. Chatterjee, D. Seo, and L. R. Varshney, “Capacity of systemswith queue-length dependent service quality,”

IEEE Transactions onInformation Theory , vol. 63, pp. 3950–3963, June 2017.[86] M. F. Neuts, “A general class of bulk queues with Poissin input,”

TheAnnals of Mathematical Statistics , vol. 38, pp. 759–770, June 1967.[87] D. Tse and P. Viswanath,

Fundamentals of Wireless Communication .Cambridge University Press, 1 ed., 2005.[88] L. H. Ozarow, S. Shamai, and A. D. Wyner, “Information theoretic con-siderations for cellular mobile radio,”

IEEE Transactions on VehicularTechnology , vol. 43, no. 2, pp. 359–378, 1994.[89] N. C. Beaulieu and J. Hu, “A closed-form expression for the outageprobability of decode-and-forward relaying in dissimilar rayleigh fad-ing channels,”

IEEE Communications Letters , vol. 10, pp. 813–815,December 2006.[90] C. N. Hadjiscostis and R. Touri, “Feedback control utilizing packetdropping links,” in

Proceedings of the IEEE Conference on Decisionand Control , pp. 1205–1210, 2002.[91] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poola, M. I. Jordan, andS. S. Sastry, “Kalman ﬁlering with intermittent observations,”

IEEETransactions on Automatic Control , vol. 49, no. 9, pp. 1453–1464,2004.[92] O. L. V. Costa and M. D. Fragoso, “Stability results for discrete-time linear systems with markovian jumping parameters,”

Journal ofMathematical Analysis and Applications , vol. 179, pp. 154–178, 1993.[93] O. C. Imer, S. Yuksel, and T. Basar, “Optimal control of lti systemsover unreliable communication links,”

Automatica , vol. 42, pp. 1429–1439, 2006.[94] O. C. Imer and T. Basar, “Optimal estimation with limited measure-ments,”

International Journal of Systems, Control and Communica-tions , vol. 2, pp. 5–29, 2010.[95] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poola, and S. S.Sastry, “Foundations of control and estimation over lossy networks,”

Proceedings of the IEEE , vol. 95, pp. 163–187, January 2007.[96] Y. Xu and J. P. Hespanha, “Estimation under uncontrolled and con-trolled communications in networked control systems,” in

Proceedingsof the IEEE Conference on Decision and Control , pp. 842–847,December 2005.[97] W. J. Rugh,

Linear system theory . Prentice Hall, 2 ed., 1996. [98] J. P. Hespanha, Linear systems theory . Princeton University Press,2 ed., 2018.[99] V. Gupta, B. Hassibi, and R. M. Murray, “Optimal lqg control accrosspacket-dropping links,”

Systems and Control Letters , vol. 56, pp. 439–446, 2007.[100] V. Gupta, N. C. Martins, and J. S. Baras, “Optimal output feedbackcontrol using two remote sensors over erasure channels,”

IEEE Trans-actions on Automatic Control , vol. 54, pp. 1463–1476, July 2009.[101] V. Gupta and N. C. Martins, “On stability in the presence of analogerasure channel between the controller and the actuator,”

IEEE Trans-actions on Automatic Control , vol. 55, no. 1, pp. 175–179, 2010.[102] P. Minero, M. Franceschetti, S. Dey, and G. N. Nair, “Data ratetheorem for stabilization over time-varying feedback channels,”

IEEETransactions on Automatic Control , vol. 54, no. 2, pp. 243–255, 2009.[103] N. C. Martins, M. A. Dahleh, and N. Elia, “Feedback stabilization ofuncertain systems in the presence of a direct link,”

IEEE Transactionson Automatic Control , vol. 51, no. 3, pp. 438–447, 2006.[104] A. Sahai and S. Mitter, “The necessity and sufﬁciency of anytimecapacity for stabilization of a linear system over a noisy communicationlink - part i: scalar systems,”

IEEE Transactions on Information Theory ,vol. 52, pp. 3369–3395, August 2006.[105] G. M. Lipsa and N. C. Martins, “Remote state estimation withcommunication costs for ﬁrst-order lti systems,”

IEEE Transactionson Automatic Control , vol. 56, pp. 2013–2025, September 2011.[106] B. Hajek, K. Mitzel, and S. Yang, “Paging and registration in cellularnetworks: jointly optimal policies and an iterative algorithm,”

IEEETransactions on Information Theory , vol. 54, pp. 608–622, February2008.[107] A. Nayyar, T. Basar, D. Teneketzis, and V. V. Veeravalli, “Optimalstrategies for communication and remote estimation with an energyharvesting sensor,”

IEEE Transactions on Automatic Control , vol. 58,pp. 2246–2260, September 2013.[108] S. Park and N. C. Martins, “Individually optimal solutions to a remotestate estimation problem with communication costs,” in

Proceedings ofthe IEEE Conference on Decision and Control , pp. 4014–4019, 2018.[109] A. Molin and S. Hirche, “On the optimality of certainty equivalencefor event-triggered control systems,”

IEEE Transactions on AutomaticControl , vol. 58, no. 2, pp. 470–474, 2013.[110] X. Wang and M. Lemmon, “Event-triggering in distributed networkedcontrol systems,”

IEEE Transactions on Automatic Control , vol. 56,pp. 586 – 601, March 2011.[111] M. L. Puterman,

Markov decision processes: discrete stochastic dy-namic programming . Wiley, 2005.[112] A. S. Leong, S. Dey, and D. E. Quevedo, “Transmission scheduling forremote state estimation and control with an energy harvesting sensor,”

Automatica , vol. 91, pp. 54–60, 2018.[113] Y. Li, F. Zhang, D. E. Quevedo, V. Lau, S. Dey, and L. Shi, “Powercontrol of an energy harvesting sensor for remote state estimation,”

IEEE Transactions on Automatic Control , vol. 62, pp. 277–290, January2017.[114] M. Nourian, A. S. Leong, and S. Dey, “Optimal energy allocation forkalman ﬁltering over packet dropping links with imperfect acknowl-edgements and energy harvesting constraints,”

IEEE Transactions onAutomatic Control , vol. 59, pp. 2128–2143, August 2014.[115] S. Trimpe and R. D’Andrea, “Event-based state estimation withvariance-based triggering,”

IEEE Transactions on Automatic Control ,vol. 59, pp. 3266–3281, December 2014.[116] P. R. Kumar and P. Varayia,

Stochastic Systems: Estimation, Identiﬁ-cation and Adaptive Control . SIAM, 2015.[117] S. Knorn and S. Dey, “Optimal energy allocation for linear control withpacket loss under energy harvesting constraints,”

Automatica , vol. 77,pp. 259–267, 2017.[118] M. Lin, R. J. La, and N. C. Martins, “Remote state estimation accrossan action-dependent packet-drop link,” in

Proceedings of the IEEEConference on Decision and Control , 2018.[119] O. Orhan, D. G¨und¨uz, and E. Erkip, “Source-channel coding underenergy, delay and buffer constraints,”

IEEE Transactions on WirelessCommuncations , vol. 14, no. 7, pp. 3836–3849, 2015.[120] P. Grover, K. Woyach, and A. Sahai, “Towards a communication-theoretic understanding of system-level power consumption,”

IEEEJournal on Selected Areas in Communications , vol. 29, September2011.[121] D. O. Wheeler, D. P. Koch, J. S. Jackson, T. W. McLain, and R. W.Beard, “Relative navigation: a keyframe-based approach for observablegps-degraded navigation,”

IEEE Control Systems Magazine , vol. 38,pp. 30–48, July 2018. [122] C. Kreucher, K. Kastella, and A. O. Hero III, “Sensor managementusing an active sensing approach,”

Signal Processing , vol. 85, pp. 607–624, 2005.[123] Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Sampling of the wienerprocess for remote estimation over a channel with random delay,”

ArXiv , 2018.[124] Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation ofthe wiener process over a channel with random delay,”

ArXiv , 2018.[125] G. M. Lipsa and N. C. Martins, “Optimal state estimation in thepresence of communication costs and packet drops,” in

Proceedingsof Allerton Conference on Communication, Control and Computing ,pp. 160–169, 2009.[126] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff,“Update or wait: how to keep your data fresh,”

IEEE Transactions onInformation Theory , vol. 63, pp. 7492–7508, November 2017.[127] A. Arafa, J. Yang, S. Ulukus, and H. V. Poor, “Age-minimal transmis-sion for energy harvesting sensors with ﬁnite batteries: online policies,”

ArXiv , 2018.[128] C. Kam, S. Kompela, G. D. Nguyen, J. E. Wieselthier, andA. Ephremides, “Towards an effective age of information: remoteestimation of a Markov source,” in

Proceedings of the IEEE Conferenceon Computer Communication Workshops: AoI Workshop , pp. 367–372,2018.[129] T. Z. Ornee and Y. Sun, “Sampling for remote estimation throughqueues: age of information and beyond,”

ArXiv , February 2019.[130] M. Pajic, J. Weimer, N. Bezzo, O. Sokolsky, G. J. Pappas, and I. Lee,“Design and implementation of attack-resilient cyberphysical systems,”

IEEE Control Systems Magazine , pp. 66–81, April 2017.[131] H. Fawzi, P. Tabuada, and S. Diggavi, “Secure estimation and controlfor cyber-physical systems under adversarial attacks,”

IEEE Transac-tions on Automatic Control , vol. 59, no. 6, pp. 1454–1467, 2014.[132] Y. Shoukry and P. Tabuada, “Event-triggered state observers for sparsesensor noise/attacks,”

IEEE Transactions on Automatic Control , vol. 61,pp. 2079–2091, August 2016.[133] Y. Mo, E. Garone, A. Casavola, and B. Sinopoli, “False data injectionattacks against state estimation in wireless sensor networks,” in

Pro-ceedings of the IEEE Conference on Decision and Control , pp. 5967–5972, 2010.[134] I. Jovanov and M. Pajic, “Secure state estimation with cumulativemessage authentication,” in

Proceedings of the IEEE Conference onDecision and Control , pp. 2074–2079, 2018.[135] S. Sundaram and C. N. Hadjiscostis, “Distributed function calculationvia linear iterative strategies in the presence of malicious agents,”

IEEETransactions on Automatic Control , vol. 56, pp. 1495–1508, July 2011.[136] D. I. Urbina, J. Giraldo, A. A. Cardenas, J. Valente, M. Faisal, N. O.Tippenhauer, J. Ruths, R. Candell, and H. Sandberg, “Survey andnew directions for physics-based attack detection in control systems,”Tech. Rep. NIST GCR 16-010, NIST, U.S. Department of Commerce,November 2016.[137] A. Cetinkaya, H. Ishii, and T. Hayakawa, “An overview on denial-of-service attacks in control systems: attack models and security analyses,”

Entropy , vol. 21, pp. 1–29, February 2019.[138] D. Easley and J. Kleinberg,

Networks, Crowds and Markets . CambridgeUniversity Press, 1 ed., 2010.[139] F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron, “Decentral-ized task-aware scheduling for data center networks,” in

Proceedingsof ACM SIGCOMM , pp. 431–442, August 2014.[140] A. Mahajan and D. Teneketzis,

Multi-armed bandit problems ,ch. Multi-armed bandit problems, pp. 121–151. Springer, 2008.[141] P. Whittle, “Restless bandits: activity allocation in a changing world,”

Journal of Applied Probability , vol. 25, pp. 287–298, 1988.[142] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”in

Advances in neural information processing systems , pp. 2672–2680,2014.[143] K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang, “Gen-erative adversarial networks: introduction and outlook,”

IEEE/CAAJournal of Automatica Sinica , vol. 4, no. 4, pp. 588–598, 2017.[144] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta,and A. A. Bharath, “Generative adversarial networks: An overview,”

IEEE Signal Processing Magazine , vol. 35, no. 1, pp. 53–65, 2018.[145] D. A. Bristow, M. Tharayil, and A. G. Alleyne, “A survey of iterativelearning control,”

IEEE control systems magazine , vol. 26, no. 3,pp. 96–114, 2006.[146] B. Recht, “A tour of reinforcement learning: The view from contin-uous control,”

Annual Review of Control, Robotics, and AutonomousSystems , 2018. [147] L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarialreinforcement learning,” in Proceedings of the 34th InternationalConference on Machine Learning-Volume 70 , pp. 2817–2826, JMLR.org, 2017.[148] J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversar-ial inverse reinforcement learning,” arXiv preprint arXiv:1710.11248arXiv preprint arXiv:1710.11248