[PDF] Optimal Downlink-Uplink Scheduling of Wireless Networked Control for Industrial IoT

Abstract

This paper considers a wireless networked control system (WNCS) consisting of a dynamic system to be controlled (i.e., a plant), a sensor, an actuator and a remote controller for mission-critical Industrial Internet of Things (IIoT) applications. A WNCS has two types of wireless transmissions, i.e., the sensor's measurement transmission to the controller and the controller's command transmission to the actuator. In this work, we consider a practical half-duplex controller, which introduces a novel transmission-scheduling problem for WNCSs. A frequent scheduling of sensor's transmission results in a better estimation of plant states at the controller and thus a higher quality of control command, but it leads to a less frequent/timely control of the plant. Therefore, considering the overall control performance of the plant in terms of its average cost function, there exists a fundamental tradeoff between the sensor's and the controller's transmissions. We formulate a new problem to optimize the transmission-scheduling policy for minimizing the long-term average cost function. We derive the necessary and sufficient condition of the existence of a stationary and deterministic optimal policy that results in a bounded average cost in terms of the transmission reliabilities of the sensor-to-controller and controller-to-actuator channels. Also, we derive an easy-to-compute suboptimal policy, which notably reduces the average cost of the plant compared to a naive alternative-scheduling policy.

Full PDF

OOptimal Downlink-Uplink Scheduling of WirelessNetworked Control for Industrial IoT

Kang Huang, Wanchun Liu † , Yonghui Li, Branka Vucetic, and Andrey Savkin Abstract — This paper considers a wireless networked controlsystem (WNCS) consisting of a dynamic system to be controlled(i.e., a plant), a sensor, an actuator and a remote controller formission-critical Industrial Internet of Things (IIoT) applications.A WNCS has two types of wireless transmissions, i.e., the sensor’smeasurement transmission to the controller and the controller’scommand transmission to the actuator. In the literature ofWNCSs, the controllers are commonly assumed to work in a full-duplex mode by default, i.e., being able to simultaneously receivethe sensor’s information and transmit its own command to theactuator. In this work, we consider a practical half-duplex con-troller, which introduces a novel transmission-scheduling problem for WNCSs. A frequent scheduling of sensor’s transmissionresults in a better estimation of plant states at the controller andthus a higher quality of control command, but it leads to a lessfrequent/timely control of the plant. Therefore, considering theoverall control performance of the plant in terms of its averagecost function, there exists a fundamental tradeoff between thesensor’s and the controller’s transmissions. We formulate anew problem to optimize the transmission-scheduling policyfor minimizing the long-term average cost function. We derivethe necessary and sufﬁcient condition of the existence of astationary and deterministic optimal policy that results in abounded average cost in terms of the transmission reliabilitiesof the sensor-to-controller and controller-to-actuator channels.Also, we derive an easy-to-compute suboptimal policy, whichnotably reduces the average cost of the plant compared to anaive alternative-scheduling policy.

Index Terms —Wireless communication, wireless control,transmission scheduling, performance analysis, IIoT.

I. I

NTRODUCTION

Driven by recent development of mission-critical IndustrialInternet of Things (IIoT) applications [2]–[4] and signiﬁcantadvances in wireless communications, networking, comput-ing, sensing and control [5]–[8], wireless networked controlsystems (WNCSs) have recently emerged as a promisingtechnology to enable reliable and remote control of industrialcontrol systems. They have a wide range of applications infactory automation, process automation, smart grid, tactileInternet and intelligent transportation systems [9]–[13]. Es-sentially, a WNCS is a spatially distributed control systemconsisting of a plant with dynamic states, a set of sensors, aremote controller, and a set of actuators.

K. Huang, W. Liu, Y. Li and B. Vucetic are with School of Electricaland Information Engineering, The University of Sydney, Australia. Emails: { kang.huang, wanchun.liu, yonghui.li, branka.vucetic } @sydney.edu.au. A.Savkin is with School of Electrical Engineering and Telecommunications,University of New South Wales, Australia. Email: [email protected].( W. Liu is the corresponding author. )Part of the paper has been accepted by Proc. IEEE Globecom 2019 [1].

A WNCS has two types of wireless transmissions, i.e., thesensor’s measurement transmission to the controller and thecontroller’s command transmission to the actuator. The pack-ets carrying plant-state information and control commandscan be lost, delayed or corrupted during their transmissions.Most of existing research in WNCS adopted a separate designapproach , i.e., either focusing on remote plant-state estimationor remote plant-state control through wireless channels. In[14] and [15], the optimal policies of remote plant-stateestimation with a single and multiple sensors’ measurementswere proposed, respectively. Some advanced remote plant-state control methods were investigated to overcome theeffects of transmission delay [16] and detection errors [17],[18].The fundamental co-design problem of a WNCS in termsof the optimal remote estimation and control were tackled in[19]. Speciﬁcally, the controller was ideally assumed to workin a full-duplex (FD) mode that can simultaneously receivethe sensor’s packet and transmit its control packet by default.The scheduling of sensor’s and controller’s transmissions hasrarely been considered in the area of WNCSs, while trans-mission scheduling is actually an important issue for prac-tical wireless communication systems [20]–[22]. Moreover,although an FD system can improve the spectrum efﬁciency,it faces challenges of balancing between the performanceof self-interference cancellation, device cost and power con-sumption, and may not be feasible in practical systems [23].In this paper, we focus on the design of a WNCS us-ing a practical half-duplex (HD) controller, which naturallyintroduces a fundamental transmission-scheduling problem,i.e., to schedule the sensor’s measurement transmission tothe controller or the controller’s command transmission tothe actuator. A frequent schedule of the sensor’s transmissionresults in a better estimation of plant states and thus ahigher quality of the control command. On the other side, afrequent schedule of controller’s transmission leads to a moretimely plant control. Thus, considering the overall controlperformance of plant’s states, e.g., the average cost functionof the plant, there exists a fundamental tradeoff betweenthe sensor’s and the controller’s transmission. We propose atractable framework to model this problem and enable theoptimal design of the WNCS. The main contributions of thepaper are summarized as follows: • We propose a WNCS with an HD controller, where thecontroller schedules the sensor’s measurement transmis-sion and its own control-command transmission depend- a r X i v : . [ c s . I T ] O c t ng on both the estimation quality of current plant statesand the current cost function of the plant. • We formulate a problem to optimally design thetransmission-scheduling policy to optimize the long-term control performance of the WNCS in terms of theaverage cost function for both the one-step and v -stepcontrollable plants. As the long-term average cost of theplant may not be bounded with high transmission-errorprobabilities leading to an unstable situation, in the staticchannel scenario, we derive a necessary and sufﬁcientcondition in terms of the transmission reliabilities of thesensor-controller and controller-actuator channels and theplant parameters to ensure the existence of an optimalpolicy that stabilizes the plant. In the fading channelscenario, we derive a necessary condition and a sufﬁcientcondition in terms of the uplink and downlink channelqualities, under which the optimal transmission schedul-ing policy exits. • We also derive a suboptimal policy with a low com-putation complexity. The numerical results show thatthe suboptimal policy provides an average cost closeto the optimal policy, and signiﬁcantly outperforms thebenchmark policy, i.e., scheduling the sensor’s and thecontroller’s transmissions alternatively.The remainder of the paper is organized as follows: InSection II, we introduce a WNCS with an HD controller. InSection III, we analyze the estimation-error covariance and theplant-state covariance of the WNCS and formulate an uplink-downlink transmission-scheduling problem. In Sections IVand V, we analyze and solve the transmission-schedulingproblem for one-step and multi-step controllable WNCSs,respectively, in the static channel scenario. In Section VI, weextend the design to the fading channel scenario. Section VIInumerically evaluates the performance of WNCSs with dif-ferent transmission-scheduling policies. Finally, Section VIIIconcludes the paper.Notations: ( · ) is the indicator function. ρ ( A ) denotes thespectral radius of the square matrix A . ( · ) (cid:62) is the matrix-transpose operator. N is the set of positive integers.II. S YSTEM M ODEL

We consider a discrete-time WNCS consisting of a dynamicplant with multiple states, a wireless sensor, an actuator, aremote controller, as illustrated in Fig. 1. In general, the sensormeasures the states of the plant and sends the measurementsto the remote controller through a wireless uplink (i.e., sensor-controller ) channel. The controller generates controlcommands based on the sensor’s feedback and sends thecommands to the actuator through a wireless downlink (i.e., controller-actuator ) channel. The actuator controls the plantusing the received control commands.

A. Dynamic Plant

The plant is a linear time invariant (LTI) discrete-timesystem modeled as [17], [18], [24] x k +1 = Ax k + Bu k + w k , ∀ k (1) PlantControllerActuator Sensor x k C k U p li n k D o w n li n k δ k γ k Fig. 1. The system architecture. where x k ∈ R n is the plant-state vector at time k , u k ∈ R m is the control input applied by the actuator and w k ∈ R n isthe plant disturbance independent of x k and is a discrete-timezero-mean Gaussian white noise process with the covariancematrix R ∈ R n × n . A ∈ R n × n and B ∈ R n × m arethe system-transition matrix and the control-input matrix,respectively, which are constant. The discrete time step of thesystem (1) is T , i.e., the plant states keep constant during atime slot of T and changes slot-by-slot.We assume that the plant is an unstable system [14], [17],i.e., the spectral radius of A , ρ ( A ) , is larger than one. In otherwords, the plant-state vector x k grows unbounded without thecontrol input, i.e., u k = , ∀ k .We consider the long-term average (quadratic) cost of thedynamic plant deﬁned as (see e.g. [17], [19]) J = lim K →∞ K K − (cid:88) k =0 E (cid:2) x (cid:62) k Qx k (cid:3) = lim K →∞ K K − (cid:88) k =0 Tr ( QP k ) , (2)where Q is a symmetric positive semideﬁnite matrix, and P k is the plant-state covariance deﬁned as P k (cid:44) E (cid:2) x k x (cid:62) k (cid:3) . (3) Deﬁnition 1 (Closed-loop Stability [17], [19]) . The plant(1) is stabilized by the sequence { u k } , if the average costfunction (2) is bounded. B. HD Operation of the Controller

We assume that the controller is an HD device, and thusit can either receive the sensor’s measurement or transmit itscontrol command to the actuator at a time. Let a k ∈ { , } bethe controller’s transmission-scheduling variable in time slot k . The sensor’s or the controller’s transmission is scheduledin time slot k if a k = 1 or , respectively.The sensor measures the plant states at the beginning ofeach time slot. The measurement is assumed to be per-fect [17], [18], [24]. We use δ k to indicate the successfulnessof the sensor’s transmission in time slot k . Thus, δ k = 1 if the sensor is scheduled to send a packet carrying itsmeasurement to the controller in time slot k (i.e., a k = 1 )and the transmission is successful, and δ k = 0 otherwise.The controller generates a control-command-carryingpacket at the beginning of each time slot. Similarly, we use γ k to indicate the successfulness of the controller’s transmissionn time slot k . Thus, γ k = 1 if the controller is scheduled tosend the control packet to the actuator in time slot k (i.e., a k = 2 ) and the transmission is successful, and γ k = 0 otherwise. We also assume that the controller has a perfectfeedback from the actuator indicating the successfulness ofthe packet detection [19]. Thus, the controller knows whetherits control command will be applied or not.

We assume thatthe packets in both the sensor-to-controller and controller-to-actuator channels have the same packet length and is less than T [14], [19]. C. Wireless Channel

We consider both the static channel and the fading channelscenarios of the WNCS. The static channel scenario is forthe IIoT applications with low mobilities, e.g., process controlof chemical and oil-reﬁnery plant, while the fading channelscenario is for high mobility applications, e.g., motion controlof automated guided vehicles in warehouses.For the static channel scenario, we assume that the packet-error probabilities of the uplink (sensor-controller) and down-link (controller-actuator) channels are p s and p c , respectively,which do not change with time, where p s , p c ∈ (0 , .For the fading channel scenario, we adopt a practical ﬁnite-state Markov channel model, which captures the inherentproperty of practical fading channels for which the channelstates change with memories [25]. It is assumed that theuplink channel and the downlink channel have B s and B c states, respectively, and the packet loss probability of the i thchannel state of the uplink channel and the j th channel state ofthe downlink channel are ω i and ξ j , respectively. The matricesof the channel state transition probabilities of the uplink anddownlink channels are given as D s (cid:44)  d s , · · · d sB s , ... . . . ... d s ,B s · · · d sB s ,B s  , (4)and D c (cid:44)  d c , · · · d cB c , ... . . . ... d c ,B c · · · d cB c ,B c  , (5)respectively. The packet-error probabilities of the uplink anddownlink channels at time k are p s,k and p c,k , respectively,where p s,k ∈ { ω , · · · , ω B s } and p c,k ∈ { ξ , · · · , ξ B c } . D. Optimal Plant-State Estimation

At the beginning of time slot ( k + 1) , before generating aproper control command, the controller needs to estimate thecurrent states of the plant, x k +1 , using the previously receivedsensor’s measurement and also the implemented control inputbased on the dynamic plant model (1). The optimal plant-stateestimator is given as [14] ˆ x k +1 = (cid:40) Ax k + Bu k , a k = 1 , δ k = 1 , A ˆ x k + Bu k , otherwise . (6) E. v -Step Predictive Plant-State Control As the transmission between the controller and the actuatoris unreliable, the actuator may not successfully receive thecontroller’s packet containing the current control command.To provide robustness against packet failures, we considera predictive control approach [26]. In general, the controllersends both the current command and the predicted future com-mands to the actuator at each time. If the current command-carrying packet is lost, the actuator will apply the previouslyreceived command that was predicted for the current time slot.The details of the predictive control method is given below.The controller adopts a conventional linear predictive con-trol law [17], which generates a sequence of v control com-mands including one current command and ( v − predictedfuture commands in each time slot k as C k = (cid:2) K ˆ x k , K ( A + BK )ˆ x k , · · · , K ( A + BK ) v − ˆ x k (cid:124) (cid:123)(cid:122) (cid:125) ( v − predicted control commands (cid:3) , (7)where the constant v is the length of predictive control, andthe constant K ∈ R m × n is the controller gain, which satisﬁesthe condition that ρ ( A + BK ) < . (8)If time slot k is scheduled for the controller’s transmission,the controller sends a packet containing v control commands C k to the actuator. Note that in most communication protocols,the minimum packet length is longer than the time durationrequired for transmitting a single control command [26], andthus it is wise to send multiple commands in the one packetwithout increasing the packet length.The actuator maintains a command buffer of length v , U k (cid:44) (cid:2) u k , u k , · · · , u v − k (cid:3) . If the current controller’s packet issuccessfully received, the actuator resets the buffer with thereceived command sequence, otherwise, the buffer shifts onestep forward, i.e., U k = (cid:40) C k , a k = 2 , γ k = 1 , (cid:2) u k − , u k − , · · · , u v − k − , (cid:3) , otherwise. (9)The actuator always applies the ﬁrst command in the bufferto the plant. Thus, the actuator’s control input in time slot k is u k (cid:44) u k . (10)To indicate the number of passed time slots from the lastsuccessfully received control packet, we deﬁne the control-quality indicator of the plant in time slot k as η k = (cid:40) , a k = 2 , γ k = 1 ,η k − + 1 , otherwise . (11)Speciﬁcally, η k − is the number of the time slots passed fromthe most recent controller’s successful transmission to thecurrent time slot k . If (8) is not satisﬁed, the plant (1) can never be stabilized even if theuplink and downlink transmissions are always perfect see e.g., [19], [27]. rom (7), (9), (10) and (11), the control input can berewritten as u k = (cid:40) K ( A + BK ) η k − ˆ x k +1 − η k , if η k ≤ v, , if η k > v. (12)To better explain the intuition behind the predictive controlmethod (7), (9) and (10), we give an example below. Example 1.

Assume that a sequence of the controller’scommands is successfully received in time slot k and the ac-tuator will not receive any further commands in the following v − time slots. Consider an ideal case that the estimationis accurate in time slot k , i.e., ˆ x k = x k , and the plantdisturbance, w k = , ∀ k . Taking (12) into (1) , the plant-statevector at ( k + j ) , ∀ j ≤ v can be derived as x k + j = ( A + BK ) j x k . (13) Therefore, if the controller gain K is chosen properly andmakes the spectral radius of ( A + BK ) less than one, eachstate in x k can approach zero gradually within the v stepseven without receiving any new control packets. In this work, we mainly focus on two types of plantsapplying the predictive control method as follows.

Case 1:

The controller gain K satisﬁes the condition that A + BK = . (14)This case is named as the one-step controllable case [28],since once a control packet is received successfully, the plant-state vector x k can be driven to zero in one step in theabove mentioned ideal setting, i.e., x k +1 = k = in(13). By taking (14) into (7), the ( v − predicted commandsare all , thus the controller only needs to send the currentcontrol command to the actuator without any prediction , andthe length of U and C , v , is equal to one. Case 2:

The controller gain K satisﬁes the conditionthat [28] ( A + BK ) v = , v > . (15)This case is named as the v -step controllable case [28], sincethe plant state x k can be driven to zero in v steps after asuccessful reception of a control packet in the ideal setting ,i.e., x k + v = in (13).The other cases not satisfying the conditions (14) nor (15),will also be discussed in the following section.III. A NALYSIS OF THE D OWNLINK -U PLINK S CHEDULING

As the controller estimates the current plant states andutilizes the estimation to control the future plant states, weanalyze the estimation-error covariance and the plant-statecovariance in the sequel. Note that the ideal setting here is only for the explanation of the term of“one-step controllable”, while we only consider practical settings in the restof the paper.

A. Estimation-Error Covariance

Using (1) and (6), the estimation error in time slot ( k + 1) is obtained as e k +1 (cid:44) x k +1 − ˆ x k +1 = (cid:40) w k , a k = 1 , δ k = 1 , Ae k + w k , otherwise . (16)Thus, we have the updating rule of the estimation-errorcovariance, U k (cid:44) E [ e k e (cid:62) k ] , as U k +1 (cid:44) E [ e k +1 e (cid:62) k +1 ] = (cid:40) R a k = 1 , δ k = 1 , AU k A (cid:62) + R otherwise . (17)We deﬁne the estimation-quality indicator of the plant intime slot k , τ k , as the number of passed time slots fromthe last successfully received sensor’s packet. Then, the state-updating rule of τ k is obtained as τ k +1 = (cid:40) , a k = 1 , δ k = 1 ,τ k + 1 , otherwise . (18)Once a successful sensor’s transmission occurs (e.g., thereexists k (cid:48) such that U k (cid:48) = R ), from (17) and (18), it can beshown that the estimation-error covariance U k , ∀ k ≥ k (cid:48) , issimply a function of the estimation-quality indicator τ k , i.e., U k = F ( τ k ) , (19)where the function F ( · ) is deﬁned as F ( τ ) (cid:44) τ (cid:88) i =1 A i − R ( A (cid:62) ) i − , τ ∈ N . (20)As we focus on the long-term performance of the sys-tem, without loss of generality, we assume that U k ∈{ F (1) , F (2) , F (3) , · · · } for all k . From (18) and (19), theupdating rule of U k is obtained as U k +1 = F ( τ k +1 ) = (cid:40) F (1) a k = 1 , δ k = 1 , F ( τ k + 1) otherwise . (21) B. Plant-State Covariance of One-Step Controllable Case

Taking (11) and (14) into (12), the control input of theone-step controllable case can be simpliﬁed as u k = (cid:40) K ˆ x k , a k = 2 , γ k = 1 , , otherwise . (22)Substituting (22) into (1) and using (14), the plant-state vectorcan be rewritten as x k +1 = (cid:40) Ax k + BK ˆ x k + w k = Ae k + w k , a k = 2 , γ k = 1 , Ax k + w k , otherwise.(23)Thus, the plant-state covariance, P k , has the updating rule as P k +1 (cid:44) E [ x k +1 x (cid:62) k +1 ] = (cid:40) AU k A (cid:62) + R a k = 2 , γ k = 1 , AP k A (cid:62) + R otherwise. (24) i +1 k t ik kτ ik τ i +1 k η ik φ ik time ≈ ≈ Fig. 2. Illustration of the state parameters, where red vertical bars denotesuccessful controller’s transmissions and blue vertical bars denote the mostrecent successful sensor’s transmissions prior to the successful controller’stransmissions.

From (20), (21) and (24), we see that the plant-state covari-ance P k will only take value from the countable inﬁnity set { F (2) , F (3) , · · · } after a successful controller’s transmission.Again, as we focus on the long-term performance of thesystem, we assume that P k ∈ { F (2) , F (3) , · · · } for all k ,without loss of generality.By introducing the variable φ k ∈ { , , · · · } , the plant-statecovariance in time slot k can be written as P k = F ( φ k ) , (25)where φ k is the state-quality indicator of the plant in timeslot k . Note that the state covariance only depends on thestate parameter φ k .From (24) and (19), the updating rules of P k and φ k in(25) are given by, respectively, as P k +1 = F ( φ k +1 ) = (cid:40) F ( τ k + 1) a k = 2 , γ k = 1 , F ( φ k + 1) otherwise , (26) φ k +1 = (cid:40) τ k + 1 , a k = 2 , γ k = 1 ,φ k + 1 , otherwise . (27)From (18) and (27), it is easy to prove that φ k ≥ τ k , ∀ k . C. Plant-State Covariance of v -Step Controllable Case Taking (12) into (1), the plant-state vector is rewritten as x k +1 = (cid:40) Ax k + BK ( A + BK ) η k − ˆ x k +1 − η k + w k , if η k ≤ v, Ax k + w k , if η k > v. (28)Using the property (15), we have the state-updating rule as x k = Ax k − + BK ( A + BK ) η k − − ˆ x k − η k − + w k − . (29)Different from the one-step controllable case in (23), wherethe current state vector relies on the previous-step estimation,it depends on the state estimation η k − steps ago in the v -stepcontrollable case.Inspired by the one-step controllable case (25), we aimat deriving the plant-state covariance in terms of a set ofstate parameters. First, we deﬁne a sequence of variables, t ik , i = 1 , · · · , v , where t ik is the time-slot index of the i th latestsuccessful controller’s transmission prior to the current timeslot k as illustrated in Fig. 2. Then, we deﬁne the following state parameters τ ik (cid:44) (cid:40) τ k , i = 0 ,τ t ik , i = 1 , , · · · v, (30) η ik (cid:44) (cid:40) η k − = k − t k , i = 0 ,t ik − t i +1 k , i = 1 , , · · · v − . (31)Speciﬁcally, η ik measures the delay between two consecutivecontroller’s successful transmissions; τ ik is the estimation-quality indicator of time slot t ik . Last, we deﬁne the stateparameters φ ik as φ ik (cid:44) η ik + τ i +1 k , i = 0 , · · · , v − . (32)Using the state-transition rules of η k and τ k in (11) and(18), and the deﬁnitions (30), (31) and (32), the state-transition rules of τ ik , η ik and φ ik can be obtained, respectively, τ ik +1 =  , i = 0 , a k = 1 , δ k = 1 ,τ k + 1 , i = 0 , otherwise ,τ i − k , i = 1 , · · · , v − , a k = 2 , γ k = 1 ,τ ik , i = 1 , · · · , v − , otherwise , (33) η ik +1 =  , i = 0 , a k = 2 , γ k = 1 ,η k + 1 , i = 0 , otherwise ,η i − k , i = 1 , · · · , v − , a k = 2 , γ k = 1 ,η ik , i = 1 , · · · , v − , otherwise , (34) φ ik +1 =  τ k + 1 , i = 0 , a k = 2 , γ k = 1 ,φ ik + 1 , i = 0 , otherwise ,φ i − k , i = 1 , · · · , v − , a k = 2 , γ k = 1 ,φ ik , i = 1 , · · · , v − , otherwise. (35)Then, we can derive the plant-state covariance in a closedform in terms of the state parameters as follows. Proposition 1.

The plant-state covariance P k in time slot k is P k = F ( φ k ) + v − (cid:88) i =0 G  i (cid:88) j =0 φ jk − i (cid:88) j =0 τ j +1 k , ( φ i +1 k > τ i +1 k ) (cid:0) F ( φ i +1 k ) − F ( τ i +1 k ) (cid:1)(cid:1) , (36)where the summation operator has the property that (cid:80) bi = a ( · ) = 0 if a > b , F ( · ) is deﬁned in (20), and G ( x, Y ) (cid:44) ( A + BK ) x Y (( A + BK ) x ) (cid:62) . (37) Proof.

See Appendix A.

Remark 1.

Proposition 1 states that the state covariance P k of a v -step controllable plant is determined by (2 v − stateparameters, i.e., τ ik , i = 1 , · · · , v − and φ ik , i = 0 , · · · , v − . Remark 2.

In practice, it is possible that the plant (1) is ¯ v -step controllable, i.e., ( A + BK ) ¯ v = , where ¯ v > v ;it is also possible that when the controller gain K is pre-determined and ﬁxed, one cannot ﬁnd ¯ v ∈ N such that ( A + BK ) ¯ v = . Moreover, the plant may not be ﬁnite-stepontrollable, i.e., one cannot ﬁnd a set of K and ¯ v ∈ N suchthat ( A + BK ) ¯ v = . In these cases, where conditions (14) and (15) are not satisﬁed, we can show that the covariance P k has incountably inﬁnite many values and cannot be expressedby ﬁnite number of state parameters as in Proposition 1. Fur-thermore, the process { P k } is not stationary making the long-term average cost function (2) difﬁcult to evaluate. However,when v is sufﬁciently large, ( A + BK ) v approaches as ρ ( A + BK ) < . Thus, the plant-state vector in (58) of theproof of Proposition 1 obtained by letting ( A + BK ) v = ,is still a good approximation of x k for these cases, andhence Proposition 1 can be treated as a countable-state-spaceapproximation of the plant-state covariance.D. Problem Formulation The uplink-downlink transmission-scheduling policy is de-ﬁned as the sequence { a , a , · · · , a k , · · · } , where a k isthe transmission-scheduling action in time slot k . In thefollowing, we optimize the transmission-scheduling policy forboth the one-step and multi-step controllable plants such thatthe average cost of the plant in (2) is minimized , i.e., min a ,a , ··· ,a k , ··· J = lim K →∞ K K − (cid:88) k =0 Tr ( QP k ) . (38)IV. O NE -S TEP C ONTROLLABLE C ASE : O

PTIMAL T RANSMISSION -S CHEDULING P OLICY

We ﬁrst investigate the optimal transmission schedulingpolicy for the one-step controllable case, as it will also shedsome light onto the optimal policy design of general multi-step controllable cases. Note that in this section and thefollowing section, we focus on the static channel scenario,and the design method of the optimal scheduling policies canbe extended to the fading channel scenario, which will bediscussed in Section VI.

A. MDP Formulation

From (26), (18) and (27), the next state cost P k +1 , andthe next states τ k +1 and φ k +1 only depend on the currenttransmission-scheduling action a k and the current states τ k and φ k . Therefore, we can reformulate the problem (38) intoa Markov Decision Process (MDP) as follows.1) The state space is deﬁned as S (cid:44) { ( τ, φ ) : φ ≥ τ, φ (cid:54) = τ + 1 , τ ∈ N , φ ∈ { , , · · · }} as illustrated in Fig. 3. Notethat the states with φ = τ + 1 are transient states (whichcan be veriﬁed using (18) and (27)) and are not included in S , since we only focus on the long-term performance of thesystem. The state of the MDP at time k is s k (cid:44) ( τ k , φ k ) ∈ S .2) The action space of the MDP is deﬁned as A (cid:44) { , } .The action at time k , a k (cid:44) π ( s k ) ∈ A , indicates the sensor’stransmission ( a k = 1) or the controller’s transmission ( a k =2) in time slot k . In this work, we only focus on the design of the scheduling policy { a k } ,when the controller gain K and the length of predictive control v are givenand ﬁxed. In our future work, the controller gain, the length of predictivecontrol and the scheduling sequence will be jointly optimized. (1 , τφ Fig. 3. The state space S (shaded dots) of the MDP.

3) The state-transition probability P ( s (cid:48) | s , a ) is the proba-bility that the state s at time ( k − transits to s (cid:48) at time k with action a at time ( k − . We drop the time index k heresince the transition is time-homogeneous. Let s = ( τ, φ ) and s (cid:48) = ( τ (cid:48) , φ (cid:48) ) denote the current and next state, respectively.From (18) and (27), the state-transition probability can beobtained as P ( s (cid:48) | s , a ) =  p s , if a = 1 , s (cid:48) = ( τ + 1 , φ + 1)1 − p s , if a = 1 , s (cid:48) = (1 , φ + 1) p c , if a = 2 , s (cid:48) = ( τ + 1 , φ + 1)1 − p c , if a = 2 , s (cid:48) = ( τ + 1 , τ + 1)0 , otherwise . (39)4) The one-stage cost of the MDP, i.e., the one-stepquadratic-cost of the plant in (2), is a function of the currentstate φ as c ( s ) = c ( φ ) (cid:44) Tr ( QP ) = Tr ( QF ( φ )) , (40)which is independent of the state τ and the action a . Thefunction c ( · ) has the following property: Lemma 1.

The one-stage cost function c ( φ ) in (40) is astrictly monotonically increasing function of φ , where φ ∈{ , , · · · } . Proof.

Since R is a positive deﬁnite matrix, MRM (cid:62) ispositive deﬁnite for any n -by- n non-zero matrix M . Also,we have A i (cid:54) = , ∀ i ∈ N , as it is assumed that ρ ( A ) > in Section II-A. Due to the fact that the product of positive-deﬁnite matrices has positive trace and Q is positive deﬁnite,Tr (cid:0) QA i R ( A i ) (cid:62) (cid:1) is positive, ∀ i ∈ N . From the deﬁnition of F ( · ) in (20), we have c ( φ + z ) − c ( φ ) = Tr ( QF ( φ + z )) − Tr ( QF ( φ ))= φ + z (cid:88) i = φ +1 Tr (cid:0) QA i R ( A i ) (cid:62) (cid:1) > , ∀ z ∈ N . (41)This completes the proof.Therefore, the problem (38) is equivalent to ﬁnding theoptimal policy π ( s s ) , ∀ s ∈ S by solving the classical averagecost minimization problem of the MDP [29]. If a stationaryand deterministic optimal policy of the MDP exists, we caneffectively ﬁnd the optimal policy by using standard methodssuch as the relative value iteration algorithm see e.g., [29,Chapter 8]. . Existence of the Optimal Scheduling Policy If the uplink and downlink channels have high packet-errorprobabilities, the average cost in (38) may never be boundedno matter what policy we choose. Therefore, we need to studythe condition in terms of the transmission reliability of theuplink and downlink channels, under which the dynamic plantcan be stabilized, i.e., the average cost can be bounded.

Wederive the following result.

Theorem 1.

In the static channel scenario, there exists astationary and deterministic optimal transmission-schedulingpolicy that can stabilize the one-step controllable plant (1) iff max { p s , p c } < ρ ( A ) , (42)where we recall that ρ ( A ) is the spectral radius of A . Proof.

The necessity of the condition can be easily proved as(42) is the necessary and sufﬁcient condition that an ideal FDcontroller with the uplink-downlink packet-error probabilities { p s , p c } can stabilize the remote plant [19]. Intuitively, if(42) does not hold, an FD controller cannot stabilize theplant and thus an HD controller cannot either, no matter whattransmission-scheduling policy it applies.The sufﬁciency part of the proof is conducted by provingthe existence of a stationary and deterministic policy π (cid:48) thatcan stabilize the plant if (42) is satisﬁed, where π (cid:48) ( s ) = π (cid:48) ( τ, φ ) = (cid:40) , τ = φ, ( τ, φ ) ∈ S , otherwise . (43)The details of the proof are given in Appendix B. Remark 3.

Theorem 1 states that the optimal policy exists,which stabilizes the plant, if both the channel conditions of theuplink and downlink channels are good (i.e., small p s and p c )and the dynamic process does not change rapidly (i.e., a small ρ ( A ) ). Also, it is interesting to see that the HD controllerhas exactly the same condition with the FD controller [19]to stabilize the plant. However, since the HD operationnaturally introduces longer delay in both transmissions of thesensor measurement and the control command than the FDoperation, the bounded average cost of the HD controllershould be higher than that of the FD one, which will beillustrated in Section VII. Assuming that the condition (42) is satisﬁed, we have thefollowing property of the optimal policy.

Proposition 2.

The stationary and deterministic optimalpolicy of the problem (38), π ∗ ( τ, φ ) , is a switching-typepolicy in terms of τ and φ , i.e., (i) if π ∗ ( τ, φ ) = 1 , then π ∗ ( τ + z, φ ) = 1 , ∀ z ∈ N and ( τ + z, φ ) ∈ S ; (ii)if π ∗ ( τ, φ ) = 2 , then π ∗ ( τ, φ + z ) = 2 , ∀ z ∈ N and ( τ, φ + z ) ∈ S . Proof.

The proof follows the same procedure as that of [30,Theorem 2] and is omitted due to the space limitation. Therefore, for the optimal policy, the state space is dividedinto two parts by a curve, and the scheduling actions of thestates in each part are the same, which will be illustrated inSection VII. Such a switching structure helps saving storagespace for on-line transmission scheduling, as the controlleronly needs to store the states of the switching boundaryinstead of the entire state space [30], [31].

C. Suboptimal Policy

In practice, to solve the MDP problem in Section IV-A withan inﬁnite number of states, one needs to approximate it by atruncated MDP problem with ﬁnite states for ofﬂine numericalevaluation. The computing complexity of the problem is O ( AB C ) [32], where A and B are the cardinalities of theaction space and the state space, respectively, and C is thenumber of convergence steps for solving the problem. Toreduce the computation complexity, we propose a myopicpolicy ψ ( s ) , ∀ s ∈ S , which simply makes online decision tooptimize the expected next stage cost.From (39) and (40), the expected next stage cost E [ c ( φ (cid:48) ) | s , a = ψ ( s )] , where s = ( τ, φ ) , is derived as E [ c ( φ (cid:48) ) | s , ψ ( s ) = 1] = c ( φ + 1) , E [ c ( φ (cid:48) ) | s , ψ ( s ) = 2] = p c c ( φ + 1) + (1 − p c ) c ( τ + 1) . (44)1) For the states { s | ( τ, φ ) ∈ S , φ > τ } , from (44), theaction ψ ( s ) = 2 results in a smaller next stage cost than ψ ( s ) = 1 .2) For the states { s | ( τ, φ ) ∈ S , φ = τ } , from (44), sincethe two actions lead to the same next stage cost, i.e., E [ c ( φ (cid:48) ) | s , ψ ( s ) = 1] = E [ c ( φ (cid:48) ) | s , ψ ( s ) = 2] = c ( φ + 1) , (45)we need to compare the second stage cost led by the actions.If ψ ( s ) = 1 , s (cid:48) ∈ { (1 , φ +1) , ( φ +1 , φ +1) } . If s (cid:48) = (1 , φ +1) ,since φ + 1 > , the next stage myopic action is ψ (1 , φ +1) = 2 as discussed earlier and the second stage state s (cid:48)(cid:48) ∈{ (2 , φ + 2) , (2 , } . If s (cid:48) = ( φ + 1 , φ + 1) , from (45), theexpected second stage cost is c ( φ + 2) for both ψ ( s (cid:48) ) = 1 and . Based on these analysis and (39), we have the expectedsecond stage cost with φ ( s ) = 1 as E [ c ( φ (cid:48)(cid:48) ) | s , ψ ( s ) = 1] = (1 − p s ) ( p c c ( φ + 2)+(1 − p c ) c (2))+ p s c ( φ + 2) . (46)Similarly, we can obtain the expected second stage cost with φ ( s ) = 2 as E [ c ( φ (cid:48)(cid:48) ) | s , ψ ( s ) = 2] = c ( φ + 2) . (47)Since p c , p s < and c (2) < c ( φ + 2) from Lemma 1, ψ ( s ) =1 results in a smaller cost than ψ ( s ) = 2 . From the aboveanalysis, the myopic policy ψ ( s ) is equal to π (cid:48) ( s ) in (43), ∀ s ∈ S . Proposition 3.

The myopic policy of problem (38) is π (cid:48) in (43). Remark 4.

From the myopic policy (43) and the state-updating rules (18) and (27) , we see that the policy π (cid:48) isctually a persistent scheduling policy, which consecutivelyschedules the uplink transmission until a transmission issuccessful and then consecutively schedules the downlinktransmission until a transmission is successful, and so on. From the property of the persistent scheduling policy, wecan easily obtain the result below.

Corollary 1.

For the persistent uplink-downlink schedulingpolicy π (cid:48) in Proposition 3, the chances for scheduling thesensor’s and the controller’s transmissions, are − p c (1 − p c )+(1 − p s ) and − p s (1 − p c )+(1 − p s ) , respectively.D. Naive Policy: A Benchmark We consider a naive uplink-downlink scheduling policy ofthe HD controller, as a benchmark of the proposed optimalscheduling policy. The naive policy simply schedules thesensor’s and the controller’s transmissions alternatively, i.e., {· · · , sensing, control, sensing, control, · · · } , without takinginto account the state-estimation quality of the controller northe state-quality of the plant. Such a naive policy is also notedas the round-robin scheduling policy . Theorem 2.

In the static channel scenario, the alternativescheduling policy can stabilize the one-step controllable plant(1) iff max { p s , p c } < ρ ( A )) . (48) Proof.

See Appendix C.

Remark 5.

Comparing with Theorem 1, to stabilize thesame plant, the naive policy may require smaller packet-errorprobabilities of the uplink and downlink channels than theproposed optimal scheduling policy. This also implies that theoptimal policy can result in a notably smaller the average costof the plant than the naive policy, which will be illustrated inSection VII. V. v -S TEP C ONTROLLABLE C ASE : O

PTIMAL T RANSMISSION -S CHEDULING P OLICY

A. MDP Formulation

Based on Proposition 1, the average cost minimizationproblem (38) can be formulated as an MDP similar to theone-step controllable case in Section IV as:1) The state space is deﬁned as S (cid:44) { ( τ k , τ k , · · · , τ v − k , φ k , φ k , · · · , φ v − k ) : φ ik ≥ τ ik , φ ik (cid:54) = τ ik + 1 , τ ik ∈ N , φ ik ∈ { , , · · · } , ∀ i = 0 , · · · , v − } .2) The action space of the MDP is exactly the same as thatof the one-step controllable plant in Section IV-A.3) Let P ( s (cid:48) | s , a ) denote the state-transition probabil-ity, where s = ( τ , · · · , τ v − , φ , · · · , φ v − ) and s (cid:48) =(( τ ) (cid:48) , · · · , ( τ v − ) (cid:48) , ( φ ) (cid:48) , · · · , ( φ v − ) (cid:48) ) are the current and next state, respectively, after dropping the time indexes. From(33) and (35), the state-transition probability is obtained as P ( s (cid:48) | s , a ) =  p s , if a = 1 , s (cid:48) = ( τ +1 , τ , · · · , τ v − , φ +1 , φ , · · · , φ v − )1 − p s , if a = 1 , s (cid:48) = (1 , τ , · · · , τ v − , φ +1 , φ , · · · , φ v − ) p c , if a = 2 , s (cid:48) = ( τ +1 , τ , · · · , τ v − , φ +1 , φ , · · · , φ v − )1 − p c , if a = 2 , s (cid:48) = ( τ +1 , τ , · · · ,τ v − , τ +1 , φ , · · · ,φ v − )0 , otherwise . (49)4) The one-stage cost of the MDP is a function of thecurrent state s , and is obtained from (2) and Proposition 1 as c ( s ) = c ( τ , · · · , τ v − , φ , · · · , φ v − )= Tr  Q  F ( φ ) + v − (cid:88) i =0 G  i (cid:88) j =0 φ j − i (cid:88) j =0 τ j +1 , ( φ i +1 > τ i +1 ) (cid:0) F ( φ i +1 ) − F ( τ i +1 ) (cid:1)(cid:1)(cid:3)(cid:1) . (50) Remark 6.

Different from the one-step controllable case,where the one-stage cost function is a monotonically increas-ing function of the state parameter φ , the cost function in (50) is more complex and does not have such a property.Thus, the switching structure of the optimal policy does nothold in general for the v -step controllable case.B. Existence of the Optimal Scheduling Policy Theorem 3.

In the static channel scenario, there exists astationary and deterministic optimal transmission-schedulingpolicy that can stabilize the v -step controllable plant (1) usingthe predictive control method (7), (9) and (10), iff (42) holds. Proof.

See Appendix D.

Remark 7.

The stability condition of a v -step controllableplant is exactly the same as that of the one-step controllableplant in Theorem 1. Thus, whether a plant can be stabilized byan HD controller simply depends on the spectral radius of theplant parameter A and the uplink and downlink transmissionreliabilities. Remark 8.

Although the stability conditions of a one-stepand a v -step plants are the same, to ﬁnd the optimal uplink-downlink scheduling policy, the state space and the computa-tion complexity of the MDP problem grow up with v linearlyand exponentially [32], respectively. However, in the followingsection, we will show that the persistent scheduling policy inProposition 3, which can be treated as a policy that makesdecision simply relying on two state parameters, i.e., φ and τ , instead of the entire v state parameters, can provide aremarkable performance close to the optimal one. VI. E

XTENSION TO F ADING C HANNELS

In this section, we investigate the optimal transmission-scheduling policy for the general v -step controllable case inthe fading channel scenario, where v ≥ . . MDP Formulation Comparing with the static channel scenario, the transmis-sion scheduling of the WNCS in the fading channel scenarioshould take into account the channel states of both the uplinkand downlink channels, and hence expands the dimension ofthe state space. Also, the state-transition probabilities of theMDP problem should also rely on the transition probabilitiesof channel states. Therefore, the detailed MDP problem forsolving the average cost minimization problem (38) can beformulated as:1) The state space is deﬁned as S (cid:44) { ( τ k , τ k , · · · , τ v − k , φ k , φ k , · · · , φ v − k , h s,k , h c,k ) : φ ik ≥ τ ik , φ ik (cid:54) = τ ik + 1 , τ ik ∈ N , φ ik ∈ { , , · · · } , h s,k ∈{ , · · · , B s } , h c,k ∈ { , · · · , B c } , ∀ i = 0 , · · · , v − } , where h s,k and h c,k are channel-state indexes of the uplink anddownlink channels at time k , respectively.2) The action space is the same as that of the static channelscenario in Section V-A.3) As the state transition is time-homogeneous, we dropthe time index k here. Let h (cid:44) ( h s , h c ) and h (cid:48) (cid:44) ( h (cid:48) s , h (cid:48) c ) denote the current and the next uplink-downlink channelstates, respectively. As the uplink and downlink channel areaction-invariant and independent of each other, the overallchannel state transition probability can be directly obtainedfrom (4) and (5) as P ( h (cid:48) | h ) = d sh s ,h (cid:48) s d ch c ,h (cid:48) c . (51)Let s (cid:44) ( τ , · · · , τ v − , φ , · · · , φ v − , h ) and s (cid:48) (cid:44) (( τ ) (cid:48) , · · · , ( τ v − ) (cid:48) , ( φ ) (cid:48) , · · · , ( φ v − ) (cid:48) , h (cid:48) ) denote the cur-rent and the next states of the WNCS, respectively. The state-transition probability P ( s (cid:48) | s, a ) can be obtained as P ( s (cid:48) | s , a ) =  P ( h (cid:48) | h ) ω h s , if a = 1 and s (cid:48) = ( τ + 1 , · · · , τ v − , φ + 1 , · · · , φ v − , h (cid:48) ) ,P ( h (cid:48) | h )(1 − ω h s ) , if a = 1 and s (cid:48) = (1 , · · · , τ v − , φ + 1 , · · · , φ v − , h (cid:48) ) ,P ( h (cid:48) | h ) ξ h c , if a = 2 and s (cid:48) = ( τ + 1 , · · · , τ v − , φ + 1 , · · · , φ v − , h (cid:48) ) ,P ( h (cid:48) | h )(1 − ξ h c ) , if a = 2 and s (cid:48) = ( τ + 1 , · · · , τ v − , τ + 1 , · · · , φ v − , h (cid:48) ) , , otherwise . (52)4) The one-stage cost of the MDP is the same as (50).Such an MDP problem with (2 v +2) state dimensions and asmall action space can be solved by standard MDP algorithmssimilar to that of the static channel scenario discussed earlier. B. Existence of the Optimal Scheduling Policy

In the fading channel scenario, since each state of theMarkov chain induced by a scheduling policy has (2 v + 2) dimensions, it is difﬁcult to analyze the average cost of theMarkov chain and determine whether it is bounded or not. Fig. 4. Temperature and humidity control in grain conservation.

Therefore, it is hard to give a necessary and sufﬁcient con-dition in terms of the properties of the Markov channels andthe plant, under which the MDP problem has a scheduling-policy solution leading to a bounded minimum average cost.However, inspired by the result of the static channel scenarioin Section V-B, we can directly give a necessary conditionand a sufﬁcient condition by considering the best and theworst Markov channel conditions of the uplink and downlinkchannels as below.

Theorem 4.

In the fading channel scenario, a necessarycondition and a sufﬁcient condition on the exists a stationaryand deterministic optimal transmission-scheduling policy thatcan stabilize the general v -step controllable plant (1) usingthe predictive control method (7), (9) and (10) are given by max (cid:8) p s , p c (cid:9) < ρ ( A ) , (53)and max { p s , p c } < ρ ( A ) , (54)respectively, where p s (cid:44) min { ω , · · · , ω B s } , p s (cid:44) max { ω , · · · , ω B s } , p c (cid:44) min { ξ , · · · , ξ B c } , p c (cid:44) max { ξ , · · · , ξ B c } .In general, Theorem 4 says that the plant can be stabilizedby a transmission scheduling policy as long as the worstachievable channel conditions of the uplink and downlinkMarkov channels are good enough, and it cannot be stabilizedby any scheduling policy if the best achievable channelconditions of the uplink and downlink Markov channels arepoor enough.In the following section, we will numerically evaluatethe performance of the plant using the optimal transmissionscheduling policy, where the sufﬁcient condition of the exis-tence of an optimal policy (54) is satisﬁed.VII. N UMERICAL R ESULTS

The uplink-downlink scheduling policies that we developedcan be applied to a large range of real IIoT applications,including temperature control of hot rolling process in theiron and steel industry, ﬂight path control of delivery drones,voltage control in smart grids, and lighting control in smartomes/buildings. Speciﬁcally, in this section, we apply theuplink-downlink scheduling policies to a real application ofsmart farms as illustrated in Fig. 4. The system contains agrain container, a sensor measuring the temperature ( ◦ C) andthe humidity ( % ) of the grain pile, an actuator which has ahigh pressure fan and/or an air cooler, and an edge controller,which receives the sensor’s measurements, and then computesand sends the command to the actuator. Given the presentvalues of temperature and humidity, the state vector x k in(1) contains two parameters, i.e., the current temperature andhumidity offsets. Note that since grain absorbs water fromthe air and generates heat naturally, the temperature and thehumidity levels of the grain pile will automatically increasewithout proper control, leading to severe insect and molddevelopment [33]. In general, by using the high pressure fanfor ventilation, both the temperature and the humidity can becontrolled in a proper range. If the air cooler is also available,the temperature can be controlled to the preset value faster.Thus, if the actuator has a high pressure fan only, given apreset fan speed, its control input u k in (1) has only oneparameter that is the relative fan speed (measured by the ﬂowvolume [m /h]) If the actuator has both high pressure fan andair cooler, the control input u k has two parameters includingthe fan speed and the cooler temperature ( ◦ C). The formerand the latter cases will be studied in Section VII-A, andSections VII-B and VII-C, respectively.The discrete time step T in this example is set to be onesecond [27]. Unless otherwise stated, we assume the systemparameters as A = (cid:20) . . . . (cid:21) , Q = R = (cid:20) (cid:21) , andthus ρ ( A ) = 1 . . Since the controller, the sensor andthe actuator have very low mobility in this example, wefocus on the static channel scenario and set the packet errorprobabilities of the uplink and downlink channels as p s = 0 . and p c = 0 . , respectively, and also study the fading channelscenario in Section VII-A.In the following, we present numerical results of theoptimal policies and the optimal average costs of the plant inSections IV and V for one-step and v -step controllable cases,respectively. Also, we numerically compare the performanceof the optimal scheduling policy with the persistent schedulingpolicy in Section IV-C, the benchmark (naive) policy inSection IV-D, and also the ideal FD policy in [19], i.e., thecontroller works in the FD mode and have the same packet-error probabilities of the uplink-downlink channels as in theHD mode.Note that to calculate the optimal policies in Sections IV-A,V-A and VI-A by solving the MDP problems, the inﬁnitestate space S is ﬁrst truncated by limiting the range of thestate parameters as ≤ τ i , φ i ≤ , ∀ i = 0 , · · · , v − , toenable the evaluation. For example, if we consider a two-stepcontrollable case, i.e., v = 2 , there will be × v = 160 , states in the static channel scenario, and there will be muchmore states in the fading channel scenario. For solving ﬁnite-state MDP problems, in general, there are two classicalmethods: the policy iteration and the value iteration. The Fig. 5. The uplink-downlink scheduling policies, where where ‘o’ and ‘.’denote a = 1 and a = 2 , respectively, and ‘x’ denotes a state that does notbelong to S . policy iteration method converges faster in solving small-scaleMDP problems, but is more computationally burdensome thanthe value iteration method when the state space is large [29].Since our problems have large state spaces, we adopt theclassical value iteration method for solving the MDP problemsby using a well recognized Matlab MDP toolbox [34]. A. One-Step Controllable Case

In this case, we assume that B = − (cid:20) (cid:21) , and K = A satisfying A + BK = . Optimal and suboptimal policies.

Fig. 5 shows the optimalpolicy and the persistent (suboptimal) policy in Proposition 3within the truncated state space. We see that although the op-timal policy has more states choosing to schedule the sensor’stransmission than the persistent policy, these two policies looksimilar to each other. Also, we see that the optimal policy isa switching-type policy in line with Proposition 2.

Performance comparison.

We further evaluate the perfor-mances of the optimal scheduling policy, the persistent policy,the naive policy and also the FD policy in terms of the K -step average cost of the plant using K (cid:80) K − k =0 x (cid:62) k Qx k . We run -step simulations with the initial value of the plant-statevector x = [1 , − (cid:62) . The initial state for the optimal andpersistent policies is ( τ , φ ) = (2 , . The initial schedulingof the naive policy is the sensor’s transmission.Fig. 6 shows the average cost versus the simulation time,using different policies. We see that the average costs inducedby different policies converge to the steady state values when K > . Given the baseline of the FD (non-scheduling)policy, the optimal scheduling policy gives a signiﬁcant average cost reduction of the naive policy. Also, we see thatthe persistent policy provides a performance close to the op-timal one. We note that there is a noticeable performance gapbetween the optimal scheduling policy of the HD controllerand the FD policy of the FD controller, since the HD operationintroduces extra delays in uplink-downlink transmissions anddeteriorates the performance of the control system.

Performance versus transmission reliabilities.

In Fig. 7, weshow a contour plot of the average cost of the plant withdifferent uplink-downlink packet-error probabilities ( p s , p c ) within the rectangular region that can stabilize the plant, i.e., p s , p c < /ρ ( A ) = 0 . based on Theorem 1. The averagecost is calculated by running a -step simulation and then Naive PolicyPersistent PolicyOptimal PolicyFD Policy

Fig. 6. One-step controllable case: average cost versus time. Unstable Region

Fig. 7. One-step controllable case: average cost versus packet-error proba-bilities, i.e., p s and p c . taking the average, and it does not have a steady-state valueif ( p s , p c ) lies outside the rectangular region. We see thatthe average cost increases quickly when p s or p c approachesthe boundary /ρ ( A ) . Also, it is interesting to see that inorder to guarantee a certain average cost, e.g., J = 8 , therequired p s is less than p c in general, which implies thatthe transmission reliability of the sensor-controller channel ismore important than that of the controller-actuator channel. Fading channel scenario.

Assume that both the uplink anddownlink channels have two Markov channel states withthe packet error probabilities . (i.e., the good channelstate) and . (i.e., the bad channel state), respectively, i.e., ω = ξ = 0 . and ω = ξ = 0 . . Figs. 8 and 9 show theaverage cost versus the simulation time with different channelstate transition probabilities. In Fig. 8, we set the matrices ofthe channel state transition probabilities of the uplink anddownlink channels as D s = D c = (cid:20) . . . . (cid:21) . Taking theuplink channel as an example, the transition probabilities fromthe bad channel state and the good channel state are the same,and thus the Markov channel does not have any memory [26].Since the uplink and downlink channels have the sameMarkovian property, both the uplink and downlink channels Naive PolicyPersistent PolicyOptimal PolicyFD Policy

Fig. 8. Markov channel scenario without memory: average cost versus time.

Naive PolicyPersistent PolicyOptimal PolicyFD Policy

Fig. 9. Markov channel scenario with memory: average cost versus time. are memoryless. In Fig. 9, we set D s = D c = (cid:20) . . . . (cid:21) ,where the probability of remaining in any given state is higherthan jumping to the other state. In this case, both the uplinkand downlink channels have persistent memories.In Figs. 8 and 9, we see that the persistent policy alwaysprovides a low average cost, which is close to that of theoptimal policy and is much smaller than that of the naivepolicy. It is interesting to see that the average cost achievedby the optimal policy under the memoryless Markov channelsin Fig. 8 is smaller than that of the Markov channels withmemories in Fig. 9. This is because in the Markov channelwith memory, when the current channel state is bad, it is morelikely to have a bad channel state in the following time slot,which can lead to consecutive packet losses and deterioratethe control performance of the WNCS. B. Two-Step Controllable Case

In this case, we assume that B = − [1 , (cid:62) , and K =[2 . , − satisfying ( A + BK ) = . For fair comparison,all the policies considered in this subsection adopt the samepredictive control method in (7), (9) and (10) with v = 2 . .1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50102030405060708090100 Naive PolicyPersistent PolicyOptimal PolicyFD Policy

Fig. 10. Two-step controllable case: average cost versus packet-error prob-ability p c . Naive Policy,v=2Persistent Policy,v=2Approximated Optimal Policy,v=2FD Policy,v=2Naive Policy,v=3Persistent Policy,v=3Approximated Optimal Policy,v=3FD Policy,v=3

Fig. 11. Non-ﬁnite-step-controllable case: average cost versus packet-errorprobability p c . In Fig. 10, we plot the average cost function versus thepacket-error probability of the downlink channel with differ-ent uplink-downlink transmission-scheduling policies, wherethe uplink packet-error probability p s = 0 . . We see that thepersistent policy can still provide a good performance closeto the optimal policy. Given the FD policy as a benchmark, itis clear that the optimal scheduling policy provides at least a reduction of the average cost than the naive policy when p c ≥ . . C. Non-Finite-Step-Controllable Case

We now look at the non-ﬁnite-step-controllable case asdiscussed in Remark 2, where B = − [1 , (cid:62) , and K =[0 . , . . It can be veriﬁed that ρ ( A + BK ) = 0 . < and ( A + BK ) v (cid:54) = for a practical range of v , e.g. v < . We consider two predictive control protocols in(7) with v = 2 and v = 3 , respectively, i.e., the controllersends two or three commands to the actuator each time.We have ( A + BK ) = (cid:20) . − . − . . (cid:21) , ( A + BK ) = (cid:20) . − . − . . (cid:21) and ( A + BK ) = (cid:20) . − . − .

29 0 . (cid:21) . It isclear that ( A + BK ) v approaches as v increases. By letting ( A + BK ) v = in the analysis of plant-state vector in(58), where v = 2 or , the plant-state covariance matrix P k is approximated by a function of v − state parametersas in Proposition 1. Based on such the approximation, wecan formulate and solve the MDP problem in Section V-A,resulting an approximated optimal scheduling policy .In Fig. 11, we plot the average cost function versusthe packet-error probability of the downlink channel withdifferent downlink transmission-scheduling policies. We seethat for both the cases v = 2 and , the performanceof the approximated optimal and persistent uplink-downlinkscheduling policies are quite close to the benchmark FDpolicy when p c < . , while the performance gap between thenaive scheduling policy and the FD policy is large. This alsoimplies that the approximated optimal policy is near optimalin this practical range of downlink transmission reliability,and the persistent scheduling policy is also an effective yetlow-complexity one in this case.VIII. C ONCLUSIONS

In this work, we have proposed an important uplink-downlink transmission scheduling problem of a WNCS withan HD controller for practical IIoT applications, which hasnot been considered in the open literature. We have given acomprehensive analysis of the estimation-error covariance andthe plant-state covariance of the HD-controller-based WNCSfor both one-step and v -step controllable plants. Based onour analytical results, in both the static and fading channelscenarios, we have formulated the novel problem to optimizethe transmission-scheduling policy depending on both thecurrent estimation quality of the controller and the currentcost function of the plant, so as to minimize the long-termaverage cost function. Moreover, for the static channel sce-nario, we have derived the necessary and sufﬁcient conditionof the existence of a stationary and deterministic optimalpolicy that results in a bounded average cost in terms of thetransmission reliabilities of the uplink and downlink channels.For the fading channel scenario, we have derived a necessarycondition and a sufﬁcient condition in terms of the uplinkand downlink channel qualities, under which the optimaltransmission-scheduling policy exits. Our problem can besolved effectively by the standard MDP algorithms if theoptimal scheduling policy exits. Also, we have derived aneasy-to-compute suboptimal policy, which provides a controlperformance close to the optimal policy and notably reducesthe average cost of the plant compared to a naive alternative-scheduling policy.In future work, we will consider a scenario that an HDcontroller controls multiple plants for IIoT applications witha large number of devices. It is important to investigate thescheduling of different sensors’ transmissions to the controllerand the controller’s transmissions to different actuators, andconsider different quality of service (QoS) requirements ofdifferent devices in the scheduling and how they affect thecontrol. Moreover, for the scheduling-policy design, it isore practical to take into account the transmission powerconstraints of the sensors and the controller.A PPENDIX

A: P

ROOF OF P ROPOSITION η k (cid:44) η k − . From the deﬁnition of η k in (11),we have η j +1 = (cid:40) , j = k − η k η j + 1 , j = k − η k + 1 , · · · , k − (55)By using the state-updating rule (29) for x j , j = ( k − η k +1) , · · · , k , we have  x k − η k +1 = Ax k − η k + BK ( A + BK ) ˆ x k − η k + w k − η k x k − η k +2 = Ax k − η k +1 + BK ( A + BK ) ˆ x k − η k + w k − η k +1 ... x k = Ax k − + BK ( A + BK ) η k − ˆ x k − η k + w k − (56)Substituting x k − η k +1 into x k − η k +2 and so on, it can be shownthat x k = ( A + BK ) η k x k − η k + ( A η k − ( A + BK ) η k ) e k − η k + η k (cid:88) i =1 A i − w k − i . (57)Using the new state-updating rule (57), x k can be furtherrewritten as x k = ( A + BK ) η k x t k +( A η k − ( A + BK ) η k ) e t k + η k (cid:88) i =1 A i − w k − i = ( A + BK ) η k (cid:0) ( A + BK ) η k x t k +( A η k − ( A + BK ) η k ) e t k + η k (cid:88) i =1 A i − w t k − i (cid:1) + ( A η k − ( A + BK ) η k ) e t k + η k (cid:88) i =1 A i − w k − i = ( A + BK ) η k + η k x t k + η k (cid:88) i =1 A i − w k − i +( A + BK ) η k η k (cid:88) i =1 A i − w t k − i +( A η k − ( A + BK ) η k ) e t k +( A + BK ) η k ( A η k − ( A + BK ) η k ) e t k = ( A + BK ) η k + η k + ··· + η v − k x t vk + w (cid:48) + e (cid:48) = w (cid:48) + e (cid:48) , (58)where the last step is due to the fact that η k + η k + · · · + η v − k ≥ v as η ik ≥ , ∀ i ≥ , and ( A + BK ) v = , and w (cid:48) = η k (cid:88) i =1 A i − w k − i + ( A + BK ) η k η k (cid:88) i =1 A i − w k − i + · · · + ( A + BK ) η k + ··· + η v − k η v − k (cid:88) i =1 A i − w t v − k − i , (59) e t jk = τ jk (cid:88) i =1 A i − w t jk − i , j = 1 , · · · , v, (60) t k t k kτ k τ k τ k η k η k time t k t k t k kτ k τ k τ k η k η k time t k (a)(b) t k t k kτ k τ k τ k η k η k time t k (a)(c) Fig. 12. Illustration of three different cases for analyzing the plant-state co-variance, where red vertical bars denote successful controller’s transmissionsand blue vertical bars denote the most recent successful sensor’s transmissionsprior to the successful controller’s transmissions. e (cid:48) = ( A η k − ( A + BK ) η k ) e t k + ( A + BK ) η k ( A η k − ( A + BK ) η k ) e t k + · · · +( A + BK ) η k + ··· + η v − k ( A η v − k − ( A + BK ) η v − k ) e t vk . (61)We see that x k only depends on the noise terms in the timerange S (cid:44) (cid:2) k − ( η k + · · · + η v − k ) − τ v , k − (cid:3) . (62)To further simplify (58), we consider three complementarycases: 1) τ ik < η ik , ∀ i = 1 , · · · , v − , i.e., a sensor’s successfultransmission occurred between two consecutive controller’ssuccessful transmissions, as illustrated in Fig. 12(a); 2) thereexists i such that τ ik ≥ η ik and there also exists j suchthat τ jk < η jk where i, j ∈ { , · · · , v − } , i.e., a sensor’ssuccessful transmission did not always occur between twoconsecutive controller’s successful transmissions, as illus-trated in Fig. 12(b). Note that from the deﬁnition of τ jk and η jk , τ ik = η ik + τ i +1 k if τ ik > η ik ; 3) τ ik = η ik + τ i +1 k ≥ η ik for all i ∈ { , · · · , v − } , i.e., a sensor’s successful trans-mission never occur between the ﬁrst and the v th controller’ssuccessful transmissions prior to the current time slot k , asillustrated in Fig. 12(c).For case 1), e t ik contains the noise terms within time slots t ik − τ ik to t ik − . Since τ ik < η ik = t i +1 k − t ik , e t ik and e t jk donot contain common noise terms when i (cid:54) = j . Taking (60) into(58), after some simple simpliﬁcations, x k can be simpliﬁeds below with v -segment summations x k = η k + τ k (cid:88) i =1 A i − w k − i + ( A + BK ) η k η k + τ k (cid:88) i = τ k +1 A i − w t k − i + · · · + ( A + BK ) η k + ··· + η v − k η v − k + τ vk (cid:88) i = τ v − k +1 A i − w t v − k − i , (63)where η jk > τ jk , ∀ j = 1 , · · · , v − .For case 2), the estimation-error terms e t ik and e t jk in (58)may contain common noise terms when i (cid:54) = j , and η jk > τ jk may not hold for j = 1 , · · · , v − . Inspired by the result (63)in the ﬁrst case, to calculate x k , we divide the time range S by the time slots t jk − τ jk , j = 1 , · · · , v − . Since t j (cid:48) k − τ j (cid:48) k mayequal to t jk − τ jk when j (cid:48) (cid:54) = j , S is divided into v (cid:48) segmentsfrom left to right, and ≤ v (cid:48) ≤ v .To investigate the noise terms within the ﬁrst v (cid:48) − seg-ments of S , we assume that sensor’s successful transmissionsoccurred in the time ranges (cid:104) t j (cid:48) +1 k + 1 , t j (cid:48) k (cid:105) and (cid:104) t j +1 k + 1 , t jk (cid:105) and there is no sensor’s successful transmission in the gapbetween them, where v ≥ j (cid:48) > j ≥ . Thus, η j (cid:48) k > τ j (cid:48) k and η jk > τ jk . When j (cid:48) = j + 1 , we have t j (cid:48) k − τ j (cid:48) k = t j +1 k − τ j +1 k = t jk − η jk − τ j +1 k and only w (cid:48) and the estimation-errorterm e t j (cid:48) k contains the noise terms within the time segment (cid:104) t j (cid:48) k − τ j (cid:48) k , t jk − τ jk − (cid:105) = (cid:104) t jk − ( η jk + τ j +1 k ) , t jk − τ jk − (cid:105) ,therefore, the noise terms in this segment have exactly thesame expressions as in (63) of case 1), i.e., ( A + BK ) η k + ··· + η j − k η jk + τ j +1 k (cid:88) i = τ jk +1 A i − w t jk − i . (64)When j (cid:48) > j + 1 , w (cid:48) and the estimation-errorterms e t j +1 k , e t j +2 k , · · · , e t j (cid:48) k contains the noise termswithin the time segment (cid:104) t j (cid:48) k − τ j (cid:48) k , t jk − τ jk − (cid:105) = (cid:104) t jk − ( η jk + τ j +1 k ) , t jk − τ jk − (cid:105) . After combining thenoise terms in this range, we also have the expression (64).To investigate the noise terms of the v (cid:48) th (last) segmentof S , we assume that the most recently successful sensor’stransmission before t k is within the range of (cid:104) t j +1 k + 1 , t jk (cid:105) ,where j ∈ { , · · · , v } . We see that w (cid:48) and the estimation-error terms e t k , · · · , e t jk contains the noise terms withinthe time range (cid:104) t jk − τ jk , k − (cid:105) = (cid:2) k − ( η k + τ k ) , k − (cid:3) .After combining the noise terms in this range contributed by e t k , · · · , e t jk and w (cid:48) , we have exactly the same expressionsas in (63) of case 1), i.e., η k + τ k (cid:88) i =1 A i − w k − i . (65) To sum up, different from (63) of case 1), x k of case 2)has v (cid:48) segment summations, i.e., x k = η k + τ k (cid:88) i =1 A i − w k − i + ( η k > τ k )( A + BK ) η k η k + τ k (cid:88) i = τ k +1 A i − w t k − i + · · · + ( η v − k > τ v − k )( A + BK ) η k + ··· + η v − k η v − k + τ vk (cid:88) i = τ v − k +1 A i − w t v − k − i , (66)where ( · ) is the indicator function and (cid:80) v − j =1 ( η jk > τ jk ) = v (cid:48) − .For case 3), the range S has only one segment, which isa special case of case 2) discussed above (65), where j = v .Therefore, x k has the expression of (65).Therefore, the general expression of x k is given in (66),and thus the state covariance P k = E [ x k x (cid:62) k ] is obtained as(36). A PPENDIX

B: P

ROOF OF T HEOREM π (cid:48) in(43) stabilizes the plant.It is easy to verify that the state-transition process inducedby π (cid:48) is an ergodic Markov process, i.e., any state in S isaperiodic and positive recurrent. In the following, we provethat the average cost of the plant induced by π (cid:48) is bounded.From (43), (18) and (27), we see that the policy π (cid:48) isactually a persistent scheduling policy , which consecutivelyschedules the uplink transmission until a transmission issuccessful and then consecutively schedules the downlinktransmission until a transmission is successful, and so on.The transmission process of (s)ensor’s measurement and(c)ontroller’s command is illustrated as {· · · , control cycle ( t − (cid:122) (cid:125)(cid:124) (cid:123) s · · · s (cid:124) (cid:123)(cid:122) (cid:125) m (cid:48) , c · · · c (cid:124) (cid:123)(cid:122) (cid:125) n (cid:48) , control cycle t (cid:122) (cid:125)(cid:124) (cid:123) s · · · s (cid:124) (cid:123)(cid:122) (cid:125) m , c · · · c (cid:124) (cid:123)(cid:122) (cid:125) n , · · · } (67)where m and n are the numbers of consecutively scheduleduplink and downlink transmission, respectively.For the ease of analysis, we deﬁne the concept of controlcycle , which consists of M consecutive uplink transmissionsand the following N consecutive downlink transmissions. Itis clear that M and N follow geometric distributions withsuccess probabilities (1 − p s ) and (1 − p c ) , respectively.The values of M and N change in different control cyclesindependently as illustrated in (67). Thus, the uplink-downlinkschedule process (67) can be treated as a sequence of controlcycles.Let S and L (cid:44) M + N denote the sum cost of theplant and the number of transmissions in a control cycle,respectively. We can prove that S and L of the sequenceof control cycles can be treated as ergodic Markov chains,i.e., {· · · , S t , S t +1 , · · · } and {· · · , L t , L t +1 , · · · } , where t isthe control-cycle index. We use N (cid:48) to denote the number ofconsecutive downlink transmissions before the current controlycle, which follows the same distribution of N . Due to theergodicity of { S t } and { L t } , the average cost in (2) can berewritten as J = lim t →∞ S + S + · · · + S t L + L + · · · + L t = E [ S ] E [ L ] , (68)where E [ S ] = ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =1 ∞ (cid:88) n =1 E [ S | N (cid:48) = n (cid:48) , M = m, N = n ] (69) P [ N (cid:48) = n (cid:48) , M = m, N = n ] , E [ L ] = ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =1 ∞ (cid:88) n =1 ( m + n ) P [ N (cid:48) = n (cid:48) , M = m, N = n ] . (70)Thus, the average cost J is bounded if E [ S ] is. From thepolicy (43) and the state-transition rules in (18) and (27), wesee that φ is equal to N (cid:48) + 1 at the beginning of the controlcycle, and increases one-by-one within the control cycle, andwe have E [ S | N (cid:48) = n (cid:48) , M = m, N = n ] = m + n (cid:88) i =1 c ( n (cid:48) + i ) , (71)and P [ N (cid:48) = n (cid:48) , M = m, N = n ] = P [ N (cid:48) = n (cid:48) ] P [ M = m ] P [ N = n ]= (1 − p c ) p n (cid:48) − c (1 − p s ) p m − s (1 − p c ) p n − c , (72)as N (cid:48) , M , N are independent with each other. Let p (cid:44) max { p s , p c } . We have E [ S ] ≤ κ ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =1 ∞ (cid:88) n =1 m + n (cid:88) i =1 c ( n (cid:48) + i ) p n (cid:48) + m + n (73) < κ ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =1 ∞ (cid:88) n =1 ( n (cid:48) + m + n ) c ( n (cid:48) + m + n ) p n (cid:48) + m + n (74) < κ ∞ (cid:88) i =1 i c ( i ) p i . (75)where κ = (1 − p c ) p − c (1 − p s ) p − s (1 − p c ) p − c , and (75) isdue to the fact that the number of possible partition of ( n (cid:48) + m + n ) into three parts is less than ( n (cid:48) + m + n ) . Since therealways exists p (cid:48) > p and n such that i p i < ( p (cid:48) ) i , ∀ i > n , (cid:80) ∞ i =1 i c ( i ) p i < ∞ if (cid:80) ∞ i =1 c ( i )( p (cid:48) ) i < ∞ . Using the resultthat (cid:80) ∞ j =1 ( p (cid:48) ) j c ( j ) < ∞ iff p (cid:48) ρ ( A ) < in [19] and [14], (cid:80) ∞ i =1 i c ( i ) p i < ∞ if p ρ ( A ) < , completing the proof.A PPENDIX

C: P

ROOF OF T HEOREM

A. Sufﬁciency

Similar to the proof of Theorem 1, we need to deﬁne thecontrol cycle of the naive policy and then calculate the averagecost.Different from the Proof of Theorem 1, the control cycle isdeﬁned as the time slots after a effective control cycle until the end of the following effective control cycle. Here, the effectivecontrol cycle is the sequence of time slots starting from asensor’s successful transmission and ending at a controller’ssuccessful transmission, where there is no successful trans-missions in between. In other words, in an effective controlcycle, the sensor’s measurement at the beginning of the cyclewill be utilized for generating a control command, which willbe implemented on the plant by the end of the cycle. Thecontrol cycle and the effective control cycle are illustrated as {· · · , control cycle ( t − (cid:122) (cid:125)(cid:124) (cid:123) s ˇ c ˇ sc · · · sc (cid:124) (cid:123)(cid:122) (cid:125) m (cid:48) , ˇ scsc · · · s ˇ c (cid:124) (cid:123)(cid:122) (cid:125) n (cid:48) , (cid:124) (cid:123)(cid:122) (cid:125) effective control cycle ( t − control cycle t (cid:122) (cid:125)(cid:124) (cid:123) scs ˇ c · · · sc (cid:124) (cid:123)(cid:122) (cid:125) m , ˇ scsc · · · s ˇ c (cid:124) (cid:123)(cid:122) (cid:125) n , (cid:124) (cid:123)(cid:122) (cid:125) effective control cycle t · · · } (76)where n and l = m + n are the number of time slots ofan effective control cycle and a control cycle, respectively,and ˇ s and ˇ c denotes a successful sensor’s transmission andcontroller’s transmission, respectively. Note that m and n areeven numbers.Similar to the proof of Theorem 1, S and L (cid:44) M + N de-note the sum cost of the plant and the number of transmissionsin a control cycle, respectively. Also, S and L of the sequenceof control cycles can be treated as ergodic Markov chains, i.e., {· · · , S t , S t +1 , · · · } and {· · · , L t , L t +1 , · · · } , where t is thecontrol-cycle index. Due to the ergodicity of { S t } and { L t } ,the average cost in (2) can be rewritten as (68), where E [ S ] = ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =0 ∞ (cid:88) n =1 E [ S | N (cid:48) = n (cid:48) , M = m, N = n ] (77) P [ N (cid:48) = n (cid:48) , M = m, N = n ] , E [ L ] = ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =0 ∞ (cid:88) n =1 l P [ N (cid:48) = n (cid:48) , M = m, N = n ] , (78)where M + N and N are the length of the current controlcycle and effective control cycle, respectively, and N (cid:48) is thelength of the previous effective control cycle. It is clear that N (cid:48) is independent with M and N .Thus, the average cost J is bounded if E [ S ] is. From thenaive policy and the deﬁnition of the control cycle and theeffective control cycle, we can derive the following probabilitydensity functions as P [ M = m, N = n ]=  (1 − p s )(1 − p c ) p n/ − c p n/ − s , m = 0 , n = 2 , , − p s ) (cid:16) p m/ s + (1 − p s ) (cid:80) m/ i =1 p i − s p m/ − ic (cid:17) × (1 − p c ) p n/ − c p n/ − s , m, n = 2 , , , · · · (79)and thus P [ N (cid:48) = n (cid:48) ] = P [ N = n (cid:48) ] = ∞ (cid:88) m =0 P [ M = m, N = n (cid:48) ]= (1 − p s p c ) p n (cid:48) − c p n (cid:48) − s , n (cid:48) = 2 , , , · · · (80)hen, it can be proved that P [ N (cid:48) = n (cid:48) ] ≤ κ p n (cid:48) , ∀ n (cid:48) = 2 , , , · · · P [ M = m, N = n ] ≤ κ (1 + m/ p m/ p n , ∀ m = 0 , , , · · · ,n = 2 , , , · · · (81)where p = max { p s , p c } , κ = (1 − p s p c ) p − c p − s , and κ =(1 − p s )(1 − p c ) p − c p − s .Since E [ S | N (cid:48) = n (cid:48) , M = m, N = n ] = (cid:80) m + ni =1 c ( n (cid:48) + i ) ,we have E [ S ] = ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =0 ∞ (cid:88) n =1 E [ S | N (cid:48) = n (cid:48) , M = m, N = n ] P [ N (cid:48) = n (cid:48) ] P [ M = m, N = n ] , (82) ≤ κ κ ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =0 ∞ (cid:88) n =1 m + n (cid:88) i =1 c ( n (cid:48) + i )(1 + m p n (cid:48) + m + n (83) < κ κ ∞ (cid:88) n (cid:48) =1 ∞ (cid:88) m =0 ∞ (cid:88) n =1 ( n (cid:48) + m + n ) c ( n (cid:48) + m + n ) p n (cid:48) + m + n (84) < κ κ ∞ (cid:88) i =2 i c (2 i ) p i . (85)Since there always exists p (cid:48) > p and ¯ n such that i p i < ( p (cid:48) ) i , ∀ i > ¯ n , (cid:80) ∞ i =2 i c (2 i ) p i < ∞ if (cid:80) ∞ i =2 c (2 i )( p (cid:48) ) i < ∞ .Also, we have (cid:80) ∞ i =2 c (2 i )( p (cid:48) ) i < (cid:80) ∞ i =1 c ( i ) (cid:112) p (cid:48) i . Using theresult that (cid:80) ∞ j =1 (cid:112) p (cid:48) j c ( j ) < ∞ iff (cid:112) p (cid:48) ρ ( A ) < in [19]and [14], (cid:80) ∞ i =2 i c (2 i ) p i < ∞ if √ p ρ ( A ) < , completingthe proof of sufﬁciency. B. Necessity

To prove the necessity, we consider two ideal cases: thesensor’s transmission is perfect, i.e., p s = 0 , and the con-troller’s transmission is perfect, i.e., p c = 0 . In these cases,the stability conditions are the necessary condition that theplant can be stabilized by the naive policy. The proof requiresthe analysis of average cost of control cycles which followssimilar steps in the proof of sufﬁciency. Since the averagecost J = E [ S ] E [ L ] and E [ L ] is bounded straightforwardly, weonly need to prove the necessary condition that the averagesum cost of a control cycle is bounded, i.e., E [ S ] .In the ideal cases, we have E [ S ] = (cid:40) (1 − p c ) (cid:80) ∞ j =1 (cid:80) j +1 i =2 c ( i ) p j − c , p s = 0(1 − p s ) (cid:80) ∞ j =1 (cid:80) j +1 i =2 c ( i ) p j − s , p c = 0 (86)Therefore, if E [ S ] is bounded, we have ∞ (cid:88) i =1 c (2 i ) p ic < ∞ , ∞ (cid:88) i =1 c (2 i + 1) p ic < ∞ ∞ (cid:88) i =1 c (2 i ) p is < ∞ , ∞ (cid:88) i =1 c (2 i + 1) p is < ∞ (87) and hence ∞ (cid:88) i =1 c ( i ) √ p ci < ∞ , ∞ (cid:88) i =1 c ( i ) √ p si < ∞ . (88)Using the result that (cid:80) ∞ j =1 √ p j c ( j ) < ∞ iff √ p ρ ( A ) < in [19] and [14], the necessary condition that the average costinducted by the naive policy is bounded, is √ p s ρ ( A ) < and √ p c ρ ( A ) < , completing the proof of necessity.A PPENDIX

D: P

ROOF OF T HEOREM

A. Sufﬁciency

We construct a persistent-scheduling-like policy includingthree phases: 1) the sensor’s transmission is consecutivelyscheduled until it is successful, and then 2) the controller’stransmission is consecutively scheduled until a successfultransmission, and then 3) none of the sensor nor the controlleris scheduled for transmission in the following v − time slots,i.e., all the commands contained in the successfully receivedcontrol packet will be implemented by the actuator, and thenphase 1) and so on.Then, following the similar steps of the proof of Theorem 1,it can be proved that the persistent-scheduling-like policystabilizes the plant if (42) holds. B. Necessity

The proof is conducted by considering two virtual cases:1) the sensor’s transmission is continuously scheduled, whilethere is a virtual control input u k at each time slot thatideally resets x k to if the sensor’s transmission is successfulat k , and is otherwise; 2) the controller’s transmission iscontinuously scheduled, while the controller applies a virtualestimator that has perfect estimation of the plant states in eachtime slots.It can be readily proved that the two virtual cases resultin lower average costs than any feasible uplink-downlinkscheduling policy. Then, following the similar steps in theproof of Theorem 1 and 2, it can be shown that if the averagecost of case 1) is bounded, p s < /ρ ( A ) must be satisﬁed,and if the average cost of case 2) is bounded, p c < /ρ ( A ) must be satisﬁed, completing the proof of necessity.R EFERENCES[1] K. Huang, W. Liu, Y. Li, and B. Vucetic, “To sense or to control:Wireless networked control using a half-duplex controller for IIoT,”accepted by Proc. IEEE Globecom 2019.[2] P. Park, S. C. Ergen, C. Fischione, C. Lu, and K. H. Johansson, “Wire-less network design for control systems: A survey,”

IEEE Commun.Surveys Tuts. , vol. 20, no. 2, pp. 978–1013, Second Quarter 2018.[3] C. Perera, C. H. Liu, and S. Jayawardena, “The emerging Internet ofThings marketplace from an industrial perspective: A survey,”

IEEETrans. Emerg. Topics Comput. , vol. 3, no. 4, pp. 585–598, Jan. 2015.[4] M. Wollschlaeger, T. Sauter, and J. Jasperneite, “The future of industrialcommunication: Automation networks in the era of the Internet ofThings and Industry 4.0,”

IEEE Ind. Electron. Mag. , vol. 11, no. 1,pp. 17–27, Mar. 2017.5] P. Schulz, M. Matthe, H. Klessig, M. Simsek, G. Fettweis, J. Ansari,S. A. Ashraf, B. Almeroth, J. Voigt, I. Riedel, A. Puschmann,A. Mitschele-Thiel, M. Muller, T. Elste, and M. Windisch, “Latencycritical IoT applications in 5G: Perspective on the design of radiointerface and network architecture,”

IEEE Commun. Mag. , vol. 55,no. 2, pp. 70–78, Feb. 2017.[6] O. Bello and S. Zeadally, “Intelligent device-to-device communicationin the Internet of Things,”

IEEE Syst. J. , vol. 10, no. 3, pp. 1172–1182,Jan. 2014.[7] D. Zhang, P. Shi, Q.-G. Wang, and L. Yu, “Analysis and synthesis of net-worked control systems: A survey of recent advances and challenges,”

ISA Trans. , vol. 66, pp. 376–392, Jan. 2017.[8] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Visionand challenges,”

IEEE Internet Things J. , vol. 3, no. 5, pp. 637–646,Jun. 2016.[9] L. Lyu, C. Chen, S. Zhu, N. Cheng, B. Yang, and X. Guan, “Controlperformance aware cooperative transmission in multiloop wireless con-trol systems for industrial IoT applications,”

IEEE Internet Things J. ,vol. 5, no. 5, pp. 3954–3966, Sep. 2018.[10] P. Gil, A. Santos, and A. Cardoso, “Dealing with outliers in wirelesssensor networks: an oil reﬁnery application,”

IEEE Trans. Control Syst.Technol. , vol. 22, no. 4, pp. 1589–1596, Nov. 2013.[11] Y. Wang, S. X. Ding, D. Xu, and B. Shen, “An h ∞ fault estimationscheme of wireless networked control systems for industrial real-timeapplications,” IEEE Trans. Control Syst. Technol. , vol. 22, no. 6, pp.2073–2086, Apr. 2014.[12] V. Liberatore and A. Al-Hammouri, “Smart grid communication andco-simulation,” in

Proc. IEEE EnergyTech , May 2011, pp. 1–5.[13] R. Hult, G. R. Campos, E. Steinmetz, L. Hammarstrand, P. Falcone,and H. Wymeersch, “Coordination of cooperative autonomous vehicles:Toward safer and more efﬁcient road transportation,”

IEEE SignalProcess. Mag. , vol. 33, no. 6, pp. 74–84, Nov. 2016.[14] L. Schenato, “Optimal estimation in networked control systems subjectto random delay and packet drop,”

IEEE Trans. Autom. Control , vol. 53,no. 5, pp. 1311–1317, Jun. 2008.[15] C. Yang, J. Wu, X. Ren, W. Yang, H. Shi, and L. Shi, “Deterministicsensor selection for centralized state estimation under limited commu-nication resource,”

IEEE Trans. Signal Process. , vol. 63, no. 9, pp.2336–2348, May 2015.[16] G.-P. Liu, Y. Xia, J. Chen, D. Rees, and W. Hu, “Networked predictivecontrol of systems with random network delays in both forward andfeedback channels,”

IEEE Trans. Ind. Electron. , vol. 54, no. 3, pp.1282–1297, Jun. 2007.[17] B. Demirel, V. Gupta, D. E. Quevedo, and M. Johansson, “On thetrade-off between communication and control cost in event-triggereddead-beat control,”

IEEE Trans. Autom. Control , vol. 62, no. 6, pp.2973–2980, Jun. 2017.[18] P. K. Mishra, D. Chatterjee, and D. E. Quevedo, “Stabilizing stochas-tic predictive control under bernoulli dropouts,”

IEEE Trans. Autom.Control , vol. 63, no. 6, pp. 1579–1590, Jun. 2018. [19] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poolla, and S. S. Sastry,“Foundations of control and estimation over lossy networks,”

Proc.IEEE , vol. 95, no. 1, pp. 163–187, Jan. 2007.[20] Z. Shen, A. Khoryaev, E. Eriksson, and X. Pan, “Dynamic uplink-downlink conﬁguration and interference management in TD-LTE,”

IEEE Commun. Mag. , vol. 50, no. 11, pp. 51–59, Nov. 2012.[21] F. Boccardi, J. Andrews, H. Elshaer, M. Dohler, S. Parkvall,P. Popovski, and S. Singh, “Why to decouple the uplink and downlinkin cellular networks and how to do it,”

IEEE Commun. Mag. , vol. 54,no. 3, pp. 110–117, Mar. 2016.[22] J. Yang and S. Ulukus, “Optimal packet scheduling in an energyharvesting communication system,”

IEEE Trans. on Commun. , vol. 60,no. 1, pp. 220–230, Jan. 2012.[23] A. Sabharwal, P. Schniter, D. Guo, D. W. Bliss, S. Rangarajan, andR. Wichman, “In-band full-duplex wireless: Challenges and opportuni-ties,”

IEEE J. Sel. Areas Commun. , vol. 32, no. 9, pp. 1637–1652, Sep.2014.[24] K. Gatsis, A. Ribeiro, and G. J. Pappas, “Optimal power management inwireless control systems,”

IEEE Trans. Autom. Control , vol. 59, no. 6,pp. 1495–1510, Jun. 2014.[25] P. Sadeghi, R. A. Kennedy, P. B. Rapajic, and R. Shams, “Finite-state Markov modeling of fading channels - a survey of principles andapplications,”

IEEE Signal Process. Mag. , vol. 25, no. 5, pp. 57–80,Sep. 2008.[26] V. Gupta, B. Sinopoli, S. Adlakha, A. Goldsmith, and R. Murray, “Re-ceding horizon networked control,” in

Proc. Allerton Conf. Commun.,Control Comput , 2006.[27] K. Gatsis, M. Pajic, A. Ribeiro, and G. J. Pappas, “Opportunistic controlover shared wireless channels,”

IEEE Trans. Autom. Control , vol. 60,no. 12, pp. 3140–3155, Dec. 2015.[28] J. O’Reilly, “The discrete linear time invariant time-optimal controlprobleman overview,”

Automatica , vol. 17, no. 2, pp. 363 – 370, 1981.[29] M. L. Puterman,

Markov decision processes: discrete stochastic dy-namic programming . John Wiley & Sons, 2014.[30] K. Huang, W. Liu, Y. Li, and B. Vucetic, “To retransmit or not: Real-time remote estimation in wireless networked control,” in

Proc. IEEEICC , 2019.[31] K. Huang, W. Liu, M. Shirvanimoghaddam, Y. Li, and B. Vucetic,“Real-time remote estimation with hybrid ARQ in wireless networkedcontrol,” submitted to IEEE Trans. Wireless Commun. , 2019. [Online].Available: https://arxiv.org/pdf/1903.12472.pdf[32] M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexityof solving Markov decision problems,” in

Proc. Association for Uncer-tainty in AI (AUAI) , 1995, pp. 394–402.[33]

GRAIN STORAGE , Grain research and development corporation, 2017.[Online]. Available: https://grdc.com.au/[34]