Closed-Form Delay-Optimal Power Control for Energy Harvesting Wireless System with Finite Energy Storage
11 Closed-Form Delay-Optimal Power Controlfor Energy Harvesting Wireless Systemwith Finite Energy Storage
Fan Zhang,
Student Member, IEEE , Vincent K. N. Lau,
Fellow, IEEE
Department of ECE, Hong Kong University of Science and Technology, Hong KongEmail: [email protected], [email protected]
Abstract —In this paper, we consider delay-optimal powercontrol for an energy harvesting wireless system with finite energystorage. The wireless system is powered solely by a renewableenergy source with bursty data arrivals, and is characterizedby a data queue and an energy queue . We consider a delay-optimal power control problem and formulate an infinite horizonaverage cost Markov Decision Process (MDP). To deal with thecurse of dimensionality, we introduce a virtual continuous timesystem and derive closed-form approximate priority functions forthe discrete time MDP at various operating regimes. Based onthe approximation, we obtain an online power control solutionwhich is adaptive to the channel state information as well asthe data and energy queue state information. The derived powercontrol solution has a multi-level water-filling structure, wherethe water level is determined jointly by the data and energyqueue lengths. We show through simulations that the proposedscheme has significant performance gain compared with variousbaselines.
I. I
NTRODUCTION
Recently, green communication has received considerableattention since it will play an important role in enhancingenergy efficiency and reducing carbon emissions in futurewireless networks [1], [2]. To support green communication, energy harvesting techniques such as solar panels, wind tur-bines and thermoelectric generators [3] have become popularfor enabling the transmission nodes to harvest energy fromthe environment. While the renewable energy sources mayappear to be virtually free and they are random in nature,energy storage is needed to buffer the unstable supply ofthe renewable energy [4]. In [5] and [6], the authors proposetransmission policies that minimize the transmission time fora given amount of data in point-to-point and broadcast energyharvesting networks with an infinite capacity battery. However,the infinity capacity battery assumption is not realistic inpractice. In [7] and [8], offline power allocation policiesare proposed by solving short-term throughput maximizationproblems under finite energy storage capacity in a finitetime horizon. However, the above works [5]–[8] assume thatthe realizations of the energy arrival processes are knownin advance (i.e., non-causal knowledge of future arrivals).Furthermore, the above proposed policies [5]–[8] are basedon the assumption that there are infinite data backlogs at thetransmitters so that the applications are delay-insensitive. Inpractice, it is very important to consider bursty data arrivals,bursty energy arrivals and delay requirements in designing thepower control policy for delay-sensitive applications. In this paper, we are interested in the online power controlsolution in a wireless system powered by a renewable energysource to support real-time delay-sensitive applications. Thewireless transmitter is powered solely by an energy harvestingstorage with limited energy storage capacity. Unlike the previ-ous proposed schemes [5]–[8], we consider an online controlpolicy, in the sense that we only have causal knowledge of thesystem states. Specifically, to support real-time applicationswith bursty data arrivals and bursty renewable energy arrivals,it is very important to dynamically control the transmit powerthat is adaptive to the channel state information (CSI), the dataqueue length (DQSI) and the energy queue length (EQSI). TheCSI reveals the transmission opportunities of the time-varyingphysical channels. The DQSI reveals the urgency of the dataflows and the EQSI reveals the availability of the renewableenergy . It is highly non-trivial to strike a good balance betweenthese factors.Online power control adaptive to the CSI, the DQSI andthe EQSI is quite challenging because the associated opti-mization problem belongs to an infinite-dimensional stochasticoptimization problem. There is intense research interest inexploiting renewable energy in communication network de-signs. In [9], the authors use large deviations theory to findthe closed-form expression for the buffer overflow probabilityand design an energy-efficient scheme for maximizing thedecay exponent of this probability. In [10], the authors proposethroughput-optimal control policies (in the stability sense) thatare adaptive to the CSI, the DQSI and the EQSI for a point-to-point energy harvesting network. In [11] and [12], the authorsextend the Lyapunov optimization framework to derive energymanagement algorithms, which can stabilize the data queue forenergy harvesting networks with finite energy storage capac-ity. Note that the buffer overflow probability and the queuestability are weak forms of delay performance, and it is ofgreat importance to study the control policies that minimize theaverage delay of the queueing network. A systematic approachin dealing with the delay-optimal control is to formulate theproblem into an Markov Decision Process (MDP) [13], [14].In [15], the authors propose several heuristic event-basedadaptive transmission policies on the basis of a finite horizonMDP formulation. These solutions are suboptimal and withno performance guarantee. In [4], the authors consider onlinepower control for the interference network with a renewableenergy supply by solving an infinite horizon average cost a r X i v : . [ c s . I T ] A ug MDP. The authors in [10] also propose an online delay-optimalpower control policy by solving an infinite horizon MDP forenergy harvesting networks. However, the MDP problems in[4] and [10] are solved using numerical iteration algorithms,such as value iteration or policy iteration algorithms (
Chap.4 in Vol. 2 of [14]), which suffer from slow convergence anda lack of insight. There are some existing works that adoptMDP/POMDP approaches to solve the stochastic resourceallocation problems for energy harvesting wireless sensornetworks [16]–[24]. In [16], the authors consider a simplebirth-death model for the energy queue dynamics and obtaina threshold-like data transmission scheme by maximizing anaverage data rate reward using the MDP approach. However,the energy queue model considered in [16] is a simplifiedmodel and the approach therein cannot be applied in ourscenario with general energy queue dynamics. In [17]–[19],the authors propose an efficient power control scheme tominimize the average power consumption and the packet errorrate. In [20] and [21], the authors consider on-off control of thesensor to maximize the event detection efficiency or maximizethe discounted weighted sum transmitted data. However, thepower control actions in [17]–[21] are chosen from discreteand finite action spaces. Hence, the approaches in [17]–[21]cannot be applied to our scenario where the power controlaction is chosen from a continuous action space. In [22]and [23], the authors propose a power allocation schemefor an energy harvesting sensor network with finite energybuffer capacity by solving a POMDP problem. However, theyconsider non-causal control, which means that the realizationsof the energy arrival processes are known in advance. In[24], the authors consider general energy queue model withonline causal power control schemes. However, the stochasticMDP/POMDP problems in [16]–[24] are solved using nu-merical value iteration or policy iteration algorithms [14]. Inthis paper, we focus on deriving a closed-form delay-optimalonline power control solution that is adaptive to the CSI, theDQSI and the EQSI. There are several first order technicalchallenges associated with the stochastic optimization. • Challenges due to the Queue-Dependent Control:
Inorder to maintain low average delay performance andefficiently use the renewable energy in a finite capac-ity storage, it is important to dynamically control thetransmit power based on the CSI, the DQSI and theEQSI. As a result, the underlying problem embracesboth information theory (to model the physical layerdynamics) and the queueing theory (to model the dataand energy queue dynamics) and is an infinite horizonstochastic optimization [13], [14]. Such problems arewell-known to be very challenging due to the infinite-dimensional optimization (w.r.t. control policy) and lackof closed-form characterization of the value function inthe optimality equation (i.e., the Bellman equation). • Complex Coupling between the Data Queue and theEnergy Queue:
The service rate of the transmitter inthe energy harvesting network depends on the currentavailable energy stored in the energy queue buffer. Assuch, the dynamics of the data queue and the energyqueue are coupled together. The associated stochastic
Receiver noise n(t)CSI Transmitter (t)(t) data queueenergy queue R(t) cross-layer controller EQSI DQSI power control actionSolar panelReal-time application data power, rate, modulation scheme
MAC layer PHY layer fading channel h(t)
Fig. 1: System model of the point-to-point energy harvesting system. optimization problem is a multi-dimensional MDP [25].To solve the associated Bellman equation, numericalbrute-force approaches (e.g., value iteration and policyiteration [14]) can be adopted, but they are not practicaland provide no design insights. Therefore, it is desirableto obtain a low complexity and insightful solution for thedynamic power control in the energy harvesting system. • Challenges due to the Finite Energy Storage and Non-i.i.d. Energy Arrivals:
In practice, the energy storage (orbattery) at the transmitter has finite capacity only. Thefinite renewable energy storage limit induces a difficult energy availability constraint (the energy consumptionper time slot cannot exceed the available energy in thestorage) in the stochastic optimization problem. Further-more, in the previous literature (e.g., [4]–[9]), the burstyenergy arrivals are modeled as an i.i.d. process for analyt-ical tractability. In [10], the authors also consider periodicstationary energy arrivals for designing the power controlpolicies. In practice, most of the renewable energy arrivalsare not i.i.d.. Such non-iid nature will have a huge impacton the dimensioning of the battery capacity.In this paper, we model the delay-optimal power controlproblem as an infinite horizon average cost MDP. Specifically,the stochastic MDP problem is to minimize the average delayof the transmitter subject to the energy availability constraint.By exploiting the special structure in our problem, we derivean equivalent Bellman equation to solve the MDP. We thenintroduce a virtual continuous time system (VCTS) where theevolutions of the data and energy queues are characterizedby two coupled differential equations with reflections. Weshow that the priority function of the associated total costproblem in the VCTS is asymptotically optimal to that ofthe discrete time MDP problem when the slot duration issufficiently small. Using the priority function in the VCTS asan approximation to the optimal priority function, we deriveonline power control solutions and obtain design insights fromthe structural properties of the priority function under differentasymptotic regimes. The power control solution has a multi-level water-filling structure, where the DQSI and the EQSIdetermine the water level via the priority function. Finally,we compare the proposed solution with various baselines andshow that significant performance gain can be achieved.II. S
YSTEM M ODEL
We consider a point-to-point energy harvesting system withfinite energy storage. Fig. 1 illustrates the top-level systemmodel, where the transmitter is powered solely by the energyharvesting storage with limited energy storage capacity. The transmitter acts as a cross-layer controller , which takes theCSI, the DQSI and the EQSI as input and generates powercontrol action as output. In this paper, the time dimension ispartitioned into decision slot indexed by n ( n = 0 , , , . . . )with duration τ . In the following subsections, we elaborate onthe physical layer model and the bursty data arrival model, aswell as the renewable energy arrival model. A. Physical Layer Model
We consider a point-to-point system as shown in Fig. 1.The transmitter sends information to the receiver. Let s bethe transmitted information symbol and the received signal isgiven by y = h √ ps + z (1)where h ∈ C is the complex channel fading coefficientbetween the transmitter and the receiver, p is the transmitpower, and z ∼ CN (0 , is the i.i.d. complex Gaussianadditive channel noise. We have the following assumption onthe channel model. Assumption 1 (Channel Model): h ( n ) remains constantwithin each decision slot and is i.i.d. over the slots. Specif-ically, we assume that h ( n ) follows a complex Gaussiandistribution with zero mean and unit variance, i.e., h ( n ) ∼CN (0 , .For given CSI realization h and power control action p , theachievable data rate (bit/s/Hz) for the transmitter-receiver pairis given by R ( h, p ) = log (cid:16) ζp | h | (cid:17) (2)where ζ ∈ (0 , is a constant that is determined by themodulation and coding scheme (MCS) used in the system.For example, ζ = 0 . for QAM constellation at BER= 1%[26] and ζ = 1 for capacity-achieving coding (in which (2)corresponds to the instantaneous mutual information). In thispaper, our derived results are based on ζ = 1 for simplicity,which can be easily extended to other MCS cases. B. Bursty Data Source Model and Data Queue Dynamics
As illustrated in Fig. 1, the transmitter maintains a dataqueue for the bursty traffic flow towards the receiver. Let λ ( n ) τ be the random new data arrival (bits) at the end of the n -th decision slot at the transmitter. We have the followingassumption on the data arrival process. Assumption 2 (Bursty Data Source Model):
The data ar-rival process λ ( n ) is i.i.d. over the slots according to a generaldistribution Pr[ λ ] with finite average arrival rate E [ λ ] = λ .Let Q ( n ) ∈ Q denote the DQSI (bits) at the data queue ofthe transmitter at the beginning of the n -th slot, where Q =[0 , ∞ ) is the DQSI state space. We assume that the transmitteris causal in the sense that new data arrivals are observed afterthe control actions are performed at each decision slot. Hence,the data queue dynamics is given by Q ( n + 1) = [ Q ( n ) − R ( h ( n ) , p ( n )) τ ] + + λ ( n ) τ (3)where x + (cid:44) max { x, } . Note that p ( n ) is transmit power ofthe transmitter at time slot n and the power solely comes fromthe renewable energy source. We shall define the renewableenergy source model in the next subsection. C. Renewable Energy Source Model and Energy Queue Dy-namics
The power of the transmitter solely comes from the re-newable energy source. Specifically, the transmitter is capableof harvesting energy from the environment, e.g., using solarpanels, wind turbines and thermoelectric generators [3]. Weassume that the energy arrival process is block i.i.d. withblock size N . The block i.i.d. energy arrival model is usedto take into account that the energy arrival process evolves ata different timescale w.r.t. that of the data arrival process. Let α ( n ) τ be the random renewable energy arrival (Joules) at theend of the n -th decision slot at the transmitter. We have thefollowing assumption on the energy arrival process. Assumption 3 (Block i.i.d. Renewable Energy Source Model):
The energy arrival process α ( n ) is block i.i.d. in the sensethat α ( n ) is constant for a block of N slots and is i.i.d.between blocks according to a general distribution Pr[ α ] withfinite average energy arrival rate E [ α ] = α .Due to the random nature of the renewable energy, thereis limited energy storage capacity at the transmitter to bufferthe renewable energy arrivals. Let E ( n ) ∈ E denote the EQSI(Joules) at the beginning of the n -th slot, where E = [0 , N E ] isthe EQSI state space and N E denotes the energy queue buffersize (i.e., energy storage capacity in Joules). Remark 1 (Discussions on the Finite Energy Queue Capacity):
High-capacity renewable energy storage is very expensive[27] and energy storage is one key cost component inrenewable energy systems. As such, it is very important toconsider the impact on how the finite renewable energy bufferaffects the system performance. The analysis also serves asthe first order dimensioning on how large an energy buffer isneeded.Note that when the energy buffer is full, i.e., E ( n ) = N E ,additional energy cannot be harvested. Similarly, we assumethat the transmitter is causal so that the renewable energyarrival E ( n ) is observed only after the power actions. Hence,the energy queue dynamics at the transmitter is given by E ( n + 1) = min (cid:8) E ( n ) − p ( n ) τ + α ( n ) τ, N E (cid:9) (4)where the renewable power consumption p ( n ) must satisfythe following energy availability constraint : p ( n ) τ ≤ E ( n ) (5)The energy availability constraint means that the energy con-sumption at each time slot cannot exceed the current availableenergy in the energy storage. Due to this constraint, the energyqueue E ( n ) in (4) will not go below zero (i.e., E ( n ) ≥ forall n ). Remark 2: (Coupling Property of Data Queue and EnergyQueue)
The data queue dynamics in (3) and the energy queuedynamics in (4) are coupled together. Specifically, the servicerate R ( n ) in the data queue depends on the power controlaction p ( n ) , which solely comes from the energy queuebuffer. Specifically, α ( n ) is constant when kN ≤ t < ( k + 1) N for any given t , where k is a positive integer. III. D
ELAY -O PTIMAL P ROBLEM F ORMULATION
In this section, we formally define the power control policyand formulate the delay-optimal control problem for the point-to-point energy harvesting system.
A. Power Control Policy
For notation convenience, denote χ ( n ) =( h ( n ) , Q ( n ) , E ( n )) . Let F ( n ) = σ (cid:0)(cid:8) χ ( i ) : 0 ≤ i ≤ n (cid:9)(cid:1) bethe minimal σ -algebra containing the set (cid:8) χ ( i ) : 0 ≤ i ≤ n (cid:9) ,and (cid:8) F ( n ) (cid:9) be the associated filtration [28]. At the beginningof the n -th slot, the transmitter determines the power controlaction based on the following control policy: Definition 1 (Power Control Policy):
A power control pol-icy for the transmitter Ω is F ( n ) -adapted at each time slot n ,meaning that the power control action p ( n ) is adaptive to allthe information χ ( i ) up to tome n (i.e., (cid:8) χ ( i ) : 0 ≤ i ≤ n (cid:9) ).Furthermore, the power control policy Ω satisfies the energyavailability constraint in (5), i.e., p ( n ) τ ≤ E ( n ) ( ∀ n ).Given a control policy Ω , the random process { χ ( n ) } is a controlled Markov chain with the following transitionprobability: Pr (cid:2) χ ( n + 1) (cid:12)(cid:12) χ ( n ) , Ω (cid:0) χ ( n ) (cid:1)(cid:3) = Pr (cid:2) h ( n + 1) (cid:3) Pr (cid:2) Q ( n + 1) (cid:12)(cid:12) Q ( n ) , h ( n ) , Ω (cid:0) χ ( n ) (cid:1)(cid:3) · Pr (cid:2) E ( n + 1) (cid:12)(cid:12) E ( n ) , Ω (cid:0) χ ( n ) (cid:1)(cid:3) (6)where Pr (cid:2) Q ( n + 1) (cid:12)(cid:12) Q ( n ) , h ( n ) , Ω (cid:0) χ ( n ) (cid:1)(cid:3) is the dataqueue transition probability which is given by Pr (cid:2) Q ( n + 1) = Q (cid:48) (cid:12)(cid:12) Q ( n ) = Q, h ( n ) = h, Ω (cid:0) χ ( n ) (cid:1) = p (cid:3) = (cid:40) Pr [ λ ( n )] , if Q (cid:48) = [ Q − R ( h, p ) τ ] + + λ ( n ) τ , otherwise (7)and Pr (cid:2) E ( n + 1) (cid:12)(cid:12) E ( n ) , Ω (cid:0) χ ( n ) (cid:1)(cid:3) is the energy queuetransition probability which is given by Pr (cid:2) E ( n + 1) = E (cid:48) (cid:12)(cid:12) E ( n ) = E, Ω (cid:0) χ ( n ) (cid:1) = p (cid:3) = (cid:40) Pr [ α ( n )] , if E (cid:48) = min (cid:8) E − pτ + α ( n ) τ, N E (cid:9) , otherwise (8)Furthermore, we have the following definition on the admis-sible control policy: Definition 2 (Admissible Control Policy):
A policy Ω is ad-missible if the following requirements are satisfied: • Ω is a unichain policy, i.e., the controlled Markov chain { χ ( n ) } under Ω has a single recurrent class (and possiblysome transient states) [14]. • The queueing system under Ω is stable in the sensethat lim n →∞ E Ω (cid:2) Q ( n ) (cid:3) < ∞ , where E Ω means takingexpectation w.r.t. the probability measure induced by thecontrol policy Ω . B. Problem Formulation
As a result, under an admissible control policy Ω , theaverage delay cost of the energy harvesting system startingfrom a given initial state χ (0) is given by D (Ω) = lim sup N →∞ N N − (cid:88) n =0 E Ω (cid:20) Q ( n ) λ (cid:21) (9)We consider the following delay-optimal power controloptimization for the energy harvesting system: Problem 1 (Delay-Optimal Power Control Optimization): min Ω D (Ω) (10)where Ω satisfies the energy availability constraint accordingto Definition 1. C. Optimality Conditions
While the MDP in Problem 1 is difficult in general, weutilize the i.i.d. assumption of the CSI to derive an equivalentoptimality equation as summarized below.
Theorem 1 (Sufficient Conditions for Optimality):
Assumethere exists a ( θ ∗ , { V ∗ ( Q, E ) } ) that solves the following equivalent optimality equation : θ ∗ + V ∗ ( Q, E ) ∀ Q, E (11) = E (cid:20) min p Furthermore,align all admissible control policy Ω and initialqueue state ( Q (0) , E (0)) , V ∗ satisfies the following transver-sality condition : lim N →∞ N E Ω [ V ∗ ( Q ( N ) , E ( N )) | Q (0) , E (0)] = 0 (12)Then, we have the following results: • θ ∗ = min Ω D (Ω) is the optimal average cost for any initialstate χ (0) and V ∗ ( Q, E ) is called the priority function . • Suppose there exists an admissible stationary controlpolicy Ω ∗ with Ω ∗ ( χ ) = p ∗ for any χ , where p ∗ attainsthe minimum of the R.H.S. of (11) for given χ . Then, theoptimal control policy of Problem 1 is given by Ω ∗ . Proof: please refer to Appendix A.Based on the unichain assumption of the control policyin Definition 2, there is a unique solution for the Bellmanequation in (11) and the transversality condition in (12). Thesolution V ∗ ( Q, E ) captures the dynamic priority of the dataflow for different ( Q, E ) . However, obtaining the priorityfunction V ∗ ( Q, E ) is highly non-trivial as it involves solvinga large system of nonlinear fixed point equations. Brute-forceapproaches (such as value iteration and policy iteration [14]))have huge complexity. Challenge 1: Huge complexity in obtaining the priorityfunction V ∗ ( Q, E ) . IV. V IRTUAL C ONTINUOUS T IME S YSTEM AND A PPROXIMATE P RIORITY F UNCTION In this section, we adopt a continuous time approach so thatwe can exploit calculus techniques and theories of differentialequations to obtain a closed-form approximate priority func-tion. Specifically, we first reverse-engineer a virtual continuoustime system (VCTS) and an associated total cost problem inthe VCTS. We show that the optimality conditions of theVCTS is equivalent to that of the original MDP (up to o ( τ ) order optimal). Based on that, we exploit calculus techniquesand theories of differential equations to obtain a closed-formcharacterization of the priority function V ∗ ( Q, E ) . A. Virtual Continuous Time System We first define the VCTS, which is a fictitious system witha continuous virtual queue state ( q ( t ) , e ( t )) , where q ( t ) ∈ [0 , ∞ ) and e ( t ) ∈ [0 , N E ) are the virtual data queue state andvirtual energy queue state at time t ( t ∈ [0 , ∞ ) ).Let Ω v be the virtual power control policy of theVCTS. Similarly, Ω v is F vt -adapted, where F vt = σ (cid:0)(cid:8) h ( s ) , q ( s ) , e ( s ) : 0 < s < t (cid:9)(cid:1) and (cid:8) F vt (cid:9) is the filtrationof the VCTS. Furthermore, the virtual power control policy Ω v satisfies the virtual energy availability constraint, i.e., p ( t ) τ ≤ e ( t ) ( ∀ t ). Given an initial virtual system state ( q , e ) and a virtual policy Ω v , the trajectory of the virtual queueingsystem is described by the following coupled differentialequations with reflections: d q ( t ) = (cid:0) − E (cid:2) R ( h ( t ) , p ( t )) (cid:12)(cid:12) q ( t ) , e ( t ) (cid:3) + λ (cid:1) τ d t + d L ( t ) (13) d e ( t ) = (cid:0) − E (cid:2) p ( t ) (cid:12)(cid:12) q ( t ) , e ( t ) (cid:3) + α (cid:1) τ d t − d U ( t ) (14)where L ( t ) and U ( t ) are the reflection processes associatedwith the lower data queue boundary q ( t ) = 0 and upper energyqueue boundary e ( t ) = N E , which are uniquely determinedby the following equations ( Chap. 2.4 of [30]): L ( t ) = max (cid:26) , − min t (cid:48) ≤ t (cid:20) q (15) + ˆ t (cid:48) (cid:0) − E (cid:2) R ( h ( s ) , p ( s )) (cid:12)(cid:12) q ( s ) , e ( s ) (cid:3) + λ (cid:1) τ d s (cid:21)(cid:27) U ( t ) = max (cid:26) , max t (cid:48) ≤ t (cid:20) e (16) + ˆ t (cid:48) (cid:0) − E (cid:2) p ( s ) (cid:12)(cid:12) q ( s ) , e ( s ) (cid:3) + α (cid:1) τ d s (cid:21) − N E (cid:27) with L (0) = U (0) = 0 .Note that the process L ( t ) ensures that the virtual dataqueue length q ( t ) will not go below zero. The process U ( t ) together with the virtual energy availability constraint ensuresthat the virtual energy queue length lies in the domain [0 , N E ] .Fig. 2 illustrates an example of the trajectories of { q ( t ) , L ( t ) } and { e ( t ) , U ( t ) } for a virtual policy Ω v . L ( t ) and U ( t ) are non-decreasing and minimal subject to the constraintthat q ( t ) ≥ and e ( t ) ≤ N E , respectively [30]. According to [29], commercial solar panels usually provide 1 ∼ energy harvesting performance. We assume that the wireless trans-mitter (e.g., base station) is equipped with a 20cm × Elapsed time (sec) V i r t ua l da t a queue and t he a ss o c i a t ed r e f l e c t i on p r o c e ss ( p ck ) L(t)q(t) (a) Trajectories of { q ( t ) , L ( t ) } . Elapsed time (sec) E ne r g y queue and t he a ss o c i a t ed r e f l e c t i on p r o c e ss ( J ou l e ) e(t) U(t) (b) Trajectories of { e ( t ) , U ( t ) } . Fig. 2: The system parameters are configured as follows: τ = 0 . s, λ = 1 . pcks/s, α = 10 W, N E = 600 J. The virtual control policy Ω v is p = 0 W when e < . J, and p = 8 W if e > J. Furthermore, we have the following definition on the ad-missible virtual control policy for the VCTS. Definition 3 (Admissible Virtual Control Policy for VCTS): A virtual policy Ω v for the VCTS is admissible if the followingrequirements are satisfied: • For any initial virtual queue state ( q , e ) , the virtualqueue trajectory (cid:0) q ( t ) , e ( t ) (cid:1) in (13) and (14) under Ω v is unique. • For any initial virtual queue state ( q , e ) , the total cost ´ ∞ q ( t ) d t under Ω v is bounded. B. Total Cost Problem under the VCTS Given an admissible virtual control policy Ω v , we definethe total cost of the VCTS starting from a given initial virtualqueue state ( q , e ) as V ( q , e ; Ω v ) = ˆ ∞ q ( t ) d t (17)We consider the following infinite horizon total cost prob-lem for the VCTS: Problem 2 (Infinite Horizon Total Cost Problem for VCTS): For any initial virtual queue state ( q , e ) , the infinite horizontotal cost problem for the VCTS is formulated as min Ω v V ( q , e ; Ω v ) (18)where V ( q , e ; Ω v ) is given in (17).Note that the two technical conditions in Definition 3 on theadmissible virtual policy are for the existence of an optimalpolicy for the total cost problem in Problem 2. The above totalcost problem has been well-studied in the continuous timeoptimal control theory ( Chap. 2.6 of [31]). The solution canbe obtained by solving the Hamilton-Jacobi-Bellman (HJB)equation as below. Theorem 2 (Sufficient Conditions for Optimality under VCTS): Assume there exists a function V ( q, e ) that is of class C ( R ) , and V ( q, e ) satisfies the following HJB equation: min p ≤ e/τ E (cid:20) qλτ + ∂V ( q, e ) ∂q (cid:0) − R ( h, p ) + λ (cid:1) (19) + ∂V ( q, e ) ∂e (cid:0) − p + α (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) q, e (cid:21) = 0 ∀ q, e Furthermore, for all admissible virtual control policy Ω v andinitial virtual queue state ( q , e ) , the following conditions aresatisfied: lim sup T →∞ ˆ T ∂V (0 , e ( t )) ∂q L ( t ) d t = 0lim sup T →∞ ˆ T ∂V ( q ( t ) , N E ) ∂e U ( t ) d t = 0lim sup T →∞ V ( q ( T ) , e ( T )) = 0 (20)Then, we have the following results: • V ( q, e ) = min Ω v V ( q , e ; Ω v ) is the optimal total costwhen ( q , e ) = ( q, e ) and V ( q, e ) is called the virtualpriority function . • Suppose there exists an admissible virtual stationary con-trol policy Ω v ∗ with Ω v ∗ ( h, q, e ) = p ∗ for any ( h, q, e ) ,where p ∗ attains the minimum of the L.H.S. of (19)for given ( h, q, e ) . Then, the optimal control policy ofProblem 2 is given by Ω v ∗ . Proof: Please refer to Appendix B.In the following theorem, we establish the relationshipbetween the virtual priority function V ( Q, E ) in Theorem 2and the optimal priority function V ∗ ( Q, E ) in Theorem 1. Theorem 3 (Relationship between V ( Q, E ) and V ∗ ( Q, E ) ): If V ( Q, E ) = O (cid:0) Q (cid:1) and Ω v ∗ is admissible in the discretetime system, then V ∗ ( Q, E ) = V ( Q, E ) + o ( τ ) . Proof: please refer to Appendix C.Theorem 3 means that V ( Q, E ) can serve as an approximatepriority function to the optimal priority function V ∗ ( Q, E ) with approximation error o ( τ ) . As a result, solving theoptimality equation in (11) is transformed into a calculusproblem of solving the HJB equation in (19). In the nextsubsection, we shall focus on solving the HJB equation in(19) by leveraging the well-established theories of calculusand differential equations. f ( x ) ( x is a K -dim vector) is of class C ( R K + ) , if the first order partialderivatives w.r.t. each element of x ∈ R K + are continuous. V. C LOSED -F ORM D ELAY -O PTIMAL P OWER C ONTROL The HJB equation in Theorem 2 is a coupled two-dimensional partial differential equation (PDE) and hence, onekey obstacle is to obtain the closed-form solution to the PDE. Challenge 2: Solution of the coupled two-dimensional PDEin Theorem 2.In this section, using asymptotic analysis, we obtain closed-form solutions to the multi-dimensional PDE in differentoperating regimes. We also discuss the control insights fromthe structural properties of the closed-form priority functionsfor different asymptotic regimes. A. General Solution We first have the following corollary on the optimal powercontrol based on the HJB equation in Theorem 2 for given V ( q, e ) : Corollary 1 (Optimal Power Control based on Theorem 2): For given priority function V ( q, e ) , the optimal power controlaction from the HJB equation in Theorem 2 is given by p ∗ = min (cid:40)(cid:18) − ∂V ( q, e ) ∂q (cid:30) ∂V ( q, e ) ∂e − | h | (cid:19) + , eτ (cid:41) (21) Remark 3 (Structure of the Optimal Power Control Policy): The optimal power control policy in (21) depends on theinstantaneous CSI, DQSI and EQSI. Furthermore, the powercontrol action has a multi-level water-filling structure asillustrated in Fig. 4–Fig. 5, where the water level is adaptiveto the DQSI and the EQSI indirectly via the priority function V ( q, e ) . Therefore, the function V ( q, e ) captures how theDQSI and the EQSI affect the overall priority of the dataflow.We then establish the following theorem on the sufficientconditions to ensure the existence of solution to the PDE inTheorem 2: Theorem 4 (Sufficient Conditions for the Existence of Solution): There exists a V ( q, e ) that satisfies (19) and (20) in Theorem2 if the following conditions are satisfied: λ < exp (cid:18) x (cid:19) E (cid:18) x (cid:19) (22) N E ≥ e ∗ (23)where x satisfies x exp (cid:0) − x (cid:1) − E (cid:0) x (cid:1) = α and E ( x ) (cid:44) ´ ∞ e − tx t d t is the exponential integral function. e ∗ is thesolution of the fixed point equation in (46) in Appendix Cif λ > E (cid:0) α (cid:1) , and e ∗ = ατ if λ ≤ E (cid:0) α (cid:1) . Proof: Please refer to Appendix D.The challenge is to find a priority function V ( q, e ) thatsatisfies (19) and (20). Note that the PDE in (19) is a two-dimensional PDE, which has no closed-form solution for thepriority function V ( q, e ) . In the next subsection V-B, weconsider different asymptotic regimes and obtain closed-formsolutions of V ( q, e ) for these operating regimes. Average energy arrival rate α A v e r ag d a t aa rr i v a l r a t e λ E ! α " exp ! x " E ! x " small−data−arrival−energy−sufficient regime large−data−arrival− energy−sufficientregime large−data−arrival−energy−limited regime(unstable data queue) small−data−arrival− energy−limited regime Fig. 3: Asymptotic regimes of the energy harvesting system. B. Asymptotic Closed-Form Priority Functions and ControlInsights In this subsection, we obtain the closed-form priority func-tions V ( q, e ) in different asymptotic regimes as illustrated inFig. 3 and discuss the control insights for each regime. 1) Large-Data-Arrival-Energy-Sufficient Regime: In thisregime, we consider the operating region with large λ andlarge α , and E (cid:0) α (cid:1) < λ < exp (cid:0) x (cid:1) E (cid:0) x (cid:1) (where x satisfies x exp (cid:0) − x (cid:1) − E (cid:0) x (cid:1) = α ). This regime corresponds tothe scenario that we have a large data arrival rate for thedata queue and sufficient renewable energy supply for theenergy queue to maintain the data queue stable. The closed-form priority function V ( q, e ) for this regime is given by thefollowing theorem: Theorem 5: (Closed-Form V ( q, e ) for the Large-Data-Arrival-Energy-Sufficient Regime) Under the large-data-arrival-energy-sufficient regime, the closed-form V ( q, e ) of thePDE in Theorem 2 is given by • When < e < e th ( e th is the solution of E (cid:0) τe th (cid:1) = λ ),we have V ( q, e ) = e λα τ (cid:16) γ eu + 2 λ − eτ (cid:17) − eqλατ + C (24)where γ eu is the Euler’s constant and C = τ λ (cid:0) γ eu + 2 λ − α (cid:1) . • When e ≥ e th , V ( q, e ) is a function of q only. Proof: Please refer to Appendix E.Based on Theorem 5, when e ≥ e th , since V ( q, e ) is afunction of q only, we have ∂V ( q,e ) ∂e = 0 for given q and e .Therefore, the water level in (21) is infinite and hence, wehave p ∗ = eτ . Furthermore, based on the closed-form priorityfunction in (24), we can calculate the closed-form expressionof the water level − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e in (21). We summarizethe optimal power control structure for this regime in thefollowing corollary: Under the condition in (22) in Theorem 4, we have that α grows at leastat the order of exp( λ ) . Therefore, large λ induces large α . The regime withlarge λ and small α will cause the system to be unstable and is not includedin our discussions. From (24), we have − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e = αee ( γ eu + λ − log( eτ ) ) − αq . Fig. 4: Water level versus the data queue length and the energy queuelength for the large-data-arrival-energy-sufficient regime, where τ =0 . s, λ = 1 . pcks/s, α = 10 W, bandwidth is MHz, and averagepacket length is Mbits. Corollary 2: (Optimal Power Control Structure for the Large-Data-Arrival-Energy-Sufficient Regime) The optimal powercontrol for the large-data-arrival-energy-sufficient regime isgiven by • When < e < e th and q > eα (cid:0) γ eu + λ − log( eτ ) (cid:1) , p ∗ =0 . • When < e < αq and q < eα (cid:0) γ eu + λ − log( eτ ) (cid:1) , thewater level − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e is an increasing functionof q for a given e , and is a decreasing function of e fora given q . • When αq < e < e th and q < eα (cid:0) γ eu + λ − log( eτ ) (cid:1) , thewater level − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e is an increasing functionof q for a given e , and is an increasing function of e fora given q . • When e ≥ e th , p ∗ = eτ . Proof: Please refer to Appendix F.Fig. 4 illustrates the water level versus the data queue lengthand the energy queue length when e < e th . Specifically,Corollary 2 means that when < e < e th and for alarge data queue length q > eα (cid:0) γ eu + λ − log( eτ ) (cid:1) , we donot use any renewable energy to transmit data. The reasonis that we do not have enough energy to support the largedata arrival rate, and it is appropriate to wait for futuregood transmission opportunities. For a small queue length q < eα (cid:0) γ eu + λ − log( eτ ) (cid:1) , we can use the available energyfor transmission and the water level is increasing w.r.t. q ,which is in accordance with the high urgency of the data flow.Furthermore, when < e < αq , the water level decreases as e increases, which is reasonable because it is better to save someenergy for the future transmissions. When e > αq , the waterlevel increases as e increases, which is reasonable because wehave relatively sufficient available energy and it is appropriateto use more power to decrease the data queue. When e ≥ e th , In order for q < eα (cid:16) γ eu + λ − log( eτ ) (cid:17) to hold, we require αq ≤ τ exp( γ eu + λ − . For large λ , e th ≈ τ exp( γ eu + λ ) . Therefore, we have αq ∈ [0 , e th ] . we have sufficient renewable energy, and it is appropriate touse all the available energy and make room for the futureenergy arrivals. 2) Small-Data-Arrival-Energy-Limited Regime: In thisregime, we consider the operating region with small λ andsmall α , and E (cid:0) α (cid:1) < λ < exp (cid:0) x (cid:1) E (cid:0) x (cid:1) . This regimecorresponds to the scenario that we have a small data arrivalrate for the data queue and insufficient energy supply for theenergy queue. The closed-form priority function V ( q, e ) forthis regime is given by the following theorem: Theorem 6: (Closed-Form V ( q, e ) for the Small-Data-Arrival-Energy-Limited Regime) Under the small-data-arrival-energy-limited regime, the closed-form V ( q, e ) of the PDE inTheorem 2 is given by • When < e < e th ( e th is the solution of E (cid:0) τe th (cid:1) = λ ),we have V ( q, e ) = − e λα τ + e α τ − qeλατ + C (25) • When e ≥ e th , V ( q, e ) is a function of q only and C = τ − ατ λ . Proof: Please refer to Appendix G.Based on the closed-form V ( q, e ) in Theorem 6, we havethe following corollary summarizing the optimal power controlstructure for this regime : Corollary 3: (Optimal Power Control Structure for the Small-Data-Arrival-Energy-Limited Regime) The optimal power con-trol for the small-data-arrival-energy-limited regime is givenby • When < e < e th and q > − e + λτeατ , p ∗ = 0 . • When < e < √ ατ q and q < − e + λτeατ , the water level − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e is an increasing function of q for agiven e , and is a decreasing function of e for a given q . • When √ ατ q < e < e th and q < − e + λτeατ , the waterlevel − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e is an increasing function of q fora given e , and is an increasing function of e for a given q . • When e ≥ e th , we have p ∗ = eτ . Proof: Please refer to Appendix H.Fig. 5 illustrates the water level versus the data queue lengthand the energy queue length when e < e th . Specifically,Corollary 3 means that when < e < e th and for a largedata queue length q > − e + λτeατ , we do not use any renewableenergy to transmit data. The reason is that even though we canuse the limited energy for data transmission, the data queuelength will not decrease significantly, which contributes verylittle to the delay performance. Instead, if we do not use theenergy at the current slot, we can save it and wait for thefuture good transmissions opportunities. On the other hand, fora small queue length q < − e + λτeατ , we can use the availableenergy for transmission and the water level is increasing w.r.t. q , which is in accordance with the high urgency of the data From (25), we have − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e = ατe − e + λτe − ατq . Fig. 5: Water level versus the data queue length and the energy queuelength for the small-data-arrival-energy-limited regime, where τ =0 . s, λ = 0 . pcks/s, α = 1 W, bandwidth is MHz, and averagepacket length is Mbits. flow. Furthermore, when < e < √ ατ q , large e leads to alower water level. This is reasonable because it is appropriatethat for small e , we can save some energy in the current slotfor better transmission opportunities in the future slots. When √ ατ q < e < e th , large e leads to a higher water level becausewe have sufficient available energy and it is appropriate to usemore power to decrease the data queue. When e ≥ e th , wehave plenty of renewable energy, and it is sufficient to use allthe available energy to support the small data arrival rate. 3) Small-Data-Arrival-Energy-Sufficient Regime: In thisregime, we consider the operating region with λ ≤ E (cid:0) α (cid:1) .This regime corresponds to the scenario that we have a smalldata arrival rate for the data queue and sufficient renewableenergy supply in the energy queue to maintain the data queuestable. The closed-form priority function V ( q, e ) for thisregime is given by the following theorem: Theorem 7: (Closed-Form V ( q, e ) for the Small-Data-Arrival-Energy-Sufficient Regime) Under the small-data-arrival-energy-sufficient regime, the closed-form V ( q, e ) of thePDE in Theorem 2 is given by • When < e < e th ( e th is the solution of E (cid:0) τe th (cid:1) = λ ),we have V ( q, e ) = 12 α τ e − eqλατ − λ τ (cid:18) q − λα e (cid:19) (26) • When e ≥ e th , V ( q, e ) is a function of q only. Proof: Please refer to Appendix I.Based on the closed-form V ( q, e ) in Theorem 7, we havethe following corollary summarizing the optimal power controlstructure in this regime : In order for q < − e + λτeατ to hold, we require √ ατq ≤ λτ . For small λ , e th τ exp (cid:16) − τe th (cid:17) ≈ λ < e th τ ⇒ e th > λτ . Therefore, we have √ ατq ∈ [0 , e th ] . Based on (26), we have ∂V ( q,e ) ∂e = 0 for all q, e , which induces aninfinite water level in (21). Hence, we have p ∗ = eτ when < e < e th . Corollary 4: (Optimal Power Control Structure for the Small-Data-Arrival-Energy-Sufficient Regime) The optimal powercontrol for the small-data-arrival-energy-sufficient regime isgiven by p ∗ = eτ (27)Corollary 4 means that the optimal control policy for thesmall-data-arrival-energy-sufficient regime is to use all theavailable energy in the energy buffer. This is reasonablebecause in this regime we have λ ≤ E (cid:0) α (cid:1) , which meansthat there is plenty of renewable energy and it is sufficient touse all the available energy to support the data traffic.Based on the closed-form solutions for the asymptoticoperating regions in Theorem 5–7, we propose the followingsolution for the PDE in Theorem 2 that covers all regimesw.r.t. (cid:0) λ, α (cid:1) : V ( q, e ) ≈ sol. in Thm 5 ,α ≥ α th , E (cid:18) α (cid:19) < λ < exp (cid:18) x (cid:19) E (cid:18) x (cid:19) sol. in Thm 6 ,α < α th , E (cid:18) α (cid:19) < λ < exp (cid:18) x (cid:19) E (cid:18) x (cid:19) sol. in Thm 7 , λ ≤ E (cid:18) α (cid:19) (28)where α th > is a solution parameter. C. Stability Conditions of using the Closed-Form Solution inthe Discrete-Time System In the previous subsection, we obtain the closed-form opti-mal power control solutions for different asymptotic regimesas in Theorem 5–7. We then establish the following theoremon the stability conditions when using the control policy inCorollary 1 in the original discrete time system in (3) and (4): Theorem 8: (Stability Conditions of using the Closed-FormSolutions in the Discrete-Time System): Using (28) and theclosed-form control policy in Corollary 1, if the followingconditions are satisfied: λ < E (cid:20) exp (cid:18) α (cid:19) E (cid:18) α (cid:19)(cid:21) (29) N E ≥ N e ∗ (30)where e ∗ is defined in (23), then the data queue in theoriginal discrete time system in (3) is stable, in the sense that lim n →∞ E (cid:2) Q ( n ) (cid:3) < ∞ . Proof: Please refer to Appendix J.Theorem 8 means that using (28), the closed-form controlpolicy in Corollary 1 is admissible according to Definition 2. Remark 4 (Interpretation of the Conditions in Theorem 4): • Interpretation of the Condition on λ and α in (29): The condition in (29) implies that α grows at least atthe order of exp( λ ) . It indicates that for given λ , if α istoo small, even if we use all the available energy in theenergy buffer at each time slot, the average data arrivalrate will be larger than the average data departure rate forthe data queue buffer. Therefore, the data queue cannotbe stabilized. • Interpretation of the Condition on N E in (30): Thecondition in (30) gives a first order design guideline onthe dimensioning of the energy storage capacity requiredat the transmitter. For example, N E should be at least ata similar order of N ατ . This condition on N E ensuresthat the energy storage at the transmitter has sufficientenergy to support data transmission for N slots when α ( t ) is small. VI. S IMULATIONS In this section, we compare the performance of the proposedclosed-form delay-optimal power control scheme in (21) withthe following three baselines using numerical simulations: • Baseline 1, Greedy Strategy (GS) [10]: At each timeslot, the transmitter sends data to the receiver using thepower p ( t ) = min (cid:110) α − (cid:15), E ( t ) τ (cid:111) for a given small positiveconstant (cid:15) . The GS is a throughput-optimal policy in thestability sense, i.e., it ensures the stability of the queueingnetwork. • Baseline 2, CSI-Only Water-Filling Strategy(COWFS) [10]: At each time slot, the transmittersends data to the receiver using the power p ( t ) = (cid:110)(cid:0) γ − | h ( t ) | (cid:1) + , E ( t ) τ (cid:111) . Specifically, thewater-filling solution in the COWFS is obtained bymaximizing the ergodic capacity E (cid:2) log(1 + p | h | ) (cid:3) withthe average power constraint E [ p ] = α − (cid:15) for a givensmall positive constant (cid:15) . • Baseline 3, Queue-Weighted Water-Filling Strategy(QWWFS) [11]: At each time slot, the transmitter sendsdata to the receiver using the power p = (cid:110)(cid:0) Q ( t ) γ − | h ( t ) | (cid:1) + , E ( t ) τ (cid:111) . The QWWFS is also a throughput-optimal policy. γ is the Lagrangian multiplier associatedwith the average power constraint E [ p ] = α − (cid:15) for agiven small positive constant (cid:15) .In the simulation, we consider a point-to-point energy har-vesting system, where a base station (BS) communicates witha mobile station. The BS is equipped with a 40cm × (29) ( a ) ⇒ λ < exp (cid:0) α (cid:1) E (cid:0) α (cid:1) = O (log α ) , where ( a ) isdue to E (cid:2) exp (cid:0) α (cid:1) E (cid:0) α (cid:1)(cid:3) < exp (cid:0) α (cid:1) E (cid:0) α (cid:1) using the concavity of exp (cid:0) x (cid:1) E (cid:0) x (cid:1) and the Jensen’s Inequality . Therefore, α grows at leastat the order of exp( λ ) . From (45) in Appendix D, we have e ∗ > ατ . Therefore, from (30), wehave N E > Nατ which means that N E grows at least at the order of Nατ . Baseline 1 (Baseline 2) refers to the greedy policy (CSI dependent policy)in Section III (Section V) of [10]. The Lagrangian multiplier γ for Baseline 2 and Baseline 3can be obtained by the following iterative equation: γ ( t + 1) =[ γ ( t ) + a t ( p − α + (cid:15) )] + , where a t is the step size satisfying (cid:80) t a t = ∞ , (cid:80) t a t < ∞ . As t → ∞ , the convergent γ ( ∞ ) can be shown to satisfy theaverage power constraint E [ p ] = α − (cid:15) [25]. Average energy arrival rate (W) P e r f o r m an c e l o ss r a t i o ( % ) Perf. loss under sol. in Thm. 6Boundary of using sol. in Thm. 5/6:Perf. loss under sol. in Thm. 5 α th ≈ . α Fig. 6: Performance loss ratio versus average energy arrival rate. panel with energy harvesting performance ∼ 10 mW/cm .We assume Poisson packet arrival with average packet arrivalrate λ (pck/s) and an exponentially distributed random packetsize with mean Mbits. The decision slot duration τ is ms,and the total bandwidth is MHz. Furthermore, we considerPoisson energy arrival [10] with average energy arrival rate α = ∼ 10 W. We assume that the block length of the energyarrival process is N = 6000 , i.e., the energy arrival rate α ( t ) at the BS changes every 5 min and the renewable energy isstored in a 1.2V 2000 mAh lithium-ion battery. We comparethe delay performance of the proposed scheme with the abovethree baselines. A. Choice of the Solution Parameter α th in (28) Fig. 6 illustrates the performance loss ratio versus theaverage energy arrival rate with the average data arrival rate λ = (cid:2) E (cid:0) α (cid:1) + exp (cid:0) x (cid:1) E (cid:0) x (cid:1)(cid:3) . It can be observed thatusing the solution in Theorem 5, the performance loss is smallfor large α and it increases as α decreases. In addition, usingthe solution in Theorem 6, the performance loss is small forsmall α and it increases as α increases. It can be observedthat choosing α th ≈ . can keep the performance loss downto over the entire operating regime w.r.t. ( λ, α ) . B. Delay Performance for the Large-Data-Arrival-Energy-Sufficient Regime Fig. 7 illustrates the average delay versus the average dataarrival rate for the large-data-arrival-energy-sufficient regime.The average data arrival rate is λ = 1 . ∼ . pcks/sand the average energy arrival rate is α = 10 W. Theaverage delay of all the schemes increases as the averagedata arrival rate increases, and the proposed scheme achievessignificant performance gain over all the baselines. The gain iscontributed by the DQSI and the EQSI aware dynamic waterlevel structure. It can be also observed that the performanceof the proposed closed-form solution is very close to that ofthe optimal value iteration algorithm (VIA) [14]. If the surrounding environment of the BS has sufficient sunlight, theenergy harvesting performance is high. Otherwise, the energy harvestingperformance is low [29]. The performance loss ratio is defined as Perf. of the proposed scheme − Perf. of the VIAPerf. of the VIA . Average data arrival rate (pck/s) A v e r age de l a y ( s ) Baseline 1, GSBaseline 2, COWFSBaseline 3, QWWFS Opt. VIA Proposed scheme with complete V in (28) and α th = 3 . λ Fig. 7: Average delay versus average data arrival rate for the large-data-arrival-energy-sufficient regime. The average energy arrival is α = 10 W. Average data arrival rate (pck/s) A v e r age de l a y ( s ) Baseline 1, GSBaseline 2, COWFSBaseline 3, QWWFS Proposed scheme with complete V in (28)andOpt. VIA α th = 3 . λ Fig. 8: Average delay versus average data arrival rate for the small-data-arrival-energy-sufficient regime. The average energy arrival is α = 1 W. C. Delay Performance for the Small-Data-Arrival-Energy-Limited Regime Fig. 8 illustrates the average delay versus the average dataarrival rate for the small-data-arrival-energy-limited regime.The average data arrival rate is λ = 0 . ∼ . pcks/s andthe average energy arrival rate is α = 1 W. The proposedscheme achieves significant performance gain over all thebaselines due to the DQSI and the EQSI aware dynamic waterlevel structure. Furthermore, the performance of the proposedclosed-form solution is very close to that of the VIA. D. Delay Performance for the Small-Data-Arrival-Energy-Sufficient Regime Fig. 9 illustrates the average delay versus the average dataarrival rate for the small-data-arrival-energy-sufficient regime.The average data arrival rate is λ = 0 . ∼ . pcks/s and theaverage energy arrival rate is α = 6 W. The delay performanceof the proposed scheme is very close to that of Baseline 3and also better than those of Baselines 1 and 2. However,our proposed scheme has lower complexity compared withBaseline 3, which involves the gradient update to obtain theLagrangian multiplier. Therefore, it is better to adopt ourproposed scheme for the small-data-arrival-energy-sufficient Baseline 1 Baseline 2 Baseline 3 Proposed Scheme VIAComputational time ( N E = 2000 ) 759sComputational time ( N E = 4000 ) 0.2374ms 1.729s 15.437s 0.2491ms > sComputational time ( N E = 6000 ) > s TABLE I: Comparison of the MATLAB computational time of the proposed scheme, the baselines and the value iteration algorithm (VIA).The system parameters are configured as in Fig. 9. Average data arrival rate (pck/s) A v e r age de l a y ( s ) Baseline 2, COWFS Baseline 3, QWWFSOpt. VIA α th = 3 . Proposed scheme with complete V in (28) and Baseline 1, GS λ Fig. 9: Average delay versus average data arrival rate for the small-data-arrival-energy-limited regime. The average energy arrival is α =6 W. regime. Furthermore, the performance of the proposed closed-form solution is very close to that of the VIA. E. Comparison of Complexity in Computational Time Table I illustrates the comparison of the MATLAB com-putational time of the proposed solution, the baselines andthe brute-force VIA [14]. Note that the proposed schemehas similar complexity to Baseline 1 due to the closed-formpriority function. Therefore, our proposed scheme achievessignificant performance gain with negligible computationalcost. VII. S UMMARY In this paper, we propose a closed-form delay-optimal powercontrol solution for an energy harvesting wireless network withfinite energy storage. We formulate the associated stochasticoptimization problem as an infinite horizon average cost MDP.Using a continuous time approach, we derive closed-form ap-proximate priority functions for different asymptotic regimes.Based on the closed-form approximations, we propose aclosed-form optimal control policy, which has a multi-levelwater filling structure and the water level is adaptive to theDQSI and the EQSI. Numerical results show that the proposedpower control scheme has much better performance than thebaselines. A PPENDIX A: P ROOF OF T HEOREM Proposition 4.6.1 of [14], the sufficient condi-tions for the optimality of Problem 1 is that there exists a( θ ∗ , { V ∗ ( χ ) } ) that satisfies the following Bellman equationand V ∗ satisfies the transversality condition in (12) for alladmissible control policy Ω and initial state χ (0) : θ ∗ + V ∗ ( χ ) = min p PPENDIX B: P ROOF OF T HEOREM V ( q, e ) is of class C ( R ) , we have d V ( q, e ) = ∂V ( q,e ) ∂q d q + ∂V ( q,e ) ∂e d e . Substituting the dynamics in (13) and(14), we obtain d V ( q ( t ) , e ( t )) = D Ω v ( V ( q ( t ) , e ( t ))) d t (32) + ∂V ( q ( t ) , e ( t )) ∂q d L ( t ) − ∂V ( q ( t ) , e ( t )) ∂e d U ( t ) where D Ω v ( V ( q, e )) (cid:44) ∂V ( q,e ) ∂q (cid:0) − E (cid:2) R ( h, p ) (cid:12)(cid:12) q, e (cid:3) + λ (cid:1) τ + ∂V ( q,e ) ∂e (cid:0) − E (cid:2) p (cid:12)(cid:12) q, e (cid:3) + α (cid:1) τ . Integrating on both sizes w.r.t. t from 0 to T , we have V ( q ( T ) , e ( T )) − V ( q , e ) (33) = ˆ T D Ω v ( V ( q ( t ) , e ( t ))) d t + ˆ T ∂V ( q ( t ) , e ( t )) ∂q d L ( t ) − ˆ T ∂V ( q ( t ) , e ( t )) ∂e d U ( t ) ( a ) = ˆ T D Ω v ( V ( q ( t ) , e ( t ))) d t + ˆ T ∂V (0 , e ( t )) ∂q d L ( t ) − ˆ T ∂V ( q ( t ) , N E ) ∂e d U ( t )= ˆ T (cid:18) q ( t ) λ + D Ω v ( V ( q ( t ) , e ( t ))) (cid:19) d t + ˆ T ∂V (0 , e ( t )) ∂q d L ( t ) − ˆ T ∂V ( q ( t ) , N E ) ∂e d U ( t ) − ˆ T q ( t ) λ d t (34)where ( a ) is because L ( t ) and U ( t ) increase only when q = 0 and e = N E according to Chapter 2.4 of [30]. If V ( q, e ) satisfies (19), from (34), we have for any admissible virtualpolicy Ω v , V ( q , e ) ≤ V ( q ( T ) , e ( T )) − ˆ T ∂V (0 , e ( t )) ∂q d L ( t )+ ˆ T ∂V ( q ( t ) , N E ) ∂e d U ( t ) + ˆ T q ( t ) λ d t (35)From the boundary conditions in (20), we have lim sup T →∞ ´ T ∂V (0 ,e ( t )) ∂q d L ( t ) = 0 , lim sup T →∞ ´ T ∂V ( q ( t ) ,N E ) ∂e d U ( t ) = 0 and lim sup T →∞ V ( q ( T ) , e ( T )) = . Hence, taking the limit superior as T → ∞ in (35), wehave V ( q , e ) ≤ lim sup T →∞ ˆ T q ( t ) λ d t (36)where the above equality is achieved if the admissible virtualstationary policy Ω v ( q, e, h ) attains the minimum in the HJBequation in (19) for all ( q, e, h ) . Hence, such Ω v is the optimalcontrol policy of the total cost problem in VCTS in Problem2. A PPENDIX C: P ROOF OF T HEOREM A. Relationship between the Discrete Time and VCTS Opti-mality Equations We first prove the following corollary based on Theorem 1. Corollary 5 (Approximate Optimality Equation): Supposethere exist J ( Q, E ) of class C ( R K + ) that solve the following approximate optimality equation : min p ≤ E/τ E (cid:20) Qλτ + ∂J ( Q, E ) ∂Q (cid:0) − R ( h, p ) + λ (cid:1) + ∂J ( Q, E ) ∂E (cid:0) − p + α (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) Q, E (cid:21) = 0 , ∀ Q, E (37)Furthermore, for all admissible control policy Ω and initialqueue state Q (0) , E (0) , the transversality condition in (12) issatisfied for J . Then, we have V ∗ ( Q, E ) = J ( Q, E )+ o ( τ ) . Proof of Corollary 5: We will establish the followingLemmas 1–3 to prove Corollary 5. For convenience, denote T χ ( θ, J, p ) = Qλ + (cid:88) Q (cid:48) ,E (cid:48) Pr (cid:2) Q (cid:48) , E (cid:48) (cid:12)(cid:12) χ , p (cid:3) J ( Q (cid:48) , E (cid:48) ) − J ( Q, E ) − θ (38) T † χ ( θ, J, p ) = Qλ + ∂J ( Q, E ) ∂Q (cid:0) − R ( h, p ) τ + λ (cid:1) τ + ∂J ( Q, E ) ∂E (cid:0) − p + α (cid:1) τ − θ (39) Step 1, Relationship between T χ ( θ, J, p ) and T † χ ( θ, J, p ) :Lemma 1: For any χ , T χ ( θ, J, p ) = T † χ ( θ, J, p ) + νG χ ( J, p ) for some smooth function G χ and ν = o ( τ ) . Proof of Lemma 1: Let ( Q ( n + 1) , E ( n + 1)) =( Q (cid:48) , E (cid:48) ) and ( Q ( n ) , E ( n )) = ( Q, E ) . For sufficiently small τ , according to the dynamics in (3) and (4), we have thefollowing Taylor expansion on J ( Q (cid:48) , E (cid:48) ) in (11): E (cid:2) J ( Q (cid:48) , E (cid:48) ) (cid:12)(cid:12) Q, E (cid:3) = J ( Q, E ) + E (cid:20) ∂J ( Q, E ) ∂Q ( − R ( h, p )+ λ (cid:1) + ∂J ( Q, E ) ∂E (cid:0) − p + α (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) Q, E (cid:21) τ + o ( τ ) (40)Substituting (40) into T χ ( θ, J, p ) , we obtain T χ ( θ, J, p ) = T † χ ( θ, J, p ) + νG χ ( J, p ) for some smooth function G χ and ν = o ( τ ) . Step 2, Growth Rate of E (cid:2) T χ (0 , J ) (cid:12)(cid:12) Q, E (cid:3) : Denote T χ ( θ, J ) = min p T χ ( θ, J, p ) , T † χ ( θ, J ) = min p T † χ ( θ, J, p ) (41)Suppose ( θ ∗ , V ∗ ) satisfies the Bellman equation in (11) and (0 , J ) satisfies (37), we have for any χ , E (cid:2) T χ ( θ ∗ , V ∗ ) (cid:12)(cid:12) Q, E (cid:3) = 0 , E (cid:2) T † χ (0 , J ) (cid:12)(cid:12) Q, E (cid:3) = 0 (42)Then, we establish the following lemma. Lemma 2: E (cid:2) T χ (0 , J ) (cid:12)(cid:12) Q, E (cid:3) = o ( τ ) , ∀ Q, E . Proof of Lemma 2: For any χ , we have T χ (0 , J ) =min p (cid:2) T † χ (0 , J, p ) + νG χ ( J, p ) (cid:3) ≥ min p T † χ (0 , J, p )+ ν min p G χ ( J, p ) . On the other hand, T χ (0 , J ) ≤ min p T † χ (0 , J, p ) + νG χ ( J, p † ) , where p † =arg min p T † χ (0 , J, p ) .From (42), E (cid:2) min p T † χ (0 , J, p ) (cid:12)(cid:12) Q, E (cid:3) = E (cid:2) T † χ (0 , J ) (cid:12)(cid:12) Q, E (cid:3) = 0 . Since T † χ (0 , J, p ) and G χ ( J, p † ) are all smooth and bounded functions, we have E (cid:2) T χ (0 , J ) (cid:12)(cid:12) Q, E (cid:3) = O ( ν ) = o ( τ ) for any Q, E . Step 3, Difference between V ∗ ( Q, E ) and J ( Q, E ) :Lemma 3: Suppose E [ T χ ( θ ∗ , V ∗ ) | Q, E ] = 0 for all Q, E together with the transversality condition in (12) has a uniquesolution ( θ ∗ , V ∗ ) . If J satisfies (37) and the transversalitycondition in (12), then V ∗ ( Q, E ) − J ( Q, E ) = o ( τ ) . Proof of Lemma 3: Suppose for some ( Q (cid:48) , E (cid:48) ) , we have J ( Q (cid:48) , E (cid:48) ) = V ∗ ( Q (cid:48) , E (cid:48) ) + α for some α (cid:54) = 0 as τ → . Nowlet τ → . From Lemma 2, we have E (cid:2) T χ (0 , J ) (cid:12)(cid:12) Q, E (cid:3) = 0 for all Q, E and also J satisfies the transversality condi-tion in (12). However, J ( Q (cid:48) , E (cid:48) ) (cid:54) = V ∗ ( Q (cid:48) , E (cid:48) ) becauseof the assumption that J ( Q (cid:48) , E (cid:48) ) = V ∗ ( Q (cid:48) , E (cid:48) ) + α . Thiscontradicts the condition that ( θ ∗ , V ∗ ) is a unique solutionof E [ T χ ( θ ∗ , V ∗ ) | Q, E ] = 0 for all Q, E and the transver-sality condition in (12). Hence, we must have V ∗ ( Q, E ) − J ( Q, E ) = o ( τ ) for all Q, E . B. Relationship between the Discrete Time Optimality Equa-tion and the HJB Equation First, if V ( Q, E ) that is of class C ( R ) satisfies theoptimality conditions of the total cost problem in VCTS(as shown in Theorem 2), then it also satisfies (37) inCorollary 5. Second, since V ( Q, E ) = O ( Q ) , we have lim n →∞ E Ω [ V ( Q ( n ) , E ( n ))] < ∞ for any admissible policy Ω of the discrete time system according to Definition 2. Hence, V ( Q, E ) satisfies the transversality condition in (12). UsingCorollary 5, we have V ∗ ( Q, E ) = V ( Q, E ) + o ( τ ) .A PPENDIX D: P ROOF OF T HEOREM p ∗ = min (cid:8)(cid:0) − ∂V ( q,e ) ∂q (cid:14) ∂V ( q,e ) ∂e − | h | (cid:1) + , eτ (cid:9) . Substituting it to the PDE in(19), we have E (cid:20) qλτ + ∂V ( q, e ) ∂q (cid:0) − R ( h, p ∗ ) + λ (cid:1) + ∂V ( q, e ) ∂e (cid:0) − p ∗ + α (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) q, e (cid:21) = 0 (43) For convenience, denote V q (cid:44) ∂V (cid:0) q,e (cid:1) ∂q and V e (cid:44) ∂V (cid:0) q,e (cid:1) ∂e . We then calculate the expectations in(43): E (cid:2) p ∗ (cid:3) = (cid:104) ´ − τVeVee + Vqτ − VeVq (cid:0) V q − V e − x (cid:1) exp (cid:0) − x (cid:1) d x + ´ ∞ − τVeVee + Vqτ eτ exp( x )d x (cid:105) (cid:0) V q − V e > eτ (cid:1) + (cid:104) ´ ∞ − VeVq (cid:0) V q − V e − x (cid:1) exp (cid:0) − x (cid:1) d x (cid:105) (cid:0) V q − V e < eτ (cid:1) = (cid:2) V q − V e exp (cid:0) V e V q (cid:1) − E (cid:0) − V e V q (cid:1) + V e e + V q ττV e exp (cid:0) τV e V e e + V q τ (cid:1) + E (cid:0) − τV e V e e + V q τ (cid:1)(cid:3) (cid:0) V q − V e > eτ (cid:1) + (cid:2) V q − V e exp (cid:0) V e V q (cid:1) − E (cid:0) − V e V q (cid:1)(cid:3) (cid:0) V q − V e < eτ (cid:1) (cid:44) G (cid:0) V q − V e , eτ (cid:1) .Similarly, using the integration by parts, we have E (cid:2) R ( h, p ∗ ) (cid:3) = (cid:2) exp (cid:0) τe (cid:1) E (cid:0) τ V q e V e + eτV q (cid:1) + E (cid:0) − V e V q (cid:1) − E (cid:0) − τV e V e e + V q τ (cid:1)(cid:3) (cid:0) V q − V e > eτ (cid:1) + E (cid:0) − V e V q (cid:1) (cid:0) V q − V e < eτ (cid:1) (cid:44) F (cid:0) V q − V e , eτ (cid:1) . Therefore, the PDE in (43) becomes: qλτ + V q (cid:20) λ − F (cid:18) V q − V e , eτ (cid:19)(cid:21) + V e (cid:20) α − G (cid:18) V q − V e , eτ (cid:19)(cid:21) = 0 (44)We then discuss the properties of F and G in (44) asfollows: • If V q − V e ≤ eτ , F is increasing w.r.t. V q − V e and F ∈ [0 , E (cid:0) τe (cid:1) ] . If V q − V e > eτ , F is a function of V q − V e and eτ , and is increasing w.r.t. V q − V e and F ∈ ( E (cid:0) τe (cid:1) , exp (cid:0) τe (cid:1) E (cid:0) τe (cid:1) ) . • If V q − V e ≤ eτ , G is increasing w.r.t. V q − V e and G ∈ [0 , eτ exp (cid:0) − τe (cid:1) − E (cid:0) τe (cid:1) ] . If V q − V e > eτ , G is a func-tion of V q − V e and eτ , and is increasing w.r.t. V q − V e and G ∈ ( eτ exp (cid:0) − τe (cid:1) − E (cid:0) τe (cid:1) , eτ ) .For the continuous time queueing system in (13) and (14),there exists a steady data queue states q s = 0 and e s ∈ [0 , N E ] ,i.e., lim t →∞ q ( t ) = q s and lim t →∞ e ( t ) = e s . At steady state,we require λ ≤ F (cid:18) V q − V e , e s τ (cid:19) , α ≥ G (cid:18) V q − V e , e s τ (cid:19) (45)The existence of solution for the HJB equation in Theorem2 is equivalent to the existence of solution of (45). We shalldiscuss the solution of (45) in the following two cases: Case 1: if the equalities are achieved in (45), i.e., λ = F (cid:18) V q − V e , e s τ (cid:19) , α = G (cid:18) V q − V e , e s τ (cid:19) (46)there exists a ˜ e ∈ [0 , N E ] such that ˜ eτ exp (cid:16) − τ ˜ e (cid:17) − E (cid:16) τ ˜ e (cid:17) < α < ˜ eτE (cid:16) τ ˜ e (cid:17) < λ < exp (cid:16) τ ˜ e (cid:17) E (cid:16) τ ˜ e (cid:17) (47)From the first equation above, we have ατ < ˜ e < xτ where x satisfies x exp (cid:0) − x (cid:1) − E (cid:0) x (cid:1) = α . For given ˜ e , we canobtain the range for λ according to the second equation above: E (cid:0) α (cid:1) < λ < exp (cid:0) x (cid:1) E (cid:0) x (cid:1) . Furthermore, we denote thesolution of (46) w.r.t. e to be e ∗ . Then, it is sufficient that N E ≥ e ∗ so that the solution of (46) is meaningful. Case 2: if λ ≤ E (cid:0) α (cid:1) , we will show in Appendix I thatthe optimal control for this case achieves λ < F (cid:18) V q − V e , e s τ (cid:19) , α = G (cid:18) V q − V e , e s τ (cid:19) (48)where the steady states are q s = 0 , e s = ατ . In this case, werequire that N E ≥ e s = ατ . Combining both cases, we obtainthe conditions in (22) and (23). This completes the proof.A PPENDIX E: P ROOF OF T HEOREM V q − V e < eτ and V q − V e > eτ . Specifically, when V q − V e > eτ , qλτ + V q (cid:20) λ − exp (cid:16) τe (cid:17) E (cid:18) τ V q e V e + eτ V q (cid:19) − E (cid:18) − V e V q (cid:19) + E (cid:18) − τ V e V e e + V q τ (cid:19) (cid:21) + V e (cid:20) α + V q V e exp (cid:18) V e V q (cid:19) + E (cid:18) − V e V q (cid:19) − V e e + V q ττ V e exp (cid:18) τ V e V e e + V q τ (cid:19) − E (cid:18) − τ V e V e e + V q τ (cid:19)(cid:21) = 0 (49)when V q − V e < eτ , qλτ + V q (cid:20) λ − E (cid:18) − V e V q (cid:19)(cid:21) + V e (cid:20) α + V q V e exp (cid:18) V e V q (cid:19) + E (cid:18) − V e V q (cid:19)(cid:21) = 0 (50) A. Relationship among V q − V e , eτ , λ and α Dividing − V e on both sizes of (44), we have J (cid:18) V q − V e (cid:19) (cid:44) (51) V q − V e (cid:20) λ − F (cid:18) V q − V e , eτ (cid:19) (cid:21) − (cid:20) α − G (cid:18) V q − V e , eτ (cid:19)(cid:21) = − q − V e λτ We first have the following lemma: Lemma 4: From (51), we have − V e = Θ (cid:16) λ exp( λ ) (cid:17) . Proof of Lemma 4: We assume that V ( q, e ) = o ( g ( λ )) and V ( q, e ) = O ( f ( λ )) for some functions f and g . Therefore, V q = o ( g ( λ )) = O ( f ( λ )) , and V e = o ( g ( λ )) = O ( f ( λ )) .According to (47), we have α = o (exp( λ )) . Combining (44),we have o ( g ( λ ))Θ (cid:0) λ (cid:1) + o ( g ( λ ))Θ(exp( λ )) = − Θ (cid:18) λ (cid:19) (52) O ( f ( λ ))Θ (cid:0) λ (cid:1) + O ( f ( λ ))Θ(exp( λ )) = − Θ (cid:18) λ (cid:19) (53)where (52) implies g ( λ ) = − o (cid:16) λ exp( λ ) (cid:17) , and (53) implies f ( λ ) = −O (cid:16) λ exp( λ ) (cid:17) . Hence, V ( q, e ) = − Θ (cid:16) λ exp( λ ) (cid:17) ,which induces − V e = Θ (cid:16) λ exp( λ ) (cid:17) .Based on Lemma 4, (51) implies J (cid:16) V q − V e (cid:17) = − Θ(exp( λ )) .Let e th satisfy E (cid:0) τe th (cid:1) = λ . We have the following discus-sions on the property of J (cid:16) V q − V e (cid:17) : • e < e th : if exp (cid:0) τe th (cid:1) E (cid:0) τe th (cid:1) < λ , J (cid:16) V q − V e (cid:17) is anincreasing function w.r.t. V q − V e . Specifically, when < V q − V e < x ( e ) , where F (cid:0) x ( e ) , eτ (cid:1) = 0 , J (cid:16) V q − V e (cid:17) is neg-ative. When V q − V e > x ( e ) , J (cid:16) V q − V e (cid:17) is positive. On theother hand, if exp (cid:0) τe th (cid:1) E (cid:0) τe th (cid:1) > λ , let x ( e ) satisfy F (cid:0) x ( e ) , eτ (cid:1) = λ . When < V q − V e < x ( e ) , J (cid:16) V q − V e (cid:17) isincreasing w.r.t. V q − V e , and when V q − V e > x ( e ) , J (cid:16) V q − V e (cid:17) is decreasing w.r.t. V q − V e . • e ≥ e th : let x ( e ) satisfy F (cid:0) x ( e ) , eτ (cid:1) = λ . When < V q − V e < x ( e ) , J (cid:16) V q − V e (cid:17) is negative and increasing.When V q − V e > x ( e ) , J (cid:16) V q − V e (cid:17) is negative and decreasing.Furthermore, we have x ( e ) < eτ for given e .Therefore, we have the following results on the relationshipamong V q − V e , eτ , λ and α : Classification 1 (Relationship among V q − V e , eτ , λ and α ): e < e th : • small λ , small α and E (cid:0) α (cid:1) < λ < exp (cid:0) x (cid:1) E (cid:0) x (cid:1) :in this case, we have V q − V e = Θ (cid:16) exp( λ ) λ (cid:17) which is largefor sufficiently small λ . Furthermore, since e < e th , wehave V q − V e > eτ . Therefore, the PDE in (51) becomes(49) with large V q − V e and small e . • large λ , large α and E (cid:0) α (cid:1) < λ < exp (cid:0) x (cid:1) E (cid:0) x (cid:1) :similar to the previous case, we have V q − V e > eτ . Since e < e th and we consider large λ , e is relativelylarge compared with V q − V e . Therefore, the PDE in (51)becomes (49) with large V q − V e and large e .2) e ≥ e th : in this case, since < V q − V e < x ( e ) and x ( e ) < eτ . Therefore, V q − V e < eτ , which means that the PDE in (51)becomes (50). B. Solving the HJB Equation under the Large-Data Arrival-Energy-Sufficient Regime According to Classification 1, when e < e th , we have thePDE in (49) with large V q − V e and large e , and when e ≥ e th ,we have the PDE in (50). We first solve the PDE in (49)with large V q − V e and large e . We have the following approxi-mations for V q − V e E [ R ( h, p ∗ )] in (51): V q − V e E (cid:16) τ V q e V e + eτV q (cid:17) = V q − V e E (cid:0) τe (cid:1) + o (1) , E (cid:16) − V e V q (cid:17) = − γ eu V q − V e + V q − V e log (cid:16) V q − V e (cid:17) +1 + o (1) , E (cid:16) − τV e V e e + V q τ (cid:17) = − γ eu V q − V e + V q − V e log (cid:16) V q − V e − eτ (cid:17) +1 + o (1) . Hence, we have V q − V e E [ R ( h, p ∗ )] = V q − V e exp (cid:16) τe (cid:17) E (cid:16) τe (cid:17) + eτ + o (1) (54)Similarly, for E [ p ∗ ] , we have V q V e exp (cid:16) V e V q (cid:17) = V q V e + 1 + o (1) , V e e + V q ττV e exp (cid:16) τV e V e e + V q τ (cid:17) = eτ + V q V e + 1 + o (1) , Hence, we have E [ p ∗ ] = eτ (55) Substituting (54) and (55) into (44) and for large λ and α , weobtain the following simplified PDE: qλτ + V q (cid:16) λ − exp (cid:16) τe (cid:17) E (cid:16) τe (cid:17)(cid:17) + V e α = 0 (56)For large e , we approximate exp (cid:0) τe (cid:1) E (cid:0) τe (cid:1) as exp (cid:0) τe (cid:1) E (cid:0) τe (cid:1) = − γ eu + log eτ + o (1) . Substitutingit into (56) and using of [32], we obtain V ( q, e ) = e λα τ (cid:0) γ eu + 2 λ − eτ (cid:1) − eqλατ + C .We then determine the addend constant C .For the steady state requirement in (45), using (54) and (55),we have − γ eu +log eτ + eτ − V e V q = λ , eτ = α . We then obtain that q s = 0 e s = ατ , V q − V e (cid:12)(cid:12) q = q s ,e = e s = αλ + γ eu − log α . Under (24),the steady state requirement is satisfied, and therefore the firstcondition in (20) is satisfied. In addition, for any admissible Ω v , we have lim t →∞ q ( t ) = 0 . Choosing C = C as in (24),the third condition in (20) is satisfied if N E satisfies (23).We then solve the PDE in (50) when e ≥ e th . To satisfy thesecond condition in (20), it requires ∂V ( q,N E ) ∂e = 0 , ∀ q . Using of [32], we obtain the solution in the following form: V ( q, e ) = c e + φ ( q, c ) + c (57)where φ = O ( q ) . Then, ∂V ( q,N E ) ∂e = 0 induces that c = 0 ,which means that V ( q, e ) is a function of q only. Hence, V q − V e is infinite which means that p ∗ = eτ according to Corollary 1.Combing (24) ( e < e th ) and (57) ( e ≥ e th ), we obtain the fullsolution in this regime. C. Verification of the Admissibility of Ω v ∗ under the Large-Data-Arrival-Energy-Sufficient Regime We first calculate the water level when e < e th under (24): V q − V e = αee (cid:0) γ eu + λ − log( eτ ) (cid:1) − αq (58)Therefore, for sufficiently large q , we have V q − V e < , whichmeans that there is no data transmission, and the energy bufferwill harvest energy until e ≥ e th when the policy is p ∗ = eτ (we refer to it as the greedy policy ). Specifically, we can calcu-late the trajectory of e ( t ) as: e ( t ) = ( e (¯ t ) − ατ ) exp( − t )+ ατ ,where ¯ t is the time stamp when e ≥ e th is first satisfied.Note that ¯ t = 0 if e (0) ≥ e th . This trajectory implies that lim t →∞ e ( t ) = ατ . For any (cid:15) > , there exists t > . When t ≥ t , we have | e ( t ) − ατ | ≤ (cid:15), t ≥ t > ¯ t (59)We then can calculate the trajectory of q ( t ) under p ∗ = eτ : q ( t ) = q (0) + q (¯ t ) − q (0) − ´ t ¯ t (cid:2) exp (cid:0) τe ( t (cid:48) ) (cid:1) E (cid:0) τe ( t (cid:48) ) (cid:1) − λ (cid:3) τ t (cid:48) + ´ t ¯ t L ( t )d t = q (0) + q (¯ t ) − q (0) − ´ t ¯ t (cid:104) exp (cid:0) τe ( t (cid:48) ) (cid:1) E (cid:0) τe ( t (cid:48) ) (cid:1) − λ (cid:105) τ t (cid:48) + ´ t ¯ t L ( t )d t − ´ tt (cid:104) exp (cid:0) τe ( t (cid:48) ) (cid:1) E (cid:0) τe ( t (cid:48) ) (cid:1) − λ (cid:3) τ t (cid:48) + ´ tt L ( t )d t . Let q ( t ) (cid:44) q (¯ t ) − q (0) − ´ t ¯ t (cid:104) exp (cid:16) τe ( t (cid:48) ) (cid:17) E (cid:16) τe ( t (cid:48) ) (cid:17) − λ (cid:105) τ t (cid:48) + ´ t ¯ t L ( t )d t . Therefore, q ( t ) = q (0) + q ( t ) − ˆ tt (cid:20) exp (cid:18) τe ( t (cid:48) ) (cid:19) E (cid:18) τe ( t (cid:48) ) (cid:19) − λ (cid:21) τ t (cid:48) + ˆ tt L ( t (cid:48) )d t (cid:48) ( a ) ≤ q (0) + q ( t ) − ˆ tt (cid:20) exp (cid:18) α + (cid:15)/τ (cid:19) E (cid:18) α + (cid:15)/τ (cid:19) − λ (cid:21) τ t (cid:48) + ˆ tt L ( t (cid:48) )d t (cid:48) (60)where ( a ) is due to e ( t ) < ατ + (cid:15) when t ≥ t accordingto (59). Since E (cid:0) α (cid:1) < λ , there exists a δ > such that λ < exp (cid:16) α + δ (cid:17) E (cid:16) α + δ (cid:17) . Choosing (cid:15) = δτ in (60), weobtain q ( t ) − q (0) < , if t ≥ q ( t )exp (cid:16) α + δ (cid:17) E (cid:16) α + δ (cid:17) − λ (61)Therefore, we obtain the negative queue drift, which meansthat the greedy policy is a stabilizing policy [33], [34].A PPENDIX F: P ROOF OF C OROLLARY V ( q, e ) is a function of q only when e ≥ e th and hence, V q − V e is infinite, which means that p ∗ = eτ according to Corollary 1. For e < e th , the water level(WL) is given in (58). When q > eα (cid:0) γ eu + λ − log( eτ ) (cid:1) ,the WL is negative, which results in p ∗ = 0 . On the otherhand, when q < eα (cid:0) γ eu + λ − log( eτ ) (cid:1) , the WL is positiveand increasing w.r.t. q . Moreover, derivative of (58) w.r.t. e = α ( e − αq )( − e ( λ + γ eu )+ αq + e log( eτ )) . When e < αq , the WL is decreasingw.r.t. e , and when e < αq , the WL is increasing w.r.t. e .A PPENDIX G: P ROOF OF T HEOREM A. Solving the HJB Equation under the Small-Data-Arrival-Energy-Limited Regime According to Classification 1, when e < e th , we have thePDE in (49) with large V q − V e and small e and when e ≥ e th ,we have the PDE in (50). Following part B in Appendix E,we can obtain the simplified PDE as in (56). For small e , weapproximate exp (cid:0) τe (cid:1) E (cid:0) τe (cid:1) as exp (cid:0) τe (cid:1) E (cid:0) τe (cid:1) = eτ + o (1) .Substituting it to (56) and using of [32], we obtain thesolution for this case as in (25). Furthermore, the solution for e ≥ e th is the same as (57). Following the same procedurein Appendix E, it can be verified that the three conditions in(20) are satisfied. B. Verification of the Admissibility of Ω v ∗ under the Small-Data-Arrival-Energy-Limited Regime We first calculate the WL when e < e th under (25): V q − V e = ατ e − e + λτ e − ατ q (62)Therefore, for sufficiently large q , we have V q − V e < , whichmeans that there is no data transmission, and the energy bufferwill harvest energy until e ≥ e th when the data queue willadopt the policy p ∗ = eτ . Following the same proof as in partC in Appendix E, we can prove the negative data queue drift. A PPENDIX H: P ROOF OF C OROLLARY V ( q, e ) is a function of q only when e ≥ e th , wehave p ∗ = eτ . For e < e th , the WL is given in (62). When q > − e + λτeατ , the WL is negative, which results in p ∗ = 0 . Onthe other hand, when q < λe + αeα , the WL is positive, whichis increasing w.r.t. q . Moreover, derivative of (62) w.r.t. e = ατ ( e − ατq )( − e + λτe − ατq ) . When e < √ ατ q , the WL is decreasing w.r.t. e , and when e > √ ατ q , the WL is increasing w.r.t. e .A PPENDIX I: P ROOF OF T HEOREM A. Solving the HJB Equation under the Small-Data-Arrival-Energy-Sufficient Regime Following the same analysis as in part A in Appendix E,when e < e th , we have the PDE in (49), and when e ≥ e th ,we have the PDE in (50). For the PDE in (49), we require ∂V (0 , e ) ∂q = 0 (63)because the equalities in (45) cannot be achieved and L ( t ) (cid:54) = 0 after the virtual queueing system enters thesteady state. Under this regime, the system operates at theregion with small V q . We have the following approximationsfor E [ R ( h, p ∗ )] in (49): exp (cid:0) τe (cid:1) E (cid:16) τ V q e V e + eτV q (cid:17) = eτ (cid:16) − eτ − V e V q (cid:17) exp (cid:18) − Vq − Ve − eτ (cid:19) + o (1) , E (cid:16) − V e V q (cid:17) = V q − V e exp (cid:16) V e V q (cid:17) + o (1) , E (cid:16) − τV e V e e + V q τ (cid:17) = (cid:16) V q − V e − eτ (cid:17) exp (cid:18) − Vq − Ve − eτ (cid:19) + o (1) . Hence, we have E [ R ( h, p ∗ )] = O (cid:18) V q − V e exp (cid:18) V e V q (cid:19)(cid:19) + o (1) = o (1) (64)Similarly, for E [ p ∗ ] , we have E [ p ∗ ] = o (1) (65)Substituting (64) and (65) into (44), we obtain the simplifiedPDE: qλτ + V q λ + V e α = 0 . Using of [32], we obtainthe following solution for this case: V ( q, e ) = 12 α τ e − eqλατ + φ (cid:18) q − λα e (cid:19) (66)From (63), we require that φ (cid:48) (cid:16) − λα e (cid:17) = (cid:16) − λα e (cid:17) (cid:0) − λ τ (cid:1) , ∀ e .We choose φ ( x ) = x (cid:0) − λ τ (cid:1) . Therefore, the final solutionis given in (26). Furthermore, the solution for e ≥ e th is thesame as (57). Following the same procedure in Appendix E, itcan be verified that the three conditions in (20) are satisfied. B. Verification of the Admissibility of Ω v ∗ under the Small-Data-Arrival-Energy-Sufficient Regime Note that when e < e th , under the solution in (26), wehave that ∂V ( q,e ) ∂e = 0 for all q, e , which results in p ∗ = eτ .Following the same proof as in part C in Appendix E, we canprove the negative data queue drift. A PPENDIX J: P ROOF OF T HEOREM Q (0) , for thefollowing case 1 ( E (0) > e th ) and case 2 ( E (0) < e th ), wehave negative data queue drift. Case 1, E (0) > e th : In this case, the greedy policy p ∗ ( n ) = E ( n ) τ is adopted for all different asymptotic scenar-ios. Based on the energy queue dynamics in (4), we have p ∗ ( n ) = min { α ( n − , N E τ } for n ≥ . We then calculate theone step queue drift as follows: for sufficiently large Q , E (cid:2) Q ( n + 1) − Q ( n ) (cid:12)(cid:12) Q ( n ) = Q, E ( n ) = E (cid:3) = E (cid:104)(cid:2) Q − log (cid:0) | h | E/τ (cid:1) τ (cid:3) + + λτ − Q (cid:105) ( a ) = E (cid:2) − log (cid:0) | h | E/τ (cid:1) τ + λτ (cid:3) (67)where (a) is due to the fact that for given E and sufficientlylarge Q , we have Pr (cid:2) Q > log (cid:0) | h | E (cid:1) τ (cid:3) = Pr (cid:2) | h | < exp( Q ) − e/τ (cid:3) > − δ ( ∀ δ > ). In (67), if α ( n − < N E τ ,we have E (cid:2) − log (cid:0) | h | E/τ (cid:1) τ + λτ (cid:3) = E (cid:2) − log (cid:0) | h | α (cid:1) τ + λτ (cid:3) = (cid:0) λ − E (cid:2) exp (cid:0) α (cid:1) E (cid:0) α (cid:1)(cid:3)(cid:1) τ ( b ) < , where (b)is due to (29). If α ( n − > N E , we have E (cid:2) − log (cid:0) | h | E/τ (cid:1) τ + λτ (cid:3) = E (cid:2) − log (cid:0) | h | N E /τ (cid:1) τ + λτ (cid:3) ( c ) ≤ E (cid:2) − log (cid:0) | h | α (cid:1) τ + λτ (cid:3) = (cid:0) λ − exp (cid:0) α (cid:1) E (cid:0) α (cid:1)(cid:1) τ ( d ) < ,where (c) is due to N E ≥ N ατ > ατ and (d) is due to α ≤ E (cid:2) exp (cid:0) α (cid:1) E (cid:0) α (cid:1)(cid:3) < exp (cid:0) α (cid:1) E (cid:0) α (cid:1) . Hence, we havenegative drift. Case 2, E (0) < e th : In this case, we show that there existssome positive integer n < N such that the n -step queuedrift in the discrete time queueing system is negative. Since E (0) < e th and Q (0) is sufficiently large, the data queue willnot transmit in the beginning. For given α , after (cid:100) e th − E (0) α (cid:101) number of time slots where (cid:100) x (cid:101) is the ceiling function, thedata queue will adopt the greedy policy to transmit. To provethe existence of n , it is sufficient to prove that E (cid:20)(cid:18) N − (cid:100) e th − E (0) α (cid:101) (cid:19) (cid:18) exp (cid:18) α (cid:19) E (cid:18) α (cid:19)(cid:19)(cid:21) > E [ λN ] (68)where the L.H.S. (R.H.S.) means the departure bits(arrival bits) before the end of the next changeevent of the energy arrival rate. From (68), we have (68) ⇐ E (cid:104)(cid:16) − N (cid:100) e th − E (0) α (cid:101) (cid:17) (cid:0) exp (cid:0) α (cid:1) E (cid:0) α (cid:1)(cid:1)(cid:105) > λ ( e ) ⇐ E (cid:0) exp (cid:0) α (cid:1) E (cid:0) α (cid:1)(cid:1) ( f ) > λ , where ( e ) holds for large N and ( f ) holds due to (29). Therefore, we have negative drift forthis case. Based on the Lyapunov theory [33], [34], negativestate drift for both cases leads to the stability of Q ( n ) , i.e., lim n →∞ E (cid:2) Q ( n ) (cid:3) < ∞ .R EFERENCES[1] C. K. Ho and R. Zhang, “Optimal energy allocation for wireless com-munications with energy harvesting constraints,” IEEE Trans. SignalProcess. , vol, 60, no. 9, pp. 4808–4818, Sept. 2012.[2] G. Yang, V. Y. F. Tan, C. K. Ho, S. H. Ting, and Y. L. Guan, “Wirelesscompressive sensing for energy harvesting sensor nodes,” IEEE Trans.Signal Process. , vol, 61, no. 18, pp. 4491–4505, Sept. 2013. [3] B. K. Chalise, W.-K. Ma, Y. D. Zhang, H. A. Suraweera, and M. G. Amin,“Optimum performance boundaries of OSTBC based AF-MIMO relaysystem with energy harvesting receiver,” IEEE Trans. Signal Process. ,vol, 61, no. 17, pp. 4199–4213, Sept. 2013.[4] H. Huang and V. K. N. Lau, “Decentralized delay optimal controlfor interference networks with limited renewable energy storage,” IEEETrans. Signal Process. , vol. 60, no. 5, pp. 2552–2561, May 2012.[5] J. Yang and S. Ulukus, “Optimal packet scheduling in an energy harvest-ing communication system,” IEEE Trans. Commun. , vol. 60, pp. 220–230,Jan. 2012.[6] J. Yang, O. Ozel, and S. Ulukus, “Broadcasting with an energy harvestingrechargeable transmitter,” IEEE Trans. Wireless Commun. , vol. 11, pp.571–583, Feb. 2012.[7] K. Tutuncuoglu and A. Yener, “Sum-rate optimal power policies forenergy harvesting transmitters in an interference channel,” J. Commun.Netw. , vol. 14, no. 2, pp. 151–161, Apr. 2012.[8] K. Tutuncuoglu and A. Yener, “Optimum transmission policies for batterylimited energy harvesting nodes,” IEEE Trans. Wireless Commun. , vol. 11,no. 3, pp. 1180–1189, Mar. 2012.[9] R. Srivastava and C. E. Koksal, “Basic performance limits and tradeoffsin energy-harvesting sensor nodes with finite data and energy storage,” IEEE/ACM Trans. Netw , Oct. 2012.[10] V. Sharma, U. Mukherji, V. Joseph, and S. Gupta, “Optimal energymanagement policies for energy harvesting sensor nodes,” IEEE Trans.Wireless Commun. , vol. 9, no. 4, pp. 1326–1336, Apr. 2010.[11] L. Huang and M. J. Neely, “Utility optimal scheduling in energyharvesting networks,” in Proc. Mobihoc , 2011.[12] M. Gatzianas, L. Georgiadis, and L. Tassiulas, “Control of wirelessnetworks with rechargeable batteries,” IEEE Trans. Wireless Commun. ,vol. 9, no. 2, pp. 581–593, Feb. 2010.[13] A.S. Leong, S. Dey, G. N. Nair, and P. Sharma, “Power allocationfor outage minimization in state estimation over fading channels,” IEEETrans. Signal Process. , vol. 59, no. 7, pp. 3382–3397, Jul. 2011.[14] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Boston, MA: Athena Scientific, 2005.[15] O. Ozel, K. Tutuncuoglu, J. Yang, S. Ulukus, and A. Yener, “Adaptivetransmission policies for energy harvesting wireless nodes in fadingchannels,” in Proc. CISS , Baltimore, Mar. 2011.[16] J. Lei, R. Yates, and L. Greenstein, “A generic model for optimizingsingle-hop transmission policy of replenishable sensors,” IEEE Trans.Wireless Commun. , vol. 8, no. 2, pp. 547–551, Feb. 2009.[17] A. Seyedi and B. Sikdar, “Energy efficient transmission strategies forbody sensor networks with energy harvesting,” IEEE Trans. Commun. ,vol. 58, no. 7, 2116–2126, Jul. 2010.[18] H. Li, N. Jaggi, and B. Sikdar, “Relay scheduling for cooperativecommunications in sensor networks with energy harvesting,” IEEE Trans.Commun. , vol. 10, no. 9, pp. 2918–2928, Sept. 2011.[19] A. Aprem, C. R. Murthy, and N. B. Mehta, “Transmit power controlpolicies for energy harvesting sensors with retransmissions,” IEEE J. Sel.Topics Signal Process. , vol. 7, no. 5, pp. 895–906, Oct. 2013.[20] N. Jaggi, K. Kar, and A. Krishnamurthy, “Rechargeable sensor activationunder temporally correlated events,” Wireless Networks , vol. 15, no. 5,pp. 619–635, Jul. 2009.[21] P. Blasco, D. G ¨ und ¨ uz, and M. Dohler, “A learning theoretic approachto energy harvesting communication system optimization,” IEEE Trans.Wireless Commun. , vol. 12, no. 4, pp. 1872–1882, Apr. 2013.[22] M. Gorlatova, A. Bernstein, and G. Zussman, “Performance evaluationof resource allocation policies for energy harvesting devices,” in Proc.IEEE Int’l Symp. Modeling and Optimization in Mobile, Ad Hoc andWireless Networks , May 2011.[23] M. Gorlatova, A. Wallwater, and G. Zussman, “Networking low-powerenergy harvesting devices: Measurements and algorithms,” IEEE Trans.Mobile Comput. , vol. 12, no. 9, pp. 1853–1865, Sept. 2013.[24] A. Sinha, P. Chaporkar, “Optimal power allocation for a renewableenergy source,” in Proc. 2012 National Conf. Commun. , pp. 1–5, Feb.2012.[25] Y. Cui, V. K. N. Lau, R. Wang, H. Huang, S. Zhang, “A surveyon delay-aware resource control for wireless systemsLarge derivationtheory, stochastic Lyapunov drift and distributed stochastic learning,” IEEE Trans. Info. Theory , vol. 58, no. 3, pp. 1677–1700, Mar. 2012.[26] Z. Han, Z. Ji, and K. J. R. Liu, “Non-cooperative resource competitiongame by virtual referee in multi-cell OFDMA networks,” IEEE J. Sel.Areas Commun. , vol. 25, no. 6, pp. 1079–1090, Aug. 2007.[27] S. Teleke, et al. “Rule-based control of battery energy storage fordispatching intermittent renewable sources,” IEEE Trans. Sustain. Energy ,vol. 1, no. 3, pp. 117-124, Oct. 2010. [28] R. Durrett, Probability: Theory and Examples. Vol. 3. Cambridgeuniversity press, 2010.[29] C. Park and P. H. Chou, “Ambimax: Autonomous energy harvestingplatform for multi-supply wireless sensor nodes,” in Proc. IEEE SECON ,pp. 168–177, Sept. 2006.[30] J. M. Harrison, Brownian Motion and Stochastic Flow Systems , WileyNew York, 1985.[31] K. Ross, “Stochastic control in continuous time,” Lecture Notes onContinuous Time Stochastic Control , Spring 2008.[32] A. D. Polyanin, V. F. Zaitsev, and A. Moussiaux, Handbook of FirstOrder Partial Differential Equations, 2nd ed. Taylor & Francis, 2002.[33] M. J. Neely, “Energy optimal control for time-varying wireless net-works,” IEEE Trans. Inf. Theory , vol. 52, no. 7, pp. 2915–2934, Jul.2006.[34] M. J. Neely, E. Modiano, and C. E. Rohrs, “Dynamic power allocationand routing for time-varying wireless networks,”