Energy Management for Energy Harvesting Wireless Sensors with Adaptive Retransmission
Animesh Yadav, Mathew Goonewardena, Wessam Ajib, Octavia A. Dobre, Halima Elbiaze
aa r X i v : . [ c s . I T ] O c t Energy Management for Energy HarvestingWireless Sensors with Adaptive Retransmission
Animesh Yadav,
Member, IEEE , Mathew Goonewardena,
Student Member, IEEE , WessamAjib,
Senior Member, IEEE , Octavia A. Dobre,
Senior Member, IEEE , and HalimaElbiaze,
Member, IEEE
Abstract
This paper analyzes the communication between two energy harvesting wireless sensor nodes. Thenodes use automatic repeat request and forward error correction mechanism for the error control. Therandom nature of available energy and arrivals of harvested energy may induces interruption to the signalsampling and decoding operations. We propose a selective sampling scheme where the length of thetransmitted packet to be sampled depends on the available energy at the receiver. The receiver performsthe decoding when complete samples of the packet are available. The selective sampling informationbits are piggybacked on the automatic repeat request messages for the transmitter use. This way, thereceiver node manages more efficiently its energy use. Besides, we present the partially observableMarkov decision process formulation, which minimizes the long-term average pairwise error probabilityand optimizes the transmit power. Optimal and suboptimal power assignment strategies are introducedfor retransmissions, which are adapted to the selective sampling and channel state information. Withfinite battery size and fixed power assignment policy, an analytical expression for the average packetdrop probability is derived. Numerical simulations show the performance gain of the proposed schemewith power assignment strategy over the conventional scheme.
A part of the paper is published in the Proceeding of the IEEE International Conference on Communications (IEEE ICC2015), London, UK, 8-12 June 2015.A. Yadav and O. A. Dobre are with the Faculty of Engineering and Applied Science, Memorial University, St. John’s, NL,Canada, (e-mail: {animeshy, odobre}@mun.ca), W. Ajib and H. Elbiaze are with the Department of Computer Science, Universitédu Québec à Montréal (UQAM), Montreal, QC, Canada (e-mail: {ajib.wessam, elbiaze.halima}@uqam.ca) and M. Goonewardenais École de Technologie Supérieure (ÉTS), Montréal, QC, Canada. (e-mail:[email protected]).
Index Terms
Wireless sensors networks, energy harvesting, packet drop probability, partially observable Markovdecision processes.
I. I
NTRODUCTION
The use of energy harvesting (EH) sources to power wireless communication systems has recentlyreceived considerable attention [1]–[7]. The EH devices offer green communication and can operateautonomously over long periods of time. Because of these benefits, the EH devices are also increasinglyconsidered in wireless sensor networks (WSNs) to power the sensor nodes [8]–[15].Sensors nodes arelow cost distributed devices which operate on minimal energy. They are very prevalent in applicationsrelated to monitoring and controlling the environments, especially the remote and dangerous ones [16].Usually, sensor nodes are operated by small capacity non-renewable batteries, thus, suffering from finitelifespan of operation. Sensor nodes with EH capabilities can be an alternative to increase the lifespanand lower the maintenance cost. Energy can be harvested from the environment using for instance solar,vibration or thermoelectric effects. Unlike EH, another practical alternative to increase the lifespan of thenodes is to use massive antenna arrays at the receiver nodes to mitigates severe energy constraints givenby the inexpensive transmitter nodes [17], [18].Typically, the energy arrival amount at the EH devices is random. Thus, for such nodes, the challengingobjective is the adequate management of the collected energy to enable reliable and continuous operation.Recently, a considerable amount of works on wireless networks solely powered by harvested energy haveemanated [4]–[11] to address this objective. Although both transmitter and receiver nodes can harvestenergy, the research is primarily focused either on the transmitter [8]–[11] or receiver [5]–[7], [19]. Thereare many practicals scenarios where the transmitter and receiver nodes can harvest energy to increase theirlifespan, such as the scenario of transmitter and multiple intermediate nodes in a multi-hop WSN, andmultiple transmitter nodes communicating with a single sink node. These scenarios are more challengingdue to the presence of many random sources of energy.
Fewer works have considered the EH capability at both transmitter and receiver nodes simultaneously[4], [12]–[15]. In [4], the authors considered a static additive white Gaussian noise (AWGN) channel andused a rate-based utility as a function of both transmitter and receiver powers. They proposed directionalwater-filling based power allocation policy in an offline setting. The problem of online power control fora wireless link with automatic repeat request (ARQ) scheme is studied in [12]. The authors investigatedthree fixed policies under various assumptions, such as knowledge of the receiver battery availabilityat the transmitter node, finite and infinite battery storage. In [13], the authors analyzed a wireless linkwhich employs type-II hybrid automatic repeat request (HARQ) scheme. They derived the packet dropprobability (PDP) for predetermined transmit energy levels. In [14], the authors obtained a lower boundon maximum achievable throughput and proposed a common threshold policy. In [15], we introduced anenergy-aware adaptive retransmission scheme, where the receiver node performs the selective sampling(SS) and decoding operations based on the energy availability.The receiver node spends the energy dominantly in sampling and decoding operations, if a forward errorcorrection (FEC) coding is employed [20]. Moreover, for small distances, the transmit energy is oftensmaller than the energy needed in the decoding operation [21]. Nonetheless, because of the randomness inthe amount of energy arrivals, the receiver operations may be suspended, which leads to energy wastage.Thus, the receiver might favor to sample a fraction of the full packet depending on the available energy[5], which we refer to as SS. On the other hand, the transmitter with exact SS information (SSI) canretransmit only a portion of the packet, which is not sampled by the receiver.Furthermore, the time-varying characteristic of the wireless channel and harvested energy might con-tribute to a higher packet error probability (PEP). Hence, the transmitter must adequately adapt the transmitpower level to the channel state information (CSI), while meeting the constraint of energy causality, toensure a lower PEP. The causality constraint affirms that the cumulative used energy cannot surpass thecumulative harvested energy by nodes at any given time. Furthermore, based on the SSI knowledge, thetransmitter adapts the packet size to ensure an efficient utilization of the receiver energy. In pursuance ofproviding the SSI to the transmitter, we resort to the ARQ protocol’s acknowledgement (ACK)/negativeacknowledgement (NAK) feedback messages. Consequently, the retransmission scheme, which we denote by ACK/NAKx, needs to have some additional feedback messages.In this paper, we consider a generic communication between two EH wireless nodes with the aforemen-tioned retransmission protocol. A decision-theoretic approach is used to find the optimal transmit powerstrategy. Firstly, the problem is formulated as a partially observable Markov decision process (POMDP),which is a suitable approach for formulating problems that require sequential decision making in astochastic setting, when some of the system states are unknown [22]. We solve the POMDP problemusing the value iteration method by computing the value function for the belief of the unknown state.Since the memory and computational complexity requirements are limited for sensor nodes, we proposea suboptimal and a computationally lower greedy power assignment method.The outline of this paper is as follows. The system model is presented in Section II. The adaptiveretransmission scheme is detailed in Section III. Section IV introduces the optimal and suboptimal methodsaiming to allocate the power over the time slots. An analytical upper bound on the PDP is derivedin Section V. Simulated numerical results and discussions are presented in Section VI, followed byconclusions in Section VII. A list of symbols with their descriptions used in this paper is given in Table I E hTx ACK E hTx NAK XNAKx E hTx ACK E hTx NAKx E hTx ACK T s T f t − t − t − t − t − t t + 1 E Tx E Tx E Tx E Tx k = 1 k = 2 k = 3 k = 1 k = 2 time index Fig. 1: Time-slotted packet transmission time line at the transmitter node. ’ ’ and ’ −→ ’ denote theARQ message and EH arrival events, respectively. Shaded areas in the slot denote the amount of energyused for transmission. II. S
YSTEM M ODEL
A. Transmission Model
We consider a point-to-point communication between two EH wireless sensor nodes. Sensor nodes havelimited capacity rechargeable batteries, which are charged by renewable energy sources. In the consideredmodel, when a transmitted packet is erroneously decoded, the receiver requests its retransmission. Amaximum of K ∈ Z retransmission requests are permitted. A packet consists of c information bits, takenfrom the data buffer, encoded with an ( m, c ) FEC code (e.g., convolutional code), and then modulatedthrough an M -ary quadrature amplitude modulation, where M denotes the cardinality of the constellation.This forms the packet of length ⌈ m/ log M ⌉ symbols, where ⌈·⌉ is the ceiling operator.The CSI and SSI are known at the transmitter through the ARQ feedback messages. Each sensor nodeis aware of its own battery state information (BSI), but not of the BSI of the other node. However, thetransmitter estimates the one slot delayed BSI of the receiver via SSI.A discrete time-slotted model is considered as depicted in Fig. 1. Each time slot is of T s secondsduration and indexed as t ∈ {1, 2, . . . }. The packet transmission and corresponding ARQ message receptionare completed within a slot, i.e., the round-trip time is T s . Several slots constitute a frame of duration T f . A frame has variable duration depending upon the number of slots being used in packet transmission,including retransmissions. Thus, the minimum and maximum values of frame duration are T s and KT s ,respectively. After a maximum of K unsuccessful attempts, the transmitter drops the packet and choosesa new one to transmit.Without loss of generality, the system model considered here can be extended to generic short-rangecommunication systems involving different modulation formats, sophisticated channel coding methods,and transmission strategies relying on multiple antennas and sub-carriers techniques. B. Energy Consumption Model
The transmitter and receiver sensor nodes spend energy to transmit and retrieve the information bits,respectively. For short-range commmunication, the energy consumption in a wireless link can be brokendown into two dominant factors [20]: the energy consumed at the power amplifiers P PA at the transmitter, and the circuit blocks at both transmitter and receiver. The circuit blocks of the transmitter consist ofa digital-to-analog convertor, mixers, active filters, and frequency synthesizers, while mainly of a lownoise amplifier, intermediate frequency amplifier, active filters, analog-to-digital convertor, and frequencysynthesizer at the receiver. Further, for coded systems, the energy expended in the decoding operationneeds to be included [20], [21] at the receiver. Thus, for a coded system, the total energy expenditure attime slot t , at both transmitter and receiver nodes is, respectively, given as P Tx = (1 + α ) P out | {z } P PA + P C , Tx , (1) P Rx = P dec + P C , Rx + P fb , (2)where P out is the transmit power, α = ( ξ/η ) − , with η as the drain efficiency and ξ as the peak-to-average power ratio. P C , Tx and P C , Rx are the total power spent in the circuit blocks of the transmitter andreceiver, respectively. The power consumed in transmitting the ARQ messages is denoted by P fb . P dec denotes the power used in the decoding operation and is ignored for the uncoded system. Typical valuesof P dec are around 70-80% of the power dissipated in the circuit blocks [21]. The index t is dropped in(1) and (2) to simplify the presentation. C. Energy Harvesting Model
The transmitter and receiver nodes are connected to two separate but similar renewable EH sources. Inparticular, two independent and identically distributed (i.i.d.), Bernoulli random processes are consideredto model the energy arrivals, similar to [23]. The Bernoulli model is tractable and captures the intermit-tent and irregular behavior of the energy arrival. It is worth mentioning that this work is, in essence,independent of the energy arrival process; this will be shown later in the simulation results, where thecompound Poisson arrival model [24] is used as well. At the start of every slot, E hTx Joule (J) withprobability (w.p.) ρ Tx and zero J w.p. − ρ Tx is harvested at the transmitter. The receiver node followsa similar energy arrival process with probability ρ Rx and amount E hRx .When the two nodes are in close vicinity and have the same type of harvesting source, then the twoEH processes are spatially correlated. In this case, the harvested energy pairs at time slot t are given as [12] ( E t Tx , E t Rx ) = (0 , w.p. p , (0 , E hRx ) w.p. p , ( E hTx , w.p. p , ( E hTx , E hRx ) w.p. p , (3)with the condition that p + p + p + p = 1 . For example, when p = p = p = p = p = 0 . then both nodes harvest energies independent from each other. When p = p = 0 and p = p = 0 . ,the harvested energies are highly correlated.Let B t Tx and B t Rx denote the energy levels of the battery at the start of the t th time slot, and E t Tx = P Tx T s and E t Rx = P Rx T s denote the energy consumed in transmitting a packet, as well as samplingand decoding operations at the transmitter and receiver, respectively. The battery level at the transmitterfollows the Markovian evolution: B t +1Tx = min { B t Tx + E hTx − E t Tx , B maxTx } , w.p. ρ Tx B t Tx − E t Tx , w.p. − ρ Tx , (4)where B maxTx denotes the transmitter node’s battery capacity. Replacing the subscript Tx in (4) with Rx gives the receiver side battery evolution. For presentation simplicity, the energies are normalized by aminimum possible energy, i.e., E minTx and E minRx that are spent in transmitting and receiving a packet ofsmallest size, respectively. Consequently, the transmitter energy level is an integer multiple of E minTx andthe change in the battery state whenever harvesting takes place is L Tx , E hTx /E minTx . Similarly, the batteryenergy level at the receiver side is an integer multiple of E minRx and the EH amount is L Rx , E hRx /E minRx . D. Channel Model
The wireless channel from the transmitter to the receiver is assumed to be Rayleigh faded and modeledas a finite state Markov chain (FSMC) [25], [26]. This model captures the main features of fading channels,and approximates the fading as a discrete-time Markov process. Essentially, all possible fading gains are modelled as a set of finite and discrete channel states. The FSMC channel is described as follows: discretestates of the channel G = { g , g , . . . , g |G| } , state transition probabilities Ω = { p ( g j | g i ) : g < g i, g j In low harvesting rate, the receiver might not perform the sampling and decoding operations togetheror perform only the sampling operation in one time slot. In such time slots, the receiver performs SSand stores the samples, which can be later combined with the remaining parts of the packet for buildinga full packet. Besides, the receiver sends back the SSI via ARQ messages. The transmitter then adaptsthe packet length for the next transmission. In the adaptive retransmission scheme, ARQ messages carry β bits, where β ∈ { , , , , . . . , ⌈ m/ log M ⌉} , and have a total of x + 2 messages where x ∈ { , , . . . , β } , β > ∅ , otherwise . (5)For β = 1 , the scheme becomes the conventional one. The additional x messages are essentially theSSI which are carried back to the transmitter as ARQ messages; henceforth, we refer to the adaptiveretransmission scheme by ACK/NAKx. Details of each message are as follows: ACK : The packet decoding at the receiver is successful. In reply, the transmitter chooses a new packetfor transmission in the subsequent time slot. NAK : The packet decoding is erroneous. In reply, the transmitter chooses the same packet fortransmission in the subsequent time slot. TABLE I: List of symbols Symbol Description Symbol Description K and k Maximum number of retransmissionsand its index S , |S| and S t System state space, its cardinality, andstate at time slot tc and m Number of information and coded bits B Tx , B Rx , |B Tx | , and |B Rx | Transmitter and receiver battery statesspace and their cardinality M and R c Modulation order and code rate U S t , |U S t | , a t , B S t Action states space, its cardinality, ac-tion, and maximum value of an action attime slot tT s and T f Slot duration and frame duration Z , |Z| and Z t Observation state space, its cardinality,and observation state at time slot tE t Tx and E t Rx Transmitter and receiver energy expen-diture at time slot t ACK, NAK, NAKx ARQ feedback messages P dec and P fb Decoding and ARQ message transmitpower expenditure at receiver P e ( g, a t ) Packet error probability with energy aE minTx via channel state gP out and P PA Transmit power and power amplifier out-put power P ( d, g, a t ) Modulation dependent bit error probabil-ity P C,Tx and P C,Rx Transmitter and receiver circuit blockpower P err Approximate packet error probability af-ter NAK or decoding failure η and ξ Amplifier drain efficiency and PAPR A d and d free Weight spectral coefficient and free dis-tance of convolution code B maxTx and B maxRx Maximum battery size of transmitter andreceiver nodes r ( s, a t ) Cost function of state S t = s after takingaction aE h Tx and E h Rx Transmitter and receiver EH amounts ρ Tx and ρ Rx Transmitter and receiver nodes probabil-ities of EH E minTx and E minRx Minimum energy to transmit and receivea packet of minimum size π and J π ( S ) Transmitter policy and total expectedcost with given start state S L Tx and L Rx E h Tx /E minTx and E h Rx /E minRx ̟ ( G t ) Belief of channel state G t at time slot t G , |G| , γ i , and G t Total number of discrete channel states,its cardinality, i th interval fading gain,and channel state at time slot t P drop and ¯ P drop PDP and average PDP g i and ω o ( g i ) Channel state of interval [ γ i − , γ i ) andits steady state probability ψ ( i, j ) Stationary probability distributionwith transmitter and receiver energies ( iE minTx , jE minRx ) β and x Number of division of a packet andnumber of additional ARQ messages Ξ q,r,w,yi,j,z,k Transition probability of going fromstate ( i, j, z, k ) to state ( q, r, w, y ) NAKx : ⌈ x m/ ( β log M ) ⌉ symbols of the transmitted packet are sampled and the rest is discardeddue to the lack of energy. In reply, in the subsequent time slot, the transmitter sends a packet with theremaining ⌈ m ( β − x) / ( β log M ) ⌉ symbols. Note that the message NAK is different from NAK0, which corresponds to the case when the receiverdoes not have enough energy to sample the smallest fraction of the transmitted packet. However, inboth cases, the transmitter retransmits the full packet. As for the conventional scheme, the ACK/NAKxmessages help in estimating the CSI to the transmitter node. Furthermore, it is now evident that theSS is a function of the available energy at the receiver. Based on the chosen value of β , the receiverselects ⌈ x m/ ( β log M ) ⌉ symbols for sampling in any given time slot, where x is related to the availableenergy at the receiver. The transmitter sends the ⌈ m ( β − x) / ( β log M ) ⌉ symbols after receiving NAKxmessage from the receiver. The length of the transmitted packet depends on the energy available at thetransmitter. For example, if β = 4 , the variable x takes values from { , , , , } . For x = 3 and if B Rx ≥ x E C , Rx /β = 3 E C , Rx / , the receiver samples ⌈ x m/ ( β log M ) ⌉ = ⌈ . m/ log M ⌉ symbolsand selects NAK to feedback to the transmitter. At the transmitter, if B Tx ≥ E minTx then a part of thepacket of size ⌈ m ( β − x) / ( β log M ⌉ ) = ⌈ . m/ log M ⌉ symbols is sent out; otherwise, there isno transmission. In another example, if B Rx ≥ ( E C , Rx + E dec ) , the receiver samples and decodes thepacket and selects the NAK or ACK message depending on the decoding outcome. In the conventionalretransmission scheme, if the receiver lacks energy to sample the full packet, it samples the packet till theenergy lasts and does not store the samples, and requests the packet retransmission, which requires fullamount of energy. In conclusion, the conventional retransmission scheme wastes energy when comparedwith the proposed adaptive one. IV. P OWER A SSIGNMENT S TRATEGY Considering the adaptive retransmission scheme, we formulate in this section the power assignmentas a sequential decision problem and then discuss two methods for managing efficiently the harvestedenergy at transmitter. At each time slot, the transmitter chooses the energy levels that minimize theaverage PEP. The decision is based on the retransmission index, BSI, sequences of past observations andpower assignments at the transmitter. After each transmission, the transmitter receives a feedback, referredto as observation { ACK , NAK , NAKx } , from the receiver. Furthermore, based on the observation, thetransmitter can also adapt the modulation and coding scheme. However, for sake of tractability, we onlyconsider the transmit packet size and power adaptations. The problem is considered in infinite horizon. A. Problem Formulation We formulate the problem by defining the following components: a set of time slots T = { , , . . . } over which decisions are made, and a set of system states S , a set of transmitter BSI B Tx , a set ofFSMC channel states G , a set of retransmission indices K = { , , . . . , K } , a set of actions U , a setof transition probabilities P , a set of observations Z , and a cost corresponding to every decision. Let S = B Tx × G × K = { ( b , g , k ) , ( b , g , k ) , . . . , ( b |B Tx | , g |G| , k |K| ) } denote the complete discrete statespace of the system with a total of |B Tx |×|G|×|K| states, where b , g , and k represent the transmitter batterystate, channel state, and retransmission index state, respectively. The state of the system, channel, andobservation at time slot t are represented as S t ∈ S , G t ∈ G , and Z t ∈ Z , respectively. The retransmissionindex k tracks the system state within each frame and is reset to one when its maximum value K isreached or when ACK is received, whichever comes first. Due to EH, the cardinality of the actions setvaries in every time slot. Hence, the set of actions at time slot t is denoted by U s t , { , , , . . . , B s t } ,where B s t ∈ B Tx represents the current state battery level. This is a set consisting of feasible choicesof energy levels corresponding to the transmission of a full packet. An action a t ∈ U s t represents theenergy level a t E minTx = P out T s in time slot t . For each action taken, the system receives an observationbelonging to the set Z . Note that a set of receiver BSI can be included in the system state space. Sincethe exact receiver BSI state is unkown to the transmitter, its value can be estimated using the ARQmessages likewise the channel state. However, including more unknown states to the system state spaceincreases the complexity in solving the problem.We first define the PEP used for the proposed adaptive retransmission scheme. Note that the packetcan be transmitted in parts, as presented in Section-III, and the receiver can decode the packet onlywhen all the samples are available. Since the different parts of the packet have passed through differentchannel states, the part which has passed through the worse channel state leads to the decoding failure. Tosimplify, we assume that the full packet is transmitted through the worse channel state, and the receiverdecodes it. Thus, the approximate PEP is the one of the worse part. Consequently, we use the followingapproximate PEP expression in the rest of the paper. Definition 1: For β > , the PEP after the transmitter receives Z t = NAK in current time slot t and retransmission index K t = k is approximated as P err ( G t , a t ) ≈ max l { P e ( G t − l , a t − k ) } , ≤ l ≤ k, (6)where P e ( G t , a t ) is the probability that a full packet transmitted in time slot t , with energy a t E minTx viachannel state G t = g , received in error. a t − k corresponds to the energy level used by the transmitterwhen sending a new packet at the retransmission k = 1 . Through numerical simulations, we have verifiedthat the approximated PEP approaches the simulated PEP; hence, the approximation (6) is reliable.Furthermore, for β = 1 the above expression becomes same as P e ( G t , a t ) .Furthermore, PEP is a function of the modulation type and FEC coding used. With the convolutioncode, for example, PEP is calculated as [27]: P e (cid:0) g, a (cid:1) ≤ − (cid:16) − m X d = d free A d P ( d, g, a ) (cid:17) m , (7)where d free is the free distance and A d is the weight spectra coefficients of the convolutional code. P ( d, g, a ) is the modulation dependent bit error probability. For example, the bit error probability ofbinary phase-shift-keying can be approximated by P ( d, g, a ) ≈ . p ( d ˜ γ ( g ) P out ) /σ n ) , where σ n is the noise power, erfc( · ) is the complementary error function, and ˜ γ ( g ) is the average power gainin channel state g , which can be found as ˜ γ ( g ) = ( R γ i γ i − γp ( γ ) dγ ) / R γ i γ i − p ( γ ) dγ , where p ( γ ) is theprobability density function of γ , which is distributed exponentially. Here, we assume that the errordetection code is able to find all remaining errors.Now, we consider a system state at time t as S t = ( B t Tx = b, G t = g, K t = k ) . Let the ARQ messageat time t be denoted by Z t , where Z t = NAK for decoding failure, NAK x for incomplete transmissionor Z t = ACK for a decoding success. After an action a t is taken at time slot t , the current system stategoes to a new state with the transition probability p ( S t +1 = s ′ | S t = s, a t ) and is associated with a cost.Let s = ( b, g, k ) be the current system state, then r ( S t = s, a t ) is the cost defined as: r ( s, a t ) = P err ( g, a t ) a t ≤ b, Z t = NAK , . (8)The cost function is independent of the receiver available energy as it is unknown at the transmitter. At time slot t , the probability of transition from state s = ( b, g, k ) to state s ′ = ( b ′ , g ′ , k ′ ) after takingan action a t , similar to [11], is given as p ( s ′ | s, a t ) = δ ( k ′ , k + ) p ( g ′ | g ) ζ (( b ′ , a t , b, k, g ) , (9)where k + , ( k mod K ) Z t +1 =ACK + 1 , with the indicator function A equal to 1 if the event A is true,and to zero otherwise. δ ( · , · ) is the Kronecker delta function. δ ( k ′ , k + ) ensures that the transmission indexincreases by one at each state transition and is reset to one when the maximum retransmission times isreached or when Z t +1 = ACK is received, whichever occurs first. ζ ( b ′ , a t , b, k, g ) is the probability thatthe transmitter with current channel state and retransmission state ( g, k ) moves from battery state b toanother state b ′ after taking an action a t . For k ≥ , ζ ( b ′ , a t , b, k, g ) = η ( b ′ , a t , b ) × ρ Rx P err ( g, a t ) Z t +1 = NAK ,Z t = NAK, NAKx ρ Rx (1 − P err ( g, a t )) Z t +1 = ACK ,Z t = NAK, NAKx − ρ Rx Z t +1 = NAKx , Z t = NAK, NAKx , (10)where η ( b ′ , a t , b ) , ρ Tx δ ( b ′ , b + L Tx − a t ) + (1 − ρ Tx ) δ ( b ′ , b − a t ) .At time slot t , the transmitter uses the history of both observation sequence, i.e., z t , [ Z , . . . , Z t ] with Z = ACK, and previously selected transmit power a t − , [ a , . . . , a t − ] to choose the transmitpower a t from the set U S t of the admissible power level. The transmit power is selected to minimize thetotal expected cost for the current and remaining packets: a ⋆t , arg min a t ∈U St E (cid:26) r ( S t , a t ) + ∞ X k = t +1 r ( S k , a ⋆k ) (cid:12)(cid:12)(cid:12)(cid:12) z t , a t − (cid:27) for t = 1 , , . . . , (11)where a ⋆t and a ⋆k denote the optimal transmit power assignments for the current time slots t and for futuretime slot k = t + 1 , respectively. E {·} is the expectation operator.Let the policy π : S → U specifies the rule for the selection of an action by the transmitter in a giventime slot. Hence, a policy is basically a mapping between what happened in the past and what has to be done at the current state. To find a policy π that minimizes the total expected cost, we cast (11) as aninfinite-horizon Markov decision process (MDP) as J π ( S ) = lim T →∞ T E (cid:26) T X t =1 r ( S t , a t ) (cid:12)(cid:12)(cid:12)(cid:12) S , z t , a t − (cid:27) , (12)where S is a known start state. The optimal policy π ⋆ minimizes the expected long-term average costgiven by (12). The optimal policies obtained in infinite-horizon MDP problems are often stationary, andhence, simpler to implement compared to what is obtained in a finite-horizon MDP problem that variesin each time slot. Furthermore, since the total system state space is countable and discrete, and U S t isfinite for each S t ∈ S , there exists an optimal stationary deterministic policy π ⋆ that minimized the totalexpected cost. B. Solution Methods This section discusses the solution methods for solving the MDP considered in this work. The followingBellman equation [28] is used to solve (12), λ ⋆ + h ⋆ ( s ) = min a t ∈U s ,a t ≤ B s h r ( s, a t ) + X s ′ ∈S p ( s ′ | s, a t ) h ⋆ ( s ′ ) i , (13)where λ ⋆ is the optimal cost and h ⋆ ( s ) is an optimal differential cost or relative value function for eachstate s ∈ S . The Bellman equation is a well estabilished and commonly used method for solving asequential decision making problem. Interested readers may refer to [22], [28] for further insights. Let π ⋆ ( s ) denote the solution of the MDP solved via the value iteration algorithm.In the formulation described above, the exact CSI is unknown at the transmitter while making thedecisions. Since one of the system state variable is partially known, the problem at hand is commonlyreferred to as POMDP [22]. Consequently, based on the observation history, a belief channel state spaceof the system is formed. It represents a sufficient statistic for the history of the previous actions andobservations, and adequate actions can be chosen depending upon the belief state. The belief channelstate ̟ ( G t ) = p ( G t | z t , a t − ) is defined as a probability distribution over all possible states conditionedon the history of previous actions and observations. The belief state at time slot t can be obtained byexpanding the inferred CSI distribution via the Bayes rule ̟ ( G t ) = G X j =1 p ( G t | G t − = g j , z t , a t − ) p ( G t − = g j | z t , a t − )= G X j =1 p ( G t | G t − = g j ) p ( G t − = g j | z t , a t − ) , (14)where we use the Markov CSI variation assumption to write (14). Further, with some simple mathematicalmanipulations, (14) can be written as: p ( G t − | z t , a t − ) = p ( Z t − | a t − , G t − ) p ( G t − | z t − , a t − ) P Gl =1 p ( Z t − | a t − , G t − = g l ) p ( G t − = g l | z t − , a t − ) , (15)where for G t = g p ( Z t | a t , g ) = ρ Rx P err ( g, a t ) Z t = NAK , a t > ,ρ Rx (1 − P err ( g, a t )) Z t = ACK , a t > , − ρ Rx Z t = NAKx , a t > ,p ( Z t − | a t , g ) a t = 0 . (16)Note that when a t = 0 , the transmitter is in energy outage; thus, no transmission takes place. Con-sequently, the acknowledged state also remains the same as the previously received one and so is theprobability.The POMDP can be solved using dynamic programming to find the optimal policy. However, solvingthe POMDP optimally is computationally infeasible for systems with a total number of states higher than15 [29]. In our case, the total number of system states is large, and hence, the optimal solution is notpresented. POMDPs are PSPACE-complete, i.e., they have high computational complexity and requirelarge memory that grows exponentially with the horizon [30]. Furthermore, PSPACE-complete problemsare even harder than NP-complete problems. However, many heuristics exist to find suboptimal policies,e.g., maximum-likelihood policy heuristic (MLPH) [31]. C. Proposed Solutions1) MLPH Power Assignment: We solve the problem in (13) using the MLPH method. In this approach,we first determine the state that the channel is most likely in, i.e., g ML = arg max G t ∈G ̟ ( G t ) . (17)With γ ML as the belief channel state at the t th time slot, the corresponding ML state is denoted as s ML = ( b, g, k ) . Then, the transmit power policy is set as a t , π ⋆ ( s ML ) . (18)Furthermore, MLPH finds the most probable state of the system from the belief state. When two ormore states are equally likely, MLPH chooses one arbitrarily. 2) Greedy Power Assignment: For low-power wireless sensors, the optimal solution should be avoideddue to the computational complexity constraint. Thus, we turn to a suboptimal greedy power assignmentscheme by modifying (11) as: ¯ a t , arg min a t ∈U St E { r ( s, a t ) | z t , a t − } for t = 1 , , . . . . . (19)The main idea of greedy power assignment is to avoid computation of future dependent expectedcost values. This incurs performance loss, however, at the expense of lower computational and storagerequirement. The greedy power assignment scheme can be rewritten as: ¯ a t = arg min a t ∈U St G X i =1 r ( s, a t ) p ( G t = g i | z t , a t − ) . (20)In order to estimate the greedy power assignment, (20) has to be implemented recursively. Using (14)and (15), we can write the following recursive implementation for the greedy power assignment.1) Measure Z t , compute p ( Z t | a t − , G t − ) as a function of G t − , and calculate p ( G t | z t , a t − ) using(15).2) Calculate p ( G t | z t , a t − ) using the Markov prediction step (14).3) Calculate a t via (19). For the initial packets indices t ∈ { , } , we use the initial steady state distribution of states ω o insteadof p ( G t − | z t − , a t − ) . 3) Implementation Issues: Here, we compare the implementation complexity issues of MLPH andgreedy heuristics. The computation required to solve MLPH is too high to cater by low power wirelessnodes. Thus, similar to [32], we use the memory resource of the sensor nodes rather than the computationalcomplexity. A look-up table T , which has been pre-computed and stored in the nodes memory, is usedto find the adequate transmit power. It contains the actions for different probabilities of EH, transmitterside battery, channel, acknowledgement and retransmit index states. The node, at every time slot, updatesthe channel belief state ̟ ( G t ) and looks up the transmit power a t , corresponding to this value.The memory requirement for storing the look-up table T depends on the total number of the systemspace states |S| and the number of actions |U | . The look-up table is stored for different values ofthe probabilities of EH. If each EH probability value is divided into κ levels, then the total memoryrequirement is κ × |U | × |S| bits. Additionally, |S| bits of memory are required to store the beliefvector of size |S| , and each element is quantized into 10 levels. On the other hand, the greedy algorithmrequires neither computation nor memory resource of the sensors. It only computes the immediate costas a function of the current state of the system including the belief state of the channel. This computationhas very low complexity when compared to computing the expected future costs.V. P ACKET D ROP P ROBABILITY A NALYSIS In this section, the queuing process induced by the adaptive retransmission scheme is analyzed forthe link between two sensors nodes. In particular, PDP is derived by leveraging tools from the queuingtheory. The PDP is the probability that the transmitted packet has been dropped due to repeatedly decodingfailure or not decoded due to the lack of energy at the receiver over K retransmission attempts. In thissection, we consider that the channel state remains constant for the duration of one frame transmissionand changes to a new state with some transition probability at the start of the new frame. The EH andconsumption models are defined in Section II.In order to make the PDP analysis tractable, we consider equal and fixed power policy, where theenergy required to transmit a full packet is fixed to βE minTx . For example, for the case when β = 4 , if the transmitter battery has energy sufficient to transmit only / portion of the packet, then thetransmit energy is E minTx . On the other hand, if the transmitter battery has less than the E minTx , thetransmit energy is zero. Hence, we approximate the system by discrete-time FSMC, which has thestate space S = B Tx × B Rx × G × Z × K = { s , s , . . . , s |S| } . The state at time t is denoted by S t = ( B t Tx = i, B t Rx = j, G t = g, Z t = z, K t = k ) where i , j , g , z , and k are the state values ofthe battery at the transmitter and receiver nodes, channel, acknowledgement and retransmission index,respectively.Let ¯ P drop denote the average PDP. The packet drop event in a finite battery system is due to eitherdecoding failure or unavailability of energy at the transmitter or at the receiver during K retransmissions.As depicted in Fig. 1, a frame consists of minimum to maximum K slots. The acknowledgementstate is z = ACK and the retransmission index state is k = 1 at the start of the frame. At the end ofeach slot, k is incremented by 1 if NAKx is received; otherwise, k = 1 if ACK is received. Moreover,after K retransmission attempts, the value of k is reset to 1. If a NAK is received in the K th attempt,then the acknowledge state is reset to ACK to indicate the start of a next packet transmission.The PDP as a function of K ≥ can be written as ¯ P drop ( K ) = X i,j ψ ( i, j ) E g (cid:2) P drop ( K | i, j, g, z = ACK , k = 1) (cid:3) , (21)where ψ ( i, j ) is the stationary probability that the transmitter and receiver nodes have energy iE minTx and jE minRx , respectively, at the start of the frame. P drop ( K | i, j, g, z, k ) is the PDP conditioned on the channelgain being in state G t = g , transmitter BSI iE minTx , receiver BSI jE minRx , acknowledgement state z andretransmission index k at the beginning of the frame. It is given by P drop ( K | i, j, g, ACK , 1) = 1 − P suc , (22)where P suc is the probability that the packet is successfully decoded within K attempts. Thus, P suc is the sum of all possible events contributing to successful packet transmission. It is given by P suc = P Kk =1 P suc ,k , where P suc ,k is the probability of success at the k th retransmission index. Accounting for EH events at the transmitter and receiver EHNs, P suc ,k can be upper bounded as: P suc ,k ≤ ( − (cid:20) ρ Tx ρ Rx P err ( a k ) + (1 − ρ Tx ) ρ Rx ϕ Tx P err ( a k ) + ρ Tx (1 − ρ Rx ) (cid:16) P err ( a k )( ϕ Rx + ϕ dec ) + β X x=0 ϕ Rx , x (cid:17) +(1 − ρ Tx )(1 − ρ Rx ) (cid:16) ϕ Tx P err ( a k )( ϕ Rx + ϕ dec ) + β X x=0 ϕ Rx , x (cid:17)(cid:21)) × (1 − P suc ,k − ) , (23)where ϕ Tx = ( a k E minTx ≥ E Tx ) , ϕ Rx = ( jE minRx ≥ E Rx ) , ϕ dec = ( jE minRx ≥ E dec ) and ϕ Rx , x = ( x+1 β E C , Rx >jE minRx ≥ x β E C , Rx ) . P err ( a k ) = P err ( g, a k ) , where a k denotes the value of action taken at retransmission time index k .Hereafter, the dependency of g is removed from P err ( g, a k ) since the channel state is assumed fixedduring the packet transmission.The stationary probability distribution ψ = [ ψ (0 , , · · · , ψ ( i, j ) , · · · , ψ ( B maxTx , B maxRx )] can be com-puted by solving ψ = ψ Ψ g , where Ψ g is the transition probability matrix whose elements are given as E g (cid:2) Pr( B t +1Tx = q, B t +1Rx = r | B t Tx = i, B t Rx = j, g ) (cid:3) , under constraint P ( i,j ) ψ g ( i, j ) = 1 . Moreover, E g (cid:2) Pr( B t +1Tx = q, B t +1Rx = r | B t Tx = i, B t Rx = j, g ) (cid:3) = G X l =1 ω o ( g l )Pr( B t +1Tx = q, B t +1Rx = r | B t Tx = i, B t Rx = j ) , (24)where the left hand side term is the expected probability that BSI of the transmitter and receiver is q and r conditioned on previous BSI of i and j , respectively. Furthermore, the right hand side term of (24) canbe given as Pr( B Tx t +1 = q, B Rx t +1 = r | B Tx t = i, B Rx t = j ) = |Z| X w K X y =1 Pr( q, r, w, y | i, j, z = ACK , k = 1) . (25)We use the transition probability matrix Ξ to evaluate Pr( B t +1Tx = q, B t +1Rx = r | B t Tx = i, B t Rx = j, g ) .The elements of matrix Ξ represent the transition probability of going from state ( i, j, z, k ) to anotherstate ( q, r, w, y ) , which is denoted by Ξ q,r,w,yi,j,z,k with fixed g . We have identified the following four casesto calculate these elements: Case i) For z ∈ { ACK / NAKx } , k = 1 , . . . , K, and both transmitter and receiver are harvestingenergy, then Ξ q,r,w,yi,j,z,k = ρ Tx ρ Rx w , where w is described in Table IIa. In this case, the receiver doesnot feedback NAKx messages and the transmitter resends the full packet since both nodes are harvesting. Case ii) For z ∈ { ACK / NAKx } , and k = 1 , . . . , K , and the transmitter is harvesting energy, whilethe receiver is not, then Ξ q,r,w,yi,j,z,k = ρ Tx (1 − ρ Rx ) w , where w is described in Table IIb. In this case, the receiver can feedback NAKx messages whenever it does SS. In response, the transmitter can sendthe appropriate fraction of the packet. Case iii) For z ∈ { ACK / NAKx } and k = 1 , . . . , K , and the transmitter is not harvesting energy, whilethe receiver is harvesting, then Ξ q,r,w,yi,j,z,k = (1 − ρ Tx ) ρ Rx w , where w is described in Table IIc. In thiscase, the receiver never transmits NAKx messages as it is harvesting the entire time slot. The transmitternode can decide to transmit or not depending upon the availability of minimum energy. However, if thecurrent acknowledgement state value is NAKx and the transmitter decides not to transmit, then the nextacknowledgement state remains NAKx. Case iv) Both transmitter and receiver are not harvesting energy, and k = 1 , . . . , K , then Ξ q,r,w,yi,j,z,k =(1 − ρ Tx )(1 − ρ Rx ) w , where w is described in Table IId. In this case, assuming β = 4 , if the currentsystem state is S t = ( i, j, NAK2 , k ) such that ϕ Tx = 1 and ϕ Rx , = 1 , then the system moves to state S t +1 = ( q, r, NAK1 , k + 1) with probability (1 − ρ Tx )(1 − ρ Rx ) . In another example, if the current systemstate is S t = ( i, j, NAK , k ) such that ϕ Tx = 1 and ϕ Rx , = 1 , the system moves to the new state S t +1 = ( q, r, NAK1 , k + 1) with probability (1 − ρ Tx )(1 − ρ Rx ) . The energy levels at the receiver nodesare defined as a Rx = E Rx /E minRx , a dec = E dec E minRx , and a Rx , x = x E C , Rx βE minRx .VI. N UMERICAL R ESULTS In this section, we evaluate the performance of the adaptive ACK/NAKx scheme and power assignmentstrategy by numerical simulations. Results are compared with the conventional scheme in order todemonstrate the benefits. The conventional retransmission scheme is denoted by ACK/NAK.Three metrics are used to evaluate the performance: the average packet transmission time T p ( t ) , whichis the average time taken per packet to be successfully delivered; PDP, i.e., P drop ( t ) , the probabilityof dropping a packet after K retransmission attempts; the spectral efficiency, which is the ratio of thenumber of successfully transmitted packets to the total number of packets selected from the data bufferto transmit within a fixed transmission time.The parameters summarized in Table III are used in numerical simulations unless otherwise mentioned.The probabilities of EH for both nodes are assumed to be the same, i.e., ρ Tx = ρ Rx = ρ . Note that theslot duration T s = 1 second, and thus, the power and energy values can be used interchangeably. The equal power assignment in each time slot is denoted by P eout . Furthermore, for low values of β such as4, P fb is assumed negligible in the simulations.TABLE III: Simulation Parameters Parameters Value G c bits M and R c and / K β T s and T s and s ξ , η and α √ M − / ( √ M + 1) , . and σ n mW P C , Tx and P C , Rx . W ρ Tx and ρ Rx [0 , P out and P PA { , } mW and { , } mW E minTx and E minRx E t Tx /β and E t Rx /βB maxTx and B maxRx P Tx and P Tx P dec P C , Rx E hTx and E hRx P Tx T s and . P Rx T s Probability of EH ( ρ )0.4 0.5 0.6 0.7 0.8 0.9 1 A v e r age P D P -3 -2 -1 Greedy ACK/NAKx (K = 2)MLPH ACK/NAKx (K = 2)GreedyNAKx (K = 3)MLPH ACK/NAKx (K = 3) K = 2K = 3 Fig. 2: Average packet drop probability (PDP) P drop ( t ) for K = { , } . We first compare the performance of the MLPH with the greedy transmit power assignments. Fig. 2plots the average PDP versus the probability of EH. In order to reduce the overall system states, weset the values of K to and , and E hRx = 1 . P Rx T s . The MLPH and greedy algorithms have similarperformance in lower EH rate regime, whereas the MLPH algorithm shows higher gains in higher EHrate regime. As expected, the performances improve with higher number of retransmission attempts, i.e., K = 3 . However, when the state size increases, the MLPH becomes impractical and the greedy powerassignment strategy becomes a natural choice. Thus, the following numerical examples only consider thegreedy approach. Probability of EH ( ρ )0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 A v e r age pa ck e t t r an s m i ss i on t i m e ( s ) ACK/NAK, P oute = 5 mWACK/NAKx, P oute = 5 mWACK/NAK, P oute = 15 mWACK/NAKx, P oute = 15 mWGreedy ACK/NAKGreedy ACK/NAKx P eout = 15 mWGreedy P eout = 5 mW Fig. 3: Average packet transmission time for K = 4 .In Fig. 3, the average packet transmission time T p ( t ) is shown versus the probability of EH, andthe performance of the ACK/NAKx and ACK/NAK in the greedy and equal power assignment settings,respectively, is compared. We set P eout = 5 and mW for equal power assignments. We can observe thatthe ACK/NAKx scheme has the lowest average packet transmission time compared to that of ACK/NAK in both greedy and equal power assignment settings. Furthermore, the performance of the ACK/NAKxscheme, which employs the equal power assignment of mW is better than the conventional scheme,which employs the greedy power assignment in the low EH regime. As expected, for low harvesting rates,all schemes have longer transmission time. Additionally, the ACK/NAK scheme exhibits equal averagetransmission times for both greedy assignment and equal power assignment with P eout = 15 mW. Thisresult means that a higher transmit power helps the equal power assignment algorithm in overcoming thechannel states that are in deep fade. However, it does not help in using the receiver energy efficiently.Moreover, the performance of ACK/NAKx over the ACK/NAK scheme is significant in lower EH rateregime. This is because the receiver node, in the latter scheme, processes the received packet by samplingfollowed by decoding it. Due to the lack of energy, the signal processing operation stops, which resultsin packet drop and loss of energy. Therefore, the receiver has to wait longer to get enough energy tosample and decode the packet in a single time slot. Probability of EH ( ρ )0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 A v e r age P D P -3 -2 -1 ACK/NAK, P oute = 5mWACK/NAKx, P oute = 5 mWACK/NAK, P oute = 15 mWACK/NAKx, P oute = 15 mWGreedy ACK/NAKGreedy ACK/NAKx Greedy P eout = 5 mWP eout = 15 mW Fig. 4: Average packet drop probability (PDP) P drop ( t ) for K = 4 . In Fig. 4, the PDP of the ACK/NAKx scheme is compared to that of the ACK/NAK scheme, whenemploying the greedy and equal power assignments strategies, respectively. The simulation parametersare the same as those used for Fig. 3. One can see that all retransmission schemes experience high PDPin low harvesting rate regime. However, the proposed scheme exhibits performance gain particularly inlow EH regime. The greedy ACK/NAKx shows even better gains in all the harvesting rate regimes. Probability of EH ( ρ )0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 S pe c t r a l e ff i c i en cy ( b i t s / c hanne l u s e ) ACK/NAK, P eout = 5 mWACK/NAKx, P eout = 5 mWACK/NAK, P eout = 15 mWACK/NAKx, P eout = 15 mWGreedy ACK/NAKGreedy ACK/NAKx Greedy P eout = 15 mWP eout = 5 mW Fig. 5: Spectral efficiency for a fixed transmission time for K = 4 .To get further insight into the performance gain, Fig. 5 compares the spectral efficiency of theACK/NAKx and the ACK/NAK schemes for the fixed transmission time T = 150 s. The ACK/NAKxscheme has better performance over the ACK/NAK scheme under equal power assignment. Moreover,the greedy ACK/NAKx has better performance over equal power ACK/NAKx and ACK/NAK. Again, asexpected, the gains are significant in the low EH regime.In Fig. 6, the simulated and analytical PDP calculated using (21) are compared. In order to reduce thenumber of states of discrete-time FSMC, we set new E hTx = 2 P Tx T s , B maxTx = 3 P Tx T s , E hRx = 1 . P Rx T s , Probability of EH ( ρ )0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 A v e r age P D P -2 -1 Analy. ACK/NAKx, P eout = 5 mWSim. ACK/NAKx, P eout = 5 mWAnaly. ACK/NAKx, P eout = 15 mWSim. ACK/NAKx, P eout = 15 mW P eout = 5 mWP eout = 15 mW Fig. 6: Comparison of analytical (21) and simulated packet drop probability (PDP). B maxRx = 2 P Rx T s , and P dec = 5 P C , Rx . It can be seen that the analytical and simulated curves for P eout ∈ { , } mW have a similar behavior; however, loose. This is because the exact tight bound forthe PDP of any modulation with convolution coding is not known for low signal-to-noise ratio (SNR).Consequently, we use the upper bound on the PEP (7), which is loose in low and tight in high SNRregime, respectively, in deriving the PDP expression (21).For further exposition, Fig. 7 compares the performance of the ACK/NAKx and ACK/NAK undera stochastic EH setup modeled by the compound Poisson process [1], [24]. The compound Poissonprocess closely models the EH due to the solar power [33], [34]. In this model, the energy arrivalsfollow a Poisson distribution with intensity λ , i.e., inter-arrival time is exponentially distributed withmean /λ . The energy amount in each arrival, i.e., E h { Tx , Rx } is i.i.d., with mean ¯ E { Tx , Rx } . The numberof arrivals in one time slot follows a Poisson distribution with mean λT s . The simulated PDP of the ACK/NAKx and ACK/NAK are compared for the greedy and equal power P eout = 5 mW assignmentsstrategy. The simulation parameters are the same as the ones in Fig. 3, except the harvesting model with ¯ E Tx = 3 P Tx T s / and ¯ E Rx = 1 . P Rx T s / . The behavior of the schemes is the same as in the case withthe Bernoulli EH arrival process. EH intensity ( λ )0.5 1 1.5 2 2.5 3 A v e r age P D P -4 -3 -2 -1 ACK/NAK, P oute = 5 mWACK/NAKx, P oute = 5 mWGreedy ACK/NAKGreedy ACK/NAKx P eout = 5 mWGreedy Fig. 7: Average packet drop probability (PDP) P drop ( t ) for K = 4 .It is worth mentioning that the overall gain of the NAK/NAKx with low complexity greedy powerassignment method is approximately − when compared to ACK/NAK. While this gain may appearlow, it is significant in WSN where multiple sensors are in operation.VII. C ONCLUSIONS AND F UTURE D IRECTIONS An ARQ based adaptive retransmission scheme between a pair of EH wireless sensor nodes is in-vestigated. In a conventional scheme, the receiver may suspend the sampling and decoding operationsdue to insufficient energy, and hence, suffer a loss of both data and harvested energy. To overcome this problem, a selective sampling scheme was introduced, where the receiver selectively samples the receiveddata and stores it. The selection depends on the amount of energy available. The receiver performs thedecoding operation when both complete samples of the packet and enough energy are available. Selectivesampling information is fed back to the transmitter by resorting to the conventional ARQ scheme. Thetransmitter uses this information to re-size the packet length. A POMDP formulation was setup to furtheroptimize the transmit power. A suboptimal greedy power assignment method was developed, which iswell suited for low power wireless nodes from the implementation perspective. An analytical upper boundon the PDP is derived for the proposed adaptive retransmission scheme. Simulation results agree withthe analytical solution when the fixed power assignment policy is used. Numerical results demonstratedthat the adaptive retransmission scheme and power assignment strategy provide better performance overthe conventional scheme.The proposed adaptive retransmission framework paves the way to several other interesting researchavenues. The proposed scheme can be investigated with more sophisticated retransmission scheme, i.e.,type-II HARQ. To conceptualize the performance of the proposed scheme in a practical scenario, onecan consider a system setup with multiple transmitters and one receiver, all with EH. Problems to bestudied under this setting are more challenging due to the involvement of an increased number of randomvariables associated to each node. R EFERENCES [1] O. Ozel , K. Tutuncuoglu, J. Yang, S. Ulukus, and A. Yener, “Transmission with energy harvesting nodes in fading wirelesschannels: Optimal policies,” IEEE J. Select. Areas Commun. , vol. 29, no. 8, pp. 1732–1743, Sep. 2011.[2] M. Gorlatova, A. Wallwater, and G. Zussman, “Networking low-power energy harvesting devices: Measurements andalgorithms,” in Proc. IEEE Int. Conf. Comput. Commun. , Shanghai, China, Apr. 10–15 2011, pp. 1602–1610.[3] J. Yang and S. Ulukus, “Optimal packet scheduling in an energy harvesting communication system,” IEEE Trans. Commun. ,vol. 60, no. 1, pp. 220–230, Jan. 2012.[4] K. Tutuncuoglu and A. Yener, “Communicating with energy harvesting transmitter and receivers,” in Proc. Inform. Theoryand Applications Workshop , San Diego, CA, USA, Feb. 05–10 2012, pp. 240–245.[5] H. Mahdavi-Doost and R. D. Yates, “Energy-harvesting receivers: Finite battery capacity,” in Proc. IEEE Int. Symp. Inform.Theory , Istanbul, Turkey, Mar. 19–21 2013, pp. 1799–1803. [6] R. D. Yates and H. Mahdavi-Doost, “Energy-harvesting receivers:Otimal sampling and decoding policies,” in Proc. IEEEGlobal Signal and Inform. Processing , Austin, TX, USA, Dec. 03–05 2013, pp. 367–370.[7] H. Mahdavi-Doost and R. D. Yates, “Fading channels in energy-harvesting receivers,” in Proc. Conf. Inform. Sciences Syst.(CISS) , Princeton, USA, Mar. 19–21 2014, pp. 1–6.[8] A. Kansal, J. Hsu, S. Zahedi, and M. B. Srivastava, “Power management in energy harvesting sensor netwroks,” ACMTrans. Embedded Comput. Syst. , vol. 6, no. 4, pp. 1–38, Sep. 2007.[9] C. R. Murthy, “Power management and data rate maximization in wireless energy harvesting sensors,” Int. J. Wireless Inf.Netw. , vol. 16, no. 3, pp. 102–117, Jul. 2009.[10] Z. Shenqiu, A. Seyedi, and B. Sikdar, “An analytical approach to the design of energy harvesting wireless sensor nodes,” IEEE Trans. Wireless Commun. , vol. 12, no. 8, pp. 4010–4024, Aug. 2013.[11] A. Aprem, C. R. Murthy, and N. B. Mehta, “Transmit power control policies for energy harvesting sensors withretransmissions,” IEEE J. Select. Topics Signal Processing , vol. 7, no. 5, pp. 895–906, Oct. 2013.[12] S. Zhou, T. Chen, W. Chen, and Z. Niu, “Outage minimization for a fading wireless link with energy harvesting transmitterand receiver,” IEEE J. Select. Areas Commun. , vol. 33, no. 3, pp. 496–511, Mar. 2015.[13] M. K. Sharma and C. R. Murthy, “Packet drop probability analysis of ARQ and HARQ-CC with energy harvestingtransmitters and receivers,” in Proc. IEEE Global Signal and Inform. Processing , Atlanta,Georgia. USA, Dec. 3–5 2014,pp. 148–152.[14] J. Doshi and R. Vaze, “Long term throughput and approximate capacity of transmitter-receiver energy harvesting channelwith fading,” in Proc. IEEE Int. Conf. Commun. Syst. , Macau, Nov.19–21 2014, pp. 46–50.[15] A. Yadav, M. Gonnewardhena, W. Ajib, and H. Elbiaze, “Novel retransmission scheme for energy harvesting transmitterand receiver,” in Proc. IEEE Int. Conf. Commun. , London, UK, Jun.8–12 2015, pp. 4810–4815.[16] J. A. Stankovic, T. F. Abdelzaher, C. Lu, L. Sha, and J. C. Hou, “Real-time communication and coordination in embeddedsensor networks,” Proc. IEEE , vol. 91, no. 7, pp. 1002–1022, Jul. 2003.[17] D. Ciuonzo, P. Salvo Rossi, and S. Dey, “Massive MIMO channel-aware decision fusion,” IEEE Trans. Signal Processing ,vol. 63, no. 3, pp. 604–619, Feb. 2015.[18] A. Shirazinia, S. Dey, D. Ciuonzo, and P. Salvo Rossi, “Massive MIMO for decentralized estimation of a correlated source,” IEEE Trans. Signal Processing , vol. 64, no. 10, pp. 2499–2512, May 2016.[19] Q. Bai, A. Mezghani, and J. A. Nossek, “Throughput maximization for energy harvesting receivers,” in Proc. Int. Works.on Smart Antennas , Stuttgart, Germany, Mar. 13–14 2013, pp. 1–8.[20] S. Cui, A. Goldsmith, and A. Bahai, “Energy-efficiency of MIMO and cooperative MIMO techniques in sensor networks,” IEEE J. Select. Areas Commun. , vol. 22, no. 6, pp. 1089–1098, Aug. 2004.[21] P. Grover, K. Woyach, and A. Sahai, “Towards a communication-theoretic understanding of system-level power consump-tion,” IEEE J. Select. Areas Commun. , vol. 29, no. 8, pp. 1744–1755, Sep. 2011.[22] M. Putterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming . New York, USA: Wiley- Interscience, 1994.[23] J. A. Paradiso and M. Feldmeier, A Compact, Wireless, Self-Powered Pushbutton Controller . Ubicomp 2001, SpringerBerlin Heidelberg, 2001, pp. 299–304.[24] J. Xu and R. Zhang, “Throughput optimal policies for energy harvesting wireless transmitters with non-ideal circuit power,” IEEE J. Select. Areas Commun. , vol. 32, no. 2, pp. 322–332, Feb. 2014.[25] H. S. Wang and N. Moayeri, “Finite-state Markov channel-A useful model for radio communication channels,” IEEE Trans.Veh. Technol. , vol. 44, no. 1, pp. 163–171, Feb. 1995.[26] Q. Zhang and S. A. Kassam, “Finite-state Markov model for Rayleigh fading channels,” IEEE Trans. Commun. , vol. 47,no. 11, pp. 1688–1692, Nov. 1999.[27] M. B. Pursley and D. J. Taipale, “Error probabilities for spread-spectrum packet radio with convolutional codes and Viterbidecoding,” IEEE Trans. Commun. , vol. 35, no. 1, pp. 1–12, Jan. 1987.[28] D. P. Bertsekas, Dynamic Programming and Optimal Control , 2nd ed. Athena Scientific, 2000.[29] M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, “Learning policies for partially observable environments: Scalingup,” in Proc. Int. Conf. Mach. Learning , Tahoe City, CA, USA, Jul.9–15 1995, pp. 362–370.[30] C. H. Papadimitrious and J. N. Tsitsiklis, “The complexity of Markov decision processes,” Mathematics of OperationsResearch , vol. 12, no. 3, pp. 441–450, Aug. 1987.[31] I. Nourbakhsh, R. Powers, and S. Birchfield, “DERVISH: An office-navigating robot,” Artificial Intell. Magazine , vol. 16,no. 2, pp. 53–60, Summer, 1995.[32] R. Srivastava and C. E. Koksal, “Energy optimal transmission scheduling in wireless sensor networks,” IEEE Trans. WirelessCommun. , vol. 9, no. 5, pp. 1550–1560, May 2010.[33] Q. Bai and J. A. Nossek, “Modulation optimization for energy harvesting transmitters with compound Poisson energyarrivals,” in Proc. IEEE Works. on Sign. Proc. Adv. in Wirel. Comms. , Darmstadt, Germany, Jun. 16–19 2013, pp. 764–768.[34] P. Lee, Z. A. Eu, M. Han, and H. Tan, “Empirical modeling of a solar-powered energy harvesting wireless sensor nodefor time-slotted operation,” in Proc. IEEE Wireless Commun. and Netw. Conf. , Quintana, Mexico, Mar.28–31 2011, pp.179–184. (a) Case-I w Conditions P err ( a k ) w = NAK, y = mod ( k, K ) + 1 q = min { i + L Tx − a k , B maxTx } , r = min { j + L Rx − a Rx , B maxRx } − P err ( a k ) w = ACK, y = 1 q = min { i + L Tx − a k , B maxTx } , r = min { j + L Rx − a Rx , B maxRx } (b) Case-II w Conditions P err ( a k ) w = NAK , y = mod ( k, K ) + 1 q = min { i + L Tx − a k , B maxTx } , r = j − a Rx for z = NAK, ϕ Rx = 1 q = min { i + L Tx − a k , B maxTx } , r = j − a dec − a Rx , x for z = NAKx, ϕ dec = 1 , ϕ Rx , x = 11 − P err ( a k ) w = ACK , y = 1 q = min { i + L Tx − a k , B maxTx } , r = j − a Rx for z = NAK, ϕ Rx = 1 q = min { i + L Tx − a k , B maxTx } , r = j − a dec − a Rx , x for z = NAKx, ϕ dec = 1 , ϕ Rx , x = 11 w = NAKx , y = mod ( k, K ) + 1 q = min { i + L Tx − a k , B maxTx } , r = j − a Rx , x for z = NAK, NAKx, ϕ Rx , x = 1 (c) Case-III w Conditions P err ( a k ) w = NAK , y = mod( k, K ) + 1 q = i − a k , r = min { j + L Rx − a Rx , B maxRx } for z = NAK, ϕ Tx = 1 q = i − a k , r = min { j + L Rx − a Rx , B maxRx } for z = NAKx, ϕ Tx = 11 − P err ( a k ) w = ACK , y = 1 q = min { i + L Tx − a k , B maxTx } , r = j − a Rx for z = NAK, ϕ Tx = 1 q = min { i + L Tx − a k , B maxTx } , r = j − a dec − a Rx , x for z = NAKx, ϕ Tx = 11 w = NAK, NAKx, y = mod ( k, K ) + 1 q = i , r = min { j + L Rx , B maxRx } , ϕ Tx = 0 (d) Case-IV w Conditions P err ( a k ) w = NAK , y = mod( k, K ) + 1 q = i − a k , r = j − a Rx for z = NAK, ϕ Tx = ϕ Rx = 1 q = i − a k , r = j − a Rx for z = NAKx, ϕ Tx = ϕ Rx = ϕ dec = 11 − P err ( a k ) w = ACK , y = 1 q = i − a k , r = j − a Rx for z = NAK, ϕ Tx = ϕ Tx = 1 q = i − a k , r = j − a dec for z = NAKx, ϕ Tx = ϕ Rx = ϕ dec = 11 w = NAKx, y = mod ( k, K ) + 1 q = i − a k , r = j − a Rx , x ϕ Rx , x , for z = NAK, ϕ Tx = ϕ Rx , x = 1 q = i − a k , r = j − a Rx , x ϕ Rx , x , for z = NAKx, ϕ Tx = ϕ Rx , x = 1 q = i , r = j , for z = NAKx, ϕ Tx = 0 TABLE II: Values of w , w , w , and w00