[PDF] Queue-aware Energy Efficient Control for Dense Wireless Networks

Abstract

We consider the problem of long term power allocation in dense wireless networks. The framework considered in this paper is of interest for machine-type communications (MTC). In order to guarantee an optimal operation of the system while being as power efficient as possible, the allocation policy must take into account both the channel and queue states of the devices. This is a complex stochastic optimization problem, that can be cast as a Markov Decision Process (MDP) over a huge state space. In order to tackle this state space explosion, we perform a mean-field approximation on the MDP. Letting the number of devices grow to infinity the MDP converges to a deterministic control problem. By solving the Hamilton-Jacobi-Bellman Equation, we obtain a well-performing power allocation policy for the original stochastic problem, which turns out to be a threshold-based policy and can then be easily implemented in practice.

Full PDF

QQueue-aware Energy Efﬁcient Control for DenseWireless Networks

Maialen Larrañaga, Mohamad Assaad and Koen De Turck

Laboratoire des Signaux et Systèmes (L2S,CNRS), CentraleSupélecGif-sur-Yvette, France

Abstract —We consider the problem of long term power alloca-tion in dense wireless networks. The framework considered in thispaper is of interest for machine-type communications (MTC). Inorder to guarantee an optimal operation of the system while beingas power efﬁcient as possible, the allocation policy must take intoaccount both the channel and queue states of the devices. This isa complex stochastic optimization problem, that can be cast asa Markov Decision Process (MDP) over a huge state space. Inorder to tackle this state space explosion, we perform a mean-ﬁeld approximation on the MDP. Letting the number of devicesgrow to inﬁnity the MDP converges to a deterministic controlproblem. By solving the Hamilton-Jacobi-Bellman Equation, weobtain a well-performing power allocation policy for the originalstochastic problem, which turns out to be a threshold-based policyand can then be easily implemented in practice.

I. I

NTRODUCTION

The steep increase of the number of mobile devices in usehas brought a lot of attention to the design of large wirelessnetworks. The proliferation of Internet of Things (IoT) appli-cations will lead to a drastic increase of the density of devicesin future wireless networks. Machine Type Communications(MTC) is the cellular technology for IoT. In 5G (and beyond)networks, it is foreseen that the density of MTC devices maysurpass 1 Million of devices per Km [1]. In such densenetworks, the network designer has to deal with severe inter-ference issues in order to guarantee a certain level of qualityof service (QoS). This can be handled by advanced physicallayer solutions (e.g. Interference Alignment [3], etc.), whichmay however suffer in some cases from high complexity orhigh signaling overhead. Furthermore, opportunistic resourceallocation, such as power control, can also help manage theimpact of interference among users and hence improve theirQoS. The focus of this paper is on resource allocation in suchdense networks. The problem of power control in wirelessnetworks has been widely studied in the past, e.g. in [4]and the references therein. The problem of power controlin large scale networks has also been investigated in thepast using game theory and mean-ﬁeld games, e.g. [7], [9],[10]. The problem in these references is ﬁrst formulated as astochastic differential game and then the sufﬁcient conditionsfor the existence and uniqueness of the mean-ﬁeld equilibriumare provided. It is also shown that this equilibrium powercan be obtained by solving a coupled system of Fokker-Planck-Kolmogorov (FPK) equations (which take the formof forward equations) and Hamilton-Jacobi-Bellman (HJB)equations (which take the form of backward equations) to form a system of so-called forward-backward equations. Inthe aforementioned work on mean-ﬁeld games, two issues arenot addressed: i) solving numerically the resulting forward-backward equations has a high complexity, and ii) the focus ofthe proposed frameworks is on the wireless links, i.e. channelstate information (CSI), without taking into account the trafﬁcpatterns and/or the queues of the users. In fact, since theCSI reveals the instantaneous transmission opportunities at thephysical layer and the queue state information (QSI) revealsthe urgency of the data ﬂows, a good control policy musttake into account both the CSI and the QSI, and the goal inthe present paper is to ﬁnd such control policy. Queue-awarecontrol problems have been widely studied in the literatureand several approaches have been used. For example in [6],[13], the allocation policies are based on the MaxWeight rulewhich allows to stabilize the queues of the users. However,the MaxWeight rule may suffer from high delay and thereforedelay-ware control policies for wireless networks have beendeveloped in [8], [14], where it is established that MarkovDecision Processes (MDP) constitute the systematic approachused in the development of the delay-aware policies. A surveyon delay-aware control policies can be found in [5]. MDPproblems prove to be a difﬁcult problem to solve. Manytechniques have been proposed, for instance brute force valueiteration or policy iteration [5], [12] that ﬁnd the optimalcontrol policy by solving the Bellman equation. Howeverthese techniques have a huge complexity (due to the curse ofdimensionality) because solving the Bellman equation involvessolving a large system of non-linear equations whose sizeincreases exponentially in the number of users. Effort has beendone in order to deal with the curse of dimensionality [14] byutilizing the interference ﬁltering property of the CSMA-likeMAC protocol. A closed-form approximate solution and theassociated error bound have been derived using perturbationanalysis. However this assumption on the weak interferenceseems constraining and not adapted to dense wireless networkswhere the interference level cannot be small. In this work,to overcome the dimension problem we use the mean ﬁeldapproach. It consist of neglecting the behavior of individualuser by only considering the one of the proportion of usersin certain state. This allows us to move from a stochasticoptimization problem to a continuous-time deterministic one.We formulate the bias optimal control problem based on thisdeterministic approach and we solve it by characterizing asolution of the Hamilton-Jacobi-Bellman equation. One of the a r X i v : . [ c s . I T ] J a n ain challenges we face is that the equations are fully coupled,meaning the solution of one is dynamically depending on thesolution of the other. In order to handle those challenges weadopt a three steps method to ﬁnally obtain the optimal powercontrol. We ﬁrst characterize the optimal equilibrium pointof the dynamic system with respect to the control variable.Then we prove convexity of the cost function (in all possibleequilibrium points). Finally, we propose a threshold type ofpolicy that satisﬁes the HJB equations, and is hence bias-optimal. The obtained policy, being a simple threshold type ofpolicy, can easily be applied in the original stochastic systemand provides nearly-optimal performance.Summarizing, these are the main differences between thepresent paper and the existing body of literature. While the ex-isting work on mean-ﬁeld games in wireless networks focuseson the CSI and formulate the power control problems usinggame theory (e.g. differential game) [7], [9], [10], we considerthe impact of the QSI in addition to the CSI in this work.Furthermore, our problem is a multi-dimensional stochasticoptimization problem that is formulated as an inﬁnite horizonaverage cost MDP. Last but not least, while most of theexisting work on mean ﬁeld (e.g. [7], [9], [10]) does notprovide a simple solution of the forward-backward equation(resulting from the mean ﬁeld game) which is known tobe complex, we analyze in this paper the forward-backwardequation resulting from our MDP problem and provide a fullcharacterization of the mean ﬁeld solution under a speciﬁcchannel model. This is the main contribution in this paper.Moreover, it is worth mentioning that our obtained policy isa threshold based policy and hence it is easy to implement inpractice. II. S YSTEM M ODEL

In this section, we introduce the system model of our wire-less network consisting of N transmitters communicating witha Base Station (BS). The transmitters correspond for exampleto users or to Machine Type Communication (MTC) devices.We will use the terms transmitter and user interchangeablythroughout the paper. We assume time to be slotted and usersto be synchronized to these time slots. At the beginning of eachtime slot, users that have been allotted enough transmissionpower will be able to transmit their packets. The latter not onlydepends on the allocated power but also the channel qualityof each user. We consider the channel state of transmitter n , i.e., h n ( t ) , to take values in the set { c , . . . , c K } . Wewill assume that h n ( t ) = c K is the best quality channeland c the worst. The channel is further assumed to evolveas an i.i.d. process from one time-slot to another, althoughour modeling framework holds for the more general case ofMarkovian channel dynamics as well.The users transmit on the same bandwidth and interferewith each other. For a given channel state h n ( t ) for user n ,and transmit power p n ( t ) the SINR of user n is given by SINR n ( h ( t ) , p ( t )) = h n ( t ) p n ( t ) (cid:80) k (cid:54) = n α k h k ( t ) p k ( t ) + N , where N is Gaussian noise, α k is a weight that comesfor example from the processing gain at the receiver (thisis widely used in the literature, e.g. in [4], [10] and in[11] for a CDMA system and a Match Filter receiver), h ( t ) = ( h ( t ) , . . . , h N ( t )) and p ( t ) = ( p ( t ) , . . . , p N ( t )) .We will assume that the transmit power of each transmitterin each time slot is bounded by p max . Namely, ≤ p n ( t ) ≤ p max , for all n ∈ { , . . . , N } . For ease of notation we deﬁne SINR n ( t ) := SINR n ( h ( t ) , p ( t )) . In order to receive correctlythe information at the receiver, it is required that

SINR n ( t ) ≥ θ. (1)where θ is a given threshold. For convenience, we also assumethat each transmitter can transmit at most one packet per timeslot if the SINR constraint in (1) is satisﬁed. The extensionto the case of higher rates is straightforward. Therefore, theachievable data rate of user n is given by R n ( h ( t ) , p ( t )) = { SINR n ( t ) ≥ θ } , (2)Let us now present the bursty data source and the queuedynamic for each user n . Let A n ( t ) be the (random) numberof packet arrivals to the transmitter n at the end of time slot t .Let A n ( t ) be i.i.d. over time slots. We assume that in each timeslot there will be at most one packet arrival, i.e., P ( A n ( t ) =1) = ρ and P ( A n ( t ) = 0) = 1 − ρ with ρ > .Each transmitter has a data queue for the bursty trafﬁc ﬂowtowards its associated receiver. Let Q φn ( t ) be the queue lengthat transmitter n at the beginning of time slot t under a powerallocation policy φ . The queue dynamic is then given by Q φn ( t + 1) = max { Q φn ( t ) − R n ( h ( t ) , p ( t )) , } + A n ( t ) . (3)For mathematical tractability, we assume that the queue lengthcannot exceed Q max and that packets that arrive during thebuffer overﬂow are dropped. Namely, Q φn ( t ) ∈ { , . . . , Q max } . Remark 1.

The dynamics of all N queues are coupledtogether due to the interference term in the expression of the SINR . The departure of the queue at each transmitter dependson the power actions of all the other transmitters.

The objective of the present work is to ﬁnd an optimalpower allocation policy φ taking into account the interferencesbetween users in the system.III. C ONTROL P ROBLEM F ORMULATION

Let X φn ( t ) = ( h n ( t ) , Q φn ( t )) , be the state of transmitter n , namely, the channel condition and the queue length. Thetransmit power is dynamically adapted to the global systemto handle the interference mitigation. In this work we focuson the set of all stationary policies Φ . Given a control policy φ ∈ Φ , the stochastic process X φn ( t ) is a controlled Markovchain with the following transition probabilities ν φn ( X φn ( t + 1) = ( c, q ) | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t ))= P ( h n ( t + 1) = c | h n ( t ) = c (cid:48) ) · P ( Q φn ( t + 1) = q | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t )) , for all n. bserve that, for the i.i.d. channel model, the probability P ( h n ( t + 1) = c | h n ( t ) = c (cid:48) ) reduces to P ( h n ( t + 1) = c ) .According to the system model in Section II, we note that in P ( Q φn ( t + 1) = q | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t )) q can only take threevalues q ∈ { q (cid:48) − , q (cid:48) , q (cid:48) + 1 } . We give explicit expression ofall transition probabilities in Appendix VI-A.The objective is to minimize the average power cost togetherwith the queue length. In order to reduce the delay andqueue overﬂow, users with a higher queue length should beprioritized over users with small number of packets to transmit.The objective of the present work is then to minimize L φ = lim sup T →∞ T T − (cid:88) t =0 N (cid:88) n =1 E (cid:2) p n ( t ) + λQ φn ( t ) (cid:3) where λ ≥ is a weight parameter that can be adjusted to inorder to ﬁnd a tradeoff between the power consumption thequeue length minimization.This problem, due to the complex interrelations betweenusers, is a very complex MDP problem. Well known simpleheuristics to solve such MDPs (such as Whittle’s index policy)fail in this problem, due to the interferences between users.In the next section we therefore develop a mean-ﬁeld approx-imation. A. Mean-Field Approach

We consider each user in the network as an object evolvingin a ﬁnite state space, the state of user n at time t is denotedas X φn ( t ) and equals ( c, q ) , where c ∈ { c , . . . , c K } and q ∈ { , . . . , Q max } . We assume that the users are distinguish-able only through their state. This means that the behaviorof the system only depends on the proportion of users inevery state. Let M N ( t ) be the empirical measure of thecollection of users, it is a S -dimensional vector with the n -th component given by M N ( t ) = ( M N ( t ) , . . . , M NK ( t )) , where M Ni ( t ) = ( M Ni, ( t ) , . . . , M Ni,Q max ( t )) , and M Ni,j ( t ) = N (cid:80) Nn =1 { X φn ( t )=( c i ,j ) } , for all i ∈ { , . . . , K } and all j ∈ { , . . . , Q max } . The value of M Ni,j ( t ) is to be interpretedas the proportion of transmitter/users in channel state c i andqueue length j . We then have that, the set of possible valuesfor M N is the set of probability measures on S = { ( c, q ) : c ∈ { c , . . . , c K } , q ∈ { , . . . , Q max }} . The mean ﬁeld approach allows us to move from a stochas-tic optimal control problem to a deterministic one. The ad-vantage is that we are no longer in an uncertain environmentand we can now overcome the curse of dimensionality duethe large number of users in the network. The limiting deter-ministic optimization problem is formulated as follows. Let usdenote by D M N ( t ) the expected drift of M N ( t ) , that is, D M φ := E ( M N ( t + 1) − M N ( t ) | M N ( t )) . We now aim at obtaining the explicit expression of theexpected drift under the policy φ . In order to do so, let usﬁrst deﬁne s i to be the state that corresponds to the i th entryin M N . Then we deﬁne ν φi,i ( m ) to be the probability that a user in state s i ∈ S at time slot t , transitions to state s j ∈ S at time slot t + 1 given that M N ( t ) = m , that is, ν φi,j ( m ) := g i ( m ) γ i,j + (1 − g i ( m )) γ i,j , where g i ( m ) is the fraction of users in state s i ∈ S whose SINR n ( t ) ≥ θ , with n a user in state s i . The values of γ ai,j for a = 0 , represent the transition probabilities from state s i to state s j , when the SINR n ( t ) ≥ θ for all users n in state s i if a = 1 and, when SINR n ( t ) < θ for all users n in state s i if a = 0 . These values depend on whether we assume an i.i.d.channel evolution model or a Markovian one. For both casesthe expressions of γ ai,j for a = 0 , can be found in AppendixB.We then have D M N ( t ) (cid:12)(cid:12)(cid:12)(cid:12) M N ( t )= m = (cid:80) i (cid:80) j ν φi,j ( m ) (cid:126)e ij = U φ ( m ) m , where (cid:126)e ij = (0 , . . . , , i th (cid:122)(cid:125)(cid:124)(cid:123) − , , . . . , , j th (cid:122)(cid:125)(cid:124)(cid:123) , , . . . ) ,i.e., the K · ( Q max + 1) dimensional vector with a − entry inthe i th position and the entry at j th position equal to and I is the identity matrix. We further have (cid:126)e ii = (cid:126) . Also note that U φi,j ( m ) = (cid:40) − (cid:80) r (cid:54) = i ν φi,r ( m ) if i = j,ν φj,i ( m ) if i (cid:54) = j. We can now deﬁne m ( t + 1) − m ( t ) = U φi,j ( m ( t )) m ( t ) . Thelatter can be seen as a ﬂuid system which is deﬁned for any m ( t ) and not only for probability densities.In the original stochastic problem we aim at minimizingthe long run expected average power and the queue length.Note that in the ﬂuid setting there are several power and queuetrajectories that reach to the same equilibrium point and hencewe aim at minimizing the biased cost . Assuming that φ is suchthat all users in same state s ∈ S are allocated same powerand that µ n = µ n (cid:48) if user n and n (cid:48) are both in the same state s ∈ S , we can equivalently write L φ,N ≈ lim sup T →∞ NT T − (cid:88) t =0 K · Q max +1 (cid:88) i =1 E ( p i ( t ) + λm i ( t ) σ ( i )) , (4)where σ ( · ) is a mapping between i ∈ { , K · ( Q max + 1) } andthe queue-length. Namely, if i = z ∗ K + j then σ ( i ) = j forall z ∈ { , . . . , Q max + 1 } .For the mean-ﬁeld approach we note that α n should scaleas /N in order to have a ﬁnite interference in the network.This can be the case where in dense networks the number ofusers that interfere scales as /N or for example when an ad-vanced receiver is used to cancel part of the interference. Thisnormalization is widely used in Mean ﬁeld approach, e.g. [7],[9], [10] and the references therein. Also, this normalizationhas been used and justiﬁed for instance in [11] for a CDMAsystem and an Match Filter receiver. In this case, the SINR ofuser n depends on the interference coming from other users,namely, I n ( t ) = 1 N (cid:88) i (cid:54) = n p i ( t ) h i ( t ) . (5) roposition 1. Let I n ( t ) be given by (5) , the interfer-ence perceived by transmitter n . We prove that I n ( t ) → I ( t ) , as N → ∞ . Consequently, a user in state i achieves SINR i ( t ) = p i ( t ) h i (cid:80) K · ( Q max +1) j =1 p j ( t ) h j m j ( t )+ N . Proof.

The result can be obtained by the Interchangeabilityproperty assumed in the mean ﬁeld.The problem is therefore to ﬁnd the power allocation policy φ such that weminimize (cid:90) ∞ (cid:18) K · ( Q max +1) (cid:88) i =1 p i ( t ) + λm i ( t ) σ ( i ) − E ∗ (cid:19) d t, where E ∗ is the optimal equilibrium cost, subject to d m ( t ) = U φ ( m ( t )) m ( t )d t, and p i ( t ) ≤ p max . That is we aim at char-acterizing a bias optimal policy.Next we reformulate the problem in order to be able tocharacterize an optimal policy φ for the problem introducedabove. We note that if we were to minimize (cid:80) K ( Q max +1) i p i ( t ) it sufﬁces to solve SINR i ( t ) = p i ( t ) h i N + (cid:80) K ( Q max +1) j =1 p j ( t ) m j ( t ) h j = θs i ( t ) , with s i ( t ) ∈ [0 , . The latter has a unique solution (cid:126)p ∗ ( (cid:126)s ( t )) = ( p ∗ ( (cid:126)s ( t )) , . . . , p ∗ K ( Q max +1) ( (cid:126)s ( t ))) given by p ∗ i ( t ) = p ∗ i ( (cid:126)s ( t )) = θN s i ( t ) h i (1 − (cid:80) K ( Q max +1) j =1 s j ( t ) m j ( t ) θ ) , for all h i > , p ∗ i ( t ) = s i ( t ) = 0 if h i = 0 . For thelatter solution (cid:126)p ∗ ( t ) to be a feasible solution we impose thefollowing assumptions. Assumption 1.

We assume θ < . The latter implies − (cid:80) K ( Q max +1) i =1 s i ( t ) m i ( t ) θ > and p ∗ i ( t ) ≥ for all s i ( t ) ∈ [0 , and all m i ( t ) ∈ [0 , . Assumption 2.

We assume θ ≤ P max h i / ( N + p max h i ) for all h i > . The latter implies p ∗ i ( t ) ≤ p max . Therefore, we aim at ﬁnding the control vector (cid:126)s ( t ) suchthat (cid:90) ∞ ( K ( Q max +1) (cid:88) i =1 θN s i ( t ) h i (1 − (cid:80) K ( Q max +1) j =1 θs j ( t ) m j ( t ))+ λσ ( i ) m i ( t ) − E ∗ )d t, (6)is minimized, subject to s i ( t ) ∈ [0 , , for all i ∈{ , . . . , Q max + 1 } and d (cid:126)m ( t )d t = (cid:126)m ( t + 1) − (cid:126)m ( t ) = U ( (cid:126)s ( t )) (cid:126)m ( t ) , where U ( (cid:126)s ( t )) is deﬁned below. Proposition 2.

Let s i ( t ) be the action with respect to usersin state ( h i , σ ( i )) . If h i = 0 or σ ( i ) = 0 then s i ( t ) = 0 Proof.

The proof is straightforward.The ﬁrst step to obtain a bias optimal solution is to char-acterize an optimal equilibrium cost E ∗ . In order to do so weﬁrst compute the conditions under which objective function (6)is convex. Let us deﬁne ( ¯ m i , ¯ s i ) for all i such that U (¯ s , . . . , ¯ s K ( Q max +1) ) · ( ¯ m , . . . , ¯ m K ( Q max +1) ) (cid:48) = 0 (7) Assumption 3.

For mathematical tractability, we will assumein the remaining of this section that H = { , } (i.e.GOOD/BAD state) and Q max = 1 . The assumption Q max = 1 is meaningful in the context where the transmitters are MTC(Machine Type Communications) devices or IoT objects (e.g.sensors) that transmit some updated estimations/parameters.Once a new estimation arrives, the old one in the buffer be-comes useless and is dropped. In this case, we have Q max = 1 . In this case ¯ s i = 0 for all i = 1 , , . Therefore the cost atequilibrium, E ( (cid:126) ¯ s ) , equals E ( (cid:126) ¯ s ) = θN ¯ s h i (1 − θ ¯ m ) + λ ¯ m . (8)Namely, E ( (cid:126) ¯ s ) = C ( (cid:126) ¯ m, − ¯ s ) + C ( (cid:126) ¯ m, s , with C ( ¯ m,

0) = λ ( ¯ m + ¯ m ) and C ( ¯ m,

1) = θN − θm + λ ( ¯ m + ¯ m ) .This equilibrium cost can be interpreted as the cost of beingpassive times the fraction of time the system is passive plusthe cost of being active multiplied by the fraction of time thesystem is active.Throughout the paper we will denote the optimal equilib-rium point by (cid:126)m ∗ and the optimal control by (cid:126)s ∗ . The optimalaverage cost, E ∗ is therefore E ∗ = θN s ∗ − θm ∗ + λ ( m ∗ + m ∗ ) . We have assumed that the channel is either in a GOODstate h = 1 or in a BAD state, i.e., h = 0 . In this casewe aim at determining (cid:126)s ( t ) = ( s ( t ) , . . . , s ( t )) for all t . ByProposition 2 we have that s ( t ) = s ( t ) = s ( t ) = 0 for all t . The objective is therefore to determine s ( t ) for all t .We denote by ρ the arrival probability and by β theprobability of being in the BAD state and by β the probabilityof being in the GOOD state. We have that U ( (cid:126) ¯ s ) equals  − β − β ρ β (1 − ρ ) ¯ s β (1 − ρ ) β ρ − β β ρ ¯ s β ρ + (1 − ¯ s ) β β (1 − ρ ) 0 − β ρ − β ¯ s β (1 − ρ ) β ρ β β ρ − ¯ s β (1 − ρ ) − β .  By solving U ( (cid:126) ¯ s ) · ( ¯ m , ¯ m , ¯ m , ¯ m ) (cid:48) = 0 we obtain ¯ m = β β (1 − ρ )¯ s ρ + β ¯ s (1 − ρ ) ; ¯ m = β ρρ + β ¯ s (1 − ρ )¯ m = β (1 − ρ )¯ s ρ + β ¯ s (1 − ρ ) ; ¯ m = β ρρ + β ¯ s (1 − ρ ) (9) Proposition 3.

Let (cid:126) ¯ s = (0 , , , ¯ s ) and (cid:126) ¯ m given by Equa-tion (9) . Assume N ≤ λ (1 − ρ )(1 − β θ ) ρθ . Then E ( (cid:126) ¯ s ) as givenby Equation (8) is convex. See Appendix VI-C for the proof.

1) An average optimal control:

In the next propositionwe characterize the optimal equilibrium point. The result ischaracterized by the following constants. N = λβ (1 − ρ )(1 − θβ ) ρθ , (10) N = λβ (1 − ρ ) ρ ( ρ + β − β ρ (1 + θ )) / ( β + ρ − β ρ ) θ (2 β (1 − ρ ) ρ (1 − θβ ) + ρ (1 − θβ ) + β (1 − ρ ) ) (11) roposition 4. Let (cid:126)s ∗ be given by (0 , , , s ∗ ) , and let N beas in (10) then • s ∗ = 0 and (cid:126)m ∗ = (0 , β , , β ) if N ≥ N . • s ∗ = 1 and m ∗ = β β (1 − ρ ) β (1 − ρ ) + ρ ; m ∗ = β ρρ + β (1 − ρ ) ,m ∗ = β (1 − ρ ) ρ + β (1 − ρ ) ; m ∗ = β ρρ + β (1 − ρ ) ) , if N ≤ N . • And s ∗ ∈ (0 , if N < N < N , with N given by λ (1 − ρ ) ¯ m (1 − θ ¯ m ) θρ ( θ ¯ m ( ¯ m − β )+ β ) . See Appendix VI-D for the proof.Proposition 4 suggests that a threshold policy in m isoptimal for the problem presented in Equation (6). This isshown in the next section.

2) Bias-optimal solution:

In this section we derive anoptimal solution for the deterministic control problem inEquation (6). In the previous section we have characterizedthe optimal equilibrium point based on the value of N . InProposition 5 (see the proof in Appendix E). We determinea bias-optimal control policy. Recall that, we are interestedin this solution since the average optimal cost obtained inProposition 4 can be achieved by any control policy withequilibrium point (cid:126)m ∗ . Proposition 5.

Let N and N be given by Equation (10) .An optimal solution for problem (6) is: • If N ≥ N , then s ( t ) = 0 if m ( t ) ≤ m and s ( t ) = 1 otherwise. • If N ≤ N , then s ( t ) = 0 if m ( t ) ≤ m and s ( t ) = 1 otherwise. • If N ∈ ( N , N ) , then s ( t ) = 0 if m ( t ) ≤ m ∗ and s ( t ) = 1 otherwise. We note that m = β ρ/ ( ρ + β (1 − rho )) , m = β and m ∗ solution of d E ∗ (¯ s ) / d¯ s . Proposition 5 tells us that the optimal solution of the mean-ﬁeld approximation is of threshold type. That is, it sufﬁces tocompare the fraction of users that have one packet to transmitand are in a good channel state with respect to N . This is avery simple heuristic for the original stochastic problem andas we will see in the next section is nearly optimal.IV. N UMERICAL RESULTS

In this section we numerically evaluate our mean-ﬁeldsolution as proposed in Proposition 5. We compare its perfor-mance with respect to the numerical optimal solution (obtainedthrough Value Iteration (VI)) of the original stochastic problemas presented in the beginning of Section III.We consider the following example. There are 10 transmit-ters in the system, two possible channel qualities GOOD/BADand each user has at most 1 packet to transmit. The latteris motivated by MTC where each machine (e.g., sensors)has few packets (e.g., temperature) to transmit. We assume θ = 0 . , β = 0 . , λ = 1 . and N = 1 . The packet arrivalprobability ρ will vary between 0.05 and 0.3. These values satisfy Assumptions 1 and 2. We observe in the table belowthat our proposed solution is nearly-optimal across all valuesof ρ . We compute the relative error | g MF − g V I | ∗ /g MF and the absolute error | g MF − g V I | ∗ , where g MF isthe average cost incurred by our policy and g V I the optimalaverage cost computed using VI. ρ ONCLUSION

We have studied the problem of power allocation in largewireless networks taking into account both channel stateinformation and queue state information. We identiﬁed anMDP formulation of this problem and in view of the statespace explosion, we performed a mean-ﬁeld approximationand let the number of devices grow to inﬁnity so as to obtaina deterministic control problem. By solving the HJB Equation,we derived a well-performing power allocation policy for theoriginal stochastic problem, which turns out to be a threshold-based policy and can then be efﬁciently implemented in real-life wireless networks. R

EFERENCES[1] 3GPP TR 45.820.

Cellular System Support for Ultra-Low Complexityand Low Throughput Internet of Things (CIoT) . 3GPP, 2015.[2] D.P. Bertsekas.

Dynamic programming and optimal control . AthenaScientiﬁc, 2005.[3] Viveck R Cadambe and Syed Ali Jafar. Interference alignment anddegrees of freedom of the K -user interference channel. IEEE Trans.Inform. Theory , 54(8):3425–3441, August 2008.[4] Mung Chiang, Prashanth Hande, Tian Lan, and Chee Wei Tan.

PowerControl in Wireless Cellular Networks . Foundations and Trends inNetworking, 2008.[5] Y. Cui, V. K. N. Lau, R. Wang, H. Huang, and S. Zhang. A surveyon delay-aware resource control for wireless systems – large deviationtheory, stochastic lyapunov drift, and distributed stochastic learning.

IEEE Transactions on Information Theory , 58(3):1677–1701, March2012.[6] A. Destounis, M. Assaad, M. Debbah, and B. Sayadi. Trafﬁc-awaretraining and scheduling for MISO wireless downlink systems.

IEEETransactions on Information Theory , 61(5):2574–2599, May 2015.[7] S. Lasaulce H. Tembine and M. Jungers. Joint power control-allocationfor green cognitive wireless networks using mean ﬁeld theory. In

IEEECROWNCOM , 2010.[8] Vincent K. N. Lau, Fan Zhang, and Ying Cui. Low complexity delay-constrained beamforming for multi-user MIMO systems with imperfectCSIT.

IEEE Trans. Signal Processing , 61(16):4090–4099, 2013.[9] F. Meriaux and S. Lasaulce. Mean-ﬁeld games and green power control.In

IEEE International Conference on Network Games, Control andOptimization (NetGCooP) , 2011.[10] F. Meriaux, S. Lasaulce, and H. Tembine. Stochastic differential gamesand energy-efﬁcient power control.

Dynamic Games and Applications ,3:3–23, 2013.[11] Farhad Meshkati, Mung Chiang, H Vincent Poor, and Stuart C Schwartz.A game-theoretic approach to energy-efﬁcient power control in multicar-rier cdma systems.

IEEE Journal on selected areas in communications ,24(6):1115–1129, 2006.[12] M.L. Puterman.

Markov Decision Processes: Discrete Stochastic Dy-namic Programming . John Wiley & Sons, 2005.[13] L. Tassiulas and Anthony Ephremides. Stability properties of constrainedqueueing systems and scheduling policies for maximum throughput inmultihop radio networks.

IEEE Transactions on Automatic Control ,37(12):1936–1948, Dec 1992.14] Wei Wang, Fan Zhang, and Vincent K. N. Lau. Dynamic power controlfor delay-aware device-to-device communications.

IEEE Journal onSelected Areas in Communications , 33(1):14–27, 2015.

VI. A

PPENDIX

A. Expressions of transition probabilities of the MDP

Here we provide expressions of the transition probabilitiesof the MDP deﬁned in Section III. We have P ( Q φn ( t + 1) = q | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t ))= P ( Q φn ( t + 1) = q (cid:48) | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t ))+ P ( Q φn ( t + 1) = q (cid:48) + 1 | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t ))+ P ( Q φn ( t + 1) = [ q (cid:48) − , + | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t )) , where [ q (cid:48) − , + = max { q (cid:48) − , } . Recall that P ( A n ( t ) =1) = ρ we therefore have P ( Q φn ( t + 1) = q (cid:48) | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t ))= ρ { SINR n ( t ) ≥ θ } + (1 − ρ ) { SINR n ( t ) <θ } , P ( Q φn ( t + 1) = q (cid:48) + 1 | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t ))= ρ { SINR n ( t ) <θ } P ( Q φn ( t + 1) = [ q (cid:48) − , + | Q φn ( t ) = q (cid:48) , h ( t ) , p ( t ))= (1 − ρ ) { SINR n ( t ) ≥ θ } . B. Transition probabilities

The transition probabilities can be found in Table I below.

C. Proof of Proposition 3

We denote E (¯ s ) = E ( (cid:126) ¯ s ) , since it only depends on theequilibrium control ¯ s . To prove convexity of E (¯ s ) is sufﬁcesto show that d E (¯ s ) / d s ≥ for all ¯ s ∈ [0 , . Letus ﬁrst compute ∂ ¯ m /∂ ¯ s , namely, ∂ ¯ m ∂ ¯ s = β ( ρ − ρ ( ρ + β ¯ s (1 − ρ )) . Therefore, the condition ∂ ¯ m /∂ ¯ s ≥ simpliﬁes to ∂ ¯ m ∂ ¯ s = β ( ρ − ρ ( ρ + β ¯ s (1 − ρ )) ≥ . To see that the latter is always greater orequal to 0 it sufﬁces to recall that < ρ < , < β < and ¯ s ∈ [0 , . Similarly, ∂ ¯ m ∂ ¯ s = β β ( ρ − ρ ( ρ + β ¯ s (1 − ρ )) ≥ . We nowcompute ∂ E (¯ s ) ∂ ¯ s , to do so, we ﬁrst compute ∂E (¯ s ) /∂ ¯ s ,namely, ∂∂ ¯ s (cid:18) θN ¯ s − θ ¯ m + λ ( ¯ m + ¯ m ) (cid:19) = ∂∂ ¯ s (cid:32) θN ¯ s − θ β ρρ + β ¯ s (1 − ρ ) + λ (cid:18) ρρ + β ¯ s (1 − ρ ) (cid:19)(cid:33) = ∂∂ ¯ s (cid:18) θN ¯ s ( ρ + β ¯ s (1 − ρ )) ρ + β ¯ s (1 − ρ ) − θβ ρ + λρ ( ρ + β ¯ s (1 − ρ )) (cid:19) , where the second inequality follows from the substitution of ¯ m = β ρ/ ( ρ + β ¯ s (1 − ρ )) and ¯ m = (1 − β ) ρ/ ( ρ + β ¯ s (1 − ρ )) . We therefore have, after some algebra, ∂E (¯ s ) ∂s equals θN (2 β ρ ¯ s (1 − ρ )(1 − θβ ) + ρ (1 − θβ ) + ¯ s β (1 − ρ ) ( ρ + β ¯ s (1 − ρ ) − θβ ρ ) ) − λρβ (1 − ρ )( ρ + β ¯ s (1 − ρ )) . ABLE IT

RANSITION PROBABILITIES FROM STATE s i TO s j Independent and identically distributed channel model: γ i,j ( m ) =  β (cid:96) ρ, if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = r (cid:48) and (cid:96) (cid:48) ∈ { , . . . , K − } ,β (cid:96) (1 − ρ ) , if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = max { r (cid:48) − , } and (cid:96) (cid:48) ∈ { , . . . , K − } , , otherwise ,γ i,j ( m ) =  β (cid:96) (1 − ρ ) , if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = r (cid:48) and (cid:96) (cid:48) ∈ { , . . . , K − } ,β (cid:96) ρ, if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = min { r (cid:48) + 1 , Q max + 1 } and (cid:96) (cid:48) ∈ { , . . . , K − } , , otherwise , Markov channel model: γ i,j ( m ) =  b (cid:96) (cid:48) (cid:96) ρ, if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = r (cid:48) and (cid:96) (cid:48) ∈ { , . . . , K − } ,b (cid:96) (cid:48) (cid:96) (1 − ρ ) , if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = min { r (cid:48) − , } and (cid:96) (cid:48) ∈ { , . . . , K − } , , otherwise ,γ i,j ( m ) =  b (cid:96) (cid:48) (cid:96) (1 − ρ ) , if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = r (cid:48) and (cid:96) (cid:48) ∈ { , . . . , K − } ,b (cid:96) (cid:48) (cid:96) ρ, if j = (cid:96) ( Q max + 1) + r with ≤ r ≤ Q max + 1 and (cid:96) ∈ { , . . . , K − } , and i = (cid:96) (cid:48) ( Q max + 1) + r (cid:48) with r = min { r (cid:48) + 1 , Q max + 1 } and (cid:96) (cid:48) ∈ { , . . . , K − } , , otherwise , where β i = P ( h n ( t ) = c i ) (the probability that user n is in channel state c i in the i.i.d. model). ∂ E (¯ s ) ∂s can now easily be computed, we obtain ∂ E (¯ s ) ∂ ¯ s = − θ N ρ β (1 − ρ )(1 − β θ )( ρ + β ¯ s (1 − ρ ) − θβ ρ ) + 2 λρβ (1 − ρ ) ( ρ + β ¯ s (1 − ρ )) = β ρ ( ρ − (cid:18) θ N ρ (1 − β θ )( ρ + β ¯ s (1 − ρ ) − θβ ρ ) − λ (1 − ρ )( ρ + β ¯ s (1 − ρ )) (cid:19) . We want to show the latter to be ≥ , and since ρ < itsufﬁces to show θ N ρ (1 − β θ )( ρ + β ¯ s (1 − ρ ) − θβ ρ ) − λ (1 − ρ )( ρ + β ¯ s (1 − ρ )) ≤ , which holds if and only if N ≤ f (¯ s ) , (12)with f (¯ s ) = λ (1 − ρ )( β ¯ s + ρ (1 − β ¯ s ) − ρβ θ ) θ ρ (1 − β θ )( β ¯ s + ρ (1 − β ¯ s )) . We will now show that Inequality (12) is implied by thecondition on N in the statement,i.e., Equation (10). To do so we will show that f (¯ s ) is increasing in ¯ s ∈ [0 , , that is, f (cid:48) (¯ s ) ≥ for all ¯ s ∈ [0 , . We have f (cid:48) (¯ s ) = λ (1 − ρ ) θ ρ (1 − β θ ) ∂∂ ¯ s (cid:18) − θβ ρρ + β ¯ s (1 − ρ ) (cid:19) λ (1 − ρ ) θ ρ (1 − β θ ) (cid:18) − θβ ρρ + β ¯ s (1 − ρ ) (cid:19) · ∂∂ ¯ s (cid:18) − θβ ρρ + β ¯ s (1 − ρ ) (cid:19) = 3 λ (1 − ρ ) θ (1 − β θ ) (cid:18) − θβ ρρ + β ¯ s (1 − ρ ) (cid:19) β (1 − ρ )( ρ + β ¯ s (1 − ρ )) ≥ . Inequality (12) is therefore satisﬁed since the condition in thestatement Equation (10) ensures N ≤ ηµ (1 − ρ )(1 − β θ ) ρθ = f (0) ≤ f (¯ s ) . Hence, ∂ E (¯ s ) ∂ ¯ s ≥ , and E (¯ s ) is convex in ¯ s for all N ≤ f (0) . D. Proof of Proposition 4Proof.

We want to characterize the optimal equilibrium point.In Proposition 3 we have proven that the equilibrium cost E ∗ ( (cid:126) ¯ s ) is convex in ¯ s , we therefore distinguish between threepossible cases, see Figure 1. ase 1Case 2Case 30 s Fig. 1. Case 1: the cost at equilibrium is increasing in ¯ s . Case 2: there exists ¯ s ∈ (0 , such that d E (¯ s ) / d¯ s = 0 . Case 3: the cost at equilibrium isdecreasing in ¯ s . • Case 1: d E ∗ / d¯ s ≥ for all ¯ s ∈ [0 , . In this case s ∗ = 0 . • Case 2: d E (¯ s ) / d¯ s = 0 for ¯ s ∈ (0 , . In this case s ∗ is such that d E ( s ∗ ) / d s ∗ = 0 . • Case 3: d E (¯ s ) / d¯ s for all ¯ s ∈ [0 , . In this case s ∗ = 1 .We compute the ﬁrst derivative of E ( (cid:126) ¯ s ) w.r.t. ¯ s which aftersome algebra reduces to θN (2 β ρ ¯ s (1 − ρ )(1 − θβ ) + ρ (1 − θβ ) + ¯ s β (1 − ρ ) )( ρ + β ¯ s (1 − ρ ) − θβ ρ ) − λρβ (1 − ρ )( ρ + β ¯ s (1 − ρ )) . (13)We know the latter is an increasing function, since its deriva-tive w.r.t. ¯ s is ≥ (see Proposition 3). We then know that if d E ( (cid:126) ¯ s ) / d¯ s ≥ for ¯ s = 0 then it is also for ¯ s ∈ (0 , .Similarly, if d E ( (cid:126) ¯ s ) / d¯ s ≤ for ¯ s = 1 then also for ¯ s ∈ [0 , . Next we ﬁnd the condition so that d E ( (cid:126) ¯ s ) / d¯ s ≥ for ¯ s = 0 . Substituting ¯ s = 0 in Equation (13) we obtain d E ∗ ( (cid:126) ¯ s ) / d¯ s (cid:12)(cid:12)(cid:12)(cid:12) ¯ s =0 ≥ ⇐⇒ N ≥ λβ (1 − ρ )(1 − β θ ) ρθ . Equivalently, if we substitute ¯ s = 1 in Equation (13) weobtain d E ∗ ( (cid:126) ¯ s ) / d¯ s (cid:12)(cid:12)(cid:12)(cid:12) ¯ s =1 ≤ ⇐⇒ N ≤ β λ (1 − ρ ) ρ ( ρ + β − β ρ (1 + θ )) / ( β + ρ − β ρ ) θ (2 β (1 − ρ ) ρ (1 − θβ ) + ρ (1 − θβ ) + β (1 − ρ ) ) . We have therefore proven that if N ≥ N then s ∗ = 0 , if N ≤ N then s ∗ = 1 and if N ∈ ( N , N ) then s ∗ ∈ (0 , and it is the solution obtained by equating Equation (13) with0, that is, N = λρβ (1 − ρ )( ρ + β ¯ s (1 − ρ ) − θβ ρ ) / ( ρ + β ¯ s (1 − ρ )) θ (2 β ¯ s (1 − ρ )(1 − θβ ) ρ + ρ (1 − θβ ) + ¯ s β (1 − ρ ) ) . (14) The latter after substitution of ¯ s = ρ ( β − ¯ m ) / ( β (1 − ρ ) ¯ m ) yields N = λβ ρ (1 − ρ )( ρ + ρ ( β − ¯ m m − θβ ρ ) ( ρ + ρ ( β − ¯ m m ) θ (2 ρ ( β − ¯ m )(1 − θβ ) ρ ¯ m + ρ (1 − θβ ) + ρ ( β − ¯ m ) ¯ m ) , (15)which after some algebra reduces to N = λβ (1 − ρ ) ¯ m (1 − θ ¯ m ) θρ ((1 − β θ ) ¯ m (2 β − ¯ m ) + ( β − ¯ m ) )= λ (1 − ρ ) ¯ m (1 − θ ¯ m ) θρ ( θ ¯ m ( ¯ m − β ) + β ) (16) E. Proof of Proposition 5

In order to prove that the control in the statement ofProposition 5 is optimal it sufﬁces to show that the Hamilton-Jacobi-Bellman (HJB) equation is satisﬁed. The HJB equationis a partially differentiable equation that serves as sufﬁcientcondition for optimality for optimal control problems, see [2].The HJB equation in our particular problem reduces to thefollowing condition, min {V ( (cid:126)m ) , V ( (cid:126)m ) } = 0 , (17)for all (cid:126)m ∈ [0 , , where V ( (cid:126)m ) = λ ( m + m ) − E ∗ + ∂V ( (cid:126)m ) ∂ (cid:126)m ϕ ( (cid:126)m ) , (18) V ( (cid:126)m ) = θN − θm + λ ( m + m ) − E ∗ + ∂V ( (cid:126)m ) ∂ (cid:126)m ϕ ( (cid:126)m ) , (19)and V ( · ) the Bellman value function. In the latter equation, ϕ a ( (cid:126)m ) for a ∈ { , } represents the vector of the evolutionsof the states m i i = 1 , . . . , , under action a . We will denoteaction a = 1 the active action, and a = 0 the passive action.Then ϕ a ( (cid:126)m ) = ( ϕ a ( (cid:126)m ) , . . . , ϕ a ( (cid:126)m )) (cid:48) , with ϕ a ( (cid:126)m ) = − ( β + β ρ ) m + β (1 − ρ ) m + aβ (1 − ρ ) m ,ϕ a ( (cid:126)m ) = β ρm − β m + β ρm + ( ρa + (1 − a )) β m ,ϕ a ( (cid:126)m ) = β (1 − ρ ) m − ( β ρ + β ) m + aβ (1 − ρ ) m ,ϕ a ( (cid:126)m ) = β ρm + β m + β ρm − ( β a (1 − ρ ) + β ) m . (20)Condition (17) is written for the case s ( t ) ∈ { , } , as theseare all the possible controls we are interested on.We ﬁrst note that if an optimal solution satisﬁes the HJBequation then the following conditions must hold in all switch-ing points: V ( (cid:126)m ) = V ( (cid:126)m ) , (21) V ( (cid:126)m ) = 0 , (22) ∂∂m (cid:18) ∂V ( (cid:126)m ) ∂m (cid:19) = ∂∂m (cid:18) ∂V ( (cid:126)m ) ∂m (cid:19) . (23)ondition (21) must hold in all points (cid:126)m for which beingactive or passive is equally attractive, namely, in all switchingpoints. Condition (22) must be satisﬁed by all (cid:126)m at whichpassive action is optimal, in particular, for all switching points.Finally, Condition (23), symmetry of the second derivativesof the value function, must hold at all points. The partialderivative of ∂V ( (cid:126)m ) /∂m i must be continuous in the decisionboundary.Before proving that all three conditions (21)- (23) aresatisﬁed we are going to show that ∂V ( (cid:126)m ) ∂m = ∂V ( (cid:126)m ) ∂m . Bydeﬁnition V ( (cid:126)m ) = (cid:90) ∞ (cid:18) θN s π ( t )1 − θm π ( t ) s π ( t ) + λ (1 − m π ( t ) − m π ( t )) − E ∗ (cid:19) d t, where π is considered to be an optimal policy. Therefore wehave ∂V ( (cid:126)m ) ∂m i = (cid:90) ∞ ∂∂m i (cid:18) θN s π ( t )1 − θm π ( t ) s π ( t )+ λ (1 − m π ( t ) − m π ( t )) (cid:19) d t, for all i . We are going to show that ∂ ( m π ( t ) + m π ( t )) ∂m = ∂ ( m π ( t ) + m π ( t )) ∂m , and ∂∂m (cid:18) − θm π ( t ) (cid:19) = ∂∂m (cid:18) − θm π ( t ) (cid:19) . (24)The policy π is a combination of passive and active intervals,therefore we will compute m π,ai ( t ) in a passive time interval(when a = 0 ) and in an active time interval (when a = 1 )for all i = 1 , , . We will later prove that Equations (24) aresatisﬁed. Note that d m π,ai ( t )d t = ϕ ai ( (cid:126)m π,a ( t )) , for all i = 1 , , . (25)We do not consider m π,a ( t ) , since m π,a ( t ) = 1 − m π,a ( t ) − m π,a − m π,a ( t ) . If we solve the ordinary differential equationsystem (25) we obtain m π, ( t ) = ( β m − β m )e − t + β ( m + m )e − ρt ,m π, ( t ) = ( β m − β m )e − t + β ( m + m )e − ρt ,m π, ( t ) = β (1 + e − t ( m + m − − t m − ( m + m ) β e − ρt , for all initial points (cid:126)m , and m π, ( t ) = e − (1+ β ( ρ − t β ( ρ − (cid:18) − m + m + m + β (1 − m − m + m ( ρ −

1) + m ρ + m ρ )+ ( − m + m + β ( − t )( ρ − − β + β ( m + m + ρ − m ρ − m ρ ))e β ( ρ − t − β ( ρ − t + β ( ρ − t (cid:19) , m π, ( t )= e − (1+ β ( ρ − t β ( ρ − (cid:18) − m + m + m + β (1 + m + m + m − m ρ − m ρ − m ρ ) − β ( − m + 2 m − m ( ρ − − m ρ − m ρ )+ ( − (1 + m + m + m ) − β ( − t )( ρ − − m + m ( ρ −

1) + 2 m ( ρ −

1) + 2 m ( ρ − − ρ + m ρ ) β + β (3 + m ( ρ −

2) + m ( ρ −

3) + m ( ρ − − ρ ))e β ( ρ − t + (2( ρ − β − β ( ρ − t + β ( ρ − t (cid:19) ,m π, ( t )= e − t − β ( ρ − t β (1 + β ( ρ − (cid:18) − m + m + m + β (1 − m + m + m ) − ρ ( m + m + m )+ 2 β ( − m + m m − ρ ( m + m + m ))+ (( − m + m + m ) − β ( − m + m )( ρ − β (2 − m ( ρ −

2) + m ( ρ − − ρ − m + m ρ ))e β ( ρ − t + ( β ρ + β ρ )e t + β ( ρ − t (cid:19) . It is now easy to show that ∂ ( m π,a ( t )+ m π,a ( t )) ∂m = ∂ ( m π,a ( t )+ m π,a ( t )) ∂m , since ∂ ( m π, ( t ) + m π, ( t )) ∂m i = e − t ( − β + e − β ( − ρ ) t ) β , for all i = 1 , and ∂ ( m π, ( t ) + m π, ( t )) ∂m i = e ρt , for all i = 1 , . Besides, ∂∂m (cid:16) − θm π,a ( t ) (cid:17) = ∂∂m (cid:16) − θm π,a ( t ) (cid:17) , for a =0 , , since ∂m π,a ( t ) ∂m = ∂m π,a ( t ) ∂m for a = 0 , . Hence, ∂V ( (cid:126)m ) ∂m = ∂V ( (cid:126)m ) ∂m .Let us now show under which conditions Equations (21)-(23) are satisﬁed. We start from Equation (21), namely, V ( (cid:126)m )= λ (1 − m − m ) − E ∗ + ∂V ( (cid:126)m ) ∂m ( ϕ ( (cid:126)m ) + ϕ ( (cid:126)m ))+ ∂V ( (cid:126)m ) ∂m ϕ ( (cid:126)m ) = θN − θm + λ (1 − m − m ) − E ∗ + ∂V ( (cid:126)m ) ∂m ( ϕ ( (cid:126)m ) + ϕ ( (cid:126)m )) + ∂V ( (cid:126)m ) ∂m ϕ ( (cid:126)m )= V ( (cid:126)m ) . From the latter we obtain the condition ∂V ( (cid:126)m ) ∂m ( ϕ ( (cid:126)m ) + ϕ ( (cid:126)m ) − ϕ ( (cid:126)m ) − ϕ ( (cid:126)m ))= θN − θm + ∂V ( (cid:126)m ) ∂m ( ϕ ( (cid:126)m ) − ϕ ( (cid:126)m )) , hich after substitution of the values of ϕ ai ( (cid:126)m ) , given byEquation (20), gives ∂V ( (cid:126)m ) ∂m = ∂V ( (cid:126)m ) ∂m β + θN (1 − θm )(1 − ρ ) β m . (26)We solve for Equation (22) next. Namely, V ( (cid:126)m ) = 0 ⇔ λ (1 − m − m ) − E ∗ + ∂V ( (cid:126)m ) ∂m ( ϕ ( (cid:126)m ) + ϕ ( (cid:126)m )) + ∂V ( (cid:126)m ) ∂m ϕ ( (cid:126)m ) = 0 , which after substitution of Equation (26) and ϕ i ( (cid:126)m ) for all i = 1 , , , given by Equation (20), we obtain ∂V ( (cid:126)m ) ∂m = ( E ∗ − λ (1 − m − m )) β ( β − β ( m + m ) − m ) − θN ( β ( ρ − m + m ) + β − m )(1 − θm )(1 − ρ )( β − β ( m + m ) − m ) m . Therefore, substituting the value of ∂V ( (cid:126)m ) ∂m obtained above inEquation (26) we get ∂V ( (cid:126)m ) ∂m = θN (1 − θm )(1 − ρ ) β m + E ∗ − λ (1 − m − m ) β − β ( m + m ) − m − θN ( β ( ρ − m + m ) + β − m ) β (1 − θm )(1 − ρ )( β − β ( m + m ) − m ) m = E ∗ − λ (1 − m − m ) β − β ( m + m ) − m − θN β ρ ( m + m ) β m (1 − θm )(1 − ρ )( β − β ( m + m ) − m ) . We are now left with Equation (23), that is, ∂∂m (cid:18) ∂V ( (cid:126)m ) ∂m (cid:19) = ∂∂m (cid:18) ∂V ( (cid:126)m ) ∂m (cid:19) . Let us ﬁrst compute ∂∂m (cid:16) ∂V ( (cid:126)m ) ∂m (cid:17) , namely, ∂∂m (cid:18) ∂V ( (cid:126)m ) ∂m (cid:19) = β ( E ∗ + λ ( − m + m ))( β ( − m + m ) + m ) − N θ m − m θ ( m ( β ( − m + m ) + m ) ( − ρ )( − m θ ) ) − N θ ( β ( − m + m )(1 + ( m + m )( − ρ ))( − m θ )( m ( β ( − m + m ) + m ) ( − ρ )( − m θ ) )+ N θ β m ( − m θ + ( m + m )(2 − ρ + m θ ( − ρ )))( m ( β ( − m + m ) + m ) ( − ρ )( − m θ ) ) We now compute ∂∂m (cid:16) ∂V ( (cid:126)m ) ∂m (cid:17) , that is, ∂∂m (cid:18) ∂V ( (cid:126)m ) ∂m (cid:19) = E ∗ β − λm ( β − β ( m + m ) − m ) − θN ρ ( β − m )(1 − θm )(1 − ρ ) m ( β − β ( m + m ) − m ) . By equating ∂∂m (cid:16) ∂V ( (cid:126)m ) ∂m (cid:17) and ∂∂m (cid:16) ∂V ( (cid:126)m ) ∂m (cid:17) we obtain N = λm ( ρ − θm − θ ( β (1 + ( m + m )( ρ − θm −

1) + m (1 − θm + ρ ( θm − . (27)Note that in equilibrium ¯ ϕ ( (cid:126) ¯ m ) = β ( ρ − m + ¯ m ) + β − ¯ m − β ¯ s (1 − ρ ) ¯ m = 0 and ¯ s = ρ ( β − ¯ m ) β (1 − ρ ) ¯ m , therefore β ( ρ − m + ¯ m ) + β = ¯ m + ρ ( β − ¯ m ) . Substituting the latter in the denominator of Equation (27) weobtain N = λm ( ρ − θm − θ (( ¯ m + ρ ( β − ¯ m ))(2 θm −

1) + m (1 − θm + ρ ( θm − N = λm ( ρ − θm − θρ ( θ ¯ m (2 β − ¯ m ) − β ) N = λm (1 − ρ )( θm − θρ ( θ ¯ m ( ¯ m − β ) + β ) . The latter coincides with the N0