Dynamic Service Rate Control for a Single Server Queue with Markov Modulated Arrivals
aa r X i v : . [ m a t h . O C ] J u l Dynamic Service Rate Control for a Single Server Queue withMarkov Modulated Arrivals
Ravi Kumar , Mark E. Lewis and Huseyin Topaloglu School of Operations Research and Information EngineeringCornell UniversityIthaca, NY 14853
September 19, 2018 [email protected] [email protected] [email protected] bstract We consider the problem of service rate control of a single server queueing system with afinite-state Markov-modulated Poisson arrival process. We show that the optimal service rateis non-decreasing in the number of customers in the system; higher congestion rates warranthigher service rates. On the contrary, however, we show that the optimal service rate is notnecessarily monotone in the current arrival rate. If the modulating process satisfies a stochasticmonotonicity property the monotonicity is recovered.We examine several heuristics and show where heuristics are reasonable substitutes for theoptimal control. None of the heuristics perform well in all the regimes. Secondly, we discusswhen the Markov-modulated Poisson process with service rate control can act as a heuristicitself to approximate the control of a system with a periodic non-homogeneous Poisson arrivalprocess. Not only is the current model of interest in the control of Internet or mobile networkswith bursty traffic, but it is also useful in providing a tractable alternative for the control ofservice centers with non-stationary arrival rates.
Introduction
In this paper, we study a fundamental queueing control problem; managing the service rateof a server in the face of non-stationary arrival rates. We have a queue with an infinitebuffer and a single server. Arrivals occur according to a Markov-modulated Poisson process(MMPP), which is to say that the rate of the Poisson process driving the arrivals into thesystem changes according to an exogenous Markov process. This exogenous Markov process iscommonly referred to as the phase modulating process . The service times are assumed to beexponential. We incur a holding cost for each job in the system and there is a cost for runningthe server at different service rates. Given that the state of the phase modulating process andstate of the queue are known, we want to find a policy to adjust the service rate so as tominimize either the expected discounted cost or the long-run average cost rate over an infinitehorizon.Our model is motivated by the power aware transmission policies that are becomingincreasingly important over the Internet and in mobile networks. The goal of such policies is tocontrol the power consumption of wireless devices by adjusting the transmission rate in responseto the number of packets waiting to be transmitted in the buffer. Due to changes in incomingand outgoing traffic through the device, it is almost always the case that the packet arrivalsdisplay non-stationarities, creating periods of bursts followed by near-complete silence. Ouruse of an MMPP to model arrivals is intended to capture such periods of bursts and silence.An alternative model to capture the non-stationary nature of arrivals is the non-homogenousPoisson process (NHPP), but for the wireless applications we have in mind, MMPP appearsto be a more suitable model of non-stationarity since these applications involve arrival ratechanges occurring at random points in time, whereas an NHPP models arrival rates as a fixedfunction of time. Furthermore, when computing the optimal policy under an NHPP, one needsto keep track of time together with the queue length, resulting in an uncountable state space.This issue is not present when dealing with an MMPP.Controlling queues when arrivals have varying rates poses interesting challenges. Whencontrolling such queues, the policy in use not only needs to consider the current arrival rate,but it also needs to anticipate the arrival rate in the near future and adjust decisions accordingly.For example, if the current arrival rate is relatively low, but arrivals are expected to be morefrequent in the near future, then the control policy may choose to proactively speed up theservice rate to empty the system (as much as possible) before the higher arrival rate strikes.Similarly, if the current arrival rate is high and the current system load is high, the controlpolicy may slow down the service rate in anticipation of lower arrival rates in the near future.The extent to which changes in arrival rates can be foreseen or anticipated depends on how thenon-stationarity is modeled, but policies that explicitly address the non-stationarity in arrivalrate are naturally expected to make better decisions than those that do not. Furthermore, the In Kendall’s notation, our queueing system is classified as MMPP/M/1 { , , . . . , L } andthe arrival rates of the MMPP are ordered such that λ ≤ λ ≤ · · · ≤ λ L . In this context,the first interesting question is whether the optimal service rate is monotone in the queuelength, for a fixed state of the phase modulating process. We answer this question, not toosurprisingly, in the affirmative, indicating that the optimal service rate is higher as we havemore jobs in the buffer, all else being equal. The second interesting question is whether theoptimal service rate is monotone in the state of the phase modulating process, for a fixedqueue length. Perhaps surprisingly, the answer to this question is not necessarily affirmative,indicating that the optimal service rate is not necessarily higher as the arrival rate becomeslarger, all else being equal. This observation builds the intuition that the optimal policy shouldanticipate the arrival rate in near future. For example, even if the current arrival rate is higher,the optimal policy may choose not to serve the jobs faster because the arrival rate is expected toslow down soon after the higher arrival rate strikes. Thus, although it may be surprising to seethat the optimal service rate is not necessarily monotone in the state of the phase modulatingprocess, this non-monotonicity embodies the intuitive expectation that the optimal policy maystart using faster service rates even before higher arrival rates strike or may start using slowerservice rates even before arrival rates slow down. Motivated by this observation, a naturalquestion is when we can expect the optimal policy to actually be monotone in the state ofthe phase modulating process so that the behavior that we just mentioned is not prevalent.We give sufficient conditions under which the optimal service rate is indeed monotone in thestate of the phase modulating process. These conditions are simple to check and they onlydepend on the structure of the CTMC driving the phase process. These structural results arenot only important in providing insights but are also useful in deriving efficient approximationmethods. When the phase process for the MMPP has a large number of states, computing anoptimal policy using value or policy iteration may still be a difficult task. In these situations,the structural properties of the value function can be used to develop approximate dynamicprogramming methods and obtain approximate results efficiently (see for example Powell[21]).We include a numerical study with two goals in mind. First, we examine when it is importantto explicitly capture the non-stationary behavior of an arrival process via MMPP as opposedto using some natural heuristic like assuming the system has stationary arrivals. To implementthe optimal policy, a decision maker needs to look at both the state of the queue and the stateof the phase process of the MMPP while a heuristic control mechanism based only on the queuelength or some fixed service rate may be easier to implement. Thus, a comparison between thetwo helps in determining the value of a more complex control mechanism. Second, since we2entioned the alternative of using an NHPP to capture the non-stationarity in the arrivals,we explore the possibility of using a “suitable” MMPP to approximate the control policy fora system with an NHPP with a periodic rate function. We find that our preliminary resultsare encouraging. This is a significant diversion from previous studies since the focus here is oncomputing an optimal control and not solely on evaluating performance measures.Most of the research related to Markov-modulated queueing systems deals with performancecharacteristics for systems without control. Excellent overviews of this line of work can be foundin the survey paper by Prabhu [22] or more recently in Gupta et al. [13]. A hierarchical schemebased on MMPP was proposed by Muscariello et al. [20] to model the data generated byInternet users. Heffess et al. [14] used MMPP to approximate a statistical multiplexer whoseinputs consist of superposition of packetized voice sources and data. A two state MMPP modelwas proposed by Shah-Heydari [24] to model the so-called aggregate asynchronous transfermode traffic. For more general scenarios, Frost [10] proposed a scheme to approximate a simpleNHPP using an MMPP by suitably quantizing the rate function of the NHPP into a finitenumber of rates. Each rate corresponds to a state in the Markov modulating process and theparameters of the MMPP model can be estimated using empirical data.There is also a rich body of literature on the subject of monotone optimal policies for thecontrol of a single server queue in a setting similar to the one considered here but with stationaryarrivals. See, for example, the classical work of Crabill [6], Lippman [18] and Stidham and Weber[25]. In the context of telecommunications systems, the existing literature addresses a moreclosely related problem of service rate control of queues when the job service requirements areinfluenced by an exogenous stochastic process . Such models arise frequently in point-to-pointwireless data transmission where the induced transmission rates are affected by the time varyingproperties of the transmission medium. Berry [3] considers a very general model for this problemunder a discrete-time Markovian setting. In this work, packet arrivals follow a batch Markovprocess and the state of the transmission channel varies according to a secondary discrete-timeMarkov chain. The data buffer and transmitter are modeled using a single server queue withfinite capacity. The goal of the transmitter is to minimize the average cost rate or powerconsumption over an infinite time horizon subject to a constraint on packet delay. Anothercase with a constraint on the probability of buffer overflow is also discussed in this work. Theauthor proves several results related to the monotonicity of the optimal policy. Motivatedby mobile networks, Ata and Zachariadis [2] address the problem of finding optimal servicerates for multiple users that are being served by a central controller. Data gets transmittedthrough a time varying channel that is being modulated by a two-state continuous time Markovchain. Packet data for each user arrives based on a Poisson process and gets stored in a finitecapacity queue before getting transmitted. The objective is to maximize some measure of overallquality of service subject to a constraint on the long-run average power consumption. The Similar to the present work, the policy for this type of model depends on both the queue length and thestate of the exogenous process.
We consider a single server queue with infinite buffer capacity and job arrivals that followan MMPP. Each arriving job has an exponentially distributed service requirement with mean1. The phase transition process for arrivals is an ergodic , finite state continuous time Markovchain with generator matrix Q . Let the state space for this process be denoted by S := { , , . . . , L } . When the phase transition process is in phase s ∈ S , jobs arrive to the queueaccording to a Poisson process with rate λ s . Without loss of generality we assume that thestates are ordered such that λ ≤ λ ≤ · · · ≤ λ L . Let the number of jobs in the system (bufferstate) be denoted by n ∈ Z + , where Z + is the set of non-negative integers. The service rate canbe changed at the times of arrivals, departures or phase transitions. Together, the union of theseevent times and (in a moment) the added dummy transitions due to uniformization comprisethe set of decision epochs. Based on the queue length, n , and the state of the arrival process, s ,the controller selects a service rate µ n,s from the compact set A = [0 , ¯ u ], ¯ u < ∞ . When a servicerate µ ∈ A is chosen, the system incurs a cost at the rate of c ( µ ) per unit time. The cost ratefunction, c ( · ), is defined on A and is assumed to be strictly convex, continuously differentiable,strictly increasing and (without loss of generality) such that c (0) = 0. Furthermore, a holdingor congestion cost is incurred at rate h ( n ) per unit time when the buffer state is n . The holdingcost function h ( n ) is assumed to be convex, non-decreasing in n and such that h (0) = 0 andlim n →∞ h ( n ) = ∞ . In the average-cost case we assume h ( · ) to be a non-decreasing and convexwith polynomial rate of growth ( h ( n ) ≤ Cn p for some C ≥ , p ∈ Z + ) and again such that h (0) = 0 and lim n →∞ h ( n ) = ∞ . The assumption about polynomial growth rate of the holdingcost function for the average cost case is required for proving the existence of a policy thatincurs costs at a finite rate. 4et Π be the set of non-anticipating policies. A stationary control policy, π ∈ Π, is definedas π = { µ ( n, s ) | n ∈ N , s ∈ S} , where µ ( n, s ) is the service rate to be selected when the stateof the system is ( n, s ). The controller remains idle when the queue is empty i.e, for any policy, µ (0 , s ) ≡
0. Thus, given a policy π , the overall process, X ( t ), evolves as a two dimensionalcontinuous time Markov chain on the state space X = { ( n, s ) | n ∈ Z + , s ∈ S} . Our objectiveis to find a control policy that minimizes the discounted expected cost or average expected costper unit time over an infinite time horizon. For x = ( n, s ) and service rate µ ∈ A , let f ( x, µ ) := c ( µ ) + h ( n ) . Let { ( X π ( t ) , D π ( t )) , t ≥ } be the stochastic process representing the evolution of states anddecisions under an admissible policy π . Given the initial state x and discount factor α >
0, the α − discounted expected cost until time t under policy π is given by v πt,α ( x ) := E πx (cid:20)Z t e − αu f ( X ( u ) , D ( u )) du (cid:21) , (2.1)where E x [ · ] := E [ ·| X (0) = x ]. The total discounted expected cost of a policy π given thatthe initial state of the system is x , is v πα ( x ) := lim t →∞ v πt,α ( x ) . The optimal total discounted expected cost is v ∗ α ( x ) := inf π ∈ Π v πα ( x ) . A policy, π ∗ , is total discounted expected cost optimal if v π ∗ α ( x ) = v ∗ α ( x ) for all x ∈ X .We apply uniformization in the spirit of Lippman [18] and consider the discrete timeequivalent of the continuous time Markov chain described above. The uniformization rateis chosen to be ν := λ L + ¯ η + ¯ u , where ¯ η ≥ max {− Q ss | s ∈ S} is any finite rate larger thanthe maximum of the holding time parameters for the phase transition process.Let v α,k ( n, s ) be the minimum total α -discounted expected cost that can be obtained duringthe last k transitions when starting from state ( n, s ). Using standard arguments of Markovdecision theory [4], the discrete-time finite horizon optimality equations (FHOE) for the system5an be written (for each s ∈ S ): v α,k +1 (0 , s ) = 1 α + ν h h (0) + λ s v α,k (1 , s )+ L X s ′ =1 Q ss ′ v α,k (0 , s ′ ) + ( ν − λ s ) v α,k (0 , s ) i (2.2a) v α,k +1 ( n, s ) = 1 α + ν min µ ∈ A (cid:26) c ( µ ) + h ( n ) + µv α,k ( n − , s ) + λ s v α,k ( n + 1 , s )+ L X s ′ =1 Q ss ′ v α,k ( n, s ′ ) + ( ν − λ s − µ ) v α,k ( n, s ) (cid:27) for n ≥ , (2.2b)where v α, is assumed to be zero for each state. Note that the cost function has compact levelsets. That is, { (( n, s ) , µ ) | f (( n, s ) , µ ) ≤ β } is compact for all β ∈ R . Since the state space isdiscrete, we may apply Proposition 3.1 of [9] to get v α,k ↑ v α . Moreover, v α satisfies (2.2) with v α replacing v α,k on the right hand side and v α,k +1 on the left hand side. The resulting set ofequations are called the discounted cost optimality equations (DCOE) and are stated next forlater reference (for each s ∈ S ). v α (0 , s ) = 1 α + ν " h (0) + λ s v α (1 , s ) + L X s ′ =1 Q ss ′ v α (0 , s ′ ) + ( ν − λ s ) v α (0 , s ) (2.3a) v α ( n, s ) = 1 α + ν min µ ∈ A (cid:26) c ( µ ) + h ( n ) + µv α ( n − , s ) + λ s v α ( n + 1 , s )+ L X s ′ =1 Q ss ′ v α ( n, s ′ ) + ( ν − λ s − µ ) v α ( n, s ) (cid:27) for n ≥ . (2.3b) In this section we provide conditions under which an average cost optimal policy exists andmay be computed as a limit of discounted cost optimal policies. The long-run average cost or gain of a policy π given that the initial state of the system is x , is g π ( x ) := lim sup t →∞ v πt, ( x ) /t, where v t, is as defined in (2.1). The optimal expected average cost g ∗ ( x ) is g ∗ ( x ) := inf π ∈ Π g π ( x ) , and π ∗ is an average cost optimal policy if g π ( x ) = g ∗ ( x ) for all x ∈ X . After uniformizationthe average cost optimality inequalities (ACOI) (cf. [23]) are, w ( n, s ) ≥ ν min µ ∈ A (cid:20) − g + c ( µ ) + h ( n ) + λ s w ( n + 1 , s ) + µw (( n − + , s )+ L X s ′ =1 Q ss ′ w ( n, s ′ ) + ( ν − λ s − µ ) w ( n, s ) (cid:21) for n ≥ , s ∈ S . (2.4)6hen the solution, ( w, g ) to the ACOI exists, w is called a relative value function and g ∗ ( x ) = g is the optimal long-run expected average cost for any initial state x .A solution to the ACOI (2.4) exists under a necessary and sufficient stability condition whichis provided in (2.5) below. This condition requires that the maximum available service rate ishigher than the long-run average arrival rate and coincides with the one derived by Yechiali[26] for the stability of queue with Markov-modulated arrivals. However, since Yechiali usedthe balance equations to show the existence of a steady state distribution there is no guaranteeof finite long-run average cost. This is required for the MDP formulation provided. Since thephase transition process is assumed to be ergodic , it has unique stationary probabilities denotedby, { p , p , . . . , p L } . Proposition 2.1.
There exists a stationary policy , π , under which the system is stable (steadystate distribution exists) if and only if the maximum available service rate satisfies the followingcondition ¯ u > L X s =1 p s λ s . (2.5) Furthermore, the long-run average cost under this policy, g π ( x ) , is finite and independent ofthe initial state x . Proof.
See Appendix.In the next proposition, we present results related to the existence of an optimal average-costpolicy.
Proposition 2.2.
The following hold1. For α > , v α ( x ) satisfies the DCOE (2.3) . Moreover, any stationary policy π α thatminimizes the right side of the DCOE (2.3) is α -discounted expected cost optimal.2. If the stability condition (2.5) holds, we have,(a) There exists a stationary long-run average expected cost optimal policy π ∗ = { µ ∗ ( n, s ) | n ≥ , s ∈ { , , . . . , L }} that is a limit of a sequence of discountedexpected cost optimal policies { π α k , k ≥ } . That is, µ ∗ ( n, s ) = lim k →∞ µ α k ( n, s ) ,where α k ↓ .(b) The long-run average expected cost of policy π ∗ is g ∗ = lim α ↓ αv α ( x ) for every x ∈ X . Moreover, there exists a subsequence α k ↓ such that lim k →∞ w α k ( x ) := v α k ( x ) − v α k ( ) = w ( x ) for a distinguished state such that ( w, g ∗ ) satisfy the ACOI (2.4) . Proof.
See Appendix. 7
Structural Properties of Optimal Policies
In this section we derive structural results for optimal policies for both the discounted cost andthe average cost criterion. In a manner similar to [11], we use following definitions to simplifythe optimality equations, y α (0 , s ) = 0 for s = 1 , , . . . , L, (3.1) y α ( n, s ) = v α ( n, s ) − v α ( n − , s ) for n ∈ N , s ∈ S ,φ ( y ) = max µ ∈ A { µy − c ( µ ) } , and ψ ( y ) = arg max µ ∈ A { µy − c ( µ ) } , where the argmax is a singleton by the assumptions on c (cf. Section 4.3 of [1]). The definitionsabove yield the following simplified form of the DCOE (2.3): v α ( n, s ) = 1 α + ν (cid:20) h ( n ) − φ ( y α ( n, s )) + λ s v α ( n + 1 , s ) + L X s ′ =1 Q ss ′ v α ( n, s ′ )+ ( ν − λ s ) v α ( n, s ) (cid:21) for n ∈ Z + , s ∈ S . (3.2)In order to derive structural results for an optimal discounted expected cost policy, we makeuse of several important properties of functions φ ( y ) = max x ∈ A { yx − c ( x ) } and its associatedmaximizers ψ ( y ) = arg max x ∈ A { yx − c ( x ) } that were introduced in the DCOE (2.3). Recall thatthe conjugate of c ( · ), φ ( · ), is convex (cf. [5]). Moreover, ψ ( y ) is continuous, non-decreasing andequals φ ′ ( y ) wherever the derivative exists. As described in [1], since ( c ′ ) − ( · ) is well-defined,continuous and strictly increasing we have the following characterization of the function ψ ( · ) ψ ( y ) = y ≤ c ′ (0) , ( c ′ ) − ( y ) if c ′ (0) < y < c ′ (¯ u ) , ¯ u if y > c ′ (¯ u ) . (3.3)It may also be established (see [11]) that φ ( · ) is continuous and non-decreasing with the followingcharacterization φ ( y ) = y < , R y ψ ( x ) dx if y ≥ . (3.4) We show the intuitive result that there exists an optimal policy that is monotone in n . Wenote that the structural part of the result could also be proven via the event-based dynamicprogramming framework of Koole [17]. We provide what we believe is an equally simple proofhere for completeness. 8 roposition 3.1. The following hold1. For each s ∈ S , the optimal discounted expected cost value function, v α ( n, s ) , satisfies theDCOE (2.3) and is a non-decreasing, convex function of n .2. There exists a discounted expected cost optimal policy { µ α ( n, s ) , n ≥ , s ∈ S} that isnon-decreasing in n for each s ∈ S .3. Under the assumptions that the holding cost is non-decreasing and convex with polynomialrate of growth and (2.5) hold, there exists a long-run average optimal policy, { µ ( n, s ) , n ≥ , s ∈ S} that is non-decreasing in n for each s ∈ S . Proof.
We use induction and the FHOE (2.2) to prove the first result. The result holdstrivially for k = 0. For the inductive step, suppose v α,k ( · , s ) is non-decreasing and convex on Z + for each s ∈ S . Let u n = µ α,k ( n, s ) be the optimal service rate for the ( k + 1)-stage problemwhen the state is ( n, s ). Suppose we use the potentially sub-optimal decision u n when the stateis ( n − , s ). The FHOE (2.2) yield, v α,k +1 ( n − , s ) ≤ α + ν (cid:20) c ( u n ) + h ( n −
1) + u n v α,k (( n − + , s ) + λ s v α,k ( n, s )+ L X s ′ =1 Q ss ′ v α,k ( n − , s ′ ) + ( ν − λ s − u n ) v α,k ( n − , s ) (cid:21) , ≤ α + ν (cid:20) c ( u n ) + h ( n ) + u n v α,k ( n − , s ) + λ s v α,k ( n + 1 , s )+ X s ′ = s Q ss ′ v α,k ( n, s ′ ) + ( ν + Q ss − λ s − u n ) v α,k ( n, s ) (cid:21) = v α,k +1 ( n, s ) , where the second inequality follows from the induction hypothesis. Thus, v α,k is non-decreasingfor all k .To show convexity note that by the inductive hypothesis, y α,k ( n + 1 , s ) = v α,k ( n + 1 , s ) − v α,k ( n, s ) is a non-decreasing function of n for each s ∈ S . Let u n +1 = µ α,k ( n + 1 , s ) be theoptimal rate for the ( k + 1)-stage problem when the state is ( n + 1 , s ) and u n − = µ α,k ( n − , s )be the optimal rate when the state is ( n − , s ). The DCOE (2.3) imply (for n ≥ α + ν ) y α,k +1 ( n + 1 , s ) ≥ h ( n + 1) − h ( n ) − u n +1 ( y α,k ( n + 1 , s ) − y α,k ( n, s ))+ λ s y α,k ( n + 2 , s ) + L X s ′ =1 Q s,s ′ y α,k ( n + 1 , s ′ ) . y α,k ( n, s ) = v α,k ( n, s ) − v α,k ( n − , s ),( α + ν ) y α,k +1 ( n, s ) ≤ h ( n ) − h ( n −
1) + ( ν − λ s ) y α,k ( n, s ) − u n − ( y α,k ( n, s ) − y α,k ( n − , s )) + λ s y α,k ( n + 1 , s )+ L X s ′ =1 Q s,s ′ y α,k ( n, s ′ ) . Using the definitions of ν = λ L + ¯ η + ¯ u and ¯ Q = ¯ η I + Q we have,( α + ν )( y α,k +1 ( n + 1 , s ) − y α,k +1 ( n, s )) ≥ h ( n + 1) − h ( n ) + h ( n − λ L + ¯ u − λ s − u n +1 ) ( y α,k ( n + 1 , s ) − y α,k ( n, s ))+ u n − ( y α,k ( n, s ) − y α,k ( n − , s ))+ λ s ( y α,k ( n + 2 , s ) − y α,k ( n + 1 , s ))+ L X s ′ =1 ¯ Q s,s ′ (cid:0) y α,k ( n + 1 , s ′ ) − y α,k ( n, s ′ ) (cid:1) ≥ . The second inequality follows as h ( n ) is convex, the coefficients of y α,k terms are non-negativeand the inductive hypothesis. So y α,k ( · , s ) is non-decreasing on Z + for all s ∈ S as required.Taking limits as k → ∞ yields that v α is non-decreasing and convex; the first result is proven.Since the function ψ ( · ) is non-decreasing and µ α ( n, s ) = ψ ( y α ( n, s )), we conclude that thereexists an optimal policy for the discounted cost problem that is monotonically nondecreasingin the queue length for each s ∈ S . This is the second result.For the third result, consider a subsequence of discount factors { α i , i ≥ } such that α i → µ α i ( · , · ) that converge to an average costoptimal policy µ ( · , · ) (see Proposition 2.2). The previous result implies that for each fixed s ∈ S , µ α i ( n, s ) ≤ µ α i ( n + 1 , s ). Thus, the same inequality holds for µ ( · , · ).We remark that we have explicitly used the fact that argmax in ψ is a singleton (whichfollows from the strict convexity assumption on c ( · )). When the convexity is not assumed to bestrict, the results still hold, but we need to take care to define ψ as the minimal element of theargmax and consider a subsequence of discount factors such that w α i ( x ) = v α i ( x ) − v α i ( ) → w ( x ). Since v α i ( n, s ) is non-decreasing in n , so is w and proof in the average cost case follows inthe same way as the discounted cost case except that we use the ACOI instead of the DCOE. Since the states of the phase transition process are ordered such that λ ≤ λ ≤ · · · ≤ λ L ,one might conjecture that the optimal policy is non-decreasing in the phase state, s , for eachcongestion level, n . However, we present two examples to show that depending on the transition10tructure of the phase process, this property may not hold. In both examples, we use valueiteration with α = 0 .
05 to compute the optimal policy numerically. We consider an exponentialcost rate function, c ( µ ) = e µ −
1, and a linear holding cost function, h ( n ) = n . The set ofpermissible service rates is A = [0 , η = 1 η = 1 η = 1 η = 1 (a) Birth and Death η = 1 η = 1 η = 1 (b) Cyclic Figure 1: Transition structure of Phase process for Examples 3.1 and 3.2.
Example 3.1.
In this example, the phase process is a birth and death process on the states { , , } . See Figure 1(a). The infinitesimal generator for the phase process is given by Q = − − − , and arrival rates are λ = 0 . , λ = 1 and λ = 1 . n for each s . It shouldalso be clear that the optimal service rates are non-decreasing in s for each n . Example 3.2.
Consider a phase process with a cyclic transition structure on the set of states { , , } . See Figure 1(b). The infinitesimal generator matrix for the phase process is Q = − − − and the arrival rates are λ = 0 . , λ = 1 and λ = 1 .
25. Figure 3.1 shows that the optimalpolicy is non-decreasing in n for each s . However, it is clear from this figure that the servicerates are not monotone in s when queue length is 4.In Example 3.2, the phase process transitions from the highest arrival intensity state, 3, tothe lowest arrival intensity state, 1. This causes the optimal service rate to be higher in state 2as compared to state 3 for some congestion levels and thereby renders an optimal policy that isnot monotone in s . These examples beg the question, is there a reasonable assumption underwhich the optimal policy is monotone in s ? Stochastic monotonicity of the phase transitionprocess is one such assumption. 11
Queue Length (n) O p t i m a l S e r v i c e R a t e s = 1 s = 2s = 3 Figure 2: Structure of Optimal Policy for Example 3.1
Queue Length (n) O p t i m a l S e r v i c e R a t e s = 1 s = 3s = 2 Figure 3: Structure of Optimal Policy for Example 3.212 .2.1 Stochastic Monotonicity for Continuous Time Markov Chains
Intuitively, stochastic monotonicity means that given the arrival process is in a high arrivalintensity state, the future states it will encounter are in some sense worse (in terms of arrivalintensity) than if the process is in a low arrival intensity state. This leads to the followingdefinitions (see for example Keilson and Kester [16]).
Definition 3.2.
Given two probability vectors p and q , a stochastic matrix M and ahomogeneous Markov chain { X ( t ) , t ≥ } with probability transition function P ( t ) = P ij ( t ) .1. p stochastically dominates q ( p ≥ st q ) iff P Ni = n p i ≥ P Ni = n q i , n = 1 , , . . . .2. Letting M i denote the i th row of the matrix, M is called stochastically monotone if M k ≥ st M l , whenever k > l .3. { X ( t ) , t ≥ } is said to be stochastically monotone if P ( t ) is monotone. Note that the transition structure shown in Example 3.1 is stochastically monotone whilethat for Example 3.2 is not (this is trivial to see once the underlying Markov Chain isuniformized). Some useful stochastic processes that have the stochastic monotonicity propertyinclude the birth-death process, the simple random walk, the age of renewal process withdecreasing failure rate [16]. In particular, the simple 2-state MMPP fluctuating between higharrival rate and low arrival rate considered by Gupta [13] and Shah-Heydari [24], is alsostochastically monotone. The following provides alternative methods for specifying when atransition matrix is stochastically monotone (again refer to Keilson and Kester [16]).
Proposition 3.3.
For monotone matrices the following are equivalent.1. M is monotone.2. ( T − MT ) ij ≥ where T is a square matrix with 1’s on or below the diagonal.3. pM ≥ st qM for all probability vectors p , q with p ≥ st q .4. Mv is non-decreasing for all non-decreasing vectors v . Since for a continuous-time Markov chain with generator matrix Q , stochastic monotonicityimplies that the generator for the phase process satisfies the property ( T − QT ) ij ≥ , i = j where T is a square matrix with 1’s on or below the diagonal([16]). Choosing a uniformizingconstant ¯ η yields ( T − (¯ η I + Q ) T ) ≥ Q is such that ¯ Q := ¯ η I + Q satisfies the property that ¯ Qv isnon-decreasing for all non-decreasing vectors v . This leads to the next result; the main resultof this section. Theorem 3.4.
Suppose that the phase transition process is stochastically monotone. . For each n ∈ Z + , y α,k ( n, s ) is non-decreasing function of s .2. There exists a discounted cost optimal policy, µ α ( n, s ) , that is non-decreasing in s for each n .3. Under the assumptions that the holding cost is non-decreasing and convex with polynomialrate of growth and (2.5) hold, there exists an average cost optimal policy µ ( n, s ) , that isnon-decreasing in s for each n . Proof.
We show the first result by induction. The second and third results follow in ananalogous manner to Proposition 3.1. The statement holds trivially for k = 0. Assume it holdsfor k . Using the definitions ν = λ L + ¯ η + ¯ u and ¯ Q = ¯ η I + Q we have for s > α + ν ) ( y α,k +1 ( n + 1 , s ) − y α,k +1 ( n + 1 , s − φ ( y α,k ( n, s )) − φ ( y α,k ( n, s − λ s ( y α,k ( n + 2 , s ) − y α,k ( n + 2 , s − λ s − λ s − ) ( y α,k ( n + 2 , s − − y α,k ( n + 1 , s − λ L − λ s ) ( y α,k ( n + 1 , s ) − y α,k ( n + 1 , s − u ( y α,k ( n + 1 , s ) − y α,k ( n + 1 , s − − ( φ ( y α,k ( n + 1 , s )) − φ ( y α,k ( n + 1 , s − (cid:20) L X s ′ =1 ¯ Q s,s ′ y α,k ( n + 1 , s ′ ) − L X s ′ =1 ¯ Q s − ,s ′ y α,k ( n + 1 , s ′ ) (cid:21) (3.5)Note that the inductive hypothesis implies that the first four terms in the RHS of (3.5) arenon-negative. Now as φ ( y ) = R y ψ ( x ) dx and ψ ( y ) ≤ ¯ u , φ ( y α,k ( n + 1 , s )) − φ ( y α,k ( n + 1 , s − Z y α,k ( n +1 ,s ) y α,k ( n +1 ,s − ψ ( x ) dx ≤ ¯ u ( y α,k ( n + 1 , s ) − y α,k ( n + 1 , s − . So the next to last term in RHS of (3.5) is non-negative. Furthermore, since the inductivehypothesis and the assumption on ¯ Q implies the last term in the RHS of (3.5) is non-negative.Thus, we conclude from the induction hypothesis that( α + ν )( y α,k +1 ( n + 1 , s ) − y α,k +1 ( n + 1 , s − ≥ , as desired. This section provides two insights using numerical examples. First, we compare the optimalcontrol policy with two natural heuristics. When the environment is changing, it seems a14ecision-maker that is not armed with the current research might take one of two courses.(S)he might choose to ignore the state change of the environment altogether, or she might treateach state change as permanent and react accordingly. In either case, the resulting controlpolicies are heuristics when compared to the optimal control that takes into account both thephase and queue length processes. The first goal is then to compare the optimal policy withthese heuristics. Second, as alluded to in Section 1, the current model can act as a heuristicitself when compared to a model with NHPP arrivals. We analyze when this is a reasonableapproximation.
In this section we present a comparison of the performance of optimal policy with severalheuristics. Furthermore, we compare the optimal cost achievable with state-dependent servicerates with the optimal cost achievable when the service rate is fixed for all states. As mentionedin George and Harrison [11], the difference between these costs represent the economic valueof a responsive mechanism. The first heuristic that we consider uses the optimal control for anaverage cost problem where the arrival process is Poisson with the long-run mean arrival rateof the MMPP. When applied to the original model, this policy is a function of the queue lengthonly. We call this heuristic the
Average Rate Method (ARM).Since the state of the arrival process is known, the decision-maker may solve the stationarymodel with each potential arrival rate and change the service rate according to the current stateof the arrival process. That is to say, a second heuristic is derived in the following way:1. Compute the service rate control average cost optimal policy, π hs , for a system with Poissonarrivals with rate λ s for each intensity level s ∈ S .2. The heuristic for the Markov-Modulated queue is obtained by using π hs when the state ofthe process is ( n, s ).This heuristic is referred to as the Phase Rate based Method (PRM). Note that the long-runaverage arrival rate used in computing the ARM policy is influenced by both the infinitesimalgenerator matrix and the arrival rates of the phase process while the PRM policy relies onlyon the arrival rates.When the service rate is fixed for all states (open loop policy), the queue operates as anMMPP/M/1 queue. For a given service rate, the transition matrix and corresponding steadystate distribution can be computed numerically. Based on the steady state distribution onecan determine the long-run average cost corresponding to that service rate. We then use a 1-Dsearch procedure to find the service rate that minimizes long-run average cost. This is calledthe
Fixed Rate policy.A numerical study comparing the performance of these heuristics for various test cases isprovided in Examples 4.1 and 4.2. In all cases, the policies and average cost are computed15sing value iteration where the queue length is truncated at 50. We use the cost rate function c ( µ ) = e µ − h ( n ) = n . Service rates are allowed to be chosen from A = [0 , { , , . . . , } with the arrival rates as shown in Table 1.Arrival Rate in Phase StateCase 1 2 3 4 5 6 7 8I 0.1 0.35 0.6 0.85 1.1 1.35 1.6 1.85II 0.1 0.6 1.1 1.6 2.1 2.6 3.1 3.6III 0.1 0.85 1.6 2.35 3.1 3.85 4.6 5.35Table 1: Arrival Rate Parameters for Phase Transition Process in Examples 4.1 and 4.2 Example 4.1.
Suppose that phase process is a birth and death process on states { , , . . . , } (recall Figure 1(a)). Fix c >
0. The transition rates for the phase process are η i,i +1 = η i,i − = c for 2 ≤ i ≤ η , = c and η , = c . A higher value for c means that the phase process transitionsfaster between the arrival phases. We refer to c as the fluctuation rate scaling parameter . Fornumerical analysis, we consider three sets of arrival rates for the phase process as shown inTable 1. For each set, the parameter c takes values 0 . , . , .
75 and 1 .
00 resulting in a totalof 12 different scenarios for the arrival process. Table 2 shows the results for all heuristics.Scenarios Gain (% Sub-Optimal)Arrival Rates c Optimal ARM PRM Fixed RateCase I 0.25 4.3651 4.4650 (2.29 %) 4.3676 (0.06 %) 7.6841 (76.03 %)0.50 4.3196 4.3974 (1.80 %) 4.3254 (0.13 %) 7.3185 (69.43 %)0.75 4.2818 4.3455 (1.49 %) 4.2909 (0.21 %) 7.0223 (64.01 %)1.00 4.2494 4.3031 (1.27 %) 4.2618 (0.29 %) 6.8399(60.96 %)Case II 0.25 15.5713 16.9349 (8.76 %) 15.7936 (1.43 %) 24.8200 (59.41 %)0.50 14.8674 15.6939 (5.56 %) 15.2599 (2.64 %) 22.5509 (51.68%)0.75 14.3638 14.9444 (4.04 %) 14.8821 (3.61 %) 21.1000(46.9%)1.00 13.9776 14.4189 (3.16 %) 14.5924 (4.40 %) 20.1360(44.06%)Case III 0.25 47.6797 51.9918 (9.04 %) 49.6854 (4.21 %) 61.1588(28.27%)0.50 42.3561 44.4741 (5.00 %) 45.7978 (8.13 %) 55.8678 (31.9%)0.75 39.2816 40.6579 (3.51 %) 43.7541 (11.39 %) 51.8600 (32.02%)1 .00 37.2150 38.2310 (2.73 %) 42.3809 (13.88 %) 48.9160 (31.44%)Table 2: Average Cost Rates and Percentage Difference between Optimal and Heuristic Policiesfor Example 4.1.A few observations are in order. Ignoring the dynamic state information of the phaseprocess (and using the ARM policy) is more costly when the fluctuation parameter is lower.16his stands to reason since the phase process can be in a state for a long period of time, whilethe ARM policy assumes the arrival rate is the mean arrival rate. In Case III, when the arrivalrate change is the most between phase states and with the slowest rate of changing states, thepercent sub-optimality for ARM is above 9%. If we try to approximate the state changes withstationary processes (using PRM) we see that again, the percent sub-optimality is high (above13%) but this time when the fluctuation parameter is highest.
Fluctuation Rate Scaling Parameter, c A v e r age C o s t R a t e / G a i n ExactARMPRM (a) Small Difference in Phase Intensities (Case I)
Fluctuation Rate Scaling Parameter A v e r age C o s t R a t e / G a i n OptimalARMPRM (b) Med. Difference in Phase Intensities (Case II)
Fluctuation Rate Scaling Parameter A v e r age C o s t R a t e / G a i n OptimalARMPRM (c) Large Difference in Phase Intensities (Case III)
State of the Phase Process S e r v i c e R a t e Optimal Policies for Different c valuesARM PolicyPRM Policy c=0.5 c=0.25c=0.75 c=1 (d) Heuristic Vs Opt. Policy (Case III, n = 2) Figure 4: Comparison of Gain Values for Heuristic and Optimal Policies for Example 4.1.Figures 4(a)-4(c) show the change in the average cost rate under the heuristic and optimalpolicies as a function of the parameter c for the three arrival cases. Figure 4(d) shows acomparison of the heuristics and optimal policies for the arrival rates in Case III and n = 2for various values of fluctuation rate parameter (the behavior is similar for other values of n ).17t should be clear that the PRM policy outperforms the ARM policy in most cases exceptfor high values of c in Case III. Moreover, one should note that the performance of the ARMpolicy improves while that of PRM policy degrades in comparison to the optimal policy asthe fluctuation rate parameter increases. Intuitively this seems reasonable since for low valuesof c the phase process spends more time in each phase. Therefore the PRM policy, that ineach phase applies the optimal policy for a stationary M/M/1 queue with arrival rate of thatparticular phase, performs better than the ARM policy. In fact, if c = 0, the PRM policy isoptimal since the phase process is stationary with the arrival rate of initial phase.At high values of c (since the system sees more and more transitions), the arrival processbehaves like a Poisson process with the average arrival rate of MMPP. Therefore the PRM policygets penalized more in comparison to the ARM policy in this case. In Figure 4(d) we also seethat the change in the optimal service rates as a function of phase state for each congestionlevel is lower for higher values of c . Example 4.2.
In this example we study a cyclic phase process (cf. Figure 1(b)) on the states { , , . . . , } . The transition rates for the phase process are η i,i +1 = c for 1 ≤ i ≤ η , = c .Similar to the previous example we perform the numerical analysis for 12 scenarios for thephase process; three different sets of arrival rates given in Table 1 and for each set of arrivalrates, c takes values 0 . , . , .
75 and 1 . c Optimal ARM PRM Fixed RateCase I 0.25 4.1872 4.2295 (1.01 %) 4.2267 (0.94 %) 6.3440 (51.51%)0.50 4.0603 4.085 (0.61 %) 4.1204 (1.48 %) 5.9620 (46.84%)0.75 3.988 4.0051 (0.43 %) 4.0574 (1.73 %) 5.7700 (44.68%)1.00 3.9423 3.9549 (0.32 %) 4.0166 (1.89 %) 5.6647 (43.69%)Case II 0.25 12.894 13.2042 (2.41 %) 13.9767 (8.39 %) 17.2439 (33.74 %)0.50 11.9656 12.1319 (1.39 %) 13.2268 (10.54 %) 15.6149 (30.5 %)0.75 11.5435 11.6531 (0.95 %) 12.8573 (11.38 %) 14.9350 (29.38 %)1.00 11.2996 11.3786 (0.70 %) 12.6374 (11.84 %) 14.51 (28.43 %)Case III 0.25 31.2724 32.1887 (2.93 %) 39.4752 (26.23 %) 35.5711 (13.75 %)0.50 28.3046 28.7893(1.71 %) 37.1449 (31.23 %) 33.8800 (19.7 %)0.75 27.0506 27.3664 (1.16 %) 36.0660 (33.33 %) 32.7185 (20.95%)1.00 26.3445 26.5702 (0.86 %) 35.4401 (34.53 %) 31.9436 (21.25 %)Table 3: Average Cost Rates and Percentage Difference between Optimal and Heuristic Policiesfor Example 4.2Figures 5(a)-5(c) show the change in the average cost computed under the heuristics as wellas the optimal policies as a function of parameter c for the three arrival cases. It is interestingto observe that unlike Example 4.1, the ARM policy outperforms the PRM policy in almost all18 .2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 13.93.9544.054.14.154.24.25 Fluctuation Rate Scaling Parameter c A v e r age C o s t R a t e / G a i n OptimalARMPRM (a) Small Difference in Phase Intensities (Case I)
Fluctuation Rate Scaling Parameter, c A v e r age C o s t R a t e / G a i n OptimalARMPRM (b) Med. Difference in Phase Intensities (Case II)
Fluctuation Rate Scaling Parameter A v e r age C o s t R a t e / G a i n OptimalARMPRM (c) Large Difference in Phase Intensities (Case III)
State of the Phase Process S e r v i c e R a t e Optmal Policies for Different c ValuesARM PolicyPRM Policy c=0.5 c=0.25c=1c=0.75 (d) Heuristic Vs Optimal Policy (Case III, n = 2) Figure 5: Comparison of Gain Values for Heuristic and Optimal Policies for Example 4.2cases. As illustrated in Example 3.2, when the phase process has cyclic transitions, the optimalservice rates may not be monotone in the phase process (for each fixed n ). In fact, while theservice rates for the PRM policy are monotone in the phase of the transition process for eachcongestion level, the service rates in the optimal policy may begin to decrease as the phasestate increases. This can be more clearly observed in Figure 5(d) which shows a comparison ofheuristic and optimal policies as a function of the phase state when n = 2 for various values ofparameter c and the arrival rates of Case III. Thus the ARM policy approximates the optimalpolicy better than the PRM policy which explains the observed performance difference.Similar to the previous example, we observe that the performance of the ARM policy getsworse and that of PRM policy improves with a decrease in the fluctuation rate parameter c .Furthermore, the average cost percent differences provided in Table 3 show that the ARM19olicy performs extremely well for all three arrival rate cases (less than 5% from optimal forall cases). The PRM policy performs well for Case I but the degradation in its performanceis quite significant (almost 35 % from optimal for Case III with c = 1) when the difference inarrival rates is medium or high (Cases II and III).Tables 2 and 3 show the optimal costs achievable under the fixed rate mechanism under thecolumn heading “fixed rate”. We find that costs are between 13% and 76% suboptimal whenusing a fixed rate mechanism relative to the variable rate mechanism. This shows that thereis substantial benefit in investing in a responsive mechanism. Furthermore, these examplesshow that one cannot rely on a particular heuristic method to perform well in all scenarios.In particular, we find that within the gamut of simple heuristic methods considered here, thetransition structure and the transition rates of the phase process play an important role in theselection of an appropriate approximation method. When the arrival process follows known rate changes, a non-homogeneous Poisson process is areasonable modeling tool. In the classical work of Green and Kolesar [12] or Massey and Whitt[19] the analysis of queues with non-stationary arrivals is considered. From the standpoint of control , Yoon and Lewis [27] consider the case of admission and pricing control. One thing iscertain from Yoon and Lewis’s work, control of non-stationary processes can be computationallyintensive. This is due the fact that to solve each instance the numerical approach requires thetime to be discretized.In this section we explore the possibility of computing an approximate average cost optimalpolicy for a single server queue with non-homogeneous Poisson arrivals using the optimal policiesfor a system with a “suitable” Markov-modulated Poisson arrival process. Apart from the arrivalprocess, other details are the same as the setting described in Section 2. Let the arrival processbe an NHPP with rate λ ( t ). Assume that λ ( t ) is a periodic function with period T . Since theoptimization criterion considered in this study is over an infinite time horizon, and the ratefunction for NHPP is a periodic function of time, the principle of optimality implies that onlythe time elapsed in the current period and the number of jobs in the system need to be includedin the state space [27].To compute the optimal policy for an NHPP arrival process numerically, the time periodis divided into n equally spaced segments of length ∆ t = T /n . Denote the state space forthis discretized process as X = { ( n, z ) | n ∈ Z + , s ∈ { , ∆ t, . . . , T − ∆ t }} . Under this setting,the decision epochs are the time points 0 , ∆ t, t, . . . , T − ∆ t . Let ν be a uniformizing rateof the process. An event (arrival, departure or dummy transition) occurs at a decision epochwith probability 1 − e − ν ∆ t and with probability e − ν ∆ t no event occurs. The standard theory of20arkov decision processes yields the following average cost optimality inequality (ACOI): w ( n, z ) ≥ min x ∈ A (cid:26) (cid:0) − g + h ( n ) + { n> } c ( x ) (cid:1) ∆ t + (1 − e − ν ∆ t ) (cid:18) λ ( z ) ν w ( n + 1 , z + ∆ t )+ { n> } xν w ( n − , z + ∆ t ) + (cid:0) − λ ( z ) ν − { n> } xν (cid:1) w ( n, z + ∆ t ) (cid:19) + e − ν ∆ t w ( n, z + ∆ t ) (cid:27) for n ∈ Z + , z = 0 , ∆ t, . . . , T − t, and w ( n, T − ∆ t ) ≥ min x ∈ A (cid:26) (cid:0) − g + h ( n ) + { n> } c ( x ) (cid:1) ∆ t + (1 − e − ν ∆ t ) (cid:18) λ ( T − ∆ t ) ν w ( n + 1 , { n> } xν w ( n − ,
0) + (cid:0) − λ ( z ) ν − { n> } xν (cid:1) w ( n, (cid:19) + e − ν ∆ t w ( n, (cid:27) for n ∈ Z + , where E is the indicator function of the event E . When the solution, ( w, g ) to the ACOIexists, w is called the relative value function and g ∗ ( x ) = g is the optimal long-run expectedaverage cost for any initial state x .We now present a method for constructing an approximate policy for NHPP arrivals. Themain idea is to approximate the NHPP by an appropriately constructed MMPP. This is doneby dividing the time period T into l subintervals and constructing an MMPP with the samenumber of phases as the number of subintervals i.e, l . We choose a cyclic transition structurefor the phase process with arrival rate in each phase as the average rate over that subinterval.Transition rates for the phase process can be selected such that the mean sojourn time in phase s is the width of the corresponding interval. The optimal policy corresponding to this MMPPcan then be applied to the original NHPP arrivals. The detailed procedure to evaluate theapproximate policy is described below:1. Partition the interval [0 , T ] into l subintervals, [ t s − , t s ], s = 1 , . . . , l with t = 0 and t l = T .2. Compute average rates over each partition, λ s = R t s t s − λ ( t ) dt ( t s − t s − ) .
3. Compute transition rates, η i,j , for the phase process of MMPP, η i,j = t i − t i − ) , ≤ i ≤ l − , j = i + 1; t l − t l − ) , i = l, j = 1;0 , otherwise . 21. Compute the optimal policy corresponding to the MMPP arrival process constructed inprevious steps. Denote this policy as µ ( n, s ) , n ∈ Z + , s ∈ , , . . . , l .5. Construct the approximate policy, ˆ µ , for the NHPP process asˆ µ ( n, t ) = µ ( n, s ) for t s − ≤ t ≤ t s , n = 0 , , , . . . . A numerical study comparing the performance of the approximation procedure stated abovefor various test cases is provided in Examples 4.3 and 4.4. In all cases, the policies and averagecost are computed using value iteration where the queue length is truncated at 50 to keep thesize of the state space manageable.
Example 4.3.
In this example, we consider a NHPP with the following (periodic) rate function, λ ( t ) = . ≤ t < T / . T / ≤ t < T / . T / ≤ t < T / . T / ≤ t < T / , . T / ≤ t < T, where T is the time period of the rate function. One can think of this rate function as aquantized version of a triangular waveform with time period T . The discretization interval ∆ t ,for solving the problem with NHPP arrivals is selected as 0 .
05 units. The cost rate functionis c ( µ ) = e µ − h ( n ) = n . Service rates can be selected from aset A = [0 , , T ] into 5subintervals of equal length. Thus, the arrival rates for the corresponding MMPP are λ =0 . , λ = 2 . , λ = 4 . λ = 2 . λ = 0 . Q = 5 T − − − − − , Figure 6 show a comparison of optimal policy with approximate MMPP policy. Table 4 givesthe average cost percent difference between the performance of approximate and optimal policyfor various test scenarios. This data shows that the approximate policies perform extremelywell in all cases (less than 1% sub-optimal).
Example 4.4.
In this example, we consider a NHPP with the following (periodic) rate function, λ ( t ) = 5 sin ( ωT ) + 622 Time S e r v i c e R a t e ( n = ) Time S e r v i c e R a t e ( n = ) Time S e r v i c e R a t e ( n = ) Time S e r v i c e R a t e ( n = ) MMPP PolicyOptimal Policy
Figure 6: NHPP Policy vs Approximate MMPP policy for various congestion levels (T=7)Time Period ( T ) Gain(% Sub-Optimal)Optimal App. MMPP4 8.5667 8.5932 (0.31%)5 8.7750 8.7262 (0.41%)6 8.7467 8.7925 (0.52%)7 8.8225 8.8785 (0.64%)Table 4: Average Cost Rates and Percentage Difference between Optimal and ApproximateNHPP Policy 23here T is the time period of the rate function and ω = πT the frequency. We consider thecases in which T is set to nπ , n = 1 , , , and ,
4. The discretization interval ∆ t , for solving theproblem with NHPP arrivals is selected as T units. The cost rate function is c ( µ ) = µ − h ( n ) = ( n − + . Service rates can be selected from a set A = [0 , , T ] into 6 subintervals ofequal length. Time S e r v i ce R a t e ( n = ) Time S e r v i ce R a t e ( n = ) Time S e r v i ce R a t e ( n = ) Time S e r v i ce R a t e ( n = ) MMPPOptimal
Figure 7: NHPP Policy vs Approximate MMPP policy for various congestion levels ( T = π )Time Period ( T ) Gain(% Sub-Optimal)Optimal App. MMPP π π π π In this paper we investigate the problem of service rate control for a single server queue with non-stationary arrivals. We propose a framework based on the Markov modulated Poisson processes;a popular model amongst practitioners that is also relatively easy to analyze. Assuming thatthe goal is to minimize a combination of effort cost and holding cost incurred per unit time,we study this problem under both the discounted and average cost optimality criterion. Ineither case, we characterize the structure of an optimal service rate as being monotone in thequeue length for each arrival rate but not necessarily monotone in the arrival rates for eachqueue length. In particular, we show that the manner in which the process switches betweenthe arrival rates plays an important role in determining the structure of the optimal policy.We further prove that monotonicity in the arrival rates is recovered when the transition matrixgoverning the MMPP is stochastically monotone.There are several ways to extend our work. Our numerical study confirms that in some casessimple heuristics may perform well in the face of changing arrival rates. However, it also pointsout that careful selection based on the parameters of the system is required, and in many casesapplying the proposed model is essential (as opposed to the heuristics). The second part of ournumerical work points to the fact that our model can be used as a heuristic itself. We showthat we can potentially provide a policy for a system with non-homogenous Poisson arrivalsusing the optimal policy for an MMPP/M/1 queue. Our results indicate that this may be apromising direction for future research.Another problem of interest is that of the control of MMPP/M/1 queue with a finitebuffer but with an explicit constraint on the job loss rate. Under this setting, while thetechnical conditions required for stability are not needed, handling the explicit constraint posesa significant challenge. We note that results provided by Ata [1] for the stationary arrival casemay be useful.The model under study assumes that complete information about arrival statistics isavailable. This may be unreasonable in situations where arrival statistics cannot be associatedwith the observable features of the system. A promising direction of work may be to tacklesuch situations using the partially observable Markov decision process framework.Finally, we would like to point out that although for ease of exposition, we assume that thecost of effort function, c ( µ ), is strictly convex, continuously differentiable and non-decreasing,our proofs (with minor modification) and results hold for more general cost of effort functions.Using the analysis presented by George and Harrison [11], it can be easily shown that thestructural results for an optimal policy continue to hold when c ( µ ) is assumed to be non-25ecreasing and continuous. References [1] B. Ata. Dynamic Power Control in a Wireless Static Channel Subject to a Quality-of-Service Constraint.
Operations Research , 53(5):842–851, September 2005. 8, 25[2] B. Ata and K. E. Zachariadis. Dynamic power control in a fading downlink channel subjectto an energy constraint.
Queueing Systems , 55(1):41–69, December 2006. 3[3] R.A. Berry. Power and delay trade-offs in fading channels. PhD Thesis, 2000. 3[4] D.P. Bertsekas.
Dynamic Programming and Optimal Control-Vol II . Athena Scientific,1995. 5[5] S. Boyd and L. Vandenberghe.
Convex Optimization . Cambridge University Press, 2004.8[6] T.B. Crabill. Optimal Control of a Maintenance System with Variable Service Rates.
Operations Research , 22(4):736–745, July 1974. 3[7] J.G. Dai. On positive Harris recurrence of multiclass queueing networks: a unified approachvia fluid limit models.
The Annals of Applied Probability , 40(11):49–77, 1995. 28, 30[8] J.G. Dai and S.P. Meyn. Stability and convergence of moments for multiclass queueingnetworks via fluid limit models.
IEEE Transactions on Automatic Control , 40(11):1889–1904, 1995. 28, 31[9] E.A. Feinberg and M.E. Lewis. Optimality Inequalities for Average Cost Markov DecisionProcesses and the Stochastic Cash Balance Problem.
Mathematics of Operations Research ,32(4):769–783, November 2007. 6[10] V.S. Frost and B. Melamed. Modeling For Telecommunications Networks.
IEEECommunications Magazine , (March):70–81, 1994. 3[11] J.M. George and J.M. Harrison. Dynamic Control of a Queue with Adjustable ServiceRate.
Operations Research , 49(5):720–731, September 2001. 8, 15, 25[12] L.V. Green and P.J. Kolesar. On the accuracy of the simple peak hour approximation forMarkovian queues.
Management science , 41(8):1353–1370, 1995. 20[13] V. Gupta, M. Harchol-Balter, A. Scheller Wolf, and U. Yechiali. Fundamentalcharacteristics of queues with fluctuating load.
ACM SIGMETRICS PerformanceEvaluation Review , 34(1):203, June 2006. 3, 132614] H. Heffes and D. Lucantoni. A Markov modulated characterization of packetized voiceand data traffic and related statistical multiplexer performance.
Selected Areas inCommunications, IEEE Journal on , 4(6):856–868, 1986. 3[15] D.L. Kaufman and M.E. Lewis. Machine maintenance with workload considerations.
NavalResearch Logistics , 54(7):750–766, October 2007. 28[16] J. Keilson and A. Kester. Monotone matrices and monotone Markov processes 1.
StochasticProcesses and their Applications , 5(3):231–241, 1977. 13[17] G. Koole. Structural results for the control of queueing systems using event-based dynamicprogramming.
Queueing Systems , 1(x):1–15, 1998. 8[18] S.A. Lippman. Applying a new device in the optimization of exponential queuing systems.
Operations Research , 23(4):687–710, 1975. 3, 5[19] W.A. Massey and W. Whitt. Peak congestion in multi-server service systems with slowlyvarying arrival rates.
Queueing Systems , 25(1):157–172, 1997. 20[20] L. Muscariello, M. Meillia, M. Meo, M.A. Marsan, and R.L. Cigno. An MMPP-based hierarchical model of Internet traffic. , pages 2143–2147 Vol.4, 2004. 3[21] W.B. Powell.
Approximate Dynamic Programming Solving the curses of dimensionality .Wiley; 2 edition, 2011. 2[22] N.U. Prabhu and Y. Zhu. Markov-modulated queueing systems.
Queueing Systems , 5(1-3):215–245, November 1989. 3[23] L.I. Sennott. Average Cost Semi-Markov Decision Processes and the Control of QueueingSystems.
Probability in the Engineering and Informational Sciences , 3(02):247, July 1989.6, 31, 32[24] S. Shah-Heydari. MMPP modeling of aggregated ATM traffic.
Conference Proceedings.IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.98TH8341) ,1:129–132, 1998. 3, 13[25] S. Stidham Jr and R.R. Weber. Monotonic and insensitive optimal policies for control ofqueues with undiscounted costs.
Operations Research , 37(4):611–625, 1989. 3[26] U. Yechiali. A queuing-type birth-and-death process defined on a continuous-time Markovchain.
Operations Research , 21(2):604–609, 1973. 7, 31[27] S. Yoon and M.E. Lewis. Optimal Pricing and Admission Control in a Queueing Systemwith Periodically Varying Parameters.
Queueing Systems , 47(3):177–199, July 2004. 2027
Appendix
This appendix is dedicated to providing proofs of Propositions 2.1 and 2.2. The results of Dai[7] and Dai and Meyn [8] are utilized to show the stability of a stochastic model by establishingthe stability of its fluid limit approximation. For the purpose of this analysis, consider thecontinuous time Markov process, X π ( t ) = { ( Q π ( t ) , S π ( t )) , t ≥ } , induced by an admissiblestationary policy π ∈ Π where { Q π ( t ) , t ≥ } and { S π ( t ) , t ≥ } represent the queue length andphase transition process for arrivals, respectively.The proof approach for Proposition 2.1 follows closely that of Kaufman and Lewis [15](Proposition 3.1) and is done in several steps. Let ˆ π be a policy that selects a constant rateˆ µ ∈ ( P Ls =1 p s λ s , ¯ u ] whenever the queue is non-empty. Let X ˆ π ( t ) = { ( Q ˆ π ( t ) , S ˆ π ( t )) , t ≥ } bethe Markov process induced by the policy ˆ π on state space X = { ( n, s ) | n ∈ Z + , s ∈ S} .Since the policy is fixed for the remainder of this section, in the interest of brevity we suppressdependence on ˆ π . The norm of a state, x = ( n, s ) ∈ X , is defined to be | x | := n + s . For aninitial state X (0) = x , we define the scaled queue length process¯ Q x ( t ) := 1 | x | Q x ( | x | t ) . We will use a similar notation to denote scaled versions of other stochastic processes.For each s ∈ S , let { ξ s ( k ) , k ≥ } be a sequence of i.i.d exponential random variables withmean 1 /λ s . The sequence { ξ s ( k ) , k ≥ } represents the set of job inter-arrival times when thearrival process is in phase s . Also, let { η ( k ) , k ≥ } be a sequence of i.i.d exponential randomvariables with mean 1 representing the set of job completion times. Based on these sequences,we define the following cumulative processes E s ( t ) = max { k ≥ | ξ s (1) + ξ s (2) , . . . , ξ s ( k ) ≤ t } for s ∈ S ,D ( t ) = max { k ≥ | η (1) + η (2) , . . . , η ( k ) ≤ t } . Let Y xs ( t ) be the cumulative amount of time the arrival process spends in phase s until time t when the initial state is x . Similarly, let T x ( t ) be the cumulative amount of time for whichthere is at least one customer in the queue, I x ( t ) be the cumulative amount of time when thequeue is empty and W x ( t ) be the total work done by the server until time t. We can now writethe following system of equations for the stochastic process induced by policy ˆ π when starting28rom initial state x , Q x ( t ) = Q x (0) + L X s =1 E s ( Y xs ( t )) − D ( W x ( t )) , (6.1) Q x ( t ) ≥ , (6.2) L X s =1 Y xs ( t ) = t, (6.3) W x ( t ) = ˆ µT x ( t ) , (6.4) T x ( t ) + I x ( t ) = t, (6.5) Z ∞ Q x ( t ) dI x ( t ) = 0 , (6.6) Y xs ( t ) , T x ( t ) , I x ( t ) , W x ( t ) start from zero and are non-decreasing in t . (6.7)A few comments are in order. First, note that (6.6) imposes the constraint that the server isidle only when the system is empty. For a subsequence { x n , n ≥ } such that | x n | → ∞ , anylimit point, ¯ Q ( t ), of the sequence { ¯ Q x n , n ≥ } is called a fluid limit . It will be shown thatevery fluid limit satisfies a set of equations known as the fluid model . A fluid model is called stable if there exists a t > Q ( t ) = 0 for all t ≥ t and for all fluid limits. We nextpresent the fluid model and convergence results for the scaled processes Proposition 6.1.
Let { x j | x j ∈ X , j ≥ } be a sequence of initial states with | x j | → ∞ . Thenwith probability 1, there exists a subsequence, { x j k , k ≥ } , such that ( ¯ Q x jk (0) , ¯ S x jk (0)) → ( ¯ Q (0) , , (6.8)( ¯ Q x jk ( t ) , ¯ T x jk ( t )) → ( ¯ Q ( t ) , ¯ T ( t )) uniformly on compact sets (u.o.c.) , (6.9) where ( ¯ Q ( t ) , ¯ T ( t )) satisfy the following equations, ¯ Q ( t ) = ¯ Q (0) + L X s =1 p s λ s t − ¯ W ( t ) , (6.10)¯ Q ( t ) ≥ , (6.11)¯ W ( t ) = ˆ µ ¯ T ( t ) , (6.12)¯ T ( t ) + ¯ I ( t ) = t, (6.13) Z ∞ ¯ Q ( t ) d ¯ I ( t ) = 0 , (6.14)¯ T ( t ) , ¯ I ( t ) , ¯ W ( t ) start from zero and are non-decreasing in t. (6.15) Proof.
Since ¯ Q x j (0) ≤
1, ¯ S x j (0) ≤ ≤ S x j (0) ≤ L for all j ∈ N , there exists asubsequence, { x j k , k ≥ } such that ( ¯ Q x jk (0) , ¯ S x jk (0)) → ( ¯ Q (0) , ω and 0 ≤ s ≤ t , we have 0 ≤ ¯ T x j ( t ) − ¯ T x j ( s ) ≤ t − s . Thus, the function ¯ T x j ( t ) isuniformly Lipschitz of order 1. Since 0 ≤ ¯ T x j ( t ) ≤ t , it is also uniformly bounded for each j ≥
1. Therefore, the sequence { ¯ T x j ( t ) , j ≥ } is equicontinuous. By Arzela-Ascoli theorem,any subsequence of { ¯ T x j ( t ) , j ≥ } has a u.o.c. convergent subsequence.Since the phase transition process is ergodic, for each s ∈ { , . . . , L } we have with probability1 lim t →∞ Y xs ( t ) /t = p s . Furthermore, from the strong law of large numbers for renewalprocesses, the following hold almost surelylim t →∞ E s ( t ) /t = λ s s ∈ S , lim t →∞ D ( t ) /t = 1 . The above results can be used in a manner similar to Lemma 4.2 of [7], to yield (with probability1) ¯ Y s ( t ) = lim k →∞ | x j k | Y x jk ( | x j k | t ) = p s t u.o.c., for s ∈ S , (6.16)¯ E s ( t ) = lim k →∞ | x j k | E ( | x j k | t ) = λ s t u.o.c., for s ∈ S (6.17)¯ D ( t ) = lim k →∞ | x j k | D ( | x j k | t ) = t u.o.c. (6.18)The equality in (6.10) follows from (6.1) and (6.16)-(6.18). Similarly, (6.12)-(6.15) follow directlyfrom (6.4)-(6.7), respectively. Proof of Proposition 2.1:
We start by showing that the fluid model provided in (6.10)-(6.15)is stable. First note that ¯ T ( t ) , ¯ I ( t ) and ¯ Y s ( t ) are Lipschitz continuous and therefore absolutelycontinuous and differentiable almost everywhere. Taking the derivative with respect to t in(6.10) and (6.12) yields ˙¯ Q ( t ) = L X s =1 p s λ s − ˆ µ ˙¯ T ( t ) . Further, due to the non-idling constraint (6.14), ˙¯ I ( t ) = 0 whenever ¯ Q ( t ) >
0. Thus, from (6.13),˙¯ T ( t ) = 1 whenever ¯ Q ( t ) >
0. So for ¯ Q ( t ) >
0, we have˙¯ Q ( t ) = L X s =1 p s λ s − ˆ µ. Our choice of the stationary policy enabled by the stability condition (2.5), implies that ˙¯ Q ( t ) < Q ( t ) >
0. Thus from Lemma 5.2 of [7] we have that the fluid limit process for queuelength is non-increasing and there exists a t ≥ Q ( t ) = 0 for all t ≥ t . That is, thefluid model is stable. The results of Theorem 4.2 of [7] imply that the Markov process inducedby the stationary policy ˆ π , is positive recurrent and stationary distribution exists. Furthermore,30ince the embedded discrete time Markov chain for the process is irreducible, this process is ergodic . This implies that the long-run average cost under ˆ π , say g ˆ π , is independent of theinitial state.To show that g ˆ π is finite, we use the results of Theorem 4.1(i) of [8]. Since the fluid modelis stable and conditions A1) and A2) of Theorem 4.1 hold, it follows that for any integer p ≥ t →∞ t R t E x [ | Q ( u ) | p ] du < ∞ for each initial condition x . Since the holding cost hasa polynomial rate of growth, we have that the long-run average holding cost rate is finite.Moreover, the direct contribution to long-run cost rate due to serving at ˆ µ whenever the queueis not empty is at most c (ˆ µ ) < ∞ . It therefore follows g ˆ π is finite.It remains to consider the necessity of (2.5). Consider the Markov process induced by apolicy that uses the highest available service rate whenever the queue is not empty. As shown byYechiali[26] using the detailed balance equations for the steady state distribution a non-trivialinvariant measure exists only if ¯ u > P Ls =1 p s λ s . The result follows and the proof is complete.The remainder of this section is dedicated to proving Proposition 2.2 holds. We showthat the optimal value and relative value functions satisfying the ACOI, (2.4), exist and canbe obtained via limits from the discounted expected cost value functions. Thus, the structuralresults proved for the discounted cost case continue to hold for the average cost case. In provingthese results, we verify the following set of assumptions ( SEN ) (included for completeness)provided by Sennott [23] hold. • SEN1:
There exist δ > ǫ > ǫ that the transition time will be greater that δ . • SEN2:
There exists B such that τ ( i, µ ) ≤ B for every state i and control µ , where τ ( i, µ )is the expected transition time out of state i when control µ is chosen. • SEN3: v α ( i ) < ∞ for all states i and α > • SEN4:
There exists α > M i such that w α ( i ) ≤ M i for everystate i and 0 < α < α where w α ( i ) = v α ( i ) − v α ( ), for distinguished state . For everystate i , there exists an action µ i such that P j P ij ( µ i ) M j < ∞ . • SEN5:
There exists α > N such that − N ≤ w α ( i ) forevery i and 0 ≤ α ≤ α . • SEN6:
For each state i , the expected single stage discounted cost f α ( i, µ ) is a lowersemi-continuous (lsc) function on the product space [0 , ∞ ) × A . Note that f ( i, µ ) is theun-discounted single stage cost. • SEN7:
For all states i and j , the function L ij ( α, µ ) = P ij ( µ ) R ∞ t =0 e − αt νe − νt dt is a lscfunction on the product space [0 , ∞ ) × A .31 SEN8:
Assume that α n is a sequence of discount factors converging to 0 with the propertythat π α n , the associated sequence of α -discount optimal policies, converge to a stationarypolicy π . Then for each state i , lim inf n τ ( i, π α n ) ≤ τ ( i, π ). Proof of Proposition 2.2:
We begin by verifying the
SEN assumptions. Since theuniformizing rate is strictly positive and finite, assumptions
SEN1 and
SEN2 hold. It wasshown in Proposition 2.1 that under the stability condition (2.5), there exists a stationarypolicy that induces an ergodic Markov process with finite long-run average expected cost.Thus, the hypotheses of Lemma 2 of [23] hold; validating
SEN3 and SEN4 assumptions. Let¯ s = arg min s ∈S { v α (0 , s ) } and define the distinguished state as = (0 , ¯ s ). It follows fromProposition 3.1 that for any α > v α ( ) ≤ v α ( n, s ) for all ( n, s ) ∈ X . Therefore, w α ( n, s ) ≥ SEN5 is satisfied.
SEN6 holds since for each ( n, s ) ∈ X , f α (( n, s ) , µ ) = ( c ( µ )+ h ( n )) / ( α + ν )is a continuous function on [0 , ∞ ) × A .There is no decision to be made when n = 0. Fix n ≥ s ∈ { , , . . . , L } and note, L ( n,s ) , ( n ′ ,s ′ ) ( α, µ ) = λ s α + ν if n ′ = n + 1 , s ′ = s, µα + ν if n ′ = n − , s ′ = s, Q ss ′ α + ν if n ′ = n, s ′ ∈ S , L x,x ′ ( α, µ ) is jointly continuous in α and µ for each x, x ′ ∈ X , therefore SEN7 holds. Finally, since for any policy π , τ ( i, π ) = 1 /ν , SEN8 holds.
SEN1 , SEN3 , SEN6 and