Optimal Control of Markov Processes with Age-Dependent Transition Rates
aa r X i v : . [ m a t h . O C ] S e p OPTIMAL CONTROL OF MARKOV PROCESSES WITHAGE-DEPENDENT TRANSITION RATES ∗ Mrinal K. Ghosh † , Subhamay Saha ‡ Abstract
We study optimal control of Markov processes with age-dependent transition rates. The controlpolicy is chosen continuously over time based on the state of the process and its age. We studyinfinite horizon discounted cost and infinite horizon average cost problems. Our approach isvia the construction of an equivalent semi-Markov decision process. We characterise the valuefunction and optimal controls for both discounted and average cost cases.
Key Words : Age-dependent transition rates, semi-Markov decision process, infinite horizon dis-counted cost, infinite horizon average cost.
Mathematics Subject Classification : 93E20, 60J75.
We address optimal control of Markov processes in continuous time taking values in a countablestate space. The simplest example of such a process is controlled Markov chains also known ascontinuous time Markov decision process (CTMDP). The study of controlled Markov chains is quitewell developed [3], [8], [9], [14]; in particular see [7] and the references therein. For a continuous timecontrolled Markov chain, for each control input the holding time or sojourn time in each state isexponentially distributed. Thus for a fixed input the sojourn times are memoryless. If the sojourntime in each state is given by a general distribution (other than exponential) then the processis referred to as a semi-Markov process. A controlled semi-Markov process, also known as semi-Markov decision process(SMDP), is determined by a controlled transition kernel and controlledholding time distributions. This class of processes are usually studied via the embedded controlledMarkov chain [4], [5], [16]. Since in an SMDP the holding time distributions have a memory, theage of the process in a particular state influences the residual time in that state. It may, however, ∗ This work is supported in part by SPM fellowship of CSIR and in part by UGC Centre for Advanced Study. † Department of Mathematics, Indian Institute of Science, Bangalore-12, India, email: [email protected] ‡ Department of Mathematics, Indian Institute of Science, Bangalore-12, India, email: [email protected]
1e noted that the age has no influence in determining the next state; nor does it play any role inthe decision making. There are several situations in which the age of the process is crucial in theoverall decision making process. To illustrate this point we consider two examples.Consider a queueing system with controllable arrival and service rates. Suppose the queuecapacity is infinite. The decision maker can dynamically select the service rates between the bounds0 < µ < µ < ∞ depending on the number of persons in the queue and for how long that manypersons are in the queue. Moreover, the arrival rates can also be adjusted between 0 < γ < γ < ∞ . The cost structure consists of three parts: a holding cost rate function b ( i, y ) where i is the numberof customers and y is the amount of time for which there has been i customers, an income rate b ( γ ) when an arrival rate γ is maintained and a service cost rate b ( µ ) when the service rate is µ .Mathematically the model can be described as below: S = { , , , · · · } : state space .U = [ γ , γ ] × [ µ , µ ] : control set .λ ij ( y, γ, µ ) = γ for j = i + 1 µ for j = i −
10 otherwise : transition rates .c ( i, y, γ, µ ) = b ( i, y ) − b ( γ ) + b ( µ ) : cost function . Next consider a device which is subject to shocks that occur randomly in time accordingto a Poisson process with controllable rate. Every shock causes a damage to the machine. Thedamage caused depends on the state of the machine and the amount of time it has been in thatstate. The machine can be in the states 0 , , , · · · , N. The state 0 represents the new machine andonce the machine goes to state N , then a further shock would mean that a new machine has to beinstalled. Suppose the rate of arrival of shocks can be adjusted between 0 < µ < µ < ∞ . Thecost structure consists of two parts: an operational cost rate b ( i, y ) is incurred if the machine is instate i and the age in that state is y , and a maintenance rate b ( µ ) when the shock arrival rate is µ . Mathematically, the model can be described as below: S = { , , · · · , N } .U = [ µ , µ ] . ij ( y, µ ) = µ y for j = i + 1 , i ≤ N − µy y for j = i + 2 , i ≤ N − µ for i = N − , j = N and i = N, j = 0 .c ( i, y, µ ) = b ( i, y ) + b ( µ ) . Motivated by the above two examples we study optimal control of Markov processes where thetransition rates are age dependent. Informally, this means if the process is in state i and its age inthe state is y , then the probability that in an infinitesimal time dt the process will jump to state j is λ ij ( y ) dt plus a small error term. The probability that after an infinitesimal time dt it willstill be in state i is 1 − X j = i λ ij ( y ) dt plus some error term, where λ ij are some measurable functionsreferred to as transition rates. In controlled case the transition rates also depends on the controlparameter chosen dynamically based on the state and the age. In continuous time Markov chainthe transition rates are constant with respect to the age. In semi-Markov case the transition ratesare given by λ ij ( y ) = p ij f ( y | i )1 − F ( y | i ) , where p ij s are the transition probabilities and F is the holdingtime distributions with density f . In CTMDP and SMDP when the controller is using a stationarycontrol, he or she takes decision only on the basis of state and it is independent of the age. Butin our case the decision maker takes his actions based on both the state and the age. Thus thedecision maker, unlike in CTMDP and SMDP, has the liberty to take actions between jumps evenwhen he or she is using a stationary control. This liberty can be of great advantage in practicalsituations. Hence our model may be more effective in many practical situations.We now present a formal description of the controlled process. A rigorous construction ofthe process is given in the next section. Let S = { , , , · · · } be the state space and U a compactmetric space, which is the control set.For i, j ∈ S with i = j suppose λ ij : [0 , ∞ ) × U → [0 , ∞ )are given measurable functions. Consider a controlled process { ( X t , Y t ) } which satisfies ( P ( X t + h = j, Y t + h = 0 | X t = i, Y t = y, U t = u ) = λ ij ( y, u ) h + o ( h ) P ( X t + h = j, Y t + h = y + h | X t = i, Y t = y, U t = u ) = 1 − P j = i λ ij ( y, u ) h + o ( h ) . (1.1)We call { X t } the state process, { Y t } the associated age process and { U t } is the control processwhich is a U -valued process satisfying certain technical conditions. The control process is chosenbased on both the state and its age. Thus the control action is taken continuously over time.Equation (1.1) implies that at time t if the state is i , and its age in the state is y and the controlchosen is u then λ ij ( y, u ) is the the infinitesimal jump rate to state j .3he main aim in a stochastic optimal control problem is to find a control policy whichminimises a given cost functional. Let c : S × R + × U −→ R + be the running cost function. Suppose the planning horizon is infinite and consider the discountedcost problem. We seek to minimise E Z ∞ e − αt c ( X t , Y t , U t ) dt over the set of all admissible controls (to be defined in the next section), where α > { ( X t , Y t ) } which is based on a representation of { ( X t , Y t ) } as stochasticintegrals with respect to an appropriate Poisson random measure. In Section 3 we study the infinitehorizon discounted cost problem. For that we construct an equivalent semi-Markov process. Section4 deals with the infinite horizon average cost case. Let (Ω , F , P ) be the underlying probability space. For i, j ∈ S, i = j , let λ ij : [0 , ∞ ) × U → [0 , ∞ )be given measurable functions. Set λ ii ( y, u ) = − X j = i λ ij ( y, u ) . We make the following assumption which is in force throughout this paper: (A1)
There exists a constant M such thatsup i ∈ S,y ≥ ,u ∈ U {− λ ii ( y, u ) } < M . A2) inf i ∈ S,y ≥ ,u ∈ U {− λ ii ( y, u ) } > m for some m > P ( U ) denote the set ofprobability measures on U . For i = j , let ˜ λ ij : [0 , ∞ ) × P ( U ) → R + be defined by˜ λ ij ( y, ν ) = Z U λ ij ( y, u ) ν ( du ) . For i = j , y ∈ R + and ν ∈ P ( U ), let Λ ij ( y, ν ) be consecutive right open, left closed intervals of thereal line of length ˜ λ ij ( y, ν ).We define a function h : S × R + × P ( U ) × R → R by h ( i, y, ν, z ) = ( j − i if z ∈ Λ ij ( y, ν )0 otherwise . (2.1)We also define a function g : S × R + × P ( U ) × R → R by g ( i, y, ν, z ) = y if z ∈ [ j Λ ij ( y, ν )0 otherwise . (2.2)Let ℘ ( ds, dz ) be a Poisson random measure on R + × R with intensity measure ds × dz , the productLebesgue measure on R + × R . Consider the following stochastic differential equation ( X t = X + R t R R h ( X s − , Y s − , U s , z ) ℘ ( ds, dz ) Y t = Y + t − R t R R h ( X s − , Y s − , U s , z ) ℘ ( ds, dz ) (2.3)where { U t } is a P ( U )-valued process with measurable sample paths which is predictable withrespect to the filtration given by σ ( ℘ ( A × B ) : A ∈ B ([0 , s ]) , B ∈ B ( R ) , s ≤ t )and X , Y are random variables with prescribed laws independent of the Poisson random measure.The integrals in (2.3) are over (0 , t ]. From the results in [13, Chap IV, p. 231] it follows that for each { U t } as above, equation (2.3) has an a.s unique strong solution { ( X t , Y t ) } . If U t = u ( t, X t − , Y t − )for some measurable function u : [0 , ∞ ) × S × [0 , ∞ ) → P ( U ) then U is called a Markov control.Moreover if U t = u ( X t − , Y t − ) for some measurable function u : S × [0 , ∞ ) → P ( U ) then U is referredto as stationary Markov control. It is customary in optimal control literature to refer to the function u as the control. We denote by U the set of all measurable functions u : S × [0 , ∞ ) → P ( U ). In thispaper we restrict our set of controls to the set U and we refer to U as the set of admissible controls.For each u ∈ U , { ( X t , Y t ) } is a strong Markov process. Let f : S × R + → R be continuously5ifferentiable in the second variable. Then applying Itˆo’s formula to f we can show that thegenerator of the process { ( X t , Y t ) } denoted by A u is given by A u f ( i, y ) = ∂f∂y ( i, y ) + X j = i ˜ λ ij ( t, y, u ( i, y ))[ f ( j, − f ( i, y )] . (2.4) Let c : S × R + × U −→ R + be the running cost function. Define ˜ c : S × R + × P ( U ) −→ R + by˜ c ( i, y, ν ) = Z U c ( i, y, u ) ν ( du ) . Let α > u ∈ U the infinite horizon discounted cost is given by J u α ( i ) = E u i, Z ∞ e − αt ˜ c ( X t , Y t , u ( X t , Y t )) dt (3.1)where E u i, denotes the expectation when the control u is used and X = i, Y = 0. The objectiveis to minimise J u α ( i ) over all admissible controls. So we define V α ( i ) = inf u ∈U J u α ( i ) . (3.2)The function V α is called the ( α -discounted) value function. An admissible control u ∗ ∈ U is called( α -discounted) optimal if J u ∗ α ( i ) = V α ( i ) . We carry out our study under the following assumptions : (A3) λ ij s ( j = i ) are jointly continuous in y and u and the sum X j = i λ ij ( y, u ) converges uniformlyfor each i . (A4) The cost function c is continuous in the second and third variable and there exists a finiteconstant ˜ C such that sup i,y,u c ( i, y, u ) ≤ ˜ C .
The boundedness of c implies that V α ( i ) is well defined for each i andsup i V α ( i ) ≤ ˜ Cα .
6n order to characterise the value function and the optimal control we construct an equivalentsemi-Markov decision process. In order to do so the key observation here is that between jumps thetrajectory of the process { ( X t , Y t ) } is deterministic. Thus { ( X t , Y t ) } is a piecewise deterministicprocess [1]. Therefore a stationary relaxed control is equivalent to that of choosing a function r : [0 , ∞ ) −→ P ( U ) at each jump time. More explicitly suppose the process jumps to a state ( i, r i given by r i ( y ) = u ( i, y ).Let R = { r | r : [0 , ∞ ) −→ P ( U ) , measurable } This set R will be the action space for an equivalent semi-Markov decision process that we aregoing to construct. First we give a topology on R . Let V = L ([0 , ∞ ); C ( U )), where C ( U ) isspace of continuous functions on U endowed with the supremum norm.. Thus V is the space ofintegrable (with respect to Lebesgue measure) C ( U )-valued functions on [0 , ∞ ). Then the dualof V is V ∗ = L ∞ ([0 , ∞ ); M ( U )), where M ( U ) is the space of complex Borel regular measures on U with the total variation norm. Now by Banach-Alaoglu theorem the unit ball of V ∗ is weak ∗ compact. Hence R being a closed subset of the unit ball of V ∗ , is a compact metric space (for moredetails see [1, Chap 4, p. 149]). In this topology, r n −→ r if and only if Z [0 , ∞ ) Z U f ( y, u ) r ny ( du ) dy −→ Z [0 , ∞ ) Z U f ( y, u ) r y ( du ) dy for all f ∈ V .Now define f : S × R −→ R + by f ( i, r ) = Z ∞ (cid:18) exp( − αy ) exp {− Z y Z U X k = i λ ik ( s, u ) r s ( du ) ds } Z U c ( i, y, u ) r y ( du ) (cid:19) dy . (3.3)For r ∈ R define a transition matrix byˆ p ij ( r ) = Z ∞ (cid:18) exp {− Z y Z U X k = i λ ik ( s, u ) r s ( du ) ds } Z U λ ij ( y, u ) r y ( du ) (cid:19) dy . (3.4)Finally for r ∈ R and t ∈ R + define a family of distribution functions by F rij ( t ) = R t (cid:18) exp {− R y R U X k = i λ ik ( s, u ) r s ( du ) ds } Z U λ ij ( y, u ) r y ( du ) (cid:19) dy ˆ p ij ( r ) . (3.5)Now consider a semi-Markov decision process with state space S , action space R , expected one stagecost f given by (3.3), transition probabilities (ˆ p ij ( r )) given by (3.4) and sojourn time distributions F rij given by (3.5). In short the dynamics of the process is as follows: Suppose the initial stateis i ∈ S and the decision maker chooses an action r from the set R . The action depends on the7tate. Because of this action the decision maker has to pay a cost up to the next jump time at arate dependent on the state and the action chosen. The next state is j with probability ˆ p ij ( r ) andconditioned on the event that the next state is j , the distribution of the sojourn time in the state i is given by F rij . The aim of the decision maker is to minimize the cost over the set of stationarypolicies π : S −→ R .Define e J πα ( i ) = E πi ∞ X n =0 e − α ( τ + τ + ··· + τ n ) Z τ n +1 e − αy (cid:18)Z U c ( X T n , y, u ) π X Tn ( y )( du ) (cid:19) dy (3.6)where T n is the n th jump time and τ n = T n − T n − . Let˜ V α ( i ) = inf π e J πα ( i ) . Thus ˜ V α is the value function for the SMDP. Now corresponding to a control u of the originaloptimal control problem, define the policy π u for the semi-Markov decision process by π u i ( y ) = u ( i, y ) . Then it follows from the definition of the semi-Markov decision process that J u α ( i ) = E u i, (cid:2) ∞ X n =0 Z T n +1 T n e − αt ˜ c ( X t , Y t , u ( X t , Y t )) dt (cid:3) = ∞ X n =0 E u i, (cid:2) E u i, (cid:2)Z T n +1 T n e − αt ˜ c ( X t , Y t , u ( X t , Y t )) dt | H n (cid:3)(cid:3) = E π u i ∞ X n =0 e − α ( τ + τ + ··· + τ n ) Z τ n +1 e − αy (cid:18)Z U c ( X T n , y, u ) π u X Tn ( y )( du ) (cid:19) dy = e J π u α ( i )where H n is the history upto the nth jump time. On the other hand corresponding to a policy π of the SMDP define the control u π for the original optimal control problem by u π ( i, y ) = π i ( y ) . Again J u π α ( i ) = ˜ J πα ( i ) . Hence it follows that V α ( i ) = e V α ( i ) . (3.7)The equation (3.7) establishes the equivalence between the original control problem and the con-structed semi-Markov decision process. 8hus in order to evaluate V α ( i ), we analyse the the equivalent semi-Markov decision process.As a first step we state the following useful lemma. Lemma 3.1.
Under (A1) - (A4), the functions f ( i, . ) , ˆ p ij ( . ) and F ( . ) ij ( t ) are continuous on R .Proof. Suppose r n converges to r in R . Then | f ( i, r n ) − f ( i, r ) | ≤ ˜ C Z ∞ e − αt (cid:12)(cid:12) e − R t R U P k = i λ ik ( s,u ) r ns ( du ) ds − e − R t R U P k = i λ ik ( s,u ) r s ( du ) ds (cid:12)(cid:12) dt + (cid:12)(cid:12)(cid:12)(cid:12)Z ∞ Z U e − αt e − R t R U P k = i λ ik ( s,u ) r s ( du ) ds c ( i, t, u ) r nt ( du ) dt − Z ∞ Z U e − αt e − R t R U P k = i λ ik ( s,u ) r s ( du ) ds c ( i, t, u ) r t ( du ) dt (cid:12)(cid:12)(cid:12)(cid:12) . By the definition of convergence in R , both the terms on the right hand side of the above expressiongo to 0 as n → ∞ . Similar arguments hold for the other two functions as well.Thus using the equivalence of the semi-Markov decision process described above and theoriginal control problem, we obtain the following result from the standard theory of SMDP [15]. Theorem 3.1.
Assume (A1) - (A4). Then the value function V α is the unique bounded solutionof φ ( i ) = min r ∈R (cid:2) f ( i, r ) + X j = i ˆ p ij ( r ) Z ∞ e − αt φ ( j ) dF rij ( t ) (cid:3) (3.8) Furthermore if r ∗ i is the minimizer of the right hand side of (3.8) (which exists by the previouslemma and compactness of R ), then the control given by u ∗ ( i, y ) = r ∗ i ( y ) is an optimal control forthe original control problem. Remark 3.1.
The reason for restricting to only stationary controls is evident from our approach.For setting a bijection between the set of controls of the original control problem and the equivalentSMDP, we need the restriction on the set of admissible controls. For a Markov control it is notclear that such a bijection can be established. Because in CTMDP as well as in SMDP, the optimalcontrol is finally given by a stationary control, this restriction is not unnatural.
Now we investigate the infinite horizon average cost cost problem via the equivalent semi-Markovdecision process approach. First we describe the infinite horizon average cost control problem for9he original control problem. For u ∈ U define J u ( i ) = lim sup n →∞ E u i, R T n R U c ( X t , Y t , u ) u ( X t , Y t )( du ) dt E u i, T n , where T n is the n th jump time. The aim of the controller is to minimise J u over all u . Now consider the semi-Markov decision process defined in the previous section with the expectedone-stage (jump to jump) cost in state i given by ϕ ( i, r ) = Z ∞ (cid:18) exp (cid:8) − Z y Z U X k = i λ ik ( s, u ) r s ( du ) ds (cid:9) Z U c ( i, y, u ) r y ( du ) (cid:19) dy . where r ∈ R is the action chosen in state i .Now define ˜ J π ( i ) = lim sup n →∞ E πi Z ( T n ) E πi ( T n ) , where Z ( T n ) = n − X k =0 Z τ k +1 Z U c ( X T k , y, u ) π X Tk ( y )( du ) dy is the cost incurred up to the n th jump time.By arguments analogous to the discounted case we haveinf u J u ( i ) = inf π ˜ J π ( i ) . Let ¯ τ ( i, r ) be the expected sojourn time of the equivalent semi-Markov decision process in state i,when the action chosen is r . Thus¯ τ ( i, r ) = Z ∞ exp {− Z t Z U X j = i λ ij ( y, u ) r y ( du ) dy } dt . Consider the equation ψ ( i ) = inf r ∈R [ ϕ ( i, r ) + X j = i ˆ p ij ( r ) ψ ( j ) − ρ ¯ τ ( i, r )] (4.1)where ψ : S → R and ρ is a scalar.Using the equivalence and the theory of SMDP [15], we obtain the following result: Theorem 4.1. If (4.1) has a solution ( h, g ) , where h is a bounded function, then g is the optimalaverage cost for the original control problem and an optimal policy is given by u ∗ ( i, y ) = r ∗ i ( y ) where r ∗ i is given by [ ϕ ( i, r ∗ i ) + X j = i ˆ p ij ( r ∗ i ) h ( j ) − g ¯ τ ( i, r ∗ i )] = inf r ∈R [ ϕ ( i, r ) + X j = i ˆ p ij ( r ) h ( j ) − g ¯ τ ( i, r )] . λ ij which will ensure the existence of a boundedsolution of (4.1). We make two additional assumptions: (A5) S is a finite set. (A6) The exists δ > λ i ( y, u ) > δ for all i ( = 0) , y, u and for j = 0 if sup y,u λ ij ( y, u ) > y,u λ ij ( y, u ) > Remark 4.1.
Note that even though S is finite, the effective state space is S × R + which isuncountable. Now we give an example where our assumptions are true.
Example 4.1.
We modify the second example in the introduction. Let λ ij be modified as follows:For i ≤ N − , λ iN ( y, µ ) = µ N − i .λ ii +1 ( y, µ ) = µ − µ N − i for y ≤ µ N − i for y ≥ linear in between .λ ii +2 ( y, µ ) = µ − µ N − i − λ ii +1 ( y, µ ) .λ N − N − ( y, µ ) = µ − µ for y ≤ µ for y ≥ linear in between .λ N − N ( y, µ ) = µ − λ N − N − ( y, µ ) .λ N − N ( y, µ ) = µ .λ N ( y, µ ) = µ . Clearly this example satisfies (A5) and (A6) with N playing the role of . For u ∈ U it follows from (A6) that the transition probabilities of the embedded Markov chain { X T n } where T n are the successive jump times, satisfy:ˆ p u i = Z ∞ ˜ λ i ( y, u ( i, y )) exp (cid:0) − Z y X j = i ˜ λ ij ( s, u ( i, s )) ds (cid:1) dy ≥ δ Z ∞ exp( − M y ) dy = δM . i is finite, i.e., if N = min { n ≥ | X T n = 0 } then sup u ∈U E u i N < ∞ . (4.2)Also by (A6) it follows that if ˆ p u ij = 0 then inf u ∈U ˆ p u ij > τ = inf { t > | ( X t , Y t ) = (0 , } . (4.3) Lemma 4.1.
Under (A1)-(A3), (A5)-(A6) we have sup u E u i, τ < ∞ , (4.4) where τ is as in (4.3) .Proof. Let δ n denote the set of sequences of states ( i , i , · · · , i n ) such that i = ii j = 0 for j = 1 , , · · · , n − i n = 0 . Then E u i, τ = ∞ X n =1 X ( i ,i , ··· ,i n ) ∈ δ n Y ˆ p u i k ,i k +1 ( η u i i + · · · + η u i n − i n )where η u ij is the expected amount of time spent in state i given that the next transition will be intostate j . Therefore E u i, τ ≤ (max j,k ∈ S η u jk ) E u i N .
Using (A6) and the fact that the expected sojourn times in each state is finite it follows thatsup u ∈U (max j,k ∈ S η u jk ) < ∞ . Note that for the above the finiteness of the state space is crucial. Hence the desired result followsby (4.2). 12 emma 4.2.
For α > , let h α ( i ) = V α ( i ) − V α (0) . Then the family { h α } α> is uniformly bounded.Proof. Let K be a constant such that max i sup u E u i, τ < K . If u ∗ α denotes the optimal policy for the α − discounted case then we have, V α ( i ) = E u ∗ α i, (cid:20)Z τ e − αt ˜ c ( X t , Y t , u ∗ α ( X t , Y t )) dt + Z ∞ τ e − αt ˜ c ( X t , Y t , u ∗ α ( X t , Y t )) dt (cid:21) ≤ ˜ CK + E u ∗ α i, e − ατ V α (0) ≤ ˜ CK + V α (0) . Again, E u ∗ α i, e − ατ V α (0) ≤ V α ( i )Thus, V α (0) ≤ V α ( i ) + (cid:0) − E u ∗ α i, e − ατ (cid:1) V α (0) ≤ V α ( i ) + (cid:0) − e − αK (cid:1) ˜ Cα ≤ V α ( i ) + K ˜ C .
The second inequality follows from Jensen’s inequality.Thus we have | h α ( i ) | ≤ K ˜ C .
Theorem 4.2.
Under (A1)-(A6), the equation (4.1) has a solution ( h, g ) where h is a boundedfunction and g is a scalar.Proof. Let ˜ h α ( i ) = ˜ V α ( i ) − ˜ V α (0). Then by Lemma 4 . { ˜ h α ( i ) } is uniformly bounded. Therefore there exists a sequence α n → g = lim α n → α n ˜ V α n (0) h ( i ) = lim α n → ˜ h α n ( i )where h is a bounded function. Now using standard arguments [15], it can be shown that the pair( g, h ) satisfies (4.1). 13 emark 4.2. If { X u t } is irreducible for each u ∈ U , i.e., if the embedded Markov chain is irreduciblethen lim sup n →∞ E u i, R T n R U c ( X t , Y t , u ) u ( X t , Y t )( du ) dt E u i, T n = lim sup T →∞ T E u i, Z T Z U c ( X t , Y t , u ) u ( X t , Y t )( du ) dt . Thus if the irreducibility assumption holds, then g of the above theorem satisfies g = lim sup T →∞ T E u i, Z T Z U c ( X t , Y t , u ) u ( X t , Y t )( du ) dt . We have studied optimal control problems for a class Markov processes with age dependent tran-sitions rates which subsumes semi-Markov decision processes with the holding time distributionshaving densities. We have allowed control actions between jumps based on the age of the process.We have constructed an equivalent SMDP which yields the relevant results for the original problem.A standard approach towards solving an optimal control problem is via the HJB equation. In ourproblem the HJB equation for the discounted cost case is given by dϕ ( i, y ) dy + inf u [ c ( i, y, u ) + X j = i λ ij ( y, u ) { ϕ ( j, − ϕ ( i, y ) } ] = αϕ ( i, y ) (5.1)on S × [0 , ∞ ). One important difficulty in handing with this differential equation is that it is non-local. It can be be shown via contraction principle argument that when α > M , the value function V α is the unique bounded, smooth solution of (5.1). In this case the infimum in (5.1) is realised ata stationary deterministic (non-relaxed) control which is optimal for the α -discounted cost criteria.But we have not been able to establish the existence of a solution to (5.1) when α ≤ M . Becausewe have not been able to solve the discounted case HJB for smaller values of α , we could not pursuethe vanishing discount approach in finding a solution to the HJB equation for the average optimalcase. In our problem the HJB equation for the average optimal case is given by ρ = dh ( i, y ) dy + inf u [ c ( i, y, u ) + X j = i λ ij ( y, u ) { h ( j, − h ( i, y ) } ] . (5.2)It would be interesting to investigate an appropriate solution of 5.2 to study the average optimalcase. Finally, in this paper we have assumed that the jump rates and the cost function are bounded.If the jump rates are unbounded but satisfy a certain growth rate, then following the arguments inChapter 8, Section 3 in [2], one can show that the controlled martingale problem for the operator A u f ( i, y ) = ∂f∂y ( i, y ) + X j = i ˜ λ ij ( t, y, u ( i, y ))[ f ( j, − f ( i, y )] . (5.3)14s well-posed. For an unbounded cost, with an appropriate growth rate it may be possible to workin the space of continuous functions with weighted norms as in [7], [9] to derive analogous results. References [1] M. H. A. Davis,
Markov Models and Optimization , Chapman and Hall, 1993.[2] S. N. Ethier and T. G. Kurtz,
Markov Processes : Characterization and Convergence , JohnWiley and Sons, 1986.[3] E. A. Finberg and A. A. Yushkevich,
Homogeneous controllable Markov models with continuoustime and with a finite or countable state space
Toer. Veroyatnost. i Primenen 24 (1979), 155-160.[4] A. Federgruen, A Hordijk and H. C. Tijms,
Denumerable state semi-Markov decision processeswith unbounded costs, average cost criteria , Stoch. Proc. and Appl. 9 (1979), 223-235.[5] A. Federgruen, A Hordijk and H. C. Tijms,
Denumerable undiscounted semi-Markov decisionprocesses with unbounded rewards , Math. of Oper. Research 8 (1983), 298-313.[6] M.K. Ghosh and A. Goswami,
Risk minimising option pricing in a semi-Markov modulatedmarket , SIAM J. Control Optim. 48 (2009), 1519-1541.[7] X. Guo and O. Hern´andez-Lerma,
Continuous-Time Markov Decision Processes. Theory andApplications , Springer-Verlag, 2009.[8] X. Guo and O. Hern´andez-Lerma,
Continuous-time controlled Markov chains , Annals of Ap-plied Probability 13 (2003), 363-388.[9] X. Guo, O. Hern´andez-Lerma and T. Prieto-Rumeau,
A survey of recent results on continuous-time Markov decision processes , TOP 14 (2006), 177-261.[10] A. Hordijk and F. A. Van Der Duyn Schouten,
Average optimal policies in Markov decisiondrift processes with applications to a queueing and a replacement model , Advances in AppliedProbability 15 (1983), 274-303.[11] A. Hordijk and F. A. Van Der Duyn Schouten,
Discretization and weak convergence in Markovdecision drift processes , Mathematics of Operations Research, 9 (1984), 112-141.[12] A. Hordijk and F. A. Van Der Duyn Schouten,
Markov decision drift processes; conditions foroptimality obtained by discretization , Mathematics of Operations Research 10 (1985), 160-173.1513] N. Ikeda and S. Watanabe,
Stochastic Differential Equations and Diffusion Processes , NorthHolland, 1989.[14] S. R. Pliska,
Controlled jump processes , Stochastic Processes and their Applications 3 (1975),259-282.[15] S. M. Ross,
Applied Probability Models with Optimization Applications , Dover, 1992.[16] K. Wakuta,