Risk-sensitive control of continuous time Markov chains
aa r X i v : . [ m a t h . O C ] S e p RISK-SENSITIVE CONTROL OF CONTINUOUS TIMEMARKOV CHAINS
MRINAL K. GHOSH AND SUBHAMAY SAHA
Abstract.
We study risk-sensitive control of continuous time Markov chains taking valuesin discrete state space. We study both finite and infinite horizon problems. In the finitehorizon problem we characterise the value function via HJB equation and obtain an optimalMarkov control. We do the same for infinite horizon discounted cost case. In the infinitehorizon average cost case we establish the existence of an optimal stationary control undercertain Lyapunov condition. We also develop a policy iteration algorithm for finding anoptimal control. Introduction and Preliminaries
In the last two decades considerable attention has been given to the investigation ofrisk sensitive problems in the literature of stochastic dynamic optimization. An importantreason for the popularity of this kind of problems is its connections with H ∞ or robust controlproblems and stochastic dynamic games. A justification for the term risk-sensitive controlcomes from utility theory in economics. Generally in stochastic dynamic optimization, thedecision maker (controller) seeks to minimise a cost functional which is a random quantity,say, X , which depends on the time horizon and the control adopted by the controller.Since X is random the controller tries to minimise the expected value of X . This is therisk neutral case. But this approach has some limitations namely if the variance is largethen there can be issues with the optimal control. Generally variance is a measure of riskin economics literature. So ideally one would like to minimise both mean and variancesimultaneously, but this may not be feasible. Therefore a convex combination of mean andvariance is optimised or the mean is optimised for a given variance. This approach of mean-variance optimization was taken by Markowitz in his work on portfolio selection [20]. Thiswas later extended by Sharpe in his capital asset pricing model [26]. But if the randomvariable is not normally distributed, then its distribution is not completely determined bythe first two moments. Thus it is reasonable to consider a cost criterion which deals withhigher moments as well. A powerful approach in this direction is the risk-sensitive controlwherein the controller seeks to minimise an exponential criterion. Roughly speaking thecost functional of interest is of the form E exp( θX ) where X is the random variable which Mathematics Subject Classification.
Primary 93E20 ; Secondary 49L20, 60J27.
Key words and phrases.
Risk sensitive control, finite horizon problem, infinite horizon discounted cost,infinite horizon average cost, multiplicative ergodic theorem, HJB equation, Poisson equation, policy im-provement algorithm.This work is supported in part by SPM fellowship of CSIR and in part by UGC Centre for AdvancedStudy. denotes the cost payable by the controller and θ > w be the amount the controller is willing topay instead of the random quantity X . Thus w satisfiesexp( θw ) = E exp( θX ) . The deterministic quantity w is referred to as the certainty equivalent of X . The riskpremium π is defined by the equation w = E X + π . Now by Jensen’s inequality exp( θ E X ) ≤ E exp( θX ) = exp( θw ) . Thus by the monotonicity property of the exponential function, w ≥ E X , which implies π ≥
0. Thus in this case the controller is risk averse. Now to measure the degree of riskaversion, let x = E X . Formally by Taylor’s expansionexp( θw ) = exp( θx ) + πθ exp( θx ) + o ( π ) . Again E exp( θX ) = exp( θx ) + 12 var( X ) θ exp( θx ) + E ( o ( X − x ) ) . Thus we have π = var( X ) θ plus smaller order terms. Hence the risk premium is propor-tional to θ up to to first order. That is why θ is referred to as the absolute risk aversionparameter. Similar arguments can also be made for θ < θ = 0 is the risk neutral case.There is a vast literature on the risk neutral case, for example see [1] for controlleddiffusions, [11] for continuous time MDP, [13] for discrete time MDP and the referencestherein. See also [25] for variance minimization and overtaking optimality of continuous-time MDP. For earlier works on risk-sensitive control we refer to [15] and [16]. Since thenthere has been a lot of research on risk senstive control of discrete time Markov chains [6],[7], [14], [17] [21] and also there has been a lot of work on risk sensitive control of diffusions[3], [9], [22], [23], [27]. As is evident from the discussion above, risk sensitive control haswide applications in economics and in particular in finance [10], [4], [5], [24].Although risk sensitive control of continuous time diffusions and discrete time Markovchains has been studied, the problem for continuous time MDP does not seem to havebeen studied in literature. In this paper we study risk-sensitive control of continuous timeMarkov chains. We take the state space S to be countable. For notational simplicity wetake S = { , , , . . . } . Let U i , i = 0 , , · · · be compact metric spaces; U i is the control setwhen the state is i . We denote the state process by { X t } and the control process by { U t } . ISK-SENSITIVE CONTROL 3
Formally the dynamics of the process is as follows: ( P ( X t + h = j | X t = i, U t = u ) = λ ij ( u ) h + o ( h ) P ( X t + h = i | X t = i, U t = u ) = 1 − (cid:0)P j = i λ ij ( u ) (cid:1) h + o ( h ) , (1.1)where λ ij : U i → R + are given functions. That is, if the process is at i at time t andif the action chosen at that moment is u , then after a little while h the process will beat state j with probability λ ij ( u ) h plus some error term and the process will remain at i with probability 1 − (cid:0)P j = i λ ij ( u ) (cid:1) h plus some error term. Thus λ ij s are the instantaneoustransition rates. Set λ ii ( u ) = − X j = i λ ij ( u ) . (1.2)The following assumptions will be in force throughout the paper: (A1) The function λ ij s are continuous. (A2) sup i sup u ∈ U i {− λ ii ( u ) } ≤ M < ∞ . (A3) The sum in (1.2) converges uniformly. Thus λ ii is continuous for each i .We now describe a rigorous construction of the process { X t } via the martingale problem.Let D = D ([0 , ∞ ) , S ) be the space of S − valued right-continuous functions with left limitsendowed with the Skorokhod topology. Let S be the Borel σ − algebra on D . Define U = ∪ i U i .Let u : [0 , ∞ ) × S → U be such that u ( ., i ) ∈ U i and is measurable for each i . Let B ( S )denote the space of bounded real valued functions on S. For f ∈ B ( S ), || f || denotes thesupremum norm. For each t ∈ [0 , ∞ ) define the operator Λ u t : B ( S ) → B ( S ) byΛ u t f ( i ) = X j λ ij ( u ( t, i )) f ( j ) . (1.3)On the measurable space ( D , S ), let { X t , t ≥ } denote the canonical process, i.e., for ω ∈ D , X t ( ω ) = ω ( t ). Let µ be any probability measure on S . The martingale problemcorresponding to (Λ u , µ ) is the following: A measure P u s,µ on ( D , S ) is said to be a solutionfor the martingale problem corresponding to (Λ u , µ ) ifi) P u s,µ ( X s ∈ A ) = µ ( A ) for any Borel subset A of S ;ii) f ( X t ) − R t Λ u s f ( X s ) ds is a P u s,µ martingale with respect to the filtration F t = σ ( X r ; r ≤ t )for each f in B ( S ).Under (A2) it can be shown following the arguments in Chapter 6 of [8] that the abovemartingale problem has a unique solution and { X t } is a Markov process with the generatorgiven by (1.3). In fact we can relax the boundedness condition in (A2) . If λ ii s satisfyappropriate growth condition then also the martingale problem is well posed; see Chapter 6of [8]. Also see [12] and the references therein for related works. From now on we will workin the canonical space ( D , S ). If s = 0 and µ = δ i for some i ∈ S then we will write P u s,µ as P u i . The corresponding expectation operator is denoted by E u i . In our paper the set ofadmissible controls is the set of Markov controls, i.e., controls of the form U t = u ( t, X t − ),for some u : [0 , ∞ ) × S → U , such that u ( ., i ) ∈ U i and is measurable for each i . With an GHOSH AND SAHA abuse of terminology the map u itself is referred to as a Markov control. Let U denote theset of all Markov controls. A Markov control is said to be stationary if the function u hasno explicit dependence on t , i.e., u : S → U , such that u ( i ) ∈ U i for each i . The set ofstationary Markov controls is denoted by U s .Now we briefly describe the problems we consider in this paper. In stochastic dynamicoptimization based on the time horizon there can be two kinds of problems namely finitehorizon and infinite horizon problems. In this paper we address both infinite and finitehorizon problems. Finite Horizon Problem:
Define K = { ( i, u ) : i ∈ S, u ∈ U i } . Let c : [0 , ∞ ) × K → [0 , ∞ )be a bounded function, such that c ( ., i, . ) is continuous for each i and g : S → [0 , ∞ ) abounded function. Let 0 < T < ∞ be the length of the time horizon. Then for any Markovcontrol U consider the cost functional J u T ( i ) = 1 θ log E u i (cid:20) exp (cid:0) θ (cid:2)Z T c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) (1.4)for some θ ∈ (0 ,
1) and where U t = u ( t, X t − ). In literature c is referred to as the the runningcost function and g as the terminal cost function. The aim of the controller is to minimise J u T over all Markov controls u . A control ˆ u is said to be optimal if J ˆ u T ( i ) = inf U J u T ( i ) . Infinite Horizon Discounted Cost Problem:
For the infinite horizon problems therunning cost function has no explicit time dependence. For each Markov control u define I α ( θ, i, u ) = 1 θ log (cid:18) E u i (cid:18) exp (cid:20) θ Z ∞ e − αt c ( X t , U t ) dt (cid:21)(cid:19)(cid:19) (1.5)where θ is as in the finite horizon problem and α > I α ( θ, i, u ) over all Markov controls u . A control ˆ u is said to beoptimal if it satisfies I α ( θ, i, ˆ u ) = inf U I α ( θ, i, u ) . Infinite Horizon Average Cost Problem:
For the average cost problem the set ofadmissible controls is the set of stationary Markov controls. For a stationary control u ,define J u ( i ) = lim sup T →∞ T log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) . (1.6)The controller wants to minimise J u ( i ) over all stationary controls u . Optimal control isdefined analogously.The rest of the paper is organised as follows. In Section 2 we study the finite horizon prob-lem. This analysis of this problem is fairly straightforward. Using the dynamic programmingheuristics we derive the Hamilton Jacobi Bellman (HJB) equation for this criteria. Thenusing a fixed point theorem and some standard arguments involving Dynkin’s formula we ISK-SENSITIVE CONTROL 5 show that the value function is the unique solution of the HJB equation in an appropriateclass of functions. This in turn yields the existence of an optimal Markov control. Section 3deals with infinite horizon discounted cost case. The analysis of this problem is surprisinglyfar more involved from a technical view point. As usual by using the dynamic programmingheuristics we derive the HJB equation and establish the corresponding verification theorem.However, to establish the existence of a smooth solution of the HJB equation for this criteriaturns out to be quite tricky. We work around this problem by an appropriate limiting pro-cedure which establishes a solution of the HJB equation in the sense of distributions. Thenunder certain assumptions we establish the desired regularity of the solution. In Section 4we investigate the average cost problem. Again this problem turns out to be technicallyinvolved. The traditional vanishing discount approach does not seem to work. Instead weuse the multiplicative Poisson equation to get at the desired result. Using a limiting argu-ment involving the multiplicative Poisson equation we establish the existence of an optimalstationary control. In Section 5 we give a policy improvement algorithm for the average costcase. Finally in Section 6 we conclude our paper with some concluding remarks.2.
Finite Horizon Case
In this section we study the finite horizon case. For this we first study the exponentialcost criterion. For t ∈ [0 , T ], u ∈ U , defineˆ J u T ( t, i ) = E u t,i (cid:20) exp (cid:0) θ (cid:2)Z Tt c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) . (2.1)Define the value function V T by V T ( t, i ) = inf U ˆ J u T ( t, i )where the infimum is over all Markov controls. Our aim is to characterise the value functionand to obtain an optimal control. To this end we first describe a heuristic derivation of theHamilton Jacobi Bellman (HJB) equation. Formally V T ( t, i ) = inf U E u t,i (cid:26) exp (cid:20) θ Z t + ht c ( s, X s , U s ) ds + θ Z Tt + h c ( s, X s , U s ) ds + θg ( X T ) (cid:21)(cid:27) = inf U E u t,i (cid:26) exp (cid:20) θ Z t + ht c ( s, X s , U s ) ds (cid:21) E u t + h,X t + h (cid:18) exp (cid:20) θ Z Tt + h c ( s, X s , U s ) ds + θg ( X T ) (cid:21)(cid:19)(cid:27) = inf U E u t,i (cid:26) exp (cid:20) θ Z t + ht c ( s, X s , U s ) ds (cid:21) V T ( t + h, X t + h ) (cid:27) . If the function V T ( ., i ) is continuously differentiable then standard dynamic programmingarguments involving Dynkin’s formula leads to the following HJB equation for the finitehorizon problem: dϕdt + inf u ∈ U i (cid:2) θc ( t, i, u ) ϕ ( t, i ) + X j λ ij ( u ) ϕ ( t, j ) (cid:3) = 0 on [0 , T ) × Sϕ ( T, i ) = e θg ( i ) . (2.2)The importance of this equation is highlighted by the following verification theorem: GHOSH AND SAHA
Theorem 2.1.
Assume (A1)-(A3) . Suppose there exists a smooth (continuously differ-entiable with respect to the first variable), bounded solution Ψ to (2.2) , then Ψ( t, i ) = V T ( t, i ) for all ( t, i ) ∈ [0 , T ] × S .
Furthermore an optimal Markov control for the cost criterion (2.1) exists and is given by U ∗ t = u ∗ ( t, X t − ) where u ∗ satisfies inf u ∈ U i (cid:2) θc ( t, i, u )Ψ( t, i ) + X j λ ij ( u )Ψ( t, j ) (cid:3) = (cid:2) θc ( t, i, u ∗ ( t, i ))Ψ( t, i ) + X j λ ij ( u ∗ ( t, i ))Ψ( t, j ) (cid:3) . (2.3) Proof.
Let u be any arbitrary Markov control. By Feynman - Kac formula E u t,i (cid:20) exp (cid:0) θ (cid:2)Z Tt c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) = Ψ( t, i ) + E u t,i Z Tt exp (cid:18) θ (cid:2)Z rt c ( s, X s , U s ) ds (cid:3)(cid:19)(cid:20) d Ψ dr ( r, X r ) + θc ( r, X t , U t )Ψ( r, X r ) + X j λ X r j ( U r )Ψ( t, j ) (cid:21) dr . Since Ψ satisfies (2.2) we haveΨ( t, i ) ≤ E u t,i (cid:20) exp (cid:0) θ (cid:2)Z Tt c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) . Now if we use the control u ∗ as in (2.3) then we get an equality in the above in placeof inequality. The existence of an u ∗ satisfying (2.3) is ensured by a standard measurableselection theorem [2]. Hence the theorem follows. (cid:3) Next we prove that there exists a smooth, bounded solution to (2.2).
Theorem 2.2.
Assume (A1)-(A3) . Then there exists a unique solution to (2.2) in C b ([0 , T ] × S ) T C ([0 , T ) × S ) .Proof. Let ϕ ( t, i ) = e − γ t ψ ( t, i ). Substituting in (2.2) we get dψdt − γ ψ + inf u ∈ U i (cid:2) θc ( t, i, u ) ψ ( t, i ) + X j λ ij ( u ) ψ ( t, j ) (cid:3) = 0 on [0 , T ) × Sψ ( T, i ) = e γ T e θg ( i ) . (2.4)Consider the following integral equation: ψ ( t, i ) = e γ t e θg ( i ) + e γ t Z Tt e − γ s inf u ∈ U i (cid:2) θc ( s, i, u ) ψ ( s, i ) + X j λ ij ( u ) ψ ( s, j ) (cid:3) ds . (2.5)Define T : C b ([0 , T ] × S ) → C b ([0 , T ] × S ) by T ψ ( t, i ) = e γ t e θg ( i ) + e γ t Z Tt e − γ s inf u ∈ U i (cid:2) θc ( s, i, u ) ψ ( s, i ) + X j λ ij ( u ) ψ ( s, j ) (cid:3) ds . ISK-SENSITIVE CONTROL 7
Then |T ψ ( t, i ) − T ψ ( t, i ) | ≤ e γ t Z Tt e − γ s (cid:8) θ || c |||| ψ − ψ || + 2 M || ψ − ψ || (cid:9) ds = (2 M + θ || c || ) || ψ − ψ || e γ t Z Tt e − γ s ds = 2 M + θ || c || γ || ψ − ψ || e γ t ( e − γ t − e − γ T ) ≤ M + θ || c || γ || ψ − ψ || , where M is as in (A2) . Hence for γ = 2 M + θ || c || + 1, T is a contraction and thus byBanach’s fixed point theorem there exists a unique solution to (2.5) in C b ([0 , T ] × S ). Using (A1)-(A3) , the boundedness and continuity of the cost function c , it follows that ψ is in C b ([0 , T ] × S ) T C [0 , T ) × S . Then it follows that ϕ ( t, i ) = e − γ t ψ ( t, i ) is a solution of (2.2).The uniqueness follows from the previous theorem. (cid:3) Thus combining the above two theorems we get the following theorem:
Theorem 2.3.
Under (A1)-(A3) , the value function V T is the unique solution to (2.2) in C b ([0 , T ] × S ) T C ([0 , T ) × S ). An optimal control is given by the Markov control U ∗ t = u ∗ ( t, X t − ) where u ∗ satisfies inf u (cid:2) θc ( t, i, u ) V T ( t, i ) + X j λ ij ( u ) V T ( t, j ) (cid:3) = (cid:2) θc ( t, i, u ∗ ( t, i )) V T ( t, i ) + X j λ ij ( u ∗ ( t, i )) V T ( t, j ) (cid:3) . (2.6)Now since logarithm is an increasing function the following theorem is now evident. Theorem 2.4.
Let ϕ be the unique solution of (2.2) in C b ([0 , T ] × S ) T C ([0 , T ) × S ).Define ψ = θ − log ϕ . Then ψ ( t, i ) = inf U θ log E u i (cid:20) exp (cid:0) θ (cid:2)Z T c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) . Moreover the Markov control given by (2.6) is again an optimal control in this case. Discounted Cost Case
In this section we turn our attention towards infinite horizon discounted cost problem.Define V α ( θ, i ) = inf U I α ( θ, i, u ) (3.1)where I α ( θ, i, u ) is as in (1.5). The function V α is called the α − discounted value function.Our aim is to characterise the value function and to obtain an optimal control.Instead of working with V α we first start with W α ( θ, i ) = inf U exp (cid:2) θI α ( θ, i, u ) (cid:3) . (3.2) GHOSH AND SAHA
Formally, for any
T > W α ( θ, i ) = inf U E u i (cid:26) exp (cid:20) θ Z T e − αt c ( X t , U t ) dt + θ Z ∞ T e − αt c ( X t , U t ) dt (cid:21)(cid:27) = inf U E u i (cid:26) exp (cid:20) θ Z T e − αt c ( X t , U t ) dt (cid:21) E u X T (cid:18) exp (cid:20) θe − αT Z ∞ e − αt c ( X t , U t ) dt (cid:21)(cid:19)(cid:27) = inf U E u i (cid:26) exp (cid:20) θ Z T e − αt c ( X t , U t ) dt (cid:21) W α ( θe − αT , X T ) (cid:27) . If W α ( ., i ) is smooth then, using Dynkin’s formula and some heuristic arguments we obtainthat W α should satisfy αθ dW α dθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) W α ( θ, i ) + X j λ ij ( u ) W α ( θ, j ) (cid:21) with lim θ → W α ( θ, i ) = 1 . (3.3)Equation (3.3) is known as the HJB equation for the cost criterion (3.2). Now starting with(3.3) the following verification theorem can be obtained. Theorem 3.1.
Assume that there exists a bounded, smooth (continuously differentiable inthe first variable) function w ( θ, i ) such that αθ dwdθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) w ( θ, i ) + X j λ ij ( u ) w ( θ, j ) (cid:21) on (0 , × S (3.4) and w ( θ, i ) → as θ → uniformly in i . Then w ( θ, i ) = W α ( θ, i ) . Furthermore an optimalcontrol for the cost criterion is given by (3.2) is given by U ∗ t = u ∗ ( θe − αt , X t − ) (3.5) where u ∗ is given by inf u ∈ U i (cid:2) θc ( i, u ) w ( θ, i ) + X j λ ij ( u ) w ( θ, j ) (cid:3) = (cid:2) θc ( i, u ∗ ( θ, i )) w ( θ, i ) + X j λ ij ( u ∗ ( θ, i )) w ( θ, j ) (cid:3) . (3.6) Proof.
Define θ t = θe − αt andΨ t = exp (cid:26)Z t θ s c ( X s , U s ) ds (cid:27) for any arbitrary Markov control U t = u ( t, X t − ). Then by Feynman - Kac formula we get E u i (cid:8) Ψ T w ( θ T , X T ) (cid:9) − w ( θ, i )= E u i (cid:26)Z T Ψ t (cid:20) − αθ t dwdθ ( θ t , X t ) + θ t c ( X t , U t ) w (( θ t , X t )) + X j λ X t j ( U t ) w ( θ t , X t ) (cid:21) dt (cid:27) . Since w satisfies (3.4), the term on the righthand side above is non-negative. Therefore weget w ( θ, i ) ≤ E u i (cid:8) Ψ T w ( θ T , X T ) (cid:9) . ISK-SENSITIVE CONTROL 9
Now θ T → T → ∞ and hence w ( θ T , X T ) →
1. Thus we get w ( θ, i ) ≤ E u i (cid:26) exp (cid:20) θ Z ∞ e − αt c ( X t , U t ) dt (cid:21)(cid:27) . (3.7)Similarly if we take the Markov control U ∗ given by (3.5) and (3.6) then we get equality in(3.7) in place of inequality. Hence the theorem follows. (cid:3) The following result is now evident.
Corollary 3.2.
For w as in Theorem 3.1, define v ( θ, i ) = θ − log w ( θ, i ) . Then v = V α ,where V α is as in (3.1) . Now we prove that the HJB equation (3.4) indeed has a smooth solution. To this endwe first prove the following result.
Proposition 3.3.
Let ǫ > be arbitrary but fixed. There exists a function W ǫ in C b ([ ǫ, × S ) T C (( ǫ, × S ) such that W ǫ satisfies αθ dW ǫ dθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) W ǫ ( θ, i ) + X j λ ij ( u ) W ǫ ( θ, j ) (cid:21) on ( ǫ, × SW ǫ ( ǫ, i ) = e ǫα || c || := h ǫ ( i ) . (3.8) Proof.
Let δ > T : C b ([ ǫ, ǫ + δ ] × S ) → C b ([ ǫ, ǫ + δ ] × S )by T f ( η, i ) = e ǫα || c || + 1 α Z ηǫ inf u ∈ U i (cid:2) c ( i, u ) f ( θ, i ) + 1 θ X j λ ij ( u ) f ( θ, j ) (cid:3) dθ . Then | T f ( η, i ) − T f ( η, i ) | ≤ α (cid:20) || c || δ || f − f || + 2 Mǫ δ || f − f || (cid:21) , where M is as in (A2) . Choose δ such that β := 1 α (cid:20) || c || δ + 2 Mǫ δ (cid:21) is strictly less than 1. Then T is a contraction. Hence by Banach’s fixed point theoremthere exists a W in C b ([ ǫ, ǫ + δ ] × S ) which is the unique fixed point of T . Now assumptions (A1)-(A3) and the continuity of c imply that W is in C b ([ ǫ, ǫ + δ ] × S ) T C (( ǫ, ǫ + δ ] × S ).Thus it follows that W satisfies (3.8) on [ ǫ, ǫ + δ ] × S . Proceeding in this way we will get afunction W ǫ ∈ C b ([ ǫ, × S ) T C (( ǫ, × S ) which satisfies (3.8). (cid:3) Next we take limit ǫ → W ǫ and show that the limit satisfies (3.4). In particular weprove the following theorem: Theorem 3.4.
Assume (A1)-(A3) and further assume that S is finite. Then there existsa unique solution W in the class C b ((0 , × S ) T C ((0 , × S ) to the equation αθ dWdθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) W ( θ, i ) + X j λ ij ( u ) W ( θ, j ) (cid:21) on (0 , × S with lim θ → W ( θ, i ) = 1 . Proof.
Using Dynkin’s formula it can be shown that W ǫ has the following stochastic repre-sentation: W ǫ ( θ, i ) = inf U E u i (cid:20) h ǫ exp (cid:18) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:19)(cid:21) where h ǫ is as in (5.2) and T ǫ = inf { t ≥ θ t = ǫ } , i.e., T ǫ = log( θǫ ) α . From this representationof W ǫ we can deduce that for every ǫ > ≤ W ǫ ( θ, i ) ≤ e θα || c || ≤ e || c || α . Now we show that dW ǫ dθ is also uniformly (in ǫ >
0) bounded. For any arbitrary Markovcontrol u , (cid:12)(cid:12)(cid:12)(cid:12) E u i (cid:2) h ǫ exp (cid:0) ( θ + δ ) Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3) − E u i (cid:2) h ǫ exp (cid:0) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) ≤ I + I where ( θ + δ ) e − αT δǫ = ǫ and I = (cid:12)(cid:12)(cid:12)(cid:12) E u i (cid:2) h ǫ exp (cid:0) ( θ + δ ) Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3) − E u i (cid:2) h ǫ exp (cid:0) θ Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) ,I = (cid:12)(cid:12)(cid:12)(cid:12) E u i (cid:2) h ǫ exp (cid:0) θ Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3) − E u i (cid:2) h ǫ exp (cid:0) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) . Now I ≤ e || c || α E u i (cid:20) exp (cid:0) θ Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:0) δ Z T δǫ e − αt c ( X t , U t ) dt (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ C e || c || α δ || c || α for some constant C > δ > I we have I ≤ e || c || α E u i (cid:20) exp (cid:0) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:0) θ Z T δǫ T ǫ e − αt c ( X t , U t ) dt (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ e || c || α (cid:20) e θ || c || α (cid:0) e − αTǫ − e − αTδǫ (cid:1) − (cid:21) . But θe − αT ǫ − θe − αT δǫ = δe − αT δǫ = ǫδθ + δ . Hence we have I ≤ e || c || α (cid:20) e θ || c || α ǫδθ + δ − (cid:21) ≤ C e || c || α δ || c || α for some constant C > δ > δ > (cid:12)(cid:12) W ǫ ( θ + δ, i ) − W ǫ ( θ, i ) (cid:12)(cid:12) ≤ C e || c || α δ || c || α for some constant C > δ <
0, small enough, we can get an estimate of the type (cid:12)(cid:12) W ǫ ( θ + δ, i ) − W ǫ ( θ, i ) (cid:12)(cid:12) ≤ C e || c || α | δ | || c || α . ISK-SENSITIVE CONTROL 11
Thus we get that dW ǫ dθ is uniformly bounded in ǫ > f W ǫ ( θ, i ) = ( W ǫ ( θ, i ) for θ > ǫh ǫ ( i ) for θ ≤ ǫ . Then f W ǫ satisfies the same bounds. Now since f W ǫ is uniformly bounded and d f W ǫ dθ is alsouniformly bounded, by Ascoli - Arzela theorem there exists a function W in C b ((0 , × S )and a sequence ǫ n → g W ǫ n → W uniformly over compact subsets of (0 , × S .Also by the definition of f W ǫ , W ( θ, i ) → θ →
0. Now taking ϕ ∈ C ∞ c (0 ,
1) we get − Z α g W ǫ n d ( θϕ ) dθ dθ = Z αθ d g W ǫ n dθ ϕ ( θ ) dθ = Z inf u ∈ U i (cid:20) θc ( i, u ) g W ǫ n ( θ, i ) + X j λ ij ( u ) g W ǫ n ( θ, j ) (cid:21) ϕ ( θ ) dθ − Z ǫ n inf u ∈ U i (cid:20) θc ( i, u ) g W ǫ n ( θ, i ) + X j λ ij ( u ) g W ǫ n ( θ, j ) (cid:21) ϕ ( θ ) dθ . Now taking limit n → ∞ we get − Z αW d ( θϕ ) dθ dθ = Z inf u ∈ U i (cid:20) θc ( i, u ) W ( θ, i ) + X j λ ij ( u ) W ( θ, j ) (cid:21) ϕ ( θ ) dθ . Thus αθ dWdθ = inf u ∈ U i (cid:20) θc ( i, u ) W ( θ, i ) + X j λ ij ( u ) W ( θ, j ) (cid:21) in the sense of distribution. But by our assumptions the righthand side is a continuousfunction. Therefore dWdθ is in C ((0 , × S ). Thus W is a smooth solution to the HJB equation(5.2). The uniqueness follows from Theorem 3.1. (cid:3) This immediately yields the following result:
Theorem 3.5.
Assume (A1)-(A3) and that S is finite.Then the value function V α as in (3.1) is the unique solution in C b ((0 , × S ) T C ((0 , × S ) to αθ (cid:2) v + θ dvdθ (cid:3) e θv = inf u ∈ U i (cid:2) θce θv + X j λ ij ( u ) e θv ( θ,j ) (cid:3) on (0 , × S with lim θ → v ( θ, i ) = inf U E u i Z ∞ e − αt c ( X t , U t ) dt . An optimal control is given by the Markov control U t = u ∗ ( θe − αt , X t − ) where u ∗ is given by inf u ∈ U i (cid:2) θc ( i, u ) e θV α + X j λ ij ( u ) e θV α ( θ,j ) (cid:3) = (cid:2) θc ( i, u ∗ ( θ, i )) e θV α + X j λ ij ( u ∗ ( θ, i )) e θV α ( θ,j ) (cid:3) . Remark 3.6.
The finiteness of the state space in Theorem 3.4 is forced upon by theuniformity in the boundary condition in 3.1. Note that the limiting procedure that we haveemployed only yields that lim θ → W ( θ, i ) = 1 for each i . Hence the finiteness assumptionon S . Note that a similar situation arises in the risk-sensitive control of diffusion processes [22] . In [22] the authors treat periodic diffusions for which the state space is a torus whichis compact. Infinite Horizon Average Cost
In this section we study the infinite horizon average cost case. In order to study theaverage cost case we make some further assumptions on our model. (A4)
For every stationary control u , the corresponding Markov chain is irreducible. (A5) There exists a Lyapunov function V : S → R + , an unbounded function W : S → [1 , ∞ ) and constants δ > b < ∞ such that e − V ( i ) X j λ ij ( u ) e V ( j ) ≤ − δW ( i ) + b { } ( i ) for all i, u. (4.1)An important consequence of (A5) is the following lemma: Lemma 4.1.
Let η < δ and τ = inf { t > X t = 0 } . (4.2) Then sup u E u i e ητ ≤ e V ( i ) . Proof.
Let τ n = sup { t ≥ X t ≤ n } . If X ≥ n , then τ n is 0. Assumption (A5) implies that there exists a ˜ b and a e V = e V suchthat X j λ ij ( u ) e V ( j ) ≤ − δ e V ( i ) + ˜ b { } ( i ) . By Dynkin’s formula we get E u i e η ( τ ∧ τ n ∧ n ) e V ( X τ ∧ τ n ∧ n ) = e V ( i ) + E u i Z τ ∧ τ n ∧ n e ηs (Λ u + η ) e V ( X s ) ds ≤ e V ( i ) + E u i Z τ ∧ τ n ∧ n e ηs ( η − δ ) e V ( X s ) ds . Thus we have e V ( i ) ≥ E u i e η ( τ ∧ τ n ∧ n ) e V ( X τ ∧ τ n ∧ n ) + E u i Z τ ∧ τ n ∧ n e ηs ( − η + δ ) e V ( X s ) ds ≥ E u i e η ( τ ∧ τ n ∧ n ) . Now letting n → ∞ we get the desired result. (cid:3) ISK-SENSITIVE CONTROL 13
Finally we make the following assumption (A6)
For τ as defined in (4.2), sup i, u E u i τ < ∞ . Remark 4.2.
If the state space is finite then it can be easily seen that (A5) implies (A6) . Now we state and prove the main theorem of this section:
Theorem 4.3.
Under (A1) - (A6) , an optimal control for the risk-sensitive average costcriterion exists for θ and c satisfying θ || c || < δ where δ is as in (4.1) .Proof. Let θρ u = lim T →∞ T log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) . (4.3)The existence of the above limit follows from the multiplicative ergodic theorems proved in[18] and [19] . Moreover it also follows from the results in [18] and [19] that the limit in (4.3)is the principal eigenvalue for the operator Λ u + θc and has a positive eigenfunction whichbelongs to the class L ∞ e V , i.e., if we denote an eigenfunction by h u then sup | h u ( i ) | e V ( i ) < ∞ . Thusthe following equation holds X j λ ij ( u ( i )) h u ( j ) + θc ( i, u ( i )) h u ( i ) = ρ u θh u ( i ) . (4.4)Equation (4.4) is referred to as the Poisson equation. Now it is clear that if h u satisfies (4.4)then so does any scalar multiple of h u . Therefore without any loss of generality we mayassume that h u (0) = 1. With this restriction, using Dynkin’s formula and the fact that h u satisfies (4.4) we get the following stochastic representation for h u : h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:19)(cid:21) . (4.5)Now using the stochastic representation of h u we derive bounds on h u . First we derive anupper bound. We have h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:19)(cid:21) ≤ E u i e θ || c || τ ≤ e V ( i ) by Lemma 4.1, provided θ || c || < δ .This upper bound shows that bound on h u is uniform in u . Next we obtain a lower bound. We have h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:19)(cid:21) ≥ exp (cid:26) E u i θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:27) ≥ exp( − θρ u E u i τ ) ≥ exp( − θ || c || E u i τ ) > ǫ for some ǫ >
0. In the above sequence of inequalities the second one follows from Jensen’sinequality and the last one follows from (A6) .Let ρ ∗ = inf u ρ u . (4.6)Next we show that there exists a control u ∗ which attains the infimum in (4.6). From (4.6)it follows that there exists a sequence u n such that ρ u n → ρ ∗ . Since each U i is compactthere exists a subsequence which is also denoted by u n again and a u ∗ such that u n → u ∗ pointwise . Again since h u n is pointwise bounded, there exists a subsequence which we call h u n againsuch that h u n ( i ) → h ∗ ( i ) for each i , for some h ∗ and inf i h ∗ ( i ) ≥ ǫ . Therefore by using Fatou’s lemma we have X j = i λ ij ( u ∗ ( i )) h ∗ ( j ) ≤ lim inf n →∞ X j = i λ ij ( u n ( i )) h u n ( j )= lim inf n →∞ [ − λ ii ( u n ( i )) h u n ( i ) − θc ( i, u n ( i )) h u n ( i ) + θρ u n h u n ( i )]= − λ ii ( u ∗ ( i )) h ∗ ( i ) − θc ( i, u ∗ ( i )) h ∗ ( i ) + θρ ∗ h ∗ ( i ) . Thus we get X j λ ij ( u ∗ ( i )) h ∗ ( j ) + θc ( i, u ∗ ( i )) h ∗ ( i ) ≤ θρ ∗ h ∗ ( i ) . Now we claim that ρ ∗ = ρ u ∗ . ISK-SENSITIVE CONTROL 15
Indeed, with τ n as in the proof of Lemma 4.1 we get from Dynkin’s formula E u ∗ i (cid:20) exp (cid:18) θ Z T ∧ τ n c ( X s , u ∗ ( X s )) ds (cid:19) h ∗ ( X T ∧ τ n ) (cid:21) = h ∗ ( i ) + E u ∗ i (cid:20)Z T ∧ τ n exp (cid:0) θ Z t c ( X s , u ∗ ( X s )) ds (cid:1) (Λ u ∗ + θc ) h ∗ ( X t ) dt (cid:21) ≤ h ∗ ( i ) + θρ ∗ E u ∗ i (cid:20)Z T ∧ τ n exp (cid:0) θ Z t c ( X s , u ∗ ( X s )) ds (cid:1) h ∗ ( X t ) dt (cid:21) ≤ h ∗ ( i ) + θρ ∗ Z T E u ∗ i (cid:20) exp (cid:0) θ Z t ∧ τ n c ( X s , u ∗ ( X s )) ds (cid:1) h ∗ ( X t ∧ τ n ) (cid:21) dt . Hence by Gronwall’s inequality we have E u ∗ i (cid:20) exp (cid:18) θ Z T ∧ τ n c ( X s , u ∗ ( X s )) ds (cid:19) h ∗ ( X T ∧ τ n ) (cid:21) ≤ h ∗ ( i ) e θρ ∗ T . Therefore letting n → ∞ we have h ∗ ( i ) e θρ ∗ T ≥ ǫ E u ∗ i (cid:20) exp (cid:18) θ Z T c ( X s , u ∗ ( X s )) ds (cid:19)(cid:21) . which implies θρ ∗ ≥ lim T →∞ T log E u ∗ i (cid:20) exp (cid:18) θ Z T c ( X s , u ∗ ( X s )) ds (cid:19)(cid:21) = θρ u ∗ . Hence ρ ∗ = ρ u ∗ . (cid:3) Policy Improvement Algorithm
In the previous section we have proved the existence of an optimal control. But ourtheorem is purely existential and does not give an algorithm to find an optimal control.In this section we focus on the computational approach for finding an optimal stationarycontrol. Since we are concerned with algorithm in this section we assume that both thestate and action spaces are finite. Now we describe the policy improvement algorithm.
AlgorithmStep 1:
Start with any initial stationary control u . For this u ρ u = lim T →∞ θT log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) and h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X t , u ( X t )) − ρ u ) dt (cid:19)(cid:21) . We know from previous section that h u satisfies the Poisson equation X j λ ij ( u ( i )) h u ( j ) + θc ( i, u ( i )) h u ( i ) = θρ u h u ( i )satisfying the constraint h u (0) = 1. Step 2:
Define u to be the stationary control which minimizesmin u ∈ U i (cid:2) θc ( i, u ) h u ( i ) + X j λ ij ( u ) h u ( j ) (cid:3) . With ρ u and h u as above continue the procedure. Theorem 5.1.
The above algorithm leads to an optimal control in finite number of steps.Proof.
In order to prove that this algorithm comes up with an optimal control in a finitenumber of steps we first claim that ρ u n +1 ≤ ρ u n . (5.1)Indeed, from the definition of u n +1 we have X j λ ij ( u n +1 ( i )) h u n ( j ) + θc ( i, u n +1 ( i )) h u n ( i ) ≤ X j λ ij ( u n ( i )) h u n ( j ) + θc ( i, u n ( i )) h u n ( i )= θρ u n h u n ( i ) . Now using arguments involving Dynkin’s formula as in the previous section it can be provedthat ρ u n +1 ≤ ρ u n .Our second claim is that, suppose for some n and for all i X j λ ij ( u n ( i )) h u n ( j ) + θc ( i, u n ( i )) h u n ( i ) = X j λ ij ( u n +1 ( i )) h u n ( j ) + θc ( i, u n +1 ( i )) h u n ( i ) (5.2) then u n is optimal.To prove this observe that if u n is as in (5.2) then θρ u n h u n ( i ) = X j λ ij ( u n ( i )) h u n ( i ) + θc ( i, u n ( i )) h u n ( i )= X j λ ij ( u n +1 ( i )) h u n ( i ) + θc ( i, u n +1 ( i )) h u n ( i )= min u ∈ U i (cid:2) θc ( i, u ) h u n ( i ) + X j λ ij ( u ) h u n ( i ) (cid:3) . (5.3)Now for any stationary control u we have by Dynkin’s formula E u i (cid:20) exp (cid:18) θ Z T ( c ( X t , u ( X t )) − ρ u n ) dt (cid:19) h u n ( X T ) (cid:21) = h u n ( i ) + E u i Z T exp (cid:18) θ Z t ( c ( X s , u ( X s )) − ρ u n ) ds (cid:19) [Λ u + θc − θρ u n ] h u n ( X t ) dt ≥ h u n ( i ) . The last inequality follows from (5.3). Thus we have h u n ( i ) ≤ max i h u n ( i ) E u i (cid:20) exp (cid:18) θ Z T ( c ( X t , u ( X t )) − ρ u n ) dt (cid:19)(cid:21) . This implies that ρ u n ≤ lim T →∞ θT log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) = ρ u . Hence the claim.Our final claim is that if u n is not an optimal control then the inequality in (5.1) is actually ISK-SENSITIVE CONTROL 17 a strict inequality.Since u n is not optimal, we have X j λ ij ( u n +1 ( i )) h u n ( i ) + θc ( i, u n +1 ( i )) h u n ( i ) − θρ u n h u n ( i ) = − g ( i ) , where g is a non-negative function and there exists at least one i such that g ( i ) > T > E u n +1 i (cid:20) exp (cid:18) θ Z T ( c ( X s , u n +1 ( X s )) − ρ u n ) ds (cid:19) h u n ( X T ) (cid:21) = h u n ( i ) + E u n +1 i (cid:20)Z T exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n ) ds (cid:19)(cid:18) θc n +1 + Λ n +1 − θρ u n (cid:19) h u n ( X t ) dt (cid:21) = h u n ( i ) − E u n +1 i (cid:20)Z T exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n ) ds (cid:19) g ( X t ) dt (cid:21) = h u n ( i ) − Z T exp − ( ρ u n − ρ u n +1 ) t exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n +1 ) ds (cid:19) g ( X t ) dt = h u n ( i ) − h u n +1 ( i ) T T e E u n +1 i Z T g ( X t ) h u n +1 ( X t ) dt (5.4)if ρ u n +1 = ρ u n . In (5.4) the expectation operator e E u n +1 i is given by e E u n +1 i f ( X t ) = E u n +1 i (cid:20) exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n +1 ) ds (cid:19) h u n +1 ( X t ) h u n +1 ( i ) f ( X t ) dt (cid:21) (5.5)for any real valued bounded function f . It is easy to see that (5.5) uniquely determinesa transition probability kernel e P u n +1 i and under e P u n +1 i the corresponding Markov chain isstill irreducible. Since the state space is finite, the Markov chain under e P u n +1 i is positiverecurrent. Thus it has a unique invariant measure, say, e π . Now observe that the righthandside of (5.4) is negative for T sufficiently large because1 T e E u n +1 i Z T g ( X t ) h u n +1 ( X t ) dt → X e π ( i ) g ( i ) h u n +1 ( i ) > . But the lefthand side is always non-negative. Thus we get a contradiction and hence ρ u n +1 < ρ u n . From the above claims it follows that the algorithm comes up with the optimal controlwithin a finite number of steps because the number of controls is finite. (cid:3)
Some comments are now in order.
Remark 5.2.
Suppose the state and action spaces are finite. Let u ∗ be an optimal control.Let ρ ∗ be the optimal average cost and h u ∗ ( i ) = E u ∗ i (cid:20) exp (cid:18) θ Z τ ( c ( X t , u ∗ ( X t )) − ρ ∗ ) dt (cid:19)(cid:21) . Then it follows from the arguments used in the proof of Theorem 5.1 that ( ρ ∗ , h u ∗ ) satisfiesthe equation θρ ∗ h u ∗ ( i ) = min u (cid:2) θc ( i, u ) h u ∗ ( i ) + X j λ ij ( u ) h u ∗ ( i ) (cid:3) . (5.6) Equation (5.6) is the HJB equation for the average cost criterion. If ( λ, h ) is a solution of (5.6) where h is a positive function then using Dynkin’s formula it can be shown that λ isthe optimal cost and the minimiser in (5.6) is an optimal control. Remark 5.3.
If the state space is countably infinite and equation (5.6) has a solution ( λ, h ) such that h is a bounded, positive function which is uniformly bounded away from , then again it can be shown that λ is the optimal cost and the minimiser in (5.6) is anoptimal control. However, we have not been able to show that (5.6) has such a solution. Ifone assumes that (5.6) has such a solution then one can develop value and policy iterationalgorithm along the lines of [6] . In [6] the authors deal with discrete time Markov chains.There they have developed value and policy iteration algorithm under the assumption thatanalogous dynamic programming equation has a solution. Conclusion
In this paper we have studied risk-sensitive optimal control problem for continuous timeMarkov chains. We have analysed the finite horizon case under fairly general conditions.For the infinite horizon discounted cost case we have assumed that the state space is finite.So it will be interesting to investigate the problem for the case of countably infinite statespace. The average cost case has been studied under an additional Lyapunov type stabilitycondition. We have established the existence of an optimal control. We have also developedpolicy iteration algorithm for the case of finite state and action spaces. For countable statespace an algorithmic approach to determine an optimal control needs further investigation.
Acknowledgement :
The authors wish to thank V. S. Borkar for helpful discussions.
References [1] A. Arapostathis, V. S. Borkar and M. K. Ghosh,
Ergodic Control of Diffusion Processes , Encyclopediaof Mathematics and its Applications 143, Cambridge University Press, Cambridge, 2012.[2] V. E. Benes,
Existence of optimal strategies based on specified information for a class of stochasticdecision problems , SIAM J. Control 8 (1970), 179-188.[3] A. Biswas, V. S. Borkar and K. S. Kumar,
Risk-sensitive control with near monotone cost , Appl. Math.Optim. 62 (2010), 145-163.[4] T. R. Bielecki and S. R. Pliska,
Risk-sensitive dynamic asset management , Appl. Math. Optim. 39(1999), 337-360.[5] T. R. Bielecki, S. R. Pliska and S. J. Sheu,
Risk sensitive portfolio management with Cox-Ingersoll-Rossinterst rates: the HJB equation , SIAM J. Control Optim 44 (2005), 1811-1843.[6] V. S. Borkar and S. P. Meyn,
Risk-sensitive optimal control for Markov decision processes with monotonecost , Math. Oper. Res. 27 (2002), 192-209.[7] K. J. Chung and M. J. Sobel,
Discounted MDPs: distribution functions and exponential utility maxi-mization , SIAM J. Control Optim. 25 (1987), 49-62.[8] S. N. Ethier and T. G. Kurtz,
Markov Processes : Characterization and Convergence , John Wiley andSons, 1986.
ISK-SENSITIVE CONTROL 19 [9] W. H. Fleming and W. M. McEneaney,
Risk sensitive control on an infinite horizon , SIAM J. ControlOptim. 33 (1995), 1881-1915.[10] M. K. Ghosh, A. Goswami and S. K. Kumar,
Portfolio Optimization in a semi-Markov modulatedmarket , Appl. Math. Optim. 60 (2009), 275-296.[11] X. Guo and O. Hern´andez-Lerma,
Continuous-Time Markov Decision Processes. Theory and Applica-tions , Springer-Verlag, 2009.[12] X. Guo, O. Hern´andez-Lerma and T. Prieto-Rumeau,
A survey of recent results on continuous-timeMarkov decision processes , TOP 14 (2006), 177-261.[13] O. Hern´andez-Lerma,
Adaptive Markov Control Processes , Springer-Verlag, New York, 1989.[14] D. Hern´ a ndez-Hern´ a ndez and S. I. Marcus, Risk sensitive control of Markov processes in countable statespace , Systems Control Lett. 29 (1996), 147-155.[15] R. A. Howard and J. E. Matheson,
Risk-sensitive Markov decision processes , Mananagement Sci. 18(1972), 356-369.[16] D. H. Jacobson
Optimal stochastic linear systems with exponential performance criteria and their rela-tion to deterministic differential games , IEEE Trans. on Automat. Control 18 (1973), 124-131.[17] A. Ja´ s kiewicz, Average optimality for risk-sensitive control with general state space , Ann. Appl. Probab.17 (2007), 654-675.[18] I. Kontoyiannis and S. P. Meyn,
Spectral theory and limit theorems for geometrically ergodic Markovprocesses , Ann. Appl. Probab. 13 (2003), 304-362.[19] I. Kontoyiannis and S. P. Meyn,
Large deviations asymptotics and the spectral theory of multiplicativelyregular Markov Processes , Electron. J. Probab. 10 (2005), 61-123.[20] H. Markowitz,
Portfolio Selection . J. of Finance 7 (1952), 77-91.[21] G. B. Di Masi and L. Stettner,
Risk-senstive control of discrete time Markov processes with infinitehorizon , SIAM J. Control Optim. 38 (1999), 61-78.[22] J. L. Menaldi and M. Robin,
Remarks on risk sensitive control problems , Appl. Math. Optim. 52 (2005),297-310.[23] H. Nagai,
Bellman equations of risk-sensitive control , SIAM J. Control Optim. 34 (1996), 74-101.[24] G. E. Monahan and M. J. Sobel,
Risk-sensitive dynamic market share attraction games , GamesEconom.Behav. 20 (1997), 149-160.[25] T. Prieto-Romeau and O. Hern´andez-Lerma,
Variance minimization and the overtaking optimality ap-proach to continuous-time controlled Markov chains , Math. Meth. Oper. Res. 70 (2009), 527-540.[26] W. F. Sharpe,
Capital asset prices: A theory of market equilibrium under conditions of risk , J. of Finance19 (1964), 425-442.[27] P. Whittle,
Risk-Sensitive Optimal Control , Wiley, New York, 1990.
Department of Mathematics, Indian Institute of Science, Bangalore 560 012, India.
E-mail address : [email protected] Department of Mathematics, Indian Institute of Science, Bangalore 560 012, India.
E-mail address ::