[PDF] Risk-sensitive control of continuous time Markov chains

Abstract

We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterise the value function via HJB equation and obtain an optimal Markov control. We do the same for infinite horizon discounted cost case. In the infinite horizon average cost case we establish the existence of an optimal stationary control under certain Lyapunov condition. We also develop a policy iteration algorithm for finding an optimal control.

Full PDF

aa r X i v : . [ m a t h . O C ] S e p RISK-SENSITIVE CONTROL OF CONTINUOUS TIMEMARKOV CHAINS

MRINAL K. GHOSH AND SUBHAMAY SAHA

Abstract.

We study risk-sensitive control of continuous time Markov chains taking valuesin discrete state space. We study both ﬁnite and inﬁnite horizon problems. In the ﬁnitehorizon problem we characterise the value function via HJB equation and obtain an optimalMarkov control. We do the same for inﬁnite horizon discounted cost case. In the inﬁnitehorizon average cost case we establish the existence of an optimal stationary control undercertain Lyapunov condition. We also develop a policy iteration algorithm for ﬁnding anoptimal control. Introduction and Preliminaries

In the last two decades considerable attention has been given to the investigation ofrisk sensitive problems in the literature of stochastic dynamic optimization. An importantreason for the popularity of this kind of problems is its connections with H ∞ or robust controlproblems and stochastic dynamic games. A justiﬁcation for the term risk-sensitive controlcomes from utility theory in economics. Generally in stochastic dynamic optimization, thedecision maker (controller) seeks to minimise a cost functional which is a random quantity,say, X , which depends on the time horizon and the control adopted by the controller.Since X is random the controller tries to minimise the expected value of X . This is therisk neutral case. But this approach has some limitations namely if the variance is largethen there can be issues with the optimal control. Generally variance is a measure of riskin economics literature. So ideally one would like to minimise both mean and variancesimultaneously, but this may not be feasible. Therefore a convex combination of mean andvariance is optimised or the mean is optimised for a given variance. This approach of mean-variance optimization was taken by Markowitz in his work on portfolio selection [20]. Thiswas later extended by Sharpe in his capital asset pricing model [26]. But if the randomvariable is not normally distributed, then its distribution is not completely determined bythe ﬁrst two moments. Thus it is reasonable to consider a cost criterion which deals withhigher moments as well. A powerful approach in this direction is the risk-sensitive controlwherein the controller seeks to minimise an exponential criterion. Roughly speaking thecost functional of interest is of the form E exp( θX ) where X is the random variable which Mathematics Subject Classiﬁcation.

Primary 93E20 ; Secondary 49L20, 60J27.

Key words and phrases.

Risk sensitive control, ﬁnite horizon problem, inﬁnite horizon discounted cost,inﬁnite horizon average cost, multiplicative ergodic theorem, HJB equation, Poisson equation, policy im-provement algorithm.This work is supported in part by SPM fellowship of CSIR and in part by UGC Centre for AdvancedStudy. denotes the cost payable by the controller and θ > w be the amount the controller is willing topay instead of the random quantity X . Thus w satisﬁesexp( θw ) = E exp( θX ) . The deterministic quantity w is referred to as the certainty equivalent of X . The riskpremium π is deﬁned by the equation w = E X + π . Now by Jensen’s inequality exp( θ E X ) ≤ E exp( θX ) = exp( θw ) . Thus by the monotonicity property of the exponential function, w ≥ E X , which implies π ≥

0. Thus in this case the controller is risk averse. Now to measure the degree of riskaversion, let x = E X . Formally by Taylor’s expansionexp( θw ) = exp( θx ) + πθ exp( θx ) + o ( π ) . Again E exp( θX ) = exp( θx ) + 12 var( X ) θ exp( θx ) + E ( o ( X − x ) ) . Thus we have π = var( X ) θ plus smaller order terms. Hence the risk premium is propor-tional to θ up to to ﬁrst order. That is why θ is referred to as the absolute risk aversionparameter. Similar arguments can also be made for θ < θ = 0 is the risk neutral case.There is a vast literature on the risk neutral case, for example see [1] for controlleddiﬀusions, [11] for continuous time MDP, [13] for discrete time MDP and the referencestherein. See also [25] for variance minimization and overtaking optimality of continuous-time MDP. For earlier works on risk-sensitive control we refer to [15] and [16]. Since thenthere has been a lot of research on risk senstive control of discrete time Markov chains [6],[7], [14], [17] [21] and also there has been a lot of work on risk sensitive control of diﬀusions[3], [9], [22], [23], [27]. As is evident from the discussion above, risk sensitive control haswide applications in economics and in particular in ﬁnance [10], [4], [5], [24].Although risk sensitive control of continuous time diﬀusions and discrete time Markovchains has been studied, the problem for continuous time MDP does not seem to havebeen studied in literature. In this paper we study risk-sensitive control of continuous timeMarkov chains. We take the state space S to be countable. For notational simplicity wetake S = { , , , . . . } . Let U i , i = 0 , , · · · be compact metric spaces; U i is the control setwhen the state is i . We denote the state process by { X t } and the control process by { U t } . ISK-SENSITIVE CONTROL 3

Formally the dynamics of the process is as follows: ( P ( X t + h = j | X t = i, U t = u ) = λ ij ( u ) h + o ( h ) P ( X t + h = i | X t = i, U t = u ) = 1 − (cid:0)P j = i λ ij ( u ) (cid:1) h + o ( h ) , (1.1)where λ ij : U i → R + are given functions. That is, if the process is at i at time t andif the action chosen at that moment is u , then after a little while h the process will beat state j with probability λ ij ( u ) h plus some error term and the process will remain at i with probability 1 − (cid:0)P j = i λ ij ( u ) (cid:1) h plus some error term. Thus λ ij s are the instantaneoustransition rates. Set λ ii ( u ) = − X j = i λ ij ( u ) . (1.2)The following assumptions will be in force throughout the paper: (A1) The function λ ij s are continuous. (A2) sup i sup u ∈ U i {− λ ii ( u ) } ≤ M < ∞ . (A3) The sum in (1.2) converges uniformly. Thus λ ii is continuous for each i .We now describe a rigorous construction of the process { X t } via the martingale problem.Let D = D ([0 , ∞ ) , S ) be the space of S − valued right-continuous functions with left limitsendowed with the Skorokhod topology. Let S be the Borel σ − algebra on D . Deﬁne U = ∪ i U i .Let u : [0 , ∞ ) × S → U be such that u ( ., i ) ∈ U i and is measurable for each i . Let B ( S )denote the space of bounded real valued functions on S. For f ∈ B ( S ), || f || denotes thesupremum norm. For each t ∈ [0 , ∞ ) deﬁne the operator Λ u t : B ( S ) → B ( S ) byΛ u t f ( i ) = X j λ ij ( u ( t, i )) f ( j ) . (1.3)On the measurable space ( D , S ), let { X t , t ≥ } denote the canonical process, i.e., for ω ∈ D , X t ( ω ) = ω ( t ). Let µ be any probability measure on S . The martingale problemcorresponding to (Λ u , µ ) is the following: A measure P u s,µ on ( D , S ) is said to be a solutionfor the martingale problem corresponding to (Λ u , µ ) ifi) P u s,µ ( X s ∈ A ) = µ ( A ) for any Borel subset A of S ;ii) f ( X t ) − R t Λ u s f ( X s ) ds is a P u s,µ martingale with respect to the ﬁltration F t = σ ( X r ; r ≤ t )for each f in B ( S ).Under (A2) it can be shown following the arguments in Chapter 6 of [8] that the abovemartingale problem has a unique solution and { X t } is a Markov process with the generatorgiven by (1.3). In fact we can relax the boundedness condition in (A2) . If λ ii s satisfyappropriate growth condition then also the martingale problem is well posed; see Chapter 6of [8]. Also see [12] and the references therein for related works. From now on we will workin the canonical space ( D , S ). If s = 0 and µ = δ i for some i ∈ S then we will write P u s,µ as P u i . The corresponding expectation operator is denoted by E u i . In our paper the set ofadmissible controls is the set of Markov controls, i.e., controls of the form U t = u ( t, X t − ),for some u : [0 , ∞ ) × S → U , such that u ( ., i ) ∈ U i and is measurable for each i . With an GHOSH AND SAHA abuse of terminology the map u itself is referred to as a Markov control. Let U denote theset of all Markov controls. A Markov control is said to be stationary if the function u hasno explicit dependence on t , i.e., u : S → U , such that u ( i ) ∈ U i for each i . The set ofstationary Markov controls is denoted by U s .Now we brieﬂy describe the problems we consider in this paper. In stochastic dynamicoptimization based on the time horizon there can be two kinds of problems namely ﬁnitehorizon and inﬁnite horizon problems. In this paper we address both inﬁnite and ﬁnitehorizon problems. Finite Horizon Problem:

Deﬁne K = { ( i, u ) : i ∈ S, u ∈ U i } . Let c : [0 , ∞ ) × K → [0 , ∞ )be a bounded function, such that c ( ., i, . ) is continuous for each i and g : S → [0 , ∞ ) abounded function. Let 0 < T < ∞ be the length of the time horizon. Then for any Markovcontrol U consider the cost functional J u T ( i ) = 1 θ log E u i (cid:20) exp (cid:0) θ (cid:2)Z T c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) (1.4)for some θ ∈ (0 ,

1) and where U t = u ( t, X t − ). In literature c is referred to as the the runningcost function and g as the terminal cost function. The aim of the controller is to minimise J u T over all Markov controls u . A control ˆ u is said to be optimal if J ˆ u T ( i ) = inf U J u T ( i ) . Inﬁnite Horizon Discounted Cost Problem:

For the inﬁnite horizon problems therunning cost function has no explicit time dependence. For each Markov control u deﬁne I α ( θ, i, u ) = 1 θ log (cid:18) E u i (cid:18) exp (cid:20) θ Z ∞ e − αt c ( X t , U t ) dt (cid:21)(cid:19)(cid:19) (1.5)where θ is as in the ﬁnite horizon problem and α > I α ( θ, i, u ) over all Markov controls u . A control ˆ u is said to beoptimal if it satisﬁes I α ( θ, i, ˆ u ) = inf U I α ( θ, i, u ) . Inﬁnite Horizon Average Cost Problem:

For the average cost problem the set ofadmissible controls is the set of stationary Markov controls. For a stationary control u ,deﬁne J u ( i ) = lim sup T →∞ T log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) . (1.6)The controller wants to minimise J u ( i ) over all stationary controls u . Optimal control isdeﬁned analogously.The rest of the paper is organised as follows. In Section 2 we study the ﬁnite horizon prob-lem. This analysis of this problem is fairly straightforward. Using the dynamic programmingheuristics we derive the Hamilton Jacobi Bellman (HJB) equation for this criteria. Thenusing a ﬁxed point theorem and some standard arguments involving Dynkin’s formula we ISK-SENSITIVE CONTROL 5 show that the value function is the unique solution of the HJB equation in an appropriateclass of functions. This in turn yields the existence of an optimal Markov control. Section 3deals with inﬁnite horizon discounted cost case. The analysis of this problem is surprisinglyfar more involved from a technical view point. As usual by using the dynamic programmingheuristics we derive the HJB equation and establish the corresponding veriﬁcation theorem.However, to establish the existence of a smooth solution of the HJB equation for this criteriaturns out to be quite tricky. We work around this problem by an appropriate limiting pro-cedure which establishes a solution of the HJB equation in the sense of distributions. Thenunder certain assumptions we establish the desired regularity of the solution. In Section 4we investigate the average cost problem. Again this problem turns out to be technicallyinvolved. The traditional vanishing discount approach does not seem to work. Instead weuse the multiplicative Poisson equation to get at the desired result. Using a limiting argu-ment involving the multiplicative Poisson equation we establish the existence of an optimalstationary control. In Section 5 we give a policy improvement algorithm for the average costcase. Finally in Section 6 we conclude our paper with some concluding remarks.2.

Finite Horizon Case

In this section we study the ﬁnite horizon case. For this we ﬁrst study the exponentialcost criterion. For t ∈ [0 , T ], u ∈ U , deﬁneˆ J u T ( t, i ) = E u t,i (cid:20) exp (cid:0) θ (cid:2)Z Tt c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) . (2.1)Deﬁne the value function V T by V T ( t, i ) = inf U ˆ J u T ( t, i )where the inﬁmum is over all Markov controls. Our aim is to characterise the value functionand to obtain an optimal control. To this end we ﬁrst describe a heuristic derivation of theHamilton Jacobi Bellman (HJB) equation. Formally V T ( t, i ) = inf U E u t,i (cid:26) exp (cid:20) θ Z t + ht c ( s, X s , U s ) ds + θ Z Tt + h c ( s, X s , U s ) ds + θg ( X T ) (cid:21)(cid:27) = inf U E u t,i (cid:26) exp (cid:20) θ Z t + ht c ( s, X s , U s ) ds (cid:21) E u t + h,X t + h (cid:18) exp (cid:20) θ Z Tt + h c ( s, X s , U s ) ds + θg ( X T ) (cid:21)(cid:19)(cid:27) = inf U E u t,i (cid:26) exp (cid:20) θ Z t + ht c ( s, X s , U s ) ds (cid:21) V T ( t + h, X t + h ) (cid:27) . If the function V T ( ., i ) is continuously diﬀerentiable then standard dynamic programmingarguments involving Dynkin’s formula leads to the following HJB equation for the ﬁnitehorizon problem:  dϕdt + inf u ∈ U i (cid:2) θc ( t, i, u ) ϕ ( t, i ) + X j λ ij ( u ) ϕ ( t, j ) (cid:3) = 0 on [0 , T ) × Sϕ ( T, i ) = e θg ( i ) . (2.2)The importance of this equation is highlighted by the following veriﬁcation theorem: GHOSH AND SAHA

Theorem 2.1.

Assume (A1)-(A3) . Suppose there exists a smooth (continuously diﬀer-entiable with respect to the ﬁrst variable), bounded solution Ψ to (2.2) , then Ψ( t, i ) = V T ( t, i ) for all ( t, i ) ∈ [0 , T ] × S .

Furthermore an optimal Markov control for the cost criterion (2.1) exists and is given by U ∗ t = u ∗ ( t, X t − ) where u ∗ satisﬁes inf u ∈ U i (cid:2) θc ( t, i, u )Ψ( t, i ) + X j λ ij ( u )Ψ( t, j ) (cid:3) = (cid:2) θc ( t, i, u ∗ ( t, i ))Ψ( t, i ) + X j λ ij ( u ∗ ( t, i ))Ψ( t, j ) (cid:3) . (2.3) Proof.

Let u be any arbitrary Markov control. By Feynman - Kac formula E u t,i (cid:20) exp (cid:0) θ (cid:2)Z Tt c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) = Ψ( t, i ) + E u t,i Z Tt exp (cid:18) θ (cid:2)Z rt c ( s, X s , U s ) ds (cid:3)(cid:19)(cid:20) d Ψ dr ( r, X r ) + θc ( r, X t , U t )Ψ( r, X r ) + X j λ X r j ( U r )Ψ( t, j ) (cid:21) dr . Since Ψ satisﬁes (2.2) we haveΨ( t, i ) ≤ E u t,i (cid:20) exp (cid:0) θ (cid:2)Z Tt c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) . Now if we use the control u ∗ as in (2.3) then we get an equality in the above in placeof inequality. The existence of an u ∗ satisfying (2.3) is ensured by a standard measurableselection theorem [2]. Hence the theorem follows. (cid:3) Next we prove that there exists a smooth, bounded solution to (2.2).

Theorem 2.2.

Assume (A1)-(A3) . Then there exists a unique solution to (2.2) in C b ([0 , T ] × S ) T C ([0 , T ) × S ) .Proof. Let ϕ ( t, i ) = e − γ t ψ ( t, i ). Substituting in (2.2) we get  dψdt − γ ψ + inf u ∈ U i (cid:2) θc ( t, i, u ) ψ ( t, i ) + X j λ ij ( u ) ψ ( t, j ) (cid:3) = 0 on [0 , T ) × Sψ ( T, i ) = e γ T e θg ( i ) . (2.4)Consider the following integral equation: ψ ( t, i ) = e γ t e θg ( i ) + e γ t Z Tt e − γ s inf u ∈ U i (cid:2) θc ( s, i, u ) ψ ( s, i ) + X j λ ij ( u ) ψ ( s, j ) (cid:3) ds . (2.5)Deﬁne T : C b ([0 , T ] × S ) → C b ([0 , T ] × S ) by T ψ ( t, i ) = e γ t e θg ( i ) + e γ t Z Tt e − γ s inf u ∈ U i (cid:2) θc ( s, i, u ) ψ ( s, i ) + X j λ ij ( u ) ψ ( s, j ) (cid:3) ds . ISK-SENSITIVE CONTROL 7

Then |T ψ ( t, i ) − T ψ ( t, i ) | ≤ e γ t Z Tt e − γ s (cid:8) θ || c |||| ψ − ψ || + 2 M || ψ − ψ || (cid:9) ds = (2 M + θ || c || ) || ψ − ψ || e γ t Z Tt e − γ s ds = 2 M + θ || c || γ || ψ − ψ || e γ t ( e − γ t − e − γ T ) ≤ M + θ || c || γ || ψ − ψ || , where M is as in (A2) . Hence for γ = 2 M + θ || c || + 1, T is a contraction and thus byBanach’s ﬁxed point theorem there exists a unique solution to (2.5) in C b ([0 , T ] × S ). Using (A1)-(A3) , the boundedness and continuity of the cost function c , it follows that ψ is in C b ([0 , T ] × S ) T C [0 , T ) × S . Then it follows that ϕ ( t, i ) = e − γ t ψ ( t, i ) is a solution of (2.2).The uniqueness follows from the previous theorem. (cid:3) Thus combining the above two theorems we get the following theorem:

Theorem 2.3.

Under (A1)-(A3) , the value function V T is the unique solution to (2.2) in C b ([0 , T ] × S ) T C ([0 , T ) × S ). An optimal control is given by the Markov control U ∗ t = u ∗ ( t, X t − ) where u ∗ satisﬁes inf u (cid:2) θc ( t, i, u ) V T ( t, i ) + X j λ ij ( u ) V T ( t, j ) (cid:3) = (cid:2) θc ( t, i, u ∗ ( t, i )) V T ( t, i ) + X j λ ij ( u ∗ ( t, i )) V T ( t, j ) (cid:3) . (2.6)Now since logarithm is an increasing function the following theorem is now evident. Theorem 2.4.

Let ϕ be the unique solution of (2.2) in C b ([0 , T ] × S ) T C ([0 , T ) × S ).Deﬁne ψ = θ − log ϕ . Then ψ ( t, i ) = inf U θ log E u i (cid:20) exp (cid:0) θ (cid:2)Z T c ( s, X s , U s ) ds + g ( X T ) (cid:3)(cid:1)(cid:21) . Moreover the Markov control given by (2.6) is again an optimal control in this case. Discounted Cost Case

In this section we turn our attention towards inﬁnite horizon discounted cost problem.Deﬁne V α ( θ, i ) = inf U I α ( θ, i, u ) (3.1)where I α ( θ, i, u ) is as in (1.5). The function V α is called the α − discounted value function.Our aim is to characterise the value function and to obtain an optimal control.Instead of working with V α we ﬁrst start with W α ( θ, i ) = inf U exp (cid:2) θI α ( θ, i, u ) (cid:3) . (3.2) GHOSH AND SAHA

Formally, for any

T > W α ( θ, i ) = inf U E u i (cid:26) exp (cid:20) θ Z T e − αt c ( X t , U t ) dt + θ Z ∞ T e − αt c ( X t , U t ) dt (cid:21)(cid:27) = inf U E u i (cid:26) exp (cid:20) θ Z T e − αt c ( X t , U t ) dt (cid:21) E u X T (cid:18) exp (cid:20) θe − αT Z ∞ e − αt c ( X t , U t ) dt (cid:21)(cid:19)(cid:27) = inf U E u i (cid:26) exp (cid:20) θ Z T e − αt c ( X t , U t ) dt (cid:21) W α ( θe − αT , X T ) (cid:27) . If W α ( ., i ) is smooth then, using Dynkin’s formula and some heuristic arguments we obtainthat W α should satisfy  αθ dW α dθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) W α ( θ, i ) + X j λ ij ( u ) W α ( θ, j ) (cid:21) with lim θ → W α ( θ, i ) = 1 . (3.3)Equation (3.3) is known as the HJB equation for the cost criterion (3.2). Now starting with(3.3) the following veriﬁcation theorem can be obtained. Theorem 3.1.

Assume that there exists a bounded, smooth (continuously diﬀerentiable inthe ﬁrst variable) function w ( θ, i ) such that αθ dwdθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) w ( θ, i ) + X j λ ij ( u ) w ( θ, j ) (cid:21) on (0 , × S (3.4) and w ( θ, i ) → as θ → uniformly in i . Then w ( θ, i ) = W α ( θ, i ) . Furthermore an optimalcontrol for the cost criterion is given by (3.2) is given by U ∗ t = u ∗ ( θe − αt , X t − ) (3.5) where u ∗ is given by inf u ∈ U i (cid:2) θc ( i, u ) w ( θ, i ) + X j λ ij ( u ) w ( θ, j ) (cid:3) = (cid:2) θc ( i, u ∗ ( θ, i )) w ( θ, i ) + X j λ ij ( u ∗ ( θ, i )) w ( θ, j ) (cid:3) . (3.6) Proof.

Deﬁne θ t = θe − αt andΨ t = exp (cid:26)Z t θ s c ( X s , U s ) ds (cid:27) for any arbitrary Markov control U t = u ( t, X t − ). Then by Feynman - Kac formula we get E u i (cid:8) Ψ T w ( θ T , X T ) (cid:9) − w ( θ, i )= E u i (cid:26)Z T Ψ t (cid:20) − αθ t dwdθ ( θ t , X t ) + θ t c ( X t , U t ) w (( θ t , X t )) + X j λ X t j ( U t ) w ( θ t , X t ) (cid:21) dt (cid:27) . Since w satisﬁes (3.4), the term on the righthand side above is non-negative. Therefore weget w ( θ, i ) ≤ E u i (cid:8) Ψ T w ( θ T , X T ) (cid:9) . ISK-SENSITIVE CONTROL 9

Now θ T → T → ∞ and hence w ( θ T , X T ) →

1. Thus we get w ( θ, i ) ≤ E u i (cid:26) exp (cid:20) θ Z ∞ e − αt c ( X t , U t ) dt (cid:21)(cid:27) . (3.7)Similarly if we take the Markov control U ∗ given by (3.5) and (3.6) then we get equality in(3.7) in place of inequality. Hence the theorem follows. (cid:3) The following result is now evident.

Corollary 3.2.

For w as in Theorem 3.1, deﬁne v ( θ, i ) = θ − log w ( θ, i ) . Then v = V α ,where V α is as in (3.1) . Now we prove that the HJB equation (3.4) indeed has a smooth solution. To this endwe ﬁrst prove the following result.

Proposition 3.3.

Let ǫ > be arbitrary but ﬁxed. There exists a function W ǫ in C b ([ ǫ, × S ) T C (( ǫ, × S ) such that W ǫ satisﬁes  αθ dW ǫ dθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) W ǫ ( θ, i ) + X j λ ij ( u ) W ǫ ( θ, j ) (cid:21) on ( ǫ, × SW ǫ ( ǫ, i ) = e ǫα || c || := h ǫ ( i ) . (3.8) Proof.

Let δ > T : C b ([ ǫ, ǫ + δ ] × S ) → C b ([ ǫ, ǫ + δ ] × S )by T f ( η, i ) = e ǫα || c || + 1 α Z ηǫ inf u ∈ U i (cid:2) c ( i, u ) f ( θ, i ) + 1 θ X j λ ij ( u ) f ( θ, j ) (cid:3) dθ . Then | T f ( η, i ) − T f ( η, i ) | ≤ α (cid:20) || c || δ || f − f || + 2 Mǫ δ || f − f || (cid:21) , where M is as in (A2) . Choose δ such that β := 1 α (cid:20) || c || δ + 2 Mǫ δ (cid:21) is strictly less than 1. Then T is a contraction. Hence by Banach’s ﬁxed point theoremthere exists a W in C b ([ ǫ, ǫ + δ ] × S ) which is the unique ﬁxed point of T . Now assumptions (A1)-(A3) and the continuity of c imply that W is in C b ([ ǫ, ǫ + δ ] × S ) T C (( ǫ, ǫ + δ ] × S ).Thus it follows that W satisﬁes (3.8) on [ ǫ, ǫ + δ ] × S . Proceeding in this way we will get afunction W ǫ ∈ C b ([ ǫ, × S ) T C (( ǫ, × S ) which satisﬁes (3.8). (cid:3) Next we take limit ǫ → W ǫ and show that the limit satisﬁes (3.4). In particular weprove the following theorem: Theorem 3.4.

Assume (A1)-(A3) and further assume that S is ﬁnite. Then there existsa unique solution W in the class C b ((0 , × S ) T C ((0 , × S ) to the equation  αθ dWdθ ( θ, i ) = inf u ∈ U i (cid:20) θc ( i, u ) W ( θ, i ) + X j λ ij ( u ) W ( θ, j ) (cid:21) on (0 , × S with lim θ → W ( θ, i ) = 1 . Proof.

Using Dynkin’s formula it can be shown that W ǫ has the following stochastic repre-sentation: W ǫ ( θ, i ) = inf U E u i (cid:20) h ǫ exp (cid:18) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:19)(cid:21) where h ǫ is as in (5.2) and T ǫ = inf { t ≥ θ t = ǫ } , i.e., T ǫ = log( θǫ ) α . From this representationof W ǫ we can deduce that for every ǫ > ≤ W ǫ ( θ, i ) ≤ e θα || c || ≤ e || c || α . Now we show that dW ǫ dθ is also uniformly (in ǫ >

0) bounded. For any arbitrary Markovcontrol u , (cid:12)(cid:12)(cid:12)(cid:12) E u i (cid:2) h ǫ exp (cid:0) ( θ + δ ) Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3) − E u i (cid:2) h ǫ exp (cid:0) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) ≤ I + I where ( θ + δ ) e − αT δǫ = ǫ and I = (cid:12)(cid:12)(cid:12)(cid:12) E u i (cid:2) h ǫ exp (cid:0) ( θ + δ ) Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3) − E u i (cid:2) h ǫ exp (cid:0) θ Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) ,I = (cid:12)(cid:12)(cid:12)(cid:12) E u i (cid:2) h ǫ exp (cid:0) θ Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3) − E u i (cid:2) h ǫ exp (cid:0) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:1)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) . Now I ≤ e || c || α E u i (cid:20) exp (cid:0) θ Z T δǫ e − αt c ( X t , U t ) dt (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:0) δ Z T δǫ e − αt c ( X t , U t ) dt (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ C e || c || α δ || c || α for some constant C > δ > I we have I ≤ e || c || α E u i (cid:20) exp (cid:0) θ Z T ǫ e − αt c ( X t , U t ) dt (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:0) θ Z T δǫ T ǫ e − αt c ( X t , U t ) dt (cid:1) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ e || c || α (cid:20) e θ || c || α (cid:0) e − αTǫ − e − αTδǫ (cid:1) − (cid:21) . But θe − αT ǫ − θe − αT δǫ = δe − αT δǫ = ǫδθ + δ . Hence we have I ≤ e || c || α (cid:20) e θ || c || α ǫδθ + δ − (cid:21) ≤ C e || c || α δ || c || α for some constant C > δ > δ > (cid:12)(cid:12) W ǫ ( θ + δ, i ) − W ǫ ( θ, i ) (cid:12)(cid:12) ≤ C e || c || α δ || c || α for some constant C > δ <

0, small enough, we can get an estimate of the type (cid:12)(cid:12) W ǫ ( θ + δ, i ) − W ǫ ( θ, i ) (cid:12)(cid:12) ≤ C e || c || α | δ | || c || α . ISK-SENSITIVE CONTROL 11

Thus we get that dW ǫ dθ is uniformly bounded in ǫ > f W ǫ ( θ, i ) = ( W ǫ ( θ, i ) for θ > ǫh ǫ ( i ) for θ ≤ ǫ . Then f W ǫ satisﬁes the same bounds. Now since f W ǫ is uniformly bounded and d f W ǫ dθ is alsouniformly bounded, by Ascoli - Arzela theorem there exists a function W in C b ((0 , × S )and a sequence ǫ n → g W ǫ n → W uniformly over compact subsets of (0 , × S .Also by the deﬁnition of f W ǫ , W ( θ, i ) → θ →

0. Now taking ϕ ∈ C ∞ c (0 ,

1) we get − Z α g W ǫ n d ( θϕ ) dθ dθ = Z αθ d g W ǫ n dθ ϕ ( θ ) dθ = Z inf u ∈ U i (cid:20) θc ( i, u ) g W ǫ n ( θ, i ) + X j λ ij ( u ) g W ǫ n ( θ, j ) (cid:21) ϕ ( θ ) dθ − Z ǫ n inf u ∈ U i (cid:20) θc ( i, u ) g W ǫ n ( θ, i ) + X j λ ij ( u ) g W ǫ n ( θ, j ) (cid:21) ϕ ( θ ) dθ . Now taking limit n → ∞ we get − Z αW d ( θϕ ) dθ dθ = Z inf u ∈ U i (cid:20) θc ( i, u ) W ( θ, i ) + X j λ ij ( u ) W ( θ, j ) (cid:21) ϕ ( θ ) dθ . Thus αθ dWdθ = inf u ∈ U i (cid:20) θc ( i, u ) W ( θ, i ) + X j λ ij ( u ) W ( θ, j ) (cid:21) in the sense of distribution. But by our assumptions the righthand side is a continuousfunction. Therefore dWdθ is in C ((0 , × S ). Thus W is a smooth solution to the HJB equation(5.2). The uniqueness follows from Theorem 3.1. (cid:3) This immediately yields the following result:

Theorem 3.5.

Assume (A1)-(A3) and that S is ﬁnite.Then the value function V α as in (3.1) is the unique solution in C b ((0 , × S ) T C ((0 , × S ) to αθ (cid:2) v + θ dvdθ (cid:3) e θv = inf u ∈ U i (cid:2) θce θv + X j λ ij ( u ) e θv ( θ,j ) (cid:3) on (0 , × S with lim θ → v ( θ, i ) = inf U E u i Z ∞ e − αt c ( X t , U t ) dt . An optimal control is given by the Markov control U t = u ∗ ( θe − αt , X t − ) where u ∗ is given by inf u ∈ U i (cid:2) θc ( i, u ) e θV α + X j λ ij ( u ) e θV α ( θ,j ) (cid:3) = (cid:2) θc ( i, u ∗ ( θ, i )) e θV α + X j λ ij ( u ∗ ( θ, i )) e θV α ( θ,j ) (cid:3) . Remark 3.6.

The ﬁniteness of the state space in Theorem 3.4 is forced upon by theuniformity in the boundary condition in 3.1. Note that the limiting procedure that we haveemployed only yields that lim θ → W ( θ, i ) = 1 for each i . Hence the ﬁniteness assumptionon S . Note that a similar situation arises in the risk-sensitive control of diﬀusion processes [22] . In [22] the authors treat periodic diﬀusions for which the state space is a torus whichis compact. Inﬁnite Horizon Average Cost

In this section we study the inﬁnite horizon average cost case. In order to study theaverage cost case we make some further assumptions on our model. (A4)

For every stationary control u , the corresponding Markov chain is irreducible. (A5) There exists a Lyapunov function V : S → R + , an unbounded function W : S → [1 , ∞ ) and constants δ > b < ∞ such that e − V ( i ) X j λ ij ( u ) e V ( j ) ≤ − δW ( i ) + b { } ( i ) for all i, u. (4.1)An important consequence of (A5) is the following lemma: Lemma 4.1.

Let η < δ and τ = inf { t > X t = 0 } . (4.2) Then sup u E u i e ητ ≤ e V ( i ) . Proof.

Let τ n = sup { t ≥ X t ≤ n } . If X ≥ n , then τ n is 0. Assumption (A5) implies that there exists a ˜ b and a e V = e V suchthat X j λ ij ( u ) e V ( j ) ≤ − δ e V ( i ) + ˜ b { } ( i ) . By Dynkin’s formula we get E u i e η ( τ ∧ τ n ∧ n ) e V ( X τ ∧ τ n ∧ n ) = e V ( i ) + E u i Z τ ∧ τ n ∧ n e ηs (Λ u + η ) e V ( X s ) ds ≤ e V ( i ) + E u i Z τ ∧ τ n ∧ n e ηs ( η − δ ) e V ( X s ) ds . Thus we have e V ( i ) ≥ E u i e η ( τ ∧ τ n ∧ n ) e V ( X τ ∧ τ n ∧ n ) + E u i Z τ ∧ τ n ∧ n e ηs ( − η + δ ) e V ( X s ) ds ≥ E u i e η ( τ ∧ τ n ∧ n ) . Now letting n → ∞ we get the desired result. (cid:3) ISK-SENSITIVE CONTROL 13

Finally we make the following assumption (A6)

For τ as deﬁned in (4.2), sup i, u E u i τ < ∞ . Remark 4.2.

If the state space is ﬁnite then it can be easily seen that (A5) implies (A6) . Now we state and prove the main theorem of this section:

Theorem 4.3.

Under (A1) - (A6) , an optimal control for the risk-sensitive average costcriterion exists for θ and c satisfying θ || c || < δ where δ is as in (4.1) .Proof. Let θρ u = lim T →∞ T log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) . (4.3)The existence of the above limit follows from the multiplicative ergodic theorems proved in[18] and [19] . Moreover it also follows from the results in [18] and [19] that the limit in (4.3)is the principal eigenvalue for the operator Λ u + θc and has a positive eigenfunction whichbelongs to the class L ∞ e V , i.e., if we denote an eigenfunction by h u then sup | h u ( i ) | e V ( i ) < ∞ . Thusthe following equation holds X j λ ij ( u ( i )) h u ( j ) + θc ( i, u ( i )) h u ( i ) = ρ u θh u ( i ) . (4.4)Equation (4.4) is referred to as the Poisson equation. Now it is clear that if h u satisﬁes (4.4)then so does any scalar multiple of h u . Therefore without any loss of generality we mayassume that h u (0) = 1. With this restriction, using Dynkin’s formula and the fact that h u satisﬁes (4.4) we get the following stochastic representation for h u : h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:19)(cid:21) . (4.5)Now using the stochastic representation of h u we derive bounds on h u . First we derive anupper bound. We have h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:19)(cid:21) ≤ E u i e θ || c || τ ≤ e V ( i ) by Lemma 4.1, provided θ || c || < δ .This upper bound shows that bound on h u is uniform in u . Next we obtain a lower bound. We have h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:19)(cid:21) ≥ exp (cid:26) E u i θ Z τ ( c ( X s , u ( X s )) − ρ u ) ds (cid:27) ≥ exp( − θρ u E u i τ ) ≥ exp( − θ || c || E u i τ ) > ǫ for some ǫ >

0. In the above sequence of inequalities the second one follows from Jensen’sinequality and the last one follows from (A6) .Let ρ ∗ = inf u ρ u . (4.6)Next we show that there exists a control u ∗ which attains the inﬁmum in (4.6). From (4.6)it follows that there exists a sequence u n such that ρ u n → ρ ∗ . Since each U i is compactthere exists a subsequence which is also denoted by u n again and a u ∗ such that u n → u ∗ pointwise . Again since h u n is pointwise bounded, there exists a subsequence which we call h u n againsuch that h u n ( i ) → h ∗ ( i ) for each i , for some h ∗ and inf i h ∗ ( i ) ≥ ǫ . Therefore by using Fatou’s lemma we have X j = i λ ij ( u ∗ ( i )) h ∗ ( j ) ≤ lim inf n →∞ X j = i λ ij ( u n ( i )) h u n ( j )= lim inf n →∞ [ − λ ii ( u n ( i )) h u n ( i ) − θc ( i, u n ( i )) h u n ( i ) + θρ u n h u n ( i )]= − λ ii ( u ∗ ( i )) h ∗ ( i ) − θc ( i, u ∗ ( i )) h ∗ ( i ) + θρ ∗ h ∗ ( i ) . Thus we get X j λ ij ( u ∗ ( i )) h ∗ ( j ) + θc ( i, u ∗ ( i )) h ∗ ( i ) ≤ θρ ∗ h ∗ ( i ) . Now we claim that ρ ∗ = ρ u ∗ . ISK-SENSITIVE CONTROL 15

Indeed, with τ n as in the proof of Lemma 4.1 we get from Dynkin’s formula E u ∗ i (cid:20) exp (cid:18) θ Z T ∧ τ n c ( X s , u ∗ ( X s )) ds (cid:19) h ∗ ( X T ∧ τ n ) (cid:21) = h ∗ ( i ) + E u ∗ i (cid:20)Z T ∧ τ n exp (cid:0) θ Z t c ( X s , u ∗ ( X s )) ds (cid:1) (Λ u ∗ + θc ) h ∗ ( X t ) dt (cid:21) ≤ h ∗ ( i ) + θρ ∗ E u ∗ i (cid:20)Z T ∧ τ n exp (cid:0) θ Z t c ( X s , u ∗ ( X s )) ds (cid:1) h ∗ ( X t ) dt (cid:21) ≤ h ∗ ( i ) + θρ ∗ Z T E u ∗ i (cid:20) exp (cid:0) θ Z t ∧ τ n c ( X s , u ∗ ( X s )) ds (cid:1) h ∗ ( X t ∧ τ n ) (cid:21) dt . Hence by Gronwall’s inequality we have E u ∗ i (cid:20) exp (cid:18) θ Z T ∧ τ n c ( X s , u ∗ ( X s )) ds (cid:19) h ∗ ( X T ∧ τ n ) (cid:21) ≤ h ∗ ( i ) e θρ ∗ T . Therefore letting n → ∞ we have h ∗ ( i ) e θρ ∗ T ≥ ǫ E u ∗ i (cid:20) exp (cid:18) θ Z T c ( X s , u ∗ ( X s )) ds (cid:19)(cid:21) . which implies θρ ∗ ≥ lim T →∞ T log E u ∗ i (cid:20) exp (cid:18) θ Z T c ( X s , u ∗ ( X s )) ds (cid:19)(cid:21) = θρ u ∗ . Hence ρ ∗ = ρ u ∗ . (cid:3) Policy Improvement Algorithm

In the previous section we have proved the existence of an optimal control. But ourtheorem is purely existential and does not give an algorithm to ﬁnd an optimal control.In this section we focus on the computational approach for ﬁnding an optimal stationarycontrol. Since we are concerned with algorithm in this section we assume that both thestate and action spaces are ﬁnite. Now we describe the policy improvement algorithm.

AlgorithmStep 1:

Start with any initial stationary control u . For this u ρ u = lim T →∞ θT log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) and h u ( i ) = E u i (cid:20) exp (cid:18) θ Z τ ( c ( X t , u ( X t )) − ρ u ) dt (cid:19)(cid:21) . We know from previous section that h u satisﬁes the Poisson equation X j λ ij ( u ( i )) h u ( j ) + θc ( i, u ( i )) h u ( i ) = θρ u h u ( i )satisfying the constraint h u (0) = 1. Step 2:

Deﬁne u to be the stationary control which minimizesmin u ∈ U i (cid:2) θc ( i, u ) h u ( i ) + X j λ ij ( u ) h u ( j ) (cid:3) . With ρ u and h u as above continue the procedure. Theorem 5.1.

The above algorithm leads to an optimal control in ﬁnite number of steps.Proof.

In order to prove that this algorithm comes up with an optimal control in a ﬁnitenumber of steps we ﬁrst claim that ρ u n +1 ≤ ρ u n . (5.1)Indeed, from the deﬁnition of u n +1 we have X j λ ij ( u n +1 ( i )) h u n ( j ) + θc ( i, u n +1 ( i )) h u n ( i ) ≤ X j λ ij ( u n ( i )) h u n ( j ) + θc ( i, u n ( i )) h u n ( i )= θρ u n h u n ( i ) . Now using arguments involving Dynkin’s formula as in the previous section it can be provedthat ρ u n +1 ≤ ρ u n .Our second claim is that, suppose for some n and for all i X j λ ij ( u n ( i )) h u n ( j ) + θc ( i, u n ( i )) h u n ( i ) = X j λ ij ( u n +1 ( i )) h u n ( j ) + θc ( i, u n +1 ( i )) h u n ( i ) (5.2) then u n is optimal.To prove this observe that if u n is as in (5.2) then θρ u n h u n ( i ) = X j λ ij ( u n ( i )) h u n ( i ) + θc ( i, u n ( i )) h u n ( i )= X j λ ij ( u n +1 ( i )) h u n ( i ) + θc ( i, u n +1 ( i )) h u n ( i )= min u ∈ U i (cid:2) θc ( i, u ) h u n ( i ) + X j λ ij ( u ) h u n ( i ) (cid:3) . (5.3)Now for any stationary control u we have by Dynkin’s formula E u i (cid:20) exp (cid:18) θ Z T ( c ( X t , u ( X t )) − ρ u n ) dt (cid:19) h u n ( X T ) (cid:21) = h u n ( i ) + E u i Z T exp (cid:18) θ Z t ( c ( X s , u ( X s )) − ρ u n ) ds (cid:19) [Λ u + θc − θρ u n ] h u n ( X t ) dt ≥ h u n ( i ) . The last inequality follows from (5.3). Thus we have h u n ( i ) ≤ max i h u n ( i ) E u i (cid:20) exp (cid:18) θ Z T ( c ( X t , u ( X t )) − ρ u n ) dt (cid:19)(cid:21) . This implies that ρ u n ≤ lim T →∞ θT log E u i (cid:20) exp (cid:18) θ Z T c ( X t , u ( X t )) dt (cid:19)(cid:21) = ρ u . Hence the claim.Our ﬁnal claim is that if u n is not an optimal control then the inequality in (5.1) is actually ISK-SENSITIVE CONTROL 17 a strict inequality.Since u n is not optimal, we have X j λ ij ( u n +1 ( i )) h u n ( i ) + θc ( i, u n +1 ( i )) h u n ( i ) − θρ u n h u n ( i ) = − g ( i ) , where g is a non-negative function and there exists at least one i such that g ( i ) > T > E u n +1 i (cid:20) exp (cid:18) θ Z T ( c ( X s , u n +1 ( X s )) − ρ u n ) ds (cid:19) h u n ( X T ) (cid:21) = h u n ( i ) + E u n +1 i (cid:20)Z T exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n ) ds (cid:19)(cid:18) θc n +1 + Λ n +1 − θρ u n (cid:19) h u n ( X t ) dt (cid:21) = h u n ( i ) − E u n +1 i (cid:20)Z T exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n ) ds (cid:19) g ( X t ) dt (cid:21) = h u n ( i ) − Z T exp − ( ρ u n − ρ u n +1 ) t exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n +1 ) ds (cid:19) g ( X t ) dt = h u n ( i ) − h u n +1 ( i ) T T e E u n +1 i Z T g ( X t ) h u n +1 ( X t ) dt (5.4)if ρ u n +1 = ρ u n . In (5.4) the expectation operator e E u n +1 i is given by e E u n +1 i f ( X t ) = E u n +1 i (cid:20) exp (cid:18) θ Z t ( c ( X s , u n +1 ( X s )) − ρ u n +1 ) ds (cid:19) h u n +1 ( X t ) h u n +1 ( i ) f ( X t ) dt (cid:21) (5.5)for any real valued bounded function f . It is easy to see that (5.5) uniquely determinesa transition probability kernel e P u n +1 i and under e P u n +1 i the corresponding Markov chain isstill irreducible. Since the state space is ﬁnite, the Markov chain under e P u n +1 i is positiverecurrent. Thus it has a unique invariant measure, say, e π . Now observe that the righthandside of (5.4) is negative for T suﬃciently large because1 T e E u n +1 i Z T g ( X t ) h u n +1 ( X t ) dt → X e π ( i ) g ( i ) h u n +1 ( i ) > . But the lefthand side is always non-negative. Thus we get a contradiction and hence ρ u n +1 < ρ u n . From the above claims it follows that the algorithm comes up with the optimal controlwithin a ﬁnite number of steps because the number of controls is ﬁnite. (cid:3)

Some comments are now in order.

Remark 5.2.

Suppose the state and action spaces are ﬁnite. Let u ∗ be an optimal control.Let ρ ∗ be the optimal average cost and h u ∗ ( i ) = E u ∗ i (cid:20) exp (cid:18) θ Z τ ( c ( X t , u ∗ ( X t )) − ρ ∗ ) dt (cid:19)(cid:21) . Then it follows from the arguments used in the proof of Theorem 5.1 that ( ρ ∗ , h u ∗ ) satisﬁesthe equation θρ ∗ h u ∗ ( i ) = min u (cid:2) θc ( i, u ) h u ∗ ( i ) + X j λ ij ( u ) h u ∗ ( i ) (cid:3) . (5.6) Equation (5.6) is the HJB equation for the average cost criterion. If ( λ, h ) is a solution of (5.6) where h is a positive function then using Dynkin’s formula it can be shown that λ isthe optimal cost and the minimiser in (5.6) is an optimal control. Remark 5.3.

If the state space is countably inﬁnite and equation (5.6) has a solution ( λ, h ) such that h is a bounded, positive function which is uniformly bounded away from , then again it can be shown that λ is the optimal cost and the minimiser in (5.6) is anoptimal control. However, we have not been able to show that (5.6) has such a solution. Ifone assumes that (5.6) has such a solution then one can develop value and policy iterationalgorithm along the lines of [6] . In [6] the authors deal with discrete time Markov chains.There they have developed value and policy iteration algorithm under the assumption thatanalogous dynamic programming equation has a solution. Conclusion

In this paper we have studied risk-sensitive optimal control problem for continuous timeMarkov chains. We have analysed the ﬁnite horizon case under fairly general conditions.For the inﬁnite horizon discounted cost case we have assumed that the state space is ﬁnite.So it will be interesting to investigate the problem for the case of countably inﬁnite statespace. The average cost case has been studied under an additional Lyapunov type stabilitycondition. We have established the existence of an optimal control. We have also developedpolicy iteration algorithm for the case of ﬁnite state and action spaces. For countable statespace an algorithmic approach to determine an optimal control needs further investigation.

Acknowledgement :

The authors wish to thank V. S. Borkar for helpful discussions.

References [1] A. Arapostathis, V. S. Borkar and M. K. Ghosh,

Ergodic Control of Diﬀusion Processes , Encyclopediaof Mathematics and its Applications 143, Cambridge University Press, Cambridge, 2012.[2] V. E. Benes,

Existence of optimal strategies based on speciﬁed information for a class of stochasticdecision problems , SIAM J. Control 8 (1970), 179-188.[3] A. Biswas, V. S. Borkar and K. S. Kumar,

Risk-sensitive control with near monotone cost , Appl. Math.Optim. 62 (2010), 145-163.[4] T. R. Bielecki and S. R. Pliska,

Risk-sensitive dynamic asset management , Appl. Math. Optim. 39(1999), 337-360.[5] T. R. Bielecki, S. R. Pliska and S. J. Sheu,

Risk sensitive portfolio management with Cox-Ingersoll-Rossinterst rates: the HJB equation , SIAM J. Control Optim 44 (2005), 1811-1843.[6] V. S. Borkar and S. P. Meyn,

Risk-sensitive optimal control for Markov decision processes with monotonecost , Math. Oper. Res. 27 (2002), 192-209.[7] K. J. Chung and M. J. Sobel,

Discounted MDPs: distribution functions and exponential utility maxi-mization , SIAM J. Control Optim. 25 (1987), 49-62.[8] S. N. Ethier and T. G. Kurtz,

Markov Processes : Characterization and Convergence , John Wiley andSons, 1986.

ISK-SENSITIVE CONTROL 19 [9] W. H. Fleming and W. M. McEneaney,

Risk sensitive control on an inﬁnite horizon , SIAM J. ControlOptim. 33 (1995), 1881-1915.[10] M. K. Ghosh, A. Goswami and S. K. Kumar,

Portfolio Optimization in a semi-Markov modulatedmarket , Appl. Math. Optim. 60 (2009), 275-296.[11] X. Guo and O. Hern´andez-Lerma,

Continuous-Time Markov Decision Processes. Theory and Applica-tions , Springer-Verlag, 2009.[12] X. Guo, O. Hern´andez-Lerma and T. Prieto-Rumeau,

A survey of recent results on continuous-timeMarkov decision processes , TOP 14 (2006), 177-261.[13] O. Hern´andez-Lerma,

Adaptive Markov Control Processes , Springer-Verlag, New York, 1989.[14] D. Hern´ a ndez-Hern´ a ndez and S. I. Marcus, Risk sensitive control of Markov processes in countable statespace , Systems Control Lett. 29 (1996), 147-155.[15] R. A. Howard and J. E. Matheson,

Risk-sensitive Markov decision processes , Mananagement Sci. 18(1972), 356-369.[16] D. H. Jacobson

Optimal stochastic linear systems with exponential performance criteria and their rela-tion to deterministic diﬀerential games , IEEE Trans. on Automat. Control 18 (1973), 124-131.[17] A. Ja´ s kiewicz, Average optimality for risk-sensitive control with general state space , Ann. Appl. Probab.17 (2007), 654-675.[18] I. Kontoyiannis and S. P. Meyn,

Spectral theory and limit theorems for geometrically ergodic Markovprocesses , Ann. Appl. Probab. 13 (2003), 304-362.[19] I. Kontoyiannis and S. P. Meyn,

Large deviations asymptotics and the spectral theory of multiplicativelyregular Markov Processes , Electron. J. Probab. 10 (2005), 61-123.[20] H. Markowitz,

Portfolio Selection . J. of Finance 7 (1952), 77-91.[21] G. B. Di Masi and L. Stettner,

Risk-senstive control of discrete time Markov processes with inﬁnitehorizon , SIAM J. Control Optim. 38 (1999), 61-78.[22] J. L. Menaldi and M. Robin,

Remarks on risk sensitive control problems , Appl. Math. Optim. 52 (2005),297-310.[23] H. Nagai,

Bellman equations of risk-sensitive control , SIAM J. Control Optim. 34 (1996), 74-101.[24] G. E. Monahan and M. J. Sobel,

Risk-sensitive dynamic market share attraction games , GamesEconom.Behav. 20 (1997), 149-160.[25] T. Prieto-Romeau and O. Hern´andez-Lerma,

Variance minimization and the overtaking optimality ap-proach to continuous-time controlled Markov chains , Math. Meth. Oper. Res. 70 (2009), 527-540.[26] W. F. Sharpe,

Capital asset prices: A theory of market equilibrium under conditions of risk , J. of Finance19 (1964), 425-442.[27] P. Whittle,

Risk-Sensitive Optimal Control , Wiley, New York, 1990.

Department of Mathematics, Indian Institute of Science, Bangalore 560 012, India.

E-mail address : [email protected] Department of Mathematics, Indian Institute of Science, Bangalore 560 012, India.

E-mail address ::