On risk-sensitive piecewise deterministic Markov decision processes
aa r X i v : . [ m a t h . O C ] N ov On risk-sensitive piecewise deterministic Markov decision processes
Xin Guo ∗ and Yi Zhang † Abstract:
We consider a piecewise deterministic Markov decision process, where the expected ex-ponential utility of total (nonnegative) cost is to be minimized. The cost rate, transition rate andpost-jump distributions are under control. The state space is Borel, and the transition and cost ratesare locally integrable along the drift. Under natural conditions, we establish the optimality equation,justify the value iteration algorithm, and show the existence of a deterministic stationary optimal pol-icy. Applied to special cases, the obtained results already significantly improve some existing results inthe literature on finite horizon and infinite horizon discounted risk-sensitive continuous-time Markovdecision processes.
Keywords:
Continuous-time Markov decision processes. Piecewise deterministic Markov decisionprocesses. Exponential utility. Dynamic programming.
AMS 2000 subject classification:
Primary 90C40, Secondary 60J75
Since the pioneering work [18], risk-sensitive discrete-time Markov decision processes (DTMDPs) havebeen studied intensively. Having restricted our attention to total undiscounted or discounted problems,let us mention e.g., [3, 6, 7, 11, 12, 15, 16], most of which deal with the exponential utility, as wellas in the present paper. As an application, an open problem in insurance was recently solved in [4]in the framework of risk-sensitive DTMDP. There are notable differences between risk-sensitive andrisk-neutral DTMDPs. For instance, in a finite model, i.e., when the state and action spaces are bothfinite, there is always a deterministic stationary optimal policy in a discounted risk-neutral DTMDP,but not always in a discounted risk-sensitive DTMDP, see [15].One of the first works on risk-sensitive continuous-time Markov decision processes (CTMDPs) is[21], where only verification theorems were presented. Recently, there have been reviving interests inthis topic; see e.g., [8, 14, 20, 24, 25, 27]. A finite horizon total undiscounted risk-sensitive CTMDPwas considered in [14, 21, 24], whose arguments were summarized as follows. Firstly, the optimalityequation is shown to admit a solution out of a small enough class. Secondly, by using the Feynman-Kac formula, this solution is shown to be the value function, and any Markov policy providing theminimizer in the optimality equation is optimal. The proofs of [14, 24] reveal that the the maintechnicalities lie in the first step, for which, the state space was assumed to be denumerable. Thisassumption is important for the diagonalization argument used in [24], which is an extension of [14]from bounded transition rate to possibly unbounded transition rate, whose growth is bounded by aLyapunov function. The latter requirement and the boundedness of the cost rate then validate theFeynman-Kac formula applied in the second step. The author of [24] mentioned that it was unclear ∗ Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, U.K.. E-mail:[email protected]. † Corresponding author. Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, U.K..E-mail: [email protected].
Notations and conventions.
In what follows, B ( X ) is the Borel σ -algebra of the topological space X, I stands for the indicator function, and δ { x } ( · ) is the Dirac measure concentrated on the singleton { x } , assumed to be measurable. A measure is σ -additive and [0 , ∞ ]-valued. Below, unless statedotherwise, the term of measurability is always understood in the Borel sense. Throughout this paper,we adopt the conventions of00 := 0 , · ∞ := 0 ,
10 := + ∞ , ∞ − ∞ := ∞ . (1)If a mapping f defined on X , and { X i } is a partition of X , then when f is piecewise defined as f ( x ) = g i ( x ) for all x ∈ X i , the notation f ( x ) = P i I { x ∈ X i } g i ( x ) is used, even if f is not real-valued.Let S be a nonempty Borel state space, A be a nonempty Borel action space, and q stand for asigned kernel q ( dy | x, a ) on B ( S ) given ( x, a ) ∈ S × A such that˜ q (Γ S | x, a ) := q (Γ S \ { x }| x, a ) ≥ S ∈ B ( S ) . Throughout this article we assume that q ( ·| x, a ) is conservative and stable, i.e., q ( S | x, a ) = 0 , ¯ q x = sup a ∈ A q x ( a ) < ∞ , (3)where q x ( a ) := − q ( { x }| x, a ) . The signed kernel q is often called the transition rate. Between twoconsecutive jumps, the state of the process evolves according to a measurable mapping φ from S × [0 , ∞ )to S , see (5) below. It is assumed that for each x ∈ Sφ ( x, t + s ) = φ ( φ ( x, t ) , s ) , ∀ s, t ≥ φ ( x,
0) = x, (4)and t → φ ( x, t ) is continuous.Finally let the cost rate c be a [0 , ∞ )-valued measurable function on S × A . For simplicity, we donot consider the case of different admissible action spaces at different states. Condition 2.1 (a) For each bounded measurable function f on S and each x ∈ S , R S f ( y )˜ q ( dy | x, a ) is continuous in a ∈ A. (b) For each x ∈ S, the (nonnegative) function c ( x, a ) is lower semicontinuous in a ∈ A. (c) The action space A is a compact Borel space. Condition 2.2
For each x ∈ S , R t q φ ( x,s ) ds < ∞ , and R t sup a ∈ A c ( φ ( x, s ) , a ) ds < ∞ , for each t ∈ [0 , ∞ ) . The integrals in the above condition are well defined: the integrands are universally measurable in s ∈ [0 , ∞ ); see Chapter 7 of [5].Let us take the sample space Ω by adjoining to the countable product space S × ((0 , ∞ ) × S ) ∞ the sequences of the form ( x , θ , . . . , θ n , x n , ∞ , x ∞ , ∞ , x ∞ , . . . ) , where x , x , . . . , x n belong to S , θ , . . . , θ n belong to (0 , ∞ ) , and x ∞ / ∈ S is the isolated point. We equip Ω with its Borel σ -algebra F .Let t ( ω ) := 0 =: θ , and for each n ≥
0, and each element ω := ( x , θ , x , θ , . . . ) ∈ Ω, let t n ( ω ) := t n − ( ω ) + θ n , and t ∞ ( ω ) := lim n →∞ t n ( ω ) . Obviously, ( t n ( ω )) are measurable mappings on (Ω , F ). In what follows, we often omit the argument ω ∈ Ω from the presentation for simplicity. Also, we regard x n and θ n +1 as the coordinate variables,and note that the pairs { t n , x n } form a marked point process with the internal history {F t } t ≥ , i.e.,the filtration generated by { t n , x n } ; see Chapter 4 of [19] for greater details. The marked point process { t n , x n } defines the stochastic process { ξ t , t ≥ } on (Ω , F ) of interest by ξ t = X n ≥ I { t n ≤ t < t n +1 } φ ( x n , t − t n ) + I { t ∞ ≤ t } x ∞ , t ≥ , (5)where we accept 0 · x := 0 and 1 · x := x for each x ∈ S ∞ , and below we denote S ∞ := S S { x ∞ } .A (history-dependent) policy π is given by a sequence ( π n ) such that, for each n = 0 , , , . . . ,π n ( da | x , θ , . . . , x n , s ) is a stochastic kernel on A , and for each ω = ( x , θ , x , θ , . . . ) ∈ Ω, t > ,π ( da | ω, t ) = I { t ≥ t ∞ } δ a ∞ ( da ) + ∞ X n =0 I { t n < t ≤ t n +1 } π n ( da | x , θ , . . . , θ n , x n , t − t n ) , (6)3here a ∞ / ∈ A is some isolated point. A policy π is called Markov if, with slight abuse of nota-tions, π ( da | ω, s ) = π M ( da | ξ s − , s ) for some stochastic kernel π M . A Markov policy is further calleddeterministic if the stochastic kernels π M ( da | x, s ) = δ { f M ( x,s ) } ( da ) for some measurable mapping f M from S × (0 , ∞ ) to A . A policy is called deterministic stationary if for each n = 0 , , . . . ,π n ( da | x , θ , . . . , θ n , x n , t − t n ) = δ { f ( φ ( x n ,t − t n )) } ( da ) for some measurable mapping f from S to A .We shall identify such a deterministic stationary policy by the underlying measurable mapping f .The class of all policies is denoted by Π . Under a fixed policy π = ( π n ), for each initial distribution γ on ( S, B ( S )) , by using the Ionescu-Tulcea theorem, one can build a probability measure P πγ on (Ω , F )such that P πγ ( x ∈ Γ) = γ (Γ) for each Γ ∈ B ( S ), and the conditional distribution of ( θ n +1 , x n +1 ) withthe condition on x , θ , x , . . . , θ n , x n is given on { ω : x n ( ω ) ∈ S } by P πγ ( θ n +1 ∈ Γ , x n +1 ∈ Γ | x , θ , x , . . . , θ n , x n )= Z Γ e − R t R A q φ ( xn,s ) ( a ) π n ( da | x ,θ ,...,θ n ,x n ,s ) ds Z A ˜ q (Γ | φ ( x n , t ) , a ) π n ( da | x , θ , . . . , θ n , x n , t ) dt, ∀ Γ ∈ B ((0 , ∞ )) , Γ ∈ B ( S ); P πγ ( θ n +1 = ∞ , x n +1 = x ∞ | x , θ , x , . . . , θ n , x n ) = e − R ∞ R A q φ ( xn,s ) ( a ) π n ( da | x ,θ ,...,θ n ,x n ,s ) ds , (7)and given on { ω : x n ( ω ) = x ∞ } by P πγ ( θ n +1 = ∞ , x n +1 = x ∞ | x , θ , x , . . . , θ n , x n ) = 1 . Below, when γ is a Dirac measure concentrated at x ∈ S, we use the denotation P πx . Expectations withrespect to P πγ and P πx are denoted as E πγ and E πx , respectively. Roughly speaking, the uncontrolledversion of the process evolves as follows: given the current state, the process evolves deterministicallyaccording to the mapping φ , up to the next jump, taking place after a random time whose distributionis (nonstationary) exponential, and the dynamics continue in the similar manner. A detailed booktreatment with many examples of this and more general type of processes, allowing deterministicjumps, can be found in [10].For each x ∈ S , and policy π = ( π n ), E πx h e R ∞ R A c ( ξ t ,a ) π ( da | ω,t ) dt i = E πx (cid:20) e P ∞ n =0 R θn +10 R A c ( φ ( x n ,s ) ,a ) π n ( da | x ,θ ,...,x n ,s ) ds (cid:21) =: V ( x, π )defines the concerned performance measure of the policy π ∈ Π given the initial state x ∈ S. Here andbelow, we put c ( x ∞ , a ) := 0 for each a ∈ A, and φ ( x ∞ , t ) = x ∞ for each t ∈ [0 , ∞ ) . We are interestedin the following optimal control problem for each x ∈ S :Minimize over π ∈ Π: V ( x, π ) . (8)A policy π ∗ is called optimal if V ( x, π ∗ ) = inf π ∈ Π V ( x, π ) =: V ∗ ( x ) for each x ∈ S .The objective of this paper is to show, under the imposed conditions, the existence of a deterministicstationary optimal policy, and to establish the corresponding optimality equation satisfied by the valuefunction V ∗ , together with its value iteration. Evidently, V ∗ ( x ) ≥ x ∈ S. Under the nextcondition, it will be seen that for each x ∈ S, V ∗ ( φ ( x, s )) is absolutely continuous in s. Condition 2.3
For each x ∈ S, V ∗ ( x ) < ∞ . The above condition is mainly assumed for notational convenience. In fact, the main optimalityresults (such as the existence of a deterministic stationary optimal policy) obtained in this paper canbe established without assuming Condition 2.3, at the cost of some additional notations. In a nutshell,one has to consider the sets ˆ S := { x ∈ S : V ∗ ( x ) < ∞} and S \ ˆ S separately, and note that if x ∈ ˆ S ,then φ ( x, t ) ∈ ˆ S for each t ∈ [0 , ∞ ) . The reasoning presented under Condition 2.3 can be followed inan obvious manner. We formulate the corresponding optimality results in Remarks 3.1 and 3.2 below.4
Main statements
We first present the main optimality results concerning problem (8) for the PDMDP model. Theirproofs are postponed to the next section.
Theorem 3.1
Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. Then the following assertions hold.(a) The value function V ∗ for problem (8) is the minimal [1 , ∞ ) -valued solution to the followingoptimality equation: − ( V ( φ ( x, t )) − V ( x ))= Z t inf a ∈ A (cid:26)Z S V ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ( φ ( x, τ )) (cid:27) dτ,t ∈ [0 , ∞ ) , x ∈ S. In particular, V ∗ ( φ ( x, t )) is absolutely continuous in t for each x ∈ S. (b) There exists a deterministic stationary optimal policy f , which can be taken as any measurablemapping from S to A such that inf a ∈ A (cid:26)Z S V ∗ ( y )˜ q ( dy | x, a ) − ( q x ( a ) − c ( x, a )) V ∗ ( x )) (cid:27) = Z S V ∗ ( y )˜ q ( dy | x, f ( x )) − ( q x ( f ( x )) − c ( x, f ( x ))) V ∗ ( x )) , ∀ x ∈ S. Remark 3.1
By inspecting its proof, one can see the following version of Theorem 3.1 holds withoutassuming Condition 2.3. Suppose Conditions 2.1 and 2.2 are satisfied. Then the following assertionshold.(a) The value function V ∗ for problem (8) is the minimal [1 , ∞ ] -valued solution to the followingoptimality equation: − ( V ( φ ( x, t )) − V ( x ))= Z t inf a ∈ A (cid:26)Z S V ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ( φ ( x, τ )) (cid:27) dτ,t ∈ [0 , ∞ ) , x ∈ ˆ S ; V ( x ) < ∞ , x ∈ ˆ S ; V ( x ) = ∞ , x ∈ S \ ˆ S. In particular, V ∗ ( φ ( x, t )) is absolutely continuous in t for each x ∈ ˆ S. (b) There exists a deterministic stationary optimal policy f , which can be taken as any measurablemapping from S to A such that inf a ∈ A (cid:26)Z S V ∗ ( y )˜ q ( dy | x, a ) − ( q x ( a ) − c ( x, a )) V ∗ ( x )) (cid:27) = Z S V ∗ ( y )˜ q ( dy | x, f ( x )) − ( q x ( f ( x )) − c ( x, f ( x ))) V ∗ ( x )) , ∀ x ∈ ˆ S. Next, we present the value iteration algorithm for the value function V ∗ .5 heorem 3.2 Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. Let V (0) ( x ) := 1 for each x ∈ S .For each n ≥ , let V ( n +1) be the minimal [1 , ∞ ) -valued measurable solution to − ( V ( n +1) ( φ ( x, t )) − V ( n +1) ( x ))= Z t inf a ∈ A (cid:26)Z S V ( n ) ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ( n +1) ( φ ( x, τ )) (cid:27) dτ,t ∈ [0 , ∞ ) , x ∈ S, (9) such that V ( n +1) ( φ ( x, t )) is absolutely continuous in t for each x ∈ S. (For each n ≥ , such a solutionalways exists.) Furthermore, { V ( n ) } is a monontone nondecreasing sequence of measurable functionson S such that for each x ∈ S, V ( n ) ( x ) ↑ V ∗ ( x ) as n ↑ ∞ . Remark 3.2
Similar to Remark 3.1, we have the following version of Theorem 3.2 without assumingCondition 2.3. Suppose Conditions 2.1, 2.2 are satisfied. Let V (0) ( x ) := 1 for each x ∈ ˆ S and V (0) ( x ) = ∞ if x ∈ S \ ˆ S . For each n ≥ , let V ( n +1) be the minimal [1 , ∞ ] -valued measurable solutionto − ( V ( n +1) ( φ ( x, t )) − V ( n +1) ( x ))= Z t inf a ∈ A (cid:26)Z S V ( n ) ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ( n +1) ( φ ( x, τ )) (cid:27) dτ,t ∈ [0 , ∞ ) , x ∈ ˆ S,V ( n +1) ( x ) < ∞ , x ∈ ˆ S, V ( n +1) ( x ) = ∞ , x ∈ S \ ˆ S. Here V ( n +1) ( φ ( x, t )) is absolutely continuous in t for each x ∈ ˆ S. (For each n ≥ , such a solutionalways exists.) Furthermore, { V ( n ) } is a monontone nondecreasing sequence of measurable functionson S such that for each x ∈ S, V ( n ) ( x ) ↑ V ∗ ( x ) as n ↑ ∞ . We can apply our theorems to a special case of a CTMDP. That is, φ ( x, t ) ≡ x for each x ∈ S. The following α -discounted risk-sensitive CTMDP problem was considered in [14]:Minimize over π ∈ Π: E πx h e R ∞ e − αt R A c ( ξ t ,a ) π ( da | ω,t ) dt i , x ∈ S. (10)Here α > x ∈ S q x < ∞ , and sup x ∈ S,a ∈ A c ( x, a ) < ∞ , and a finite state space S .These restrictions, e.g., the finiteness of S , were needed for their investigations, see e.g., Remark 3.6 in[14]. Under the compactness-continuity condition (Condition 2.1), it was shown in [14] that there existsan optimal Markov policy for the discounted risk-sensitive CTMDP, and established the optimalityequation. By using the theorems presented earlier in this section, we can obtain these optimality resultsfor problem (10) in a much more general setup: the state space S is Borel, there is no boundednessrequirement on the transition rate with respect to the state x ∈ S , and the optimality is over the classof history-dependent policies. Furthermore, we let the CTMDP model be nonhomogeneous, i.e., thetransition rate q ( dy | t, x, a ) now is a signed kernel on B ( S ) from ( t, x, a ) ∈ [0 , ∞ ) × S × A , satisfyingthe corresponding version of (3); the notations ˜ q is kept as before, see (2), with the extra argument t in addition to x . Similarly, the nonnegative cost rate c is allowed to be a measurable function on[0 , ∞ ) × S × A . Corollary 3.1
Consider the α -discounted risk-sensitive (nonhomogeneous) CTMDP problem (10)with c ( ξ t , a ) being replaced by c ( t, ξ t , a ) . Suppose sup t ∈ [0 , ∞ ) { q ( t,x ) } < ∞ , ∀ x ∈ S, sup t ∈ [0 , ∞ ) ,x ∈ S,a ∈ A c ( t, x, a ) < ∞ , nd the corresponding version of Condition 2.1, where x is replaced by ( t, x ) , is satisfied by the non-homogeneous CTMDP model. Then the following assertions hold.(a) There exists some [1 , ∞ ) -valued measurable solution on [0 , ∞ ) × S to − ( V ( t, x ) − V (0 , x ))= Z t inf a ∈ A (cid:26)Z S V ( u, y )˜ q ( dy | u, x, a ) + ( e − αu c ( u, x, a ) − q ( u,x ) ( a )) V ( u, x ) (cid:27) du,x ∈ S, t ∈ [0 , ∞ ) , so that V ( t, x ) is absolutely continuous in t for each x ∈ S. (b) Let L be the minimal [1 , ∞ ) -valued measurable solution on [0 , ∞ ) × S to the above equation.Then the value function say L ∗ to the α -discounted risk-sensitive CTMDP problem (10) (with c ( ξ t , a ) being replaced by c ( t, ξ t , a ) ) is given by L ∗ ( x ) = L (0 , x ) for each x ∈ S. (c) There exists an optimal deterministsic Markov policy f for the α -discounted risk-sensitive CT-MDP problem (10) (with c ( ξ t , a ) being replaced by c ( t, ξ t , a ) ). One can take f as any measurablemapping from [0 , ∞ ) × S to A such that inf a ∈ A (cid:26)Z S L ( u, y )˜ q ( dy | u, x, a ) + ( e − αu c ( u, x, a ) − q ( u,x ) ( a )) L ( u, x ) (cid:27) = Z S L ( u, y )˜ q ( dy | u, x, f ( u, x )) + ( e − αu c ( u, x, f ( u, x )) − q ( u,x ) ( f ( u, x ))) L ( u, x ) for each u ∈ [0 , ∞ ) and x ∈ S. Proof.
We prove this by reformulating the nonhomogeneous version of the α -discounted risk-sensitive(nonhomogeneous) CTMDP problem (10) in the form of problem (8) for a PDMDP, which we introduceas follows. We use the notation “hat” to distinguish this model from the original (nonhomogeneous)CTMDP model. • The state space is ˆ S = [0 , ∞ ) × S. • The action space is the same as in the CTMDP: ˆ A = A. • the transition rate ˆ q ( ds × dy | ( t, x ) , a ) is defined byˆ q ( ds × dy | ( t, x ) , a ) := ˜ˆ q ( ds × dy | ( t, x ) , a ) − I { ( t, x ) ∈ ds × dy } q ( t,x ) ( a ) , where ˜ˆ q ( ds × dy | ( t, x ) , a ) := I { t ∈ ds } ˜ q ( dy | t, x, a ) , for each ( t, x ) ∈ ˆ S and a ∈ ˆ A. • The drift is given by ˆ φ (( t, x ) , s ) := ( t + s, x ) for each x ∈ S and t, s ≥ . Clearly it satisfies thecorresponding version of (4). • The cost rate is given byˆ c (( t, x ) , a ) := e − αt c ( t, x, a ) , ∀ t ∈ [0 , ∞ ) , x ∈ S, a ∈ A. { ˆ t n , ˆ x n } and controlled process ˆ ξ t in this PDMDP model is connectedto those in the original (nonhomogeneous) CTMDP model, namely ( t n , x n ) and ξ t , via ˆ t n = t n andˆ x n = ( t n , x n ) , and ˆ ξ t = ( t, ξ t ) . For example, under a fixed strategy ˆ π and initial distribution ˆ γ in thisPDMDP model, the version of the first equation in (7) now reads on { ω : x n ( ω ) ∈ S } ˆ P ˆ π ˆ γ (ˆ θ n +1 ∈ Γ , ˆ x n +1 ∈ Γ × Γ | ˆ x , ˆ θ , ˆ x , . . . , ˆ θ n , ˆ x n )= Z Γ e − R t R A q ( tn + s,xn ) ( a )ˆ π n ( da | ˆ x , ˆ θ ,..., ˆ θ n , ˆ x n ,s ) ds × Z A I { t + t n ∈ Γ } ˜ q (Γ | t + t n , x n , a )ˆ π n ( da | ˆ x , ˆ θ , . . . , ˆ θ n , ˆ x n , t ) dt, ∀ Γ ∈ B ((0 , ∞ )) , Γ ∈ B ([0 , ∞ )) , Γ ∈ B ( S ) . Clearly, Conditions 2.1, 2.2 and 2.3 are satisfied by this PDMDP model. It remains to applyTheorem 3.1. ✷ The condition in the previous corollary is much weaker than in [14], and can be further weakened;one only needs the reformulated PDMDP to satisfy Conditions 2.1, 2.2 and 2.3. Moreover, theboundedness of the cost rate c was assumed in the previous corollary only to ensure Condition 2.3 tobe satisfied. It can be relaxed if one formulates the previous corollary using the statements in Remarks3.1 and 3.2.One can also consider the risk-sensitive nonhomogeneous CTMDP problem on the finite horizon[0 , T ] with T > π ∈ Π: E πx h e R T e − αt R A c ( t,ξ t ,a ) π ( da | ω,t ) dt + g ( ξ T ) i , x ∈ S, where g is a [0 , ∞ )-valued measurable function; g ( x ) represents the terminal cost incurred when ξ T = x ∈ S . Let us put g ( x ∞ ) := 0 . Here α is a fixed nonnegative finite constant. A simpler version ofthis problem was considered in [24] with α = 0 and a bounded cost rate, where additional restrictionswere put on the growth of the transition rate. We can reformulate this problem into the PDMDPproblem (8) just as in the above. The only difference is that now we put q ( t,x ) ( a ) ≡ x ∈ S and t ≥ T, and introduce the following cost rate for each x ∈ S , t ≥ a ∈ A :ˆ c (( t, x ) , a ) = (cid:26) e − αt c ( t, x, a ) , if t ≤ T ; e − ( t − T ) g ( x ) if t > T. For the rest of this paper, it is convenient to introduce the following notations. Let P ( A ) be the spaceof probability measures on B ( A ), endowed with the standard weak topology. For each µ ∈ P ( A ), q x ( µ ) := Z A q x ( a ) µ ( da ) , ˜ q ( dy | x, µ ) := Z A ˜ q ( dy | x, a ) µ ( da ) , c ( x, µ ) := Z A c ( x, a ) µ ( da ) . Let R denote the set of (Borel) measurable mappings ρ t ( da ) from t ∈ (0 , ∞ ) → P ( A ) . Here, we do notdistinguish two measurable mappings in t ∈ (0 , ∞ ) , which coincide almost everywhere with respect tothe Lebesgue measure. Let us equip R with the Young topology, which is the weakest topology withrespect to which the function ρ ∈ R → R ∞ R A f ( t, a ) ρ t ( da ) dt is continuous for each strongly integrableCarath´eodory function f on (0 , ∞ ) × A . Here a real-valued measurable function f on (0 , ∞ ) × A iscalled a strongly integrable Carath´eodory function if for each fixed t ∈ (0 , ∞ ), f ( t, a ) is continuous in a ∈ A, and for each fixed a ∈ A, sup a ∈ A | f ( t, a ) | is integrable in t , i.e., R ∞ sup a ∈ A | f ( t, a ) | dt < ∞ . Itis known that if A is a compact Borel space, then so is R ; see Chapter 4 of [10].8 emma 4.1 Suppose Conditions 2.1 and 2.2 are satisfied. Then the following assertions hold.(a) The value function V ∗ is the minimal [1 , ∞ ] -valued measurable solution to V ∗ ( x ) = inf ρ ∈R (cid:26)Z ∞ e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds (cid:18)Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) (cid:19) dτ + e − R ∞ q φ ( x,s ) ( ρ s ) ds e R ∞ c ( φ ( x,s ) ,ρ s ) ds o , ∀ x ∈ S. (b) The mapping ρ ∈ R → W ( x, ρ ) := Z ∞ e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds (cid:18)Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) (cid:19) dτ + e − R ∞ q φ ( x,s ) ( ρ s ) ds e R ∞ c ( φ ( x,s ) ,ρ s ) ds is lower semicontinuous for each x ∈ S. Proof.
One can legitimately consider the following DTMDP (discrete-time Markov decision process):according to Lemma 2.29 of [9], all the involved mappings are measurable. • The state space is X := ((0 , ∞ ) × S ) S { ( ∞ , x ∞ ) } . Whenever the topology is concerned, ( ∞ , x ∞ )is regarded as an isolated point in X . • The action space is A := R . • The transition kernel p on B ( X ) from X × A , c.f. (7), is given for each ρ ∈ A by p (Γ × Γ | ( θ, x ) , ρ ) := Z Γ e − R t q φ ( x,s ) ( ρ s ) ds ˜ q (Γ | φ ( x, t ) , ρ t ) dt, ∀ Γ ∈ B ( S ) , Γ ∈ B ((0 , ∞ )) , x ∈ S, θ ∈ (0 , ∞ ) ,p ( { ( ∞ , x ∞ ) }| ( θ, x ) , ρ ) := e − R ∞ q φ ( x,s ) ( ρ s ) ds , ∀ x ∈ S, θ ∈ (0 , ∞ ); p ( { ( ∞ , x ∞ ) }| ( ∞ , x ∞ ) , ρ ) := 1 . • The cost function l is a [0 , ∞ ]-valued measurable function on X × A × X given by l (( θ, x ) , ρ, ( τ, y )) := Z ∞ I { s < τ } c ( φ ( x, s ) , ρ s ) ds, ∀ (( θ, x ) , ρ, ( τ, y )) ∈ X × A × X . The relevant facts and statements for the DTMDP are included in the Appendix.One can show that under Conditions 2.1 and 2.2, for each ( θ, x ) ∈ X , a ∈ A → R X f ( z ) p ( dz | ( θ, x ) , a )is continuous for each bounded measurable function f on X ; for each ( θ, x ) ∈ X and ( τ, y ) ∈ X , a ∈ A → l (( θ, x ) , ρ, ( τ, y )) is lower semicontinuous, and A is a compact Borel space. Hence, ConditionA.1 for the DTMDP model { X , A , p, l } is satisfied.The controlled process in the above DTMDP model { X , A , p, l } is denoted by { Y n , n = 0 , , . . . } ,where Y n = (Θ n , X n ), and the controlling process is denoted by { A n , n = 0 , , . . . } . For n ≥ , Θ n and X n correspond to the n th sojourn time and the post-jump state in the PDMDP, Θ is fictitious,and X is the initial state in the PDMDP. Let Σ be the class of all strategies for the DTMDP model { X , A , p, l } , and Σ DM be the class of deterministic Markov strategies in the form σ = ( ϕ n ) where ϕ (( θ, x )) does not depend on θ ∈ (0 , ∞ ) for each x ∈ S. We preserve the term of policy for thePDMDP and the term of strategy for the DTMDP.9ccording to Proposition A.1, the function( θ, x ) ∈ X → V ∗ (( θ, x )) := inf σ ∈ Σ E σ ( θ,x ) h e P ∞ n =0 l ( Y n ,A n ,Y n +1 ) i is the minimal [1 , ∞ ]-valued measurable solution to the optimality equation V ∗ (( θ, x )) = inf ρ ∈R (cid:26)Z ∞ e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds (cid:18)Z S V ∗ (( τ, y ))˜ q ( dy | φ ( x, τ ) , ρ τ ) (cid:19) dτ + e − R ∞ q φ ( x,s ) ( ρ s ) ds e R ∞ c ( φ ( x,s ) ,ρ s ) ds o for each x ∈ S and θ ∈ (0 , ∞ ); this is just (20). Furthermore, by Proposition A.1, there exists adeterministic stationary strategy σ ∗ for the DTMDP such that σ ∗ (( θ, x )) attains the above infimumfor each x ∈ S and θ ∈ (0 , ∞ ) , and any such strategy σ ∗ verifies E σ ∗ ( θ,x ) h e P ∞ n =0 l ( Y n ,A n ,Y n +1 ) i = inf σ ∈ Σ E σ ( θ,x ) h e P ∞ n =0 l ( Y n ,A n ,Y n +1 ) i , ∀ ( θ, x ) ∈ X . Let ˆ θ ∈ (0 , ∞ ) be arbitrarily fixed. The function V ∗ (( θ, x )) being measurable in ( θ, x ) ∈ X , it followsthat x ∈ S → V ∗ ((ˆ θ, x )) is measurable. The strategy σ ∗ and the constant ˆ θ induce a deterministicMarkov strategy σ ∗∗ = ( ϕ n ) ∈ Σ DM , where ϕ (( θ, x )) =: σ ∗ ((ˆ θ, x )) for each θ ∈ (0 , ∞ ) , x ∈ S , and ϕ n (( θ, x )) := σ (( θ, x )) for each n ≥ θ ∈ (0 , ∞ ) , x ∈ S. (The control on the isolated point (0 , x ∞ )is irrelevant and we do not specify the definition of the strategy on that point.) This strategy can beidentified with a policy π ∗ in the PDMDP, c.f. (6). On the other hand, each policy π = ( π n ) can beidentified with a deterministic strategy in this DTMDP. Thus, V ∗ ( x ) ≥ V ∗ ((ˆ θ, x )) = E σ ∗ (ˆ θ,x ) h e P ∞ n =0 l ( Y n ,A n ,Y n +1 ) i = E σ ∗∗ (ˆ θ,x ) h e P ∞ n =0 l ( Y n ,A n ,Y n +1 ) i = V ( x, π ∗ ) ≥ V ∗ ( x )for each x ∈ S. Consequently, the policy π ∗ is optimal, V ∗ ( x ) = V ∗ (( θ, x )) for each x ∈ S and θ ∈ (0 , ∞ ); recall that ˆ θ was arbitrarily fixed. The statement of this lemma now follows. ✷ The policy π ∗ in the proof of the previous lemma is actually optimal for problem (8). However, it isnot necessarily a deterministic nor stationary policy. Also the reduction of the risk-sensitive PDMDPproblem (8) to a risk-sensitive problem for the DTMDP model { X , A , p, l } as seen in the proof of theabove theorem will be used without special reference in what follows. Lemma 4.2
Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. For each x ∈ S and ρ ∈ R , t ∈ [0 , ∞ ) → Z t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) is monotone nondecreasing in t ∈ [0 , ∞ ) .Proof. Let 0 ≤ t < t < ∞ be arbitrarily fixed. We need show Z t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) ≥ Z t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) . (11)10t is without loss of generality to assume Z t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ < ∞ . Then all the four terms in (11) are nonnegative and finite, and (11) is equivalent to Z t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) − Z t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ − e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t ))= Z t t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds (cid:16) e − R t t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) − V ∗ ( φ ( x, t )) (cid:17) = (cid:26)Z t − t e − R τ ( q φ ( x,s + t ( ρ s + t ) − c ( φ ( x,s + t ) ,ρ s + t )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, t + τ ) , ρ t + τ ) dτ + e − R t t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) − V ∗ ( φ ( x, t )) o e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds ≥ , (12)which is verified as follows. Let δ > ν ∈ R such that V ∗ ( φ ( x, t )) + δ ≥ Z ∞ Z S V ∗ ( y )˜ q ( dy | φ ( x, t + τ ) , ˆ ν τ ) e − R τ ( q φ ( x,t s ) (ˆ ν s ) − c ( φ ( x,t + s ) , ˆ ν s )) ds dτ + e − R ∞ q φ ( x,t s ) (ˆ ν s ) ds e R ∞ c ( φ ( x,t + s ) , ˆ ν s ) ds . (Recall φ ( x, t + t ) = φ ( φ ( x, t ) , t ) for each t ≥ . ) Consider ˜ ν ∈ R defined by˜ ν s = (cid:26) ρ t + s , if s ≤ t − t ;ˆ ν s − ( t − t ) if s > t − t . V ∗ ( φ ( x, t )) ≤ Z t − t e − R τ ( q φ ( x,t s )(˜ ν s ) − c ( φ ( x,t + s ) , ˜ ν s )) ds (cid:18)Z S V ∗ ( y )˜ q ( dy | φ ( x, t + τ ) , ˜ ν τ ) (cid:19) dτ + Z ∞ t − t e − R τ ( q φ ( x,t s )(˜ ν s ) − c ( φ ( x,t + s ) , ˜ ν s )) ds (cid:18)Z S V ∗ ( y )˜ q ( dy | φ ( x, t + τ ) , ˜ ν τ ) (cid:19) dτ + e − R t − t ( q φ ( x,t s ) (˜ ν s ) − c ( φ ( x,t + s ) , ˜ ν s )) ds e − R ∞ t − t q φ ( x,t s ) (˜ ν s ) ds e R ∞ t − t c ( φ ( x,t + s ) , ˜ ν s ) ds = Z t − t e − R τ ( q φ ( x,t s ) ( ρ s + t ) − c ( φ ( x,t + s ) ,ρ s + t )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, t + τ ) , ρ t + τ ) dτ + e − R t − t ( q φ ( x,t s ) ( ρ s + t ) − c ( φ ( x,t + s ) ,ρ s + t )) ds × (cid:26)Z ∞ e − R τ ( q φ ( x,t s ) (ˆ ν s ) − c ( φ ( x,t + s ) , ˆ ν s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, t + τ ) , ˆ ν τ ) dτ + e − R ∞ q φ ( x,t s ) (ˆ ν s ) ds e R ∞ c ( φ ( x,t + s ) , ˆ ν s ) ds o ≤ Z t − t e − R τ ( q φ ( x,t s ) ( ρ s + t ) − c ( φ ( x,t + s ) ,ρ s + t )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, t + τ ) , ρ t + τ ) dτ + e − R t − t ( q φ ( x,t s ) ( ρ s + t ) − c ( φ ( x,t + s ) ,ρ s + t )) ds ( V ∗ ( φ ( x, t )) + δ ) . Since δ > ✷ Lemma 4.3
Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. For each x ∈ S , there is some ρ ∗ ∈ R such that V ∗ ( x ) = inf ρ ∈R (cid:26)Z t e − R s ( q φ ( x,v ) ( ρ v ) − c ( φ ( x,v ) ,ρ v )) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , ρ s ) ds + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) o = Z t e − R s ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , ρ ∗ s ) ds + e − R t ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds V ∗ ( φ ( x, t )) , ∀ t ≥ . (13) Proof.
Let x ∈ S be fixed, and let ρ ∗ ∈ R be such that V ∗ ( x ) = W ( x, ρ ∗ ), see Lemma 4.1. Suppose t ∈ [0 , ∞ ) is arbitrarily fixed. Consider ˜ ρ ∈ R defined by ˜ ρ s = ρ ∗ t + s for each s >
0. Then V ∗ ( x ) = Z t e − R s ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , ρ ∗ s ) ds + e − R t ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds × (cid:26)Z ∞ e − R τ ( q φ ( x,t + s ) (˜ ρ s ) − c ( φ ( x,s + t ) , ˜ ρ s )) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ + t ) , ˜ ρ τ ) dτ + e − R ∞ q φ ( x,t + s ) (˜ ρ s ) ds e − R ∞ c ( φ ( x,t + s ) , ˜ ρ s ) ds o ≥ Z t e − R s ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , ρ ∗ s ) ds + e − R t ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds V ∗ ( φ ( x, t ));12ecall (4). On the other hand, by Lemma 4.2, V ∗ ( x ) ≤ inf ρ ∈R (cid:26)Z t e − R s ( q φ ( x,v ) ( ρ v ) − c ( φ ( x,v ) ,ρ v )) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , ρ s ) ds + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ ( φ ( x, t )) o . The statement of this lemma is thus proved. ✷ Lemma 4.4
Suppose Conditions 2.1, 2.2 and 2.3 are satisfied. Then for each x ∈ S, t ∈ [0 , ∞ ) → V ∗ ( φ ( x, t )) is absolutely continuous.Proof. This immediately follows from Lemma 4.3. ✷ Proof of Theorem 3.1. (a) Under Conditions 2.1, 2.2 and 2.3, by Lemma 4.4, for each x ∈ S, let t ∈ [0 , ∞ ) → U ∗ ( x, t ) be an integrable real-valued function such that U ∗ ( x, t ) coincides with thederivative of t ∈ [0 , ∞ ) → V ( φ ( x, t )) almost everywhere. Let x ∈ S and t ∈ [0 , ∞ ) be fixed, and let ρ ∗ ∈ R be from Lemma 4.3.By Lemmas 4.3 and 4.4, Z τ e − R s ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , ρ ∗ s ) ds and e − R τ ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds V ∗ ( φ ( x, τ ))are absolutely continuous in τ and are finite for each τ ∈ [0 , ∞ ). Since φ ( x,
0) = x , see (4), e − R t ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds V ∗ ( φ ( x, t )) − V ∗ ( x )= Z t e − R τ ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds (cid:8) U ∗ ( x, τ ) − ( q φ ( x,τ ) ( ρ ∗ τ ) − c ( φ ( x, τ ) , ρ ∗ τ )) V ∗ ( φ ( x, τ )) (cid:9) dτ. Now by Lemma 4.3,0 = Z t e − R s ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , ρ ∗ s ) ds + e − R t ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds V ∗ ( φ ( x, t )) − V ∗ ( x )= Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv (cid:26)Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ ∗ τ ) + U ∗ ( x, τ ) − ( q φ ( x,τ ) ( ρ ∗ τ ) − c ( φ ( x, τ ) , ρ ∗ τ )) V ∗ ( φ ( x, τ )) (cid:9) dτ ≥ Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv { U ∗ ( x, τ )+ inf a ∈ A (cid:26)Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ∗ ( φ ( x, τ )) (cid:27)(cid:27) dτ = Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv (cid:26) U ∗ ( x, τ ) + Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , f ( φ ( x, τ ))) − ( q φ ( x,τ ) ( f ( φ ( x, τ ))) − c ( φ ( x, τ ) , f ( φ ( x, τ )))) V ∗ ( φ ( x, τ )) (cid:9) dτ, (14)13here f is a measurable mapping from S to A such thatinf a ∈ A (cid:26)Z S V ∗ ( y )˜ q ( dy | x, a ) − ( q x ( a ) − c ( x, a )) V ∗ ( x ) (cid:27) = Z S V ∗ ( y )˜ q ( dy | x, f ( x )) − ( q x ( ϕ ( x )) − c ( x, f ( x ))) V ∗ ( x )for each x ∈ S ; the existence of such a mapping is according to a well known measurable selectiontheorem, c.f. Proposition D.5 of [17].Note that e − R τ ( q φ ( x,v ) ( ρ v ) − c ( φ ( x,v ) ,ρ v )) dv is bounded and separated from zero in τ ∈ [0 , t ] for each ρ ∈ R ; recall Condition 2.2. So Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv (cid:8) U ∗ ( x, τ ) − ( q φ ( x,τ ) ( f ( φ ( x, τ ))) − c ( φ ( x, τ ) , f ( φ ( x, τ )))) V ∗ ( φ ( x, τ )) (cid:9) dτ is finite. If Z t Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , f ( φ ( x, τ ))) dτ = ∞ , then Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv (cid:26) U ∗ ( x, τ ) + Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , f ( φ ( x, τ ))) − ( q φ ( x,τ ) ( f ( φ ( x, τ ))) − c ( φ ( x, τ ) , f ( φ ( x, τ )))) V ∗ ( φ ( x, τ )) (cid:9) dτ = ∞ , which is against (14). Therefore, Z t Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , f ( φ ( x, τ ))) dτ < ∞ . Then Z v e − R τ ( q φ ( x,s ) ( f ( φ ( x,s ))) − c ( φ ( x,s ) ,f ( φ ( x,s )))) ds Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , f ( φ ( x, τ ))) dτ + e − R v ( q φ ( x,s ) ( f ( φ ( x,s ))) − c ( φ ( x,s ) ,f ( φ ( x,s )))) ds V ∗ ( φ ( x, v ))is absolutely continuous on [0 , t ] . After legitimately differentiating the above expression with respectto v , and applying Lemma 4.2, we see U ∗ ( x, v ) + Z S V ∗ ( y )˜ q ( dy | φ ( x, v ) , f ( φ ( x, v ))) − ( q φ ( x,v ) ( f ( φ ( x, v ))) − c ( φ ( x, v ) , f ( φ ( x, v )))) V ∗ ( φ ( x, v )) ≥ v ∈ [0 , t ] . This and (14) imply U ∗ ( x, τ ) + inf a ∈ A (cid:26)Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ∗ ( φ ( x, τ )) (cid:27) = 0almost everywhere in τ ∈ [0 , t ] . Remember, t ∈ [0 , ∞ ) was arbitrarily fixed. The first part of (a) isthus verified, and we postpone the justification of the second part of (a) after the proof of part (b).(b) We use the same notation as in the above. Note thatlim t →∞ n e − R t ( q φ ( x,s ) ( f ( φ ( x,s ))) − c ( φ ( x,s ) ,f ( φ ( x,s )))) ds o ≥ e − R ∞ q φ ( x,s ) ( f ( φ ( x,s ))) ds e R ∞ c ( φ ( x,s ) ,f ( φ ( x,s )))) ds . (15)14ndeed, if either R ∞ q φ ( x,s ) ( f ( φ ( x, s ))) ds or R ∞ c ( φ ( x, s ) , f ( φ ( x, s )))) ds is finite, then in the above in-equality, the equality takes place; and if both R ∞ q φ ( x,s ) ( f ( φ ( x, s ))) ds and R ∞ c ( φ ( x, s ) , f ( φ ( x, s )))) ds are infinite, then the right hand side of the inequality is zero according to (1).In the proof of part (a), it was observed that Z t e − R s ( q φ ( x,v ) ( f ( φ ( x,v ))) − c ( φ ( x,v ) ,f ( φ ( x,v )))) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , f ( φ ( x, s ))) ds and e − R t ( q φ ( x,s ) ( f ( φ ( x,s ))) − c ( φ ( x,s ) ,f ( φ ( x,s )))) ds V ∗ ( φ ( x, t ))are absolutely continuous in t and are thus finite for each t ∈ [0 , ∞ ). As in the proof of part (a),similar calculations to those in (14) imply that for each t ∈ [0 , ∞ ) , Z t e − R s ( q φ ( x,v ) ( f ( φ ( x,v ))) − c ( φ ( x,v ) ,f ( φ ( x,v )))) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , f ( φ ( x, s ))) ds + e − R t ( q φ ( x,s ) ( f ( φ ( x,s ))) − c ( φ ( x,s ) ,f ( φ ( x,s )))) ds V ∗ ( φ ( x, t )) − V ∗ ( x )= Z t e − R τ ( q φ ( x,v ) ( f ( φ ( x,v ))) − c ( φ ( x,v ) ,f ( φ ( x,v )))) dv (cid:26) U ∗ ( x, τ ) + Z S V ∗ ( y )˜ q ( dy | φ ( x, τ ) , f ( φ ( x, τ ))) − ( q φ ( x,τ ) ( f ( φ ( x, τ ))) − c ( φ ( x, τ ) , f ( φ ( x, τ )))) V ∗ ( φ ( x, τ )) (cid:9) dτ = 0 , where the last equality is by what was established in part (a). Therefore, for each t ∈ [0 , ∞ ) ,V ∗ ( x ) − Z t e − R s ( q φ ( x,v ) ( f ( φ ( x,v ))) − c ( φ ( x,v ) ,f ( φ ( x,v )))) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , f ( φ ( x, s ))) ds = e − R t ( q φ ( x,s ) ( f ( φ ( x,s ))) − c ( φ ( x,s ) ,f ( φ ( x,s )))) ds V ∗ ( φ ( x, t )) ≥ e − R t ( q φ ( x,s ) ( f ( φ ( x,s ))) − c ( φ ( x,s ) ,f ( φ ( x,s )))) ds , where the inequality holds because V ∗ ( x ) ≥ x ∈ S. Taking lim t →∞ on the both sides of theprevious equality yields: V ∗ ( x ) − Z ∞ e − R s ( q φ ( x,v ) ( f ( φ ( x,v ))) − c ( φ ( x,v ) ,f ( φ ( x,v )))) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , f ( φ ( x, s ))) ds ≥ e − R ∞ q φ ( x,s ) ( f ( φ ( x,s ))) ds e R ∞ c ( φ ( x,s ) ,f ( φ ( x,s )))) ds with the inequality following from (15). Hence V ∗ ( x ) ≥ Z ∞ e − R s ( q φ ( x,v ) ( f ( φ ( x,v ))) − c ( φ ( x,v ) ,f ( φ ( x,v )))) dv Z S V ∗ ( y )˜ q ( dy | φ ( x, s ) , f ( φ ( x, s ))) ds + e − R ∞ q φ ( x,s ) ( f ( φ ( x,s ))) ds e R ∞ c ( φ ( x,s ) ,f ( φ ( x,s )))) ds = W ( x, ˜ f x ) ≥ V ∗ ( x ) . Here it is clear that s ∈ [0 , ∞ ) → f ( φ ( x, s )) can be identified as an element of R , denoted as ˜ f x . Infact, ˜ f xs = δ { f ( φ ( x,s )) } for each s ∈ [0 , ∞ ), whereas x ∈ S → ˜ f x ∈ R is measurable. This measurablemapping x ∈ S → ˜ f x ∈ R defines a deterministic stationary optimal strategy for the risk-sensitiveDTMDP problem (20) by Proposition A.1. It is clear that the measurable mapping x ∈ S → f ( x ) ∈ A defines an optimal deterministic stationary policy for the PDMDP problem (8).15inally, we show the remaining part of (a). Let H ∗ be a measurable [1 , ∞ )-valued function on S such that − ( H ∗ ( φ ( x, t )) − H ∗ ( x ))= Z t inf a ∈ A (cid:26)Z S H ∗ ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) H ∗ ( φ ( x, τ )) (cid:27) dτ,t ∈ [0 , ∞ ) , x ∈ S. There exists a measurable mapping h from S to A such thatinf a ∈ A (cid:26)Z S H ∗ ( y )˜ q ( dy | x, a ) − ( q x ( a ) − c ( x, a )) H ∗ ( x ) (cid:27) = Z S H ∗ ( y )˜ q ( dy | x, h ( x )) − ( q x ( h ( x )) − c ( x, h ( x ))) H ∗ ( x ) , ∀ x ∈ S ;c.f., Proposition D.5 of [17]. It follows that R s R S H ∗ ( y )˜ q ( dy | φ ( x, τ ) , h ( φ ( x, τ ))) dτ is absolutely con-tinuous in s ∈ [0 , t ] for each t ≥ . As in the proof of part (b), Z t e − R s ( q φ ( x,v ) ( h ( φ ( x,v ))) − c ( φ ( x,v ) ,h ( φ ( x,v )))) dv Z S H ∗ ( y )˜ q ( dy | φ ( x, s ) , h ( φ ( x, s ))) ds + e − R t ( q φ ( x,s ) ( h ( φ ( x,s ))) − c ( φ ( x,s ) ,h ( φ ( x,s )))) ds H ∗ ( φ ( x, t )) − H ∗ ( x ) = 0 , ∀ t ∈ [0 , ∞ ) , and by passing to the lower limit as t → ∞ , H ∗ ( x ) ≥ Z ∞ e − R s ( q φ ( x,v ) ( h ( φ ( x,v ))) − c ( φ ( x,v ) ,h ( φ ( x,v )))) dv Z S H ∗ ( y )˜ q ( dy | φ ( x, s ) , h ( φ ( x, s ))) ds + e − R ∞ q φ ( x,s ) ( h ( φ ( x,s ))) ds e R ∞ c ( φ ( x,s ) ,h ( φ ( x,s )))) ds ≥ inf ρ ∈R (cid:26)Z ∞ e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds (cid:18)Z S H ∗ ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) (cid:19) dτ + e − R ∞ q φ ( x,s ) ( ρ s ) ds e R ∞ c ( φ ( x,s ) ,ρ s ) ds o , ∀ x ∈ S. (16)It remains to refer to Proposition A.1 for that H ∗ ( x ) ≥ V ∗ ( x ) for each x ∈ S. ✷ Proof of Theorem 3.2 . Let V ∗ ( x ) := 1 for each x ∈ S. For each n ≥ , one can legitimately define V ∗ n +1 ( x ) = inf ρ ∈R (cid:26)Z ∞ e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds (cid:18)Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) (cid:19) dτ + e − R ∞ q φ ( x,s ) ( ρ s ) ds e R ∞ c ( φ ( x,s ) ,ρ s ) ds o , ∀ x ∈ S. (17)Recall that the DTMDP model { X , A , p, l } satisfies Condition A.1, as noted in the proof of Lemma 4.1.Then by Proposition A.1, { V ∗ n } is a monotone nondecreasing sequence of [1 , ∞ )-valued measurablefunctions on S such that V ∗ n ( x ) ↑ V ∗ ( x ) as n ↑ ∞ , for each x ∈ S. Let n ≥ x ∈ S , there is some ρ ∗ ∈ R such that V ∗ n +1 ( x ) = inf ρ ∈R (cid:26)Z t e − R s ( q φ ( x,v ) ( ρ v ) − c ( φ ( x,v ) ,ρ v )) dv Z S V ∗ n ( y )˜ q ( dy | φ ( x, s ) , ρ s ) ds + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ n +1 ( φ ( x, t )) o = Z t e − R s ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv Z S V ∗ n ( y )˜ q ( dy | φ ( x, s ) , ρ ∗ s ) ds + e − R t ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds V ∗ n +1 ( φ ( x, t )) , ∀ t ≥ . x ∈ S and ρ ∈ R , t ∈ [0 , ∞ ) → Z t e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) dτ + e − R t ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds V ∗ n +1 ( φ ( x, t ))is monotone nondecreasing in t ∈ [0 , ∞ ). Clearly, V ∗ n +1 ( φ ( x, t )) is absolutely continuous in t ∈ [0 , ∞ )for each x ∈ S .Corresponding to (14), we now have0 = Z t e − R s ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv Z S V ∗ n ( y )˜ q ( dy | φ ( x, s ) , ρ ∗ s ) ds + e − R t ( q φ ( x,s ) ( ρ ∗ s ) − c ( φ ( x,s ) ,ρ ∗ s )) ds V ∗ n +1 ( φ ( x, t )) − V ∗ n +1 ( x )= Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv (cid:26)Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , ρ ∗ τ ) + U ∗ n +1 ( x, τ ) − ( q φ ( x,τ ) ( ρ ∗ τ ) − c ( φ ( x, τ ) , ρ ∗ τ )) V ∗ n +1 ( φ ( x, τ )) (cid:9) dτ ≥ Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv (cid:8) U ∗ n +1 ( x, τ )+ inf a ∈ A (cid:26)Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ∗ n +1 ( φ ( x, τ )) (cid:27)(cid:27) dτ = Z t e − R τ ( q φ ( x,v ) ( ρ ∗ v ) − c ( φ ( x,v ) ,ρ ∗ v )) dv (cid:26) U ∗ n +1 ( x, τ ) + Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , f ( φ ( x, τ ))) − ( q φ ( x,τ ) ( f ( φ ( x, τ ))) − c ( φ ( x, τ ) , f ( φ ( x, τ ))) V ∗ n +1 ( φ ( x, τ )) (cid:9) dτ, where τ ∈ [0 , t ] → U ∗ n +1 ( x, τ ) is integrable and coincides with ∂V ∗ n +1 ( φ ( x,t )) ∂t almost everywhere, and f is some measurable mapping from S to A , whose existence is guaranteed by Proposition D.5 of[17]. Continued from the above relation, the reasoning in the proof of the first assertion in part (a) ofTheorem 3.1 can be followed: eventually we see U ∗ n +1 ( x, τ ) + inf a ∈ A (cid:26)Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ∗ n +1 ( φ ( x, τ )) (cid:27) = 0almost everywhere in τ ∈ [0 , t ] , i.e., the equation − ( V ( φ ( x, t )) − V ( x ))= Z t inf a ∈ A (cid:26)Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , a ) − ( q φ ( x,τ ) ( a ) − c ( φ ( x, τ ) , a )) V ( φ ( x, τ )) (cid:27) dτ,t ∈ [0 , ∞ ) , x ∈ S, (18)is satisfied by V = V ∗ n +1 . Recall that V ∗ = V (0) . Suppose the recursive definition in (9) is valid up to step n , and V ∗ n ( x ) = V ( n ) ( x ) for each x ∈ S. Consider an arbitrarily fixed [1 , ∞ )-valued measurable solution V to (18), andlet f ∗ be a measurable mapping from S to A such thatinf a ∈ A (cid:26)Z S V ∗ n ( y )˜ q ( dy | x, a ) − ( q x ( a ) − c ( x, a )) V ( x ) (cid:27) = Z S V ∗ n ( y )˜ q ( dy | x, f ∗ ( x )) − ( q x ( f ∗ ( x )) − c ( x, f ∗ ( x ))) V ( x ) , ∀ x ∈ S. V ( x ) ≥ Z ∞ e − R s ( q φ ( x,v ) ( f ∗ ( φ ( x,v ))) − c ( φ ( x,v ) ,f ∗ ( φ ( x,v )))) dv Z S V ∗ n ( y )˜ q ( dy | φ ( x, s ) , f ∗ ( φ ( x, s ))) ds + e − R ∞ q φ ( x,s ) ( f ∗ ( φ ( x,s ))) ds e R ∞ c ( φ ( x,s ) ,f ∗ ( φ ( x,s )))) ds ≥ inf ρ ∈R (cid:26)Z ∞ e − R τ ( q φ ( x,s ) ( ρ s ) − c ( φ ( x,s ) ,ρ s )) ds (cid:18)Z S V ∗ n ( y )˜ q ( dy | φ ( x, τ ) , ρ τ ) (cid:19) dτ + e − R ∞ q φ ( x,s ) ( ρ s ) ds e R ∞ c ( φ ( x,s ) ,ρ s ) ds o = V ∗ n +1 ( x ) , ∀ x ∈ S, where the last equality is by (17). Thus, V ∗ n +1 is the minimal [1 , ∞ )-valued measurable solution to(18), and coincides with V ( n +1) . Therefore, by induction V ∗ n = V ( n ) for each n ≥ . It follows nowthat V ( n ) ( x ) ↑ V ∗ ( x ) as n ↑ ∞ for each x ∈ S. ✷ In this paper, we considered total undiscounted risk-sensitive PDMDP in Borel state and action spaceswith a nonnegative cost rate. The transition and cost rates are assumed to be locally integrable alongthe drift. Under quite natural conditions, we showed that the value function is a solution to theoptimality equation, justified the value iteration algorithm, and showed the existence of deterministicstationary optimal policy. As a corollary, the obtained results were applied to improving significantlyknown results for finite horizon undiscounted and infinite horizon discounted risk-sensitive CTMDPin the literature.
A Appendix
For ease of reference, we present the relevant notations and facts about the risk-sensitive problem fora DTMDP. The proofs of the presented statements can be found in [16] or [27]. Standard descriptionof a DTMDP can be found in e.g., [17, 22].Consider a discrete-time Markov decision process with the following primitives: • X is a nonempty Borel state space. • A is a nonempty Borel action space. • p ( dy | x, a ) is a stochastic kernel on B ( X ) given ( x, a ) ∈ X × A . • l a [0 , ∞ ]-valued measurable cost function on X × A × X . Let Σ be the space of strategies, and Σ DM be the space of all deterministic strategies for theDTMDP. Let the controlled and controlling processes be denoted by { Y n , n = 0 , , . . . , ∞} and { A n , n = 0 , , . . . , ∞} , respectively. The strategic measure of a strategy σ given the initial state x ∈ X is denoted by P σx . The expectation taken with respect to P σx is denoted by E σx . Consider the optimal control problemMinimize over σ : E σx h e P ∞ n =0 l ( Y n ,A n ,Y n +1 ) i =: V ( x, σ ) , x ∈ X . (19)It is also referred to as the risk-sensitive DTMDP problem. We denote the value function of problem(19) by V ∗ . Then a strategy σ ∗ is called optimal for problem (19) if V ( x, σ ∗ ) = V ∗ ( x ) for each x ∈ X . ondition A.1 (a) The function l ( x, a, y ) is lower semicontinuous in a ∈ A for each x, y ∈ X . (b) For each bounded measurable function f on X and each x ∈ X , R X f ( y ) p ( dy | x, a ) is continuousin a ∈ A . (c) The space A is a compact Borel space. Proposition A.1
Suppose Condition A.1 is satisfied.(a) The value function V ∗ is the minimal [1 , ∞ ] -valued measurable solution to V ( x ) = inf a ∈ A (cid:26)Z X p ( dy | x, a ) e l ( x,a,y ) V ( y ) (cid:27) , x ∈ X . (20) (b) Let U be a [1 , ∞ ] -valued lower semianalytic function on X . If U ( x ) ≥ inf a ∈ A (cid:26)Z X p ( dy | x, a ) e l ( x,a,y ) U ( y ) (cid:27) , ∀ x ∈ X , then U ( x ) ≥ V ∗ ( x ) for each x ∈ X . In particular, if the function U satisfying the above relationis [1 , ∞ ) -valued, then so is the value function V ∗ . (c) Let ϕ be a deterministic stationary strategy for the DTMDP model { X , A , p, l } . If V ∗ ( x ) = Z X p ( dy | x, ϕ ( x )) e l ( x,ϕ ( x ) ,y ) V ∗ ( y ) , ∀ x ∈ X , (21) then V ∗ ( x ) = V ( x, ϕ ) for each x ∈ X . (d) Let V (0) ( x ) := 1 for each x ∈ X , and for each n = 1 , , . . . , V ( n ) ( x ) := inf a ∈ A (cid:26)Z X p ( dy | x, a ) e l ( x,a,y ) V ( n − ( y ) (cid:27) , ∀ x ∈ X . Then ( V ( n ) ( x )) increases to V ∗ ( x ) for each x ∈ X , where V ∗ is the value function for problem(19). Furthermore, there exists a deterministic stationary strategy ϕ satisfying (21), and so inparticular, there exists a deterministic stationary optimal strategy for the risk-sensitive DTMDPproblem (19). Acknowledgement.
We thank the referees for their remarks, which improved the presentation ofthis paper. This work is partially supported by a grant from the Royal Society (IE160503).
References [1] B¨auerle, N. and Rieder, U. (2009). MDP algorithms for portfolio optimization problems in purejump markets.
Finance Stoch. , 591-611.[2] B¨auerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance .Springer, Berlin.[3] B¨auerle, N. and Rieder, U. (2014). More risk-sensitive Markov decision processes.
Math. Oper.Res. , 105-120. 194] B¨auerle, N. and Ja´skiewicz, A. (2015). Risk-sensitive Divident problems. Eur. J. Oper. Res. ,161-171.[5] Bertsekas, D. and Shreve, S. (1978).
Stochastic Optimal Control . Academic Press, New York.[6] Cavazos-Cadena, R. and Montes-de-Oca, R. (2000). Optimal stationary policies in risk-sensitivedynamic programs with finite state space and nonnegative rewards.
Appl. Math. (Warsaw) ,167-185.[7] Chung, K. and Sobel, M. (1987). Discounted MDP’s: distribution functions and exponentialutility maximization. SIAM J Control Optim. , 49-62.[8] Coraluppi, S. and Marcus, S. (1997). Risk-sensitive queueing. Proceedings of the 35th AnnualAllerton Conference on Communication Control and Computing , 943-952.[9] Costa, O. and Dufour, F. (2013).
Continuous Average Control of Piecewise Deterministic MarkovProcesses . Springer, New York.[10] Davis, M. (1993).
Markov Models and Optimization . Chapman and Hall, London.[11] Di Masi, G. and Stettner, L. (1999). Risk-sensitive control of discrete-time Markov processes withinfinite horizon.
SIAM J. Control Optim. , 61-78.[12] Fainberg, E. (1982). Controlled Markov processes with arbitrary numerical criteria. TheoryProbab. Appl. , 486-503.[13] Forwick, L., Sch¨al, M. and Schmitz, M. (2004). Piecewise deterministic Markov control processeswith feedback controls and unbounded costs. Acta Appl. Math. , 239-267.[14] Ghosh, M. and Saha, S. (2014). Risk-sensitive control of continuous time Markov chains. Stochas-tics , 655-675.[15] Jaquette, S. (1976). A utility criterion for Markov decision processes. Manag. Sci. , 43-49.[16] Ja´skiewicz, A. (2008). A note on negative dynamic programming for risk-sensitive control. Oper.Res. Lett. , 531-534.[17] Hern´andez-Lerma, O. and Lasserre, J. (1996). Discrete-Time Markov Control Processes . Springer-Verlag, New York.[18] Howard, R. and Matheson, J. (1972). Risk-sensitive Markov decision proceses.
Manag. Sci. ,356-369.[19] Kitaev, M. and Rykov, V. (1995). Controlled Queueing Systems . CRC Press, Boca Raton.[20] Kumar, S. and Pal, C. (2013). Risk-Sensitive control of pure jump process on countable spacewith near monotone cost.
Appl. Math. Optim. , 311-331.[21] Piunovski, A. and Khametov, V. (1985). New effective solutions of optimality equations for thecontrolled Markov chains with continuous parameter (the unbounded price-function). ProblemsControl Inform. Theory , 303-318.[22] Piunovskiy, A. (1997). Optimal Control of Random Sequences in Problems with Constraints ,Kluwer, Dordrecht. 2023] Sch¨al, M. (1998). On piecewise deterministic Markov control processes: control of jumps and ofrisk processes in insurance.
Insur. Math. Econ. , 75-91.[24] Wei, Q. (2016). Continuous-time Markov decision processes with risk-sensitive finite-horizon costcriterion. Math. Meth. Oper. Res. , 461-487.[25] Wei, Q. and Chen, X. (2016). Continuous-time Markov decision processes under the risk-sensitiveaverage cost criterion. Oper. Res. Lett. , 457-462.[26] Yushkevich, A. (1980). On reducing a jump controllable Markov model to a model with discretetime. Theory. Probab. Appl. , 58-68.[27] Zhang, Y. (2017). Continuous-time Markov decision processes with exponential utility. SIAM J.Control Optim.55