Convergence of Finite Element Methods for Singular Stochastic Control
CConvergence of Finite Element Methods forSingular Stochastic Control
Martin G. Vieten (University of Wisconsin-Milwaukee, [email protected])Richard H. Stockbridge (University of Wisconsin-Milwaukee)
Abstract:
A numerical method is proposed for a class of stochastic control problems including singularbehavior. This method solves an infinite-dimensional linear program equivalent to the stochastic controlproblem using a finite element type approximation, which results in a solvable finite-dimensional program.The discretization scheme as well as the necessary assumptions are discussed, and a detailed convergenceanalysis for the discretization scheme is given. Its performance is illustrated by two examples featuring along-term average cost criterion.
Keywords: singular stochastic control, finite element method, linear programming, relaxed controls
AMS classification:
This paper considers singular stochastic control problems for a process X whose dynamics are initiallyspecified by a stochastic differential equation (SDE) dX t = b ( X t , u t ) dt + σ ( X t , u t ) dW t + dξ t , X = x , (1.1)where W is a Brownian motion process and ξ is another stochastic process that evolves singularly in time.The process u represents the control influencing the evolution of X . Given two cost functions ˜ c and ˜ c , u has to be chosen from a set of admissible controls in such a way that it minimizes either a long-term averagecost criterion lim sup t →∞ t E (cid:20)(cid:90) t ˜ c ( X s , u s ) ds + (cid:90) t ˜ c ( X s , u s ) dξ s (cid:21) (1.2)or a discounted infinite horizon cost criterion E (cid:20)(cid:90) ∞ e − αs ˜ c ( X s , u s ) ds + (cid:90) ∞ e − αs ˜ c ( X s , u s ) dξ s (cid:21) , (1.3)for some discounting rate α >
0. Such control problems are considered in a relaxed sense by using a martingaleproblem formulation involving the infinitesimal generators of X , and an equivalent infinite-dimensional linearprogram for the expected occupation measures of both the process X and the control u . Approximatesolutions to this linear program are attained by discretizing the infinite-dimensional constraint space offunctions using a finite element approach, and discrete approximations of the expected occupation measures.The (cid:15) -optimality of approximate solutions is shown and the method is applied to two example problems.The classical analytic approach to stochastic control problems is given by methods based on the dynamicprogramming principle, as presented in Fleming and Rishel [6] or Fleming and Soner [7]. Central to thesemethods is the solution of the so-called Hamilton-Jacobi-Bellman (HJB) equation. Numerical methods canbe derived by solving a control problem for an approximate, discrete Markov chain, as extensively discussedin Kushner and Dupuis [15], or by using discrete methods to approximate the solution to the HJB equation,frequently considering viscosity solutions. An example is given in Kumar and Muthuraman [12]. Anothernumerical technique using dynamic programming was analyzed in Anselmi et. al. [1].As an alternative, linear programming approaches have been instrumental in the analytic treatment of variousstochastic control problems. The first example is given in Manne [18], where an ergodic Markov chain foran inventory problem under long-term average costs is analyzed. Bhatt and Borkar [2] as well as Kurtzand Stockbridge [13] investigated the linear programming approach for solutions of controlled martingaleproblems using long-term average and discounted cost criteria for infinite horizon problems, as well as finitehorizon and first exit problems for absolutely continuous control. Taksar [22] establishes equivalence between1 a r X i v : . [ m a t h . P R ] J un linear program and a stochastic control problem for a multi-dimensional diffusion with singular control.Jump diffusions of Levy-Ito type are considered by Serrano [21].To provide an alternative to numerical techniques based on the dynamic programming principle, the linearprogramming approach has been exploited using various discretization techniques. A very general settingcan be found in Mendiondo and Stockbridge [19]. Moment-based approaches have been used in a line ofpublications, as can be seen in Helmes et. al. [9] and Lasserre and Prieto-Rumeau [16]. Recent researchby Kaczmarek et. al. [11] and Rus [20] has been investigating a novel approximation technique for thelinear programming formulation by borrowing ideas from the finite element method used for solving partialdifferential equations. A discretization of the occupation measures (by discretizing their densities) andthe linear constraints with a finite set of basis functions gives a solvable finite-dimensional linear program.Kaczmarek et. al. [11] indicated that a finite element discretization approach may outperform Markov chainapproximation methods as well as a finite difference approximation to the Hamilton-Jacobi-Bellman equationstemming from the dynamic programming approach. However, no analytic treatment of the convergenceproperties was provided.The present paper closes this gap by providing a modified finite element based approximation scheme forwhich convergence of the computed cost criterion values can be guaranteed. To this end, the approximationscheme is split up in several steps which either deal with the discretization of the measures or the constraints.The separate steps are set up in such a way that convergence of the discrete optimal solutions to the analyticoptimal solution can be proven. The proofs are, on one hand, based on the concept of weak convergence ofmeasures, and on a detailed analysis of discretized approximations of the measures on the other hand.The paper is structured as follows. The next subsection presents the notation and formally introduces thelinear programming formulation for singular stochastic control problems, along with a review of importantresults from the literature. The approximation scheme is discussed in Section 2. Then, we provide theconvergence proof for this scheme in Section 3 and illustrate the performance of the numerical method ontwo examples in Section 4. A short outlook on possible research directions concludes this paper. Additionalproofs needed to prove the results from Section 3 are given in Appendix A. The natural numbers are denoted by N , and the non-negative integers are N = N ∪ { } . The symbol usedfor the real numbers is R , and that for the non-negative real numbers is R + . The space of n -dimensionalvectors is R n , and the space of n by m matrices is R n × m .The set of continuous functions on a topological space S is denoted by C ( S ). The set of twice differentiablefunctions on S is denoted by C ( S ), while its subset of twice differentiable functions with compact supportis referred to by C c ( S ). The space of uniformly continuous, bounded functions is denoted by C ub ( S ). Ona function space, (cid:107) · (cid:107) ∞ refers to the uniform norm of functions. On R n , (cid:107) · (cid:107) ∞ refers to the maximumnorm of vector components, while on R n × m , it refers to the maximum absolute row sum norm. The space ofLebesgue integrable functions is L ( S ). For any given function f , let f + : E (cid:51) x (cid:55)→ f + ( x ) := max( f ( x ) , f .In terms of measurable spaces, we use B ( S ) to describe the σ -algebra of Borel sets on a topological space S . Given a measurable space (Ω , F ), the set of probability measures on Ω is P (Ω), while the set of finiteBorel measures is denoted by M (Ω). The symbol δ { s } denotes the Dirac measure on s ∈ S . When using thedifferential dx as an integrator, it is understood that this refers to integration by Lebesgue measure. Whenwe explicitly refer to the Lebesgue measure, we use the symbol λ . A Brownian motion process is denoted bythe symbol W .Consider the SDE given by (1.1). We assume that X t ∈ E = [ e l , e r ], with ∞ < e l < e r < ∞ , and u t ∈ U = [ u l , u r ], with ∞ < u l < u r < ∞ , for all t ≥ E and U are called the state space and controlspace, respectively. The coefficient functions b : E × U (cid:55)→ R and σ : E × U (cid:55)→ R + − { } are called thedrift and diffusion functions. They are assumed to be continuous. The process ξ is a singular stochasticprocess stemming from the behavior of X at the boundaries of the state space e l and e r , and is givenby either a reflection, a jump or a combination of both. The infinitesimal generators of a process solving(refintroduction:sde) are ˜ A : C c ( E ) (cid:55)→ C ( E × U ), called the continuous generator, and B : C c ( E ) (cid:55)→ C ( E × U ), called the singular generator. For f ∈ C c ( E ), ˜ A is defined by ˜ Af ( x, u ) = b ( x, u ) f (cid:48) ( x ) + σ ( x, u ) f (cid:48)(cid:48) ( x ).2 is defined by either of Bf ( x, u ) = ± f (cid:48) ( x ) or Bf ( x, u ) = f ( x + u ) − f ( x ) . (1.4)The first form of B models a reflection process (+ forcing a reflection to the right and − forcing a reflectionto the left) and the second form models a jump process jumping from x to x + u . With these generators, aspecification of the dynamics that requires f ( X t ) − f ( x ) − (cid:90) t ˜ Af ( X s , u s ) ds − (cid:90) t Bf ( X s , u s ) dξ s (1.5)to be a martingale for all f ∈ C c is equivalent to (1.1) in terms of weak solutions. Hence, the values of thecost criteria determined by (1.2) and (1.3) remain identical. The following relaxed formulation of (1.5) isbetter suited for the purpose of stochastic control. Definition 1.1.
Let X be a stochastic process with state space E , let Λ be a stochastic process takingvalues in P ( U ) , and let Γ be a random variable taking values in the space of measures on ([0 , ∞ ) × E × U ) ,with Γ([0 , t ] × E × U ) ∈ M ([0 , t ] × E × U ) for all t . The triplet ( X, Λ , Γ) is a relaxed solution to thesingular, controlled martingale problem for ( ˜ A, B ) if there is a filtration { F t } t ≥ such that X , Λ and Γ are F t -progressively measurable and f ( X t ) − f ( x ) − (cid:90) t (cid:90) U ˜ Af ( X s , u ) Λ s ( du ) ds − (cid:90) [0 ,t ] × E × U Bf ( x, u ) Γ( ds × dx × du ) is an { F t } t ≥ -martingale for all f ∈ C c ( E ) . The relaxation is given by the fact that the control is no longer represented by a process u , but is encodedin the random measures Λ and Γ. Assume that the cost functions ˜ c and ˜ c are continuous and non-negative.The cost criteria for a relaxed solution of the singular, controlled martingale problem arelim sup t →∞ t E (cid:34)(cid:90) t (cid:90) U ˜ c ( X s , u ) Λ s ( du ) ds + (cid:90) [0 ,t ] × E × U ˜ c ( x, u ) Γ( ds × dx × du ) (cid:35) , (1.6)for the long-term average cost criterion, and for α > E (cid:34)(cid:90) ∞ (cid:90) U e − αs ˜ c ( X s , u ) Λ s ( du ) ds + (cid:90) [0 , ∞ ) × E × U e − αs ˜ c ( x, u ) Γ( ds × dx × du ) (cid:35) , (1.7)for the infinite horizon discounted cost criterion. A stochastic control problem given by (1.5) together with(1.6) or (1.7) can be reformulated as an infinite dimensional linear program. To this end, we set c ( x, u ) = (cid:26) ˜ c ( x, u ) if α = 0˜ c ( x, u ) /α if α > c ( x, u ) = (cid:26) ˜ c ( x, u ) if α = 0˜ c ( x, u ) /α if α > . Furthermore, for α ≥ A : C c ( E ) (cid:55)→ C ( E × U ) by Af ( x, u ) = ˜ Af ( x, u ) − αf ( x ) (1.8)and the functional Rf = − αf ( x ), x being the starting point of the diffusion. Definition 1.2.
The infinite-dimensional linear program for a singular stochastic control problem is givenby Minimize (cid:82) E × U c dµ + (cid:82) E × U c dµ Subject to (cid:82) E × U Af dµ + (cid:82) E × U Bf dµ = Rf ∀ f ∈ C c ( E ) µ ∈ P ( E × U ) µ ∈ M ( E × U ) . µ and µ are the expected occupation measures of X and Γ. We frequently consider themeasures on ( E, B ( E )) given by µ ,E ( · ) = µ ( · × U ) and µ ,E ( · ) = µ ( · × U ). The refer to these measuresas the state space marginals of µ and µ , respectively. The properties of such linear programs and theirrelation to stochastic control problems for singular, controlled martingale problems are stated in Theorem1.4. These results use the notion of a regular conditional probability defined as follows. Definition 1.3.
Let ( E × U, B ( E × U ) , µ ) be a measure space, and let P : E × U (cid:51) ( x, u ) (cid:55)→ x ∈ E be theprojection map onto E . Let µ E be the distribution of P , which is identical to the state space marginal of µ .A map η : B ( U ) × E (cid:55)→ [0 , is called a regular conditional probability ifi) for each x ∈ E , η ( · , x ) : B ( U ) (cid:55)→ [0 , is a probability measure,ii) for each V ∈ B ( U ) , η ( V, · ) : E (cid:55)→ [0 , is a measurable function, andiii) for all V ∈ B ( U ) and all F ∈ B ( E ) we have µ ( F × V ) = (cid:90) F η ( V, x ) µ E ( dx ) . Theorem 1.4.
The problem of minimizing either the long-term average cost criterion of (1.6) or the infinitehorizon discounted cost criterion of (1.7) over the set of all relaxed solutions ( X, Λ , Γ) to the singular,controlled martingale problem for ( ˜ A, B ) is equivalent to the linear program stated in (1.2). Moreover, thereexists an optimal solution ( µ ∗ , µ ∗ ) . Let η ∗ and η ∗ be the regular conditional probabilities of µ ∗ and µ ∗ with respect to their state space marginals. Then an optimal relaxed control is given in feedback form by Λ ∗ t = η ∗ ( · , X ∗ t ) and Γ ∗ ( ds × dx × du ) = η ∗ ( du, x )˜Γ ∗ ( ds × dx ) for a random measure ˜Γ ∗ on [0 , ∞ ) × E , where ( X ∗ , Λ ∗ , Γ ∗ ) is a relaxed solution to the singular, controlled martingale problem for ( ˜ A, B ) having occupationmeasures ( µ ∗ , µ ∗ ) .Proof. See Kurtz and Stockbridge [14], Theorem 2.1 and Theorem 3.3, respectively.By this result, it suffices to find optimal solutions to the infinite linear program when solving a singularstochastic control problem, and approximate solutions to the linear program serve as approximate solutionsto the control problem. Section 2.1 presents how we discretize the infinite dimensional linear program to acomputationally attainable formulation, which is the basis for the numerical technique used in this paper.The analysis of this discretization scheme relies in part on the notion of weak convergence of finite measureswhich is defined next. Let S be a measurable space in the following, equipped with a topology. Definition 1.5.
Consider a sequence of finite measures { µ n } n ∈ N and another finite measure µ on S . Wesay that µ n converges weakly to µ , in symbol µ n ⇒ µ , if for all f ∈ C ub ( S ) (cid:90) S f ( x ) µ n ( dx ) → (cid:90) S f ( x ) µ ( dx ) as n → ∞ holds. Note that we are considering finite measures, and not necessarily probability measures. In particular,we could encounter a situation where the sequence of numbers { µ n ( S ) } n ∈ N is unbounded. This differs from‘classical’ considerations of weak convergence, which for example can be found in Billingsley [3]. However,Bogachev [4] (see Chapter 8 in Volume 2) offers a discussion of the concept of weak convergence in this moregeneral case. Central to our purposes is Theorem 1.9, which states sufficient conditions for the existence ofweakly converging subsequences when considering sequences of finite measures, based on the following twoconcepts. Definition 1.6.
A sequence of finite measures { µ n } n ∈ N on S is called tight if for each (cid:15) > , there is acompact set K (cid:15) in S such that µ n ( K C(cid:15) ) < (cid:15) holds for all n ∈ N . emark 1.7. If S is compact, any sequence of finite measures on S is tight. Definition 1.8.
A sequence of finite measures { µ n } n ∈ N on S is called uniformly bounded if for some l ≥ , µ n ( S ) ≤ l holds for all n ∈ N . If a sequence of finite measures on S is tight and uniformly bounded, the existence of convergent subse-quences is guaranteed by the following result. Theorem 1.9.
Let { µ n } n ∈ N be a sequence of finite measures on S . Then, the following are equivalent.i) { µ n } n ∈ N contains a weakly convergent subsequence,ii) { µ n } n ∈ N is tight and uniformly bounded.Proof. See Bogachev [4], Theorem 8.6.2.
We begin the presentation of the proposed method by describing the discretization scheme. Then, we discussthe assumption being necessary for the convergence of the method.
The proposed numerical technique is based on a discretization of the infinite-dimensional linear program inthree steps. First, we introduce a limit on the full mass of the measure µ . Then, we restrict the numberof constraint functions. Thirdly, we introduce discrete versions of the measures. In the process, severalassumptions on the measure µ are made. For the sake of exposition, we elaborate on these assumptionsseparately in Section 2.2.Since the discretization brings forth several distinct sets of measures, we define the cost criterion using thefollowing, general formulation. J : P ( E × U ) × M ( E × U ) (cid:51) ( µ , µ ) (cid:55)→ J ( µ , µ ) = (cid:90) E × U c dµ + (cid:90) E × U c dµ ∈ R + We choose to consider C c ( E ) as a normed space in the right sense. Set (cid:107) f (cid:107) D = (cid:107) f (cid:107) ∞ + (cid:107) f (cid:48) (cid:107) ∞ + (cid:107) f (cid:48)(cid:48) (cid:107) ∞ anddefine D ∞ = ( C c ( E ) , (cid:107) · (cid:107) D ) to designate that we consider C c ( E ) to be a specific normed space. Set M ∞ = { ( µ , µ ) ∈ P ( E × U ) × M ( E × U ) : (cid:90) Af dµ + (cid:90) Bf dµ = Rf ∀ f ∈ D ∞ } . For analytical purposes, we introduce an upper bound on µ ( E × U ). For l > M l ∞ = { ( µ , µ ) ∈ M ∞ : µ ( E × U ) ≤ l } . (2.1) Remark 2.1. As l increases, more measures of M ∞ will lie in M l ∞ . For l large enough, the optimal solutionwill lie in M l ∞ , as we have that µ ∗ ∈ M ( E × U ) and hence µ ∗ ( E ) < ∞ . Definition 2.2.
The l -bounded infinite-dimensional linear program is given by min (cid:8) J ( µ , µ ) | ( µ , µ ) ∈ M l ∞ (cid:9) . The set M l ∞ features an infinite set of constraints given by all f ∈ D ∞ and measures µ and µ havingan infinite number of degrees of freedom. First, we discretize the set of constraints using B-spline basisfunctions. To construct these basis functions, fix q ∈ N and consider a finite set of pointwise distinct gridpoints { e k } q +3 k = − in E , with e = e l , e r = e q and e k < e k +1 for k = − , − , . . . , q + 2.5 efinition 2.3. The set of cubic B-spline basis functions for a grid { e k } q +3 k = − is defined on R by f k ( x ) = ( e k +4 − e k ) k +4 (cid:88) i = k (cid:2) ( e i − x ) (cid:3) + Ψ (cid:48) k ( e i ) , k = − , − , . . . , q − , where Ψ k ( x ) = k +4 (cid:89) i = k ( x − e i ) , k = − , − , . . . , q − . An analysis of these basis function is given in de Boor [5]. Provided thatmax k = − ,...,q +2 ( e k +1 − e k ) → k = − ,...,q +2 ( e k +1 − e j )min k = − ,...,q +2 ( e k +1 − e j ) → n → ∞ , Theorem 1 of Hall and Meyer [8] holds and the following statement can be shown. Proposition 2.4.
The normed space D ∞ is separable and a countable basis { f k } k ∈ N is given by the cubic B-splines basis functions. For fixed ˜ q ∈ N , define a grid using the dyadic partition of E given by e k = e l + e r − e l ˜ q · k, k = − . . . , ˜ q + 3 . and consider the n := 2 ˜ q + 2 B-spline basis functions { f k } nk =1 on this grid. This allows us to define M n = { ( µ , µ ) ∈ P ( E × U ) × M ( E × U ) : (cid:90) Af k dµ + (cid:90) Bf k dµ = Rf k , k = 1 , . . . , n (cid:27) and we can define M ln in a similar manner using the mass restriction on µ as seen in (2.1). Definition 2.5.
The l -bounded ( n, ∞ ) -dimensional linear program is given by min (cid:8) J ( µ , µ ) | ( µ , µ ) ∈ M ln (cid:9) . Next, we discretize the measures. Theorem 1.4 reveals that it is sufficient to regard feedback controlswhich can be represented by regular conditional probabilities. In particular, this result states that we canconsider measures ( µ , µ ) which can be decomposed according to µ ( dx × du ) = η ( dx, u ) µ ,E ( dx ) and µ ( dx × du ) = η ( dx, u ) µ ,E ( dx ) for two regular conditional probabilities η and η . We furthermore assumethat, first, for any interval or singleton V ⊂ U , x (cid:55)→ η ( V, x ) is continuous almost everywhere with respectto Lebesgue measure, second, that µ ,E has a density p with respect to Lebesgue measure and third, that p satisfies the constraint that λ ( { x : p ( x ) = 0 } ) = 0. In other words, p must only be equal to zero on a setof Lebesgue measure 0. The particulars of these assumptions are discussed in Section 2.2, and we continuehere with the description of the approximation scheme.Define a sequence k m as follows. As c , b and σ are continuous over a compact set, for all m ∈ N , there is a δ m > u, v ∈ U with | u − v | ≤ δ m , it is true thatmax (cid:26) | c ( x, u ) − c ( x, v ) | , | b ( x, u ) − b ( x, v ) | , (cid:12)(cid:12)(cid:12)(cid:12) σ ( x, u ) − σ ( x, v ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:27) ≤ m +1 , (2.2)uniformly in x . Set k m to be the smallest integer such that u r − u l km ≤ δ m . The parameter k m controls thediscretization of the control space U , and the specific choice enables an accurate approximate integrationof the cost function c and the functions Af k against the relaxed control η in the convergence proof ofSection 3. So, define U ( m ) = { u j = u l + u r − u l k m · j, j = 0 , . . . , k m } . (2.3)6imilarly, we set E ( m ) = { e j = e l + e r − e l m · j, j = 0 , . . . , m } . (2.4)The union of these sets over all m ∈ N is dense in the control space and state space, respectively. Thenumber m is called the discretization level. It determines the degrees of freedom of the discrete measures ˆ µ (approximating µ ) and ˆ µ (approximating µ ), which are defined as follows.First, we approximate the density p of µ ,E . Choose a countable basis of L ( E ), say { p n } n ∈ N , given byindicator functions over subintervals of E . We truncate this basis to p , . . . , p m − (given by the indicatorfunctions of the intervals of length 1 / m , compare (2.4)) to approximate the density p byˆ p m ( x ) = m − (cid:88) j =0 γ j p j ( x ) (2.5)where γ j ∈ R + , j = 0 , . . . , m − (cid:82) E ˆ p m ( x ) dx = 1. Set E j = [ x j , x j +1 ) for j = 0 , , . . . m − E m − = [ x m − , x m ] to defineˆ η ,m ( V, x ) = m − (cid:88) j =0 2 km (cid:88) i =0 β j,i I E j ( x ) δ { u i } ( V ) , (2.6)where β j,i ∈ R + , j = 0 , . . . , m − , i = 0 , . . . , k m are weights to be chosen under the constraint that (cid:80) m − i =0 β j,i = 1 for j = 0 , . . . , m −
1. We approximate η using (2.6), which means that this relaxed controlis approximated by point masses in U -‘direction’ and piecewise constant in E -‘direction’. Then, we setˆ µ ,m ( du × dx ) = ˆ η ,m ( du, x )ˆ p m ( x ) dx .To approximate the singular occupation measure µ , we use that the process is only showing singularbehavior at e l and e r . Thus, if we introduce the regular conditional probability η and write µ ( dx × du ) = η ( du, x ) µ ,E ( dx ), and for F ∈ B ( E ), we have for F ∈ B ( E ) µ ,E ( F ) = w δ { e l } ( F ) + w δ { e r } ( F ) (2.7)with w , w ∈ R + . We approximate the relaxed control η byˆ η ,m ( V, e l ) = km (cid:88) i =0 ζ ,i δ { u i } ( V ) , ˆ η ,m ( V, e r ) = km (cid:88) i =0 ζ ,i δ { u i } ( V ) (2.8)with (cid:80) km i =0 ζ j,i = 1 for j = 1 ,
2. So, we have ˆ µ ,m ( dx × du ) = ˆ η ,m ( du, x ) µ ,E ( dx ). In summary, we considermeasures of the form(ˆ µ ,m , ˆ µ ,m ) ( dx × du ) = (ˆ η ,m ( du, x )ˆ p m ( x ) dx, ˆ η ,m ( du, x ) µ ,E ( dx ))and we introduce the notation M ln,m = (cid:8) ( µ ,m , µ ,m ) ∈ M ln : ( µ ,m , µ ,m ) ( du, dx ) = (ˆ η m ( du, x )ˆ p m ( x ) dx, ˆ η m ( du, x ) µ ,E ( dx )) (cid:9) . This finalizes the discretization of the measures and leaves us with the following linear program.
Definition 2.6.
The l -bounded ( n, m ) -dimensional linear program is given by inf (cid:8) J ( µ , µ ) | ( µ , µ ) ∈ M ln,m (cid:9) . This linear program is linear in the coefficients given by the products β j,i · γ j and ζ j,i · α j , and the costfunctional can as well be expressed as a linear combination of these coefficients.Up to this point, we introduced four sets of measures, M ∞ , M l ∞ , M ln and M ln,m , and we later on will use (cid:15) -optimal solutions in M ln,m to approximate the optimal solution in M ∞ . However the relations betweenthose sets are M l ∞ ⊂ M ∞ , M ln ⊃ M l ∞ and M ln,m ⊂ M ln . As this does not provide a clear nested structure,it has to be carefully analyzed how optimal solutions in these sets relate to each other. This is presented inSection 3. 7 .2 Assumptions Before we move to the presentation of the convergence argument, we elaborate on the assumptions on µ which were made in Section 2.1. These assumptions restrict the set of feasible measures considered in thelinear program given by (1.2) to measures which allow the approximation to converge. Albeit technical,the imposed restrictions do not curtail the set of feasible measures beyond what can be considered to be‘implementable’ solutions, in other words, the set of measures will still be large enough to include any typeof control that could be used in a real-world application.First, we assume that the state space marginal µ ,E of the expected occupation measures µ has a density p with respect to the Lebesgue measure. As shown in [23], Section II.2, this is guaranteed when certainassumptions on the regular conditional probability η of the continuous occupation measure µ are fulfilled.To be precise, we have to assert that the functions x (cid:55)→ (cid:90) U b ( x, u ) η ( du, x ) , x (cid:55)→ (cid:90) U σ ( x, u ) η ( du, x ) (2.9)are continuous everywhere except for finitely many points in E . On the one hand, this is satisfied for controlsof the form given by (2.6), which includes the important class of so-called bang-bang controls. Bang-bangcontrols put full mass on either of the end points u l and u r of the control space U . Usually, when the costfunction c does not depend on the control value u , the optimal solution is given by a bang-bang control. Ifthis is not the case, optimal controls are frequently given in the form of a continuous function v : E (cid:55)→ U and a control satisfying η ( { v ( x ) } , x ) = 1. It is easy to see that in both cases the two functions defined in(2.9) are continuous except for finitely many points.Secondly, we assume that p must be equal to zero only on a set of Lebesgue measure 0. The analysis in[23], Section II.2 shows that the densities encountered when using both the long-term average cost criterionand the discounted infinite horizon criterion satisfy this assumption. Thirdly, we assume that for any set V ⊂ U which is either an interval or a singleton, the function x (cid:55)→ η ( V, x ) is continuous almost everywherewith respect to Lebesgue measure. This allows us to approximate the function x (cid:55)→ η ( V, x ) uniformly by afunction which is piecewise constant over intervals, and the approximate function values on these intervalsare given by the values of η ( V, x ) at the left endpoints of the intervals. This makes the statement of (3.8)of the convergence argument true. Controls of the form given in (2.6) satisfy this requirement. However, if η fulfills η ( { v ( x ) } , x ) = 1 for some continuous function v on E , we have to assert more regularity on v ,according to the following definition. Definition 2.7.
A continuous function v : E (cid:55)→ U is said to have finitely many modes if there are finitelymany points e l = ˆ y < ˆ y < . . . < ˆ y k = e r such that for all ≤ i ≤ k − , there are points a i and b i with ˆ y i − < a i < ˆ y i < b i < ˆ y i +1 and either of the following statements hold:i) v is strictly increasing on ( a i , ˆ y i ) and strictly decreasing on (ˆ y i , b i ) as well as increasing on (ˆ y i − , ˆ y i ) and decreasing on (ˆ y i , ˆ y i +1 ) ,ii) v is strictly decreasing on ( a i , ˆ y i ) and strictly increasing on (ˆ y i , b i ) as well as decreasing on (ˆ y i − , ˆ y i ) and increasing on (ˆ y i , ˆ y i +1 ) . We assume that v only has finitely many modes in the following. The rationale behind this assumptionis as follows. Obviously, x (cid:55)→ η ( V, x ) is piecewise constant, either 0 or 1. The fact that v ‘oscillates’ onlyfinitely many times between its modes ensures that x (cid:55)→ η ( V, x ) does not switch from 0 to 1 or from 1 to0 more than finitely many times, and hence it is discontinuous on a set that has measure 0 with respect toLebesgue measure.
The first part of this section gives an overview of the convergence argument, illustrating the main ideas ofthe analysis. The proofs of the propositions and corollary are given in the second part.8 .1 Statement of the main results
The l -bounded ( n, m )-dimensional linear program introduced in Section 2.1 is a finite dimensional linearprogram that can be solved with standard solvers that are available in numerical libraries, and hence optimalsolutions are attainable. We proceed to show that the optimal solution to the l -bounded ( n, m )-dimensionallinear program is an (cid:15) -optimal solution to the infinite dimensional program for l , n and m large enough. Weuse the notations J ∗ = inf (cid:8) J ( µ , µ ) | ( µ , µ ) ∈ M l ∞ (cid:9) J ∗ n = inf (cid:8) J ( µ , µ ) | ( µ , µ ) ∈ M ln (cid:9) J ∗ n,m = inf (cid:8) J ( µ , µ ) | ( µ , µ ) ∈ M ln,m (cid:9) . For l large enough, J ∗ is indeed the optimal solution to the unbounded problem, as stated in Remark 2.1,in other words, J ∗ = min { J ( µ , µ ) | ( µ , µ ) ∈ M ∞ } Since an infimum might not be computationally attainable in M l ∞ and M ln , we withdraw to the slightlyrelaxed optimization problem of finding an (cid:15) -optimal solution, in other words, we try to find a pair ofmeasures ( µ (cid:15) , µ (cid:15) ) ∈ M l ∞ such that J ( µ (cid:15) , µ (cid:15) ) − J ( µ , µ ) ≤ (cid:15) ∀ ( µ , µ ) ∈ M l ∞ . Note that trivially, J ( µ (cid:15) , µ (cid:15) ) − J ∗ ≥
0. The (cid:15) -optimality for M ln is defined analogously. The followingconvergence analysis proves that we can find (cid:15) -optimal measures in M l ∞ using the approximation proposedin Section 2. The outline of the proof is as follows. First, it is shown that it suffices to find an (cid:15) -optimalsolution in M ln . Proposition 3.1.
For each n ∈ N , assume that ( µ (cid:15) ,n , µ (cid:15) ,n ) ∈ M ln and that for n ∈ N , ( µ (cid:15) ,n , µ (cid:15) ,n ) is an (cid:15) -optimal solution for the l -bounded, ( n, ∞ ) -dimensional linear program. Then, for δ > , there exists an N ( δ ) such that J ( µ (cid:15) ,n , µ (cid:15) ,n ) − J ∗ ≤ (cid:15) + δ. for all n ≥ N ( δ ) . Next, we establish that (cid:15) -optimal solutions in M ln can be obtained using the discretization introduced inSection 2.1. The central result reads as follows. Proposition 3.2.
For ( µ , µ ) ∈ M ln and each (cid:15) > , there is an m such that for all m ≥ m there existsa (ˆ µ ,m , ˆ µ ,m ) ∈ M ln,m , with | J ( µ , µ ) − J (ˆ µ ,m , ˆ µ ,m ) | < (cid:15). This result shows that arbitrary (not necessarily optimal) measures in M ln can be approximated, in termsof their cost criterion, by measures in M ln,m . Regarding optimal measures, the following statement is aneasy consequence from Proposition 3.2. Corollary 3.3.
For each m ∈ N , assume that ( µ ∗ ,n,m , µ ∗ ,n,m ) ∈ M ln,m and that for m ∈ M , ( µ ∗ ,n,m , µ ∗ ,n,m ) is an optimal solution to the l -bounded, ( n, m ) -dimensional linear program. Then, the sequence of numbers { J ( µ ∗ ,n,m , µ ∗ ,n,m ) } m ∈ N converges to J ∗ n as m → ∞ . In conjunction, the preceding results allow us to prove the following theorem.
Theorem 3.4.
For any (cid:15) > , there is an l > , an N ∈ N and an M ∈ N such that | J ∗ − J ∗ n,m | < (cid:15) holds for all n ≥ N and m ≥ M . roof. Pick any (cid:15) >
0. Choose l large enough such that J ∗ = min { J ( µ , µ ) | ( µ , µ ) ∈ M ∞ } . Pick ˆ (cid:15) and δ > (cid:15) + δ < (cid:15) . By Proposition3.1, choose N ∈ N large enough such that for all n ≥ N , an ˆ (cid:15) -optimal solution ( µ ˆ (cid:15) ,n , µ ˆ (cid:15) ,n ) to the l -bounded, ( n, ∞ )-dimensional program is an 2ˆ (cid:15) + δ -optimal solution to the l -bounded infinite-dimensionallinear program. Now, using Corollary 3.3, choose M ∈ N large enough such that for all m ≥ M , the optimalsolution ( µ ∗ ,n,m , µ ∗ ,n,m ) to the l -bounded ( n, m )-dimensional linear program is an ˆ (cid:15) -optimal solution to the l -bounded ( n, ∞ )-dimensional program. But then, | J ∗ − J ∗ n,m | ≡ | J ∗ − J ( µ ∗ ,n,m , µ ∗ ,n,m ) | < (cid:15) + δ < (cid:15) The proof of Proposition 3.1 is a rather straight forward application of the theory of weak convergence asintroduced in Section 1.2. The proofs of Proposition 3.2 and Corollary 3.3 on the other hand require anin-depth analysis of the approximation properties of the proposed discretization scheme. We begin with theproof of Proposition 3.1, stated again for the sake of exposition.
Proposition 3.1.
For each n ∈ N , assume that ( µ (cid:15) ,n , µ (cid:15) ,n ) ∈ M ln and that for n ∈ N , ( µ (cid:15) ,n , µ (cid:15) ,n ) is an (cid:15) -optimal solution for the l -bounded, ( n, ∞ ) -dimensional linear program. Then, for δ > , there exists an N ( δ ) such that J ( µ (cid:15) ,n , µ (cid:15) ,n ) − J ∗ ≤ (cid:15) + δ. for all n ≥ N ( δ ) .Proof. Assume first that ( µ (cid:15) ,n , µ (cid:15) ,n ) converges weakly to some ( µ (cid:15) , µ (cid:15) ). Then, for g ∈ D ∞ , and g k → g in D ∞ we have (cid:90) Agdµ + (cid:90) Bgdµ = lim k →∞ lim n →∞ (cid:90) Ag k dµ (cid:15) ,n + (cid:90) Bg k dµ (cid:15) ,n = lim k →∞ lim n →∞ Rg k = lim k →∞ Rg k = Rg.
Thus, ( µ (cid:15) , µ (cid:15) ) ∈ M ∞ . Now observe that since c and c are continuous, J ( µ (cid:15) ,n , µ (cid:15) ,n ) → J ( µ (cid:15) , µ (cid:15) ). Assume there would be an (ˆ µ , ˆ µ ) ∈ M l ∞ such that J ( µ (cid:15) , µ (cid:15) ) > J (ˆ µ , ˆ µ )+ (cid:15) , thatis, assume ( µ (cid:15) , µ (cid:15) ) would not be (cid:15) -optimal. For n large enough, we would have J ( µ (cid:15) ,n , µ (cid:15) ,n ) > J (ˆ µ , ˆ µ ) + (cid:15) .But as M n ⊃ M ∞ , this would imply that ( µ (cid:15) ,n , µ (cid:15) ,n ) is not (cid:15) -optimal. So, ( µ (cid:15) , µ (cid:15) ) is (cid:15) -optimal.In general, we cannot guarantee weak convergence of ( µ (cid:15) ,n , µ (cid:15) ,n ), but since E × U is compact and the full massof µ (cid:15) ,n and µ (cid:15) ,n is uniformly bounded by 1 and l , respectively, the existence of a convergent subsequenceis given by Theorem 1.9. Consider two different subsequences of (cid:15) -optimal measures ( µ (cid:15) ,n , µ (cid:15) ,n ) and( µ (cid:15) ,n , µ (cid:15) ,n ). Set z = lim n →∞ J ( µ (cid:15) ,n , µ (cid:15) ,n ) and z = lim n →∞ J ( µ (cid:15) ,n , µ (cid:15) ,n ). Both z and z are (cid:15) -optimal cost criterion values in M l ∞ . Hence we can conclude that | z − z | < (cid:15) . In particular, for z ∈ R such that J ( µ (cid:15) , µ (cid:15) ) ∈ [ z − (cid:15) , z + (cid:15) ] for any weak limit ( µ (cid:15) , µ (cid:15) ) of a sequence of (cid:15) -optimal measures. Thismeans that for δ >
0, there is an N ≡ N ( δ ) large enough such that for all n ≥ N , J ( µ (cid:15) ,n , µ (cid:15) ,n ) ∈ (cid:0) z − (cid:15) − δ, z + (cid:15) + δ (cid:1) . Now assume that ( µ (cid:15) ,n , µ (cid:15) ,n ) does not converge weakly, and that J ( µ (cid:15) ,n , µ (cid:15) ,n ) / ∈ (cid:0) z − (cid:15) − δ, z + (cid:15) + δ (cid:1) infinitely often. This would allow for the construction of a subsequence ( µ (cid:15) ,n , µ (cid:15) ,n )with J ( µ n , µ ,n ) / ∈ (cid:0) z − (cid:15) − δ, z + (cid:15) + δ (cid:1) ∀ n ∈ N . However, by the tightness argument of Theorem1.9, that sequence contains a converging sub-subsequence ( µ ,n , µ ,n ). This means that J ( µ ,n , µ ,n ) ∈ (cid:0) z − (cid:15) − δ, z + (cid:15) + δ (cid:1) eventually, contradicting the preceding assumption. So, there exists an N ∈ N suchthat J ( µ (cid:15) ,n , µ (cid:15) ,n ) ∈ (cid:0) z − (cid:15) − δ, z + (cid:15) + δ (cid:1) ∀ n ≥ N . Consider a limit ( µ (cid:15) , µ (cid:15) ) of a convergent subsequenceof ( µ (cid:15) ,n , µ (cid:15) ,n ). By its (cid:15) -optimality and the properties of the infimum, J ∗ + (cid:15) ≥ J ( µ (cid:15) , µ (cid:15) ) ≥ z − (cid:15) ⇔ J ∗ ≥ z − (cid:15) . n ≥ N , J ( µ (cid:15) ,n , µ (cid:15) ,n ) − J ∗ ≤ z + (cid:15) δ − (cid:18) z − (cid:15) (cid:19) = 2 (cid:15) + δ from which the claim follows.We proceed to analyze the discretization scheme of Section 2.1 and its approximation properties in orderto prove Proposition 3.2. From here on, we consider a fixed n , assuming that it is large enough to guaranteethat the statement of Proposition 3.1 holds. In particular, we consider the space D n , which is spanned bythe basis functions f , f , . . . , f n .Proposition 3.2 states that we can approximate the cost criterion of a pair of measures ( µ , µ ) ∈ M ln arbitrarily closely by a pair of measures ( µ m, , µ m, ) ∈ M ln,m . In the following, we consider a fixed pair ofmeasures ( µ , µ ). If η and η are the regular conditional probabilities of µ and µ , respectively, we usethe following choice of the coefficients β j,i and ζ j,i in (2.6) and (2.8), respectively. We use the mesh pointsin E and U as defined in (2.4) and (2.3). Set U i = [ u i , u i +1 ) for 0 ≤ i ≤ k m − , U km = { u km } (3.1)and with that, β j,i = η ( U i , x j ) = (cid:90) U i η ( du, x j ) , j = 0 , . . . , m − , i = 0 , . . . , k m , (3.2) ζ ,i = η ( U i , e l ) = (cid:90) U i η ( du, e l ) , ζ ,i = η ( U i , e r ) = (cid:90) U i η ( du, e r ) , (3.3) i = 0 , . . . , k m . The following two facts can easily be derived using the uniform continuity of c , c as well as f , . . . , f n (recall that these are continuous functions on a compact set), and the specific forms of the generators A and B as in seen (1.8) and (1.4), respectively. Lemma 3.6.
For (cid:15) > , there is a δ > such that, uniformly in x ∈ E , max {| c ( x, u ) − c ( x, v ) | , | Af ( x, u ) − Af ( x, v ) | , . . . , | Af n ( x, u ) − Af n ( x, v ) |} < (cid:15) holds whenever | u − v | < δ . Lemma 3.7.
For (cid:15) > , there is a δ > such that for s = e l or s = e r , max {| c ( s, u ) − c ( s, v ) | , | Bf ( s, u ) − Bf ( s, v ) | , . . . , | Bf n ( s, u ) − Bf n ( s, v ) |} < (cid:15) holds whenever | u − v | < δ . The following two results ensure that we can approximate the cost criterion of a pair of measures ( µ , µ ) ∈ M ln arbitrarily closely, and that the our approximate measures ‘almost’ satisfy the linear constraints. Proposition 3.8.
Consider a regular conditional probability η and a probability density function p stemmingfrom a continuous occupation measure µ such that µ ( dx × du ) = η ( du, x ) p ( x ) dx . Let g ( x, u ) = c ( x, u ) or g ( x, u ) = Af k ( x, u ) for any k = 1 , , . . . , n . For (cid:15) > , there exists an m ∈ N such that for all m ≥ m , (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U g ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U g ( x, u )ˆ η ,m ( du, x ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15), (3.4) where ˆ η ,m is of the form (2.6), using the coefficients specified in (3.2). roof. Observe that given (2.6), | I | ≡ (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U g ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U g ( x, u )ˆ η ,m ( du, x ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U g ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E m − (cid:88) j =0 2 km (cid:88) i =0 β j,i I E j ( x ) g ( x, u i ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . By the definition of β j,i in (3.2) and the triangle inequality it follows that | I | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U g ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E km (cid:88) i =0 g ( x, u i ) (cid:90) U i η ( du, x ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E km (cid:88) i =0 g ( x, u i ) (cid:90) U i η ( du, x ) − m − (cid:88) j =0 2 km (cid:88) i =0 (cid:90) U i η ( du, x j ) I E j ( x ) g ( x, u i ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≡ | I | + | I | . Observe that | I | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E km (cid:88) i =0 (cid:90) U i ( g ( x, u ) − g ( x, u i )) η ( du, x ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . By Lemma 3.6, there is a δ > | u − v | < δ , we have that | g ( x, u ) − g ( x, v ) | < (cid:15) , uniformlyin x ∈ E . Choose m large enough such that for all m ≥ m it is true that km < δ . Then | I | < (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E km (cid:88) i =0 (cid:90) U i (cid:15) η ( du, x ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:15) . (3.5)We now examine the term | I | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E km (cid:88) i =0 g ( x, u i ) (cid:90) U i η ( du, x ) − m − (cid:88) j =0 I E j ( x ) (cid:90) U i η ( du, x j ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (3.6)Regard I as a sequence with two indices, say I ( a, b ) ≡ I ( a, k b ) ≡ I , where a and b are two discretizationlevels, with a slight abuse of notation.Our first claim is that I ( a, b ) is a Cauchy sequence in b . To see this, we analyze two successive elements ofthe sequence. Consider | I ( a, b + 1) − I ( a, b ) | = | I ( a, k b +1 ) − I ( a, k b ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E kb +1 (cid:88) i =0 g ( x, ˜ u i ) η ( ˜ U i , x ) − a − (cid:88) j =0 I E j ( x ) η ( ˜ U i , x j ) − kb (cid:88) i =0 g ( x, u i ) η ( U i , x ) − a − (cid:88) j =0 I E j ( x ) η ( U i , x j ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) where ˜ U i and ˜ u i are used to indicate the partition of U and the points of the discrete set in U of thediscretization level b + 1, defined analogously to (2.3) and (3.1). Due to the additivity of measures, the twosums over i , if regarded as a Riemann-type approximation to an integral, only differ by a more accurate choiceof the ‘rectangle height’ g ( x, u i ) and g ( x, ˜ u i ) in the Riemann sum. To formalize this, for i ∈ { , . . . , k b +1 } π ( i ) ∈ { , . . . , k b } be the index such that ˜ U i ⊂ U π ( i ) . Observe that (cid:80) kb +1 i =0 | η ( U i , x ) − η ( U i , x j ) | ≤ x j , and thus independently of our choice of E ( a ) . This is due to the fact regular conditionalprobabilities are indeed probability measures with a full mass of 1. Then, | I ( a, b + 1) − I ( a, b ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E kb +1 (cid:88) i =0 (cid:0) g ( x, ˜ u i ) − g ( x, u π ( i ) ) (cid:1) · η ( ˜ U i , x ) − a − (cid:88) j =0 I E j ( x ) η ( ˜ U i , x j ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) E kb +1 (cid:88) i =0 (cid:12)(cid:12) g ( x, ˜ u i ) − g ( x, u π ( i ) ) (cid:12)(cid:12) · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) η ( ˜ U i , x ) − a − (cid:88) j =0 I E j ( x ) η ( ˜ U i , x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( x ) dx ≤ K (cid:18) (cid:19) b +1 (cid:90) E kb +1 (cid:88) i =0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) η ( ˜ U i , x ) − a − (cid:88) j =0 I E j ( x ) η ( ˜ U i , x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( x ) dx ≤ K (cid:18) (cid:19) b +1 · K (cid:18) (cid:19) b by the fact that (cid:12)(cid:12) g ( x, ˜ u i ) − g ( x, u π ( i ) ) (cid:12)(cid:12) is uniformly bounded by K (cid:0) (cid:1) b +1 , with K = 1 if g ( x, u ) = c ( x, u ),and K = max {(cid:107) f (cid:107) D , . . . , (cid:107) f k (cid:107) D } if g ( x, u ) = Af k ( x, u ) by our choice of U ( k m ) , compare (2.2).Now, for some ϑ >
0, choose b large enough such that (cid:80) ∞ j = b (cid:0) (cid:1) j < ϑK . Then, for all b > b ≥ b , we have | I ( a, b ) − I ( a, b ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b − (cid:88) j = b I ( a, j + 1) − I ( a, j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ b − (cid:88) j = b | I ( a, j + 1) − I ( a, j ) |≤ K b − (cid:88) j = b (cid:18) (cid:19) j < ϑ, (3.7)revealing that I is Cauchy in b , which is the same as saying it is Cauchy in k b . The bound on the incrementof I in k b (given by (3.7)) is independent of a , so it does not depend on the choice of E ( a ) . Given this result,choose m ≥ m such that for all m ≥ m , we have that k m is large enough to make | I ( m, b ) − I ( m, b ) | < (cid:15) hold for all b , b ≥ k m . An application of the dominated convergence theorem together with the assumptionthat x (cid:55)→ η ( U i , x ) is continuous almost everywhere reveals that the choice of coefficients in (3.2) ensures thatfor k m fixed and for each i ∈ { , , . . . , k m } , there is a m (1 ,i ) large enough such that for all m ( i ) ≥ m (1 ,i ) ,we have (cid:90) E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) η ( U i , x ) − m ( i ) − (cid:88) j =0 I E j ( x ) η ( U i , x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( x ) dx< (cid:15) {(cid:107) c (cid:107) ∞ , (cid:107) Af (cid:107) ∞ , . . . (cid:107) Af n (cid:107) ∞ } (2 k m +1 ) (3.8)Set ˜ m = max (cid:8) max i =0 ,..., km m (1 ,i ) , m (cid:9) . Then, I ( ˜ m, k m ) ≤ (cid:90) E (cid:107) g (cid:107) ∞ km (cid:88) i =0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) η ( U i , x ) − ˜ m − (cid:88) j =0 I E j ( x ) η ( U i , x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( x ) dx ≤ (cid:107) g (cid:107) ∞ km (cid:88) i =0 (cid:90) E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) η ( U i , x ) − ˜ m − (cid:88) j =0 I E j ( x ) η ( U i , x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( x ) dx ≤ (cid:15) . I ( ˜ m, k m ) is decreasing in ˜ m , which again is revealed using the dominated convergence theorem.Also, for l ≥ k m , we have | I ( ˜ m, l ) | ≤ | I ( ˜ m, l ) − I ( ˜ m, k m ) | + | I ( ˜ m, k m ) | < (cid:15) (cid:15) (cid:15) I ≡ I ( ˜ m, k m ) does not exceed (cid:15) when ˜ m or k m increase. Choose m = max { ˜ m, m } .Then, for all m ≥ m , we have that I < (cid:15) . Proposition 3.9.
Consider a singular occupation measure µ that decomposes into µ ( dx × du ) = η ( du, x ) µ ,E ( dx ) .Let g ( x, u ) = c ( x, u ) or g ( x, u ) = Bf k ( x, u ) for any k = 1 , , . . . , n . For (cid:15) > , there exists an m ∈ N suchthat for all m ≥ m , (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U g ( x, u ) η ( du, x ) µ ,E ( dx ) − (cid:90) E (cid:90) U g ( x, u )ˆ η ,m ( du, x ) µ ,E ( dx ) (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15), (3.9) where ˆ η ,m is of the form (2.8), using the coefficients specified in (3.3).Proof. We only have to show that (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) U g ( s, u ) η ( du, s ) − g ( s, u )ˆ η ,m ( du, s ) (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15)µ ,E ( E ) (3.10)uniformly for s = e l or s = e r , since µ ,E only puts mass on these two points. By (3.3), (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) U g ( s, u ) η ( du, s ) − (cid:90) U g ( s, u )ˆ η ,m ( du, s ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) km (cid:88) i =0 (cid:90) U i g ( s, u ) η ( du, s ) − (cid:90) U i g ( s, u )ˆ η ,m ( du, s ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) km (cid:88) i =0 (cid:90) U i g ( s, u ) η ( du, s ) − g ( s, u i ) ζ j,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) km (cid:88) i =0 (cid:90) U i g ( s, u ) η ( du, s ) − g ( s, u i ) (cid:90) U i η ( du, s ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ km (cid:88) i =0 (cid:90) U i | g ( s, u ) − g ( s, u i ) | η ( du, s )holds. According to Lemma 3.7, there is a δ > | g ( s, u ) − g ( s, v ) | < (cid:15)µ ,E ( E ) whenever | u − v | < δ .Hence it suffices to choose m large enough such that for all m ≥ m it is true that km < δ to ensure that | u − u i | < δ .Proposition 3.8 and Proposition 3.9 have established the approximation properties of the coefficientchoices made by (3.2) and (3.3), without paying any respect to the constraints defining M ln,m . We proceedto link these approximate controls to measures that actually fulfill those constraints. To do so, we need tobe able to quantify how far away a given approximation is from satisfying the constraints. This motivatesthe following definitions. Definition 3.10.
Let η be any relaxed control. For n, m ∈ N , define the constraint matrix C ( m ) ∈ R n +1 , m by C ( m ) k,j = (cid:90) E (cid:90) U Af k ( x, u ) η ( du, x ) p j ( x ) dx, k = 1 , , . . . , n, j = 0 , , . . . , m − C ( m ) n +1 ,j = (cid:90) E p j ( x ) dx j = 0 , , . . . , m − . efinition 3.11. For m ∈ N and ˆ η ,m given by (3.2), ˆ η ,m given by (3.3) and some ˜ p in the span of { p , p , . . . , p m − } , with µ ,E being the state space marginal of the previously fixed measure µ , the constrainterror d ( m ) (˜ p ) ∈ R n +1 is defined for k = 1 , . . . , n by d ( m ) k (˜ p ) = Rf k − (cid:90) E (cid:90) U Af k ( x, u )ˆ η ,m ( du, x )˜ p ( x ) dx − (cid:90) E (cid:90) U Bf k ( x, u )ˆ η ,m ( du, x ) µ ,E ( dx ) , and for k = n + 1 by d ( m ) n +1 = 1 − (cid:82) E ˜ p ( x )( dx ) . One can increase m to the point that the constraint matrix C ( m ) has full rank. This will help us findadjustments to the coefficients of ˜ p which will let the constraint error vanish. The following results showthat we can attain an arbitrarily small constraint error using the proposed approximation. Their proofs arerather technical, and are given in Appendix A. Lemma 3.12.
Let p be a probability density function with λ ( { x : p ( x ) = 0 } ) = 0 . Then, for any (cid:15) > and D > , there exists an ˆ (cid:15) < (cid:15) and an m such that for all m ≥ m , there is a ˜ p m in the span of { p , p , . . . , p m − } with (cid:107) p − ˜ p m (cid:107) L ( E ) < ˆ (cid:15) D and ˜ p m ≥ ˆ (cid:15) on E . Lemma 3.13.
Consider a pair of measures ( µ , µ ) ∈ M n, ∞ , and let µ ( dx × du ) = η ( du, x ) p ( x ) dx aswell as µ ( dx × du ) = η ( du, x ) µ ,E ( dx ) . Let ¯ A = max k =1 ,...,n (cid:107) Af k (cid:107) ∞ . For δ > and D ≥ , there exists an ˆ (cid:15) ≤ δ and an m ∈ N such that for all m ≥ m , there is a function ˜ p m in the span of { p , p , . . . , p m − } with (cid:107) d ( m ) (˜ p m ) (cid:107) ∞ < ˆ (cid:15) , where d ( m ) (˜ p m ) is the constraint error using the approximations ˆ η ,m ( du, x ) and ˆ η ,m ( du, x ) of the given controls η and η defined by the coefficients given in (3.2) and (3.3). In particular, (cid:107) p − ˜ p m (cid:107) L ( E ) < ˆ (cid:15) { ¯ A, } as well as ˜ p m ≥ D · ˆ (cid:15) holds. Next, we establish that we can find a ‘correction’ term y for the coefficients of ˜ p , which can be used todefine a measure that satisfies the constraints, with a maximum norm that does not exceed a given bound (cid:15) . For the proof of the following statement, we again refer to Appendix A. Lemma 3.14.
Consider a pair of measures ( µ , µ ) ∈ M n, ∞ , and let µ ( dx × du ) = η ( du, x ) p ( x ) dx aswell as µ ( dx × du ) = η ( du, x ) µ ,E ( dx ) . For any ϑ > , there is a ˆ ϑ < ϑ and an m ∈ N such that for all m ≥ m , there is a ˜ p m ∈ span { p , p , . . . , p m − } such that the equation C ( m ) y = − d ( m ) (˜ p m ) has a solution ˜ y with (cid:107) ˜ y (cid:107) ∞ ≤ ˆ ϑ , In particular, ˜ p m ≥ ˆ ϑ and (cid:107) p − ˜ p m (cid:107) L ( E ) < ˆ ϑ hold. At this point, we have gathered the results to prove Proposition 3.2, stated again for the sake of presen-tation.
Proposition 3.2.
For ( µ , µ ) ∈ M ln and each (cid:15) > , there is an m such that for all m ≥ m there existsa (ˆ µ ,m , ˆ µ ,m ) ∈ M ln,m , with | J ( µ , µ ) − J (ˆ µ ,m , ˆ µ ,m ) | < (cid:15). Proof.
Fix (cid:15) >
0. For ( µ , µ ) ∈ M ln , let µ ,E be the state space marginal of µ and let η be the regular condi-tional probability such that µ ( dx × du ) = η ( du, x ) µ ,E ( dx ). Similarly, let µ ( dx × du ) = η ( du, x ) µ ,E ( dx ).Define ˆ η ,m and ˆ η ,m using the coefficients given by (3.2) and (3.3), respectively. First, by Proposition 3.9,we have that there is an m such that ∀ m ≥ m , (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U c ( x, u ) η ( du, x ) µ ,E ( dx ) − (cid:90) E (cid:90) U c ( x, u )ˆ η ,m ( du, x )ˆ µ ,E ( dx ) (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15) . (3.11)Now, we consider the approximation of the cost accrued by c . We will show that (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U c ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U c ( x, u )ˆ η ,m ( du, x )ˆ p m ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) < (cid:15) , (3.12)15or the given choice of ˆ η ,m and a choice of ˆ p m to be identified. This will be done by a successive applicationof the triangle inequality. First, observe that (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U c ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U c ( x, u )ˆ η ,m ( du, x )ˆ p m ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U c ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U c ( x, u )ˆ η ,m ( du, x ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U c ( x, u )ˆ η ,m ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U c ( x, u )ˆ η ,m ( du, x )ˆ p m ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) ≡ | I | + | I | . Now set ϑ = min (cid:26) (cid:15) (cid:107) c (cid:107) ∞ max { , ( e r − e l ) } , (cid:15) max { ¯ A, } (cid:107) c (cid:107) ∞ (cid:27) . (3.13)By Lemma 3.14 we can choose an m ≥ m such that for all m ≥ m , there is a function ˜ p m = (cid:80) m − i =0 ˜ γ i p i ∈ span { p , p , . . . , p m − } that allows for a solution ˜ y to C ( m ) y = − d ( m ) (˜ p m ) with (cid:107) ˜ y (cid:107) ∞ ≤ ˆ ϑ < ϑ for someˆ ϑ < ϑ , but ˜ p m ≥ ˆ ϑ . This m is also large enough to approximate p by ˜ p m with an accuracy of (cid:15) (cid:107) c (cid:107) ∞ for all m ≥ m . Define new coefficients γ i = ˜ γ i − ˜ y i and set ˆ p m = (cid:80) m − i =0 γ i p i . Then, for all k = 1 , , . . . , n , d ( m ) k (ˆ p m ) = Rf k − (cid:16) C ( m ) γ (cid:17) k − (cid:90) E (cid:90) U Bf k ( x, u )ˆ η ,m µ ,E ( dx )= Rf k − (cid:16) C ( m ) (˜ γ − ˜ y ) (cid:17) k − (cid:90) E (cid:90) U Bf k ( x, u )ˆ η ,m µ ,E ( dx )= Rf k − (cid:16) C ( m ) ˜ γ (cid:17) k − (cid:90) E (cid:90) U Bf k ( x, u )ˆ η ,m µ ,E ( dx ) + (cid:16) C ( m ) ˜ y (cid:17) k = d ( m ) k (˜ p m ) − d ( m ) k (˜ p m ) = 0and d ( m ) n +1 (ˆ p m ) = 1 − ( C ( m ) ˜ γ ) n +1 + ( C ( m ) ˜ y ) n +1 = d ( m ) n +1 (˜ p m ) − d ( m ) n +1 (˜ p m ) = 0which shows that ˆ p m fulfills the constraints. But also, ˆ p m ≥
0. So(ˆ η ,m ( du, x )ˆ p m ( x ) dx, ˆ η ,m ( du, x ) µ ,E ( dx ) ∈ M ln,m . Furthermore, (cid:107) p − ˆ p m (cid:107) L ( E ) ≤ (cid:107) p − ˜ p m (cid:107) L ( E ) + (cid:107) ˜ p m − ˆ p m (cid:107) L ( E ) ≤ (cid:15) (cid:107) c (cid:107) ∞ + (cid:15) (cid:107) c (cid:107) ∞ = (cid:15) (cid:107) c (cid:107) ∞ . This shows that | I | ≤ (cid:90) E (cid:107) c (cid:107) ∞ (cid:90) U ˆ η ,m ( du, x ) | p ( x ) − ˆ p m ( x ) | dx < (cid:107) c (cid:107) ∞ (cid:107) p − ˆ p m (cid:107) L < (cid:15) , since (cid:82) U ˆ η ,m ( du, x ) = 1. Turning to | I | , we have seen in Proposition 3.8 that there is an m ≥ m suchthat for all m ≥ m , | I | < (cid:15) . To sum up, (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U c ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U c ( x, u )ˆ η ,m ( du, x )ˆ p m ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ I + I ≤ (cid:15) (cid:15) (cid:15) m = m .So far, we have investigated the approximation of arbitrary measures. Proposition 3.2 is instrumental inproving the next important result, Corollary 3.3, which analyzes how optimal solutions in M ln,m relate to (cid:15) -optimal solution in M ln . The following lemma will as well be needed. Its proof is similar to an argumentused in (3.1), and thus is omitted. 16 emma 3.15. Let { µ ,n,m , µ ,n,m } be a sequence of measures such that for each m , ( µ ,n,m , µ ,n,m ) ∈ M ln,m .Assume that µ ,n,m ⇒ ˆ µ ,n and µ ,n,m ⇒ ˆ µ ,n as m → ∞ . Then, (ˆ µ ,n , ˆ µ ,n ) ∈ M ln . Now we can prove Corollary 3.3 from Section 3.1.
Corollary 3.3.
For each m ∈ N , assume that ( µ ∗ ,n,m , µ ∗ ,n,m ) ∈ M ln,m and that for m ∈ M , ( µ ∗ ,n,m , µ ∗ ,n,m ) is an optimal solution to the l -bounded, ( n, m ) -dimensional linear program. Then, the sequence of numbers { J ( µ ∗ ,n,m , µ ∗ ,n,m ) } m ∈ N converges to J ∗ n = inf ( µ ,n ,µ ,n ) ∈ M ln J ( µ ,n , µ ,n ) as m → ∞ .Proof. First, observe that if µ ∗ ,n,m ⇒ ˆ µ ∗ ,n and µ ∗ ,n,m ⇒ ˆ µ ∗ ,n as m → ∞ for some (ˆ µ ∗ ,n , ˆ µ ∗ ,n ) ∈ M ln ,it follows that J (ˆ µ ∗ ,n , ˆ µ ∗ ,n ) = J ∗ n . The proof of this claim is as follows. Assume the opposite. Then,there is a pair of measures (ˆ µ ,n , ˆ µ ,n ) ∈ M ln with J (ˆ µ ∗ ,n , ˆ µ ∗ ,n ) − J (ˆ µ ,n , ˆ µ ,n ) > (cid:15) for some (cid:15) > m large enough such that for all m ≥ m , | J (ˆ µ ∗ ,n , ˆ µ ∗ ,n ) − J ( µ ∗ ,n,m , µ ∗ ,n,m ) | < (cid:15) and hence J ( µ ∗ ,n,m , µ ∗ ,n,m ) − J (ˆ µ ,n , ˆ µ ,n ) > (cid:15) for all m ≥ m . By Proposition 3.2, select m ≥ m large enoughthat there is a a pair of measures (ˆ µ ,n,m , ˆ µ ,n,m ) ∈ M ln,m with | J (ˆ µ ,n , ˆ µ ,n ) − J (ˆ µ ,n,m , ˆ µ ,n,m ) | < (cid:15) . Butthen, J (ˆ µ ,n,m , ˆ µ ,n,m ) < J ( µ ∗ ,n,m , µ ∗ ,n,m ), contradicting that ( µ ∗ ,n,m , µ ∗ ,n,m ) is the optimal solution in M ln,m .Also, { J ( µ ∗ ,n,m , µ ∗ ,n,m ) } m ∈ N is a decreasing sequence which is bounded from below, so it converges. As { µ ∗ ,n,m } m ∈ N and { µ ∗ ,n,m } m ∈ N are sequences of measures over a compact space, they are tight, and the fullmass of { µ ∗ ,n,m } m ∈ N is uniformly bounded by l . So there is a convergent subsequence { ( µ ∗ ,n,m , µ ∗ ,n,m ) } m ∈ N with µ ∗ ,n,m ⇒ ˆ µ ∗ ,n and µ ∗ ,n,m ⇒ ˆ µ ∗ ,n (3.14)for some (ˆ µ ∗ ,n , ˆ µ ∗ ,n ) ∈ M n . By the first part of this proof and because c and c are bounded and uniformlycontinuous, J ∗ n = J (ˆ µ ∗ ,n , ˆ µ ∗ ,n ) = (cid:90) c d ˆ µ ∗ ,n + (cid:90) c d ˆ µ ∗ ,n = lim m →∞ (cid:18)(cid:90) c d ˆ µ ∗ ,n,m + (cid:90) c d ˆ µ ∗ ,n,m (cid:19) = lim m →∞ J ( µ ∗ ,n,m , µ ∗ ,n,m ) , but { J ( µ ∗ ,n,m , µ ∗ ,n,m ) } m ∈ N converges, and any subsequence has to converge to its very limit. So,lim m →∞ J ( µ ∗ ,n,m , µ ∗ ,n,m ) = J ∗ n Consider a stochastic control problem with state space E = [0 ,
1] such that the process is governed by theSDE dX t = u ( X t ) dt + σdW t + dξ t , X = x in which u ( x ) ∈ U = [ − , ξ is a process that captures the singular behavior of X . The latter is givenby a reflection to the right at { } and a jump from { } to { } . We use the relaxed martingale formulation,compare Definition 1.1, and retain the coefficient functions b ( x, u ) = u and σ ( x, u ) ≡ σ . We adopt thelong-term average cost criterion, with cost functions c ( x, u ) = x , c ( e r , u ) ≡ c at the right endpoint forsome c ∈ R + and c ( e l , u ) = 0 at the left endpoint. This problem is known as the modified bounded followerin the literature. According to [10], the optimal control for this problem is a degenerate relaxed control η with η ( { u a ( x ) } , x ) = 1, where u a is of the form u a ( x ) = (cid:26) − x < a +1 x ≥ a.
01 0.750.5 p r ob a b ilit y state spacecontrol space Figure 4.1: Computed optimal relaxed control,coarse grid, modified bounded follower state space -1-0.500.51 a v e r a g e c on t r o l v a l u e Figure 4.2: Computed optimal average control,coarse grid, modified bounded followerThe ‘switching point’ a depends on the coefficient and cost functions. Furthermore, the state space marginal µ ,E under the optimal control has the density p a ( x ) = (cid:82) x exp (cid:0)(cid:82) yx − σ u a ( z ) dz (cid:1) dy (cid:82) (cid:82) x exp (cid:0)(cid:82) yx − σ u a ( z ) dz (cid:1) dy dx . We will compare the performance of the proposed numerical method against this analytic solution. Table 4.1shows the configuration of the problem, along with the optimal switching point a under this configurationand the weights of the occupation measure µ , capturing the singular behavior on the left boundary { } andon the right boundary { } , denoted w and w , respectively. It also shows the value of the cost criterion J ∗ under the optimal control. Table 4.2 and Table 4.3 show the results and performance measures for variousTable 4.1: Configuration and analytic solution, modified bounded follower x σ c a w w J ∗ . √ .
01 0 . . . . n m T J ∗ n,m e a e r e L . . . · − . · − . · − . . . · − . · − . · − . . . · − . · − . · − . . . · − . · − . · − . . . · − . · − . · − . . . · − . · − . · − . . . · − . · − . · −
10 10 0 . . . · − . · − . · −
11 11 0 . . . · − . · − . · − discretization levels n and m . To achieve higher accuracy, we added another mesh point for the choice ofbasis functions for p by cutting the interval in the middle of the state space in half. As the cost functiondoes not depend on u we expect the optimal solution to be a bang-bang control. Hence it suffices to choose k m = 0, which means the optimization has to choose between two possible control values {− , } .Table 4.2 shows the result for the approximate cost criterion J ∗ n,m . The column e a refers to the absolute18rror between J ∗ and J ∗ n,m , the column e r to the relative error between J ∗ and J ∗ n,m and the column e L to the L ( E )-distance between ˆ p m and p . T is the execution time, which is an average time taken from1000 repetitions of the same optimization run. In Table 4.3, ˆ w and ˆ w refer to the approximate values for w and w . The discretization levels m for table 4.3 are the same as in table 4.2 for respective n , and asbefore, e a refers to the absolute error and e r refers to the relative error of these quantities. Note that theTable 4.3: Results (2), modified bounded follower n ˆ w e a e r ˆ w e a e r . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · − . . · − . · −
10 2 . . · − . · − . . · − . · −
11 2 . . · − . · − . . · − . · − method produces already fairly accurate approximations in almost negligible time for n = 5 or n = 6. Theover-proportional increase in computing time for higher discretization levels ( n = 10 and n = 11) is dueto longer execution time of the linear program solver, and might indicate that the approximate problem isbecoming ill-conditioned. For n = 12 and m = 12, no reliable solution could be produced. In this case,the linear programming solver could find no point satisfying the constraints, which can be circumvented byincreasing the discretization level m without increasing the number of constraints n . However, this did notshow better performance than the presented cases. The absolute error for n = 11 is on a comparable level toresults obtained in [20]. Both the error of the cost criterion value and the L -error of the state space densityare steadily decreasing, which is a strong indication of a convergent method, together with the presentedconvergence results. The inferior approximation quality at n = 11 compared to n = 10 is believed to be dueto the problem becoming ill-conditioned.Figure 4.1 shows the computed relaxed control ˆ η for n = 4 and m = 4. Figure 4.2 shows the average
101 0.750.5 p r ob a b ilit y state spacecontrol space Figure 4.3: Computed optimal relaxed control,fine grid, modified bounded follower state space -1-0.500.51 a v e r a g e c on t r o l v a l u e Figure 4.4: Computed optimal average control,fine grid, modified bounded followercontrol value specified by this relaxed control. These figures have to be understood as follows. Figure 4.1displays the full relaxed control, specifying the probability to pick a certain control value u when the processis in a certain state x . This state x is found on the x -axis of the plot, labeled ‘state space’ and the choice of acontrol value u corresponds to the y -axis of the plot, labeled ‘control space’, while the probability ˆ η ( { u } , x )of picking this control value u is presented on the z -axis, labeled ‘probability’. For example, at x = 0 . u = − u = 1 is chosen with probability0. We can see that for any possible value of x , ˆ η assigns full mass on either one of the two possible controlvalues u = − u = 1. Hence, ˆ η can be represented by its average control function, which is given by x (cid:55)→ (cid:82) U u ˆ η ( du, x ). It is shown in Figure 4.2. In both Figure 4.1 and Figure 4.2, the red dots represent themesh points of the mesh E ( m ) as defined in (2.4). The switching point a at x = 0 .
75, where the controlswitches from − n = 4 and m = 4, as displayed in blue in Figure 4.5, clearly showsthe features inherited from the piecewise constant basis functions we use to approximate p . Its irregularpattern is due to the fact that we introduced an additional mesh point in the middle of the state space.Figure 4.5 also shows the exact solution displayed in red.For a finer grid with parameters n = 10 and m = 10, Figure 4.3 shows the computed relaxed control ˆ η .Figure 4.4 shows the average control function. The switching point a again is clearly visible. The red dotsindicating the mesh points lie so dense that they form a solid line in both plots. Figure 4.6 shows theapproximate state space density for the parameter choice of n = 10 and m = 10. The exact solution couldnot be visually distinguished from the approximate solution and is thus omitted from the figure. One canalso see a change in concavity of the state space density at roughly x = 0 .
75, which is where the controlswitches its behavior from selecting u = − u = 1. state space d e n s it y Figure 4.5: State space density, coarse grid, mod-ified bounded follower state space d e n s it y Figure 4.6: State space density, fine grid, modifiedbounded follower
To illustrate the performance of the numerical method on a different type of problem, consider a stochasticcontrol problem with state space E = [ − ,
1] such that the process is governed by the SDE dX t = u ( X t ) dt + σdW t + dξ t , X = x in which u ( x ) ∈ U = [ − , ξ models reflections at both − E . X can be viewed as a particle that randomly diffusions inside a confined space,and bounces off at the boundaries. Again, we adopt the long-term average cost criterion and use therelaxed martingale formulation, compare Definition 1.1. We retain the coefficient functions b ( x, u ) = u and σ ( x, u ) ≡ σ . To differentiate this example from the previous one, consider a cost structure given by c = x + u and c ( x, u ) ≡ c for some c ∈ R + at both left endpoint x = − x = 1. Inparticular, this means that using the control induces a cost. In contrast to the modified bounded followerof Section 4.1, we will see a different structure of the control since choosing the maximal or minimal controlvalues might not be optimal any longer, as this introduces additional costs. For this problem, no analyticalsolution is known to the authors.We examine the influence of the cost of the reflection c on the optimal control. All subsequent calculations20se σ = √ / n = 9, m = 9 and k m = m + 3 = 12. The latter is needed to attain a sufficient approximationof the cost function, compare (2.2).Figure 4.7 shows the average control x (cid:55)→ (cid:82) U u ˆ η ( du, x ) for a cost of reflections given by c = 0 .
01. We -1 -0.5 0 0.5 1 state space -1-0.500.51 a v e r a g e c on t r o l v a l u e Figure 4.7: Average of optimal control, c = 0 . -1 -0.5 0 0.5 1 state space d e n s it y Figure 4.8: State space density, c = 0 .
01, simpleparticle problemchose to show a plot of the average control function rather than the full relaxed control since the numericalsolutions were degenerate relaxed controls, putting full mass on the values attained by x (cid:55)→ (cid:82) U u ˆ η ( du, x ),with the exception of small rounding errors. Moreover, a full visualization of the relaxed control as seen inSection 4.1 is infeasible due to the high number of possible control values. Figure 4.8 shows the state spacedensity associated with the control of Figure 4.7. The computed optimality criterion is J ∗ n,m = 0 . -1 -0.5 0 0.5 1 state space -1-0.500.51 a v e r a g e c on t r o l v a l u e Figure 4.9: Average of optimal control, c = 1,simple particle problem -1 -0.5 0 0.5 1 state space d e n s it y Figure 4.10: State space density, c = 1, simpleparticle problem21 state space -1-0.500.51 a v e r a g e c on t r o l v a l u e Figure 4.11: Average of optimal control, c = 6,simple particle problem -1 -0.5 0 0.5 1 state space d e n s it y Figure 4.12: State space density, c = 6, simpleparticle problemThe interesting observation from this simulation is that the optimal control favors using the effect ofthe reflection to efficiently ‘push’ the process back into the interior. As the penalty for the reflection with c = 0 .
01 is rather mild compared to the cost of using the control at full scale, increasingly less influence isenacted by the control as we move closer to both boundaries − c = 1. It reveals that with a higher penalty for the reflection,it is beneficial to use the control more extensively, although a similar pattern as in the previous case can beobserved when the process approaches the boundaries of the state space. The control is used slightly less inthis area to benefit from the reflection in direction of the origin. The overall heavier use of the control resultsin a state space density (Figure 4.10) which is more concentrated around the origin than the one from theprevious case, see Figure 4.8. The value of the optimality criterion is given by J ∗ n,m = 0 . c = 6. Figure 4.11 shows the optimal control inthis setting, Figure 4.12 displays the state space density. In contrast to the previous two cases, the optimalcontrol tries to avoid a reflection under all circumstances by using its full force pushing back to the originwhen the process approaches the boundaries of the state space. Still, a trade-off is made when the processis close to the origin, and the control is used with less than full force to avoid the costs induced by c . Thestate space density concentrates even more around the origin in this setting. The value of the cost criterionis given by J ∗ n,m = 0 . The considerations presented in the present paper can be extended in several ways. From a numerical analysispoint of view, it is highly interesting how the numerical scheme behaves if higher-order basis elements areused to approximate the density p of the state space marginal of the occupation measure µ . The analyticsolution described in Section 4.1 has a density that is infinitely differentiable everywhere but at one point,thus justifying the use of, for example, piecewise linear basis functions. However, this would require anadaption of the presented convergence proof, in particular regarding the analysis leading up to the proofof Lemma 3.14. One aspect to be addressed is the fact that as soon as we use standard elements with aorder larger than 1, for example, quadratic Lagrange elements, the non-negativity of the approximate densitycannot be guaranteed by restricting the coefficients to be non-negative.Another topic to research would be the introduction of adaptive meshing techniques for both state and controlspace. Analytic or heuristic error estimator could guide a successive refinement of the meshes, leading to aincrease in accuracy without significantly higher computation time.From a modeling point of view, on the one hand, an adaption of the discretization scheme for modelsfeaturing an unbounded state space would enhance the number of applications for this numerical scheme.Several control problems in finance and economics feature an unbounded state space, and are well suited for22he linear programming approach. Initial investigations show that models with an unbounded state spacecan be approximated using a bounded state space with reflection boundaries. A full analysis of this approachwould allow us to use the methods presented in this paper in order to solve such models.On the other hand, problems with finite time horizon or even optimal stopping problems could also besolved with similar numerical techniques. While the analytic linear programming approach to address suchproblems is well studied, the discretization techniques presented in this chapter would have to be enhancedto reflect the time dependency of both constraint functions and measures. A numerical analysis of suchtechniques was conducted in [17], but a convergence analysis remained unconsidered. A Additional proofs
This appendix provides the proofs of Lemma 3.12, Lemma 3.13 and Lemma 3.14.
Proof of Lemma 3.12.
Find ˆ (cid:15) < (cid:15) such that λ ( { x : p ( x ) ≤ ˆ (cid:15) } ) < D , which is possible due to the conti-nuity from above of measures. Define ¯ p ( x ) = (cid:40) p ( x ) p ( x ) > ˆ (cid:15) ˆ (cid:15) p ( x ) ≤ ˆ (cid:15) . Then, (cid:107) p − ¯ p (cid:107) L ( E ) ≤ ˆ (cid:15) · λ ( { x : p ( x ) ≤ ˆ (cid:15) } ) ≤ ˆ (cid:15) D . Now, choose m large enough such that for all m ≥ m ,there is a ˜ p m ∈ span ( p , p , . . . , p m − ) with (cid:107) ¯ p − ˜ p m (cid:107) L ( E ) ≤ ˆ (cid:15) D and ˜ p m ≥ ˆ (cid:15) (note there is no point inchoosing ˜ p m < ˆ (cid:15) when approximating ¯ p ). Then, (cid:107) p − ˜ p m (cid:107) L ( E ) ≤ (cid:107) p − ¯ p (cid:107) L ( E ) + (cid:107) ¯ p − ˜ p m (cid:107) L ( E ) < ˆ (cid:15) D + ˆ (cid:15) D = ˆ (cid:15) D holds. Proof of Lemma 3.13.
Fix δ >
0. Since ( µ , µ ) ∈ M n, ∞ , we have that for each k = 1 , , . . . , nRf k = (cid:90) E (cid:90) U Af k ( x, u ) η ( du, x ) p ( x ) dx + (cid:90) E (cid:90) U Bf k ( x, u ) µ ( dx × du )and thereby for any ˜ p m in the span of { p , p , . . . , p m − } d ( m ) k (˜ p m )= Rf k − (cid:90) E (cid:90) U Af k ( x, u )ˆ η ,m ( du, x )˜ p m ( x ) dx − (cid:90) E (cid:90) U Bf k ( x, u )ˆ η ,m ( du, x ) µ ,E ( dx )= (cid:90) E (cid:90) U Af k ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U Af k ( x, u )ˆ η ,m ( du, x )˜ p m ( x ) dx + (cid:90) E (cid:90) U Bf k ( x, u ) η ( du, x ) µ ,E ( dx ) − (cid:90) E (cid:90) U Bf k ( x, u )ˆ η ,m ( du, x ) µ ,E ( dx )holds. The triangle inequality reveals that | d ( m ) k (˜ p m ) |≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U Af k ( x, u ) η ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U Af k ( x, u )ˆ η ,m ( du, x ) p ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U Af k ( x, u )ˆ η ,m ( du, x ) p ( x ) dx − (cid:90) E (cid:90) U Af k ( x, u )ˆ η ,m ( du, x )˜ p m ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U Bf k ( x, u ) η ( du, x ) µ ,E ( dx ) − (cid:90) E (cid:90) U Bf k ( x, u )ˆ η ,m ( du, x ) µ ,E ( dx ) (cid:12)(cid:12)(cid:12)(cid:12) ≡ (cid:12)(cid:12)(cid:12) d ( m ) k, (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) d ( m ) k, (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) d ( m ) k, (cid:12)(cid:12)(cid:12) . (cid:15) = δ and D = D · · max { ¯ A, } . Take ˆ (cid:15) and m from this result. Set ˆ (cid:15) = ˆ (cid:15) /D .Then, ˆ (cid:15) ≤ δ and for all m ≥ m , there is a ˜ p m ∈ span { p , p , . . . , p m − } such that (cid:107) p − ˜ p m (cid:107) L ( E ) < ˆ (cid:15) · max { ¯ A, } as well as ˜ p m ≥ D · ˆ (cid:15) holds. Also, (cid:12)(cid:12)(cid:12) d ( m ) k, (cid:12)(cid:12)(cid:12) ≡ (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) E (cid:90) U Af k ( x, u )ˆ η ,m ( du, x ) ( p ( x ) − ˜ p m ( x )) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ ¯ A (cid:107) p − ˜ p m (cid:107) L ( E ) < ˆ (cid:15) . By Proposition 3.8, we can choose m ≥ m such that for all m ≥ m , | d mk, | is bounded by ˆ (cid:15) . ByProposition 3.9, we can choose m ≥ m such that | d ( m ) k, | is bounded by ˆ (cid:15) for all m ≥ m , which shows that (cid:12)(cid:12)(cid:12) d ( m ) k (˜ p m ) (cid:12)(cid:12)(cid:12) < ˆ (cid:15) for k = 1 , , . . . , n . For k = n + 1, since p is a probability density, (cid:107) ˜ p m (cid:107) L ( E ) ≤ (cid:107) ˜ p m − p (cid:107) L ( E ) + (cid:107) p (cid:107) L ( E ) < ˆ (cid:15) { ¯ A, } + 1 < ˆ (cid:15) + 1 . Now assume that (cid:107) ˜ p m (cid:107) L ( E ) < − ˆ (cid:15) . Then, (cid:107) p (cid:107) L ( E ) ≤ (cid:107) p − ˜ p m (cid:107) L ( E ) + (cid:107) ˜ p m (cid:107) L ( E ) < ˆ (cid:15) { ¯ A, } + 1 − ˆ (cid:15) < ˆ (cid:15) + 1 − ˆ (cid:15) = 1 , a contradiction, and we have that 1 − ˆ (cid:15) ≤ (cid:107) ˜ p m (cid:107) L ( E ) ≤ (cid:15) . Hence, | d ( m ) n +1 (˜ p m ) | < ˆ (cid:15) , which completes the proof, upon setting m = m .Now we can show that the statement of Lemma 3.14 is true. Proof of Lemma 3.14.
Fix ϑ >
0. Select m ∈ N large enough such that for all m ≥ m , C ( m ) has full rankand thus n + 1 independent columns. For any m ≥ m , let ¯ C ( m ) ∈ R n +1 ,n +1 be a matrix consisting of n + 1independent columns of C ( m ) . Set δ = ϑ max (cid:110) , (cid:13)(cid:13)(cid:13)(cid:0) ¯ C ( m ) (cid:1) − (cid:13)(cid:13)(cid:13) ∞ (cid:111) and by Lemma 3.13, with δ and D = max { , (cid:107) (cid:0) ¯ C ( m ) (cid:1) − (cid:107) ∞ } , find m ≥ m such that for all m ≥ m ,there is a ˜ p m , with (cid:107) d ( m ) (˜ p m ) (cid:107) ∞ < ˆ (cid:15) ≤ δ , for some ˆ (cid:15) >
0, satisfying˜ p m ≥ max (cid:26) , (cid:107) (cid:16) ¯ C ( m ) (cid:17) − (cid:107) ∞ (cid:27) · ˆ (cid:15) ≥ (cid:107) (cid:16) ¯ C ( m ) (cid:17) − (cid:107) ∞ · ˆ (cid:15) as well as (cid:107) p − ˜ p m (cid:107) L ( E ) < ˆ (cid:15) · max { ¯ A, } . Set ˆ ϑ = max (cid:110) , (cid:107) (cid:0) ¯ C ( m ) (cid:1) − (cid:107) ∞ (cid:111) · ˆ (cid:15) and note that ˆ (cid:15) · max { ¯ A, } < ˆ ϑ .Consider the solution ˜ y ∈ R m for C ( m ) y = − d ( m ) (˜ p m ) that is given by injecting ¯ y = (cid:0) ¯ C ( m ) (cid:1) − (cid:0) − d ( m ) (˜ p m ) (cid:1) ∈ R n +1 into R m . Then, (cid:107) ˜ y (cid:107) ∞ = (cid:107) ¯ y (cid:107) ∞ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) ¯ C ( m ) (cid:17) − d ( m ) (˜ p m ) (cid:13)(cid:13)(cid:13)(cid:13) ∞ < (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) ¯ C ( m ) (cid:17) − (cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:107) d ( m ) (˜ p m ) (cid:107) ∞ ≤ ˆ ϑ. We now show that there is a solution ˜ y to C ( m ) y = d ( m ) (˜ p m ) that satisfies (cid:107) ˜ y (cid:107) ∞ ≤ ˆ ϑ . By the definition ofthe constraint matrix, for any m ∈ N , we have that for k = 1 , , . . . , n + 1 and i = 0 , , . . . , m − C ( m +1) k, i + C ( m +1) k, i +1 = C ( m ) k,i holds. Indeed, since for 1 ≤ k ≤ n , by the choice of basis functions { p , p , . . . , p m − } as indicator functionsover dyadic intervals, the entries of C ( m ) k,i are given by integration of the functions Af k over intervals that24re cut in half, and if k = n + 1, the entries are simply given by the interval lengths ( x j +1 − x j ) since p j = 1on [ x j +1 , x j ). Hence, if y is a solution to C ( m ) y = − d , the vector ¯ y ∈ R m +1 with components¯ y i +1 = ¯ y i = y i where i = 0 , , . . . , m −
1, satisfies C ( m +1) y = − d , and (cid:107) y (cid:107) ∞ = (cid:107) ¯ y (cid:107) ∞ holds. Inductively, this reveals that forany m ≥ m , there is a solution ˜ y to C ( m ) y = − d ( m ) (˜ p m ) which satisfies (cid:107) ˜ y (cid:107) ∞ = (cid:107) y (cid:107) ∞ ≤ ˆ ϑ . In particular,this means that there is a solution ˜ y to C ( m ) y = − d ( m ) (˜ p m ), with (cid:107) ˜ y (cid:107) ∞ = (cid:107) y (cid:107) ∞ < ˆ ϑ . For any m ≥ m ,this analysis can be conducted similarly, showing the result for m = m . References [1]
J. Anselmi, F. Dufour, and T. Prieto-Rumeau , Computable approximations for continuous-timeMarkov decision processes on Borel spaces based on empirical measures , J. Math. Anal. Appl., 443(2016), pp. 1323–1361.[2]
A. G. Bhatt and V. S. Borkar , Occupation measures for controlled Markov processes: characteri-zation and optimality , Ann. Probab., 24 (1996), pp. 1531–1562.[3]
P. Billingsley , Convergence of probability measures , Wiley Series in Probability and Statistics: Prob-ability and Statistics, John Wiley & Sons, Inc., New York, Second ed., 1999. A Wiley-IntersciencePublication.[4]
V. I. Bogachev , Measure theory. Vol. I, II , Springer-Verlag, Berlin, 2007.[5]
C. de Boor , A practical guide to splines , vol. 27 of Applied Mathematical Sciences, Springer-Verlag,New York, revised ed., 2001.[6]
W. H. Fleming and R. W. Rishel , Deterministic and stochastic optimal control , Springer-Verlag,Berlin-New York, 1975. Applications of Mathematics, No. 1.[7]
W. H. Fleming and H. M. Soner , Controlled Markov processes and viscosity solutions , vol. 25 ofStochastic Modelling and Applied Probability, Springer, New York, second ed., 2006.[8]
C. A. Hall and W. W. Meyer , Optimal error bounds for cubic spline interpolation , J. ApproximationTheory, 16 (1976), pp. 105–122.[9]
K. Helmes, S. R¨ohl, and R. H. Stockbridge , Computing moments of the exit time distributionfor Markov processes by linear programming , Oper. Res., 49 (2001), pp. 516–530.[10]
K. Helmes and R. H. Stockbridge , Determining the optimal control of singular stochastic processesusing linear programming , in Markov processes and related topics: A Festschrift for Thomas G. Kurtz,vol. 4 of Inst. Math. Stat. (IMS) Collect., Inst. Math. Statist., Beachwood, OH, 2008, pp. 137–153.[11]
P. Kaczmarek, S. T. Kent, G. A. Rus, R. H. Stockbridge, and B. A. Wade , Numericalsolution of a long-term average control problem for singular stochastic processes , Math. Methods Oper.Res., 66 (2007), pp. 451–473.[12]
S. Kumar and K. Muthuraman , A numerical method for solving singular stochastic control problems ,Operations Research, 52 (2004), pp. 563–582.[13]
T. G. Kurtz and R. H. Stockbridge , Existence of Markov controls and characterization of optimalMarkov controls , SIAM J. Control Optim., 36 (1998), pp. 609–653.[14]
T. G. Kurtz and R. H. Stockbridge , Linear programming formulations of singular stochasticprocesses , [arXiv:1707.09209], (2017). 2515]
H. J. Kushner and P. Dupuis , Numerical methods for stochastic control problems in continuoustime , vol. 24 of Applications of Mathematics (New York), Springer-Verlag, New York, Second ed., 2001.Stochastic Modelling and Applied Probability.[16]
J.-B. Lasserre and T. Prieto-Rumeau , SDP vs. LP relaxations for the moment approach in someperformance evaluation problems , Stoch. Models, 20 (2004), pp. 439–456.[17]
M. Lutz , A linear programming approach for optimal exercise and valuation of American lookbackoptions , Master’s thesis, University of Wisconsin-Milwaukee, 2007.[18]
A. S. Manne , Linear programming and sequential decisions , Management Sci., 6 (1960), pp. 259–267.[19]
M. S. Mendiondo and R. H. Stockbridge , Approximation of infinite-dimensional linear program-ming problems which arise in stochastic control , SIAM J. Control Optim., 36 (1998), pp. 1448–1472.[20]
G. A. Rus , Finite element methods for control of singular stochastic processes , PhD thesis, 2009. TheUniversity of Wisconsin - Milwaukee.[21]
R. Serrano , On the lp formulation in measure spaces of optimal control problems for jump-diffusions ,Systems & Control Letters, 85 (2015), pp. 33–36.[22]
M. I. Taksar , Infinite-dimensional linear programming approach to singular stochastic control , SIAMJournal on Control and Optimization, 35 (1997), pp. 604–625.[23]