A Modified Method of Successive Approximations for Stochastic Recursive Optimal Control Problems
aa r X i v : . [ m a t h . O C ] F e b A Modified Method of Successive Approximations forForward-Backward Stochastic Control Systems
Shaolin Ji ∗ Rundong Xu † February 3, 2021
Abstract . Based on the global stochastic maximum principle established by Hu [11], a modified methodof successive approximations (MSA for short) is established for decoupled forward-backward stochastic con-trol systems. The second-order adjoint processes is introduced in the augmented Hamiltonian minimizationstep in order to find the optimal control which can reach the global minimum of the cost functional. Thanksto the theory of bounded mean oscillation martingales (BMO-martingales for short), we give a delicate proofof the error estimate and obtain convergence result of the modified MSA algorithm.
Key words . BMO-martingales; Forward-backward stochastic control systems; Method of successiveapproximations; Global stochastic maximum principle; Stochastic recursive optimal control
AMS subject classifications.
It has always been paid much attention on finding numerical solutions to optimal control problems viascientific computing methods. As one of those methods, the method of successive approximations (MSAfor short) is a much efficient tool to tackle optimal control problems. On the one hand, compared withthe algorithms based on the dynamic programming approach (for example, the Bellman-Howard policyiteration algorithm in [15]), the MSA is an iterative method equipped with alternating propagation andoptimization steps based on the maximum principle. On the other hand, as was suggested in [22], thisleads to an alternative approach dealing with the deep residual networks (He et al. [9]) from the optimalcontrol viewpoint since such kind of neural networks can be regarded as a discretization of certain continuousdeterministic control systems.The MSA based on Pontryagin’s maximum principle [3] for numerical solutions to deterministic controlsystems was first proposed by Krylov et al. [18]. This method includes successive integrations of the stateand adjoint equations and updates the control variables by minimizing the Hamiltonian. After that, many ∗ Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, PR China. Email:[email protected] (Corresponding author). Research supported by the National Natural Science Foundation of China (No.11971263; 11871458). † Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, PR China. Email:[email protected]. D x σ ( · ) ≡ " to guarantee the secondterm Z α of the solution to the adjoint equation is bounded. Furthermore, since their modified MSA is basedon local SMP, it can not deal with the case when the control domain is non-convex.In this paper, on the one hand, we remove the above unnecessary assumption imposed on the diffusioncoefficients by employing the BMO property of the solution to the adjoint equation. More than that, wegeneralize the modified MSA to the decoupled forward-backward stochastic control system (FBSCS for short)for which the state equation is described by a decoupled forward-backward stochastic differential equation(FBSDE for short) (see [13], [24], [34] and the references therein). This kind of optimal control problemis also called stochastic recursive optimal control problem which plays an important role in economic andfinancial fields (see [5], [7], [20] and the references therein).On the other hand, we study the general case when the control domain may be non-convex. For thispurpose, we need to establish the modified MSA based on the global SMP for FBSCSs. As for the globalSMP, Peng [25] first established the general SMP for classical stochastic control systems. Then, numerousprogress has been made for various stochastic control systems (see [8], [26], [29], [30], [33] and the referencestherein). Recently, Hu [11] introduced two adjoint equations to obtain the global SMP for FBSCSs governedby decoupled FBSDEs and solved the open problem proposed by Peng [27]. Based on Hu’s work, Hu et al.[12] proposed a new method to obtain the first and second-order variational equations which are essentiallyfully coupled FBSDEs, and derived the global SMP for fully coupled FBSCSs.Our main contributions are as follows. Firstly, we extend the modified MSA from the classical stochasticcontrol systems to the decoupled FBSCSs (2.3). In contrast to the former, the emergence of Z u in thebackward state equation of (2.3) makes the error estimate more difficult. By applying the Girsanov trans-formation, the process (3.9) disappears in the drift term, and we get the error estimate (3.8) under a newreference probability measure successfully.Secondly, the convergence of our modified MSA does not need the assumption " D x σ ( · ) ≡ ". It is worthto pointing out that the challenge to obtain the desired error estimate is the unboundedness of the solution q u to the adjoint equation (2.7). As mentioned earlier, this technical difficulty was avoided if we impose therestrictive assumption " D x σ ( · ) ≡ " which makes q u ( Z α in their context) bounded. Fortunately, we foundthat q u · W is a multi-dimensional BMO-martingale and obtained the convergence of our modified MSA dueto some useful inequalities involved the harmonic analysis on the space of BMO-martingales.Thirdly, since our modified MSA is based on the global SMP, the augmented Hamiltonian contains thesecond-order adjoint process P u (see (2.8)) whose boundedness is essential to obtain the error estimate (3.8).We proved that the boundedness of P u depends on the BMO property of q u · W . Consequently, we claimthat the optimal control we found is a candidate control that reaches the global minimum.The rest of the paper is organized as follows. In section 2, preliminaries and the formulation of ourproblem are given. In section 3, we first show properties of the solutions to the adjoint equations, then stateour main results consist of the error estimate and the convergence of our modified MSA algorithm.2 Preliminaries and Problem Formulation
Fix a terminal value T ∈ (0 , + ∞ ) and three positive integers n , d and k . Let (Ω , F , P ) be a completeprobability space on which a standard d -dimensional Brownian motion W = ( W t , W t , ...W dt ) ⊺ t ∈ [0 ,T ] is defined,and F : = {F t } t ∈ [0 ,T ] be the P -augmentation of the natural filtration generated by W .Denote by R n the n -dimensional real Euclidean space, R n × m the set of n × m real matrices ( n, m ≥ )and S n × n the set of all n × n symmetric matrices. The scalar product (resp. norm) of A , B ∈ R n × m isdenoted by h A, B i = tr { AB ⊺ } (resp. k A k = p tr { AA ⊺ } ), where the superscript ⊺ denotes the transpose ofvectors or matrices. Denote by I n the n × n identity matrix.For any given p, q ≥ , we introduce the following Banach spaces. L p F T (Ω; R n ) : the space of F T -measurable R n -valued random variables ξ such that E [ | ξ | p ] < ∞ . L ∞F T (Ω; R n ) : the space of F T -measurable R n -valued random variables ξ such that ess sup ω ∈ Ω | ξ ( ω ) | < ∞ . L ∞F ([0 , T ]; R n ) : the space of F -adapted R n -valued processes ( ϕ t ) t ∈ [0 ,T ] such that k ϕ k ∞ := ess sup ( t,ω ) ∈ [0 ,T ] × Ω | ϕ t ( ω ) | < ∞ . S p F ([0 , T ]; R n ) : the space of F -adapted R n -valued continuous processes ( ϕ t ) t ∈ [0 ,T ] such that E " sup t ∈ [0 ,T ] | ϕ t | p < ∞ . H p F ([0 , T ]; R n ) : the space of R n -valued F -martingales M = (cid:0) M , . . . , M n (cid:1) ⊺ such that M = 0 and k M k H p := (cid:13)(cid:13)p tr {h M i T } (cid:13)(cid:13) L p < ∞ , where h M i t := (cid:0)(cid:10) M i , M j (cid:11) t (cid:1) ≤ i,j ≤ n for t ∈ [0 , T ] . M p ( R n × d ) : the space of R n × d -valued F -progressively measurable processes ( ϕ t ) t ∈ [0 ,T ] such that k ϕ k M p := E Z T | ϕ t | dt ! p p < ∞ .BM O : the space of processes M ∈ H F ([0 , T ]; R ) such that k M k BMO := sup τ (cid:13)(cid:13)(cid:13) ( E [ h M i T − h M i τ | F τ ]) (cid:13)(cid:13)(cid:13) ∞ < ∞ , (2.1)where the supremum is taken over all stopping times τ ∈ [0 , T ] . Furthermore, one can replace τ with alldeterministic times t ∈ [0 , T ] in definition (2.1). K ( R n × d ) : the space of R n × d -valued processes ϕ ∈ M ( R n × d ) such that k ϕ k K := sup τ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) E "Z Tτ | ϕ s | ds | F τ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ < ∞ , (2.2)where the supremum is taken over all stopping times τ ∈ [0 , T ] . Furthermore, one can replace τ with alldeterministic times t ∈ [0 , T ] in definition (2.2).We write BM O ( Q ) and K ( R n × d ; Q ) for any probability Q defined on (Ω , F ) whenever it is necessaryto indicate the underlying probability. For simplicity, if the underlying probability is P , we still use thenotations BM O and K ( R n × d ) . 3 .1 Some Notations and Results of BMO-Martingales Here we list some notations and results of BMO-martingales, which will be used in this paper. We refer thereaders to [6], [10], [17] and the references therein for more details.Denote by E ( M ) the Doléans-Dade exponential of a continuous local martingale M , that is, E ( M t ) =exp (cid:8) M t − h M i t (cid:9) for any t ∈ [0 , T ] . If M ∈ BM O , then E ( M ) is a uniformly integrable martingale (seeTheorem 2.3 in [17]).Let H be an R d -valued F -adapted process. Denote by H · W the stochastic integral of H with respect tothe d -dimensional Brownian motion W , that is, ( H · W ) t := P di =1 R t H is dW is for t ∈ [0 , T ] .The following theorem plays an significant role in characterizing the duality between H F ([0 , T ]; R ) and BM O . Theorem 2.1 (Fefferman’s inequality) . If M ∈ BM O and N ∈ H F ([0 , T ]; R ) , then E "Z T | d h M, N i t | ≤ √ k M k BMO k N k H . For any M ∈ BM O , the energy-type inequality for h M i is a significant result commonly used in BMO-martingale theory (see [17]). In essence, for any ϕ ∈ K ( R n × d ) , F -stopping time τ on [0 , T ] and A ∈ F τ , wecan apply Garsia’s Lemma ([10], Lemma 10.35) to the continuous increasing process (cid:16) A R tτ | ϕ s | ds (cid:17) t ∈ [ τ,T ] to obtain the following energy-type inequality. Proposition 2.2 (Energy inequality) . Let ϕ ∈ K ( R n × d ) . Then, for any integer m and F -stopping time τ on [0 , T ] , we have E " Z Tτ | ϕ s | ds ! m | F τ ≤ m ! k ϕ k m K . Recall that the space
BM O depends on the underlying probability measure. The following lemma showsthe equivalence of different BMO-norms under the Girsanov transformation.
Lemma 2.3 ([14], Lemma A.4) . Let
K > be a given constant and M be in BM O . Then, there areconstants c > and c > depending only on K such that for any N ∈ BM O and k N k BMO ≤ K , we have c k M k BMO ≤ (cid:13)(cid:13)(cid:13) ˜ M (cid:13)(cid:13)(cid:13) BMO (˜ P ) ≤ c k M k BMO , where ˜ M := M − h M, N i and d ˜ P := E ( N T ) d P . The following proposition is a more profound result by applying Fefferman’s inequality.
Proposition 2.4 ([6], Lemma 1.4) . Let p ≥ . Assume that X ∈ S p F ([0 , T ]; R ) and M ∈ BM O . Then, X · M ∈ H p F ([0 , T ]; R ) . Moreover, we have the following estimate k X · M k H p ≤ √ k X k S p k M k BMO for p > and k X · M k H ≤ k X k S k M k BMO . emark 2.5. Let ϕ ∈ S p F ([0 , T ]; R n ) for some p ≥ . For i ∈ { , , . . . , d } , the stochastic integral ofreal-valued process ( | ϕ t | ) t ∈ [0 ,T ] with respect to W i is well-defined. Moreover, | ϕ | · W i ∈ H p F ([0 , T ]; R ) since E h(cid:10) | ϕ | · W i (cid:11) p T i = E Z T | ϕ t | dt ! p ≤ T p E " sup t ∈ [0 ,T ] | ϕ t | p < ∞ . Consider the following decoupled FBSCS: dX ut = b ( t, X ut , u t ) dt + σ ( t, X ut , u t ) dW t ,dY ut = − f ( t, X ut , Y ut , Z ut , u t ) dt + ( Z ut ) ⊺ dW t ,X u = x , Y uT = Φ( X uT ) , (2.3)with the cost functional J ( u ( · )) := Y u (2.4)for a given x ∈ R n and measurable functions b : [0 , T ] × R n × U R n , σ : [0 , T ] × R n × U R n × d , f : [0 , T ] × R n × R × R d × U R and Φ : R n R , where the control domain U is a nonempty subset of R k . We want to find an optimal control minimizing (2.4) over the admissible control set.For deterministic control systems, it has been shown that the basic MSA may diverge when a bad initialvalue of control is chosen (see [2]) or the feasibility errors blow up (see [23]). It can be observed thatKerimkulov et al. [16] proposed directly a modified MSA for classical stochastic control systems to ensurethe convergence. To go a further step, we are aimed at establishing a modified MSA for stochastic recursivecontrol optimal control problems and obtaining the related convergence result.Before giving the modified MSA algorithm for (2.3), we first introduce the global SMP for stochasticcontrol system (2.3). Set b ( · ) = (cid:0) b ( · ) , b ( · ) , . . . , b n ( · ) (cid:1) ⊺ ∈ R n σ ( · ) = (cid:0) σ ( · ) , σ ( · ) , . . . , σ d ( · ) (cid:1) ∈ R n × d ,σ i ( · ) = (cid:0) σ i ( · ) , σ i ( · ) , . . . , σ ni ( · ) (cid:1) ⊺ ∈ R n for i = 1 , , . . . , d. An admissible control u ( · ) is an F -adapted process with values in U such that sup t ∈ [0 ,T ] E h | u t | i < ∞ . (2.5)Denote by U [0 , T ] the set of all admissible controls. We assume there exists at least an optimal controlminimizing (2.4), and impose the following assumptions on the coefficients of (2.3). Assumption 2.6.
Let L i , i = 1 , , are given positive constants.(i) b and σ are twice continuously differentiable with respect to x . b , σ , b x , σ x , b xx , σ xx are continuousin ( x, u ) . b x , σ x , b xx , σ xx are bounded. b and σ are bounded by L (1 + | x | + | u | ) . ii) Φ is twice continuously differentiable with respect to x . Φ , Φ x , Φ xx are continuous in x . Φ x , Φ xx are bounded, and Φ is bounded by L (1 + | x | ) .(iii) f is twice continuously differentiable with respect to ( x, y, z ) . f together with its gradient Df , Hessianmatrix D f with respect to x , y , z are continuous in ( x, y, z, u ) . Df , D f are bounded, and f is bounded by L (1 + | x | + | y | + | z | + | u | ) . Let us fix a u ( · ) ∈ U [0 , T ] arbitrarily. Under Assumption 2.6, thanks to [28] (Chapter V, Theorem 6) andTheorem 5.1 in [7], (2.3) admits a unique solution ( X u , Y u , Z u ) ∈ S F ([0 , T ]; R n ) ×S F ([0 , T ]; R ) × M ( R n × d ) .We call ( X u , Y u , Z u ) the state trajectory corresponding to u ( · ) . Particularly, let ¯ u ( · ) be an optimal control, (cid:0) ¯ X, ¯ Y , ¯ Z (cid:1) be the corresponding state trajectory of (2.3) and (¯ p, ¯ q ) , (cid:0) ¯ P , ¯ Q (cid:1) be the corresponding uniquesolution to the first-order adjoint equation (2.7), the second-order adjoint equation (2.8) below respectively.The (stochastic) Hamiltonian H : [0 , T ] × Ω × R n × R n × R n × d × R n × n × U R is defined as follow: H ( t, x, y, z, p, q, P, u ) = p ⊺ b ( t, x, u ) + d P i =1 (cid:0) q i (cid:1) ⊺ σ i ( t, x, u ) + f ( t, x, y, z + ∆( t, x, u ) , u )+ 12 d P i =1 (cid:0) σ i ( t, x, u ) − σ i ( t, ¯ X t , ¯ u t ) (cid:1) ⊺ P (cid:0) σ i ( t, x, u ) − σ i ( t, ¯ X t , ¯ u t ) (cid:1) , where q i is the i th column of q for i ∈ { , , . . . , d } , and ∆( t, x, u ) = (cid:16)(cid:0) σ ( t, x, u ) − σ ( t, ¯ X t , ¯ u t ) (cid:1) ⊺ p, . . . , (cid:0) σ d ( t, x, u ) − σ d ( t, ¯ X t , ¯ u t ) (cid:1) ⊺ p (cid:17) ⊺ . Then, the following maximum principle ([11], Theorem 3) holds:
Theorem 2.7.
Let Assumption 2.6 hold. Then, for all u ∈ U , H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , ¯ u t ) ≤ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , u ) , P − a.s., a.e. t ∈ [0 , T ] . (2.6)Secondly, it follows from the pioneering works mentioned before that a key step to control the divergentbehavior rigorously of the modified MSA is to obtain the error estimate by estimating the difference betweentwo cost functionals J ( u ( · )) and J ( v ( · )) corresponding to different admissible controls u ( · ) and v ( · ) . In orderto do this, we need the following notations and introduce the generalized Hamiltonian together with theaugmented form of it.Define the function G : [0 , T ] × R n × R × R d × R n × R n × d × U × U R by G ( t, x, y, z, p, q, v, u ) = p ⊺ b ( t, x, v ) + d X i =1 (cid:0) q i (cid:1) ⊺ σ i ( t, x, v ) + f ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) , where q = (cid:0) q , q , . . . , q d (cid:1) and ˜∆( t, x, p, v, u ) := (cid:16)(cid:0) σ ( t, x, v ) − σ ( t, x, u ) (cid:1) ⊺ p, . . . , (cid:0) σ d ( t, x, v ) − σ d ( t, x, u ) (cid:1) ⊺ p (cid:17) ⊺ . Let u ( · ) , v ( · ) ∈ U [0 , T ] . For simplicity, for ψ = b , σ , f , Φ and w = x , y , z , set Θ ut = ( X ut , Y ut , Z ut ) , Θ u,vt = (cid:16) X ut , Y ut , Z ut + ˜∆( t, X ut , p ut , v t , u t ) (cid:17) ,ψ u ( t ) = ψ ( t, Θ ut , u t ) , ψ u,v ( t ) = ψ ( t, Θ u,vt , v t ) ,ψ uw ( t ) = ψ w ( t, Θ ut , u t ) , ψ u,vw ( t ) = ψ w ( t, Θ u,vt , v t ) ,ψ uww ( t ) = ψ ww ( t, Θ ut , u t ) , ψ u,vww ( t ) = ψ ww ( t, Θ u,vt , v t ) , D f u ( t ) = D f ( t, Θ ut , u t ) for all t ∈ [0 , T ] .In our context, the first-order (resp. second-order) adjoint equation in [11] for (2.3) is (2.7) (resp. (2.8))below. p ut = Φ x ( X uT ) + Z Tt { G x ( s, Θ us , p us , q us , u s , u s ) + G y ( s, Θ us , p us , q us , u s , u s ) p us + Υ( s, X us , p us , q us , u s ) G z ( s, Θ us , p us , q us , u s , u s ) } ds − d P i =1 Z Tt ( q us ) i dW is , (2.7) P ut = Φ xx ( X uT ) + Z Tt (cid:8) f uy ( s ) P us + ( b ux ( s )) ⊺ P us + ( P us ) ⊺ b ux ( s )+ d P i =1 f uz i ( s ) h(cid:16) ( σ ux ( s )) i (cid:17) ⊺ P us + ( P us ) ⊺ ( σ ux ( s )) i i + d P i =1 (cid:16) ( σ ux ( s )) i (cid:17) ⊺ P us ( σ ux ( s )) i + d P i =1 f uz i ( s ) ( Q us ) i + d P i =1 h(cid:16) ( σ ux ( s )) i (cid:17) ⊺ ( Q us ) i + (cid:16) ( Q us ) i (cid:17) ⊺ ( σ ux ( s )) i i + Ψ us (cid:27) ds − d P i =1 Z Tt ( Q us ) i dW is , (2.8)where Υ( t, X ut , p ut , q ut , u t ) := (cid:16)(cid:0) σ x ( t, X ut , u t ) (cid:1) ⊺ p ut + ( q ut ) , . . . , (cid:0) σ dx ( t, X ut , u t ) (cid:1) ⊺ p ut + ( q ut ) d (cid:17) and Ψ ut := n P j =1 ( b uxx ( t )) j ( p ut ) j + d P i =1 n P j =1 ( σ uxx ( t )) ji (cid:16) f uz i ( t ) ( p ut ) j + ( q ut ) ji (cid:17) + ( I n , p ut , Υ( t, X ut , p ut , q ut , u t )) D f u ( t ) ( I n , p ut , Υ( t, X ut , p ut , q ut , u t )) ⊺ . (2.9)Define the generalized Hamiltonian H : [0 , T ] × R n × R n × R n × d × R n × n × U × U R by H ( t, x, y, z, p, q, P, v, u )= G ( t, x, y, z, p, q, v, u ) + 12 d P i =1 (cid:0) σ i ( t, x, v ) − σ i ( t, x, u ) (cid:1) ⊺ P (cid:0) σ i ( t, x, v ) − σ i ( t, x, u ) (cid:1) . (2.10)Then, for all u ∈ U , the maximum principle in Theorem 2.7 can be rewritten as follow: H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , ¯ u t , ¯ u t ) ≤ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , u, ¯ u t ) , P − a.s., a.e. t ∈ [0 , T ] . Now we define the augmented form ˜ H : [0 , T ] × R n × R n × R n × d × R n × n × U × U R for some ρ ≥ by ˜ H ( t, x, y, z, p, q, P, v, u )= H ( t, x, y, z, p, q, P, v, u ) + ρ ( P ψ ∈{ b,σ,f } | ψ ( t, x, y, z, v ) − ψ ( t, x, y, z, u ) | + P w ∈{ x,y,z } | G w ( t, x, y, z, p, q, v, u ) − G w ( t, x, y, z, p, q, u, u ) | ) . (2.11)Note that when ρ = 0 we get exactly the generalized Hamiltonian (2.10). Moreover, the maximumprinciple also holds for ˜ H , which is a basis of constructing the iterative algorithm.7 emma 2.8 ( Extended SMP ) . Let ¯ u ( · ) be an optimal control, (cid:0) ¯ X, ¯ Y , ¯ Z (cid:1) be the corresponding state tra-jectory of (2.3) and (¯ p, ¯ q ) , (cid:0) ¯ P , ¯ Q (cid:1) be the corresponding unique solution to (2.7), (2.8) respectively. Then wehave ˜ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , ¯ u t , ¯ u t ) = min u ∈ U ˜ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , u, ¯ u t ) , P − a.s., a.e. t ∈ [0 , T ] . (2.12)The proof of the extended SMP is a direct application of Theorem 2.7 and (2.11) so we omit it. We call (cid:0) ¯ X, ¯ Y , ¯ Z, ¯ p t , ¯ q t , ¯ P t , ¯ u ( · ) (cid:1) a solution to (2.12).It should be emphasized that not all solutions to (2.12) are global optimal for (2.3) since the usual SMP(2.6) is only the necessary condition for optimality and (2.12) is weaker than (2.6). However, (2.6) is oftenstrong enough to give good solution candidates in practice and so is (2.12). In some special cases (seeExample 1 in [11]), (2.6) also becomes a sufficient condition without any convexity assumptions.As the end of this section, we introduce the modified MSA in the following algorithm. Algorithm 1
Modified Method of Successive Approximations For Decoupled FBSCSs Initialisation:
Make a guess of the control u = (cid:0) u t (cid:1) t ∈ [0 ,T ] . while the difference between J ( u m ( · )) and J ( u m − ( · )) is large do Given a control u m − = (cid:0) u m − t (cid:1) t ∈ [0 ,T ] solve FBSDE (2.3), then obtain the state trajectory ( X m , Y m , Z m ) . Solve BSDE (2.7) and (2.8) with the control u m − ( · ) and state trajectory ( X m , Y m , Z m ) , then obtainthe first and second-order adjoint variables ( p m , q m ) and ( P m , Q m ) . Update the control u mt ∈ arg min u ∈ U ˜ H ( t, X mt , Y mt , Z mt , p mt , q mt , P mt , u, u m − t ) for t ∈ [0 , T ] . (2.13) end while Return u m . In this section, the universal constant C may depend only on n , d , T , k b x k ∞ , k σ x k ∞ , k Φ x k ∞ , k Df k ∞ , (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ and will change from line to line in our proof. Before stating main results of the paper, we first give some properties of solutions to adjoint equations (2.7)and (2.8), which is necessary to prove the convergence of the Algorithm 1.Under Assumption 2.6, the following lemma shows that the first (resp. second) variable p (resp. q ) ofthe solution to the first-order adjoint equation (2.7) belongs to the space L ∞F ([0 , T ]; R n ) (resp. K ( R n × d ) ). Lemma 3.1.
Let Assumption 2.6 hold. Then, for any u ( · ) ∈ U [0 , T ] , equation (2.7) admits a unique solution ( p u , q u ) ∈ S F ([0 , T ]; R n ) × M ( R n × d ) , where q u = (cid:16) ( q u ) , ( q u ) , . . . , ( q u ) d (cid:17) . Moreover, p u ∈ L ∞F ([0 , T ]; R n ) and q u ∈ K ( R n × d ) . roof . At first, equation (2.7) can be rewritten in the following form: p ut = Φ x ( X uT ) + Z Tt " ( A u ( s )) ⊺ p us + d X i =1 (cid:16) ( B u ( s )) i (cid:17) ⊺ ( q us ) i + f ux ( s ) ds − d X i =1 Z Tt ( q us ) i dW is , where A u ( t ) := d P i =1 f uz i ( t ) ( σ ux ( t )) i + f uy ( t ) I n + b ux ( t );( B u ( t )) i := f uz i ( t ) I n + ( σ ux ( t )) i , for i = 1 , , . . . , d. Since b x , σ x , Φ x , f x , f y and f z are bounded, it can be easily verified that both A u ( · ) and B u ( · ) are uniformlybounded. Moreover, by Theorem 5.1 in [7], there exists a unique solution ( p u , q u ) ∈ S F ([0 , T ]; R n ) ×M ( R n × d ) to BSDE (2.7). Furthermore, p u can be expressed explicitly by p ut = E " (Λ ut ) ⊺ (Γ uT ) ⊺ Φ x ( X uT ) + Z Tt (Λ ut ) ⊺ (Γ us ) ⊺ f ux ( s ) ds | F t for t ∈ [0 , T ] , (3.1)where Γ u and Λ u satisfy the following matrix-valued SDEs respectively: Γ ut = I n + Z t A u ( s )Γ us ds + d X i =1 Z t ( B u ( s )) i Γ us dW is for t ∈ [0 , T ] and Λ ut = I n + Z t Λ us (cid:20) − A u ( s ) + d P i =1 (cid:16) ( B u ( s )) i (cid:17) (cid:21) ds − d P i =1 Z t Λ us ( B u ( s )) i dW is for t ∈ [0 , T ] . By using Itô’s formula, it can be verified that Λ ut = (Γ ut ) − P -almost surely for all t ∈ [0 , T ] . For each fixed t ∈ [0 , T ] , set (Γ ts ) u = Γ us Λ ut for s ∈ [ t, T ] . Then it is easy to check that Γ t satisfies the following SDE: (cid:0) Γ ts (cid:1) u = I n + Z st A u ( s ) (cid:0) Γ tr (cid:1) u dr + d X i =1 Z st ( B u ( s )) i (cid:0) Γ tr (cid:1) u dW ir , s ∈ [ t, T ] . (3.2)By using a standard SDE estimate, we obtain E h sup s ∈ [ t,T ] (cid:12)(cid:12) (Γ ts ) u (cid:12)(cid:12) β | F t i ≤ C for all t ∈ [0 , T ] and any β > .Then, it follows immediately that | p ut | ≤ E " sup s ∈ [ t,T ] (cid:12)(cid:12) (Γ ts ) u (cid:12)(cid:12) | Φ x ( X uT ) | + Z Tt | f ux ( s ) | ds ! | F t ≤ ( k Φ x k ∞ + k f x k ∞ T ) E " sup s ∈ [ t,T ] (cid:12)(cid:12) (Γ ts ) u (cid:12)(cid:12) | F t ≤ C. k p u k ∞ < ∞ . Moreover, since p u is bounded, by Itô’s formula, one can obtain that Z Tt | q us | ds ≤ | Φ x ( X uT ) | + 2 Z Tt |h p us , ( A u ( s )) ⊺ p us i| ds + 2 Z Tt |h p us , f ux ( s ) i| ds +2 d P i =1 Z Tt (cid:12)(cid:12)(cid:12)D p us , (cid:16) ( B u ( s )) i (cid:17) ⊺ ( q us ) i E(cid:12)(cid:12)(cid:12) ds − d P i =1 Z Tt D p us , ( q us ) i E dW is , ≤ k Φ x k ∞ + 2 T (cid:16) k A u ( · ) k ∞ k p u k ∞ + k f x k ∞ k p u k ∞ (cid:17) +2 k p u k ∞ d P i =1 (cid:13)(cid:13)(cid:13) ( B u ( · )) i (cid:13)(cid:13)(cid:13) ∞ Z Tt (cid:12)(cid:12)(cid:12) ( q us ) i (cid:12)(cid:12)(cid:12) ds − d P i =1 Z Tt D p us , ( q us ) i E dW is . (3.3)Consequently, by taking conditional expectation on both sides of (3.3) and Young’s inequality, we have E hR Tt | q us | ds | F t i ≤ C for all t ∈ [0 , T ] , which implies that q u ∈ K ( R n × d ) . Remark 3.2.
An inspection of the above proof shows that sup u ( · ) ∈U [0 ,T ] ( k p u k ∞ + k q u k K ) ≤ C , i.e. C isindependent of u ( · ) . Now we give the property of solution ( P u , Q u ) to the second-order adjoint equation. Lemma 3.3.
Let in Assumption 2.6 hold. Then, for any u ( · ) ∈ U [0 , T ] , equation (2.8) admits a uniquesolution ( P u , Q u ) ∈ S F ([0 , T ]; S n × n ) × (cid:0) M ( S n × n ) (cid:1) d , where Q u = (cid:16) ( Q u ) , ( Q u ) , . . . , ( Q u ) d (cid:17) . Moreover, P u ∈ L ∞F ([0 , T ]; S n × n ) . Proof . By Theorem 5.1 in [7] with the boundedness of b x , σ x , Φ x , f x , b xx , σ xx , Φ xx and f xx , there existsa unique solution ( P u , Q u ) ∈ S F ([0 , T ]; S n × n ) × (cid:0) M ( R n × n ) (cid:1) d to BSDE (2.8). Furthermore, denote by ( P u ) j (resp. (Ψ u ) j , ( Q u ) ji ) the j th column of P u (resp. Ψ u , ( Q u ) i ) for i = 1 , , . . . , d , j = 1 , , . . . , n , and I ijn ∈ R n × n the matrix whose elements equal to 0 except that the one in i th row and j th column equals to . Set ˜ P ut := (cid:16)(cid:16) ( P ut ) (cid:17) ⊺ , (cid:16) ( P ut ) (cid:17) ⊺ , . . . , (( P ut ) n ) ⊺ (cid:17) ⊺ ∈ R n , (cid:16) ˜ Q us (cid:17) i := (cid:16)(cid:16) ( Q ut ) i (cid:17) ⊺ , (cid:16) ( Q ut ) i (cid:17) ⊺ , . . . , (cid:16) ( Q ut ) ni (cid:17) ⊺ (cid:17) ⊺ ∈ R n ,A u ( t ) := ( b ux ( t )) ⊺ . . . ( b ux ( t )) ⊺ ∈ R n × n , I ∗ n := I n · · · I n n ... . . . ... I nn · · · I nnn ∈ S n × n , ( B u ( t )) i := (cid:16) ( σ ux ( t )) i (cid:17) ⊺ . . . (cid:16) ( σ ux ( t )) i (cid:17) ⊺ ∈ R n × n , ( D u ( t )) i := (cid:0) σ ux ( t ) (cid:1) i I n · · · (cid:0) σ ux ( t ) (cid:1) ni I n ... . . . ... (cid:0) σ ux n ( t ) (cid:1) i I n · · · (cid:0) σ ux n ( t ) (cid:1) ni I n ∈ R n × n , ˜ P ut = ˜ P uT + Z Tt (cid:26)(cid:20) f uy ( s ) I n + d P i =1 f uz i ( s ) (cid:0) I n + I ∗ n (cid:1) ( B u ( s )) i + (cid:0) I n + I ∗ n (cid:1) A u ( s )+ d P i =1 ( D u ( s )) i ( B u ( s )) i (cid:21) ˜ P us + d P i =1 h f uz i ( s ) + (cid:0) I n + I ∗ n (cid:1) ( B u ( s )) i i (cid:16) ˜ Q us (cid:17) i + ˜Ψ us (cid:27) ds − Z Tt d P i =1 (cid:16) ˜ Q us (cid:17) i dW s , (3.4)where I n is the n × n identity matrix and ˜Ψ ut := (cid:16)(cid:16) (Ψ ut ) (cid:17) ⊺ , (cid:16) (Ψ ut ) (cid:17) ⊺ , ..., ((Ψ ut ) n ) ⊺ (cid:17) ⊺ . Obviously, (3.4)is a Lipschitz linear BSDE. Since p u ∈ L ∞F ([0 , T ]; R n ) and q u ∈ K ( R n × d ) , for all t ∈ [0 , T ] , it can be verifiedthat E Z Tt (cid:12)(cid:12)(cid:12) ˜Ψ us (cid:12)(cid:12)(cid:12) ds ! | F t = E Z Tt | Ψ us | ds ! | F t ≤ C. (3.5)Indeed, for instance, it follows immediately from the energy inequality that E Z Tt d P j,k =1 (cid:12)(cid:12)(cid:12) f uz j z k ( s ) ( q us ) j (cid:16) ( q us ) k (cid:17) ⊺ (cid:12)(cid:12)(cid:12) ds ! | F t ≤ d k f zz k ∞ E Z Tt | q us | dt ! | F t ≤ d k f zz k ∞ k q u k K . The other terms in (2.9) can be estimated similarly so that (3.5) holds. Therefore, similar to the proof ofLemma 3.1, one can obtain that sup u ( · ) ∈U [0 ,T ] (cid:13)(cid:13)(cid:13) ˜ P u (cid:13)(cid:13)(cid:13) ∞ < ∞ by applying Itô’s formula and the estimate (3.5),which implies that sup u ( · ) ∈U [0 ,T ] k P u k ∞ < ∞ . In order to prove the convergence of Algorithm 1, we need the following lemma about the error estimate. Itwill be seen that if we directly minimize H instead of ˜ H (Step 5 in Algorithm 1), then the updated controlvariable may fail to be the optimal descent direction for the cost functional when it incurs too much errorin solving the state and adjoint dynamics (Step 4 in Algorithm 1) in the next iteration. Lemma 3.4.
Suppose that Assumption 2.6 holds. For any u ( · ) ∈ U [0 , T ] , let v t ∈ arg min v ∈ U ˜ H ( t, X ut , Y ut , Z ut , p ut , q ut , P ut , v, u t ) . (3.6) Define a new probability ˜ P and a Brownian motion ˜ W with respect to ˜ P by d ˜ P := E d X i =1 Z T f u,vz i ( t ) dW it ! d P ; ˜ W it := W it − Z t f u,vz i ( s ) ds, i = 1 , , . . . , d (3.7) and denote by ˜ E [ · ] the mathematical expectation corresponding to ˜ P . Then, there exists a universal constant > such that J ( v ( · )) − J ( u ( · )) ≤ exp (cid:8) − k f y k ∞ T (cid:9) ˜ E "Z T ˆ H ( t ) dt + C ( P ψ ∈{ b,σ,f } ˜ E "Z T | ψ ( t, X ut , Y ut , Z ut , v t ) − ψ ( t, X ut , Y ut , Z ut , u t ) | dt + P w ∈{ x,y,z } ˜ E "Z T | G w ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G w ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | dt , (3.8) where ˆ H ( t ) := H ( t, X ut , Y ut , Z ut , p ut , q ut , P ut , v t , u t ) − H ( t, X ut , Y ut , Z ut , p ut , q ut , P ut , u t , u t ) for t ∈ [0 , T ] . Remark 3.5.
In a way, one can observe that the last two terms in (3.8) determine whether the Hamiltonianminimization step can make J ( · ) descend. In other words, they measure the degree of satisfaction of the stateequation (2.3), adjoint equations (2.7) and (2.8). We refer to them as the feasibility errors (see [23]). Proof . Let u ( · ) ∈ U [0 , T ] and v ( · ) is defined as (3.6). Denote by η ( t ) = Y vt − Y ut − ( p ut ) ⊺ ( X vt − X ut ); ζ i ( t ) = ( Z vt ) i − ( Z ut ) i − ˜∆ i ( t, X ut , p ut , v t , u t ) − (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) − R i ( t ) − ( p ut ) ⊺ (cid:2) σ ix ( t, X ut , v t ) − σ ix ( t, X ut , u t ) (cid:3) ( X vt − X ut ) for i = 1 , , . . . , d (3.9)and ˜ D f v ( t ) = 2 Z Z λD f ( t, Θ u,vt + λµ (Θ vt − Θ u,vt ) , v t ) dµdλ, where R i ( t ) := n X j =1 ( p ut ) j Z Z λ tr (cid:8) σ jixx ( t, X ut + λµ ( X vt − X ut ) , v t )( X vt − X ut )( X vt − X ut ) ⊺ (cid:9) dλdµ, for i = 1 , , . . . , d . We first note that G x ( t, x, y, z, p, q, v, u ) = b ⊺ x ( t, x, v ) p + d P i =1 (cid:0) σ ix ( t, x, v ) (cid:1) ⊺ q i + f x ( t, x, y, z + ˜∆( t, x, p, v, u ) , v )+ ˜∆ ⊺ x ( t, x, p, v, u ) f z ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) ,G y ( t, x, y, z, p, q, v, u ) = f y ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) ,G z ( t, x, y, z, p, q, v, u ) = f z ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) . η ( t ) , we have η ( t ) = η ( T ) + Z Tt (cid:26) f uy ( s ) η ( s ) + d P i =1 f u,vz i ( s ) ζ i ( s ) + (cid:2) f u,vy ( s ) − f uy ( s ) (cid:3) ( Y vs − Y us )+ G ( s, Θ us , p us , q us , v s , u s ) − G ( s, Θ us , p us , q us , u s , u s )+ ( G x ( s, Θ us , p us , q us , v s , u s ) − G x ( s, Θ us , p us , q us , u s , u s )) ⊺ ( X vs − X us )+ d P i =1 (cid:2) f u,vz i ( s ) − f uz i ( s ) (cid:3) (cid:0) Υ i ( s, X us , p us , q us , u s ) (cid:1) ⊺ ( X vs − X us )+ d P i =1 f u,vz i ( s ) R i ( s ) + R ( s ) + R ( s ) + R ( s ) (cid:27) ds − d P i =1 Z Tt ζ i ( s ) dW is , (3.10)where η ( T ) = Z Z λ tr { Φ xx ( X uT + λµ ( X vT − X uT ))( X vt − X ut )( X vt − X ut ) ⊺ } dλdµ,R ( t ) = n X j =1 ( p ut ) j Z Z λ tr (cid:8) b jxx ( t, X ut + λµ ( X vt − X ut ) , v t )( X vt − X ut )( X vt − X ut ) ⊺ (cid:9) dλdµ,R ( t ) = d X i =1 n X j =1 ( q ut ) ji Z Z λ tr (cid:8) σ jixx ( t, X ut + λµ ( X vt − X ut ) , v t )( X vt − X ut )( X vt − X ut ) ⊺ (cid:9) dλdµ and R ( t ) = 12 h ( X vt − X ut ) ⊺ , Y vt − Y ut , (cid:16) ( Z vt ) − ( Z ut ) − ˜∆( t, X ut , p ut , v t , u t ) (cid:17) ⊺ i ˜ D f v ( t ) × h ( X vt − X ut ) ⊺ , Y vt − Y ut , (cid:16) ( Z vt ) − ( Z ut ) − ˜∆( t, X ut , p ut , v t , u t ) (cid:17) ⊺ i ⊺ . By definition of the Hamiltonian H , (3.10) can be rewritten as η ( t ) = η ( T ) + Z Tt (cid:26) f uy ( s ) η ( s ) + d P i =1 f u,vz i ( s ) ζ i ( s ) + ˆ H ( s ) + φ s (cid:27) − d P i =1 Z Tt ζ i ( s ) dW is , (3.11)where φ t := ( G x ( t, Θ ut , p ut , q ut , v t , u t ) − G x ( t, Θ ut , p ut , q ut , u t , u t )) ⊺ ( X vt − X ut )+ (cid:2) f u,vy ( t ) − f uy ( t ) (cid:3) ( Y vt − Y ut ) + d P i =1 (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut )+ d P i =1 f u,vz i ( t ) R i ( t ) + R ( t ) + R ( t ) + R ( t ) − d P i =1 (cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1) ⊺ P ut (cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1) . By definition (3.7), (3.11) can be further rewritten as η ( t ) = η ( T ) + Z Tt h f uy ( s ) η ( s ) + ˆ H ( s ) + φ s i ds − d X i =1 Z Tt ζ i ( s ) d ˜ W is . (3.12)By applying Itô’s formula to exp nR t f uy ( s ) ds o η ( t ) on [0 , T ] , we get η ( t ) = exp (Z Tt f uy ( s ) ds ) η ( T ) + Z Tt exp (cid:26)Z st f uy ( r ) dr (cid:27) (cid:16) ˆ H ( s ) + φ s (cid:17) ds − d P i =1 Z Tt exp (cid:26)Z st f uy ( r ) dr (cid:27) ζ i ( s ) d ˜ W is . (3.13)13t can be verified that ˜ E (cid:20)(cid:16)R T | ζ ( t ) | dt (cid:17) (cid:21) < ∞ hence the stochastic integral in (3.13) is a true martingaleunder ˜ P . Then, by taking mathematical expectation ˜ E [ · ] on both sides of (3.13), we have J ( v ( · )) − J ( u ( · )) = η (0)= ˜ E " exp (Z T f uy ( t ) dt ) η ( T ) + Z T exp (cid:26)Z t f uy ( s ) ds (cid:27) (cid:16) ˆ H ( t ) + φ t (cid:17) dt ≤ exp (cid:8) k f y k ∞ T (cid:9) ˜ E " | η ( T ) | + Z T | φ t | dt + ˜ E "Z T exp (cid:26)Z t f uy ( s ) ds (cid:27) ˆ H ( t ) dt . Due to the definition of v ( · ) in (3.6) and the definition of ˜ H in (2.11), we get H ( t, Θ ut , p ut , q ut , P ut , v t , u t ) + ρ ( P ψ ∈{ b,σ,f } | ψ ( t, Θ ut , v t ) − ψ ( t, Θ ut , u t ) | + P w ∈{ x,y,z } | G w ( t, Θ ut , p ut , q ut , v t , u t ) − G w ( t, Θ ut , p ut , q ut , u t , u t ) | ) ≤ H ( t, Θ ut , p ut , q ut , P ut , u t , u t ) , (3.14)which implies that ˆ H ( t ) ≤ a.e. t ∈ [0 , T ] and P -almost surely. Thus we obtain J ( v ( · )) − J ( u ( · )) ≤ exp (cid:8) k f y k ∞ T (cid:9) ˜ E " | η ( T ) | + Z T | φ t | dt + exp (cid:8) − k f y k ∞ T (cid:9) ˜ E "Z T ˆ H ( t ) dt . (3.15)In order to obtain (3.8), we shall proceed to estimate ˜ E h | η ( T ) | + R T | φ t | dt i as the following five parts. (i) Estimate of ˜ E h | η ( T ) | + R T (cid:16)P di =1 (cid:12)(cid:12) f u,vz i ( t ) R i ( t ) (cid:12)(cid:12) + | R ( t ) | (cid:17) dt i .Denote by ˜ b u,vx ( t ) = Z b x ( t, X ut + λ ( X vt − X ut ) , v t ) dλ ; ˜ f u,vx ( t ) = Z f x ( t, Θ ut + λ (Θ vt − Θ ut ) , v t ) dλ. ˜ σ u,vx ( t ) , ˜ f u,vy ( t ) and ˜ f u,vz ( t ) are defined similarly. Since X vt − X ut = Z t (cid:26)(cid:18) ˜ b u,vx ( s ) + d P i =1 f u,vz i ( s ) (˜ σ u,vx ( s )) i (cid:19) ( X vs − X us ) + b ( s, X us , v s ) − b ( s, X us , u s )+ d P i =1 f u,vz i ( s ) (cid:2) σ i ( s, X us , v s ) − σ i ( s, X us , u s ) (cid:3)(cid:27) ds + d P i =1 Z t n (˜ σ u,vx ( s )) i ( X vs − X us ) + (cid:2) σ i ( s, X us , v s ) − σ i ( s, X us , u s ) (cid:3)o d ˜ W is , by using a standard SDE estimate, we obtain ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt . (3.16)14herefore, by virtue of sup u ( · ) ∈U [0 ,T ] k p u k ∞ < ∞ and the boundedness of Φ xx , b xx , σ xx , f z , we get ˜ E " | η ( T ) | + d X i =1 Z T (cid:12)(cid:12) f u,vz i ( t ) R i ( t ) (cid:12)(cid:12) dt + Z T | R ( t ) | dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt . (ii) Estimate of ˜ E hR T | R ( t ) | dt i .Since sup u ( · ) ∈U [0 ,T ] k q u k K < ∞ , for i = 1 , , . . . , d and j = 1 , , . . . , n , we have ( q u ) ji · W i ∈ BM O .Furthermore, by Lemma 2.3, c (cid:13)(cid:13)(cid:13) ( q u ) ji · W i (cid:13)(cid:13)(cid:13) BMO ≤ (cid:13)(cid:13)(cid:13) ( q u ) ji · ˜ W i (cid:13)(cid:13)(cid:13) BMO (˜ P ) ≤ c (cid:13)(cid:13)(cid:13) ( q u ) ji · W i (cid:13)(cid:13)(cid:13) BMO , (3.17)where c and c are two constants depending only on k f z k ∞ and T . Then, it follows from Fefferman’sinequality, estimate (3.16) and inequality (3.17) that ˜ E "Z T | R ( t ) | dt ≤ k σ xx k ∞ d P i =1 n P j =1 ˜ E "Z T (cid:12)(cid:12)(cid:12) ( q ut ) ji (cid:12)(cid:12)(cid:12) | X vt − X ut | dt ≤ k σ xx k ∞ d P i =1 n P j =1 ˜ E "Z T (cid:12)(cid:12)(cid:12) d D ( q u ) ji · ˜ W i , | X v − X u | · ˜ W i E t (cid:12)(cid:12)(cid:12) ≤ √ T k σ xx k ∞ d P i =1 n P j =1 (cid:13)(cid:13)(cid:13) ( q u ) ji · ˜ W i (cid:13)(cid:13)(cid:13) BMO (˜ P ) ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | ≤ √ T ndc k σ xx k ∞ k q u k K ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt . (iii) Estimate of ˜ E hR T | R ( t ) | dt i .In order to estimate ˜ E hR T | R ( t ) | dt i , we only need to estimate the following term: ˜ E "Z T (cid:18)Z Z λ | f z i z i ( t, Θ u,vt + λµ (Θ vt − Θ u,vt ) , v t ) | dµdλ (cid:19) (cid:12)(cid:12)(cid:12) ( Z vt ) i − ( Z ut ) i − ˜∆ i ( t, X ut , p ut , v t , u t ) (cid:12)(cid:12)(cid:12) dt (3.18)for any i ∈ { , , . . . d } . Since Y vt − Y ut = Φ( X vT ) − Φ( X uT ) + Z Tt n(cid:16) ˜ f u,vx ( s ) (cid:17) ⊺ ( X vs − X us )+ ˜ f u,vy ( s )( Y vs − Y us ) + (cid:16) ˜ f u,vz ( s ) − f u,vz ( s ) (cid:17) ⊺ ( Z vs − Z us ) − (cid:16) ˜ f u,vz ( s ) (cid:17) ⊺ ˜∆( s, X us , p us , v s , u s ) + f u,v ( s ) − f u ( s ) o ds − Z Tt ( Z vs − Z us ) ⊺ d ˜ W is , | f u,v ( t ) − f u ( t ) | = (cid:12)(cid:12)(cid:12) f ( t, X ut , Y ut , Z ut + ˜∆( t, X ut , p ut , v t , u t ) , v t ) − f ( t, X ut , Y ut , Z ut , u t ) (cid:12)(cid:12)(cid:12) ≤ k f z k ∞ k p u k ∞ | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | + | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | , we get ˜ E " sup t ∈ [0 ,T ] | Y vt − Y ut | + Z T | Z vt − Z ut | dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + Z T | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | dt . (3.19)Thus, from estimate (3.19), and due to sup u ( · ) ∈U [0 ,T ] k p u k ∞ < ∞ and (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ < ∞ , (3.18) can be dominatedby (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ ˜ E "Z T (cid:12)(cid:12)(cid:12) ( Z vt ) i − ( Z ut ) i − ˜∆ i ( t, X ut , p ut , v t , u t ) (cid:12)(cid:12)(cid:12) dt ≤ C ˜ E "Z T | Z vt − Z ut | dt + Z T (cid:12)(cid:12)(cid:12) ˜∆( t, X ut , p ut , v t , u t ) (cid:12)(cid:12)(cid:12) dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + Z T | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | dt . The estimates for other terms in ˜ E hR T | R ( t ) | dt i are similar to (3.18). Hence, we obtain ˜ E "Z T | R ( t ) | dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + Z T | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | dt . (iv) Estimate of ˜ E hP di =1 R T (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) dt i .We first note that d P i =1 (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) ≤ d P i =1 (cid:16)(cid:12)(cid:12) f u,vz i ( t ) − f uz i ( t ) (cid:12)(cid:12) + (cid:12)(cid:12)(cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) (cid:12)(cid:12) (cid:17) ≤ | G z ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G z ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | + d k σ x k k p u k sup t ∈ [0 ,T ] | X vt − X ut | + d P i =1 n P j =1 (cid:12)(cid:12)(cid:12) ( q ut ) ji (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ( X vt − X ut ) j (cid:12)(cid:12)(cid:12) . (3.20)Since X v − X u ∈ S F ([0 , T ]; R n ) and q u ∈ K ( R n × d ) , by Proposition 2.4 and inequality (3.17), we can estimate16he last term in the second inequality of (3.20) as follow: ˜ E "Z T d P i =1 n P j =1 (cid:12)(cid:12)(cid:12) ( q ut ) ji (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ( X vt − X ut ) j (cid:12)(cid:12)(cid:12) dt = d P i =1 n P j =1 ˜ E hD ( X vt − X ut ) j · (cid:16) ( q ut ) ji · ˜ W i (cid:17)E T i ≤ d P i =1 n P j =1 (cid:13)(cid:13)(cid:13) ( q u ) ji · ˜ W i (cid:13)(cid:13)(cid:13) BMO (˜ P ) ˜ E " sup t ∈ [0 ,T ] (cid:12)(cid:12)(cid:12) ( X vt − X ut ) j (cid:12)(cid:12)(cid:12) ≤ ndc k q u k K ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | . (3.21)Thus, combine estimates (3.16), (3.21) with (3.20), we obtain ˜ E " d X i =1 Z T (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) dt ≤ C ( ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + ˜ E "Z T | G z ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G z ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | dt . The estimates for the first and second terms in φ are similar to the above inequality. (v) Estimate of ˜ E (cid:20) d P i =1 R T (cid:12)(cid:12)(cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1) ⊺ P ut (cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1)(cid:12)(cid:12) dt (cid:21) .We only need the result that sup u ( · ) ∈U [0 ,T ] k P u k ∞ < ∞ which has been proved in Lemma 3.3.Consequently, we have ˜ E " | η ( T ) | + Z T | φ t | dt ≤ C ( P ψ ∈{ b,σ,f } ˜ E "Z T | ψ ( t, X ut , Y ut , Z ut , v t ) − ψ ( t, X ut , Y ut , Z ut , u t ) | dt + P w ∈{ x,y,z } ˜ E "Z T | G w ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G w ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | dt . (3.22)Combine (3.22) with (3.15), we obtain the desired result. This completes the proof.Now we state the main result of the paper. Theorem 3.6.
Suppose that Assumption 2.6 holds. Then, for ρ > Cexp (cid:8) k f y k ∞ T (cid:9) , Algorithm 1 convergesto the set of solutions to the extended SMP. Proof . For any integer m , define a new probability P m by d P m := Ξ mT d P , where Ξ mt := E d X i =1 Z t f u m − ,u m z i ( s ) dW is ! . E m [ · ] the mathematical expectation with respect to P m and Θ mt = ( X mt , Y mt , Z mt ) for t ∈ [0 , T ] .In order to obtain convergence, set ˆ H m ( t ) = H ( t, Θ mt , p mt , q mt , P mt , u mt , u m − t ) − H ( t, Θ mt , p mt , q mt , P mt , u m − t , u m − t ) and µ m = E m hR T ˆ H m ( t ) dt i . Similar to the analysis in (3.14), we get ˆ H m ( t ) ≤ , P − a.s. , a.e. t ∈ [0 , T ] ,and observe that µ m ≤ . By Lemma 3.4, let v ( · ) = u m ( · ) and u ( · ) = u m − ( · ) , then we have J ( u m ( · )) − J (cid:0) u m − ( · ) (cid:1) ≤ exp (cid:8) − k f y k ∞ T (cid:9) E m "Z T ˆ H m ( t ) dt + C ( P ψ ∈{ b,σ,f } E m "Z T (cid:12)(cid:12) ψ ( t, X mt , Y mt , Z mt , u mt ) − ψ ( t, X mt , Y mt , Z mt , u m − t ) (cid:12)(cid:12) dt + P w ∈{ x,y,z } E m "Z T (cid:12)(cid:12) G w ( t, Θ mt , p mt , q mt , u mt , u m − t ) − G w ( t, Θ mt , p mt , q mt , u m − t , u m − t ) (cid:12)(cid:12) dt (3.23)for some universal constant C > depending on n , d , T , k b x k ∞ , k σ x k ∞ , k Φ x k ∞ , k Df k ∞ and (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ .Hence, by choosing ρ > Cexp (cid:8) k f y k ∞ T (cid:9) and definition of the augmented Hamiltonian ˜ H , one can rewrite(3.23) as J ( u m ( · )) − J (cid:0) u m − ( · ) (cid:1) ≤ (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) µ m ≤ . Consequently, for any integer l ≥ , we have l P m =1 ( − µ m ) ≤ (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) − l P m =1 (cid:2) J (cid:0) u m − ( · ) (cid:1) − J ( u m ( · )) (cid:3) = (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) − (cid:2) J (cid:0) u ( · ) (cid:1) − J (cid:0) u l ( · ) (cid:1)(cid:3) ≤ (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) − (cid:20) J (cid:0) u ( · ) (cid:1) − inf u ( · ) ∈U [0 ,T ] J ( u ( · )) (cid:21) < ∞ , which implies that ∞ P m =1 ( − µ m ) < ∞ . Since − µ m ≥ , we obtain that µ m → as m → ∞ .As was mentioned in [23], the quantity | µ m | = − µ m ≥ measures the distance from a solution tothe extended SMP (2.12). Indeed, if µ m = 0 for some integer m , then from the augmented Hamiltonianminimization step (2.13) we get ≤ ρ ( P ψ ∈{ b,σ,f } E m "Z T (cid:12)(cid:12) ψ ( t, X mt , Y mt , Z mt , u mt ) − ψ ( t, X mt , Y mt , Z mt , u m − t ) (cid:12)(cid:12) dt + P w ∈{ x,y,z } E m "Z T (cid:12)(cid:12) G w ( t, Θ mt , p mt , q mt , u mt , u m − t ) − G w ( t, Θ mt , p mt , q mt , u m − t , u m − t ) (cid:12)(cid:12) dt ≤ − µ m = 0 , which implies that the feasibility error (see Remark 3.5) vanishes and ˜ H ( t, X mt , Y mt , Z mt , p mt , q mt , P mt , u m − t , u m − t ) = min u ∈ U ˜ H ( t, X mt , Y mt , Z mt , p mt , q mt , P mt , u, u m − t ) , − a.s , a.e. t ∈ [0 , T ] . Thus (cid:0) X m , Y m , Z m , p m , q m , P m , u m − ( · ) (cid:1) solves (2.12). Corollary 3.7.
Suppose that Assumption 2.6 holds, and f is independent of y, z . We further assume thatthe following local maximum principle holds: G ( t, ¯ X t , ¯ p t , ¯ q t , ¯ u t ) ≤ G ( t, ¯ X t , ¯ p t , ¯ q t , u ) , ∀ u ∈ U, P − a.s., a.e. t ∈ [0 , T ] . (3.24) Then, the estimate (3.8) becomes J ( v ( · )) − J ( u ( · )) ≤ E "Z T [ G ( t, X ut , p ut , q ut , v t ) − G ( t, X ut , p ut , q ut , u t )] dt + C ( E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + E "Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + E "Z T | G x ( t, X ut , p ut , q ut , v t ) − G x ( t, X ut , p ut , q ut , u t ) | dt . (3.25) Furthermore, Algorithm 1 converges to the set of solutions to the extended version of (3.24) for ρ > C . Proof . Since f is independent of y, z , we have ˜ E = E , and it is not necessary to estimate (3.19) in the proofof Lemma 3.4. Thus, due to the vanishing of y , z and P , we have H ( t, x, p, q, v ) = G ( t, x, p, q, v ) . In thiscase, the augmented Hamiltonian becomes ˜ H ( t, x, p, q, v, u ) = G ( t, x, p, q, v ) + ρ | b ( t, x, v ) − b ( t, x, u ) | + ρ | σ ( t, x, v ) − σ ( t, x, u ) | + ρ | G x ( t, x, p, q, v ) − G x ( t, x, p, q, u ) | . Then, similar to the proof of Lemma 3.4, we update the control by u mt ∈ arg min u ∈ U ˜ H ( t, X mt , p mt , q mt , u, u m − t ) . and obtain estimate (3.25). By Theorem 3.6, we get the convergence of Algorithm 1 for ρ > C . Remark 3.8.