[PDF] A Modified Method of Successive Approximations for Stochastic Recursive Optimal Control Problems

Abstract

Based on the global stochastic maximum principle for partially coupled forward-backward stochastic control systems, a modified method of successive approximations (MSA for short) is established for stochastic recursive optimal control problems. The second-order adjoint processes are introduced in the augmented Hamiltonian minimization step in order to find the optimal control which can reach the global minimum of the cost functional. Thanks to the theory of bounded mean oscillation martingales (BMO-martingales for short), we give a delicate proof of the error estimate and obtain the convergence result of the modified MSA algorithm.

Full PDF

aa r X i v : . [ m a t h . O C ] F e b A Modiﬁed Method of Successive Approximations forForward-Backward Stochastic Control Systems

Shaolin Ji ∗ Rundong Xu † February 3, 2021

Abstract . Based on the global stochastic maximum principle established by Hu [11], a modiﬁed methodof successive approximations (MSA for short) is established for decoupled forward-backward stochastic con-trol systems. The second-order adjoint processes is introduced in the augmented Hamiltonian minimizationstep in order to ﬁnd the optimal control which can reach the global minimum of the cost functional. Thanksto the theory of bounded mean oscillation martingales (BMO-martingales for short), we give a delicate proofof the error estimate and obtain convergence result of the modiﬁed MSA algorithm.

Key words . BMO-martingales; Forward-backward stochastic control systems; Method of successiveapproximations; Global stochastic maximum principle; Stochastic recursive optimal control

AMS subject classiﬁcations.

It has always been paid much attention on ﬁnding numerical solutions to optimal control problems viascientiﬁc computing methods. As one of those methods, the method of successive approximations (MSAfor short) is a much eﬃcient tool to tackle optimal control problems. On the one hand, compared withthe algorithms based on the dynamic programming approach (for example, the Bellman-Howard policyiteration algorithm in [15]), the MSA is an iterative method equipped with alternating propagation andoptimization steps based on the maximum principle. On the other hand, as was suggested in [22], thisleads to an alternative approach dealing with the deep residual networks (He et al. [9]) from the optimalcontrol viewpoint since such kind of neural networks can be regarded as a discretization of certain continuousdeterministic control systems.The MSA based on Pontryagin’s maximum principle [3] for numerical solutions to deterministic controlsystems was ﬁrst proposed by Krylov et al. [18]. This method includes successive integrations of the stateand adjoint equations and updates the control variables by minimizing the Hamiltonian. After that, many ∗ Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, PR China. Email:[email protected] (Corresponding author). Research supported by the National Natural Science Foundation of China (No.11971263; 11871458). † Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, PR China. Email:[email protected]. D x σ ( · ) ≡ " to guarantee the secondterm Z α of the solution to the adjoint equation is bounded. Furthermore, since their modiﬁed MSA is basedon local SMP, it can not deal with the case when the control domain is non-convex.In this paper, on the one hand, we remove the above unnecessary assumption imposed on the diﬀusioncoeﬃcients by employing the BMO property of the solution to the adjoint equation. More than that, wegeneralize the modiﬁed MSA to the decoupled forward-backward stochastic control system (FBSCS for short)for which the state equation is described by a decoupled forward-backward stochastic diﬀerential equation(FBSDE for short) (see [13], [24], [34] and the references therein). This kind of optimal control problemis also called stochastic recursive optimal control problem which plays an important role in economic andﬁnancial ﬁelds (see [5], [7], [20] and the references therein).On the other hand, we study the general case when the control domain may be non-convex. For thispurpose, we need to establish the modiﬁed MSA based on the global SMP for FBSCSs. As for the globalSMP, Peng [25] ﬁrst established the general SMP for classical stochastic control systems. Then, numerousprogress has been made for various stochastic control systems (see [8], [26], [29], [30], [33] and the referencestherein). Recently, Hu [11] introduced two adjoint equations to obtain the global SMP for FBSCSs governedby decoupled FBSDEs and solved the open problem proposed by Peng [27]. Based on Hu’s work, Hu et al.[12] proposed a new method to obtain the ﬁrst and second-order variational equations which are essentiallyfully coupled FBSDEs, and derived the global SMP for fully coupled FBSCSs.Our main contributions are as follows. Firstly, we extend the modiﬁed MSA from the classical stochasticcontrol systems to the decoupled FBSCSs (2.3). In contrast to the former, the emergence of Z u in thebackward state equation of (2.3) makes the error estimate more diﬃcult. By applying the Girsanov trans-formation, the process (3.9) disappears in the drift term, and we get the error estimate (3.8) under a newreference probability measure successfully.Secondly, the convergence of our modiﬁed MSA does not need the assumption " D x σ ( · ) ≡ ". It is worthto pointing out that the challenge to obtain the desired error estimate is the unboundedness of the solution q u to the adjoint equation (2.7). As mentioned earlier, this technical diﬃculty was avoided if we impose therestrictive assumption " D x σ ( · ) ≡ " which makes q u ( Z α in their context) bounded. Fortunately, we foundthat q u · W is a multi-dimensional BMO-martingale and obtained the convergence of our modiﬁed MSA dueto some useful inequalities involved the harmonic analysis on the space of BMO-martingales.Thirdly, since our modiﬁed MSA is based on the global SMP, the augmented Hamiltonian contains thesecond-order adjoint process P u (see (2.8)) whose boundedness is essential to obtain the error estimate (3.8).We proved that the boundedness of P u depends on the BMO property of q u · W . Consequently, we claimthat the optimal control we found is a candidate control that reaches the global minimum.The rest of the paper is organized as follows. In section 2, preliminaries and the formulation of ourproblem are given. In section 3, we ﬁrst show properties of the solutions to the adjoint equations, then stateour main results consist of the error estimate and the convergence of our modiﬁed MSA algorithm.2 Preliminaries and Problem Formulation

Fix a terminal value T ∈ (0 , + ∞ ) and three positive integers n , d and k . Let (Ω , F , P ) be a completeprobability space on which a standard d -dimensional Brownian motion W = ( W t , W t , ...W dt ) ⊺ t ∈ [0 ,T ] is deﬁned,and F : = {F t } t ∈ [0 ,T ] be the P -augmentation of the natural ﬁltration generated by W .Denote by R n the n -dimensional real Euclidean space, R n × m the set of n × m real matrices ( n, m ≥ )and S n × n the set of all n × n symmetric matrices. The scalar product (resp. norm) of A , B ∈ R n × m isdenoted by h A, B i = tr { AB ⊺ } (resp. k A k = p tr { AA ⊺ } ), where the superscript ⊺ denotes the transpose ofvectors or matrices. Denote by I n the n × n identity matrix.For any given p, q ≥ , we introduce the following Banach spaces. L p F T (Ω; R n ) : the space of F T -measurable R n -valued random variables ξ such that E [ | ξ | p ] < ∞ . L ∞F T (Ω; R n ) : the space of F T -measurable R n -valued random variables ξ such that ess sup ω ∈ Ω | ξ ( ω ) | < ∞ . L ∞F ([0 , T ]; R n ) : the space of F -adapted R n -valued processes ( ϕ t ) t ∈ [0 ,T ] such that k ϕ k ∞ := ess sup ( t,ω ) ∈ [0 ,T ] × Ω | ϕ t ( ω ) | < ∞ . S p F ([0 , T ]; R n ) : the space of F -adapted R n -valued continuous processes ( ϕ t ) t ∈ [0 ,T ] such that E " sup t ∈ [0 ,T ] | ϕ t | p < ∞ . H p F ([0 , T ]; R n ) : the space of R n -valued F -martingales M = (cid:0) M , . . . , M n (cid:1) ⊺ such that M = 0 and k M k H p := (cid:13)(cid:13)p tr {h M i T } (cid:13)(cid:13) L p < ∞ , where h M i t := (cid:0)(cid:10) M i , M j (cid:11) t (cid:1) ≤ i,j ≤ n for t ∈ [0 , T ] . M p ( R n × d ) : the space of R n × d -valued F -progressively measurable processes ( ϕ t ) t ∈ [0 ,T ] such that k ϕ k M p :=  E  Z T | ϕ t | dt ! p  p < ∞ .BM O : the space of processes M ∈ H F ([0 , T ]; R ) such that k M k BMO := sup τ (cid:13)(cid:13)(cid:13) ( E [ h M i T − h M i τ | F τ ]) (cid:13)(cid:13)(cid:13) ∞ < ∞ , (2.1)where the supremum is taken over all stopping times τ ∈ [0 , T ] . Furthermore, one can replace τ with alldeterministic times t ∈ [0 , T ] in deﬁnition (2.1). K ( R n × d ) : the space of R n × d -valued processes ϕ ∈ M ( R n × d ) such that k ϕ k K := sup τ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) E "Z Tτ | ϕ s | ds | F τ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ < ∞ , (2.2)where the supremum is taken over all stopping times τ ∈ [0 , T ] . Furthermore, one can replace τ with alldeterministic times t ∈ [0 , T ] in deﬁnition (2.2).We write BM O ( Q ) and K ( R n × d ; Q ) for any probability Q deﬁned on (Ω , F ) whenever it is necessaryto indicate the underlying probability. For simplicity, if the underlying probability is P , we still use thenotations BM O and K ( R n × d ) . 3 .1 Some Notations and Results of BMO-Martingales Here we list some notations and results of BMO-martingales, which will be used in this paper. We refer thereaders to [6], [10], [17] and the references therein for more details.Denote by E ( M ) the Doléans-Dade exponential of a continuous local martingale M , that is, E ( M t ) =exp (cid:8) M t − h M i t (cid:9) for any t ∈ [0 , T ] . If M ∈ BM O , then E ( M ) is a uniformly integrable martingale (seeTheorem 2.3 in [17]).Let H be an R d -valued F -adapted process. Denote by H · W the stochastic integral of H with respect tothe d -dimensional Brownian motion W , that is, ( H · W ) t := P di =1 R t H is dW is for t ∈ [0 , T ] .The following theorem plays an signiﬁcant role in characterizing the duality between H F ([0 , T ]; R ) and BM O . Theorem 2.1 (Feﬀerman’s inequality) . If M ∈ BM O and N ∈ H F ([0 , T ]; R ) , then E "Z T | d h M, N i t | ≤ √ k M k BMO k N k H . For any M ∈ BM O , the energy-type inequality for h M i is a signiﬁcant result commonly used in BMO-martingale theory (see [17]). In essence, for any ϕ ∈ K ( R n × d ) , F -stopping time τ on [0 , T ] and A ∈ F τ , wecan apply Garsia’s Lemma ([10], Lemma 10.35) to the continuous increasing process (cid:16) A R tτ | ϕ s | ds (cid:17) t ∈ [ τ,T ] to obtain the following energy-type inequality. Proposition 2.2 (Energy inequality) . Let ϕ ∈ K ( R n × d ) . Then, for any integer m and F -stopping time τ on [0 , T ] , we have E " Z Tτ | ϕ s | ds ! m | F τ ≤ m ! k ϕ k m K . Recall that the space

BM O depends on the underlying probability measure. The following lemma showsthe equivalence of diﬀerent BMO-norms under the Girsanov transformation.

Lemma 2.3 ([14], Lemma A.4) . Let

K > be a given constant and M be in BM O . Then, there areconstants c > and c > depending only on K such that for any N ∈ BM O and k N k BMO ≤ K , we have c k M k BMO ≤ (cid:13)(cid:13)(cid:13) ˜ M (cid:13)(cid:13)(cid:13) BMO (˜ P ) ≤ c k M k BMO , where ˜ M := M − h M, N i and d ˜ P := E ( N T ) d P . The following proposition is a more profound result by applying Feﬀerman’s inequality.

Proposition 2.4 ([6], Lemma 1.4) . Let p ≥ . Assume that X ∈ S p F ([0 , T ]; R ) and M ∈ BM O . Then, X · M ∈ H p F ([0 , T ]; R ) . Moreover, we have the following estimate k X · M k H p ≤ √ k X k S p k M k BMO for p > and k X · M k H ≤ k X k S k M k BMO . emark 2.5. Let ϕ ∈ S p F ([0 , T ]; R n ) for some p ≥ . For i ∈ { , , . . . , d } , the stochastic integral ofreal-valued process ( | ϕ t | ) t ∈ [0 ,T ] with respect to W i is well-deﬁned. Moreover, | ϕ | · W i ∈ H p F ([0 , T ]; R ) since E h(cid:10) | ϕ | · W i (cid:11) p T i = E  Z T | ϕ t | dt ! p  ≤ T p E " sup t ∈ [0 ,T ] | ϕ t | p < ∞ . Consider the following decoupled FBSCS:  dX ut = b ( t, X ut , u t ) dt + σ ( t, X ut , u t ) dW t ,dY ut = − f ( t, X ut , Y ut , Z ut , u t ) dt + ( Z ut ) ⊺ dW t ,X u = x , Y uT = Φ( X uT ) , (2.3)with the cost functional J ( u ( · )) := Y u (2.4)for a given x ∈ R n and measurable functions b : [0 , T ] × R n × U R n , σ : [0 , T ] × R n × U R n × d , f : [0 , T ] × R n × R × R d × U R and Φ : R n R , where the control domain U is a nonempty subset of R k . We want to ﬁnd an optimal control minimizing (2.4) over the admissible control set.For deterministic control systems, it has been shown that the basic MSA may diverge when a bad initialvalue of control is chosen (see [2]) or the feasibility errors blow up (see [23]). It can be observed thatKerimkulov et al. [16] proposed directly a modiﬁed MSA for classical stochastic control systems to ensurethe convergence. To go a further step, we are aimed at establishing a modiﬁed MSA for stochastic recursivecontrol optimal control problems and obtaining the related convergence result.Before giving the modiﬁed MSA algorithm for (2.3), we ﬁrst introduce the global SMP for stochasticcontrol system (2.3). Set b ( · ) = (cid:0) b ( · ) , b ( · ) , . . . , b n ( · ) (cid:1) ⊺ ∈ R n σ ( · ) = (cid:0) σ ( · ) , σ ( · ) , . . . , σ d ( · ) (cid:1) ∈ R n × d ,σ i ( · ) = (cid:0) σ i ( · ) , σ i ( · ) , . . . , σ ni ( · ) (cid:1) ⊺ ∈ R n for i = 1 , , . . . , d. An admissible control u ( · ) is an F -adapted process with values in U such that sup t ∈ [0 ,T ] E h | u t | i < ∞ . (2.5)Denote by U [0 , T ] the set of all admissible controls. We assume there exists at least an optimal controlminimizing (2.4), and impose the following assumptions on the coeﬃcients of (2.3). Assumption 2.6.

Let L i , i = 1 , , are given positive constants.(i) b and σ are twice continuously diﬀerentiable with respect to x . b , σ , b x , σ x , b xx , σ xx are continuousin ( x, u ) . b x , σ x , b xx , σ xx are bounded. b and σ are bounded by L (1 + | x | + | u | ) . ii) Φ is twice continuously diﬀerentiable with respect to x . Φ , Φ x , Φ xx are continuous in x . Φ x , Φ xx are bounded, and Φ is bounded by L (1 + | x | ) .(iii) f is twice continuously diﬀerentiable with respect to ( x, y, z ) . f together with its gradient Df , Hessianmatrix D f with respect to x , y , z are continuous in ( x, y, z, u ) . Df , D f are bounded, and f is bounded by L (1 + | x | + | y | + | z | + | u | ) . Let us ﬁx a u ( · ) ∈ U [0 , T ] arbitrarily. Under Assumption 2.6, thanks to [28] (Chapter V, Theorem 6) andTheorem 5.1 in [7], (2.3) admits a unique solution ( X u , Y u , Z u ) ∈ S F ([0 , T ]; R n ) ×S F ([0 , T ]; R ) × M ( R n × d ) .We call ( X u , Y u , Z u ) the state trajectory corresponding to u ( · ) . Particularly, let ¯ u ( · ) be an optimal control, (cid:0) ¯ X, ¯ Y , ¯ Z (cid:1) be the corresponding state trajectory of (2.3) and (¯ p, ¯ q ) , (cid:0) ¯ P , ¯ Q (cid:1) be the corresponding uniquesolution to the ﬁrst-order adjoint equation (2.7), the second-order adjoint equation (2.8) below respectively.The (stochastic) Hamiltonian H : [0 , T ] × Ω × R n × R n × R n × d × R n × n × U R is deﬁned as follow: H ( t, x, y, z, p, q, P, u ) = p ⊺ b ( t, x, u ) + d P i =1 (cid:0) q i (cid:1) ⊺ σ i ( t, x, u ) + f ( t, x, y, z + ∆( t, x, u ) , u )+ 12 d P i =1 (cid:0) σ i ( t, x, u ) − σ i ( t, ¯ X t , ¯ u t ) (cid:1) ⊺ P (cid:0) σ i ( t, x, u ) − σ i ( t, ¯ X t , ¯ u t ) (cid:1) , where q i is the i th column of q for i ∈ { , , . . . , d } , and ∆( t, x, u ) = (cid:16)(cid:0) σ ( t, x, u ) − σ ( t, ¯ X t , ¯ u t ) (cid:1) ⊺ p, . . . , (cid:0) σ d ( t, x, u ) − σ d ( t, ¯ X t , ¯ u t ) (cid:1) ⊺ p (cid:17) ⊺ . Then, the following maximum principle ([11], Theorem 3) holds:

Theorem 2.7.

Let Assumption 2.6 hold. Then, for all u ∈ U , H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , ¯ u t ) ≤ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , u ) , P − a.s., a.e. t ∈ [0 , T ] . (2.6)Secondly, it follows from the pioneering works mentioned before that a key step to control the divergentbehavior rigorously of the modiﬁed MSA is to obtain the error estimate by estimating the diﬀerence betweentwo cost functionals J ( u ( · )) and J ( v ( · )) corresponding to diﬀerent admissible controls u ( · ) and v ( · ) . In orderto do this, we need the following notations and introduce the generalized Hamiltonian together with theaugmented form of it.Deﬁne the function G : [0 , T ] × R n × R × R d × R n × R n × d × U × U R by G ( t, x, y, z, p, q, v, u ) = p ⊺ b ( t, x, v ) + d X i =1 (cid:0) q i (cid:1) ⊺ σ i ( t, x, v ) + f ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) , where q = (cid:0) q , q , . . . , q d (cid:1) and ˜∆( t, x, p, v, u ) := (cid:16)(cid:0) σ ( t, x, v ) − σ ( t, x, u ) (cid:1) ⊺ p, . . . , (cid:0) σ d ( t, x, v ) − σ d ( t, x, u ) (cid:1) ⊺ p (cid:17) ⊺ . Let u ( · ) , v ( · ) ∈ U [0 , T ] . For simplicity, for ψ = b , σ , f , Φ and w = x , y , z , set Θ ut = ( X ut , Y ut , Z ut ) , Θ u,vt = (cid:16) X ut , Y ut , Z ut + ˜∆( t, X ut , p ut , v t , u t ) (cid:17) ,ψ u ( t ) = ψ ( t, Θ ut , u t ) , ψ u,v ( t ) = ψ ( t, Θ u,vt , v t ) ,ψ uw ( t ) = ψ w ( t, Θ ut , u t ) , ψ u,vw ( t ) = ψ w ( t, Θ u,vt , v t ) ,ψ uww ( t ) = ψ ww ( t, Θ ut , u t ) , ψ u,vww ( t ) = ψ ww ( t, Θ u,vt , v t ) , D f u ( t ) = D f ( t, Θ ut , u t ) for all t ∈ [0 , T ] .In our context, the ﬁrst-order (resp. second-order) adjoint equation in [11] for (2.3) is (2.7) (resp. (2.8))below. p ut = Φ x ( X uT ) + Z Tt { G x ( s, Θ us , p us , q us , u s , u s ) + G y ( s, Θ us , p us , q us , u s , u s ) p us + Υ( s, X us , p us , q us , u s ) G z ( s, Θ us , p us , q us , u s , u s ) } ds − d P i =1 Z Tt ( q us ) i dW is , (2.7) P ut = Φ xx ( X uT ) + Z Tt (cid:8) f uy ( s ) P us + ( b ux ( s )) ⊺ P us + ( P us ) ⊺ b ux ( s )+ d P i =1 f uz i ( s ) h(cid:16) ( σ ux ( s )) i (cid:17) ⊺ P us + ( P us ) ⊺ ( σ ux ( s )) i i + d P i =1 (cid:16) ( σ ux ( s )) i (cid:17) ⊺ P us ( σ ux ( s )) i + d P i =1 f uz i ( s ) ( Q us ) i + d P i =1 h(cid:16) ( σ ux ( s )) i (cid:17) ⊺ ( Q us ) i + (cid:16) ( Q us ) i (cid:17) ⊺ ( σ ux ( s )) i i + Ψ us (cid:27) ds − d P i =1 Z Tt ( Q us ) i dW is , (2.8)where Υ( t, X ut , p ut , q ut , u t ) := (cid:16)(cid:0) σ x ( t, X ut , u t ) (cid:1) ⊺ p ut + ( q ut ) , . . . , (cid:0) σ dx ( t, X ut , u t ) (cid:1) ⊺ p ut + ( q ut ) d (cid:17) and Ψ ut := n P j =1 ( b uxx ( t )) j ( p ut ) j + d P i =1 n P j =1 ( σ uxx ( t )) ji (cid:16) f uz i ( t ) ( p ut ) j + ( q ut ) ji (cid:17) + ( I n , p ut , Υ( t, X ut , p ut , q ut , u t )) D f u ( t ) ( I n , p ut , Υ( t, X ut , p ut , q ut , u t )) ⊺ . (2.9)Deﬁne the generalized Hamiltonian H : [0 , T ] × R n × R n × R n × d × R n × n × U × U R by H ( t, x, y, z, p, q, P, v, u )= G ( t, x, y, z, p, q, v, u ) + 12 d P i =1 (cid:0) σ i ( t, x, v ) − σ i ( t, x, u ) (cid:1) ⊺ P (cid:0) σ i ( t, x, v ) − σ i ( t, x, u ) (cid:1) . (2.10)Then, for all u ∈ U , the maximum principle in Theorem 2.7 can be rewritten as follow: H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , ¯ u t , ¯ u t ) ≤ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , u, ¯ u t ) , P − a.s., a.e. t ∈ [0 , T ] . Now we deﬁne the augmented form ˜ H : [0 , T ] × R n × R n × R n × d × R n × n × U × U R for some ρ ≥ by ˜ H ( t, x, y, z, p, q, P, v, u )= H ( t, x, y, z, p, q, P, v, u ) + ρ ( P ψ ∈{ b,σ,f } | ψ ( t, x, y, z, v ) − ψ ( t, x, y, z, u ) | + P w ∈{ x,y,z } | G w ( t, x, y, z, p, q, v, u ) − G w ( t, x, y, z, p, q, u, u ) | ) . (2.11)Note that when ρ = 0 we get exactly the generalized Hamiltonian (2.10). Moreover, the maximumprinciple also holds for ˜ H , which is a basis of constructing the iterative algorithm.7 emma 2.8 ( Extended SMP ) . Let ¯ u ( · ) be an optimal control, (cid:0) ¯ X, ¯ Y , ¯ Z (cid:1) be the corresponding state tra-jectory of (2.3) and (¯ p, ¯ q ) , (cid:0) ¯ P , ¯ Q (cid:1) be the corresponding unique solution to (2.7), (2.8) respectively. Then wehave ˜ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , ¯ u t , ¯ u t ) = min u ∈ U ˜ H ( t, ¯ X t , ¯ Y t , ¯ Z t , ¯ p t , ¯ q t , ¯ P t , u, ¯ u t ) , P − a.s., a.e. t ∈ [0 , T ] . (2.12)The proof of the extended SMP is a direct application of Theorem 2.7 and (2.11) so we omit it. We call (cid:0) ¯ X, ¯ Y , ¯ Z, ¯ p t , ¯ q t , ¯ P t , ¯ u ( · ) (cid:1) a solution to (2.12).It should be emphasized that not all solutions to (2.12) are global optimal for (2.3) since the usual SMP(2.6) is only the necessary condition for optimality and (2.12) is weaker than (2.6). However, (2.6) is oftenstrong enough to give good solution candidates in practice and so is (2.12). In some special cases (seeExample 1 in [11]), (2.6) also becomes a suﬃcient condition without any convexity assumptions.As the end of this section, we introduce the modiﬁed MSA in the following algorithm. Algorithm 1

Modiﬁed Method of Successive Approximations For Decoupled FBSCSs Initialisation:

Make a guess of the control u = (cid:0) u t (cid:1) t ∈ [0 ,T ] . while the diﬀerence between J ( u m ( · )) and J ( u m − ( · )) is large do Given a control u m − = (cid:0) u m − t (cid:1) t ∈ [0 ,T ] solve FBSDE (2.3), then obtain the state trajectory ( X m , Y m , Z m ) . Solve BSDE (2.7) and (2.8) with the control u m − ( · ) and state trajectory ( X m , Y m , Z m ) , then obtainthe ﬁrst and second-order adjoint variables ( p m , q m ) and ( P m , Q m ) . Update the control u mt ∈ arg min u ∈ U ˜ H ( t, X mt , Y mt , Z mt , p mt , q mt , P mt , u, u m − t ) for t ∈ [0 , T ] . (2.13) end while Return u m . In this section, the universal constant C may depend only on n , d , T , k b x k ∞ , k σ x k ∞ , k Φ x k ∞ , k Df k ∞ , (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ and will change from line to line in our proof. Before stating main results of the paper, we ﬁrst give some properties of solutions to adjoint equations (2.7)and (2.8), which is necessary to prove the convergence of the Algorithm 1.Under Assumption 2.6, the following lemma shows that the ﬁrst (resp. second) variable p (resp. q ) ofthe solution to the ﬁrst-order adjoint equation (2.7) belongs to the space L ∞F ([0 , T ]; R n ) (resp. K ( R n × d ) ). Lemma 3.1.

Let Assumption 2.6 hold. Then, for any u ( · ) ∈ U [0 , T ] , equation (2.7) admits a unique solution ( p u , q u ) ∈ S F ([0 , T ]; R n ) × M ( R n × d ) , where q u = (cid:16) ( q u ) , ( q u ) , . . . , ( q u ) d (cid:17) . Moreover, p u ∈ L ∞F ([0 , T ]; R n ) and q u ∈ K ( R n × d ) . roof . At ﬁrst, equation (2.7) can be rewritten in the following form: p ut = Φ x ( X uT ) + Z Tt " ( A u ( s )) ⊺ p us + d X i =1 (cid:16) ( B u ( s )) i (cid:17) ⊺ ( q us ) i + f ux ( s ) ds − d X i =1 Z Tt ( q us ) i dW is , where A u ( t ) := d P i =1 f uz i ( t ) ( σ ux ( t )) i + f uy ( t ) I n + b ux ( t );( B u ( t )) i := f uz i ( t ) I n + ( σ ux ( t )) i , for i = 1 , , . . . , d. Since b x , σ x , Φ x , f x , f y and f z are bounded, it can be easily veriﬁed that both A u ( · ) and B u ( · ) are uniformlybounded. Moreover, by Theorem 5.1 in [7], there exists a unique solution ( p u , q u ) ∈ S F ([0 , T ]; R n ) ×M ( R n × d ) to BSDE (2.7). Furthermore, p u can be expressed explicitly by p ut = E " (Λ ut ) ⊺ (Γ uT ) ⊺ Φ x ( X uT ) + Z Tt (Λ ut ) ⊺ (Γ us ) ⊺ f ux ( s ) ds | F t for t ∈ [0 , T ] , (3.1)where Γ u and Λ u satisfy the following matrix-valued SDEs respectively: Γ ut = I n + Z t A u ( s )Γ us ds + d X i =1 Z t ( B u ( s )) i Γ us dW is for t ∈ [0 , T ] and Λ ut = I n + Z t Λ us (cid:20) − A u ( s ) + d P i =1 (cid:16) ( B u ( s )) i (cid:17) (cid:21) ds − d P i =1 Z t Λ us ( B u ( s )) i dW is for t ∈ [0 , T ] . By using Itô’s formula, it can be veriﬁed that Λ ut = (Γ ut ) − P -almost surely for all t ∈ [0 , T ] . For each ﬁxed t ∈ [0 , T ] , set (Γ ts ) u = Γ us Λ ut for s ∈ [ t, T ] . Then it is easy to check that Γ t satisﬁes the following SDE: (cid:0) Γ ts (cid:1) u = I n + Z st A u ( s ) (cid:0) Γ tr (cid:1) u dr + d X i =1 Z st ( B u ( s )) i (cid:0) Γ tr (cid:1) u dW ir , s ∈ [ t, T ] . (3.2)By using a standard SDE estimate, we obtain E h sup s ∈ [ t,T ] (cid:12)(cid:12) (Γ ts ) u (cid:12)(cid:12) β | F t i ≤ C for all t ∈ [0 , T ] and any β > .Then, it follows immediately that | p ut | ≤ E " sup s ∈ [ t,T ] (cid:12)(cid:12) (Γ ts ) u (cid:12)(cid:12) | Φ x ( X uT ) | + Z Tt | f ux ( s ) | ds ! | F t ≤ ( k Φ x k ∞ + k f x k ∞ T ) E " sup s ∈ [ t,T ] (cid:12)(cid:12) (Γ ts ) u (cid:12)(cid:12) | F t ≤ C. k p u k ∞ < ∞ . Moreover, since p u is bounded, by Itô’s formula, one can obtain that Z Tt | q us | ds ≤ | Φ x ( X uT ) | + 2 Z Tt |h p us , ( A u ( s )) ⊺ p us i| ds + 2 Z Tt |h p us , f ux ( s ) i| ds +2 d P i =1 Z Tt (cid:12)(cid:12)(cid:12)D p us , (cid:16) ( B u ( s )) i (cid:17) ⊺ ( q us ) i E(cid:12)(cid:12)(cid:12) ds − d P i =1 Z Tt D p us , ( q us ) i E dW is , ≤ k Φ x k ∞ + 2 T (cid:16) k A u ( · ) k ∞ k p u k ∞ + k f x k ∞ k p u k ∞ (cid:17) +2 k p u k ∞ d P i =1 (cid:13)(cid:13)(cid:13) ( B u ( · )) i (cid:13)(cid:13)(cid:13) ∞ Z Tt (cid:12)(cid:12)(cid:12) ( q us ) i (cid:12)(cid:12)(cid:12) ds − d P i =1 Z Tt D p us , ( q us ) i E dW is . (3.3)Consequently, by taking conditional expectation on both sides of (3.3) and Young’s inequality, we have E hR Tt | q us | ds | F t i ≤ C for all t ∈ [0 , T ] , which implies that q u ∈ K ( R n × d ) . Remark 3.2.

An inspection of the above proof shows that sup u ( · ) ∈U [0 ,T ] ( k p u k ∞ + k q u k K ) ≤ C , i.e. C isindependent of u ( · ) . Now we give the property of solution ( P u , Q u ) to the second-order adjoint equation. Lemma 3.3.

Let in Assumption 2.6 hold. Then, for any u ( · ) ∈ U [0 , T ] , equation (2.8) admits a uniquesolution ( P u , Q u ) ∈ S F ([0 , T ]; S n × n ) × (cid:0) M ( S n × n ) (cid:1) d , where Q u = (cid:16) ( Q u ) , ( Q u ) , . . . , ( Q u ) d (cid:17) . Moreover, P u ∈ L ∞F ([0 , T ]; S n × n ) . Proof . By Theorem 5.1 in [7] with the boundedness of b x , σ x , Φ x , f x , b xx , σ xx , Φ xx and f xx , there existsa unique solution ( P u , Q u ) ∈ S F ([0 , T ]; S n × n ) × (cid:0) M ( R n × n ) (cid:1) d to BSDE (2.8). Furthermore, denote by ( P u ) j (resp. (Ψ u ) j , ( Q u ) ji ) the j th column of P u (resp. Ψ u , ( Q u ) i ) for i = 1 , , . . . , d , j = 1 , , . . . , n , and I ijn ∈ R n × n the matrix whose elements equal to 0 except that the one in i th row and j th column equals to . Set ˜ P ut := (cid:16)(cid:16) ( P ut ) (cid:17) ⊺ , (cid:16) ( P ut ) (cid:17) ⊺ , . . . , (( P ut ) n ) ⊺ (cid:17) ⊺ ∈ R n , (cid:16) ˜ Q us (cid:17) i := (cid:16)(cid:16) ( Q ut ) i (cid:17) ⊺ , (cid:16) ( Q ut ) i (cid:17) ⊺ , . . . , (cid:16) ( Q ut ) ni (cid:17) ⊺ (cid:17) ⊺ ∈ R n ,A u ( t ) :=  ( b ux ( t )) ⊺ . . . ( b ux ( t )) ⊺  ∈ R n × n , I ∗ n :=  I n · · · I n n ... . . . ... I nn · · · I nnn  ∈ S n × n , ( B u ( t )) i :=  (cid:16) ( σ ux ( t )) i (cid:17) ⊺ . . . (cid:16) ( σ ux ( t )) i (cid:17) ⊺  ∈ R n × n , ( D u ( t )) i :=  (cid:0) σ ux ( t ) (cid:1) i I n · · · (cid:0) σ ux ( t ) (cid:1) ni I n ... . . . ... (cid:0) σ ux n ( t ) (cid:1) i I n · · · (cid:0) σ ux n ( t ) (cid:1) ni I n  ∈ R n × n , ˜ P ut = ˜ P uT + Z Tt (cid:26)(cid:20) f uy ( s ) I n + d P i =1 f uz i ( s ) (cid:0) I n + I ∗ n (cid:1) ( B u ( s )) i + (cid:0) I n + I ∗ n (cid:1) A u ( s )+ d P i =1 ( D u ( s )) i ( B u ( s )) i (cid:21) ˜ P us + d P i =1 h f uz i ( s ) + (cid:0) I n + I ∗ n (cid:1) ( B u ( s )) i i (cid:16) ˜ Q us (cid:17) i + ˜Ψ us (cid:27) ds − Z Tt d P i =1 (cid:16) ˜ Q us (cid:17) i dW s , (3.4)where I n is the n × n identity matrix and ˜Ψ ut := (cid:16)(cid:16) (Ψ ut ) (cid:17) ⊺ , (cid:16) (Ψ ut ) (cid:17) ⊺ , ..., ((Ψ ut ) n ) ⊺ (cid:17) ⊺ . Obviously, (3.4)is a Lipschitz linear BSDE. Since p u ∈ L ∞F ([0 , T ]; R n ) and q u ∈ K ( R n × d ) , for all t ∈ [0 , T ] , it can be veriﬁedthat E  Z Tt (cid:12)(cid:12)(cid:12) ˜Ψ us (cid:12)(cid:12)(cid:12) ds ! | F t  = E  Z Tt | Ψ us | ds ! | F t  ≤ C. (3.5)Indeed, for instance, it follows immediately from the energy inequality that E  Z Tt d P j,k =1 (cid:12)(cid:12)(cid:12) f uz j z k ( s ) ( q us ) j (cid:16) ( q us ) k (cid:17) ⊺ (cid:12)(cid:12)(cid:12) ds ! | F t  ≤ d k f zz k ∞ E  Z Tt | q us | dt ! | F t  ≤ d k f zz k ∞ k q u k K . The other terms in (2.9) can be estimated similarly so that (3.5) holds. Therefore, similar to the proof ofLemma 3.1, one can obtain that sup u ( · ) ∈U [0 ,T ] (cid:13)(cid:13)(cid:13) ˜ P u (cid:13)(cid:13)(cid:13) ∞ < ∞ by applying Itô’s formula and the estimate (3.5),which implies that sup u ( · ) ∈U [0 ,T ] k P u k ∞ < ∞ . In order to prove the convergence of Algorithm 1, we need the following lemma about the error estimate. Itwill be seen that if we directly minimize H instead of ˜ H (Step 5 in Algorithm 1), then the updated controlvariable may fail to be the optimal descent direction for the cost functional when it incurs too much errorin solving the state and adjoint dynamics (Step 4 in Algorithm 1) in the next iteration. Lemma 3.4.

Suppose that Assumption 2.6 holds. For any u ( · ) ∈ U [0 , T ] , let v t ∈ arg min v ∈ U ˜ H ( t, X ut , Y ut , Z ut , p ut , q ut , P ut , v, u t ) . (3.6) Deﬁne a new probability ˜ P and a Brownian motion ˜ W with respect to ˜ P by d ˜ P := E d X i =1 Z T f u,vz i ( t ) dW it ! d P ; ˜ W it := W it − Z t f u,vz i ( s ) ds, i = 1 , , . . . , d (3.7) and denote by ˜ E [ · ] the mathematical expectation corresponding to ˜ P . Then, there exists a universal constant > such that J ( v ( · )) − J ( u ( · )) ≤ exp (cid:8) − k f y k ∞ T (cid:9) ˜ E "Z T ˆ H ( t ) dt + C ( P ψ ∈{ b,σ,f } ˜ E "Z T | ψ ( t, X ut , Y ut , Z ut , v t ) − ψ ( t, X ut , Y ut , Z ut , u t ) | dt + P w ∈{ x,y,z } ˜ E "Z T | G w ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G w ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | dt , (3.8) where ˆ H ( t ) := H ( t, X ut , Y ut , Z ut , p ut , q ut , P ut , v t , u t ) − H ( t, X ut , Y ut , Z ut , p ut , q ut , P ut , u t , u t ) for t ∈ [0 , T ] . Remark 3.5.

In a way, one can observe that the last two terms in (3.8) determine whether the Hamiltonianminimization step can make J ( · ) descend. In other words, they measure the degree of satisfaction of the stateequation (2.3), adjoint equations (2.7) and (2.8). We refer to them as the feasibility errors (see [23]). Proof . Let u ( · ) ∈ U [0 , T ] and v ( · ) is deﬁned as (3.6). Denote by η ( t ) = Y vt − Y ut − ( p ut ) ⊺ ( X vt − X ut ); ζ i ( t ) = ( Z vt ) i − ( Z ut ) i − ˜∆ i ( t, X ut , p ut , v t , u t ) − (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) − R i ( t ) − ( p ut ) ⊺ (cid:2) σ ix ( t, X ut , v t ) − σ ix ( t, X ut , u t ) (cid:3) ( X vt − X ut ) for i = 1 , , . . . , d (3.9)and ˜ D f v ( t ) = 2 Z Z λD f ( t, Θ u,vt + λµ (Θ vt − Θ u,vt ) , v t ) dµdλ, where R i ( t ) := n X j =1 ( p ut ) j Z Z λ tr (cid:8) σ jixx ( t, X ut + λµ ( X vt − X ut ) , v t )( X vt − X ut )( X vt − X ut ) ⊺ (cid:9) dλdµ, for i = 1 , , . . . , d . We ﬁrst note that G x ( t, x, y, z, p, q, v, u ) = b ⊺ x ( t, x, v ) p + d P i =1 (cid:0) σ ix ( t, x, v ) (cid:1) ⊺ q i + f x ( t, x, y, z + ˜∆( t, x, p, v, u ) , v )+ ˜∆ ⊺ x ( t, x, p, v, u ) f z ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) ,G y ( t, x, y, z, p, q, v, u ) = f y ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) ,G z ( t, x, y, z, p, q, v, u ) = f z ( t, x, y, z + ˜∆( t, x, p, v, u ) , v ) . η ( t ) , we have η ( t ) = η ( T ) + Z Tt (cid:26) f uy ( s ) η ( s ) + d P i =1 f u,vz i ( s ) ζ i ( s ) + (cid:2) f u,vy ( s ) − f uy ( s ) (cid:3) ( Y vs − Y us )+ G ( s, Θ us , p us , q us , v s , u s ) − G ( s, Θ us , p us , q us , u s , u s )+ ( G x ( s, Θ us , p us , q us , v s , u s ) − G x ( s, Θ us , p us , q us , u s , u s )) ⊺ ( X vs − X us )+ d P i =1 (cid:2) f u,vz i ( s ) − f uz i ( s ) (cid:3) (cid:0) Υ i ( s, X us , p us , q us , u s ) (cid:1) ⊺ ( X vs − X us )+ d P i =1 f u,vz i ( s ) R i ( s ) + R ( s ) + R ( s ) + R ( s ) (cid:27) ds − d P i =1 Z Tt ζ i ( s ) dW is , (3.10)where η ( T ) = Z Z λ tr { Φ xx ( X uT + λµ ( X vT − X uT ))( X vt − X ut )( X vt − X ut ) ⊺ } dλdµ,R ( t ) = n X j =1 ( p ut ) j Z Z λ tr (cid:8) b jxx ( t, X ut + λµ ( X vt − X ut ) , v t )( X vt − X ut )( X vt − X ut ) ⊺ (cid:9) dλdµ,R ( t ) = d X i =1 n X j =1 ( q ut ) ji Z Z λ tr (cid:8) σ jixx ( t, X ut + λµ ( X vt − X ut ) , v t )( X vt − X ut )( X vt − X ut ) ⊺ (cid:9) dλdµ and R ( t ) = 12 h ( X vt − X ut ) ⊺ , Y vt − Y ut , (cid:16) ( Z vt ) − ( Z ut ) − ˜∆( t, X ut , p ut , v t , u t ) (cid:17) ⊺ i ˜ D f v ( t ) × h ( X vt − X ut ) ⊺ , Y vt − Y ut , (cid:16) ( Z vt ) − ( Z ut ) − ˜∆( t, X ut , p ut , v t , u t ) (cid:17) ⊺ i ⊺ . By deﬁnition of the Hamiltonian H , (3.10) can be rewritten as η ( t ) = η ( T ) + Z Tt (cid:26) f uy ( s ) η ( s ) + d P i =1 f u,vz i ( s ) ζ i ( s ) + ˆ H ( s ) + φ s (cid:27) − d P i =1 Z Tt ζ i ( s ) dW is , (3.11)where φ t := ( G x ( t, Θ ut , p ut , q ut , v t , u t ) − G x ( t, Θ ut , p ut , q ut , u t , u t )) ⊺ ( X vt − X ut )+ (cid:2) f u,vy ( t ) − f uy ( t ) (cid:3) ( Y vt − Y ut ) + d P i =1 (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut )+ d P i =1 f u,vz i ( t ) R i ( t ) + R ( t ) + R ( t ) + R ( t ) − d P i =1 (cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1) ⊺ P ut (cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1) . By deﬁnition (3.7), (3.11) can be further rewritten as η ( t ) = η ( T ) + Z Tt h f uy ( s ) η ( s ) + ˆ H ( s ) + φ s i ds − d X i =1 Z Tt ζ i ( s ) d ˜ W is . (3.12)By applying Itô’s formula to exp nR t f uy ( s ) ds o η ( t ) on [0 , T ] , we get η ( t ) = exp (Z Tt f uy ( s ) ds ) η ( T ) + Z Tt exp (cid:26)Z st f uy ( r ) dr (cid:27) (cid:16) ˆ H ( s ) + φ s (cid:17) ds − d P i =1 Z Tt exp (cid:26)Z st f uy ( r ) dr (cid:27) ζ i ( s ) d ˜ W is . (3.13)13t can be veriﬁed that ˜ E (cid:20)(cid:16)R T | ζ ( t ) | dt (cid:17) (cid:21) < ∞ hence the stochastic integral in (3.13) is a true martingaleunder ˜ P . Then, by taking mathematical expectation ˜ E [ · ] on both sides of (3.13), we have J ( v ( · )) − J ( u ( · )) = η (0)= ˜ E " exp (Z T f uy ( t ) dt ) η ( T ) + Z T exp (cid:26)Z t f uy ( s ) ds (cid:27) (cid:16) ˆ H ( t ) + φ t (cid:17) dt ≤ exp (cid:8) k f y k ∞ T (cid:9) ˜ E " | η ( T ) | + Z T | φ t | dt + ˜ E "Z T exp (cid:26)Z t f uy ( s ) ds (cid:27) ˆ H ( t ) dt . Due to the deﬁnition of v ( · ) in (3.6) and the deﬁnition of ˜ H in (2.11), we get H ( t, Θ ut , p ut , q ut , P ut , v t , u t ) + ρ ( P ψ ∈{ b,σ,f } | ψ ( t, Θ ut , v t ) − ψ ( t, Θ ut , u t ) | + P w ∈{ x,y,z } | G w ( t, Θ ut , p ut , q ut , v t , u t ) − G w ( t, Θ ut , p ut , q ut , u t , u t ) | ) ≤ H ( t, Θ ut , p ut , q ut , P ut , u t , u t ) , (3.14)which implies that ˆ H ( t ) ≤ a.e. t ∈ [0 , T ] and P -almost surely. Thus we obtain J ( v ( · )) − J ( u ( · )) ≤ exp (cid:8) k f y k ∞ T (cid:9) ˜ E " | η ( T ) | + Z T | φ t | dt + exp (cid:8) − k f y k ∞ T (cid:9) ˜ E "Z T ˆ H ( t ) dt . (3.15)In order to obtain (3.8), we shall proceed to estimate ˜ E h | η ( T ) | + R T | φ t | dt i as the following ﬁve parts. (i) Estimate of ˜ E h | η ( T ) | + R T (cid:16)P di =1 (cid:12)(cid:12) f u,vz i ( t ) R i ( t ) (cid:12)(cid:12) + | R ( t ) | (cid:17) dt i .Denote by ˜ b u,vx ( t ) = Z b x ( t, X ut + λ ( X vt − X ut ) , v t ) dλ ; ˜ f u,vx ( t ) = Z f x ( t, Θ ut + λ (Θ vt − Θ ut ) , v t ) dλ. ˜ σ u,vx ( t ) , ˜ f u,vy ( t ) and ˜ f u,vz ( t ) are deﬁned similarly. Since X vt − X ut = Z t (cid:26)(cid:18) ˜ b u,vx ( s ) + d P i =1 f u,vz i ( s ) (˜ σ u,vx ( s )) i (cid:19) ( X vs − X us ) + b ( s, X us , v s ) − b ( s, X us , u s )+ d P i =1 f u,vz i ( s ) (cid:2) σ i ( s, X us , v s ) − σ i ( s, X us , u s ) (cid:3)(cid:27) ds + d P i =1 Z t n (˜ σ u,vx ( s )) i ( X vs − X us ) + (cid:2) σ i ( s, X us , v s ) − σ i ( s, X us , u s ) (cid:3)o d ˜ W is , by using a standard SDE estimate, we obtain ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt . (3.16)14herefore, by virtue of sup u ( · ) ∈U [0 ,T ] k p u k ∞ < ∞ and the boundedness of Φ xx , b xx , σ xx , f z , we get ˜ E " | η ( T ) | + d X i =1 Z T (cid:12)(cid:12) f u,vz i ( t ) R i ( t ) (cid:12)(cid:12) dt + Z T | R ( t ) | dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt . (ii) Estimate of ˜ E hR T | R ( t ) | dt i .Since sup u ( · ) ∈U [0 ,T ] k q u k K < ∞ , for i = 1 , , . . . , d and j = 1 , , . . . , n , we have ( q u ) ji · W i ∈ BM O .Furthermore, by Lemma 2.3, c (cid:13)(cid:13)(cid:13) ( q u ) ji · W i (cid:13)(cid:13)(cid:13) BMO ≤ (cid:13)(cid:13)(cid:13) ( q u ) ji · ˜ W i (cid:13)(cid:13)(cid:13) BMO (˜ P ) ≤ c (cid:13)(cid:13)(cid:13) ( q u ) ji · W i (cid:13)(cid:13)(cid:13) BMO , (3.17)where c and c are two constants depending only on k f z k ∞ and T . Then, it follows from Feﬀerman’sinequality, estimate (3.16) and inequality (3.17) that ˜ E "Z T | R ( t ) | dt ≤ k σ xx k ∞ d P i =1 n P j =1 ˜ E "Z T (cid:12)(cid:12)(cid:12) ( q ut ) ji (cid:12)(cid:12)(cid:12) | X vt − X ut | dt ≤ k σ xx k ∞ d P i =1 n P j =1 ˜ E "Z T (cid:12)(cid:12)(cid:12) d D ( q u ) ji · ˜ W i , | X v − X u | · ˜ W i E t (cid:12)(cid:12)(cid:12) ≤ √ T k σ xx k ∞ d P i =1 n P j =1 (cid:13)(cid:13)(cid:13) ( q u ) ji · ˜ W i (cid:13)(cid:13)(cid:13) BMO (˜ P ) ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | ≤ √ T ndc k σ xx k ∞ k q u k K ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt . (iii) Estimate of ˜ E hR T | R ( t ) | dt i .In order to estimate ˜ E hR T | R ( t ) | dt i , we only need to estimate the following term: ˜ E "Z T (cid:18)Z Z λ | f z i z i ( t, Θ u,vt + λµ (Θ vt − Θ u,vt ) , v t ) | dµdλ (cid:19) (cid:12)(cid:12)(cid:12) ( Z vt ) i − ( Z ut ) i − ˜∆ i ( t, X ut , p ut , v t , u t ) (cid:12)(cid:12)(cid:12) dt (3.18)for any i ∈ { , , . . . d } . Since Y vt − Y ut = Φ( X vT ) − Φ( X uT ) + Z Tt n(cid:16) ˜ f u,vx ( s ) (cid:17) ⊺ ( X vs − X us )+ ˜ f u,vy ( s )( Y vs − Y us ) + (cid:16) ˜ f u,vz ( s ) − f u,vz ( s ) (cid:17) ⊺ ( Z vs − Z us ) − (cid:16) ˜ f u,vz ( s ) (cid:17) ⊺ ˜∆( s, X us , p us , v s , u s ) + f u,v ( s ) − f u ( s ) o ds − Z Tt ( Z vs − Z us ) ⊺ d ˜ W is , | f u,v ( t ) − f u ( t ) | = (cid:12)(cid:12)(cid:12) f ( t, X ut , Y ut , Z ut + ˜∆( t, X ut , p ut , v t , u t ) , v t ) − f ( t, X ut , Y ut , Z ut , u t ) (cid:12)(cid:12)(cid:12) ≤ k f z k ∞ k p u k ∞ | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | + | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | , we get ˜ E " sup t ∈ [0 ,T ] | Y vt − Y ut | + Z T | Z vt − Z ut | dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + Z T | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | dt . (3.19)Thus, from estimate (3.19), and due to sup u ( · ) ∈U [0 ,T ] k p u k ∞ < ∞ and (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ < ∞ , (3.18) can be dominatedby (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ ˜ E "Z T (cid:12)(cid:12)(cid:12) ( Z vt ) i − ( Z ut ) i − ˜∆ i ( t, X ut , p ut , v t , u t ) (cid:12)(cid:12)(cid:12) dt ≤ C ˜ E "Z T | Z vt − Z ut | dt + Z T (cid:12)(cid:12)(cid:12) ˜∆( t, X ut , p ut , v t , u t ) (cid:12)(cid:12)(cid:12) dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + Z T | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | dt . The estimates for other terms in ˜ E hR T | R ( t ) | dt i are similar to (3.18). Hence, we obtain ˜ E "Z T | R ( t ) | dt ≤ C ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + Z T | f ( t, X ut , Y ut , Z ut , v t ) − f ( t, X ut , Y ut , Z ut , u t ) | dt . (iv) Estimate of ˜ E hP di =1 R T (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) dt i .We ﬁrst note that d P i =1 (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) ≤ d P i =1 (cid:16)(cid:12)(cid:12) f u,vz i ( t ) − f uz i ( t ) (cid:12)(cid:12) + (cid:12)(cid:12)(cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) (cid:12)(cid:12) (cid:17) ≤ | G z ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G z ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | + d k σ x k k p u k sup t ∈ [0 ,T ] | X vt − X ut | + d P i =1 n P j =1 (cid:12)(cid:12)(cid:12) ( q ut ) ji (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ( X vt − X ut ) j (cid:12)(cid:12)(cid:12) . (3.20)Since X v − X u ∈ S F ([0 , T ]; R n ) and q u ∈ K ( R n × d ) , by Proposition 2.4 and inequality (3.17), we can estimate16he last term in the second inequality of (3.20) as follow: ˜ E "Z T d P i =1 n P j =1 (cid:12)(cid:12)(cid:12) ( q ut ) ji (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ( X vt − X ut ) j (cid:12)(cid:12)(cid:12) dt = d P i =1 n P j =1 ˜ E hD ( X vt − X ut ) j · (cid:16) ( q ut ) ji · ˜ W i (cid:17)E T i ≤ d P i =1 n P j =1 (cid:13)(cid:13)(cid:13) ( q u ) ji · ˜ W i (cid:13)(cid:13)(cid:13) BMO (˜ P ) ˜ E " sup t ∈ [0 ,T ] (cid:12)(cid:12)(cid:12) ( X vt − X ut ) j (cid:12)(cid:12)(cid:12) ≤ ndc k q u k K ˜ E " sup t ∈ [0 ,T ] | X vt − X ut | . (3.21)Thus, combine estimates (3.16), (3.21) with (3.20), we obtain ˜ E " d X i =1 Z T (cid:2) f u,vz i ( t ) − f uz i ( t ) (cid:3) (cid:0) Υ i ( t, X ut , p ut , q ut , u t ) (cid:1) ⊺ ( X vt − X ut ) dt ≤ C ( ˜ E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + ˜ E "Z T | G z ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G z ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | dt . The estimates for the ﬁrst and second terms in φ are similar to the above inequality. (v) Estimate of ˜ E (cid:20) d P i =1 R T (cid:12)(cid:12)(cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1) ⊺ P ut (cid:0) σ i ( t, X ut , v t ) − σ i ( t, X ut , u t ) (cid:1)(cid:12)(cid:12) dt (cid:21) .We only need the result that sup u ( · ) ∈U [0 ,T ] k P u k ∞ < ∞ which has been proved in Lemma 3.3.Consequently, we have ˜ E " | η ( T ) | + Z T | φ t | dt ≤ C ( P ψ ∈{ b,σ,f } ˜ E "Z T | ψ ( t, X ut , Y ut , Z ut , v t ) − ψ ( t, X ut , Y ut , Z ut , u t ) | dt + P w ∈{ x,y,z } ˜ E "Z T | G w ( t, X ut , Y ut , Z ut , p ut , q ut , v t , u t ) − G w ( t, X ut , Y ut , Z ut , p ut , q ut , u t , u t ) | dt . (3.22)Combine (3.22) with (3.15), we obtain the desired result. This completes the proof.Now we state the main result of the paper. Theorem 3.6.

Suppose that Assumption 2.6 holds. Then, for ρ > Cexp (cid:8) k f y k ∞ T (cid:9) , Algorithm 1 convergesto the set of solutions to the extended SMP. Proof . For any integer m , deﬁne a new probability P m by d P m := Ξ mT d P , where Ξ mt := E d X i =1 Z t f u m − ,u m z i ( s ) dW is ! . E m [ · ] the mathematical expectation with respect to P m and Θ mt = ( X mt , Y mt , Z mt ) for t ∈ [0 , T ] .In order to obtain convergence, set ˆ H m ( t ) = H ( t, Θ mt , p mt , q mt , P mt , u mt , u m − t ) − H ( t, Θ mt , p mt , q mt , P mt , u m − t , u m − t ) and µ m = E m hR T ˆ H m ( t ) dt i . Similar to the analysis in (3.14), we get ˆ H m ( t ) ≤ , P − a.s. , a.e. t ∈ [0 , T ] ,and observe that µ m ≤ . By Lemma 3.4, let v ( · ) = u m ( · ) and u ( · ) = u m − ( · ) , then we have J ( u m ( · )) − J (cid:0) u m − ( · ) (cid:1) ≤ exp (cid:8) − k f y k ∞ T (cid:9) E m "Z T ˆ H m ( t ) dt + C ( P ψ ∈{ b,σ,f } E m "Z T (cid:12)(cid:12) ψ ( t, X mt , Y mt , Z mt , u mt ) − ψ ( t, X mt , Y mt , Z mt , u m − t ) (cid:12)(cid:12) dt + P w ∈{ x,y,z } E m "Z T (cid:12)(cid:12) G w ( t, Θ mt , p mt , q mt , u mt , u m − t ) − G w ( t, Θ mt , p mt , q mt , u m − t , u m − t ) (cid:12)(cid:12) dt (3.23)for some universal constant C > depending on n , d , T , k b x k ∞ , k σ x k ∞ , k Φ x k ∞ , k Df k ∞ and (cid:13)(cid:13) D f (cid:13)(cid:13) ∞ .Hence, by choosing ρ > Cexp (cid:8) k f y k ∞ T (cid:9) and deﬁnition of the augmented Hamiltonian ˜ H , one can rewrite(3.23) as J ( u m ( · )) − J (cid:0) u m − ( · ) (cid:1) ≤ (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) µ m ≤ . Consequently, for any integer l ≥ , we have l P m =1 ( − µ m ) ≤ (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) − l P m =1 (cid:2) J (cid:0) u m − ( · ) (cid:1) − J ( u m ( · )) (cid:3) = (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) − (cid:2) J (cid:0) u ( · ) (cid:1) − J (cid:0) u l ( · ) (cid:1)(cid:3) ≤ (cid:18) exp (cid:8) − k f y k ∞ T (cid:9) − Cρ (cid:19) − (cid:20) J (cid:0) u ( · ) (cid:1) − inf u ( · ) ∈U [0 ,T ] J ( u ( · )) (cid:21) < ∞ , which implies that ∞ P m =1 ( − µ m ) < ∞ . Since − µ m ≥ , we obtain that µ m → as m → ∞ .As was mentioned in [23], the quantity | µ m | = − µ m ≥ measures the distance from a solution tothe extended SMP (2.12). Indeed, if µ m = 0 for some integer m , then from the augmented Hamiltonianminimization step (2.13) we get ≤ ρ ( P ψ ∈{ b,σ,f } E m "Z T (cid:12)(cid:12) ψ ( t, X mt , Y mt , Z mt , u mt ) − ψ ( t, X mt , Y mt , Z mt , u m − t ) (cid:12)(cid:12) dt + P w ∈{ x,y,z } E m "Z T (cid:12)(cid:12) G w ( t, Θ mt , p mt , q mt , u mt , u m − t ) − G w ( t, Θ mt , p mt , q mt , u m − t , u m − t ) (cid:12)(cid:12) dt ≤ − µ m = 0 , which implies that the feasibility error (see Remark 3.5) vanishes and ˜ H ( t, X mt , Y mt , Z mt , p mt , q mt , P mt , u m − t , u m − t ) = min u ∈ U ˜ H ( t, X mt , Y mt , Z mt , p mt , q mt , P mt , u, u m − t ) , − a.s , a.e. t ∈ [0 , T ] . Thus (cid:0) X m , Y m , Z m , p m , q m , P m , u m − ( · ) (cid:1) solves (2.12). Corollary 3.7.

Suppose that Assumption 2.6 holds, and f is independent of y, z . We further assume thatthe following local maximum principle holds: G ( t, ¯ X t , ¯ p t , ¯ q t , ¯ u t ) ≤ G ( t, ¯ X t , ¯ p t , ¯ q t , u ) , ∀ u ∈ U, P − a.s., a.e. t ∈ [0 , T ] . (3.24) Then, the estimate (3.8) becomes J ( v ( · )) − J ( u ( · )) ≤ E "Z T [ G ( t, X ut , p ut , q ut , v t ) − G ( t, X ut , p ut , q ut , u t )] dt + C ( E "Z T | b ( t, X ut , v t ) − b ( t, X ut , u t ) | dt + E "Z T | σ ( t, X ut , v t ) − σ ( t, X ut , u t ) | dt + E "Z T | G x ( t, X ut , p ut , q ut , v t ) − G x ( t, X ut , p ut , q ut , u t ) | dt . (3.25) Furthermore, Algorithm 1 converges to the set of solutions to the extended version of (3.24) for ρ > C . Proof . Since f is independent of y, z , we have ˜ E = E , and it is not necessary to estimate (3.19) in the proofof Lemma 3.4. Thus, due to the vanishing of y , z and P , we have H ( t, x, p, q, v ) = G ( t, x, p, q, v ) . In thiscase, the augmented Hamiltonian becomes ˜ H ( t, x, p, q, v, u ) = G ( t, x, p, q, v ) + ρ | b ( t, x, v ) − b ( t, x, u ) | + ρ | σ ( t, x, v ) − σ ( t, x, u ) | + ρ | G x ( t, x, p, q, v ) − G x ( t, x, p, q, u ) | . Then, similar to the proof of Lemma 3.4, we update the control by u mt ∈ arg min u ∈ U ˜ H ( t, X mt , p mt , q mt , u, u m − t ) . and obtain estimate (3.25). By Theorem 3.6, we get the convergence of Algorithm 1 for ρ > C . Remark 3.8.