Pricing Options Under Rough Volatility with Backward SPDEs
PPricing Options Under Rough Volatility with Backward SPDEs
Christian Bayer ∗ Jinniao Qiu † Yao Yao † August 5, 2020
Abstract
In this paper, we study the option pricing problems for rough volatility models. As theframework is non-Markovian, the value function for a European option is not deterministic;rather, it is random and satisfies a backward stochastic partial differential equation (BSPDE).The existence and uniqueness of weak solution is proved for general nonlinear BSPDEs withunbounded random leading coefficients whose connections with certain forward-backwardstochastic differential equations are derived as well. These BSPDEs are then used to ap-proximate American option prices. A deep leaning-based method is also investigated for thenumerical approximations to such BSPDEs and associated non-Markovian pricing problems.Finally, the examples of rough Bergomi type are numerically computed for both Europeanand American options.
Mathematics Subject Classification (2010):
Keywords: rough volatility, option pricing, stochastic partial differential equation, machinelearning, stochastic Feynman-Kac formula
Let (Ω , F , ( F t ) t ∈ [0 ,T ] , P ) be a complete filtered probability space with the filtration ( F t ) t ∈ [0 ,T ] being the augmented filtration generated by two independent Wiener processes W and B .Throughout this paper, we denote by ( F Wt ) t ∈ [0 ,T ] the augmented filtration generated by theWiener process W . The predictable σ -algebras on Ω × [0 , T ] corresponding to ( F Wt ) t ∈ [0 ,T ] and( F t ) t ∈ [0 ,T ] are denoted by P W and P , respectively.We consider a general stochastic volatility model given under a risk neutral probabilitymeasure as dS t = rS t dt + S t (cid:112) V t (cid:16) ρ dW t + (cid:112) − ρ dB t (cid:17) ; S = s , (1.1)where ρ ∈ [ − ,
1] denotes the correlation coefficient and the constant r the interest rate. Weimpose the following assumptions on the stochastic variance process V . Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Berlin, Germany.
Email : [email protected] . C. Bayer gratefully acknowledges funding by the German Research Foun-dation (DFG) (project AA4-2 within the cluster of excellence MATH+). Department of Mathematics & Statistics, University of Calgary, 2500 University Drive NW, Calgary, AB T2N1N4, Canada.
Email : [email protected] (J. Qiu), [email protected] (Y. Yao). J. Qiu was partiallysupported by the National Science and Engineering Research Council of Canada and by the start-up funds fromthe University of Calgary. a r X i v : . [ q -f i n . M F ] A ug ssumption 1.1. V has continuous trajectories, takes values in R ≥ , and is adapted to thefiltration generated by the Brownian motion W . We further assume that V is integrable, i.e., E (cid:20)(cid:90) T V s ds (cid:21) < ∞ , T > . Note that we do not assume that V (or even ( S, V )) is a Markov process or a semi-martingale,and, in fact, our main examples will be neither. Indeed, the motivation of this work is to extendthe backward stochastic differential equation-based pricing theory to rough volatility models.These models were put forth in [GJR18] in order to explain the roughness of time series of dailyrealized variance estimates. The idea is that the spot price process is modeled by a stochasticvolatility model, with the stochastic variance process essentially behaving like an exponentialfractional Brownian motion with Hurst index 0 < H < / / < H . In the pricing domain, rough volatility was found in [BFG16]to lead to extremely accurate fits of SPX implied volatility surfaces with very few parameters,in particular explaining the power law behaviour of the ATM implied volatility skew for shortmaturities; see also [ALV07, Fuk11]. Since then, there have been many new contributions tothe literature of rough volatility models, including developments of rough Heston models withclosed expressions for the characteristic functions (see [EER19]), microstructural foundations ofrough volatility models ([EEFR18]), calibration of rough volatility models by machine learningtechniques ([BHM + + Example 1.1.
In the rough Bergomi model (see [BFG16]), the stochastic variance is given as V t = ξ t E (cid:16) η (cid:99) W t (cid:17) , (1.2) where ξ t denotes the forward variance curve (a quantity which can be computed from the impliedvolatility surface), E denotes the Wick exponential , i.e., E ( Z ) := exp (cid:0) Z − var Z (cid:1) for a zero-mean normal random variable Z , and η ≥ . Finally, (cid:99) W denotes a fractional Brownian motion(fBm) of Riemann-Liouville type with Hurst index < H < , i.e., (cid:99) W t := (cid:90) t K ( t − s ) dW s , K ( r ) := √ Hr H − / , r > . (1.3) If the correlation ρ is negative, then Gassiat [Gas18] showed that the discounted price e − rt S t is, indeed, a martingale; otherwise, it may not be a martingale. But the conditions of Assump-tion 1.1 are always satisfied. Example 1.2.
In the rough Heston model introduced in [EER19], the stochastic variance sat-isfies the stochastic Volterra equation V t = V + (cid:90) t K ( t − s ) λ ( θ − V s ) ds + (cid:90) t K ( t − s ) ζ (cid:112) V s dW s , (1.4) where the Kernel satisfies K ( r ) := r α − / Γ( α ) , r > , < α < . (1.5) The rough Heston process also satisfies Assumption 1.1; see [JLP19]. t, s ) ∈ [0 , T ] × R + , denote the asset/security price process by S t,sτ , for τ ∈ [ t, T ],which satisfies the stochastic differential equation (SDE) in (1.1) but with initial time t andinitial state s (price at time t ). The fair price of a European option with payoff H , as thesmallest initial wealth required to finance an admissible (super-replicating) wealth process, isgiven by P t ( s ) := E (cid:104) e − r ( T − t ) H ( S t,sT ) (cid:12)(cid:12) F t (cid:105) ; (1.6)refer to [CH05] for the cases when the discounted price e − rt S t is just a local martingale. Taking X t = − rt + log S t , we may reformulate the above pricing problem, i.e., u t ( x ) := E (cid:104) e − r ( T − t ) H ( e X t,xT + rT ) (cid:12)(cid:12) F t (cid:105) , ( t, x ) ∈ [0 , T ] × R , (1.7)subject to dX t,xs = (cid:112) V s (cid:16) ρ dW s + (cid:112) − ρ dB s (cid:17) − V s ds, ≤ t ≤ s ≤ T ; X t,xt = x. (1.8)Obviously, we have the relation u t ( x ) = P t ( e x + rt ) a.s..The non-Markovianity of the pair ( S, V ) (or (
X, V )) makes it impossible to characterizethe value function u t ( x ) with a conventional (deterministic) partial differential equation (PDE).Indeed, we prove that the function u t ( x ), for ( t, x ) ∈ [0 , T ] × R , is a random field which togetherwith another random field ψ t ( x ) satisfies the following backward stochastic partial differentialequation (BSPDE): − du t ( x ) = (cid:104) V t D u t ( x ) + ρ (cid:112) V t Dψ t ( x ) − V t Du t ( x ) − ru t ( x ) (cid:105) dt − ψ t ( x ) dW s ; u T ( x ) = H ( e x + rT ) , (1.9)where the pair ( u, ψ ) is unknown and the volatility process ( V t ) t ≥ is defined exogenously as inExamples 1.1 and 1.2.While the BSPDEs have been extensively studied (see [BD14, DQT11, HMY02, Pen92] forinstance), to the best of our knowledge, there is no available theory for the well-posednesssof BSPDE (1.9) because the leading coefficient V t is neither uniformly bounded from abovenor uniformly (strictly) positive from below and the terminal value H ( e · + rT ) may not belongto any space L p (Ω × R ) for p ∈ (1 , ∞ ). Hence, a weak solution theory is established for thewell-posedness of general nonlinear BSPDEs and associated stochastic Feynman-Kac formula,particularly applicable to (1.9). Such nonlinear BSPDEs are further used to approximate theAmerican option prices. Based on the stochastic Feynman-Kac formula with forward-backwardstochastic differential equations (FBSDEs), we develop a deep learning-based method for numer-ical approximations for the solutions which are essentially defined on the (infinite dimensional)probability space due to the randomness. Accordingly, the universal approximation theorem ofneural networks is generalized from finite dimensional input spaces to infinite dimensional casesin the probabilistic setting. On the basis of this approximation result, we design the schemesin the spirit of the Markovian counterpart by Hur´e, Pham, and Warin [HPW19] but equippedwith neural networks with changing and high input dimensions. Some numerical results are alsopresented for examples of rough Bergomi type, along with an appended convergence analysis.3ere, although the theory and application results are presented for the case of a single riskyasset under rough volatility, leading to associated BSPDEs on the one-dimensional space R , amulti-dimensional extension may be obtained under certain assumptions in a similar manner;nevertheless, we would not seek such a generality to avoid cumbersome arguments.Finally, let us contrast the present work with the recent work [JO19]. Therein, with themethod developed in [VZ +
19] the European option price in a local rough volatility model isexpressed as a function of t, S t and an additional, infinite-dimensional term Θ, which is closelyrelated to the forward variance curve. An infinite-dimensional pricing PDE for the option pricewith respect to these variables is then formulated and solved with a discretization method usingdeep neural networks as basis functions. The focus of [JO19] is clearly on the mathematicalfinance and numerical side, whereas well-posedness of the path-dependent PDE is more or lessassumed. (They do refer to [EKTZ14], which, however, only covers the case of path-dependentPDEs with constant diffusion coefficients. Moreover, the arguments in [JO19] seem to requireclassical – not viscosity – solutions of the path-dependent PDE.) In this sense, our present workis complementary, as the well-posedness of the BSPDE is a serious concern of this paper. Wealso extend the consideration from the European to the American case, and provide similar typeof numerical discretization also based on deep neural networks, but for approximation of theassociated FBSDEs.The rest of this paper is organized as follows. Section 2 is devoted to the well-posedness of aclass of nonlinear BSPDEs and associated stochastic Feynman-Kac formula. The weak solutiontheory is then applied to approximations of American option prices under rough volatility inSection 3. Then in Section 4, we discuss the numerical approximations with a deep learning-based method: in the first subsection we addressed the approximations of neural networks torandom functions involving infinite-dimensional input spaces in the probabilistic setting, thena deep learning-based method is introduced for non-Markovian BSDEs and associated BSPDEsin the second subsection, and in the third subsection we present some numerical examples forthe rough Bergomi model. Finally, in the appendix, a convergence analysis is presented for thedeep learning-based method. This section is devoted to a weak solution theory for the following nonlinear BSPDE: − du t ( x ) = (cid:104) V t D u t ( x ) + ρ (cid:112) V t Dψ t ( x ) − V t Du t ( x )+ F t ( e x , u t ( x ) , (cid:112) (1 − ρ ) V t Du t ( x ) , ψ t ( x ) + ρ (cid:112) V t Du t ( x )) (cid:105) dt − ψ t ( x ) dW s , ( t, x ) ∈ [0 , T ) × R ; u T ( x ) = G ( e x ) , x ∈ R . (2.1)Noteworthily, BSPDE (1.9) turns out to be a particular case when F t ( x, y, z, ˜ z ) ≡ − ry and G ( e x ) = H ( e x + rT ).We shall study the well-posedness of BSPDE (2.1) for given continuous nonnegative pro-cess ( V t ) t ≥ and address the representation relationship between BSPDE (2.1) and associatedFBSDE. Following are the assumptions on the coefficients G and F .4 ssumption 2.1. (1) The function G : (Ω × R , F WT ⊗ B ( R )) → ( R , B ( R ) satisfies G ( x ) ≤ L (1 + | x | ) , x ∈ R , for some constant L > F : (Ω × [0 , T ] × R , P W ⊗ B ( R )) → ( R , B ( R )) satisfies that there exists apositive constants L ∈ (0 , ∞ ) such that for all x, y , y , z , z , ˜ z , ˜ z ∈ R , and t ∈ [0 , T ], | F t ( x, y , z , ˜ z ) − F t ( x, y , z , ˜ z ) | ≤ L (cid:0) | y − y | + | z − z | + | ˜ z − ˜ z | (cid:1) , a.s. , | F t ( x, , , | ≤ L (1 + | x | ) , a.s., | F t ( x, y , z , ˜ z ) − F t ( x, y , , | ≤ L , a.s..For the well-posedness of BSPDE (2.1) under Assumption 2.1, the difficulty lies in thecombination of the non-uniform-boundedness of ( V t ) t ∈ [0 ,T ] and the inintegrability of G ( e x ) and F t ( e x , y, z, ˜ z ) w.r.t. x on the whole space R . Indeed, from the condition on ( V ) t ≥ in Assumption2.1, we may conclude that e X ,xs is a positive local martingale and thus a supermartingale,satisfying E [ e X ,xt ] ≤ e x for instance; however, it is not appropriate to expect E (cid:104)(cid:12)(cid:12) e X ,xt (cid:12)(cid:12) p (cid:105) < ∞ for some p > F on ( Z, ˜ Z ) is not necessary for the concerned examples in this paper.We assume the Lipschitz continuity and boundedness in ( Z, ˜ Z ) for the reader’s interests. In fact,for the well-posedness of the involved BSDEs and BSPDEs in the L spaces, it is not appropriateto assume the linear growth in ( Z, ˜ Z ) as indicated in the theory of L solutions for BSDEs (see[BDH +
03, Section 6]); it might be workable for certain fractional growths in ( Z, ˜ Z ), while wewould not seek such a generality to avoid cumbersome arguments in this work.Corresponding to BSPDE (2.1), there follows the BSDE: (cid:40) − dY t,xs = F s ( e X t,xs , Y t,xs , Z t,xs , ˜ Z t,xs ) ds − ˜ Z t,xs dW s − Z t,xs dB s , ≤ t ≤ s ≤ T ; Y t,xT = G ( X t,xT ) , (2.2)where the triple ( Y t,xs , Z t,xs , ˜ Z t,xs ) is defined as the solution to BSDE (2.2) in the sense of[BDH +
03, Definition 2.1]. Under Assumptions 1.1 and 2.1, BSDE (2.2) has a unique solution( Y t,xs , Z t,xs , ˜ Z t,xs ) for each ( t, x ) ∈ [0 , T ) × R (see [BDH +
03, Theorem 6.3]). (2.1)
Denote by C ∞ c the space of infinitely differentiable functions with compact supports in R andlet D be the space of real-valued Schwartz distributions on C ∞ c . The Lebesgue measure in R will be denoted by dx . L ( R ) ( L for short) is the usual Lebesgue integrable space with scalarproduct and norm defined (cid:104) φ, ψ (cid:105) = (cid:90) R φ ( x ) ψ ( x ) dx, (cid:107) φ (cid:107) = (cid:104) φ, φ (cid:105) / , ∀ φ, ψ ∈ L . For convenience, we shall also use (cid:104)· , ·(cid:105) to denote the duality between the Schwartz distributionspace D and C ∞ c .By D F (respectively, D F W ) we denote the set of all D -valued functions defined on Ω × [0 , T ]such that, for any u ∈ D F (respectively, u ∈ D F W ) and φ ∈ C ∞ c , the function (cid:104) u, φ (cid:105) is P P W )-measurable. When there is no confusion about the involved filtration, weshall just write D .For p = 1 , D p the totality of u ∈ D such that for any R ∈ (0 , ∞ ) and φ ∈ C ∞ c , we have (cid:90) T sup | x |≤ R |(cid:104) u t ( · ) , φ ( · − x ) (cid:105)| p dt < ∞ a.s.. Lemma 2.1.
Given u ∈ D p for p = 1 , , it holds that:(i) Du ∈ D p ;(ii) For each continuous function (cid:37) on R , we have (cid:37)u ∈ D p if u ∈ L (Ω × [0 , T ] × R ) .(iii) For any continuous processes ( x t ) t ∈ [0 ,T ] and ( y t ) t ∈ [0 ,T ] with max t ∈ [0 ,T ] | x t | + | y t | < ∞ a.s.,the random field ˜ u t ( x ) := y t u t ( x + x t ) is also lying in D p .Proof. The assertion (i) may also be found in [Kry10, page 297]. In fact, for each φ ∈ C ∞ c , wehave Dφ ∈ C ∞ c , and the integration-by-parts formula indicates that (cid:104) Du t ( · ) , φ ( · − x ) (cid:105) = −(cid:104) u t ( · ) , ( Dφ )( · − x ) (cid:105) . Hence, Du ∈ D p if u ∈ D p .For assertion (ii), notice that for each γ ∈ (0 , ∞ ),sup | x |≤ γ |(cid:104) (cid:37) ( · ) u t ( · ) , φ ( · − x ) (cid:105)| p ≤ (cid:107) u t (cid:107) p (cid:107) φ (cid:107) p max | x |≤ γ + R | (cid:37) ( x ) | p , where we choose a sufficiently big R > φ is contained in [ − R, R ]. Thenit follows obviously that (cid:37)u ∈ D p .Lastly, as max t ∈ [0 ,T ] | x t | + | y t | < ∞ a.s. and for each γ ∈ (0 , ∞ ),sup | x |≤ γ |(cid:104) y t u t ( · + x t ) , φ ( · − x ) (cid:105)| p = sup | x |≤ γ |(cid:104) u t ( · ) , y t φ ( · − x t − x ) (cid:105)| p ≤ sup | x |≤ γ +max t ∈ [0 ,T ] | x t | |(cid:104) u t ( · ) , φ ( · − x ) (cid:105)| p max t ∈ [0 ,T ] | y t | p , there holds assertion (iii).For u, f, g ∈ D , we say that the equality du t ( x ) = f t ( x ) dt + g t ( x ) dW t , t ∈ [0 , T ] , holds in the sense of distribution if f ∈ D , g ∈ D and for each φ ∈ C ∞ c , it holds a.s., (cid:104) u t ( · ) , φ (cid:105) = (cid:104) u ( · ) , φ (cid:105) + (cid:90) t (cid:104) f s ( · ) , φ (cid:105) ds + (cid:90) t (cid:104) g s ( · ) , φ (cid:105) dW s , ∀ t ∈ [0 , T ] . Definition 2.1.
A pair ( u, ψ ) ∈ D F W × D F W is said to be a weak solution of BSPDE (2.1), if(i) u T ( x ) = G ( e x ) a.s.; 6ii) for almost all ( ω, t ) ∈ Ω × [0 , T ], the functions u t ( x ) , (cid:112) (1 − ρ ) V t Dψ t ( x ), and ρ √ V t Du t ( x )+ ψ t ( x ) are locally integrable in x ∈ R ; (iii) the equality − du t ( x ) = (cid:104) V t D u t ( x ) + ρ (cid:112) V t Dψ t ( x ) − V t Du t ( x )+ F t ( e x , u t ( x ) , (cid:112) (1 − ρ ) V t Du t ( x ) , ψ t ( x ) + ρ (cid:112) V t Du t ( x )) (cid:105) dt − ψ t ( x ) dW s , holds in the sense of distribution.By Assumption 2.1, the linear growth of ( G, F ) w.r.t. e x produces the local integrability in x ∈ R . Therefore, in Definition 2.1 the local integrability is set for the weak solution, which doesnot just give a point-wise meaning of the compositions involved in function F but also makethe weak solution be potentially workable under Assumption 2.1 particularly encompassing theconcerned examples in this paper. Obviously, it differs from the L p ( p ∈ (1 , ∞ ])-integrabilityrequirements for the weak or viscosity solutions in the existing BSPDE literature (see [DQT11,HMY02, Qiu18, Zho92] for instance). (2.1) and the stochastic Feynman-Kac formula First comes a result about the measurability of Y t,xt which basically states that the randomnessfrom Wiener process B is averaged out as the randomness of all the coefficients is only (explicitly)subject to the sub-filtration { F Wt } t ≥ . Theorem 2.2.
Under assumptions 1.1 and 2.1, for each ( t, x ) ∈ [0 , T ] × R , let ( Y t,xs , Z t,xs , ˜ Z t,xs ) be the solution to BSDE (2.2) . Then the value function: Φ t ( x ) := Y t,xt is just F Wt -measurable.Proof. We shall adopt some techniques by Buckdahn and Li in [BL08]. For the underlyingprobability space, w.l.o.g., we may take Ω = C ([0 , T ]; R ) = Ω W × Ω B , with Ω W = C ([0 , T ]; R ),Ω B = C ([0 , T ]; R ), and for each ω ∈ Ω, one has ω = ( ω W , ω B ) with ω W ∈ Ω W and ω B ∈ Ω B . Andthe two independent Wiener processes W and B may be defined on Ω W and Ω B , respectively.Set H = (cid:26) h ; h (0) = 0 , dhdt ∈ L (0 , T ; R ) (cid:27) , which is the Cameron-Martin space associated with the Wiener process B . For any h ∈ H , wedefine the translation operator τ h : Ω → Ω, τ h (( ω W , ω B )) = ( ω W , ω B + h ) for ω = ( ω W , ω B ) ∈ Ω. It is obvious that τ h is a bijection and that it defines the probability transformation: (cid:0) P ◦ τ − h (cid:1) ( dω ) = exp { (cid:82) T | dhdt | dt − (cid:82) T dhdt dB t } P ( dω ).Fix some ( t, x ) ∈ [0 , T ] × R d and set H t = { h ∈ H (cid:12)(cid:12) h ( · ) = h ( · ∧ t ) } . Recall X t,xT = x − (cid:90) Tt V s ds + (cid:90) Tt ρ (cid:112) V s dW s + (cid:90) Tt (cid:112) (1 − ρ ) V s dB s . Here, by the local integrability of a function g in x ∈ R we mean that for each bounded measurable set D ⊂ R ,it holds that the truncated function g · D lies in L ( R ).
7y Girsanov theorem, it follows that X t,xT ( τ h ) = X t,xT for all h ∈ H t , and thus, we haveΦ t ( x )( τ h ) = Φ t ( x ) P -a.s. for any h ∈ H t . In particular, for any continuous and boundedfunction G , E (cid:20) G (Φ t ( x )) exp (cid:110) (cid:90) T | dhds | ds − (cid:90) T dhds dB s (cid:111)(cid:21) = E (cid:20) G (Φ t ( x ))( τ h ) exp (cid:110) (cid:90) T | dhds | ds − (cid:90) T dhds dB s (cid:111)(cid:21) = E [ G (Φ t ( x ))]= E [ G (Φ t ( x ))] E (cid:20) exp (cid:110) (cid:90) T | dhds | ds − (cid:90) T dhds dB s (cid:111)(cid:21) , which together with the arbitrariness of ( G , h ) implies that Φ t ( x ) is just F Wt -measurable.Following is the Itˆo-Wentzell-Krylov formula. Lemma 2.3 (Theorem 1 of [Kry10]) . Let x t be an R -valued predictable process of the followingform x t = (cid:90) t b s ds + (cid:90) t β s dW s + (cid:90) t σ s dB s , where b , σ and β are predictable processes such that for all ω ∈ Ω and s ∈ [0 , T ] , it holds that | β s | + | σ s | < ∞ and (cid:90) T (cid:0) | b t | + | β s | + | σ s | (cid:1) dt < ∞ . Assume that the equality du t ( x ) = f t ( x ) dt + g t ( x ) dW t , t ∈ [0 , T ] , holds in the sense of distribution and define v t ( x ) := u t ( x + x t ) . Then we have dv t ( x ) = (cid:18) f t ( x + x t ) + 12 ( | β t | + | σ t | ) D v t ( x ) + β t Dg t ( x + x t ) + b t Dv t ( x ) (cid:19) dt + ( g t ( x + x t ) + β t Dv t ( x )) dW t + σ t Dv t ( x ) dB t , t ∈ [0 , T ] holds in the sense of distribution. We note that in the Itˆo-Wentzell formula by Krylov [Kry10, Theorem 1], the Wiener pro-cess ( W t ) t ≥ may be general separable Hilbert space-valued and the process ( x t ) t ≥ may bemulti-dimensional. An application of the above Itˆo-Wentzell-Krylov formula gives the followingstochastic Feynman-Kac formula that is the probabilistic representation of the weak solution toBSPDE (2.1) via the solution of associated BSDE (2.2) coupled with the forward SDE (1.8). Theorem 2.4.
Let Assumptions 1.1 and 2.1 hold. Let ( u, ψ ) be a weak solution of BSPDE (2.1) such that there is C u ∈ (0 , ∞ ) satisfying for each t ∈ [0 , T ] | u t ( x ) | ≤ C u (1 + e x ) , for almost all ( ω, x ) ∈ Ω × R . (2.3) Then ( u, ψ ) admits a version (denoted by itself ) satisfying a.s. u τ ( X t,xτ ) = Y t,xτ , (cid:112) (1 − ρ ) V τ Du τ ( X t,xτ ) = Z t,xτ , ψ τ ( X t,xτ ) + ρ (cid:112) V τ Du τ ( X t,xτ ) = ˜ Z t,xτ , for ≤ t ≤ τ ≤ T and x ∈ R , where ( Y t,xτ , Z t,xτ , ˜ Z t,xτ ) is the unique solution to BSDE (2.2) . roof. For each t ∈ [0 , T ), recall X t,xs = x − (cid:90) st V r dr + (cid:90) st ρ (cid:112) V r dW r + (cid:90) st (cid:112) (1 − ρ ) V r dB r , t ≤ s ≤ T. Applying Lemma 2.3 to u over the interval [ t, T ] yields that du s ( X t,xs ) = (cid:16) ψ s ( X t,xs ) + ρ (cid:112) V s Du s ( X t,xs ) (cid:17) dW s + (cid:112) (1 − ρ ) V s Du s ( X t,xs ) dB s − F s (cid:16) e X t,xs , u s ( X t,xs ) , (cid:112) (1 − ρ ) V s Du s ( X t,xs ) , ψ s ( X t,xs ) + ρ (cid:112) V s Du s ( X t,xs ) (cid:17) ds, s ∈ [ t, T ] , holds in the sense of distribution with u T ( X t,xT ) = G ( e X t,xT ).Notice that for all τ ∈ [ t, T ], we have e X t,xτ ∈ L (Ω , P ) and E (cid:104) e X t,xτ (cid:105) ≤ e x . This together withAssumption 2.1 and relation (2.3), implies that u τ ( X t,xτ ) ∈ L (Ω , P ) for all τ ∈ [ t, T ]. Further,the uniqueness of L -solution for BSDEs (see [BDH +
03, Section 6]) yields a version of ( u, ψ )(denoted by itself) satisfying that a.s. u τ ( X t,xτ ) = Y t,xτ , (cid:112) (1 − ρ ) V τ Du τ ( X t,xτ ) = Z t,xτ , ψ τ ( X t,xτ ) + ρ (cid:112) V τ Du τ ( X t,xτ ) = ˜ Z t,xτ , for 0 ≤ t ≤ τ ≤ T and x ∈ R , where ( Y t,xτ , Z t,xτ , ˜ Z t,xτ ) is the unique solution to BSDE (2.2).From the proof, we may see that the growth condition (2.3) confirms that the distribution-valued process u is locally integrable and a.e. defined on Ω × [0 , T ] × R which means more thandistributions. More importantly, it implies the integrability of u ( τ, X t,xτ ) which is needed for theuniqueness of solution to BSDEs. The growth condition (2.3) may be relaxed; however, powergrowth condition like | u t ( x ) | ≤ C (1 + | e x | p ) for some p > u ( τ, X t,xτ ) (see [Gas18, Theorem 2]). On the other hand, the stochastic Feynman-Kac formula inTheorem 2.4 actually implies the uniqueness of weak solution for BSPDE (2.1) which togetherwith the existence is summarized in what follows. Theorem 2.5.
Under Assumptions 1.1 and 2.1, suppose further that there is an infinitelydifferentiable function ζ such that ζ ( x ) > for all x ∈ R and G ( e · + X , T ) ζ ( · ) ∈ L (Ω , F T ; L ( R )) , ζ ( · ) F · ( e · + X , · , , , ∈ L (Ω × [0 , T ]; L ( R )) . (2.4) Then BSPDE (2.1) admits a unique weak solution ( u, ψ ) such that there is C u ∈ (0 , ∞ ) satisfyingfor each t ∈ [0 , T ] | u ( t, x ) | ≤ C u (1 + e x ) , for almost all ( ω, x ) ∈ Ω × R . (2.5) Proof.
Step 1 (Existence). Put θ ( x ) = ζ ( x )(1+ ζ ( x ))(1+ x ) for x ∈ R . The theory of Banach space-valued BSDEs in [DQT11, Section 3] may be extended to nonlinear cases under Lipschitz as-sumptions with the standard application of Picard iteration. In particular, for the case of Hilbertspaces, applying [HP91, Theorem 3.1] to the following Hilbert space-valued BSDE (with a trivialoperator A = 0 therein):˜ u t ( x ) = G ( e x + X , T ) θ ( x ) + (cid:90) Tt θ ( x ) F s ( e x + X , s , ( θ ( x )) − ˜ u s ( x ) , ( θ ( x )) − ˜ ψ Bs ( x ) , ( θ ( x )) − ˜ ψ Ws ( x )) ds − (cid:90) Tt ˜ ψ Bs ( x ) dB s − (cid:90) Tt ˜ ψ Ws ( x ) dW s , t ∈ [0 , T ] . (2.6)9ives the solution of the triple of L ( R )-valued ( F t )-adapted random fields(˜ u, ˜ ψ B , ˜ ψ W ) ∈ L (Ω; C ([0 , T ]; L ( R ))) × L (Ω × [0 , T ] × R ) × L (Ω × [0 , T ] × R ) . (2.7)Obviously, we have (˜ u, ˜ ψ B , ˜ ψ W ) ∈ D F × D F × D F , and thus by assertion (ii) of Lemma 2.1, itholds that (ˆ u, ˆ ψ B , ˆ ψ W ) := (˜ u, ˜ ψ B , ˜ ψ W ) θ ∈ D F × D F × D F , satisfying BSDE:ˆ u t ( x ) = G ( e x + X , T ) + (cid:90) Tt F s ( e x + X , s , ˆ u s ( x ) , ˆ ψ Bs ( x ) , ˆ ψ Ws ( x )) ds − (cid:90) Tt ˆ ψ Bs ( x ) dB s − (cid:90) Tt ˆ ψ Ws ( x ) dW s , t ∈ [0 , T ] . Also, it is straightforward to have thatˆ u t ( x ) = Y t,x + X , t t a.s., for all ( t, x ) ∈ [0 , T ] × R , (2.8)with the triple ( Y t,xs , Z t,xs , ˜ Z t,xs ) s ∈ [ t,T ] satisfying BSDE (2.2).By Lemma 2.1, we may apply the Itˆo-Wentzell-Krylov formula in Lemma 2.3 which yieldsthat the equality − d ˆ u t ( x − X , t ) (2.9)= (cid:26) − V t D ˆ u t ( x − X , t ) + (cid:112) (1 − ρ ) V t D ˆ ψ Bt ( x − X , t ) + ρ (cid:112) V t D ˆ ψ Wt ( x − X , t ) − V t D ˆ u t ( x − X , t ) + F t ( e x , ˆ u t ( x − X , t ) , ˆ ψ Bt ( x − X , t ) , ˆ ψ Wt ( x − X , t )) (cid:27) dt − (cid:16) ˆ ψ Wt ( x − X , t ) − ρ (cid:112) V t D ˆ u t ( x − X , t ) (cid:17) dW t − (cid:16) ˆ ψ Bt ( x − X , t ) − (cid:112) (1 − ρ ) V t D ˆ u t ( x − X , t ) (cid:17) dB t , t ∈ [0 , T ] , (2.10)holds in the sense of distribution. Notice that the equality (2.8) indicates that for each s ∈ [0 , T ]ˆ u s ( x − X , s ) = Y s,xs , (2.11)which is just F Ws -measurable by Theorem 2.2. Thus, the stochastic integration w.r.t. B shouldbe vanishing, i.e., we haveˆ ψ Bt ( x ) − (cid:112) (1 − ρ ) V t D ˆ u t ( x ) = 0 , a.s. for all ( t, x ) ∈ [0 , T ] × R . Put u t ( x ) = ˆ u t ( x − X , t ) and ψ t ( x ) = ˆ ψ Wt ( x − X , t ) − ρ (cid:112) V t D ˆ u t ( x − X , t ) , ( t, x ) ∈ [0 , T ] × R . The F Wt -adaptedness of u t ( x ), and the assertions (i) and (iii) of Lemma 2.1 imply ( u, ψ ) ∈ D F W × D F W , and the equality (2.8) writes equivalently − du t ( x ) = (cid:26) V t D u t ( x ) + ρ (cid:112) V t Dψ t ( x ) − V t Du t ( x )10 F t ( e x , u t ( x ) , (cid:112) (1 − ρ ) V t Du t ( x ) , ψ t ( x ) + ρ (cid:112) V t Du t ( x )) (cid:27) dt − ψ t ( x ) dW t , t ∈ [0 , T ] , which holds in the sense of distribution with the terminal condition u T ( x ) = G ( e x ). Thelocal integrability of ( u, (cid:112) (1 − ρ ) V t Du, ψ + ρ √ V t Du ) required in Definition 2.1 (ii) may beobtained by combining the relation (2.7), the path-continuity of ( X , s ) s ≥ , and the positivity of θ . Therefore, the pair ( u, ψ ) is a weak solution of BSPDE (2.1). Step 2 (Growth condition (2.5)). Consider the following Hilbert space-valued BSDE: ˜ Y t ( x ) = (cid:12)(cid:12)(cid:12) G ( e x + X , T ) (cid:12)(cid:12)(cid:12) θ ( x ) − (cid:90) Tt ˜ Z Bs ( x ) dB s − (cid:90) Tt ˜ Z Ws ( x ) dW s + (cid:90) Tt (cid:16) θ ( x ) (cid:12)(cid:12) F s ( e x + X , s , , , (cid:12)(cid:12) + L θ ( x ) + L | ˜ Y s ( x ) | (cid:17) ds, (2.12)where the positive constant L is from Assumption 2.1 (iii). The standard BSDE theory (see[PP90]) yields the unique existence of the L -solution to BSDE (2.12). In fact, for each ( t, x ) ∈ [0 , T ) × R we have˜ Y t ( x ) = E (cid:20)(cid:12)(cid:12)(cid:12) G ( e x + X , T ) (cid:12)(cid:12)(cid:12) θ ( x ) γ tT + (cid:90) Tt θ ( x )( L + (cid:12)(cid:12) F s ( e x + X , s , , , (cid:12)(cid:12) ) · γ ts ds (cid:12)(cid:12)(cid:12) F t (cid:21) , (2.13)with γ ts = exp { L ( s − t ) } , s ∈ [ t, T ] . Putting the BSDEs (2.6) and (2.12) together, we may use the comparison theorem (see [EPQ97,Theorem 2.2]) to achieve the relation˜ u t ( x ) ≤ ˜ Y t ( x ) , a.s., ∀ ( t, x ) ∈ [0 , T ] × R , which together with (2.13) implies that u t ( x ) ≤ ( θ ( x − X , t )) − ˜ Y t ( x − X , t )= E (cid:20)(cid:12)(cid:12)(cid:12) G ( e X t,xT ) (cid:12)(cid:12)(cid:12) γ tT + (cid:90) Tt (cid:16)(cid:12)(cid:12) F s ( e X t,xs , , , (cid:12)(cid:12) + L (cid:17) · γ ts ds (cid:12)(cid:12)(cid:12) F t (cid:21) ≤ E (cid:20)(cid:16) Le X t,xT + L (cid:17) γ tT + (cid:90) Tt (cid:16) L e X t,xs + 2 L (cid:17) · γ ts ds (cid:12)(cid:12)(cid:12) F t (cid:21) ≤ E (cid:20) Le x + L ( T − t ) + Le L ( T − t ) + (cid:90) Tt (cid:16) L e x + L ( s − t ) + 2 L e L ( s − t ) (cid:17) ds (cid:12)(cid:12)(cid:12) F t (cid:21) ≤ C ( L, T, L )(1 + e x ) , a.s., ∀ ( t, x ) ∈ [0 , T ] × R , where we have used the relation E (cid:104) e X t,xs (cid:12)(cid:12) F t (cid:105) ≤ e x a.s., for 0 ≤ t ≤ s ≤ T . This gives thegrowth estimate (2.5). Step 3 (Uniqueness). The uniqueness follows from Theorem 2.4 and the proof is complete.
Remark 2.1.
In view of the above proof, the assumption (2.4) on G and F is to ensure( ˜ ψ B , ˜ ψ W ) ∈ D F × D F and further ψ ∈ D F W . It is for simplicity and may be relaxed; forinstance, the L -requirements in (2.4) may be replaced correspondingly by L p -integrability butwith 1 < p < ∞ and the associated well-posedness result with L p -integrability in (2.7) may be11btained by standardly extending the theory of Banach space-valued BSDEs in [DQT11, Section3] as stated at the beginning of the proof. A typical example satisfying (2.4) is the Europeanput option where F t ( x, y, z, ˜ z ) = − ry , G ( e x ) = ( K − e x + rT ) + for some K ∈ (0 , ∞ ) and one maytake ζ ( x ) = x for instance. However, it is by no means obvious to see if it is satisfied for thecall options, while for pricing calls, we may use the put-call parity if applicable. Assuming the same setting as the European options, we consider instead the American type,that is to compute u t ( x ) := sup τ ∈T t E (cid:104) e − ( τ − t ) r g τ ( e X t,xτ ) (cid:12)(cid:12) F t (cid:105) , ( t, x ) ∈ [0 , T ] × R , where r ≥ T t denotes all the stopping times τ satisfying t ≤ τ ≤ T .For simplicity, we assume: Assumption 3.1.
The function g : (Ω × [0 , T ] × R , P W ⊗ B ( R )) → ( R , B ( R )) satisfies thatthere exists a positive constant L > t, x ) ∈ [0 , T ] × R ,(i) g s ( e X t,xs ) is almost surely continuous in s ∈ [ t, T ];(ii) g s ( e x ) ≤ L (1 + e x ), a.s.;(iii) (cid:12)(cid:12)(cid:12) g s (cid:16) e X t,xs (cid:17)(cid:12)(cid:12)(cid:12) ≤ Γ ts ˜ θ ( x ) , a.s., ∀ s ∈ [ t, T ] , with E (cid:34) sup s ∈ [ t,T ] (cid:12)(cid:12) Γ ts (cid:12)(cid:12) (cid:35) < ∞ , where the positive function ˜ θ : R → (0 , ∞ ) is infinitely differentiable.A typical example satisfying Assumption 3.1 is the American put option with g t ( e x ) =( K − e x + rt ) + for some K >
0, where one may take L = K , Γ ts ≡ K , and ˜ θ ( x ) ≡
1. By thetheory of reflected BSDEs (see [EKP +
97, Section 3]), the following reflected BSDE − dY t,xs = − rY t,xs ds + dA t,xs − Z t,x ; Bs dB s − Z t,x ; Ws dW s , s ∈ [ t, T ]; Y t,xT = g T ( e X t,xT ); Y t,xs ≥ g s ( e X t,xs ) , s ∈ [ t, T ]; A t,x · is increasing and continuous, A t,xt = 0 , (cid:90) Tt ( Y t,xs − g s ( e X t,xs )) dA t,xs = 0 , (3.1)admits a unique solution ( Y t,x , A t,x , Z t,x ; B , Z t,x ; W ) for each ( t, x ) ∈ [0 , T ] × R , and in particular,by [EKP +
97, Proposition 7.1], we have Y t,xt = u t ( x ) , a.s. for each ( t, x ) ∈ [0 , T ] × R . (3.2)We would stress that the above relation (3.2) only indicates that u t ( x ) is F t -measuable for each( t, x ) ∈ [0 , T ] × R . 12n fact, the penalization method provides an approximation of reflected BSDE (3.1) with asequence of BSDEs without reflections (see [EKP +
97, Section 6]), i.e., for each N ∈ N + , thefollowing BSDE − dY t,x ; Ns = (cid:20) − rY t,x ; Ns + N (cid:16) g s ( e X t,xs ) − Y t,x ; Ns (cid:17) + (cid:21) ds − Z t,x ; B,Ns dB s − Z t,x ; W,Ns dW s , s ∈ [ t, T ]; Y t,x ; NT = g T ( e X t,xT ) , (3.3)admits a unique solution ( Y t,x ; N , Z t,x ; B,N , Z t,x ; W,N ) such that Y t,x ; Ns converges increasingly to Y t,xs withlim N →∞ E (cid:34) sup s ∈ [ t,T ] (cid:12)(cid:12)(cid:12) Y t,x ; Ns − Y t,xs (cid:12)(cid:12)(cid:12) + (cid:90) Tt (cid:12)(cid:12)(cid:12) Z t,x ; B,Ns − Z t,x ; Bs (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) Z t,x ; W,Ns − Z t,x ; Ws (cid:12)(cid:12)(cid:12) ds (cid:35) = 0 , (3.4)lim N →∞ E (cid:34) sup s ∈ [ t,T ] (cid:12)(cid:12) A t,x ; Ns − A t,xs (cid:12)(cid:12) (cid:35) = 0 , (3.5)for each ( t, x ) ∈ [0 , T ] × R , where A t,x ; Nr = (cid:90) rt N (cid:16) g s ( e X t,xs ) − Y t,x ; Ns (cid:17) + ds, for 0 ≤ t ≤ r ≤ T. Notice that Theorem 2.2 says that Y t,x ; Nt is F Wt -measurable for each ( t, x ) ∈ [0 , T ] × R . Hence,the approximation (3.4) implies that Y t,xt (and thus u t ( x )) is also just F Wt -measurable for each( t, x ) ∈ [0 , T ] × R , which together with Theorems 2.4 and 2.5 yields the following Corollary 3.1.
Let Assumptions 1.1 and 3.1 hold. It holds that:(i) The value function u t ( x ) is just F Wt -measurable for each ( t, x ) ∈ [0 , T ] × R .(ii) For each N ∈ N + , the following BSPDE − du Nt ( x ) = (cid:104) V t D u Nt ( x ) + ρ (cid:112) V t Dψ Nt ( x ) − V t Du Nt ( x ) − ru Nt ( x )+ N (cid:0) g t ( e x ) − u Nt ( x ) (cid:1) + (cid:105) dt − ψ Nt ( x ) dW s ; u NT ( x ) = g T ( e x ) , (3.6) admits a unique weak solution ( u N , ψ N ) such that there exists C N ∈ (0 , ∞ ) satisfying foreach t ∈ [0 , T ] | u Nt ( x ) | ≤ C N (1 + e x ) , for almost all ( ω, x ) ∈ Ω × R . (iii) For each N ∈ N + , the above weak solution ( u N , ψ N ) satisfies a.s. u Nτ ( X t,xτ ) = Y t,x ; Nτ , (cid:112) (1 − ρ ) V τ Du Nτ ( X t,xτ ) = Z t,x ; B,Nτ , and ψ Nτ ( X t,xτ ) + ρ (cid:112) V τ Du Nτ ( X t,xτ ) = Z t,x ; W,Nτ , for ≤ t ≤ τ ≤ T and x ∈ R , where ( Y t,x ; N , Z t,x ; B,N , Z t,x ; W,N ) is the unique solution toBSDE (3.3) . iv) For each ( t, x ) ∈ [0 , T ] × R , u Nt ( x ) converges increasingly to u t ( x ) in L (Ω , F t ; R ) .(v) There is a triple ( u, ψ B , ψ W ) defined on (Ω × [0 , T ] × R , P W ⊗ B ( R )) such that u τ ( X t,xτ ) = Y t,xτ , ψ Bτ ( X t,xτ ) = Z t,x ; Bτ , and ψ Wτ ( X t,xτ ) = Z t,x ; Wτ , a.s. , for ≤ t ≤ τ ≤ T . Remark 3.1.
The assertion (v) is concluded from the approximating relations (3.4) and (3.5).In fact, by the theory of reflected BSPDEs (see [QW14] or [Qiu17, Section 3.3]), one may expectthe value function u t ( x ) to be characterized via the following reflected BSPDE − du t ( x ) = (cid:104) V t D u t ( x ) + ρ (cid:112) V t Dψ t ( x ) − V t Du t ( x ) − ru t ( x ) (cid:105) dt + µ ( dt, x ) − ψ ( t, x ) dW t , ( t, x ) ∈ [0 , T ] × R ; u T ( x ) = g T ( e x ) , x ∈ R ; u t ( x ) ≥ g t ( e x ) , d P ⊗ dt ⊗ dx -a.e.; (cid:90) [0 ,T ] × R (cid:0) u t ( x ) − g t ( e x ) (cid:1) µ ( dt, dx ) = 0 , a.s., (Skorohod condition) (3.7)for which the solution is a triple ( u, ψ, µ ) with µ being a regular random radon measure. Asolution theory may be developed by generalizing the regular stochastic potential and capacitytheory in [Qiu17, QW14]; nevertheless, we would not seek such a generality in this paper, inorder to put more efforts in the numerical approximations. Throughout this section, we assume that the functions G , F and g are deterministic, i.e.,( A ∗ ) G : R → R , F : [0 , T ] × R → R , g : [0 , T ] × R → R . In fact, this assumption may be relaxed by allowing (explicit) dependence on the varianceprocess V and the Wiener process W , and together with Assumptions 1.1, 2.1, and 3.1, itensures that all the coefficients may be simulated in the subsequent numerical computations,given the approximations of the unknown functions. In what follows, we first introduce anddiscuss the neural networks approximating random functions, a deep learning-based method isthen introduced for non-Markovian BSDEs and associated BSPDEs and finally, the numericalexamples are presented for the rough Bergomi model. First, we introduce a feedforward neural network with input dimension d and output dimension d . Suppose that it has M + 1 ∈ N + (cid:31) { , } layers with each layer having m n neurons, n =0 , · · · , M . For simplicity, we choose an identical number of neurons for all hidden layers, i.e., m n = m, n = 1 , · · · , M −
1. Obviously, we have m = d , and m M = d . The neural networkmay be thought of as a function from R d to R d defined by composition of simple functions as x ∈ R d (cid:55)→ A M o (cid:37) o A M − o · · · o (cid:37) o A ( x ) ∈ R d . (4.1)14ere, A : R d (cid:55)→ R m , A M : R m (cid:55)→ R d and A n : R m (cid:55)→ R m , n = 2 , · · · , M − A n ( x ) = W n x + β n , where the matrix W n and the vector β n are called weight and bias respectively for the n th layerof the network. For the last layer we choose identity function as activation function, and theactivation function (cid:37) is applied component-wise on the outputs of A n , for n = 1 , . . . , M − θ = ( W n , β n ) Mn =1 . Given d , d , M and m , the total number of parameters in a network is M m = (cid:80) M − n =0 ( m n +1) m n +1 = ( d +1) m +( m +1) m ( M −
1) + ( m + 1) d and thus θ ∈ R M m . By Θ m , we denote the set of all possible parametersand if there are no constraints on parameters, we have Θ m = R M m . By Φ m ( · ; θ ) we denote theneural network function defined in (4.1) and set of all such neural networks Φ m ( · ; θ ) , θ ∈ Θ m isdenoted by N N (cid:37)d ,d ,M,m (Θ m ).Deep neural networks may approximate large classes of unknown functions. Following is afundamental result by Hornik et al. [HSW89, HSW90]: Lemma 4.1 (Universal Approximation Theorem) . It holds that:(i) For each M ∈ N + \ { } , the set ∪ m ∈ N N N (cid:37)d ,d ,M,m ( R M m ) is dense in L ( R d , ν ( dx ); R d ) for any finite measure ν on R d , whenever (cid:37) is continuous and non-constant.(ii) Assume that (cid:37) is a non-constant C k function. Then the neural networks/functions in ∪ m ∈ N N N (cid:37)d ,d , ,m ( R m ) can approximate any function and its derivatives up to order k ,arbitrarily well on any compact set of R d . Notice that in the above lemma the approximated functions are defined on the finite dimen-sional spaces i.e., R d . In fact, the approximations may be extended to some classes of functionsdefined on infinite dimensional spaces. In this paper, we need the following one: Proposition 4.2.
For each T ∈ (0 , T ] , M ∈ N + \ { } , and d , d ∈ N + , the function set (cid:110) Φ m ( W t , · · · , W t k , x ; θ ) : Φ m ( · ; θ ) ∈ N N (cid:37)d + k,d ,M,m ( R M m ) , m, k ∈ N + , < t < t < · · · < t k ≤ T (cid:111) is dense in L (cid:0) Ω × R d , F WT ⊗ B ( R d ) , P ( dω ) ⊗ dx ; R d (cid:1) , whenever (cid:37) is continuous and non-constant.Proof. Take f ∈ L (cid:0) Ω × R d , F WT ⊗ B ( R d ) , P ( dω ) ⊗ dx (cid:1) arbitrarily. Notice that L (cid:16) Ω × R d , F WT ⊗ B ( R d ) , P ( dω ) ⊗ dx ; R d (cid:17) ≡ L (cid:16) Ω , F WT , P ; L ( R d ; R d ) (cid:17) . The denseness of simple random variables (see [DPZ14, Lemma 1.2, Page 16] for instance)implies that the function f may be approximated monotonically by simple random variables ofthe following form: l (cid:88) i =1 A i ( ω ) h i ( x ) , with h i ∈ L ( R d ; R d ) , A i ∈ F T , l ∈ N + , i = 1 , . . . , l. A i may be approximatedin L (Ω , F T ) by functions in the following set { g i ( W ˜ t i , . . . , W ˜ t iki ) : k i ∈ N + , g i ∈ C ∞ c ( R k i ) , < ˜ t i < · · · < ˜ t ik i ≤ T } . To sum up, the function f may be approximated in L (cid:0) Ω × R d , F WT ⊗ B ( R d ) , P ( dω ) ⊗ dx ; R d (cid:1) by the following random fields: f k ( W ¯ t , · · · , W ¯ t k , x ) = l (cid:88) i =1 g i (cid:16) W ˜ t i , . . . , W ˜ t iki (cid:17) h i ( x ) , where g i ∈ C ∞ c ( R k i ), h i ∈ L ( R d ), 0 < ¯ t < · · · < ¯ t k ≤ T , and { ¯ t , . . . , ¯ t k } = ∪ li =1 { ˜ t i , . . . , ˜ t ik i } . Applying the approximation in (i) of Lemma 4.1 to the functions f k yields the approximationof f , and this completes the proof. Remark 4.1.
In fact, the process ( W t ) t ≥ and the filtration ( F Wt ) t ≥ may be replaced by anarbitrary continuous process ( W t ) t ≥ and corresponding gernerated filtration ( F Wt ) t ≥ , wherethe process ( W t ) t ≥ is not necessarily a Brownian motion. Inspired by [HPW19, HJW18], we adopt a deep learning method based on the following repre-sentation relationship by Theorems 2.4 and 2.5. Letting the quadruple ( X s , Y s , Z s , ˜ Z s ) be thesolution to the following FBSDE − dY s = F s ( e X s , Y s , Z s , ˜ Z s ) ds − ˜ Z s dW s − Z s dB s , ≤ s ≤ T ; Y T = G ( e X T ) ,dX s = (cid:112) V s (cid:16) ρ dW s + (cid:112) − ρ dB s (cid:17) − V s ds, ≤ s ≤ T ; X = x ; V s = ξ s E ( η (cid:99) W s ) with (cid:99) W s = (cid:90) s K ( s, r ) dW r , s ∈ [0 , T ] , (4.2)with K being a general Kernel function including the particular cases in Examples 1.1 and 1.2,one has u τ ( X τ ) = Y τ , (cid:112) (1 − ρ ) V τ Du τ ( X τ ) = Z τ , ψ τ ( X τ ) + ρ (cid:112) V τ Du τ ( X τ ) = ˜ Z τ , for 0 ≤ τ ≤ T and x ∈ R , where the pair ( u, ψ ) is the unique weak solution to BSPDE (2.1) inTheorem 2.5. In particular, we may write forwardly, for t ∈ [0 , T ], u t ( X t ) = u ( X ) − (cid:90) t F s (cid:16) e X s , u s ( X s ) , (cid:112) (1 − ρ ) V s Du s ( X s ) , ψ s ( X s ) + ρ (cid:112) V s Du s ( X s ) (cid:17) ds + (cid:90) t (cid:16) ψ s ( X s ) + ρ (cid:112) V s Du s ( X s ) (cid:17) dW s + (cid:90) t (cid:112) (1 − ρ ) V s Du s ( X s ) dB s . (4.3)16iven a partition of the time interval: π = { t < t < ... < t N = T } with modulus | π | = max i =0 , ,...,N − ∆ t i , ∆ t i = t i +1 − t i , we first simulate (or approximate) the joint process( B, W, V ), and then the forward process X may be approximated by X π obtained through anEuler scheme. Further, the forward representation (4.3) yields an approximation for ( u, ψ ) underthe Euler scheme u t i +1 ( X t i +1 ) ≈ H t i ( X t i , u t i ( X t i ) , (cid:113) (1 − ρ ) V t i Du t i ( X t i ) , ψ t i ( X t i ) + ρ (cid:112) V t i Du t i ( X t i ) , ∆ B t i , ∆ W t i )with H t ( x, y, z, ˜ z, b, w ) := y − F t ( e x , y, z, ˜ z )∆ t i + zb + ˜ zw. Inspired by [HPW19], we design the numerical approximation of u t i ( X t i ) as follows:(1) start with (cid:98) U N = G ;(2) for i = N − , ...,
0, given (cid:98) U i +1 , use the triple of deep neural networks( U i ( · , θ ) , Z i ( · , θ ) , ˜ Z i ( · , θ )) ∈ N N (cid:37) i, ,M,m ( R M m ) × N N (cid:37) i, ,M,m ( R M m ) × N N (cid:37) i, ,M,m ( R M m ) (4.4)for the approximation of (cid:18) u t i ( X t i ) , (cid:113) (1 − ρ ) V t i Du t i ( X t i ) , ψ t i ( X t i ) + ρ (cid:112) V t i Du t i ( X t i ) (cid:19) , to achieve an estimate U i +1 = H t i (cid:16) X t i , U i ( X t i , θ i ) , Z i ( X t i , θ i ) , ˜ Z i ( X t i , θ i ) , ∆ B t i , ∆ W t i (cid:17) ;(3) compute the minimizer of the expected quadratic loss function ˆ L i ( θ ) : = E (cid:12)(cid:12)(cid:12) (cid:98) U i +1 − H t i (cid:16) X t i , U i ( X t i , θ i ) , Z i ( X t i , θ i ) , ˜ Z i ( X t i , θ i ) , ∆ B t i , ∆ W t i (cid:17)(cid:12)(cid:12)(cid:12) , ≈ J J (cid:88) j =1 (cid:12)(cid:12)(cid:12) (cid:98) U ( j ) i +1 − H t i (cid:16) X ( j ) t i , U i ( X ( j ) t i , θ i ) , Z i ( X ( j ) t i , θ i ) , ˜ Z i ( X ( j ) t i , θ i ) , ∆ B ( j ) t i , ∆ W ( j ) t i (cid:17)(cid:12)(cid:12)(cid:12) θ ∗ i ∈ arg min θ ∈ R Mm ˆ L i ( θ ) , where the Adam (adaptive moment estimation) optimizer may be used to get the optimalparameter θ ∗ ;(4) update and set (cid:98) U i = U i ( · , θ ∗ i ), (cid:98) Z i = Z i ( · , θ ∗ i ), and (cid:98) ˜ Z i = ˜ Z i ( · , θ ∗ i ). Remark 4.2.
Here, ( X ( j ) , B ( j ) , W ( j ) , (cid:99) W ( j ) , V ( j ) ) ≤ j ≤ J are independent simulations of ( X, B, W, (cid:99)
W , V ).Noticing that F Wt = F W, (cid:99) Wt for t ∈ [0 , T ], by Proposition 4.2 and Remark 4.1 we have the func-tions in N N (cid:37) i, ,M,m ( R M m ) of the following form:Φ m ( W t , · · · , W t i , (cid:99) W t , · · · , (cid:99) W t i , x ) , i = 0 , , , · · · , N − , which incorporates all the simulated values of ( W, (cid:99) W ) until time t i , leading to the changingdimension of the inputs. One may also see that the finer the partition of [0 , T ] is, the higher input17imension it involves. The changing and high dimensionality arising from the approximationsprompts us to adopt a deep learning-based method, and this also unveils the difference from thescheme in [HPW19].On the other hand, a convergence analysis of the above scheme is given in the appendix.Even though we are working with dimension-changing neural networks under a non-Markovianframework with different assumptions, we adopt a similar strategy to [HPW19] for the proof ofthe convergence analysis. We consider the rough Bergomi model of [BFG16] in Example 1.1 with the following choice ofparameters: H = 0 . η = 1 . , ρ = − . r = 0 . T = 1, X = ln(100). For simplicity, wechoose the forward variance curve to be ξ ( t ) ≡ .
09, independent of time.We compute the numerical approximations to the European option price given in (1.7). Thevalue function u together with another random field ψ constitutes the unique solution to BSPDE(1.9) which corresponds to the BSPDE (2.1) in Theorem 2.5 with F s ( x, y, z, ˜ z ) = − ry, and G ( e x ) = ( K − e x + rT ) + . By Theorems 2.4 and 2.5, the triple ( Y ,xt , Z ,xt , Z ,xt ) t ∈ [0 ,T ] with Y ,xt := u t ( X ,xt ) , Z ,xt := ρ (cid:112) V t Du t ( X ,xt ) + ψ t ( X ,xt ) , Z ,xt := (cid:112) (1 − ρ ) V t Du t ( X ,xt ) , for t ∈ [0 , T ] satisfies the following FBSDE: dX ,xs = (cid:112) V s (cid:16) ρ dW s + (cid:112) − ρ dB s (cid:17) − V s ds, ≤ s ≤ T ; X ,x = x ; V s = ξ s E ( η (cid:99) W s ) with (cid:99) W s = (cid:90) s √ H ( s − r ) H − / dW r , s ∈ [0 , T ]; dY ,xs = rY ,xs ds + Z ,xs dW s + Z ,xs dB s , s ∈ [0 , T ]; Y ,xT = G ( e X ,xT ) . (4.5)Then the deep learning-based method in Section 4.2 is used for the numerical approximations.We take N = 20 in the Euler Scheme and set a single hidden layer whose number of neuronsis equal to half of the total number of neurons in the input and output layers. We adoptthe Sigmoid function for the activation function and the optimization algorithm is Adam. Weimplement 10000 trajectories in mini-batch and check the loss convergence every 50 iterations.In the following Table 1, the reference values are calculated by Monte Carlo method and they areclose to the results obtained by averaging 20 independent runs with the deep learning method.Reference value RSD = standard deviationaverage value Estimated value RSD K = 90 4 . . . . K = 100 7 . . . . K = 110 12 . . . . K = 120 18 . . . . V . We simulate 10000 independent trajectories of the stochastic variance process V and evaluate the corresponding values of u (0 . , ln 100) when t = 0 . x = ln 100, and K = 100.The mean of these u (0 . , ln 100) is 9 . . u (0 . , ln 100)are listed in Table 2. From Figure 1(a) and Table 2, one may see that bigger values of V (0 . V , we reset the values of V to be the same and equal to the average of simulated values of V ( t )at time t = 0 .
5, i.e., we fix V (0 .
5) = 0 . u (0 . , ln 100)turns out to be 9 . . u (0 . , ln 100) in Table 3. Comparing the obtained means, the standard deviations, and the fourpaths and associated values of u (0 . , ln 100) in these two cases, we may see that the value of V at t = 0 . u (0 . , ln 100),which is different from the classical Markovian cases; this is due to the path-dependence andthus the non-Markovianity, i.e., the trajectory of V before t = 0 . u (0 . , ln 100) in a non-negligible manner. (a) Paths of V with different values at t=0.5 (b) Paths of V with a fixed value at t = 0 . Figure 1: Different paths of V on time interval [0 , . u (0 . , ln 100) V t = 0 .
5) = 0 . . V t = 0 .
5) = 0 . . V t = 0 .
5) = 0 . . V t = 0 .
5) = 0 . . u (0 . , ln 100) on different paths of V in Figure 1(a)19aths of process V with u (0 . , ln 100) V t = 0 .
5) = 0 . . V t = 0 .
5) = 0 . . V t = 0 .
5) = 0 . . V t = 0 .
5) = 0 . . u (0 . , ln 100) on different paths of V in Figure 1(b) Again, consider the rough Bergomi model in Example 1.1 with the following choice of parameters: H = 0 . η = 1 . , ρ = − . r = 0 . T = 1, X = ln(100). Also, we choose the forwardvariance curve to be ξ ( t ) ≡ .
09 independent of time, for simplicity. The strike prices may takedifferent values. Then, pricing the American put option is to compute u ( x ) := sup τ ∈T E (cid:104) e − τr g τ ( e X ,xτ ) (cid:105) , with g τ ( e x ) = ( K − e rτ + x ) + , for ( τ, x ) ∈ [0 , T ] × R . We shall adopt two different schemes for the computations for the numerical approximations.The first scheme is based on the penalization. By Corollary 3.1, u ( x ) may be approximatedby u ˜ N ( x ) as ˜ N tends to infinity, where the pair ( u ˜ N , ψ ˜ N ) is the unique weak solution to BSPDE(2.1) with F t ( e x , y, z, ˜ z ) = − ry + ˜ N ( g t ( e x ) − y ) + and G ( e x ) = g T ( e x ) . Then the first scheme is to use the algorithm in Section 4.2 to compute u ˜ N ( X ) which approxi-mates u ( x ) when ˜ N tends to infinity.The second scheme is based on the representation via the following forward-backward system: dX ,xs = (cid:112) V s (cid:16) ρ dW s + (cid:112) − ρ dB s (cid:17) − V s ds, ≤ s ≤ T ; X ,x = x ; V s = ξ s E ( η (cid:99) W s ) with (cid:99) W s = (cid:90) s √ H ( s − r ) H − / dW r , s ∈ [0 , T ]; − dY ,xs = − rY ,xs ds + dA ,xs − Z ,x ; Bs dB s − Z ,x ; Ws dW s , s ∈ [0 , T ]; Y ,xT = g T ( e X ,xT ); Y ,xs ≥ g s ( e X ,xs ) , s ∈ [0 , T ]; A ,x · is increasing and continuous, A ,x = 0 , (cid:90) T (cid:16) Y ,xs − g s ( e X ,xs ) (cid:17) dA ,xs = 0 . (4.6)Recalling the assertion (v) in Corollary (3.1) which gives the following representation u τ ( X ,xτ ) = Y ,xτ , ψ Bτ ( X ,xτ ) = Z ,x ; Bτ , and ψ Wτ ( X ,xτ ) = Z ,x ; Wτ , a.s. , for 0 ≤ τ ≤ T , for some triple ( u, ψ B , ψ W ) defined on (Ω × [0 , T ] × R , P W ⊗ B ( R )), we may usethe following scheme:(1) Start with (cid:98) U N = g T . 202) For i = N − , ...,
0, given (cid:98) U i +1 , use the triple of deep neural networks( U i ( · , θ ) , Z Bi ( · , θ ) , ˜ Z Wi ( · , θ )) ∈ N N (cid:37) i, ,M,m ( R M m ) × N N (cid:37) i, ,M,m ( R M m ) × N N (cid:37) i, ,M,m ( R M m ) (4.7)for the approximation of (cid:16) u t i ( X t i ) , ψ Bt i ( X t i ) , ψ Wt i ( X t i ) (cid:17) , and obtain an estimate U i +1 = U i ( X t i , θ i ) + r U i ( X t i , θ i )∆ t i + Z Bi ( X t i , θ i ) ∆ B t i + Z Wi ( X t i , θ i ) ∆ W t i . (3) Compute the minimizer of the expected quadratic loss function: ˆ L i ( θ ) : = E (cid:12)(cid:12)(cid:12) (cid:98) U i +1 − U i +1 (cid:12)(cid:12)(cid:12) ,θ ∗ i ∈ arg min θ ∈ R Nm ˆ L i ( θ ) . (4) Update (cid:98) U i = max {U i ( X t i , θ ∗ i ) , g t i ( X t i ) } .The above scheme extends the one proposed in [HPW19, Section 3.3] from Markovian cases toa non-Markovian setting, with the main difference lying in the changing dimensions in the neuralnetworks (4.7). Looking into Appendix for the convergence analysis of the scheme in Section 4.2,we may extend the convergence analysis in [HPW19, Section 4.3] to our non-Markovian setting,and as such an extension is similar to that of the scheme in Section 4.2, the proof is omitted.In Table 4, the estimates of the above two schemes are presented together with the referencevalues which are lower bound estimates from [BTW18]. We take N = 20 and implement asingle hidden layer whose number of neurons is equal to half of the total number of neurons inthe input and output layers. The activation function and optimization algorithm we use hereare Sigmoid function and Adam. The results are obtained by averaging 20 independent runs.For the first scheme, in theory, u ˜ N ( X ) is (bigger and) closer to the real value than u ¯ N ( X )when ˜ N > ¯ N , which is affirmed by the numerical experiments. We set ˜ N equal to 40 and10000 for comparisons. The same neural networks are put to use in the second scheme. Here,neural networks with ≥ K = 90 5 .
32 5 . . . . . . K = 100 8 .
51 9 . . . . . . K = 110 13 .
24 15 . . . . . . K = 120 20 22 . . . . . . ρ = η = 0 and keeping the other parameters unchanged, The estimates of the above twoschemes are compared with the option prices calculated by binprice function in the financialtoolbox of Matlab. It can be seen from Table 5 that our results are pretty close to the optionprice estimates by using the Cox-Ross-Rubinstein binomial model.Reference value 1st scheme 2nd scheme RSDN=40 RSD N=10000 RSD K = 90 5 . . . . . . . K = 100 9 . . . . . . . K = 110 15 . . . . . . . K = 120 22 . . . . . . . ρ = η = 0. A Convergence analysis
This section is to devoted to a convergence analysis for the deep learning-based scheme proposedin Section 4.2. The discussions are conducted under Assumptions ( A ∗ ), 1.1, 2.1, and the followingone: (H1) (i) There exists a continuous and increasing function ρ : [0 , ∞ ) → [0 , ∞ ) with ρ (0) = 0such that for any 0 ≤ t ≤ t ≤ T , it holds that E (cid:20)(cid:90) t t V s ds (cid:21) + E (cid:34)(cid:18)(cid:90) t t V s ds (cid:19) (cid:35) ≤ ρ ( | t − t | ) . (ii) There exists a constant L > | F t ( e x , y , z , ˜ z ) − F t ( e x , y , z , ˜ z ) |≤ L ( (cid:112) ρ ( | t − t | ) + | x − x | + | y − y | + | z − z | + | ˜ z − ˜ z | ) , for all ( t , x , y , z , ˜ z ) and ( t , x , y , z , ˜ z ) in [0 , T ] × R × R × R × R . Remark A.1.
In fact, for examples like 1.1 and 1.2, one has E (cid:20)(cid:90) T | V t | p dt (cid:21) < ∞ , for some p > , which, by H¨older’s inequality, implies E (cid:90) t t V t dt ≤ | t − t | p − p (cid:18) E (cid:90) t t | V t | p dt (cid:19) /p ≤ C p | t − t | p − p , for 0 ≤ t ≤ t ≤ T, (cid:34)(cid:18)(cid:90) t t V t dt (cid:19) (cid:35) ≤ | t − t | p − p (cid:18) E (cid:90) t t | V t | p dt (cid:19) /p ≤ C p | t − t | p − p , for 0 ≤ t ≤ t ≤ T, and thus, we may take ρ ( r ) = ( C p + C p ) · (cid:18) | r | p − p ∨ | r | p − p (cid:19) , for r ≥
0. Further, one maystraightforwardly check that the numerical examples discussed in Section 4.3 have Assumption (H1) satisfied.In what follows, we denote by C a positive generic constant whose value is independentof π and may vary from line to line, and by X we denote the unique (strong) solution to theSDE (1.8) start at t = 0 and by X = X π the Euler-Maruyama approximation with a time grid π = { t = 0 < t < ... < t N = T } , with modulus | π | = max ≤ i ≤ N | t i − t i − | bounded by CTN forsome constant C . Under Assumptions 1.1 and (H1) , standard calculations yield that E (cid:34) sup ≤ t ≤ T |X t | (cid:35) ≤ C (1 + | x | ) , (A.1)max i =0 ,...,N − E (cid:34) |X t i +1 − X t i +1 | + sup t ∈ [ t i ,t i +1 ] |X t − X t i | (cid:35) ≤ Cρ ( | π | ) . (A.2)By the theory of BSDEs (see [BDH +
03] for instance), Assumptions 1.1, 2.1, and (H1) implythe existence and uniqueness of an adapted L -solution ( Y, Z, ˜ Z ) to BSDE (2.2), which togetherwith (A.1) and (H1) -(ii) gives E (cid:20)(cid:90) T | F t ( e X t , Y t , Z t , ˜ Z t ) | dt (cid:21) < ∞ (A.3)and the standard L -regularity result on Y :max i =0 ,...,N − E (cid:34) sup t ∈ [ t i ,t i +1 ] | Y t − Y t i | (cid:35) = O ( | π | ) . (A.4)For the pair ( Z, ˜ Z ), set ε Z ( π ) := E (cid:104)(cid:80) N − i =0 (cid:82) t i +1 t i | Z t − ¯ Z t i | dt (cid:105) , with ¯ Z t i := t i E i (cid:104)(cid:82) t i +1 t i Z t dt (cid:105) ,ε ˜ Z ( π ) := E (cid:104)(cid:80) N − i =0 (cid:82) t i +1 t i | ˜ Z t − ¯˜ Z t i | dt (cid:105) , with ¯˜ Z t i := t i E i (cid:104)(cid:82) t i +1 t i ˜ Z t dt (cid:105) , (A.5)where E i denotes the conditional expectation given F t i .To investigate the convergence of the deep learning scheme, we define, for i = 0 , ..., N − (cid:98) V t i := E i [ (cid:98) U i +1 ( X t i +1 )] + F t i ( e X ti , (cid:98) V t i , (cid:98) Z t i , (cid:98) ˜ Z t i )∆ t i , (cid:98) Z t i := t i E i [( (cid:98) U i +1 ( X t i +1 )∆ B t i ] , (cid:98) ˜ Z t i := t i E i [( (cid:98) U i +1 ( X t i +1 )∆ W t i ] , (A.6)where, (cid:98) V t i is well-defined for sufficiently small | π | due to the uniform Lipschitz continuity of F .In view of Theorem 2.4, we may find F Wt i ⊗ B ( R )-measurable functions ˆ v i , ˆ z i , and ˆ˜ z i s.t. (cid:98) V t i = ˆ v i ( X t i ) , (cid:98) Z t i = ˆ z i ( X t i ) , and (cid:98) ˜ Z t i = ˆ˜ z i ( X t i ) , i = 0 , ..., N − . (A.7)23n the other hand, by the martingale representation theorem, there exist two R -valued squareintegrable processes { (cid:98) Z t } and { (cid:98) ˜ Z t } s.t. (cid:98) U i +1 ( X t i +1 ) = (cid:98) V t i − F t i ( e X ti , (cid:98) V t i , (cid:98) Z t i , (cid:98) ˜ Z t i )∆ t i + (cid:90) t i +1 t i (cid:98) Z t dB t + (cid:90) t i +1 t i (cid:98) ˜ Z t dW t , (A.8)and Itˆo’s isometry gives (cid:98) Z t i = t i E i [ (cid:82) t i +1 t i (cid:98) Z t dt ] , (cid:98) ˜ Z t i = t i E i [ (cid:82) t i +1 t i (cid:98) ˜ Z t dt ] , i = 0 , ..., N − . The distance between the optimal triple ( (cid:98) U i , (cid:98) Z i , (cid:98) ˜ Z i ) from the deep learning-based scheme and( (cid:98) V t i , (cid:98) Z t i , (cid:98) ˜ Z t i ) from the system (A.6) is given as follows. Lemma A.1.
Let Assumptions ( A ∗ ) , 1.1, 2.1, and (H1) hold. When | π | is sufficiently small,we have E | (cid:98) V t i − (cid:98) U i ( X t i ) | + ∆ t i E (cid:20) | (cid:98) Z t i − (cid:98) Z i ( X t i ) | + | (cid:98) ˜ Z t i − (cid:98) ˜ Z i ( X t i ) | (cid:21) ≤ Cε N ,vi + C ∆ t i ε N ,zi + C ∆ t i ε N , ˜ zi , (A.9) where we use ε N ,vi := inf ξ E | ˆ v i ( X t i ) − U i ( X t i ; ξ ) | , ε N ,zi := inf η E | ˆ z i ( X t i ) − Z i ( X t i ; η ) | , and ε N , ˜ zi := inf η E | ˆ˜ z i ( X t i ) − ˜ Z i ( X t i ; η ) | to denote the L -approximation errors of ˆ v i , ˆ z i , and ˆ˜ z i by neural networks U i , Z i , and ˜ Z i , for i = 0 , ..., N − . To focus on the convergence analysis, we postpone the proof of Lemma A.1. Define thefollowing square error: E [( (cid:98) U , (cid:98) Z , (cid:98) ˜ Z ) , ( Y, Z, ˜ Z )] = max i =0 ,...,N − E (cid:104) | Y t i − (cid:98) U i ( X t i ) | (cid:105) + E (cid:34) N − (cid:88) i =0 (cid:90) t i +1 t i | Z t − (cid:98) Z i ( X t i ) | dt (cid:35) + E (cid:34) N − (cid:88) i =0 (cid:90) t i +1 t i | ˜ Z t − (cid:98) ˜ Z i ( X t i ) | dt (cid:35) . Theorem A.2.
Under Assumptions ( A ∗ ) , 1.1, 2.1, and (H1) , it holds that E [( (cid:98) U , (cid:98) Z , (cid:98) ˜ Z ) , ( Y, Z, ˜ Z )] ≤ C (cid:40) E | G ( X T ) − G ( X T ) | + ρ ( | π | ) + | π | + ε Z ( π ) + ε ˜ Z ( π ) + N − (cid:88) i =0 ( N ε N ,vi + ε N ,zi + ε N , ˜ zi ) (cid:41) , (A.10) where the constant C is independent of the partition π . The computations involved in the proofs of Lemma A.1 and Theorem A.2 are conducted ina similar way to [HPW19, Section 4.1] by Hur´e, Pham, and Warin, with the main differenceslying in the approximations of the random variables with dimension-varying neural networksand the general modulus function ρ ( π ). We provide the proofs for the reader’s interests.24 roof of Theorem A.2. Step 1.
We first derive a recursive estimate for the square norm of Y t i − (cid:98) V t i , i.e., E | Y t i − (cid:98) V t i | ≤ (1 + C | π | ) E | Y t i +1 − (cid:98) U i +1 ( X t i +1 ) | + C | π | E (cid:20)(cid:90) t i +1 t i | F t ( e X t , Y t , Z t , ˜ Z t ) | dt (cid:21) + CE (cid:20)(cid:90) t i +1 t i (cid:16) | ˜ Z t − ¯˜ Z t i | + | Z t − ¯ Z t i | (cid:17) dt (cid:21) + Cρ ( | π | ) | π | , (A.11)for each i ∈ { , ..., N − } .In view of (2.2) and (A.6), we have Y t i − (cid:98) V t i = E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] + E i (cid:20)(cid:90) t i +1 t i F t ( e X t , Y t , Z t , ˜ Z t ) − F t i ( e X ti , (cid:98) V t i , (cid:98) Z t i , (cid:98) ˜ Z t i )d t (cid:21) . Young’s inequality gives ( a + b ) ≤ (1 + γ ∆ t i ) a + (1 + γ ∆ t i ) b for any a, b ∈ R and γ > F in (H1) , and theestimation (A.2) on the forward process, implies that E | Y t i − (cid:98) V t i | ≤ E (cid:26) (1 + γ ∆ t i ) (cid:16) E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] (cid:17) + (cid:18) γ ∆ t i (cid:19) (cid:18) E i (cid:104) (cid:90) t i +1 t i ( F t ( e X t , Y t , Z t , ˜ Z t ) − F t i ( e X ti , (cid:98) V t i , (cid:98) Z t i , (cid:98) ˜ Z t i )) dt (cid:105)(cid:19) (cid:27) ≤ (1 + γ ∆ t i ) E (cid:104) | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:105) + 5 (cid:18) γ ∆ t i (cid:19) L ∆ t i (cid:26) Cρ ( | π | ) | π | + E (cid:20) (cid:90) t i +1 t i | Y t − (cid:98) V t i | dt (cid:21) + E (cid:20) (cid:90) t i +1 t i (cid:18) | Z t − (cid:98) Z t i | + | ˜ Z t − (cid:98) ˜ Z t i | (cid:19) dt (cid:21)(cid:27) ≤ (1 + γ ∆ t i ) E (cid:104) | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:105) + 5 (1 + γ ∆ t i ) L γ (cid:26) Cρ ( | π | ) | π | + 2∆ t i E | Y t i − (cid:98) V t i | + E (cid:20) (cid:90) t i +1 t i (cid:18) | Z t − (cid:98) Z t i | + | ˜ Z t − (cid:98) ˜ Z t i | (cid:19) dt (cid:21)(cid:27) , (A.12)where the L -regularity of Y (A.4) is used in the last inequality.Recalling that ¯ Z and ¯˜ Z are the L -projections of Z and ˜ Z respectively, we have E [ (cid:82) t i +1 t i | Z t − (cid:98) Z t i | dt ] = E [ (cid:82) t i +1 t i | Z t − ¯ Z t i | dt ] + ∆ t i E (cid:104) | ¯ Z t i − (cid:98) Z t i | (cid:105) ,E [ (cid:82) t i +1 t i | ˜ Z t − (cid:98) ˜ Z t i | dt ] = E [ (cid:82) t i +1 t i | ˜ Z t − ¯˜ Z t i | dt ] + ∆ t i E (cid:20) | ¯˜ Z t i − (cid:98) ˜ Z t i | (cid:21) . (A.13)Integrate equation (2.2) over time interval [ t i , t i +1 ] multiplied by ∆ W t i and ∆ B t i respectively.This together with (A.6) gives∆ t i (cid:18) ¯˜ Z t i − (cid:98) ˜ Z t i (cid:19) = E i (cid:104) ∆ W t i (cid:16) Y t i +1 − (cid:98) U i +1 ( X t i +1 ) − E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] (cid:17)(cid:105) + E i (cid:20) ∆ W t i (cid:90) t i +1 t i F t ( e X t , Y t , Z t , ˜ Z t ) dt (cid:21) , ∆ t i (cid:16) ¯ Z t i − (cid:98) Z t i (cid:17) = E i (cid:104) ∆ B t i (cid:16) Y t i +1 − (cid:98) U i +1 ( X t i +1 ) − E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] (cid:17)(cid:105) E i (cid:20) ∆ B t i (cid:90) t i +1 t i F t ( e X t , Y t , Z t , ˜ Z t ) dt (cid:21) . Standard computations further indicate that∆ t i E (cid:104) | ¯ Z t i − (cid:98) Z t i | (cid:105) ≤ (cid:16) E | Y t i +1 − (cid:98) U i +1 ( X t i +1 ) | − E | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:17) + 2∆ t i E (cid:20)(cid:90) t i +1 t i | F t ( e X t , Y t , Z t , ˜ Z t ) | dt (cid:21) ; (A.14)it follows similarly for ˜ Z . Then, by plugging (A.13) and (A.14) into (A.12), and choosing γ = 20 L , we have E (cid:104) | Y t i − (cid:98) V t i | (cid:105) ≤ (1 + γ ∆ t i ) E (cid:104) | E i (cid:104) Y t i +1 − (cid:98) U i +1 ( X t i +1 ) (cid:105) | (cid:105) + 5(1 + γ ∆ t i ) L γ (cid:26) Cρ ( | π | ) | π | + 2∆ t i E | Y t i − (cid:98) V t i | + E (cid:20)(cid:90) t i +1 t i (cid:16) | Z t − ¯ Z t i | + | ˜ Z t − ¯˜ Z t i | (cid:17) dt (cid:21) + 4 (cid:16) E | Y t i +1 − (cid:98) U i +1 ( X t i +1 ) | − E (cid:2) | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:3)(cid:17) + 4∆ t i E (cid:20)(cid:90) t i +1 t i | F t ( e X t , Y t , Z t , ˜ Z t ) | dt (cid:21) (cid:27) ≤ Cρ ( | π | ) | π | + (1 + γ ∆ t i ) E | Y t i +1 − (cid:98) U i +1 ( X t i +1 ) | + C ∆ t i E | Y t i − (cid:98) V t i | + CE (cid:20) (cid:90) t i +1 t i (cid:16) | Z t − ¯ Z t i | + | ˜ Z t − ¯˜ Z t i | (cid:17) dt (cid:21) + C ∆ t i E (cid:20) (cid:90) t i +1 t i | F t ( e X t , Y t , Z t , ˜ Z t ) | dt (cid:21) , (A.15)which implies (A.11) when | π | is sufficiently small. Step 2.
We prove the estimate for the Y -component in (A.10), i.e.,max i =0 ,...,N − E | Y t i − (cid:98) U i ( X t i ) | ≤ Cρ ( | π | ) + CE | G ( X T ) − G ( X T ) | + Cε Z ( π ) + Cε ˜ Z ( π )+ C N − (cid:88) i =0 ( N ε N ,vi + ε N ,zi + ε N , ˜ zi ) . (A.16)Indeed, using Young inequality of the form:( a + b ) ≥ (1 − | π | ) a + (cid:18) − | π | (cid:19) b ≥ (1 − | π | ) a − | π | b , we have E | Y t i − (cid:98) V t i | = E | Y t i − (cid:98) U i ( X t i ) + (cid:98) U i ( X t i ) − (cid:98) V t i | ≥ (1 − | π | ) E | Y t i − (cid:98) U i ( X t i ) | − | π | E | (cid:98) U i ( X t i ) − (cid:98) V t i | . (A.17)Plugging the above inequality into (A.11) and letting | π | be small enough yield that E | Y t i − (cid:98) U i ( X t i ) | ≤ Cρ ( | π | ) | π | + (1 + C | π | ) E | Y t i +1 − (cid:98) U i +1 ( X t i +1 ) | + CE (cid:20) (cid:90) t i +1 t i (cid:16) | Z t − ¯ Z t i | + | ˜ Z t − ¯˜ Z t i | (cid:17) dt (cid:21) C | π | E (cid:20) (cid:90) t i +1 t i | F t ( e X t , Y t , Z t , ˜ Z t ) | dt (cid:21) + CN E | (cid:98) V t i − (cid:98) U i ( X t i ) | . (A.18)Recalling Y t N = G ( X T ) and (cid:98) U i ( X t N ) = G ( X T ), and (A.3), we may use the discrete Gronwall’sinequality to reach the following estimate:max i =0 ,...,N − E | Y t i − (cid:98) U i ( X t i ) | ≤ C (cid:40) ρ ( | π | ) + | π | + E | G ( X T ) − G ( X T ) | + ε Z ( π ) + ε ˜ Z ( π ) + N N − (cid:88) i =0 E | (cid:98) U i ( X t i ) − (cid:98) V t i | (cid:41) , (A.19)which combined with Lemma A.1 gives (A.16). Step 3.
We prove the estimate for the ( Z, ˜ Z )-component in (A.10), i.e., E (cid:34) N − (cid:88) i =0 (cid:90) t i +1 t i (cid:16) | Z t − (cid:98) Z i ( X t i ) | + | ˜ Z t − (cid:98) ˜ Z i ( X t i ) | (cid:17) dt (cid:35) ≤ C (cid:40) ε Z ( π ) + ε ˜ Z ( π ) + ρ ( | π | ) + | π | + E | G ( X T ) − G ( X T ) | + N − (cid:88) i =0 ( N ε N ,vi + ε N ,zi ) , + ε N , ˜ zi ) (cid:41) . From (A.13) and (A.14), it follows that for any i = 0 , ..., N − E [ (cid:90) t i +1 t i | Z t − (cid:98) Z t i | dt ] ≤ E (cid:20)(cid:90) t i +1 t i | Z t − ¯ Z t i | dt (cid:21) + 2 (cid:16) E | Y t i +1 − (cid:98) U i +1 ( X t i +1 ) | − E | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:17) + 2 | π | E (cid:20)(cid:90) t i +1 t i | F t ( e X t , Y t , Z t , ˜ Z t ) | dt (cid:21) . which, together with (A.3), gives E (cid:34) N − (cid:88) i =0 (cid:90) t i +1 t i | Z t − (cid:98) Z t i | dt (cid:35) ≤ ε Z ( π ) + 2 E | G ( X T ) − G ( X T ) | + 2 N − (cid:88) i =0 (cid:16) E | Y t i − (cid:98) U i ( X t i ) | − E | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:17) + C | π | , (A.20)where the indices are changed in the last summation. Analogously, E (cid:34) N − (cid:88) i =0 (cid:90) t i +1 t i | ˜ Z t − (cid:98) ˜ Z t i | dt (cid:35) ≤ ε ˜ Z ( π ) + 2 E | G ( X T ) − G ( X T ) | + 2 N − (cid:88) i =0 (cid:16) E | Y t i − (cid:98) U i ( X t i ) | − E | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:17) + C | π | . (A.21)Notice that by (A.12) and (A.17) we have2 (cid:16) E | Y t i − (cid:98) U i ( X t i ) | − E | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:17) ≤ − | π | (cid:26) (1 + γ ∆ t i ) E (cid:104) | E i [ Y t i +1 − (cid:98) U i +1 ( X t i +1 )] | (cid:105) + 5(1 + γ ∆ t i ) L γ (cid:16) Cρ ( | π | ) | π |
27 2 | π | E | Y t i − (cid:98) V t i | + E (cid:104) (cid:90) t i +1 t i | Z t − (cid:98) Z t i | dt (cid:105) + E (cid:104) (cid:90) t i +1 t i | ˜ Z t − (cid:98) ˜ Z t i | dt (cid:105)(cid:17)(cid:27) + 3 | π | (1 − | π | ) E | (cid:98) U i ( X t i ) − (cid:98) V t i | . (A.22)Take γ = 50 L so that L γ (1 + γ | π | ) / (1 − | π | ) ≤ / | π | small enough and notice that[(1 + γ | π | ) / (1 − | π | ) −
1] = O ( | π | ). This together with (A.3), (A.9), (A.11), (A.16), and (A.20),yields E (cid:34) N − (cid:88) i =0 (cid:90) t i +1 t i (cid:18) | Z t − (cid:98) Z t i | + | ˜ Z t − (cid:98) ˜ Z t i | (cid:19) dt (cid:35) ≤ ε Z ( π ) + ε ˜ Z ( π ) + C max i =0 ,...,N E | Y t i − (cid:98) U i ( X t i ) | + Cρ ( | π | ) + CE | G ( X T ) − G ( X T ) | + C | π | N − (cid:88) i =0 E | Y t i − (cid:98) V t i | + CN N − (cid:88) i =0 E | (cid:98) U i ( X t i ) − (cid:98) V t i | + C | π |≤ ε Z ( π ) + ε ˜ Z ( π ) + C max i =0 ,...,N E | Y t i − (cid:98) U i ( X t i ) | + Cρ ( | π | ) + C | π | + C | π | N − (cid:88) i =0 (cid:26) Cρ ( | π | ) | π | + CE (cid:20) (cid:90) t i +1 t i (cid:16) | Z t − ¯ Z t i | + | ˜ Z t − ¯˜ Z t i | (cid:17) dt (cid:21) + (1 + C | π | ) E | Y t i +1 − (cid:98) U i +1 ( X t i +1 ) | + C | π | E (cid:20) (cid:90) t i +1 t i | F ( t, X t , Y t , Z t , ˜ Z t ) | dt (cid:21)(cid:27) + CN N − (cid:88) i =0 E | (cid:98) U i ( X t i ) − (cid:98) V t i | ≤ C (cid:26) ε Z ( π ) + ε ˜ Z ( π ) + ρ ( | π | ) + | π | + E | G ( X T ) − G ( X T ) | + N − (cid:88) i =0 ( N ε N ,vi + ε N ,zi + ε N , ˜ zi ) (cid:27) . (A.23) Finally, noticing the relations E (cid:20)(cid:90) t i +1 t i (cid:12)(cid:12)(cid:12) Z t − (cid:98) Z i ( X t i ) (cid:12)(cid:12)(cid:12) d t (cid:21) ≤ E (cid:20)(cid:90) t i +1 t i (cid:12)(cid:12)(cid:12) Z t − (cid:98) Z t i (cid:12)(cid:12)(cid:12) d t (cid:21) + 2∆ t i E (cid:12)(cid:12)(cid:12) (cid:98) Z t i − (cid:98) Z i ( X t i ) (cid:12)(cid:12)(cid:12) ,E (cid:20)(cid:90) t i +1 t i (cid:12)(cid:12)(cid:12) ˜ Z t − (cid:98) ˜ Z i ( X t i ) (cid:12)(cid:12)(cid:12) d t (cid:21) ≤ E (cid:34)(cid:90) t i +1 t i (cid:12)(cid:12)(cid:12)(cid:12) ˜ Z t − (cid:98) ˜ Z t i (cid:12)(cid:12)(cid:12)(cid:12) d t (cid:35) + 2∆ t i E (cid:12)(cid:12)(cid:12)(cid:12) (cid:99) ˜ Z t i − (cid:98) ˜ Z i ( X t i ) (cid:12)(cid:12)(cid:12)(cid:12) , and using (A.9), (A.23), we obtain by summing over i = 0 , ..., N − , the desired error estimatefor the ( Z, ˜ Z )-component, completing the proof.Finally, we prove the claim in Lemma A.1. Proof of Lemma A.1.
Fix i ∈ { , ..., N − } . Using relation (A.8) in the expression of the expectedquadratic loss function, and recalling the definitions of (cid:98) Z t i and (cid:98) ˜ Z t i as L -projection of (cid:98) Z t and (cid:98) ˜ Z t , we have for all parameters θ of the neural networks U i ( . ; θ ), Z i ( . ; θ ), and ˜ Z i ( . ; θ ),ˆ L i ( θ ) = ˜ L i ( θ ) + E (cid:34)(cid:90) t i +1 t i (cid:32)(cid:12)(cid:12)(cid:12) (cid:98) Z t − (cid:98) Z t i (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) ˜ Z t − (cid:98) ˜ Z t i (cid:12)(cid:12)(cid:12)(cid:12) (cid:33) d t (cid:35) , (A.24)with˜ L i ( θ ) := E (cid:20) | (cid:98) V t i − U i ( X t i ; θ i ) 28 ( F t i ( e X ti , U i ( X t i ; θ ) , Z i ( X t i ; θ i ) , ˜ Z i ( X t i ; θ i )) − F t i ( e X ti , (cid:98) V t i , (cid:98) Z t i , (cid:98) ˜ Z t i ))∆ t i | (cid:21) + ∆ t i E (cid:104) | (cid:98) Z t i − Z i ( X t i ; θ i ) | (cid:105) + ∆ t i E (cid:20) | (cid:98) ˜ Z t i − ˜ Z i ( X t i ; θ i ) | (cid:21) . (A.25)By using Young inequality: ( a + b ) ≤ (1 + γ ∆ t i ) a + (1 + γ ∆ t i ) b , together with the Lipschitzcondition on F in (H1) , we see that˜ L i ( θ ) ≤ (1 + C ∆ t i ) E | (cid:98) V t i − U i ( X t i ; θ i ) | + C ∆ t i E (cid:20) | (cid:98) Z t i − Z i ( X t i ; θ i ) | + | (cid:98) ˜ Z t i − ˜ Z i ( X t i ; θ i ) | (cid:21) . (A.26)On the other hand, using Young inequality in the form: ( a + b ) ≥ (1 − γ ∆ t i ) a + (1 − γ ∆ t i ) b ≥ (1 − γ ∆ t i ) a − γ ∆ t i b , together with the Lipschitz condition on F , gives˜ L i ( θ ) ≥ (1 − γ ∆ t i ) E | (cid:98) V t i − U i ( X t i ; θ i ) | − t i L γ (cid:16) E | (cid:98) V t i − U i ( X t i ; θ i ) | + E | (cid:98) Z t i − Z i ( X t i ; θ i ) | + E | (cid:98) ˜ Z t i − ˜ Z i ( X t i ; θ i ) | (cid:17) + ∆ t i E | (cid:98) Z t i − Z i ( X t i ; θ i ) | + ∆ t i E | (cid:98) ˜ Z t i − ˜ Z i ( X t i ; θ i ) | . (A.27)Choosing γ = 6 L , this yields˜ L i ( θ ) ≥ (1 − C ∆ t i ) E | (cid:98) V t i − U i ( X t i ; θ i ) | + ∆ t i E (cid:20) | (cid:98) Z t i − Z i ( X t i ; θ i ) | + | (cid:98) ˜ Z t i − ˜ Z i ( X t i ; θ i ) | (cid:21) . (A.28)For each i ∈ { , . . . , N − } , take θ ∗ i ∈ arg min θ ˆ L i ( θ ) so that (cid:98) U i = U i ( · ; θ ∗ i ), (cid:98) Z i = Z i ( · ; θ ∗ i ), and (cid:98) ˜ Z i = ˜ Z i ( · ; θ ∗ i ). As the second term of the right hand side of (A.24) is independent of parameters θ i , it also holds that θ ∗ i ∈ arg min θ ˜ L i ( θ ). Combining (A.28) and (A.26) implies that for all θ (1 − C ∆ t i ) E | (cid:98) V t i − (cid:98) U i ( X t i ) | + ∆ t i E (cid:20) | (cid:98) Z t i − (cid:98) Z i ( X t i ) | + | (cid:98) ˜ Z t i − (cid:98) ˜ Z i ( X t i ) | (cid:21) ≤ ˜ L i ( θ ∗ i ) ≤ ˜ L i ( θ ) ≤ (1 + C ∆ t i ) E | (cid:98) V t i − U i ( X t i ; θ i ) | + C ∆ t i E (cid:20) | (cid:98) Z t i − Z i ( X t i ; θ i ) | + | (cid:98) ˜ Z t i − ˜ Z i ( X t i ; θ i ) | (cid:21) . (A.29)By (A.7), letting | π | be sufficiently small gives (A.9). References [ALV07] Elisa Al`os, Jorge A Le´on, and Josep Vives. On the short-time behavior of theimplied volatility for jump-diffusion models with stochastic volatility.
Finance andStochastics , 11(4):571–589, 2007.[BD14] Christian Bender and Nikolai Dokuchaev. A first-order BSPDE for swing optionpricing.
Math. Finance , 2014. DOI: 10.1111/mafi.12067.[BDH +
03] P. Briand, B. Delyon, Y. Hu, E. Pardoux, and L. Stoica. L p solutions of backwardstochastic differential equations. Stoch. Process. Appl. , 108(4):604–618, 2003.29BFG16] Christian Bayer, Peter Friz, and Jim Gatheral. Pricing under rough volatility.
Quan-titative Finance , 16(6):887–904, 2016.[BFG +
19] Christian Bayer, Peter K Friz, Paul Gassiat, Jorg Martin, and Benjamin Stemper.A regularity structure for rough volatility.
Mathematical Finance , 2019.[BHM +
19] Christian Bayer, Blanka Horvath, Aitor Muguruza, Benjamin Stemper, and MehdiTomas. On deep calibration of (rough) stochastic volatility models. arXiv preprintarXiv:1908.08806 , 2019.[BL08] Rainer Buckdahn and Juan Li. Stochastic differential games and viscosity solutionsof Hamilton-Jacobi-Bellman-Isaacs equations.
SIAM J. Control Optim. , 47(1):444–475, 2008.[BTW18] Christian Bayer, Ra´ul Tempone, and S¨oren Wolfers. Pricing american options byexercise rate optimization. arXiv preprint arXiv:1809.07300 , 2018.[CCR12] Fabienne Comte, Laure Coutin, and Eric Renault. Affine fractional stochastic volatil-ity models.
Annals of Finance , 8(2-3):337–378, 2012.[CH05] Alexander MG Cox and David G Hobson. Local martingales, bubbles and optionprices.
Finance and Stochastics , 9(4):477–492, 2005.[DPZ14] Giuseppe Da Prato and Jerzy Zabczyk.
Stochastic equations in infinite dimensions .Cambridge university press, 2014.[DQT11] Kai Du, Jinniao Qiu, and Shanjian Tang. L p theory for super-parabolic backwardstochastic partial differential equations in the whole space. Appl. Math. Optim. ,65(2):175–219, 2011.[EEFR18] Omar El Euch, Masaaki Fukasawa, and Mathieu Rosenbaum. The microstruc-tural foundations of leverage effect and rough volatility.
Finance and Stochastics ,22(2):241–280, 2018.[EER19] Omar El Euch and Mathieu Rosenbaum. The characteristic function of rough Hestonmodels.
Mathematical Finance , 29(1):3–38, 2019.[EKP +
97] N. El Karoui, C. Kapoudjian, E. Paudoux, S. Peng, and M. C. Quenez. Reflectedsolutions of backward SDE’s, and related obstacle problems for PDE’s.
Ann. Probab. ,25(2):702–737, 1997.[EKTZ14] Ibrahim Ekren, Christian Keller, Nizar Touzi, and Jianfeng Zhang. On viscositysolutions of path dependent PDEs.
The Annals of Probability , 42(1):204–236, 2014.[EPQ97] N. El Karoui, S. Peng, and M. C. Quenez. Backward stochastic differential equationsin finance.
Math. Finance , 7(1):1–71, 1997.[Fuk11] Masaaki Fukasawa. Asymptotic analysis for stochastic volatility: martingale expan-sion.
Finance and Stochastics , 15(4):635–654, 2011.[Gas18] Paul Gassiat. On the martingale property in the rough Bergomi model, 2018.30GJR18] Jim Gatheral, Thibault Jaisson, and Mathieu Rosenbaum. Volatility is rough.
Quan-titative Finance , 18(6):933–949, 2018.[GMZ20] Ludovic Gouden`ege, Andrea Molent, and Antonino Zanette. Machine learning forpricing American options in high-dimensional Markovian and non-Markovian mod-els.
Quantitative Finance , 20(4):573–591, 2020.[HJW18] Jiequn Han, Arnulf Jentzen, and E Weinan. Solving high-dimensional partial dif-ferential equations using deep learning.
Proceedings of the National Academy ofSciences , 115(34):8505–8510, 2018.[HMY02] Y. Hu, J. Ma, and J. Yong. On semi-linear degenerate backward stochastic partialdifferential equations.
Probab. Theory Relat. Fields , 123:381–411, 2002.[HP91] Y. Hu and S. Peng. Adapted solution of a backward semilinear stochastic evolutionequations.
Stoch. Anal. Appl. , 9:445–459, 1991.[HPW19] Cˆome Hur´e, Huyˆen Pham, and Xavier Warin. Some machine learning schemes forhigh-dimensional nonlinear pdes. arXiv preprint arXiv:1902.01599 , 2019.[HSW89] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforwardnetworks are universal approximators.
Neural networks , 2(5):359–366, 1989.[HSW90] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Universal approximationof an unknown mapping and its derivatives using multilayer feedforward networks.
Neural networks , 3(5):551–560, 1990.[JLP19] Eduardo Abi Jaber, Martin Larsson, and Sergio Pulido. Affine Volterra processes.
The Annals of Applied Probability , 29(5):3155–3200, 2019.[JO19] Antoine Jack Jacquier and Mugad Oumgari. Deep PPDEs for rough local stochasticvolatility.
Available at SSRN 3400035 , 2019.[Kry10] N. V. Krylov. On the Itˆo-Wentzell formula for distribution-valued processes andrelated topics.
Probab. Theory Relat. Fields , 150:295–319, 2010.[Oks03] Bernt Oksendal.
Stochastic differential equations: an introduction with applications .Springer, 2003.[Pen92] Shige Peng. Stochastic Hamilton-Jacobi-Bellman equations.
SIAM J. Control Op-tim. , 30:284–304, 1992.[PP90] E. Pardoux and S. Peng. Adapted solution of a backward stochastic differentialequation.
Syst. Control Lett. , 14(1):55–61, 1990.[Qiu17] Jinniao Qiu. Weak solution for a class of fully nonlinear stochastic hamilton–jacobi–bellman equations.
Stoch. Process. Appl. , 127(6):1926–1959, 2017.[Qiu18] Jinniao Qiu. Viscosity solutions of stochastic Hamilton–Jacobi–Bellman equations.
SIAM J. Control Optim. , 56(5):3708–3730, 2018.31QW14] Jinniao Qiu and Wenning Wei. On the quasi-linear reflected backward stochasticpartial differential equations.
J. Funct. Anal. , 267:3598–3656, 2014.[VZ +
19] Frederi Viens, Jianfeng Zhang, et al. A martingale approach for fractional brown-ian motions and related path dependent pdes.
The Annals of Applied Probability ,29(6):3489–3540, 2019.[Zho92] Xun Yu Zhou. A duality analysis on stochastic partial differential equations.