[PDF] PDGM: a Neural Network Approach to Solve Path-Dependent Partial Differential Equations

Abstract

In this paper, we propose a novel numerical method for Path-Dependent Partial Differential Equations (PPDEs). These equations firstly appeared in the seminal work of Dupire [2009], where the functional Itô calculus was developed to deal with path-dependent financial derivatives contracts. More specificaly, we generalize the Deep Galerking Method (DGM) of Sirignano and Spiliopoulos [2018] to deal with these equations. The method, which we call Path-Dependent DGM (PDGM), consists of using a combination of feed-forward and Long Short-Term Memory architectures to model the solution of the PPDE. We then analyze several numerical examples, many from the Financial Mathematics literature, that show the capabilities of the method under very different situations.

Full PDF

PPDGM: A N EURAL N ETWORK A PPROACH TO S OLVE P ATH -D EPENDENT P ARTIAL D IFFERENTIAL E QUATIONS

A P

REPRINT

Yuri F. Saporito

School of Applied MathematicsGetulio Vargas FoundationRio de Janeiro, Brazil [email protected]

Zhaoyu Zhang

Department of Industrial Engineeringand Operations ResearchColumbia UniversityNew York, USA [email protected]

April 7, 2020 A BSTRACT

In this paper, we propose a novel numerical method for Path-Dependent Partial Differential Equa-tions (PPDEs). These equations ﬁrstly appeared in the seminal work of Dupire [2009], where thefunctional Itˆo calculus was developed to deal with path-dependent ﬁnancial derivatives contracts.More speciﬁcaly, we generalize the Deep Galerking Method (DGM) of Sirignano and Spiliopoulos[2018] to deal with these equations. The method, which we call Path-Dependent DGM (PDGM),consists of using a combination of feed-forward and Long Short-Term Memory architectures tomodel the solution of the PPDE. We then analyze several numerical examples, many from the Finan-cial Mathematics literature, that show the capabilities of the method under very different situations. K eywords Functional Itˆo Calculus · Path-Dependent Partial Differential Equations · Neural Networks · LongShort-Term Memory · Deep Galerkin Method

Neural networks and their modern computational implementations, generally called deep learning, have been suc-cessfully applied in several areas of mathematics and science in recent years. In this paper, we will generalize themethodology proposed in Sirignano and Spiliopoulos [2018] to numerically solve PDEs, known as Deep GalerkinMethod (DGM), to an inﬁnite dimensional setting. Applications of deep learning to solve PDEs date back to Lee andKang [1990], Lagaris et al. [1998], Parisi et al. [2003]. Lately, many articles have dealt with the ﬁnite-dimensionalPDE problems, see, for instance, E et al. [2017, 2018], Raissi et al. [2019], Al-Aradi et al. [2019]. Many of themexamine non-linear PDEs and then consider the Backward Stochastic Differential Equation (BSDE) technique. Wewill not pursue this approach here.We propose a numerical method based on neural networks to solve path-dependent partial differential equations(PPDEs) that arises from the functional calculus framework proposed in Dupire [2009]. This theory was ﬁrstly pro-posed in the aforesaid reference with the goal to extend results available for vanilla derivatives contracts in FinancialMathematics to more general, path-dependent derivatives, as, for instance, Asian, barrier and lookback options. Ad-ditionally, non-linear PPDEs appear in the context of stochastic optimal control and differential games, see Saporito[2019] and Pham and Zhang [2014].One of the main features of this functional calculus is the fact that all the modelling is non-anticipative, meaning thatit does not look into the future of the evolution of the state dynamics. This fact suggests the choice of Long-ShortTerm Memory (LSTM) networks to model these objects. In fact, we propose a novel architecture that combines LSTMand feed-forward, which we called Path-Dependent Deep Galerking Method (PDGM) architecture, that captures thenon-anticipativeness of functionals and deals with the necessary path deformations from this functional calculus. a r X i v : . [ q -f i n . C P ] A p r PREPRINT - A

PRIL

7, 2020Recently, in Fouque and Zhang [2019], it has been shown that the LSTM network can be used to numerically solvecoupled forward anticipated BSDEs, and effectively approximate the conditional expectation for a non-Markovianprocess.There are very few methods available to solve PPDEs; for a discussion about them, see Ren and Tan [2017] andreferences therein. In this paper, the authors summarize some numerical methods to deal with PPDE, namely ﬁnitedifference, trinomial tree, probabilistic schemes. These methods are either Monte Carlo or tree based. Our methoddiffers from all of them by considering the recent neural network approach for differential equations.The closest work to ours, but different nonetheless, is Jacquier and Oumgari [2019]. In this paper, the authors considerthe functional framework proposed by Viens and Zhang [2019] that generalizes the functional Itˆo calculus to deal withthe fractional Brownian motion in a very inventive way. The numerical procedure proposed in Jacquier and Oumgari[2019] uses the approach that combines BSDE and deep learning to numerically solve PPDE that arises from the roughHeston model. Our approach could be modiﬁed to handle those PPDEs. However, it is outside the scope of this paper.The paper is organized as follows. In Section 2 we introduce the functional Itˆo calculus and the main theoretical objectof our study, the PPDEs. The algorithm is presented and studied in Section 3. Finally, we show several numericalexamples in Section 4. In order to show the capabilities of the method, we mostly consider cases where closed-formsolutions are available.

In this section, we review the notion of Path-Dependent Partial Differential Equations (PPDEs) and the theory thatcreated them, the functional Itˆo calculus. Proposed in the seminal paper Dupire [2009], this framework allows usto apply the techniques of differential calculus to functions that depend on the history of the state variable beingconsidered. It was ﬁrstly developed in the Itˆo’s stochastic calculus setting, but this generalization could be obviouslyapplied in the usual, deterministic differential calculus. Below we present the necessary deﬁnitions and results todeﬁne precisely what is a PPDE.

We start by ﬁxing a time horizon

T > . Denote Λ t the space of c`adl`ag paths in [0 , t ] taking values in R n and deﬁne Λ = (cid:83) t ∈ [0 ,T ] Λ t . Capital letters will denote elements of Λ (i.e. paths) and lower-case letters will denote spot value ofpaths. In symbols, Y t ∈ Λ means Y t ∈ Λ t and y s = Y t ( s ) , for s ≤ t .A functional is any function f : Λ −→ R . For such objects, we deﬁne, when the limits exist, the time and spacefunctional derivatives , respectively, as ∆ t f ( Y t ) = lim δt → + f ( Y t,δt ) − f ( Y t ) δt , (1) ∆ x f ( Y t ) = lim h → f ( Y ht ) − f ( Y t ) h , (2)where Y t,δt ( u ) = (cid:40) y u , if ≤ u ≤ t,y t , if t ≤ u ≤ t + δt,Y ht ( u ) = (cid:40) y u , if ≤ u < t,y t + h, if u = t, see Figures 1 and 2. In the case when the path Y t lies in a multidimensional space, the path deformations above areunderstood as follows: the ﬂat extension is applied to all dimension jointly and equally and the bump is applied toeach dimension individually.We consider here continuity of functionals as the usual continuity in metric spaces with respect to the metric: d Λ ( Y t , Z s ) = (cid:107) Y t,s − t − Z s (cid:107) ∞ + | s − t | , where, without loss of generality, we are assuming s ≥ t , and (cid:107) Y t (cid:107) ∞ = sup u ∈ [0 ,t ] | y u | . PREPRINT - A

PRIL

7, 2020 b b

Figure 1: Flat extension of a path. bb b Figure 2: Bumped path.The norm | · | is the usual Euclidean norm in the appropriate Euclidean space, depending on the dimension of the pathbeing considered. This continuity notion could be relaxed, see, for instance, Oberhauser [2016].Moreover, we say a functional f is boundedness- preserving if, for every compact set K ⊂ R n , there exists a constant C such that | f ( Y t ) | ≤ C , for every path Y t satisfying Y t ([0 , t ]) = { y ∈ R n ; Y t ( s ) = y for some s ∈ [0 , t ] } ⊂ K , seeCont and Fourni´e [2010].A functional f : Λ −→ R is said to belong to C , if it is Λ -continuous, boundedness-preserving and it has Λ -continuous, boundedness-preserving derivatives ∆ t f , ∆ x f and ∆ xx f . Here, clearly, ∆ xx = ∆ x ∆ x .Our numerical method is based on the following approximation of the functional derivatives: for a smooth functional f ∈ C , , we use ∆ t f ( Y t ) = f ( Y t,δt ) − f ( Y t ) δt + o ( δt ) , ∆ x f ( Y t ) = f ( Y ht ) − f ( Y t ) h + o ( h ) , ∆ xx f ( Y t ) = f ( Y ht ) − f ( Y t ) + f ( Y − ht ) h + o ( h ) . (3)Additionally, one could obviously consider ∆ x f ( Y t ) = f ( Y ht ) − f ( Y − ht )2 h + o ( h ) . For any s ≤ t in [0 , T ] , denote by Λ s,t the space of R n -valued c`adl`ag paths on [ s, t ] . Now deﬁne the operator ( · ⊗ · ) : Λ s,t × Λ t,T −→ Λ s,T , the concatenation of paths, by ( Y ⊗ Z )( u ) = (cid:40) y u , if s ≤ u < t,z u − z t + y t , if t ≤ u ≤ T, which is a paste of Y and Z .Given functionals µ and σ and ﬁxing a probability space (Ω , F , P ) , we consider a process x given by the stochasticdifferential equation (SDE) dx s = µ ( X s ) ds + σ ( X s ) dw s , (4)with s ≥ t and X t = Y t . The process ( w s ) s ∈ [0 ,T ] denotes a standard Brownian motion in (Ω , F , P ) and we assume µ and σ are such that there exists a unique strong solution for the SDE (4). This unique solution will be denoted by x Y t s and the path solution from t to T by X Y t t,T . We forward the reader, for instance, to Rogers and Williams [2000] forresults on SDEs with functional coefﬁcients.Finally, we deﬁne the conditioned expectation as E [ g ( X T ) | Y t ] = E [ g ( Y t ⊗ X Y t t,T )] , (5)for any Y t ∈ Λ . The path Y t ⊗ X Y t t,T ∈ Λ T is equal to the path Y t up to t and follows the dynamics of the SDE (4)from t to T with initial path Y t . Moreover, if we deﬁne the ﬁltration F xt generated by { x s ; s ≤ t } , one may prove E [ g ( X T ) | X t ( ω )] = E [ g ( X T ) | F xt ]( ω ) P -a.s.3 PREPRINT - A

PRIL

7, 2020where the expectation on the left-hand side is the one discussed above and the one on the right-hand side is the usual conditional expectation .The Feynman-Kac formula in the classical stochastic calculus is a very important result that relates conditional expec-tations of functions of diffusions and PDEs. It turns out that a functional extension of this result is available.

Theorem 2.1 (Functional Feynman-Kac Formula; Dupire [2009])

Let x be a process given by the SDE (4). Con-sider functionals g : Λ T −→ R , λ : Λ −→ R and k : Λ −→ R and deﬁne the functional f as f ( Y t ) = E (cid:34) e − (cid:82) Tt λ ( X u ) du g ( X T ) + (cid:90) Tt e − (cid:82) st λ ( X u ) du k ( X s ) ds (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Y t (cid:35) , for any path Y t ∈ Λ , t ∈ [0 , T ] . Thus, if f ∈ C , and k , λ , µ and σ are Λ -continuous, then f satisﬁes the (linear)Path-dependent Partial Differential Equation (PPDE): ∆ t f ( Y t ) + µ ( Y t )∆ x f ( Y t ) + 12 σ ( Y t )∆ xx f ( Y t ) − λ ( Y t ) f ( Y t ) + k ( Y t ) = 0 , (6) with f ( Y T ) = g ( Y T ) , for any Y t in the topological support of the stochastic process process x . Remark 2.2

In diffusion models ( µ ( Y t ) = µ ( t, y t ) and σ ( Y t ) = σ ( t, y t ) ), under mild assumptions on µ and σ , theStroock-Varadhan Support Theorem states that the topological support of x is the space of continuous paths startingat x , see for instance [Pinsky, 1995, Chapter 2]. So, under these assumptions, the PPDE (6) will hold for anycontinuous path. See Jazaerli and Saporito [2017] for a discussion on this type of result in the case of SDEs withfunctional coefﬁcients. For instance, the arithmetic and geometric Brownian motions have full support on the spaceof continuous path, with the GBM having a restriction for positive range for the paths. Remark 2.3

Existence and uniqueness of classical (in the functional sense) of solution of PPDEs of the form (6) wasstudied in Flandoli and Zanco [2016], for instance. We forward the reader to the aforesaid reference for conditions ofthe functional parameters to ensure this result. Furthermore, several results have been developed to study non-linearversions of such PPDE and the existence and uniqueness of viscosity solutions, see Ekren et al. [2014], Ekren et al.[2016a,b].

In this section, we will present our algorithm to numerically solve a vast class of PPDEs. The main idea of algorithm isto apply the DGM methodology of Sirignano and Spiliopoulos [2018] to the PPDE framework. By DGM methodologywe mean to approximate the solution of the equation by ﬁnding an neural network that approximately solve the equa-tion in a given sense and any other additional conditions. In order to achieve this we need to consider a neural networkarchitecture that correctly models the functionals that appear in PPDEs. Since functionals are non-anticipative, theirvalue at t does not depend on state values after t , for any given time t . Because of this characteristic we consider acombination of feed-forward and the LSTM networks.Another difference between our setting and the DGM is the space where the equation is deﬁned. In their case, thedomain of the PDE is some subset of an Euclidean space. In our case, it is a subset of the space of paths Λ or of thespace of continuous paths. We start by stating some useful deﬁnitions. A set of layers M ρd,k with input x ∈ R d in a feed-forward neural networkcan be deﬁned as M ρd,k := { M : R d → R k ; M ( x ) = ρ ( Ax + b ) , A ∈ R k × d , b ∈ R k } , (7)where ρ is some activation function such as ρ tanh ( x ) := tanh( x ) , ρ s ( x ) := e − x and ρ Id ( x ) := x . Then the set offeed-forward neural networks with (cid:96) hidden layers is deﬁned as a composition of layers: NN (cid:96)d ,d = { (cid:102) M : R d → R d ; (cid:102) M = M (cid:96) ◦ · · · ◦ M ◦ M ,M ∈ M ρd ,k , M (cid:96) ∈ M ρk l ,d , M i ∈ M ρk i ,k i +1 , k i ∈ Z + , i = 1 , . . . , (cid:96) − } . Instead of all the inputs being not ordered as in the feed-forward neural network, we often encounter sequential infor-mation as input, which is the case of our application. Additionally, for instance, in natural language processing, one4

PREPRINT - A

PRIL

7, 2020of the main topics is sentimental analysis, where given paragraphs of texts, one classify them into different categories.In such cases, recurrent neural networks (RNN) come into play, which stores information so far, and uses them toperform computations in the next step. However, it was shown in Bengio et al. [1994] that plain RNNs suffer fromexploding or vanishing gradient problems. The LSTM network in Hochreiter and Schmidhuber [1997] is designed totackle this problem, in which the inputs and outputs are controlled by gates inside each LSTM cell. This architecture ispowerful for capturing long-range dependence of the data. Each LSTM cell is composed of a cell state, which containsthe information, and three gates, which regulate the ﬂow of information. Mathematically, the rule inside the i th cellfollows, for x i ∈ R d , Γ F i ( x i , a i − ) = ρ s ( A F x i + U F a i − + b F ) , Γ I i ( x i , a i − ) = ρ s ( A I x i + U I a i − + b I ) , Γ O i ( x i , a i − ) = ρ s ( A O x i + U O a i − + b O ) ,c i =Γ F i (cid:12) c i − + Γ I i (cid:12) ρ tanh ( A C x i + U C a i − + b C ) ,a i =Γ O i (cid:12) ρ tanh ( c i ) , (8)where the operator (cid:12) denotes the element-wise product. Additionally, a i ∈ R k is known as the output vector withinitial value a − = 0 , and c i ∈ R k is called the cell state vector with initial value c − = 0 ; k refers to the number ofhidden units. Moreover, A · ∈ R k × d , U · ∈ R k × k are weight matrices, and b · ∈ R k is the bias vector. These parametersare learned during training via a stochastic gradient descent algorithm combined with backpropagation to compute thegradients.The set of LSTM network up to time i is deﬁned as LSTM i,d,k = (cid:26) M : ( R d ) i × R k × R k → R k × R k ; M ( x [0 ,i ] , a − , c − ) = ( a i , c i ) ,c i = Γ F i (cid:12) c i − + Γ I i (cid:12) ρ tanh ( A C x i + U C a i − + b C ) , a i = Γ O i (cid:12) ρ tanh ( c i ) , a − = c − = 0 (cid:27) , (9)where Γ F · , Γ I · , Γ O · are deﬁned in (8) and x [0 ,i ] = [ x , . . . , x i ] . In order to model the objects from the functional Itˆo calculus, we propose a novel neural network architecture thatcombines LSTM and feed-forward networks in order to guarantee non-anticipativeness and to deal with the necessarypath deformations from the functional Itˆo calculus. We call such architecture Path-Dependent Deep Galerking Method(DPGM). The network structure is displayed in Figure 3.Figure 3: PDGM architecture.The PDGM architecture approximates a functional f as follows. We start by considering a time discretization { t i } i =1 ,...,N , with δt = t i − t i − . We then approximate f ( Y t ) by a feed-forward neural network u ( Y t i ; θ ) = ϕ ( t i , y t i , a t i − ; θ f ) , where t i ≤ t < t i +1 . Here ϕ ∈ NN (cid:96)k +2 , , where a is an output vector from an LSTM network,5 PREPRINT - A

PRIL

7, 2020i.e. a t i − = ψ ( y t , . . . , y t i − ; θ r ) , for some ψ ∈ LSTM i − , ,k . θ = [ θ f , θ r ] are the neural network’s parameters. Thespatial and time extensions can be properly obtained by assigning correct inputs to the feed-forward neural network.This shows the effectiveness of the PDGM architecture for the functional Itˆo calculus setting. Therefore, the functionalderivatives can be approximated according to the approximation as in (3): u ( Y ht i ; θ ) = ϕ ( t i , y t i + h, a t i − ; θ f ) ,u ( Y t i ,δt ; θ ) = ϕ ( t i +1 , y t i , a t i ; θ f ) , ∆ [ δt ] t u ( Y t i ; θ ) = u ( Y t i ,δt ; θ ) − u ( Y t i ; θ ) δt , ∆ [ h ] x u ( Y t i ; θ ) = u ( Y ht i ; θ ) − u ( Y t i ; θ ) h , ∆ [ h ] xx u ( Y t i ; θ ) = u ( Y ht i ; θ ) − u ( Y t i ; θ ) + u ( Y − ht i ; θ ) h . (10) Consider the general class of ﬁnal-value PPDE problem:  ∆ t f ( Y t ) + L f ( Y t ) = 0 ,f ( Y T ) = g ( Y T ) , (11)where Y t ∈ Λ ⊂ Λ , for some subset of paths Λ . As an illustration, L could be given by the linear operator L f ( Y t ) = µ ( Y t )∆ x f ( Y t ) + 12 σ ( Y t )∆ xx f ( Y t ) − λ ( Y t ) f ( Y t ) + k ( Y t ) . (12)We train the neural network to minimize the following objective function: J ( θ ) = (cid:13)(cid:13)(cid:13) ∆ [ δt ] t u ( · ; θ ) + L [ h ] u ( · ; θ ) (cid:13)(cid:13)(cid:13) ,ν + (cid:107) u ( · ; θ ) − g (cid:107) T ,ν , where L [ h ] is the operator L with the ﬁnite difference approximation of the functional derivatives and (cid:107) f (cid:107) ,ν = E ν (cid:34)(cid:90) T f ( X t ) dt (cid:35) , (cid:107) g (cid:107) T ,ν = E ν (cid:2) g ( X T ) (cid:3) . Here, ν and ν are measures in the path space Λ and Λ T , respectively. The choice of this measures and consequentlyhow we should sample the paths X t in order to approximate the theoretical loss function J will be discussed below.Additionally, we apply stochastic gradient descent to minimize the loss function over a set of parameters θ . Theneural network with optimized parameters θ delivers an approximation of the solution of the PPDE (11). Given M simulated paths accordingly to the laws ν and ν , time and space discretization parameters δt and h , the loss J willbe approximated by J N,M ( θ ) = 1 M N M (cid:88) j =1 N (cid:88) i =0 (cid:16) ∆ [ δt ] t u ( Y ( j ) t i ; θ ) + L [ h ] u ( Y ( j ) t i ; θ ) (cid:17) + 1 M M (cid:88) j =1 (cid:16) u ( Y ( j ) t N ; θ ) − g ( Y ( j ) t N ) (cid:17) . (13)Moreover, when a closed-form solution is available, we compute the L -error from this one and our numerical solutionapproximating the mean squared error (cid:107) · (cid:107) ,ν deﬁned above.Our algorithm works as follows: 6 PREPRINT - A

PRIL

7, 2020

Algorithm 1:

Path-Dependent DGM - PDGMinitialize discretization parameter δt , mini-batch size M and threshold (cid:15) while J N,M ( θ ) > (cid:15) do generate a mini-batch size of M paths { ( Y ( j ) t i ) i =0 ,...,N } j =1 ,...,M for i ∈ { , . . . , N } do calculate u ( Y ( j ) t i ; θ ) , ∆ [ δt ] t u ( Y ( j ) t i ; θ ) , ∆ [ h ] x u ( Y ( j ) t i ; θ ) and ∆ [ h ] xx u ( Y ( j ) t i ; θ ) according to (10);put them all together to compute L [ h ] u ( Y ( j ) t i ; θ ) ; end calculate the approximated loss function, J N,M ( θ ) , as in (13);to minimize J N,M ( θ ) , update θ using stochastic gradient descent. end3.3.1 Simulation One important ingredient of the method above is the simulation of the paths Y ( j ) . The goal of this step is to selectgood representatives of the set Λ , the domain of the PPDE. Usually, for problems that arises from the Feynman-Kacformula (even in its non-linear form), it is straightforward to choose the generating process that should be consideredfor the simulation of these paths. For instance, if we have the path-dependent heat equation (studied in Section 4.1),one should simulate from the Brownian motion.However, the simulation does not to have as precise as in the Monte Carlo methods. The reason is that the PPDE itselfhas the dynamics of the state variable within its formulation. For example, in the Heston model studied in Section4.2.4, one could simulate the CIR dynamics simplifying the natural reﬂecting barrier at 0 (taking the maximum of thesimulated value and zero, for instance).Nonetheless, one should be aware of the choice of simulated paths for the training. Usually, the space Λ is muchbigger than the possible simulated paths (e.g. in the Brownian case, Λ is the space of continuous paths in [0 , T ] ).An interesting exercise is to verify that training from a given set of simulated paths gives the algorithm sufﬁcientknowledge to predict the value of the functional on a different type of path. Numerical experiments showed us that oneimportant aspect is the range of the test paths. If the range is very different from the trained paths, the approximationwill not work very well. In the numerical examples below, we consider the exact model coming from the PPDE tosimulate the paths for the training sets and test the trained functional in very smooth and very rough paths differentfrom the generating process of the training paths, but respecting their range. The method performs very well in all ofthem. Remark 3.1

An idea similar to control variates applied in Monte Carlo methods would be the following. Supposethat φ is a path-independent functional (i.e. φ ( Y t ) = φ ( t, y t ) ) such that ∆ t φ + L φ = 0 with φ ( T, y T ) being somewhatanalogous to g ( Y T ) (e.g. grows similarly). Then, the functional ˜ f ( Y t ) = f ( Y t ) − φ ( t, y t ) solves the same PPDE andthe ﬁnal condition might be better behaved. We then could apply the algorithm to approximate ˜ f and use the formula f ( Y t ) = ˜ f ( Y t ) + φ ( t, y t ) to ﬁnd an approximation for f . The derivation of convergence results similar to the ones shown in Sirignano and Spiliopoulos [2018] are very chal-lenging in this setting. We leave them for possible future work since it would require some new results from thefunctional Itˆo calculus theory. However, we sketch an approach for the proof of existence of a PDGM network suchthat the loss function is arbitrarily small.The argument would be as follows: ﬁx N as the time discretization parameter and consider the approximation of thefunctional f as f ( Y t ) ≈ φ N ( y t , . . . , y t i , y t ) where, t i < t < t i +1 . If f is smooth, then φ N is smooth in the last vari-able and φ N → f as N → + ∞ , where the convergence is of the functional and their derivatives. Now, fundamentally,the PDGM approximates φ N and this is very similar to argument presented in Sirignano and Spiliopoulos [2018]. Theresult, under possible additional technical conditions, could be formulated as: Conjecture 3.1

Assume there exists a classical solution for the PPDE (11). Then, for any ε > , there exists k, (cid:96) ∈ N , ϕ ∈ NN (cid:96)k +2 , and ψ ∈ LSTM i − , ,k such that u ( Y t i ; θ ) = ϕ ( t i , y t i , a t i − ; θ f ) with a t i − = ψ ( y t , . . . , y t i − ; θ r ) satisﬁes J ( θ ) < ε. PREPRINT - A

PRIL

7, 2020

In this section we will provide several examples of PPDEs with their closed-form and PDGM solutions. We willconsider different dynamics (Brownian motion, geometric Brownian motion and Heston model) and different path-dependent ﬁnal conditions (running integral, running maximum and running minimum). Moreover, we will alsoconsider a non-linear case. The algorithm could handle more complex problems, as for instance, high-dimensionalPPDEs. We decided to choose classic examples for pedagogical reasons: they are well-known to the readers, they haveclosed-form solutions and demonstrate how powerful the method is. Furthermore, as it was clear from the expositionof the method, the PDGM is able to deal with any path-dependent structure as long as it might be written as a PPDEof the form (11). Additional conditions, such as boundary and integral conditions, could be added to the loss functionsimilarly to the DGM methodology.We have used a personal desktop with Intel Core i7, 16GB RAM, and a NVIDIA RTX 2080 graphic card to runthese numerical examples. Additionally, we have used the

TensorFlow . Each epoch takes approximately 0.4s to0.8s depending on the complexity of the neural network and the PPDE. The Python code for an illustrative exam-ple of the geometric Asian option shown in Section 4.2.1 is available at https://github.com/zhaoyu-zhang/PDGM-Geometric_Asian . In this section, we consider the class of examples  ∆ t f ( Y t ) + 12 ∆ xx f ( Y t ) = 0 ,f ( Y T ) = g ( Y T ) , (14)where Y T is any continuous path, see Remark 2.2. These PPDEs arise from the linear expectations of path-dependentﬁnal condition g under a Brownian model. Under smoothness condition on f , the PPDE above holds for any continuouspath Y . These simple examples allow us to provide a very clear introduction to the method and serve as illustrations.Below we will consider ﬁve different ﬁnal conditions g : path-independent, linear running integral, quadratic runningintegral, one high-dimensional case and a strongly path-dependent example.Training paths in this subsection are sampled from standard Brownian motions paths with T = 1 and time discretiza-tion N = 100 . For the path independent, linear running integral, quadratic running integral examples, we choosemini-batch size M = 128 paths. We use a single layer LSTM network with 64 units connecting with a deep feed-forward neural network which consists of three hidden layers with 64, 128, 64 respectively. Although we only trainour neural network using standard Brownian motions simulated paths, our algorithm is able to provide a good approx-imation to the true solution for paths other than those. Furthermore, we show the train losses, test losses and MSEafter 10,000 epochs in the table below.Example Train Loss Test Loss MSEPath Independent × − × − . × − Linear Running Integral × − × − . × − Quadratic Running Integral × − × − . × − High Dimensional Example with Dimension = 20 × − × − . × − Stopping Time Example . × − . × − – –Table 1: Train and test losses for the Brownian caseFor path independent, linear running integral, quadratic running integral examples, three representatives test pathswith their corresponding solution and derivatives are plotted in Figures 5, 7, and 12 respectively. Path 1 is a standardBrownian motion path. Path 2 is the smooth path y t = (1 − t ) for t ∈ [0 , . Path 3 is a realization of a sequence ofuniform random variables between -1 and 1, i.e., y t i ∼ U ( − , , for each i ∈ { , . . . , N } .8 PREPRINT - A

PRIL

7, 2020

As a sanity check, consider the case where g ( Y T ) = φ ( y T ) , which yields a PDE with solution f ( Y t ) = (cid:90) R φ ( y t + z ) e − z / (2( T − t )) (cid:112) π ( T − t ) dz. As an example, we consider φ ( y ) = y , which gives f ( Y t ) = y t + T − t . Figure 4 shows the training and testinglosses. Three representative paths with their corresponding solution and derivatives are shown in Figure 5. It can beseen that our algorithm provides a good approximation. The functional derivatives for this example are ∆ t f ( Y t ) = − and ∆ xx f ( Y t ) = 2 , which are also captured by the algorithm. L o ss Train LossTest Loss

Figure 4: Train and test losses for the path-independent example. Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t101234

Solution & Derivatives of Test Path 1

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Solution & Derivatives of Test Path 2

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Solution & Derivatives of Test Path 3

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Figure 5: Three representative paths with corresponding solutions and functional derivatives for the path-independentexample. 9

PREPRINT - A

PRIL

7, 2020

Consider the path-dependent case of g ( Y T ) = (cid:90) T y u du , which gives the solution f ( Y t ) = (cid:90) t y u du + y t ( T − t ) . In this example, training and test losses reach × − after 10000 epochs, which is shown in Figure 6. Figure 7plots three representative paths (as in the path-independent example) with their corresponding solution and functionalderivatives. The predicted solutions are approximately the same as true solutions. From the plot, the derivatives in thisexample for both ∆ t f ( Y t ) and ∆ xx f ( Y t ) are 0 which is true also by direct computation. L o ss Train LossTest Loss

Figure 6: Train and test losses for the linear running integral example. Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t0.00.20.40.60.81.01.2

Solution & Derivatives of Test Path 1

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Solution & Derivatives of Test Path 2

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Solution & Derivatives of Test Path 3

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Figure 7: Three representative paths with corresponding solutions and functional derivatives for the linear runningintegral example. 10

PREPRINT - A

PRIL

7, 2020Though using Brownian motions as training paths yields a faster convergence with a small number of neurons, onedrawback is that when the test path is outside the domain of the trained paths, it would yield a poor prediction, as itwas discussed in Section 3.3.1. In particular, Figure 8 plots 128 Brownian paths used for training, showing that thedomain is from − to 2. In Figure 9, the neural network is not able to ﬁnd the right solutions to a Brownian path withvolatility 4 which starts at − . . Y t A sample of 128 Brownian motion paths

Figure 8: A sample of 128 Brownian motion paths. Y t Test Path (Failure) 0.0 0.2 0.4 0.6 0.8 1.0t876543 f ( Y t ) True / Pred Solution to Test Path (Failure)

TruePredTerminal Value

Figure 9: Prediction failure due to the limitation of training domain.One easy remedy for the above problem is to use varying volatility of a Brownian paths with varying initial values. Inaddition, one may also need to enlarge the neural network. For example, the training paths for Figure 10 are Brownianpaths with volatility σ ∈ { , , , } and initial value x ∼ U ( − , . The single layer LSTM network consistsof 128 units, and each of the three layer feed-forward neural networks contains 128 hidden neurons. For example, inFigure 10, test path 1 is a Brownian path with volatility 4 and starting at − . ; test path 2 is a function y t = (2 − t ) ;test path 3 is a realization of a sequence of i.i.d. uniform random variables drawn from − to 5. As a result, accordingto the above setup, the neural network is capable to predict solutions to the paths with wider domain. In order to consider a more complicated case, take g ( Y T ) = (cid:32)(cid:90) T y u du (cid:33) , then f ( Y t ) = E (cid:32)(cid:90) t y u du + (cid:90) Tt ( y t + w u − w t ) du (cid:33)  = E (cid:32)(cid:90) t y u du + y t ( T − t ) + (cid:90) Tt ( w u − w t ) du (cid:33)  PREPRINT - A

PRIL

7, 2020 Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t8.07.57.06.56.05.5 f ( Y t ) Solution of Test Path 1

TruePredTerminal Value0.0 0.2 0.4 0.6 0.8 1.0t012345678 f ( Y t ) Solution of Test Path 2

TruePredTerminal Value 0.0 0.2 0.4 0.6 0.8 1.0t42024 f ( Y t ) Solution of Test Path 3

TruePredTerminal Value

Figure 10: Three representative paths with corresponding solutions and functional derivatives for the linear runningintegral example. = (cid:18)(cid:90) t y u du (cid:19) + y t ( T − t ) + 2 y t ( T − t ) (cid:90) t y u du + E (cid:32)(cid:90) Tt ( w u − w t ) du (cid:33)  . Moreover E (cid:32)(cid:90) Tt ( w u − w t ) du (cid:33)  = E (cid:32)(cid:90) T − t ( w u + t − w t ) du (cid:33)  = E (cid:32)(cid:90) T − t w u du (cid:33)  = 13 ( T − t ) , yielding f ( Y t ) = (cid:18)(cid:90) t y u du (cid:19) + y t ( T − t ) + 2 y t ( T − t ) (cid:90) t y u du + 13 ( T − t ) . Similar to the above examples, training and testing loss is around × − after 10000 epochs as in Figure 11, andthree representative paths with their corresponding solution and derivatives are plotted in Figure 12. Here we will show that the methodology we have developed could handle high-dimensional PPDEs. Since this isnot the main focus of the paper, the example serves more as an illustration of the method under this setting. Severalnumerical improvements could be introduce following the suggestions outlined in Sirignano and Spiliopoulos [2018].12

PREPRINT - A

PRIL

7, 2020 L o ss Train LossTest Loss

Figure 11: Train and test losses for the quadratic running integral example. Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t1.00.50.00.51.01.52.0

Solution & Derivatives of Test Path 1

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Solution & Derivatives of Test Path 2

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Solution & Derivatives of Test Path 3

True SolutionPredicted SolutionTerminal ValueSpace Derivative xx f Time Derivative t f Figure 12: Three representative paths with corresponding solutions and functional derivatives for the quadratic runningintegral example. 13

PREPRINT - A

PRIL

7, 2020Consider a d -dimensional Brownian motion w t = ( w (1) t , . . . , w ( d ) t ) and the payoff functional g ( Y T ) = (cid:32)(cid:90) T d (cid:88) i =1 y ( i ) u du (cid:33) . It can be straightforwardly shown that f ( Y t ) = (cid:32)(cid:90) t d (cid:88) i =1 y ( i ) u du (cid:33) + 2( T − t ) (cid:32) d (cid:88) i =1 y ( i ) t (cid:33) (cid:90) t d (cid:88) i =1 y ( i ) u du + d T − t ) . Moreover, f satisﬁes the PPDE:  ∆ t f ( Y t ) + 12 d (cid:88) i =1 ∆ x i x i f ( Y t ) = 0 ,f ( Y T ) = g ( Y T ) . (15)As an illustration, we choose d = 20 , T = 1 and δ = 0 . . Each dimension of the training paths are sampledfrom Brownian motions. For the numerical implementation, our algorithm works the same as in other aforementionedexamples. On the left of Figure 13, we plot the each dimension of a 20-dimensional path separately. The ﬁrst 10dimensions are sampled from standard Brownian motion paths. For the remaining 10 paths, points at each time stepare sample from an uniform distribution between -2 and 2. The right plot in Figure 13 compares the true solution andthe solution predicted, which are similar.. Time and spatial derivatives can also found on the right plot of Figure 13. Y t Test Path with Dim = 20

Solution & Derivatives of Test Path with Dim = 20

True SolutionPredicted SolutionTerminal ValueSpace Derivative i =1 x i x i f Time Derivative t f Figure 13: High-Dimensional Example

Consider the ﬁnal functional g ( Y T ) = inf { t ∈ [0 , T ] ; y t = y T } , which is the hitting time of the ﬁnal value of the path Y T . This example presents a stronger type of path-dependencethan the ones presented so far. The value of f ( Y ) can be found in closed form. Indeed, notice that in the Browniancase, f ( Y ) = E [inf { t ∈ [0 , T ] ; y + w t = y + w T } ]= E [inf { t ∈ [0 , T ] ; w t = w T } ] = (cid:90) R E [ T x ] f T ( x ) dx = 2 (cid:90) + ∞ E [ T x ] f T ( x ) dx, where T x is the hitting time of the value x of a Brownian bridge from to x and f T is the probability density of w T .Fixing T = 1 and x > , one might show that the probability density if T x is given by f T x ( t ) = x (cid:112) πt (1 − t ) e − − t t x . PREPRINT - A

PRIL

7, 2020We might then compute E [ T x ] = x √ π (cid:90) t (cid:112) t (1 − t ) e − − t t x dt = x √ π (cid:90) (cid:112) t (1 − t ) e − − t t x dt = x √ π πe x / erfc ( x/ √

2) = x (cid:114) π e x / erfc ( x/ √ . Therefore, f ( Y ) = 2 (cid:90) + ∞ x (cid:114) π e x / erfc ( x/ √ f T ( x ) dx = 2 (cid:90) + ∞ x (cid:114) π e x / erfc ( x/ √

2) 1 √ π e − x / dx = (cid:90) + ∞ x erfc ( x/ √ dx = 12 . In this example, we use Brownian motion paths with starting points following a standard normal distribution as trainingpaths. The discretization mesh size is again chosen to be 0.01. The training and testing losses approach to 0.01 after25,000 epochs as shown on the left side of Figure 14. On the right of Figure 14, we compare f ( Y ) of 12,800 pathsbetween PDGM architecture and Monte Carlo method. In the Monte Carlo method, for each starting position Y ,we simulate 5,000 Brownian motion paths in order to compute the sample mean. The results from Monte Carlosimulation have bell shape with mean around 0.54, but the results from our method are more concentrated at 0.53, andthe difference is less than 1%. This bias comes from discretization of time as also discussed in Remark 4.1.Figure 15 shows three representative test paths and their solutions from both our method and Monte Carlo simulation.Finding the solution from Monte Carlo simulation for an entire path is quite expensive. At each time step of a givenpath, we simulate 2,000 Brownian motion concatenated paths with the original path, i.e. we need to simulate 200,000paths to approximate the pathwise solution. Test path 1 is a Brownian motion path starting at 0.1233; test path 2 isa straight line from 0 to 3; test path 3 is a realization of a sequence of i.i.d. uniform random variables between 2and 2.5. The solutions are similar, and the solution from the PDGM algorithm tends to be smoother. Our algorithmafter properly trained is able to compute path solutions for any path with similar range, however, the Monte Carlosimulation is only capable to compute the solution for each entire path at a time. L o ss Train LossTest Loss 0.48 0.50 0.52 0.54 0.56 0.58 0.600200040006000800010000

Histogram of f ( Y ) PDGMMonte Carlo

Figure 14: Train and test losses on the left, and the histogram comparison of f ( Y ) between PDGM architecture andMonte Carlo method on the right for the hitting time example. Functional Itˆo calculus, and hence PPDEs, was born from the necessity to deal path-dependent ﬁnancial derivativesin the Mathematical Finance literature. In this section we will consider the classical Black–Scholes model, where thespot value follows a geometric Brownian Motion with constant parameters dx t = ( r − q ) x t dt + σx t dw t . PREPRINT - A

PRIL

7, 2020 Y t Test Path 1Test Path 2Test Path 3 y = y T f ( Y t ) Solution of Test Path 1

Monte CarloPDGMTerminal Value0.0 0.2 0.4 0.6 0.8 1.0t0.50.60.70.80.91.0 f ( Y t ) Solution of Test Path 2

Monte CarloPDGMTerminal Value 0.0 0.2 0.4 0.6 0.8 1.0t0.00.10.20.30.40.50.6 f ( Y t ) Solution of Test Path 3

Monte CarloPDGMTerminal Value

Figure 15: Three representative paths with corresponding solutions for the hitting time example.Under this model, the price of a general path-dependent ﬁnancial derivative with maturity T and payoff g : Λ T −→ R solves the PPDE  ∆ t f ( Y t ) + ( r − q ) y t ∆ x f ( Y t ) + 12 σ y t ∆ xx f ( Y t ) − rf ( Y t ) = 0 ,f ( Y T ) = g ( Y T ) , (16)for any continuous path Y T taking positive values, see Remark 2.2.We will consider three examples (Geometric Asian, Lookback and Barrier options) where closed-form solutions areavailable. Moreover, we will consider one path-dependent example with the process x having stochastic volatility.Additionally, one could consider several other path-dependent, exotic derivatives with different dynamics. The PDGMcould be applied similarly to these cases requiring possibly more computational power or time.In the examples below, we consider the PPDE (16) with parameters x = 1 , r = 0 . , q = 0 . , σ = 1 and T = 1 .The payoff functional g will vary for each case. We use the geometric Brownian motion with these parameters astraining paths for the algorithm with number of batch size of M = 128 paths and N = 100 time steps. Moreover,there are 128 units in a single layer LSTM cell, and the deep feed-forward neural network consists of three hiddenlayers with 128 neurons in each.For geometric Asian option and the lookback option, Figures 16 and 17 show three representative test paths withcorresponding closed-form solutions. Path 1 is a geometric Brownian motion path with the same parameters as above.Path 2 is the smooth path y t = (1 − t ) for t ∈ [0 , . Path 3 is a realization of a sequence of i.i.d. uniformrandom variables between 1 and 3. For the barrier option, we consider a down and out option. Figures 18 shows threerepresentative test paths (different from the ones above and deﬁned in Section 4.2.3) with corresponding closed-formsolutions.The solutions predicted from our algorithm are approximately the same as the true solutions. Our algorithm is able topredict solutions for any given paths in the domain of training paths regardless of the shape of a path. Furthermore,the losses after 15,000 epochs in these examples, together with the MSE when closed-form solution is available, aregiven in the table below. 16 PREPRINT - A

PRIL

7, 2020Example Train Loss Test Loss MSEGeometric Asian . × − . × − . × − Barrier . × − . × − . × − Lookback . × − . × − . × − Heston model ( K = 0 . , T = 1 ) . × − . × − – –Nonlinear × − × − . × − Table 2: Train and test losses for the Mathematical Finance examples

The case of continuously-monitored geometric Asian options with ﬁxed strike is determined by the payoff g ( Y T ) = (cid:32) exp (cid:40) T (cid:90) T log y t dt (cid:41) − K (cid:33) + , where x + is the positive part of x and K > is called the strike. A closed-form solution is available in this case: f ( Y t ) = e − r ( T − t ) (cid:16) G t/Tt y − t/Tt e ¯ µ +¯ σ / Φ( d ) − K Φ( d ) (cid:17) , where Φ is the cumulative distribution function of the standard normal, G t = exp (cid:26) t (cid:90) t log y u du (cid:27) ¯ µ = (cid:18) r − q − σ (cid:19) T ( T − t ) , ¯ σ = σT (cid:114)

13 ( T − t ) ,d = ( t/T ) log G t + (1 − t/T ) log y t + ¯ µ − log K ¯ σ ,d = d + ¯ σ. (17)In the numerical examples below, we ﬁx the strike at K = 0 . . Three representative test paths with correspondingclosed-form solutions are given in Figure 16. A lookback call option with ﬂoating strike is given by the payoff g ( Y T ) = y T − inf ≤ t ≤ T y t . If we denote m t = inf ≤ u ≤ t y u , the closed-form solution, assuming q = 0 , for the price of this option can be writtenas f ( Y t ) = y t Φ( a ) − m t e − r ( T − t ) Φ( a ) − y t σ r (cid:32) Φ( − a ) − e − r ( T − t ) (cid:18) m t y t (cid:19) r/σ Φ( − a ) (cid:33) , where a = log( y t /m t ) + ( r + σ / T − t ) σ √ T − t , a = a − σ √ T − t and a = a − rσ √ T − t. Figures 17 shows three representative test paths with corresponding closed-form solutions. Our algorithm shows apromising result. 17

PREPRINT - A

PRIL

7, 2020 Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t0.500.751.001.251.501.752.002.25 f ( Y t ) Solution of Test Path 1

TruePredTerminal Value0.0 0.2 0.4 0.6 0.8 1.0t0.00.10.20.30.40.5 f ( Y t ) Solution of Test Path 2

TruePredTerminal Value 0.0 0.2 0.4 0.6 0.8 1.0t0.81.01.21.41.61.82.02.2 f ( Y t ) Solution of Test Path 3

TruePredTerminal Value

Figure 16: Three representative paths with corresponding solutions for the geometric Asian option. Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t0.00.51.01.52.02.53.03.5 f ( Y t ) Solution of Test Path 1

TruePredTerminal Value0.0 0.2 0.4 0.6 0.8 1.0t0.00.10.20.30.40.50.6 f ( Y t ) Solution of Test Path 2

TruePredTerminal Value 0.0 0.2 0.4 0.6 0.8 1.0t0.250.500.751.001.251.501.752.00 f ( Y t ) Solution of Test Path 3

TruePredTerminal Value

Figure 17: Three representative paths with corresponding solutions for the lookback option.18

PREPRINT - A

PRIL

7, 2020

There are several types of barrier options, see for instance, Reiner and Rubinstein [1991] and Fouque et al. [2011].Here, we will focus on the case of down-and-out call options. More precisely, the option becomes worthless whetherthe spot value crosses a down barrier

B < S . Otherwise, the payoff is a call with strike K ≥ B . The payoff functionalcan then be written as g ( Y T ) = ( y T − K ) + (cid:110) inf y t ≤ t ≤ T > B (cid:111) . In addition, the solution should also satisfy the boundary condition f ( Y t ) = 0 if the barrier B was crossed by the path Y t . A closed-form solution is available: f ( Y t ) =  f do ( y t , T − t ) , if inf ≤ u ≤ t y u > B, , if inf ≤ u ≤ t y u ≤ B, (18)where f do ( y t , T − t ) = C BS ( y t , T − t ) − (cid:16) y t B (cid:17) − λ C BS (cid:18) B y t , T − t (cid:19) ,C BS ( y t , T − t ) is the price of a call option with strike K and maturity T at ( t, y t ) and λ = 2( r − q ) σ . Due to the fact that the option become valueless when the stock price crosses the barrier, we need to slightly modifythe loss function in our algorithm. In this case, the loss for a given sample path j at time t i is J ( j ) t i ( θ ) =  | u ( Y ( j ) t i ; θ ) − | if inf ≤ i (cid:48) ≤ i Y ( j ) t i (cid:48) < B, (cid:16) ∆ t u ( Y ( j ) t i ; θ ) + L u ( Y ( j ) t i ; θ ) (cid:17) otherwise.The total loss is calculated as J N,M ( θ ) = 1 M N M (cid:88) j =1 N (cid:88) i =0 J ( j ) t i ( θ )+ 1 M M (cid:88) j =1 (cid:34)(cid:18) u ( Y ( j ) t N ; θ ) − g ( Y ( j ) t N )1 (cid:110) inf y ti ≤ i ≤ N > B (cid:111) (cid:19) + | u ( Y ( j ) t N ; θ ) − | (cid:110) inf y ti ≤ i ≤ N < B (cid:111) (cid:35) . We then minimize the above loss objective using stochastic gradient descent algorithm and update parameter θ .In the numerical implementation, we choose B = 0 . and K = 0 . . Figure 18 plots three representative paths and thecorresponding solutions. Ttest path 1 is a geometric Brownian motion with the parameters described above. Note thispath does not cross the barrier. Test path 2 is another geometric Brownian motion but with σ = 2 . This path downcrosses the barrier around t = 0 . . The third test path is a smooth path y t = 2 . − t ) . As a result, the predictedsolutions and the true solutions are approximately the same. Remark 4.1

In our numerical experiment, we simulate paths Y at time t i equally spaced by δt . When verify-ing whether the barrier was crossed, we have available these discretized values y t i . It would be possible to have inf i =0 ,...,n − y t i > B , but inf ≤ t ≤ T y t ≤ B . This problem vanishes when δt goes to 0. One approach would bethen to consider a sufﬁciently small δt in order to diminish this issue. This concern appears in the usual Monte Carlomethods to price barrier options and the methods used there might be adapted to assist us here, see Gobet [2009]. A more complex model we could consider is the well-known Heston model:  dx t = ( r − q ) x t dt + √ v t x t dw t ,dv t = κ ( m − v t ) dt + ξ √ v t dw ∗ t ,dw t dw ∗ t = ρdt The price at time t of a general path-dependent option with maturity T and payoff g : Λ T −→ R can be written as thefunctional f ( Y t , v ) and solves the PPDE  ∆ t f ( Y t , v ) + ( r − q ) y t ∆ x f ( Y t , v ) + 12 vy t ∆ xx f ( Y t , v ) − rf ( Y t , v )+ κ ( m − v ) ∂ v f ( Y t , v ) + ξ v∂ vv f ( Y t , v ) + ρξvy t ∆ x ∂ v f ( Y t , v ) = 0 f ( Y T , v ) = g ( Y T ) . (19)19 PREPRINT - A

PRIL

7, 2020 Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t0.00.51.01.52.02.53.03.54.0 f ( Y t ) Solution of Test Path 1

TruePredTerminal Value0.0 0.2 0.4 0.6 0.8 1.0t012345 f ( Y t ) Solution of Test Path 2

TruePredTerminal Value 0.0 0.2 0.4 0.6 0.8 1.0t0.00.20.40.60.81.01.21.4 f ( Y t ) Solution of Test Path 3

TruePredTerminal Value

Figure 18: Three representative paths with corresponding solutions for the down-and-out call option.The generalization of our algorithm to this multidimensional case is straightforward. We will consider the geometricAsian option as in Section 4.2.1. For the numerical implementation, we specify r = 0 . , q = 0 . , κ = 3 , m =1 , ξ = 1 , ρ = 0 . , x = v = 1 . For a ﬁxed maturity time T = 1 , Figure 19 plots a pair of stock prices path realizationand volatility path realization on the left-hand side. Solutions predicted from our algorithm are shown on the right bywith different strikes from 0 to 1. On the left of Figure 20, we plot the patterns of option prices versus strikes withﬁxed maturity T = 1 , and on the right we plot the option prices versus different maturities with ﬁxed strike price K = 0 . . Y t V t f ( Y t ) Strike = 0.0Strike = 0.2Strike = 0.4Strike = 0.6Strike = 0.8Strike = 1.0Terminal Value

Figure 19: Solutions to the Heston model given a pair of paths of ( Y t , V t ) (on the left) by varying the strike prices.20 PREPRINT - A

PRIL

7, 2020 P r i c e s T P r i c e s Figure 20: On the left: prices vs strike prices K . On the Right: prices vs maturity times T . This is the last numerical example. We consider a non-linear PPDE with closed-form solution. This example wasstudied in Ren and Tan [2017].The closed-formula solution is given by f ( Y t ) = cos( y t + I t ) , where I t = (cid:82) t y u du is the running integral, and thePPDE being considered is  ∆ t f ( Y t ) + ( µ { ∆ x f ( Y t ) > } + µ { ∆ x f ( Y t ) < } )∆ x f ( Y t )+ 12 ( σ { ∆ xx f ( Y t ) < } + σ { ∆ xx f ( Y t ) > } )∆ xx f ( Y t ) + φ ( Y t ) = 0 ,f ( Y T ) = g ( Y T ) . We want to ﬁnd the suitable source φ so f satisﬁes the PPDE above. Notice that ∆ t f ( Y t ) = − sin( y t + I t ) y t ∆ x f ( Y t ) = − sin( y t + I t )∆ xx f ( Y t ) = − cos( y t + I t ) Plugging them into the PPDE, it yields − sin( y t + I t ) y t + ( µ { ∆ x f ( Y t ) > } + µ { ∆ x f ( Y t ) < } )[ − sin( y t + I t )]+ 12 ( σ { ∆ xx f ( Y t ) < } + σ { ∆ xx f ( Y t ) > } )[ − cos( y t + I t )] + φ ( Y t ) = 0 . Rearranging − sin( y t + I t )( y t + µ { ∆ x f ( Y t ) > } + µ { ∆ x f ( Y t ) < } ) −

12 cos( y t + I t )( σ { ∆ xx f ( Y t ) < } + σ { ∆ xx f ( Y t ) > } ) + φ ( Y t ) = 0 . Then, φ ( Y t ) = ( y t + µ ) min (sin( y t + I t ) ,

0) + ( y t + µ ) max (sin( y t + I t ) , σ y t + I t ) ,

0) + σ y t + I t ) , . Moreover g ( Y T ) = cos( y T + I T ) . The motivation for this problem is the following stochastic differential game: u = inf µ ∈ [ µ,µ ] sup σ ∈ [ σ,σ ] E (cid:34) g ( X µ,σT ) + (cid:90) T φ ( X µ,σt ) dt (cid:35) , PREPRINT - A

PRIL

7, 2020where dx µ,σt = µ t dt + σ t dW t with x µ,σ = 0 . We use standard Brownian motion paths to train the neural network. We specify the coefﬁcients to be µ = − . , µ =0 . , σ = 0 . , and σ = 0 . . While keeping track of the signs of spatial derivatives, our algorithm works in thesame way as in the other examples. Loss reaches around . × − after 15000 epochs, and loss is plotted inFigure 21. Three representative paths with their corresponding solutions are presented in Figure 22. Test path 1 isa realization of standard Brownian motion path. Test path 2 is a smooth path y t = (1 − t ) . Test path 3 is is y t i ∼ U (1 , , i ∈ { , . . . , } . L o ss Train / Test Loss

Train LossTest Loss

Figure 21: Train and test losses for the path-independent example. Y t Test Path 1Test Path 2Test Path 3 0.0 0.2 0.4 0.6 0.8 1.0t0.750.500.250.000.250.500.751.00 f ( Y t ) Solution to Test Path 1

TruePredTerminal Value0.0 0.2 0.4 0.6 0.8 1.0t0.60.70.80.91.0 f ( Y t ) Solution to Test Path 2

TruePredTerminal Value 0.0 0.2 0.4 0.6 0.8 1.0t1.00.80.60.40.20.00.2 f ( Y t ) Solution of Test Path 3

TruePredTerminal Value

Figure 22: Three representative paths with corresponding solutions for the non-linear PPDE.22

PREPRINT - A

PRIL

7, 2020

We have proposed a new method to solve PPDEs based on neural networks, called Path-Dependent Deep GalerkingMethod (PDGM). A novel network architecture was developed in order to deal with the objects from the functionalItˆo calculus. There are very few methods available to solve these equations; for a discussion about them, see Ren andTan [2017] and references therein. We then showed the vast capabilities of the PDGM in various examples.Future work could be divided between two main avenues. Firstly, one could study theoretical questions regarding thePDGM method as its consistency, speed of convergence and stability. Secondly, one could apply the method to morecomplex situations. As mentioned in the introduction, one could also extend PDGM to the different family of PPDEsoriginated from Viens and Zhang [2019].Additionally, the notion of monotone numerical schemes was generalized to the PPDE setting in Ren and Tan [2017].As future research one could study if the proposed method here is monotonic as deﬁned in aforesaid reference.

References

A. Al-Aradi, A. Correia, and Y. F. S. D. Naiff, G. Jardim. Applications of the Deep Galerkin Method to SolvingPartial Integro-Differential and Hamilton-Jacobi-Bellman Equations. preprint , 2019. Available at arXiv: http://arxiv.org/abs/1912.01455 .Y. Bengio, P. Simard, , and P. Frasconi. Learning Long-Term Dependencies with Gradient Descent is Difﬁcult.

IEEETransactions on Neural Networks , 5:157–166, 1994.R. Cont and D.-A. Fourni´e. Change of Variable Formulas for Non-Anticipative Functional on Path Space.

J. Funct.Anal. , 259(4):1043–1072, 2010.B. Dupire. Functional Itˆo Calculus.

Quantitative Finance , 2019:721–729, 2009.W. E, J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differ-ential equations and backward stochastic differential equations.

Communications in Mathematics and Statistics , 5(4):349–380, 2017.W. E, J. Han, and A. Jentzen. Solving high-dimensional partial differential equations using deep learning.

Proceedingsof the National Academy of Sciences , 115(34):8505–8510, 2018.I. Ekren, C. Keller, N. Touzi, and J. Zhang. On Viscosity Solutions of Path Dependent PDEs.

Ann. Probab. , 42:204–236, 2014.I. Ekren, N. Touzi, and J. Zhang. Viscosity Solutions of Fully Nonlinear Parabolic Path Dependent PDEs: Part I.

Ann.Probab. , 44:1212–1253, 2016a.I. Ekren, N. Touzi, and J. Zhang. Viscosity Solutions of Fully Nonlinear Parabolic Path Dependent PDEs: Part II.

Ann. Probab. , 44:2507–2553, 2016b.F. Flandoli and G. Zanco. An inﬁnite-dimensional approach to path-dependent Kolmogorov equations.

Ann. Probab. ,44, 2016.J.-P. Fouque and Z. Zhang. Deep learning methods for mean ﬁeld control problems with delay.

Available in ArXiv: https: // arxiv. org/ abs/ 1905. 00358 , 2019.J.-P. Fouque, G. Papanicolaou, R. Sircar, and K. Sølna.

Multiscale Stochastic Volatility for Equity, Interest Rate, andCredit Derivatives . Cambridge University Press, 2011.E. Gobet. Advanced monte carlo methods for barrier and related exotic options. In A. Bensoussan, Q. Zhang, andP. Ciarlet, editors,

Handbook of Numerical Analysis , pages 497–528. Elsevier, 2009.S. Hochreiter and J. Schmidhuber. Long short-term memory.

Neural computation , 9(8):1735–1780, 1997.A. Jacquier and M. Oumgari. Deep PPDEs for rough local stochastic volatility.

Available in ArXiv: https: //arxiv. org/ abs/ 1906. 02551 , 2019.S. Jazaerli and Y. F. Saporito. Functional Itˆo Calculus, Path-dependence and the Computation of Greeks.

StochasticProcess. Appl. , 127:3997–4028, 2017.I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artiﬁcial Neural Networks for Solving Ordinary and Partial DifferentialEquations.

IEEE Transactions on Neural Networks , 9(5):987–1000, 1998.H. Lee and I. S. Kang. Neural algorithm for solving differential equations.

Journal of Computational Physics , 91(1):110–131, Nov. 1990. ISSN 00219991. doi: 10.1016/0021-9991(90)90007-N. URL http://linkinghub.elsevier.com/retrieve/pii/002199919090007N .23

PREPRINT - A

PRIL

7, 2020H. Oberhauser. An extension of the Functional Itˆo Formula under a Family of Non-dominated Measures.

Stoch. Dyn. ,16, 2016.D. R. Parisi, M. C. Mariani, and M. A. Laborde. Solving differential equations with unsupervised neural net-works.

Chemical Engineering and Processing: Process Intensiﬁcation , 42(8-9):715–721, Aug. 2003. ISSN02552701. doi: 10.1016/S0255-2701(02)00207-6. URL http://linkinghub.elsevier.com/retrieve/pii/S0255270102002076 .T. Pham and J. Zhang. Two Person Zero-sum Game in Weak Formulation and Path Dependent Bellman-Isaacs Equa-tion.

SIAM J. Control Optim. , 52(4):2090–2121, 2014.R. G. Pinsky.

Positive Harmonic Functions and Diffusion . Cambridge University Press, 1995.M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework forsolving forward and inverse problems involving nonlinear partial differential equations.

Journal of ComputationalPhysics , 378:686 – 707, 2019.E. Reiner and M. Rubinstein. Breaking Down the Barriers.

Risk Magazine , 4, 1991.Z. Ren and X. Tan. On the convergence of monotone schemes for path-dependent PDEs.

Stochastic Process. Appl. ,127:1738–1762, 2017.L. Rogers and D. Williams.

Diffuions, Markov Processes and Martingales . Cambridge Mathematical Library, secondedition, 2000.Y. F. Saporito. Stochastic Control and Differential Games with Path-Dependent Inﬂuence of Controls on Dynamicsand Running Cost.

SIAM Journal on Control and Optimization , 57(2):1312–1327, 2019. Available at arXiv: http://arxiv.org/abs/1611.00589 .J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations.

Journalof Computational Physics , 375:1339–1364, 2018.F. Viens and J. Zhang. A martingale approach for fractional brownian motions and related path dependent pdes.