[PDF] A differential neural network learns stochastic differential equations and the Black-Scholes equation for pricing multi-asset options

Abstract

Neural networks with sufficiently smooth activation functions can approximate values and derivatives of any smooth function, and they are differentiable themselves. We improve the approximation capability of neural networks by utilizing the differentiability of neural networks; the gradient and Hessian of neural networks are used to train the neural networks to satisfy the differential equations of the problems of interest. Several activation functions are also compared in term of effective differentiation of neural networks. We apply the differential neural networks to the pricing of financial options, where stochastic differential equations and the Black-Scholes partial differential equation represent the relation of price of option and underlying assets, and the first and second derivatives, Greeks, of option play important roles in financial engineering. The proposed neural network learns -- (a) the sample paths of option prices generated by stochastic differential equations and (b) the Black-Scholes equation at each time and asset price. Option pricing experiments were performed on multi-asset options such as exchange and basket options. Experimental results show that the proposed method gives accurate option values and Greeks; sufficiently smooth activation functions and the constraint of Black-Scholes equation contribute significantly for accurate option pricing.

Full PDF

aa r X i v : . [ c s . C E ] J u l A diﬀerential neural network learns stochasticdiﬀerential equations and the Black-Scholes equationfor pricing multi-asset options

Sang-Mun Chi

Department of Computer Science, Kyungsung University, Busan 48434, South Korea

Abstract

Neural networks with suﬃciently smooth activation functions can approximatevalues and derivatives of any smooth function, and they are diﬀerentiable them-selves. We improve the approximation capability of neural networks by utilizingthe diﬀerentiability of neural networks; the gradient and Hessian of neural net-works are used to train the neural networks to satisfy the diﬀerential equationsof the problems of interest. Several activation functions are also compared interm of eﬀective diﬀerentiation of neural networks. We apply the diﬀerentialneural networks to the pricing of ﬁnancial options, where stochastic diﬀeren-tial equations and the Black-Scholes partial diﬀerential equation represent therelation of price of option and underlying assets, and the ﬁrst and second deriva-tives, Greeks, of option play important roles in ﬁnancial engineering. The pro-posed neural network learns – (a) the sample paths of option prices generatedby stochastic diﬀerential equations and (b) the Black-Scholes equation at eachtime and asset price. Option pricing experiments were performed on multi-asset options such as exchange and basket options. Experimental results showthat the proposed method gives accurate option values and Greeks; suﬃcientlysmooth activation functions and the constraint of Black-Scholes equation con-tribute signiﬁcantly for accurate option pricing.

Keywords: diﬀerential neural networks, stochastic diﬀerential equations, theBlack-Scholes equation, option pricing, smooth activation functions.

Preprint submitted to Journal of L A TEX Templates July 3, 2020 . Introduction

Neural networks (NNs) have found successful applications such as imageunderstanding, handwriting recognition, speech recognition, and drug discov-ery Krizhevsky et al. (2012); Le et al. (2012); Graves and Schmidhuber (2009);Deng and Yu (2014); Ramsundar et al. (2015). This success is due to the uni-versal approximation capability of NNs, approximating the function accuratelywithout assuming any mathematical relationship between the input and theoutput of a function. Whereas no mathematical relationships are found for theforegoing applications, strong governing diﬀerential equations are presented inmany ﬁelds such as ﬁnance, ﬂuid dynamics, quantum mechanics, and diﬀusionphenomena. For these problems, the approximation capabilities of NNs can beimproved when the diﬀerential relationships are included in the modeling.Neural networks with suﬃciently smooth activation functions can approxi-mate a function and its derivatives Cybenko (1989); Hornik et al. (1990); Gallant and White(1992); they are also diﬀerentiable because the composition of diﬀerentiablefunctions is diﬀerentiable. Thus, we can obtain the gradient and Hessian ofdiﬀerentiable NNs, and use these derivatives to make the NNs themselves sat-isfy the diﬀerential equations of the problems of interest. To obtain accuratederivatives of NNs, we calculate the gradient and Hessian of NNs using au-tomatic diﬀerentiation Nocedal and Wright (2006); Abadi et al. (2015); Raissi(2018), which is an exact method based on the chain rule for diﬀerentiatingcompositions of functions. Since the diﬀerentiation of NNs plays a key role inthe present work, activation functions should be suﬃciently smooth as well aseﬀective for our applications. Although any smooth activation function makesNNs diﬀerentiable, we need the activation functions which show good practicalperformance; we compare several activation functions in terms of estimationaccuracy for the value, gradient, and Hessian.We need suitable benchmark problems to examine the eﬀectiveness of ourdiﬀerential NN. We consider ﬁnancial option pricing because the option pricingproblem has a widely known mathematical relationship, the Black-Scholes equa-2ion Black and Scholes (1973), and the derivatives of option price, Greeks, areextensively studied since they have many practical applications such as hedgingor speculating the future asset price Hull (2018); Shreve (2004); Glasserman(2004). Hence, the performance of the proposed diﬀerential NNs can be easilytested using the experiments on option pricing.Most multidimensional options have no general closed-form solution for pric-ing and they are priced by numerical approximation techniques. Various numer-ical methods have therefore been developed to solve this problem such as ﬁnitediﬀerences methods (FDMs), Fourier methods, and Monte Carlo (MC) simula-tions Hull (2018); Shreve (2004); Carr et al. (2001); Glasserman (2004). FDMsapproximate the discrete version of Black-Scholes partial diﬀerential equation(PDE); MC methods use the generated sequence of stock prices governed by thestochastic diﬀerential equation (SDE). The present work combines the informa-tion of the PDE and SDE, and trains NNs using – (a) the option price generatedby the SDE and (b) the constraint of the Black-Scholes equation. As a result,the proposed method becomes mesh-free method like MC simulations, and doesnot suﬀer from the curse of dimensionality associated with high-dimensionalFDMs. Furthermore, Greeks are easily computed by diﬀerentiating the neuralnetwork because the proposed method represents the solution of Black-Scholesequation in the form of a neural network.NNs have also been applied for option pricing problems Gencay and Min Qi(2001); Kohler et al. (2010); Sirignano and Spiliopoulos (2018); E et al. (2017);Becker et al. (2019). Our approach relates to methods of diﬀerentiating NNsused in Sirignano and Spiliopoulos (2018); Raissi (2018). We use automaticdiﬀerentiation method to compute the gradient and Hessian of NNs, whileSirignano and Spiliopoulos (2018) use a MC method to approximate the Hes-sian; automatic diﬀerentiation is an exact operation and not a kind of approx-imation methods used in MC methods and FDMs Nocedal and Wright (2006);Abadi et al. (2015). The present work generates training data using the SDEwhich gives more realistic evolution of asset prices while Sirignano and Spiliopoulos(2018) use random samples from the region where the function is deﬁned. Raissi32018) uses automatic diﬀerentiation method to compute the gradient of NNsand SDE to generate training data; but, the loss function for training NN doesnot contain PDE and the Hessian of NNs are not used. In addition, we adoptsuﬃciently smooth activation function softplus Glorot et al. (2011) for the eﬃ-cient diﬀerentiation of NNs based on the performance comparison experiments,while Sirignano and Spiliopoulos (2018) use tanh and Raissi (2018) uses sine asan activation function.The remainder of the paper is organized as follows. We discuss the relatedworks on multidimensional option pricing and numerical approximation meth-ods in Section 2. The proposed method is developed for learning stochasticdiﬀerential equations and the Black-Scholes equation for option pricing in Sec-tion 3. The validity of the proposed method is demonstrated in Section 4, andconclusions are drawn in Section 5.

2. Price and Greeks of multidimensional options

A ﬁnancial option is a contract between the seller and the buyer (holder ofthe option). A European call option gives the holder the right but no obligationto buy the risky assets at expiration time T for a price that is agreed on now,the strike price K . The option itself has a price because it give the holder aright. Let the stock price S i ( t ) be a geometric Brownian motion, dS i ( t ) = rS i ( t ) dt + σ i S i ( t ) d ˜ W i ( t ) , (1)where t denotes time, r risk-free interest rate, σ i the volatility of the asset price,and ˜ W i a Brownian motion under a risk-neutral probability measure with covari-ance Cov ( ˜ W i ( t ) , ˜ W j ( t )) = ρ ij t . The correlated Brownian motion ˜ W i ( t ) can beexpressed as √ tL i Z where Z is a standard n -dimensional normally distributedvector and L i is the i th row of L ; L represents any matrix such that Σ = LL ′ where the ( i, j ) component of Σ is ρ ij . We choose L to be the Cholesky factorof Σ. Then the stock prices between time t and t are related by S i ( t ) = S i ( t ) exp(( r − σ i )( t − t ) + σ i √ t − t L i Z ) (2)4lasserman (2004); Shreve (2004); Hull (2018). An exchange option gives the holder the right to exchange one asset foranother in given time in the future and it is commonly seen in the energymarket. The payoﬀ of exchange option ismax( S ( T ) − S ( T ) , . (3)Margrabe (1978) developed an analytical solution formula for the price of thisoption. The price of exchange option at time t is given by u ( t, S , S ) = S N ( d ) − S N ( d ) , (4)where σ = p σ + σ − ρ σ σ , d = σ √ T − t h ln (cid:16) S S (cid:17) + σ ( T − t ) i , d = d − σ √ T − t , and N ( x ) = √ π R x −∞ e − z dz. Greeks are the sensitivities of the option price to the movement of variousparameters. They are used for risk management; the risk in a short positionin an option is oﬀset by holding delta units of each underlying asset, wherethe delta is the partial derivative of the option price with respect to the currentprice of that underlying asset. Sensitivities with respect to other parameters arealso widely used to measure and manage risk. The Greeks of exchange optionis obtained by diﬀerentiating Eq.(4). Let φ ( x ) = √ π exp( − x ), diﬀerentiatingEq.(4) by S and using the property φ ( d ) /φ ( d ) = S /S gives the delta∆ = ∂u∂S = N ( d ) . (5)In some other hedging strategy, we need to hedge away the risk due to thechanges of underlying asset’s delta; the gamma is deﬁned byΓ = ∂ ∆ ∂S = ∂ u∂S = φ ( d ) S σ √ T − t . (6)The time decay of the value for an option is called theta, given byΘ = ∂u∂t = − σ √ T − t [ S φ ( d ) + S φ ( d )] . (7)5 .2. Basket option A European basket call option gives the holder to buy a group of underlyingassets at the same time. The price of the option at maturity are given by C ( T, S ( T ) , S ( T ) , . . . , S n ( T )) = max( n X i =1 w i S i ( T ) − K,

0) (8)where w i denotes the quantity of i th asset in basket option contracts. Sincethis option has no exact formula for price and Greeks, these values are approx-imated by numerical methods such as ﬁnite diﬀerence methods (FDMs) andMonte Carlo (MC) simulations Hull (2018); Shreve (2004); Glasserman (2004).While FDMs are accurate in low dimensional problems, they become infeasi-ble in higher dimensions due to increased number of grid points and numericalinstability. MC methods are versatile for handling general type of options, multi-asset, and path dependent problems. But, the MC method converges slow atrate O (1 / √ M ), where M denotes the number of random samples.FDMs approximate the price of option, u ( t, S , S , . . . , S n ), on the basis ofthe Black-Scholes equation Hull (2018); Shreve (2004), ∂u∂t + 12 n X i =1 n X j =1 σ i σ j ρ i,j S i S j ∂ u∂S i ∂S j + r n X i =1 S i ∂u∂S i = ru, (9)where r denotes risk-free interest rate, and ( σ i σ j ρ i,j ) ≤ i,j ≤ n denotes the co-variance matrix of the stock prices. FDMs discretize the stock price and timedimensions and use the option values at the time of expiration. We use centraldiﬀerence equation to obtain the ﬁrst and second derivatives with respect tostock price, and forward diﬀerence equation to obtain time derivatives. TheGreeks at time zero are obtained using the solved u ( · ). The delta is given by∆ i = u (0 , S , . . . , S i + δ s , . . . , S n ) − u (0 , S , . . . , S i − δ s , . . . , S n )2 δ s , (10)where δ s denotes a stock price discretization unit. The gamma is given byΓ i = u (0 , . . . , S i + δ s , . . . ) − u (0 , . . . , S i , . . . ) + u (0 , . . . , S i − δ s , . . . ) δ s (11)The theta is given byΘ = u ( δ t , S , . . . , S n ) − u (0 , S , . . . , S n ) δ t , (12)6reeks Expectation of the following formulasExchange option∆ e − rT S ( T ) S (0) { S ( T ) > S ( T ) } Θ e − rT [ S ( T )( σ − σ L Z √ T ) − S ( T )( σ − σ L Z √ T )] { S ( T ) > S ( T ) } Γ e − rT ( Z ′ L − A − ) √ T S ( T ) S (0) { ( S ( T ) > S ( T ) } Basket option∆ i e − rT S i ( T ) S i (0) w i { P ni =1 w i S i ( T ) > K } Θ e − rT [ − rK + P ni =1 w i S i ( T )( σ i − σ i L i Z √ T )] { P ni =1 w i S i ( T ) > K } Γ i e − rT ( Z ′ L − A − ) i √ T S i (0) { w i S i ( T ) − P ni =1 w i S i ( T ) + K }× { P ni =1 w i S i ( T ) > K } Table 1: Greeks estimation by MC simulation. where δ t denotes a time discretization unit.MC methods generate multiple simulated paths of stock price by Eq. (2).The price of options is obtained by averaging the payoﬀ as follows. V ( t ) = E Q [ e − r ( T − t ) V ( T ) |F ( t )] , (13)where V ( T ) is the payoﬀ function, F ( t ) is a ﬁlteration for the Brownian motion˜ W i ( t ) in Eq.(1) and Q is the risk-neutral measure Hull (2018); Shreve (2004);Glasserman (2004). The Greeks estimation is a practical challenge in MC meth-ods. In the present work, the pathwise derivative estimate is used for calculatingthe delta and theta, and combination of likelihood ratio method and pathwisederivative estimate is used for calculating gamma. The Greeks at time zero aregiven by averaging the formulas in Table 1. Derivations of these formulas aredescribed in Appendix. 7 . Learning stochastic diﬀerential equations and the Black-Scholesequation We approximate the price of option u ( t, S , S , . . . , S n ) in Eq.(9) using aneural network. Expiry time T is divided into N (= 200) equal intervals t = 0

4, and one in the last layer l = 5.To train the neural network N ( t, S ), we need to know the target u ( t, S ).Although the u ( · ) is of course unknown at t < T , this function satisﬁes thefollowing SDE when the stock price follows Eq.(1) Hull (2018); Shreve (2004);Glasserman (2004). du = ( u t + n X i =1 rS i u S i + 12 S ′ HS ) dt + n X i =1 σ i S i u S i d ˜ W i , (16)where u t = ∂u∂t , S ′ = ( S , S , . . . , S n ) , H ij = ρ ij σ i σ j ∂ u∂S i ∂S j and u S i = ∂u∂S i .Based on this SDE and the derivatives of the neural network N ( t, S ), we gen-erate the following ˜ u ( t k , S ( t k )) that approximates u ( t k , S ( t k )).8 u ( t k , S ( t k )) = N ( t k − , S ( t k − )) + {N t ( t k − , S ( t k − ))+ n X i =1 rS i ( t k − ) N S i ( t k − , S i ( t k − )) + 12 S ( t k − ) ′ H k − S ( t k − ) } dt k + n X i =1 σ i S i ( t k − ) N S i ( t k − , S i ( t k − )) d ˜ W i ( t k ) , (17)where N t and N S i respectively denote partial derivatives of the neural network N ( t, S ) with respect to t and S i , ( H k − ) ij = ρ ij σ i σ j ∂ N ( t k − ,S ( t k − )) ∂S i ∂S j , dt k = t k − t k − and d ˜ W i ( t k ) = √ t k − t k − L i Z . This ˜ u ( t k , S ( t k )) is used to train N ( t k , S ( t k )) with the following loss function which is minimized during trainingthe neural network. L SDE = N X k =1 [˜ u ( t k , S ( t k )) − N ( t k , S ( t k ))] . (18) We enforce the neural network N ( t, S ) to satisfy the Black-Scholes equationin Eq. (9) by minimizing the following representation of Black-Scholes equationin the form of a neural network L BS = m X k =1 [ N t ( t k , S ( t k )) + n X i =1 rS i ( t k ) N S i ( t k , S i ( t k ))+ 12 S ( t k ) ′ H k S ( t k ) − r N ( t k , S ( t k ))] . (19)We need to diﬀerentiate the neural network with respect to variables t k and S ( t k ) for the calculation of Eq. (17) and Eq. (19). This diﬀerentiation isperformed by using automatic diﬀerentiation. Automatic diﬀerentiation is anexact diﬀerentiation method, not an approximation method, so that it givesaccurate diﬀerential results. It apply the chain rule for diﬀerentiating compo-sitions of a set of elementary functions for which derivatives are known exactlyNocedal and Wright (2006); Abadi et al. (2015); Raissi (2018). Since softwarelibraries like Tensorﬂow Abadi et al. (2015) already provide operations for au-tomatic diﬀerentiation, we used the tensorﬂow.GradientTape operation in Ten-sorFlow to calculate the gradient of neural network; tensorﬂow.GradientTape9peration is performed to the gradient of NN for the calculation of the Hessianof NN.At expiration time T , true u ( T, S , S , . . . , S n ) is given by the payoﬀ func-tion. To match the option value at maturity time T , the third loss functionis L T = [ N ( T, S ( T )) − h ( T )] ∗ w T , (20)where h ( T ) denotes the payoﬀ function such as Eq.(3) and Eq.(8), and w T denotes the weight for accurate approximation at T , w T = N/ . ) as adefault weight in this work. Finally, we use the following loss function thatpenalizes the neural network output deviations from the stochastic diﬀerentialequation, the neural network diﬀerential structure deviations from Black-Scholesequation, and the neural network output deviations from the payoﬀ function atexpiration time T , Loss = L SDE + L BS + L T . (21) We call our neural network SDBS, because it is based on the stochasticdiﬀerential equation and the Black-Scholes equation. We summarize the train-ing procedure of SDBS in Algorithm 1. For updating the parameters of SDBS(line 12 in Algorithm 1), we use a stochastic gradient descent method, adam,that is based on adaptive estimation of ﬁrst-order and second-order momentsKingma and Ba (2014). It is commonly observed that a monotonically decreas-ing learning rate results in a better performing model. The present work usesa PolynomialDecay schedule in Tensorﬂow and the learning rate is linearly de-cayed from 10 − to 10 − . As the ﬁnal trained model, we choose the model whichhas the minimum Loss during training iterations.The estimation procedure is similar to the training procedure. There arethree diﬀerences: (1) in addition to the input of training procedure, the pa-rameters of trained SDBS is loaded, (2) the learning rate is ﬁxed by 10 − inline 12 of Algorithm 1, (3) the line 13, 14, and 15 are replace by output Price= N ( t, S ( t )), ∆ i = N S i ( t, S ( t )), Γ i = N S i S i ( t, S ( t )), Θ = N t ( t, S ( t )).10 lgorithm 1 Training of SDBS. Input: S ( t ), T, σ i , r, ρ ij , N , nEpoch Calculate L representing Σ = LL ′ , t = 0 for epoch=1, 2, · · · , nEpoch do Calculate N , N t , N S i , H at ( t , S ( t )) for k = 1 , , · · · , N do t k = t k − + T /N Calculate ˜ u ( t k , S ( t k )) by Eq. (17) Calculate S ( t k ) by Eq. (14) and Calculate N , N t , N S i , H at ( t k , S ( t k )) end for Calculate

Loss by Eq. (21)

Update the parameters of neural network by minimizing

Loss if Loss is reduced comparing with that in the previous epoch then

Save parameters of SDBS end if end for / (1 + e − x ) (0, 1) C ∞ tanh ( e x − e − x ) / ( e x + e − x ) (-1, 1) C ∞ sin sin( · ) (-1, 1) C ∞ relu  x ≤ ,x if x > . (0 , ∞ ) C elu  α ( e x −

1) if x ≤ ,x if x > . ( − α, ∞ )  C if α = 1 ,C otherwise . selu λ  α ( e x −

1) if x ≤ ,x if x > . ( − λα, ∞ ) C softplus ln(1 + e x ) (0 , ∞ ) C ∞ Table 2: Classiﬁcation of activation functions

Activation functions determine the output of a node and the characteristicsof NNs. They have various shape, range, order of continuity, and magnitude ofgradient. Every function in Table 2 has nonlinear shape because nonlinearityof the activation functions allows NNs to be universal function approximatorsCybenko (1989); Hornik et al. (1990). The range of function is ﬁnite for sigmoid,tanh, and sin while inﬁnite for relu Nair and Hinton (2010), elu Clevert et al.(2015), selu Klambauer et al. (2017), softplus Glorot et al. (2011). The bound-edness and continuity of partial derivatives of the activation functions up toorder m are required to approximate the functions with all partial derivativesup to m are continuous and bounded Hornik et al. (1990); Gallant and White(1992). Hence, the order of continuity of activation functions critically aﬀectsthe performance of the proposed method in which the ﬁrst and second deriva-tives of a neural network play a key role. This consideration suggests that thefollowing suﬃciently smooth activation functions are suitable for the presentwork. C ∞ functions = { sigmoid, tanh, sin, softplus } . (22)12he magnitude of gradient of activation function should not vanish for theupdate of parameters of NNs during training. When the sigmoid and tanh areeither too high or too low, the gradient vanishing is observed. The followingrelu-like functions have been widely used in many neural network applicationsbecause they show high performance and their gradients are non-zero for allpositive values. relu-like functions = { relu, elu, selu, softplus } , (23)where α = 1 is used for elu and pre-deﬁned constants α = 1 . λ = 1 .

4. Numerical experiments

In this section, numerical experiments are performed to demonstrate theeﬀectiveness of the proposed method for pricing multidimensional options suchas exchange and basket options.

We can compare the performance of numerical methods using an exchangeoption because this option has an exact solution. Margrabe (1978) provided theformula for the exchange option price in Eq.(4). By diﬀerentiating the Margrabeformula, we obtained the Greeks in Eq.(5)-(7). In the same notation in section2.1, initial stock prices S (0) = 60 and S (0) = 60, time to maturity T = 1,volatility of ﬁrst asset σ = 0 .

4, volatility of second asset σ = 0 .

2, interestrate r = 0 .

1, and correlation coeﬃcient ρ = 0 . (cid:12)(cid:12)(cid:12)(cid:12) exact − estimateexact (cid:12)(cid:12)(cid:12)(cid:12) . (24)13 xact FDM1 FDM2 MC1 MC2 MC3Price 8.777591 8.765359 8.776234 8.784203 8.776402 8.777109rError 0 1.39e-03 1.55e-04 7.53e-04 1.35e-04 ∆ 0.573140 0.572740 0.573102 0.573611 0.573094 0.57313rError 0 7.09e-04 7.80e-05 8.10e-04 9.24e-05 Γ 0.017726 0.017728 0.017726 0.017738 0.017724 0.017725rError 0 1.15e-04

Table 3: Numerical estimate of the price and Greeks of the exchange option. The best resultsare highlighted in bold face. nEpoch Exact 25,000 50,000 100,000 200,000 400,000Price 8.777591 8.777736 8.777681 8.777660 8.777587 8.777516rError 0 1.65e-05 1.02e-05 7.85e-06

Γ 0.017726 0.017692 0.017653 0.017699 0.017708 0.017701rError 0 1.90e-03 4.12e-03 1.52e-03

Table 4: SDBS estimate of the price and Greeks of the exchange option using several trainingepochs. The best results are highlighted in bold face.

14n Table 3, FDMs discretize the domain of stock price [0 , , T ]with 5,000 uniform intervals for FDM1 and 50,000 for FDM2. MC simulationscalculate the price and Greeks of options given by Eq. (13) and Table 1. MC1uses 10 simulated paths, 10 for MC2, and 10 for MC3. As can be seen inTable 3, FDMs and MC methods give accurate estimation. Furthermore, theﬁner discretization makes FDMs more accurate; MC methods estimate moreaccurately with more simulated paths.Five independent SDBS models are trained for several nEpochs using theAlgorithm 1. The batch size 10,000 of simulated stock price S ( t ) was used fortraining and estimation procedure. The price and Greeks are generated 1000times for each trained NN. The average of 5,000 (5 models x 1,000 times) valuesis used as the ﬁnal estimated values. Comparing with the results in Table3, the SDBS in Table 4 gives comparable performance to the classical FDMand MC methods. In particular, the estimation of price for nEpoch 200,000is signiﬁcantly accurate. These experiments show that the diﬀerential neuralnetwork based SDBS performs accurately which estimates Price = N ( t , S ( t )),∆ i = N S i ( t , S ( t )), Γ i = N S i S i ( t , S ( t )), and Θ = N t ( t , S ( t )).Table 5 shows the estimates of price and Greeks of exchange options withdiﬀerent initial stock price pairs ( S (0), S (0)), where the SDBS uses the sameexperimental conﬁguration as in Table 4 with 100,000 nEpoch. In the estimationof Price and ∆, the SDBS shows good performance; ﬁve bests and ﬁve secondbests out of total ten combinations of initial stock price pairs ( S (0), S (0));but, the performance of SDBS is degraded for the estimation of Γ and Θ.From these experiments, we ﬁnd that the diﬀerential neural network, SDBS,accurately estimates the value, the ﬁrst and second derivatives of solution func-tion. But, the SDBS consumes large memory for storing many simulated pathsand model parameters of NNs; our simulation conﬁguration needs 8GB memoryof NVIDIA GTX 1080. The SDBS takes about 4.5 hours for 10,000 trainingnEpoch. Thus, the current SDBS cannot be a practical substitute of MC simu-lations for the pricing of multidimensional European options.15 (0) , S (0) 20,60 40,60 60,60 60,40 60,20Price FDM2 5.64e-02 5.55e-04 1.55e-04 MC2

Γ FDM2 4.06e-02

MC2 1.74e-03 2.55e-04 9.47e-05

MC2 1.15e-03 1.55e-04

Table 5: rErrors of numerical methods for the exchange options with several initial stockprices. The best results are highlighted in bold face.

We consider a call option on a basket with four independent stocks. The pa-rameters are as in Korn and Zeytun (2013) ; T = 0 . , r = 0 .

06, ( S , S , S , S )= (40 , , , w , w , w , w ) = (0 . , . , . , . . We compare our method, SDBS used in Table5, with LN (the log-normal approximation of Levy (1992)), RG (the reciprocalgamma approximation of Milevsky and Posner (1998)), SLN (the shifted log-normal approximation of Korn and Zeytun (2013)), JU (the Taylor expansionapproximation of Ju (2002)), and MC2 which is the MC method with 10 sim-ulated paths used in Table 3, We rounded up to four decimals to be consistentwith previous results in Korn and Zeytun (2013).16 MC LN RG SLN JU MC2 SDBS( σ , σ , σ , σ ) = (0 . , . , . , .

60 0.5041 0.5037 0.5133 0.4719 0.5049 0.5049 0.50490 σ , σ , σ , σ ) = (0 . , . , . , .

60 2.7402 2.7463 2.7444 2.6729 2.7450 2.7441 2.74360 2.23e-03 1.53e-03 2.46e-02 1.75e-03 1.42e-03

65 1.4468 1.4413 1.4831 1.2550 1.4488 1.4479 1.44760 3.80e-03 2.51e-02 1.33e-01 1.38e-03 7.60e-04 ( σ , σ , σ , σ ) = (0 . , . , . , .

65 3.8179 3.8418 3.8123 3.5776 3.8336 3.8230 3.82150 6.26e-03 1.47e-03 6.29e-02 4.11e-03 1.34e-03

70 2.7011 2.7003 2.7430 2.2590 2.7135 2.7025 2.70190 σ , σ , σ , σ ) = (0 . , . , . , .

65 4.1555 4.3459 4.2836 4.0874 4.1973 4.1604 4.15880 4.58e-02 3.08e-02 1.64e-02 1.01e-02 1.18e-03

70 3.1196 3.1607 3.1798 2.6941 3.1710 3.1222 3.12070 1.32e-02 1.93e-02 1.36e-01 1.65e-02 8.33e-04

Table 6: Basket call option prices and rErrors (the upper and lower values in each cell). Thebest results are highlighted in bold face. K and ( σ , σ , σ , σ ), SDBS gives 8 bests and 1 equal best,LN 2 bests, RG 1 best, and JU 1 equal best performances. SDBS gives goodperformance for all levels of strike prices; it gives more accurate price of optionswith increasing volatilities and non-uniform volatilities. The simulation basedmethods such as MC2 and SDBS can eﬀectively process the options with largeand non-uniform volatilities accurately, while the analytic approximation meth-ods like LN, RG, SLN, and JU are degraded with such volatilities. This trendis consistent with the results in Korn and Zeytun (2013).Table 7 shows the rErrors of Greeks estimation obtained with the methodsJU, MC2 and SDBS used in Table 6. Since benchmark values are requiredfor the calculation of the relative errors of Greeks estimations, large number ofMC simulated paths(=10 ) were generated and Greeks were estimated with theformulas shown in Table 1. In ∆ estimation experiments, MC2 gives 11 bestand 1 equal best, and SDBS 10 best and 1 equal best, JU 2 best performancesin 24 combinations of K and ( σ , σ , σ , σ ). In estimation experiments of Γand Θ, MC2 gives 26 best, SDBS 3 best, and JU 1 best performances in 30combinations of K and ( σ , σ , σ , σ ). The estimation performance of SDBSis high for ∆, but low for Γ and Θ; this trend is consistent with the results onexchange options in Table 5.As can be seen in Table 6 and 7, the SDBS estimates accurately for priceand ∆ of basket option with four assets; for the estimation of Γ and Θ, the SDBSis worse than MC simulation methods but better than analytic approximationmethods. We expect that the loss L BS in Eq. (19) constrains the SDBS to satisfy theBlack-Scholes partial diﬀerential equation; the derivatives of SDBS, Greeks, areestimated more accurately with L BS than without it. To validate the eﬀective-ness of this constraint, we test several weights ( w = 10 − , − , 10 − , , ,18 σ , σ , σ , σ ) (0.5, 0.5, 0.5, 0.5) (0.6, 1.2, 0.3, 0.9) K

55 60 65 60 65 70∆ JU 2.15e-04 ∆ JU 9.75e-05 ∆ JU 1.28e-04

SDBS 5.57e-05 9.23e-05 2.46e-04 5.15e-04 6.89e-04 7.48e-04∆ JU 5.34e-04 2.27e-04 7.55e-04 1.09e-03 9.20e-03 1.87e-02MC2

SDBS 1.10e-04 JU 2.47e-03 1.84e-03 1.49e-03 5.22e-03 8.22e-03 1.43e-02MC2

SDBS 7.99e-03 2.98e-03 7.61e-03 2.29e-03 9.91e-03 6.70e-03Γ JU 2.96e-03 3.81e-03 4.10e-03 1.38e-01 1.50e-01 1.54e-01MC2 2.01e-04

SDBS JU 1.25e-04 8.40e-04 1.70e-03 1.96e-02 2.24e-02 1.28e-02MC2

SDBS 1.23e-03 8.23e-04 2.08e-03 8.70e-04 1.03e-02 6.38e-03Γ JU 1.16e-03

SDBS 3.81e-04 9.63e-04 2.49e-03 8.83e-04 8.10e-04 1.89e-03Θ JU 1.44e+00 8.76e-01 5.12e-01 1.40e-01 5.32e-02 1.91e-01MC2

Table 7: rErrors of Greeks estimation of basket options. The best results are highlighted inbold face. − − − ( S , S ) = (20 , S , S ) = (60 , S , S ) = (60 , Table 8: rErrors of weighted loss L w for the exchange option. The best results are highlightedin bold face. , ) in weighted loss L w L w = L SDE + w ∗ L BS + L T , (25)where w = 1 corresponds to the loss in Eq. (21); small w means small con-tribution of PDE constraint and relatively large contribution of the SDE, andlarge w means large contribution of PDE constraint and relatively small con-tribution of the SDE. We perform the same experiments as in Table 5 with( S , S ) = (20 , , (60 , ,

20) except that this experiment uses theweighted loss L w instead of the original loss in Eq. (21).Table 8 shows that this version of SDBS performs accurately for wide rangeof w , but it degrades when the w deviates from unity. This means that suitableconstraint of Black-Scholes equation enhances the performance of option pricing.Price estimation performs better with w ≤ w ≥

1. This is because weighting more to the loss of diﬀerentialequation L BS gives accurate estimation for derivatives while weighting moreto the loss of value L SDE gives accurate estimation for values. When L BS isremoved completely, no convergent value is obtained. These observations revealthat the loss L BS is essential for the accurate estimation of gradient and Hessianof NN, and both loss L BS and L SDE play a key role in making the SDBS suitablefor option pricing problems.

In section 3.4, we described the mathematical conditions of activation func-tions for the approximation of the functions with all partial derivatives; theboundedness and continuity of all its partial derivatives up to order m are re-quired to approximate the functions with all partial derivatives up to m arecontinuous and bounded Hornik et al. (1990); Gallant and White (1992). Theactivation functions such as sigmoid, tanh, and sin satisfy these conditions.But, the practical performance can diﬀer from one activation function to an-other. We compare the performance of several activation functions in terms oftheir suitability for option pricing problems.Using the same experimental setting used for SDBS in Table 5 with nEpoch100,000, we examine the rErrors of estimation with several activation functions.As can be seen in Table 9, the accuracy diﬀers substantially from activationfunction to activation function. Sigmoid and softplus show signiﬁcantly betteraccuracy than other methods for estimating price and ∆; but, sigmoid degradedfor ( S , S ) = (20 , C functions such as relu and selu give the lowestperformance, especially large errors( ∼ e-01 and e+00). Another relu-like C function elu gives intermediate performance, while relu-like inﬁnitely diﬀeren-tiable function softplus shows high performance. This is because the smoothness21 igmoid tanh sin relu elu selu softplus( S , S ) = (20 , ∆ 5.06e-01 5.29e-03 9.93e-01 9.94e-01 2.01e-01 7.59e-01 Γ 5.83e-01 1.90e-02 9.92e-01 1.00e+00 1.39e-01 9.51e-01

Θ 5.55e-01 3.40e-02 1.00e+00 1.00e+00 2.59e-01 9.07e-01 ( S , S ) = (60 , Γ 2.99e-03 3.65e-03 5.49e-03 1.00e+00 1.813-03 5.04e-03

Θ 8.14e-03 S , S ) = (60 , ∆ 8.50e-05 3.37e-04 4.49e-04 6.84e-05 1.68e-04 5.05e-03 Γ Table 9: rErrors of activation functions for exchange option. The best results are highlightedin bold face. of the neural network is determined by the smoothness of the component func-tions, and suﬃcient smoothness of activation functions is an essential factor fortraining the SDBS which is strongly based on diﬀerentiability of NNs. The ac-curacy of price estimation by relu-like functions in Eq. (23) is comparable tothat by C ∞ functions in Eq. (22); for example, elu shows good performance forestimating price comparing with C ∞ functions. This is because the vanishinggradient problem is reduced for relu-like functions.The above considerations suggest that relu-like functions are required for theestimation of accurate price and C ∞ functions are required for accurate Greeksestimation. Hence, the intersection of relu-like functions in Eq. (23) and C ∞ functions in Eq. (22), softplus, gives the best performance for the present work.22 . Conclusions We improve the approximation capability of NNs by making NNs have thediﬀerential relationship of the problem we wish to solve. Following this ap-proach, the proposed method learns the Black-Scholes equation of option price.The suﬃcient smoothness of activation functions is an essential factor for theproposed method which relies heavily on the diﬀerentiability of NNs; The soft-plus activation function is suitable for our method.The SDBS is trained using more realistic asset price paths generated bySDEs. Hence, the SDBS can accurately model option price and does not suﬀerfrom the curse of dimensionality associated with high dimensional FDMs. Sincethe SDBS utilizes the exact diﬀerential relationship along a single simulationpath, it can easily use parallel and distributed computing. We can also usethis single-path dependency for backward recursion in American option pricingwhich will be discussed in a separate paper.Further improvements can be made using sample paths generated by ad-vanced MC simulation methods such as variance reduction techniques and low-discrepancy sequences. Although the proposed method was applied to a typicalBlack-Scholes model of option pricing in this work, it can be modiﬁed to suitdiﬀerent models and option types. In addition, diﬀerential neural networks canbe applied to solve problems related to partial diﬀerential equations such asinverse problems, the Navier-Stokes equation, and the Schr¨odinger equation.

Appendix

In this section we derive the calculation formulas of Greeks in Table 1. Al-though the exchange option has the exact solution, we prepare the MC ap-proximation of Greeks for comparing with other numerical methods. Pathwisederivative estimate (PW) method ﬁrst diﬀerentiates e − r ( T − t ) V ( T ) in Eq.(13)and then expectation is performed Glasserman (2004), i.e., diﬀerentiation andexpectation is interchanged. For the delta of exchange option, e − r ( T − t ) [ S ( T ) − ( T )] { S ( T ) > S ( T ) } at t = 0 is diﬀerentiated with respect to S (0), ddS (0) e − rT V ( T ) = e − rT dS ( T ) dS (0) dV ( T ) dS ( T )= e − rT S ( T ) S (0) { S ( T ) > S ( T ) } , (26)where S ( T ) is presented in Eq. (2) and denotes a indicator function.Similarly to the calculation of delta, the theta of exchange option is obtainedby the expectation of the following, ∂e − r ( T − t ) V ( T − t ) ∂t | t =0 = ∂e − r ( T − t ) ∂t | t =0 V ( T ) + e − rT ∂V ( T − t ) ∂t | t =0 = re − rT V ( T ) + e − rT ∂ [ S ( T − t ) − S ( T − t )] { S ( T − t ) > S ( T − t ) } ∂t | t =0 = e − rT (cid:0) r [ S ( T ) − S ( T )] − [ S ( T )( r − σ σ L Z √ T ) − S ( T )( r − σ σ L Z √ T )] (cid:1) { S ( T ) > S ( T ) } = e − rT [ S ( T )( σ − σ L Z √ T ) − S ( T )( σ − σ L Z √ T )] { S ( T ) > S ( T ) } (27)where L i and Z are described in (2).We also use the PW method for pricing basket call option. The delta ofbasket option is calculated with V ( T ) in Eq.(8). The delta is given by theexpectation of de − rT V ( T ) dS i (0) = e − rT S i ( T ) S i (0) w i { n X i =1 w i S i ( T ) > K } (28)The theta of basket call option is given by the expectation of ∂e − r ( T − t ) V ( T − t ) ∂t | t =0 = e − rT [ − rK + n X i =1 w i S i ( T )( σ i − σ i L i Z √ T )] { n X i =1 w i S i ( T ) > K } (29)For the calculation of the gamma using PW method, the ﬁrst derivativesof payoﬀ functions are required to be diﬀerentiable functions of the parameter S i (0). But, the ﬁrst derivatives, the delta, are not diﬀerentiable for exchangeand basket options and PW cannot be applied for these options. Likelihood ratiomethod (LR), an alternate method, assumes that the assets S i has a probability24ensity g α and that α denotes a parameter of this density. To emphasize thatthe expectation is computed with respect to g α , E α is used. LR interchange theorder of diﬀerentiation and integration to derive a Greeks ddα E α [ V ( T )] = Z R n V ( T ) ddα g α ( x ) dx = Z R n V ( T ) ˙ g α ( x ) g α ( x ) g α ( x ) dx = E α [ V ( T ) ˙ g α ( x ) g α ( x ) ] (30)In the present work, Eq.(2) suggests that S i ( T ) for i = 1 , . . . , n , has thedistribution exp( Y i ). The vector Y = ( Y , Y , . . . , Y n ) ′ follows the normal dis-tribution N ( µ ( α ) , ˜Σ( α )), where µ ( α ; · ) denotes a vector with elements µ i =ln( S i (0)) + ( r − σ i ) T , ˜Σ( α ) = T A Σ A ′ ( A = diag ( σ , . . . , σ n )), and α =( S (0) , . . . , S n (0)). The probability density function of Y could be written by g α ( Y ) = 1 q (2 π ) n | ˜Σ( α ) | exp[ −

12 ( y − µ ( α )) ′ ˜Σ( α ) − ( y − µ ( α ))] . (31)We generate the sample paths Y by µ + √ T ALZ , where LL ′ = Σ, and Z denotesa standard normal distribution. Hence,˙ g S i (0) g S i (0) = ddS i (0) log( g S i (0) ( Y )) = ( Y − µ ) ′ ˜Σ − dµdS i (0) = ( √ T ALZ ) ′ ˜Σ − dµdS i (0)= ( √ T ALZ ) ′ ( T A Σ A ′ ) − (0 , · · · , S i (0) , · · · , ′ = ( Z ′ L − A − ) i √ T S i (0) (32)where ( Z ′ L − A − ) i denotes the i th component of the row vector Z ′ L − A − .For calculating gamma in the present study, LR method is ﬁrst applied tothe delta calculation and this delta is diﬀerentiated and averaged by the PWmethod. This LR-PW method for gamma calculation gives superior perfor-mance compared with the pure LR method in Glasserman (2004).For the exchange option, the gamma is given by the expectation of the25ollowing expression ddS (0) (cid:0) e − rT max[ S ( T ) − S ( T ) ,

0] ( Z ′ L − A − ) √ T S (0) (cid:1) = e − rT (cid:0) d max[ S ( T ) − S ( T ) , dS (0) (cid:1) ( Z ′ L − A − ) √ T S (0)+ e − rT max[ S ( T ) − S ( T ) , ddS (0) (cid:0) ( Z ′ L − A − ) √ T S (0) (cid:1) = e − rT { ( S ( T ) > S ( T ) ( Z ′ L − A − ) √ T S (0)+ e − rT max[ S ( T ) − S ( T ) ,

0] ( Z ′ L − A − ) −√ T S (0)= e − rT ( Z ′ L − A − ) √ T S ( T ) S (0) { ( S ( T ) > S ( T ) } . (33)For the basket call option, the gamma is given by the expectation of thefollowing expression ddS i (0) [ e − rT max( n X i =1 w i S i ( T ) − K,

0) ( Z ′ L − A − ) i √ T S i (0) ]= e − rT ddS i (0) [max( n X i =1 w i S i ( T ) − K, Z ′ L − A − ) i √ T S i (0)+ e − rT max( n X i =1 w i S i ( T ) − K, ddS i (0) [ ( Z ′ L − A − ) i √ T S i (0) ]= e − rT w i { n X i =1 w i S i ( T ) > K } ( Z ′ L − A − ) i √ T S i (0)+ e − rT max( n X i =1 w i S i ( T ) − K, Z ′ L − A − ) i −√ T S i (0) ]= e − rT ( Z ′ L − A − ) i √ T S i (0) { w i S i ( T ) − n X i =1 w i S i ( T ) + K } { n X i =1 w i S i ( T ) > K } . (34) References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Leven-berg, J., Man´e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M.,26hlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V.,Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke,M., Yu, Y., Zheng, X., 2015. TensorFlow: Large-scale machine learningon heterogeneous systems. URL: . softwareavailable from tensorﬂow.org.Becker, S., Cheridito, P., Jentzen, A., Welti, T., 2019. Solving high-dimensionaloptimal stopping problems using deep learning. CoRR abs/1908.01602. URL: http://dblp.uni-trier.de/db/journals/corr/corr1908.html .Black, F., Scholes, M., 1973. The pricing of options and corporate liabilityies.the Journal of Political Economy 81, 637–654. doi: .Carr, P., Stanley, M., Madan, D., 2001. Option valuation using the fast fouriertransform. J. Comput. Finance 2. doi: .Clevert, D.A., Unterthiner, T., Hochreiter, S., 2015. Fast and accurate deep net-work learning by exponential linear units (elus). Under Review of ICLR2016(1997) .Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals, and Systems (MCSS) 2, 303–314. URL: http://dx.doi.org/10.1007/BF02551274 , doi: .Deng, L., Yu, D., 2014. Deep Learning: Methods and Ap-plications. Technical Report MSR-TR-2014-21. URL: .E, W., Han, J., Jentzen, A., 2017. Deep learning-based numerical methodsfor high-dimensional parabolic partial diﬀerential equations and backwardstochastic diﬀerential equations. Communications in Mathematics and Statis-tics 5, 349–380. doi: .Gallant, A.R., White, H., 1992. On learning the deriva-tives of an unknown mapping with multilayer feedfor-ward networks. Neural Networks 5, 129 – 138. URL:27 ,doi: https://doi.org/10.1016/S0893-6080(05)80011-5 .Gencay, R., Min Qi, 2001. Pricing and hedging derivative securities with neu-ral networks: Bayesian regularization, early stopping, and bagging. IEEETransactions on Neural Networks 12, 726–734. doi: .Glasserman, P., 2004. Monte Carlo methods in ﬁ-nancial engineering. Springer, New York. URL: .Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectiﬁer neu-ral networks, in: Gordon, G., Dunson, D., Dudk, M. (Eds.), Proceed-ings of the Fourteenth International Conference on Artiﬁcial Intelligenceand Statistics, PMLR, Fort Lauderdale, FL, USA. pp. 315–323. URL: http://proceedings.mlr.press/v15/glorot11a.html .Graves, A., Schmidhuber, J., 2009. Oﬄine handwriting recognition withmultidimensional recurrent neural networks, in: Koller, D., Schuurmans,D., Bengio, Y., Bottou, L. (Eds.), Advances in Neural InformationProcessing Systems 21. Curran Associates, Inc., pp. 545–552. URL: http://papers.nips.cc/paper/3449-offline-handwriting-recognition-with-multidimensional-recurrent-neural-networks.pdf .Hornik, K., Stinchcombe, M., White, H., 1990. Universal approx-imation of an unknown mapping and its derivatives using multi-layer feedforward networks. Neural Networks 3, 551 – 560. URL: ,doi: https://doi.org/10.1016/0893-6080(90)90005-6 .Hull, J., 2018. Options, futures, and other derivatives. 10. ed., pearson internat.ed ed., Pearson Prentice Hall.Ju, N., 2002. Pricing asian and basket options via taylor expansion. J. Comput.Finance 5, 79–103. 28ingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. URL: http://arxiv.org/abs/1412.6980 . cite arxiv:1412.6980Comment: Pub-lished as a conference paper at the 3rd International Conference for LearningRepresentations, San Diego, 2015.Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S., 2017. Self-normalizingneural networks, in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H.,Fergus, R., Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural In-formation Processing Systems 30. Curran Associates, Inc., pp. 971–980. URL: http://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf .Kohler, M., Krzyak, A., Todorovic, N., 2010. Pricing of high-dimensionalamerican options by neural networks. Mathematical Finance 20, 384–410.doi: .Korn, R., Zeytun, S., 2013. Eﬃcient basket monte carlo op-tion pricing via a simple analytical approximation. Journal ofComputational and Applied Mathematics 243, 48 – 59. URL: ,doi: https://doi.org/10.1016/j.cam.2012.10.035 .Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classiﬁcation withdeep convolutional neural networks, in: Proceedings of the 25th InternationalConference on Neural Information Processing Systems - Volume 1, CurranAssociates Inc., Red Hook, NY, USA. p. 10971105.Le, Q.V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean,J., Ng, A.Y., 2012. Building high-level features using large scale unsupervisedlearning, in: Proceedings of the 29th International Coference on InternationalConference on Machine Learning, Omnipress, Madison, WI, USA. p. 507514.Levy, E., 1992. Pricing european average rate currency options. Jour-nal of International Money and Finance 11, 474 – 491. URL: ,doi: https://doi.org/10.1016/0261-5606(92)90013-N .29argrabe, W., 1978. The value of an option to exchange one asset for another.Journal of Finance 33, 177–86. doi: .Milevsky, M.A., Posner, S.E., 1998. A closed-form approxima-tion for valuing basket options. The Journal of Derivatives 5,54–61. URL: https://jod.pm-research.com/content/5/4/54 ,doi: , arXiv:https://jod.pm-research.com/content/5/4/54.full.pdf .Nair, V., Hinton, G.E., 2010. Rectiﬁed linear units improve restricted boltzmannmachines, in: Proceedings of the 27th International Conference on Interna-tional Conference on Machine Learning, Omnipress, Madison, WI, USA. p.807814.Nocedal, J., Wright, S.J., 2006. Numerical Optimization. second ed., Springer,New York, NY, USA.Raissi, M., 2018. Forward-backward stochastic neural networks: Deeplearning of high-dimensional partial diﬀerential equations. arXiv preprintarXiv:1804.07010 .Ramsundar, B., Kearnes, S.M., Riley, P., Webster, D., Konerding, D.E.,Pande, V.S., 2015. Massively multitask networks for drug discovery. ArXivabs/1502.02072.Shreve, S.E., 2004. Stochastic calculus for ﬁnance 2, Continuous-time models. Springer, New York, NY; Heidelberg. URL: .Sirignano, J., Spiliopoulos, K., 2018. Dgm: A deep learning algorithm forsolving partial diﬀerential equations. Journal of Computational Physics 375,1339–1364. doi:10.1016/j.jcp.2018.08.029