A differential neural network learns stochastic differential equations and the Black-Scholes equation for pricing multi-asset options
aa r X i v : . [ c s . C E ] J u l A differential neural network learns stochasticdifferential equations and the Black-Scholes equationfor pricing multi-asset options
Sang-Mun Chi
Department of Computer Science, Kyungsung University, Busan 48434, South Korea
Abstract
Neural networks with sufficiently smooth activation functions can approximatevalues and derivatives of any smooth function, and they are differentiable them-selves. We improve the approximation capability of neural networks by utilizingthe differentiability of neural networks; the gradient and Hessian of neural net-works are used to train the neural networks to satisfy the differential equationsof the problems of interest. Several activation functions are also compared interm of effective differentiation of neural networks. We apply the differentialneural networks to the pricing of financial options, where stochastic differen-tial equations and the Black-Scholes partial differential equation represent therelation of price of option and underlying assets, and the first and second deriva-tives, Greeks, of option play important roles in financial engineering. The pro-posed neural network learns – (a) the sample paths of option prices generatedby stochastic differential equations and (b) the Black-Scholes equation at eachtime and asset price. Option pricing experiments were performed on multi-asset options such as exchange and basket options. Experimental results showthat the proposed method gives accurate option values and Greeks; sufficientlysmooth activation functions and the constraint of Black-Scholes equation con-tribute significantly for accurate option pricing.
Keywords: differential neural networks, stochastic differential equations, theBlack-Scholes equation, option pricing, smooth activation functions.
Preprint submitted to Journal of L A TEX Templates July 3, 2020 . Introduction
Neural networks (NNs) have found successful applications such as imageunderstanding, handwriting recognition, speech recognition, and drug discov-ery Krizhevsky et al. (2012); Le et al. (2012); Graves and Schmidhuber (2009);Deng and Yu (2014); Ramsundar et al. (2015). This success is due to the uni-versal approximation capability of NNs, approximating the function accuratelywithout assuming any mathematical relationship between the input and theoutput of a function. Whereas no mathematical relationships are found for theforegoing applications, strong governing differential equations are presented inmany fields such as finance, fluid dynamics, quantum mechanics, and diffusionphenomena. For these problems, the approximation capabilities of NNs can beimproved when the differential relationships are included in the modeling.Neural networks with sufficiently smooth activation functions can approxi-mate a function and its derivatives Cybenko (1989); Hornik et al. (1990); Gallant and White(1992); they are also differentiable because the composition of differentiablefunctions is differentiable. Thus, we can obtain the gradient and Hessian ofdifferentiable NNs, and use these derivatives to make the NNs themselves sat-isfy the differential equations of the problems of interest. To obtain accuratederivatives of NNs, we calculate the gradient and Hessian of NNs using au-tomatic differentiation Nocedal and Wright (2006); Abadi et al. (2015); Raissi(2018), which is an exact method based on the chain rule for differentiatingcompositions of functions. Since the differentiation of NNs plays a key role inthe present work, activation functions should be sufficiently smooth as well aseffective for our applications. Although any smooth activation function makesNNs differentiable, we need the activation functions which show good practicalperformance; we compare several activation functions in terms of estimationaccuracy for the value, gradient, and Hessian.We need suitable benchmark problems to examine the effectiveness of ourdifferential NN. We consider financial option pricing because the option pricingproblem has a widely known mathematical relationship, the Black-Scholes equa-2ion Black and Scholes (1973), and the derivatives of option price, Greeks, areextensively studied since they have many practical applications such as hedgingor speculating the future asset price Hull (2018); Shreve (2004); Glasserman(2004). Hence, the performance of the proposed differential NNs can be easilytested using the experiments on option pricing.Most multidimensional options have no general closed-form solution for pric-ing and they are priced by numerical approximation techniques. Various numer-ical methods have therefore been developed to solve this problem such as finitedifferences methods (FDMs), Fourier methods, and Monte Carlo (MC) simula-tions Hull (2018); Shreve (2004); Carr et al. (2001); Glasserman (2004). FDMsapproximate the discrete version of Black-Scholes partial differential equation(PDE); MC methods use the generated sequence of stock prices governed by thestochastic differential equation (SDE). The present work combines the informa-tion of the PDE and SDE, and trains NNs using – (a) the option price generatedby the SDE and (b) the constraint of the Black-Scholes equation. As a result,the proposed method becomes mesh-free method like MC simulations, and doesnot suffer from the curse of dimensionality associated with high-dimensionalFDMs. Furthermore, Greeks are easily computed by differentiating the neuralnetwork because the proposed method represents the solution of Black-Scholesequation in the form of a neural network.NNs have also been applied for option pricing problems Gencay and Min Qi(2001); Kohler et al. (2010); Sirignano and Spiliopoulos (2018); E et al. (2017);Becker et al. (2019). Our approach relates to methods of differentiating NNsused in Sirignano and Spiliopoulos (2018); Raissi (2018). We use automaticdifferentiation method to compute the gradient and Hessian of NNs, whileSirignano and Spiliopoulos (2018) use a MC method to approximate the Hes-sian; automatic differentiation is an exact operation and not a kind of approx-imation methods used in MC methods and FDMs Nocedal and Wright (2006);Abadi et al. (2015). The present work generates training data using the SDEwhich gives more realistic evolution of asset prices while Sirignano and Spiliopoulos(2018) use random samples from the region where the function is defined. Raissi32018) uses automatic differentiation method to compute the gradient of NNsand SDE to generate training data; but, the loss function for training NN doesnot contain PDE and the Hessian of NNs are not used. In addition, we adoptsufficiently smooth activation function softplus Glorot et al. (2011) for the effi-cient differentiation of NNs based on the performance comparison experiments,while Sirignano and Spiliopoulos (2018) use tanh and Raissi (2018) uses sine asan activation function.The remainder of the paper is organized as follows. We discuss the relatedworks on multidimensional option pricing and numerical approximation meth-ods in Section 2. The proposed method is developed for learning stochasticdifferential equations and the Black-Scholes equation for option pricing in Sec-tion 3. The validity of the proposed method is demonstrated in Section 4, andconclusions are drawn in Section 5.
2. Price and Greeks of multidimensional options
A financial option is a contract between the seller and the buyer (holder ofthe option). A European call option gives the holder the right but no obligationto buy the risky assets at expiration time T for a price that is agreed on now,the strike price K . The option itself has a price because it give the holder aright. Let the stock price S i ( t ) be a geometric Brownian motion, dS i ( t ) = rS i ( t ) dt + σ i S i ( t ) d ˜ W i ( t ) , (1)where t denotes time, r risk-free interest rate, σ i the volatility of the asset price,and ˜ W i a Brownian motion under a risk-neutral probability measure with covari-ance Cov ( ˜ W i ( t ) , ˜ W j ( t )) = ρ ij t . The correlated Brownian motion ˜ W i ( t ) can beexpressed as √ tL i Z where Z is a standard n -dimensional normally distributedvector and L i is the i th row of L ; L represents any matrix such that Σ = LL ′ where the ( i, j ) component of Σ is ρ ij . We choose L to be the Cholesky factorof Σ. Then the stock prices between time t and t are related by S i ( t ) = S i ( t ) exp(( r − σ i )( t − t ) + σ i √ t − t L i Z ) (2)4lasserman (2004); Shreve (2004); Hull (2018). An exchange option gives the holder the right to exchange one asset foranother in given time in the future and it is commonly seen in the energymarket. The payoff of exchange option ismax( S ( T ) − S ( T ) , . (3)Margrabe (1978) developed an analytical solution formula for the price of thisoption. The price of exchange option at time t is given by u ( t, S , S ) = S N ( d ) − S N ( d ) , (4)where σ = p σ + σ − ρ σ σ , d = σ √ T − t h ln (cid:16) S S (cid:17) + σ ( T − t ) i , d = d − σ √ T − t , and N ( x ) = √ π R x −∞ e − z dz. Greeks are the sensitivities of the option price to the movement of variousparameters. They are used for risk management; the risk in a short positionin an option is offset by holding delta units of each underlying asset, wherethe delta is the partial derivative of the option price with respect to the currentprice of that underlying asset. Sensitivities with respect to other parameters arealso widely used to measure and manage risk. The Greeks of exchange optionis obtained by differentiating Eq.(4). Let φ ( x ) = √ π exp( − x ), differentiatingEq.(4) by S and using the property φ ( d ) /φ ( d ) = S /S gives the delta∆ = ∂u∂S = N ( d ) . (5)In some other hedging strategy, we need to hedge away the risk due to thechanges of underlying asset’s delta; the gamma is defined byΓ = ∂ ∆ ∂S = ∂ u∂S = φ ( d ) S σ √ T − t . (6)The time decay of the value for an option is called theta, given byΘ = ∂u∂t = − σ √ T − t [ S φ ( d ) + S φ ( d )] . (7)5 .2. Basket option A European basket call option gives the holder to buy a group of underlyingassets at the same time. The price of the option at maturity are given by C ( T, S ( T ) , S ( T ) , . . . , S n ( T )) = max( n X i =1 w i S i ( T ) − K,
0) (8)where w i denotes the quantity of i th asset in basket option contracts. Sincethis option has no exact formula for price and Greeks, these values are approx-imated by numerical methods such as finite difference methods (FDMs) andMonte Carlo (MC) simulations Hull (2018); Shreve (2004); Glasserman (2004).While FDMs are accurate in low dimensional problems, they become infeasi-ble in higher dimensions due to increased number of grid points and numericalinstability. MC methods are versatile for handling general type of options, multi-asset, and path dependent problems. But, the MC method converges slow atrate O (1 / √ M ), where M denotes the number of random samples.FDMs approximate the price of option, u ( t, S , S , . . . , S n ), on the basis ofthe Black-Scholes equation Hull (2018); Shreve (2004), ∂u∂t + 12 n X i =1 n X j =1 σ i σ j ρ i,j S i S j ∂ u∂S i ∂S j + r n X i =1 S i ∂u∂S i = ru, (9)where r denotes risk-free interest rate, and ( σ i σ j ρ i,j ) ≤ i,j ≤ n denotes the co-variance matrix of the stock prices. FDMs discretize the stock price and timedimensions and use the option values at the time of expiration. We use centraldifference equation to obtain the first and second derivatives with respect tostock price, and forward difference equation to obtain time derivatives. TheGreeks at time zero are obtained using the solved u ( · ). The delta is given by∆ i = u (0 , S , . . . , S i + δ s , . . . , S n ) − u (0 , S , . . . , S i − δ s , . . . , S n )2 δ s , (10)where δ s denotes a stock price discretization unit. The gamma is given byΓ i = u (0 , . . . , S i + δ s , . . . ) − u (0 , . . . , S i , . . . ) + u (0 , . . . , S i − δ s , . . . ) δ s (11)The theta is given byΘ = u ( δ t , S , . . . , S n ) − u (0 , S , . . . , S n ) δ t , (12)6reeks Expectation of the following formulasExchange option∆ e − rT S ( T ) S (0) { S ( T ) > S ( T ) } Θ e − rT [ S ( T )( σ − σ L Z √ T ) − S ( T )( σ − σ L Z √ T )] { S ( T ) > S ( T ) } Γ e − rT ( Z ′ L − A − ) √ T S ( T ) S (0) { ( S ( T ) > S ( T ) } Basket option∆ i e − rT S i ( T ) S i (0) w i { P ni =1 w i S i ( T ) > K } Θ e − rT [ − rK + P ni =1 w i S i ( T )( σ i − σ i L i Z √ T )] { P ni =1 w i S i ( T ) > K } Γ i e − rT ( Z ′ L − A − ) i √ T S i (0) { w i S i ( T ) − P ni =1 w i S i ( T ) + K }× { P ni =1 w i S i ( T ) > K } Table 1: Greeks estimation by MC simulation. where δ t denotes a time discretization unit.MC methods generate multiple simulated paths of stock price by Eq. (2).The price of options is obtained by averaging the payoff as follows. V ( t ) = E Q [ e − r ( T − t ) V ( T ) |F ( t )] , (13)where V ( T ) is the payoff function, F ( t ) is a filteration for the Brownian motion˜ W i ( t ) in Eq.(1) and Q is the risk-neutral measure Hull (2018); Shreve (2004);Glasserman (2004). The Greeks estimation is a practical challenge in MC meth-ods. In the present work, the pathwise derivative estimate is used for calculatingthe delta and theta, and combination of likelihood ratio method and pathwisederivative estimate is used for calculating gamma. The Greeks at time zero aregiven by averaging the formulas in Table 1. Derivations of these formulas aredescribed in Appendix. 7 . Learning stochastic differential equations and the Black-Scholesequation We approximate the price of option u ( t, S , S , . . . , S n ) in Eq.(9) using aneural network. Expiry time T is divided into N (= 200) equal intervals t = 0 4, and one in the last layer l = 5.To train the neural network N ( t, S ), we need to know the target u ( t, S ).Although the u ( · ) is of course unknown at t < T , this function satisfies thefollowing SDE when the stock price follows Eq.(1) Hull (2018); Shreve (2004);Glasserman (2004). du = ( u t + n X i =1 rS i u S i + 12 S ′ HS ) dt + n X i =1 σ i S i u S i d ˜ W i , (16)where u t = ∂u∂t , S ′ = ( S , S , . . . , S n ) , H ij = ρ ij σ i σ j ∂ u∂S i ∂S j and u S i = ∂u∂S i .Based on this SDE and the derivatives of the neural network N ( t, S ), we gen-erate the following ˜ u ( t k , S ( t k )) that approximates u ( t k , S ( t k )).8 u ( t k , S ( t k )) = N ( t k − , S ( t k − )) + {N t ( t k − , S ( t k − ))+ n X i =1 rS i ( t k − ) N S i ( t k − , S i ( t k − )) + 12 S ( t k − ) ′ H k − S ( t k − ) } dt k + n X i =1 σ i S i ( t k − ) N S i ( t k − , S i ( t k − )) d ˜ W i ( t k ) , (17)where N t and N S i respectively denote partial derivatives of the neural network N ( t, S ) with respect to t and S i , ( H k − ) ij = ρ ij σ i σ j ∂ N ( t k − ,S ( t k − )) ∂S i ∂S j , dt k = t k − t k − and d ˜ W i ( t k ) = √ t k − t k − L i Z . This ˜ u ( t k , S ( t k )) is used to train N ( t k , S ( t k )) with the following loss function which is minimized during trainingthe neural network. L SDE = N X k =1 [˜ u ( t k , S ( t k )) − N ( t k , S ( t k ))] . (18) We enforce the neural network N ( t, S ) to satisfy the Black-Scholes equationin Eq. (9) by minimizing the following representation of Black-Scholes equationin the form of a neural network L BS = m X k =1 [ N t ( t k , S ( t k )) + n X i =1 rS i ( t k ) N S i ( t k , S i ( t k ))+ 12 S ( t k ) ′ H k S ( t k ) − r N ( t k , S ( t k ))] . (19)We need to differentiate the neural network with respect to variables t k and S ( t k ) for the calculation of Eq. (17) and Eq. (19). This differentiation isperformed by using automatic differentiation. Automatic differentiation is anexact differentiation method, not an approximation method, so that it givesaccurate differential results. It apply the chain rule for differentiating compo-sitions of a set of elementary functions for which derivatives are known exactlyNocedal and Wright (2006); Abadi et al. (2015); Raissi (2018). Since softwarelibraries like Tensorflow Abadi et al. (2015) already provide operations for au-tomatic differentiation, we used the tensorflow.GradientTape operation in Ten-sorFlow to calculate the gradient of neural network; tensorflow.GradientTape9peration is performed to the gradient of NN for the calculation of the Hessianof NN.At expiration time T , true u ( T, S , S , . . . , S n ) is given by the payoff func-tion. To match the option value at maturity time T , the third loss functionis L T = [ N ( T, S ( T )) − h ( T )] ∗ w T , (20)where h ( T ) denotes the payoff function such as Eq.(3) and Eq.(8), and w T denotes the weight for accurate approximation at T , w T = N/ . ) as adefault weight in this work. Finally, we use the following loss function thatpenalizes the neural network output deviations from the stochastic differentialequation, the neural network differential structure deviations from Black-Scholesequation, and the neural network output deviations from the payoff function atexpiration time T , Loss = L SDE + L BS + L T . (21) We call our neural network SDBS, because it is based on the stochasticdifferential equation and the Black-Scholes equation. We summarize the train-ing procedure of SDBS in Algorithm 1. For updating the parameters of SDBS(line 12 in Algorithm 1), we use a stochastic gradient descent method, adam,that is based on adaptive estimation of first-order and second-order momentsKingma and Ba (2014). It is commonly observed that a monotonically decreas-ing learning rate results in a better performing model. The present work usesa PolynomialDecay schedule in Tensorflow and the learning rate is linearly de-cayed from 10 − to 10 − . As the final trained model, we choose the model whichhas the minimum Loss during training iterations.The estimation procedure is similar to the training procedure. There arethree differences: (1) in addition to the input of training procedure, the pa-rameters of trained SDBS is loaded, (2) the learning rate is fixed by 10 − inline 12 of Algorithm 1, (3) the line 13, 14, and 15 are replace by output Price= N ( t, S ( t )), ∆ i = N S i ( t, S ( t )), Γ i = N S i S i ( t, S ( t )), Θ = N t ( t, S ( t )).10 lgorithm 1 Training of SDBS. Input: S ( t ), T, σ i , r, ρ ij , N , nEpoch Calculate L representing Σ = LL ′ , t = 0 for epoch=1, 2, · · · , nEpoch do Calculate N , N t , N S i , H at ( t , S ( t )) for k = 1 , , · · · , N do t k = t k − + T /N Calculate ˜ u ( t k , S ( t k )) by Eq. (17) Calculate S ( t k ) by Eq. (14) and Calculate N , N t , N S i , H at ( t k , S ( t k )) end for Calculate Loss by Eq. (21) Update the parameters of neural network by minimizing Loss if Loss is reduced comparing with that in the previous epoch then Save parameters of SDBS end if end for / (1 + e − x ) (0, 1) C ∞ tanh ( e x − e − x ) / ( e x + e − x ) (-1, 1) C ∞ sin sin( · ) (-1, 1) C ∞ relu x ≤ ,x if x > . (0 , ∞ ) C elu α ( e x − 1) if x ≤ ,x if x > . ( − α, ∞ ) C if α = 1 ,C otherwise . selu λ α ( e x − 1) if x ≤ ,x if x > . ( − λα, ∞ ) C softplus ln(1 + e x ) (0 , ∞ ) C ∞ Table 2: Classification of activation functions Activation functions determine the output of a node and the characteristicsof NNs. They have various shape, range, order of continuity, and magnitude ofgradient. Every function in Table 2 has nonlinear shape because nonlinearityof the activation functions allows NNs to be universal function approximatorsCybenko (1989); Hornik et al. (1990). The range of function is finite for sigmoid,tanh, and sin while infinite for relu Nair and Hinton (2010), elu Clevert et al.(2015), selu Klambauer et al. (2017), softplus Glorot et al. (2011). The bound-edness and continuity of partial derivatives of the activation functions up toorder m are required to approximate the functions with all partial derivativesup to m are continuous and bounded Hornik et al. (1990); Gallant and White(1992). Hence, the order of continuity of activation functions critically affectsthe performance of the proposed method in which the first and second deriva-tives of a neural network play a key role. This consideration suggests that thefollowing sufficiently smooth activation functions are suitable for the presentwork. C ∞ functions = { sigmoid, tanh, sin, softplus } . (22)12he magnitude of gradient of activation function should not vanish for theupdate of parameters of NNs during training. When the sigmoid and tanh areeither too high or too low, the gradient vanishing is observed. The followingrelu-like functions have been widely used in many neural network applicationsbecause they show high performance and their gradients are non-zero for allpositive values. relu-like functions = { relu, elu, selu, softplus } , (23)where α = 1 is used for elu and pre-defined constants α = 1 . λ = 1 . 4. Numerical experiments In this section, numerical experiments are performed to demonstrate theeffectiveness of the proposed method for pricing multidimensional options suchas exchange and basket options. We can compare the performance of numerical methods using an exchangeoption because this option has an exact solution. Margrabe (1978) provided theformula for the exchange option price in Eq.(4). By differentiating the Margrabeformula, we obtained the Greeks in Eq.(5)-(7). In the same notation in section2.1, initial stock prices S (0) = 60 and S (0) = 60, time to maturity T = 1,volatility of first asset σ = 0 . 4, volatility of second asset σ = 0 . 2, interestrate r = 0 . 1, and correlation coefficient ρ = 0 . (cid:12)(cid:12)(cid:12)(cid:12) exact − estimateexact (cid:12)(cid:12)(cid:12)(cid:12) . (24)13 xact FDM1 FDM2 MC1 MC2 MC3Price 8.777591 8.765359 8.776234 8.784203 8.776402 8.777109rError 0 1.39e-03 1.55e-04 7.53e-04 1.35e-04 ∆ 0.573140 0.572740 0.573102 0.573611 0.573094 0.57313rError 0 7.09e-04 7.80e-05 8.10e-04 9.24e-05 Γ 0.017726 0.017728 0.017726 0.017738 0.017724 0.017725rError 0 1.15e-04 Table 3: Numerical estimate of the price and Greeks of the exchange option. The best resultsare highlighted in bold face. nEpoch Exact 25,000 50,000 100,000 200,000 400,000Price 8.777591 8.777736 8.777681 8.777660 8.777587 8.777516rError 0 1.65e-05 1.02e-05 7.85e-06 Γ 0.017726 0.017692 0.017653 0.017699 0.017708 0.017701rError 0 1.90e-03 4.12e-03 1.52e-03 Table 4: SDBS estimate of the price and Greeks of the exchange option using several trainingepochs. The best results are highlighted in bold face. 14n Table 3, FDMs discretize the domain of stock price [0 , , T ]with 5,000 uniform intervals for FDM1 and 50,000 for FDM2. MC simulationscalculate the price and Greeks of options given by Eq. (13) and Table 1. MC1uses 10 simulated paths, 10 for MC2, and 10 for MC3. As can be seen inTable 3, FDMs and MC methods give accurate estimation. Furthermore, thefiner discretization makes FDMs more accurate; MC methods estimate moreaccurately with more simulated paths.Five independent SDBS models are trained for several nEpochs using theAlgorithm 1. The batch size 10,000 of simulated stock price S ( t ) was used fortraining and estimation procedure. The price and Greeks are generated 1000times for each trained NN. The average of 5,000 (5 models x 1,000 times) valuesis used as the final estimated values. Comparing with the results in Table3, the SDBS in Table 4 gives comparable performance to the classical FDMand MC methods. In particular, the estimation of price for nEpoch 200,000is significantly accurate. These experiments show that the differential neuralnetwork based SDBS performs accurately which estimates Price = N ( t , S ( t )),∆ i = N S i ( t , S ( t )), Γ i = N S i S i ( t , S ( t )), and Θ = N t ( t , S ( t )).Table 5 shows the estimates of price and Greeks of exchange options withdifferent initial stock price pairs ( S (0), S (0)), where the SDBS uses the sameexperimental configuration as in Table 4 with 100,000 nEpoch. In the estimationof Price and ∆, the SDBS shows good performance; five bests and five secondbests out of total ten combinations of initial stock price pairs ( S (0), S (0));but, the performance of SDBS is degraded for the estimation of Γ and Θ.From these experiments, we find that the differential neural network, SDBS,accurately estimates the value, the first and second derivatives of solution func-tion. But, the SDBS consumes large memory for storing many simulated pathsand model parameters of NNs; our simulation configuration needs 8GB memoryof NVIDIA GTX 1080. The SDBS takes about 4.5 hours for 10,000 trainingnEpoch. Thus, the current SDBS cannot be a practical substitute of MC simu-lations for the pricing of multidimensional European options.15 (0) , S (0) 20,60 40,60 60,60 60,40 60,20Price FDM2 5.64e-02 5.55e-04 1.55e-04 MC2 Γ FDM2 4.06e-02 MC2 1.74e-03 2.55e-04 9.47e-05 MC2 1.15e-03 1.55e-04 Table 5: rErrors of numerical methods for the exchange options with several initial stockprices. The best results are highlighted in bold face. We consider a call option on a basket with four independent stocks. The pa-rameters are as in Korn and Zeytun (2013) ; T = 0 . , r = 0 . 06, ( S , S , S , S )= (40 , , , w , w , w , w ) = (0 . , . , . , . . We compare our method, SDBS used in Table5, with LN (the log-normal approximation of Levy (1992)), RG (the reciprocalgamma approximation of Milevsky and Posner (1998)), SLN (the shifted log-normal approximation of Korn and Zeytun (2013)), JU (the Taylor expansionapproximation of Ju (2002)), and MC2 which is the MC method with 10 sim-ulated paths used in Table 3, We rounded up to four decimals to be consistentwith previous results in Korn and Zeytun (2013).16 MC LN RG SLN JU MC2 SDBS( σ , σ , σ , σ ) = (0 . , . , . , . 60 0.5041 0.5037 0.5133 0.4719 0.5049 0.5049 0.50490 σ , σ , σ , σ ) = (0 . , . , . , . 60 2.7402 2.7463 2.7444 2.6729 2.7450 2.7441 2.74360 2.23e-03 1.53e-03 2.46e-02 1.75e-03 1.42e-03 65 1.4468 1.4413 1.4831 1.2550 1.4488 1.4479 1.44760 3.80e-03 2.51e-02 1.33e-01 1.38e-03 7.60e-04 ( σ , σ , σ , σ ) = (0 . , . , . , . 65 3.8179 3.8418 3.8123 3.5776 3.8336 3.8230 3.82150 6.26e-03 1.47e-03 6.29e-02 4.11e-03 1.34e-03 70 2.7011 2.7003 2.7430 2.2590 2.7135 2.7025 2.70190 σ , σ , σ , σ ) = (0 . , . , . , . 65 4.1555 4.3459 4.2836 4.0874 4.1973 4.1604 4.15880 4.58e-02 3.08e-02 1.64e-02 1.01e-02 1.18e-03 70 3.1196 3.1607 3.1798 2.6941 3.1710 3.1222 3.12070 1.32e-02 1.93e-02 1.36e-01 1.65e-02 8.33e-04 Table 6: Basket call option prices and rErrors (the upper and lower values in each cell). Thebest results are highlighted in bold face. K and ( σ , σ , σ , σ ), SDBS gives 8 bests and 1 equal best,LN 2 bests, RG 1 best, and JU 1 equal best performances. SDBS gives goodperformance for all levels of strike prices; it gives more accurate price of optionswith increasing volatilities and non-uniform volatilities. The simulation basedmethods such as MC2 and SDBS can effectively process the options with largeand non-uniform volatilities accurately, while the analytic approximation meth-ods like LN, RG, SLN, and JU are degraded with such volatilities. This trendis consistent with the results in Korn and Zeytun (2013).Table 7 shows the rErrors of Greeks estimation obtained with the methodsJU, MC2 and SDBS used in Table 6. Since benchmark values are requiredfor the calculation of the relative errors of Greeks estimations, large number ofMC simulated paths(=10 ) were generated and Greeks were estimated with theformulas shown in Table 1. In ∆ estimation experiments, MC2 gives 11 bestand 1 equal best, and SDBS 10 best and 1 equal best, JU 2 best performancesin 24 combinations of K and ( σ , σ , σ , σ ). In estimation experiments of Γand Θ, MC2 gives 26 best, SDBS 3 best, and JU 1 best performances in 30combinations of K and ( σ , σ , σ , σ ). The estimation performance of SDBSis high for ∆, but low for Γ and Θ; this trend is consistent with the results onexchange options in Table 5.As can be seen in Table 6 and 7, the SDBS estimates accurately for priceand ∆ of basket option with four assets; for the estimation of Γ and Θ, the SDBSis worse than MC simulation methods but better than analytic approximationmethods. We expect that the loss L BS in Eq. (19) constrains the SDBS to satisfy theBlack-Scholes partial differential equation; the derivatives of SDBS, Greeks, areestimated more accurately with L BS than without it. To validate the effective-ness of this constraint, we test several weights ( w = 10 − , − , 10 − , , ,18 σ , σ , σ , σ ) (0.5, 0.5, 0.5, 0.5) (0.6, 1.2, 0.3, 0.9) K 55 60 65 60 65 70∆ JU 2.15e-04 ∆ JU 9.75e-05 ∆ JU 1.28e-04 SDBS 5.57e-05 9.23e-05 2.46e-04 5.15e-04 6.89e-04 7.48e-04∆ JU 5.34e-04 2.27e-04 7.55e-04 1.09e-03 9.20e-03 1.87e-02MC2 SDBS 1.10e-04 JU 2.47e-03 1.84e-03 1.49e-03 5.22e-03 8.22e-03 1.43e-02MC2 SDBS 7.99e-03 2.98e-03 7.61e-03 2.29e-03 9.91e-03 6.70e-03Γ JU 2.96e-03 3.81e-03 4.10e-03 1.38e-01 1.50e-01 1.54e-01MC2 2.01e-04 SDBS JU 1.25e-04 8.40e-04 1.70e-03 1.96e-02 2.24e-02 1.28e-02MC2 SDBS 1.23e-03 8.23e-04 2.08e-03 8.70e-04 1.03e-02 6.38e-03Γ JU 1.16e-03 SDBS 3.81e-04 9.63e-04 2.49e-03 8.83e-04 8.10e-04 1.89e-03Θ JU 1.44e+00 8.76e-01 5.12e-01 1.40e-01 5.32e-02 1.91e-01MC2 Table 7: rErrors of Greeks estimation of basket options. The best results are highlighted inbold face. − − − ( S , S ) = (20 , S , S ) = (60 , S , S ) = (60 , Table 8: rErrors of weighted loss L w for the exchange option. The best results are highlightedin bold face. , ) in weighted loss L w L w = L SDE + w ∗ L BS + L T , (25)where w = 1 corresponds to the loss in Eq. (21); small w means small con-tribution of PDE constraint and relatively large contribution of the SDE, andlarge w means large contribution of PDE constraint and relatively small con-tribution of the SDE. We perform the same experiments as in Table 5 with( S , S ) = (20 , , (60 , , 20) except that this experiment uses theweighted loss L w instead of the original loss in Eq. (21).Table 8 shows that this version of SDBS performs accurately for wide rangeof w , but it degrades when the w deviates from unity. This means that suitableconstraint of Black-Scholes equation enhances the performance of option pricing.Price estimation performs better with w ≤ w ≥ 1. This is because weighting more to the loss of differentialequation L BS gives accurate estimation for derivatives while weighting moreto the loss of value L SDE gives accurate estimation for values. When L BS isremoved completely, no convergent value is obtained. These observations revealthat the loss L BS is essential for the accurate estimation of gradient and Hessianof NN, and both loss L BS and L SDE play a key role in making the SDBS suitablefor option pricing problems. In section 3.4, we described the mathematical conditions of activation func-tions for the approximation of the functions with all partial derivatives; theboundedness and continuity of all its partial derivatives up to order m are re-quired to approximate the functions with all partial derivatives up to m arecontinuous and bounded Hornik et al. (1990); Gallant and White (1992). Theactivation functions such as sigmoid, tanh, and sin satisfy these conditions.But, the practical performance can differ from one activation function to an-other. We compare the performance of several activation functions in terms oftheir suitability for option pricing problems.Using the same experimental setting used for SDBS in Table 5 with nEpoch100,000, we examine the rErrors of estimation with several activation functions.As can be seen in Table 9, the accuracy differs substantially from activationfunction to activation function. Sigmoid and softplus show significantly betteraccuracy than other methods for estimating price and ∆; but, sigmoid degradedfor ( S , S ) = (20 , C functions such as relu and selu give the lowestperformance, especially large errors( ∼ e-01 and e+00). Another relu-like C function elu gives intermediate performance, while relu-like infinitely differen-tiable function softplus shows high performance. This is because the smoothness21 igmoid tanh sin relu elu selu softplus( S , S ) = (20 , ∆ 5.06e-01 5.29e-03 9.93e-01 9.94e-01 2.01e-01 7.59e-01 Γ 5.83e-01 1.90e-02 9.92e-01 1.00e+00 1.39e-01 9.51e-01 Θ 5.55e-01 3.40e-02 1.00e+00 1.00e+00 2.59e-01 9.07e-01 ( S , S ) = (60 , Γ 2.99e-03 3.65e-03 5.49e-03 1.00e+00 1.813-03 5.04e-03 Θ 8.14e-03 S , S ) = (60 , ∆ 8.50e-05 3.37e-04 4.49e-04 6.84e-05 1.68e-04 5.05e-03 Γ Table 9: rErrors of activation functions for exchange option. The best results are highlightedin bold face. of the neural network is determined by the smoothness of the component func-tions, and sufficient smoothness of activation functions is an essential factor fortraining the SDBS which is strongly based on differentiability of NNs. The ac-curacy of price estimation by relu-like functions in Eq. (23) is comparable tothat by C ∞ functions in Eq. (22); for example, elu shows good performance forestimating price comparing with C ∞ functions. This is because the vanishinggradient problem is reduced for relu-like functions.The above considerations suggest that relu-like functions are required for theestimation of accurate price and C ∞ functions are required for accurate Greeksestimation. Hence, the intersection of relu-like functions in Eq. (23) and C ∞ functions in Eq. (22), softplus, gives the best performance for the present work.22 . Conclusions We improve the approximation capability of NNs by making NNs have thedifferential relationship of the problem we wish to solve. Following this ap-proach, the proposed method learns the Black-Scholes equation of option price.The sufficient smoothness of activation functions is an essential factor for theproposed method which relies heavily on the differentiability of NNs; The soft-plus activation function is suitable for our method.The SDBS is trained using more realistic asset price paths generated bySDEs. Hence, the SDBS can accurately model option price and does not sufferfrom the curse of dimensionality associated with high dimensional FDMs. Sincethe SDBS utilizes the exact differential relationship along a single simulationpath, it can easily use parallel and distributed computing. We can also usethis single-path dependency for backward recursion in American option pricingwhich will be discussed in a separate paper.Further improvements can be made using sample paths generated by ad-vanced MC simulation methods such as variance reduction techniques and low-discrepancy sequences. Although the proposed method was applied to a typicalBlack-Scholes model of option pricing in this work, it can be modified to suitdifferent models and option types. In addition, differential neural networks canbe applied to solve problems related to partial differential equations such asinverse problems, the Navier-Stokes equation, and the Schr¨odinger equation. Appendix In this section we derive the calculation formulas of Greeks in Table 1. Al-though the exchange option has the exact solution, we prepare the MC ap-proximation of Greeks for comparing with other numerical methods. Pathwisederivative estimate (PW) method first differentiates e − r ( T − t ) V ( T ) in Eq.(13)and then expectation is performed Glasserman (2004), i.e., differentiation andexpectation is interchanged. For the delta of exchange option, e − r ( T − t ) [ S ( T ) − ( T )] { S ( T ) > S ( T ) } at t = 0 is differentiated with respect to S (0), ddS (0) e − rT V ( T ) = e − rT dS ( T ) dS (0) dV ( T ) dS ( T )= e − rT S ( T ) S (0) { S ( T ) > S ( T ) } , (26)where S ( T ) is presented in Eq. (2) and denotes a indicator function.Similarly to the calculation of delta, the theta of exchange option is obtainedby the expectation of the following, ∂e − r ( T − t ) V ( T − t ) ∂t | t =0 = ∂e − r ( T − t ) ∂t | t =0 V ( T ) + e − rT ∂V ( T − t ) ∂t | t =0 = re − rT V ( T ) + e − rT ∂ [ S ( T − t ) − S ( T − t )] { S ( T − t ) > S ( T − t ) } ∂t | t =0 = e − rT (cid:0) r [ S ( T ) − S ( T )] − [ S ( T )( r − σ σ L Z √ T ) − S ( T )( r − σ σ L Z √ T )] (cid:1) { S ( T ) > S ( T ) } = e − rT [ S ( T )( σ − σ L Z √ T ) − S ( T )( σ − σ L Z √ T )] { S ( T ) > S ( T ) } (27)where L i and Z are described in (2).We also use the PW method for pricing basket call option. The delta ofbasket option is calculated with V ( T ) in Eq.(8). The delta is given by theexpectation of de − rT V ( T ) dS i (0) = e − rT S i ( T ) S i (0) w i { n X i =1 w i S i ( T ) > K } (28)The theta of basket call option is given by the expectation of ∂e − r ( T − t ) V ( T − t ) ∂t | t =0 = e − rT [ − rK + n X i =1 w i S i ( T )( σ i − σ i L i Z √ T )] { n X i =1 w i S i ( T ) > K } (29)For the calculation of the gamma using PW method, the first derivativesof payoff functions are required to be differentiable functions of the parameter S i (0). But, the first derivatives, the delta, are not differentiable for exchangeand basket options and PW cannot be applied for these options. Likelihood ratiomethod (LR), an alternate method, assumes that the assets S i has a probability24ensity g α and that α denotes a parameter of this density. To emphasize thatthe expectation is computed with respect to g α , E α is used. LR interchange theorder of differentiation and integration to derive a Greeks ddα E α [ V ( T )] = Z R n V ( T ) ddα g α ( x ) dx = Z R n V ( T ) ˙ g α ( x ) g α ( x ) g α ( x ) dx = E α [ V ( T ) ˙ g α ( x ) g α ( x ) ] (30)In the present work, Eq.(2) suggests that S i ( T ) for i = 1 , . . . , n , has thedistribution exp( Y i ). The vector Y = ( Y , Y , . . . , Y n ) ′ follows the normal dis-tribution N ( µ ( α ) , ˜Σ( α )), where µ ( α ; · ) denotes a vector with elements µ i =ln( S i (0)) + ( r − σ i ) T , ˜Σ( α ) = T A Σ A ′ ( A = diag ( σ , . . . , σ n )), and α =( S (0) , . . . , S n (0)). The probability density function of Y could be written by g α ( Y ) = 1 q (2 π ) n | ˜Σ( α ) | exp[ − 12 ( y − µ ( α )) ′ ˜Σ( α ) − ( y − µ ( α ))] . (31)We generate the sample paths Y by µ + √ T ALZ , where LL ′ = Σ, and Z denotesa standard normal distribution. Hence,˙ g S i (0) g S i (0) = ddS i (0) log( g S i (0) ( Y )) = ( Y − µ ) ′ ˜Σ − dµdS i (0) = ( √ T ALZ ) ′ ˜Σ − dµdS i (0)= ( √ T ALZ ) ′ ( T A Σ A ′ ) − (0 , · · · , S i (0) , · · · , ′ = ( Z ′ L − A − ) i √ T S i (0) (32)where ( Z ′ L − A − ) i denotes the i th component of the row vector Z ′ L − A − .For calculating gamma in the present study, LR method is first applied tothe delta calculation and this delta is differentiated and averaged by the PWmethod. This LR-PW method for gamma calculation gives superior perfor-mance compared with the pure LR method in Glasserman (2004).For the exchange option, the gamma is given by the expectation of the25ollowing expression ddS (0) (cid:0) e − rT max[ S ( T ) − S ( T ) , 0] ( Z ′ L − A − ) √ T S (0) (cid:1) = e − rT (cid:0) d max[ S ( T ) − S ( T ) , dS (0) (cid:1) ( Z ′ L − A − ) √ T S (0)+ e − rT max[ S ( T ) − S ( T ) , ddS (0) (cid:0) ( Z ′ L − A − ) √ T S (0) (cid:1) = e − rT { ( S ( T ) > S ( T ) ( Z ′ L − A − ) √ T S (0)+ e − rT max[ S ( T ) − S ( T ) , 0] ( Z ′ L − A − ) −√ T S (0)= e − rT ( Z ′ L − A − ) √ T S ( T ) S (0) { ( S ( T ) > S ( T ) } . (33)For the basket call option, the gamma is given by the expectation of thefollowing expression ddS i (0) [ e − rT max( n X i =1 w i S i ( T ) − K, 0) ( Z ′ L − A − ) i √ T S i (0) ]= e − rT ddS i (0) [max( n X i =1 w i S i ( T ) − K, Z ′ L − A − ) i √ T S i (0)+ e − rT max( n X i =1 w i S i ( T ) − K, ddS i (0) [ ( Z ′ L − A − ) i √ T S i (0) ]= e − rT w i { n X i =1 w i S i ( T ) > K } ( Z ′ L − A − ) i √ T S i (0)+ e − rT max( n X i =1 w i S i ( T ) − K, Z ′ L − A − ) i −√ T S i (0) ]= e − rT ( Z ′ L − A − ) i √ T S i (0) { w i S i ( T ) − n X i =1 w i S i ( T ) + K } { n X i =1 w i S i ( T ) > K } . (34) References Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Leven-berg, J., Man´e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M.,26hlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V.,Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke,M., Yu, Y., Zheng, X., 2015. TensorFlow: Large-scale machine learningon heterogeneous systems. URL: . softwareavailable from tensorflow.org.Becker, S., Cheridito, P., Jentzen, A., Welti, T., 2019. Solving high-dimensionaloptimal stopping problems using deep learning. CoRR abs/1908.01602. URL: http://dblp.uni-trier.de/db/journals/corr/corr1908.html .Black, F., Scholes, M., 1973. The pricing of options and corporate liabilityies.the Journal of Political Economy 81, 637–654. doi: .Carr, P., Stanley, M., Madan, D., 2001. Option valuation using the fast fouriertransform. J. Comput. Finance 2. doi: .Clevert, D.A., Unterthiner, T., Hochreiter, S., 2015. Fast and accurate deep net-work learning by exponential linear units (elus). Under Review of ICLR2016(1997) .Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals, and Systems (MCSS) 2, 303–314. URL: http://dx.doi.org/10.1007/BF02551274 , doi: .Deng, L., Yu, D., 2014. Deep Learning: Methods and Ap-plications. Technical Report MSR-TR-2014-21. URL: .E, W., Han, J., Jentzen, A., 2017. Deep learning-based numerical methodsfor high-dimensional parabolic partial differential equations and backwardstochastic differential equations. Communications in Mathematics and Statis-tics 5, 349–380. doi: .Gallant, A.R., White, H., 1992. On learning the deriva-tives of an unknown mapping with multilayer feedfor-ward networks. Neural Networks 5, 129 – 138. URL:27 ,doi: https://doi.org/10.1016/S0893-6080(05)80011-5 .Gencay, R., Min Qi, 2001. Pricing and hedging derivative securities with neu-ral networks: Bayesian regularization, early stopping, and bagging. IEEETransactions on Neural Networks 12, 726–734. doi: .Glasserman, P., 2004. Monte Carlo methods in fi-nancial engineering. Springer, New York. URL: .Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neu-ral networks, in: Gordon, G., Dunson, D., Dudk, M. (Eds.), Proceed-ings of the Fourteenth International Conference on Artificial Intelligenceand Statistics, PMLR, Fort Lauderdale, FL, USA. pp. 315–323. URL: http://proceedings.mlr.press/v15/glorot11a.html .Graves, A., Schmidhuber, J., 2009. Offline handwriting recognition withmultidimensional recurrent neural networks, in: Koller, D., Schuurmans,D., Bengio, Y., Bottou, L. (Eds.), Advances in Neural InformationProcessing Systems 21. Curran Associates, Inc., pp. 545–552. URL: http://papers.nips.cc/paper/3449-offline-handwriting-recognition-with-multidimensional-recurrent-neural-networks.pdf .Hornik, K., Stinchcombe, M., White, H., 1990. Universal approx-imation of an unknown mapping and its derivatives using multi-layer feedforward networks. Neural Networks 3, 551 – 560. URL: ,doi: https://doi.org/10.1016/0893-6080(90)90005-6 .Hull, J., 2018. Options, futures, and other derivatives. 10. ed., pearson internat.ed ed., Pearson Prentice Hall.Ju, N., 2002. Pricing asian and basket options via taylor expansion. J. Comput.Finance 5, 79–103. 28ingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. URL: http://arxiv.org/abs/1412.6980 . cite arxiv:1412.6980Comment: Pub-lished as a conference paper at the 3rd International Conference for LearningRepresentations, San Diego, 2015.Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S., 2017. Self-normalizingneural networks, in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H.,Fergus, R., Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural In-formation Processing Systems 30. Curran Associates, Inc., pp. 971–980. URL: http://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf .Kohler, M., Krzyak, A., Todorovic, N., 2010. Pricing of high-dimensionalamerican options by neural networks. Mathematical Finance 20, 384–410.doi: .Korn, R., Zeytun, S., 2013. Efficient basket monte carlo op-tion pricing via a simple analytical approximation. Journal ofComputational and Applied Mathematics 243, 48 – 59. URL: ,doi: https://doi.org/10.1016/j.cam.2012.10.035 .Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification withdeep convolutional neural networks, in: Proceedings of the 25th InternationalConference on Neural Information Processing Systems - Volume 1, CurranAssociates Inc., Red Hook, NY, USA. p. 10971105.Le, Q.V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean,J., Ng, A.Y., 2012. Building high-level features using large scale unsupervisedlearning, in: Proceedings of the 29th International Coference on InternationalConference on Machine Learning, Omnipress, Madison, WI, USA. p. 507514.Levy, E., 1992. Pricing european average rate currency options. Jour-nal of International Money and Finance 11, 474 – 491. URL: ,doi: https://doi.org/10.1016/0261-5606(92)90013-N .29argrabe, W., 1978. The value of an option to exchange one asset for another.Journal of Finance 33, 177–86. doi: .Milevsky, M.A., Posner, S.E., 1998. A closed-form approxima-tion for valuing basket options. The Journal of Derivatives 5,54–61. URL: https://jod.pm-research.com/content/5/4/54 ,doi: , arXiv:https://jod.pm-research.com/content/5/4/54.full.pdf .Nair, V., Hinton, G.E., 2010. Rectified linear units improve restricted boltzmannmachines, in: Proceedings of the 27th International Conference on Interna-tional Conference on Machine Learning, Omnipress, Madison, WI, USA. p.807814.Nocedal, J., Wright, S.J., 2006. Numerical Optimization. second ed., Springer,New York, NY, USA.Raissi, M., 2018. Forward-backward stochastic neural networks: Deeplearning of high-dimensional partial differential equations. arXiv preprintarXiv:1804.07010 .Ramsundar, B., Kearnes, S.M., Riley, P., Webster, D., Konerding, D.E.,Pande, V.S., 2015. Massively multitask networks for drug discovery. ArXivabs/1502.02072.Shreve, S.E., 2004. Stochastic calculus for finance 2, Continuous-time models. Springer, New York, NY; Heidelberg. URL: .Sirignano, J., Spiliopoulos, K., 2018. Dgm: A deep learning algorithm forsolving partial differential equations. Journal of Computational Physics 375,1339–1364. doi:10.1016/j.jcp.2018.08.029