Deep-learning based numerical BSDE method for barrier options
aa r X i v : . [ q -f i n . M F ] A p r Deep-learning based numerical BSDE methodfor barrier options
Bing Yu ∗ , Xiaojing Xing † , Agus Sudjianto ‡ April 15, 2019
Abstract
As is known, an option price is a solution to a certain partial differentialequation (PDE) with terminal conditions (payoff functions). There isa close association between the solution of PDE and the solution of abackward stochastic differential equation (BSDE). We can either solvethe PDE to obtain option prices or solve its associated BSDE. Recently adeep learning technique has been applied to solve option prices using theBSDE approach. In this approach, deep learning is used to learn somedeterministic functions, which are used in solving the BSDE with terminalconditions. In this paper, we extend the deep-learning technique to solvea PDE with both terminal and boundary conditions. In particular, wewill employ the technique to solve barrier options using Brownian motionbridges.
A barrier option is a type of derivative where the payoff depends on whetherthe underlying asset has breached a predetermined barrier price. For a simplebarrier case, an analytical pricing formula is available (see [1]). Because barrieroptions have additional conditions built in, they tend to have cheaper premiumsthan comparable options without barriers. Therefore, if a trader believes thebarrier is unlikely to be reached, they may prefer to buy a knock-out barrieroption for a lower premium. There are different methods to solve option prices,ranging from an analytical solution, solving PDE numerically, and Monte Carlosimulations. Recently, a different approach using machine learning has beenproposed.Using machine learning to solve PDE was studied in [2]. In this work, a newmethod was proposed for solving parabolic partial differential equations withterminal conditions, which we will call the standard framework hereafter. In this ∗ Corporate Model Risk, Wells Fargo, [email protected] † Corporate Model Risk, Wells Fargo ‡ Corporate Model Risk, Wells Fargo
Basic method to solve BSDE by machine learn-ing
We briefly introduced the deep-learning-based numerical BSDE algorithm pro-posed in [2]. We start from an FBSDE, which is first proposed in [9]. X t = X + ˆ t b s ( X s ) ds + ˆ t σ s ( X s ) dW s Y t = h ( X T ) + ˆ Tt f s ( X s , Y s , Z s ) ds − ˆ Tt Z s dW s Here, { W s }
Barrier options are options where the payoff depends on whether the underlyingasset’s price reaches a certain level during a certain period of time. These bar-rier options can be classified as either knock-out options or knock-in options. Aknock-out option ceases to exist when the underlying asset price reaches a cer-tain barrier. A knock-in option comes into existence only when the underlyingasset price reaches a barrier. An up-and-out call is a regular call option thatceases to exist if the asset price reaches a barrier level, B , that is higher than thecurrent asset price. An up-and-in call is a regular call option that comes intoexistence only if the barrier is reached. The down-in and down-out option aresimilarly defined. Under the Black-Scholes Framework, assuming constant co-efficient, it is not hard to derive an analytical solution for these kinds of barrieroptions. Therefore, we will use an analytical solution as the benchmark.We would like to start from the most general form of a Cauchy-Dirichletproblem. As is well known, the Feynman-Kac formula has provided a way oftranslating the problem of a partial differential equation into a probabilisticproblem. A Dirichlet condition needs to be translated in a probabilistic way bystopping the underlying diffusion process as it exits from a domain. Theorem 1. (See [10] Chapter 4). Let W be a Brownian motion. Assumeprocess X t satisfies: X t = X + ˆ t b s ( X s ) ds + ˆ t σ s ( X s ) dW s where b and σ satisfy some usual regularity and boundedness conditions. Let D be a bounded domain in real space and define τ t,x = inf { s > t, X t,xs / ∈ D } asthe first exit time from domain D by process X started from ( t, x ) . Assume theboundary ∂D is smooth. Assume functions r, g : [0 , T ] × ¯ D → R are continuous.Then the solution u ( t, x ) of class C , of the following PDE ∂ t u ( t, x ) + b ( t, x ) ∂ x u ( t, x ) + σ ( t, x ) ∂ xx u ( t, x ) − r ( t, x ) u ( t, x ) = 0 , t < T, x ∈ Du ( T, x ) = g ( T, x ) x ∈ ¯ Du ( t, x ) = g ( t, x ) ( t, x ) ∈ [0 , T ] × ∂D (3) can be expressed by the following probabilistic representation u ( t, x ) = E [ g ( τ t,x ∧ T, X t,xτ t,x ∧ T ) e − ´ τt,x ∧ Tt r ( s,X t,xs ) ds ] . Remark . The domain D is assumed to be bounded. In fact, it is enough forthe boundary to be compact. u ( t, x ) is an average over all paths start at ( t, x ) . If the path never exits thedomain D (i.e., τ t,x ≥ T ), we use the value at terminal g ( T, X t,xT ) . If the path4xits the domain (i.e., τ t,x < T ), we use the boundary value at the point thepath exits, (i.e., g ( τ t,x , X t,xτ t,x ) ). We can write it this way: u ( t, x ) = E [ g ( T, X t,xT ) e − ´ Tt r ( s,X t,xs ) ds | τ t,x ≥ T ] P ( τ t,x ≥ T )+ E [ g ( τ t,x , X t,xτ t,x ) e − ´ τt,xt r ( s,X t,xs ) ds | τ t,x < T ] P ( τ t,x < T ) . (4)An up-out call barrier option is a special case of the above general casewith domain, terminal condition, and boundary condition specifically defined.In up-out call option pricing, domain is D = { x < B } , terminal conditionis g ( T, x ) = ( x − K ) + { x max ( X t , X t +∆ t ) , then P [ max t B ] = 1 − ξ ( B ) Taking the use of Lemma 1 and plugging in the terminal condition for barrieroption, the probabilistic representation becomes: u ( t, x ) = E { [( X t,xT − K ) + { X t,xT
In this section, we present the results for 72 cases tested. In test 1, we ran 8,000iterations. As Table 3 indicates, on average the results differ from analyticalsolutions by 4.5 cents and relative differences of 1.21%. In approximately 50%of cases, the pricing difference is less than a penny, while there is a relativeerror of 0.4%. Some isolated cases, shown in Table 4, have large differences. Toimprove convergence, we applied batch normalization technique.Batch normalization technique is proposed in [11]. When we have a neuralnetwork, the change in the input distribution at each layer presents a problembecause parameters need to continuously adapt to a new distribution. This isa phenomenon known as covariate shift. Eliminating internal covariate shiftprovides faster training and batch normalization is the mechanism to do so.Batch normalization accomplishes this via a normalization step that fixes themeans and variances of input distribution. We would like to note that, whenadding a batch normalization technique in tests to improve performance in ourtest, we add additional trainable variables; these trainable parameters withinbatch normalization are different among different time steps (i.e., the networkwill still depend on t i ). Z t i = net θ,β ti ( X t i , T − t i ) , i = 0 ...N (10)Here β t i are the parameters introduced in batch normalization.Test 2, shown in Table 3, are results of applying batch normalization atevery layer. Comparing the results from test 1 and those from test 2, we cansee significant improvement in both absolute differences and relative differenceswhen batch normalization is applied. In addition, those isolated points wherethey failed to converge in test 1 now converge nicely, with less than 1% relativedifferences, as shown in Table 4.When doing batch normalization, we can choose to apply it at every layer orwe can apply it only at the input layer. Whether we apply batch normalization8t each layer or just the input layer, the results are similar, as shown in Table3. However, computation efficiency is very different as we will show later.Table 3: Grid test result - error statisticsStatistics Test 1 rel Test 2 rel Test 3 rel Test 1 abs Test 2 abs Test 3 absAverage 1.21% 0.57% 0.56% 0.0452 0.0099 0.0086STD 2.56% 0.54% 0.53% 0.1399 0.0116 0.009325% quantile 0.24% 0.21% 0.13% 0.0017 0.0017 0.0013Median 0.41% 0.39% 0.48% 0.0043 0.0059 0.004775% quantile 1.04% 0.70% 0.76% 0.0174 0.0133 0.0132Table 4: Result of isolated test casesMaturity Underlying Volatility Barrier Test 1 rel error Test 2 rel error0.5 17 1.2 100 18% 0.65%0.5 22 0.8 100 4.09% 0.34%0.5 27 0.8 100 6.76% 0.28%0.5 32 0.8 100 9.03% 0.39%2 22 0.4 100 4.02% 0.18%2 27 0.4 100 6.42% 0.15% As we have mentioned before, we can apply batch normalization at every layeror apply it only at the input layer. When we apply it at each layer, we increasethe trainable variable but less iterations are required to achieve convergence.On the other hand, if we apply it only at the input layer, we have less trainableparameters but more iterations are needed to achieve convergence. However, theoverall time needed to achieve convergence is less in the latter case, as shownin Table 5. The overall running time for applying batch normalization at onlythe input layer is almost three times faster.Table 5: Efficiency ResultsIndicator Test 2 result Test 3 resultAverage iteration step needed 1000 1670Time consumed per 200 trainingiterations 15-20s 4-5sApproximate running time percase (including building time) 200s 75s9
Conclusion
In this work, we solved a PDE with boundary conditions, using barrier optionsas a concrete example. In this problem, the diffusion domain is restricted bya barrier. By viewing the terminal condition probabilistically (i.e., including abreaching probability of barrier), we are able to recast this problem into the stan-dard framework, namely a PDE with terminal conditions. This PDE is solvedby its equivalent BSDE using a machine learning technique. We have completedextensive testing using a wide range of market conditions and achieved good re-sults when comparing with known analytical results. In some isolated cases, thebatch normalization technique is needed to improve learning.
For completeness, we present a technique Lemma that was used in section 3 fortransition from equation (5) to (6).
Lemma 2.
Assume X is a random variable, A is an event. Then E [ f ( X ) | A ] P ( A ) = E [ f ( X ) P ( A | X )] for any function f ( · ) .Proof. First starting from the left side, we have E [ f ( X ) | A ] P ( A ) = E [1 A E [ f ( X ) | A ]] = E [ E [ f ( X )1 A | A ]] = E [ f ( X )1 A ] . Then, starting from the right side, E [ f ( X ) P ( A | X )] = E [ f ( X ) E [1 A | X ]] = E [ E [ f ( X )1 A | X ]] = E [ f ( X )1 A ] . We arrive at same quantityfrom both side; thus, the statement is proved. References [1] John C. Hull.
Options, Futures and other Derivatives, 9th Edition . 2015.[2] Weinan E, Jiequn Han, Arnulf Jantzen. Deep learning-based numericalmethods for high-dimensional parabolic partial differential equations andbackward stochastic differential equations.
Communications in Mathemat-ics and Statistics , 2017.[3] G. Cybenko. Approximation by superposition of a sigmoidal function.
Mathematics of Control, Signal and System , 1989.[4] Chritian Beck, Weinan E, Arnulf Jantzen. Machine learning approximationalgorithms for high dimensional fully nonlinear partial differential equa-tions and second order backward stochastic differential equations.
Archive1709.05963 , 2017.[5] Maziar Raissi. Forward backward stochastic neural networks: Deep learn-ing of high-dimensional partial differential equations.
Arxiv1804.07010 ,2018.[6] Quentin Chan-Wai-Nam, Joseph Mikael, Xavier Warin. Machine learningfor semi linear PDEs. arXiv:1809.07609v1 , 2018.107] Masaaki Fujii, Akihiko Takahashi, Masayuki Takahashi. Asymptotic ex-pansion as prior knowledge in deep learning method for high dimensionalBSDEs.
Arxiv 1710.07030 , 2017.[8] Haojie Wang, Han Chen, Agus Sudjianto, Richard Liu, Qi Shen. Deeplearning-based BSDE solver for LIBOR market model with application toBermudan swaption pricing and hedging.
Arxiv 1807.06622 , 2018.[9] Etienne Pardous, Shige Peng. Adapted solution of a backward stochasticdifferential equation.
System & Control Letters , 1990.[10] Emmanuel Gobet.
Monte Carlo Method and Stochastic Processes: fromLinear to non-Linear . Chapman and Hall/CRC; 1 edition, 2016.[11] Sergey Ioffe, Christian Szegedy. Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift.