Financial option valuation by unsupervised learning with artificial neural networks
FFinancial option valuation by unsupervised learningwith artificial neural networks
Beatriz Salvador , Cornelis W. Oosterlee , , Remco van der Meer , CWI – Centrum Wiskunde & Informatica, Amsterdam, the Netherlands DIAM, Delft University of Technology, Delft, the Netherlands
Abstract
Artificial neural networks (ANNs) have recently also been applied to solve par-tial differential equations (PDEs). In this work, the classical problem of pricingEuropean and American financial options, based on the corresponding PDEformulations, is studied. Instead of using numerical techniques based on finiteelement or difference methods, we address the problem using ANNs in the con-text of unsupervised learning. As a result, the ANN learns the option valuesfor all possible underlying stock values at future time points, based on the min-imization of a suitable loss function. For the European option, we solve thelinear Black-Scholes equation, whereas for the American option, we solve thelinear complementarity problem formulation. Two-asset exotic option values arealso computed, since ANNs enable the accurate valuation of high-dimensionaloptions. The resulting errors of the ANN approach are assessed by comparingto the analytic option values or to numerical reference solutions (for Americanoptions, computed by finite elements).
Keywords: (non)linear PDEs, Black-Scholes model, artificial neural network,loss function, multi-asset options
1. Introduction
The interest in machine learning techniques, due to the remarkable successesin different application areas, is growing exponentially. Impressive results havebeen achieved in image recognition or natural language processing problems,among others. The availability of large data sets and powerful compute unitshas brought the broad field of data science to a next level. ANNs are learningsystems based on a collection of artificial neurons that constitute a connectednetwork [26]. Such systems “learn” to perform tasks, generally without beingprogrammed with task-specific rules. The neurons are organized in multiplelayers; The input layer receives external data, the output layer produces thefinal result. The layers in between input and output are the so-called hidden
Preprint submitted to Elsevier May 26, 2020 a r X i v : . [ q -f i n . C P ] M a y ayers [25]. Many different financial problems have also been addressed withmachine learning, like stock price prediction, where ANNs are trained to detectpatterns in historical data sets to predict future trends [12, 20], or bond ratingpredictions, see [6, 17, 21].Motivated by the universal approximation theorems [3, 4], nowadays ANNsare also being used to approximate solutions to ordinary differential equations(ODEs) or partial differential equations (PDEs) [1,5-8]. We will contribute tothis field by solving some PDEs that appear in computational finance applica-tions with ANNs, following the unsupervised learning methodology introducedby [19] and refined in [16]. The resulting ANN-based methods do not require adiscretization of the differential equation, and mesh generation is therefore notrequired.The financial application on which we focus is the valuation of financial deriva-tives with PDEs. Generally, we can distinguish between supervised and un-supervised machine learning techniques. Research so far has mainly focusedon supervised machine learning, i.e. given input variables x and labeled out-put variables y , the ANN is employed to learn the mapping function from theinput to the output. The goal is then to approximate the mapping functionaccurately, so that for new input data x (cid:48) , the corresponding output y (cid:48) is wellapproximated. Such ANN methodology usually consists of two phases. Duringthe training phase, the ANN should learn the PDE solver with input parametersand output. This (off-line) phase usually takes substantial computing time. Inthe testing phase, the trained model is used to very rapidly approximate solu-tions for other parameter sets. In [8], the authors showed that ANNs efficientlyapproximate the solution to the Black-Scholes equation. In [14], option valuesas well as the corresponding implied volatilities were directly computed with oneneural network in a supervised learning approach. The authors in [1] examinedwhether an ANN could derive option pricing formulas based on market prices.ANN studies for American options are also found, like in [11], and in [22], wherethe option was formulated as a free boundary problem. In [13] the Americanoption implied volatility and implied dividend were assessed with the help ofANNs.The goal of the current work is to solve the financial PDEs by applying unsu-pervised machine learning techniques. In such a case, only the inputs of thenetwork are known, and based on a suitable loss function that needs to be min-imized, the ANN should “converge” to the solution of the PDE problem. TheANN should learn solutions that satisfy constraints that are imposed by thePDE and the boundary conditions, without using any information about thetrue solution. These constraints are typically formulated as soft constraints,that are satisfied by minimizing some loss function. The potential advantage ofapplying ANNs to address PDE problems, instead of using classical numericalmethods, is found in the problem’s dimensionality. An ANN-based methodol-ogy does not suffer much from the curse of dimensionality. The authors of [1,6, 7] provide evidence that for the well-known Poisson and Burgers equations,2hese unsupervised learning methods yield accurate results. The authors in [16]extended the class of PDE solutions that may be approximated by these un-supervised learning methods, by translating the PDEs to a suitably weightedminimization problem for the ANNs to solve. Moreover, in [4, 5] Americanoptions were formulated as optimal stopping problems, where optimal stoppingdecisions were learned and so-called ANN regression was used to estimate thecontinuation values. This is an example of the unsupervised learning approachto solve a specific formulation of options with early-exercise features.We will price European and American options modeled by the Black–ScholesPDE and look for solutions for all future time points and stock values. So,linear and nonlinear partial differential equations need to be solved. We willsolve European and American option problems based on one and two underlyingassets, as the methodology is easily extended to solving multi-asset options. Forthe European problems, the accuracy of the network can be measured as wehave the analytic Black-Scholes solution as a reference. American options willbe formulated as linear complementarity problems. Since an analytic solutionis not known in this case, the reference solutions are obtained by finite elementcomputations on fine meshes.This paper is organized as follows. In Section 2, the methodology to trainthe neural network is introduced. In Section 3, the financial PDE problemsare formulated, for the linear and the nonlinear case. Numerical results, ANNconvergence and solution accuracy, are presented in Section 4. Finally, Section5 concludes.
2. Artificial Neural Networks Solving PDEs
In this section, we introduce the methodology following [16] to solve linear andnonlinear time-dependent PDEs by ANNs. With this aim, we write a generalPDE problem as follows: N I ( v ( t, x )) = 0 , x ∈ (cid:101) Ω , t ∈ [0 , T ] , N B ( v ( t, x )) = 0 on ∂ (cid:101) Ω , (1) N ( v ( t ∗ , x )) = 0 x ∈ (cid:101) Ω and t ∗ = 0 or t ∗ = T, where v ( t, x ) denotes the solution of the PDE, N I ( · ) is a linear or nonlineartime-dependent differential operator, N B ( · ) is a boundary operator, N ( · ) is aninitial or final time operator, (cid:101) Ω is a subset of R D and ∂ (cid:101) Ω denotes the boundaryon the domain (cid:101)
Ω.As mentioned in the introduction, we will compute European and Americanoption values for one and two underlying assets by unsupervised learning. Thegoal is to obtain ˆ v ( t, x ) by minimizing a suitable loss function L ( v ) over the3pace of k -times differentiable functions, where k depends on the order of thederivatives in the PDE, i.e arg min v ∈C k L ( v ) = ˆ v , where we denote by ˆ v ( t, x ) the true solution of the PDE.Results are available that establish a relation between the value of the lossfunction and the accuracy of the approximated solution. A general expressionfor the loss function, defined in terms of the L p norm, including a weighting, isdefined as follows [19, 16]: L ( v ) = λ (cid:90) Ω | N I ( v ( t, x )) | p d Ω (2)+ (1 − λ ) (cid:90) ∂ Ω ( | N B ( v ( t, x )) | p + | N ( v ( t, x )) | p ) dγ, (3)where Ω = (cid:101) Ω × [0 , T ], ∂ Ω the boundary of Ω and N I ( v ( t, x )) ≡ N ( v ( t, x )) − F ( t, x ) in Ω , N B ( v ( t, x )) ≡ B ( v ( t, x )) − G ( t, x ) on ∂ (cid:101) Ω , N ( v ( t ∗ , x )) ≡ H ( x ) − v ( t ∗ , x ) in (cid:101) Ω × t ∗ , with t ∗ = 0 or t ∗ = T. The integrals of the loss function are labeled as: L I ( v ) ≡ (cid:90) Ω | N I ( v ( t, x )) | p d Ω , and L B ( v ) ≡ (cid:90) ∂ Ω ( | N B ( v ( t, x )) | p + | N ( v ( t, x )) | p ) dγ, which are denoted as the interior and the boundary loss functions, respectively.Financial options with early-exercise features give rise to free boundary PDEproblems. Free boundary problems are well-known and often appearing in avariety of engineering problems. We recall some classical formulations of thefree boundary problems that we encounter here: • An optimal stopping time problem, • A linear complementarity problem (LCP), • A parabolic variational inequality, • A penalty problem.We will focus on the reformulation of the free boundary problem as an LCP, andaim to solve this formulation by ANNs and unsupervised learning. The genericLCP formulation reads, N I ( v ( t, x )) · N ( v ( t, x )) = 0 , x ∈ (cid:101) Ω , t ∈ [0 , T ] . (4)4r, equivalently,max( N ( v ( t, x )) , N I ( v ( t, x ))) = 0 , x ∈ (cid:101) Ω , t ∈ [0 , T ] , N B ( v ( t, x )) = 0 , on ∂ (cid:101) Ω , N ( v ( t ∗ , x )) = 0 , x ∈ (cid:101) Ω and t ∗ = 0 or t ∗ = T. Our expression for the loss function, to solve the linear complementarity prob-lem, is as follows: L ( v ) = λ (cid:90) Ω | max( N ( t, x, v ) , N I ( t, x, v )) | p d Ω+ (1 − λ ) (cid:90) ∂ Ω ( | N B ( t, x, v ) | p + | N ( t, x, v ) | p ) dγ . (5)As an alternative loss function for the LCP, a variance normalization loss func-tion has also been considered [16], which is defined as: L ( v ) = (cid:82) Ω | max( N ( t, x, v ) , N I ( t, x, v )) | p dx (cid:82) Ω (max( | N ( t, x, v ) | , ˆ N I ( t, x, v ))) p dx + (cid:82) ∂ Ω ( | N B ( t, x, v ) | p + | N ( t, x, v ) | p ) dγ (cid:82) ∂ Ω | v ( t, x ) − ¯ v | p dγ , (6)where ˆ N I is defined as N I but considering each term in absolute value and ¯ v isthe mean of v over the corresponding domain.The parameter λ ∈ (0 ,
1) in the loss functions represents the relative importanceof the interior and boundary functions in the minimization process. The choiceof such value can be addressed in different ways, see [19, 16]. In this work, theloss weight is, in most of the tests, set equal to λ = 0 .
5. It was found in [16]that this choice works very well for PDE problems with smooth, non-oscillatorysolutions (as we also encounter them in the option valuation problems underconsideration). For some linear complementarity problems, we will compare thebasic choice with the variance normalization loss function. In addition, for someother cases, we will compute the loss function considering a so-called optimalloss weight (as in [16]).Based on the loss function, the ANN has been trained with the Broyden-Fletcher-Goldfarb-Shanno optimization (BFGS). This is a quasi-Newton methodwhich employs an approximate Hessian matrix. Particularly, we use the L-BFGSalgorithm to optimize the vector θ , which contains all parameters defining theneural network. The activation function used in the ANN is the hyperbolic tan-gent function tanh( x ), however, other choices of the activation function can alsobe used, like the sigmoid function (resulting in very similar results in this work).We will work with relatively small neural networks formed by four hidden layerswith 20 neurons each for the European and American options. Increasing thenumber of layers did not improve the accuracy of the solution significantly forthese particular problems. Finally, the integral terms in the loss function areapproximated by Monte Carlo techniques.5 . Financial derivative pricing partial differential equations In this section the option pricing partial differential equation problems are pre-sented. We briefly introduce the models.
The reference option pricing PDE for the valuation of a plain vanilla European,put or call, option is the Black-Scholes equation. The underlying asset S t isassumed to pay a constant dividend yield δ , and follows the geometric Brownianmotion: dS t = ( µ − δ ) S t dt + σS t dW Pt , (7)where W Pt is a Brownian motion. The drift term µ , the risk-free interest rate, r , and the asset volatility, σ , are known functions. Assuming there are noarbitrage opportunities, the European option value follows from the Black–Scholes equation, (cid:40) L ( v ) = ∂ t v + A v − rv = 0 , S ∈ (cid:101) Ω , t ∈ [0 , T ) ,v ( T, S ) = H ( S ) , (8)where operator A is defined as, A v ≡ σ S ∂ v∂S + ( r − δ ) S ∂v∂S (9)and function H denotes the option’s payoff, which is given by: (cid:40) ( K − S ) + for a put option( S − K ) + for a call option , (10)with K the strike price in the option contract.In order to apply numerical methods to solve the PDE, a bounded domain shouldbe considered and a proper set of boundary conditions should be imposed. Weassume a domain large enough being [0 , S ∞ ], with S ∞ four times the strike K .Depending on the kind of option, call v c or put v p , the problem (8) is subjectto the conditions: (cid:40) v c ( t,
0) = 0 v c ( t, S max ) = S max − Ke − r ( T − t ) , (cid:40) v p ( t,
0) = Ke − r ( T − t ) v p ( t, S max ) = 0 . (11)The analytic solution for (8) is known: v c ( t, S ) = S exp( − δ ( T − t )) N , ( d ) − K exp( − r ( T − t )) N , ( d ) ,v p ( t, S ) = K exp( − r ( T − t )) N , ( − d ) − S exp( − δ ( T − t )) N , ( − d ) , d = log( S/K ) + ( r − δ + σ / T − t ) σ √ T − t , d = log( S/K ) + ( r − δ − σ / T − t ) σ √ T − t and N , ( x ) the distribution function of a standard N (0 ,
1) random variable.Regarding the numerical solution with ANNs, we will use the methodologyintroduced in the previous section. In particular, the loss function is defined as: L ( v ) = λ (cid:90) Ω | L ( v ( t, x )) | p d Ω+ (1 − λ ) (cid:90) ∂ Ω ( | v ( t, x ) − G ( t, x ) | p + | v ( t, x ) − H ( x ) | p ) dγ, (12)where functions G and H denote the values of the spatial boundary conditionsand final condition, respectively. The integral terms in the loss function areapproximated by Monte Carlo techniques, as a result, we obtain the followinginterior and boundary loss function for the parameter vector θ : (cid:98) L ( θ ) = λ n I n I (cid:88) i =1 | L ( v ( y Ii , θ ))) | p +(1 − λ ) (cid:32) n B n B (cid:88) i =1 | v ( y Bi , θ ) − G ( y Bi ) | p + 1 n n (cid:88) i =1 | v ( y i , θ ) − H ( x i ) | p (cid:33) . (13)The collocation points { y Ii } n I i =1 and { y Bi } n B i =1 are uniformly distributed over thedomain Ω and the boundary ∂ (cid:101) Ω and { y i } n i =1 are uniformly distributed over thedomain T × (cid:101) Ω , respectively and y = ( t, x ). We extend the model for one underlying asset to valuing basket options withtwo underlying assets. The two-asset prices follow the following dynamics, dS t = ( µ − δ ) S t dt + σ S t dW t ,dS t = ( µ − δ ) S t dt + σ S t dW t , where µ , µ are drift terms, δ , δ dividend yields, the Brownian increments, dW i for i = 1 ,
2, satisfy E ( dW i ) = 0, and the underlying assets are correlated:corr( W , W ) = ρt or E ( dW , dW ) = ρdt . In the Black-Scholes framework, the two-asset European option price, v ( t, S , S ),satisfies the following PDE: (cid:40) L ( v ) = ∂ t v + B v − rv = 0 ( S , S ) ∈ (cid:101) Ω , t ∈ [0 , T ) ,v ( T, S , S ) = H ( S , S ) , (14)7here the operator B is defined as follows: B v ≡ σ S ∂ v∂S + 12 σ S ∂ v∂S + ρσ σ S S ∂ v∂S ∂S + ( r − δ ) S ∂v∂S + ( r − δ ) S ∂v∂S , (15)and function H ( S , S ) denotes the payoff function. By prescribing differentpayoff functions, different options can be defined, like an exchange option, rain-bow option or an average put option. We will deal with the exchange option,for which an analytic solution is given by the Margrabe’s formula [15] and themax-on-call rainbow option, for which a closed-form expression was introducedin [10] and [23]. These particular options are defined by their payoff functions: H ( S , S ) = ( S − S ) + exchange option ,H ( S , S ) = (max( S , S ) − K ) + max-on-call rainbow option . According to the Margrabe’s formula, the fair value of a European exchangeoption at time t is given by: v ( t, S , S ) = e − δ ( T − t ) S ( t ) N , ( d ) − e − δ ( T − t ) S ( t ) N ( d ) (16)where N , again denotes the cumulative distribution function for the standardnormal, σ = (cid:112) σ + σ − σ σ ρ and d = (log( S ( t ) /S ( t )) + ( δ − δ + σ / T /σ √ T − t , d = d − σ √ T − t . With the following parameters: d i = log( S i /k ) + ( r − δ i + σ i )( T − t ) σ i √ T − t ,ρ = σ − ρσ σ and ρ = σ − ρσ σ , i = 1 , , the closed-form formula for a call on the maximum is given by: v max c ( t, S , S ) = S e − δ ( T − t ) M ( d , d ; ρ ) + S e − δ ( T − t ) M ( d , − d + σ √ T − t ; ρ ) − Ke − r ( T − t ) (1 − M ( − d + σ √ T − t, − d + σ √ T − t ; ρ )) , (17)where M is the cumulative bivariate normal distribution M ( a, b ; ρ ) = 12 π (cid:112) − ρ (cid:90) a −∞ (cid:90) b −∞ e − x − ρxy + y − ρ ) dxdy . To obtain a numerical solution of the PDE (14), we bound the domain andimpose appropriate boundary conditions. The computational domain should be8ufficiently large, [0 , S ∞ ] × [0 , S ∞ ], where S ∞ = S ∞ = 4 K ( K the optionstrike). In the particular case of the exchange and rainbow max-on-call options,where the analytic solutions are known, we impose as boundary conditions theanalytic option value on each boundary.Similar to the one-dimensional problem, we address the European exchangeoption problem building the loss function as a sum of the interior and boundaryloss functions, using λ = 0 . As we have introduced in Section 2, we also address the problem for an Americanoption depending on one underlying asset price. With this aim, we focus on thelinear complementarity formulation.
We will here consider the linear complementarity problem (LCP) Americanoption valuation formulation, see, for example, [24, 9], as follows, L ( v ) = ∂ t v + A v − rv ≤ , S ∈ (cid:101) Ω , t ∈ [0 , T ) ,v ( t, S ) ≥ H ( S ) , L ( v )( v − H ) = 0 ,v ( T, S ) = H ( S ) . (18)This LCP can be rewritten as a nonlinear PDE as follows (cid:40) max { H ( S ) − v ( t, S ) , L ( v ) } = 0 , S ∈ (cid:101) Ω , t ∈ [0 , T ) ,v ( T, S ) = H ( S ) . (19)Essentially, using the same methodology for solving the European option PDEs,we address the linear complementarity formulation and its equivalent formula-tion as a nonlinear PDE given by (19).As we introduced in Section 2, the loss function can be formulated using variancenormalization. Moreover, in case of the American option we will also compute λ as the optimal loss weight.The loss function based on variance normalization depends on the variance ofthe network output. For the Black-Scholes American option problem, the lossfunction following variance normalization is given by L ( v ) = (cid:82) Ω | max( H ( x ) − v ( t, x ) , L ( v ( t, x ))) | p dx (cid:82) Ω (max( | H ( x ) − v ( t, x ) | , ˜ L ( v ( t, x )))) p dx + (cid:82) ∂ Ω ( | v ( t, x ) − G ( t, x ) | p + | v ( t, x ) − H ( x ) | p ) dγ (cid:82) ∂ Ω | v ( t, x ) − ¯ v | p dγ , (20)9here ˜ L ( v ( t, x ))) is defined as follows˜ L (ˆ v ) = | ∂ t ˆ v | + | σ S ∂ SS ˆ v | + | ( r − δ ) S∂ S ˆ v | + | r ˆ v | , (21)function G refers to the boundary conditions imposed in a bounded domainwhich are defined as in (11) and function H denotes the final condition. More-over, ¯ v is the mean of v over the corresponding domain, which is given as¯ v = 1 (cid:107) ∂ Ω (cid:107) (cid:90) ∂ Ω v ( t, x ) d Ω . (22)Then, approximating each integral term by Monte Carlo techniques the resultingfunction is defined as follows (cid:98) L ( θ ) = (cid:80) n I i =1 | max( H ( x Ii ) − v ( y Ii , θ ) , L ( v ( y Ii , θ ))) | p (cid:80) n I i =1 max( | H ( x Ii ) − v ( y Ii , θ ) | , ˜ L ( v ( y Ii , θ ))) p + n B (cid:80) n B i =1 | v ( y Bi , θ ) − G ( y Bi ) | p + n (cid:80) n i =1 | v ( y i , θ ) − H ( x i ) | p n ∗ (cid:80) n ∗ i =1 | v ( y ∗ i , θ ) − n ∗ (cid:80) n ∗ j =1 v ( y ∗ j ) | p , (23)with θ containing all parameters of the neural network, vector y = ( t, x ), andthe collocation points { y ∗ i } n ∗ i =1 are uniformly distributed over the boundary ∂ Ω.An alternative is to build the loss function based on an optimal loss weight.However, optimizing λ can be nontrivial.In order to find the optimal loss weight, we may look for a so-called (cid:15) -closesolution to the true solution ˆ v , see [16], (cid:12)(cid:12)(cid:12)(cid:12) ∂ n v∂y ni − ∂ n ˆ v∂y ni (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:15) ∂ n ˆ v∂y ni , for all n ≥ i ∈ , . . . , d , where d is the dimension of the problem. Satisfyingsuch condition, the value of the optimal loss weight λ ∗ should be: λ ∗ = (cid:82) ∂ Ω | ˆ v ( t, x ) | p dγ (cid:82) Ω ( ˆ N I ( t, x, ˆ v )) p d Ω + (cid:82) ∂ Ω | ˆ v ( t, x ) | p dγ , (24)where function ˆ N I ( x, ˆ v ) is defined as the function N I ( x, ˆ v ) with each term inabsolute value. This expression of λ ∗ is constant when the analytical solution isknown. However, for the American options where the analytical solution is notknown, the optimal loss weight can be computed by approximating the valueof ˆ v in (24) by the trained solution. Note that in this case, the loss weight is afunction instead of a constant value and is optimized by the neural network.As a result, the loss function is built in the following way: L ( v ) = λ ∗ L I ( v ) + (1 − λ ∗ ) L B ( v )= λ ∗ (cid:90) Ω | max( H ( x ) − v ( t, x ) , L ( v ( t, x ))) | p d Ω+(1 − λ ∗ ) (cid:90) ∂ Ω ( | v ( t, x ) − G ( t, x ) | p + | v ( t, x ) − H ( x ) | p ) dγ, (25)10nd the optimal loss weight is given in terms of the trained solution v , as follows: λ ∗ = (cid:82) ∂ Ω | v ( t, x ) | p dγ (cid:82) Ω ( ˜ L ( v )) p d Ω + (cid:82) ∂ Ω | v ( t, x ) | p dγ , with ˜ L ( v ) = max( | H ( x ) − v ( t, x ) | , ˜ L ( v )) , where ˜ L ( v ) is defined as in (21). The one underlying asset American option pricing problem is extended to alsoprice multi-asset American options. We focus on two underlying assets andformulate the problem as a linear complementarity problem. Based on two assetprices following correlated geometric Brownian motion, the American optionvalue can be modeled by the following linear complementarity problem: L ( v ) = ∂ t v + B v − rv ≤ , ( S , S ) ∈ (cid:101) Ω , t ∈ [0 , T ) ,v ( t, S , S ) ≥ H ( S , S ) , L ( v )( v − H ) = 0 ,v ( T, S , S ) = H ( S , S ) . (26)Operator B is defined as in (15) and function H ( S , S ) denotes the payofffunction. In order to compare with the European option problem, an Americancall on the maximum is also priced, moreover, we address a two-asset spreadoption and a put arithmetic average option. Then, the payoff functions aredefined as: H ( S , S ) = (max( S , S ) − K ) + , max-on-call rainbow option ,H ( S , S ) = ( S − S − K ) + , asset spread option ,H ( S , S ) = ( K − ( S + S ) / + , arithmetic average put . In order to solve the linear complementarity formulation using numerical meth-ods, a bounded domain should be considered and appropriate boundary condi-tions should be imposed. In particular, we consider a domain large enough toavoid that the solution is affected by the conditions, in the interested regionsof the asset prices. Whereas for the European option problem the analyticalsolution is known and imposed as a boundary condition, for the American op-tions problem, where the analytical solution is not known, we should define theappropriate boundary conditions. Then, we start studying at which bound-aries a condition should be imposed. Following [18], that includes the theory ofFichera [7], we introduce the notation x = τ , x = S , x = S , and the domain11 ∗ = (0 , x ∞ ) × (0 , x ∞ ) × (0 , x ∞ ), where x ∞ = T , x ∞ = S ∞ and x ∞ = S ∞ .The boundary of Ω ∗ is, ∂ Ω ∗ = (cid:91) i =0 (Γ ∗ , − i ∪ Γ ∗ , + i ) , where we use the notation:Γ ∗ , − i = { ( x , x , x ) ∈ ∂ Ω ∗ , x i = 0 } , Γ ∗ , + i = { ( x , x , x ) ∈ ∂ Ω ∗ , x i = x ∞ i } . Then, the PDE in (26) can be written in the form: (cid:88) i,j =0 b i,j ∂ v∂x i ∂x j + (cid:88) j =0 p j ∂v∂x j + c v ≤ g , where the involved data are defined as follows: B ( x , x , x ) = ( b ij ) = σ x ρσ σ x x ρσ σ x x σ x , c = r ,p ( x , x , x ) = ( p j ) = − r − δ ) x ( r − δ ) x , g ( x , x , x ) = 0 . Next, we introduce the following subset of Γ ∗ in terms of the normal vector tothe boundary pointing inwards Ω ∗ , −→ m = ( m , m , m )Σ = x ∈ ∂ Ω ∗ / (cid:88) i,j =0 b ij m i m j = 0 , Σ = ∂ Ω − Σ , Σ = x ∈ Σ / (cid:88) i =0 p i − (cid:88) j =0 ∂b ij ∂x j m i ≤ . In this particular problem, we have: Σ = Γ ∗ , − ∪ Γ ∗ , +0 ∪ Γ ∗ , − ∪ Γ ∗ , − , Σ =Γ ∗ , +1 ∪ Γ ∗ , +2 and Σ = Γ ∗ , +0 . Thus, following [18], the boundary conditions mustbe imposed over the subset Σ ∪ Σ which matches with the set Γ ∗ , − ∪ Γ ∗ , +1 ∪ Γ ∗ , +2 .Then, is not necessary to impose boundary conditions above the boundary wherethe asset prices S and S are equal zero. Moreover, for simplicity, we assumethat the option value is equal to the payoff when the asset prices S and S takethe maximum values.Next, taking into account the methodologies proposed to solve the one-dimensionalAmerican problem and the two-dimensional European problem, we propose the12oss function to solve the multi-asset American option by artificial neural net-work. First of all, we rewrite the linear complementarity problem (26) as theequivalent nonlinear PDE: (cid:40) max { H ( S , S ) − v ( t, S , S ) , L ( v ) } = 0 ,v ( T, S , S ) = H ( S , S ) . (27)Similar to the previous problems, we build the loss function as the sum of theinterior and boundary loss functions as follows: L ( v ) = λL I ( v ) + (1 − λ ) L B ( v )= λ (cid:90) Ω | max( H ( x ) − v, L ( v )) | p d Ω+(1 − λ ) (cid:90) ∂ Ω ( | v ( t, x ) − G ( t, x ) | p + | v ( t, x ) − H ( x ) | p ) dγ, (28)where function G refers to the boundary conditions and function H denotesthe final condition imposed for the problems. Note that the loss function is ageneralization of the loss function introduced for the one asset problem and theintegral terms are also approximated by Monte Carlo techniques.
4. ANN Option Pricing Results
In this section the European and American options values are computed withthe ANNs based on the loss functions introduced. We apply the unsupervisedlearning methodology from the previous section to compute the solutions andshow some results. For the following tests, we have considered the parameter p = 2 in the loss functions. First of all, we discuss the European option single asset results obtained solvingthe PDE problem (8) by ANNs.The results are presented with the loss function introduced in (12) and herewe use the basic choice λ = 0 .
5. Recall that optimal loss weight-based lossfunction or the variance normalization technique are especially useful in thecase of nontrivial solutions.We start with a European put option, with the following parameters values: σ = 0 . r = 0 . T = 1, K = 15, S ∞ = 4 K , δ = 0 .
0. In Figure 1, the ANN-based, trained and the analytical solution are plotted for two time instances.13 igure 1: European put option for different times instances, t = 0 , t = 0 .
5, with λ = 0 . We measure the accuracy of the solution generated by the ANN by comparingthe relative error of the trained solution v ANN with the analytic solution v BS ,as follows: error = (cid:107) v BS − v ANN (cid:107) L (cid:107) v BS (cid:107) L . (29)In Figure 2 the error throughout the domain is plotted. Clearly, the biggest Figure 2: Error surface for the ANN solution. error in the ANN solution is found close to the strike price at maturity time t = T , where the payoff is non-smooth. The relative error according to (29)with λ = 0 . . × − .Next, we show some results for a European option depending on two underlyingassets.The corresponding loss function has been optimized by means of the L-BFGSalgorithm and choosing the tanh as the activation function. In the last layera linear activation function is considered. We have plotted in Figure 3 theANN solution for the European exchange option. The error, comparing theapproximated ANN solution with the analytical solution given by (16) is alsoplotted. Note that the maximum error is obtained for the minimum value ofboth asset prices, which is related to S /S in the expression of d in (16).14 igure 3: European exchange option, with parameters: σ = σ = 0 . ρ = 0 . r = 0 . δ i = 0 . S ∞ = S ∞ = 60 and loss weight λ = 0 . ( S , ∞ , S , ∞ ) Relative error(10 , 10) 2 . × − (60 , 60) 3 . × − (120 , 120) 8 . × − (180 , 180) 1 . × − (240 , 240) 2 . × − (300 , 300) 3 . × − (360 , 360) 4 . × − Table 1: Relative error for different domains.
Due to the relatively big differences in the asset prices S , S and the time t -values, we have scaled the inputs of the artificial neural network, i.e. the originaldomain (cid:101) Ω = [0 , S ∞ ] × [0 , S ∞ ], is scaled to a dimensionless computationaldomain, i.e., (cid:101) Ω ∗ = [0 , × [0 , S ∞ = S ∞ = 4 K , σ = σ = 0 . ρ = 0 . r R = 0 . r = 0 . T = 1,for several values of K , modifying the original domain, we found that scalingthe input parameters is not sufficient to obtain highly accurate results for largedomain sizes. In Table 1, the error for a European max-call option is presented,based on different unscaled domain sizes. It can be observed that as the domainincreases the accuracy of the neural network solution decreases.In order to understand, the reasons for the degraded accuracy with an increasingdomain size, we have computed the gradients of the interior and boundary lossfunctions. In Table 2, we present these values for the European max-call. Thegradient of the interior loss remains constant, note that the domain is always[0 , × [0 , S , ∞ , S , ∞ ) (cid:107) ∂L I /∂ω (cid:107) L (cid:107) ∂L B /∂ω (cid:107) L (10 , 10) 0 . . . . . . . . . . . . . . Table 2: Gradient values for different domain sizes with standard weights. ( S , ∞ , S , ∞ ) Relative error(10 , 10) 3 . × − (60 , 60) 3 . × − (120 , 120) 3 . × − (180 , 180) 4 . × − (240 , 240) 2 . × − (300 , 300) 3 . × − (360 , 360) 4 . × − Table 3: Relative error with scaled weights. domain size, and therefore the ANN needs to be modified. The initializationof the weights is adapted by using a variation of the Xavier initialization. Inparticular, the initial values of the weight values in the last layer of the ANNwill be scaled, by multiplying them by the maximum option value. As a result,we obtain a solution which is accurate independent of the size of the domain,see Table 3. This adaptation, i.e. the weights having similar magnitude as theexpected largest option value in the output, forms a robust weight initialization.Moreover, such initialization helps for the interior and boundary loss functionsto have similar sensitivity to the domain size. In Table 4, we can observe suchbehaviour, where the rate between both gradients remains close to 1 / λ = 0 . σ = σ = 0 . ρ = 0 . δ i = 0 . r = 0 .
05 and T = 1. Moreover, the maximum value of the asset prices is S ∞ = S ∞ = 4 K .Note that the accuracy of the trained solution is not affected by the size of thedomain, in addition, similar to the one-dimensional case, the maximum error is16 S , ∞ , S , ∞ ) (cid:107) ∂L I /∂ω (cid:107) L (cid:107) ∂L B /∂ω (cid:107) L (10 , 10) 26 .
372 74 . .
424 2692 . .
696 10769 . .
816 24232 . .
74 43079 . .
602 67311 . .
266 96928 . Table 4: Gradient value for different domains with scaled weights. strike ( S , S ) ANN Analytical(15 , − . × − . × −
15 (10 ,
20) 4 . × − . × − (25 ,
5) 2 . × − . × − (30 , − . × − . × −
30 (20 ,
40) 4 . × − . × − (50 ,
10) 4 . × − . × − (60 ,
60) 1 . × − . × −
60 (40 ,
80) 1 . × − . × − (100 ,
20) 7 . × − . × − Table 5: European max-call option value. obtained when the underlying value is close to the strike price.In Table 6, we present the error for the two-asset European options. The valuesare computed based on the expression in (29). We observe very fine accuracyfor the problems.
The goal of this section is to address the American option problem by usingunsupervised learning with the ANN. As for the European option, we computethe value for one and two underlying assets. However, whereas for the Europeanoption, an analytical solution is known, for the American case, we will use theOption λ ErrorAsset exchange 0 . . × − Max-call K = 15 0 . . × − Max-call K = 30 0 . . × − Max-call K = 60 0 . . × − Table 6: Error according to the loss weight values. x ). With the aim of comparing both methodologies, we price anAmerican option with the same parameter data that in the previous example,considering now, the optimal loss weight, which equals λ ≈ .
90. We determinefirst American options with the following parameter data: σ = 0 . δ = 0 . r = 0 . T = 1, K = 15, S ∞ = 4 K . Figure 4 shows the trained solution andthe error related to a reference FEM solution, for all time points. As for theEuropean options, the maximum error is reached when the asset price is equalto the strike price, close to the maturity time, where the payoff function is notsmooth. In Figure 5, a comparison of the American option value computed byANNs or FEM is presented for different time points. Moreover, the payoff isplotted to demonstrate that the obstacle condition is satisfied. Figure 4: American option price with dividends (left). Error surface comparing with the solu-tion obtained by finite element method (right). Solution obtained by variance normalizationmethodFigure 5: American price and the payoff function. Finite element method (green line) andNeural networks (red line) for different time points. Solution obtained by variance normaliza-tion method
The accuracy of two loss functions for the LCP, one based on optimal loss18eight and another based on variance normalization, is compared by means ofthe relative error of the solution, computed in terms of the L -norm, similar to(29), i.e. error = (cid:107) v F EM − v ANN (cid:107) L (cid:107) v F EM (cid:107) L . Very similar accuracy is obtained with both loss functions, 5 . × − (foroptimal loss weight) versus 5 . × − (variance normalization). However,comparing the convergence of both methodologies, which is presented in Figure6, we clearly observe that defining the loss function with a variance normal-ization (left) the neural network converges faster than using the optimal lossweight (right). In this figure, the relative error is plotted for different numbersof iterations of the L-BFGS algorithm. Figure 6: Error value (represented in log scale) obtained for different iterations with thevariance normalization method (left) and using the optimal loss weight (right). The referencesolution has been obtained solving the PDE by the finite element method.
Next, we value the American options depending on two underlying assets. Op-timizing the loss function with the L-BFGS algorithm, with the tanh( x ) asthe activation function, and equal weighting of boundary and interior losses, λ = 0 .
5, the following results have been obtained for the three types of options.In Table 7, a comparison between the ANN and FEM solutions is shown. Wefocus on an American max-call option with several strike values and the fol-lowing parameter data, ρ = 0 . σ = σ = 0 . r = 0 . δ = 0 .
01 and T = 0 .
5. Moreover, for the FEM, 75 time steps have been considered for thetime discretization and the spatial discretization is based on a 101 ×
101 mesh.Figure 7 shows the trained solution for the spread American option with thefollowing parameter values, K = 15, S ∞ = S ∞ = 4 K , σ = σ = 0 . ρ = 0 . r R = 0 . r = 0 . T = 1, and the error surface using the FEM solutionfollowing [3] as a reference. In Figure 8, the option value and the difference withthe payoff function are shown.Finally, in order to show the accuracy of the method applied to train the ANNto price American options depending on two asset prices, the relative error ispresented in Table 8. 19trike ( S , S ) ANN FEM(15 ,
15) 2 .
021 2 . ,
20) 5 .
703 5 . ,
5) 10 .
996 10 . ,
30) 4 .
102 4 . ,
40) 11 .
405 11 . ,
10) 21 .
998 21 . ,
60) 7 .
916 8 . ,
80) 22 .
753 22 . ,
20) 43 .
994 43 . Table 7: Comparison of American max-call option values.Figure 7: Two-asset spread American option value in the whole domain (left). Error surfacebetween the FEM and the ANN solution (right).Figure 8: Two-Asset spread American option value in a reduced domain (left). Differencebetween the ANN solution and the payoff function (right).
Note that the accuracy of the neural network for the American options depend-ing on two stochastic factors is lower than for the European options. However,20rrorMax-call 1 . × − Spread 2 . × − Arithmetic average put 6 . × − Table 8: Error for different multi-asset American options this may be because here a numerical solution is our reference and not a closed-form expression.
5. Conclusions
In this work, classical problems in financial option pricing have been addressedwith artificial neural networks. In particular, following the classical Black-Scholes model, European and American options depending on one and twounderlying assets have been valued. A new unsupervised learning methodol-ogy is introduced to solve the option value problems based on the PDE formu-lation. With this aim, we proposed appropriate loss functions. The classicalBlack-Scholes American option pricing problem has been formulated as a linearcomplementarity problem.For the European option problem, the accuracy of the methods was comparedto the analytical solution, whereas, for American options, solutions computedby the finite element method were used as reference values. For all problemsconsidered, the final error in the ANN solution was highly satisfactory. Needlessto mention that ANNs can be easily extended to solving higher-dimensionalproblems, as they are not drastically affected by the curse of dimensionality.Finally, the PDE problem formulation can be easily generalized by introducingcounterparty risk which gives rise to nonlinear option valuation PDEs.
References [1] Amilon, H., 2003. A neural network versus Black–Scholes:a comparison ofpricing and hedging performances. Journal of Forecasting 22, 317–335.[2] Arregui, I., Salvador, B., V´azquez, C., 2017. PDE models and numericalmethods for total value adjustment in European and American options withcounterparty risk. Applied Mathematics and Computation 308, 31–53.[3] Arregui, I., Salvador, B., ˇSevˇcoviˇc, D., V´azquez, C., 2020. Pde modelsfor American options with counterparty risk and two stochastic factors:mathematical analysis and numerical solution. Computers & Mathematicswith Applications DOI: https://doi.org/10.1016/j.camwa.2019.09.014.214] Becker, S., Cheridito, P., Jentzen, A., 2019a. Pricing and hedgingAmerican-style options with deep learning. ArXiv:1912.11060.[5] Becker, S., Cheridito, P., Jentzen, A., Welti, T., 2019b. Solv-ing high-dimensional optimal stopping problems using deep learning.ArXiv:1908.01602.[6] Dutta, S., Shekar, S., 1988. Bond rating: a non-conservative applicationof neural networks. Proceedings of the IEEE International Conference onNeural Networks 2, 443–450.[7] Fichera, G., 1963. On a unified theory of boundary value problems forelliptic–parabolic equations of second order. Mathematika 7, 99–122. Alsoin