Volatility model calibration with neural networks a comparison between direct and indirect methods
VVolatility model calibration with neural networksa comparison between direct and indirectmethods
Dirk Roeder ∗ [email protected] Georgi Dimitroff † georgi.dimitroff@allianzgi.com Abstract
In a recent paper [1] a fast 2-step deep calibration algorithm for rough volatilitymodels was proposed: in the first step the time consuming mapping from themodel parameter to the implied volatilities is learned by a neural network andin the second step standard solver techniques are used to find the best modelparameter.In our paper we compare these results with an alternative direct approachwhere the the mapping from market implied volatilities to model parameters isapproximated by the neural network, without the need for an extra solver step.Using a whitening procedure and a projection of the target parameter to [0 , ∗ The views expressed in this paper are solely the responsibility of the authors and shouldnot be interpreted as reflecting the official positions of DZBANK. † The views expressed in this paper are solely the responsibility of the authors and shouldnot be interpreted as reflecting the official positions of allianz global investors . a r X i v : . [ q -f i n . C P ] J u l olatility model calibration with neural networks: direct vs indirect methods Calibrating the parameter of a volatility model to the market can be very timeconsuming, especially if there is no analytic solution for pricing the calibrationproducts (mostly plain-vanilla options), e.g. for the Rough Bergomi model [3].Therefore a growing field of research is to use neural networks as part of thecalibration algorithm to speed up the calibration process.In [1] the authors proposed a two step algorithm based on neural networks.In the first step the neural network is trained to predict the implied volatilitiesfrom the volatility model parameter. Once the network is trained, the pricingcan be done very efficiently as it is just a forward pass through the network.In the second step a standard solver, like the Levenberg-Marquart, is used tocalibrate the model, that is to find the volatility model parameter which min-imize the reconstruction error between the target/market volatilities and thepredicted volatilities by the trained model. They found that the reconstructionerrors are within the Monte Carlo error of the underlying volatility model andsolving can be done fast.In a previous technical note [4] we have proposed an alternative approachwhere the neural network approximates the implicit mapping from the marketimplied volatilities to the optimal model parameters. In this case there is noneed to wrap the neural network into an additional numerical solver in orderto get the optimal model parameters and hence is more practicable, especiallyin a portfolio simulation context where one needs to calibrate derivative pricingmodels on each time step and path of a Monte Carlo simulation. In the contextof the Heston model we have shown that the direct neural network calibrationproduces very accurate Heston model parameters.In the following we show that, for the five data sets used in [1], a directcalibration of the volatility model parameters to the market implied volatilitiescan be done with a neural network very accurately and without over-fitting.The big advantage is that, once the network is trained offline, no further onlinesolver step is necessary to find the parameter of the volatility model.The data sets and notebooks for [1] can be found in their github. Ouralternative ansatz can be found in [5]. Throughout the paper our method isreferred as Volatilities-2-Model and the results from [1] as Model-2-Volatilities.
The no-arbitrage derivative pricing theory states that the price of an Europeanstyle derivative can be calculated as a discounted risk-neutral expectation ofthe pay-off function. The pricing measure Q used to calculate the risk-neutralexpectation is unknown and needs to be estimated from the available marketdata. Typically Q is modeled as the weak solution of a parameterized stochas-tic differential equation (SDE). In the case of the Rough Bergomi model the2olatility model calibration with neural networks: direct vs indirect methodsunderlying S t under the pricing measure Q follows the SDE [3]:d S t = µ t S t d t + σ t S t d Z t σ t = exp( X t )d X t = ν d W Ht − α ( X t − m )d t where µ t is an appropriate drift, like the repo-rate associated with the underlying S t , Z t is a standard Brownian motion and X t is a fractional Ornstein-Uhlenbeckprocess, that is X t satisfies an Ornstein-Uhlenbeck SDE with respect to a frac-tional Brownian motion W Ht .The parameters of any market model are determined so that the observedmarket prices of plain-vanilla options are closely replicated by the model. Tobe more specific let us introduce some notation: denote the model parametersby θ , and the distribution of the solution of the model SDE by Q ( θ ). The(plain-vanilla) pricing function of the model is denoted by F PVM : F PVM ( θ : K, T ) = D T E Q ( θ ) ( S T − K ) + where D T denotes the appropriate discounting factor depending on the deal’scollateralisation and S T is the underlying stock price integration dummy vari-able denoting the solution of the modeling SDE. Given the observed marketprices p mkt = { p mkt i : i = 1 , . . . , n } of call options with strikes and maturi-ties { ( K i , T i ) : i = 1 , . . . , n ) } the calibration of the parameter to the observedmarket data amounts to minimizing the loss L over the parameter θ : L ( θ : p mkt ) = n (cid:88) i =1 l ( F PVM ( θ : K i , T i ) , p mkt i ) , (1)where l is some distance function, for example the squared distance l ( x, y ) =( x − y ) .The solution of (1) ˆ θ can be viewed as a function of the market data mappingto the domain where the model parameters live:ˆ θ ( p mkt ) : p mkt (cid:55)→ argmin θ L ( θ : p mkt ) =: ˆ θ ( p mkt ) . (2)In this setup there are at least two approaches to leverage the function-approximation capabilities of deep neural networks:1. Use a deep neural network to approximate the pricing function F PVM ( θ : K, T )2. Use a deep neural network to approximate the calibrated model parame-ters function ˆ θ ( p mkt ) from (2)In some models, like the prominent Heston model the pricing function isknown in at least quasi-explicit form and the model to market loss L can becomputed very efficiently. Therefore it would not make very much sense to tryto approximate it using neural networks. In contrast, for some models we do Obviously the observed market data contain call and put options among others, butwithout loss of generality for the sake of conciseness of the presentation we restrict ourselvesto only call options. F PVM ( θ : K, T ) is proposed.An important disadvantage of the first approach is that an additional nu-merical optimization algorithm needs to be used in order to calculate to modelparameters ˆ θ . In the context of portfolio simulation one needs to perform thisnumerical optimisation on each time discretization step and each Monte Carlopath. Even if the neural network approximation of F PVM ( θ : K, T ), and hencethe loss L is efficient, the numerical optimization within the Monte Carlo simu-lation becomes a significant computational bottleneck. On the other hand usinga neural network to directly approximate the function p mkt (cid:55)→ ˆ θ ( p mkt )doesn’t have to be wrapped with an additional optimization algorithm. In [4]the neural network approximation of ˆ θ ( p mkt ) in the case of the Heston stochasticvolatility model was investigated and was found to be of a very good quality.In this technical note we continue this line of work by applying the same directapproach in the context of the Rough Bergomi model and compare it againstthe two-step alternative where one uses a neural network to approximate thepricing function F PVM ( θ : K, T ). To directly compare the both methods, we use the data sets and notebooks from[1] as published in github and compare the results found for the train and testdata sets with the results of our approach.There are five different data sets for 1. the Rough Bergomi model with flatforward variance, 2. the Rough Bergomi model with picewise forward variance,3. an one factor model with flat forward variance, 4. an one factor model withpicewise forward variance, and 5. a Heston model. The summary of the numberof samples and volatility model parameter per data set are summarized in table1. data set name train test model parameter
RoughBergomiFlatForwardVariance 34000 6000 4RoughBergomiPicewiseForwardVariance 68000 12000 111FactorFlatForwardVariance 34000 6000 41FactorPiecewiseForwardVariance 68000 12000 11Heston 10200 1800 5Table 1: The number of items in the data sets for the five volatility models.
Before constructing the neural network, it is important to have a closer look tothe input features and their inter-correlations. In figure 3.1 the correlation ma-trix of the input features (the volatility surface) is shown for the Rough Bergomi4olatility model calibration with neural networks: direct vs indirect methodsmodel with piecewise forward variance. It is not surprising that the impliedvolatilities on the surface are highly correlated and this correlation depends onthe relative positions of the data points on the surface grid - obviously neigh-bouring volatilities express stronger correlations than the far off data points.For example in the upper left corner the volatilities for the shortest maturityof 0.1 years and strike 0.6 is strongly correlated with the one at strike 0.8 andmore weakly correlated with with the one at strike 1.4.Figure 1: The correlation matrix of the train data for RoughBergomiPicewise-ForwardVariance (with T the maturity and K the strike).The neural network is supposed to learn the correlations between the volatil-ities at different grid points of the volatility surface. One could support thisprocess by adding the maturity and strike at each volatility instant into theinput data, which is done e.g. in [4]. However in the data sets used here thestrike-maturity grid is fixed so we refrained from doing so.It is common also to standardize the data in order to numerically aid thetraining. We centered the data and instead of just scaling, to get a unit samplevariance, we used ZCA-Mahalanobis whitening in order to also de-correlate theinput matrices. In this process the centered input data are linearly transformed,that is multiplied by a de-correlation matrix W , so that the sample correlationmatrix of the training data is the identity. For more on the ZCA-Mahalanobiswhitening procedure please refer to [6]. We decided to use precisely this whiten-ing approach because of its very natural property that the de-correlation isachieved by a minimal additional adjustment, that is the input data remain asclose as possible in the L -sense to the original input data (after centering of5olatility model calibration with neural networks: direct vs indirect methodscourse). The results of the correlation matrix after the whitening is shown infigure 2.Figure 2: The correlation matrix of the train data for BergomiPicewiseForward-Variance after whitening (with T the maturity and K the strike).The results after whitening are very similar for the first four data sets. How-ever for the Heston data the correlation matrix is more problematic, and thewhitening doesn’t seem to work very well here, as can be seen in figure 3. Prob-ably this correlation structure is the reason that prediction results, shown later,are not so good for the Heston data as for the other four models.Figure 3: Left the correlation matrix of the train data for heston model beforeand right after whitening (with T the maturity and K the strike).6olatility model calibration with neural networks: direct vs indirect methodsNote that the matrix of the ZCA whitening, is constructed from the singularvalue decomposition of the sample covariance matrix of the training data whichis then kept fixed and is being applied as such to the input data at inferencetime. In the following chapter we explain how to construct the neural networks whichare able to learn the implicit mapping from the volatility surface directly to theparameter of the model. We will highlight the main points, all details can befound in the jupyter notebooks made public in the associated git repositories[5].
After pre-processing, in particular whitening, the input features are fed into asimple feed forward network with fully connected layers as shown in figure 4. Wefound that three hidden layers with decreasing number of neurons to be sufficientin order to obtain excellent results. For example for the Rough Bergomi modelwith piece-wise forward variance we use three hidden layers with 68, 49, and 30neurons, which amount to 11274 calibration parameter of the neural network.Figure 4: A schematic plot of the neural network architecture.
In figure 5 three popular activation functions are shown. For computer vision the
ReLU function is very widely used because of its simplicity. A slight modification7olatility model calibration with neural networks: direct vs indirect methodsof this is the eLU function (used in [1]) which has the advantage that the negativevalues from the previous layer are not neglected. In our work we use anothermodification, the
SeLU , a self normalized activation introduced in 2017 in [7].The big advantage is that the
SeLU -layers tend to preserve the sample mean andvariance of their respective inputs which leads to improved training performancedue to avoiding the vanishing gradient issues.Figure 5: Three popular activation functions.
The output of the neural network should lie into the parameter range expectedby the volatility model. An easy way to force this is to simply scale the targetvalues (the parameter of the volatility model) to the unit interval [0 ,
1] andrespectively using a sigmoid output activation function. Obviously, to obtainthe volatility model parameters one needs to map the predicted [0 ,
1] values backto the original parameter domain. For the scaling we use p [0 , = p × ( ub − lb )+ lb with ub the upper bound of the parameter p , lb the lower bound and p [0 , the transformed parameter. For numerical simplicity the hard sigmoid version,which are not smooth but numerical simpler and faster, can be used, cf. figure6. 8olatility model calibration with neural networks: direct vs indirect methodsFigure 6: The sigmoid functions. For the implementation and training of the neural network we use standardmethods of tensorflow/keras. The adam optimizer as solver is used where thetraining is performed on mini-batches. Early stopping was implemented in orderto prevent over-fitting of the network.As loss function we use the mean squared error between the target andpredicted volatility model parameters, cf. fig. 7. Again, all technical details canbe found on the Jupyter notebook in our github repository [5].Figure 7: The loss as a function of epochs for the training of the RoughBer-gomiPiecewiseForwardVariance data set. The blue line is the validation and theorange one the train set. 9olatility model calibration with neural networks: direct vs indirect methods
In figure 8, 9, 10, 11, and 12 the results for the five datasets (cf. table 1) andthe two methods, Volatilities-2-Model (left part of the pictures) and Model-2-Volatilities (right part) are shown. The rows represents the different parameterof the volatility model, e.g. figure 8 from top to button ξ , ν , β , and ρ . In thefirst column of the left (right) picture the target vs. the predicted parameterare plotted for all training data sets with the Volatilities-2-Model (Model-2-Volatilities) Ansatz. The second column shows the reconstruction loss from thethe first column as a density (the integral is normalized to one). The third andfourth column shows the same for the test data.As one can see the predictions are indeed very close to the target parameters,where the quality of the direct implied vol to parameter approach, dubbed theVolatilities-2-Model Ansatz, is superior. On the test data we see the same per-formance as on the train subset, meaning that the network is able to generalizeto unseen data, they do not experience over-fitting issues.Remarkable is that the results for the Heston model in figure 12 tend tobe slightly worse than for the Rough Bergomi model. As mentioned above,maybe this is an effect of the correlation structure of the volatilities used here astraining data. Another issue with the Heston model can be the model parameteridentifiability - there are Heston parameterizations which differ substantially onthe values of the Heston parameters but correspond to extremely similar impliedvolatility surfaces. Figure 8: 1FactorFlatForwardVariance In [1] different solver are compared but the results are very similar, here the results forLevenberg Marquardt are shown. The difference between the target and predicted parameter
There are two main approaches for aiding a volatility model calibration withdeep neural networks:1. Use the neural network to directly learn the implicit mapping from themarket implied vols to the volatility model parameters12olatility model calibration with neural networks: direct vs indirect methods2. Use the network to learn the pricing function of the model, that is thefunction mapping the model parameters, strike and maturity into the cor-responding implied volatility. Then use this approximation within a nu-merical optimization routine, aka solver, in order to calibrate the modelparameters given the market volatilities.We have implemented the first approach and compared its performance withthe second approach followed in [1]. The predictions using the first, direct,approach are superior. Our experiments also show that the direct market vol tomodel parameter neural network generalizes well to unseen data. Additionally,from the computational perspective the first approach is also to be preferredsince it does not require an external solver loop.We found that the whitening of the highly correlated vol surface inputsleads to a more fast and stable training. The scaling of the target parameterto the unit [0 ,
1] interval and using a sigmoid-like output activations forced thepredicted parameter to lie within the target boundaries of the model and henceimproved the interpretability and usability of the results.Note that the parameter sets used here are generated with the models them-selves, that is every volatility surface perfectly fits to a valid model parameterset by construction. What would happen if the target volatility model cannotfit the volatility surface shown in real market? In such cases it would be benefi-cial to bias the network towards fitting the more liquid sections of the volatilitysurface, as the neural network has no information on what ranges of the surfaceare more important. In practical situations ATM volatilities are more impor-tant than far out of the money ones. All this is routinely done in the dailycalibration process in financial institutions for example by adding more weightto ATM calibration. To train a neural network to reflect these requirementsone way is to use the standard calibration process in order to generate trainingdata. In [4] we used such data for a Heston model with very good results.Certainly one shouldn’t trust the results without back-testing, i.e. if onewould like to replace her calibration procedure with a fast neural network ap-proach then the calibration error should be monitored on a regular basis.
References [1] Blanka Horvath, Aitor Muguruza, and Mehdi Tomas. Deep learning volatil-ity, 2019.[2] Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, ZhifengChen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, MatthieuDevin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving,Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, ManjunathKudlur, Josh Levenberg, Dandelion Man´e, Rajat Monga, Sherry Moore,Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner,Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Va-sudevan, Fernanda Vi´egas, Oriol Vinyals, Pete Warden, Martin Wattenberg,Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scalemachine learning on heterogeneous systems, 2015. Software available fromtensorflow.org. 13olatility model calibration with neural networks: direct vs indirect methods[3] Christian Bayer, Peter Friz, and Jim Gatheral. Pricing under rough volatil-ity.
Quantitative Finance , 16(6):887–904, June 2016.[4] Georgi Dimitroff, Dirk R¨oder, and Christian P. Fries. Volatility model cali-bration with convolutional neural networks, 2018.[5] Dirk R¨oder and Georgi Dimitroff.
Volatility Model Calibration With Neu-ral Networks , 2020. https://github.com/roederd/volatility_model_calibration_with_nn .[6] Agnan Kessy, Alex Lewin, and Korbinian Strimmer. Optimal whitening anddecorrelation.
The American Statistician , 72(4):309314, Jan 2018.[7] G¨unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochre-iter. Self-normalizing neural networks.