[PDF] Volatility model calibration with neural networks a comparison between direct and indirect methods

Abstract

In a recent paper "Deep Learning Volatility" a fast 2-step deep calibration algorithm for rough volatility models was proposed: in the first step the time consuming mapping from the model parameter to the implied volatilities is learned by a neural network and in the second step standard solver techniques are used to find the best model parameter. In our paper we compare these results with an alternative direct approach where the the mapping from market implied volatilities to model parameters is approximated by the neural network, without the need for an extra solver step. Using a whitening procedure and a projection of the target parameter to [0,1], in order to be able to use a sigmoid type output function we found that the direct approach outperforms the two-step one for the data sets and methods published in "Deep Learning Volatility". For our implementation we use the open source tensorflow 2 library. The paper should be understood as a technical comparison of neural network techniques and not as an methodically new Ansatz.

Full PDF

VVolatility model calibration with neural networksa comparison between direct and indirectmethods

Dirk Roeder ∗ [email protected] Georgi Dimitroﬀ † georgi.dimitroﬀ@allianzgi.com Abstract

In a recent paper [1] a fast 2-step deep calibration algorithm for rough volatilitymodels was proposed: in the ﬁrst step the time consuming mapping from themodel parameter to the implied volatilities is learned by a neural network andin the second step standard solver techniques are used to ﬁnd the best modelparameter.In our paper we compare these results with an alternative direct approachwhere the the mapping from market implied volatilities to model parameters isapproximated by the neural network, without the need for an extra solver step.Using a whitening procedure and a projection of the target parameter to [0 , ∗ The views expressed in this paper are solely the responsibility of the authors and shouldnot be interpreted as reﬂecting the oﬃcial positions of DZBANK. † The views expressed in this paper are solely the responsibility of the authors and shouldnot be interpreted as reﬂecting the oﬃcial positions of allianz global investors . a r X i v : . [ q -f i n . C P ] J u l olatility model calibration with neural networks: direct vs indirect methods Calibrating the parameter of a volatility model to the market can be very timeconsuming, especially if there is no analytic solution for pricing the calibrationproducts (mostly plain-vanilla options), e.g. for the Rough Bergomi model [3].Therefore a growing ﬁeld of research is to use neural networks as part of thecalibration algorithm to speed up the calibration process.In [1] the authors proposed a two step algorithm based on neural networks.In the ﬁrst step the neural network is trained to predict the implied volatilitiesfrom the volatility model parameter. Once the network is trained, the pricingcan be done very eﬃciently as it is just a forward pass through the network.In the second step a standard solver, like the Levenberg-Marquart, is used tocalibrate the model, that is to ﬁnd the volatility model parameter which min-imize the reconstruction error between the target/market volatilities and thepredicted volatilities by the trained model. They found that the reconstructionerrors are within the Monte Carlo error of the underlying volatility model andsolving can be done fast.In a previous technical note [4] we have proposed an alternative approachwhere the neural network approximates the implicit mapping from the marketimplied volatilities to the optimal model parameters. In this case there is noneed to wrap the neural network into an additional numerical solver in orderto get the optimal model parameters and hence is more practicable, especiallyin a portfolio simulation context where one needs to calibrate derivative pricingmodels on each time step and path of a Monte Carlo simulation. In the contextof the Heston model we have shown that the direct neural network calibrationproduces very accurate Heston model parameters.In the following we show that, for the ﬁve data sets used in [1], a directcalibration of the volatility model parameters to the market implied volatilitiescan be done with a neural network very accurately and without over-ﬁtting.The big advantage is that, once the network is trained oﬄine, no further onlinesolver step is necessary to ﬁnd the parameter of the volatility model.The data sets and notebooks for [1] can be found in their github. Ouralternative ansatz can be found in [5]. Throughout the paper our method isreferred as Volatilities-2-Model and the results from [1] as Model-2-Volatilities.

The no-arbitrage derivative pricing theory states that the price of an Europeanstyle derivative can be calculated as a discounted risk-neutral expectation ofthe pay-oﬀ function. The pricing measure Q used to calculate the risk-neutralexpectation is unknown and needs to be estimated from the available marketdata. Typically Q is modeled as the weak solution of a parameterized stochas-tic diﬀerential equation (SDE). In the case of the Rough Bergomi model the2olatility model calibration with neural networks: direct vs indirect methodsunderlying S t under the pricing measure Q follows the SDE [3]:d S t = µ t S t d t + σ t S t d Z t σ t = exp( X t )d X t = ν d W Ht − α ( X t − m )d t where µ t is an appropriate drift, like the repo-rate associated with the underlying S t , Z t is a standard Brownian motion and X t is a fractional Ornstein-Uhlenbeckprocess, that is X t satisﬁes an Ornstein-Uhlenbeck SDE with respect to a frac-tional Brownian motion W Ht .The parameters of any market model are determined so that the observedmarket prices of plain-vanilla options are closely replicated by the model. Tobe more speciﬁc let us introduce some notation: denote the model parametersby θ , and the distribution of the solution of the model SDE by Q ( θ ). The(plain-vanilla) pricing function of the model is denoted by F PVM : F PVM ( θ : K, T ) = D T E Q ( θ ) ( S T − K ) + where D T denotes the appropriate discounting factor depending on the deal’scollateralisation and S T is the underlying stock price integration dummy vari-able denoting the solution of the modeling SDE. Given the observed marketprices p mkt = { p mkt i : i = 1 , . . . , n } of call options with strikes and maturi-ties { ( K i , T i ) : i = 1 , . . . , n ) } the calibration of the parameter to the observedmarket data amounts to minimizing the loss L over the parameter θ : L ( θ : p mkt ) = n (cid:88) i =1 l ( F PVM ( θ : K i , T i ) , p mkt i ) , (1)where l is some distance function, for example the squared distance l ( x, y ) =( x − y ) .The solution of (1) ˆ θ can be viewed as a function of the market data mappingto the domain where the model parameters live:ˆ θ ( p mkt ) : p mkt (cid:55)→ argmin θ L ( θ : p mkt ) =: ˆ θ ( p mkt ) . (2)In this setup there are at least two approaches to leverage the function-approximation capabilities of deep neural networks:1. Use a deep neural network to approximate the pricing function F PVM ( θ : K, T )2. Use a deep neural network to approximate the calibrated model parame-ters function ˆ θ ( p mkt ) from (2)In some models, like the prominent Heston model the pricing function isknown in at least quasi-explicit form and the model to market loss L can becomputed very eﬃciently. Therefore it would not make very much sense to tryto approximate it using neural networks. In contrast, for some models we do Obviously the observed market data contain call and put options among others, butwithout loss of generality for the sake of conciseness of the presentation we restrict ourselvesto only call options. F PVM ( θ : K, T ) is proposed.An important disadvantage of the ﬁrst approach is that an additional nu-merical optimization algorithm needs to be used in order to calculate to modelparameters ˆ θ . In the context of portfolio simulation one needs to perform thisnumerical optimisation on each time discretization step and each Monte Carlopath. Even if the neural network approximation of F PVM ( θ : K, T ), and hencethe loss L is eﬃcient, the numerical optimization within the Monte Carlo simu-lation becomes a signiﬁcant computational bottleneck. On the other hand usinga neural network to directly approximate the function p mkt (cid:55)→ ˆ θ ( p mkt )doesn’t have to be wrapped with an additional optimization algorithm. In [4]the neural network approximation of ˆ θ ( p mkt ) in the case of the Heston stochasticvolatility model was investigated and was found to be of a very good quality.In this technical note we continue this line of work by applying the same directapproach in the context of the Rough Bergomi model and compare it againstthe two-step alternative where one uses a neural network to approximate thepricing function F PVM ( θ : K, T ). To directly compare the both methods, we use the data sets and notebooks from[1] as published in github and compare the results found for the train and testdata sets with the results of our approach.There are ﬁve diﬀerent data sets for 1. the Rough Bergomi model with ﬂatforward variance, 2. the Rough Bergomi model with picewise forward variance,3. an one factor model with ﬂat forward variance, 4. an one factor model withpicewise forward variance, and 5. a Heston model. The summary of the numberof samples and volatility model parameter per data set are summarized in table1. data set name train test model parameter

RoughBergomiFlatForwardVariance 34000 6000 4RoughBergomiPicewiseForwardVariance 68000 12000 111FactorFlatForwardVariance 34000 6000 41FactorPiecewiseForwardVariance 68000 12000 11Heston 10200 1800 5Table 1: The number of items in the data sets for the ﬁve volatility models.

Before constructing the neural network, it is important to have a closer look tothe input features and their inter-correlations. In ﬁgure 3.1 the correlation ma-trix of the input features (the volatility surface) is shown for the Rough Bergomi4olatility model calibration with neural networks: direct vs indirect methodsmodel with piecewise forward variance. It is not surprising that the impliedvolatilities on the surface are highly correlated and this correlation depends onthe relative positions of the data points on the surface grid - obviously neigh-bouring volatilities express stronger correlations than the far oﬀ data points.For example in the upper left corner the volatilities for the shortest maturityof 0.1 years and strike 0.6 is strongly correlated with the one at strike 0.8 andmore weakly correlated with with the one at strike 1.4.Figure 1: The correlation matrix of the train data for RoughBergomiPicewise-ForwardVariance (with T the maturity and K the strike).The neural network is supposed to learn the correlations between the volatil-ities at diﬀerent grid points of the volatility surface. One could support thisprocess by adding the maturity and strike at each volatility instant into theinput data, which is done e.g. in [4]. However in the data sets used here thestrike-maturity grid is ﬁxed so we refrained from doing so.It is common also to standardize the data in order to numerically aid thetraining. We centered the data and instead of just scaling, to get a unit samplevariance, we used ZCA-Mahalanobis whitening in order to also de-correlate theinput matrices. In this process the centered input data are linearly transformed,that is multiplied by a de-correlation matrix W , so that the sample correlationmatrix of the training data is the identity. For more on the ZCA-Mahalanobiswhitening procedure please refer to [6]. We decided to use precisely this whiten-ing approach because of its very natural property that the de-correlation isachieved by a minimal additional adjustment, that is the input data remain asclose as possible in the L -sense to the original input data (after centering of5olatility model calibration with neural networks: direct vs indirect methodscourse). The results of the correlation matrix after the whitening is shown inﬁgure 2.Figure 2: The correlation matrix of the train data for BergomiPicewiseForward-Variance after whitening (with T the maturity and K the strike).The results after whitening are very similar for the ﬁrst four data sets. How-ever for the Heston data the correlation matrix is more problematic, and thewhitening doesn’t seem to work very well here, as can be seen in ﬁgure 3. Prob-ably this correlation structure is the reason that prediction results, shown later,are not so good for the Heston data as for the other four models.Figure 3: Left the correlation matrix of the train data for heston model beforeand right after whitening (with T the maturity and K the strike).6olatility model calibration with neural networks: direct vs indirect methodsNote that the matrix of the ZCA whitening, is constructed from the singularvalue decomposition of the sample covariance matrix of the training data whichis then kept ﬁxed and is being applied as such to the input data at inferencetime. In the following chapter we explain how to construct the neural networks whichare able to learn the implicit mapping from the volatility surface directly to theparameter of the model. We will highlight the main points, all details can befound in the jupyter notebooks made public in the associated git repositories[5].

After pre-processing, in particular whitening, the input features are fed into asimple feed forward network with fully connected layers as shown in ﬁgure 4. Wefound that three hidden layers with decreasing number of neurons to be suﬃcientin order to obtain excellent results. For example for the Rough Bergomi modelwith piece-wise forward variance we use three hidden layers with 68, 49, and 30neurons, which amount to 11274 calibration parameter of the neural network.Figure 4: A schematic plot of the neural network architecture.

In ﬁgure 5 three popular activation functions are shown. For computer vision the

ReLU function is very widely used because of its simplicity. A slight modiﬁcation7olatility model calibration with neural networks: direct vs indirect methodsof this is the eLU function (used in [1]) which has the advantage that the negativevalues from the previous layer are not neglected. In our work we use anothermodiﬁcation, the

SeLU , a self normalized activation introduced in 2017 in [7].The big advantage is that the

SeLU -layers tend to preserve the sample mean andvariance of their respective inputs which leads to improved training performancedue to avoiding the vanishing gradient issues.Figure 5: Three popular activation functions.

The output of the neural network should lie into the parameter range expectedby the volatility model. An easy way to force this is to simply scale the targetvalues (the parameter of the volatility model) to the unit interval [0 ,

1] andrespectively using a sigmoid output activation function. Obviously, to obtainthe volatility model parameters one needs to map the predicted [0 ,

1] values backto the original parameter domain. For the scaling we use p [0 , = p × ( ub − lb )+ lb with ub the upper bound of the parameter p , lb the lower bound and p [0 , the transformed parameter. For numerical simplicity the hard sigmoid version,which are not smooth but numerical simpler and faster, can be used, cf. ﬁgure6. 8olatility model calibration with neural networks: direct vs indirect methodsFigure 6: The sigmoid functions. For the implementation and training of the neural network we use standardmethods of tensorﬂow/keras. The adam optimizer as solver is used where thetraining is performed on mini-batches. Early stopping was implemented in orderto prevent over-ﬁtting of the network.As loss function we use the mean squared error between the target andpredicted volatility model parameters, cf. ﬁg. 7. Again, all technical details canbe found on the Jupyter notebook in our github repository [5].Figure 7: The loss as a function of epochs for the training of the RoughBer-gomiPiecewiseForwardVariance data set. The blue line is the validation and theorange one the train set. 9olatility model calibration with neural networks: direct vs indirect methods

In ﬁgure 8, 9, 10, 11, and 12 the results for the ﬁve datasets (cf. table 1) andthe two methods, Volatilities-2-Model (left part of the pictures) and Model-2-Volatilities (right part) are shown. The rows represents the diﬀerent parameterof the volatility model, e.g. ﬁgure 8 from top to button ξ , ν , β , and ρ . In theﬁrst column of the left (right) picture the target vs. the predicted parameterare plotted for all training data sets with the Volatilities-2-Model (Model-2-Volatilities) Ansatz. The second column shows the reconstruction loss from thethe ﬁrst column as a density (the integral is normalized to one). The third andfourth column shows the same for the test data.As one can see the predictions are indeed very close to the target parameters,where the quality of the direct implied vol to parameter approach, dubbed theVolatilities-2-Model Ansatz, is superior. On the test data we see the same per-formance as on the train subset, meaning that the network is able to generalizeto unseen data, they do not experience over-ﬁtting issues.Remarkable is that the results for the Heston model in ﬁgure 12 tend tobe slightly worse than for the Rough Bergomi model. As mentioned above,maybe this is an eﬀect of the correlation structure of the volatilities used here astraining data. Another issue with the Heston model can be the model parameteridentiﬁability - there are Heston parameterizations which diﬀer substantially onthe values of the Heston parameters but correspond to extremely similar impliedvolatility surfaces. Figure 8: 1FactorFlatForwardVariance In [1] diﬀerent solver are compared but the results are very similar, here the results forLevenberg Marquardt are shown. The diﬀerence between the target and predicted parameter

There are two main approaches for aiding a volatility model calibration withdeep neural networks:1. Use the neural network to directly learn the implicit mapping from themarket implied vols to the volatility model parameters12olatility model calibration with neural networks: direct vs indirect methods2. Use the network to learn the pricing function of the model, that is thefunction mapping the model parameters, strike and maturity into the cor-responding implied volatility. Then use this approximation within a nu-merical optimization routine, aka solver, in order to calibrate the modelparameters given the market volatilities.We have implemented the ﬁrst approach and compared its performance withthe second approach followed in [1]. The predictions using the ﬁrst, direct,approach are superior. Our experiments also show that the direct market vol tomodel parameter neural network generalizes well to unseen data. Additionally,from the computational perspective the ﬁrst approach is also to be preferredsince it does not require an external solver loop.We found that the whitening of the highly correlated vol surface inputsleads to a more fast and stable training. The scaling of the target parameterto the unit [0 ,

1] interval and using a sigmoid-like output activations forced thepredicted parameter to lie within the target boundaries of the model and henceimproved the interpretability and usability of the results.Note that the parameter sets used here are generated with the models them-selves, that is every volatility surface perfectly ﬁts to a valid model parameterset by construction. What would happen if the target volatility model cannotﬁt the volatility surface shown in real market? In such cases it would be beneﬁ-cial to bias the network towards ﬁtting the more liquid sections of the volatilitysurface, as the neural network has no information on what ranges of the surfaceare more important. In practical situations ATM volatilities are more impor-tant than far out of the money ones. All this is routinely done in the dailycalibration process in ﬁnancial institutions for example by adding more weightto ATM calibration. To train a neural network to reﬂect these requirementsone way is to use the standard calibration process in order to generate trainingdata. In [4] we used such data for a Heston model with very good results.Certainly one shouldn’t trust the results without back-testing, i.e. if onewould like to replace her calibration procedure with a fast neural network ap-proach then the calibration error should be monitored on a regular basis.

References [1] Blanka Horvath, Aitor Muguruza, and Mehdi Tomas. Deep learning volatil-ity, 2019.[2] Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, ZhifengChen, Craig Citro, Greg S. Corrado, Andy Davis, Jeﬀrey Dean, MatthieuDevin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoﬀrey Irving,Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, ManjunathKudlur, Josh Levenberg, Dandelion Man´e, Rajat Monga, Sherry Moore,Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner,Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Va-sudevan, Fernanda Vi´egas, Oriol Vinyals, Pete Warden, Martin Wattenberg,Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scalemachine learning on heterogeneous systems, 2015. Software available fromtensorﬂow.org. 13olatility model calibration with neural networks: direct vs indirect methods[3] Christian Bayer, Peter Friz, and Jim Gatheral. Pricing under rough volatil-ity.

Quantitative Finance , 16(6):887–904, June 2016.[4] Georgi Dimitroﬀ, Dirk R¨oder, and Christian P. Fries. Volatility model cali-bration with convolutional neural networks, 2018.[5] Dirk R¨oder and Georgi Dimitroﬀ.

Volatility Model Calibration With Neu-ral Networks , 2020. https://github.com/roederd/volatility_model_calibration_with_nn .[6] Agnan Kessy, Alex Lewin, and Korbinian Strimmer. Optimal whitening anddecorrelation.

The American Statistician , 72(4):309314, Jan 2018.[7] G¨unter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochre-iter. Self-normalizing neural networks.