[PDF] Variational Autoencoders: A Hands-Off Approach to Volatility

Abstract

A volatility surface is an important tool for pricing and hedging derivatives. The surface shows the volatility that is implied by the market price of an option on an asset as a function of the option's strike price and maturity. Often, market data is incomplete and it is necessary to estimate missing points on partially observed surfaces. In this paper, we show how variational autoencoders can be used for this task. The first step is to derive latent variables that can be used to construct synthetic volatility surfaces that are indistinguishable from those observed historically. The second step is to determine the synthetic surface generated by our latent variables that fits available data as closely as possible. As a dividend of our first step, the synthetic surfaces produced can also be used in stress testing, in market simulators for developing quantitative investment strategies, and for the valuation of exotic options. We illustrate our procedure and demonstrate its power using foreign exchange market data.

Full PDF

VVariational Autoencoders:A Hands-Oﬀ Approach to Volatility

Maxime Bergeron, Nicholas Fung, John Hull, Zissis Poulos ∗ February 9, 2021

Maxime Bergeron is the Director of Research and Development at Riskfuel Analytics inToronto, ON, [email protected]

Nicholas Fung is a Masters student in the Edward S. Rogers Sr. Department of Electrical& Computer Engineering at the University of Toronto, and a Research Associate at RiskfuelAnalytics in Toronto, ON, [email protected]

John Hull is a professor at the Joseph L. Rotman School of Management, University [email protected]

Zissis Poulos is a postdoctoral fellow at the Joseph L. Rotman School of Management,University of [email protected] author:

Nicholas Fung [email protected] ∗ We would like to thank Ryan Ferguson, Vlad Lucic, Ivan Sergienko, Andreas Veneris and Gary Wong for theirinterest in this work as well as their many helpful comments. We would also like to thank Mitacs for providingﬁnancial support for this research. a r X i v : . [ q -f i n . C P ] F e b ariational Autoencoders: A Hands-Oﬀ Approach to Volatility AbstractA volatility surface is an important tool for pricing and hedging derivatives. The surfaceshows the volatility that is implied by the market price of an option on an asset as a function ofthe option’s strike price and maturity. Often, market data is incomplete and it is necessary toestimate missing points on partially observed surfaces. In this paper, we show how variationalautoencoders can be used for this task. The ﬁrst step is to derive latent variables thatcan be used to construct synthetic volatility surfaces that are indistinguishable from thoseobserved historically. The second step is to determine the synthetic surface generated by ourlatent variables that ﬁts available data as closely as possible. As a dividend of our ﬁrst step,the synthetic surfaces produced can also be used in stress testing, in market simulators fordeveloping quantitative investment strategies, and for the valuation of exotic options. Weillustrate our procedure and demonstrate its power using foreign exchange market data.THREE KEY TAKEAWAYS:1. We show how synthetic yet realistic volatility surfaces for an asset can be generatedusing variational autoencoders trained on multiple assets at once.2. We illustrate how variational autoencoders can be used to construct a complete volatilitysurface when only a small number of points are available without making assumptionsabout the process driving the underlying asset or the shape of the surface.3. We empirically demonstrate our approach using foreign exchange data.Keywords: Derivatives; Unsupervised learning; Variational autoencodersJEL: G10, G20 2he famous Black and Scholes (1973) formula does not provide a perfect model for pricingoptions, but it has been very inﬂuential in the way traders manage portfolios of optionsand communicate prices. The formula has the attractive property that it involves only oneunobservable variable: volatility. As a result, there is a one-to-one correspondence betweenthe volatility substituted into Black-Scholes and the option price. The volatility that isconsistent with the price of an option is known as its implied volatility. Traders frequentlycommunicate prices in the form of implied volatilities. This is convenient because impliedvolatilities tend to be less variable than the prices themselves.A volatility surface shows the implied volatility of an option as a function of its strikeprice and time to maturity. If the Black-Scholes formula provided a perfect description ofprices in the market, the volatility surface for an asset would be ﬂat (i.e., implied volatilitieswould be the same for all strike prices and maturities) and never change. However, in practice,volatility surfaces exhibit a variety of diﬀerent shapes and vary through time.Traders monitor implied volatilities carefully and use them to provide quotes and valuetheir portfolios. Option prices, and therefore implied volatilities, are of course determinedby supply and demand. When transactions for many diﬀerent strike prices and maturitiesare available on a particular day, there is very little uncertainty about the volatility surface.However, in situations where only a few points on the surface can be reliably obtained, it isnecessary to develop a way of estimating the rest of the surface. We refer to this problem as“completing the volatility surface”.Black–Scholes assumes that the asset price follows geometric Brownian motion. This leadsto a lognormal distribution for the future asset price. Many other more sophisticated modelshave been suggested in the literature in an attempt to ﬁt market prices more accurately.Some such as Heston (1993) assume that the volatility is stochastic. Others such as Merton(1976) assume that a diﬀusion process for the underlying asset is overlaid with jumps. Bates(1996) incorporates both a stochastic volatility and jumps. Madan, Carr, and Chang (1998)propose a “variance-gamma” model where there are only jumps. Recently, rough volatilitymodels have been proposed by authors such as Gatheral, Jaisson, and Rosenbaum (2014). Inthese, volatility follows a non-Markovian process. One approach to completing the volatilitysurface is to assume one of these models and ﬁt its parameters to the known points as closelyas possible.Parametric models are another way to complete volatility surfaces. The popular stochasticvolatility inspired representation (Gatheral 2004), as well as its time dependent extension(Gatheral and Jacquier 2013), characterizes the geometry of surfaces directly through each ofits parameters. Compared to stochastic volatility models, parameteric representations areeasier to calibrate and provide better ﬁts to empirical data.We propose an alternative deep learning approach using variational autoencoders (VAEs).The advantage of the approach is that it makes no assumptions about the process drivingthe underlying asset or the shape of the surface. The VAE is trained on historical datafrom multiple assets to provide a way in which realistic volatility surfaces can be generated3rom a small number of parameters. A volatility surface can then be completed by choosingvalues for the parameters that ﬁt the known points as closely as possible. VAEs also make itpossible to generate synthetic-yet-realistic surfaces, which can be used for other tasks such asstress testing and in market simulators for developing quantitative investment strategies. Weillustrate our approach using data from foreign exchange.Deep learning techniques are becoming widely used in the ﬁeld of mathematical ﬁnance.Ferguson and Green (2018) pioneered the use of neural networks for pricing exotic options.Several researchers such as Hernandez (2016), Horvath, Muguruza, and Tomas (2019), andBayer et al. (2019) have used deep learning to calibrate models to market data. One advantageof these approaches is that, once computational time has been invested upfront in developingthe model, results can be produced quickly. Our application of VAEs shares this advantage,but also aims to empirically learn a parameterization of volatility surfaces.Two works use deep learning to model volatility surfaces directly. Ackerer, Tagasovska,and Vatter (2020) proposes an approach where volatility is assumed to be a product of anexisting model and a neural network. Chataigner, Cr´epey, and Dixon (2020) use neuralnetworks to model local volatility using soft and hard constraints inspired by existing models.A potential disadvantage of both approaches is that they train the neural network on eachsurface individually, which can be costly and impractical for real time inference. In contrast,much of the robustness of our approach stems from the fact that we train our networks usingdata from multiple diﬀerent assets at once.We conclude this introduction with a brief outline of the paper. Section 1 introducesvariational autoencoders. Section 2 describes how variational autoencoders can be applied tovolatility surfaces. Section 3 presents experimental results. Finally, conclusions are presentedin Section 4.

1. Variational Autoencoders

The architecture of a vanilla neural network is illustrated in Exhibit 1. There are seriesof hidden layers between the inputs (which form the input layer) and the outputs (whichform the output layer). The value at each neuron of a layer (except the input layer) is F ( c + wv T ) where F is a nonlinear activation function, c is a constant, w is a vector ofweights and v is a vector of the values at the neurons of the immediately preceding layer.Popular activation functions are the rectiﬁed linear unit ( F ( x ) = max( x, F ( x ) = e − x ). The network’s parameters, c and w , are in general diﬀerent foreach neuron. A training set consisting of inputs and outputs is provided to the networkand parameter values are chosen so that the network determines outputs from inputs asaccurately as possible. Further details are provided by Goodfellow et al (2017).An autoencoder is a special type of neural network where the output layer is the same asthe input layer. The objective is to determine a small number of latent variables that arecapable of reproducing the inputs as accurately as possible. The architecture is illustrated4 x ... x M h (1)0 h (1)1 ... h (1) m (1) . . .. . .. . . h ( L )0 h ( L )1 ... h ( L ) m ( L ) y y ... y N Input layer 1 st Hidden Layer L th Hidden Layer Output layer

Exhibit 1:

A neural network with L hidden layers, with M inputs and N outputs. The i th hidden layer contains m ( i ) neurons, and h ( i ) k is the value at the k th neuron of hidden layer i . Latent Encoding z Encoder E ( x ) Decoder D ( z ) Exhibit 2:

An autoencoder can be split into encoder and decoder networks. Note that thedimensionality of the latent encoding is typically smaller than the original input dimension.in Exhibit 2. The encoding function E consists of a number of layers that produce a vectorof latent variables, z , from the vector of inputs, x . The decoder function, D , attempts toreproduce the inputs from z . In the simple example in Exhibit 2, there are ﬁve input variables.These are reduced to two variables by the encoder and the decoder attempts to reconstructthe original ﬁve variables from the latent variables. The parameters of the neural network arechosen to minimize the diﬀerence between D ( z ) and x . Speciﬁcally, we choose the network’sparameters to minimize the reconstruction error (RE):RE = 1 M M (cid:88) i =1 ( x i − y i ) (1)where M is the dimensionality for the input and output, x i is the i th input value and y i is the i th output value obtained by the decoder. Principal components analysis (PCA) is5n alternative approach to dimensionality reduction, and can be regarded as a degenerateautoencoder without any hidden layers or nonlinear activation functions. A useful extension of autoencoders is the variational autoencoder (VAE), which wasintroduced by Kingma and Welling (2014). As its name suggests, the VAE is closely linked tovariational inference methods in statistics, which aims to approximate intractable probabilitydistributions. Rather than producing latent variables in a deterministic manner, the latentvariable is sampled from a distribution that is parameterized by the encoder. By samplingfrom the distribution, synthetic data similar to the input data can be generated. A usefulprior distribution for the latent variables is a multivariate normal distribution, N (0 , I ), wherethe variables are uncorrelated with mean zero and standard deviation one. This is what wewill use in what follows. Contrary to deterministic autoencoders, there are now two parts tothe objective function which is to be minimized. The ﬁrst part is the loss function in equation(1). The second part is the Kullback-Leibler (KL) divergence between the parameterizeddistribution and N (0 , I ). That is:KL = 12 d (cid:88) k=1 ( − − log σ + σ + µ ) (2)where µ k and σ k are the mean and standard deviation of the k th latent variable. The objectivefunction is: RE + β KL (3)where β is a hyperparameter that tunes the strength of the regularization provided by KL.Note that in the limiting case where β goes to 0, the VAE behaves like a deterministicautoencoder. The reason for introducing the KL divergence term in the loss function is toencourage the model to encode a distribution that is as close to normal as possible. Thishelps ensure stability during training and tractability during inference.

2. Application for Volatility Surfaces2.1 Implied Volatility Surfaces

We now show how VAEs can be applied to volatility surfaces. As mentioned earlier, avolatility surface is a function of the strike price and time to maturity, where the impliedvolatilities are obtained by inverting Black-Scholes on observed prices.For a European call option with strike K ≥

0, and time to maturity

T >

0, let S denotethe current price of the underlying asset, and let r denote the (constant) risk-free rate. Let C mkt ( K, T ) denote the market price of this option, and let C BS be the price of this optionas predicted by the Black-Scholes formula (Black and Scholes 1973). The implied volatility σ ( K, T ) ≥ C mkt ( K, T ) = C BS ( S, K, T, r, σ ( K, T )) . (4)

1. Avellaneda et al (2020) provides a recent application of PCA to volatility surface changes. moneyness of an option is a measure of the extent to which the option is likely to beexercised. A moneyness measure providing equivalent information to the strike price usuallyreplaces the strike price in the deﬁnition of the volatility surface. One common moneynessmeasure is the ratio of strike price to asset price. Another is the delta of the option. Thedelta is the partial derivative of the option price with respect to the asset price. Intuitively,the delta approximates the probability that an option expires in-the-money. For a call optionon an asset this varies from zero for a deep out-of-the money option (high strike price) toone for a deep in-the-money money option (low strike price). As per convention, we presentresults on foreign exchange rates using delta as a measure of moneyness.Many diﬀerent shapes are observed for the surface and both the level of volatilities andthe shape of the surface can change through time. However, implied volatility surfaces donot come in completely arbitrary shapes. Indeed, there are several restrictions on theirgeometry arising from the absence of (static) arbitrage, that is, the existence of a tradingstrategy providing instantaneous risk-free proﬁt. Lucic (2019) provides a good discussion ofapproaches that can be used to understand such constraints. Inspired by Bayer et al. (2019), we considered two methods for modelling volatility surfaces:the grid-based approach, and the pointwise approach. Exhibit 3 provides an illustration ofthe diﬀerences between these approaches. In both approaches, the input to the encoder is avolatility surface, sampled at N prespeciﬁed grid points, which is then ﬂattened into a vector,as shown in Exhibit 3a. Exhibit 3b illustrates the grid-based approach, which follows thesame architecture as traditional VAEs, where the decoder uses a d -dimensional latent variableto reconstruct the original grid points. Finally, the pointwise approach, as shown in Exhibit3c is an alternative architecture where the option parameters (moneyness and maturity)are deﬁned explicitly. Concretely, the input for the pointwise decoder is a single option’sparameters and the latent variable for the entire surface, and the output is a single point onthe volatility surface. We can then use batch inference to output all volatility surface points.While Bayer et al. opt to use the grid-based approach for their application, we choosethe pointwise approach for greater expressivity. The pointwise approach has the advantagethat interpolation is performed entirely by neural networks and therefore the derivatives withrespect to option parameters (the “Greeks”) can be calculated precisely and eﬃciently usingbackpropagation. This is not true for the grid-based approach, where derivatives need to beapproximated.Throughout our investigation, we found that VAEs interpolated volatility surfaces quitewell even in environments with limited data. However, as usual, if more data is available itshould be used since it will improve results. We also experimented with VAEs that werepenalized for constructing surfaces that exhibited arbitrage. Nevertheless, we found that this

2. The partial derivative is calculated using the Black-Scholes model with volatility set equal to the implied volatility.3. For convenience, we also include the conditions that we use to check for static arbitrage in Appendix A. ( T , K ) σ ( T , K ) ... σ ( T N , K N ) ... . . .. . .. . . ... z z ... z d Input layer Output layer (a)

The encoder architecture z z ... z d ... . . .. . .. . . ... σ ( T , K ) σ ( T , K )... σ ( T N , K N )Input layer Output layer (b) The decoder architecture for the grid-based training approach. z ... z d TK ... . . .. . .. . . ... σ ( T, K ; z , . . . , z d )Input layer Output layer (c) The decoder architecture for the pointwise training approach.

Exhibit 3:

An illustration of the grid-based and pointwise architectures.8id not signiﬁcantly improve results, as the majority of surfaces produced by our VAEs didnot exhibit arbitrage. For further details, we refer the reader to Appendix A.

Once the VAE has been trained, the network’s parameters can be ﬁxed and used forinference tasks. During the calibration procedure, the goal is to identify the latent variablessuch that the outputs of the decoder match the market data as closely as possible. Wepropose two methods for calibration. One method is to use the encoder to infer the latentvariables. The alternative is to use the decoder in conjunction with an external optimizer(such as the Levenberg-Marquardt algorithm) to minimize the reconstruction loss. While theformer is more computationally eﬃcient, requiring only a single pass through the network,the latter is more suitable when option data is sparse.After the parameters have been calibrated, the VAE can be used to infer unobservedoption prices. Although we focus on the use of VAEs for completing volatility surfaces,there are several other notable applications. In lieu of PCA, VAEs can be used for eﬃcientdimensionality reduction to analyze the dynamics of volatility surfaces. Additionally, themodel can be used to generate synthetic-yet-realistic volatility surfaces which can be used instress tests, or for inputs to other analyses such as the valuation of exotic options.

3. Experimental Results3.1 Methodology

To test our methodology, we use over-the-counter option data from 2012–2020 for theAUD/USD, USD/CAD, EUR/USD, GBP/USD and USD/MXNcurrency pairs, provided by Exchange Data International. The prespeciﬁed grid we choseconsists of 40 points formed from eight times to maturity (one week, one month, two months,three months, six months, nine months, one year and three years) and ﬁve diﬀerent deltas(0.1, 0.25, 0.5, 0.75 and 0.9). As prices are quoted for at-the-money (ATM), butterﬂy, andrisk-reversal options, we use the equations provided in Reiswich and Wystup (2012) to obtainthe implied volatilities for the call options (for further details refer to Clark (2010)).The dataset is partitioned into a training set, which is used to train the VAE, and avalidation set, which is used to evaluate performance. The partitions are split chronologicallyto prevent leakage of information. We use 15% of available data as the validation set, whichcontains data from March 2020 – December 2020.We ﬁnd that the choice of network architecture makes a marginal diﬀerence to the results,and so we choose to use two hidden layers in the encoder and decoder, with 32 units ineach layer. We leave the latent dimension ( i.e., the dimension of the encoder output) tobe a variable in our experiments. To train our model, we minimize the objective function9n equation (3) using the Adam optimizer from Kingma and Ba (2017). With variouscombinations of hyperparameters, including learning rate and batch size, we use a randomgrid search to identify suitable hyperparameter choices to optimally balance the reconstructionloss and KL divergence to ensure continuity in latent space.

To evaluate the model’s ability to complete volatility surfaces, we randomly sample asubset of all options observed on a given day, and assume that these provide the only knownpoints on the volatility surface. We then use these points to calibrate our model using agradient based optimizer , minimizing the reconstruction error in equation (1). All 40 optionprices are then predicted using the inferred latent variables. We vary the number of samplepoints and the number of latent dimensions in the trained VAEs to see how our modelperforms in various conditions.Initially, we trained VAEs on volatility surfaces from single currency pairs. However, wefound that training models using data from multiple currencies yielded more robust models.Exhibit 4(a) shows the mean absolute error when the models are trained using only theAUD/USD data, while Exhibit 4(b) shows the mean absolute errors when VAEs are trainedusing volatility surfaces from all six currency pairs. It can be seen that in all but two casesbetter results are achieved by training the model on all six currency pairs. This suggests thatthere is similarity in the drivers of volatility surfaces across diﬀerent currency pairs.To compare our results to traditional volatility models, we perform the same task usingHeston. The mean absolute error for each currency in the validation set is shown in Exhibit5, where we compare Heston to our best performing model from Exhibit 4. In additionto consistently outperforming Heston in reconstructing volatility surfaces, there are someadditional practical beneﬁts from using VAEs. A primary advantage of using the VAE is thatit predicts prices signiﬁcantly faster, which makes calibration much more eﬃcient. Anotheradvantage is that regularization during training encourages latent space to be continuous– small perturbations in latent space result in small perturbations in the volatility surface.This is not true for a model such as Heston, as the inverse map from market prices to modelparameters can be multivalued. Finally we highlight the ﬂexibility of using our approach.When extreme market conditions are encountered, a VAE can be easily retrained. In ourexperience, this can be done in only a few minutes using just over 10,000 surfaces.To investigate where our model performs the best, we calculate the mean absolute errorfor individual grid points. We found that the parts of the volatility surface that correspond tooptions that are close to expiry have the greatest error. This is not surprising as these options,particularly when they are close to the money, exhibit the most volatile prices. We note thatour models were trained using an equal weighting of all options, however practitioners caneasily alter the weights to suit their requirements.

4. We use the L-BFGS algorithm. (a)

Models trained using only AUD/USD volatility surfaces.

Assumed Number of Known Points on Volatility SurfaceLatentDimensions 5 10 15 20 25 30 35 402 107.6 82.5 71.4 64.2 63.9 63.5 63.5 63.33 75.9 53.8 49.8 48.2 47.0 46.6 46.5 46.34 61.1 41.5 37.7 35.9 34.7 34.2 34.1 33.6 (b)

Models trained using all available currency pairs.

Exhibit 4:

The mean absolute error across the AUD/USD validation set for inferringvolatility surfaces when given partial observations. Each row contains a trained model with adiﬀerent number of latent dimensions. Units are in basis points.ModelCurrency Pair Heston VAEAUD/USD 56.6 33.6USD/CAD 35.3 32.5EUR/USD 32.2 30.9GBP/USD 47.6 34.0USD/JPY 58.5 38.2USD/MXN 92.2 56.7

Exhibit 5:

Mean absolute error when calibrating using Heston and a four-dimensional VAE.All 40 points of the volatility surface are observed for calibration. Units are in basis points.11 .3 Generating Synthetic Surfaces

As mentioned, VAEs can be useful for generating synthetic volatility surfaces (e.g. for stresstesting a portfolio) as well as for completing partially observed surfaces. A straightforwardapproach is to sample a latent variable from the prior distribution (in this case, a normaldistribution), and use the decoder to construct a volatility surface. To show that this wouldyield a variety of volatility surfaces, we interpolate between points in latent space andconstruct the corresponding surface. This is illustrated in the case of a two-dimensional VAEin Exhibit 6. While, in general, interpreting the latent dimensions of a VAE is a non-trivialtask, we can observe how the direction of skew and the term structure of volatility variesacross both dimensions. A rich variety of volatility surface patterns are obtained.

Exhibit 6:

Examples of synthetic volatility surfaces generated when interpolating acrosstwo latent dimensions. 12e may wish to observe the behavior of volatility surfaces in speciﬁc scenarios. Exhibit 7shows the encoded latent variables for the volatility surfaces in the validation set. While themajority of the points are clustered near the origin, we notice many outliers which correspondto volatility surfaces observed at the beginning of the international pandemic in March 2020.By sampling latent variables in these outlying regions, we can simulate extreme scenariosthat may occur.

Exhibit 7:

Latent (colour coded) encodings of implied volatility surfaces13 . Conclusion

Our results demonstrate that VAEs provide a useful approach to analyzing volatilitysurfaces empirically. We have described how a VAE can be trained, and used the context offoreign exchange markets as a realistic testing ground. Our results show that volatility surfacescan be captured using VAEs with as few as two latent dimensions and that the resultingmodels can be used for practical and exploratory purposes. For the sake of concreteness, welimited the scope of this paper to VAEs with Gaussian priors. In future work it should bedetermined if a VAE model without Gaussian priors is able to learn an even wider range ofmarket behaviours while retaining the stability of our model.14 eferences

Ackerer, Damien, Natasa Tagasovska, and Thibault Vatter. 2020. “Deep Smoothing ofthe Implied Volatility Surface.” ArXiv: 1906.05065, arXiv:1906.05065 [cs, q-ﬁn, stat], accessed October 21, 2020. http://arxiv.org/abs/1906.05065.Bates, David S. 1996. “Jumps and Stochastic Volatility: Exchange Rate Processes Implicitin Deutsche Mark Options.”

The Review of Financial Studies issn :0893-9454, accessed February 2, 2021. https : / / doi . org / 10 . 1093 / rfs / 9 . 1 . 69. https ://doi.org/10.1093/rfs/9.1.69.Bayer, Christian, Blanka Horvath, Aitor Muguruza, Benjamin Stemper, and Mehdi Tomas.2019. “On deep calibration of (rough) stochastic volatility models.” ArXiv: 1908.08806, arXiv:1908.08806 [q-ﬁn], accessed November 16, 2020. http://arxiv.org/abs/1908.08806.Black, Fischer, and Myron Scholes. 1973. “The Pricing of Options and Corporate Liabilities.”

The Journal of Political Economy arXiv:2007.10462 [q-ﬁn], accessed October 23, 2020. http://arxiv.org/abs/2007.10462.Clark, Iain J. 2010. “Foreign Exchange Option Pricing.”

Foreign Exchange Option Pricing, arXiv:1809.02233 [cs, q-ﬁn], accessed June 17, 2020. http://arxiv.org/abs/1809.02233.Gatheral, Jim. 2004. “A parsimonious arbitrage-free implied volatility parameterization withapplication to the valuation of volatility derivatives,” 41. http://faculty.baruch.cuny.edu/jgatheral/madrid2004.pdf.Gatheral, Jim, and Antoine Jacquier. 2013. “Arbitrage-free SVI volatility surfaces.” ArXiv:1204.0646, arXiv:1204.0646 [q-ﬁn], accessed October 30, 2020. http://arxiv.org/abs/1204.0646.Gatheral, Jim, Thibault Jaisson, and Mathieu Rosenbaum. 2014. “Volatility is rough.” ArXiv:1410.3394, arXiv:1410.3394 [q-ﬁn], accessed February 2, 2021. http://arxiv.org/abs/1410.3394.Hernandez, Andres. 2016.

Model Calibration with Neural Networks.

SSRN Scholarly Paper ID2812140. Rochester, NY: Social Science Research Network. Accessed June 13, 2020.https://doi.org/10.2139/ssrn.2812140. https://papers.ssrn.com/abstract=2812140.Heston, Steven. 1993. “A Closed-Form Solution for Options with Stochastic Volatility withApplications to Bond and Currency Options.” https://doi.org/10.1093/rfs/6.2.327.Horvath, Blanka, Aitor Muguruza, and Mehdi Tomas. 2019. “Deep Learning Volatility.”ArXiv: 1901.09647, arXiv:1901.09647 [q-ﬁn], accessed May 28, 2020. http://arxiv.org/abs/1901.09647. 15ingma, Diederik P., and Jimmy Ba. 2017. “Adam: A Method for Stochastic Optimization.”ArXiv: 1412.6980, arXiv:1412.6980 [cs], accessed November 25, 2020. http://arxiv.org/abs/1412.6980.Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” ArXiv:1312.6114, arXiv:1312.6114 [cs, stat], accessed November 25, 2020. http://arxiv.org/abs/1312.6114.Lucic, Vladimir. 2019.

Volatility Notes.

SSRN Scholarly Paper ID 3211920. Rochester, NY:Social Science Research Network. Accessed December 23, 2020. https://doi.org/10.2139/ssrn.3211920. https://papers.ssrn.com/abstract=3211920.Madan, Dilip B., Peter P. Carr, and Eric C. Chang. 1998. “The Variance Gamma Process andOption Pricing.”

Review of Finance issn : 1572-3097, 1573-692X, accessedFebruary 2, 2021. https://doi.org/10.1023/A:1009703431535. https://academic.oup.com/rof/article/2/1/79/1581894.Merton, Robert C. 1976. “Option pricing when underlying stock returns are discontinuous.”

Journal of Financial Economics issn

Wilmott issn : 15406962, accessed November 26, 2020. https://doi.org/10.1002/wilm.10132. http://doi.wiley.com/10.1002/wilm.10132.16 ppendix A. Arbitrage Conditions

Following Gatheral and Jacquier (2013), we can specify static arbitrage conditions asfollows: Let F t denote the forward price at time t and X = log KF t . Deﬁne w ( X, t ) = t · σ ( X, t )as the total implied variance surface. An IVS is free of calendar arbitrage if ∂w∂t ≥ . (5)Let w (cid:48) = ∂w∂X and w (cid:48)(cid:48) = ∂ w∂X . Suppressing the arguments for w ( X, t ), the volatility surface isfree of butterﬂy arbitrage if g ( X, t ) := (cid:16) − Xw (cid:48) w (cid:17) − w (cid:48) (cid:16) w + 14 (cid:17) + w (cid:48)(cid:48) ≥ . (6)A volatility surface is said to be free of static arbitrage if the conditions in equation (6) and(5) are met.If we let L cal = (cid:13)(cid:13)(cid:13)(cid:13) max (cid:16) , − ∂w∂t (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) , (7)and L but = (cid:107) max(0 , − g ) (cid:107) , (8)the loss function in equation (3) can then be extended as follows:RE + β KL + λ cal L cal + λ but L but . (9)As the parameters λ cal and λ butbut