[PDF] A Probabilistic Approach to Nonparametric Local Volatility

Abstract

The local volatility model is a widely used for pricing and hedging financial derivatives. While its main appeal is its capability of reproducing any given surface of observed option prices---it provides a perfect fit---the essential component is a latent function which can be uniquely determined only in the limit of infinite data. To (re)construct this function, numerous calibration methods have been suggested involving steps of interpolation and extrapolation, most often of parametric form and with point-estimate representations. We look at the calibration problem in a probabilistic framework with a nonparametric approach based on a Gaussian process prior. This immediately gives a way of encoding prior beliefs about the local volatility function and a hypothesis model which is highly flexible yet not prone to over-fitting. Besides providing a method for calibrating a (range of) point-estimate(s), we draw posterior inference from the distribution over local volatility. This leads to a better understanding of uncertainty associated with the calibration in particular, and with the model in general. Further, we infer dynamical properties of local volatility by augmenting the hypothesis space with a time dimension. Ideally, this provides predictive distributions not only locally, but also for entire surfaces forward in time. We apply our approach to S&P 500 market data.

Full PDF

AA Probabilistic Approach to

Nonparametric Local Volatility

Martin Tegn´er ∗ and Stephen RobertsDepartment of Engineering ScienceOxford-Man InstituteUniversity of Oxford, UK Abstract

The local volatility model is a widely used for pricing and hedging ﬁnancial deriva-tives. While its main appeal is its capability of reproducing any given surface ofobserved option prices—it provides a perfect ﬁt—the essential component is a la-tent function which can be uniquely determined only in the limit of inﬁnite data. To(re)construct this function, numerous calibration methods have been suggested in-volving steps of interpolation and extrapolation, most often of parametric form andwith point-estimate representations. We look at the calibration problem in a prob-abilistic framework with a nonparametric approach based on a Gaussian processprior. This immediately gives a way of encoding prior beliefs about the local volatil-ity function and a hypothesis model which is highly ﬂexible yet not prone to over-ﬁtting. Besides providing a method for calibrating a (range of) point-estimate(s),we draw posterior inference from the distribution over local volatility. This leadsto a better understanding of uncertainty associated with the calibration in particu-lar, and with the model in general. Further, we infer dynamical properties of localvolatility by augmenting the hypothesis space with a time dimension. Ideally, thisprovides predictive distributions not only locally, but also for entire surfaces forwardin time. We apply our approach to S&P 500 market data.

Keywords:

Option pricing, local volatility, probabilistic inference, Gaussian pro-cess models.

In the context of option pricing, the local volatility model introduced by Derman andKani (1994) and Dupire (1994) is well celebrated, and a versatile generalisation of thefoundational Black-Scholes model (Black and Scholes (1973) and Merton (1973)). From ∗ Correspondence to: [email protected] . We thank Chris Oates, Mike Osborne andChristoph Reisinger for helpful comments and discussions. a r X i v : . [ q -f i n . C P ] J a n t r i k e [ U S D ] m a t u r it y [ y ea r s ] ca ll b i d / a s k p r i ce [ U S D ] lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l lllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l lll l l l l lllll l l l l lllllllllllllllllllllll l l llll l l l lllll l l l l l llllllllllllllllllllllllllllllllllllllll l l l llllll l l l l l l l l l l l llllllllllllllllllllllll l l l l l l llllllllllllllllllllll l l l l l l llllllllllllllllllllllllll l l l l l l l l l −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−− − − − − −−−−−−−−−−−−−−−−−−−−−−−−−−−−− − − − − − −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−− − − − − −−−−−−−−−−−−−−−−−−−−−−−−−−−−− − − − − − − s t r i k e [ U S D ] m a t u r it y [ y ea r s ] l o ca l vo l a tilit y , M LE Figure 1: Left:

Market prices of S&P 500 call option: best bid–ask quotes as of closing 4 January2010.

Right:

Calibrated local volatility surface from optimising squared price errors (7) over alllocal volatility points. The surface achieves the same error as the maximum a posteriori estimateshown in Figure 3. the latter, the constant diﬀusion coeﬃcient of the underlying stock price—the volatility —is replaced by a function of current time and stock price—the local volatility function . Asvolatility eﬀectively determines how a diﬀusion model prices options on the underlyingstock, going from a single- to an inﬁnite-dimensional parameter is a tactical move: WhileBlack-Scholes ﬁts only one observed option price perfectly, the local volatility functionprovides ﬂexibility to ﬁt a whole surface of prices. Dupire’s formula even provides anexplicit recipe for how this can be done. However, the latent local volatility function canonly be uniquely retrieved if one has access to a continuum of option prices at all strikesand maturities. Here begins the challenge of local volatility modelling.With a discrete and ﬁnite set of market prices that can be utilised for calibration (Fig-ure 1, (left) illustrates one set of data we use in this paper) local volatility modellingbecomes tantamount to proposing a ﬁnite-dimensional approximation to the local volatil-ity function along with a routine that details how to calibrate the suggested model tomarket prices. Two general approaches appear to form common practice: either, oneworks with observable option prices (or equivalently, implied volatilities) with an inter-polation/extrapolation scheme such that Dupire’s formula can be employed for a directcalculation of local volatility (Kahal´e (2004), Benko et al. (2007), Fengler (2009), Fenglerand Hin (2011), Glaser and Heider (2012), to mention just but a few). Alternatively, aparametrised functional form is assumed for the non-observable local volatility functionand calibrated to market prices by minimisation of a suitable objective function based onmodel-to-market discrepancy (see for example Jackson et al. (1999), Luo and Liu (2010),Andreasen and Huge (2011), Lipton and Sepp (2011), Reghai et al. (2012)).One challenge with the ﬁrst approach is that it requires the call price surface to beinterpolated in an arbitrage-free manner. Indeed, constructing an interpolation method2hat produces option prices consistent with observed data is essentially equivalent toconstructing (and estimating) an option price model; and so we are back to square one.Furthermore, the resulting local volatility model is generally non-robust and non-smooth,both of which are problems that make further employment (hedging, pricing exotics,quantifying risks, etc.) under this approach questionable (Cont (2010), Hirsa et al. (2003)).Proposing a functional form for local volatility and formulating the calibration problemas one of targetting a model-to-market distance is perhaps a more appealing approach,even if it also has the challenge of designing an appropriate model. As for the numberof parameters of the functional form, choosing too few might lead to under-parametrisedmodels unable to ﬁt market data. At the other extreme, choosing as many parameters asobserved option prices makes the ill-posedness of the problem severe: With ﬁnite marketdata, a solution to the minimal model-to-market distance under some objective is non-unique, and generally non-smooth and non-stable. For a vivid illustration of this point,Figure 1, (right) shows a local volatility surface obtained from optimisation over allpoints of the surface that correspond to market data. On top of this is the fact that theoptimisation problem is non-trivial. Common objectives, such as weighted least-squares,yield non-convex maps with respect to local volatility and are also unsuitable for gradient-based optimisers due to dimensionality. To this end, regularisation has been proposedto both address the ill-posedness of the problem and to impose structure on the localvolatility parameter (Lagnado et al. (1997), Chiarella et al. (2000), Cr´epey (2003), Cezaroet al. (2008)).This paper adapts the latter approach and attempts to address its shortcomings. Firstly,the non-uniqueness of an optimal local volatility is tackled by acknowledging that param-eter uncertainty pertains to the model, whereby the focus of calibration is shifted from anoptimal point-estimate to a probability distribution over local volatility functions. Thisis reasonable from a statistical perspective, by realising that common optimisation objec-tives correspond to likelihoods of (generalised) Gaussian noise models, and further by thefact that market prices are only observed up to a bid-ask spread.A probabilistic formulation for local volatility appears in Hamida and Cont (2005), whosuggest an algorithm for generating samples of a parametrised model that yields optionprices within bid-ask spreads of market data. While this sample is intended for a coherentmeasure of model risk (Cont (2006)), we adopt a Bayesian formalism and concentrateon drawing inference from the joint posterior. In this regard, our approach is similar inspirit to what has been proposed in a recent paper by Gupta and Reisinger (2014) forthe parametric model of Jackson et al. (1999). Most notably, we extend their Bayesianidea by proposing a nonparametric functional prior for local volatility, and leverage therich framework of Gaussian processes for this purpose. We also draw inference about theobservation model for a full Bayesian treatment. Fixing parameters of the latter, as inGupta and Reisinger (2014), inﬂuences the posterior over local volatility and criticallyleads to biased uncertainty estimates.Gaussian processes are nonparametric models which oﬀer a generic tool for estimationthat is particularly suitable for Bayesian setups. They are widely used by the machinelearning community due to their expressiveness, tractability and robustness to overﬁtting We use a stochastic gradient method for this purpose, Spall (1998). • The ability to explicitly encode prior beliefs about local volatility in a straightfor-ward way. For instance, it is widely understood that a smooth surface is desirable,especially for further hedging and pricing purposes. Such functional characteristicsare easily encoded by specifying a suitable mean and covariance kernel. Comparingwith the prior formulation outlined in Gupta and Reisinger (2014), we ﬁnd a Gaus-sian process speciﬁcation more intuitive, ﬂexible and direct. Indeed, even more soin contrast to calibration approaches which use regularisation. • A Gaussian process model provides full posterior predictions, both interpolationswithin and extrapolations outside input strike-maturity data. Moreover, predictionscome with an associated notion of uncertainty which is appealing from a model-risk perspective. We utilise this prediction ability to estimate volatility points bothinternally on the surface, and for prediction of the entire local volatility surfaceforward in time. • The inference process for Gaussian processes is well studied, i.e. the practical stepsof model calibration to data. We utilise methods widely used in the machine learningliterature for eﬃcient sampling. • Finally, we use the fact that inferring a latent local volatility surface, by minimisingan objective of option prices, eﬀectively corresponds to a likelihood model; mostnotably with additive Gaussian noise for the squared error objective. In fact, oursetup is ﬂexible enough to accommodate any choice of calibration instruments andnoise model, where the latter takes into account that one observse bid-ask quotesin place of idealised non-arbitrage prices. The noise variance (and more generallycorrelation structure) is then associated with the size of the bid-ask spread and itis naturally included in the Bayesian inference process.The rest of the paper is organised as follows. Section 2 reviews the local volatility modeland discusses practical aspects that motivate our approach. The latter is detailed inSection 3 along with predictive equations and numerical methods for inference. Section 4contains experiments, Section 5 an extension on local volatility over time. Finally, Section6 concludes.

To set the scene, we consider a ﬁnancial market model consisting of a risk-free money ac-count and a risky asset whose price process is modelled by the local volatility model. Sincewe are interested in option pricing in this paper, we focus on the risk-neutral perspectiveand describe some theoretical and practical aspects thereof in the following sections.4 .1 European option pricing

The local volatility model is formulated as the class of diﬀusion processes under whichthe asset price S has risk-neutral dynamics under QdS t = ( r − q ) S t dt + σ ( t, S t ) S t dW t , t ∈ [0 , ¯ T ] (1)where the latent local volatility function , σ : X → R , on the domain X = [0 , ¯ T ] × R + ,essentially represents the model.For a current time t , a European call option with strike K and maturity T ≥ t is deﬁnedby its payoﬀ ( S T − K ) + at maturity where S T is unknown for T > t . If S follows the localvolatility model, the price of the option at time t is given by a pricing functional( t, S t , T, K, σ ) (cid:55)→ C ( t, S t , T, K, σ )which satisﬁes Black-Scholes’ equation in the variables ( t, s ) ∂C∂t + 12 σ ( t, s ) s ∂ C∂s + ( r − q ) s ∂C∂s − rC = 0 (2)with terminal condition C ( T, s ) = ( s − K ) + . This is due to no-arbitrage pricing theory (seee.g. Bj¨ork (2009)) and equation (2) can equivalently be represented by the risk-neutralpricing formula C ( t, s ) = E Q (cid:2) e − r ( T − t ) ( S T − K ) + (cid:12)(cid:12) S t = s (cid:3) . (3)More generally, the Markovian pricing formula (3) yields the price of any contingent claimby replacing the payoﬀ of the call option with the corresponding payoﬀ of the claim. ForEuropean-style exercises, (3) also admits a representation (2) with the terminal conditionreplaced by appropriate boundary conditions associated with the claim.The main appeal of the local volatility model, at least from a theoretical viewpoint, isthat it is capable of reproducing any set { c i } of time- t observable call prices. To eachprice c i , there is an associated maturity and strike price which we collect with an inputvariable x = ( T, K ) ∈ X . Thus, there exists a local volatility function σ such that c i = C ( x i , σ ) , ∀ i (4)for any set { c i } with corresponding strike-maturity inputs { x i } . In (4) and in what follows,we suppress the current state variable ( t, S t ) = (0 , S ) from the call price functional andsimply write C ( x, σ ) ≡ C ( T, K, σ ). Hence, we consider the market as seen from a singledate only. We will have more to say about time evolution of volatility in Section 5.In view of (4) one would expect that the knowledge about σ grows with number ofobservations. Indeed, given a consistent continuum of call prices { C ( T, K ) : (

T, K ) ∈ X } , Generally speaking, one can reproduce { c i } only if it lies within the range of prices attainable by themodel. For local volatility, all sets of prices are attainable as long as they are consistent in the sense of nostatic arbitrage opportunities (see Carr and Madan (2005)). This is encoded in Dupires formula (5) since C must be non-decreasing in maturity (otherwise conversion arbitrage) and convex in strike (otherwisebutterﬂy arbitrage) for the local volatility to be admissible.

5r equivalently, a price function C : X → R , Dupire’s formula (Dupire (1994)) gives theunique local volatility function σ ( T, K ) = (cid:115) ∂C∂T − ( r − q ) (cid:0) C − K ∂C∂K (cid:1) K ∂ C∂K (5)compatible with this inﬁnite set of prices. The converse is also true: given a local volatilityfunction, Dupire’s forward equation gives the unique call price function ∂C∂T + ( r − q ) K ∂C∂K − K σ ( T, K )2 ∂ C∂K + qC = 0 (6)with initial condition C (0 , K ) = ( s − K ) + . This is a restatement of the backward equation(2) in the variables ( T, K ) and there is thus a one-to-one correspondence between a localvolatility function and a call price function.

Remark 1.

The local volatility model in (1) is formulated with constant risk-free rate ofreturn r and dividend yield q . This is for convenience only, and the methodology suggestedin this paper applies equally to settings with time-dependent rates and dividends, bothcontinuous and discrete, as long as the set-up allows for a pricing equation that can besolved at least numerically. As for most option pricing models, there is no known closed-form solution to the call priceequation (2) nor (6), and one has to revert to numerical methods. To this end, Dupire’sequation advances a second major appeal of the local volatility model: for a given input setof strike-maturities, a numerical solution of (6) gives the whole surface of correspondingmodel prices simultaneously. This is an attractive feature from a practical viewpoint sinceone is interested in computing a large set of prices corresponding to market quoted optionswhen performing calibration of the model.Mode-to-market calibration is an inverse problem commonly formalised with least squares.Given a ﬁnite set of market data of call prices { ˆ c i , ˆ x i } ni =1 , ﬁnd a local volatility functionsuch that the square error functional G ( σ ) = n (cid:88) i =1 ( C (ˆ x i , σ ) − ˆ c i ) (7)is minimised. Weighted least squares is also common, where the weights may representliquidity through inverse bid-ask spreads, or the inverse Black-Scholes vega for a ﬁrstorder approximation of the square error in terms of implied volatility.In view of (4), it should be possible to ﬁnd a σ such that the error (7) is zero. However,due to what is eﬀectively expressed in (5), one can not expect that there is a unique At least, this could be expected when the data is a set of consistent prices observed without noise.In practice, however, prices are observed up to their bid-ask quotes such that neither consistent nornoise-free prices are readily available. σ to the calibration problem of minimising (7). This could only be achieved inthe limit of inﬁnite data. In other words, when calibrating to ﬁnite data, one can hopefor a local volatility function to be pinned down only at a ﬁnite set of strike-maturities(not necessarily the same as market quoted strike-maturities) subject to uncertainty.Dupire’s formula also suggests a converse estimation approach in at least two ﬂavours:either non-parametrically by approximating the derivatives of (5) with ﬁnite diﬀerences,or by assuming a parametric form ˜ C in place of C , ﬁtting its parameters to data and thenplugging derivatives of ˜ C into (5). However, while the former is known to output “spiky”local volatility surfaces along with model-to-market errors that are far from minimal (aﬁnite diﬀerence approximation is not even guaranteed to preserve the positiveness oflocal volatility), the latter brings questions about overﬁtting, interpolation/extrapolationbehaviour as well as choice of parametric form—design choices which arguably shouldbe made with minimisation of a model-to-market error in mind. Further, with a ﬁxedchoice of ˜ C , note that an increasing dataset aﬃliates the estimation of the parametersof ˜ C , not necessarily the minimisation of (7). In all, both approaches are subject to theambiguity present under ﬁnite data. The latter in that as long as the parametric form ﬁtswell with observations from the call price surface, one has liberty to chose its functionalform. For the former, one is free to choose and tune the approximation methods for ﬁrst-and second order derivatives. Finally, even if these methods appear more direct as theyprovide means of computing the local volatility function in place of imputing , they should,at least in principle, support an equivalent formulation as an optimisation problem of theerror functional (7) since this explicates an objective of minimising a distance to observedcall prices.In what follows we build on the calibration approach based on the model-to-market mea-sure. We propose to cast local volatility modelling in a probabilistic framework, suchthat the calibration problem eﬀectively corresponds to optimising a regularised version of(7). Conceptually, this gives a clear probabilistic representation for a prior local volatil-ity model (corresponding to the regularising term) and results in a posterior distribution over local volatility, as opposed to a point-estimate obtained from the optimum underregularisation. Following the approach of inverse calibration through optimisation with a square error,the next section outlines how a probabilistic framework can be adopted for local volatilitywith a nonparametric Gaussian process model. We detail posterior predictive equationsfor local volatility and call prices, and end the section by proposing a numerical method. Indeed, as mentioned in the introduction, the calibration problem is ill-posed in the sense that thereexist several solutions to the minimisation of (7). What further complicates is that σ (cid:55)→ G ( σ ) is neitherconvex nor suitable for gradient-based optimisers, see the discussion in Hamida and Cont (2005). .1 The calibration problem recast In practice, prices of options are quoted only up to bid-ask spreads from which it iscommon the use the mid-market price for calibration. To this end, we assume a set ofobservable mid-market call prices ˆ c = { ˆ c i } ni =1 at a market set of strikes and maturitiesˆ x = { ˆ x i } ni =1 to be generated from the noise model c i = C (ˆ x i , σ ) + (cid:15) i (8)for a common local volatility function. Hence C ( · , σ ) represents the fair price surfacefrom which the mid-market prices are noisy observations. For simplicity, we assume { (cid:15) i } is independent Gaussian noise with variance σ (cid:15) , and note that our approach generaliseseasily to heteroscedastic noise structured over X . This leads to a log-likelihood functionlog p (ˆ c | σ, σ (cid:15) ) = − σ (cid:15) n (cid:88) i =1 ( C (ˆ x i , σ ) − ˆ c i ) − n (cid:0) πσ (cid:15) (cid:1) . (9)In terms of maximising the likelihood with respect to the local volatility function, this isequivalent with minimising the sum of squared errors (7).As mentioned in the introduction, a standard approach to modelling is to model the localvolatility function as a parametric function σ ( T, K ) ≡ σ θ ( T, K ) with a ﬁnite-dimensionalparameter θ , and then to calibrate by minimising θ (cid:55)→ G ( σ θ ). With this as an exam-ple, taking the calibration problem to a probabilistic setting means postulating a prior distribution p ( θ ) over the parameters based on prior knowledge about θ . Note that it ispossible to include the noise variance σ (cid:15) in θ . Conditioning on observed data, Bayes’ rulethen gives the posterior distribution over parameters p ( θ | ˆ c ) = p (ˆ c | θ ) p ( θ ) p (ˆ c )where p (ˆ c | θ ) is s the likelihood (9) written as a function of θ and the marginal likelihood is given by p (ˆ c ) = (cid:82) p (ˆ c | θ ) p ( θ ) dθ .In a Bayesian framework, the inference process considers the posterior distribution over θ and not only point-estimates. For summary measures, one may use the maximum aposteriori (MAP) estimator θ MAP = arg max θ p ( θ | ˆ c ) . Another common choice is the mean of θ with respect to the posterior distribution, whichis referred to as Bayes’ estimator. For a notion of uncertainty attached to the parame-ter estimate, quantiles of the posterior distribution give credible intervals . We will makecommon use of the posterior mean plus/minus two standard deviations as a more robustalternative to quantile estimates. Examples include Gupta and Reisinger (2014), Hamida and Cont (2005), Luo and Liu (2010) andJackson et al. (1999). It is also common to calibrate parts of θ sequentially to subsets of ˆ c to obtain a more direct corre-spondence between calibrated parameters and particular sets of strikes-maturities; for instance slices ofcommon maturity (see the discussion in Luo and Liu (2010)). However, since a particular model price C ( T i , K i , σ ) depends on σ ( T, K ) for all K ≥ T ≤ T i , a partial calibration to local parts of the pricesurface always yields a worse ﬁt than a global optimum. .2 Probabilistic nonparametric local volatility At the base of our approach is a random nonparametric local volatility function. To thisend, we assume a Gaussian process functional prior with input space X (Rasmussen andWilliams (2006)) f ∼ GP ( m, k ( x, x (cid:48) )) ,σ = φ ( f )where φ is a positive function applied pointwise for imposing positivity of the localvolatility function. The Gaussian process deﬁnes a distribution over real valued func-tions on X where any ﬁnite set of function evaluations of f follow the multivariate normaldistribution—see (14). In particular, prior beliefs are encoded in a mean m and covariancefunction k ( x, x (cid:48) ) = Cov( f ( x ) , f ( x (cid:48) ))deﬁned for any pair of inputs in the strike-maturity space X . We denote the parametersof the covariance function with κ , which together with m and σ (cid:15) are referred to as themodel’s hyperparameters. Note that the mean and covariance function uniquely specifya Gaussian process. We use a constant m for convenience although it is possible to use aparametrised function for more informed beliefs about the mean.The observed prices ˆ c are conditioned on the function evaluated at an input set whichwe denote with x = { x i } Ni =1 , that is, on the surface f = f ( x ), such that we have the loglikelihood log p (ˆ c | f , σ (cid:15) ) = − σ (cid:15) n (cid:88) i =1 ( C (ˆ x i , f ) − ˆ c i ) − n (cid:0) πσ (cid:15) (cid:1) (10)In (10), the call price is written directly as a function of f , i.e. implicitly as a compositionwith the link function f (cid:55)→ φ ( f ) = σ where σ = σ ( x ).There are a few aspects to point out here. First, considering the likelihood as a functionof f = f ( x ) instead of f ( · )—i.e. a ﬁnite set of function evaluations in place of the functionobject itself—is without loss of generality (see Remark 3) although a necessity since wecan only work with ﬁnite sets of function values in practice. When working with localvolatility models, the call price function is given only up to a numerical solver of (6)which in turn relies on local volatility values taken at a ﬁnite input set. Schematically, fora ﬁnite diﬀerence solver, we have f (cid:55)→ C ( x , f ) . (11)Second, while the ﬁnite diﬀerence mapping (11) outputs call prices over a strike-maturityset corresponding to the set of input local volatility values, x does not necessarily coincidewith the observed market set ˆ x . Typically, ˆ x is scattered over the strike-maturity spacewhile a ﬁnite diﬀerence grid is a regular Cartesian product x = T × K = { ( T i , K j ) : i = 1 , . . . , I, j = 1 , . . . , J } . (12)As long as x is constructed such that ˆ x ⊆ x , ( n ≤ N ) we may evaluate the likelihood(10) which depends on call prices over the market set only—see Figure 2 for an example. With f ( x ) we mean the set of f -values at every point of the ﬁnite set x , i.e. f ( x ) ≡ { f ( x ) } x ∈ x . x (on every valueof f ) since the model price at every strike-maturity point x is dependent on the entirelocal volatility function up to the maturity of x , albeit with a stronger dependency locally around x due to the smoothness properties of (6). Ultimately, we want to infer posteriorknowledge about the local volatility function not only at the points of the market set,but also for strike-maturities lying outside, and quantify how uncertain such predictionsmight be. Equations for this purpose will be given in Section 3.4 (see also Remark 3).Third, the likelihood (10) is neither factorisable nor a Gaussian with respect to f (twoproperties commonly assumed in Gaussian process regression). This hinders analyticaltractability and eﬃcient computations. In eﬀect, what we face here is a nonlinear regres-sion problem in a latent space where observations are noisy outputs from a transformationof an unobservable function.At the next level of inference, hyperparameters are considered unknown up to some hy-perprior . We then obtain the joint posterior for hyperparameters and function values p ( f , κ, m, σ (cid:15) | ˆ c ) = 1 p (ˆ c ) p (ˆ c | f , σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) likelihood p ( f | κ, m ) (cid:124) (cid:123)(cid:122) (cid:125) f − prior p ( κ, m, σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) hyperprior (13)where p ( f | κ, m ) is short for the prior over f induced by the Gaussian process. By deﬁnition f ∼ N ( m, K ff ) (14)with K ff the covariance matrix given by evaluating the covariance function at all pairs ofinput points x , that is, with elements[ K ff ] i,j = k ( x i , x j ) . Eventually we write K κ ≡ K ff to stress that it is a matrix valued function of κ . We usea product covariance function of squared exponentials k ( x i , x j ; κ ) = σ f k SE ( T i , T j ; l T ) k SE ( K i , K j ; l K ) , (15) k SE ( x, x (cid:48) ; l ) = exp (cid:18) − ( x − x (cid:48) ) l (cid:19) , with parameters collected in κ = ( l T , l K , σ f ). Since (15) is smooth with ﬁnite derivativesat zero of all orders, f will have continuous sample-function derivatives of all orders(Lindgren (2012)) and we thus impose a strong smoothness property on the local volatilityfunction. This is a desire commonly expressed in the literature, especially when the modelis considered for derivative hedging and pricing purposes (e.g. Jackson et al. (1999)). Wealso consider a product Mat´ern 3/2 covariance function with factors k Mat32 ( x, x (cid:48) ; l ) = (cid:32) √ | x − x (cid:48) | l (cid:33) exp (cid:32) − √ | x − x (cid:48) | l (cid:33) (16)which yields sample paths of f that are continuously diﬀerentiable. This covariance willpotentially achieve higher likelihood values (i.e. a closer ﬁt in terms of model-to-marketerrors), but at the cost of losing smoothness of the local volatility surface.10or κ , and for the remaining hyperparameters, we assume scaled sigmoid Gaussian priors;each of l T , l K , σ f , m, σ (cid:15) is generated independently as (with θ a generic parameter) θ = θ min + θ max − θ min − ξ ) , ξ ∼ N ( µ ξ , σ ξ ) . (17)The motivation behind this choice of hyperprior is that it gives a convenient parametrisa-tion. Each parameter is constrained to take values in ( θ min , θ max ) while µ ξ and σ ξ provideﬂexibility over the distributional shape on this interval. This makes the speciﬁcation ofdesirable hyperprior assumptions straightforward. A limited support also makes the modelmore prone to identiﬁability issues and therefore improves sampling eﬃciency, see Tran-gucci et al. (2016). For convenience, we henceforth use µ ξ = 0, σ ξ = 1, and write θ = ssg( ξ )for the transformation in (17).In the Bayesian framework, the model calibration problem eﬀectively amounts to gen-erating a sample of local-volatility surfaces, { f (1) , f (2) , f (3) , . . . } , from the joint posterior(13). We detail an algorithm for this in Section 3.5. From this sample, one can extractthe sample counterpart of a MAP-surface (the “point-estimate” surface which achieveshighest posterior likelihood), calculate a posterior mean-surface as well as quantifying un-certainty in the calibrated local volatility by means of credible intervals. Furthermore itgives a principled way for predicting volatility at unseen inputs consistent with observeddata. The posterior predictive distribution is outlined in Section 3.4. Remark 2.

To gain some insight into the inference process, we consider the likelihoodwritten as log p (ˆ c | σ (cid:15) , SSE ) = − σ (cid:15) SSE − n πσ (cid:15) ) (18)with the sum of squared errors, SSE = (cid:80) ni =1 ( C (ˆ x i , f ) − ˆ c i ) . The ﬁrst term of (18),the “data-ﬁt”, assigns probability mass to small values of SSE (the likelihood is anexponential distribution over

SSE with its mode at zero) and large variances with σ (cid:15) . Thesecond term, the “model-complexity”, on the other hand prefers small σ (cid:15) . As a result, ifwe ignore the prior over ( σ (cid:15) , f ), the likelihood strictly works to minimise the model-to-dataerror whilst balancing with an appropriate noise variance to accommodate for variabilityin the data. Similarly, the Gaussian process priorlog p ( f | κ, m ) = − ( f − m ) (cid:62) K − κ ( f − m ) −

12 log( | π K κ | )punishes complex models through the second term. The ﬁrst term assigns probability tosurfaces referenced to the mean, i.e. it balances the likelihood over f , while it also tunesan appropriate covariance matrix. In all, the inference process automatically performs atrade-oﬀ between model-ﬁt (bias) and complexity (variance), in a principled manner. Thiswill also be the theme of the next section. We have proposed the use of diﬀerent covariance functions which eﬀectively deﬁne sep-arate members of a local volatility model-family. To discriminate between diﬀerent suchmembers in a principled way we consider model selection from a Bayesian perspective.11onsider yet another level of inference with a discrete set of possible model structures {M i } . Before seeing data, these are distributed according to a prior, p ( M i ), which maybe uniform in lack of preferences. Given data, Bayes’ rule yields the posterior over models p ( M i | ˆ c ) = p (ˆ c |M i ) p ( M i ) p (ˆ c )where p (ˆ c ) = (cid:80) i p (ˆ c |M i ) p ( M i ). The key factor in this expression is the model evidence,the probability of the data under each model p (ˆ c |M i ) = (cid:90) p (ˆ c | f , σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) likelihood p ( f | κ, m, M i ) (cid:124) (cid:123)(cid:122) (cid:125) f − prior M i p ( κ, m, σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) hyperprior d f dκ dm dσ (cid:15) (19)where we keep the hyperprior (and likelihood) invariant across models—cf. (13).A selection process based on the model evidence incorporates an automatic trade-oﬀbetween model ﬁt and model complexity, such that the least complex model which isable to explain the data is favoured. This is the Occam’s razor eﬀect (MacKay (2003),Rasmussen and Ghahramani (2001)): A simple model can account for a smaller rangeof possible data sets. Since its evidence is a distribution over data normalising to unity,is has large probabilities attached to sets within its range. Conversely, a complex modelmay account for a wide range of data but with relatively smaller probabilities, as it has tospread its probability mass across a larger range. The latter is the reason for the evidencenot necessarily favouring (complex) models with the best data ﬁt—their complexity ispenalised as well—but one which yields a balance between the two.

To make predictions at a strike-maturity point x (cid:63) (at a set x (cid:63) ) which is not included inthe input grid x , we have the posterior predictive distribution of f (cid:63) = f ( x (cid:63) ) p ( f (cid:63) | ˆ c ) = (cid:90) p ( f (cid:63) | f , κ, m, σ (cid:15) , ˆ c ) p ( f , κ, m, σ (cid:15) | ˆ c ) (cid:124) (cid:123)(cid:122) (cid:125) joint posterior dκ dm dσ (cid:15) d f . (20)The conditional distribution p ( f (cid:63) | f , κ, m, σ (cid:15) , ˆ c ) is usually not dependent on data. Here p ( f (cid:63) | f , κ, m, σ (cid:15) , ˆ c ) = p ( f (cid:63) | f , κ, m ) (cid:124) (cid:123)(cid:122) (cid:125) cond. prior p (ˆ c | f (cid:63) , f , σ (cid:15) ) p (ˆ c | f , σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) likelihood ratio . (21)Except for the case when predicting at maturities T (cid:63) greater that the latest maturity ofˆ x , the likelihood ratio in (21) does not cancel out. This is because (10) is non-factorisableover the local volatility surface: a model price C (ˆ x, f , f (cid:63) ) depends on all values in { f , f (cid:63) } taken at maturities less than ˆ T , even if the dependency is strongest on function valuestaken locally around ˆ x .The conditional prior in (21) is the predictive distribution for a Gaussian process (Ras-mussen and Williams (2006)) p ( f (cid:63) | f , κ, m ) = N ( f (cid:63) | m f (cid:63) | f , K f (cid:63) | f ) (22)12ith m f (cid:63) | f = m + K f (cid:63) f K ff − ( f − m ) , K f (cid:63) | f = K f (cid:63) f (cid:63) − K f (cid:63) f K ff − K ff (cid:63) . (23)From the posterior predictive distribution we obtain the distribution for predicting unseencall prices c (cid:63) at corresponding strike-maturities x (cid:63) p ( c (cid:63) | ˆ c ) = (cid:90) p ( c (cid:63) | f , f (cid:63) , σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) data distribution p ( f (cid:63) | f , κ, m, σ (cid:15) , ˆ c ) p ( f , κ, m, σ (cid:15) | ˆ c ) dκ dm dσ (cid:15) d f d f (cid:63) . (24)The data distribution is a spherical Gaussian as follows from the noise model (8) p ( c (cid:63) | f , f (cid:63) , σ (cid:15) ) = N ( c (cid:63) | C ( { f (cid:63) , f } , x (cid:63) ) , σ (cid:15) I ) (25)with I denoting the identity matrix of appropriate dimension. Remark 3.

The joint distribution that is marginalised in (20) is equal to1 p (ˆ c ) p (ˆ c | f (cid:63) , f , σ (cid:15) ) p ( f (cid:63) , f | κ, m ) p ( κ, m, σ (cid:15) ) (26)which is the posterior over ( f (cid:63) , f , κ, m, σ (cid:15) ) given ˆ c . Consider x (cid:63) to be the inputs of x whichare not included in ˆ x , such that { f (cid:63) , f (ˆ x ) } = f , and replace f with f (ˆ x ) in (26). Wethus have that local volatility at unquoted (unobserved) strike-maturities can either beinferred from the posterior (13) or, equivalently, by predicting these points with respectto the joint posterior over ( f (ˆ x ) , κ, m, σ (cid:15) ) from (20), i.e. with respect to local volatility atmarket-quoted (observed) inputs only. The convenient consequence of this fact is that wecan choose to either include our predictions x (cid:63) in the input set x and generate jointly theirfunctional values by posterior sampling, or we can ﬁrst sample from the posterior overvalues at (the grid which spans) the market set ˆ x and then infer the predictive distributionover x (cid:63) in a second step. A posterior distribution of the form (13) is intractable in general, with the case of Gaus-sian likelihoods standing as a rare exception (under known hyperparameters). To thisend, we will represent the posterior with samples generated by Markov chain MonteCarlo (MCMC), which is an asymptotically exact method (see e.g. Robert and Casella(2004)). Several methods also exists for deterministic approximations where a tractabledistribution is chosen within a suitable family, to be close to the posterior. These approx-imations do, however, depend on factorising likelihoods to be eﬀective, and are known tobe exclusive in ﬁtting mass to a (local) mode of the posterior, hence potentially under-estimating uncertainty. As well, when employing a calibrated model in practice, we stillrequire posterior samples.To sample ( f , κ, m, σ (cid:15) ) from the joint posterior (13), we use blocked Gibbs sampling (Ge-man and Geman (1984)) and alternate between updating the variable of function values13onditional on hyperparameters, and hyperparameters conditional on function values, re-spectively. This gives a Markov chain with the joint target as the limiting distribution(Tierney (1994)).For the update of functional values we use elliptical slice sampling (Murray et al. (2010)).The model is ﬁrst re-parametrised to have a zero-mean prior with parameter κ , while m is absorbed into the likelihood f | κ ∼ p ( f | κ ) ≡ N (0 , K κ ) , ˆ c | f , m, σ (cid:15) ∼ p (ˆ c | f , σ (cid:15) , m ) ≡ p (ˆ c | f + m, σ (cid:15) ) ,p ( f , κ, m, σ (cid:15) | ˆ c ) = 1 p (ˆ c ) p (ˆ c | f , σ (cid:15) , m ) p ( f | κ ) p ( κ ) p ( σ (cid:15) ) p ( m ) . (27)For hyperparameters ﬁxed at some values, i.e. given K κ and ( σ (cid:15) , m ), the elliptical slicesampler targets the conditional ∝ p (ˆ c | f , σ (cid:15) , m ) p ( f | κ ) with a proposed transition f → f (cid:48) deﬁned by an ellipse f (cid:48) ( θ ) = f cos( θ ) + ¯ f sin( θ ) , ¯ f ∼ N (0 , K κ ) , θ ∼ Uniform[0 , π ] . The proceeding angle θ is drawn uniformly over a bracket [ θ min , θ max ] = [ θ − π, θ ], whichis shrunk (“sliced” from Neal (2003)) towards θ = 0 until the proposal is accepted—seeMurray et al. (2010) for further details.Next we consider updating the covariance hyperparameter κ . Its conditional given func-tional values is ∝ p ( f | κ ) p ( κ ). Targeting this density will, however, lead to a slow ex-ploration of the support for κ . This since the functional prior is very informative of itshyperparameters in the sense that the two are strongly correlated. Further, there is nodirect guidance from data through the likelihood in this update. To improve mixing, wetherefore decouple the prior dependency between f and κ by the parametrisation f ( κ, ν ) = L κ ν , L κ L κ (cid:62) = K κ , ν ∼ N (0 , I ) , (28)with a Cholesky decomposition for the covariance matrix square-root. This results in anequivalent posterior p ( ν , κ, m, σ (cid:15) | ˆ c ) = 1 p (ˆ c ) p (ˆ c | f ( ν , κ ) , m, σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) likelihood p ( ν ) (cid:124)(cid:123)(cid:122)(cid:125) ν − prior p ( κ ) p ( m ) p ( σ (cid:15) ) (29)where we stress that the likelihood now depends on covariance hyperparameters throughthe deterministic transformation f ( κ, ν ) = L κ ν , and that the prior for ν is independentof the hyperprior. For a current state ( f , κ ), the corresponding ν is implied by ν = L − κ f .Given ( ν , m, σ (cid:15) ), a proposed transition κ → κ (cid:48) then targets the (unnormalised) conditional p (ˆ c | f ( ν , κ ) , m, σ (cid:15) ) p ( κ ) (30)where functional values are indirectly updated through f → f (cid:48) = L κ (cid:48) ν . Further, due tothe hyperprior assumption (17) with κ = κ ( z ) ≡ ssg( z ) for a Gaussian z ∼ N (0 , I ), weemploy the elliptical slice sampler on p ( z | ν , m, σ (cid:15) , ˆ c ) ∝ p (ˆ c | f ( ν , κ ( z )) , m, σ (cid:15) ) p ( z ). With ∝ p ( θ ) we mean an unormalised density with normalising constant that does not depend onthe target variable of interest. The normalising constant cancels out in a Metropolis-Hastings acceptanceratio for all the samplings steps we consider here. f is relatively weak, such that f is distributed almost according to its prior.This is discussed by Murray and Adams (2010), who propose surrogate data slice sampling(SDSS) for models with strong likelihoods. In this case, proposals targeting (30) are oftenrejected such that mixing of the Markov chain for κ is poor. As a remit, they augmentthe model with a noise variable centred around function values, g | f ∼ N ( f , S ), where S is a user speciﬁed diagonal covariance. They then make hyperparameter proposals thatupdates f conditional on its noisy version g , through the parametrisation f ( κ, η , g ) = L R κ η + m κ, g , L R κ L R κ (cid:62) = R κ , η ∼ N (0 , I ) , (31)where R κ and m κ, g from the conditional f | g , κ ∼ N ( m κ, g , R κ ) are given by R κ = S − S ( S + K κ ) − S , m κ, g = R κ S − g . In practice, a current state ( f , κ, m, σ (cid:15) ) is ﬁrst augmented by drawing g from N ( f , S ). Theimplied variable is calculated, η = L − R κ ( f − m κ, g ), whereby a proposal κ (cid:48) → κ targets itsconditional posterior given ( η , g , m, σ (cid:15) ), p ( κ | η , g , m, σ (cid:15) , ˆ c ) = 1 p (ˆ c ) p (ˆ c | f ( κ, η , g ) , m, σ (cid:15) ) p ( g | κ, S ) p ( η ) p ( κ ) p ( m ) p ( σ (cid:15) ) , where the relevant term is p (ˆ c | f ( κ, η , g ) , m, σ (cid:15) ) p ( g | κ, S ) p ( κ )—see Murray and Adams(2010) for details. Again, due to our hyperparameter assumption, we perform this up-date in z -space with elliptical slice sampling.The ﬁnal step in our sampling scheme updates the hyperparameters ( m, σ (cid:15) ) of the likeli-hood. We target its conditional ∝ p (ˆ c | f , σ (cid:15) ( z ) , m ( z )) p ( z ) with elliptical slice sampling onthe Gaussian z . As the mean-level m and σ (cid:15) are one-dimensional parameters to which thelikelihood is quite sensitive, this sampling step is relatively eﬀective; it is also cheap sincethe target conditional does not involve the functional prior.The three updates—functional values, covariance and likelihood parameters—summariseseach sequential step of the Markov chain. We combine them for a hybrid strategy wherewe randomise between sampling based on (28) and (31) when updating κ (see Robertand Casella (2010)). Similarly, we follow the strategy of Murray and Adams (2010) andapply three computationally “cheap” updates of f between every expensive update of thecovariance.To evaluate the likelihood, we need to calculate the mapping (11) of a local volatilitysurface to a call price surface. To this end, we use a Crank-Nicolson ﬁnite diﬀerence schemefor a numerical solution of Dupire’s equation (6), see Hirsa (2012). For this purpose, weconstruct the input set x by taking the Cartesian product (12) of all observed maturitiesand strikes in the market set ˆ x . This results in a grid with non-equidistant spacing in bothdimensions as illustrated for market data by the grey mesh in Figure 2. For boundaryconditions across maturity, we use that the call price approaches S − Ke − rT for small K and zero for large K . To improve the accuracy of these approximations, we extend thegrid in the strike dimension with a ﬂat extrapolation of the input local volatility surfacewith moneyness S/K down to 0.1 and up to 4, respectively.15n terms of computational complexity, the second update of covariance parameters isthe most expensive with a cost O ( N ) for (standard) inversion of the covariance matrixeach time a new κ is proposed. Proposing a new f also costs O ( N ) for the Choleskydecomposition of K κ , although the same decomposition can be recycled for consecutiveproposals. The third update is the cheapest since it only requires computing the likelihood.One such evaluation involves solving I systems of J × J tridiagonal matrices for theCrank-Nicolson scheme of the call price, where I ( J ) is the number of maturities (strikes)of the input set x , i.e. a cost O ( IJ ) = O ( N ). In fact, we can readily exploit that x is aCartesian product and use Kronecker methods when computing matrix decompositionsand inversions, see Saatchi (2012). This reduces the cost to O ( N / ) for both the second(for each κ proposal) and the ﬁrst update (recycled throughout a f -update with ellipticalslice sampling).Finally, we consider sampling from the predictive distribution (20). For a prediction input x (cid:63) with maturity T (cid:63) smaller than the latest maturity of ˆ x we approximate the likelihoodratio in (21) to be constant (note that this is exact if T (cid:63) is larger than the latest maturityof ˆ x ). We then generate predictions by direct simulation (ancestral pass), i.e. from anapproximating Gaussian mixture p ( f (cid:63) | ˆ c ) ≈ M (cid:88) i =1 M p ( f (cid:63) | f ( i ) , κ ( i ) , m ( i ) ) (32)where { f ( i ) , κ ( i ) , m ( i ) } Mi =1 is a sample from the posterior. This approach is used in Wil-son and Ghahramani (2010) and Osborne (2010), and yields sampling from the exactpredictive distribution (under constant likelihood ratio) in the limit M → ∞ .To sample from the complete distribution (20) we may add an importance step to theabove procedure. First we generate a sample { f (cid:63) ( i ) } from the Gaussian mixture (32), i.e.we employ p ( f (cid:63) | f , κ, m ) p ( f , κ, m, σ (cid:15) | ˆ c ) (33)as importance distribution. The unnormalised weight function w ( f (cid:63) , f , σ (cid:15) ) = p (ˆ c | f (cid:63) , f , σ (cid:15) ) p (ˆ c | f , σ (cid:15) ) (34)evaluated at each sample point ( f (cid:63) ( i ) , f ( i ) , σ ( i ) (cid:15) ) is then used for resampling over { f (cid:63) ( i ) } .Importance sampling generally works well when the variance of the importance weightsis low, in our case when the likelihood p (ˆ c | f (cid:63) , f , σ (cid:15) ) is weak with respect to f (cid:63) . Again thisdepends on how many (and how strongly) model prices corresponding to ˆ c vary with f (cid:63) , which is due to the locations of the prediction inputs x (cid:63) . In the case of a sensitivelikelihood, one may improve the resampling procedure by using annealed importancesampling (see Neal (2001) for details).With a sample of local volatility predictions in hand, we approach the predictive dis-tribution over call prices (25). To this end, we ﬁrst map { f (cid:63) ( i ) , f ( i ) } Mi =1 through C ( · , x (cid:63) )to obtain a sample { C (cid:63) ( i ) } Mi =1 from the predictive distribution over fair call prices. Weuse this in turn for representing the distribution over predicted data (24) as a Gaussiancentred around the mean over predicted fair prices and covariance matrix from the law oftotal variance, i.e. mean ( { σ (cid:15) ( i ) I } i ) + cov ( { C (cid:63) ( i ) } i ).16

000 1500 2000 2500 . . . . . . . strike [USD] m a t u r it y [ y ea r s ] lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l lllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l ll l l l l l lllll l l l l lllllllllllllllllllllll l l ll l l l l l lllll l l l l l llllllllllllllllllllllllllllllllllllllll l l l ll l l l l l l l l l l l l l l l ll l l l lllllllllllllllllll l l l l l l ll lll lllllllllllllllll l l l l l l ll l l lllllllllllllllll l l l l l l l l l l l l l ll l l l l l l ll l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l s t r i k e [ U S D ] m a t u r it y [ y ea r s ] i m p li e d vo l a tilit y lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l lllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l lll l l l l lllll l l l l lllllllllllllllllllllll l l llll l l l lllll l l l l l llllllllllllllllllllllllllllllllllllllll l l l llllll l l l l l l l l l l l llllllllllllllllllllllll l l l l l l llllllllllllllllllllll l l l l l l llllllllllllllllllllllllll l l l l l l l l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l lllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l ll l l l l l lllll l l l l lllllllllllllllllllllll l l ll l l l l l lllll l l l l l llllllllllllllllllllllllllllllllllllllll l l l ll l l l l l l l l l l l l l l l ll l l l lllllllllllllllllll l l l l l l ll lll lllllllllllllllll l l l l l l lllllllllllllllllllll l l l l l l l l l l l l l ll l l l l l l ll l l l l l l l l lll l l l l l l l l l l llll l l l l l l l l l l llllll l l l l l l l l l l lllllllll l l l l l l l l l lllllllllll l l l l l l l ll llllllllllllll l l l l l Figure 2: Left:

Strikes and maturities of observed S&P 500 call options as of 4 January 2010.The red and blue dots show all 473 market observations; blue dots show the subset ˆ x of 175 pointsused for the experiments (the market set). The strikes and maturities of this set span the gridshown in grey, from which all nodes are collected as the input set x (i.e. the Cartesian productof all unique strikes and maturities available in ˆ x ). Right:

Black-Scholes implied volatilities bymid-market prices (corresponding bid-ask prices shown in Figure 1).

In the following sections, we put our approach into practice with an example based onmarket data from the S&P 500 index. We draw posterior inference about local volatility,and demonstrate the beneﬁt of our probabilistic formulation with focus on parameteruncertainty, i.e. to what extent we can be conﬁdent about the local volatility estimate welearn from data. In our view, the Gaussian process framework provides a major beneﬁt asthe inference process is straightforward. Further, we demonstrate posterior prediction ofboth local volatility and call prices. We focus on uncertainty in the predictions, while stillthe setup allows for straightforward and robust inference without additional extrapolationassumptions.

Figure 2 shows quoted strikes and maturities, and implied volatilities from S&P 500 calloptions as observed on 4 January 2010. The data is pre-ﬁltered following Constantinideset al. (2013): quotes are removed having any of (i) zero bid-prices, (ii) less than 7 daysto maturity, or (iii) implied volatility below 0.05 or above 1. Corresponding bid and askprices are shown in Figure 1 (left).As seen in Figure 2, the quotation density is higher for options with short maturities (lessthan three months) and strikes close to the current spot price $1133 shown with a verticaldashed line—options being close to “at-the-money”. Across maturity, quotes are sparser17t later maturities and the same holds for options away from the at-the-money strikelevel: for calls deep “out-of-money” (strikes > $1750) there is only a handful of quoteswhile no options are quoted deep “in-the-money” (strikes < $750).To reduce the computational cost, we select a subsample from the full (pre-ﬁltered) marketdata as the set { ˆ c , ˆ x } we use for inference. This is illustrated in Figure 2 where ˆ x isplotted with blue dots while red dots show excluded data. The selection is based on asimple algorithm that sequentially includes the strike level with the highest number ofavailable quotes across maturity, until a ﬁxed number of strikes are included in ˆ x . Foreach selection, only strikes lying outside the range of strikes available in the current ˆ x areconsidered as candidates. The set is then pruned, by sequentially excluding the maturitywith the smallest number of quotes, until a ﬁxed number of maturities remain in the set.The purpose of this procedure is to end up with a grid x = ˆ T × ˆ K (where ˆ T and ˆ K arestrikes and maturities included in ˆ x ) of a reduced size = × x ( ×

14 = 1442 and 473 observations (33%) while the selected subset x is of size 20 × x has 117 observations (73%). The computational saving is inhaving to handle the covariance matrix associated with f = f ( x ) of N = 160 values whilethe full data would require N = 1442.To set-up an operational Gaussian process model, we scale x to lie in the unit squareand set κ i, max = 1, ∀ i , for the covariance hyperprior, and σ (cid:15), max = 0 .

75 and m max =log 0 . κ -ranges of0–1 should provide more than enough expressiveness. For the likelihood, the maximumnoise standard-deviation 0.75 is of the same magnitude as the average bid-ask spread of3.6. Note, however, that the noise models the deviance of the mid-market price from theunobserved fair price, not the (half) bid-ask spread. Calibration

In the ﬁrst part of our experiment, we calibrate the local volatility model tothe S&P 500 mid-market call prices as described in Section 3.5 by sampling from the jointposterior (13) under a squared exponential covariance. We generate 50,000 states from theMarkov chain, discard the ﬁrst 10,000 as burn-in and subset the reminder to a thinnedsample of size 1000. Figure 3 (left) represents the posterior with a credible region of ± t r i k e [ U S D ] m a t u r it y [ y ea r s ] l o ca l vo l a tilit y s t r i k e [ U S D ] m a t u r it y [ y ea r s ] l o ca l vo l a tilit y , M A P Figure 3:

Local volatility calibrated to S&P 500 call prices.

Left: credible envelope of ± Right: maximum a posterior surface. Dashedblack lines indicate cross sections taken at six maturities and shown in Figure 4.

The shapes of the credible intervals in Figure 4 are symptomatic of two related features:non-linearity of the pricing/likelihood function and data availability across strikes andmaturities. For short maturities (

T < K (cid:29) S there is not muchchance of the option reaching in-the-money before its (short) time to maturity and henceits price is close to zero; the option is likely to expire worthless. Likewise, for K (cid:28) S thereis little chance of going out-of-money and the option’s price is close to its intrinsic value S − K (the reminding time-value is small, in both case). In eﬀect, the price insensitivitymakes the likelihood uninformative about local volatility over these strike-maturity points,i.e. the probabilistic inference process attributes more uncertainty to the local volatilityvalues. For the same reasons there is little trading in these options, which also contribute toan increased posterior uncertainty as there is lack of data. Market quotes are concentratedaround at-the-money, K = S , and these options are also the most sensitive to changesin local volatility, since the volatility aﬀects the prospects of moving out/in the money.The overall eﬀects can be seen in the top panes of Figure 4: a high degree of estimationuncertainty is attached to the local volatility at large and small strikes, while close toat-the-money there is almost no posterior uncertainty in the estimate.For later maturities the price sensitivity increases, also for strikes away from at-the-money.With time, the volatility has an increased eﬀect on the probability that the option leavesits in- or out-of-money region—in view of (1) it is a parameter of diﬀuseness of theunderlying. For long maturities, local volatility therefore has higher impact on optionprices across all strikes. In eﬀect, credible intervals of the bottom panes in Figure 4 showrather constant uncertainty across strikes. Compared to earlier maturities, the band iswider around at-the-money. Data availability is also a reason since there is a large gap inmarket quotes between maturities of 2 and 3 years—see Figure 2.19

00 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y maturity: 0.13 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y ±2SDMAPpseudo obs.maturity: 0.28 year600 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y maturity: 0.48 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y maturity: 0.74 year600 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y maturity: 0.99 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y maturity: 1.5 year600 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y maturity: 2 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] l o ca l vo l a tilit y maturity: 3 year Figure 4:

Posterior distribution over local volatility as represented by the maximum a posteriorisurface (solid grey line) and a credible interval of ±

00 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l maturity: 0.13 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l ±2SDMAPdataexcludedmaturity: 0.28 year600 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l maturity: 0.48 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l maturity: 0.74 year600 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l maturity: 0.99 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l l l maturity: 1.5 year600 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l l l maturity: 2 year 600 800 1000 1200 1400 1600 1800 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l l l l l maturity: 3 year Figure 5:

Posterior distribution over implied volatilities: maximum a posteriori surface (solidgrey line) and ± D plottedwith blue dots while red asterisks show the excluded market data (see Figure 2). The posterior sample of local volatility surfaces gives a probabilistic representation of thelocal volatility function. This is interesting in its own right as it provides information onwhat can be learned (and not learned) from call price data about the latent function,i.e. to what degree we may pin down σ -values over diﬀerent strike-maturity points. Theresult—point-estimate(s) and associated uncertainty—can then be used when employingthe model for secondary purposes. For example, if pricing a derivative which is sensitive tovolatility over points with high calibration uncertainty, one should take this into accountfor a robust price-estimate. Next we use the posterior sample to re-price the calibrationinstruments. Re-pricing market options

To see how well the calibrated model explains marketdata, we transform the posterior sample over local volatility to samples over call pricesand implied volatilities. The result is represented in Figure 5 for implied volatilities. Theﬁgure show that the MAP surface is close to the data used for calibration (blue dots) and21o the market data excluded from this set (red asterisks). The only exemptions are someout-of-money implied volatilities at the shortest maturity, as shown in the top left paneof Figure 5. This conﬁrms the fact that the local volatility model is often inconsistentwith option market prices at short maturities (Gatheral (2011)). An explanation is thatthe underlying stock price is a continuous process in the local volatility model, while themarket anticipate sudden price changes, “jumps”, which contribute to more pronouncedimplied-volatility skews (as observed in top-left Figure 5).Implied volatility MAP-to-market errors are within − . ± .

02 and within − . ± .

007 if excluding the shortest maturity. Both indicate a good re-pricing performance ofour approach (still, recall that the likelihood is based on errors of model and marketprices, not implied volatilities) and competitive to what is reported in the literature:Hamida and Cont (2005) report ∼ ± .

04 for a set of DAX options, Luo and Liu (2010)a few percentage points.The uncertainty in the local volatility estimate propagate to only modest credible inter-vals for implied volatility, especially over the in-the-money region. The reason is, again,model price/implied volatility (in)sensitivity to the local volatility parameter—even ifthere is a high level of estimation uncertainty in local-volatility over some strike-maturityregions, it is not transferred back to uncertainty over prices/implied volatilities. This ro-bust behaviour of call option prices should, however, not be taken for granted for otherderivatives, especially if they are designed to be heavily dependent on volatility.If we plot conﬁdence regions over call prices they appear completely tight, with all marketprices indistinguishable from the MAP estimate (ﬁgures not shown for brevity). Pricesare $0–500 while their posterior standard deviations are $10 − –0.6. Figure 6 representsthe posterior over call prices on a logarithmic scale together with bid-ask spreads, fromwhich mid-prices are observed data. For the earliest maturity in the left ﬁgure (withlargest implied volatility errors), most bid-ask spreads—also for excluded data—enclosethe credible region over the model price even if the mid-price sometimes fall outside. Thefair price described by the posterior is hence permitted to be within the market spread.This is of course desirable since otherwise they would disagree on arbitrage. For out-of-money options where prices are relatively small ( K > $1200), the noise is dominating(the posterior mean of σ (cid:15) is 0.45) such that mid-prices (and spreads) fall within their ± Prediction

We apply the predictive distributions of Section 3.4 to predict local volatil-ity over unseen inputs. As an illustrative example, we consider long-dated options andextend the input grid to maturities 4–10 years. We generate a prediction sample of 1000 lo-cal volatility surfaces by direct simulation: each of 1000 randomly selected states ( f , κ, m )from the posterior sample are used to calculate the moments of the conditional prior in(20). We then generate a local volatility surfaces from every such set of moments. The22

000 1100 1200 1300strike [USD] ca ll p r i ce ( l og a r it h m i c ) . . −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− − −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− − −− − − − − − − −− − − − − − − − maturity: 0.13 year 600 800 1000 1200 1400 1600 1800 strike [USD] ca ll p r i ce ( l og a r it h m i c ) − − − −−−−−−−−−−−−−−−−− − − − − − − − − − − − −− − − −−−−−−−−−−−−−−−−− − − − − − − − − − − − −− − − − − − − − − − − − − − − − − − − −− − − − − − − − − − − − − − − − − − − − II ±2SD noise±2SD fairMAPbid−askexcludedmaturity: 3 year Figure 6:

The posterior distribution over call prices on a logarithmic scale, for the shortest(left) and longest (right) maturities of the market data used for calibration. Note that whilethe mid-market price is used for inference about the fair price, the ﬁgure shows quoted bid-askspreads. result is shown in Figure 7 (left) as a credible envelope of ± ∼ ∼ . ± .

3. Thisis a consequence of the predictive properties of a Gaussian process. For points far awayfrom the input grid there is not much predictive power from data as the conditional co-variance declines, and the ﬂat prior mean yields optimal predictions—see (23) where thesecond term, in both moments, goes to zero.The local volatility sample is used to a predict call prices (Figure 7, middle) which inturn are transformed to a sample over implied volatilities (Figure 7, right). Both ﬁguresillustrate how prediction uncertainty associated with local volatility propagate to pricesand implied volatilities: long-dated options can only be consistently priced subject to anincreasing uncertainty, the range of which is substantial for the long-dated maturities.In Figure 8 we show predictions at a maturity of 12 days. This is the shortest maturity ofthe original market data, but it was not included in the set { ˆ c , ˆ x } used for inference (seeFigure 2). As we consider predictive inputs close to the input grid, we obtain a predictiveinterval similar in shape to that of the calibration (cf. Figure 4, top left). Transformed tocall prices and implied volatilities, the predicted surfaces are again too ﬂat in the out ofmoney region, conﬁrming the fact that the local volatility model is incapable of capturingmarket behaviour at very short maturities. Comparison with a Mat´ern covariance

Figure 9 (left) shows the credible envelopeover local volatility from the posterior-sample generated under a Mat´ern 3/2 covariance(16). As encoded by this prior, surfaces are generally less smooth as compared to those ofa squared exponential, cf. Figure 3 (left). Still, credible regions over implied volatility fromthe Mat´ern covariance (not shown for brevity) are almost identical to those of the squaredexponential shown in Figure 5. In terms of call prices, this covariances thus produce a23 t r i k e [ U S D ] m a t u r it y [ y ea r s ] l o ca l vo l a tilit y s t r i k e [ U S D ] m a t u r it y [ y ea r s ] ca ll p r i ce [ U S D ] s t r i k e [ U S D ] m a t u r it y [ y ea r s ] i m p li e d vo l a tilit y Figure 7: Left:

Posterior predictive distribution over local volatility for maturities of 4–10years: credible interval of ± Middle:

Predicted callprices. The ﬁgure is rotated to better show prediction uncertainty for long-dated options.

Right:

Implied volatilities corresponding to predicted call prices. Compared to the calibration (Figure5) the prediction uncertainty is generally higher (same z -axis scale in both ﬁgures) and notablyincreasing in maturity.

600 800 1000 1200 1400 1600 1800 . . . . . . strike [USD] l o ca l vo l a tilit y maturity: 0.033 year ±2SD predictive 600 800 1000 1200 1400 1600 1800 . . . . . strike [USD] i m p li e d vo l a tilit y ±2SD predictivedata, unobservedmaturity: 0.033 year Figure 8:

Predictive distribution at a maturity of 12 days (the shortest of the original dataset). Credible interval of ± t r i k e [ U S D ] m a t u r it y [ y ea r s ] l o ca l vo l a tilit y . . . . . maturity [years] l o ca l vo l a tilit y ±2SD SE±2SD Mat32pseudo obs.strike: at−the−money Figure 9: Left:

Local volatility calibrated to S&P 500 call prices with a prior Mat´ern 3/2covariance; credible envelope of ± Right:

Cross-section along maturity taken at the at-the-money strike level for the squared exponential(light grey) and Mat´ern (shaded dark grey). similar distribution over the observed input set. What is slightly diﬀerent is how thedistribution is induced by the distribution over local volatility. The squared exponentialassigns more probability to smooth surfaces while it has to work with shorter length scales.Posterior sample averages are ¯ l T = 0 .

3, ¯ l K = 0 . l T = 0 .

7, ¯ l K = 0 . p (ˆ c |M i ) ≈ p (ˆ c | f map , σ map (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) best ﬁt likelihood × | K f | ˆ c | | K κ map | × | K κ,m,σ (cid:15) | ˆ c | | I | (cid:124) (cid:123)(cid:122) (cid:125) Occam factors, f and parameters . This is the best ﬁt likelihood achieved by the model times an Occam factor which penalisesthe model for having parameters. The Occam factor is a ratio of volumes of the parameterspace; the posterior accessible volume to the prior accessible volume. The denominatorthus represents the prior range of possible parameter values while the nominator represents25 t r i k e [ U S D ] m a t u r it y [ y ea r s ] l o ca l vo l a tilit y s t r i k e [ U S D ] m a t u r it y [ y ea r s ] i m p li e d vo l a tilit y

600 800 1000 1200 1400 1600 1800 . . . . . strike [USD] i m p li e d vo l a tilit y ±2SD predictivedata, unobservedmaturity: 0.033 year Figure 10:

Posterior predictive distribution under a Mat´ern 3/2 prior, illustrated by credi-ble intervals of ± the posterior range. Prior model complexity is penalised by the former while models whichdo not have to be ﬁne-tuned to data are favoured by the latter.The log-evidence for the squared exponential covariance is − .

4, of which the Occamfactors are 0.45 for the Gaussian process and 0.52 for hyperparameters. The log-evidenceis − . Up to this point, we have considered local volatility calibration based on data as observedon a single time point t = 0, i.e. in view of “today’s” call prices. While the local volatilitymodel is able to consistently and eﬀectively ﬁt the option market in this static sense—see the discussion of equation (4)—it is also known to have poor properties in that theoptimal calibration as of today will not yield an optimal calibration as of tomorrow. Forthis reason, the local volatility model has to be recalibrated, say daily, to be useful inpractice (see the discussion in Hull and Suo (2002)).This motivates us to introduce a third input dimension in our prior model that representstime-dependency of local volatility σ : ( t, T, K ) (cid:55)→ σ ( t, T, K ) (35)where we use a variable ( t, x ) ≡ ( t, T, K ) in the augmented input space ¯ X = R + × X . OurBayesian framework with Gaussian processes readily generalises to this setting such thatwe may draw posterior inference about local volatility over (calender) time.26 .1 Augmented probabilistic model To accommodate (35) we take ¯ X to be the input space of our Gaussian process prior andintroduce a time component in the covariance function with a separate length-scale. Forthe Mat´ern 3/2 covariance function k ( t i , x i ; t j , x j ; κ ) = σ f k Mat32 ( T i , T j ; l T ) k Mat32 ( K i , K j ; l K ) k Mat32 ( t i , t j ; l t ) (36)where κ = ( l T , l K , l t , σ f ) is the vector of covariance parameters. For the observation model,we assume independent and identically distributed noise across timelog p (ˆ c | f , σ (cid:15) ) = − σ (cid:15) τ (cid:88) t =1 n t (cid:88) i =1 ( C (ˆ x t,i , f t ) − ˆ c t,i ) − n tot (cid:0) πσ (cid:15) (cid:1) . (37)We introduce a separate time index t ∈ { , , . . . , τ } such that for a set of n tot observationsˆ c = { ˆ c t } τt =1 , each ˆ c t collects a surface ˆ c t = { ˆ c t,i } n t i =1 of n t mid-prices observed at a singledate t . The index i refers to the corresponding strike-maturity ˆ x t,i = ( t, T i , K i ) of theoption at that date. Similarly, we use x t to denote an input set of N t strike-maturities asof date t and x = { x t } τt =1 for the full input set over all dates (and likewise for market setsˆ x t and ˆ x ). We follow the same convention for functional evaluations: f t = f ( x t ) for thelocal volatility surface at t and f = { f t } τt =1 for the collection of local volatility surfacesover all τ dates, or, equivalently, of f evaluated at every input x of the full input set x .Together with the hyper-prior over ( κ, m, σ (cid:15) ) where l t is assumed independently dis-tributed according to (17), we obtain a joint posterior of functional values and hyper-parameters of the same form as (13). With shorthand notation x τ ≡ { x t } τt =1 , p ( f τ , κ, m, σ (cid:15) | ˆ c τ ) = 1 p (ˆ c ) p (ˆ c τ | f τ , σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) likelihood p ( f τ | κ, m ) (cid:124) (cid:123)(cid:122) (cid:125) f -prior p ( κ, m, σ (cid:15) ) (cid:124) (cid:123)(cid:122) (cid:125) hyperprior . (38) Inference

Sampling the full posterior (38) with the strategy outlined in Section 3.5quickly becomes prohibitive as the computational cost for making proposals scales as O ( N ) = O ( τ N ). Critically, observed strikes and maturities changes over time suchthat x no longer has Kronecker structure. For this reason, we exploit the time-structureof the problem and approach the sampling of (38) by a sequential method proposed byTegn´er et al. (2018).To this end, consider sampling f t when a new set of market data { ˆ c t , ˆ x t } arrives at time t . For given hyperparameters and past values f t − , this means targeting the conditional p ( f t | f t − , κ, m, σ (cid:15) , ˆ c t ) ∝ p (ˆ c t | f t , σ (cid:15) ) p ( f t | f t − , κ, m ) (39)where we use that all terms of the full likelihood drop out except the most recent p (ˆ c t | f t , σ (cid:15) )due to the factorising form of (37). This is blocked sampling of f = { f t , f t − } with f t − ﬁxed. We can therefore apply the updating step for functional values described in Section3.5 by treating the predictive prior in (39)—deﬁned by the predictive equations (22)—asthe Gaussian variable of (27). The cost for this is, however, increasing cubically with t and as a remit we apply a simple form of data selection p ( f t | f t − , κ, m ) ≈ p ( f t | f t − k : t − , κ, m ) . (40)27imiting the conditional dependency to the k ≥ O ( k N ) for a ﬁxed k . It (re)introducesmore variance in the conditional posterior over f t through its approximative “acting prior”(40), which is the proposal distribution of the elliptical slice sampler. This can potentiallyhave beneﬁcial eﬀects for the sampling eﬃciency—see e.g. Jacob et al. (2017) for relatedideas.While sequential sampling of functional values is relatively straightforward, the compu-tational challenge is updating hyperparameters from their posterior conditional on thegrowing sequence of functional values. At time t , the target is p ( κ, m, σ (cid:15) | f t , ˆ c t ) ∝ p (ˆ c t | f t , σ (cid:15) ) p ( f t | f t − , κ, m ) p ( f t − , κ, m, σ (cid:15) | ˆ c t − ) (41)where the rightmost distribution—the posterior from the previous time-step—is the chal-lenging factor. We approach (41) by breaking the dependency between parameters andfunctional values of the previous posterior, and approximate the parameter marginal basedon the hyperprior assumption (17). We thus target the vector z of the transformation( κ, m, σ (cid:15) ) = ssg( z ), and take p ( f t − , z | ˆ c t − ) ≈ N ( z | m z ,t , K z ,t ) q ( f t − ) (42)where m z ,t and K z ,t are the sample mean-vector and covariance matrix estimated from theparameter states generated at the previous sequential step t −

1. Combining (42) and (40)with (41), we can again apply the updating steps for hyperparameters of Section 3.5 at acost captured by a ﬁxed k = 1. Here we note that the marginal term with functional valuesin (42) is irrelevant; so is (40) when updating ( m, σ (cid:15) ). When updating κ , the conditioningin (41) prescribes a strong informativeness about f t relative the likelihood, such that anupdate based on (28) may be preferred to the more involved surrogate data slice sampling.In all, we arrive at a sequential procedure: given the t − { f ( l )1: t − , κ ( l ) , m ( l ) , σ ( l ) (cid:15) } Ml =1 ,we continue to build up the vector of functional values by adding a sampled f ( l ) t based on f ( l ) t − k : t − and then updating κ ( l ) , m ( l ) and σ ( l ) (cid:15) for l = 1 , . . . , M . For the initial time step t = 1, we use the prior methodology of Section 3.5 (i.e. the generated sample of Section4.2). Data and set-up

We use market data of S&P 500 call options from the period 4January 2010 to 31 December 2010. We consider weekly observations, i.e. 52 surfaces ofcall prices where each set is the result from the data-preparation procedure of Section4.1. For every observation time t = 1 , . . . ,

52, we construct a calibration set { ˆ c t , ˆ x t } andinput set x t to include 20 diﬀerent strikes and 8 maturities of the original data. These areillustrated in Figure 11 in terms of implied volatilities. In terms of strikes and maturities,the distribution of market observations ˆ x t over the input set x t is fairly constant acrosstime—similar to what is observed and discussed with Figure 2 (left)—and the coverage, . . . . . . i m p li e d vo l a tilit y

24 May llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Jan 2010 May 2010 Aug 2010 Dec 2010mean s t r i k e [ U S D ] m a t u r it y [ y ea r s ] i m p li e d vo l a tilit y lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllll l l l l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll l l l l l l l lllllllll l l lllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll l l l l l l l l l l l l l l l llllllllllllllllllllllllllllllllllllllll l l l l l l l l l l l l l l l l l lllllllll lllllllllllllllllllllllll l l lllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllll l l l l llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll s t r i k e [ U S D ] m a t u r it y [ y ea r s ] i m p li e d vo l a tilit y llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllll l l l llllllllllllllllllllllllllll l l l l l lllllllllllllllllllllllllllllllll l l l l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllllllllllll l l l llllllllllllllllllllllllllll l l l l l lllllllllllllllllllllllllllllllll l l l l llllllllllllll llllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllll ll llllllllllllllllll ll llllllllllllllllll ll llllllllllllllllll ll l Figure 11:

Weekly data of S&P 500 options from 4 January 2010 to 31 December 2010.

Left:

Implied volatilities from the market data { ˆ c t } τt =1 , each surface grouped together by its observa-tion time t . The black line shows the mean of each implied volatility surface. Middle:

Impliedvolatility surface of 24 May 2015. This surface has the highest mean value.

Right:

Impliedvolatility surface of 27 December 2015 (the last date of the dataset). keep the hyperprior assumptions of Section 4.1 along with l t, max = 1 for the calendar-timelength-scale. We thus scale the observation times to lie in the unit interval. Calibration

We run the sequential method of the previous section to obtain a sample { f ( l )1:52 } Ml =1 where a set of surfaces f ( l )1:52 = { f ( l )1 , . . . , f ( l )52 } is the l th generated state of alocal volatility time-series. Similarly, for a single time t the sample { f ( l ) t } Ml =1 representsthe calibration—the posterior over local volatility—as of that date. Figure 12 illustratesposteriors at nine diﬀerent time points as represented by their credible envelopes.To assess the calibration in term of re-pricing performance, we consider the root meansquared error from each sampled local volatility surface RM SE ( f ( l ) t ) = (cid:118)(cid:117)(cid:117)(cid:116) n t n t (cid:88) i =1 ( σ BS (ˆ x t,i , f ( l ) t ) − ˆ σ BS ,t,i ) (43)where σ BS (ˆ x i , f ) and ˆ σ BS ,i are the volatilities implied by C (ˆ x i , f ) and ˆ c i , respectively. Figure13 (left) shows the root mean squared error obtained from the MAP surface for each t along with the mean and minimum across each sample (the latter not necessarily thesame as the error achieved by the MAP estimate). The minimum measure is fairly stableover time which indicates that the sequential sampling method yields satisfying results.In particular, note that minimal errors of the entire period do not appear for the t = 1sample.Figure 13 (right) shows the corresponding errors in terms of call prices (for which MAP andminimum coincides). Here, the initial sample t = 1 produces the smallest (minimum) errorsince it results from a Markov chain with many more states (50,000 vs. 1000) and as it has amuch less informative prior over both f and its hyperparameters while consecutive samplesdraw on information from their predecessors. After all, we are interested in inferring eachindividual local volatility surface as well as their dependency over time. To be able to29 on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y m on e yn e ss m a t u r it y [ y ea r s ] l o ca l vo l a tilit y Figure 12:

Local volatility calibrated to S&P 500 call prices from diﬀerent dates during 2010.The ﬁgures show credible envelopes and MAP estimates as extracted from the posterior sampleof each time point. . . . . . R M SE , i m p li ed v o l a t ili t y Jan 2010 May 2010 Aug 2010 Dec 2010____________________________________________________ |_ meanminMAP . . . . . . R M SE , c a ll p r i c e Jan 2010 May 2010 Aug 2010 Dec 2010 | MAP

Figure 13:

Root mean squared errors from the local volatility sample at each time t . Left:

Errors in terms of implied volatility from the MAP surface along with the mean and minimumacross the sample.

Right:

Errors in terms of call prices of the MAP local volatility. do the latter, especially from a computational point of view, we must expect to sacriﬁceprecision in the former.The poorest calibration in terms of the largest call-price RMSE (at MAP) appears on 24May 2010. This coincides with the largest observed average implied volatility, see left andmiddle panes of Figure 11. The latter followed a rapid rise in volatilities after the

FlashCrash of 6 May 2010. The overall calibration error mostly stem from errors at the shorterend of maturities (as also observed in Section 4.2)—see Figure 14 for implied volatilities.

Prediction at a future time

The sequential calibration approach can be viewed asday-to-day (in our case weekly) re-calibration aided by previous date’s estimate acting asa regularisation term. What more is, we infer dependency over time in our probabilisticmodel. We can therefore consider posterior predictions at future time-points of the localvolatility surface along with associated uncertainty measures. To this end, the predictivedistribution is obtained by marginalisation.Suppose we have observations up to and including time τ , i.e. a time-series of marketdata { ˆ c t , ˆ x t } τt =1 . The s -step forward prediction of local volatility at time τ + s and unseeninputs x (cid:63)τ + s is then given by p ( f (cid:63)τ + s | ˆ c τ ) = (cid:90) p ( f (cid:63)τ + s | f τ , κ, m ) (cid:124) (cid:123)(cid:122) (cid:125) conditional prior p ( f τ , κ, m, σ (cid:15) | ˆ c τ ) (cid:124) (cid:123)(cid:122) (cid:125) time- τ posterior dκ dm dσ (cid:15) d f τ . (44)In (44) we use that p ( f (cid:63)τ + s | f τ , κ, m, σ (cid:15) , ˆ c τ ) = p ( f (cid:63)τ + s | f τ , κ, m ), due to the likelihood (37)factorising over time.As an example, we consider 1-week and 3-weeks predictive distributions over local volatil-ity at τ + s = 10 May 2010, i.e. as predicted from τ = 3 May 2010 and τ = 19 April31

00 1000 1200 1400 1600 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l maturity: 0.15 year 800 1000 1200 1400 1600 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l ±2SDMAPdataexcludedmaturity: 0.24 year800 1000 1200 1400 1600 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l maturity: 0.35 year 800 1000 1200 1400 1600 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l l l l l maturity: 1.1 year800 1000 1200 1400 1600 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l l l l l maturity: 1.6 year 800 1000 1200 1400 1600 . . . strike [USD] i m p li e d vo l a tilit y l l l l l l l l l l l l l l l l l l l l maturity: 2.6 year Figure 14:

Posterior distribution over implied volatilities on 24 May 2010. i th component is the conditional prior p ( f (cid:63)τ + s | f τ , κ, m ) computed with state ( f ( i )1: τ , κ ( i ) , m ( i ) )from the posterior sample of time τ . For simplicity we condition on only the most recentfunctional values f τ in the conditional prior. The result based on a prediction sample of1000 surfaces is illustrated in Figure 15 for four maturities. In the left column, the 1-steppredictive intervals ( ± realised with the calibration of 10 May 2010. The 3-step predictiveintervals in the right column are considerably wider with a high degree of uncertainty,due to the longer prediction horizon.In a similar fashion, s -step predictions of the call price are obtained by mapping predictedlocal volatility surfaces through the call price operator. Only here we do not have accessto the future stock price S τ + s . In stead we predict implied volatilities over diﬀerent levelsof moneyness. These are invariant to the stock price level as C ( S t , T, K = mS t ) and C ( S t = 1 , T, K = m ) give equivalent implied volatilities. We thus predict the latter at arange of diﬀerent levels of moneyness and back out their implied volatilities.Figure 16 represents the 1-step and 3-step predictive distributions over implied volatilityat 10 May 2010, corresponding to the predicted local volatility in Figure 15. In termsof credible intervals, 1-step predictions are fairly similar to the realised distribution. Incontrast, with 3-step predictions, the uncertainty propagated from the prediction of localvolatility is profound, especially at later maturities.For a measure of predictive accuracy, we use the root mean squared error of impliedvolatilities (43) taken at each predicted surface f (cid:63)τ + s ( l ) , l = 1 , . . . , M . Figure 17 (left)32 .6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y ±2SD 1−step predictive±2SD realisedmaturity: 0.033 year 0.6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y ±2SD 3−step predictive±2SD realisedmaturity: 0.033 year0.6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y maturity: 0.28 year 0.6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y maturity: 0.28 year0.6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y maturity: 1.1 year 0.6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y maturity: 1.1 year0.6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y maturity: 2.6 year 0.6 0.8 1.0 1.2 1.4 . . . moneyness l o ca l vo l a tilit y maturity: 2.6 year Figure 15:

Credible intervals representing the predictive distribution over local volatility asof 10 May 2010. Realised (“true”) calibration intervals shown in grey.

Left column:

Right column: shows the 1-step minimum and mean across l for every prediction time t = τ + s ofthe data set. If we contrast these implied-volatility prediction errors with their realisedcorrespondents (Figure 17, right and Figure 13, left) we see that errors’ sizes are in thesame order of magnitude. While sample-minimum errors of 3-step predictions are slightlyhigher—see Figure 17 (middle)—averages are considerably higher. Again, this is an eﬀectof the longer prediction horizon. Prediction of the VIX

For a more consolidated view on the predictive power of ourapproach we consider the VIX index. The Chicago Board of Options Exchange’s VolatilityIndex (VIX) is a benchmark index for stock market volatility. It is based on prices of S&P500 options by the general formula

V S ( T ) = 2 (cid:88) i ∆ K i K i e rT O ( K i , T ) − (cid:18) F T K − (cid:19) (45)33 .6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y ±2SD 1−step predictive±2SD realiseddata, unobservedmaturity: 0.033 year 0.6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y ±2SD 3−step predictive±2SD realiseddata, unobservedmaturity: 0.033 year0.6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y maturity: 0.28 year 0.6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y maturity: 0.28 year0.6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y maturity: 1.1 year 0.6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y maturity: 1.1 year0.6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y maturity: 2.6 year 0.6 0.8 1.0 1.2 1.4 . . . . moneyness i m p li e d vo l a tilit y maturity: 2.6 year Figure 16:

Credible intervals representing the predictive distribution over implied volatility at10 May 2010. 1-step prediction from 3 May ( left column ) and 3-step prediction from 19 April( right column: ). at the two shortest maturities T and T which are greater than 8 days. The index runsthrough all available options, i < i > O ( K i , T ) denotes their price. For i = 0, the price O ( K , T ) is theaverage of the put and call price and the last factor in (45) compensates for the call beingin the money. V S ( T ) and V S ( T ) are interpolated for an estimate at 30 days to maturityand then annualised for the ﬁnal VIX, see Carr and Wu (2006) for further details.We use (45) as a plug-in formula and map the sample of s -step prices for a predictivedistribution over VIX. Here we also use predictions of C ( S t = 1 , T, K = m ) for diﬀerentlevels of moneyness in place of C ( S t , T, K = mS t ) as their VIX calculations (45) areequivalent. For a benchmark, we construct the “true” VIX by mapping correspondingmarket prices through (45). The result is shown in Figure 18 for both the 1-step and 3-steppredictions. While the latter yield very wide intervals, the former show rater encouragingpredictions. For reference, we have included the index calculated with the data as of theprediction date. Compared to this, our approach outperforms.34 . . . . R M SE , i m p li ed v o l a t ili t y Jan 2010 May 2010 Aug 2010 Dec 2010__________________________________________________ |_ meanmin . . . . R M SE , i m p li ed v o l a t ili t y Feb 2010 May 2010 Sep 2010 Dec 2010________________________________________________ |_ meanmin . . . . R M SE , i m p li ed v o l a t ili t y Jan 2010 May 2010 Aug 2010 Dec 2010____________________________________________________ |_ meanmin Figure 17:

Root mean squared errors of implied volatilities at t = τ + s as predicted at τ . Left with s = 1 step predictions, middle with s = 3 step predictions, while right shows s = 0 forcomparison, i.e. the calibration also shown in Figure 13. Mar May Jul Sep Nov Jan . . . . . . V I X i nd e x Figure 18: Conclusion

Whilst the local volatility model provides ﬂexibility of ﬁtting any set of traded optionprices in theory, practical estimation is challenging. Ipso facto, with discrete and ﬁnitesets of data, there is no unique local volatility `a la Dupire. Thus, along with designing anappropriate model—literature practice is a parametric functional form—comes identiﬁa-bility issues when calibrating to real market prices. Without suﬃcient regularisation (alsoimposing structure on local volatility) the optimisation of common objectives is ill-posed.This has beckoned us to attempt probabilistic learning for local volatility with a cleaner,nonparametric functional hypothesis model.In this paper, we proposed to approach the calibration problem of local volatility withBayesian statistics to infer a conditional distribution over functions given observed data.Notably, common calibration objectives such as squared errors directly correspond tolikelihood assumptions. We introduced the Gaussian process framework for local volatilitymodelling and deﬁned a prior distribution over functions. The motivation behind was atleast two-fold: First, prior domain knowledge, such as smoothness properties, was easilyencoded through the functional prior. Second, while the model is nonparametric in thesense that the number of parameters adapts with data, its complexity is automaticallytuned in the Bayesian inference process. We further leveraged the latter for approachingthe model selection process itself, simply by considering yet another level of inference.Lastly, we built on the rich literature on Gaussian processes to propose eﬃcient numericalalgorithms for training the model to data, i.e. for sampling the posterior distribution overlocal volatility.We exempliﬁed the utility of our approach with price quotes from S&P 500 options. Thisincluded a practical set-up for the model and for selecting an appropriate set of marketdata. In light of calibration results, we discussed beneﬁts of a probabilistic representationwith a certain focus on uncertainty; its origin as well as its propagation to model deriva-tives. Further, as a machine for prediction almost came for free with our approach, weconsidered their probabilistic representation at maturities much longer than priced by themarket, just to see that the uncertainty in such predictions where substantial.Since local volatility needs to be re-calibrated on a regular basis, we extended our model toaccount for a time dimension in the input space. We could then infer dependencies of localvolatility across time by sequential sampling of the posterior conditional on a series of pricesurfaces. Galvanized by our framework already rolled out for predictions, we immediatelytried to predict local and implied volatility forward in time. We showed encouragingresults for a 1-week horizon with prediction errors not far oﬀ from corresponding realisedcalibration errors. Finally, we demonstrated how straightforward it is to employ the set-upfor a predictive distribution over the VIX index.Several extensions of our work are immediate—we have merely proposed a framework.The breath of the Gaussian process model through its covariance function is extensive—Rasmussen and Williams (2006) give a basic account while Wilson and Adams (2013) andWilson et al. (2016) exemplify extensions—and learning the covariance structure can evenbe included in the inference process, see Duvenaud et al. (2013) for the case of Gaussianprocess regression. Exploring diﬀerent covariances is one direction for future research.Similarly, we have only considered a simple data model with homoscedastic noise. Still,36lternative calibration objectives, such as weighted least squares and absolute pricingerror, directly transfer to likelihood models which can be adopted in our framework. Herewe also stress that more general models with correlated noise (representing believes aboutthe bid-ask spreads across strike-maturities) would easily ﬁt into our approach such thatnoise structure may be inferred. Finally, for ease of exposition, we have not looked at modelderivatives, such as prices of exotics, neither at hedging quantities nor risk measures.A more fundamental extension to our approach would be to cast the Brunick-Shrevemodel (Brunick and Shreve, 2013) in a Bayesian setting. The latent diﬀusion function ishere extended to depend on a third state, the running maximum of the stock price. Sincethe payoﬀ of barrier options depends on the latter, barriers have a similar role for thismodel as calls have for local volatility. With a pricing equation corresponding to Dupire’sequation derived in Hambly et al. (2016), we could thus mimic our prevailing approachwith an extended input space of the Gaussian process. Albeit, due to the availability ofmarket data from barriers, this set-up would have its natural limitations. We leave thisfor future work. 37 eferenceseferences