[PDF] Consistent Recalibration Models and Deep Calibration

Abstract

Consistent Recalibration models (CRC) have been introduced to capture in necessary generality the dynamic features of term structures of derivatives' prices. Several approaches have been suggested to tackle this problem, but all of them, including CRC models, suffered from numerical intractabilities mainly due to the presence of complicated drift terms or consistency conditions. We overcome this problem by machine learning techniques, which allow to store the crucial drift term's information in neural network type functions. This yields first time dynamic term structure models which can be efficiently simulated.

Full PDF

CConsistent Recalibration Models and DeepCalibration

Matteo Gambara ∗ & Josef Teichmann † Abstract

Consistent Recalibration models (CRC) have been introduced to cap-ture in necessary generality the dynamic features of term structures ofderivatives’ prices. Several approaches have been suggested to tackle thisproblem, but all of them, including CRC models, suﬀered from numer-ical intractabilities mainly due to the presence of complicated drift termsor consistency conditions. We overcome this problem by machine learn-ing techniques, which allow to store the crucial drift term’s informationin neural network type functions. This yields ﬁrst time dynamic termstructure models which can be eﬃciently simulated.

Contents ∗ [email protected], ETH Zürich, Rämistrasse 101, Zürich. † [email protected], ETH Zürich, Rämistrasse 101, Zürich. a r X i v : . [ q -f i n . C P ] J un Deep calibration 25

Term structures of prices exist in many diﬀerent markets and belong to themost challenging topics for dynamic modelling in mathematical ﬁnance or eco-nometrics. Reasons for this are two-fold: ﬁrst we have to deal with poten-tially inﬁnitely many, strongly dependent prices satisfying mutual relations, and,second, the dynamics of those prices is subject to absence of arbitrage condi-tions. Whence it is a delicate issue to write models which take the full marketinformation as state variables into account, i.e. all available prices, and whichevolve at the same time in a way which satisﬁes all constraints. The problemhas been successfully dealt with in the context of bond prices within the Heath-Jarrow-Morton framework [16], but already the term structure of plain vanillaoption prices on one underlying ( S t ) has so far been too challenging to come upwith fully satisfying solutions despite deep theoretical insights. Of course thereare more involved term structures stemming from volatility cubes in treasuryor the combined S&P – VIX term structures. We consider our work a ﬁrst stepin the direction of tractable term structure modelling.Let us be more precise on this point (we assume an environment free ofinterest rates): the classical approach to modelling term structures consistsin choosing a class of stochastic process ( S t ) , which models the price of theunderlying with respect to a pricing or physical measure, on a stochastic basis,which encodes the information structure through a ﬁltration. Given market dataone model is selected from the given class via calibration, i.e. the procedureguaranteeing that all market prices are reproduced by the model. Then newproducts are priced and hedged with the calibrated model. In the realm of theterm structure of option prices on one underlying, parametric models like SABR,Heston or rough volatility models are used for this purpose, or non-parametricmodels like local volatility models are stochastic local volatility models. Thisworks in many respects very well, but suﬀers from dynamic shortcomings, i.e.newly arriving information leads via recalibration to a new model choice, whencean inconsistency over time in modelling (see, for example, [10], [13], [14], [26]).Term structure models try to overcome this issue by making market pricesstate variables of the model: the price to pay for this neo-classical approach iscomplexity. We shall outline without going into detail some of the suggestedapproaches here. Let us denote by ( C t ( T, K )) ≤ t ≤ T the stochastic process of2lain vanilla options on one underlying ( S t ) . Here K denotes the option’s strikeprice and T its maturity. The dynamic and static no arbitrage constraints areexpressed on the given stochastic basis by the existence of an equivalent measure Q such that E Q [( S T − K ) + |F t ] = C t ( T, K ) and C t ( T,

0) = S t for ≤ t ≤ T hold true. The question has been raised which codebook shouldbe used to facilitate the most dealing with those constraints. Roughly speakingthree suggestions have been made, which we shortly introduce here:• Given the current market price S t there is a unique (implied) volatility σ t ( T, K ) such that the Black-Scholes formula BS produces the correctmarket price BS(

T, K, S t , σ t ( T, K )) = C t ( T, K ) for ≤ t ≤ T . This, however, yields two problematic aspects: ﬁrst, how todeal with the dynamic absence of arbitrage, and, second, how to expressthe static absence of arbitrage conditions for implied volatilities. Undersome regularity assumptions it turns out that one can write necessaryconditions for such a dynamics by imposing that ( σ t ( T, K )) ≤ t ≤ T remainswithin the set of statically arbitrage free surfaces and that the process (BS( T, K, S t , σ t ( T, K ))) ≤ t ≤ T is a martingale for all T, K provided that S is one. This case was outlinedby Schweizer and Wissel in [28] following the work done by Schönbucherin [27].• Given the current market price S t there is a unique local volatility σ ( t, s ) such that the pricing operator P for plain vanilla calls of the local volatilityequation dX r = σ ( r, X r ) dB r ; X t = S t produces the market prices C t ( T, K ) for T ≥ t and all K . Here situationis simpler, since local variance must simply be non-negative, however, thedynamic absence of arbitrage is more involved: ﬁrst, we impose that S has to be a martingale, and, second, we need that ( P ( S t , σ t ( · , · ) , T, K )) ≤ t ≤ T is a martingale, too, for all T, K . This involves a full ﬂedged solutionoperator of local volatility equations (see [2] for more information).• Given the current market price S t there is a unique time-dependent Lévyprocess ( b t , c t , ν t ) = L t characterised by its Lévy triplet such that thepricing operator P of the corresponding exponential Lévy martingale pro-duces the market prices C t ( T, K ) for T ≥ t and all K , just as before.3ere situation is slightly more complicated than in the case of local volat-ility, since Lévy triplets are more complicated objects than non-negativefunctions of two variables. On the other hand the pricing operator is con-siderably simpler due to Fourier pricing. We impose again that S is amartingale, and that ( P ( S t , L t , T, K )) ≤ t ≤ T is a martingale, too, for all T, K . This approach has been developed at thesame time by Carmona and Nadtochiy ([2]) and by Kallsen and Krühnerin [21], still with diﬀerent choices of codebooks. Therefore, in the followingwe will refer to this approach as CNKK from author’s initials.It is the goal of this article to make the CNKK approach work in the settingof Consistent Recalibration Models, i.e. where we consider tangent aﬃne models.Basically speaking this amounts to storing the information of a non-linear driftoperator in a neural network in an optimal way, in case when the time evolutionis locally mimicking a dynamically changing aﬃne model. This is the simplestway to consistent construct term structure dynamics, which do not come fromﬁnite dimensional realizations.Actually this information, which is stored in the drift, corresponds to solvingan inverse problem or a calibration problem, see [6] and the references thereinfor a general background of this problem: more precisely, it is the inversion ofthe above mentioned pricing operators given the current market state of theunderlyings’ price and the term structure of derivatives’ prices. Let us outlinethis in case of the Lévy codebook: there the inverse problem corresponds to cal-culating the time-dependent Lévy triplet L given the price of the underlying S and the term structure of derivatives’ prices. Even though the map from modelcharacteristics to prices is usually smooth, it is due to smoothing properties,often hard to invert: existence, uniqueness and stability issues (in the sense ofJacques Hadamard) appear. Machine learning technology provides one way toﬁx these issues, which otherwise require sophisticated regularization techniques,by implicit regularization, see, e.g., [18]. In other words: learning the map fromderivatives prices (given the current price of the underlying) to model character-istics L and storing the information in a neural network provides an accuratemap satisfying Hadamard’s requirements. We shall pursue this approach notin the originally proposed way by solving a supervised learning problem, see,e.g., [19], but rather by storing ﬁrst the information of the pricing operator ina neural network and then inverting this network, compare here, e.g., [20].One could also view the current model as an unusually parmaterized neuralstochastic diﬀerential equation (NSDE) model, see, e.g., [6] for details on thisconcept. NSDEs, i.e. stochastic diﬀerential equations with neural network char-acteristics, are a wonderful concept to construct non-parametric models, but it isquite delicate to write constraint dynamics with neural networks. Therefore wehave chosen Consistent Recalibration Models with tangent aﬃne models, whereit is easier to express constraints in terms of neural networks for the drift. Anon-parametric approach aluding to NSDEs (and based on random signaturemethods, see [7]) will be presented in upcoming work.4he remainder of the article is structured as follows. In the second section,we introduce the concept of a Lévy triplet codebook , as shortly aluded to above,and we deﬁne consistent recalibration (CRC) models. We also brieﬂy reviewaﬃne models and embed stochastic volatility aﬃne models in the context ofCRC models, outlining some key properties of this codebook. The third sectionis dedicated to one of the building blocks of the whole theory: generalisedHull-White extensions. We do not simply use the Hull-White extension, as itwas exploited when ﬁrst deﬁned, for the calibration at the initial time of theterm structure in the interest rate models, but we think of it as a tool thatallows for recalibration of the model parameters. Further, we talk about a generalised extension, since we are replacing the pure drift addition typical ofinterest rate models with a Lévy process, which naturally encodes a greatercalibration power. To make things clearer, an example is laid out in the fourthsection, where we analyse how the generalised Hull-White extension is addedto the Heston model in order to get a consistent recalibration model, whatwe call a generalised Bates model. The same example is important because avery similar version of this model has been implemented numerically. The ﬁfthsection is devoted to the CNKK equation: how it is derived, deﬁned and howit can be seen as a generalisation of the HJM equation. Section 6 is dedicatedto the formal deﬁnition of CRC models of stochastic volatility aﬃne modelswith piecewise constant model parameters for pricing stocks’ derivatives. Somenumerical considerations are also listed, to show what are the most relevantaspects to deal with in case of a concrete implementation. One of this pointis the main subject of Section 7, where we discuss about calibrating the modelusing a neural network. A geometric interpretation is presented in Section 8,starting from theory of ﬁnite dimensional realisations and its connections withFrobenius’ theorem. Subsections are dedicated to the available literature andto the architecture and training techniques used to achieve the ﬁnal result. Inthe end, the conclusion summarises the main results and novelties of the paper. Notation.

The set N denotes the set of natural numbers with included; R m + the real vectors in R m whose components are greater than .With L ( X ) we denote the set of X -integrable predictable processes for asemimartingale X . If we talk about ( X, Y ) as a ( m + n ) -semimartingale, we meanthat X is an R m -valued semimartingale and Y is an R n -valued semimartingale.Whenever we apply complex logarithm of continuous functions R n (cid:51) u (cid:55)→ f ( u ) (cid:54) = 0 , we use the normalisation log f (0) = 0 , so that logarithms are uniquelydeﬁned.For the sake of readability, SDE and SPDE will be used as acronyms forstochastic diﬀerential equation and stochastic partial diﬀerential equation, re-spectively. 5 Consistent Recalibration Models

We take inspiration from the seminal paper of Kallsen and Krühner [21], but weplace in a more general framework that does not necessarily rely on the inﬁnitedivisibility of the processes. Let (Ω , F , ( F ) t ≥ , Q ) denote a ﬁltered probabilityspace, where Q represents a risk-neutral measure so that discounted asset pricesare supposed to be Q -martingales. All expectations, if not diﬀerently speciﬁed,are also taken with respect to the same probability measure and are denotedby E . We consider an adapted (multivariate) stochastic process X := ( X t ) t ≥ taking values in R n , which can be considered as logarithms of price processes.We assume that call options of any strike and maturity are liquidly tradedand denote the time t value of a call option with maturity T and strike K by C t ( T, K ) . Having liquid market prices translates, in mathematical terms, inhaving the marginal distributions of several (at most n ) underlying processes.Similar to [21], we could at this point require that the given marginal distri-butions of X are inﬁnitely divisible and that the characteristic function areabsolutely continuous with respect to timeand deﬁne the codebook of our modelas a “forward” Lévy exponent and then deﬁne dynamics for such (inﬁnite di-mensional) codebook. Rather than focusing on the joint behaviour of X and thecodebook, whose dynamics are expressed in function of a generic semimartingale M , as done in [21], we exploit the intuition behind the choice of such codebook,since it provides easier conditions to avoid dynamic and static arbitrage, but webuild around it a new framework.For this reason, we conveniently deﬁne consistent recalibration models asmodels that keep consistency , which means that future realisations will be ina neighbourhood of the current state that can always be reached with posit-ive probability, and that are analytically tractable , thus looking as ﬁnite factormodels instantaneously. This is reached by introducing stochastic parameters,whose dynamics could be extrapolated by market data, and by means of a Hull-White extension, used to compensate the stochastic updates in the parameters,while leaving the marginal distributions of the state variables unchanged.We start deﬁning the set of functions that is the base for our theory: Deﬁnition 2.1 ( Γ n ) . The set Γ n denotes the collection of continuous functions η : R n × R ≥ → C such that there exists a càdlàg process Z with independentincrements and ﬁnite exponential moments E [exp((1 + ε ) (cid:107) Z T (cid:107) ] < ∞ for all T ≥ and for some ε > satisfying E [exp(i (cid:104) u, Z T (cid:105) )] = exp (cid:32) i (cid:104) u, Z (cid:105) + (cid:90) T η ( u, r ) dr (cid:33) (2.1) for T ≥ .Remark . Requiring that for some ε > , E [exp((1 + ε ) (cid:107) Z T (cid:107) ] < ∞ impliesthat we can extend the left hand side of (2.1) to the strip − i[0 , n × R n , thuswe could choose, for example, u = − i . 6 emark . All functions η ∈ Γ n are necessarily of Lévy-Khintchine type at theshort end ( r = 0 ). Note that by doing so, we are extending the deﬁnitions givenin [2] and [21] since we only assume the function η to be of Lévy-Khintchineform at the short end. Remark . Often, elements in Γ n are subject to additional no-arbitrage con-straints in order to satisfy, for example, the martingale property for both theprice processes S = exp( X ) and call options C t ( T, K ) for all T, K > (seeTheorem 3.7 in [21]). For instance, if we consider X = ( X i ) di =1 as being alog-price process for some i , we also assume exp( X i ) is a martingale, which isequivalent to state that η ( r, e i ) = 0 with e i being the i -th basis vector of R n ,i.e. (cid:104) e i , X T (cid:105) = X iT . We assume tacitly that such conditions are imposed if ne-cessary. Notice in case of a components of X corresponding to interest rates wedo not need to impose such a condition (since we do not need martingality).We can think of the set Γ n as a chart, in the language of geometry, orcodebook, in the language of mathematical ﬁnance, for all liquid market pricesat one instant of time. If we want to consider their time evolution, we had betterdeﬁne Γ n -valued processes: Deﬁnition 2.5 ( Γ n -valued semimartingale) . Let (Ω , F , ( F t ) t ≥ , Q ) be a ﬁlteredprobability space. A stochastic process η is called a Γ n -valued semimartingale if ( η t ( u, T )) ≤ t ≤ T is a complex-valued semimartingale for T ≥ and u ∈ R n andif (cid:0) ( u, r ) (cid:55)→ η t ( u, r + t ) (cid:1) ∈ Γ n . In particular, all trajectories are assumed to be càdlàg.

Deﬁnition 2.6 (Regular decomposition) . We say that η allows for a regulardecomposition with respect to a d -dimensional semimartingale M if there existpredictable processes ( α t ( u, T )) ≤ t ≤ T taking values in C with α t (0 , T ) = 0 forall ≤ t ≤ T and ( β t ( u, T )) ≤ t ≤ T , C d -valued, with β it (0 , T ) = 0 for all i and all ≤ t ≤ T and for T ≥ and u ∈ R n such that η t ( u, T ) = η ( u, T ) + (cid:90) t α s ( u, T ) ds + d (cid:88) i =1 (cid:90) t β is ( u, T ) dM is (2.2) for ≤ t ≤ T , and (cid:18)(cid:113)(cid:82) Tt (cid:107) β t ( u, r ) (cid:107) dr (cid:19) t ≥ ∈ L ( M ) . In view of the two new deﬁnitions, we can also generalise the conditionexpressed in (2.1):

Deﬁnition 2.7 (Conditional expectation condition) . Let (Ω , F , ( F t ) t ≥ , Q ) bea ﬁltered probability space, then we say that a tuple ( X, η ) of an n -dimensionalsemimartingale X and of a Γ n -valued semimartingale η satisﬁes the conditionalexpectation condition if E [ exp(i (cid:104) u, X t (cid:105) ) | F s ] = exp (cid:18) i (cid:104) u, X s (cid:105) + (cid:90) ts η s ( u, r ) dr (cid:19) (2.3)7 or ≤ s ≤ t . At this point, the link of the whole theory to its discrete-time counterpartexposed in [25] becomes even clearer:

Deﬁnition 2.8 (Forward and process characteristics) . Let X be an adaptedsemimartingale taking values in R n and η a Γ n -valued semimartingale with η s (0 , t ) = 0 for all ≤ s ≤ t and for which the conditional expectation condition (2.3) is satisﬁed. Then the process η is called forward characteristic process of X . Analogously, the process denoted by κ Xs and that coincides with the shortend of the forward characteristics of X , i.e. η s − ( · , s ) , with κ Xs (0) = 0 for all s ≥ is said (process) characteristic of X .Remark . Note that both processes are uniquely deﬁned (up to a d Q ⊗ dt -nullset):1. The normalisation η s (0 , t ) = 0 for all ≤ s ≤ t ensures that the map u (cid:55)→ η s ( u, t ) is continuous and uniquely deﬁned through the use of thecomplex logarithm.2. Since the adapted process (cid:0) exp (cid:0) i (cid:104) u, X s (cid:105) − (cid:82) s κ Xr ( u ) dr (cid:1)(cid:1) s ≥ is a localmartingale (see Theorem 2.11 below) and κ Xr (0) = 0 for any r ≥ , unique-ness follows from Lemma A.5 in [21]. Deﬁnition 2.10 (Term structure for derivatives) . We call the tuple ( X, η ) ofan n -dimensional semimartingale X and of its Γ n -valued forward characteristicprocess η a term structure model for derivatives’ prices . With the following theorems, we are able to characterise which processes η can be considered forward processes, given the existence of a regular decompos-ition. Discrete versions of the same theorems are given in [25], while proofs forthe continuous case under examination are in [21]. Theorem 2.11.

Let (Ω , F , ( F t ) t ≥ , Q ) be a ﬁltered probability space togetherwith a tuple ( X, η ) of an n -dimensional semimartingale X and of a Γ n -valuedsemimartingale η satisfying the conditional expectation condition , then• the diﬀerentiable, predictable characteristic κ X of the n -dimensional se-mimartingale X exists and is given by κ Xt ( u ) = η t − ( u, t ) (usually called short end condition ) for t ≥ and u ∈ R n , i.e. the process exp (cid:18) i (cid:104) u, X t (cid:105) − (cid:90) t η s − ( u, s ) ds (cid:19) (2.4) is a local martingale.• If η allows for a regular decomposition (2.2) with respect to a d -dimensionalsemimartingale M , then the drift condition (cid:90) Tt α t ( u, r ) dr = η t − ( u, t ) − κ ( X,M ) t (cid:32) u, − i (cid:90) Tt β t ( u, r ) dr (cid:33) (2.5) holds for ≤ t ≤ T and u ∈ − i[0 , n × R n .

8t might be useful for the reader having in mind the following expression for theforward characteristics of the ( n + d ) -semimartingale ( X, M ) : for all t such that ≤ t ≤ T , we have exp (cid:16) η ( X,M ) t ( u, v ; T ) (cid:17) = E [ exp (i (cid:104) u, ( X T − X t ) (cid:105) + i (cid:104) v, M T − M t (cid:105) ) | F t ] from which it is easier to derive the expression for κ ( X,M ) t for ≤ t ≤ T . Theorem 2.12.

Let (Ω , F , ( F t ) t ≥ , Q ) be a ﬁltered probability space togetherwith a tuple ( X, η ) of a n -dimensional semimartingale X and a Γ n -semimartin-gale η . Furthermore, assume that η allows for a regular decomposition (2.2) withrespect to a d -dimensional semimartingale M such that the predictable charac-teristics of X satisfy (2.4) and such that the drift condition (2.5) holds, thenthe conditional expectation condition holds true. Corollary 2.13.

Let (Ω , F , ( F t ) t ≥ , Q ) be a ﬁltered probability space togetherwith a tuple ( X, η ) where X is a n -dimensional semimartingale and η , Γ n -semi-martingale, satisﬁes the conditional expectation condition . Moreover, assumethat η allows for a regular decomposition (2.2) with respect to a d -dimensionalsemimartingale M and that the processes X and M are locally independent ,i.e. κ X,Mt ( u , u ) = κ Xt ( u ) + κ Mt ( u ) (2.6) for u ∈ R n and u ∈ R d . Then (cid:90) Tt α t ( u, r ) dr = − κ Mt (cid:32) − i (cid:90) Tt β t ( u, r ) dr (cid:33) for ≤ t ≤ T and u ∈ i[0 , × R n and, furthermore, the conditional expectationCondition (2.3) rewrites as E (cid:20) exp (cid:18)(cid:90) ts η r − ( u, r ) dr (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) F s (cid:21) = exp (cid:18)(cid:90) ts η s ( u, r ) dr (cid:19) for ≤ s ≤ t .Proof. To obtain the new form of the conditional expectation condition is enoughto use (2.4).The two previous theorems basically ratify the equivalence between the con-ditional expectation condition on one hand, and the short end and drift condi-tions on the other. In [21], since they assume η being of Lévy-Khintchine typefor all times, it is possible to show equivalence with the fact of S = exp( X ) and C t ( T, K ) being martingales. This is not given for free in our settings,but requires additional assumptions (see Remark 2.4). For example, requiring See [21] for a rigorous deﬁnition. S = exp( X ) is a 1-dimensional martingale is equivalent to the condition η s ( − i , t ) = 0 for all ≤ s ≤ t . In this case, indeed, we can write E (cid:104) e i u ( X t − X s ) (cid:12)(cid:12)(cid:12) F s (cid:105) = exp (cid:18)(cid:90) ts η s ( u, r ) dr (cid:19) and for u = − i we have E (cid:2) e X t − X s (cid:12)(cid:12) F s (cid:3) = 1 for all ≤ s ≤ t . Remark . Forward characteristics encode the term structure of distributionsof increments of the stochastic process X , i.e. for ≤ t ≤ T , the distributionsof X T − X t conditional on the information F t at time t . Notice that thereis redundant information in processes of forward characteristics, which thentranslates into the drift conditions (2.5). In this subsection, we introduce aﬃne processes and give some important re-sults on their forward characteristic processes. Moreover, since we are mainlyinterested in aﬃne stochastic volatility models, we will state some properties fortheir particular case.Let D be a non empty Borel subset of R d to which we associate the set U := { u ∈ C d : sup x ∈ D Re (cid:104) u, x (cid:105) < ∞} . Deﬁnition 2.15 (Aﬃne process) . An aﬃne process is a time-homogeneousMarkov process ( X t , P x ) t ≥ ,x ∈ D with state space D , whose characteristic func-tion is an exponentially aﬃne function of the state vector. This means that itstransition kernel p t satisﬁes the following:• it is stochastically continuous, i.e. lim s → t p s ( x, · ) = p t ( x, · ) weakly on D for every t ≥ and x ∈ D , and• its Fourier-Laplace transform has exponential aﬃne dependence on theinitial state. This means that there exist functions Φ :

U × R ≥ → C and ψ : U × R ≥ → C d with E x,y (cid:104) e (cid:104) u,X t (cid:105) (cid:105) = Φ( u, t ) e (cid:104) x,ψ ( u,t ) (cid:105) , (2.7) for all x ∈ D , u ∈ U and t ∈ R ≥ .Remark . The existence of a ﬁltered probability space (Ω , F , ( F t ) t ≥ ) isalready included by the notion of Markov process (see [23]). Remark . The deﬁnition we gave is not the original provided by Duﬃeet al. in [9] but a slightly more general one: the right hand side of (2.7) isequal to e φ ( u,t )+ (cid:104) x,ψ ( u,t ) (cid:105) as long as we know that Φ( u, v, t ) (cid:54) = 0 , but this canbe shown ([24]) and not postulated (as done in [9]). From now on, we assume Φ( u, t ) = exp( φ ( u, t )) . A-priori we do not even have a unique deﬁnition ofthe functions ψ and φ , but we can assume the normalisation φ ( u,

0) = 0 and ψ i ( u,

0) = u for all u ∈ U and all i = 1 , . . . , d , which makes the functions unique.10n this subsection we build a generic example for term structure models forderivatives’ prices. Therefore, we deﬁne an aﬃne stochastic volatility model: Deﬁnition 2.18 (Aﬃne stochastic volatility model) . Let us consider a properconvex cone C ⊂ R m (the stochastic covariance structures ). An aﬃne stochasticvolatility model is a time-homogenous aﬃne (Markov) process ( X, Y ) takingvalues in R n × C relative to some ﬁltration ( F t ) t ≥ and with state space D = R n × C such that• it is stochastically continuous, that is, lim s → t p s ( x, y, · ) = p t ( x, y, · ) weaklyon D for every t ≥ and ( x, y ) ∈ D , and• its Fourier-Laplace transform has exponential aﬃne dependence on theinitial state. This means that there exist (deterministic) functions φ : U × R ≥ → C and ψ C : U × R ≥ → C m with E (cid:104) e (cid:104) u,X t (cid:105) + (cid:104) v,Y t (cid:105) (cid:12)(cid:12)(cid:12) F s (cid:105) = e φ ( u,v,t − s )+ (cid:104) u,X s (cid:105) + (cid:104) ψ C ( u,v,t − s ) ,Y s (cid:105) , (2.8) for all ( x, y ) ∈ D , ≤ s ≤ t and ( u, v ) ∈ U , where U := { ( u, v ) ∈ C n + m | e (cid:104) u, ·(cid:105) + (cid:104) v, ·(cid:105) ∈ L ∞ ( D ) } , and the normalisations φ ( u, v,

0) = 0 and ψ iC ( u, v,

0) = v for all ( u, v ) ∈ U and i = 1 , . . . , m .Remark . In line with literature on aﬃne processes there is a C n + m -valuedfunction ψ , whose projection onto the X -directions is u , as exempliﬁed in (2.8).Whence we only need the projection in the C -directions, which we denote by ψ C . This corresponds to a standard assumption if we consider X as a priceprocess: if we move X s by a quantity x , then also X t gets shifted by the sameamount.Functions φ and ψ C are important because they allow the introduction ofthe so-called functional characteristics (because of complete characterisation)of the aﬃne process ( X, Y ) . We deﬁne F ( u, v ) := ∂φ∂t ( u, v, t ) (cid:12)(cid:12)(cid:12) t =0 + , R C ( u, v ) := ∂ψ C ∂t ( u, v, t ) (cid:12)(cid:12)(cid:12) t =0 + (2.9)for all ( u, v ) ∈ U and continuous in (0 , (see [24]). Equations (2.9) are called Riccati equations .More in general, we can also deﬁne the generalised Riccati equations andprove the following theorem (from [22]): Theorem 2.20.

Suppose that | φ ( u, w, T ) | < ∞ and (cid:107) ψ C ( u, w, T ) (cid:107) < ∞ forsome ( u, w, T ) ∈ U × R ≥ . Then for all t ∈ [0 , T ] and v with Re v ≤ Re w the The name comes from the fact that they boil down to the well-known Riccati equationswhen ( X, Y ) is a diﬀusion process. erivatives (2.9) exist. Moreover, for t ∈ [0 , T ) , φ and ψ C satisfy the generalisedRiccati equations: ∂∂t φ ( u, v, t ) = F ( u, ψ C ( u, v, t )) , φ ( u, v,

0) = 0 (2.10a) ∂∂t ψ C ( u, v, t ) = R C ( u, ψ C ( u, v, t )) , ψ C ( u, v,

0) = v. (2.10b)We can also derive the following proposition: Proposition 2.21.

Let ( X, Y ) be a homogenous aﬃne process taking values in D = R n × C , then for t ≤ T we have that φ ( u, , t ) = (cid:90) t F ( u, ψ C ( u, , s )) ds and ψ C ( u, , t ) = (cid:90) t R C ( u, ψ C ( u, , s )) ds, where ( u, v ) (cid:55)→ F ( u, v ) and ( u, v ) (cid:55)→ (cid:104) R C ( u, v ) , y (cid:105) are of Lévy-Khintchine form.Proof. While the ﬁrst part automatically comes from the deﬁnition of general-ised Riccati equations, the second can be found in [22].

Corollary 2.22.

Let ( X, Y ) be a homogeneous aﬃne process taking values in D = R n × C and assume that the ﬁnite moment condition E [exp((1 + ε ) (cid:107) X t (cid:107) )] < ∞ holds true for some ε > , then for ≤ t ≤ Tη t ( − i u, T ) := F ( u, ψ C ( u, , T − t )) + (cid:104) R C ( u, ψ C ( u, , T − t )) , Y t (cid:105) deﬁnes a Γ n -valued semimartingale and the tuple ( X, η ) satisﬁes the conditionalexpectation condition.Proof. The proof follows from the previous proposition and simple algebraicoperations. For any ≤ t ≤ T , we have E (cid:104) e (cid:104) u,X T (cid:105) (cid:12)(cid:12)(cid:12) F t (cid:105) = e φ ( u, ,T − t )+ (cid:104) u,X t (cid:105) + (cid:104) ψ C ( u, ,T − t ) ,Y t (cid:105) = e (cid:104) u,X t (cid:105) + (cid:82) T − t F ( u,ψ C ( u, ,r )) dr + (cid:104) (cid:82) T − t R C ( u,ψ C ( u, ,r )) dr,Y t (cid:105) = e (cid:104) u,X t (cid:105) + (cid:82) T − t F ( u,ψ C ( u, ,r ))+ (cid:104) R C ( u,ψ C ( u, ,r )) ,Y t (cid:105) dr = e (cid:104) u,X t (cid:105) + (cid:82) Tt F ( u,ψ C ( u, ,r − t ))+ (cid:104) R C ( u,ψ C ( u, ,r − t )) ,Y t (cid:105) dr = e (cid:104) u,X t (cid:105) + (cid:82) Tt η t ( − i u,r ) dr .

12n interest rate theory, where aﬃne models proved to be a powerful tool,Hull-White extensions is realised by making the drift term time dependent andplays the fundamental role of allowing the calibration of an initial yield curveto the prescribed model. This will be the topic of the next Section.We see in the following some applications of such theory.

Example . Deterministic term structure of forward characteristics:

Deterministic forward term structure models correspond to time-dependent Lévyprocesses. More precisely, let ( X, η ) be a tuple satisfying the conditional ex-pectation condition and assume that η is a deterministic, then X is an additiveprocess and η t ( u, T ) = η ( u, T ) is of Lévy-Khintchine form for every T ≥ (compare, for example, with Deﬁnition 2.6). A particular example would beany time-dependent Lévy model. Example . Interest rate models:

If the process X is one-dimensional,pure-drift and absolutely continuous with respect to Lebesgue measure, thenwe fall in the case treated in Corollary 2.13 and we have (cid:90) Tt α t ( u, r ) dr = − κ Mt (cid:32) − i (cid:90) Tt β t ( u, r ) dr (cid:33) , but also E (cid:34) exp (cid:32) − (cid:90) Tt η s − ( u, s ) ds (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35) = exp (cid:32) − (cid:90) Tt η t ( u, r ) dr (cid:33) , (2.11)from which we obtain uX t = uX − (cid:90) t η s − ( u, s ) ds. Equation (2.11) is also well-known in interest rate theory: if we denote with P ( t, T ) the price of a risk-less zero coupon bond, with f ( t, T ) the forward rateyield prevailing at t for T and with r ( t ) the short time interest rate at t , thenwe have P ( t, T ) = E (cid:104) e − (cid:82) Tt r ( s ) ds (cid:12)(cid:12)(cid:12) F t (cid:105) = e − (cid:82) Tt f ( t,S − t ) dS for ≤ t ≤ T and u ∈ R . Moreover, if we assumed M being a Brownian motion,then we would have κ Mt ( u ) = − u / and, thus, (cid:90) Tt α t ( u, r ) dr = − (cid:32)(cid:90) Tt β t ( u, r ) dr (cid:33) , from which, diﬀerentiating both sides with respect to T , we obtain the well-known HJM drift condition α t ( u, T ) = − β t ( u, T ) (cid:90) Tt β t ( u, r ) dr. Notice that ( η s − ( u, s )) s ≥ is linear in u , since X is pure drift.13 Generalised Hull-White extension

Hull-White extension of Vašíček model was performed adding a time-dependentconstant drift to the equation for the short term interest rate r , in order to havea perfect match with the current ( t = 0 ) term structure of forward rates and,thus, to enhance calibration.In this case, we will take a more general approach and will encode the exten-sion, represented by a Lévy process, in the constant part of the aﬃne process(responsible for the state-independent characteristics thereof). In other words,the function F will become consequently time-inhomogeneous, thus modifyingthe forward characteristics of the process X . Corollary 3.1.

Let ( (cid:101) X, Y ) be a time- inhomogenous , homogeneous càdlàg aﬃneprocess taking values in R n × C with time-dependent continuous T (cid:55)→ F T , andassume that the ﬁnite moment condition E (cid:104) exp((1 + ε ) (cid:107) (cid:101) X t (cid:107) ) (cid:105) < ∞ holds truefor some ε > , then for ≤ t ≤ T (cid:101) η t ( − i u, T ) := F T ( u, ψ C ( u, , T − t )) + (cid:104) R C ( u, ψ C ( u, , T − t )) , Y t (cid:105) deﬁnes a Γ n -valued semimartingale and the tuple ( (cid:101) X, η ) satisﬁes the conditionalexpectation condition.Remark . Here time-inhomogenous, homogenous aﬃne processes appear asgeneralisation of the approaches in [2] and [21] (CNKK-approach), since we cancalibrate a large variety of (virtually, any) initial term structure into t (cid:55)→ F t . Remark . Although we are only modifying the forward characteristic processof X , the process Y , which is Markov in its own ﬁltration, remains the same.This keeps the transformation simple and the processes tractable, since it doesnot aﬀect the stochastic covariance structure.The above structure increases the calibration properties of the original model.In addition, since the Lévy process is allowed to change over time, we could cal-ibrate it to match market conditions for other instant of times (apart the initialtime).The main consequence of having such a generalised Hull-White extension isthat we could compensate ﬂuctuations (i.e. recalibrations ) in the original model’sparameters with a calibration of the Lévy process, so to keep the price/volatilitysurface unchanged. In other words, we could consider the parameters of theoriginal model as state variables . When this is possible, we will talk about amodel that satisﬁes the consistent recalibration property .A valid question, at this point, would be to know when this is possible.Are there conditions that we could impose or verify to make sure that such acompensating mechanism can always happen?Let us denote with ( ν Lt ) t ≥ the Lévy measure of the time-dependent Lévyprocess L , with p t and Z t the set of parameters and state variables belonging tothe time-homogeneous model at time t , respectively, and with ν p t ,Z s the Lévy Since parameters can be considered as state variables, they are allowed to change in time. , wherewe made explicit the dependence on the parameters p t and state variables Z s ,for s ≤ t . Proposition 3.4.

Let us assume that the stochastic parameter process ( p t ) t ≥ has trajectories, whose total variation is bounded by a deterministic constant,take values on the compact set Θ of admissible parameters. Moreover, assumethat p (cid:55)→ ν p, · is continuously diﬀerentiable and that p remains constant whenever Z leaves a prespeciﬁed compact set K . Then, if for all t ≥ we have the non-negativity condition ν Lt ≥ (cid:88) ≤ s ≤ t ν p s ,Z s − t − ν p s − ,Z s − t , (3.1) the consistent recalibration property holds.Proof. The proof is done by induction on the jumping times of the parameterprocess p . For more details, see [25].As already mentioned above, this add-on will transform the functional char-acteristic F in a time-dependent function and it might be worth noticing howthis happens in practice.Using the same notation introduced in [25], we can deﬁne F T as it appearsin Corollary 3.1 adding a new time-dependent function µ : Deﬁnition 3.5 (Inc D ) . Let Z be a generic stochastic process with values inthe (state) space D , such that all increments ∆ Z s satisfy z + ∆ Z s ∈ D for any z ∈ D and any s ≥ . We denote by Inc D the set which contains all continuousfunctions µ : U × R ≥ → R of the type µ ( u, t ) := log E [exp( (cid:104) u, ∆ Z t (cid:105) )] for which µ (0 , t ) = 0 for all t ≥ . In other words, we are adding to the “old” F the cumulant generating func-tion of the process (∆ Z s ) s ≥ , that is F t ( u, v ) := F ( u, v ) + µ ( u, v, t ) for all u ∈ U and t ≥ . This will become even clearer in the following, when we will specifyour consistent recalibration model.Analogously to what already done, we can deﬁne (cid:101) φ and (cid:101) ψ as the time-in-homogeneous versions of φ and ψ as solutions to the time-inhomogeneous versionof the Riccati equations. In particular, for stochastic volatility aﬃne processes,similarly to Theorem 2.20, for s ≤ t we can write (compare with [25]): ∂∂t (cid:101) φ ( u, v ; s, t ) = F t ( u, (cid:101) ψ C ( u, v ; s, t )) , (cid:101) φ ( u, v ; 0 ,

0) = 0 (3.2a) ∂∂t (cid:101) ψ C ( u, v ; s, t ) = R C ( u, (cid:101) ψ C ( u, v ; s, t )) , (cid:101) ψ C ( u, v ; 0 ,

0) = v, (3.2b) Here, we mean that the price or volatility surface created by the model and the Lévymeasure should be the same. (cid:101) ψ C ( u, v ; s, t ) = ψ C ( u, v ; t − s ) . At this point, it is also possible to rewritethe expression for the forward characteristics of the time-inhomogeneous process (cid:101) X as (cid:90) Tt (cid:101) η t ( u, r ) dr = (cid:101) φ (i u, t, T ) + (cid:68) (cid:101) ψ C (i u, t, T ) , Y t (cid:69) . (3.3)In particular, for t = 0 we recover the characteristic function and we obtain (cid:90) T (cid:101) η ( u, r ) dr = (cid:101) φ (i u,

0; 0 , T ) + (cid:68) (cid:101) ψ C (i u,

0; 0 , T ) , y (cid:69) and, if we denote with C ( y ) the set of characteristic functions (cid:101) η , we notice thatfor any element (cid:101) η ∈ C ( y ) , there exists (at least) one µ ∈ Inc D that deﬁnes (cid:101) η itself. It is thus possible to establish a surjective function g between Inc D and C ( y ) . The existence of such g is equivalent to nothing but the Condition (3.1)previously stated since we can consider any jump-time as an initial startingpoint for the process ( (cid:101) X, Y ) due to the Markovianity of the process. Before introducing the consistent recalibration model more mathematically, letus brieﬂy recall the Heston model, which is an aﬃne stochastic volatility model,for X = log( S ) being the log-return of the underlying price dX ( t ) = (cid:16) r − q − V ( t ) (cid:17) dt + (cid:112) V ( t ) dW ( t ) , X (0) = x ,dV ( t ) = k [ θ − V ( t )] dt + σ (cid:112) V ( t ) dW ( t ) , V (0) = v ,dW ( t ) dW ( t ) = ρ dt, ρ ∈ [ − , , (4.1)and where r and q represent the instantaneous risk-free and dividend yieldsrespectively and are constant, θ > is the long-term mean of the variance, k > is the speed of mean-reversion, σ > represents the instantaneous volatility ofthe variance process V . In order to ensure positivity of the variance process, weneed to satisfy kθ > σ (Feller condition).For ≤ t ≤ T , we have that η deﬁnes a Γ -semimartingale: η t ( u, T ) = F (i u, ψ C (i u, , T − t )) + R C (i u, ψ C (i u, , T − t )) V t , where C coincides with R > and F ( u , u ) = kθu + ( r − q ) u ,R C ( u , u ) = 12 u ( u −

1) + 12 σ u + σρu u − ku . The Hull-White extension of the Heston model consists in a generalisedversion of the so-called Bates model in which we add a compensated jump Compensation is necessary to have a martingale process, as it is often the case in thepricing context. L with Lévy measure ν ( t, dx ) to the dynamics of the log-return X : dX ( t ) = (cid:16) r − q − V ( t ) (cid:17) dt + (cid:112) V ( t ) dW ( t ) + dL t . The ﬁrst consequence that should appear obvious is that we are enriching thespace of calibrated volatility surfaces, thanks to the Lévy process, while keepingthe same dimensions of the state variables. Accordingly, the functional charac-teristic F will then change to F t ( u , u ) = kθu + ( r − q ) u + µ L ( u , u , t ) , (4.3)where µ L is the cumulant generating function of L . As we will see below, wecan establish a bijective relation between µ L and the Lévy measure ν L .We are talking about a generalised Bates model since the Lévy measure isalso allowed to change in time and, as already said, this permits to make alsoother parameters time dependent. Time is mature to explain how the generalised Bates model can be used as aconsistent recalibration (CRC) model. Recall that although formally there areonly two state variables, now also parameters are free to change in time thanksto the compensation mechanism that the Hull-White extension provides.Let us start at time t = t with a log-price X t , a set of parameters p t = ( r, q, k, θ t , σ t , ρ t ) , an initial variance for the log-price V t and the com-pensated jump Lévy process L t which represents the Hull-White extension.Note that some of the parameters are not constant, but change over time andare denoted by the time-index. This particular combination of state variablesand parameters fully speciﬁes a particular model M among all possible mod-els M i that can represent an implied volatility surface (IVS) without breakingany no-arbitrage constraints and should be able to reﬂect those market condi-tions that are summarised by the IVS at time t . It is thus natural to writeIVS t = IVS ( X t , V t ; θ t , σ t , ρ t ; { T i } , { K j } ) for the volatility surface at time t . Themodel state variables X and V are thus able to evolve in time until M is ableto mirror the market. Eventually, this situation will break at time t = t and anew calibration will be necessary.1. From time t on, X and V can evolve until t = t + ∆ t , where the newvolatility surface is given by IVS ( X t , V t ; θ t , σ t , ρ t ; { T i } , { K j } ) .2. Parameters ( θ t , σ t , ρ t ) will move to another conﬁguration ( θ t , σ t , ρ t ) ,but, in order to have a smooth change between the ﬁrst conﬁguration andsecond,3. Also the Lévy process will be adjusted and will compensate the changesin the parameters ( θ t , σ t , ρ t ) to reproduce the same IVS.4. In this way we can represent realistically the behaviour of the market.17 V S m o d e l s t = t t = t t = t M M M i M Figure 1: At t = t the change in the parameters ( θ t , σ t , ρ t ) would cause ajump in the volatility structure (snake arrow), but this is compensated by thechange in the Lévy process L (red bent arrow).The recalibration of the Hull-White extension is also preserving the drift-con-dition of the forward characteristics, thus ensuring that we do not violate no-arbitrage constraints. The new model M will then specify the evolution intime of the state variables until another recalibration will be needed.Notice that the evolution of the state variables is determined by the “old”parameters ( θ t i , σ t i , ρ t i ) on the closed interval [ t i , t i +1 ] and for this reason thereis no discontinuity at t = t i +1 in the modelled IVS caused by the parametersmovement. The mathematical formulation that is needed to describe what we sketchedabove starts from (2.2). If we rewrite the same equation using Musiela’s para-metrisation, deﬁning x := T − t , then the map becomes ( u, t, x ) (cid:55)→ η t ( u, t + x ) in the new notation. In addition, let us introduce the strongly continuous semi-group { S ( t ) | t ≥ } of right shifts, such that for a proper function g this ismapped to S ( t ) g ( u, · ) = g ( u, t + · ) . Then we can rewrite Equation (2.2) as η t ( u, t + x ) = S ( t ) η ( u, x )+ (cid:90) t S ( t − s ) α s ( u, t + x ) ds + d (cid:88) i =1 (cid:90) t S ( t − s ) β is ( u, t + x ) dM is , (5.1)18hich can be rewritten in terms of θ t ( u, x ) := η t ( u, x ) and, with abuse of nota-tion, α t ( u, x ) := α t ( u, t + x ) , β t ( u, x ) := β t ( u, t + x ) as θ t ( u, x ) = S ( t ) θ ( u, x )+ (cid:90) t S ( t − s ) α s ( u, x ) ds + d (cid:88) i =1 (cid:90) t S ( t − s ) β is ( u, x ) dM is . (5.2)Finally, the passage to the limit will justify what written in the next subsection. The framework we will develop in the following will allow a thorough analyses offactor models in the CNKK-approach introduced in [2] and [21], yet with crucialdiﬀerences. For example, as already done by Kallsen and Krühner, we assumethat the volatility processes β i of the forward characteristic η are functions ofthe present state of η itself, i.e. for all i = 1 , . . . , dβ it ( u, T )( ω ) = σ (cid:0) t, η t − ( · , · )( ω ) (cid:1) ( u, T ) , but, since we introduce the right-shift operator, we will obtain an SPDE andnot simply an SDE. Deﬁnition 5.1 (Lévy codebook Hilbert space) . Let G be a Hilbert space ofcontinuous complex-valued functions deﬁned on the strip − i[0 , n × R n , i.e. G ⊂ C (( − i[0 , n ) × R n ; C ) . H is called a Lévy codebook Hilbert space if H is a Hilbert space of continuousfunctions η : R ≥ → G , i.e. H ⊂ C ( R ≥ ; G ) such that• There is a continuous embedding H ⊂ C ( R ≥ × ( − i[0 , n ) × R n ; C ) ,• The shift semigroup ( S t η )( u, x ) := η ( u, t + x ) acts as strongly continuoussemigroup of linear operators on H ,• Continuous functions of ﬁnite activity Lévy-Khintchine type ( u, t ) (cid:55)→ i (cid:104) a ( t ) , u (cid:105) − (cid:104) u, b ( t ) u (cid:105) (cid:90) R n (cid:0) exp(i (cid:104) ξ, u (cid:105) ) − (cid:1) ν t ( dξ ) lie in H , where a , b , ν are continuous functions deﬁned on R ≥ takingvalues in R n , the positive-semideﬁnite matrices on R n and the ﬁnite pos-itive measures on R n . This is for example the case for processes withindependent increments and ﬁnite variation.Remark . Notice that we do not assume that there are additional stochasticfactors outside the considered parametrisation of liquid market prices.

Remark . Notice that elements of the Hilbert space H are understood inMusiela parametrisation and therefore denoted by a diﬀerent letter in the sequel.As already written in Subsection 5.1, we have the relationship η t ( u, t + x ) = θ t ( u, x ) , with x := T − t . In this sense, we also have the equality θ t ( u,

0) = κ Xt ( u ) for the predictable characteristics of X .19 eﬁnition 5.4 (CNKK equation) . Let H be a Lévy codebook Hilbert space. Wecall the following stochastic partial diﬀerential equation dθ t = (cid:0) Aθ t + µ CNKK ( θ t ) (cid:1) dt + d (cid:88) i =1 σ i ( θ t ) dB it (5.3) a CNKK equation ( θ , κ, σ ) with initial term structure θ and characteristics κ and σ , if• A = ddx is the generator of the shift semigroup on H ,• σ i : U ⊂ H → H , U an open subset of H , are locally Lipschitz vectorﬁelds, and• µ CNKK : U → H is locally Lipschitz and satisﬁes that for all η ∈ Γ n wehave (cid:90) T − t µ CNKK ( θ )( u, r ) dr = θ ( u, − κ θ (cid:32) u, − i (cid:90) T − t σ ( θ )( r, u ) dr ; 0 (cid:33) , (5.4) where ( κ θ ) θ ∈ U is Γ n + d -valued for each θ ∈ Γ n , such that κ θ ( u,

0; 0) = θ ( u, and κ θ (0 , v ; 0) = − (cid:107) v (cid:107) , for u ∈ R n , v ∈ R d .Remark . κ θ is the forward characteristic process associated to the couple ( X, B ) , where X is (still) the log-return price process and B the driving pro-cess of θ . Moreover, Equation (5.4) can be seen as a drift-condition and it isanalogous to Equation (2.5), reformulated under the Musiela’s parametrisation. Remark . It is evident how Equations (5.3) and (5.2) relate to each otherand how the former can be seen as the limit case of the latter.

Remark . We do not require that all solutions of Equation (5.3) are Γ n -valued, which would be too strong as a condition and diﬃcult to characterise.In particular, Γ n is a more general than what we need. Proposition 5.8.

Let θ be a Γ n -valued solution of a CNKK equation and let X be a semimartingale such that the predictable characteristics satisfy κ ( X,B ) t ( u, v ) = κ θ t ( u, v ; t ) for u ∈ R n , v ∈ R d and t ≥ , then the tuple ( X, θ ) satisﬁes the conditionalexpectation condition.Proof. The proposition is basically a consequence of Theorem 2.12: the driftcondition is satisﬁed by assumption and exp (cid:18) i (cid:104) u, X t (cid:105) − (cid:90) t κ ( X,B ) s − ( u, v ; s ) ds (cid:19) is a (local) martingale because of the Lévy-Khintchine assumption in Deﬁni-tion 5.1 regarding the functions of the Lévy codebook Hilbert space.20 .3 Generalisation of the HJM equation Equation (5.3) is very similar to the famous HJM equation , but there arerelevant diﬀerences, for example both the drift and drift condition are diﬀerentand, what is more, functions have another argument (a strike dimension) thatis completely missing in the case of the HJM equation, where only a time-dimension is considered.We can construct a particular example which corresponds indeed to the HJMequation: let us consider a situation without leverage (where the Brownianmotion B is independent of the return process X ), assuming that κ θ ( u, v ; 0) = θ ( u, − (cid:107) v (cid:107) , for u ∈ R n , v ∈ R d and t ≥ . Basically, we are looking at functions in arestricted Lévy space, for which the Lévy measure is null. This implies thatthe CNKK equation is a parameter-dependent HJM equation. In this case,Condition (5.4) can be simpliﬁed to µ CNKK ( θ )( u, x ) = − d (cid:88) i =1 σ i ( θ )( u, x ) (cid:90) x σ i ( θ )( u, s ) ds, for x ≥ and u ∈ R n (note the analogies with Example 2.24). Example . Black-Scholes model:

It might be interesting at this pointto see a concrete example coming from a simpler model. If we consider assetprices described by a geometric Brownian motion dS t = S t σdW t , where σ > isconstant and W is a standard Brownian motion, then the log-prices X are givenby dX t = d log S t = σ / dt + σdW t . We ﬁnd that η t ( − i u, r ) = / σ u ( u −

1) = F ( u ) , in the notation of (2.9). It is easy to see that η is pure-drift and that theextended functional characteristic becomes F t = σ t u ( u −

1) + µ L ( u, t ) , where µ L is the cumulant of the generalised Hull-White extension.From what has been said so far, it is clear how the CNKK equation is infact a generalisation of the HJM equation. Remark . It is possible to further increase the complexity of the equation,for example considering options on a term structure. In this case, we wouldneed another argument to take into account for both drift and volatility.

Remark . All these considerations are conceivable only because we are deal-ing with aﬃne processes. In general, for a return price process X , it might notbe possible to write the conditional expectation condition and to continue withthe following statements. Remark . It is possible to generalise what has been said in this sectionfor processes driven by inﬁnite dimensional Brownian motions, e.g. in Equa-tion (5.3) we could replace the sum (cid:80) di =1 with (cid:80) i ∈ N . The theory has been In the literature, this is also known as HJMM equation, where the last M stands forMusiela. µ CNKK , for which no explicitexpression is available.

We are now ready to face the same considerations we reported above in morerigorous settings. First of all, let us recap the most important equations. Forthe return process, we have dX ( t ) = δ Xt ( X t , V t ) dt + γ Xt ( X t , V t ) dW ( t ) + dL t , X (0) = x ,dV ( t ) = δ Vt ( X t , V t ) dt + γ Vt ( X t , V t ) dW ( t ) , V (0) = v , (6.1)with dW ( t ) dW ( t ) = ρ t dt , for ρ t ∈ [ − , . Drifts and volatility coeﬃcients aredenoted by δ and γ respectively and can be functions of X and V , e.g. γ X ( x, v ) = γ V ( x, v ) = √ v . While for the forward characteristic process, our codebook , wereport the CNKK SPDE dθ ( t ) = (cid:2) Aθ ( t ) + µ CNKK ( θ ( t )) (cid:3) dt + d (cid:88) i =1 σ i ( θ ( t )) dB i ( t ) , θ (0) = θ , (6.2)where the dependence among the Brownian motions B i with i = 1 , . . . , d andthese with W j for j = 1 , is not speciﬁed. The key relation that connects thetwo diﬀerent systems is given in Corollary 3.1 and is the following (rewritten inthe Musiela notation): θ t ( − i u, x ) = F t ( u, ψ C ( u, x )) + (cid:104) R C ( u, ψ C ( u, x )) , V t (cid:105) . (6.3)Last but not least, we should also remember that parameters are free to move intime. As such, we consider the process p , whose dynamics are exogenously given,but which are conﬁned inside the space of admissible parameters Θ ⊂ R M . Inthe example described in Subsection 4.1 it is deﬁned as p t = ( θ t , σ t , ρ t ) , t ≥ , given the constraints of positivity and Feller condition for σ t and θ t and ρ t ∈ [ − , . This is the reason why we used the subscript t in Equations (6.1) above.Let us suppose that at time t the model can ﬁt well the market surface given byobserved call prices (or, equivalently, of implied volatility surface) C obs t ( T i , K j ) for i = 1 , . . . , n and j = 1 , . . . , m . In this condition, the process ( θ, ( X, V )) arefree to evolve in time until the a new calibration is necessary. This is the casewhen ∆ C t := n (cid:88) i =1 m (cid:88) j =1 (cid:12)(cid:12) C model t ( T i , K j ) − C obs t ( T i , K j ) (cid:12)(cid:12) > ε, (6.4)22here C model t is the price of a call option at time t given by the model and ε isa threshold ﬁxed a priori.Thus, we can deﬁne the following hitting times: for i ∈ N , τ := inf { t > t : ∆ C t > ε } τ i +1 := inf { t > τ i : ∆ C t > ε } . (6.5)As already underlined in [21], the model price C model t can be expressed as ameasurable function of the codebook θ satisfying Equation (6.2). In particu-lar, since it is a progressively measurable process (it is right-continuous on acomplete probability space), we can use Début theorem, which tells us that thesequence ( τ i ) i = N is in fact made by stopping times . Remark . Since we are in the same framework as [8], and indeed we couldgeneralise all results to inﬁnite dimensional Brownian motion, it is worth men-tioning that the solution process θ satisﬁes the strong Markov property.The strictly increasing sequence ( τ i ) i = N is important since it is at theserandom (stopping) times that we have to run a new calibration procedure forthe model. Both the “true” state variables X and V and the parameter process p are allowed to change in order to have ∆ C t less than ε again. This is just theonly ﬁrst calibration problem we need to solve. Indeed, in order to compensatethe changes caused by the new parameters, we have to modify F t . In particular,we can calibrate the so-called Hull-White extension part, which enters F t asthe cumulant generating function µ L of the Lévy process L . This re-calibration ensures that we are not breaking the validity of Equation (6.3), while allowing foran “exact” match (in the sense of having ∆ C t < ε ) with the observed data. Once µ L is recovered, we are able to write down again the equation for the codebook θ . We can summarise this last passage more mathematically by introducing anoperator I such that I : Θ × R n + m + → Inc R n × C , (cid:0) p, ( C obs ) (cid:1) (cid:55)→ µ L . (6.6) Remark . The cumulant generating function µ L identiﬁes uniquely L if andonly if the process L has ﬁnite moments of order n for all n ∈ N . If we denotewith ν L the Lévy measure associated to L , then this is true if and only if ∀ n ∈ N , (cid:90) (cid:107) x (cid:107)≥ (cid:107) x (cid:107) n ν L ( t, dx ) < ∞ . Eventually, we are now ready to give the deﬁnition of Consistent Recalibra-tion (CRC) model with piecewise constant parameters p : Deﬁnition 6.3 (Consistent Recalibration Model with Piecewise-constant p ) . Let (Ω , F , F , P ) be a complete ﬁltered probability space. The quintuple ( θ, ( X, V ) , p , L, ( τ i ) i ∈ N ) is called consistent recalibration model for equity derivative pri-cing if for the stochastic processes ( θ, ( X, V ) , p ) with values in H × ( R n × C ) × Θ there exists a jump Lévy process L (with ﬁnite moments) such that the followingconditions are satisﬁed for all n ∈ N : i) The Hull-White extension L on [ τ n , τ n +1 ] is determined by calibration to θ ( τ n ) through µ L : θ ( τ n )( u,

0) = κ θ ( τ n ) ( u,

0; 0) ,µ L ( τ n ) = I (cid:0) p ( τ n ) , ( X ( τ n ) , V ( τ n )) , C obs τ n (cid:1) , and for t ∈ [ τ n , τ n +1 ] we have L ( t ) = S ( t − τ n ) L ( τ n ) . (ii) The evolution of ( X, V ) on [ τ n , τ n +1 ] corresponds to the Hull-White exten-ded stochastic volatility aﬃne model determined by the parameters p ( τ n ) and by the process L ( τ n ) : ( X, V )( t ) = (cid:16) X τ n ,X ( τ n ) , V τ n ,V ( τ n ) (cid:17) ( t ) for t ∈ [ τ n , τ n +1 ] , where ( X s,x , V s,v ) is the unique solution of the system of SDEs (6.1) on [ s, ∞ ) with initial conditions X ( s ) = x and V ( s ) = v and with L t replacedby L t − s . Implicitly, we also assume that all parameters p that enters themodel are admissible (with the usual meaning).(iii) The evolution of θ on [ τ n , τ n +1 ] is determined by X and V according tothe prevailing Hull-White extended stochastic volatility model: for t ∈ [ τ n , τ n +1 ] and x ∈ [0 , τ n +1 − τ n ] θ ( t )( − i u, x ) = F τ n ( u, ψ C ( u, x )) + (cid:104) R C ( u, ψ C ( u, x )) , V ( t ) (cid:105) . For the processes ( X, V ) and θ , we use the same symbols as in Equations (6.1)and (6.2), with a slight abuse of notation, since these stochastic processes evolvein the intervals [ τ n , τ n +1 ] following the same dynamics, but according to theparameters p ( τ n ) and to the process L ( τ n ) . The parameters p remain constant ineach interval of the type [ τ n , τ n +1 ) and, by construction, ( θ, ( X, V )) is continuouson every stopping time τ n .In this sense, any CRC model can be seen as the concatenation of stochasticvolatility aﬃne models with static parameters. There are practical remarks that we should consider when dealing with CRCmodels as deﬁned above in a numerical framework.•

Simulations : As already mentioned in [15], it is worth noting that sim-ulating ( X, V ) is much easier than simulating the HJM codebook, that is θ , in particular when this is inﬁnite dimensional, as it could be the casealso here. In fact, with the approach we are outlining, we do not need tosimulate anything from any inﬁnite dimensional distribution.24 Drift term : Even if we wanted to simulate θ solving the SPDE (6.2),we should be able to write down explicitly the drift term µ CNKK ( θ ) , but,apart from some degenerate cases, this is not possible. The only way toovercome this chasm is acknowledging Equation (6.3) as a key relation forthe entire construction.• Process p : If we assume that a piecewise process for the parameters p is given (or obtained through calibration), then CRC models can besimulated following steps ( i ) to ( iii ) in Deﬁnition 6.3.• Operator I : Last but not least, we have not speciﬁed precisely how theoperator I is acting. For the moment, we will consider it as an abstractoperator. This is anyway of great relevance because once we are able torecover L (or, alternatively, µ L ), we can obtain θ through Equation (6.3).Otherwise speaking, we could solve SPDE (6.2). If we look more closely to steps ( i ) − ( iii ) of Deﬁnition 6.3, it is possible torealise that the more complex aspect is given by the application of the operator I . This is basically a calibration conditioned on some (new) parameters andstate variables whose complexity depends on the distribution of the Lévy pro-cess L . In general, this is not a trivial operation, since it consists in solving an inverse problem that is ill-posed in the sense of Hadamard even for the easiestcases (e.g. Bates model). The inverse problem is ill-posed because of an iden-tiﬁability issue, which means that the information coming from market data isinsuﬃcient to exactly identify the parameters. If we express the quantity ∆ C t of Equation (6.4) as a function of the model parameters ϑ , that is ∆ C t ( ϑ ) = n (cid:88) i =1 m (cid:88) j =1 (cid:12)(cid:12) C model t ( ϑ ; T i , K j ) − C obs t ( T i , K j ) (cid:12)(cid:12) , we can write the identiﬁability problem as the fact that the function ∆ C t ( ϑ ) hasmany local minima. Furthermore, it is in general unclear whether these minimacan be reached by the adopted algorithm. For example, Cont and Tankov showin [4] that if one had available a set of call options prices (or, equivalently, impliedvolatilities) for all strikes (in a given time interval!) and a single maturity, thenit would be possible to deduce all the parameters of the model and, in particular,the Lévy triplet. But in reality this is never the case, since we only know pricesfor a ﬁnite number of strikes and, in addition, we also have observational errorsin the data. As a result, we have a serious identiﬁcation problem, exempliﬁed bythe fact that we can obtain the same prices for (inﬁnitely) many combinationsof the parameters. The strategy which they begin to develop in [4] and completein [5] is the use of the Kullback-Leibler divergence, also called relative entropy,25s a regulariser in order to get a well-posed inverse problem.To overcome this issue, we decided to follow a diﬀerent strategy, making use ofthe implicit regularisation present in neural networks. Andres Hernandez was among the ﬁrst who tried neural networks (NN) to ad-dress calibration tasks in Finance. In [19], he showed that a feedforward NN canactually approximate the inverse map given by the pricing formula and obtainthe two parameters of the Hull-White interest rate model ( a, σ ) as output of aNN. Just as a reminder, the Hull-White model consists of the following SDE dr ( t ) = [ β ( t ) − ar ( t )] dt + σ dW ( t ) , where a , σ > and β ( t ) is uniquely determined by the term structure . Thegreatest achievement that he highlighted in the paper is the possibility of repla-cing the traditional “slow” and cumbersome calibration procedure with a newstraightforward deterministic map, which makes calibration itself a very eﬃ-cient task, since the core of all calculations is oﬀset to the training phase. Infact, once a NN is trained, its application is extremely cheap from a computa-tional point of view, being the most expensive operations simple matrix vectormultiplications.Despite the good results obtained by Hernandez, learning the map from theprices to the parameters can be critical, since this map is not known in explicitform. In principle, we do not even know if the universal approximation theoremcould be applied, because the direct map could not be bijective (thus having adiscontinuous inverse function). In general, since the inverse map is not known,we lack control on it and it might well be that a NN learns appropriately themap on the given training sample, but is not able to generalise on out-of-sampledata. This is actually what happened when we tried to apply this approach toour problem since our situation is considerably more involved than Hernandez’.For this reason, it is a better idea to learn the direct (or forward) map fromthe parameters to the prices/implied volatilities. This is done, for example, byHorvath and coauthors in [20], where they implemented a feedforward NN todirectly get the volatilities from the model parameters. Note that for the train-ing of the networks, the data are artiﬁcially generated and the grid of strikesand maturities is ﬁxed at the beginning. Again, the most appealing advantagethey see in the application of NN is the possibility of enabling live calibration ofderivative instruments, since the application of the NN itself only requires mil-liseconds avoiding the traditional bottleneck of calibration. This allows makinguse of new models that were before considered too computationally expansive This curve is calibrated through the derivative of the instantaneous forward rate f ( t, T ) at time t = 0 , i.e. β ( t ) = ∂f (0 ,t ) ∂T + af (0 , t ) + σ a (cid:0) − e − at (cid:1) once the other two parameter a and σ have been calibrated. I deﬁned in (6.6).In order to make the system works, we need two neural networks. The ﬁrst,denoted in the following as NN , is a map between parameters and volatilities(basically, the usual pricing function, as learnt in [20]), while the second, called NN , maps volatilities to parameters (but is not trained in the usual way); inour case, to the parameters deﬁning the Lévy process L . Then, we compose thetwo networks, where the ﬁrst is trained, while the second is not, to obtain a newneural network NN which receives in input volatilities and returns the (same)volatilities: NN := NN ◦ NN . In other words, NN will learn the identity and, during the training phase, NN will get trained. In this respect, we can see NN as the inverse neural networkof NN . The trick is as simple as that. However, notice that we might notrecover exactly the same parameters that gave birth to the original IVS, but an equivalent combination that resulted in the same surface through NN . In broad terms, the numerical implementation follows the model outlined inSection 4, where the process L is a compensated compound Poisson process We could think of the combinations selected by the neural network as the representativesfor an equivalence class, where all members originate the same implied volatility surface.

Keras representation of NN (with just 2 “main” hidden layers instead of 4for NN to ﬁt the picture in the page). NN is summarised as model_1 at the verybottom. Between the 2 wanted hidden layers elu_2 and elu_4, it is possible to ﬁndanother layer (in this sense the nomenclature that we used in the text).

28n which the size of jumps is normally distributed. Since the parameters areallowed to change in time, the mean and variance of the Gaussian distributionfor the jump-size of L are time-dependent, while the Poisson rate is consideredconstant in time. The same holds true for the other parameters belonging to the“Heston” part, with the exception of the interest rate r , the dividend rate q and k , which is the speed of mean reversion for the variance process. We decidedto free ourselves from a static framework and to have a variable maturities-strikes grid. More precisely, we used ten diﬀerent time-to-maturities { τ i } i =1 ,with τ < · · · < τ ranging between 7 and 440 days (extremes included),being more concentrated for short maturities, and thirteen diﬀerent moneyness m < · · · < m ranging between 0.8 and 1.2 (extremes included) in strictlyincreasing order. One diﬀerence with the generalised Bates model described inSection 4 is that we have a maturity-dependent jump distribution: mean andvariance depend on the maturity in the sense that they are piecewise constantwithin two adjacent time-to-maturities, therefore, the process L is here modelledthrough 11 parameters: the Poisson rate and then 5 tuples mean-variance forthe normal distributions.The ﬁrst neural network, i.e. NN , is a 1-cell residual feedforward NN (see[17] for more information on ResNets) composed by 4 “main” hidden layers with1024 nodes each. The input layer has dimension 41 and includes r, q, { τ i } i =1 , { m i } i =1 , v , k, θ, σ, ρ, λ, { ν i } i =1 , { δ i } i =1 , where ν i and δ i are the mean and standard deviations of the normal distributionsfor the jump size. The output layer has dimension 130 and includes the entirepoint-valued volatility surface, denoted as { IVS i } i =1 . The activation function used for all layers (apart from the output layer) isELU. This network is trained ﬁrst with artiﬁcially generated data: all para-meters are sampled from uniform distributions whose extremes (parameters)are deﬁned a priori (and are kept ﬁxed throughout the process). Then, Quant-Lib Python routines (see [1]) are used to obtain in an eﬃcient and fast wayall the necessary prices. Implied volatilities are then retrieved through thealgorithm outlined by Fabien Le Floc’h in http://chasethedevil.github.io/post/implied-volatility-from-black-scholes-price/ and implemen-ted in Python.Second, NN is created, but not (immediately) trained. As already ex-plained, this second neural network will be trained only after being composedwith the trained NN , which will be marked as non-trainable in this secondphase. This composed NN is called NN . In order to learn the operator I , NN will be trained and, as a side result, NN will be also trained. That is to say that NN is just used as a mere tool to arrive to get NN trained as well. Finally, it With this, we mean that between the predeﬁned hidden layers we only ﬁnd one time theapplication of the activation function (basically, another hidden layer). The situation mightbe clearer by looking at Figure 2. NN . As already said, the goal of NN is basicallylearning the identity function. Thus, the input of NN is the following: θ, σ, ρ, { IVS i } i =1 , while the output, since we have to learn the Lévy process L , is { ν i } i =1 , { δ i } i =1 . From an architectural viewpoint, NN has 4 “main” hidden layers with 1024nodes each. The activation function is ELU, as for NN . While training NN (and, implicitly, NN ), we have to provide as output the entire implied volatilitystructure { IVS i } i =1 , while as input the concatenation of the complete input of NN , plus the incomplete input of NN , that is everything listed above apartfrom { ν i } i =1 , { δ i } i =1 , which have to be guessed during the training. To obtaina satisfactory training procedure, we tried also diﬀerent activation functionsfor the output layer of NN . In the end, the best results were reached usingthe standard sigmoid function stretched to completely cover the intervals inwhich ν i and δ i were (randomly) extracted. Without this precaution the trainingprocess could not converge to a reliable result.For both training processes, we used the mean squared error on the impliedvolatilities as loss function, since we were dealing with regression-type tasks(other loss functions were tried, but they gave birth to NNs that were operatingmore poorly). To obtain better results, it was really helpful also the lineartransformation operated on the input and output data: outside of the impliedvolatilities which were kept unchanged, all other quantities were scaled to residein the interval (0 , . Moreover, as it is possible to see from Figure 2, we madeuse of batch normalisation, while we avoided drop-out. The best batch size forboth training processes was 10’000 (out of a database made of around 600’000elements). All hyperparameters have been selected after tuning the networks,using not only manual adjustments, but also other techniques like randomisedsearch .The whole neural network architecture was developed using Keras. A schematicrepresentation can be found in Figure 3. Having a concrete numerical tool that allows the recalibration of our modelonline and basically in an instantaneous way has another “cheerful” consequence.Let us imagine, for the moment, that the dynamics of the parameters p areknown. If this is the case, then we can model for indeﬁnite time the evolution ofan implied volatility surface without breaking any arbitrage constraints, neitherstatic nor dynamic ones. To our knowledge, it is the ﬁrst time that this achievedin an eﬃcient way, one impressive other implementation has been presented in[3]. The algorithm used to accomplish that is outlined in Algorithm 1. Instead of the interval [0 , which represents the codomain of the function. ............................. I V S I V S I V S P a r t i a l i npu t NN NN NN r q ν δ θ σ ρ I V S I V S F i g u r e : R e p r e s e n t a t i o n o f NN a s a f u ll y - c o nn ec t e d f ee d f o r w a r dn e u r a l n e t w o r k ( r e s i du a l ce ll s a r e i g n o r e d ) . lgorithm 1 Pick initial values for the state variables (return of asset X and variance V ),the Heston parameters ( θ, σ, ρ and k ), the jump-frequency ∼ Pois ( λ dt ) andthe jump-size normal distribution ∼ N ( ν i , δ i ) for i = 1 , . . . , . Parameters λ and κ remain ﬁxed throughout the procedure. Compute the implied volatility surface (IVS) given the initial values. Bates step : update the two state variables X and V in X new and V new . Compute the new implied volatility surface IVS new given X new and V new . Heston-parameter step : update the three parameters θ , σ , ρ accordingto an exogenously given dynamics and obtain θ new , σ new , ρ new . Given IVS new together with θ new , σ new , ρ new , compute the new parameters ( ν new i , δ new i ) i =1 such that the IVS obtained with X new , V new , ρ new , θ new , σ new , λ , k and ( ν new i , δ new i ) i =1 remains constant (equal to IVS new ). Overwrite the initial parameters with the new parameters (having thesub/super-script new ). Restart from point 3.As already written in the algorithm and for our purposes, we decided toinitially pick randomly the parameters θ , σ , ρ , but then letting them evolveaccording to very simple dynamics, namely adding some noise to the currentvalue to get the new one. The variance of the Gaussian noise has been chosenrelatively small and values are scaled if they overcome a certain threshold, sothat the relative change (with respect to the initial value) could not exceed 5%.In addition, we made sure that the Feller condition was always satisﬁed and thatthe values could not exit the natural domains we assigned them. For example,if the correlation ρ were brought outside of the interval [ − , , then we wouldforce it to remain inside by collapsing the value to the closest extreme. For both θ and σ the interval [0 . , . was chosen.Notice also that Steps 2, 4 and 6 of Algorithm 1 are made by neural networks, NN for 2 and 4, while NN for Step 6.Solving the same problem with the desired precision without neural networkswould have required an immense computational power, since the inverse problemis notably ill-posed and the regularised inverse problem has to be solved atany point in time along the discretisation grid. This is something possible ona standard laptop only through these techniques. Finally, it is important tounderline that we do not break any arbitrage condition because the CNKKdrift condition is fully incorporated in the steps of Algorithm 1.32 Finite dimensional realisationsfor CNKK equations

We can formulate a geometric interpretation of CRC models gathering technicalmaterial from [12] on which we heavily rely. In the sequel, we shall considerparticular vector ﬁelds σ , which only depend on the state of the forward char-acteristic θ via a tenor ≤ x , . . . , x n of times-to-maturity, through continuouslinear functionals (cid:96) . As in interest rate theory such vector ﬁelds allow for ageometric analysis of solutions of CNKK equations. In particular, we would liketo ﬁnd conditions under which these kind of vector ﬁelds σ will lead to an aﬃnesolution of the CNKK SPDE.Recall that G is a Hilbert space of continuous complex-valued functionsdeﬁned on the strip − i[0 , n × R n , i.e. G ⊂ C (( − i[0 , n ) × R n ; C ) . Theory isdeveloped on the Fréchet space D ( A ∞ ) := (cid:92) n ∈ N D ( A n ) , equipped with the family of seminorms p n ( h ) = n (cid:88) i =0 (cid:13)(cid:13) A i h (cid:13)(cid:13) H , for all n ∈ N , where H is as in Deﬁnition 5.1. An essential part is also played by Banachmaps : Deﬁnition 8.1 (Banach map) . Given a Fréchet space E , a smooth (i.e. C ∞ )map P : E ⊃ U → E is called a Banach map if there exist smooth (not neces-sarily linear) maps R : E ⊃ U → B and Q : B ⊃ V → E such that P = Q ◦ R ,where B is a Banach space and U and V are open sets. We are now ready for the following deﬁnition:

Deﬁnition 8.2 (Tenor-dependent volatility) . We call the volatility vector ﬁelds σ , . . . , σ d of a CNKK equation tenor-dependent if• we have that σ i ( θ ) = φ i ( (cid:96) ( θ )) , ≤ i ≤ d, where (cid:96) ∈ L ( H, G p ) , for some p ∈ N , and φ , . . . , φ d : G p → D ( A ∞ ) aresmooth and pointwise linearly independent maps. Moreover µ CNKK ( θ ) = φ ( (cid:96) ( η )) , where φ : G p → D ( A ∞ ) is smooth. We usually have to assume (cid:96) ( η ) = η (0 , . ) ; for every q ≥ , the map ( (cid:96), (cid:96) ◦ ( d/dx ) , . . . , (cid:96) ◦ ( d/dx ) q ) : D (( d/dx ) ∞ ) → G p ( q +1) is open; and• A is an unbounded linear operator; that is, D ( A ) is a strict subset of H .Equivalently, A : D ( A ∞ ) → D ( A ∞ ) is not a Banach map (by Lemma 2.12in [12]). Before continuing, it might be useful to recall some concepts from geometry.

Deﬁnition 8.3 (Distribution) . Given a Fréchet space E , a distribution on U open subset of E is a collection of vector subspaces D = { D f } f ∈ U of E . Adistribution D on U is said to be involutive if for any two locally given vectorﬁelds X , X with values in D (i.e X i ( f ) ∈ D ( f ) for any f ∈ U ) the Lie bracket [ X , X ] has also values in D . Deﬁnition 8.4 (Weak foliation) . A weak foliation F of dimension n on anopen subset U of a Frechét space E is a collection of submanifolds with boundary { M r } r ∈ U such thati) for all r ∈ U we have r ∈ M r with dim( M r ) = n ,ii) the distribution D ( F )( f ) := span { T f M r | r ∈ U with f ∈ M r } has dimension n for all f ∈ U , i.e. given f ∈ U the tangent spaces T f M r agree for all M r (cid:51) f . This distribution is called tangent distribution of F .In general, we say that any distribution D is tangent to F if D ( f ) ⊂ D ( F )( f ) for all f ∈ U . Finally, let us remind the reader about the fact that the set generated by allmultiple Lie brackets of the vector µ CNKK , σ , . . . , σ d generates a Lie algebra,denoted as D LA . Moreover, let us deﬁne D as the distribution given by thesame vector ﬁelds, that is D := span { µ CNKK , σ , . . . , σ d } .All these concepts are extremely important since they enter the formulationof a weak version of Frobenius theorem (see Theorem 3.9 and Proposition 4.8in [12]), which states Theorem 8.5.

Let U be an open set in D ( A ∞ ) and F an n -dimensional weakfoliation on U , for n ∈ N . D is involutive ⇐⇒ D is tangent to F . Being involutive is equivalent to saying that D LA ⊂ D ( F ) on U . Therefore,the boundedness of dim( D LA ) is necessary for the existence of a weak foliationon U . If we assume that D LA has constant and ﬁnite dimension N LA on the U open and connected set, then we can also prove next theorem:34 heorem 8.6. Let U be an open and connected subset of the Fréchet space D ( A ∞ ) such that dim( D LA ) = N LA on it. For a tenor dependent volatilitystructure, there exist linearly independent constant vectors λ , . . . , λ N LA − suchthat D LA = span { µ CNKK , λ , . . . , λ N LA − } and σ i ( θ ) ∈ span { λ , . . . , λ N LA − } , for ≤ i ≤ d on U . If we deﬁne

Σ := (cid:8) h ∈ D ( A ∞ ) | µ CNKK ( h ) ∈ span { λ , . . . , λ N LA − } (cid:9) , it is clear that we can only expect the results of Theorem 8.6 to be true on atmost D ( A ∞ ) \ Σ . It can be shown (again [12]), that Σ is closed and nowheredense in D ( A ∞ ) and that we have N LA = dim( D LA ) ≥ dim( D ) = d + 1 on D ( A ∞ ) \ Σ . This is the reason why all leaves of our foliation will have dimension N ≥ .But what is the aspect of these leaves? All forward characteristics remainwithin the ﬁnite dimensional manifold with boundary given by (cid:110) F t ( u, ψ C ( u, x )) + (cid:104) R C ( u, ψ C ( u, x )) , y (cid:105) (cid:12)(cid:12)(cid:12) ( t, u, y ) ∈ R + × R n × C (cid:111) , as one can see from (6.3), where the model parameters are ﬁxed. Any h ∈ D ( A ∞ ) ⊂ H forward characteristic will imply a system of (Riccati) ODEs whichwill intersect the leaf at most one time. But if are able to change the couple ( F t , R C ) , varying the model parameters, we could obtain more intersections. Inthis sense, a CRC model can be seen as the concatenation of diﬀerent forwardcharacteristics h on diﬀerent leaves identiﬁed by the functional characteristics ( F t , R C ) which can evolve in time. Every time this selection is operated, theforward characteristic instantiates as a ﬁnite dimensional realisation in D ( A ∞ ) \ Σ . We conclude the section with one last theorem. Theorem 8.7.

Let σ , . . . , σ d be a tenor-dependent volatility structure of aCNKK equation. Assume furthermore that for initial values in a large enoughsubset of Γ n the local mild solutions θ of the CNKK equation leave leaves of agiven foliation with constant dimension N ≥ locally invariant (regular ﬁnitedimensional realisation).Then there exist λ , . . . , λ N LA − such that σ i ( θ ) ∈ span { λ , . . . , λ N − } . Thismeans in particular that there exists a function A t : R n × R + and an R N LA − -valued process Z with Z = 0 for which θ t ( u, x ) = A t ( u, x ) + N LA − (cid:88) i =1 λ i ( u, x ) Z it (8.1) up to some stopping time τ , for x ≥ and u ∈ R n . τ is related to the deﬁnition of local mild solution (see [11]). Remark . The aﬃne character of the representation of the solution process θ in (8.1) is apparent. In particular this representation leads via the conditionalexpectation formula (in case of global solution of the CNKK equation) to aﬃnefactor processes Z and a homogenous, time-inhomogenous aﬃne process ( X, Y ) . In this paper, we tried to set up a new rigorous framework in continuous timefor the dynamics of volatility surfaces (or cubes, etc. . . ), so called consistentrecalibration models with applications. To do so, we took inspiration from sim-ilar work in discrete time by Richter and Teichmann [25] and another paperby Harms et al. [15], which builds the theory in continuous time, but focusingon yield curves modelling. With respect to the latter, in our case we have amore complex setting due to the more complex term structure, which is hereenriched with a “strike” dimension. This is reﬂected in what we called CNKKequation, a generalisation of the more popular HJM equation, but with con-siderably more involved drift term. It goes without saying that this made theequation intractable from an analytical point of view.To overcome this issue, we decided to represent the drift term by neuralnetworks. We therefore proposed a new way of solving the (ill-posed) calibrationproblem, by exploiting the fact that composition of neural networks is still aneural network and by deﬁning, in this sense, a sort of inverse network applyingimplicit regularisation. The same trick could be used for other applications, alsoin branches other than mathematical ﬁnance, to solve inverse problems. Theuse of neural networks was crucial to make numerical procedures tractable andto get information on the solution of the CNKK SPDE. In this case, we can saythat the neural network helped us solving an equation which we could not evenwrite down (explicitly).Finally, we could use the same inverse neural network to simulate the evol-ution in time of an implied volatility surface, in this case generated by a gener-alised Bates model. To the best of our knowledge, it is the ﬁrst time this canbe achieved for indeﬁnite time without breaking arbitrage constraints.

References [1] Ferdinando Ametrano, Luigi Ballabio, et al. Quantlib - a free/open-sourcelibrary for quantitative ﬁnance, 2003. http://quantlib.org/ .[2] René Carmona and Sergey Nadtochiy. Tangent Lévy market models.

Fin-ance and Stochastics , 16(1):63–104, Jan 2012.[3] R. Carmona, Y. Ma and S. Nadtochiy. Simulation of Implied Volatility Sur-faces via Tangent Lévy models.

SIAM Journal on Financial Mathematics ,8(1):171–213, 2017. 364] Rama Cont and Peter Tankov. Nonparametric calibration of jump-diﬀusionoption pricing models.

Journal of Computational Finance , 7:1–49, 2004.[5] Rama Cont and Peter Tankov. Retrieving Lévy processes from optionprices: Regularization of an ill-posed inverse problem.

SIAM Journal onControl and Optimization , 45(1):1–25, 2005.[6] Christa Cuchiero, Wahid Khosrawi and Josef Teichmann. A generativeadversarial network approach to calibration of local stochastic volatilitymodels, arXiv/2005.02505 , preprint, 2020.[7] Christa Cuchiero, Lukas Gonon, Lyudmila Grigoryeva, Juan-Pablo Ortegaand Josef Teichmann. Approximation of dynamics by randomized signa-ture, working paper, 2020.[8] Giuseppe Da Prato and Jerzy Zabczyk.

Stochastic Equations in InﬁniteDimensions . Encyclopedia of Mathematics and its Applications. CambridgeUniversity Press, 2 edition, 2014.[9] Darrell Duﬃe, Damir Filipović, and Walter Schachermayer. Aﬃne pro-cesses and applications in ﬁnance.

The Annals of Applied Probability ,13(3):984–1053, 2003.[10] Bruno Dupire. A uniﬁed theory of volatility.

Derivatives pricing: Theclassic collection , 185–196, 1996.[11] Damir Filipović.

Consistency Problems for Heath-Jarrow-Morton InterestRate Models , volume 1760 of

Lecture Notes in Mathematics . Springer BerlinHeidelberg, Jan 2001.[12] Damir Filipović and Josef Teichmann. Existence of invariant manifolds forstochastic equations in inﬁnite dimension.

Journal of Functional Analysis ,197(2):398 – 432, 2003.[13] Jim Gatheral, Thibault Jaisson and Mathieu Rosenbaum. Volatility isrough.

Quantitative Finance , 18:6, 933-949, 2018.[14] Julien Guyon and Pierre Henry-Labordére. The smile calibration problemsolved.

Preprint, available at https://ssrn.com/abstract=1885032 , 2011[15] Philipp Harms, David Stefanovits, Josef Teichmann, and Mario V. Wü-thrich. Consistent recalibration of yield curve models.

Mathematical Fin-ance , 28(3):757–799, 2018.[16] David Heath, Robert Jarrow and Andrew Morton. Bond pricing and theterm structure of interest rates: A new methodology for contingent claimsvaluation.

Econometrica: Journal of the Econometric Society , 77–105,1992.[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residuallearning for image recognition.

CoRR , abs/1512.03385, 2015.3718] Jakob Heiss, Josef Teichmann, and Hanna Wutte. How implicit reg-ularization of Neural Networks aﬀects the learned function – Part I, arXiv/1911.02903 , preprint, 2019.[19] Andres Hernandez. Model calibration with neural networks.

Risk , 2017.[20] Blanka Horvath, Aitor Muguruza, and Mehdi Tomas. Deep Learning Volat-ility. arXiv e-prints , page arXiv:1901.09647, Jan 2019.[21] Jan Kallsen and Paul Krühner. On a Heath–Jarrow–Morton approach forstock options.

Finance and Stochastics , 19(3):583–615, Jul 2015.[22] Martin Keller-Ressel. Aﬃne processes - theory and applications in ﬁnance.

PhD thesis , TU Wien, 2008.[23] Martin Keller-Ressel, Walter Schachermayer, and Josef Teichmann. Aﬃneprocesses are regular.

Probability Theory and Related Fields , 151(3):591–611, Dec 2011.[24] Martin Keller-Ressel, Walter Schachermayer, and Josef Teichmann. Regu-larity of aﬃne processes on general state spaces.

Electron. J. Probab. , 18:17pp., 2013.[25] Anja Richter and Josef Teichmann. Discrete time term structure theory andconsistent recalibration models.

SIAM Journal on Financial Mathematics ,8(1):504–531, 2017.[26] Yuri F. Saporito, Xu Yang and Jorge P. Zubelli. The calibration ofstochastic local-volatility models: An inverse problem perspective.

Com-puters & Mathematics with Applications , 77(12):3054–3067, 2019.[27] Philipp J. Schönbucher. A Market Model for Stochastic Implied Volatil-ity.

Philosophical Transactions: Mathematical, Physical and EngineeringSciences , 1758, 2071–2092, 1999.[28] Martin Schweizer and Johannes Wissel. Term Structures of Implied Volat-ilities: Absence of Arbitrage and Existence Results.