[PDF] Fast calibration of the LIBOR Market Model with Stochastic Volatility based on analytical gradient

Abstract

We propose to take advantage of the common knowledge of the characteristic function of the swap rate process as modelled in the LIBOR Market Model with Stochastic Volatility and Displaced Diffusion (DDSVLMM) to derive analytical expressions of the gradient of swaptions prices with respect to the model parameters. We use this result to derive an efficient calibration method for the DDSVLMM using gradient-based optimization algorithms. Our study relies on and extends the work by (Cui et al., 2017) that developed the analytical gradient for fast calibration of the Heston model, based on an alternative formulation of the Heston moment generating function proposed by (del Ba{ñ}o et al., 2010). Our main conclusion is that the analytical gradient-based calibration is highly competitive for the DDSVLMM, as it significantly limits the number of steps in the optimization algorithm while improving its accuracy. The efficiency of this novel approach is compared to classical standard optimization procedures.

Full PDF

FFast calibration of the LIBOR Market Model withStochastic Volatility based on analytical gradient

Herv´e Andres , Pierre-Edouard Arrouy , Paul Bonnefoy ,Alexandre Boumezoued and Sophian Mehalla , Milliman R&D

14 Avenue de la Grande Arm´eeParis, France CERMICS

June 25, 2020

Abstract

We propose to take advantage of the common knowledge of the characteristicfunction of the swap rate process as modelled in the LIBOR Market Model withStochastic Volatility and Displaced Diﬀusion (DDSVLMM) to derive analyticalexpressions of the gradient of swaptions prices with respect to the model parameters.We use this result to derive an eﬃcient calibration method for the DDSVLMM usinggradient-based optimization algorithms.Our study relies on and extends the work by (Cui et al., 2017) that developed theanalytical gradient for fast calibration of the Heston model, based on an alternativeformulation of the Heston moment generating function proposed by (del Ba˜no etal., 2010).Our main conclusion is that the analytical gradient-based calibration is highlycompetitive for the DDSVLMM, as it signiﬁcantly limits the number of steps in theoptimization algorithm while improving its accuracy. The eﬃciency of this novelapproach is compared to classical standard optimization procedures.

Keywords:

LIBOR Market Model; Stochastic Volatility; Displaced Diﬀusion; Swaptionspricing; Aﬃne processes; Optimization algorithms.

To speed up the calibration procedure of the DDSVLMM, two main strategies can beconsidered. 1 a r X i v : . [ m a t h . O C ] J un irst, reduce the computational time required for numerical calculation of one (ormultiple) swaption prices. In (Wu and Zhang, 2006), pricing under the SVLMM isperformed based on both the classical (Heston, 1993) numerical integration method andthe famous Fast Fourier Transform (FFT) approach of (Carr and Madan, 1999), whichhas become a standard for option valuation for models with known characteristic function,as it is particularly the case for aﬃne diﬀusion processes. As an alternative to momentgenerating function calculation, (Devineau et al., 2020) have shown the eﬃciency of usingEdgeworth and Gram-Charlier expansions in the calibration of the DDSVLMM. Furtherwork on this type of strategies that reduce the computational cost of the objective functionis provided in a companion paper, see (Arrouy et al., 2020).Note that a typical number of parameters for a DDSVLMM calibration is 8 or 9,depending on whether the displacement parameter is included in the calibration processor not, whereas a standard Heston model calibration involves 5 parameters. In suchcalibration problem, the regularity of the objective function is unknown. This latterissue can be tackled using optimization algorithms unconcerned about the regularity ofthe target function; it has been done in (Devineau et al., 2020) in which authors used theNelder-Mead method.In this paper, we develop a second strategy which consists in decreasing the number ofobjective function calls required by the optimization algorithm. In this context, the use ofgradient-based algorithms is of interest. This can be done by relying on either numericalor analytical gradient computation, the latter being particularly eﬃcient in terms ofaccuracy and computational cost. A central reference for the present paper is (Cui et al.,2017), that developed the analytical gradient for fast calibration of the Heston model,based on an alternative formulation of the Heston moment generating function proposedby (del Ba˜no et al., 2010). Alternatives focusing on ‘learning’ the gradient, withoutanalytical calculation, have been proposed by (Liu et al., 2019) based on Artiﬁcial NeuralNetworks. Apart from the Heston model, analytical gradient-based methods have alsobeen developed by (Gudmundsson and Vyncke, 2019) for the calibration of the so-called3/2 model.By adapting the approach of (Cui et al., 2017), the present paper provides a nu-merically stable analytical gradient for the DDSVLMM and uses it in gradient-basedoptimization routines. Note that the work of (Cui et al., 2017) focused on the Hestonmodel, did not make use of real market data and focused on the Levenberg-Marquardt op-timization algorithm. We specify the methodology in the context of DDSVLMM swaptionmodelling and we demonstrate the eﬃciency of our approach by inputting the analyticalgradient into both the Levenberg-Marquardt algorithm (denoted by LM in the following,see (Marquardt, 1963)) and the Broyden–Fletcher–Goldfarb–Shanno algorithm (denotedby BFGS, see (Byrd et al., 1995)) using real market swaption data.In addition, we also consider ensuring that the Feller condition is satisﬁed by usingan alternative version of the Levenberg-Marquardt algorithm including constraints; notethat although the Feller condition is not targeted in (Cui et al., 2017), in our contextthis condition is of interest for further uses as it preserves the stability of some numericaldiscretization schemes. As an example, such Feller condition ensures strong convergenceof order 1/2 of discretization schemes as discussed in (Alfonsi, 2015) (Chapter 3) andreferences therein.It is worth mentioning that in our experiment, the computation of the objective2unction based on the moment generating function representation proposed by (Albrecher,2007) is in average slightly faster compared to that of (del Ba˜no et al., 2010). This leadsus to consider an ‘optimal’ optimization routine made of 1) the analytical gradient asan input of the gradient-based method (LM or BFGS), based on the diﬀerentiation ofthe moment generating function as provided in (del Ba˜no et al., 2010) and 2) objectivefunction calls during the optimization procedure that rely on the numerical evaluation ofthe moment generating function representation of (Albrecher, 2007).Finally, the eﬃciency of our approach is compared to the following methods: the clas-sical Heston-type pricing method (Heston, 1993) or the Edgeworth expansion of swaptionprices as developed in (Devineau et al., 2020), both combined with the Nelder-Mead al-gorithm as well as the Levenberg-Marquart algorithm with numerical gradient. As amain result, this paper shows that the analytical gradient method is highly competi-tive both from a computational standpoint and calibration accuracy, as it achieves tosigniﬁcantly limit the number of steps in the optimization algorithm while still oﬀeringaccurate data replication. Our calibration experiments on real data also show that ex-pansion methodologies that reduce the computational cost of an objective function call(see for instance (Devineau et al., 2020) and (Arrouy et al., 2020)) is a complementaryeﬃcient alternative; the combination of the two strategies (expansion approach and com-putation of the related analytical gradient) appears as a promising direction which is leftfor further research.The remainder of this paper is structured as follows. In Section 2, we recall theswap rate (approximated) dynamics under the DDSVLMM, as well as the correspondingmoment generating function. Section 3 presents the alternative representation of themoment generating function we propose along with the analytical gradient calculationof the objective function it implies; the optimization algorithm used is also detailed.Section 4 illustrates the eﬃciency of the proposed calibration method and compares it tothe classical alternative methods listed above. Notations

In this work, we consider a probability space (Ω , F , P ) equipped with aﬁltration ( F t ) t ≥ satisfying usual conditions. In our ﬁnancial context, P stands for thehistorical probability measure while ( F t ) t ≥ represents accumulated market information.Bold font will be employed for denoting either vectors as u or matrices using capitalletters as M . The canonical scalar product of two vectors will be denoted by u · v ; (cid:107) u (cid:107) will stand for the ( L -) norm induced by the scalar product: (cid:107) u (cid:107) = √ u · u . The LIBOR Market Model relies on the modelling of the forward rates which are quan-tities directly observable on the interest rates market. Let P ( t, T ) be the time- t price ofa Zero-Coupon bond maturing at time T > t with par value 1. Let us consider a ﬁnitetenor structure T < T < · · · < T K < ∞ . For a given j ∈ (cid:74) , K − (cid:75) , the simplycompounded forward lending rate, seen at time t ≤ T j and prevailing over the period[ T j , T j +1 [, is denoted by F j ( t ) and an arbitrage-free argument (Brigo and Mercurio, 2007)3hows that it can be expressed in terms of Zero-Coupon prices as: F j ( t ) = 1∆ T j (cid:18) P ( t, T j ) P ( t, T j +1 ) − (cid:19) , (1)with ∆ T j := T j +1 − T j . To account for the modelling of negative forward rates, a dis-placement coeﬃcient δ ≥

0, often called shift , is introduced so that in this framework, themodelled quantities are the shifted-rates F j ( t ) + δ , j ∈ { , . . . , K } . These are assumedto stay non-negative and a log-normal setting can be prescribed for the modelling.Let us consider the spot LIBOR measure Q - sometimes assimilated to the Risk-Neutral measure - associated with the num´eraire B ( t ) = P ( t,T m ( t ) )Π m ( t ) − i =0 P ( T i ,T i +1 ) , where m ( t ) =inf { ≤ j ≤ K − t ≤ T j } is the index of the ﬁrst forward rate that has not expired by t . For each j ∈ (cid:74) , K − (cid:75) , we assume the existence of a N f -dimensional deterministicfunction t ∈ R + (cid:55)−→ γ j ( t ) such that under Q , the dynamics of the shifted rates writes:d F j ( t ) = (cid:112) V t ( F j ( t ) + δ ) γ j ( t ) · (cid:16) − σ j +1 ( t )d t + d W ∗ t (cid:17) , (2)where ( W ∗ t ) t ≥ is a N f -dimensional Brownian motion (with independent components)under Q , the function σ j +1 ( t ) := − (cid:80) jk = m ( t ) ∆ T k ( F k ( t )+ δ )1+∆ T k F k ( t ) γ j ( t ) has been introduced andwhere the process ( V t ) t ≥ is a stochastic process allowing to replicate some market datafeatures (smile, skew). Namely the volatility process lies in the family of Cox-Ingersoll-Ross processes under the Risk-Neutral measure, assuming then the following dynamics:d V t = κ ( θ − V t )d t + (cid:15) (cid:112) V t d W t (3)with ( W t ) t ≥ a Brownian motion and κ, θ, (cid:15) three non-negative parameters. The Fellercondition 2 κθ ≥ (cid:15) ensures that the process remains almost surely non-negative at anytime t as long as V >

0. Finally, the correlation structure between the forward ratesand the volatility factor is captured through a time-dependent function t ∈ R + (cid:55)−→ ρ j ( t )satisfying: E (cid:20)(cid:18) γ j ( t ) (cid:107) γ j ( t ) (cid:107) · d W ∗ t (cid:19) d W t (cid:21) =: ρ j ( t )d t. The speciﬁcation of the parametric functions γ j and ρ j are detailed in Section 3.2.In the remaining of the paper, we will denote by Θ the vector of parameters that areto be estimated: it includes the parametric speciﬁcations of the maps γ j and ρ j and thevolatility parameters κ , θ and (cid:15) . We chose to not include the displaced coeﬃcient δ in theset of parameters to be estimated and to ﬁx it at an arbitrary value for our experiment.However, the developed method can be extended to the case when δ is considered as aparameter to be calibrated. The time- t value of the (forward) swap rate, t ≤ T m , prevailing over the time interval[ T m , T n ] ( m ≤ n ≤ K ) is given by an arbitrage-free argument again: R m,n ( t ) = P ( t, T m ) − P ( t, T n ) B S ( t )4here B S ( t ) = (cid:80) n − j = m ∆ T j P ( t, T j +1 ) is the annuity of the considered swap (which striclydepends on m and n although we omit the notation for simplicity). For similar reasonsas those previously mentioned, we are lead to consider shifted swap rate R m,n ( t ) + δ . The shifted swap rate can be expressed as a (stochastic) function of the shifted forward ratesinvolved between T m and T n : R m,n ( t ) + δ = n − (cid:88) j = m α j ( t ) (cid:0) F j ( t ) + δ (cid:1) with α j ( t ) = ∆ T j P ( t,T j +1 ) B S ( t ) . Observe that this stochastic weights α j are themselves functionsof the shifted forward rates. Then, the swap rate dynamics under the so-called forwardswap rate measure Q S , associated with the num´eraire B S , can be deduced from theforward rates dynamics (2) using Ito’s lemma, see (Wu and Zhang, 2006):d R m,n ( t ) = (cid:112) V t n − (cid:88) j = m ∂ R m,n ( t ) ∂F j ( t ) ( F j ( t ) + δ ) γ j ( t ) · d Z St , where ∂ R m,n ( t ) ∂F j ( t ) = α j ( t ) + ∆ T j T j F j ( t ) (cid:80) j − k = m α k ( t ) (cid:0) F k ( t ) − R m,n ( t ) (cid:1) and ( Z St ) t ≥ is a N f -dimensional Brownian motion. Moreover under Q S , the dynamics of the volatility factorwrites: d V t = κ (cid:16) θ − ˜ ξ S ( t ) V t (cid:17) d t + (cid:15) (cid:112) V t d Z St , where ( Z St ) t ≥ ξ S ( t ) := 1 + (cid:15)κ (cid:80) n − j = m α j ( t ) (cid:80) jk = m ( t ) ∆ T k ( F k ( t )+ δ )1+∆ T k F k ( t ) ρ k ( t ) (cid:107) γ k ( t ) (cid:107) appeared when applying theGirsanov’s theorem.As it stands, the model is too complex to be calibrated. We resort then to the so-called freezing technique which relies on the assumption of low variability of some stochasticquantities. We obtain then the following approximated dynamics under Q S , in which R m,n ( t ) denotes the approximated shifted swap rate:d R m,n ( t ) = (cid:112) V ( t )( R m,n ( t ) + δ ) n − (cid:88) j = m ω j (0) γ j ( t ) · d Z St , d V t = κ (cid:16) θ − ˜ ξ S ( t ) V t (cid:17) d t + (cid:15) √ V t d Z St , (4)where ω j (0) := ∂R m,n (0) ∂F j (0) F j (0)+ δR m,n (0)+ δ and ˜ ξ S ( t ) := 1+ (cid:15)κ (cid:80) n − j = m α j (0) (cid:80) jk = m ( t ) ∆ T k ( F k (0)+ δ ) ρ k ( t ) (cid:107) γ k ( t ) (cid:107) T k F k (0) .Note that (4) is an Heston-type model (with time-dependent coeﬃcient) that allows totake advantage of well-known pricing method based on the analytical form of the charac-teristic function of the logarithm of R m,n (the method is detailed in the following section).Indeed, when considering the dynamics of the log-shifted swap rate, one recover an aﬃnediﬀusion. As a result the moment generating function of ln (cid:0) R m,n ( t ) + δ (cid:1) is explicitlyknown after solving some Riccati equations.5 .2 Swaption price The spot price of a payer swaption contract with strike K expresses as the followingexpectation associated to Q S : P S ( Θ ; 0 , K ) = B S (0) E S [max( R m,n ( T m ) − K, , (5)where R m,n ( T m ) is modelled thanks to (4). Note that the swaption price is expressed hereas a function of the approximated swap rate, so that the swaption price is actually anapproximated swaption price. Observe also that we omit the dependence of the swaptionprice to the maturity and the tenor. It is straightforward to rewrite the swaption priceas: P S ( Θ ; 0 , K ) = B S (0) (( R m,n (0) + δ ) P ( Θ ; 0 , K ) − ( K + δ ) P ( Θ ; 0 , K )) , (6)where P ( Θ ; 0 , K ) = E S (cid:20) e ln Rm,n ( Tm )+ δRm,n (0)+ δ R m,n ( T m ) ≥ K (cid:21) ,P ( Θ ; 0 , K ) = E S (cid:2) R m,n ( T m ) ≥ K (cid:3) . Let us denote by ( F t ) t ≥ the ﬁltration generated by the Brownian motions ( Z S t , Z St ) t ≥ and by ψ the moment generating function of the variable X m,n ( T m ) := ln (cid:18) R m,n ( T m )+ δR m,n (0)+ δ (cid:19) ,deﬁned for z ∈ D ⊂ C by ψ ( Θ ; X m,n ( t ) , V t , t ; z ) = E S [ e zX m,n ( T m ) |F t ] , (7)where D is the domain of convergence of the moment generating function. Let ϕ be thecharacteristic function of X m,n ( T m ) deﬁned for z ∈ { z (cid:48) ∈ C : Re( z (cid:48) ) ∈ D} by ϕ ( Θ ; z ) = ψ ( Θ ; X m,n (0) , V , z ) , where Re( · ) is the real part. In the following, i denotes the imaginary unit satisfying i = −

1. The two expectations appearing in the swaption price (6) can be computedusing the characteristic function ϕ in the following way (we refer the reader to (Duﬃe etal., 2000) for a detailed justiﬁcation): P ( Θ ; 0 , K ) = 12 + 1 π (cid:90) + ∞ Re (cid:32) e − iu ln K + δRm,n (0)+ δ ϕ ( Θ ; u − i ) iu (cid:33) d u,P ( Θ ; 0 , K ) = 12 + 1 π (cid:90) + ∞ Re (cid:32) e − iu ln K + δRm,n (0)+ δ ϕ ( Θ ; u ) iu (cid:33) d u. As deﬁned in (7), the moment generating function is a martingale and thus using Ito’slemma and making the drift term zero, we get that ψ is solution of the following Kol-mogorov backward equation: ∂ψ∂t + κ ( θ − ξ ( t ) v ) ∂ψ∂v − λ ( t ) v ∂ψ∂x + 12 (cid:15) v ∂ ψ∂v + (cid:15) ˜ ρ ( t ) λ ( t ) v ∂ ψ∂v∂x + 12 λ ( t ) v ∂ ψ∂x = 0 , (8)6ith terminal condition ψ ( Θ ; x, v, T m ; z ) = e zx . The following quantities have been introduced in the equation above: ξ ( t ) := ˜ ξ S ( t ) , λ ( t ) := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − (cid:88) j = m ω j (0) γ j ( t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , ˜ ρ ( t ) := 1 λ ( t ) n − (cid:88) j = m ω j (0) (cid:107) γ j ( t ) (cid:107) ρ j ( t ) . Following Heston’s approach developed in (Heston, 1993) and (Wu and Zhang, 2006), onecan look for a solution of (8) having a separable form ψ ( Θ ; x, V, t ; z ) = e A ( τ,z )+ B ( τ,z ) V + zx where τ = T m − t (the dependence of A and B on Θ is omitted for the sake of simplicity).The problem reduces then to the resolution of some Riccati equations (see (Wu andZhang, 2006) for the details), which allows to get analytical closed-form expressions of A and B under the common assumption that λ and ˜ ρ are piecewise constant functions onthe grid ( τ j , τ j +1 ] where τ j = T m − T m − j , j = 0 , · · · , m −

1. The recursive computationof A and B is done as follows: for each j ∈ { , , . . . , m − } , for each τ in ( τ j , τ j +1 ],  A ( τ, z ) = A ( τ j , z ) + κθ(cid:15) (cid:20) ( µ + ν )( τ − τ j ) − − g j e ν ( τ − τ j ) − g j (cid:21) ,B ( τ, z ) = B ( τ j , z ) + ( µ + ν − (cid:15) B ( τ j , z ))(1 − e ν ( τ − τ j ) ) (cid:15) (1 − g j e ν ( τ − τ j ) ) , with initial condition A (0 , z ) = B (0 , z ) = 0 and µ = κξ ( τ j ) − ˜ ρ ( τ j ) (cid:15)λ ( τ j ) z, ν = (cid:113) µ − λ ( τ j ) (cid:15) ( z − z ) , g j = µ + ν − (cid:15) B ( τ j , z ) µ − ν − (cid:15) B ( τ j , z ) . For the sake of simplicity, we omitted the time dependency of µ , ν and g j . As pointed out by (Albrecher, 2007), the representation of the characteristic functionby (Heston, 1993) suﬀers from numerical discontinuities for long maturities. To overcomethis diﬃculty, they proposed an equivalent representation that is continuous. Anotheralternative formulation of the Heston moment generating function has been developedby (del Ba˜no et al., 2010) that is also continuous. This alternative formulation is easierto diﬀerentiate and thus well suited for gradient calculation, as derived by (Cui et al.,2017) in the context of the Heston model.For our purpose, we rely on the following modiﬁcations of the functions A and B : for τ j ≤ τ < τ j +1 , j = 0 , , . . . , m −  A ( τ, z ) = A ( τ j , z ) − κθ ˜ ρλz ( τ − τ j ) (cid:15) + 2 κθ(cid:15) D j ( τ ) ,B ( τ, z ) = B ( τ j , z ) − A j ( τ ) V , (9)7here: E j ( τ ) = ν + µ − (cid:15) B ( τ j , z ) + ( ν − µ + (cid:15) B ( τ j , z )) e − ν ( τ − τ j ) ,D j ( τ ) = ln νV + κξ − ν τ − τ j ) − ln E j ( τ )2 V ,A j ( τ ) = (cid:2) B ( τ j , z )(2 µ − (cid:15) B ( τ j , z )) + λ ( z − z ) (cid:3) tanh ν ( τ − τ j )2 ,A j ( τ ) = νV + µ − (cid:15) B ( τ j , z ) V tanh ν ( τ − τ j )2 ,A j ( τ ) = A j ( τ ) A j ( τ ) . Note that these formulas have been adapted from (Cui et al., 2017) by using hyper-bolic tangent functions in order to avoid numerical instabilities.We now state our key result.

Proposition 1

The gradient of the swaption price under the approximate Heston dy-namics (4) is given by: ∇ P S ( Θ ; 0 , K ) = B S (0) (( R m,n (0) + δ ) ∇ P ( Θ ; 0 , K ) − ( K + δ ) ∇ P ( Θ ; 0 , K )) with ∇ P ( Θ ; 0 , K ) = 1 π (cid:90) + ∞ Re (cid:32) e − iu ln K + δRm,n (0)+ δ ∇ ϕ ( Θ ; u − i ) iu (cid:33) du, ∇ P ( Θ ; 0 , K ) = 1 π (cid:90) + ∞ Re (cid:32) e − iu ln K + δRm,n (0)+ δ ∇ ϕ ( Θ ; u ) iu (cid:33) du, and ∇ ϕ ( Θ ; u ) = ϕ ( Θ ; u ) χ ( Θ ; u ) , where χ ( Θ ; u ) is the gradient vector (which components are the partial derivatives of thecharacteristic function with respect to each parameter), detailed in Appendix A. Remark 1

The particular form of the analytical gradient allows one to compute ϕ ( Θ ; u ) only once. The calibration problem amounts to ﬁnd the model parameters that allow to best repli-cate market data. In this paper, we choose to calibrate on market swaptions prices ratherthan on implied volatilities since we derived the analytical gradient of the swaption price.8et us consider the following standard parametrizations for the maps γ j and ρ j intro-duced in the section 2: for t ∈ [ T k , T k +1 ), γ j ( t ) ≡ γ j ( T k ) = g ( T j − T k ) β j − k +1 ,ρ j ( t ) ≡ ρ j ( T k ) = ρ (cid:112) N f (cid:107) γ j ( T k ) (cid:107) N f (cid:88) p =1 γ ( p ) j ( T k ) , where g ( u ) = ( a + bu ) e − cu + d with a , b , c and d non-negative parameters to calibrate, β j − k +1 is a N f -dimensional unit vector representing the inter-forward correlation struc-ture and ρ is a correlation parameter to calibrate in ( − , g (which is the norm of γ j by construction) withrespect to a , b , c and d as follows, for u ≥ ∂g ( u ) ∂a = e − cu ,∂g ( u ) ∂b = u ∂g ( u ) ∂a ,∂g ( u ) ∂c = − ( a + bu ) ∂g ( u ) ∂b ,∂g ( u ) ∂d = 1 . Finally, the set of parameters to be calibrated writes Θ = ( a, b, c, d, κ, θ, (cid:15), ρ ).Assume that we have a set (cid:0) P S mktm,n (0 , K j ) (cid:1) j ∈ J,m ∈ M,n ∈ N of market swaptions prices ofdiﬀerent strikes, maturities and tenors. For the same set of strikes, maturities and tenors,we denote the model prices computed as in (6) by ( P S m,n (0 , K j )) j ∈ J,m ∈ M,n ∈ N . We formu-late the calibration problem as an inverse least squares minimization problem with boundconstraints and with the Feller condition as an inequality constraint in the following way:arg min Θ W (cid:88) ( j,m,n ) ∈ J × M × N w j,m,n (cid:32) P S m,n ( Θ ; 0 , K j ) − P S mktm,n (0 , K j ) P S mktm,n (0 , K j ) (cid:33) such that 2 κθ ≥ (cid:15) ( a, b, c, d, κ, θ, (cid:15), ρ ) ∈ ( R + ) × ( R ∗ + ) × ( −

1; 1) (10)where w j,m,n are ﬁxed positive weights associated to each swaption and W = (cid:88) ( j,m,n ) ∈ J × M × N w j,m,n . In the following, F stands for the objective function in theprevious optimization problem and f stands for the vector of residuals, deﬁned by: f ( Θ ) = (cid:34)(cid:114) w j,m,n W P S m,n ( Θ ; 0 , K j ) − P S mktm,n (0 , K j ) P S mktm,n (0 , K j ) (cid:35) j,m,n such that we have F ( Θ ) = (cid:107) f ( Θ ) (cid:107) . As mentioned in (Gudmundsson and Vyncke,2019), a regularization term, of the form (cid:107) Γ Θ (cid:107) , can be added to F to promote some9olution. As an example, a classical choice is to take Γ = Id , thus preventing the normof Θ of becoming too large.Such an optimization problem can be solved numerically using general optimizationmethods like the Nelder-Mead algorithm. It is a direct search method, i.e. it does notrequire any information about the gradient of the objective function. Instead, it relies onthe concept of simplex that is a special polytope of n +1 vertices in a n -dimensional space.The algorithm starts by evaluating the objective function on each vertex of the simplex.The vertex with the highest value is replaced by a new point where the objective functionis lower. The simplex is updated in this manner until the sample standard deviation ofthe function values on the current simplex falls below some preassigned tolerance. Moredetails on the Nelder-Mead method can be found in (Nash, 1979). Note that with thisalgorithm, one can enforce bound and inequality constraints in (10) by modifying theobjective function in the following way:˜ F ( Θ ) = (cid:40) F ( Θ ) , if Θ ∈ ( R + ) × ( R ∗ + ) × ] −

1; 1[ and 2 κθ ≥ (cid:15) , + ∞ otherwise . (11)Although the Nelder-Mead method has proven to be eﬃcient in some contexts, we ob-served that it turns out to be very time-consuming in our framework. As a matter of fact,this method requires a lot of evaluations of the objective function and the computation ofthe swaption prices is very expensive due to the computation of integrals in the complexﬁeld and the recursive deﬁnition of the characteristic function. This has already beenpointed out by (Devineau et al., 2020). Consequently, an optimization algorithm thatdoes not require a lot of objective functions evaluations is preferred in order to achieve afast calibration. Since we have been able to derive an analytical formula for the gradientof model swaption price, we study gradient-based optimization algorithms. We presenttwo of such methods in the next section. In this section, we quickly remind the main ideas behind gradient-based algorithms be-fore presenting more speciﬁcally two of these algorithms, namely the BFGS and the LMalgorithms.Gradient-based algorithms are iterative methods that start from a given point andproceed by successive adjustments. Each improvement is obtained by moving from thecurrent point along a conveniently chosen descent direction in such a manner that thevalue of the objective function decreases. We describe in Algorithm 1 the general algo-rithmic scheme of gradient-based algorithms. More details on gradient-based algorithmscan be found in (Nocedal and Wright, 2006).10 lgorithm 1:

Gradient-based algorithms in pseudo code

Input:

Initial guess Θ , objective function F , objective function gradient ∇ F ,tolerance (cid:15) tol and maximum number of iterations k max begin k ← while k ≤ k max do Compute ∇ F ( Θ k ) if ∇ F ( Θ k ) ≤ (cid:15) tol then break end Compute a descent direction d k , generally by using ∇ F ( Θ k ) Find α k such that F ( Θ k + α k d k ) is reasonably lower than F ( Θ k ) (linesearch) Θ k +1 ← Θ k + α k d k end endOutput: Last computed Θ k We now take a closer look at the BFGS and LM algorithms. The BFGS algorithmis a quasi-Newton method, which means that it uses an approximation of the inverse ofthe Hessian matrix instead of the exact inverse used in Newton’s method. It is designedfor solving all kinds of unconstrained non-linear optimization problems. The descentdirection d k is given by d k = − H k ∇ F ( Θ k ) , where H k is an approximation of the inverse of the Hessian matrix that is deﬁned recur-sively by, H k +1 = H k − H k y k y Tk H k y Tk H k y k + s k s Tk y Tk s k with initial value H = I ( I is the identity matrix), s k = Θ k +1 − Θ k and y k = ∇ F ( Θ k +1 ) −∇ F ( Θ k ). The gradient of the objective function F writes as a function of the gradientof the swaption price ∇ P S m,n : ∇ F ( Θ ) = 1 W (cid:88) ( i,m,n ) ∈ I × M × N w j,m,n ∇ P S m,n ( Θ ; 0 , K j ) P S m,n ( Θ ; 0 , K j ) − P S mktm,n (0 , K j ) P S mktm,n (0 , K j ) . The calibration problem (10) is a constrained optimization problem and thus, the BFGSalgorithm can not be used. We therefore rely on an extension of the classical BFGS algo-rithm, known as L-BFGS-B, that has been developed to handle bound constraints (Byrdet al., 1995). However, inequality constraints can not be easily enforced. Consequently,we relax the Feller condition when using this method.Furthermore, the Levenberg-Marquardt algorithm is speciﬁcally designed for solvingnon-linear least squares problems, which is exactly the type of problem we are copingwith. This algorithm has the particularity of behaving like the steepest descent methodwhen the current point is far (in some sense) of a (the) solution and of behaving likethe Gauss-Newton method when the current point is near of a (the) solution. This is11chieved by introducing a damping parameter in the expression of the descent direction,as follows d k = − ( J Tk J k + µ k I ) − g k where J k is the Jacobian matrix of f in Θ k , g k = J Tk f ( Θ k ) is the gradient of F in Θ k and µ k is the damping parameter. For large values of µ k (compared to coeﬃcients of J Tk J k ),we have d k (cid:39) − µ k g k which corresponds to the descent direction in the steepest descentmethod. For small values of µ k , we have d k (cid:39) − ( J Tk J k ) − g k which corresponds to thedescent direction in the Gauss-Newton method. The updating strategy of µ k is describedin Algorithm 2 in Appendix B.1. There are a plenty of diﬀerent updating strategiesleading to several versions of the Levenberg-Marquardt algorithm. More details on thisalgorithm can be found in (Madsen et al., 1999). Adding bound constraints to the LM algorithm

As for the BFGS algorithm, the classic LM algorithm does not handle bound and in-equality constraints. However, it can be extended to do so. Bound constraints can beensured by using a projection of Θ k onto the feasible set. We detail the modiﬁcationsin Algorithm 3 in Appendix B.2. Handling linear inequality constraints requires manymodiﬁcations so we will not detail them here but the interested reader can ﬁnd a discus-sion on this topic in (Nocedal and Wright, 2006) (Chapter 15). Observe that the Fellercondition is not linear in the parameters of the DDSVLMM. However it can be easilylinearized using the following change of variables:˜ κ = ln κ, ˜ θ = ln θ, ˜ (cid:15) = ln (cid:15). The Feller condition writes in term of the new volatility parameters as:˜ κ + ˜ θ + ln 2 ≥ (cid:15). To account for this change of variables, the gradient of the swaption price has to be mod-iﬁed by replacing the partial derivatives with respect to κ , θ and (cid:15) by the correspondingderivatives with respect to ˜ κ , ˜ θ and ˜ (cid:15) : ∂P S∂ ˜ κ ( Θ ; 0 , K j ) = κ ∂P S∂κ ( Θ ; 0 , K j ) ,∂P S∂ ˜ θ ( Θ ; 0 , K j ) = θ ∂P S∂θ ( Θ ; 0 , K j ) ,∂P S∂ ˜ (cid:15) ( Θ ; 0 , K j ) = (cid:15) ∂P S∂(cid:15) ( Θ ; 0 , K j ) . In this section, we present our experimental results for the calibration of the DDSVLMM(4) using the BFGS and LM algorithms. We ﬁrst detail the market data to be replicatedand discuss some implementation aspects. Then, we compare the BFGS and LM routineswith existing calibration methods with regards to the objective function value and to the12omputational time.For the calibration, we used a set of 280 market EURO and USD swaptions volatilities.For the purpose of the calibration, these volatilities are converted into prices using theBachelier formula based on a rate curve as used under the Solvency II regulation . TheATM swaptions maturities and tenors considered range into { , . . . , , , , , } . Foraway-from-the-money swaptions, we consider the same range for maturities and focus ona 10-year reference tenor; the strikes (in bps) range into + / − { , , } . As mentionedpreviously, the shift δ is objectiﬁed otherwise: we take δ = 0 .

1. The inter-forward corre-lation structure, captured by the β k parameters, is assessed by an PCA technique we donot detail here. The number N f of risk factors is set to 2.We implemented the pricing and gradient functions in C++. We used the R basefunction optim for the Nelder-Mead and BFGS algorithms and we used the C++ LEVMAR package (Lourakis, 2005) for the Levenberg-Marquardt algorithm. This choice is partic-ularly motivated by the fact that the

LEVMAR package implements the extended versionof the Levenberg-Marquardt algorithm allowing to handle bound constraints and theextended version allowing to handle both bound and linear inequality constraints. Asfor the computation of the numerical integral required in the pricing and the gradientfunctions, we resorted to the Gauss-Laguerre quadrature with 90 nodes.

We compare the BFGS and Levenberg-Marquardt algorithms with existing calibrationmethods based on the criteria of the objective function value. First, let us introduce thethree reference calibration methods used for the purpose of comparison.The ﬁrst one is the classical Heston method in which the price is computed throughthe formula (6) and the optimization is performed via the Nelder-Mead algorithm. Weset the maximum number of iterations to 500 and we repeat the optimization 3 times inorder to achieve a better convergence.The second calibration method relies on Edgeworth approximations: it consists inusing an approximate swaption price obtained using an Edgeworth expansion of theunknown density of the swap rate R m,n (see (Devineau et al., 2020) for a thoroughdescription of the method). The associated optimization method is the Nelder-Meadalgorithm. We use the same parametrization for the Nelder-Mead method as for theclassical Heston method.The last method is based on the LM algorithm but associated with a numerical gradi-ent estimation instead of using the derived analytical gradient. The price is still computedwith pricing formula (6). We use the central diﬀerence scheme in order to approximatethe gradient as: ∇ P S ( Θ ; 0 , K j ) (cid:39) P S ( Θ + h ; 0 , K j , T m , T n ) − P S ( Θ − h ; 0 , K j )2 h Available at h := h e with e a vector whose components are 1 and h a small quantity. We take h = 10 − and a maximal number of 15 iterations.Let us also present the diﬀerent parametrizations studied for the BFGS and Levenberg-Marquardt algorithms relying on the analytical gradient as derived in this paper. For theBFGS algorithm, we test one conﬁguration in which the maximum number of iterationsis set to 30. For the Levenberg-Marquardt algorithm, we test two conﬁgurations havingthe same tolerance levels (cid:15) , (cid:15) and (cid:15) that are set to 10 − (see Appendix B.1). Thetwo conﬁgurations, respectively denoted by LM-BC-15 and LM-BC-30, use the version ofthe LM algorithm allowing to handle bound constraints only with a maximum numberof iterations of 15 and 30 respectively.We summarize the studied methods and their main characteristics in Table 1. Method name Optimizationmethod Maximum numberof iterations Ensures Fellercondition FeaturesHeston Nelder-Mead 500 Yes Repeated 3 timesEdgeworth Nelder-Mead 500 Yes Repeated 3 times. Uses a diﬀerentswaption pricing formula.LM-NUM Levenberg-Marquardt 15 No Uses the numerical gradientBFGS L-BFGS-B 30 No Uses the analytical gradientLM-BC-30 Levenberg-Marquardt 30 No Uses the analytical gradientLM-BC-15 Levenberg-Marquardt 15 No Uses the analytical gradient

Table 1: Summary of studied methodsBefore going further, we precise the bounds, particularly the lower bounds, for the 8parameters of the DDSVLMM to calibrate. Indeed, we experienced numerical instabilitieswhen some parameters equal zero or are very close to zero. For instance if the speedreversion parameter κ or the volatility of volatility (cid:15) become almost zero, the behavior ofthe model is pathologic. Therefore, we give the following lower ( LB ) and upper ( U B )bounds for Θ = ( a, b, c, d, κ, θ, (cid:15), ρ ): LB := (10 − , − , − , − , − , − , − , − . , U B := (+ ∞ , + ∞ , + ∞ , + ∞ , + ∞ , + ∞ , + ∞ , . . The procedure that has been led in order to compare the various calibration methods isthe following: for each set of data, we draw randomly 100 initial parameter starting valuesbetween LB and U B that satisfy the Feller condition and we perform the calibrationfor each described method starting from each of these initial guess. From this procedure,we retrieve the boxplots of Figure 1 using EURO data, which provide statistics on theobjective function value over the 100 calibrations.14igure 1: Boxplots of our selected benchmark of calibration methods.Note ﬁrst that the variance of replication errors when using Nelder-Mead and BFGSalgorithms are comparable and signiﬁcantly lower to that obtained when using Levenberg-Marquardt routines. To this extent, Nelder-Mead and BFGS seem less dependent toinitial guess. However, median errors reached by the Levenberg-Marquardt are lower:this can be explained by the fact that this algorithm is particularly suited for least-squares problems.When working with USD data, for which results are presented in Appendix C.1, theseconclusions still hold. Nevertheless, Nelder-Mead based techniques as well as BFGS usinganalytical gradient perform signiﬁcantly better on USD data. It can be explained by thefact that the Nelder-Mead algorithm is uniquely based on the topology of the graph ofthe objective function and thus substantially depends on market data in our context.Concerning the Levenberg-Marquardt approach more speciﬁcally, we ﬁrst note thatincreasing the number of iterations yields a better calibration (in average and in vari-ance), which is an expected behaviour. The improvement is signiﬁcant using USD data.Using the Levenberg-Marquardt technique coupled with a numerical approximation of thegradient of the target function leads to a rather wide range of objective function values,which illustrates the beneﬁt of using an analytical gradient in comparison. In addition,the median value for this approach is higher to that obtained when using Levenberg-Marquardt optimization with analytical gradient. Therefore, the information conveyedby the analytical Hessian matrix turns out to be valuable in order to stabilize the cali-bration process and reduce the dependency to the starting point. This conclusion holdsfor both EURO and USD data.So far we did not impose the Feller’s condition to be satisﬁed by the outputs of thecalibration procedures. The percentages of obtained parameters (over 100 calibrations)that do not satisfy it are given in Table 2. The number of unsatisﬁed Feller conditionsis rather signiﬁcant especially for the LM algorithm. Note that such condition is always15nsured for Nelder-Mead based calibrations as the Feller condition has been imposed asdepicted in (11). When working with USD data, behaviours of the diﬀerent algorithmsare similar. Method Percentage of unsatisﬁedFeller conditionHeston 0 %Edgeworth 0 %LM-NUM 34 %BFGS 13 %LM-BC-30 33 %LM-BC-15 32 %Table 2: Percentages of unsatisﬁed Feller condition over the 100 calibrations for eachmethods

Numerical results with bound constraints

In view of the previous results, we pro-pose to study two other conﬁgurations of the Levenberg-Marquardt algorithm. The ﬁrstone, denoted by LM-BLEIC, uses the version of the Levenberg-Marquardt algorithm al-lowing to handle both bound constraints and linear inequality constraints. The maximumnumber of iterations is set to 50. The second one, denoted by LM-BLEIC-NM, uses thesame algorithm as the LM-BLEIC conﬁguration but here 200 iterations of the Nelder-Mead method are peformed at the end of the Levenberg-Marquardt algorithm when thelatter converged towards a point whose objective function value is greater than a giventhreshold set to 0.3. We present the boxplots for the LM-BLEIC and LM-BLEIC-NMmethods in Figure 2 using EURO data. 16igure 2: Boxplots for LM-BLEIC and LM-BLEIC-NM (log-scale).We observe that for some initial guesses, the LM-BLEIC method converges towardspoints at which the objective function takes extremely high values. This is actuallydue to the fact that the implementation of the Levenberg-Marquardt algorithm handlinglinear inequality constraints can not ensure that the points stay in the feasible set. Asa consequence, some parameters may take negative values leading to numeric precisionerrors forcing the algorithm to stop prematurely. This is the reason why we introducedthe LM-BLEIC-NM. When the objective function value at the exit of the LM-BLEICmethod exceeds a given threshold, we perform some Nelder-Mead iterations in order toachieve a better optimum. This method gives us quite satisfying results compared toLM-BLEIC, as far as it allows to get rid of the extreme points. Using the Nelder-Meadallows thus to reduce the variance of the outputs of the calibration procedures but doesnot signiﬁcantly reduce the median value of the replication errors which remain close.Similar conclusions hold when comparing those two methods on USD data as depictedin Figure 4 in the Appendix C.1.

Following the same procedure as the one presented in the previous section, we computethe average CPU time, the average number of calls to the objective function and theaverage duration of a call. For gradient-based methods, we also compute the averagenumber of calls to the gradient and the average duration of a call. The computationswere performed on a computer with a 2.6 gigahertz Intel Core i7 processor and 8 giga-bytes of RAM. The results are presented in Table 3 for EURO data and in Appendix C.2in Figure 5 for USD data. Note that when using the Levenberg-Marquardt algorithm, we17ompute the average call time to the residual function f and its gradient ∇ f instead ofthe objective function F and its gradient ∇ F because the LEVMAR implementation takesas inputs the functions f and ∇ f , see Section 3.2. This explains why there is a diﬀerencebetween BFGS and Levenberg-Marquardt methods in terms of average call time to thegradient. Note also that calling the objective function in the Heston method appears totake in average less time than in gradient-based methods since F is actually replaced by˜ F deﬁned in (11) for Nelder-Mead methods, which simply returns a large value in allcases where the Feller condition is not satisﬁed. Method Average CPU time Average number ofcalls to F / f Average call time to F / f Average number ofcalls to ∇ F / ∇ f Average call time to ∇ F / ∇ f Heston 159.45 s 1489.20 0.11 s 0 0 sEdgeworth 8.47 s 1469.46 5.55 × − s 0 0 sLM-NUM-CENT 40.06 s 277.36 0.14 s 0 0 sBFGS 36.40 s 39.34 0.14 s 39.34 0.78 sLM-BC-30 33.89 s 104.80 0.14 s 30.00 0.63 sLM-BC-15 14.94 s 38.93 0.14 s 15.00 0.63 sLM-BLEIC 44.29 s 95.76 0.14 s 48.34 0.63 sLM-BLEIC-NM 45.51 s 102.29 0.14 s 48.34 0.63 s Table 3: Computational timesThe gradient-based algorithms (including those using numerical gradient) appear tobe much faster than the classical Heston calibration method using the Nelder-Mead al-gorithm, since they provide reductions in computational time ranging from 71 % (LM-BLEIC-NM) to 91 % (LM-BC-15). This gain in time results directly from the massivereduction of the number of calls to the objective function and thus, to the characteristicfunction. However, these methods are still not as fast as the Edgeworth method whichuses a large number of function calls but for which each call is very fast, which was thepurpose of the method at its origin, see (Devineau et al., 2020). However one needsto keep in mind that the reduction in computational time achieved by the Edgeworthexpansion comes at the cost of a lower ﬁtting accuracy to market data, as pointed outin (Devineau et al., 2020).As a main conclusion, the use of the analytical gradient rather than the numericalgradient allows to reduce the calibration duration by a factor of 2.7 when looking atLM-BC-15.Finally, let us observe that the LM algorithm using analytical gradient and includingbound constraints only (LM-BC) is faster than the BFGS algorithm. This can be ex-plained by the fact that the Levenberg-Marquardt algorithm takes advantage from theparticular shape of the calibration problem, i.e. a least squares optimization problem.To conclude this section, we justify numerically why we use the characteristic functionby (Albrecher, 2007) instead of that of (del Ba˜no et al., 2010) in the computation ofthe swaption price. We compared the average call time of both characteristic functionrepresentations over 1000 randomly chosen parameters and observed an average call timeto Albrecher’s representation 5 % lower than the average call time to Cui’s representation.The time diﬀerence between these two representations of the characteristic function isessentially due to the fact that the coeﬃcients A and B in Albrecher’s representation can18e easily written as function of a few quantities (see e.g. Appendix 5.3 of (Devineau etal., 2020)) which allows to perform less computations than for Cui’s representation. A Gradient expression

The vector χ in Proposition 1 writes: χ ( Θ ; z ) := [ χ a ( z ) , χ b ( z ) , χ c ( z ) , χ d ( z ) , χ κ ( z ) , χ θ ( z ) , χ (cid:15) ( z ) , χ ρ ( z )] T where χ x denotes the partial derivative of ϕ with respect to the parameter x . Using that χ x ( z ) = ∂ψ ( Θ ; X m,n (0) ,V , z ) ∂x and since ψ ( Θ ; X m,n (0) , V , z ) is deﬁned recursively withterminal value ψ ( Θ ; X m,n (0) , V , T m ; z ), one needs to compute ∂ψ ( Θ ; X m,n (0) ,V ,t ; z ) ∂x on eachinterval ( τ j , τ j +1 ]. We will rather give the partial derivatives of ψ at any time. A.1 Partial derivative of ψ with respect to θ Since A j ( τ ) and consequently also B ( τ, z ) are independant from θ for all t , the partialderivative of ψ with respect to θ writes: ∂ψ ( X m,n ( t ) , V ( t ) , t ; z ) ∂θ = ∂A ( τ, z ) ∂θ ψ ( X m,n ( t ) , V ( t ) , t ; z ) . The partial derivative of A ( τ, z ) is given by: ∂A ( τ, z ) ∂θ = ∂A ( τ j , z ) ∂θ − κ ˜ ρλz ( τ − τ j ) (cid:15) + 2 κ(cid:15) D j ( τ )since D j ( τ ) is independant from θ . A.2 Partial derivative of ψ with respect to κ The partial derivative of ψ with respect to κ writes: ∂ψ ( X m,n ( t ) , V ( t ) , t ; z ) ∂κ = (cid:20) ∂A ( τ, z ) ∂κ + V ∂B ( τ, z ) ∂κ (cid:21) ψ ( X m,n ( t ) , V ( t ) , t ; z )with ∂A ( τ, z ) ∂κ = ∂A ( τ j , z ) ∂κ − θ ˜ ρλz ( τ − τ j ) (cid:15) + 2 θ(cid:15) D j ( τ ) + 2 κθ(cid:15) ∂D j ( τ ) ∂κ ,∂B ( τ, z ) ∂κ = ∂B ( τ j , z ) ∂κ − V ∂A j ( τ ) ∂κ , ∂D j ( τ ) ∂κ = µν + 12 (cid:16) − µν (cid:17) ( τ − τ j ) − E j ( τ ) ∂E j ( τ ) ∂κ , E j ( τ ) ∂E j ( τ ) ∂κ = 1 νV A j ( τ ) (cid:20) µ + ν (cid:18) − (cid:15) ∂B ( τ j , z ) ∂κ (cid:19) tanh ν ( τ − τ j )2 (cid:21) , , − E j ( τ ) ( ν − µ + (cid:15) B ( τ j , z )) µν ( τ − τ j ) e − ν ( τ − τ j ) ,∂A j ( τ ) ∂κ = 1˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂κ − A j ( τ )˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂κ , A j ( τ ) ∂ ˜ A j ( τ ) ∂κ = 1 A j ( τ ) (cid:20) ν ( τ − τ j )2 (cid:18) µ ∂B ( τ j , z ) ∂κ − (cid:15) B ( τ j , z )) ∂B ( τ j , z ) ∂κ + B ( τ j , z ) (cid:19) + µ ( τ − τ j )2 ν ( B ( τ j , z )(2 µ − (cid:15) B ( τ j , z )) + λ ( z − z )) (cid:21) , A j ( τ ) ∂ ˜ A j ( τ ) ∂κ = 1 V A j ( τ ) (cid:20) (cid:18) µ ( τ − τ j )2 − (cid:15) ∂B ( τ j , z ) ∂κ (cid:19) tanh ν ( τ − τ j )2+ µν (cid:18) τ − τ j µ − (cid:15) B ( τ j , z )) + 1 (cid:19) (cid:21) . Note that in order to make the calculation of the partial derivative of A j easier, we write A j as the ratio of ˜ A j and ˜ A j instead of A j and A j , where ˜ A j and ˜ A j are deﬁned as:˜ A j ( τ ) = A j ( τ ) cosh ν ( τ − τ j )2 , ˜ A j ( τ ) = A j ( τ ) cosh ν ( τ − τ j )2 . This trick will be re-employed for other derivatives.

A.3 Partial derivative of ψ with respect to (cid:15) The partial derivative of ψ with respect to (cid:15) writes: ∂ψ ( X m,n ( t ) , V ( t ) , t ; z ) ∂(cid:15) = (cid:20) ∂A ( τ, z ) ∂(cid:15) + V ∂B ( τ, z ) ∂(cid:15) (cid:21) ψ ( X m,n ( t ) , V ( t ) , t ; z )with ∂A ( τ, z ) ∂(cid:15) = ∂A ( τ j , z ) ∂(cid:15) + κθ ˜ ρλz ( τ − τ j ) (cid:15) − κθ(cid:15) D j ( τ ) + 2 κθ(cid:15) ∂D j ( τ ) ∂(cid:15) ,∂B ( τ, z ) ∂(cid:15) = ∂B ( τ j , z ) ∂(cid:15) − V ∂A j ( τ ) ∂(cid:15) , ∂ξ∂(cid:15) = ξ − (cid:15) ,∂µ∂(cid:15) = µ − κ(cid:15) ,∂ν∂(cid:15) = ν − κµ(cid:15)ν ,∂D j ( τ ) ∂(cid:15) = 1 ν ∂ν∂(cid:15) + 12 (cid:18) κ ∂ξ∂(cid:15) − ∂ν∂(cid:15) (cid:19) ( τ − τ j ) − E j ( τ ) ∂E j ( τ ) ∂(cid:15) , E j ( τ ) ∂E j ( τ ) ∂(cid:15) = 1 V A j ( τ ) (cid:20) ∂ν∂(cid:15) + (cid:18) ∂µ∂(cid:15) − (cid:15)B ( τ j , z ) − (cid:15) ∂B ( τ j , z ) ∂(cid:15) (cid:19) tanh ν ( τ − τ j )2 (cid:21) − E j ( τ ) ( ν − µ + (cid:15) B ( τ j , z )) ∂ν∂(cid:15) ( τ − τ j ) e − ν ( τ − τ j ) ,∂A j ( τ ) ∂(cid:15) = 1˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂(cid:15) − A j ( τ )˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂(cid:15) , A j ( τ ) ∂ ˜ A j ( τ ) ∂(cid:15) = 1 A j ( τ ) (cid:20) ν ( τ − τ j )2 (cid:18) µ ∂B ( τ j , z ) ∂(cid:15) − (cid:15) B ( τ j , z ) ∂B ( τ j , z ) ∂(cid:15) + ∂µ∂(cid:15) B ( τ j , z ) − (cid:15) ( B ( τ j , z )) (cid:19) + ∂ν∂(cid:15) τ − τ j B ( τ j , z )(2 µ − (cid:15) B ( τ j , z )) + λ ( z − z )) (cid:21) , A j ∂ ˜ A j ∂(cid:15) = 1 V A j (cid:20) ∂ν∂(cid:15) (cid:18) µ − (cid:15) B ( τ j , z )) τ − τ j (cid:19) + tanh ν ( τ − τ j )2 (cid:18) ν ∂ν∂(cid:15) τ − τ j ∂µ∂(cid:15) − (cid:15)B ( τ j , z ) − (cid:15) ∂B ( τ j , z ) ∂(cid:15) (cid:19) (cid:21) . A.4 Partial derivative of ψ with respect to ρ The partial derivative of ψ with respect to ρ writes: ∂ψ ( X m,n ( t ) , V ( t ) , t ; z ) ∂ρ = (cid:20) ∂A ( τ, z ) ∂ρ + V ∂B ( τ, z ) ∂ρ (cid:21) ψ ( X m,n ( t ) , V ( t ) , t ; z )with ∂A ( τ, z ) ∂ρ = ∂A ( τ j , z ) ∂ρ − κθλ ˜ ρz ( τ − τ j ) (cid:15)ρ + 2 κθ(cid:15) ∂D j ( τ ) ∂ρ ,∂B ( τ, z ) ∂ρ = ∂B ( τ j , z ) ∂ρ − V ∂A j ( τ ) ∂ρ , ∂µ∂ρ = µ − κρ ,∂ν∂ρ = µν ∂µ∂ρ ,∂D j ( τ ) ∂ρ = 1 ν ∂ν∂ρ + 12 (cid:18) κ ξ − ρ − ∂ν∂ρ (cid:19) ( τ − τ j ) − E j ( τ ) ∂E j ( τ ) ∂ρ , E j ( τ ) ∂E j ( τ ) ∂ρ = 1 V A j ( τ ) (cid:20) ∂ν∂ρ + (cid:18) ∂µ∂ρ − (cid:15) ∂B ( τ j , z ) ∂ρ (cid:19) tanh ν ( τ − τ j )2 (cid:21) − E j ( τ ) ∂ν∂ρ ( τ − τ j )( ν − µ + (cid:15) B ( τ j , z )) e − ν ( τ − τ j ) ,∂A j ( τ ) ∂ρ = 1˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂ρ − A j ( τ )˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂ρ , A j ( τ ) ∂ ˜ A j ( τ ) ∂ρ = 1 A j ( τ ) (cid:20) ν ( τ − τ j )2 (cid:18) µ ∂B ( τ j , z ) ∂ρ − (cid:15) B ( τ j , z ) ∂B ( τ j , z ) ∂ρ + B ( τ j , z ) ∂µ∂ρ (cid:19) + ∂ν∂ρ τ − τ j B ( τ j , z )(2 µ − (cid:15) B ( τ j , z )) + λ ( z − z )) (cid:21) , A j ( τ ) ∂ ˜ A j ( τ ) ∂ρ = 1 V A j ( τ ) (cid:20) (cid:18) µ − (cid:15) B ( τ j , z )) τ − τ j (cid:19) ∂ν∂ρ + (cid:18) ν ∂ν∂ρ t − T j ∂µ∂ρ − (cid:15) ∂B ( τ j , z ) ∂ρ (cid:19) tanh ν ( τ − τ j )2 (cid:21) . A.5 Partial derivatives of ψ with respect to a , b , c and d One can observe that only γ k ( τ ) depends on the parameters a , b , c et d , which meansthat the derivatives of the characteristic function with respect to these parameters areclose from each other. Consequently, we group the four partial derivatives in this section.Let x ∈ { a, b, c, d } , the partial derivative of ψ with respect to x writes ∂ψ ( X m,n ( t ) , V ( t ) , t ; z ) ∂x = (cid:20) ∂A ( τ, z ) ∂x + V ∂B ( τ, z ) ∂x (cid:21) ψ ( X m,n ( t ) , V ( t ) , t ; z )with ∂A ( τ, z ) ∂x = ∂A ( τ j , z ) ∂x − κθz ( τ − τ j ) (cid:15) ∂ ( ˜ ρλ ) ∂x + 2 κθ(cid:15) ∂D j ( τ ) ∂x ,∂B ( τ, z ) ∂x = ∂B ( τ j , z ) ∂x − V ∂A j ( τ ) ∂x , ∂ ( ˜ ρλ ) ∂x = n − (cid:88) k = m w k (0) ∂ (cid:107) γ k ( τ ) (cid:107) ∂x ρ k ( τ ) ,∂ξ∂x = (cid:15)κ n − (cid:88) k = m α k (0) k (cid:88) l = m ( τ ) ∆ T l ( F l (0) + δ ) ρ l ( τ ) ∂ (cid:107) γ l ( τ ) (cid:107) ∂x T l F l (0) ,∂µ∂x = κ ∂ξ∂x − (cid:15)z ∂ ( ˜ ρλ ) ∂x ,∂λ ∂x = 2 (cid:42) n − (cid:88) k = m w k (0) ∂γ k ( τ ) ∂x , n − (cid:88) k = m w k (0) γ k ( τ ) (cid:43) ,∂ν∂x = 1 ν (cid:18) ∂µ∂x µ + 12 ∂λ ∂x (cid:15) ( z − z ) (cid:19) ,∂D j ( τ ) ∂x = 1 ν ∂ν∂x + 12 (cid:18) κ ∂ξ∂x − ∂ν∂x (cid:19) ( τ − τ j ) − E j ( τ ) ∂E j ( τ ) ∂x , E j ( τ ) ∂E j ( τ ) ∂x = 1 V A j ( τ ) (cid:20) ∂ν∂x + (cid:18) ∂µ∂x − (cid:15) ∂B ( τ j , z ) ∂x (cid:19) tanh ν ( τ − τ j )2 (cid:21) − E j ( τ ) ∂ν∂x ( τ − τ j )( ν − µ + (cid:15) B ( τ j , z )) e − ν ( τ − τ j ) ,∂A j ( τ ) ∂x = 1˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂x − A j ( τ )˜ A j ( τ ) ∂ ˜ A j ( τ ) ∂x , A j ( τ ) ∂ ˜ A j ( τ ) ∂x = 1 A j ( τ ) (cid:20) ν ( τ − τ j )2 (cid:18) µ ∂B ( τ j , z ) ∂x − (cid:15) B ( τ j , z ) ∂B ( τ j , z ) ∂x + B ( τ j , z ) ∂µ∂x + 12 ∂λ ∂x ( z − z ) (cid:19) + ∂ν∂x τ − τ j B ( τ j , z )(2 µ − (cid:15) B ( τ j , z )) + λ ( z − z )) (cid:21) , A j ( τ ) ∂ ˜ A j ( τ ) ∂x = 1 V A j ( τ ) (cid:20) (cid:18) µ − (cid:15) B ( τ j , z )) τ − τ j (cid:19) ∂ν∂x + (cid:18) ν ∂ν∂x τ − τ j ∂µ∂x − (cid:15) ∂B ( τ j , z ) ∂x (cid:19) tanh ν ( τ − τ j )2 (cid:21) . B On the Levenberg-Marquardt algorithm

In this section, we detail the classic Levenberg-Marquardt algorithm and the extendedversion handling bound constraints. 23 .1 Standard Levenberg-Marquardt algorithm

Algorithm 2:

Levenberg-Marquardt algorithm

Input: Θ , F , f , J , L, (cid:15) , (cid:15) , (cid:15) , k max , τ begin k ← ν ← A ← J ( Θ ) T J ( Θ ); g ← J ( Θ ) T f ( Θ ) found ← ( F ( Θ k ) ≤ (cid:15) or (cid:107) g (cid:107) ∞ ≤ (cid:15) ); µ ← τ max i { a ii } while !(found) and k < k max do Solve ( A + µ I ) d = − g if (cid:107) d (cid:107) ≤ (cid:15) (cid:107) Θ k (cid:107) then found ← true end else Θ k +1 ← Θ k + d if F ( Θ k ) − F ( Θ k +1 ) > and L ( ) − L ( d ) > then η ← ( F ( Θ k ) − F ( Θ k +1 )) / ( L ( ) − L ( d )) A ← J ( Θ k +1 ) T J ( Θ k +1 ); g ← J ( Θ k +1 ) T f ( Θ k +1 ) trouve ← ( F ( Θ k +1 ) ≤ (cid:15) ou (cid:107) g (cid:107) ∞ ≤ (cid:15) ) µ ← µ max { , − (2 η − } ; ν ← end else µ ← µν ; ν ← ν end end k ← k + 1 end end In Algorithm 2, the function L corresponds to the value of the objective function F in Θ k +1 when the residuals are approximated by a ﬁrst order Taylor expansion. Mathe-matically, we have: F ( Θ k + d ) (cid:39) L ( d ) = 12 (cid:107) f ( Θ k ) + J ( Θ k ) d (cid:107) Hence, L ( ) − L ( d ) can be interpreted as the gain predicted by a linear model. It is easyto check that this quantity is always positive.The quantity η (appearing ﬁrst in line 13 of the routine above) allows to measure howgood the approximation of F ( Θ k + d ) by L ( d ) is. A large value of η indicates that L ( d ) isa good approximation of F ( Θ k + d ), whereas a small value of η indicates the contrary. Inthe ﬁrst case, µ is decreased in order to imitate the Gauss-Newton algorithm behaviour;in the second case, µ is increased in order to imitate the behaviour of the steepest descentmethod.As regards the updating strategy of the damped parameter, we use the one introducedby (Nielsen, 1999). 24 .2 Extended Levenberg-Marquardt algorithm handling boundconstraints We present an extension of the classic Levenberg-Marquardt algorithm which can handlebound constraints. This extension was ﬁrst proposed by (Kanzow et al., 2002).Let us denote by P X the projection onto the feasible set X (which in the frameworkof the DDSVLMM is equal to ( R + ) × ( R ∗ + ) × ] −

1; 1[). With respect to the Algorithm 2,only the lines from 11 to 20 must be modiﬁed. They must be replaced by the followingones.

Algorithm 3:

Extended Levenberg-Marquardt algorithm Θ k +1 ← P X ( Θ k + d ); d ← Θ k +1 − Θ k if F ( P X ( Θ k +1 )) ≤ γF (Θ k ) then if F ( Θ k ) − F ( Θ k +1 ) > and L ( ) − L ( d ) > then η ← ( F ( Θ k ) − F ( Θ k +1 )) / ( L ( ) − L ( d )) A ← J ( Θ k +1 ) T J ( Θ k +1 ); g ← J ( Θ k +1 ) T f ( Θ k +1 ) found ← ( F ( Θ k +1 ) ≤ (cid:15) ou (cid:107) g (cid:107) ∞ ≤ (cid:15) ) µ ← µ max { , − (2 η − } ; ν ← end else µ ← µν ; ν ← ν end end else if ∇ F ( Θ k +1 ) T d ≤ then Perform a line search, i.e. look for α such that F ( P X ( Θ k + α d )) is reasonablylower than F ( Θ k ) end else Apply a projected gradient step, i.e. compute t = max l ∈ N β l such that F ( P X ( Θ k − t g )) ≤ F ( Θ k ) + σ ∇ g T ( P X ( Θ k − t g ) − Θ k ) end The parameters γ , β and σ are empirically ﬁxed parameters in (0 , Results for USD market data

C.1 Methods accuracy

Figure 3: Boxplots comparing calibration methods of Table 1Method Percentage of unsatisﬁedFeller conditionHeston 0 %Edgeworth 0 %LM-NUM 25 %BFGS 3 %LM-BC-30 23 %LM-BC-15 23 %Table 4: Percentages of unsatisﬁed Feller condition over the 100 calibrations for eachmethods 26igure 4: Boxplots comparing LM-BLEIC and LM-BLEIC-NM

C.2 Time eﬃciency

Method Average CPU time Average number ofcalls to F / f Average call time to F / f Average number ofcalls to ∇ F / ∇ f Average call time to ∇ F / ∇ f Heston 166.98 s 1492.42 0.11 s 0 0 sEdgeworth 9.00 s 1487.06 5.75 × − s 0 0 sLM-NUM-FWD 25.58 s 177.55 0.14 s 0 0 sLM-NUM-CENT 40.37 s 282.85 0.14 s 0 0 sBFGS 34.90 s 37.89 0.14 s 37.89 0.77 sLM-BC-15 15.67 s 42.70 0.14 s 15.00 0.63 sLM-BC-30 31.33 s 85.06 0.14 s 30.00 0.63 sLM-BLEIC 45.23 s 98.61 0.14 s 48.21 0.64 sLM-BLEIC-NM 47.21 s 104.64 0.14 s 48.21 0.64 s Table 5: Computational times

References (Albrecher, 2007) H. Albrecher, P. Mayer, W. Schoutens, and J. Tistaert.

Volatility skewsand extensions of the Libor market model . Wilmott, (1), pp. 83–92, 2007.(Alfonsi, 2015) A. Alfonsi.

Aﬃne Diﬀusions and Related Processes: Simulation, Theoryand Applications . Springer, Vol.6, 2015.(Arrouy et al., 2020) P.-E. Arrouy, A. Boumezoued, B. Lapeyre and S. Mehalla.

JacobiStochastic Volatility factor for the Libor Market Model . HAL preprint: hal-02468583,2020. 27Brigo and Mercurio, 2007) D. Brigo and F. Mercurio.

Interest rate models-theory andpractice: with smile, inﬂation and credit . Springer Science & Business Media, 2007.(Byrd et al., 1995) R.H. Byrd, P. Lu, J. Nocedal and C. Zhu.

A limited memory algorithmfor bound constrained optimization . SIAM Journal on Scientiﬁc Computing, Vol.16(5),pp. 1190–1208, 1995.(Carr and Madan, 1999) P. Carr and D. Madan.

Option valuation using the fast Fouriertransform . Journal of computational ﬁnance, Vol.2(4), pp. 61–73, 1999.(Cui et al., 2017) Y. Cui, S. del Ba˜no Rollin and G. Germano.

Full and fast calibrationof the Heston stochastic volatility model . European Journal of Operational Research,Vol.263(2), pp. 625–638, 2017.(del Ba˜no et al., 2010) S. del Ba˜no Rollin, A. Ferreiro-Castilla and F. Utzet.

On thedensity of log-spot in the Heston volatility model . Stochastic Processes and their Ap-plications author, Vol.120(10), pp. 2037–2063, 2010.(Devineau et al., 2020) L. Devineau, P.-E. Arrouy, P. Bonnefoy and A. Boumezoued.

Fast calibration of the Libor Market Model with Stochastic Volatility and DisplacedDiﬀusion . Journal of Industrial and Management Optimization, Vol.16(4), p. 1699,2020.(Duﬃe et al., 2000) D. Duﬃe, J. Pan and K. Singleton.

Transform analysis and assetpricing for aﬃne jump-diﬀusions . Econometrica, Vol.68(6), pp. 1343–1376, 2000.(Gudmundsson and Vyncke, 2019) H. Gudmundsson and D. Vyncke.

On the calibrationof the 3/2 model . European Journal of Operational Research, 2019.(Heston, 1993) S. L. Heston.

A closed-form solution for options with stochastic volatilitywith applications to bond and currency options . The review of ﬁnancial studies, OxfordUniversity Press, Vol.6(2), pp.327-343, 1993.(Kanzow et al., 2002) C. Kanzow, M. Fukushima and N. Yamashita.

Levenberg-Marquardt methods for constrained nonlinear equations with strong local convergenceproperties . Inst. of Applied Math. and Statistics, 2002.(Liu et al., 2019) S. Liu, A. Borovykh, L. A. Grzelak and C. W. Oosterlee.

A neuralnetwork-based framework for ﬁnancial model calibration . author=Liu, Shuaiqiang andBorovykh, Anastasia and Grzelak, Lech A and Oosterlee, Cornelis W, Journal ofMathematics in Industry, Springer, Vol.9(1), p. 9, 2019.(Lourakis, 2005) M. IA Lourakis.

A brief description of the Levenberg-Marquardt algo-rithm implemented by levmar . Foundation of Research and Technology, Vol.4(1), pp.1–6, 2005.(Madsen et al., 1999) K. Madsen, H.B. Nielsen and O. Tingleﬀ.

Methods for non-linearleast squares problems . 1 st edition, 1999.28Marquardt, 1963) D. W. Marquardt. An algorithm for least-squares estimation of non-linear parameters . Journal of the society for Industrial and Applied Mathematics,Vol.11(2), pp. 431–441, 1963.(Nash, 1979) J.C. Nash.

Compact Numerical Methods for Computers: Linear Algebraand Function Minimisation . Adam Hilger, Bristol, Vol.3, pp. 30–48, 1979.(Nielsen, 1999) H.B. Nielsen.

Damping parameter in Marquardt’s method . IMM, 1999.(Nocedal and Wright, 2006) J. Nocedal and S. Wright.

Numerical optimization . SpringerScience & Business Media, 2006.(Wu and Zhang, 2006) L. Wu and F. Zhang.