[PDF] Markovian approximation of the rough Bergomi model for Monte Carlo option pricing

Abstract

The recently developed rough Bergomi (rBergomi) model is a rough fractional stochastic volatility (RFSV) model which can generate more realistic term structure of at-the-money volatility skews compared with other RFSV models. However, its non-Markovianity brings mathematical and computational challenges for model calibration and simulation. To overcome these difficulties, we show that the rBergomi model can be approximated by the Bergomi model, which has the Markovian property. Our main theoretical result is to establish and describe the affine structure of the rBergomi model. We demonstrate the efficiency and accuracy of our method by implementing a Markovian approximation algorithm based on a hybrid scheme.

Full PDF

MMarkovian approximation of the rough Bergomi model forMonte Carlo option pricing

Qinwen Zhu ∗ , Gr´egoire Loeper †‡ , Wen Chen § , Nicolas Langren´e § July 7, 2020

Abstract

The recently developed rough Bergomi (rBergomi) model is a rough fractional stochastic volatility (RFSV)model which can generate more realistic term structure of at-the-money volatility skews compared with otherRFSV models. However, its non-Markovianity brings mathematical and computational challenges for modelcalibration and simulation. To overcome these difﬁculties, we show that the rBergomi model can be approxi-mated by the Bergomi model, which has the Markovian property. Our main theoretical result is to establish anddescribe the afﬁne structure of the rBergomi model. We demonstrate the efﬁciency and accuracy of our methodby implementing a Markovian approximation algorithm based on a hybrid scheme.

Keywords : rough fractional stochastic volatility, forward variance model, Markovian representation, volatilityskew, Volterra integral, rough Heston, hybrid scheme simulation

MSC codes : 60H35; 65C30; 91G20; 91G60; 65C05; 62P05;

JEL codes : C63; C15; C52; G13; G12; C02

ACM codes : G.3; I.6.1; F.2.1; G.1.2; I.6.3; G.1.10;

The rough Bergomi (rBergomi) model introduced by Bayer et al. (2016) has gained acceptance for stochasticvolatility modelling due to its power-law at-the-money volatility skew which is consistent with empirical studies(see Forde and Zhang 2017, Fukasawa 2017, Gatheral et al. 2018) and the market impact function under the no-arbitrage assumption (see Jusselin and Rosenbaum 2018). However, the stochastic process which characterizesthis volatility model is rougher than that of a Brownian motion; in particular, the lack of Markovianity makesclassical pricing methods infeasible.In order to price options under an rBergomi model and calibrate such a model, Bayer et al. (2018) proposehierarchical adaptive sparse grids for option pricing, Bayer et al. (2019) propose a deep learning method forrBergomi model calibration, Jacquier et al. (2018) develop pricing algorithms for VIX futures and options, andMcCrickerd and Pakkanen (2018) develop a ‘turbocharged’ Monte Carlo pricing method. In spite of these efforts,the inherent challenges brought by the rBergomi model still prevent its widespread adoption in industry.Inspired by the technique from Abi Jaber and El Euch (2019), Gatheral and Keller-Ressel (2019) and Harmsand Stefanovits (2019), in which the authors design a multi-factor stochastic volatility model with Markovianstructure to approximate the rough Heston model, we establish an analogous multi-factor afﬁne structure for therBergomi model. In the afﬁne structure, the Volterra kernel corresponds to a superposition of inﬁnitely manyOrnstein-Uhlenbeck (O-U) processes with different speeds of mean reversion. Truncating this inﬁnite sum into aﬁnite sum of O-U processes yields an approximation of the rBergomi model which is a Markovian approximated ∗ School of Mathematical Sciences, Nanjing Normal University, Nanjing, 210023, PR China. Email: [email protected] † School of Mathematics, Monash University, Clayton VIC Australia ‡ Centre for Quantitative Finance and Investment Strategies, Monash University, Clayton VIC Australia § CSIRO Data61, RiskLab, Docklands VIC Australia a r X i v : . [ q -f i n . M F ] J u l ergomi (aBergomi) model. We then prove the existence and uniqueness of solutions to this afﬁne aBergomimodel, and show that its afﬁne structure converges to the one of the rBergomi model.To numerically simulate the rBergomi model in practice, we adopt the hybrid scheme proposed in Bennedsenet al. (2017) for the stochastic Volterra-type integrals ˜ X = √ α + 1 (cid:82) ts ( t − s ) α dW s ( κ = 1 ). The hybrid schemeconsists in approximating the power-law kernel K pow = √ α + 1( t − s ) α by a combination of a power functionnear zero and a step function elsewhere, with a lower O ( N log N ) complexity, where N is the number of timesteps, as opposed to the O ( N ) complexity of the Cholesky method in Bayer et al. (2016). Here, the rBergomipower-law kernel K pow can be approximated by the exponential kernel K exp = (cid:80) ni =1 α i e − κ i ( t − s ) after truncating K pow before t . Our numerical tests demonstrate that using n = 25 exponential terms in K exp (i.e. a 25-term O-Uprocess), we can obtain an accurate (low root mean squared error), yet tractable and computationally efﬁcientapproximation of the fractional rBergomi model.The paper is organized as follows. In Section 2, we introduce the Bergomi and rBergomi models and discusstheir respective ATM volatility skews. In particular we provide for the ﬁrst time a proof that the ATM volatilityskew of the rBergomi model is equivalent to the power T H − while this does not hold for the Bergomi model(equation (8)). In Section 3, we establish the afﬁne structure of the rough Bergomi model. Section 4 is dedicatedto the approximation of the rough Bergomi model by a multi-factor Bergomi model. Finally, Section 5 comparesnumerical simulations of the rBergomi model with our approximated Bergomi (aBergomi) model with a ﬁnitenumber of terms, showing the effectiveness of our approximation. This section introduces the Bergomi and rough Bergomi stochastic volatility models (Deﬁnitions 2 and 1), alongwith the corresponding notations used throughout the paper.We consider a ﬁltered probability space (Ω , F , ( F t ) t ∈ R , Q ) , which supports two dimensional correlated Brownianmotions W and B . A log price process X t := log( S t ) is assumed to follow the dynamics dX t = − V t dt + (cid:112) V t dW t , (1)where V t ≥ is the instantaneous spot variance process. Let ξ ut , u ≥ t be the instantaneous forward variance fordate u observed at time t ; in particular ξ tt = V t corresponds to the spot variance.Bayer et al. (2016) proposed the so-called rough Bergomi model where the forward variance follows dξ ut = ξ ut η √ α + 1 ( u − t ) α dB t , u ≥ t (2)where W and B have correlation ρ , α (cid:44) H − ∈ (cid:0) − , (cid:1) is a negative exponent depending on the Hurstexponent H ∈ (cid:0) , (cid:1) of the underlying fractional Brownian motion, and η is a positive parameter depending on H . The deﬁnition of the rBergomi model is summarized below: Deﬁnition 1.

The rBergomi stochastic volatility model takes the form  dX t = − V t dt + (cid:112) V t dW t ,dξ ut = ξ ut η √ α + 1( u − t ) α dB t , (3) where α = H − ∈ (cid:0) − , (cid:1) , and d (cid:104) W, B (cid:105) t = ρdt . By contrast, the two-factor Bergomi model is deﬁned as follows.

Deﬁnition 2.

The two-factor Bergomi model (Bergomi 2005, Bergomi 2009) is deﬁned by:  dX t = − V t dt + (cid:112) V t dW St ,dξ ut = ξ ut α θ ω (cid:16) (1 − θ ) e − κ X ( u − t ) dW Xt + θe − κ Y ( u − t ) dW Yt (cid:17) (4)2 ith d (cid:104) W S , W X (cid:105) t = ρ SX dtd (cid:104) W S , W Y (cid:105) t = ρ SY dtd (cid:104) W X , W Y (cid:105) t = ρ XY dt, where ξ tt = V t = ω is the lognormal volatility of the instantaneous variance under the normalizing factor α θ = (cid:0) (1 − θ ) + 2 ρ XY θ (1 − θ ) + θ (cid:1) − and θ is a mixing parameter of the short term factor driven by W X and the long term factor driven by W Y ( κ X > κ Y ). Assumption 1.

Without loss of generality, we assume throughout the paper that the initial forward variance curve ξ u , u ≥ is ﬂat. This simpliﬁcation is common in the rBergomi literature, see for example Bayer et al. (2016),Bayer et al. (2018) and Bayer et al. (2019). We use the notation ξ for the constant initial forward variance curve. This subsection derives the ATM volatility skew of the rBergomi and Bergomi models, as the more realistic ATMvolatility skew of the rBergomi model over the Bergomi model is one of the motivations behind the introductionof the rBergomi model.From Bergomi and Guyon (2012), we can deﬁne the price and the volatility dynamics of a generic stochasticvolatility model as follows:  dX t = − V t dt + (cid:112) V t dW t dξ ut = λ ( t, u, ξ ut ) dB t , (5)where in particular note that X t = ξ tt = ln( S t ) is the log-spot, V t is the instantaneous spot variance, ξ ut is theinstantaneous forward variance for date u observed at time t , and λ = ( λ , · · · , λ d ) is the volatility of forwardinstantaneous variances which takes values in R d where d is the dimension of the Brownian motion B . Note thatin this formulation, the covariance between spot and variance is modelled through the ﬁrst component of λ , seeBergomi and Guyon (2012) for more details.One can derive the following second-order expression (w.r.t. volatility of volatility) for the Black-Scholes impliedvolatility: σ BS ( k, T ) = ˆ σ AT MT + S T k + C T k + O ( ε ) , (6)where k = ln (cid:16) KS (cid:17) , K is the strike and ε is a dimensionless scaling factor for the volatility of variances. TheATM volatility and the two coefﬁcients S T and C T are given by ˆ σ AT MT = ˆ σ V ST (cid:20) ε v C Xξ + ε v (cid:0) (cid:0) C Xξ (cid:1) − v ( v + 4) C ξξ + 4 v ( v − C µ (cid:1)(cid:21) , S T = ˆ σ V ST (cid:20) ε v C Xξ + ε v (cid:0) C µ v − C Xξ ) (cid:1)(cid:21) , C T = ˆ σ V ST ε v (cid:2) C µ v + C ξξ v − (cid:0) C Xξ (cid:1) (cid:3) where v = (cid:82) T ξ s ds is the total variance to expiration T , ˆ σ V ST = (cid:112) vT = (cid:113) (cid:82) T ξ s dsT is the effective volatility, and C Xξ , C ξξ , C µ are autocorrelations (Bergomi and Guyon, 2012): • C Xξt ( ξ ) = (cid:82) Tt ds (cid:82) Ts duµ ( s, u, ξ ) = (cid:82) Tt ds (cid:82) Ts du E [ dX s dξ us ] ds is the doubly integrated spot-variance covari-ance function • C Xξ = C Xξ ( ξ ) = (cid:82) T ds (cid:82) Ts du E [ dX s dξ u ] ds • C ξξt ( ξ ) = (cid:82) Tt ds (cid:82) Ts du (cid:82) Ts du (cid:48) ν ( s, u, u (cid:48) , ξ ) = (cid:82) Tt ds (cid:82) Ts du (cid:82) Ts u (cid:48) E (cid:104) dξ us dξ u (cid:48) s (cid:105) ds is the triply integrated vari-ance/variance covariance function 3 C ξξ = C ξξ ( ξ ) = (cid:82) T dt (cid:82) Ts du (cid:82) Ts du (cid:48) E (cid:104) dξ u dξ u (cid:48) (cid:105) ds . • C µt ( ξ ) = (cid:82) Tt ds (cid:82) Ts duµ ( s, u, ξ ) ∂ ξ u (cid:0) C Xξs ( ξ ) (cid:1) is the double time-integral of the instance spot variancecovariance function times the sensitivity of C Xξt ( ξ ) with respect to instantaneous forward variances • C µ = C µ ( ξ ) = (cid:82) T ds (cid:82) Ts du E [ dX s dξ u ] ds ∂ ξ u (cid:0) C Xξs ( ξ ) (cid:1) . In the rBergomi model (3) , the ATM volatility skew ψ ( T ) satisﬁes ψ ( T ) (cid:44) (cid:12)(cid:12)(cid:12)(cid:12) ∂∂ k σ BS ( k, T ) (cid:12)(cid:12)(cid:12)(cid:12) k =0 ∼ T H − . (7) Proof.

We ﬁrst explicit the autocorrelation functional in the rBergomi model. Using the fact that E [ dX t dξ ut ] dt = ρη √ α + 1( u − t ) α (cid:112) ξ tt ξ ut , the autocorrelation functionals C Xξ and C ξξ are given by C Xξ = (cid:90) T ds (cid:90) Ts du E [ dX s dξ u ] ds = ρη √ α + 1 (cid:90) T (cid:112) ξ s ds (cid:90) Ts ξ u ( u − s ) α du + O (cid:0) ε (cid:1) ,C ξξ = (cid:90) T ds (cid:90) Ts du (cid:90) Ts du (cid:48) E [ dξ u dξ u (cid:48) ] ds = (cid:90) T ds (cid:90) Ts du (cid:90) Ts du (cid:48) η (2 α + 1)( u − s ) α ( u (cid:48) − s ) α ξ u ξ u (cid:48) = η (2 α + 1) (cid:90) T ds (cid:32)(cid:90) T ξ u ( u − s ) α du (cid:33) + O (cid:0) ε (cid:1) . Then, using the fact that ∂ ξ us ( C Xξs ( ξ )) = ρη √ α + 1 (cid:34)(cid:90) Ts dt (cid:112) ξ ts ( u − t ) α u>t + 12 (cid:112) ξ us (cid:90) Tu ξ ts ( t − u ) α dt (cid:35) = ρη √ α + 1 (cid:34)(cid:90) us dt (cid:112) ξ ts ( u − t ) α + 12 (cid:112) ξ us (cid:90) Tu ξ ts ( t − u ) α dt (cid:35) , we obtain C µ = (cid:90) T ds (cid:90) Ts du E [ dX s dξ u ] dt ∂ ξ u (cid:0) C Xξs ( ξ ) (cid:1) = ρ η (2 α + 1) (cid:90) T (cid:112) ξ s ds (cid:90) Ts ( u − s ) α du × (cid:34)(cid:90) us (cid:113) ξ t ξ u ( u − t ) α dt + (cid:112) ξ u (cid:90) Tu ξ t ( t − u ) α dt (cid:35) + O (cid:0) ε (cid:1) . Therefore, using Assumption 1, we obtain the following explicit ﬁrst-order approximation: C Xξ = ρη √ H (cid:90) T (cid:112) ξ ds (cid:90) Ts ξ ( u − s ) α du + O (cid:0) ε (cid:1) ≈ C H ξ T H + , C H is a constant depending on H . We are then able to compute the ﬁrst-order approximations of thethree correlation values C Xξ , C ξξ , C µ explicitly. The ﬁrst-order approximation of σ BS ( k, T ) can be written asfollows: σ BS ( k, T ) = ˆ σ V ST + 14 v C xξ ˆ σ V ST ε + 12 v C Xξ ˆ σ V ST εk = ˆ σ V S + (cid:18) v + k v (cid:19) C H ξ T H + ˆ σ V ST ε, Thus, the ATM volatility skew generated by the rBergomi model satisﬁes Equation 7, which is consistent withempirical evidence (see for example, Gatheral et al. (2018)).

We now compare this result to the volatilityskew in the classical two-factor Bergomi model.

Theorem 2.

In the two-factor Bergomi model , the ATM volatility skew satisﬁes ψ ( T ) ∼ C (cid:0) κ X T − e − κ X T (cid:1) T + C (cid:0) κ Y T − e − κ Y T (cid:1) T (8) Proof.

The Brownian motions W S , W X , W Y can be decomposed as: W S = W W X = ρ SX W + (cid:113) − ρ SX W W Y = ρ SY W + χ (cid:113) − ρ SY W + (cid:113) (1 − χ )(1 − ρ SY ) W , where W , W , W are three independent Brownian motions and χ (cid:44) ρ XY − ρ SX ρ SY √ − ρ SX √ − ρ SY . Thus the volatilities ofvariance λ = ( λ , λ , λ ) in the general formulation (5) can be written as: λ ( t, u, ξ ) = α θ ωξ u (cid:104) (1 − θ ) ρ SX e − κ X ( u − t ) + θρ SY e − κ Y ( u − t ) (cid:105) ,λ ( t, u, ξ ) = α θ ωξ u (cid:20) (1 − θ ) (cid:113) − ρ SX e − κ X ( u − t ) + θχ (cid:113) − ρ SY e − κ Y ( u − t ) (cid:21) ,λ ( t, u, ξ ) = α θ ωξ u θ (cid:113) (1 − χ ) (1 − ρ SY ) e − κ Y ( u − t ) , or equivalently: λ i ( t, u, ξ ) = α θ ωξ u (cid:16) ω iX e − κ X ( u − t ) + ω iY e − κ Y ( u − t ) (cid:17) , where ( ω iX ) i =1 , , (cid:44) (cid:18) (1 − θ ) ρ SX , (1 − θ ) (cid:113) − ρ SX , (cid:19) (cid:62) , ( ω iY ) i =1 , , (cid:44) (cid:18) θρ SY , θχ (cid:113) − ρ SY , θ (cid:113) (1 − χ )(1 − ρ SY ) (cid:19) (cid:62) . C Xξ = (cid:90) T du (cid:90) u dt (cid:113) ξ t λ ( t, u, ξ )= α θ ω (cid:34) (1 − θ ) ρ SX (cid:90) T duξ u (cid:90) u dt (cid:113) ξ t e − κ X ( u − t ) + θρ SY (cid:90) T duξ u (cid:90) u dt (cid:113) ξ t e − κ Y ( u − t ) (cid:35) C ξξ = (cid:88) i =1 (cid:90) T ds (cid:32)(cid:90) Ts duλ i ( s, u, ξ ) (cid:33) = α θ ω (cid:88) i =1 (cid:90) T ds (cid:32) ω iX (cid:90) Ts duξ u e − κ X ( u − s ) + ω iY (cid:90) Ts duξ u e − κ Y ( u − s ) (cid:33) C µ = (cid:90) T ds (cid:90) Ts du (cid:112) ξ s λ ( s, u, ξ ) (cid:32) (cid:112) ξ u (cid:90) Tu dtλ ( u, t, ξ ) + (cid:90) us dr (cid:112) ξ r ∂ ξ u λ ( r, u, ξ ) (cid:33) . Using once again Assumption 1, we obtain C Xξ = α θ ωξ T ( ω X J ( κ X T ) + ω Y J ( κ Y T )) C ξξ = α θ ωξ T ( ω + ω X I ( κ X T ) + ω Y I ( κ Y T ) + ω XX I (2 κ X T ) + ω Y Y I (2 κ Y T ) + ω XY I (( κ X + κ Y ) T )) , where ω = (cid:88) i =1 (cid:18) ω iX κ X T + ω iY κ Y T (cid:19) , ω X = − (cid:88) i =1 ω iX κ X T (cid:18) ω iX κ X T + ω iY κ Y T (cid:19) , ω Y = − (cid:88) i =1 ω iY κ Y T (cid:18) ω iX κ X T + ω iY κ Y T (cid:19) ,ω XX = (cid:88) i =1 ω iX κ X T , ω Y Y = (cid:88) i =1 ω iY κ Y T , ω XY = 2 (cid:88) i =1 ω iX ω iY κ X κ Y T , and I ( z ) = 1 − e − z z , J ( z ) = z − e − z z , K ( z ) = 1 − e − z − ze − z z , H ( z ) = J ( z ) − K ( z ) z . Similarly, we have C µ = α θ ω ξ T ( C µ + C µ ) , where the coefﬁcients C µ = 12 ω X H ( κ X T ) + 12 ω Y H ( κ Y T ) − ω X ω Y J ( κ Y T ) − J ( κ X T )( κ X + κ Y ) T ,C µ = ω (cid:48)(cid:48) X J ( κ X T ) + ω (cid:48)(cid:48) Y J ( κ Y T ) + ω (cid:48)(cid:48) XX J (2 κ X T ) + ω (cid:48)(cid:48) Y Y J (2 κ Y T ) + ω (cid:48)(cid:48) XY J (( κ X + κ Y ) T ) , and ω (cid:48)(cid:48) X = ω X κ X T + ω X ω Y κ Y T , ω (cid:48)(cid:48) Y = ω Y κ Y T + ω X ω Y κ Y T ,ω (cid:48)(cid:48) XX = − ω X κ X T , ω (cid:48)(cid:48)

Y Y = − ω Y κ Y T , ω (cid:48)(cid:48) XY = − ω X ω Y κ X T − ω X ω Y κ Y T .

Since C Xξ ∼ T (cid:16) C · κ X T − e − κXT ( κ X T ) + C · κ Y T − e − κY T ( κ Y T ) (cid:17) and C , C are constants, we can derive the termstructure of the ATM volatility skew as in equation (8) with ﬁrst order in ε .However, this result derived for the Bergomi model by the Bergomi-Guyon expansion (Bergomi and Guyon,2012) is inconsistent with empirical evidence, see for example Bayer et al. (2016). This suggests that the power-law kernel of the forward variance curve in the rBergomi model will lead to more realistic and accurate pricingand hedging results than the exponential kernel of the forward variance curve in the Bergomi model.6 Markovian representation of the rough Bergomi model

The purpose of this section is to establish the inﬁnite-dimensional afﬁne nature and Markovianity of the rBergomimodel.

Deﬁnition 3.

An Ornstein-Uhlenbeck (O-U) process Y xt is the solution of the following stochastic differentialequation (SDE): dY xt = x ( a − Y xt ) dt + σdB t (9) where x > is the mean-reversion speed, a > is the mean-reversion level, and B s is a standard Brownianmotion. Its strong solution is explicitly given by Y xt = Y + σ (cid:90) t e − x ( t − s ) dB s . (10) Assumption 2.

In the rest of the paper, we always assume that a (cid:44) Y (11) σ (cid:44) η √ α + 1 (12) where η and α come from the Deﬁnition 1 of the rBergomi model (see Bayer et al. 2016). Deﬁnition 4.

Without loss of generality, we deﬁne the sigma-ﬁnite measure µ ( dx ) on (0 , ∞ ) as µ ( dx ) = dxx

12 + H Γ( − H ) . Theorem 3.

Using Deﬁnitions 3 and 4, the Volterra type integral ˜ X t (cid:44) (cid:82) t ( t − s ) H − dB s in the rBergomi modelhas the Markovian representation σ ˜ X t = (cid:90) ∞ ( Y xt − Y x ) µ ( dx ) . (13) Proof. the Laplace transform of the measure µ in Deﬁnition 4 is L ( µ )( τ ) = (cid:90) ∞ e − τx µ ( dx ) = (cid:90) ∞ e − τx x − − H Γ (cid:0) − H (cid:1) dx = τ H − , which can be recognised as the power-law kernel in the Volterra type integral. Consequently, we have σ ˜ X t = (cid:82) t (cid:82) ∞ σe − x ( t − s ) µ ( dx ) dB s , and using Fubini’s stochastic theorem (Protter, 2005), we obtain σ ˜ X t = (cid:82) ∞ (cid:82) t σe − x ( t − s ) dB s µ ( dx ) .From Deﬁnition 3, where (cid:82) t σe − x ( t − s ) dB s = Y xt − Y , we obtain the Markovian representation given by equation13. Theorem 4.

The O-U process (10) has the afﬁne structure E (cid:104) e (cid:82) ∞ Y xt µ ( dx ) | F s (cid:105) = e σ (cid:82) t − s ( (cid:82) ∞ e − sx µ ( dx ) ) ds + (cid:82) ∞ Y xs e − ( t − s ) x µ ( dx ) . Proof.

From Fubini’s stochastic theorem, (cid:82) ∞ Y xt µ ( dx ) is Gaussian under the ﬁltration F s for ≤ s ≤ t , withmean E (cid:20)(cid:90) ∞ Y xt µ ( dx ) | F s (cid:21) = (cid:90) ∞ Y xs e − ( t − s ) x µ ( dx ) . Furthermore, using It¯o’s isometry, we have the conditional variance:

Var (cid:18)(cid:90) ∞ Y xt µ ( dx ) | F s (cid:19) = σ (cid:90) ts (cid:18)(cid:90) ∞ e − ( t − s ) x µ ( dx ) (cid:19) ds = σ (cid:90) t − s (cid:18)(cid:90) ∞ e − sx µ ( dx ) (cid:19) ds. E (cid:104) e (cid:82) ∞ Y xt µ ( dx ) | F s (cid:105) = e Var ( (cid:82) ∞ Y xt µ ( dx ) |F s ) + E [ (cid:82) ∞ Y xt µ ( dx ) |F s ]= e σ (cid:82) t − s ( (cid:82) ∞ e − sx µ ( dx ) ) ds + (cid:82) ∞ Y xs e − ( t − s ) x µ ( dx ) . From Deﬁnition 1 and Theorem 3, the rBergomi model can be rewritten in the following form:  dX t = − V t dt + (cid:112) V t dW t log V t ξ = (cid:90) ∞ ( Y xt − Y ) µ ( dx ) , where X t is the log stock price, ξ is the initial ﬂat forward variance curve, and W, B are two Brownian motionswith correlation d (cid:104) W, B (cid:105) t = ρdt and ρ ∈ [ − , .Our aim is now to write the log stock price X t in afﬁne form as the ﬁrst coordinate of an inﬁnite-dimensionalafﬁne process. To do so, we introduce the following symmetric nonnegative tensor: L ( µ ) ⊗ s L ( µ ) = (cid:8) y ⊗ : y ∈ L ( µ ) (cid:9) ⊂ L ( µ ) ⊗ ⊂ L ( µ ⊗ ) . Let Π t = ( i ⊗

1) ( Y xt ) ⊗ ∈ iL ( µ ) ⊗ s L ( µ ) . The relation (cid:0)(cid:82) ∞ Y xt µ ( dx ) (cid:1) = (cid:82) ∞ ( i ⊗ Y xt ) ⊗ µ ⊗ ( dx ) holds.Therefore, the log stock price dynamics can be written as dX t = (cid:112) ξ · (cid:18) E (cid:82) ∞ tµ ⊗ dx )4 dW t − E (cid:82) ∞ Y xt µ ( dx ) (cid:19) = (cid:112) ξ e (cid:82) ∞ tµ ⊗ dx )4 e − η t α +1 dW t − √ ξ e (cid:82) ∞ Y xt µ ( dx ) e − η t α +1 dt, where E is the Dol´eans-Dade stochastic exponential. Theorem 5.

The process Π t = ( i ⊗ Y xt ) ⊗ satisﬁes the afﬁne structure E (cid:104) e (cid:82) ∞ Π t µ ⊗ ( dx ) | F s (cid:105) = e Φ +Φ (14) where Φ (cid:44) −

12 log (cid:18) − (cid:90) t − s (cid:18)(cid:90) ∞ e − sx µ ( dx ) (cid:19) (cid:19) ds (15) Φ (cid:44) σ (cid:0) e − ( t − s ) x (cid:1) ⊗ − (cid:82) t − s (cid:0)(cid:82) ∞ e − sx µ ( dx ) (cid:1) ds . (16) Proof.

From Fubini’s stochastic theorem, (cid:82) ∞ Y xt µ ( dx ) σ √ (cid:82) t − s ( (cid:82) ∞ e − sx µ ( dx )) ds is Gaussian under the ﬁltration F s for ≤ s ≤ t , with conditional mean E  (cid:82) ∞ Y xt µ ( dx ) σ (cid:113)(cid:82) t − s (cid:0)(cid:82) ∞ e − sx µ ( dx ) (cid:1) ds | F s  = (cid:82) ∞ Y xs e − ( t − s ) x µ ( dx ) σ (cid:113)(cid:82) t − s (cid:0)(cid:82) ∞ e − sx µ ( dx ) (cid:1) ds and conditional variance Var  (cid:82) ∞ Y xt µ ( dx ) σ (cid:113)(cid:82) t − s ( (cid:82) ∞ e − sx µ ( dx )) ds | F s  = 1 . (cid:82) ∞ Π t µ ⊗ ( dx ) σ (cid:82) t − s (cid:0)(cid:82) ∞ e − sx µ ( dx ) (cid:1) ds =  (cid:82) ∞ Y xt µ ( dx ) σ (cid:113)(cid:82) t − s ( (cid:82) ∞ e − sx µ ( dx )) ds  is a noncentral χ distribution with one degree of freedom and noncentrality parameter (cid:0)(cid:82) ∞ Y xs e − ( t − s ) x µ ( dx ) (cid:1) σ (cid:82) t − s (cid:0)(cid:82) ∞ e − sx µ ( dx ) (cid:1) ds = (cid:82) ∞ Π s (cid:0) e − ( t − s ) x (cid:1) ⊗ µ ⊗ ( dx ) σ (cid:82) t − s (cid:0)(cid:82) ∞ e − sx µ ( dx ) (cid:1) ds . Thus the formulas (15) and (16) for Φ and Φ follow from the characteristic function of the noncentral χ distribution, which concludes the proof. Corollary 1.

The rBergomi model is an inﬁnite-dimensional Markovian process.Proof.

This corollary follows from Theorem 5 which exhibits that the rBergomi model has an exponential-afﬁnedependence on x , hence the model is Markovian in each dimension. In this Section, we ﬁrst introduce the aBergomi model which is used to approximate the rBergomi model (3).After that, we will demonstrate the existence and uniqueness of the solution of this aBergomi model. We alsoprove that the aBergomi model is well-deﬁned and the solution of the aBergomi model converges to that of therBergomi model when the number of terms n in the aBergomi model goes to inﬁnity. At the same time, we showthat the rBergomi model inherits the afﬁne structure of the Bergomi model.Since the rBergomi model can be represented by  dS t = S t (cid:112) V t dW t log (cid:26) V t ξ (cid:27) = (cid:90) ∞ σ (cid:90) t e − x ( t − s ) dB s µ ( dx ) and the n -term Bergomi model with the same Brownian motion in the variance process can be represented by  dS t = S t (cid:112) V t dW t log (cid:26) V t ξ (cid:27) = (cid:90) t (cid:32) n (cid:88) i =1 α i e − κ i ( t − s ) (cid:33) dB s , (17)we can view the rBergomi model as a continuous inﬁnite-term Bergomi model under the measure µ ( · ) , in whichthe mean-reversion speed x has been integrated from to ∞ , with the Brownian motion B s . We can thereforeapproximate the rBergomi model by a n -term exponential kernel K exp = (cid:80) ni =1 α i e − κ i ( t − s ) instead of the powerkernel K pow = √ α + 1( t − s ) α of the Volterra process in the rBergomi model.Following equation (17), after approximating the exponential kernel K ( τ ) = (cid:82) ∞ e − xτ µ ( dx ) by the kernel K n ( τ ) = (cid:80) ni =1 α ni e − τx ni , we can rewrite the aBergomi model (17) as follows:  dS nt = S nt (cid:112) V nt dW log (cid:26) V nt ξ (cid:27) = n (cid:88) i =1 α ni V n,it dV n,it = − x ni ( a − V nt ) dt + σdB t a = Y , σ = η √ α + 1 , (18)where ( α ni ) ≤ i ≤ n are positive weights, ( x ni ) ≤ i ≤ n are mean-reverting speeds, and (cid:104) W, B (cid:105) t = ρdt , with initialconditions S n = S = 1 and V n,i = V = 0 . 9 .1 Existence and uniqueness of ( S n , V n ) We rewrite V n in (18) as the following stochastic equation log (cid:18) V nt ξ (cid:19) = σ (cid:90) t K n ( t − s ) dB s . (19) Theorem 6.

Under the conditions of the model (18) , there exists a unique strong non-negative solution V n toequation (19) .Proof. Øksendal and Zhang (1993) implies that there exists a unique strong non-negative solution V n to equation(19) under the conditions of the model (18) .Then the strong existence and uniqueness of ( S n , V n ) follows, along with its Markovianity w.r.t. the spot price S n and the factors V n,i for i ∈ { , · · · , n } . ( S n , V n ) to ( S, V ) To prove that the solution of the aBergomi model ( S n , V n ) converges to the solution of the rBergomi model ( S, V ) , we need to choose a suitable K n ( τ ) = (cid:80) ni =1 α ni e − x ni τ to approximate K ( τ ) = τ H − . When n → + ∞ , ( V n ) n ≥ → V (see Carmona et al. 2000, Muravlev 2011, Harms and Stefanovits 2019). Theorem 7.

There exist weights ( α ni ) ≤ i ≤ n > and mean reversion speeds ( x ni ) ≤ i ≤ n > , such that (cid:107) K n − K (cid:107) ,T → , where (cid:107) · (cid:107) ,T is the L ([0 , T ] , R ) norm. The proof of this theorem is in the Appendix.Applying the previous computations and the Kolmogorov tightness criterion, we can get that the sequence ( S n , V n ) is tight for the uniform topology and the limit satisﬁes the model (18). In this section, we detail the afﬁne property of the aBergomi model.

Theorem 8.

The process V n (equation (19) ) has the following afﬁne structure E [ V nt | F s ] = ξ exp (cid:40) σ n (cid:88) i =1 α ni (cid:18) x ni − e − ( t − s ) x ni x ni (cid:19) + n (cid:88) i =1 V n,is α ni e − ( t − s ) x ni (cid:41) Proof.

Using Theorem 4, we have E [ V nt | F s ] = ξ exp (cid:40) σ (cid:90) t − s ( K n ( s )) ds + n (cid:88) i =1 V n,is α ni e − ( t − s ) x ni (cid:41) = ξ exp (cid:40) σ (cid:90) t − s (cid:32) n (cid:88) i =1 α ni e − sx ni (cid:33) ds + n (cid:88) i =1 V n,is α ni e − ( t − s ) x ni (cid:41) = ξ exp (cid:40) σ n (cid:88) i =1 α ni (cid:18) x ni − e − ( t − s ) x ni x ni (cid:19) + n (cid:88) i =1 V n,is α ni e − ( t − s ) x ni (cid:41) , Similarly we can derive the afﬁne structure of S n by Theorem 5.10 Numerical method

In this section, we ﬁrst introduce the hybrid scheme and algorithm to approximate an rBergomi model by an aBer-gomi model. And then, we compare the simulated volatilities of both models. To demonstrate the approximationaccuracy and efﬁciency, we investigate the RMSE of simulated results for different number of terms and numberof time steps in numerical tests. By some improved algorithms, we observe that 25-term O-U process and 100time steps can produce a good output, with reliable outcomes and fast calculation speed under 20000 Monte Carlopaths.

Recalling equation (3), the rough Bergomi model with time horizon

T > under an equivalent martingalemeasure identiﬁed with P can be written as:  dS t = S t (cid:112) V t dW t dξ ts ξ ts = η √ α + 1( t − s ) α dB s , (20)where W, B are two standard Brownian motions with correlation ρ . We recall Assumption 1 that the forwardvariance curve ξ t is ﬂat for all t ∈ [0 , T ] : ξ t = ξ > Thus, the spot variance V t in Equation (20) is given by V t = ξ exp (cid:18) η √ α + 1 (cid:90) t ( t − s ) α dB s − η t α +1 (cid:19) . To simulate the Volterra-type integral ˜ X = √ α + 1 (cid:82) t ( t − s ) α dB s , we apply the hybrid scheme proposed inBennedsen et al. (2017), which approximates the kernel function of the Brownian semi-stationary processes by aWiener integrals of the power function at t = s and a Riemann sum elsewhere.Let (cid:0) Ω , F , ( F t ) t ∈ R , P (cid:1) be a ﬁltered probability space which supports a standard Brownian motion W = ( W t ) t ∈ R .We consider a Brownian semi-stationary process ( Bss ): ¯ X t = (cid:90) t −∞ g ( t − s ) σ s dW s t ∈ R , (21)where σ = ( σ t ) t ∈ R is an ( F t ) t ∈ R -predictable process which captures the stochastic volatility of ¯ X and g :(0 , ∞ ) → [0 , ∞ ) is a Borel-measurable kernel function. We assume that E (cid:2) σ t (cid:3) < ∞ for all t ∈ R and theprocess is covariance-stationary, namely E [ σ s ] = E [ σ t ]cov ( σ s , σ t ) = cov (cid:0) σ , σ | s − t | (cid:1) , s, t ∈ R . These assumptions imply that ¯ X is covariance-stationary. However, the process ¯ X need not be strictly stationary. Assumption 3.

The assumptions concerning the kernel function g are as follows: (A1) For some α ∈ (cid:0) − , (cid:1) \{ } , g ( x ) = x α L g ( x ) , x ∈ (0 , , where L g : (0 , → [0 , ∞ ) is continuously differentiable, slowly varying at 0 and bounded away from 0.Moreover, there exists a constant C > such that the derivative L (cid:48) g of L g satisﬁes | L (cid:48) g ( x ) | ≤ C (cid:18) x (cid:19) , x ∈ (0 , . (A2) The function g is continuously differentiable on (0 , ∞ ) , with derivative g (cid:48) that is ultimately monotonic andalso satisﬁes (cid:82) ∞ g (cid:48) ( x ) dx < ∞ . A3)

For some β ∈ (cid:0) −∞ , − (cid:1) , g ( x ) = O (cid:0) x β (cid:1) , x → ∞ . In order to implement the hybrid scheme to the rBergomi model, we need to introduce a particular class of non-stationary processes, namely truncated Brownian semi-stationary ( tBss ) processes, ˜ X t = (cid:90) t g ( t − s ) σ s dW s t ≥ , (22)where the kernel function g ( t ) , the volatility process σ s and the driving Brownian motion W s are as deﬁned inthe deﬁnition of Bss processes. ˜ X t can also be seen as the truncated stochastic integral at of the Bss process ¯ X t .Equation (22) is integrable since g ( t ) is differentiable on (0 , ∞ ) . Now, we can discretise equation (22) in time. Let N be the total number of time steps, ∆ t = T /N be the timestep size, and t = 0 ≤ . . . ≤ t j = j ∆ t ≤ . . . ≤ t N = T be a time grid on the interval [0 , T ] .According to Bennedsen et al. (2017), the observations ˜ X Nt j , j = 0 , , · · · , N can be computed via ( κ = 1 case) ˜ X Nt j = L g (∆ t ) σ Nj − W Nj − , + j (cid:88) k =1 g ( b ∗ k ∆ t ) σ Nj − k ¯ W Nj − k (23)using the random vectors W Nj , j = 0 , , · · · , N − , the random variables σ Nj , j = 0 , , · · · , N − , where b ∗ k = (cid:16) k α +1 − ( k − α +1 α +1 (cid:17) α , and the random vectors ¯ W Ni (cid:44) (cid:82) i +1 NiN dW s (see Proposition 2.8 in Bennedsen et al.2017).To simulate the Volterra process ˜ X , we use:  L g ≡ ,g ( x ) ≡ x H − ,σ ( · ) ≡ √ α + 1 . then, W Nj − , = (cid:90) t j t j − ( t j − s ) α dW s ≈ (cid:18) ∆ t (cid:19) α (cid:0) W t j − W t j − (cid:1) ¯ W Nj = (cid:90) t j +1 t j dW s = W t j +1 − W t j σ Nj = σ t j . The related matrix representation takes the form of  ˜ X t ˜ X t ˜ X t ... ˜ X t N  =  W , · · · W , g ( b ∗ ∆ t ) ¯ W · · · W , g ( b ∗ ∆ t ) ¯ W · · · ... ... . . . ... ... W N − , g ( b ∗ ∆ t ) ¯ W N − · · · g (cid:0) b ∗ N − ∆ t (cid:1) ¯ W g ( b ∗ N ∆ t ) ¯ W   σ t σ t σ t ... σ t N  . (24)In the rBergomi model, σ t i = σ is a constant for i = 1 , , ..., N deﬁned in equation (18). When simulating ˜ X i ,we need to perform a matrix multiplication, the computational complexity of which is of order O (cid:0) N (cid:1) whenusing the conventional matrix multiplication algorithm. However, multiplying a lower triangular Toeplitz matrixcan be regarded as a discrete convolution which can be evaluated efﬁciently by fast Fourier transform. Therefore12able 1: Parameters in the rBergomi model ξ η α -0.43the computational complexity can be reduced to O ( N log N ) . The algorithm to simulate the Volterra process ˜ X is described in Algorithm 1. Then we can use a standard Euler scheme to simulate the price ( S t , S t , · · · , S t N ) . Algorithm 1:

Volterra process ˜ X(cid:46)

Simulate W t j while j = 0 , , , · · · , N − do generate random vectors W t j end (cid:46) Simulate W Nt j − , while j = 1 , , · · · , N do W Nt j − , = (cid:0) ∆ t (cid:1) α (cid:0) W t j − W t j − (cid:1) end (cid:46) Simulate ¯ W Nj while j = 0 , , , · · · , N − do ¯ W Nj = W t j +1 − W t j end Simulate ˜ X by the matrix multiplication (24) using FFTBelow we give a simulation of the stock price in the rBergomi model in Fig. ?? (see Algorithm 2). The parametersare listed in Table 1. Algorithm 2:

Rough Bergomi modelSimulate the Volterra process ˜ X by the hybrid scheme referring to Algorithm 1 (cid:46) Spot variance V t Set initial values V t = ξ while t = t , t , · · · , t N do V t = ξ e η ˜ X − η t α +1 end (cid:46) Log-stock price log( S t ) Set initial values log( S t ) = 0 while t = t , t , · · · , t N do log( S t +∆ t ) ← log( S t ) + √ V t ∆ W t − V t ∆ t end For sake of simplicity, we start with deriving the approximation of the rBergomi model with 2 terms. It works inthe same way when terms number is bigger than 2. The 2-term Bergomi model (4) that we used to approximatethe rBergomi model is given as follows.  dS t = S t (cid:112) V t dW t dξ ts = ηξ ts (cid:16) α e − κ ( t − s ) + α e − κ ( t − s ) (cid:17) dB s , (25)13here s ∈ [0 , t ) . Here, we introduce the process y ts deﬁned as  y ts = α e − κ ( t − s ) Y s + α e − κ ( t − s ) Y s dY s = − κ Y s ds + dB s Y = 0 dY s = − κ Y s ds + dB s Y = 0 . (26)where κ , κ are from the exponential kernel K exp , and Y s and Y s are two O-U processes. Hence the process y ts can be written as a driftless Gaussian process as follows: dy ts = α e − κ ( t − s ) dB s + α e − κ ( t − s ) dB s , and its quadratic variation is (cid:104) dy t , dy t (cid:105) s = ς ( t − s ) ds where ς ( u ) = (cid:112) α e − κ u + α e − κ u + 2 α α e − ( κ + κ ) u .The forward variation process ξ ts can be written as dξ ts = η ts dy ts . Thus, the solution of the forward variation pro-cess is ξ ts = ξ f t ( s, y ts ) where f t ( s, y ) = e ηy − η χ ( s,t ) and χ ( s, t ) = (cid:90) tt − s ς ( u ) du = (cid:90) tt − s α e − κ u + α e − κ u + 2 α α e − ( κ + κ ) u du = α e − κ ( t − s ) − e − κ s κ + α e − κ ( t − s ) − e − κ s κ + 2 α α e − ( κ + κ )( t − s ) − e − ( κ + κ ) s κ + κ (27)Recall that V t = ξ tt = ξ e ηy tt − η χ ( t,t ) and χ ( t, t ) (cid:39) s → t t α +1 when s → t under the condition that the factornumber is large enough (this formula is more applicable than (27) when s → t , provided n is large enough).Using the approximation by Bergomi model, we consider the parameters { α i , κ i } ( i =1 , , ··· ,n ) in the exponentialkernel K exp = (cid:80) ni =1 α i e − κ i ( t − s ) on s ∈ [0 , t ) . Note that when s → t , the power kernel K pow → ∞ while K exp is ﬁnite. To compute the approximation numerically, we need to truncate the kernel K exp . To do so we can use the scipy.optimize module in Python or the nlinfit function in

MATLAB for the nonlinear regression ofthe parameters { α i , κ i } ( i =1 , , ··· ,n ) and the simulated price { S t } . We exemplify the truncation of K exp by letting s ∈ [0 , T − ∆ t ] , the truncated parameter θ = T − TN = T − ∆ t and let T = 1 .We deﬁne the integral I trunc on the truncated region [0 , θt ) and apply the scaling property of Brownian motion asfollows: I trunc = n (cid:88) i =1 α i (cid:90) θtT e − κ i ( t − s ) dB s = n (cid:88) i =1 α i (cid:114) θT (cid:90) t e − κ i (1 − θT ) s dB s . After scaling B s , the process y s demands change to be driftless Gaussian and satisfy y s = (cid:80) ni =1 α i e − κ i (1 − θT ) s Y is where dY is = κ i (1 − θT ) Y is ds + dB s Y i = 0 . Then the process y s can be written as dy s = (cid:80) ni =1 α i e − κ i (1 − θT ) s dB s .Thus, the kernel in the rBergomi model on [0 , θT t ) can be approximated by I trunc = (cid:113) θT y t .14igure 1: The power kernel K pow in the rBergomi model and the exponential K exp in the 25-term aBergomi modelwhen T = 1 and N = 100 .Figure 1 displays the power kernel K pow in the rBergomi model and the K exp in the 25-term aBergomi modelwhen T = 1 and N = 100 . This ﬁgure suggests that K exp is sufﬁciently accurate for nonlinear regression, with aRoot-mean-square error (RMSE) of . × − .Figure 2: Volatility smiles for rBergomi and 25-term aBergomi models with T = 1 simulated by MonteCarlo paths.The method for simulating the variance in the aBergomi model is described in Section 5.2, which leads directlyto the volatility smiles in Figure 2 (see Algorithm 3). From Figure 2, we note that the at-the-money calibrationis better with 50 time steps at the cost of a worse out-of-the-money calibration. Meanwhile, 100 time steps canapproximate the rBergomi model visually well. However, we multiply the aBergomi smile by a constant fordifferent time steps since the Riemann-sum scheme is able to capture the shape of the implied volatility smile, butnot its level (see Bennedsen et al. 2017). To generate realistic implied volatility smiles, we determine the squareof multiplication factors for different time steps in Table 2.15able 2: The square of multiplication factors for different stepstime steps square of multiplication factors50 0.750323909100 0.550447453150 0.485093611200 0.450392126Figure 3: RMSE of the the implied volatility smiles of aBergomi model with three different number of terms (15,20 and 25) at different number of time steps under 20000 Monte Carlo paths

Algorithm 3: n-term aBergomi model when T = 1 (cid:46) Driftless Gaussian process y s = (cid:80) ni =1 α i e − κ i (1 − θ ) s Y is Set initial values y s = zeros ( M, N ) , Y i = 0 while ( s = t , t , · · · , t N ) and ( i = 1 , , · · · , n ) do Y s +∆ t ← Y is + κ i (1 − θ ) Y is ∆ t + ∆ W s end (cid:46) Spot variance V t Set initial values V t = ξ while t = t , t , · · · , t N do V t = ξ e multiplication factor ·√ θy t − η t α +1 end (cid:46) Log-stock price log( S t ) Set initial values log( S t ) = 0 while t = t , t , · · · , t N do log( S t +∆ t ) ← log( S t ) + √ V t ∆ W t − V t ∆ t end We compute the RMSE of the implied volatility approximation with different numbers of terms in the aBergomimodel and different time steps in Figure 3 and compare the pricing speed in Table 3. The numerical resultssuggest that the RMSE of different term numbers reduces to the same low level as the number of time steps16able 3: Runtime (in s) of the rBergomi model and the aBergomi model for different time steps with T = 1 and20000 Monte Carlo pathsTime steps rBergomi 10-term aBergomi 15-term aBergomi 20-term aBergomi 25-term aBergomi50 0.408105 0.099360 0.139226 0.172056 0.216408100 0.500114 0.236882 0.313049 0.402372 0.454303150 0.560219 0.365012 0.449856 0.558937 0.686911200 0.586050 0.425685 0.590825 0.725772 0.872691increases. Therefore, we may conclude that choosing the 25-term O-U process and 100 time steps can produce agood output, with reliable outcomes and fast calculation speed with 20000 Monte Carlo paths. In this paper, we prove the power-law behavior of the ATM volatility skew as time to maturity goes to zero of therBergomi model and we also propose an aBergomi model with ﬁnite terms to approximate the rBergomi model.The approximation enables the adoption of classical pricing methods while keeping the fractional feature of themodel. When the terms number in the aBergomi model is large enough, we can prove its convergence to therBergomi model. We not only give the theoretical proofs, but also give its numerical results. A hybrid scheme forthe rBergomi model with the computational complexity O ( N log N ) is developed for the aBergomi model. Nu-merically simulated results by the hybrid scheme demonstrate the accuracy and efﬁciency of the approximation.The model parameters used in the numerical test are calibrated from the regression of the power-law kernel of therBergomi model. Other efﬁcient calibration methods are worth investigation for future research. A Appendix

A.1 Proof of Theorem 7

In this subsection, we give the proof of the theoretical results in Theorem 7.

Proof.

Let ( p ni ) ≤ i ≤ n be auxiliary mean reversion speeds such that p ni − ≤ x ni ≤ p ni for i ≤ { , · · · , n } and p n = 0 . Recall that K ( τ ) = (cid:82) ∞ e − xτ µ ( dx ) . We have (cid:107) K n − K (cid:107) ,T = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) i =1 α ni e − x ni τ − (cid:90) ∞ e − xτ µ ( dx ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ,T ≤ (cid:90) ∞ (cid:13)(cid:13)(cid:13) e − x ( · ) (cid:13)(cid:13)(cid:13) ,T µ ( dx ) + n (cid:88) i =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) α ni e − x ni ( · ) − (cid:90) p ni p ni − e − x ( · ) µ ( dx ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ,T . (28)The ﬁrst term on the RHS of the inequality (28) can be estimated as below: (cid:90) ∞ p nn (cid:13)(cid:13)(cid:13) e − x ( · ) (cid:13)(cid:13)(cid:13) ,T µ ( dx ) = (cid:90) ∞ p nn (cid:114) − e − xT x µ ( dx ) ≤ ( p nn ) − H √ H Γ (cid:0) − H (cid:1) . e x = 1 + x + x + (cid:82) x x − u ) du for t ∈ [0 , T ] , choosing α ni = (cid:82) p ni p ni − µ ( dx ) and x ni = (cid:32) (cid:82) pnipni − x µ ( dx ) (cid:82) pnipni − µ ( dx ) (cid:33) , we can obtain that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α ni e − x ni t − (cid:90) p ni p ni − e − xt µ ( dx ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α ni (cid:18) − x ni t ) + ( − x ni t ) (cid:19) − (cid:90) p ni p ni − (cid:32) − xt ) + ( − xt ) (cid:33) µ ( dx ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α ni (cid:32)(cid:90) x ni t ( x ni t − u ) du (cid:33) − (cid:90) p ni p ni − (cid:90) xt ( xt − u ) duµ ( dx ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:90) p ni p ni − ( xt − x ni t ) + ( − x ni t ) − ( − xt ) µ ( dx ) ≤ t (cid:90) p ni p ni − ( x − x ni ) µ ( dx ) since (cid:90) p ni p ni − (cid:40)(cid:90) x ni t ( x ni t − u ) du − (cid:90) xt ( xt − u ) du (cid:41) µ ( dx )= (cid:90) p ni p ni − (cid:26) x ni t (cid:90) ( x ni t − x ni ts ) ds − xt (cid:90) ( xt − xts ) ds (cid:27) µ ( dx ) , s = uxt = (cid:90) p ni p ni − (cid:26) ( x ni t ) (cid:90) (1 − s ) ds − ( xt ) (cid:90) (1 − s ) ds (cid:27) µ ( dx )= (cid:26) t (cid:90) (1 − s ) ds (cid:27) (cid:90) p ni p ni − (cid:110) ( x ni ) − ( x ) (cid:111) µ ( dx )= (cid:26) t (cid:90) (1 − s ) ds (cid:27) (cid:90) p ni p ni −  (cid:82) p ni p ni − xµ ( dx ) (cid:82) p ni p ni − µ ( dx )  − ( x )  µ ( dx )=0 . Hence, n (cid:88) i =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) α ni e − x ni ( · ) − (cid:90) p ni p ni − e − x ( · ) µ ( dx ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ,T ≤ T √ n (cid:88) i =1 (cid:90) p ni p ni − ( x − x ni ) µ ( dx ) . Thus, the convergence of K n depends on the weights α i and mean reversions x i . Let p ni = iπ n for each i ∈ { , · · · , n } and π n > . We have n (cid:88) i =1 (cid:90) p ni p ni − ( x − x ni ) µ ( dx ) ≤ π n (cid:90) p nn µ ( dx ) = π − Hn n − H (cid:0) − H (cid:1) Γ (cid:0) − H (cid:1) We can also proceed to get the explicit expressions of α ni and x ni as follows: α ni = ( iπ n ) − H − [( i − π n ] − H ( − H ) Γ ( − H ) , x ni = 1 − H − H · ( iπ n ) − H − [( i − π n ] − H ( iπ n ) − H − [( i − π n ] − H . Since p nn = nπ n → ∞ , we have π − Hn n − H → as n → + ∞ when π n < n − ,18 K n − K (cid:107) ,T ≤ √ H Γ (cid:0) − H (cid:1) (cid:34) ( p nn ) − H + T H √ (cid:0) − H (cid:1) ( p nn ) − H π n (cid:35) = 1 √ H Γ (cid:0) − H (cid:1) (cid:34) n − H π − Hn + T H √ (cid:0) − H (cid:1) n − H π − Hn (cid:35) = ax − H + bx − H (29)Let x = π n , RHS y = ax − H + bx − H and y (cid:48) = − aHx − H − + b (cid:0) − H (cid:1) x − H = 0 , solving for x , we have x = aHb ( − H ) , where a = n − H and b = T H √ ( − H ) n − H x = π n = (cid:34) n − H H √ (cid:0) − H (cid:1) T Hn − H (cid:0) − H (cid:1) (cid:35) = (cid:34) n − √ (cid:0) − H (cid:1) T (cid:0) − H (cid:1) (cid:35) = n − T (cid:34) √ (cid:0) − H (cid:1)(cid:0) − H (cid:1) (cid:35) When π n = n − T (cid:20) √ ( − H )( − H ) (cid:21) , the RHS of equation (29) attains its minimum and (cid:107) K n − K (cid:107) ,T ≤ Cn − H where C = √ H Γ ( − H ) T H (cid:20) √ ( − H ) − H (cid:21) − H − H is a constant. References

Abi Jaber, E. and El Euch, O. (2019). Multifactor approximation of rough volatility models.

SIAM Journal onFinancial Mathematics , 10(2):309–349.Bayer, C., Friz, P., and Gatheral, J. (2016). Pricing under rough volatility.

Quantitative Finance , 16(6):887–904.Bayer, C., Hammouda, C. B., and Tempone, R. (2018). Hierarchical adaptive sparse grids for option pricing underthe rough Bergomi model. arXiv preprint arXiv:1812.08533 .Bayer, C., Horvath, B., Muguruza, A., Stemper, B., and Tomas, M. (2019). On deep calibration of (rough)stochastic volatility models. arXiv preprint arXiv:1908.08806 .Bennedsen, M., Lunde, A., and Pakkanen, M. S. (2017). Hybrid scheme for Brownian semistationary processes.

Finance and Stochastics , 21(4):931–965.Bergomi, L. (2005). Smile dynamics II.

Risk magazine , 18(10).Bergomi, L. (2009). Smile dynamics IV.

Risk magazine , 22(12).Bergomi, L. and Guyon, J. (2012). Stochastic volatility’s orderly smiles.

Risk , 25(5):60.Carmona, P., Coutin, L., and Montseny, G. (2000). Approximation of some Gaussian processes.

Statisticalinference for stochastic processes , 3(1-2):161–171.Forde, M. and Zhang, H. (2017). Asymptotics for rough stochastic volatility models.

SIAM Journal on FinancialMathematics , 8(1):114–145.Fukasawa, M. (2017). Short-time at-the-money skew and rough fractional volatility.

Quantitative Finance ,17(2):189–198.Gatheral, J., Jaisson, T., and Rosenbaum, M. (2018). Volatility is rough.

Quantitative Finance , 18(6):933–949.Gatheral, J. and Keller-Ressel, M. (2019). Afﬁne forward variance models.

Finance and Stochastics , pages 1–33.19arms, P. and Stefanovits, D. (2019). Afﬁne representations of fractional processes with applications in mathe-matical ﬁnance.

Stochastic Processes and their Applications , 129(4):1185–1228.Jacquier, A., Martini, C., and Muguruza, A. (2018). On VIX futures in the rough Bergomi model.

QuantitativeFinance , 18(1):45–61.James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).

An introduction to statistical learning , volume 112.Springer.Jusselin, P. and Rosenbaum, M. (2018). No-arbitrage implies power-law market impact and rough volatility.

Available at SSRN 3180582 .McCrickerd, R. and Pakkanen, M. S. (2018). Turbocharging Monte Carlo pricing for the rough Bergomi model.

Quantitative Finance , 18(11):1877–1886.Muravlev, A. A. (2011). Representation of a fractional Brownian motion in terms of an inﬁnite-dimensionalOrnstein-Uhlenbeck process.

Russian Mathematical Surveys , 66(2):439–441.Øksendal, B. and Zhang, T.-S. (1993). The stochastic Volterra equation. In

Barcelona Seminar on StochasticAnalysis , pages 168–202. Springer.Protter, P. E. (2005). Stochastic differential equations. In