[PDF] Asymptotics for volatility derivatives in multi-factor rough volatility models

Abstract

We present small-time implied volatility asymptotics for Realised Variance (RV) and VIX options for a number of (rough) stochastic volatility models via large deviations principle. We provide numerical results along with efficient and robust numerical recipes to compute the rate function; the backbone of our theoretical framework. Based on our results, we further develop approximation schemes for the density of RV, which in turn allows to express the volatility swap in close-form. Lastly, we investigate different constructions of multi-factor models and how each of them affects the convexity of the implied volatility smile. Interestingly, we identify the class of models that generate non-linear smiles around-the-money.

Full PDF

AASYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGHVOLATILITY MODELS

CHLOE LACOMBE, AITOR MUGURUZA, AND HENRY STONE

Abstract.

We study the small-time implied volatility smile for Realised Variance options, and inves-tigate the eﬀect of correlation in multi-factor models on the linearity of the smile. We also develop anapproximation scheme for the Realised Variance density, allowing fast and accurate pricing of VolatilitySwaps. Additionally, we establish small-noise asymptotic behaviour of a general class of VIX options inthe large strike regime. Introduction

Following the works by Al`os, Le´on and Vives [ALV07], Gatheral, Jaisson, and Rosenbaum [GJR18] andBayer, Friz and Gatheral [BFG16], rough volatility is becoming a new breed in ﬁnancial modelling bygeneralising Bergomi’s ‘second generation’ stochastic volatility models to a non-Markovian setting. Themost basic form of (lognormal) rough volatility model is the so-called rough Bergomi model introducedin [BFG16]. Gassiat [Gas18] recently proved that such a model (under certain correlation regimes) gen-erates true martingales for the spot process. The lack of Markovianity imposes numerous fundamentaltheoretical questions and practical challenges in order to make rough volatility usable in an industrial en-vironment. On the theoretical side, Jacquier, Pakkanen, and Stone [JPS18] prove a large deviations prin-ciple for a rescaled version of the log stock price process. In this same direction, Bayer, Friz, Gulisashvili,Horvath and Stemper [BFGHS17], Forde and Zhang [FZ17], Horvath, Jacquier and Lacombe [HJL18]and most recently Friz, Gassiat and Pigato [FGP18] (to name a few) prove large deviations principlesfor a wide range of rough volatility models. On the practical side, competitive simulation methods aredeveloped in Bennedsen, Lunde and Pakkanen [BLP15], Horvath, Jacquier and Muguruza [HJM17] andMcCrickerd and Pakkanen [MP18]. Moreover, recent developments by Stone [Sto19] and Horvath, Mugu-ruza and Tomas [HMT19] allow the use of neural networks for calibration; their calibration schemes areconsiderably faster and more accurate than existing methods for rough volatility models.Crucially, the lack of a pricing PDE imposes a fundamental constraint on the comprehension and in-terpretation of rough volatility models driven, by Volterra-like Gaussian processes. The only currentexception in the rough volatility literature is the rough Heston model, developed by El Euch and Rosen-baum [ER18, ER19], which allows a better understanding through the fractional PDE derived in [ER19].Nevertheless, in this work our attention is turned to the class of models for which such a pricing PDE isunknown, and hence further theoretical results are required.

Date : November 3, 2020.2010

Mathematics Subject Classiﬁcation.

Primary 60F10, 60G22; Secondary 91G20, 60G15, 91G60.

Key words and phrases.

Rough volatility, VIX, large deviations, realised variance, small-time asymptotics, Gaussianmeasure, reproducing kernel Hilbert space.The authors are grateful to Antoine Jacquier, Mikko Pakkanen and Ryan McCrickerd for stimulating discussions. AMand HS thank the EPSRC CDT in Financial Computing and Analytics for ﬁnancial support. a r X i v : . [ q -f i n . M F ] O c t CHLOE LACOMBE, AITOR MUGURUZA, AND HENRY STONE

Perhaps, options on volatility itself are the most natural object to ﬁrst analyse within the class of roughvolatility models. In this direction, Jacquier, Martini, and Muguruza [JMM18] provide algorithms forpricing VIX options and futures. Horvath, Jacquier and Tankov [HJT18] further study VIX smiles inthe presence of stochastic volatility of volatility combined with rough volatility. Nevertheless, the preciseeﬀect of model parameters (with particular interest in the Hurst parameter eﬀect) on implied volatilitysmiles for VIX (or volatility derivatives in general) has not been studied until very recently in Al`os,Garc´ıa-Lorite and Muguruza [AGM18].The main focus of the paper is to derive the small-time behaviour of the realised variance process ofthe rough Bergomi model, as well as related but more complicated multi-factor rough volatility models,together with the small-time behaviour of options on realised variance. These results, which are interestingfrom a theoretical perspective, have practical applicability to the quantitative ﬁnance industry as theyallow practitioners to better understand the Realised Variance smile, as well as the eﬀect of correlation onthe smile’s linearity (or possibly convexity). To the best of our knowledge, this is the ﬁrst paper to studythe small-time behaviour of options on realised variance. An additional major contribution of the paperis the numerical scheme used to compute the implied volatility smiles, using an accurate approximationof the rate function from a large deviations principle. In general rate functions are highly non-trivial tocompute; our method is simple, intuitive, and accurate. The numerical methods are publicly availableon GitHub: LDP-VolOptions.Volatility options are becoming increasingly popular in the ﬁnancial industry. For instance, VIX options’liquidity has consistently increased since its creation by the Chicago Board of Exchange (CBOE). Oneof the main popularity drivers is that volatility tends to be negatively correlated with the underlyingdynamics, making it desirable for portfolio diversiﬁcation. Due to the appealing nature of volatilityoptions, their modelling has attracted the attention of many academics such as Carr, Geman, Madan andYor [CGMY05], Carr and Lee [CL09] to name a few.For a log stock price process X deﬁned as X t = − (cid:82) t v s d s + (cid:82) t √ v s d B s , X = 0 , where B is standardBrownian motion, we denote the quadratic variation of X at time t by (cid:104) X (cid:105) t . Then, the core object toanalyse in this setting is the realised variance option with payoﬀ(1.1) (cid:32) T (cid:90) T d (cid:104) X (cid:105) s − K (cid:33) + , which in turn deﬁnes the risk neutral density of the realised variance. In this work, we analyse the shorttime behaviour of the implied volatility given by (1.1) for a number of (rough) stochastic volatility modelsby means of large deviation techniques. We speciﬁcally focus on the construction of correlated factorsand their eﬀect on the distribution of the realised variance. We ﬁnd our results consistent with that ofAl`os, Garc´ıa-Lorite and Muguruza [AGM18], which also help us characterise in close-form the impliedvolatility around the money. Moreover, we also obtain some asymptotic results for VIX options.While implied volatilities for options on equities are typically convex functions of log-moneyness, givingthem their “smile” moniker, implied volatility smiles for options on realised variance tend to be linear.Options on integrated variance are OTC products, and so their implied volatility smiles are not publiclyavailable. VIX smiles are, however, and provide a good proxy for integrated variance smiles; see Figure 1below for evidence of their linearity. The data also indicates both a power-law term structure ATM andits skew. SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 3

Figure 1.

Implied volatility smiles for Call options on VIX for small maturities, closeto the money. Data provided by OptionMetrics.In spite of most of the literature agreeing on the fact that more than a single factor is needed to modelvolatility (see Bergomi’s [Ber16] two-factor model, Avellaneda and Papanicolaou [AP18] or Horvath,Jacquier and Tankov [HJT18] for instance), there is no in-depth analysis on how to construct these (cor-related) factors, nor the eﬀect of correlation on the price of volatility derivatives and their correspondingimplied volatility smiles. Our aim is to understand multi-factor models and analyse the eﬀect of factors inimplied volatility smiles. This paper, to the best of our knowledge is the ﬁrst to address such questions,which are of great interest to practitioners in the quantitative ﬁnance industry; it is also the ﬁrst toprovide a rigorous mathematical analysis of the small-time behaviour of options on integrated variancein rough volatility models.The structure of the paper is as follows. Section 2 introduces the models, the rough Bergomi model andtwo closely related processes, whose small-time realised variance behaviour we study; the main results aregiven in Section 3. Section 4 is dedicated to numerical examples of the results attained in Section 3. InSection 5 we introduce a general variance process, which includes the rough Bergomi model for a speciﬁcchoice of kernel, and brieﬂy investigate the small-noise behaviour of VIX options in this general setting.Motivated by the numerical examples in Section 4, we propose a simple and very feasible approximationfor the density of the realised variance for the mixed rough Bergomi model (see (2.5)) in Appendix A.The proofs of the main results are given in Appendix B; the details of the numerics are given in AppendixC.

Notations:

Let R + := [0 , + ∞ ) and R ∗ + := (0 , + ∞ ). For some index set T ⊆ R + , the notation L ( T )denotes the space of real-valued square integrable functions on T , and C ( T , R d ) the space of R d -valuedcontinuous functions on T . E denotes the Wick stochastic exponential.2. A showcase of rough volatility models

In this section we introduce the models that will be considered in the forthcoming computations. Weshall always work on a given ﬁltered probability space (Ω , ( F t ) t ≥ , P ). For notational convenience, we CHLOE LACOMBE, AITOR MUGURUZA, AND HENRY STONE introduce(2.1) Z t := (cid:90) t K α ( s, t )d W s , for any t ∈ T , where α ∈ (cid:0) − , (cid:1) , W a standard Brownian motion, and where the kernel K α : R + × R + → R + reads(2.2) K α ( s, t ) := η √ α + 1( t − s ) α , for all 0 ≤ s < t, for some strictly positive constant η . Note that, for any t ≥

0, the map s (cid:55)→ K α ( s, t ) belongs to L ( T ), sothat the stochastic integral (2.1) is well deﬁned. We also deﬁne an analogous multi-dimensional versionof (2.1) by(2.3) Z t := (cid:18)(cid:90) t K α ( s, t )d W s , ..., (cid:90) t K α ( s, t )d W ms (cid:19) := (cid:0) Z t , ..., Z mt (cid:1) , for any t ∈ T , where W , ..., W m are independent Brownian motions. Model 2.1 (Rough Bergomi) . The rough Bergomi model, where X is the log stock price process and v is the instantaneous variance process, is then deﬁned (see [BFG16]) as(2.4) X t = − (cid:90) t v s d s + (cid:90) t √ v s d B s , X = 0 ,v t = v exp (cid:18) Z t − η t α +1 (cid:19) , v > , where the Brownian motion B is deﬁned as B := ρW + (cid:112) − ρ W ⊥ for ρ ∈ [ − ,

1] and some standardBrownian motion W ⊥ independent of W . Remark 2.2.

The process log v has a modiﬁcation whose sample paths are almost surely locally γ -H¨older continuous, for all γ ∈ (cid:0) , α + (cid:1) [JPS18, Proposition 2.2]. For rough volatility models involvinga fractional Brownian motion, the sample path regularity of the log volatility process is referred to interms of the Hurst parameter H ; recall that the fractional Brownian motion has sample paths that are γ -H¨older continuous for any γ ∈ (0 , H ) [BHØZ08, Theorem 1.6.1]. By identiﬁcation, therefore, we havethat α = H − / Model 2.3 (Mixed rough Bergomi) . The mixed rough Bergomi model is given in terms of log stock priceprocess X and instantaneous variance process v ( γ,ν ) as(2.5) X t = − (cid:90) t v ( γ,ν ) s d s + (cid:90) t (cid:113) v ( γ,ν ) s d B s , X = 0 ,v ( γ,ν ) t = v (cid:80) ni =1 γ i exp (cid:16) ν i η Z t − ν i t α +1 (cid:17) , v > γ := ( γ , ..., γ n ) ∈ [0 , n such that (cid:80) ni =1 γ i = 1 and ν := ( ν , ..., ν n ) ∈ R n , such that 0 < ν < ... <ν n .The above modiﬁcation of the rough Bergomi model, inspired by Bergomi [Ber08], allows a bigger slope(hence bigger skew) on the implied volatility of variance/volatility options to be created, whilst main-taining a tractable instantaneous variance form. This will be made precise in Section 4.2. SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 5

Model 2.4 (Mixed multi-factor rough Bergomi) . The mixed rough Bergomi model is given in terms oflog stock price process X and instantaneous variance process v ( γ,ν, Σ) as(2.6) X t = − (cid:90) t v ( γ,ν, Σ) s d s + (cid:90) t (cid:113) v ( γ,ν, Σ) s d B s , X = 0 ,v ( γ,ν, Σ) t = v (cid:80) ni =1 γ i E (cid:16) ν i η · L i Z t (cid:17) , v > , where γ := ( γ , ..., γ n ) ∈ [0 , n such that (cid:80) ni =1 γ i = 1. The vector ν i = ( ν i , ..., ν im ) ∈ R m satisﬁes0 < ν i < ... < ν im for all i ∈ { , ..., n } . In addition, L i ∈ R m × m is a lower triangular matrix such thatL i L Ti =: Σ i is a positive deﬁnite matrix for all i ∈ { , ..., n } , denoting the covariance matrix.For all results involving models (2.4), (2.5), and (2.6) we ﬁx T = [0 , T . We additionally deﬁne β := 2 α + 1 ∈ (0 ,

1) for notationalconvenience.

Remark 2.5.

In models (2.4)-(2.6) we have considered a ﬂat or constant initial forward variance curve v >

0. However, our framework can be easily extended to functional forms v ( · ) : T (cid:55)→ R + via theContraction Principle, see Appendix D, as long as the mapping is continuous. Remark 2.6.

The reader may already have realised that the mixed multi-factor rough Bergomi deﬁnedin (2.6) is indeed general enough to cover both (2.4) and (2.5). However, we provide our theoreticalresults in an orderly fashion starting from (2.4) and ﬁnishing with (2.6), which we ﬁnd the most naturalway to increase the complexity of the model.In place of K α in (2.1), one may also consider more general kernels of the form G ( t, s ) := K α ( t, s ) L ( t − s )where L ∈ C (0 , ∞ ) is a measurable function such that the stochastic integral is well deﬁned, L ( · ) is de-creasing and L ( · ) continuous at 0. We note that under such conditions, L ( · ) is also slowly varying at zero,i.e. lim x → L ( tx ) L ( x ) = 1 for any t >

0. Such kernels are naturally related to the class of Truncated Brown-ian Semistationary (

T BSS ) processes introduced by Barndorﬀ-Nielsen and Schmiegel [BS07]. Examplesinclude the Gamma and Power-law kernels: L Gamma ( t − s ) = exp( − κ ( t − s )) , κ > L Power ( t − s ) = (1 + t − s ) ζ − α , ζ < − . The following result gives the exponential equivalence between the sequences of the rescaled stochasticintegrals of K α and G , thus it is completely justiﬁed to only consider the case K α , without any loss ofgenerality. Proposition 2.7.

The sequences of processes (cid:0) ε β/ (cid:82) · K α ( · , s ) L ( ε ( · − s ))d W s (cid:1) ε> and (cid:0) ε β/ Z · (cid:1) ε> areexponentially equivalent.Proof. As L is a slowly varying function, the so-called Potter bounds [BGT89, Theorem 1.5.6, page 25]hold on the interval (0 , ξ >

0, there exist constants 0 < C ξ ≤ C ξ such that C ξ ( εx ) ξ < L ( εx ) < C ξ ( εx ) ξ , for all ( εx ) ∈ (0 , . CHLOE LACOMBE, AITOR MUGURUZA, AND HENRY STONE

In particular, for ε > εx ) ∈ [0 , L ( εx ) − ≤ K ξ ( εx ) ξ where K ξ < ∞ as ξ >

0. Thus, forall δ > P (cid:18)(cid:13)(cid:13)(cid:13)(cid:13) ε β/ (cid:90) · K α ( · , s ) [ L ( ε ( · − s )) −

1] d W s (cid:13)(cid:13)(cid:13)(cid:13) ∞ > δ (cid:19) = P (cid:18) |N (0 , | > δε β/ (cid:107) V ε (cid:107) ∞ (cid:19) = (cid:114) π ε β/ (cid:107) V ε (cid:107) ∞ δ exp (cid:32) − δ ε β (cid:107) V ε (cid:107) ∞ (cid:33) (1 + O ( ε β )) , where the ﬁnal equality follows by using the asymptotic expansion of the Gaussian density near inﬁn-ity [AS72, Formula (26.2.12)], and V ε ( t ) := (cid:82) t K α ( t, s ) [ L ( ε ( t − s )) − d s , Then, V ε ( t ) ≤ K ξ ε ξ η (2 α + 1) (cid:90) t ( t − s ) α + ξ ) d s = ( K ξ η ) α + 12( α + ξ ) + 1 ε ξ t α + ξ )+1 , so that lim ε ↓ (cid:13)(cid:13) V ε (cid:13)(cid:13) ∞ = 0, as well as lim ε ↓ (cid:107) V ε (cid:107) ∞ = 0. Therefore ε β log P (cid:18)(cid:13)(cid:13)(cid:13)(cid:13) ε β/ (cid:90) · K α ( · , s ) [ L ( ε ( · − s )) −

1] d W s (cid:13)(cid:13)(cid:13)(cid:13) ∞ > δ (cid:19) ≤ ε β (cid:18) π (cid:19) + ε β log (cid:18) ε β/ (cid:107) V ε (cid:107) ∞ δ (cid:19) − δ (cid:107) V ε (cid:107) ∞ + ε β log (cid:0) O ) ε β (cid:1) . As ε tends to zero, the ﬁrst and second terms in the above inequality tend to zero (recall that 0 < β < −∞ . Hence for all δ > ε ↓ ε β log P (cid:18)(cid:13)(cid:13)(cid:13)(cid:13) ε β/ (cid:90) · K α ( · , s ) [ L ( ε ( · − s )) −

1] d W s (cid:13)(cid:13)(cid:13)(cid:13) ∞ > δ (cid:19) = −∞ , and thus the two processes are exponentially equivalent [DZ10, Deﬁnition 4.2.10]; see Appendix D. (cid:3) Corollary 2.8.

The sequences of processes ( ε β/ (cid:82) t e − κ i ε ( t − s ) K α ( s, t )d W i ) ε> and ( ε β/ (cid:82) t K α ( s, t )d W i ) ε> are exponentially equivalent for i = 1 , ..., m , where each κ i > .Proof. The proof is similar to the proof of Proposition 2.7. The variance in the asymptotic expansion ofthe Gaussian density near inﬁnity [AS72, Formula (26.2.12)] is deﬁned as V ε ( t ) := (cid:90) t (cid:104) e κ j ε ( t − s ) − (cid:105) K α ( s, t )d s = η β (cid:20) t β β − (cid:90) t e − κ j ε ( t − s ) ( t − s ) α d s + (cid:90) t e − κ j ε ( t − s ) ( t − s ) α d s (cid:21) , such that 0 < ε β V ε ≤ η βε β (cid:20) t β β + (cid:90) t e − κ j ε ( t − s ) ( t − s ) α d s (cid:21) ≤ η ( εt ) β , and therefore lim ε ↓ V ε = 0 and lim ε ↓ ε β (cid:107) V ε (cid:107) ∞ = 0. Then, for all δ > ε ↓ ε β log P (cid:18)(cid:13)(cid:13)(cid:13)(cid:13) ε β/ (cid:90) · K α ( s, · ) [exp( − κ j ε ( · − s )) −

1] d W js (cid:13)(cid:13)(cid:13)(cid:13) ∞ > δ (cid:19) = −∞ , ans thus the two processes are exponentially equivalent [DZ10, Deﬁnition 4.2.10]. (cid:3) Small-time results for options on integrated variance

We start our theoretical analysis by considering options on realised variance, which we also refer to asintegrated variance and RV interchangeably. We recall that volatility is not directly observable, nor atradeable asset. Options on realised variance, however, exist and are traded as OTC products. Beloware two examples of the payoﬀ structure of such products:(3.1) ( i )( RV ( v )( T ) − K ) + , ( ii )( (cid:112) RV ( v )( T ) − K ) + , where T, K ≥ . SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 7 where we deﬁne the following C ( T ) operator(3.2) RV ( f )( · ) : f (cid:55)→ · (cid:90) · f ( s )d s, RV ( f )(0) := f (0) , and v represents the instantaneous variance in a given stochastic volatility model. Note that RV ( v )(0) = v . Remark 3.1.

As shown by Neuberger [Neu94], we may rewrite the variance swap in terms of the logcontract as(3.3) E [ RV ( v )( T )] = E (cid:34) T (cid:90) T v s d s (cid:35) = E (cid:20) − X T T (cid:21) where E [ · ] is taken under the risk-neutral measure and S = exp( X ) is a risk-neutral martingale (assuminginterest rates and dividends to be null). Therefore, the risk neutral pricing of RV ( v )( T ) or options on itis fully justiﬁed by (3.3).3.1. Small-time results for the rough Bergomi model.Proposition 3.2.

The set H K α := { (cid:82) · K α ( s, · ) f ( s )d s : f ∈ L ( T ) } deﬁnes the reproducing kernelHilbert space for Z with inner product (cid:104) (cid:82) · K α ( s, · ) f ( s )d s, (cid:82) · K α ( s, · ) f ( s )d s (cid:105) H Kα = (cid:104) f , f (cid:105) L ( T ) .Proof. See [JPS18, Theorem 3.1]. (cid:3)

Before stating Theorem 3.3, we deﬁne the following function Λ Z : C ( T ) → R + as Λ Z (x) := (cid:107) x (cid:107) H Kα ,and if x / ∈ H K α then Λ Z (x) = + ∞ . Theorem 3.3.

The variance process ( v ε ) ε> satisﬁes a large deviations principle on C ( T ) as ε tends tozero, with speed ε − β and rate function Λ v ( x ) := Λ Z (cid:16) log (cid:16) xv (cid:17)(cid:17) , where Λ v ( v ) = 0 and x ∈ C ( T ) .Proof. To ease the ﬂow of the paper the proof is postponed to Appendix B.1. (cid:3)

Corollary 3.4.

The integrated variance process ( RV ( v )( t )) t ∈T satisﬁes a large deviations principle on R ∗ + as t tends to zero, with speed t − β and rate function ˆΛ v deﬁned as ˆΛ v (y) := inf { Λ v (x) : y = RV (x)(1) } ,where ˆΛ v ( v ) = 0 .Proof. As proved in Theorem 3.3, the process ( v ε ) ε> satisﬁes a pathwise large deviations principle on C ( T ) as ε tends to zero. For small perturbations δ v ∈ C ( T ), we have (cid:107) RV ( v + δ v )( t ) − RV ( δ v )( t ) (cid:107) ∞ ≤ sup t ∈T t (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) t δ v ( s )d s (cid:12)(cid:12)(cid:12)(cid:12) ≤ M, where M = sup t ∈T | δ v ( t ) | , which is ﬁnite as δ v ∈ C ( T ). Clearly M tends to zero as δ v tend to zero, andhence the operator RV is continuous with respect to the sup norm on C ( T ). Therefore we can applythe Contraction Principle (see Appendix D), and consequently the integrated variance process RV ( v ε )satisﬁes a large deviations principle on C ( T ) as ε tends to zero. Clearly RV ( v ε )( t ) = RV ( v )( εt ), forall t ∈ T , and so setting t = 1 and mapping ε to t then yields the result. By deﬁnition, ˆΛ v (y) :=inf { Λ v (x) : y = RV (x)(1) } . If we choose x ≡ v then clearly v = RV (x)(1), and Λ v (x) = 0. Since Λ v isa norm, it is a non-negative function and therefore ˆΛ v ( v ) = 0. This concludes the proof. (cid:3) CHLOE LACOMBE, AITOR MUGURUZA, AND HENRY STONE

Remark 3.5.

Corollary 3.4 can be applied to a large number of existing results on large deviations for(rough) variance processes to get a large deviations result for the integrated (rough) variance process; forexample Forde and Zhang [FZ17] and Horvath, Jacquier, and Lacombe [HJL18].

Corollary 3.6.

The rate function ˆΛ v is continuous.Proof. Indeed, as a rate function, ˆΛ v is lower semi-continuous. Moreover, as Λ v is continuous, one canuse similar arguments to [FZ17, Corollary 4.6], and deduce that ˆΛ v is upper semi-continuous, and henceis continuous. (cid:3) Before stating results on the small-time behaviour of options on integrated variance, we state that theintegrated variance process RV ( v ) satisﬁes a large deviations principle on R as t tends to zero, withspeed t − β and rate function ˆΛ v (e · ). Then, the small-time behaviour of such options can be obtained asan application of Corollary 3.4. Corollary 3.7.

For log moneyness k := log KRV ( v )(0) (cid:54) = 0 , the following equality holds true for Calloptions on integrated variance (3.4) lim t ↓ t β log E (cid:104)(cid:0) RV ( v )( t ) − e k (cid:1) + (cid:105) = − I( k ) , where I is deﬁned as as I( x ) := inf y>x ˆΛ v (e y ) for x > , I( x ) := inf yx ˆΛ v (e y ) for x > and ¯I( x ) := inf y

As with Call options on stock price processes, we can deﬁne and study the implied volatility of suchoptions. In the case of (3.1)(i) we deﬁne the implied volatility ˆ σ ( T, k ) to be the solution to(3.6) E [( RV ( v )( T ) − e k ) + ] = C BS ( RV ( v )(0) , k, T, ˆ σ ( T, k )) , where C BS denotes the Call price in the Black-Scholes model. Using Corollary 3.7, we deduce the small-time behaviour of the implied volatility ˆ σ , as deﬁned in (3.6). Corollary 3.8.

The small-time asymptotic behaviour of the implied volatility is given by the followinglimit, for a log moneyness k (cid:54) = 0 : lim t ↓ t − β ˆ σ ( t, k ) =: ˆ σ ( k ) = k k ) . Proof.

The log integrated variance process log RV ( v ) satisﬁes a large deviations principle with speed t − β and rate function ˆΛ v (e · ), which is continuous. Therefore, it follows thatlim t ↓ t β log P ( RV ( v )( t ) ≥ e k ) = − I( k ) . SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 9

In the Black Scholes model, i.e. a geometric Brownian motion with S = RV ( v )(0) with constant volatility ξ , we have the following small-time implied volatility behaviour:lim t ↓ ξ t log P ( RV ( v )( t ) ≥ e k ) = − k . We then apply Corollary 3.7 and [GL14, Corollary 7.1], identifying ξ ≡ ˆ σ ( k, t ), to conclude. (cid:3) Remark 3.9.

Notice that the level of implied volatility in Corollary 3.8 has a power law behaviour as afunction of time to maturity. This power law is of order √ t β − , which is consistent with the at-the-moneyRV implied volatility results by Al`os, Garc´ıa-Lorite and Muguruza [AGM18], where the order is found tobe t H − / using Malliavin Calculus techniques. Recall that β = 2 α + 1, and α = H − / Small-time results for the mixed rough Bergomi model.

Minor adjustments to Theorem 3.3give the following small-time result for the mixed variance process v ( γ,ν ) introduced in Model (2.5); theproof is given in Appendix B.3. Theorem 3.10.

The mixed variance process ( v ( γ,ν ) ε ) ε> satisﬁes a large deviations principle on C ( T ) with speed ε − β and rate function Λ ( γ,ν ) (x) := inf { Λ Z ( ην y) : x( · ) = v n (cid:88) i =1 γ i e νiν y( · ) } , satisfying Λ ( γ,ν ) ( v ) = 0 . By Remark 3.5, we immediately get the following result for the small-time behaviour of the integratedmixed variance process RV ( v ( γ,ν ) ). Corollary 3.11.

The integrated mixed variance process ( RV (cid:0) v ( γ,ν ) (cid:1) ( t )) t ∈T satisﬁes a large deviationsprinciple on R ∗ + as t tends to zero, with speed t − β and rate function ˜Λ ( γ,ν ) (y) := inf (cid:8) Λ ( γ,ν ) (x) : y = RV (x)(1) (cid:9) ,where ˜Λ ( γ,ν ) ( v ) = 0 . To get the small-time implied volatility result, analogous to Corollary 3.8, we need the following Lemma,which is used in place of (B.4). The remainder of the proof then follows identically.

Lemma 3.12.

For all t ∈ T and q > we have E (cid:104)(cid:16) RV (cid:16) v ( γ,ν ) (cid:17) ( t ) (cid:17) q (cid:105) ≤ v q n q t q − exp (cid:18) q ( ν ∗ ) η (cid:0) q − q (cid:1) t α +1 (cid:19) , where ν ∗ = max { ν , ..., ν n } .Proof. First we note that by H¨older’s inequality ( (cid:80) ni =1 x i ) q ≤ n q − (cid:80) ni =1 ( x i ) q , for x i >

0. Since, γ i ≤ i = 1 , ..., n , we obtain (cid:16) RV (cid:16) v ( γ,ν ) (cid:17) ( t ) (cid:17) q ≤ v q t q n q − n (cid:88) i =1 (cid:90) t E (cid:20) exp (cid:18) qν i η Z s − qν i η s α +1 (cid:19)(cid:21) d s ≤ v q t q − n q − n (cid:88) i =1 exp (cid:18) ν i η (cid:0) q − q (cid:1) t α +1 (cid:19) . Choosing ν ∗ = max { ν , ..., ν n } the result directly follows. (cid:3) Corollary 3.13.

For log moneyness k := log KRV ( v ( γ,ν ) )(0) (cid:54) = 0 , the following equality holds true for Calloptions on integrated variance in the mixed rough Bergomi model: (3.7) lim t ↓ t β log E (cid:20)(cid:16) RV ( v ( γ,ν ) )( t ) − e k (cid:17) + (cid:21) = − I( k ) , where I is deﬁned as I( x ) := inf y>x ˜Λ ( γ,ν ) (e y ) for x > , I( x ) := inf yx ˜Λ ( γ,ν ) (e y ) for x > and ¯I( x ) := inf y

The small-time implied volatility behaviour for the mixed rough Bergomi model is then given by Corollary3.8, where the function I is deﬁned in terms of the rate function ˜Λ ( γ,ν ) , as in Corollary 3.13, in this case.3.3. Small-time results for the multi-factor rough Bergomi model.

The small-time behaviour ofthe multi-factor rough Bergomi model (2.6) can then be obtained; see Theorem 3.14 below; note thatΛ m is the rate function associated to the reproducing kernel Hilbert space of the measure induced by( W , · · · , W m ) on C ( T , R m ). The proof is given in Appendix B.4. Theorem 3.14.

The variance process in the multi-factor rough Bergomi model (cid:16) v ( γ,ν, Σ) t (cid:17) t ∈T satisﬁes alarge deviations principle on R ∗ + with speed t − β and rate function Λ ( γ,ν, Σ) (y) = inf (cid:40) Λ m (x) : x ∈ H m , y = v n (cid:88) i =1 γ i exp (cid:18) ν i η · L i x(1) (cid:19)(cid:41) , satisfying Λ ( γ,ν, Σ) ( v ) = 0 . As with the mixed variance process, Remark 3.5 gives us the following small-time result for RV ( v ( γ,ν, Σ) )straight oﬀ the bat. Corollary 3.15.

The integrated variance process ( RV (cid:0) v ( γ,ν, Σ) (cid:1) ( t )) t ∈T in the multi-factor Bergomi modelsatisﬁes a large deviations principle on R ∗ + as t tends to zero, with speed t − β and rate function ˜Λ ( γ,ν, Σ) (y) := inf (cid:110) Λ ( γ,ν, Σ) (x) : y = RV (x)(1) (cid:111) , where ˜Λ ( γ,ν, Σ) ( v ) = 0 . We now establish the small-time behaviour for Call options on realised variance in Corollary 3.17, byadapting the proof of Corollary 3.7 as in the previous subsection. To do so we use Lemma 3.16 in placeof (B.4). Then we attain the small-time implied volatility behaviour for the multi-factor rough Bergomimodel in Corollary 3.8, where the function I is given by Corollary 3.17.

SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 11

Lemma 3.16.

For all t ∈ T and q > we have E (cid:104)(cid:16) RV (cid:16) v ( γ,ν, Σ) (cid:17) ( t ) (cid:17) q (cid:105) ≤ v q n q t q − exp (cid:18) ( ν ∗ ) η (cid:0) q − q (cid:1) t α +1 (cid:19) , where ν ∗ = max { ν , ..., ν n } Proof.

First we note that by H¨older’s inequality ( (cid:80) ni =1 x i ) q ≤ n q − (cid:80) ni =1 ( x i ) q , for x i >

0. Since, γ i ≤ i = 1 , ..., n , we obtain (cid:16) RV (cid:16) v ( γ,ν, Σ) (cid:17) ( t ) (cid:17) q ≤ v q t q n q − n (cid:88) i =1 (cid:90) t E (cid:20) E (cid:18) ν i η · L i Z s (cid:19) q (cid:21) d s ≤ v q t q − n q − n (cid:88) i =1 exp (cid:18) ν i η (cid:0) q − q (cid:1) t α +1 (cid:19) . Choosing ν ∗ = max { ν , ..., ν n } the result directly follows. (cid:3) Corollary 3.17.

For log moneyness k := log KRV ( v ( γ,ν, Σ) )(0) (cid:54) = 0 , the following equality holds true for Calloptions on integrated variance in the multi-factor rough Bergomi model: (3.9) lim t ↓ t β log E (cid:20)(cid:16) RV ( v ( γ,ν, Σ) )( t ) − e k (cid:17) + (cid:21) = − I( k ) , where I is deﬁned as I( x ) := inf y>x ˜Λ ( γ,ν, Σ) (e y ) for x > , I( x ) := inf yx ˜Λ ( γ,ν, Σ) (e y ) for x > and ¯I( x ) := inf y

In this section we present numerical results for each of the three models given in Section 2. We alsoanalyse the eﬀect of each parameters in the implied volatility smile. Numerical experiments and codesare provided on GitHub: LDP-VolOptions .4.1.

RV smiles for rough Bergomi.

We begin with numerical results for the rough Bergomi model(2.4) using Corollary 3.8. For the detailed numerical method we refer the reader to Appendix C. InFigure 3, we represent the rate function given in Corollary 3.4, which is the fundamental object tocompute numerically. In particular, we notice that ˆΛ v is convex; a rigorous mathematical proof of thisstatement is left for future research.More interestingly, in Figure 2 we provide a comparison of Corollary 3.8 with respect to a benchmarkgenerated by Monte Carlo simulations, and see all smiles to follow a linear trend. In particular, we noticethat Corollary 3.8 provides a surprisingly accurate estimate, even for relatively large maturities. As afurther numerical check, in Figure 4 we compare our results with the close-form at-the-money asymp-totics given by Al`os, Garc´ıa-Lorite and Muguruza [AGM18] and once again ﬁnd the correct convergence,suggesting a consistent numerical framework. Figure 2.

Comparison of Monte Carlo computed implied volatilities (straight lines) andLDP based implied volatilities (stars), in the rough Bergomi model, for diﬀerent valuesof α and maturities T . We set η = 2 and v = 0 .

04; for Monte Carlo we use 200 , t = .4.2. RV smiles for mixed rough Bergomi.

We now consider the mixed rough Bergomi model (2.5)in a simpliﬁed form given by v t = v ( γ E ( ν Z t ) + γ E ( ν Z t )). In Figure 5, we observe that a constraintof the type γ ν + γ ν = 2 in the mixed variance process (2.5) allows us to ﬁx the at-the-money impliedvolatility at a given level, whilst generating diﬀerent slopes for diﬀerent values of ( ν , ν , γ , γ ); as in SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 13

Figure 3.

Rate function ˆΛ v for diﬀerent values of α . Figure 4.

Comparison of Al`os, Garc´ıa-Lorite and Muguruza [AGM18] at-the-moneyimplied volatility asymptotics and LDP based implied volatilities for diﬀerent values of α , in the rough Bergomi model, with η = 1 . v = 0 . Figure 5.

Comparison of LDP based implied volatilities for diﬀerent values of( ν , ν , γ , γ ) in the mixed rough Bergomi process (2.5) such that γ ν + γ ν = 2,with α = − . Remark 4.1.

At this point it is important to note that the mixed rough Bergomi model 2.5 allowsboth the at-the-money implied volatility and its skew to be controlled through ( γ, ν ), whilst in the roughBergomi model (2.4) there is not enough freedom to arbitrarily ﬁt both quantities.4.3.

Linearity of smiles and approximation of the RV density.

Remarkably, we observe a linearpattern in Figures 2-5 for around the money strikes in the (mixed) rough Bergomi model. By assumingthis linear implied volatility smile in log-moneyness space, in Appendix A we are able to provide anapproximation scheme for the realised variance density ψ RV ( x, T ) , x, T ≥ P V olSwap ( T ) = (cid:90) ∞ √ xψ RV ( x, T )d x, where P V olSwap ( T ) is the price of a Volatility Swap with maturity T . Figure 9 presents the results of suchapproximation technique that turns out to be surprisingly accurate. We believe that this approximationcould be useful to practitioners and more details are provided in Appendix A.4.4. RV smiles for mixed multi-factor rough Bergomi.

We conclude our analysis by introducingthe correlation eﬀect in the implied volatility smiles, by considering the mixed multi-factor rough Bergomimodel (2.6). We shall consider the following two simpliﬁed models for instantaneous variance v t = E (cid:18) ν (cid:90) t ( t − s ) α d W s + η (cid:18) ρ (cid:90) t ( t − s ) α d W s + (cid:112) − ρ (cid:90) t ( t − s ) α d W ⊥ s (cid:19)(cid:19) , (4.1) v t = 12 (cid:18) E (cid:18) ν (cid:90) t ( t − s ) α d W s (cid:19) + E (cid:18) η (cid:18) ρ (cid:90) t ( t − s ) α d W s + (cid:112) − ρ (cid:90) t ( t − s ) α d W ⊥ s (cid:19)(cid:19)(cid:19) , (4.2)where W and W ⊥ are independent standard Brownian motions and ν, η > SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 15

Figure 6.

Rate function and corresponding implied volatilities for the model (4.1), with( α, ν, η ) = ( − . , . , . Remark 4.2.

Note here that models such as the 2 Factor Bergomi (and its mixed version) [Ber08, Ber16]have the same small-time limiting behaviour as (4.1) due to Proposition 2.7, where we set L = L Gamma and α = 0, meaning that their smiles follow a linear trend. Despite the empirical evidence given inFigure 1, the commitment to linear smiles from a modelling perspective is strong. The simpliﬁed mixedmulti-factor rough Bergomi model (4.2), however, is suﬃciently ﬂexible to generate both linear ( ρ ≥ ρ <

0) smiles.

Figure 7.

Rate function and corresponding implied volatilities for the model (4.2) with( α, ν, η ) = ( − . , . , . Figure 8.

Implied volatilities and superimposed linear smiles for the model (4.2), with( α, ν, η ) = ( − . , . , . SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 17 Options on VIX

Although options on realised variance are the most natural core modelling object for stochastic volatilitymodels, in practice the most popular variance derivative is the VIX. In this section we therefore turn ourattention to the VIX and VIX options and study their asymptotic behaviour. For this section, we ﬁx T := [0 , T ]. Let us now consider the following general model ( v t ) t ≥ for instantaneous variance:(5.1) v t = ξ ( t ) E (cid:18)(cid:90) t g ( t, s )d W s (cid:19) . Then, the VIX process is given by VIX T = (cid:115) (cid:90) T +∆ T E [ v t |F T ]d t. We introduce the following stochastic process ( V g,T ) t ∈ [ T,T +∆] , for notational convenience, as(5.2) V g,Tt := (cid:90) T g ( t, s )d W s , and assume that the mapping s (cid:55)→ g ( t, s ) is in L [0 , T ] for all t ∈ [ T, T + ∆] such that the stochasticintegral in (5.2) is well-deﬁned.

Proposition 5.1.

The VIX dynamics in model (5.1) , where the volatility of volatility is denoted by ν ,are given by VIX T,ν := 1∆ (cid:90) T +∆ T ξ ( t ) exp (cid:18) νV g,Tt − ν E [( V g,Tt ) ] (cid:19) d t. Proof.

Follows directly from [JMM18, Proposition 3.1]. (cid:3)

We now deﬁne the following L [0 , T ] operator I g,T : L [0 , T ] → C [ T, T + ∆], and space H g,T as(5.3) I g,T f ( · ) := (cid:90) T g ( · , s ) f ( s )d s, H g,T := {I g,T f : f ∈ L [0 , T ] } , where the space H g,T is equipped with the following inner product (cid:104)I g,T f , I g,T f (cid:105) H g,T := (cid:104) f , f , (cid:105) L [0 ,T ] . Note that the function g must be such that the operator I g,T is injective so that that inner product (cid:104)· , ·(cid:105) H g,T on H g,T is well-deﬁned. Proposition 5.2.

Assume that there exists h ∈ L [0 , T ] such that (cid:82) ε | h ( s ) | d s < + ∞ for some ε > and g ( t, · ) = h ( t − · ) for any t ∈ [ T, T + ∆] . Then, the space H g,T is the reproducing kernel Hilbert space forthe process ( V g,Tt ) t ∈ [ T,T +∆] .Proof.

See Appendix B.5. (cid:3)

Theorem 5.3.

For any γ > , the sequence of stochastic processes ( ε γ/ V g,T ) ε> satisﬁes a large devi-ations principle on C [ T, T + ∆] with speed ε − γ and rate function Λ V , deﬁned as (5.4) Λ V (x) :=  (cid:107) x (cid:107) H g,T , if x ∈ H g,T , + ∞ , otherwise.Proof. Direct application of the generalised Schilder’s Theorem [DS89, Theorem 3.4.12]. (cid:3)

Remark 5.4.

We now introduce a Borel subset of C [ T, T + ∆], deﬁned as A := { g ∈ C [ T, T + ∆] : g ( x ) ≥ x ∈ R } . Then, by a simple application of Theorem 5.3 and using that the rate functionΛ V is continuous on A , we can obtain then obtain the following tail behaviour of the process V g,T :(5.5) lim ε ↓ ε γ log P (cid:18) V g,Tt ≥ ε γ/ (cid:19) = − inf g ∈ A Λ V ( g ) , for any γ > t ∈ [ T, T + ∆].

Remark 5.5.

Let us again ﬁx the kernel g as the rough Bergomi kernel and denote the correspondingreproducing kernel Hilbert space by H η,α,T and the corresponding process V g,T as V η,α,T . If x ∈ H η,α,T it follows that there exists f ∈ L [0 , T ] such that x( t ) = (cid:82) T η √ α + 1( t − s ) α f ( s )d s for all t ∈ [ T, T +∆].Clearly, it follows that x ∈ H aη,α,T for any a >

0, as f ∈ L [0 , T ] implies that a f =: f a ∈ L [0 , T ]. Wecan compute the norm of x in each of these spaces to arrive at the following isometry:(5.6) (cid:107) x (cid:107) H aη,α,T = (cid:107) f a (cid:107) L [0 ,T ] = 1 a (cid:90) T f ( s )d s = 1 a (cid:107) x (cid:107) H η,α,T . We may now amalgamate (5.4), (5.5), and (5.6) to arrive at the following statement, which tells us howthe large strike behaviour scales with the vol-of-vol parameter η in the rough Bergomi model:(5.7) lim ε ↓ ε γ log P (cid:18) V aη,α,Tt ≥ ε γ/ (cid:19) = lim ε ↓ ε γ log (cid:32) P (cid:18) V η,α,Tt ≥ ε γ/ (cid:19) /a (cid:33) Indeed, (5.7) tells us precisely how increasing the vol-of-vol parameter η multiplicatively by a factor a inthe rough Bergomi model increases the probability that the associated process V g,T will exceed a certainlevel.Before stating the main theorem of this section, we ﬁrst deﬁne the following rescaled process:(5.8) V g,T,εt := ε γ/ V g,Tt , (cid:101) V g,T,εt := V g,T,εt − ε γ (cid:90) t g ( t, u )d u + ε γ/ , for ε ∈ [0 , t ∈ [ T, T + ∆]. We also deﬁne the following C ([ T, T + ∆] × [0 , ϕ ,ξ , ϕ , whichmap to C ([ T, T + ∆] × [0 , C [0 ,

1] respectively, as(5.9) ( ϕ ,ξ f )( s, ε ) := ξ ( s ) exp( f ( s, ε )) , ( ϕ g )( ε ) := 1∆ (cid:90) T +∆ T g ( s, ε )d s. Note that in the deﬁnition of ϕ ,ξ in (5.9) we assume ξ to be a continuous, single valued, and strictlypositive function on [ T, T +∆]. This then implies that for every s ∈ [ T, T +∆], the map ε (cid:55)→ ( ϕ ,ξ f )( s, ε )is a bijection and hence has an inverse, denoted by ϕ − ,ξ , which is deﬁned as ( ϕ − ,ξ f )( s, ε ) := log (cid:16) f ( s,ε ) ξ ( s ) (cid:17) . Theorem 5.6.

For any γ > , the sequence of rescaled VIX processes ( e ε γ/ VIX

T,ε γ/ ) ε ∈ [0 , satisﬁes apathwise large deviations principle on C [0 , with speed ε − γ and rate function Λ VIX (x) := inf s ∈ [ T,T +∆] (cid:26) Λ V (cid:18) log (cid:18) y( s, · ) ξ ( s ) (cid:19)(cid:19) : x( · ) = ( ϕ y)( · ) (cid:27) . Proof.

See Appendix B.6. (cid:3)

Remark 5.7.

Using Theorem 5.6, we can deduce the small-noise, large strike behaviour of VIX options.Indeed, for the Borel subset A of C [ T, T + ∆]] introduced in Remark 5.4 we have thatlim ε ↓ ε γ log P (cid:16) V IX

T,ε γ/ ≥ e − ε γ/ (cid:17) = − inf g ∈ A Λ VIX ( g ) , for any γ > SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 19 conclusions In this paper we have characterised, for the ﬁrst time, the small-time behaviour of options on integratedvariance in rough volatility models, using large deviations theory. Our approach has a solid theoreticalbasis, with very convincing corresponding numerics, which agree with observed market phenomenon andthe theoretical results attained by Al`os, Garc´ıa-Lorite and Muguruza [AGM18]. Both the theoreticaland the numerical results hold for each of the three rough volatility models presented, whose complexityincreases. Any of the three, with our corresponding results, would be suitable for practical use; the userwould simply chose the level of complexity needed to satisfy their individual needs. Note also that thetheoretical results are widely applicable, and one could very easily adapt results presented in this paperto other models where the volatility process also satisﬁes a large deviations principle, and whose ratefunction can be computed easily and accurately.Perhaps surprisingly, we have discovered that lognormal models such as rough Bergomi [BFG16], 2 FactorBergomi [Ber08, Ber16] and mixed versions thereof, generate linear smiles around the money for optionson realised variance in log-space. This is, at the very least, a property to be taken into account whenmodelling volatility derivatives and, to our knowledge, has never been addressed or commented on inprevious works. Whether such an assumption is realistic or not, we have in addition provided an explicitway to construct a model that generates non-linear smiles should this be required or desired.We have also proved a pathwise large deviations principle for rescaled VIX processes, in a fairly generalsetting with minimal assumptions on the kernel of the stochastic integral used to deﬁne the instantaneousvariance; these results are then used to establish the small-noise, large strike asymptotic behaviour of theVIX. The current set up does not allow us to deduce the small-time VIX behaviour from the pathwiselarge deviations principle, but this would be a very interesting area for future research. Our numericalscheme would most likely give a good approximation for the rate function and corresponding small-timeVIX smiles.

Appendix A. Approximating the density of realised variance in the mixed roughBergomi model

In light of the numerical results shown in Section 4 (see Figures 2-6) we identify a clear linear trend inthe implied volatility smiles generated by both the rough Bergomi and mixed rough Bergomi models.Therefore, it is natural to postulate the following conjecture/approximation of log-linear smiles.

Assumption A.1.

The implied volatility of realised variance options in the mixed rough Bergomi (2.5)model is linear in log-moneyness, and takes the following form:ˆ σ ( K, T ) = (cid:18) T β (cid:18) a ( α, γ, ν ) + b ( α, γ, ν ) log (cid:18) KRV ( v )(0) (cid:19)(cid:19)(cid:19) + where a ( α, γ, ν ) = √ α + 1 (cid:80) ni =1 γ i ν i ( α + 1) √ α + 3 ,b ( α, γ, ν ) = √ α + 1 (cid:32) (cid:80) ni =1 γ i ν i (cid:80) ni =1 γ i ν i I ( α )(2 α + 3) / ( α + 1) − (cid:80) ni =1 γ i ν i (2 α + 2) (cid:112) (2 α + 3) (cid:33) , with I ( α ) = (cid:18) ∞ (cid:88) n =0 ( α ) n ( α + 2) n − − α − − n α + 3 + n + ∞ (cid:88) n =0 ( − n ( − α ) n ( α + 1)( α + 2 + n ) n ! ˆ F ( n, − n − / − n ˆ F ( n, / α + 1 − n (cid:19) ( α + 1)(4 α + 5)such that ˆ F ( n, x ) = F ( − n − α − , α + 1 − n, α + 2 − n, x ) and ( x ) n = n − (cid:89) i =0 ( x + i ) represents the risingPochhammer factorial. Remark A.2.

The values of the constants a ( α, γ, ν ) and b ( α, γ, ν ) in Assumption A.1, which give thelevel and slope of the implied volatility respectively, are given in [AGM18, Example 24 and Example 27]respectively; we generalise to n factors. These results are given in terms of the Hurst parameter H ; toavoid any confusion we will continue with our use of α . Recall that, by Remark 2.2, α = H − / Proposition A.3.

Under Assumption A.1 , the density of RV ( v ( γ,ν ) )( T ) is given by ψ RV ( x, T ) = − φ ( d ( x )) ∂d ( x ) ∂x (cid:16) a ( α, γ, ν ) T α +1 / d ( x ) + 1 (cid:17) , x ≥ where d ( x ) = log( v ) − log( x )ˆ σ ( x,T ) √ T + ˆ σ ( x, T ) √ T , d ( x ) = d ( x ) − ˆ σ ( x, T ) √ T for x ≥ and φ ( · ) is the standardGaussian probability density function.Proof. Let us denote C ( K, T ) := E [( RV ( v ( γ,ν ) )( T ) − K ) + ] . The well-known Breeden-Litzenberger formula [BL78] tells us that ∂ C ( x, T ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K = ψ RV ( K, T ) . Under Assumption A.1, we have that C ( K, T ) = BS ( v , ˆ σ ( K, T ) , K, T )where BS ( v , σ, K, T ) = v Φ( d ) − K Φ( d ) is the Black-Scholes Call pricing formula with Φ the standardGaussian cumulative distribution function. Then, diﬀerentiating C with respect to the strike gives ∂C ( x, T ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K = v φ ( d ( K )) ∂d ( x ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K − xφ ( d ( K )) ∂d ( x ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K − Φ( d ( K ))where ∂d ( x ) ∂x = − ˆ σ ( x, T ) + log( x/v ) a ( α, γ, ν ) T α x ˆ σ ( x, T ) √ T + 12 a ( α, γ, ν ) T α +1 / x = − b ( α, γ, ν ) T α x ˆ σ ( x, T ) √ T + 12 a ( α, γ, ν ) T α +1 / x and ∂d ( x ) ∂x = ∂d ( x ) ∂x − a ( α, γ, ν ) T α +1 / x . Using the well known identity v φ ( d ( x )) = xφ ( d ( x )), proved in Appendix E, we further simplify ∂C ( K, T ) ∂K = v φ ( d ( K )) (cid:18) a ( α, γ, ν ) T α +1 / K (cid:19) − Φ( d ( K )) . Diﬀerentiating again we obtain,

SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 21 ψ RV ( K, T ) = − v φ ( d ( K )) a ( α, γ, ν ) T α +1 / K (cid:18) d ( K ) ∂d ( x ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K + 1 K (cid:19) − φ ( d ( K )) ∂d ( x ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K . Then, by using v φ ( d ( x )) = xφ ( d ( x )), we ﬁnd that ψ RV ( K, T ) = − φ ( d ( K )) (cid:18) a ( α, γ, ν ) T α +1 / (cid:18) d ( x ) ∂d ( x ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K + 1 K (cid:19) + ∂d ( x ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K (cid:19) , which we further simplify to ψ RV ( K, T ) = − φ ( d ( K )) ∂d ( x ) ∂x (cid:12)(cid:12)(cid:12)(cid:12) x = K (cid:16) a ( α, γ, ν ) T α +1 / d ( K ) + 1 (cid:17) , and the result then follows. Note that the density ψ RV ( · , T ) is indeed continuous for all T > (cid:3) Remark A.4.

Note that Proposition A.3 gives the density of RV ( v ( γ,ν ) )( T ) in closed-form. In addition,Proposition A.3 can be easily used to get the density of the Arithmetic Asian option under the Black-Scholes model. This would correspond to the case α = 0 and ν = σ > Remark A.5.

Assuming the density ψ RV exists, we have the following volatility swap price: E [ (cid:113) RV ( v ( γ,ν ) )( T )] = (cid:90) ∞ √ xψ RV ( x, T )d x. In Figure 9, we provide numerical results for the volatility swap approximation, which performs best forshort maturities, due to the nature of the approximation being motivated by small-time smile behaviour.Interestingly, it captures rather accurately the short time decay of the Volatility Swap price for maturitiesless than 3 months; for larger maturities the absolute error does not exceed 20 basis points.

Figure 9.

Volatility Swap Monte Carlo price estimates (straight lines) and LDP basedapproximation (stars) for η = 1 . v = 0 .

04; for Monte Carlo we use 200 , t = . Appendix B. Proof of Main Results

B.1.

Proof of Theorem 3.3.

Proof.

For t ∈ T , ε >

0, we ﬁrst deﬁne the rescaled processes(B.1) Z εt := ε β/ Z t d = Z εt ,v εt := v exp (cid:18) Z εt − η εt ) β (cid:19) , where β := 2 α + 1 ∈ (0 , Z ε ) ε> satisﬁes a large deviations principle on C ( T ) with speed ε − β and rate function Λ Z . We now prove that the two sequences of stochastic processes Z ε and (cid:101) Z ε : = Z ε − η ( ε · ) β are exponentially equivalent [DZ10, Deﬁnition 4.2.10]. For each δ > t ∈ T , there exists ε ∗ : = t (cid:16) δη (cid:17) /β > t ∈T | Z εt − (cid:101) Z εt | = sup t ∈T | η εt ) β | ≤ δ, for all 0 < ε < ε ∗ . Therefore, for all δ >

0, lim sup ε ↓ ε β log P ( (cid:107) Z ε − (cid:101) Z ε (cid:107) ∞ > δ ) = −∞ , and thetwo processes are indeed exponentially equivalent. Then, using [DZ10, Theorem 4.2.13], the sequence ofstochastic processes ( (cid:101) Z ε ) ε> also satisﬁes a large deviations principle on C ( T ), with speed ε − β and ratefunction Λ Z . Moreover, for all ε, t , we have that v εt = v exp( (cid:101) Z εt ), where the bijective transformation x (cid:55)→ v exp( x ) is clearly continuous with respect to the sup norm metric. Therefore we can applythe Contraction Principle [DZ10, Theorem 4.2.1], which is stated in Appendix D, concluding that thesequence of processes ( v ε ) ε> satisﬁes a large deviations principle on C ( T ) with speed ε − β and ratefunction Λ Z (cid:16) log (cid:16) xv (cid:17)(cid:17) . Here we have used that, for each ε > t ∈ T and x ∈ C ( T ), the inversemapping of the bijection transformation x( t, ε ) (cid:55)→ v exp( x ) is given by log (cid:16) xv (cid:17) . Since, For all ε > t ∈T | v εt − v εt | = 0, which implies that for any δ >

0, lim sup ε ↓ ε β log P ( (cid:107) v ε · − v ( ε · ) (cid:107) ∞ > δ ) = −∞ trivially. Thus the sequence of processes ( v εt ) t ∈T and ( v εt ) t ∈T ) are exponentially equivalent and thereforesatisfy the same LDP. Notice also that Λ v ( v ) = Λ Z (0) = (cid:107) (cid:107) H Kα = 0 . (cid:3) B.2.

Proof of Corollary 3.7.

Proof.

The proof of Equation (3.4) is similar to the proof of [FZ17, Corollary 4.9], and we shall provethe lower and upper bound separately, which turn out to be equal. Firstly, as the rate function ˆΛ v iscontinuous on C ( T ), we have that, for all k > t ↓ t β log P (log[ RV ( v )( t )] > k ) = − I( k ) , as an application of Corollary 3.4.(1) The proof of the lower bound is exactly the same as presented in [FZ17, Appendix C] and willbe omitted here; we arrive at lim inf t ↓ t β log E (cid:2) ( RV ( v )( t ) − e k ) + (cid:3) ≥ − I( k ) . (2) To establish the upper bound, We closely follow [PZ16] :First we prove that(B.2) lim t ↓ t β log E [( RV ( v )( t ) − e k )] + = lim t ↓ t β log P ( RV ( v )( t ) ≥ e k ) . SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 23

We apply H¨older’s inequality: E [( RV ( v )( t ) − e k )] + = E (cid:2) ( RV ( v )( t ) − e k )11 { RV ( v )( t ) ≥ e k } (cid:3) ≤ E (cid:2) ( RV ( v )( t ) − e k ) q (cid:3) /q P ( RV ( v )( t ) ≥ e k ) − /q , which holds for all q >

1. We note that for all q ≥

2, the mapping x (cid:55)→ x q for any x ≥ x + y ) q ≤ x q + y q − q for any x, y ≥

0, which inturn implies(B.3) E (cid:2) | RV ( v )( t ) − e k | q (cid:3) ≤ q − (cid:2) E [( RV ( v )( t )) q ] + (e k ) q (cid:3) . We further obtain the following inequality by applying Jensen’s inequality and the fact that allmoments exist for ( RV ( v )( t )) q (B.4) E [( RV ( v )( t )) q ] ≤ t q (cid:90) t E [ v qs ]d s ≤ v q t q (cid:90) t exp (cid:18)(cid:18) q η − qη (cid:19) s α +1 (cid:19) d s ≤ v q t q − exp (cid:18)(cid:18) q η − qη (cid:19) t α +1 (cid:19) Therefore, using (B.3) and (B.4) we obtain an upper boundlim sup t ↓ t β log E [( RV ( v )( t ) − e k ) + ] ≤ lim sup t ↓ (1 − /q ) t β log P ( RV ( v )( t ) ≥ e k ) , since it holds for (1 − /q ) <

1. To obtain a lower bound, we have for any ε > E [( RV ( v )( t ) − e k )] + ≥ E (cid:2) ( RV ( v )( t ) − e k )11 { RV ( v )( t ) ≥ e k + ε } (cid:3) ≥ ε P (cid:0) RV ( v )( t ) ≥ e k + ε (cid:1) , which implieslim inf t ↓ t β log E [( RV ( v )( t ) − e k ) + ] ≥ lim inf t ↓ t β log P (cid:0) RV ( v )( t ) ≥ e k + ε (cid:1) . Now, using (B.2) leads to lim sup t ↓ t β log E (cid:2) ( RV ( v )( t ) − e k ) + (cid:3) ≤ − I( k ). The conclusion forEquation (3.4) then follows directly.The proof of Equation (3.5) follows the same steps, after proving that the process (cid:112) RV ( v ) satisﬁes alarge deviations principle on R + . Indeed, as the function x (cid:55)→ x is a continuous bijection on R + , wehave that the square root of the integrated variance process (cid:112) RV ( v ) satisﬁes a large deviations principleon R + as t tends to zero, with speed t − β and rate function ˆΛ v (( · ) ), using [DZ10, Theorem 4.2.4]. (cid:3) B.3.

Proof of Theorem 3.10.

Proof.

For brevity we set n = 2, but for larger n , identical arguments can be applied. From Schilder’sTheorem [DS89, Theorem 3.4.12 ] and Proposition 3.2, we have that the sequence of processes ( Z ε ) ε> satisﬁes a large deviations principle on C ( T ) with speed ε − β and rate function Λ Z . Deﬁne the operator f : C ( T ) → C (( T ) , R ) by f (x) := ( ν η x , ν η x), which is clearly continuous with respect to the sup-norm (cid:107) · (cid:107) ∞ on C ( T , R ). Applying the Contraction Principle then yields that the sequence of two-dimensionalprocesses (( ν η Z ε , ν η Z ε )) ε> satisﬁes a large deviations principle on C ( T , R ) as ε tends to zero with speed ε − β and rate function˜Λ(y , z) := inf { Λ Z (x) : f (x) = (y , z) } = inf { Λ Z ( ην y) : z = ν ν y } . Identical arguments to the proof of Theorem 3.3 give that the sequences of processes (( ν η Z ε , ν η Z ε )) ε> and (( ν η Z ε − ν ( ε · ) β , ν η Z ε − ν ( ε · ) β )) ε> are exponentially equivalent, thus satisfy the same large devi-ations principle, with the same rate function and the same speed. We now deﬁne the operator g γ : C ( T , R ) → C ( T ) as g γ (x , y) = v ( γe x + (1 − γ ) e y ). For small perturba-tions δ x , δ y ∈ C ( T ) we have thatsup t ∈T | g γ (x + δ x , y + δ y ) − g γ (x , y) | ≤ | v | (cid:18) sup t ∈T | γe x( t ) ( e δ x ( t ) − | + sup t ∈T | (1 − γ ) e y( t ) ( e δ y ( t ) − | (cid:19) . Clearly the right hand side tends to zero as δ x , δ y tends to zero; thus the operator g γ is continuous withrespect to the sup-norm (cid:107) · (cid:107) ∞ on C ( T ). Applying the Contraction Principle then yields that the sequenceof processes ( v ( ε,γ,ν ) ) ε> := (cid:16) v (cid:16) γ exp( ν η Z ε − ν ( ε · ) β ) + (1 − γ ) exp( ν η Z ε − ν ( ε · ) β ) (cid:17)(cid:17) ε> satisﬁes alarge deviations principle on C ( T ) as ε tends to zero, with speed ε − β and rate functionx (cid:55)→ inf { ˜Λ(y , z) : x = g γ (y , z) } = inf { Λ Z ( ην y) : x = g γ (y , ν ν y) } = inf { Λ Z ( ην y) : x = v ( γe y +(1 − γ ) e ν ν y ) } . Since, for all ε > t ∈ T , v ( γ,ν ) εt and v ( ε,γ,ν ) t are equal, the theorem follows immediately. Identicalarguments to the proof of Theorem 3.3 then yield that Λ γ ( v ) = 0. (cid:3) B.4.

Proof of Theorem 3.14.

Proof.

We begin by introducing a small-time rescaling of (2.6) for ε >

0, so that the system becomes(B.5) v ( γ,ν, Σ ,ε ) t := v ( γ,ν, Σ) εt = v n (cid:88) i =1 γ i E (cid:18) ν i η · L i Z εt (cid:19) , with the rescaled process Z εt deﬁned as Z εt := Z εt = ε α + (cid:16)(cid:82) t K α ( s, t )d W s , ..., (cid:82) t K α ( s, t )d W ms (cid:17) .The m -dimensional sequence of processes ( ε β/ ( W , · · · , W m )) ε> satisﬁes a large deviations principleon C ( T , R m ) as ε goes to zero with speed ε − β and rate function Λ m deﬁned by Λ m (x) := (cid:107) x (cid:107) forx ∈ H m and + ∞ otherwise, by Schilder’s Theorem [DS89, Theorem 3.4.12 ]. H m is the reproducingkernel Hilbert space of the measure induced by ( W , · · · , W m ) on C ( T , R m ), deﬁned as H m := (cid:26) ( g , · · · , g m ) ∈ C ( T , R m ) : g i ( t ) = (cid:90) t f i ( s )d s, f i ∈ L ( T ) for all i ∈ · · · m (cid:27) . Then, using an extension of the proof of [HJL18, Theorem 3.6], for i = 1 , · · · , n , the sequence of m -dimensional processes (L i Z ε · ) ε> satisﬁes a large deviations principle on C ( T , R m ) as ε tends to zerowith speed ε − β and rate function y (cid:55)→ inf { Λ m (x) : x ∈ H m , y = L i x } with L i the lower triangular matrixintroduced in Model (2.6). Consequently, for i = 1 , · · · , n each (one-dimensional) sequence of processes (cid:16) ν i η · L i Z ε · (cid:17) ε> also satisﬁes a large deviations principle as ε tends to zero, with speed ε − β and ratefunction Λ Σ i (y) := inf (cid:110) Λ m (x) : x ∈ H m , y = ν i η · L i x (cid:111) .Analogously to Theorem 3.3, each sequence of processes (cid:16) ν i η · L i Z ε · (cid:17) ε> and (cid:16) ν i η · L i Z ε · − ν i Σ i ν i ( ε · ) β (cid:17) ε> are exponential equivalent for i = 1 · · · n ; therefore they satisfy the same large deviations principle withthe same speed ε − β and the same rate function Λ Σ i .We now deﬁne the operator g γ : C ( T , R n ) → C ( T ) as g γ (x)( · ) := v n (cid:88) i =1 γ i exp (cid:18) ν i η · x( · ) (cid:19) , SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 25 with x := (x , · · · , x n ). For small perturbations δ , · · · , δ n ∈ C ( T ) with δ := ( δ , · · · , δ n ), we have thatsup t ∈T | g γ (x + δ )( t ) − g γ (x)( t ) | = sup t ∈T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) v n (cid:88) i =1 γ i exp( ν i η · (x( t ) + δ ( t ))) − exp( ν i η · x( t )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup t ∈T | v | n (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12) exp( ν i η · x( t ))(exp( δ ( t )) − (cid:12)(cid:12)(cid:12)(cid:12) The right-hand side tends to zero as δ , · · · , δ n tends to zero; thus the operator g γ is continuous withrespect to the sup-norm (cid:107)·(cid:107) ∞ on C ( T ). Using that v ( γ,ν, Σ ,ε ) t = g γ ( ν i η · L i Z ε · − ν i Σ i ν i ( ε · ) β )( t ) for each ε > t ∈ T , we can apply the Contraction Principle then yields that the sequence of processes( v ( γ,ν, Σ ,ε ) ) ε> satisﬁes a large deviations principle on C ( T ) as ε tends to zero, with speed ε − β and ratefunctiony (cid:55)→ inf (cid:40) Λ Σ i (x) : y = v n (cid:88) i =1 γ i exp (cid:18) ν i η · x (cid:19)(cid:41) = inf (cid:40) Λ m (x) : x ∈ H m , y = v n (cid:88) i =1 γ i exp (cid:18) ν i η · L i x (cid:19)(cid:41) . As with the previous two models we have that, for all ε > t ∈ T , v ( γ,ν, Σ ,ε ) t and v ( γ,ν, Σ) εt are equalin law and so the result follows directly. (cid:3) B.5.

Proof of Proposition 5.2.

Proof.

We reall that that the inner product (cid:104)I g,T f , I g,T f (cid:105) H g,T := (cid:104) f , f , (cid:105) L [0 ,T ] , where I g,T f ( · ) = (cid:90) T g ( · , s ) f ( s )d s, H g,T = {I g,T f : f ∈ L [0 , T ] } . The proof of Proposition 5.2, which is similar to the proofs given in [JPS18], is made up of three parts.The ﬁrst part is to prove that (cid:0) H g,T , (cid:104)· , ·(cid:105) H g,T (cid:1) is a separable Hilbert space. Clearly I g,T is surjectiveon H g,T . Now take f , f ∈ L [0 , T ] such that I g,T f = I g,T f . For any t ∈ [ T, T + ∆] it follows that (cid:82) T g ( t, s )[ f ( s ) − f ( s )]d s = 0; applying the Titchmarsh convolution theorem then implies that f = f almost everywhere and so I g,T : L [0 , T ] → H g,T is a bijection. I g,T is a linear operator, and therefore (cid:104)· , ·(cid:105) H g,T is indeed an inner product; hence (cid:0) H g,T , (cid:104)· , ·(cid:105) H g,T (cid:1) is a real inner product space. Since L [0 , T ]is a complete (Hilbert) space, for every Cauchy sequence of functions { f n } n ∈ N ∈ L [0 , T ], there exists afunction (cid:101) f ∈ L [0 , T ] such that { f n } n ∈ N converges to (cid:101) f ∈ L [0 , T ]; note also that {I g,T f n } n ∈ N convergesto I g,T (cid:101) f in H g,T . Assume for a contradiction that there exists f ∈ L [0 , T ] such that f (cid:54) = (cid:101) f and {I g,T f n } n ∈ N converges to I g,T f in H g,T , then, since I g,T is a bijection, the triangle inequality yields0 < (cid:13)(cid:13)(cid:13) I g,T f − I g,T (cid:101) f (cid:13)(cid:13)(cid:13) H g,T ≤ (cid:13)(cid:13) I g,T f − I g,T f n (cid:13)(cid:13) H g,T + (cid:13)(cid:13)(cid:13) I g,T (cid:101) f − I g,T f n (cid:13)(cid:13)(cid:13) H g,T , which converges to zero as n tends to inﬁnity. Therefore f = (cid:101) f , I g,T f ∈ H g,T and H g,T is complete,hence a real Hilbert space. Since L [0 , T ] is separable with countable orthonormal basis { φ n } n ∈ N , then {I g,T φ n } n ∈ N is an orthonormal basis for H g,T , which is then separable.The second part of the proof is to show that there exists a dense embedding ι : H g,T → C [ T, T +∆]. Sincethere exists h ∈ L [0 , T ] such that (cid:82) ε | h ( s ) | d s for all ε > g ( t, · ) = h ( t − · ) for any t ∈ [ T, T + ∆],we can apply by [Che08, Lemma 2.1], which tells us that H g,T is dense in C [ T, T + ∆] and so we choosethe embedding to be the inclusion map.Finally we must prove that every f ∗ ∈ C [ T, T + ∆] ∗ , where f ∗ is deﬁned in [JPS18, Deﬁnition 3.1], isGaussian on C [ T, T + ∆], with variance (cid:107) ι ∗ f ∗ (cid:107) H g,T ∗ , where ι ∗ is the dual of ι . Take f ∗ ∈ C [ T, T + ∆] ∗ , then by the fact that ( C [ T, T + ∆] , H g,T , µ ) is a RKHS triplet by similar arguments to [FZ17, Lemma3.1], we obtain using [JPS18, Remark 3.6] that ι ∗ admits an isometric embedding ι ∗ such that (cid:107) ι ∗ f ∗ (cid:107) H g,T ∗ = (cid:107) f ∗ (cid:107) L ( C [ T,T +∆] ,µ ) = (cid:90) C [ T,T +∆] ( f ∗ ) d µ = VAR( f ∗ ) , where µ is the Gaussian measure induced by the process (cid:82) T g ( · , s )d s on ( C [ T, T + ∆] , B ( C [ T, T + ∆])) . (cid:3) B.6.

Proof of Theorem 5.6.

Proof.

First we recall (cid:101) V g,T,εt := V g,T,εt − ε γ (cid:82) t g ( t, u )d u + ε γ/ . We begin the proof by showing thatthe sequence of processes ( V g,T,ε ) ε ∈ [0 , and ( (cid:101) V g,T,ε ) ε ∈ [0 , are exponentially equivalent [DZ10, Deﬁnition4.2.10]. As g ( t, · ) ∈ L [0 , T ] for all t ∈ [ T, T + ∆], for each δ > ε ∗ > t ∈ [ T,T +∆] (cid:12)(cid:12)(cid:12) ε γ/ ∗ − ε γ ∗ (cid:82) T g ( t, u )d u (cid:12)(cid:12)(cid:12) ≤ δ. Therefore, for the C [ T, T + ∆] norm (cid:107) · (cid:107) ∞ we have that for all ε ∗ > ε > P (cid:16)(cid:13)(cid:13)(cid:13) V g,T,ε − (cid:101) V g,T,ε (cid:13)(cid:13)(cid:13) ∞ > δ (cid:17) = P (cid:32) sup t ∈ [ T,T +∆] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε γ/ − ε γ (cid:90) T g ( t, u )d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > δ (cid:33) = 0 . Therefore lim sup ε ↓ ε γ log P (cid:16)(cid:13)(cid:13)(cid:13) V g,T,ε − (cid:101) V g,T,ε (cid:13)(cid:13)(cid:13) ∞ > δ (cid:17) = −∞ , and so the two sequences of processes( V g,T,ε ) ε ∈ [0 , and ( (cid:101) V g,T,ε ) ε ∈ [0 , are exponentially equivalent; applying [DZ10, Theorem 4.2.13] thenyields that ( (cid:101) V g,T,ε ) ε ∈ [0 , satisﬁes a large deviations principle on C [ T, T + ∆] with speed ε − γ and ratefunction Λ V .We now prove that the operators ϕ ,ξ and ϕ are continuous with respect to the C ([ T, T + ∆] × [0 , C [0 , (cid:107) · (cid:107) ∞ norms respectively. The proofs are very simple, and are included for completeness. Firstlet us take a small perturbation δ f ∈ C ([ T, T + ∆] × [0 , (cid:13)(cid:13) ϕ ,ξ ( f + δ f ) − ϕ ,ξ ( f ) (cid:13)(cid:13) ∞ = sup ε ∈ [0 , s ∈ [ T,T +∆] (cid:12)(cid:12)(cid:12) ξ ( s ) e f ( s,ε ) (cid:16) e δ f ( s,ε ) − (cid:17)(cid:12)(cid:12)(cid:12) ≤ sup ε ∈ [0 , s ∈ [ T,T +∆] | ξ ( s ) | sup ε ∈ [0 , s ∈ [ T,T +∆] | e f ( s,ε ) | sup ε ∈ [0 , s ∈ [ T,T +∆] | e δ f ( s,ε ) − | . Since ξ is continuous on [ T, T + ∆] and f is continuous on [ T, T + ∆] × [0 , e δ f ( s,ε ) − δ f tends to zero and hence the operator ϕ ,ξ is continuous. Nowtake a small perturbation δ f ∈ C ([ T, T + ∆] × [0 , (cid:13)(cid:13) ϕ ( f + δ f ) − ϕ ( f ) (cid:13)(cid:13) ∞ = sup ε ∈ [0 , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) T +∆ T δ f ( s, ε )d s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ M, where M := sup ε ∈ [0 , δ f ( s, ε ). Clearly M tends to zero as δ f tends to zero, thus the operator ϕ is alsocontinuous.For every s ∈ [ T, T + ∆] we have the following: by an application of the Contraction Principle [DZ10,Theorem 4.2.1] and using the fact that ε (cid:55)→ ( ϕ ,ξ f )( s, ε ) is a bijection for all f ∈ C [ T, T + ∆] it followsthat the sequence of stochastic processes (cid:16)(cid:16) ϕ ,ξ (cid:101) V g,T,εs (cid:17) ( s, ε ) (cid:17) ε ∈ [0 , satisﬁes a large deviations principleon C [0 ,

1] as ε tends to zero with speed ε − γ and rate functionˆΛ Vs (y) := Λ V (cid:0) ( ϕ ,ξ y) − ( s, · ) (cid:1) = Λ V (cid:18) log (cid:18) y( s, · ) ξ ( s ) (cid:19)(cid:19) . SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 27

A second application of the Contraction Principle then yields that the sequence of stochastic pro-cesses (cid:16) ( ϕ ( ϕ ,ξ (cid:101) V g,T,εs ))( ε ) (cid:17) ε ∈ [0 , satisﬁes a large deviations principle on C [0 ,

1] with speed ε − γ andrate function Λ VIX (x) = inf s ∈ [ T,T +∆] { Λ V (cid:0) ( ϕ ,ξ y) − ( s, · ) (cid:1) : x( · ) = ( ϕ y)( · ) } . By deﬁnition, the se-quence of processes (cid:16) ( ϕ ( ϕ ,ξ (cid:101) V g,T,εs ))( ε ) (cid:17) ε ∈ [0 , is almost surely equal to the rescaled VIX processes( e ε γ/ VIX

T,ε γ/ ) ε ∈ [0 , and hence the satisﬁes the same large deviations principle. (cid:3) Appendix C. Numerical recipes

We ﬁrst consider the simple rough Bergomi (2.4) model for sake of simplicity and further develop themixed multi-factor rough Bergomi (2.6) model in Appendix C.3 (which also includes (2.5)). Therefore,we tackle the numerical computation of the rate functionˆΛ v (y) := inf { Λ v (x) : y = RV (x)(1) } . This problem, in turn, is equivalent to the following optimisation:(C.1) ˆΛ v (y) := inf f ∈ L [0 , (cid:26) || f || : y = RV (cid:18) exp (cid:18)(cid:90) · K α ( u, · ) f ( u )d u (cid:19)(cid:19) (1) (cid:27) . A natural approach is to consider a class of functions that is dense in L [0 , f ( n ) ( s ) = n (cid:88) i =0 a i s i such that { ˆ f ( n ) } a i ∈ R is dense in L [0 ,

1] as n tends to + ∞ . Problem (C.1) may then be approximatedvia ˆΛ vn (y) := inf a ∈ R n +1 (cid:26) || ˆ f ( n ) || : y = RV (cid:18) exp (cid:18)(cid:90) · K α ( u, · ) ˆ f ( n ) d u ( u ) (cid:19)(cid:19) (1) (cid:27) , where a = ( a , ..., a n ). In order to obtain the solution, ﬁrst the constraint y = RV (cid:16) exp (cid:16)(cid:82) · K α ( u, · ) ˆ f ( n ) ( u )d u (cid:17)(cid:17) (1)needs to be satisﬁed. To accomplish this, we consider anchoring one of the coeﬃcients in ˆ f ( n ) such that(C.2) a ∗ i = argmin a i ∈ R (cid:40)(cid:18) y − RV (cid:18) exp (cid:18)(cid:90) · K α ( u, · ) ˆ f ( n ) ( u )d u (cid:19)(cid:19) (1) (cid:19) (cid:41) and the constraint will be satisﬁed for all combinations of the vector a ∗ = ( a , ..., a i − , a ∗ i , a i +1 , ..., a n ).Numerically, (C.2) is easily solved using a few iterations of the Newton-Raphson algorithm. Then wemay easily solve inf a ∗ ∈ R n +1 (cid:26) || ˆ f ( n ) || (cid:27) which will converge to the original problem (C.1) as n → + ∞ . The polynomial basis is particularlyconvenient since we have that(C.3) RV (cid:18) exp (cid:18)(cid:90) · K α ( u, · ) ˆ f ( n ) ( u )d u (cid:19)(cid:19) (1) = (cid:90) exp (cid:32) η √ α + 1 n (cid:88) i =0 a i s α +1+ i F ( i + 1 , − α, i + 2 , i + 1 (cid:33) d s, where F denotes the Gaussian hypergeometric function. In particular one may store the values { F ( i + 1 , − α, i + 2 , } ni =0 in the computer memory and reuse them through diﬀerent iterations. Inaddition, the outer integral in (C.3) is eﬃciently computed using Gauss-Legendre quadrature i.e. RV (cid:18)(cid:18)(cid:90) · K α ( u, · ) ˆ f ( n ) ( u )d u (cid:19)(cid:19) (1) ≈ m (cid:88) k =1 exp (cid:32) η √ α + 1 n (cid:88) i =1 a i (cid:0) (1 + p k ) (cid:1) α +1+ i F ( i + 1 , − α, i + 2 , i + 1 (cid:33) w k , where { p k , w k } mk =1 are m -th order Legendre points and weights respectively.C.1. Convergence Analysis of the numerical scheme.

In Figure 10 we report the absolute diﬀer-ences | ˆΛ vn +1 (y) − ˆΛ vn (y) | , for which we observe that the distance decreases as we increase n , the degree ofthe approximating polynomial. We must enphasize that both numerical routines performing the minimi-sation steps have a default tolerance of 10 − , hence we cannot expect to obtain higher accuracies thanthe tolerance and that is why Figure 10 becomes noisy once this acuracy is obtained. We note that inFigure 10 we observe a fast convergence, and with just n = 5 we usually obtain accuracy of 10 − . Figure 10.

Absolute diﬀerence of subsequent approximating polynomials | ˆΛ vn +1 (y) − ˆΛ vn (y) | C.2.

A tailor-made polynomial basis for rough volatility.

We may improve the computation timeof the previous approach by considering a tailor-made polynomial basis. In particular, recall the followingrelation (cid:90) s K α ( u, s ) u k d u = u α +1+ k F ( k + 1 , − α, k + 2) k + 1 , SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 29 then, for k = − α − (cid:90) s K α ( u, s ) u − α − d u = F ( − α, − α, − α, − α , which in turn is a constant that does not depend on the upper integral bound s . Proposition C.1.

Consider the basis ˆ g ( n ) ( s ) = cs − α − + (cid:80) ni =0 a i s i , where c ∈ R . Then, for c = c ∗ with c ∗ = − αη √ α + 1 F ( − α, − α, − α,

1) log  y (cid:82) exp (cid:16) η √ α + 1( (cid:80) ni =0 a i s α +1+ i F ( i +1 , − α,i +2 , i +1 (cid:17) d s  , ˆ g ( n ) ( s ) solves (C.2) .Proof. We have that RV (cid:18)(cid:90) · K α ( u, · )ˆ g ( n ) ( u )d u (cid:19) (1) = exp (cid:18) η √ α + 1 − α F ( − α, − α, − α, (cid:19) × (cid:90) exp (cid:32) η √ α + 1 n (cid:88) i =0 a i s α +1+ i F ( i + 1 , − α, i + 2 , i + 1 (cid:33) d s and the proof trivially follows by solving y = RV (cid:0)(cid:82) · K α ( u, · )ˆ g ( n ) ( u )d u (cid:1) (1). (cid:3) Remark C.2.

Notice that Proposition C.1 gives a semi-closed form solution to (C.2). Then, we onlyneed to solve inf ( a ,...,a n ) ∈ R n +1 (cid:26) || ˆ g ( n ) || : c = c ∗ (cid:27) in order to recover a solution for (C.1). Remark C.3.

Notice that u − α − / ∈ L [0 , u − α − { u>ε } ∈ L [0 ,

1] for all ε >

0. Moreover, (cid:90) s K α ( s, u ) u − α − { u>ε } d u = F ( − α, − α, − α, − α − ε − α s α F ( − α, − α, − α, εt ) − α = F ( − α, − α, − α, − α + O ( ε − α ) , hence for ε suﬃciently small the error is bounded as long as α (cid:54) = 0. In our applications we ﬁnd thatthis method behaves nicely for α ∈ ( − . , − . α (which is rather surprising behaviour, as the converse is trueof other approximation schemes when the volatility trajectories become more rough) as well as strikesaround the money. Moreover, the truncated basis approach constitutes a 30-fold speed improvement inour numerical tests. As benchmark we consider the standard numerical algorithm introduced in (C.2),with accuracy measured by absolute error.C.3. Multi-Factor case.

The correlated mixed multi-factor rough Bergomi (2.6) model requires aslightly more complex setting. By Corollary 3.15 the rate function that we aim at is given by followingmultidimensional optimisation problem:(C.4) ˆΛ ( v, Σ) (y) := inf ( f ,...,f n ) ∈ L [0 , (cid:40) n (cid:88) i =1 || f i || : y = RV (cid:32) m (cid:88) i =1 γ i exp (cid:18) ν i η · Σ i f K α . (cid:19) u (cid:33) (1) (cid:41) , Figure 11.

Absolute error of the rate function. We consider the truncated basis ap-proach against the standard polynomial basis with η = 1 . v = 0 .

04 and diﬀerent valuesof α .where f K α . = (cid:0)(cid:82) · K α ( u, · ) f ( u )d u, ..., (cid:82) · K α ( u, · ) f n ( u )d u (cid:1) . The approach to solve this problem is similarto that of (C.1). Nevertheless, in order to solve (C.4) we shall use a multi-dimensional polynomial basis (cid:16) ˆ f ( p )1 ( s ) , ..., ˆ f ( p ) n ( s ) (cid:17) = (cid:32) p (cid:88) i =0 a i s i , ..., p (cid:88) i =0 a ni s i (cid:33) such that each ˆ f ( p ) i ( s ) for i ∈ { , ..., n } is dense as p tends to + ∞ in L [0 ,

1] by Stone-WeierstrassTheorem. Then we may equivalently solve(C.5) inf ( a ,...,a p ,...,a n ,...,a np ) ∈ R ( p +1) n (cid:40) n (cid:88) i =1 || ˆ f ( p ) i || : y = RV (cid:32) m (cid:88) i =1 γ i exp (cid:18) ν i η · Σ i ˆ f ( K α ,p ) . (cid:19) u (cid:33) (1) (cid:41) , where ˆ f ( K α ,p ) . = (cid:16)(cid:82) · K α ( u, · ) ˆ f ( p )1 ( u )d u, ..., (cid:82) · K α ( u, · ) ˆ f ( p ) n ( u )d u (cid:17) . Then as p tends to + ∞ , (C.5) willconverge to the original problem (C.4). In order to numerically accelerate the optimisation problem in(C.5), we anchor coeﬃcients ( a , ...., a n ) to satisfy the constraint y = RV ( · )(1) (same way we did in theone dimensional case), that is a ∗ := inf ( a ,....,a n ) ∈ R n (cid:32) y − RV (cid:32) m (cid:88) i =1 γ i exp (cid:18) ν i η · Σ i f K α . (cid:19) u (cid:33) (1) (cid:33)  where a ∗ = ( a ∗ , ..., a n ∗ ) and one may use (C.3) and Gauss-Legendre quadrature to eﬃciently compute RV ( · )(1). Then, the constraint will always be satisﬁed by construction and instead we may solve(C.6) inf ( a ∗ ,a ...,a p ,..., a n ∗ ,a n ,...,a np ) ∈ R ( p +1) n (cid:40) n (cid:88) i =1 || ˆ f ( p ) || (cid:41) . Appendix D. Exponential Equivalence and Contraction Principle

Deﬁnition D.1.

On a metric space ( Y , d ), two Y -valued sequences ( X ε ) ε> and ( (cid:101) X ε ) ε> are calledexponentially equivalent (with speed h ε ) if there exist probability spaces (Ω , B ε , P ε ) ε> such that for SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 31 any ε > P ε is the joint law of ( (cid:101) X ε , X ε ), ε > δ >

0, the set (cid:110) ω : ( (cid:101) X ε , X ε ) ∈ Γ δ (cid:111) is B ε -measurable, and lim sup ε ↓ h ε log P ε (Γ δ ) = −∞ , where Γ δ := { (˜ y, y ) : d (˜ y, y ) > δ } ⊂ Y × Y . Theorem D.2.

Let X and Y be topological spaces and f : X → Y a continuous function. Consider agood rate function I : X → [0 , ∞ ] . For each y ∈ Y , deﬁne I (cid:48) ( y ) := inf { I ( x ) : x ∈ X , y = f ( x ) } . Then, if I controls the LDP associated with a family of probability measures { µ ε } on X , the I (cid:48) controls the LDPassociated with the family of probability measures { µ ε ◦ f − } on Y and I (cid:48) is a good rate function on Y . Appendix E. Proof of v φ ( d ( x )) = xφ ( d ( x ))In order to prove v φ ( d ( x )) = xφ ( d ( x )), we will prove the following equivalent result( d ( x )) − ( d ( x )) = 2 log (cid:16) v x (cid:17) . Proof.

Recall that φ ( · ) is the standard Gaussian probability density function. Using that for x ≥ d ( x ) = log( v ) − log( x )ˆ σ ( x,T ) √ T + ˆ σ ( x, T ) √ T and d ( x ) = d ( x ) − ˆ σ ( x, T ) √ T , we obtain( d ( x )) − ( d ( x )) = ( d ( x )) − (cid:16) d ( x ) − ˆ σ ( x, T ) √ T (cid:17) , = 2 d ( x )ˆ σ ( x, T ) √ T − T ˆ σ ( x, T ) , = 2 (cid:20) log( v ) − log( x ) + 12 ˆ σ ( x, T ) T (cid:21) − T ˆ σ ( x, T ) , = 2 log (cid:16) v x (cid:17) . (cid:3) References [AGM18] E. Al`os, D. Garc´ıa-Lorite and A. Muguruza. On smile properties of volatility derivatives and exotic products:understanding the VIX skew. arXiv:1808.03610, 2018.[ALV07] E. Al`os, J. Le´on and J. Vives. On the short-time behavior of the implied volatility for jump-diﬀusion models withstochastic volatility.

Finance and Stochastics , (4), 571-589, 2007.[AP18] M. Avellaneda and A. Papanicolau. Statistics of VIX Futures and Their Applications to Trading Volatility Exchange-Traded Products. The Journal of Investment Strategies , Vol. 7, No. 2, 1-33, 2018.[AS72] M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematicaltables. Dover, New York, 1972.[AT07] R.J. Adler and J.E. Taylor. Random Fields and Geometry. Springer-Verlag, New York, 2007.[Ber08] L. Bergomi. Smile dynamics III.

Risk , October: 90-96, 2008.[Ber16] L. Bergomi. Stochastic Volatility Modeling.

Chapman & Hall/CRC ﬁnancial mathematics series , 2016.[BFG16] C. Bayer, P.K. Friz and J. Gatheral. Pricing under rough volatility.

Quantitative Finance , (6): 887-904, 2016.[BFGHS17] C. Bayer, P. Friz, A. Gulisashvili, B. Horvath and B. Stemper. Short-time near the money skew in roughfractional stochastic volatility models. Quantitative Finance , (3): 2017.[BGT89] N. H. Bingham, C. M. Goldie and J. L. Teugels. Regular Variation, 2nd edition. Cambridge University Press,Cambridge, 1989.[BHØZ08] F. Biagini, Y. Hu, B. Øksendal and T. Zhang. Stochastic Calculus for Fractional Brownian Motion and Appli-cations. Springer London, 2008. [BL78] D.T. Breeden and R.H. Litzenberger. Prices of state-contingent claims implicit in options prices. J. Business , :621–651, 1978.[BLP15] M. Bennedsen, A. Lunde and M.S. Pakkanen. Hybrid scheme for Brownian semistationary processes. Finance andStochastics , (4): 931-965, 2017.[BS07] O. E. Barndorﬀ-Nielsen and J. Schmiegel. Ambit processes: with applications to turbulence and tumour growth. InF. E. Benth, G. Di Nunno, T. Lindstrøm, B. Øksendal and T. Zhang (Eds.), 2007. Stochastic analysis and applications,volume 2 of Abel Symp. : 93-124, Springer, Berlin.[CGMY05] P. Carr, H. Geman, D. Madan, and M. Yor. Pricing options on realized variance.

Finance and Stochastics , (4):453–475, 2005.[Che08] A. Cherny. Brownian moving averages have conditional full support. Ann. App. Prob. , (5): 1825-1850, 2008.[CL09] P. Carr and R. Lee. Volatility Derivatives. Annual Review of Financial Economics (1), 319-339,2009.[DS89] J.D. Deuschel and D.W. Stroock. Large Deviations. Academic Press, Boston, 1989.[DZ10] A. Dembo and O. Zeitouni. Large Deviations Theory and Applications, Second Edition. Springer-Verlag BerlinHeidelberg, 2010.[ER18] O. El Euch and M. Rosenbaum. Perfect hedging under rough Heston models. The Annals of Applied Probability , (6): 3813-3856, 2018.[ER19] O. El Euch and M. Rosenbaum. The characteristic function of rough Heston models. Mathematical Finance , (1):3-38, 2019.[FGP18] P. Friz, P. Gassiat and P. Pigato. Precise asymptotics: robust stochastic volatility models. Preprint available atarXiv:1811.00267, 2018.[FZ17] M. Forde and H. Zhang. Asymptotics for rough stochastic volatility models. SIAM J. Finan. Math., (1): 114-145,2017.[Gas18] P. Gassiat. On the martingale property in the rough Bergomi model. Electron. Commun. Probab. (33), 2019.[GJR18] J. Gatheral, T. Jaisson and M. Rosenbaum. Volatility is rough. Quantitative Finance , (6): 933-949, 2018.[GL14] K. Gao and R. Lee. Asymptotics of implied volatility to arbitrary order. Finance and Stochastics , (2): 349-392,2014.[HJL18] B. Horvath, A. Jacquier, and C. Lacombe. Asymptotic behaviour of randomised fractional volatility models.Forthcoming Journal of Applied Probability , 2019.[HJM17] B. Horvath, A. Jacquier and A. Muguruza. Functional central limit theorems for rough volatility. arXiv:1711.03078,2017.[HJT18] B. Horvath, A. Jacquier and P. Tankov. Volatility options in rough volatility models. SIAM J. Finan. Math., (2):437-469, 2020[HMT19] B. Horvath, A. Muguruza and M. Tomas. Deep Learning Volatility. arXiv:1901.09647, 2019.[JMM18] A. Jacquier, C. Martini, A. Muguruza. On VIX Futures in the rough Bergomi model. Quant. Finance , (1):45-61, 2018.[JPS18] A. Jacquier, M. Pakkanen, and H. Stone. Pathwise Large Deviations for the rough Bergomi model. Journal ofApplied Probability

Quant. Finance , (11): 1877–1886, 2018.[Neu94] A. Neuberger. The log contract. Journal of Portfolio Management , (2): 74-80, 1994.[PZ16] D. Pirjol and L. Zhu. Short Maturity Asian Options in Local Volatility Models. SIAM Journal on Financial Math-ematics , : 947-992,2016.[Sto19] H. Stone. Calibrating rough volatility models: a convolutional neural network approach. Quantitative Finance , (3): 379-392, 2020. SYMPTOTICS FOR VOLATILITY DERIVATIVES IN MULTI-FACTOR ROUGH VOLATILITY MODELS 33

Department of Mathematics, Imperial College London

Email address : [email protected] Department of Mathematics, Imperial College London and Synergis

Email address : [email protected] Department of Mathematics, Imperial College London

Email address ::