[PDF] Kernel Estimation of Spot Volatility with Microstructure Noise Using Pre-Averaging

Abstract

We first revisit the problem of estimating the spot volatility of an Itô semimartingale using a kernel estimator. We prove a Central Limit Theorem with optimal convergence rate for a general two-sided kernel. Next, we introduce a new pre-averaging/kernel estimator for spot volatility to handle the microstructure noise of ultra high-frequency observations. We prove a Central Limit Theorem for the estimation error with an optimal rate and study the optimal selection of the bandwidth and kernel functions. We show that the pre-averaging/kernel estimator's asymptotic variance is minimal for exponential kernels, hence, justifying the need of working with kernels of unbounded support as proposed in this work. We also develop a feasible implementation of the proposed estimators with optimal bandwidth. Monte Carlo experiments confirm the superior performance of the devised method.

Full PDF

KKernel Estimation of Spot Volatility with MicrostructureNoise Using Pre-Averaging

José E. Figueroa-López ∗ Bei Wu † April 4, 2020

Abstract

We ﬁrst revisit the problem of kernel estimation of spot volatility in a general continuous Itôsemimartingale model in the absence of microstructure noise, and prove a Central Limit Theo-rem with optimal convergence rate, which is an extension of Figueroa-López & Li (2020a) as weallow for a general two-sided kernel function. Next, to handle the microstructure noise of ultrahigh-frequency observations, we present a new type of pre-averaging/kernel estimator for spotvolatility under the presence of additive microstructure noise. We prove Central Limit Theoremsfor the estimation error with an optimal rate and study the problems of optimal bandwidth andkernel selection. As in the case of a simple kernel estimator of spot volatility in the absenceof microstructure noise, we show that the asymptotic variance of the pre-averaging/kernel esti-mator is minimal for exponential or Laplace kernels, hence, justifying the need of working withunbounded kernels as proposed in this work. Feasible implementation of the proposed estima-tors with optimal bandwidth is also developed. Monte Carlo experiments conﬁrm the superiorperformance of the devised method.

AMS 2000 subject classiﬁcations : 62M09, 62G05.

Keywords and Phrases : Spot volatility estimation; kernel estimation; pre-averaging; mi-crostructure noise; bandwidth selection; kernel function selection.

Continuous Itô semimartingale models for the dynamics of asset returns have been widely used inﬁnancial econometrics. Such a process takes the form dX t = µ t dt + σ t dW t , (1.1)where W t is a standard Brownian motion. A key component of the model, spot volatility σ t , playsa crucial rule in option pricing, portfolio management, and ﬁnancial risk management. Since last ∗ Department of Mathematics and Statistics, Washington University in St. Louis, St. Louis, MO 63130, USA( [email protected] ). Research supported in part by the NSF Grants: DMS-1561141, DMS-1613016. † Department of Mathematics and Statistics, Washington University in St. Louis, St. Louis, MO 63130, USA( [email protected] ). a r X i v : . [ ec on . E M ] A p r ecade, there has been some growing interest in the estimation of volatility due to the wide avail-ability of high frequency data. In this work, we are concerned with spot volatility estimation in anItô semimartingale model with kernel estimation. This is one of the most widely used nonparametricmethods in statistics, dating back to the seminal work of Rosenblatt (1956) and Parzen (1962) (seealso the monograph Wand & Jones (1995)).One of the earliest works on kernel-based estimation of spot volatility dates back to Foster &Nelson (1996), where they studied a weighted rolling window estimator, which is essentially a kernelestimator with compact support. Asymptotic normality was established under abstract conditionsthat are not directly satisﬁed by a general Itô model of the form (1.1). Concretely, they worked witha time series discretization of the model (1.1). Fan & Wang (2008) also established the asymptoticnormality for a general kernel estimator, but the result was limited to a speciﬁc constraint onthe bandwidth, which allowed them to neglect the target error coming from approximating thespot volatility by a kernel weighted discrete volatility. As a result, the achieved convergence rateswere suboptimal. See also Kristensen (2010) and Mancini et al. (2015) for other Central LimitTheorems (CLT) of kernel-based estimators with suboptimal rate of convergence under diﬀerentconditions. Alvarez et al. (2012) proposed an estimator of σ pt by considering forward ﬁnite diﬀerenceapproximations of the realized power variation process of order p , which is essentially a forward-looking kernel estimator with uniform kernel. Jacod & Protter (2011) considered both backward andforward ﬁnite diﬀerence approximations of the realized quadratic variation. Both works obtainedthe best possible convergence rates for their CLTs under a quite general Itô semimartingale model(including jumps). We also refer (Aït-Sahalia & Jacod, 2014, Chapter 8) for an extensive review ofthe relevant literature.More recently, Figueroa-López & Li (2020a) studied the leading order terms of the mean-squareerror (MSE) of kernel-based estimators under a certain local condition on the covariance functionof the spot variance σ t , which covers not only Brownian driven volatilities but also those drivenby fractional Brownian motion and other Gaussian processes. Using the asymptotics for the MSE,the optimal convergence rate was established and formulas for the optimal bandwidth and kernelfunctions were derived. CLTs for general one-sided kernel estimators were also obtained (see alsoRemark 8.10 in Aït-Sahalia & Jacod (2014), where a result for a general right-sided kernel withcompact support was stated without proof). One of the objectives of the present work is then toextend the results of Figueroa-López & Li (2020a) and prove a CLT for a general unbounded two-sided kernel since such kernels can yield estimators with better performance than one-sided kernels,as showed in Figueroa-López & Li (2020a) (see also Section 3 below).While the results described in the previous paragraph are useful for intermediate intraday fre-quencies (e.g., 1 to 5 minute), it is widely accepted that ﬁnancial returns at ultra high-frequency arecontaminated by market microstructure noise. Speciﬁcally, high-frequency asset prices exhibit sev-eral stylized features, which cannot be accounted by Itô semimartingales, such as clustering noises,bid/ask bounce eﬀects, and roundoﬀ errors (cf. (Campbell et al. , 1997, Chapter 3), Zeng (2003),(Aït-Sahalia & Jacod, 2014, Chapter 2)). Such discrepancies between macro and micro movementsare typically modeled by an additive noise. The literature of statistical estimation methods under2icrostructure noise has grown extensively since last decade and is still a highly researched sub-ject (see Zhang et al. (2005), Hansen & Lunde (2006), Bandi & Russell (2008), Mykland & Zhang(2012), Barndorﬀ-Nielsen et al. (2008), Podolskij & Vetter (2009), and Jacod et al. (2009) for afew seminal works in the area as well as the monograph Aït-Sahalia & Jacod (2014)). Most of theexisting literature on volatility estimation for high frequency data with microstructure noise hasmainly focused on the estimation of integrated volatility or variance (IV), deﬁned as IV T = (cid:82) T σ t dt .Zhang et al. (2005) showed that scaled by (2 n ) − , the realized variance estimator, the gold stan-dard for IV estimation in the absence of microstructure noise, consistently estimates the variance ofthe microstructure noise, instead of the integrated volatility, as the sampling frequency n increases.There are several approaches to overcome this problem: the Two Scale Realized Variance (TSRV)estimator by Zhang et al. (2005) and the eﬃcient Multiscale Realized Variance by Zhang (2006); theRealized Kernel estimator by Barndorﬀ-Nielsen et al. (2008); the pre-averaging method by Podolskij& Vetter (2009) and Jacod et al. (2009); and the Quasi-Maximun Likelihood Estimator (QMLE) byXiu (2010).Spot volatility estimation is often viewed as a byproduct of integrated volatility estimation. In-deed, if we choose a shrinking time span on which the integral of volatility is calculated, then theestimates of integrated volatility should converge to the spot volatility. Following this idea, Zu &Boswijk (2014) constructs the Two Scale Realized Spot Variance (TSRSV) estimator based on theTSRV integrated variance estimator. They proved consistency and derived the asymptotic distribu-tion of the estimation error with a convergence rate of n − / , which is known to be suboptimal.The second objective of our work is to construct a kernel based estimator of the spot volatilitybased on the pre-averaging integrated variance estimator of Jacod et al. (2009). The basic ideais simple and natural. If we denote (cid:99) IV pre − avt the pre-averaging estimator of IV t = (cid:82) t σ s ds , ourestimator combines this with a kernel localization technique as follows: ˆ σ t = (cid:90) t b n K (cid:18) s − tb n (cid:19) dIV pre − avs , where K is a suitable kernel function and b n > is the bandwidth, which should converge to at an appropriate rate. We establish the asymptotic mix normality of our estimator and identifytwo asymptotic regimes for two diﬀerent bandwidth convergence regimes. One of those regimesyields the optimal convergence rate of n − / for our estimator. It is important to point out thatthe asymptotic theory for the kernel/pre-averaging estimator cannot be derived from that for thepre-averaging integrated variance and also is substantially diﬀerent and harder than that for kernelbased estimators in the absence of microstructure noise. The only related result we know is that of(Aït-Sahalia & Jacod, 2014, Section 8.7), who stated, without proof, a stable convergence result ofa pre-averaging type of estimator of the spot volatility, but only in the case of a one-sided uniformkernel K ( t ) = [0 , ( t ) . Here we consider a general two-side kernel (see below as to the need ofconsidering such kernels).As an important application of our results, we study the problem of bandwidth and kernelfunction selection. Using our CLT, we ﬁrst derive the optimal bandwidth and then the optimalkernel function (the one that minimizes the limiting variance) at the optimal rate. As in Figueroa-López & Li (2020a), we show that the optimal kernel is a two-sided exponential or Laplace function3 ( x ) = e −| x | . This fact justiﬁes the necessity of developing the asymptotic theory for generalkernels over the more widely used uniform kernels.The implementation of the optimum bandwidth (at the optimum rate) is more challenging be-cause it involves the vol vol and the spot volatility itself. Hence, to implement it we develop a newmethod, which iteratively estimates the spot volatility, the vol vol, and the optimal bandwidth. Us-ing Monte Carlo simulation, we compare our estimator with the TSRSV estimator of Zu & Boswijk(2014) and show a signiﬁcant improved accuracy. We also illustrate the improvement achieved bythe optimal exponential kernel and the calibrated optimal bandwidth via our iterative method.We ﬁnish the introduction by giving one more reason as to the importance of estimating the spotvolatility. As mentioned above, while spot volatility estimation can, at least conceptually, be seen asa byproduct of integrated variance estimation, interestingly enough, one can also use spot volatilityestimation as an intermediate step toward the estimation of integrated volatility functionals of theform I T ( g ) := (cid:82) T g ( σ s ) ds . Speciﬁcally, once an estimator ˆ σ t of σ t has been developed, one cannaturally devise an estimator for I T ( g ) of the form ˆ I T ( g ) = ∆ n (cid:80) ni =1 g (ˆ σ t i ) , where t i = i ∆ n and ∆ n = T /n , followed by an appropriate bias correction adjustment. In the absence of noise, Jacod &Rosenbaum (2013), Li et al. (2019), and Mykland & Zhang (2009) have developed methods for theestimation of these functionals (see also Li & Xiu (2016), Aït-Sahalia & Xiu (2019), and Li et al. (2017) for related methods and other applications thereof). Recently, Chen (2019) developed anestimator for ˆ I T ( g ) based on a forward ﬁnite diﬀerence approximation of the standard pre-averagingestimator of the integrated variance.The rest of the paper is organized as follows. Section 2 introduces the setting of the problemand the main result. Section 3 shows an application of our main theorem: the optimal parameterand kernel selection. The simulations are provided in Section 4. Proofs of our main results can befound in two appendices. Throughout, we consider the following stochastic diﬀerential equation (SDE): dX t = µ t dt + σ t dW t , (2.1)where all stochastic processes ( µ := { µ t } t ≥ , σ := { σ t } t ≥ , W := { W t } t ≥ ) are deﬁned on a completeﬁltered probability space (cid:0) Ω (0) , F (0) , F (0) , P (0) (cid:1) with ﬁltration F (0) = (cid:16) F (0) t (cid:17) t ≥ , and W := { W t } t ≥ is a standard Brownian Motion (BM) adapted to the ﬁltration F (0) .For an arbitrary process { U t } t ≥ and a given time span ∆ n > , we shall use the notation U ni := U i ∆ n , ∆ ni U := U ni − U ni − . Stable convergence in law is denoted by st −→ throughout the paper. See (2.2.4) in Jacod & Protter(2011) for the deﬁnition of this type of convergence. As usual, a n ∼ b n means that a n /b n → as n → ∞ .Throughout the paper, we consider two settings: observations with and without market mi-crostructure noise. In the absence of microstruture noise, we use standard kernel estimation, while4o deal the noise we propose a type of pre-averaging kernel estimator. These two settings togetherwith the main results are presented in the following two subsections. In this subsection, we assume that we can directly observe the process X in (2.1) at discrete times t i := t i,n := i ∆ n , where ∆ n := T /n and T ∈ (0 , ∞ ) is a given ﬁxed time horizon. To estimate thespot volatility σ τ , at a given time τ ∈ (0 , T ) , we adopt the kernel estimator studied in Fan & Wang(2008); Kristensen (2010); Figueroa-López & Li (2020a): ˆ σ ( m n ) τ := n (cid:88) i =1 K m n ∆ n ( t i − − τ ) (∆ ni X ) , (2.2)where K b ( x ) := K ( x/b ) /b , m n ∈ N , and b n := m n ∆ n is the bandwidth of the kernel function .The asymptotic behavior of this estimator with one-sided uniform kernels (i.e., K ( x ) = [0 , ( x ) or K ( x ) = [ − , ( x ) ) was studied in Jacod & Protter (2011). In this part, we extend the resultsto general two-sided kernels with possibly unbounded support. There is an important motivationfor considering such kernels since, as proved in Figueroa-López & Li (2020a) (see also Section 3below), exponential and some other nonuniform unbounded kernels can yield estimators with betterperformance than those based on uniform kernels. To establish a central limit theorem for the kernelestimator ˆ σ τ , we ﬁrst need some assumptions. Assumption 1.

The process { µ t } t ≥ is locally bounded and the spot volatility process { σ t } t ≥ is anItô process with dynamics d σ t = ˜ µ t d t + ˜ σ t d B t , (2.3) where B t is a standard Brownian Motion adapted to F (0) and correlated with W t so that d (cid:104) W, B (cid:105) t = ρ t dt . Here, { ˜ µ t } t ≥ is adapted locally bounded, { ρ t } t ≥ is adapted, locally bounded, and càdlàg, and { ˜ σ t } t ≥ is adapted càdlàg satisfying standard conditions for the process above to be well-deﬁned. Assumption 2.

The kernel function K : R → R is bounded and1. (cid:82) K ( x ) dx = 1 ;2. K is Lipschitz and piecewise C on ( −∞ , ∞ ) ;3. (i) (cid:82) | K ( x ) x | dx < ∞ ; (ii) K ( x ) x → , as | x | → ∞ ; (iii) (cid:82) | K (cid:48) ( x ) | dx < ∞ . We now proceed to describe the limiting distribution of the estimation error of (2.2). Let

V, V (cid:48) be independent centered Gaussian variables, deﬁned on a “very good” ﬁltered extension (cid:18)(cid:101) Ω (0) , (cid:101) F (0) , (cid:16) (cid:101) F (0) t (cid:17) t ≥ , (cid:101) P (0) (cid:19) of (cid:18) Ω (0) , F (0) , (cid:16) F (0) t (cid:17) t ≥ , P (0) (cid:19) (see Jacod & Protter (2011) for deﬁnition), and independent of F (0) , such that E (cid:0) V (cid:1) = 2 (cid:90) K ( u ) du, E (cid:0) V (cid:48) (cid:1) = (cid:90) L ( t ) dt, (2.4) Here, m n is equivalent to k n in the Theorem 13.3.7 of Jacod & Protter (2011), while m n ∆ n is equivalent to thebandwidth h n of Figueroa-López & Li (2020a). L ( t ) = (cid:82) ∞ t K ( u ) du { t> } − (cid:82) t −∞ K ( u ) du { t ≤ } . Next, let Z (0) τ , Z (cid:48) (0) τ be deﬁned as Z (0) τ = σ τ V, Z (cid:48) (0) τ = 2 σ τ ˜ σ τ V (cid:48) . (2.5)Now we are ready to introduce our main theorem for a general kernel estimator in the absence ofmicrostructure noise. The proof is given in Appendix A. Theorem 2.1.

Let the sequence { m n } n ≥ that controls the bandwidth of the kernel estimator besuch that m n → ∞ , m n ∆ n → , and m n √ ∆ n → β , for some β ∈ [0 , ∞ ] . Then, under Assumptions and above, at a given time τ ∈ [0 , T ] , we have the following stable convergence in law, as n → ∞ , (i) √ m n (cid:0) ˆ σ ( m n ) τ − σ τ (cid:1) st −→ Z (0) τ + βZ (cid:48) (0) τ , if β < ∞ , (ii) 1 √ m n ∆ n (cid:0) ˆ σ ( m n ) τ − σ τ (cid:1) st −→ Z (cid:48) (0) τ , if β = ∞ , (2.6) where Z (0) τ , Z (cid:48) (0) τ are deﬁned as in (2.5). Remark 2.1.

The result above extends Theorem 6.3 in Figueroa-López & Li (2020a) for generaltwo-sided kernels and stable convergence in law. The proof here is also diﬀerent from Figueroa-López& Li (2020a) and is based on the approach of Jacod & Protter (2011).

In this part, we assume that our observations of X are contaminated by “microstructure” noise.That is, we assume we observe Y t i := X t i + (cid:15) t i , (2.7)where (cid:15) = { (cid:15) t } is the noise process and, as before, t i := t i,n := i ∆ n , ≤ i ≤ n , with ∆ n := T /n and a ﬁxed time horizon T ∈ (0 , ∞ ) . We allow the noise (cid:15) to depend on X , but in such a waythat, conditionally on the whole process X , { (cid:15) t } t ≥ is a family of independent, centered randomvariables. More formally, following the framework of Jacod & Protter (2011), for each time t , weconsider a transition probability Q t (cid:0) ω (0) , dz (cid:1) from (cid:16) Ω (0) , F (0) t (cid:17) into ( R , B ( R )) , and the canonicalprocess { (cid:15) t } t ≥ on R [0 , ∞ ) deﬁned as (cid:15) t (˜ ω ) = ˜ ω ( t ) for t ≥ and ˜ ω ∈ R [0 , ∞ ) . Next, we constructa new probability space (cid:0) R [0 , ∞ ) , B , σ ( (cid:15) s : s ∈ [0 , t )) , Q (cid:1) , where B is the product Borel σ -ﬁeld and Q = ⊗ t ≥ Q t . We then deﬁne an enlarged ﬁltered probability space (cid:16) Ω , F , ( F t ) t ≥ , P (cid:17) and a ﬁltration ( H t ) as follows:  Ω = Ω (0) × R [0 , ∞ ) , F t = F (0) t ⊗ σ ( (cid:15) s : s ∈ [0 , t )) , H t = F (0) ⊗ σ ( (cid:15) s : s ∈ [0 , t )) P ( dω (0) , d ˜ ω ) = P (0) ( dω (0) ) Q ( ω (0) , d ˜ ω ) . (2.8)Hence, any variable or process in either Ω (0) or R [0 , ∞ ) can be considered in the usual way to avariable or a process on Ω . We are now ready to state the assumptions on the F (0) -conditional lawof the noise process as well as some slightly diﬀerent assumptions on the spot variance process andkernel function. Assumption 3.

All variables ( (cid:15) t : t ≥ are independent conditionally on F (0) , and we have E (cid:0) (cid:15) t | F (0) (cid:1) = 0 , • For all p > , the process E (cid:0) | (cid:15) t | p | F (0) (cid:1) is (cid:16) F (0) t (cid:17) -adapted and locally bounded, • The conditional variance process γ t = E (cid:16) | (cid:15) t | (cid:12)(cid:12)(cid:12) F (0) (cid:17) is càdlàg. In order to simplify the expressions for the limiting distribution of the estimation error, it isconvenient to write Assumption in the following form: dσ t = Γ t dt + Λ t dB t , (2.9)where B t is a standard Brownian Motion adapted to F (0) such that d (cid:104) W, B (cid:105) t = ρ t dt , and { Γ t } t ≥ and { Λ t } t ≥ are adapted to F (0) , and satisfying standard conditions for the integrals in (2.9) to bewell-deﬁned.Along the lines of Jacod & Protter (2011) (originally proposed in Jacod et al. (2009)), to constructthe pre-averaging estimator, we need:(i) A sequence of positive integers k n , which represent the length of the pre-averaging window,satisfying k n = 1 θ √ ∆ n + o (cid:18) / n (cid:19) , for some θ > (2.10)(ii) A real-valued weight function g on [0, 1], satisfying that g is continuous, piecewise C with apiecewise Lipschitz derivative g (cid:48) such that g (0) = g (1) = 0 , (cid:90) g ( s ) ds = 1; Next, for an arbitrary process U , we deﬁne the sequences: U ni = (cid:80) k n − j =1 g (cid:16) jk n (cid:17) ∆ ni + j − U = − (cid:80) k n j =1 (cid:16) g (cid:16) jk n (cid:17) − g (cid:16) j − k n (cid:17)(cid:17) U ni + j − , (cid:98) U ni = (cid:80) k n j =1 (cid:16) g (cid:16) jk n (cid:17) − g (cid:16) j − k n (cid:17)(cid:17) (cid:0) ∆ ni + j − U (cid:1) . (2.11)As seen from the deﬁnition, U ni is the weighted average of the increments ∆ i + j − U, j = 1 , · · · , k n − ,while (cid:98) U ni is a de-biasing term. For a weight function g as above, let φ k n ( g ) = (cid:80) k n i =1 g ( ik n ) ,φ (cid:48) k n ( h ) = (cid:80) k n i =1 (cid:16) g ( ik n ) − g ( i − k n ) (cid:17) , (2.12)and note that φ k n ( g ) = k n (cid:82) g ( s ) ds + O(1) = k n + O(1) ,φ (cid:48) k n ( g ) = k n (cid:82) ( g (cid:48) ( s )) ds + O (cid:16) k n (cid:17) . (2.13)Finally, the pre-averaging estimator of the spot volatility σ τ at τ ∈ (0 , T ) is deﬁned as ˆ σ ( k n , m n ) τ = 1 φ k n ( g ) n − k n +1 (cid:88) j =1 K m n ∆ n ( t j − − τ ) (cid:18)(cid:16) Y nj (cid:17) − (cid:98) Y nj (cid:19) , (2.14)where, as before, K b ( x ) = K ( x/b ) /b . The following result establishes the asymptotic behavior ofthe estimation error for the proposed estimator. The proof is given in Appendix B. It is enough to ask g ∈ L ([0 , , but, since the pre-averaging estimator is invariant to scalings of the weightfunction g , without loss of generality, we can impose the condition (cid:107) g (cid:107) L = 1 as we did above. heorem 2.2. Let { m n } n ≥ be a sequence of positive integers such that m n → ∞ , m n ∆ n → , m n √ ∆ n → ∞ , and m n ∆ / n → β for some β ∈ [0 , ∞ ] , and let k n and g be as described in (i)-(ii)above. Then, under the model described by the Eqs. (2.1), (2.7), and (2.9) and Assumptions and ,the pre-averaging estimator (2.14) is such that, as n → ∞ , (i) m / n ∆ / n (cid:0) ˆ σ ( k n , m n ) τ − σ τ (cid:1) st −→ Z τ + βZ (cid:48) τ , if β < ∞ , (ii) 1 √ m n ∆ n (cid:0) ˆ σ ( k n , m n ) τ − σ τ (cid:1) st −→ Z (cid:48) τ , if β = ∞ , (2.15) for τ ∈ (0 , T ) , where Z τ , Z (cid:48) τ are deﬁned on a good extension (cid:18)(cid:101) Ω , (cid:101) F , (cid:16) (cid:101) F t (cid:17) t> , (cid:101) P (cid:19) of the space (cid:16) Ω , F , ( F t ) t ≥ , P (cid:17) and, conditionally on F , are independent Gaussian random variables with con-ditional variance δ ( τ ) := (cid:101) E (cid:0) Z τ |F (cid:1) = 4 (cid:0) Φ σ τ /θ + 2Φ σ τ γ τ θ + Φ γ τ θ (cid:1) (cid:90) K ( u ) du,δ ( τ ) := (cid:101) E (cid:0) Z (cid:48) τ |F (cid:1) = Λ τ (cid:90) L ( t ) dt, (2.16) with φ ( s ) = (cid:82) s g (cid:48) ( u ) g (cid:48) ( u − s )d u , φ ( s ) = (cid:82) s g ( u ) g ( u − s )d u , Φ ij = (cid:82) φ i ( s ) φ j ( s )d s , and L ( t ) = (cid:82) ∞ t K ( u ) du { t> } − (cid:82) t −∞ K ( u ) du { t ≤ } . Remark 2.2.

For the estimation of the integrated variance (IV), [ X, X ] T = (cid:82) T σ t dt , Jacod et al. (2009) proposed the pre-averaging estimator: (cid:92) [ X, X ] s := 1 φ k n ( g ) ss − k n ∆ n [ s/ ∆ n ] − k n +1 (cid:88) j =1 (cid:18)(cid:16) Y nj (cid:17) − (cid:98) Y nj (cid:19) , for s ∈ (0 , T ] (2.17) and showed the following limiting behavior: / n (cid:16) (cid:92) [ X, X ] T − [ X, X ] T (cid:17) st −→ U noise T , (2.18) where U noise T is a centered Gaussian process with conditional variance δ T := E (cid:16)(cid:0) U noise T (cid:1) |F (cid:17) = (cid:90) T (cid:0) Φ σ t /θ + 2Φ σ t γ t θ + Φ γ t θ (cid:1) dt. The spot volatility estimator (2.14) can be viewed as a localization of IV in that ˆ σ t ≈ (cid:90) K m n ∆ n ( s − t ) d (cid:92) [ X, X ] s . (2.19) More speciﬁcally, the factor ss − k n ∆ n is omitted for the spot volatility estimator, as suggested by Aït-Sahalia & Jacod (2014) in Section 8.7. We can then heuristically argue that for the spot volatilityestimator, the variance of the estimation error at time t should be close to (cid:112) ∆ n (cid:90) K m n ∆ n ( s − t ) dδ s ≈ m n √ ∆ n (cid:0) Φ σ t /θ + 2Φ σ t γ t θ + Φ γ t θ (cid:1) (cid:90) K ( u ) du, (2.20) which is indeed the case when we have m n ∆ / n → β = 0 as formally shown in Theorem 2.2. The tuning parameters θ, β and kernel function K aﬀect the variance of the limiting distribution ofthe estimation error. In this section, as an application of our main Theorem 2.2, we show how to8nd the tuning parameters and kernel function of the estimator in order to minimize the asymptoticvariance of the estimation error. By necessity, the optimal choices of θ and β will be given interms of the integrated variance and quarticity, IV T := (cid:82) T σ t dt and QrT T := (cid:82) T σ t dt , respectively,the Integrated Volatility of Volatility (IVV), (cid:82) T Λ t dt , and the integrated variance of the noise (cid:15) t , (cid:82) T γ t dt . We can estimate (cid:82) T Λ t dt and (cid:82) T γ t dt separately, while for IV T and QrT T , we can devisean iterative method in which an initial rough estimate of σ t on a grid of [0 , T ] is used to determineestimates of the integrated variance and quarticity. These estimates can in turn be used to ﬁndsuitable estimates of the optimal values for θ and β . These values are then applied in the kernelpre-averaging estimator (2.14) to reﬁne our estimate of σ t on the grid. θ Recall we set k n = θ √ ∆ n + o (cid:16) / n (cid:17) and, thus the parameter θ determines the length of the pre-averaging window k n . The following corollary can easily be derived from Theorem . . Corollary 3.1.

The optimal value of θ , which is set to minimize (cid:82) T δ ( t ) dt is given by θ (cid:63) = (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) (cid:16)(cid:112) Φ + 3Φ Φ − Φ (cid:17) (cid:82) T σ t dt (cid:82) T γ t dt . (3.1) Deﬁne the conditional Mean Integrated Square Error (MISE),

M ISE := E σ (cid:34)(cid:90) T (cid:0) ˆ σ ( k n , m n ) τ − σ τ (cid:1) dτ (cid:35) , (3.2)where E σ denotes the conditional expectation with respect to the sigma-ﬁeld generated by { σ t } t ∈ [0 ,T ] .From Theorem . , we can deduce that when m n (the bandwidth length in ∆ n units) is of the form m n = β ∆ − / n for some constant β ∈ (0 , ∞ ) , the optimal convergence rate of ∆ / n is attained andwe further have: ∆ − / n (cid:0) ˆ σ ( k n , m n ) τ − σ τ (cid:1) st −→ β − / ( Z τ + βZ (cid:48) τ ) . (3.3)Therefore, the limiting distribution has conditional variance (cid:16) β δ ( τ ) + βδ ( τ ) (cid:17) , where δ ( τ ) and δ ( τ ) are given as in (2.16). This result suggests that (3.2) can be approximated by ∆ / n (cid:90) T (cid:18) β δ ( τ ) + βδ ( τ ) (cid:19) dτ. (3.4)We then have the following result. Corollary 3.2.

Let Θ( θ ) := Θ( θ ; g ) := Φ θ (cid:90) T σ t dt + 2Φ θ (cid:90) T γ t σ t dt + Φ θ (cid:90) T γ t dt. (3.5) With the bandwidth b n = m n ∆ n = β ∆ / n , the optimal value of b n , which is set to minimize theapproximated MISE (3.4), is given by b (cid:63)n = (cid:118)(cid:117)(cid:117)(cid:116) (cid:82) T δ ( t ) dt (cid:82) T δ ( t ) dt ∆ / n = ∆ / n (cid:115) θ ) (cid:82) K ( u ) du (cid:82) T Λ t dt (cid:82) L ( v ) dv . (3.6)9 ith this optimal bandwidth, the integrated variance of the limiting distribution for the estimationerror is given by (cid:115)(cid:90) T δ ( t ) dt (cid:90) T δ ( t ) dt = 4 (cid:115) Θ( θ ) (cid:90) T Λ t dt (cid:90) K ( u ) du (cid:90) L ( v ) dv. (3.7)Note that b (cid:63)n contains unknown theoretical quantities that need to be estimated in order todevise a plug in type estimator. Under the assumption of γ t ≡ γ , the variance of the noise, γ , canbe estimated using the estimator in Zhang et al. (2005): ˆ γ = 12 n n (cid:88) i =1 (cid:0) Y ni − Y ni − (cid:1) . (3.8)For the estimation of the IVV, (cid:82) T Λ t dt , we introduce the following estimator. We start by obtaininga preliminary estimate of the spot variance path σ on the grid τ ∈ { t i } i =0 ,...,n via the estimator(2.14) staring with some sensible initial estimates of the tuning parameter values. For example, wecan set b n = m n ∆ n = ∆ / n . Let us call ˆ σ t i , these initial estimates. We can then use sparselysampled (say, 5 min observations) spot variance estimates to estimate IVV via a standard RealizedVariance estimator (cid:91) IV V T, := [ n/p ] − (cid:88) i =0 (ˆ σ t ( i +1) p , − ˆ σ t ip , ) , for some positive integer p (cid:28) n . We also implemented a pre-averaging integrated variance estimatorfor the IVV based on the spot variance estimates. However, the choice of tuning parameters herecould be tricky and the performance is similar to the Realized Variance estimator above. As for (cid:82) T σ d dt , we can simply compute the sum of squares of the preliminary estimate ˆ σ t i , and multiplyby ∆ n . Now with these estimates, we can calculate an estimate of the optimal bandwidth b (cid:63)n usingthe result of Corollary 3.2. Such an approximate optimal bandwidth can then be used to reﬁne ourestimates of the spot variance grid. Continuing this procedure iteratively, we hope to obtain goodestimates of the optimal bandwidth.Note that (3.6) sets the same bandwidth for the entire path of X . We can also consider a local ornon-homogeneous bandwidth: for τ ∈ [0 , T ] , the optimal local bandwidth is deﬁned to minimize thevariance of the estimation error at time τ . By setting m n = β ∆ − / n and minimizing the resultingasymptotic variance of the estimation error derived from Theorem . , the optimal bandwidth isgiven by b (cid:63),localn = δ ( τ ) δ ( τ ) ∆ / n = ∆ − / n (cid:115) τ ( θ ) (cid:82) K ( u ) du Λ τ (cid:82) L ( u ) du , (3.9)with δ ( τ ) and δ ( τ ) deﬁned as in Theorem 2.2 and Θ τ ( θ ) is deﬁned as: Θ τ ( θ ) := Φ θ σ τ + 2Φ γ τ θσ τ + Φ γ τ θ . (3.10)With this optimal bandwidth, the variance of the limiting distribution for the estimation error isgiven by δ ( τ ) δ ( τ ) = 4 (cid:115) Θ τ ( θ )Λ τ (cid:90) K ( u ) du (cid:90) L ( u ) du. (3.11)Since the bandwidth now has the ﬂexibility to depend on the volatility, we may expect it to havea better performance than homogeneous bandwidth. We will analyze this point in greater detail inthe Monte Carlo simulations of Section 4. 10 emark 3.1. We can see the constant bandwidth (3.6) as an approximation of the optimal localbandwidth (3.9), where the average values ¯Θ( θ ) := (cid:82) T Θ t ( θ ) dt/T and ¯Λ := (cid:82) T Λ t dt/T are used asproxies of the spot values Θ τ ( θ ) and Λ τ , respectively. These global proxies have the advantages ofbeing easier and more accurate to estimate. With the optimal bandwidths of Section 3.2, we can now obtain a formula for the asymptoticvariance, which enjoys an explicit dependence on the kernel function K . It is then natural toattempt to ﬁnd the kernel that minimizes such a variance. As observed from (3.7) or (3.11), we onlyneed to minimize I ( K ) = (cid:90) K ( u ) du (cid:90) L ( u ) du = (cid:90) K ( u ) du (cid:90) (cid:90) xy ≥ K ( x ) K ( y )( | x | ∧ | y | ) dxdy, (3.12)over all kernels K such that (cid:82) K ( u ) du = 1 , where for the second equality above we used that L ( t ) = (cid:82) ∞ t K ( u ) du { t> } − (cid:82) t −∞ K ( u ) du { t ≤ } . It has been proved in Figueroa-López & Li (2020a), Section4.1, that, among all the kernel functions satisfying Assumption , the exponential kernel function K exp ( x ) = exp( −| x | ) is the one that minimizes the functional I ( K ) . As shown on Figueroa-López& Li (2020a), exponential kernels have also another computational advantage since they allow toreduce the time complexity for estimating the volatility on all grid points t j from O ( n ) to O ( n ) .This property is particularly useful when working with high-frequency observations. In this section, we study the performance of the kernel pre-averaging estimator (2.14), together withthe implementation procedure described in Subsection 3.2, and compare the results with the TwoScale Realized Spot Variance (TSRSV) estimator proposed in Zu & Boswijk (2014).

Throughout, we consider the Heston model: d X t = ( µ − v t /

2) d t + v / t d W t , Y t i = X t i + (cid:15) t i d v t = κ ( α − v t ) d t + γv / t d B t , (4.1)where we assume B t = ρW t + (cid:112) − ρ ˜ W t , with ˜ W being a Brownian motion independent with W .Note that the variance process is given by σ t = v t . We adopt the same parameter values as in Zhang et al. (2005), but properly normalized so that the time unit is one day: µ = 0 . / , κ = 5 / , α = 0 . / , γ = 0 . / , ρ = − . . (4.2)We set the noise as (cid:15) ni := (cid:15) t i i.i.d. ∼ N (cid:0) , . (cid:1) , and the initial values to X = 1 and v =0 . / . We use the usual triangular weight function g ( x ) = 2 x ∧ (1 − x ) . We simulate data forone day ( T = 1 ), and assume the data is observed once every second, with 6.5 trading hours perday. The number of observation is then n = 23400 .11 .2 Validity of the asymptotic theory and necessity of de-biasing We ﬁrst show that the asymptotic behavior of the estimation error is consistent with our theoreticalresult. By Corollary . , the optimal rate of convergence of the estimation error is attained whenthe bandwidth takes the form m (cid:63)n ∆ n = β ∆ / n , for some β ∈ (0 , ∞ ) , and, thus, we only analyze theﬁrst case of Theorem . . We aim to estimate the spot variance v . := σ . using our pre-averagingkernel estimator (2.14), with β = 1 and the exponential kernel. The histogram of the estimationerrors, ˆ v . − v . , based on 25,000 simulated paths, is shown in Figure 1. We also plot the theoreticaldensity of the estimation error as prescribed by Theorem . but with the true parameter valuesfor γ and θ , and replacing v . = σ . with the mean of v . over all 25,000. As it can be seen, thetheoretical density is consistent with the empirical results.Figure 1: Histogram of ˆ σ t at t = 0 . and the density of the theoretical limiting distribution.To investigate the need of the bias correction term (cid:98) Y nj in (2.14), let us consider a new estimatorwithout the bias correction term, ˜ v τ = (cid:80) n − k n +1 j =1 K m n ∆ n ( t j − − τ ) (cid:16) Y nj (cid:17) . We show the histogramof the estimation errors ˜ v . − v . for 25,000 simulated paths, and, for comparisons, also plot thesame theoretical asymptotic density function of Figure 1. As shown in left panel of Figure 2, theestimator ˜ v . signiﬁcantly overestimates the spot variance, which shows the necessity of the biascorrection term (cid:98) Y nj in (2.14). 12igure 2: Left Panel: The eﬀect of bias correction term. Right Panel: Comparison of the asymptoticdistribution between uniform and exponential kernel. Before analyzing the empirical performance of the estimators for diﬀerent kernels, we compare thetheoretical asymptotic densities of the estimation error for the exponential and uniform kernels.This is shown in right panel of Figure 2. We can see therein that, as predicted in Subsection 3.3,the exponential kernel estimator has a much smaller asymptotic variance.We now proceed to compare the ﬁnite sample performance of the estimator (2.14) for diﬀerentkernels. We assume both a non-leverage setting ( ρ = 0 ) and a negative correlation setting ( ρ = − . ).In order to alleviate the boundary eﬀects, the following estimator is used in the simulation (asproposed in Kristensen (2010)): ˆ σ ( k n , m n ) τ = (cid:80) n − k n +1 j =1 K m n ∆ n ( t j − − τ ) (cid:18)(cid:16) Y nj (cid:17) − (cid:98) Y nj (cid:19) φ k n ( g ) ∆ n (cid:80) nj =1 K m n ∆ n ( t j − − τ ) . (4.3)For the j th simulated path { X ( j ) t i : 0 ≤ i ≤ n, t i = iT /n } , we estimate the corresponding skeleton ofthe spot variance process, { σ t i ,j } i =1 ,...,n , using θ = 5 and initial bandwidth β = 1 . The estimatedpath is denoted as { ˆ σ t i ,j } i =1 ,...,n . Next, we calculate the average of the squared errors (ASE), ASE j = 1 n − l + 1 n − l (cid:88) i = l (cid:0) ˆ σ t i ,j − σ t i ,j (cid:1) . Here, l = [0 . n ] is used to further alleviate boundary eﬀects. Then, we take the square root of theaverage of the ASEs over all the simulated paths: (cid:92)

RM SE = (cid:118)(cid:117)(cid:117)(cid:116) m m (cid:88) j =1 ASE j , where m is the number of simulations. This is an estimate of RM SE = (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34) n − l + 1 n − l (cid:88) i = l (cid:0) ˆ σ t i − σ t i (cid:1) (cid:35) . Next, we ﬁx θ = 5 and apply the iterative homogeneous bandwidth selection method introduced inSubsection 3.2 to further investigate the performance of diﬀerent kernels. We report the estimated13 M SE with the initial bandwidth β = 1 and the result of iterative bandwidth selection methodafter one iteration in Table 1 for the following four diﬀerent kernels: K exp ( x ) = 12 e −| x | , K unif ( x ) = 12 1 {| x | < } K ( x ) = | − x | {| x | < } , K ( x ) = 34 (1 − x )1 {| x | < } . This shows that, indeed, the exponential kernel provides the best performance. (cid:92)

RM SE × ( ρ = 0 )Kernel β = 1 OptimalBandwidth Selection K exp K unif K K First, we show that the suboptimal bandwidth, which corresponds to β = 0 in Theorem . , indeedperforms worse than the optimal bandwidth, even though its asymptotic variance is easier to estimatewithout the βZ (cid:48) τ term. In Table 2, we compare the optimal bandwidth h = β ∆ / n with thesuboptimal bandwidths h = β ∆ . n and h = β ∆ . n , using the exponential kernel with β = 1 , , , respectively, on 1000 simulated path. The results show the advantage in using the optimal bandwidthfor the same level of the coeﬃcient β . (cid:92) RM SE × ( ρ = − . Bandwidth h (optimal) h (suboptimal) h (suboptimal) β = 1 β = 2 β = 3 β = 4 θ , which controls the length of the pre-averaging window k n as k n = θ √ ∆ n , has comparativelysmaller eﬀect on the performance of estimator than that of the bandwidth. Therefore, throughoutthis section, we ﬁx θ = 5 , which is computed by (3.1) using true parameter values, and considerdiﬀerent bandwidth selection techniques . We also consider other values of θ and the results were similar.

14n Table 3, we report the estimated RMSE for diﬀerent bandwidth selection methods. For thehomogeneous bandwidth selection method (3.6), we apply the realized variance of sparsely sampled(5 min) spot variance estimates { ˆ σ t i } to estimate the vol vol (cid:82) T Λ t dt as described in Section 3.2.We ﬁx the estimated vol vol after the ﬁrst iteration to prevent the increased variance brought bythe iterative method. The ﬁrst two iterations are shown in the ﬁrst two columns of the tableand we can see that the second iteration does not improve the result signiﬁcantly. Therefore,one iteration of the bandwidth selection method is suﬃcient in practice. For the local bandwidthmethod, we use (cid:82) T Λ t dt/T as a proxy of Λ τ in the formula (3.9). As a reference, we also givethe results of using an oracle optimal bandwidth, which is computed by the true parameter valuesand the simulated spot variance process with Eqs. (3.6) and (3.9) for the optimal homogeneous andoptimal local bandwidths, respectively. In the last column, we provide the result of a semi-oracletype of bandwidth, where we use the estimated spot variance “skeleton" { ˆ σ t i } to estimate (cid:82) T σ t dt and (cid:82) T σ t dt , via Riemann sums , while using the true parameter of γ given in (4.2) to estimate (cid:82) T Λ t dt = γ (cid:82) T σ t dt . The last simpliﬁcation is possible due to the special structure of the diﬀusioncoeﬃcient of variance process in the Heston model (4.1). A similar approach can be applied to otherpopular volatility models such as CEV models. As we can see therein, the data-driven approaches(1st two columns) are quite close to the oracle and semi-oracle estimates. (cid:92) RM SE × ( ρ = − . β = 1 is 1.4086. Columns 2 and 3 show the results corresponding to the 1stand 2nd iterations of bandwidth selection methods. Column 4 and 5 show the result using oracleand semi-oracle bandwidths, respectively. Remark 4.1.

Theoretically, the estimator with local bandwidth has the ﬂexibility to adjust its band-width at diﬀerent times based on the data. Therefore, this estimator should be able to achieve a lowerbound of the approximated MISE deﬁned in (3.4): E σ (cid:34)(cid:90) T (cid:0) ˆ σ t − σ t (cid:1) dt (cid:35) ≈ ∆ / n (cid:90) T (cid:32) β t δ ( t ) dt + (cid:90) T β t δ ( t ) dt (cid:33) . (4.4) However, the simulations show that the performance of the local bandwidth with known parametervalues is almost the same as that of the homogeneous bandwidth. To further investigate this phe-nomenon, in left panel of Figure 3, we show the estimated RMSE for diﬀerent ﬁxed values of τ against the parameter β in the bandwidth formula b n = β ∆ / n . As before we simulate the Hestonmodel (4.1) with the same parameters as in (4.2), but with the vol vol parameter γ = 1 / . Wecan conclude from the ﬁgure that the optimal bandwidths for diﬀerent τ ’s are almost the same and We also apply the pre-averaging estimate of quarticity given in Jacod et al. (2010), but the results were lessoptimal. γ = 1 / . Right panel: MSE v.s. bandwidthwhen γ = 0 . / consistent with the optimal bandwidth based on the asymptotic variance of the estimator. Thus, anestimator with homogeneous bandwidth can achieve a similar result without extra computation cost.This trend is less obvious when the vol vol parameter γ is relatively small. In the right panel ofFigure 3 we show the estimated RMSE vs. β when γ = 0 . / . The perceived almost ﬂat trendas the bandwidth increases shows that, in this case, the realized variance can serve as a good proxyof the spot volatility, at least for the purpose of tuning the parameters of the estimators, since thespot volatility estimator degenerates to the integrated volatility estimator when the bandwidth getslarge. Note, however, that the MSE paths are slowly tickling up as β increases and each of thosepaths exhibit an optimal bandwidth. These are again relatively close for diﬀerent times τ and alsoclose to the theoretical optimal bandwidth. In conclusion, when the vol vol parameter is not known,the theoretical optimal bandwidth can provide a good guideline for the empirical experiments and ahomogeneous bandwidth is suﬃcient in achieving similar result as local bandwidth while reducing theestimation error and computation cost caused by latter. Finally, we compare the estimated RMSE of our pre-averaging kernel estimator to that of the TSRSVestimator proposed in Zu & Boswijk (2014) on 2000 paths. We take the leverage ρ = − . , chooseseveral diﬀerent tuning parameter values for the TSRSV estimator and report the top 3 parametercombination. We also tried the optimal tuning parameters proposed in Zu & Boswijk (2014), butthe result is not as good as the ones reported here (the RMSE is about 2.064055e-04).KernelEstimator TSRSV( K = 0 . , h = 4 ) TSRSV( K = 0 . , h = 3 . ) TSRSV( K = 0 . , h = 3 . ) (cid:92) RM SE

Proof of Theorem . We follow the steps in the proof of Theorem 13.3.3 in Jacod & Protter (2011) (which implies Theorem13.3.7). By virtue of localization, without loss of generality, we assume throughout the proof that ˜ µ t , ˜ σ t , σ t , and ρ t are bounded (see Section 4.4.1 in Jacod & Protter (2011) and Appendix A.5 inAït-Sahalia & Jacod (2014) for details). And we use C to represent a generic constant that maychange from line to line. We ﬁrst introduce some notation. Recall that U ni := U i ∆ n and, for t ∈ (( i − n , i ∆ n ] , let V nt := n (cid:88) j =1 K m n ∆ n ( t j − − t ) (cid:16)(cid:0) ∆ nj W (cid:1) − ∆ n (cid:17) ,V (cid:48) nt := ∆ n n (cid:88) j =1 K m n ∆ n ( t j − − t ) (cid:0) B nj − B ni (cid:1) ,Z nt := ( σ ni ) V nt , Z (cid:48) nt := 2 σ ni ˜ σ ni V (cid:48) nt ,Z (cid:48)(cid:48) nt = ˆ σ ( m n ) t − σ t − Z nt − Z (cid:48) nt . (A.1)All the three cases in Theorem . follows from the next two lemmas: Lemma A.1.

Under Assumptions and and assuming that { ˜ µ t } , { ˜ σ t } , and { σ t } are bounded,with ∆ n → , m n ∆ n → , and m n √ ∆ n → ∞ , we have the following stable convergence in law: (cid:18) √ m n Z nt , √ m n ∆ n Z (cid:48) nt (cid:19) st −→ (cid:16) Z (0) t , Z (cid:48) (0) t (cid:17) , where Z t and Z (cid:48) (0) t are deﬁned in (2.5). Lemma A.2.

Under Assumptions and and assuming that ˜ µ t , ˜ σ t , and σ t are bounded, we havefor all t ∈ [0 , T ] , z (0) n Z (cid:48)(cid:48) nt P −→ , where z (0) n =  m / n , if m n ∆ / n → β < ∞ √ m n ∆ n , if m n ∆ / n → β = ∞ . We prove these two lemmas in the next two subsections.

A.1 Proof of Lemma A. We ﬁrst show (cid:18) √ m n V nt , √ m n ∆ n V (cid:48) nt (cid:19) st −→ ( V, V (cid:48) ) , (A.2)where ( V, V (cid:48) ) are deﬁned in (2.4). Denote the bandwidth of the kernel as b n := m n ∆ n , recall t ∈ (( i − n , i ∆ n ] , we can write the pair (cid:16) √ m n V nt , √ m n ∆ n V (cid:48) nt (cid:17) as (cid:80) nj =1 (cid:0) ζ nj ( t ) , ζ (cid:48) nj ( t ) (cid:1) , where ζ nj ( t ) = √ m n K b n ( t j − − t ) (cid:16)(cid:0) ∆ nj W (cid:1) − ∆ n (cid:17) ,ζ (cid:48) nj ( t ) = ∆ n √ m n ∆ n  if j = 1 − (cid:16)(cid:80) j − l =1 K b n ( t l − − t ) (cid:17) ∆ nj B if ≤ j ≤ i (cid:16)(cid:80) nl = j K b n ( t l − − t ) (cid:17) ∆ nj B if i < j ≤ n . (cid:0) ζ nj ( t ) , ζ (cid:48) nj ( t ) (cid:1) is F ( (cid:48) ) t j measurable and with F j := F (0) t j , n (cid:88) j =1 E (cid:16) ζ nj ( t ) (cid:12)(cid:12) F (0) j − (cid:17) = 0 , n (cid:88) j =1 E (cid:16) ζ (cid:48) nj ( t ) (cid:12)(cid:12) F (0) j − (cid:17) = 0 . Recall that ρ s = d (cid:104) W, B (cid:105) s /ds is càdàg and bounded on the interval [ t j − , t j ] . By Itô lemma,Cauchy-Schwartz inequality, and Doob’s inequality, we have (cid:12)(cid:12)(cid:12) E (cid:16) (cid:0) ∆ nj W (cid:1) ∆ nj B (cid:12)(cid:12)(cid:12) F (0) j − (cid:17)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) t j t j − E (cid:32) ρ s (cid:32)(cid:90) st j − dW u (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (0) j − (cid:33) ds (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) t j t j − E (cid:32) (cid:0) ρ s − ρ t j − (cid:1) (cid:32)(cid:90) st j − dW u (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (0) j − (cid:33) ds (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) t j t j − (cid:114) E (cid:16) (cid:0) ρ s − ρ t j − (cid:1) (cid:12)(cid:12)(cid:12) F (0) j − (cid:17) ∆ n ds ≤ C ∆ / n (cid:114) E (cid:16) (cid:0) ρ t j − ρ t j − (cid:1) (cid:12)(cid:12)(cid:12) F (0) j − (cid:17) . Then, by a change of variable, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) j =1 E (cid:0) ζ nj ( t ) ζ (cid:48) nj ( t ) (cid:12)(cid:12) F j − (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:112) ∆ n i (cid:88) j =2 | K b n ( t j − − t ) | (cid:32) j − (cid:88) l =1 | K b n ( t l − − t ) | (cid:33) (cid:12)(cid:12)(cid:12) E (cid:16) (cid:0) ∆ nj W (cid:1) ∆ nj B (cid:12)(cid:12)(cid:12) F (0) j − (cid:17)(cid:12)(cid:12)(cid:12) + (cid:112) ∆ n n (cid:88) j = i +1 | K b n ( t j − − t ) |  n (cid:88) l = j | K b n ( t l − − t ) |  (cid:12)(cid:12)(cid:12) E (cid:16) (cid:0) ∆ nj W (cid:1) ∆ nj B (cid:12)(cid:12)(cid:12) F (0) j − (cid:17)(cid:12)(cid:12)(cid:12) ≤ C ∆ n i (cid:88) j =2 | K b n ( t j − − t ) | (cid:32) j − (cid:88) l =1 | K b n ( t l − − t ) | (cid:33) max j (cid:114) E (cid:16) (cid:0) ρ t j − ρ t j − (cid:1) (cid:12)(cid:12)(cid:12) F (0) j − (cid:17) + C ∆ n n (cid:88) j = i +1 | K b n ( t j − − t ) |  n (cid:88) l = j | K b n ( t l − − t ) |  max j (cid:114) E (cid:16) (cid:0) ρ t j − ρ t j − (cid:1) (cid:12)(cid:12)(cid:12) F (0) j − (cid:17) ≤ C (cid:90) | K ( u ) | | L ( u ) | du max j (cid:114) E (cid:16) (cid:0) ρ t j − ρ t j − (cid:1) (cid:12)(cid:12)(cid:12) F (0) j − (cid:17) . We notice that ρ is right-continuous and uniformly bounded on [0,T], thus, we have max j E (cid:16) (cid:0) ρ t j − ρ t j − (cid:1) (cid:12)(cid:12)(cid:12) F (0) j − (cid:17) → . Therefore, n (cid:88) j =1 E (cid:0) ζ nj ( t ) ζ (cid:48) nj ( t ) (cid:12)(cid:12) F j − (cid:1) → , as n → ∞ . n (cid:88) j =1 E (cid:0) ζ nj ( t ) (cid:12)(cid:12) F j − (cid:1) = 2 n (cid:88) j =1 m n ∆ n K b n ( t j − − t ) −→ (cid:90) K ( u ) du, n (cid:88) j =1 E (cid:0) ζ (cid:48) nj ( t ) (cid:12)(cid:12) F j − (cid:1) = ∆ n m n  n (cid:88) j = i +1  n (cid:88) m = j K b n ( t m − − t )  + i (cid:88) j =2 (cid:32) j − (cid:88) m =1 K b n ( t m − − t ) (cid:33)  ∼ m n ∆ n (cid:90) Tt (cid:32)(cid:90) Tv K b n ( s − t ) ds (cid:33) dv + (cid:90) t (cid:18)(cid:90) v K b n ( s − t ) ds (cid:19) dv  −→ (cid:90) L ( u ) du, where L ( t ) = (cid:82) ∞ t K ( u ) du { t> } − (cid:82) t −∞ K ( u ) du { t ≤ } . Note that: n (cid:88) j =1 { E (cid:0) ζ nj ( t ) (cid:12)(cid:12) F i − (cid:1) + E (cid:0) ζ (cid:48) nj ( t ) (cid:12)(cid:12) F i − (cid:1) } = n (cid:88) j =1 m n ∆ n K b n ( t j − − t ) E (cid:0) U j − (cid:1) + ∆ n m n  n (cid:88) j = i +1  n (cid:88) m = j K b n ( t m − − t )  + i (cid:88) j =2 (cid:32) j − (cid:88) m =1 K b n ( t m − − t ) (cid:33)  ≤ Cm n (cid:90) K ( u ) du + Cm n ∆ n (cid:90) L ( u ) du −→ , where U j is a standard normal distribution and C is a generic constant. To apply Theorem 2.2.15in Jacod & Protter (2011), we further need to show that (i) n (cid:88) j =1 E (cid:0) ζ nj ( t ) (cid:0) M t j − M t j − (cid:1)(cid:12)(cid:12) F j − (cid:1) → , (ii) n (cid:88) j =1 E (cid:0) ζ (cid:48) nj ( t ) (cid:0) M t j − M t j − (cid:1)(cid:12)(cid:12) F j − (cid:1) → , (A.3)whenever M is either one of the component of ( W, B ) or is in the set N containing all bounded (cid:16) F (0) t (cid:17) -martingales orthogonal (in the martingale sense) to ( W, B ) . When M = W or B , (A.3-i)holds true since it is the F ( j − n -conditional expectation of an odd function of the increments ofthe process W after time ( j − n . On the other hand, by the boundedness of the process ρ ,we have | E ( ∆ j B ∆ j W | F j − ) | = E (cid:16) (cid:12)(cid:12)(cid:12)(cid:82) t j t j − ρ s ds (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F |−∞ (cid:17) ≤ C ∆ n , for some constant C and, thus,(A.3-ii) can be shown as follows: n (cid:88) j =1 E (cid:0) ζ (cid:48) nj ( t ) (cid:0) M t j − M t j − (cid:1)(cid:12)(cid:12) F j − (cid:1) ≤ ∆ / n √ m n  n (cid:88) j = i +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) m = j K b n ( t m − − t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + i (cid:88) j =2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) j − (cid:88) m =1 K b n ( t m − − t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C m n ∆ n (cid:90) | L ( u ) | du → . Suppose now that N is a bounded martingale, orthogonal to ( W, B ) . By Itô’s formula we see that ζ nj ( t ) can be written as √ m n K b n ( t j − − t ) (cid:82) t j t j − (cid:0) W s − W t j − (cid:1) dW s , i.e., a stochastic integral withrespect to W on the interval [( j − n , j ∆ n ] . Similarly, ζ (cid:48) nj ( t ) is a stochastic integral with respect19o B on the same interval. Then the orthogonality of N and ( W, B ) implies (A.3). Now, we canapply Theorem 2.2.15 in Jacod & Protter (2011) and show that (cid:18) √ m n V nt , √ m n ∆ n V (cid:48) nt (cid:19) st −→ ( V, V (cid:48) ) , where V, V (cid:48) is deﬁned in (2.4). Finally, recall that Z nt := ( σ ni ) V nt , Z (cid:48) nt := 2 σ ni ˜ σ ni V (cid:48) nt . From the càdlàg property of σ and ˜ σ , we see that σ ni → σ t and ˜ σ ni → σ t , for t ∈ (( i − n , i ∆ n ] .Then Lemma A. follows from (A.2) and the following property of the stable in law convergence: Z n st −→ Z, Y n P −→ Y ⇒ ( Y n , Z n ) st −→ ( Y, Z ) . A.2 Proof of Lemma A. For t ∈ (( i − n , i ∆ n ] , we can rewrite Z (cid:48)(cid:48) nt deﬁned in (A.1) as follows: Z (cid:48)(cid:48) nt = (cid:88) j =1 ζ nj ( t ) , where ζ n ( t ) = ( σ ni ) ∆ n n (cid:88) j =1 K b n ( t j − − t ) − σ t ζ n ( t ) = n (cid:88) j =1 K b n ( t j − − t ) (cid:16)(cid:0) ∆ nj X (cid:1) − (cid:0) σ nj − (cid:1) (cid:0) ∆ nj W (cid:1) (cid:17) ζ n ( t ) = n (cid:88) j =1 K b n ( t j − − t )2 σ ni ˜ σ ni (cid:16)(cid:0) ∆ nj W (cid:1) − ∆ n (cid:17) (cid:0) B nj − B ni (cid:1) ζ n ( t ) = n (cid:88) j =1 K b n ( t j − − t ) (cid:16)(cid:0) σ nj − (cid:1) − σ nj − σ ni + ( σ ni ) (cid:17) (cid:0) ∆ nj W (cid:1) ζ n ( t ) = n (cid:88) j =1 K b n ( t j − − t ) (cid:16) σ nj − σ ni − σ ni ) − σ ni ˜ σ ni (cid:0) B nj − B ni (cid:1)(cid:17) (cid:0) ∆ nj W (cid:1) . Therefore, it is enough to prove that, for l = 1 , , , , and all t ∈ [0 , T ] , we have z (0) n ζ nl ( t ) P → . (A.4) Proof of (A.4) for l = 1 . By Lemma 3.1 in Figueroa-López & Li (2020b) with f=1 and Assumption we have ∆ n n (cid:88) j =1 K b n ( t j − − t ) − (cid:90) T K b n ( s − t ) ds = 12 (cid:0) K ( A + ) − K ( B − ) (cid:1) ∆ n b + o (cid:18) ∆ n b (cid:19) = O (cid:18) ∆ n b (cid:19) , where ( A, B ) is the support of K and −∞ ≤ A < < B ≤ ∞ . Therefore, the boundedness of σ implies ζ n ( t ) = ( σ ni ) (cid:32)(cid:90) T K b n ( s − t ) ds (cid:33) − σ t + O (cid:18) ∆ n b (cid:19) = ( σ ni ) − σ t + C (cid:90) (0 ,T ) c K b n ( t − τ ) dt + O (cid:18) ∆ n b (cid:19) . : E (cid:0) σ i − σ t (cid:1) ≤ C ∆ n . And Assumption implies x / (cid:82) ∞ x K ( u ) du → , as x → ∞ . We then have b − / n (cid:90) (0 ,T ) c K b n ( t − τ ) dt = 1 √ b n (cid:32)(cid:90) τbn −∞ K ( u ) du + (cid:90) ∞ T − τbn K ( u ) du (cid:33) → , as n → ∞ . Thus, z (0) n ζ n ( t ) → since z n √ ∆ n → , z (0) n ∆ n b n → and z (0) n =  βb − / n if m n ∆ / n → β < ∞ b − / n if m n ∆ / n → β = ∞ . Proof of (A.4) for l = 2 . Let ρ nj ( t ) = ∆ nj X − σ nj − ∆ nj W . In view of (2.1.44) in Jacod & Protter(2011), for q ≥ , we have: E (cid:0)(cid:12)(cid:12) ρ nj ( t ) (cid:12)(cid:12) q (cid:1) ≤ K q ∆ q/ n , E (cid:0)(cid:12)(cid:12) σ j − ∆ nj W (cid:12)(cid:12) q (cid:1) ≤ C ∆ q/ n . Then, since (cid:12)(cid:12)(cid:12)(cid:0) ∆ nj X (cid:1) − σ j − (cid:0) ∆ nj W (cid:1) (cid:12)(cid:12)(cid:12) ≤ (cid:16)(cid:12)(cid:12) ρ nj ( t ) (cid:12)(cid:12) + (cid:12)(cid:12) ρ nj ( t ) (cid:12)(cid:12) (cid:12)(cid:12) σ j − ∆ nj W (cid:12)(cid:12)(cid:17) , the inequalities aboveand the Cauchy-Schwartz inequality yield E | ζ n ( t ) | ≤ n (cid:88) j =1 | K b n ( t j − − t ) | E (cid:18)(cid:12)(cid:12) ρ nj ( t ) (cid:12)(cid:12) + (cid:113) E (cid:12)(cid:12) ρ nj ( t ) (cid:12)(cid:12) E (cid:12)(cid:12) σ j − ∆ nj W (cid:12)(cid:12) (cid:19) ≤ C n (cid:88) j =1 | K b n ( t j − − t ) | (cid:16) ∆ n + ∆ / n (cid:17) ∼ (cid:90) K ( u ) du (cid:112) ∆ n . We then have the result since z n √ ∆ n → . Proof of (A.4) for l = 3 . ζ n ( t ) can be written as σ ni ˜ σ ni Φ n ( t ) where each σ ni ˜ σ ni is bounded F (0) i measurable and Φ n ( t ) = n (cid:88) j =1 K b n ( t j − − t ) (cid:16)(cid:0) ∆ nj W (cid:1) − ∆ n (cid:17) (cid:0) B nj − − B ni (cid:1) . We can compute that E (Φ n ( t )) = 0 and (cid:12)(cid:12) E (cid:0) ∆ nj W ∆ nj B (cid:1)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) E (cid:16)(cid:82) t j t j − ρ s ds (cid:17)(cid:12)(cid:12)(cid:12) ≤ C ∆ n . Notice that (cid:0) ∆ nj W (cid:1) − ∆ n , B nj − − B ni are independent when j ≥ i +1 and (cid:0) ∆ nj W (cid:1) − ∆ n , B ni − B nj are independentwhen j ≤ i. Then, by tower property property, we have E (Φ n ( t )) = n (cid:88) j = i +1 K b n ( t j − − t )2∆ n ( j − − i )∆ n + i (cid:88) j =1 K b n ( t j − − t ) E (cid:18)(cid:16)(cid:0) ∆ nj W (cid:1) − ∆ n (cid:17) (cid:0) B ni − B nj + ∆ nj B (cid:1) (cid:19) ≤ n n (cid:88) j = i +1 K b n ( t j − − t )( t j − − t i ) + i (cid:88) j =1 K b n ( t j − − t ) (cid:32) n ( t i − t j ) + (cid:114) E (cid:16)(cid:0) ∆ nj W (cid:1) − ∆ n (cid:17) E (cid:0) ∆ nj B (cid:1) (cid:33) ≤ n n (cid:88) j = i +1 K b n ( t j − − t )( t j − − t i ) + i (cid:88) j =1 K b n ( t j − − t ) (cid:0) n ( t i − t j ) + C ∆ n (cid:1) ∼ ∆ n (cid:32)(cid:90) Tt K b n ( s − t )( s − t ) ds − (cid:90) t K b n ( s − t )( s − t ) ds (cid:33) ∼ ∆ n (cid:18)(cid:90) ∞ K ( u ) udu − (cid:90) −∞ K ( u ) udu (cid:19) , C = (cid:114) E (( χ − ) E (cid:16) ( χ ) (cid:17) . Then √ ∆ n Φ n ( t ) is bounded in probability, and the resultfollows, since z n √ ∆ n → . Proof of (A.4) for l = 4 . For j ≥ i + 1 , we have E (cid:16) (cid:0) σ nj − − σ ni (cid:1) (cid:12)(cid:12)(cid:12) F (0) i (cid:17) ≤ C (( j − − i )∆ n ) inview of (2.1.44) in Jacod & Protter (2011), page 43. Therefore, E (cid:0) σ nj − − σ ni (cid:1) ≤ C | j − − i | ∆ n , and E | ζ n ( t ) | ≤ C n (cid:88) j =1 | K b n ( t j − − t ) | (cid:113) E (cid:0) σ nj − − σ ni (cid:1) E (cid:0) ∆ nj W (cid:1) ≤ C n (cid:88) j =1 | K b n ( t j − − t ) | | j − − i | ∆ n ∼ (cid:90) T | K b n ( s − t ) | | s − t | ds ∼ b n (cid:18)(cid:90) ∞−∞ | K ( u ) | udu (cid:19) . The result follows with z n b n → . Proof of (A.4) for l = 5 . ζ n ( t ) can be written as n (cid:88) j =1 K b n ( t j − − t ) 2 σ ni η nj ( t ) (cid:0) ∆ nj W (cid:1) , where η nj ( t ) = (cid:0) σ nj − − σ ni − ˜ σ ni ( B j − − B i ) (cid:1) = (cid:82) t j − t i ˜ µ s ds + (cid:82) t j − t i ( ˜ σ s − ˜ σ ni ) dB s . We then have that E (cid:0) η nj ( t ) (cid:1) ≤ C | j − − i | ∆ n γ nj , with γ nj = j − − i )∆ n E (cid:16)(cid:82) ( j − n i ∆ n ( ˜ σ s − ˜ σ ni ) ds (cid:17) . Since ˜ σ is càdlàg and bounded, we see that γ nj → for all j . By successive conditioning and the above, plus the boundedness of σ and the Cauchy-Schwarz inequality, we have E | ζ n ( t ) | ≤ C n (cid:88) j =1 | K b n ( t j − − t ) | ∆ n (cid:113) | j − − i | ∆ n γ nj = o (cid:32)(cid:90) T K b n ( s − t ) (cid:112) | s − t | ds (cid:33) = o (cid:18)(cid:112) b n (cid:90) K ( u ) √ udu (cid:19) . Then the result is shown by z n √ b n → or . B Proof of Theorem . We ﬁrst introduce some notations needed for the proofs. Then, we recall some needed estimates andpreliminary results. Finally, we proceed to prove the result through three lemmas.By virtue of localization, without loss of generality, we assume throughout the proof that Γ t , Λ t ,and σ t are bounded and that {| σ t |} t ≤ T is bounded below by a constant c > (see Section 4.4.1 inJacod & Protter (2011) and Appendix A.5 in Aït-Sahalia & Jacod (2014) for details).22 eeded Notation

1. Deﬁne φ ( Y ) ni = ( Y ni ) −

12 ˆ Y ni = ( X ni + (cid:15) ni ) −

12 ˆ Y ni ,φ ni,j = ( σ ( i − j − n W ni + (cid:15) ni ) −

12 ˆ (cid:15) ni , Ψ ni,j = E ( φ ni,j |H ( i − n ) − ( σ ( i − j − n W ni ) . (B.1)2. With any process U, we associate the variables Γ( U ) ni = sup t ∈ [( i − n ,i ∆ n + k n ∆ n ] | U t − U ( i − n | , Γ (cid:48) ( U ) ni = (cid:16) E (cid:16) (Γ( U ) ni ) (cid:12)(cid:12)(cid:12) F ( i − n (cid:17)(cid:17) / . Some Preliminary Estimates and Results

1. By Lemma 16.5.14 from Jacod & Protter (2011), for some constant C , (cid:12)(cid:12) E (cid:0) φ ( Y ) ni − φ ni, (cid:12)(cid:12) F ( i − n (cid:1)(cid:12)(cid:12) ≤ C ∆ / n (cid:16) ∆ / n + Γ (cid:48) ( µ ) ni + Γ (cid:48) ( (cid:101) σ ) ni + Γ (cid:48) ( γ ) ni (cid:17) , (B.2)where ˜ σ t = σ t Λ t .2. As in Lemma 16.5.15 in Jacod & Protter (2011), if an array ( δ ni ) satisﬁes ≤ δ ni ≤ K, ∆ n E (cid:32) n (cid:88) i =1 δ ni (cid:33) → , (B.3)then, for any q > , the array ( | δ ni | q ) also satisﬁes (B.3). Furthermore, if U is a càdlàg boundedprocess, the two arrays (Γ( U ) ni ) and (Γ (cid:48) ( U ) ni ) also satisfy (B.3).3. Under Assumption and (2.9), by Lemma 16.5.13 in Jacod & Protter (2011), we have for all q > , E (cid:0) | φ ( Y ) ni | q + (cid:12)(cid:12) φ ni, (cid:12)(cid:12) q (cid:12)(cid:12) F ( i − n (cid:1) ≤ C q ∆ q/ n , (B.4) E (cid:16) (cid:12)(cid:12) φ ( Y ) ni − φ ni, (cid:12)(cid:12) (cid:12)(cid:12)(cid:12) F ( i − n (cid:17) ≤ C ∆ n (cid:16) ∆ / n + (Γ (cid:48) ( σ ) ni ) (cid:17) . (B.5)Similarly, we can obtain E (cid:0) (cid:12)(cid:12) φ ni,j (cid:12)(cid:12) q (cid:12)(cid:12) F ( i − j − n (cid:1) ≤ C q ∆ q/ n , (B.6)since σ is bounded.4. Let γ (cid:48) t = E (cid:16) | (cid:15) t | (cid:12)(cid:12)(cid:12) F (0) (cid:17) . Under Assumption and (2.9), by Lemma 16.5.12 in Jacod & Protter(2011), Ψ ni,j deﬁned in (B.1) is such that E (cid:0) (cid:12)(cid:12) Ψ ni,j (cid:12)(cid:12)(cid:12)(cid:12) F ( i − n (cid:1) ≤ C ∆ n + C ∆ / n (cid:0) Γ (cid:48) ( γ ) ni + Γ (cid:48) ( γ (cid:48) ) ni (cid:1) , (B.7) E (cid:16) (cid:12)(cid:12) Ψ ni,j (cid:12)(cid:12) (cid:12)(cid:12)(cid:12) F ( i − n (cid:17) ≤ C ∆ / n . (B.8)5. As n → ∞ so that m n → ∞ and m n ∆ n → , m n ∆ n (cid:88) K m n ∆ n ( t i − − τ ) → (cid:90) K ( x ) dx, ∆ n n (cid:88) i = j | K m n ∆ n ( t i − − τ ) | → (cid:90) | K ( x ) | dx. (B.9)23. By Itô’s Lemma and Burkholder-Davis-Gundy inequalities (see Section 2.1.5 in Jacod & Protter(2011)), we have for all s, t ≥ and p ≥ E (cid:32) sup r ∈ [0 ,s ] (cid:12)(cid:12) σ t + r − σ t (cid:12)(cid:12) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:33) ≤ C p s, (B.10) E (cid:32) sup r ∈ [0 ,s ] | σ t + r − σ t | p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:33) ≤ C p s. (B.11)Also, we have Γ (cid:48) ( σ ) ni = (cid:32) E (cid:32) sup t ∈ [( i − n ,i ∆ n + k n ∆ n ] (cid:12)(cid:12) σ t − σ ( i − n (cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ( i − n (cid:33)(cid:33) / ≤ C ( k n ∆ n ) / . (B.12)The following decomposition will be instrumental to deduce the behavior of the estimation error: ˆ σ ( k n , m n ) τ − σ τ = (cid:88) l =1 H ( l ) n , where H (1) n = 1 φ k n ( g ) n − k n +1 (cid:88) i =1 K m n ∆ n ( t i − − τ ) (cid:0) φ ni, − E (cid:0) φ ni, (cid:12)(cid:12) F ( i − n (cid:1)(cid:1) ,H (2) n = (cid:90) T K m n ∆ n ( t − τ ) σ t dt − σ τ ,H (3) n = 1 φ k n ( g ) n − k n +1 (cid:88) i =1 K m n ∆ n ( t i − − τ ) (cid:0) φ ( Y ) ni − φ ni, (cid:1) ,H (4) n = 1 φ k n ( g ) n − k n +1 (cid:88) i =1 K m n ∆ n ( t i − − τ ) E (cid:0) Ψ ni, (cid:12)(cid:12) F ( i − n (cid:1) ,H (5) n = ∆ n n − k n +1 (cid:88) i =1 K m n ∆ n ( t i − − τ ) σ t i − − (cid:90) T K m n ∆ n ( t − τ ) σ t dt. The 1st term is the statistical error, while the 2nd term is the local approximation error. Eachof these will contribute one term to the asymptotic variance in (2.15-i). Up to a negligible term,which is analyzed in ¯ H (4) n , the third term is obtained by freezing the volatility σ in ¯ X ni to thevalue σ ( i − n . The last term analyzes the error due to approximating the integral by its associatedRiemann sum.Theorem 2.2 will follow from the following lemmas: Lemma B.1.

Under Assumptions and and (2.9), with m n → ∞ , m n ∆ n → , and m n √ ∆ n →∞ , we have m / n ∆ / n H (1) n st −→ Z τ , where Z τ is described in Theorem . . Lemma B.2.

Under Assumptions and and (2.9), with m n → ∞ , m n ∆ n → and m n ∆ / n → β ∈ (0 , ∞ ) , m / n ∆ / n (cid:0) H (1) n + H (2) n (cid:1) st −→ Z τ + βZ (cid:48) τ , where Z (cid:48) τ is described in Theorem . . emma B.3. Under Assumptions and and (2.9), assuming m n ∆ / n → β ∈ [0 , ∞ ] , we have z n H ( l ) n P −→ for l = 3 , , , (B.13) where z n =  m / n ∆ / n if m n ∆ / n → β < ∞ , √ m n ∆ n if m n ∆ / n → β = ∞ . We prove the lemmas above in three steps. In Step 1, we start to prove the last lemma whichis more straightforward than the other two. In Step 2, we prove Lemma B.1. In Step 3, we showLemma B.2.

Step 1

For l = 3 , set ζ ni = 1 φ k n ( g ) K m n ∆ n ( t i − − τ ) (cid:0) φ ( Y ) ni − φ ni, (cid:1) . By Lemma 2.2.10 in Jacod & Protter (2011), the result follows if the array z n E (cid:0) | ζ ni | | F ( i − n (cid:1) isasymptotically negligible. To this end, note that (B.2) yields E (cid:0) | ζ ni | | F ( i − n (cid:1) ≤ C ∆ / n ∆ n | K m n ∆ n ( t i − − τ ) | E (cid:16)(cid:16) ∆ / n + Γ (cid:48) ( µ ) ni + Γ (cid:48) ( (cid:101) σ ) ni + Γ (cid:48) ( γ ) ni (cid:17)(cid:17) , where recall that we are assuming that ˜ σ , µ , and γ are càdlàg bounded processes by localization.Thus, from Lemma 16.5.15 in Jacod & Protter (2011), (cid:0) (Γ (cid:48) ( (cid:101) σ ) ni ) (cid:1) , (cid:0) (Γ (cid:48) ( µ ) ni ) (cid:1) , and (cid:0) (Γ (cid:48) ( γ ) ni ) (cid:1) satisfy (B.3). By Cauchy-Schwarz inequality and (B.9), ∆ n n − k n +1 (cid:88) j =1 | K m n ∆ n ( t i − − τ ) | E (Γ (cid:48) ( µ ) ni ) ≤ (cid:113)(cid:88) ∆ n K m n ∆ n ( t i − − τ ) (cid:88) ∆ n E (Γ (cid:48) ( µ ) ni ) = o (cid:18) √ m n ∆ n (cid:19) . (B.14)We can obtain similar results on (Γ (cid:48) ( (cid:101) σ ) ni ) and Γ (cid:48) (Υ) ni . Thus, z n n − k n +1 (cid:88) j =1 E (cid:0) | ζ ni | | F ( i − n (cid:1) ≤ O( z n ∆ / n ) + o( z n m / n ∆ / n ) n →∞ −→ . This ﬁnishes the proof of Lemma B. for l = 3 .For l = 4 , by (2.10), (2.13), (B.7), and (B.9), we have | H (4) n | ≤ C k n n − k n +1 (cid:88) j =1 | K m n ∆ n ( t j − − τ ) | (cid:16) ∆ n + ∆ / n (cid:0) Γ (cid:48) ( γ ) ni + Γ (cid:48) ( γ (cid:48) ) ni (cid:1)(cid:17) = O (cid:16) ∆ / n (cid:17) + o (cid:32) ∆ / n √ m n ∆ n (cid:33) , where we used a similar argument as in (B.14) to deduce the second inequality above. Thus, wededuce (B.13) for l = 4 . 25or l = 5 , we have (cid:12)(cid:12) H (5) n (cid:12)(cid:12) ≤ (cid:90) T ( n − k n +1)∆ n (cid:12)(cid:12) K m n ∆ n ( t − τ ) σ t (cid:12)(cid:12) dt (B.15) + n − k n +1 (cid:88) j =1 (cid:90) t j t j − (cid:12)(cid:12)(cid:12) K m n ∆ n ( s − τ ) σ s − K m n ∆ n ( t j − − τ ) σ j − n (cid:12)(cid:12)(cid:12) ds (B.16) ≤ C m n √ ∆ n + ( n − k n − n (cid:32) m n ∆ / n + 1 m n ∆ n (cid:33) (B.17) = O P (cid:18) m n √ ∆ n (cid:19) , (B.18)where the ﬁrst term in (B.17) follows from the boundedness of K and σ as follows: (cid:90) T ( n − k n +1)∆ n (cid:12)(cid:12) K m n ∆ n ( t − τ ) σ t (cid:12)(cid:12) dt ≤ C m n ∆ n k n ∆ n = C m n √ ∆ n , while the second term in (B.17) can be deduced by (B.10) and Lipschitz property of K . Indeed, for s ∈ [ t j − , t j ] and b n := m n ∆ n , (cid:12)(cid:12)(cid:12) K b n ( s − τ ) σ s − K b n ( t j − − τ ) σ j − n (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) K b n ( s − τ ) σ s − K b n ( s − τ ) σ j − n (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) K b n ( s − τ ) σ j − n − K b n ( t j − − τ ) σ j − n (cid:12)(cid:12)(cid:12) = O P (cid:32) m n ∆ / n (cid:33) + O P (cid:18) m n ∆ n (cid:19) , So, we deduce (B.13) for l = 5 . Step 2

To show Lemma B. , we need several preliminary lemmas. We employ the ‘block splitting’ methodproposed in Jacod & Protter (2011) (see Section 16.5.4, page 548 therein). Recall that H (1) n = n − k n +1 (cid:88) i =1 ζ ni , where ζ ni = φ kn ( g ) K m n ∆ n ( t i − − τ ) (cid:0) φ ni, − E (cid:0) φ ni, (cid:12)(cid:12) F ( i − n (cid:1)(cid:1) . The variables ζ ni are not martingalediﬀerences. To use martingale methods, we ﬁx an integer m ≥ , and divide the summands in thedeﬁnition of H (1) n into blocks of size mk n and k n . Concretely, the (cid:96) th big block, of size mk n ,contains the indices between I ( m, n, (cid:96) ) = ( (cid:96) − m + 1) k n + 1 and I ( m, n, (cid:96) ) + mk n − . Thenumber of such blocks before time t is l n ( m ) = (cid:104) n − k n +1( m +1) k n (cid:105) . These big blocks are separated bysmall blocks of size k n , and the “real” time corresponding to the beginning of the (cid:96) th big block is t ( m, n, (cid:96) ) = ( I ( m, n, (cid:96) ) − n . Then we introduce the summand over all the big blocks, Z n ( m ) := l n ( m ) (cid:88) (cid:96) =1 δ ( m ) n(cid:96) := l n ( m ) (cid:88) (cid:96) =1 mk n − (cid:88) r =0 ζ nI ( m,n,(cid:96) )+ r , (B.19)Note that the sequence ( δ ( m ) n(cid:96) ) are now martingale diﬀerences w.r.t. the discrete ﬁltration G (cid:96) = F ( I ( m,n,(cid:96) +1) − n , for (cid:96) = 1 , . . . , l n ( m ) We now show that the contribution of the small blocks, i.e. H (1) n − Z n ( m ) , is asymptotically“negligible” compared to m − / n ∆ − / n . 26 emma B.4. Under Assumptions and and (2.9), we have lim m →∞ lim sup n →∞ E (cid:16) m / n ∆ / n (cid:12)(cid:12) H (1) n − Z n ( m ) (cid:12)(cid:12)(cid:17) = 0 Proof.

Denote by J ( n, m ) the set of all integers j between 1 and n − k n + 1 , which are not in thebig blocks (i.e., those corresponding to the small blocks). We further divide J ( n, m ) into k n disjointsubsets J ( n, m, r ) for r = 1 , ..., k n , where J ( n, m, r ) is the set of all j ∈ J ( n, m ) equal to r modulo k n . Then, H (1) n − Z n ( m ) = k n (cid:88) r =1 (cid:88) j ∈ J ( n,m,r ) ζ nj . Observe that E (cid:0) ζ nj (cid:12)(cid:12) F ( j − n (cid:1) = 0 and ζ nj is F ( j + k n )∆ n measurable. Then (cid:80) j ∈ J ( n,m,r ) ζ nj is thesum of martingale increments, because any two distinct indices in J ( n, m, r ) are more than k n apart.Therefore by (B.4) and the fact that E (cid:0) ζ nj (cid:12)(cid:12) F ( j − n (cid:1) = 0 , for some constant C (changing fromline to line) and large enough n , E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) j ∈ J ( n,m,r ) ζ nj (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  = E  (cid:88) j ∈ J ( n,m,r ) (cid:12)(cid:12) ζ nj (cid:12)(cid:12)  ≤ C (cid:88) j ∈ J ( n,m,r ) ∆ n φ k n ( g ) K m n ∆ n ( t j − − τ ) ≤ C m + 1) k n m n ∆ n (cid:90) K ( u ) du ≤ C ∆ / n mm n , where the last inequality holds because of (2.10) and the second inequality holds because, recallingthat two consecutive j ’s in J ( n, m, r ) are separated by ( m + 1) k n , we have ( m + 1) k n m n ∆ n (cid:88) j ∈ J ( n,m,r ) K m n ∆ n ( t j − − τ ) n →∞ −→ (cid:90) K ( u ) du. Then, E (cid:16) m / n ∆ / n (cid:12)(cid:12) H (1) n − Z n ( m ) (cid:12)(cid:12)(cid:17) ≤ Cm / n ∆ / n k n (cid:115) ∆ / n mm n ≤ C (cid:18) √ m (cid:19) , for large enough n . As m → ∞ , the above quantity goes to and we get the result.Next, we modify the “big-blocks” process Z n ( m ) deﬁned in (B.19) in such a way that eachsummand involves the volatility at the beginning of the corresponding large block. Recalling thenotation in (B.1), we set η ni,r = 1 φ k n ( g ) K m n ∆ n ( t i − − τ ) (cid:0) φ ni,r − E (cid:0) φ ni,r (cid:12)(cid:12) F ( i − r − n (cid:1)(cid:1) , (B.20) η (cid:48) ni,r = 1 φ k n ( g ) K m n ∆ n ( t i − − τ ) (cid:0) E (cid:0) φ ni,r (cid:12)(cid:12) F ( i − r − n (cid:1) − E (cid:0) φ ni,r (cid:12)(cid:12) F ( i − n (cid:1)(cid:1) , (B.21) M n ( m ) = l n ( m ) (cid:88) i =1 mk n − (cid:88) r =0 η nI ( m,n,i )+ r,r , M (cid:48) n ( m ) = l n ( m ) (cid:88) i =1 mk n − (cid:88) r =0 η (cid:48) nI ( m,n,i )+ r,r . (B.22) Lemma B.5.

Under Assumptions and and (2.9), for a ﬁxed m , lim n →∞ E (cid:16) m / n ∆ / n | Z n ( m ) − M n ( m ) − M (cid:48) n ( m ) | (cid:17) = 0 . roof. We use a similar method as in the previous lemma: Let J (cid:48) ( n, m ) the set of all integers j between 1 and n − k n + 1 , which are inside the big blocks, that is of the form j = I ( m, n, i ) + l forsome i ≥ and l ∈ { , · · · , mk n − } . Let J (cid:48) ( n, m, r ) be the set of all j ∈ J (cid:48) ( n, m ) equal to r modulo k n . We can then write Z n ( m ) − M n ( m ) − M (cid:48) n ( m ) = k n (cid:88) r =1 (cid:88) j ∈ J (cid:48) ( n,m,r ) θ nj , where θ nj = φ kn ( g ) K m n ∆ n ( t j − − τ ) (cid:16) φ nj, − φ nj,l − E (cid:16) φ nj, − φ nj,l (cid:12)(cid:12)(cid:12) F ( j − n (cid:17)(cid:17) , when j = I ( m, n, i )+ l .Note φ j, and φ j,l have the same noise part, − ˆ (cid:15) nj , and the cross term W nj (cid:15) nj has expectation .Then, for some constant C and large enough n , E (cid:12)(cid:12) θ nj (cid:12)(cid:12) ≤ φ k n ( g ) K m n ∆ n ( t j − − τ ) E (cid:12)(cid:12) φ nj, − φ nj,l (cid:12)(cid:12) = 1 φ k n ( g ) K m n ∆ n ( t j − − τ ) E (cid:18)(cid:16) σ j − n − σ j − l − n (cid:17) (cid:16) W nj (cid:17) (cid:19) ≤ CK m n ∆ n ( t j − − τ ) mk n ∆ n for j ∈ J (cid:48) ( n, m, r ) , where the last inequality follows by conditioning on F ( j − n , using that E (cid:104) (cid:16) W nj (cid:17) |F ( j − n (cid:105) =3 φ k n ( g ) ∆ n , and applying (B.10). As in the proof of the previous lemma, E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) j ∈ J (cid:48) ( n,m,r ) θ nj (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  = E  (cid:88) j ∈ J (cid:48) ( n,m,r ) (cid:12)(cid:12) θ nj (cid:12)(cid:12)  ≤ C ∆ n k n (cid:88) j ∈ J (cid:48) ( n,m,r ) K m n ∆ n ( t j − − τ ) ≤ C ∆ n m n (cid:90) K ( u ) du. So we have E (cid:16) m / n ∆ / n | Z n ( m ) − M n ( m ) − M (cid:48) n ( m ) | (cid:17) ≤ Cm / n ∆ / n k n (cid:114) ∆ n m n = O(∆ / n ) → . Now we prove M (cid:48) n ( m ) , deﬁned in (B.22), is asymptotically negligible. Lemma B.6.

Under Assumptions and and (2.9), lim n →∞ E (cid:16) m / n ∆ / n | M (cid:48) n ( m ) | (cid:17) = 0 . Proof.

Recall that Ψ ni,j = E ( φ ni,j |H ( i − n ) − ( σ ( i − j − n W ni ) and, since H t = F (0) ⊗ σ ( (cid:15) s : s ∈ [0 , t )) , E (cid:0) Ψ ni + r,r (cid:12)(cid:12) F ( i − n (cid:1) = E (cid:0) φ ni + r,r (cid:12)(cid:12) F ( i − n (cid:1) − E (cid:16) ( σ ( i − n W ni + r ) (cid:12)(cid:12)(cid:12) F ( i − n (cid:17) , E (cid:0) Ψ ni + r,r (cid:12)(cid:12) F ( i + r − n (cid:1) = E (cid:0) φ ni + r,r (cid:12)(cid:12) F ( i + r − n (cid:1) − E (cid:16) ( σ ( i − n W ni + r ) (cid:12)(cid:12)(cid:12) F ( i + r − n (cid:17) . Since W ni + r is a linear combination of W ( i + r )∆ n , . . . , W ( i + r + k n − n , we have: E (cid:16) ( σ ( i − n W ni + r ) (cid:12)(cid:12)(cid:12) F ( i − n (cid:17) = E (cid:16) ( σ ( i − n W ni + r ) (cid:12)(cid:12)(cid:12) F ( i + r − n (cid:17) , η (cid:48) ni + r,r = 1 φ k n ( g ) K m n ∆ n ( t i + r − − τ ) (cid:0) E (cid:0) φ ni + r,r (cid:12)(cid:12) F ( i − n (cid:1) − E (cid:0) φ ni + r,r (cid:12)(cid:12) F ( i + r − n (cid:1)(cid:1) = 1 φ k n ( g ) K m n ∆ n ( t i + r − − τ ) (cid:0) E (cid:0) Ψ ni + r,r (cid:12)(cid:12) F ( i − n (cid:1) − E (cid:0) Ψ ni + r,r (cid:12)(cid:12) F ( i + r − n (cid:1)(cid:1) . Next, note that, by (B.8), we have E (cid:12)(cid:12) E (cid:0) Ψ ni + r,r (cid:12)(cid:12) F ( i − n (cid:1) − E (cid:0) Ψ ni + r,r (cid:12)(cid:12) F ( i + r − n (cid:1)(cid:12)(cid:12) ≤ E (cid:16) E (cid:0) Ψ ni + r,r (cid:12)(cid:12) F ( i + r − n (cid:1) (cid:17) ≤ E (cid:16) E (cid:16) (cid:0) Ψ ni + r,r (cid:1) (cid:12)(cid:12)(cid:12) F ( i + r − n (cid:17)(cid:17) ≤ C ∆ / n . We can then deduce that for r (cid:54) = l , E (cid:0) η (cid:48) ni + r,r η (cid:48) ni + l,l (cid:1) ≤ (cid:114) E (cid:0) η (cid:48) ni + r,r (cid:1) E (cid:16) η (cid:48) ni + l,l (cid:17) ≤ C φ k n ( g ) | K m n ∆ n ( t i + r − − τ ) || K m n ∆ n ( t i + l − − τ ) | ∆ / n . Therefore, denoting for simplicity I i = I ( m, n, i ) = ( i − m + 1) k n + 1 , E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) mk n − (cid:88) r =0 η (cid:48) nI ( m,n,i )+ r,r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C φ k n ( g ) (cid:32) mk n − (cid:88) r =0 | K m n ∆ n ( t I i + r − − τ ) | (cid:33) ∆ / n ≤ C k n n (cid:32)(cid:90) t Ii +( mkn − n t Ii − | K m n ∆ n ( s − τ ) | ds (cid:33) ∆ / n ≤ C ∆ / n (cid:32)(cid:90) t Ii +( mkn − n t Ii − | K m n ∆ n ( s − τ ) | ds (cid:33) . The result is proved by the following: m / n ∆ / n E | M (cid:48) n ( m ) | ≤ Cm / n ∆ / n l n ( m ) (cid:88) i =1 ∆ / n (cid:32)(cid:90) t Ii +( mk n − n t Ii − | K m n ∆ n ( s − τ ) | ds (cid:33) ≤ Cm / n ∆ / n (cid:90) | K ( u ) | du → . At this stage we are ready to prove a CLT for the processes M n ( m ) , for each ﬁxed m . We followthe arguments of Jacod & Protter (2011) in page 550. For completeness, we outline them here. Let L ( g ) t = (cid:90) t +1 t g ( u − t ) dW u , L (cid:48) ( g ) t = (cid:90) t +1 t g (cid:48) ( u − t ) dW u , (B.23)where W and W are two independent one-dimensional Brownian motions deﬁned on an auxiliaryspace (cid:18) ˜Ω , ˜ F , (cid:16) ˜ F t (cid:17) t ≥ , ˜ P (cid:19) . The processes L ( g ) and L (cid:48) ( g ) are independent, stationary, centered, andGaussian with covariance E ( L ( g ) t L ( g ) s ) = (cid:90) ( t +1) ∧ ( s +1) t ∨ s g ( u − t ) g ( u − s ) du, E ( L (cid:48) ( g ) t L (cid:48) ( g ) s ) = (cid:90) ( t +1) ∧ ( s +1) t ∨ s g (cid:48) ( u − t ) g (cid:48) ( u − s ) du. (cid:101) E the expectation with respect to ˜ P , let µ ( v, v (cid:48) ) = (cid:101) E (cid:16) ( vL ( g ) s + v (cid:48) L (cid:48) ( g ) s ) − v (cid:48) φ ( g (cid:48) ) (cid:17) ,µ (cid:48) ( v, v (cid:48) ; s, s (cid:48) ) = (cid:101) E (cid:16)(cid:16) ( vL ( g ) s + v (cid:48) L (cid:48) ( g ) s ) − v (cid:48) φ ( g (cid:48) ) (cid:17) (cid:16) ( vL ( g ) s (cid:48) + v (cid:48) L (cid:48) ( g ) s (cid:48) ) − v (cid:48) φ ( g (cid:48) ) (cid:17)(cid:17) ,R ( v, v (cid:48) ) = (cid:90) ( µ (cid:48) ( v, v (cid:48) ; 1 , s ) − µ ( v, v (cid:48) ) µ ( v, v (cid:48) )) ds. As argued in the proof of Theorem 7.20 in Aït-Sahalia & Jacod (2014), one can show that θ R ( σ t , θv t ) = 4 (cid:0) Φ σ t /θ + 2Φ σ t γ t θ + Φ γ t θ (cid:1) , where v t = √ γ t is the conditional standard deviation for (cid:15) t . For a ﬁxed m and t ∈ [0 , T ] , let γ ( m ) t = mµ ( σ t , θv t ) , γ (cid:48) ( m ) t = (cid:90) m ds (cid:90) m ds (cid:48) µ (cid:48) ( σ t , θv t ; s, s (cid:48) ) . Lemma B.7.

Under Assumptions and and (2.9), for each m ≥ , as n → ∞ , the process m / n ∆ / n M n ( m ) converges in law to a r.v. Y ( m ) , which conditionally on F is a centered Gaussianr.v. with variance E (cid:16) (cid:0) Y ( m ) (cid:1) (cid:12)(cid:12)(cid:12) F (cid:17) = 1 m + 1 1 θ (cid:0) γ (cid:48) ( m ) τ − γ ( m ) τ (cid:1) (cid:90) K ( u ) du. Proof.

Recall that M n ( m ) = l n ( m ) (cid:88) i =1 mk n − (cid:88) r =0 η nI ( m,n,i )+ r,r , where η ni + r,r = 1 φ k n ( g ) K m n ∆ n ( t i + r − − τ ) (cid:0) φ ni + r,r − E (cid:0) φ ni + r,r (cid:12)(cid:12) F ( i − n (cid:1)(cid:1) φ ni + r,r = ( σ ( i − n W ni + r + (cid:15) ni + r ) −

12 ˆ (cid:15) ni + r ,W ni = − k n (cid:88) j =1 (cid:18) g (cid:18) jk n (cid:19) − g (cid:18) j − k n (cid:19)(cid:19) W ( i + j − n I ( m, n, i ) = ( i − m + 1) k n + 1 , l n ( m ) = (cid:20) n − k n + 1( m + 1) k n (cid:21) . For i = 1 , · · · , l n ( m ) , let η ( m ) ni := m / n ∆ / n mk n − (cid:88) r =0 η nI ( m,n,i )+ r,r , G ni = F ( I ( m,n,i +1) − n . (B.24)For simplicity, we write I i = I ( m, n, i ) . Note that η ( m ) ni depends on σ ( I i − n , W ( I i − n , . . . , W ( I i +1 − n , (cid:15) ( I i − n , . . . , (cid:15) ( I i +1 − n . Therefore, η ( m ) ni is G ni -measurable and, furthermore, E [ η ( m ) ni |G ni − ] = 0 . We will apply Theorem2.2.15 in Jacod & Protter (2011) to the martingale increments η ( m ) ni , i = 1 , · · · , l n ( m ) .By the Jensen type inequality | (cid:80) mk n − r =0 a r b r | ≤ ( (cid:80) mk n − r =0 | a r | ) (cid:80) mk n − r =0 | a r | b r and (B.6), we30ave, for each ﬁxed m , l n ( m ) (cid:88) i =1 E (cid:16) | η ( m ) ni | (cid:12)(cid:12)(cid:12) G ni − (cid:17) ≤ C l n ( m ) (cid:88) i =1 m n ∆ n (cid:32) mk n − (cid:88) r =0 | φ k n ( g ) | | K m n ∆ n ( t I i + r − − τ ) | (cid:33) ≤ C l n ( m ) (cid:88) i =1 m m n ∆ n (cid:32) mk n ∆ n (cid:90) t Ii − + mk n ∆ n t Ii − | K m n ∆ n ( s − τ ) | ds (cid:33) ≤ C l n ( m ) (cid:88) i =1 m m n ∆ n mk n ∆ n (cid:90) t Ii − + mk n ∆ n t Ii − K m n ∆ n ( s − τ ) ds ≤ O (cid:32) m n ∆ / n m n ∆ n ) (cid:90) K ( u ) du (cid:33) = O (cid:32) m n ∆ / n (cid:33) → . (B.25)Therefore, for every ε > , l n ( m ) (cid:88) i =1 E (cid:16) | η ( m ) ni | | η ( m ) ni | ≥ ε (cid:12)(cid:12)(cid:12) G ni − (cid:17) ≤ (cid:15) l n ( m ) (cid:88) i =1 E (cid:16) | η ( m ) ni | (cid:12)(cid:12)(cid:12) G ni − (cid:17) n →∞ −→ . It remains to prove that, for a ﬁxed m , S n := l n ( m ) (cid:88) i =1 E (cid:16) ( η ( m ) ni ) (cid:12)(cid:12)(cid:12) G ni − (cid:17) P −→ m + 1) φ ( g ) 1 θ (cid:90) K ( u ) du (cid:0) γ (cid:48) ( m ) τ − γ ( m ) τ (cid:1) , (B.26)and also, for any bounded F t -martingale N that is orthogonal to W , or for N = W , l n ( m ) (cid:88) i =1 E (cid:0) η ( m ) ni (cid:0) N ( I i +1 − n − N ( I i − n (cid:1) (cid:12)(cid:12) G ni − (cid:1) P → . (B.27)We start by proving (B.26). Let α ni := 1 k n ∆ n mk n − (cid:88) r =0 K m n ∆ n ( t I i + r − − τ ) φ nI i + r,r . = 1 k n ∆ n K m n ∆ n ( t I i − − τ ) mk n − (cid:88) r =0 φ nI i + r,r + O P (cid:32) m n ∆ / n (cid:33) , (B.28)where for the second equality above we applied Assumption and (B.4) to show k n ∆ n mk n − (cid:88) r =0 | K m n ∆ n ( t I i + r − − τ ) − K m n ∆ n ( t I i − − τ ) | E (cid:12)(cid:12) φ nI i + r,r (cid:12)(cid:12) ≤ C k n ∆ n mk n − (cid:88) r =0 m n ∆ n mk n ∆ n m n ∆ n ∆ / n = O (cid:32) m n ∆ / n (cid:33) . For ( I ( m, n, i ) − n ≤ s < ( I ( n, m, i + 1) − n , set γ ns = E (cid:32) k n ∆ n mk n − (cid:88) r =0 φ nI i + r,r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G ni − (cid:33) , γ (cid:48) ns = E  (cid:32) k n ∆ n mk n − (cid:88) r =0 φ nI i + r,r (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G ni −  . Then, we have S n = m n ∆ / n k n ∆ n φ k n ( g ) l n ( m ) (cid:88) i =1 (cid:16) E (cid:16) ( α ni ) (cid:12)(cid:12)(cid:12) G i − (cid:17) − ( E ( α ni | G i − )) (cid:17) = m n ∆ / n k n ∆ n φ k n ( g ) l n ( m ) (cid:88) i =1 K m n ∆ n ( t I i − − τ ) (cid:18) γ (cid:48) nt Ii − − (cid:16) γ nt Ii − (cid:17) (cid:19) + O P (cid:18) m n √ ∆ n (cid:19) .

31f we can show that for any s ∈ [0 , T ] , γ ns P −→ γ ( m ) s , γ (cid:48) ns P −→ γ (cid:48) ( m ) s , (B.29)we can obtain (B.26): S n = 1( m + 1) k n ∆ n m n ∆ / n k n ∆ n (cid:90) T K m n ∆ n ( s − τ ) (cid:0) γ (cid:48) ( m ) s − γ ( m ) s (cid:1) ds + o P (1)= 1 θ ( m + 1) (cid:90) T − τmn ∆ n − τmn ∆ n K ( u ) (cid:0) γ (cid:48) ( m ) τ + um n ∆ n − γ ( m ) τ + um n ∆ n (cid:1) du + o P (1) P −→ θ ( m + 1) (cid:90) K ( u ) du (cid:0) γ (cid:48) ( m ) τ − γ ( m ) τ (cid:1) , where the last line can be shown as follows. For all (cid:15) > , there exists an interval I = [ a, b ] suchthat (cid:82) I c K ( u ) du ≤ (cid:15) . Let I n = [ − τm n ∆ n , T − τm n ∆ n ] , f n ( u ) = K ( u ) (cid:0) γ (cid:48) ( m ) τ + um n ∆ n − γ ( m ) τ + um n ∆ n (cid:1) and f ( u ) = K ( u ) du (cid:0) γ (cid:48) ( m ) τ − γ ( m ) τ (cid:1) . Then, we have for some constant C , lim sup n → (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) I n f n ( u ) du − (cid:90) f ( u ) du (cid:12)(cid:12)(cid:12)(cid:12) ≤ lim sup n → (cid:90) I | f n ( u ) − f ( u ) | du + (cid:90) I n ∩ I c | f n ( u ) | du + (cid:90) I c | f ( u ) | du ≤ C(cid:15), since γ, γ (cid:48) are continuous and bounded and K is bounded. The result follows by letting (cid:15) → .To show (B.29), we ﬁx s ∈ [0 , T ] and apply Lemma 16.3.9 in Jacod & Protter (2011) with thesequence i n = I ( m, n, i ) , T n = ( I ( m, n, i ) − n if I ( m, n, i − n ≤ s < I ( m, n, i )∆ n . Concretely,with the notation L nu = 1 √ k n ∆ n W ni n +[ k n u ] , L (cid:48) nu = (cid:112) k n (cid:15) ni n +[ k n u ] , (cid:98) L nu = k n (cid:98) (cid:15) ni n +[ k n u ] , for u ∈ [0 , m ] , we have k n ∆ n mk n − (cid:88) r =0 φ ni + r,r = F n (cid:16) σ T n L n , L (cid:48) n , (cid:98) L n (cid:17) , where F n is the function on D × D × D (here D = D (cid:0) [0 , m ] : R (cid:1) is the Skorokhod space), deﬁned by F n ( x, y, z ) = 1 k n mk n − (cid:88) r =0 (cid:32) x (cid:18) rk n (cid:19) + 1 (cid:112) k n ∆ n y (cid:18) rk n (cid:19)(cid:33) − k n ∆ n z (cid:18) rk n (cid:19) . (B.30)Note that the functions F n , F n converge pointwise to F, F , respectively, where F ( x, y, z ) = (cid:90) m (cid:26) ( x ( s ) + θy ( s )) − θ z ( s ) (cid:27) ds. Now we deduce from Lemma 16.3.9 in Jacod & Protter (2011) that with Z = 1 , φ ( f ) = (cid:82) f ( u ) du and the notation from (B.23) : E (cid:16) F n (cid:16) σ T n L n , L (cid:48) n , (cid:98) L n (cid:17)(cid:12)(cid:12)(cid:12) G ( i − (cid:17) P → E ( F ( σ s L, v s L (cid:48) , φ ( g (cid:48) ) γ s )) = γ ( m ) s . Similarly, E (cid:16) F n (cid:16) σ T n L n , L (cid:48) n , (cid:98) L n (cid:17)(cid:12)(cid:12)(cid:12) G ( i − (cid:17) P → γ ( m ) (cid:48) s , Below, we assume that the space (cid:16) ˜Ω , ˜ F , ˜ P (cid:17) , where W and W (hence, L and L (cid:48) ) are deﬁned, is an extension ofthe space (Ω , F , P ) and that W and W are independent of X and (cid:15) . ζ ni = m / n ∆ / n φ k n ( g ) mk n − (cid:88) r =0 K m n ∆ n ( t I i + r,r − τ ) φ nI i + r,r , and set D ni ( N ) = N ( I i +1 − n − N ( I i − n . Since E (cid:0) D ( N ) ni |G ni − (cid:1) = 0 , we only need to prove that,for any bounded martingale N , l n ( m ) (cid:88) i =1 E (cid:0) ζ ni D ni ( N ) (cid:12)(cid:12) G ni − (cid:1) P → . (B.31)Following the same argument of (B.25) and inequality (B.6), we have l n ( m ) (cid:88) i =1 E ( ζ ni ) = m n ∆ / n φ k n ( g ) l n ( m ) (cid:88) i =1 E (cid:32) mk n − (cid:88) r =0 K m n ∆ n ( t I i + r − − τ ) (cid:12)(cid:12) φ nI i + r,r (cid:12)(cid:12)(cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i −  ≤ C l n ( m ) (cid:88) i =1 m n ∆ / n φ k n ( g ) (cid:32) mk n − (cid:88) r =0 K m n ∆ n ( t I i + r − − τ ) (cid:33) ≤ C m n ∆ / n φ k n ( g ) l n ( m ) (cid:88) i =1 m k n (cid:32) mk n ∆ n (cid:90) t i − + mk n ∆ n t Ii − K m n ∆ n ( s − τ ) ds (cid:33) ≤ C l n ( m ) (cid:88) i =1 m m n ∆ / n mk n ∆ n (cid:90) t Ii − + mk n ∆ n t Ii − K m n ∆ n ( s − τ ) ds ≤ O (cid:18)(cid:90) K ( u ) du (cid:19) = O (1) . (B.32)If N is a square-integrable martingale, the Cauchy-Schwarz inequality yields, l n ( m ) (cid:88) i =1 E (cid:0) ζ ni D ni ( N ) (cid:12)(cid:12) G ni − (cid:1) ≤ (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) l n ( m ) (cid:88) i =1 E ( ζ ni )   l n ( m ) (cid:88) i =1 E ( D ni ( N ))  ≤ C (cid:113) E N T . Note with notation (B.1) and ζ (cid:48) ni = m / n ∆ / n φ k n ( g ) mk n − (cid:88) r =0 K m n ∆ n ( t I i + r,r − τ ) Ψ nI i + r,r , the same argument also yields E ( ζ (cid:48) ni D ni ( N ) |G i − ) ≤ C ∆ / n (cid:113) E N T . (B.33)As shown in page 552 of Jacod & Protter (2011), we just need to prove (B.31) for N ∈ N ( i ) , i =0 , , where N (0) is the set of all bounded (cid:16) F (0) t (cid:17) -martingales orthogonal to W and N (1) is the setof all martingales having N ∞ = h ( χ t , . . . , χ t w ) , where h is a Borel bounded function on R w and t < · · · < t w and w ≥ . When N is either W or in N (0) , D ( N ) ni is H ∞ measurable. Therefore E (cid:0) ζ nI i D ( N ) ni |G ni − (cid:1) is equal to E (cid:16) ζ (cid:48) nI ( m,n,i ) D ( N ) ni |G ni − (cid:17) + m / n ∆ / n φ k n ( g ) E (cid:32) mk n − (cid:88) r =0 K m n ∆ n ( t I i + r,r − τ ) (cid:16) σ ( I i − n W nI i + r (cid:17) D ( N ) ni (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G ni − (cid:33) . N = W since it is the F ( I i − n -conditional expectation of anodd function of the increments of the process W after time ( I i − n . Suppose now that N is abounded martingale, orthogonal to W . By Itô’s formula we see that (cid:16) W nj (cid:17) is the sum of a constant(depending on n ) and of a martingale which is a stochastic integral with respect to W, B on theinterval [( j − n , ( j + k n −

1) ∆ n ] . Then the orthogonality of N and W implies this second termabove vanishes as well. So in view of (B.33), we have the following inequality which implies theresult: E (cid:0) ζ nI i D ( N ) ni |G ni − (cid:1) ≤ C ∆ / n (cid:113) E N T . When N ∈ N (1) is associated with h and w and the t i ’s, the same argument in Jacod & Protter(2011) and the inequality E ( ζ ni ) ≤ C m n √ ∆ n deduced from (B.32) yield E  l n ( m ) (cid:88) i =1 (cid:12)(cid:12) E (cid:0) ζ nI i D ( N ) ni |G ni − (cid:1)(cid:12)(cid:12) ≤ Cw (cid:18) ∆ / n + 1 m n √ ∆ n (cid:19) , and (B.27) is shown. This ﬁnishes the proof for Lemma B. .The only thing left to prove Lemma B. is the stable convergence in law Y ( m ) st −→ Z τ , as m → ∞ . For this, we only need to show that, for each τ ∈ (0 , T ) , as m → ∞ , m + 1 (cid:0) γ (cid:48) ( m ) τ − γ ( m ) τ (cid:1) st −→ R ( σ τ , θv τ ) . Recall that the process ( L, L (cid:48) ) is stationary, and the variables ( L t , L (cid:48) t ) and ( L s , L (cid:48) s ) are independent if | s − t | ≥ . So µ (cid:48) ( v, v (cid:48) ; s, s (cid:48) ) = ( µ ( v, v (cid:48) )) when | s − s (cid:48) | ≥ and µ (cid:48) ( v, v (cid:48) ; s, s (cid:48) ) = µ (cid:48) ( v, v (cid:48) ; 1 , s (cid:48) + 1 − s ) ,for all s, s (cid:48) ≥ with s (cid:48) + 1 − s ≥ . Then if m ≥ and letting µ = µ ( σ τ , θv τ ) and µ (cid:48) ( s, s (cid:48) ) = µ (cid:48) ( σ τ , θv τ ; s, s (cid:48) ) , we have m + 1 ( γ (cid:48) ( m ) τ − γ ( m ) τ γ ( m ) τ )= 1 m + 1 (cid:90) m ds (cid:90) m µ (cid:48) ( s, s (cid:48) ) ds (cid:48) − m µ = 1 m + 1 (cid:90) m ds (cid:90) m ∧ ( s +1)( s − + (cid:0) µ (cid:48) (1 , s (cid:48) + 1 − s ) − µ (cid:1) ds (cid:48) = m − m + 1 (cid:90) (cid:0) µ (cid:48) (1 , s (cid:48) ) − µ (cid:1) ds (cid:48) + 1 m + 1 (cid:90) ds (cid:90) − s (cid:0) µ (cid:48) (1 , s (cid:48) + 1 − s ) − µ (cid:1) ds (cid:48) → R ( σ τ , θv τ ) , since µ, µ (cid:48) are bounded. This ﬁnishes the proof for Lemma B. . Step 3

We now show Lemma B. . Proof of Lemma B. . Let b n = m n ∆ n and t ( i ) = ( I ( m, n, i ) − n , where the notation for I ( m, n, i ) can be found after step 2 above. From the proof of Theorem 6.2 in Figueroa-López & Li (2020a),we have b − / n (cid:90) T K b n ( t − τ ) (cid:0) σ t − σ τ (cid:1) dt = b − / n Λ τ −√ b n (cid:90) Tτ −√ b n L (cid:18) t − τb n (cid:19) dB t + o P (1) , L ( t ) = (cid:82) ∞ t K ( u ) du { t> } − (cid:82) t −∞ K ( u ) du { t ≤ } . Also we have b − / n (cid:90) (0 ,T ) c K b n ( t − τ ) dt = 1 √ b n (cid:32)(cid:90) − τbn −∞ K ( u ) du + (cid:90) ∞ T − τbn K ( u ) du (cid:33) → , as n → ∞ since Assumption imply that x / (cid:82) ∞ x K ( u ) du → , as x → ∞ . So, for a ﬁxed m , we can rewrite m / n ∆ / n H (2) n as βb − / n (cid:90) T K b n ( t − τ ) (cid:0) σ t − σ τ (cid:1) dt + o P (1)= βb − / n Λ τ −√ b n l n ( m ) (cid:88) i : t ( i ) >τ −√ b n (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) t − τb n (cid:19) dB t + o P (1)=: l n ( m ) (cid:88) i =1 α ( m ) ni + o P (1) , (B.34)with α ( m ) ni = 0 if i is such that t ( i ) ≤ τ − √ b n .Combine with the proof of Lemma B. , we can deduce the following lemma. Lemma B.8.

Under Assumptions and and (2.9), with m n → ∞ and m n ∆ / n → β ∈ (0 , ∞ ) , lim m →∞ lim sup n →∞ m / n ∆ / n E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) H (1) n + H (2) n − l n ( m ) (cid:88) i =1 ζ ( m ) ni − l n ( m ) (cid:88) i =1 α ( m ) ni (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 , where ζ ( m ) ni := m / n ∆ / n φ kn ( g ) K b n ( t ( i ) − τ ) (cid:80) mk n − r =1 φ nI i + r,r with notation (B.1). Now Lemma B. follows if we apply Theorem 2.2.15 in Jacod & Protter (2011) to the sum ofmartingale diﬀerences ( ζ ( m ) ni + α ( m ) ni ) and the ﬁltration G i = F ( I i +1 − n , and show that l n ( m ) (cid:88) i =1 ( ζ ( m ) ni + α ( m ) ni ) st −→ Z τ + βZ (cid:48) τ . To this end, we ﬁrst need to show, for a ﬁxed m , l n ( m ) (cid:88) i =1 E (cid:16) ( ζ ( m ) ni ) |G i − (cid:17) → m + 1 1 θ (cid:0) γ (cid:48) ( m ) τ − γ ( m ) τ (cid:1) (cid:90) K ( u ) du, (B.35) l n ( m ) (cid:88) i =1 E (cid:16) ( α ( m ) ni ) |G i − (cid:17) → β Λ τ (cid:90) L ( u ) du, (B.36) l n ( m ) (cid:88) i =1 E (( ζ ( m ) ni α ( m ) ni ) |G i − ) → . (B.37)The proof of (B.35) can be found in the proof for Lemma B. . (B.36) can be directly derived fromthe deﬁnition (B.34): l n ( m ) (cid:88) i =1 E (cid:16) ( α ( m ) ni ) |G i − (cid:17) = β b − n Λ τ −√ b n l n ( m ) (cid:88) i : t ( i ) >τ −√ b n (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) t − τb n (cid:19) dt, → β Λ τ (cid:90) L ( u ) du.

35o we only need to show (B.37). With the notation (B.1), we have E (cid:32) (cid:32) mk n − (cid:88) r =0 φ nI i + r,r (cid:33) (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) t − τb n (cid:19) dB t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i − (cid:33) = E (cid:32) (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) t − τb n (cid:19) dB t E (cid:32) (cid:32) mk n − (cid:88) r =0 φ nI i + r,r (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) H t ( i ) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i − (cid:33) = σ t ( i ) E (cid:32) (cid:32) mk n − (cid:88) r =0 (cid:16) W nt ( i )+ r (cid:17) (cid:33) (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) t − τb n (cid:19) dB t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i − (cid:33) + E (cid:32) (cid:32) mk n − (cid:88) r =0 Ψ t ( i )+ r,r (cid:33) (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) t − τb n (cid:19) dB t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i − (cid:33) := A i + B i . Let U si,r = (cid:82) st ( i )+ r ∆ n g n (cid:16) u − ( t ( i )+ r ∆ n ) k n ∆ n (cid:17) dW u , g n ( t ) = (cid:80) k n r =1 g (cid:16) rk n (cid:17) [ ( r − nkn ∆ n , r ∆ nkn ∆ n , ]( t ) . By Itô lemma,we have when t ( i ) > τ − √ b n , A i = 1 k n ∆ n σ t ( i ) E (cid:32) mk n − (cid:88) r =0 (cid:90) t ( i )+( r + k n )∆ n t ( i )+ r ∆ n U si,r g n (cid:18) s − ( t ( i ) + r ∆ n ) k n ∆ n (cid:19) dW s (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) s − τb n (cid:19) dB s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i − (cid:33) = 1 k n ∆ n σ t ( i ) E (cid:32) mk n − (cid:88) r =0 (cid:90) t ( i )+( r + k n )∆ n t ( i )+ r ∆ n U si,r g n (cid:18) s − ( t ( i ) + r ∆ n ) k n ∆ n (cid:19) L (cid:18) s − τb n (cid:19) ρ s ds (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i − (cid:33) = 0 , since E (cid:0) U si,r (cid:12)(cid:12) G i − (cid:1) = 0 . As for B i , we can apply Cauchy-Swacharz inequality. By (B.8) and the boundedness of L , B i ≤ E  (cid:32) mk n − (cid:88) r =0 Ψ t ( i )+ r,r (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G i −  (cid:90) t ( i )+( m +1) k n ∆ n t ( i ) L (cid:18) s − τb n (cid:19) ds ≤ C ( mk n ) ∆ / n ( m + 1) k n ∆ n ≤ C ∆ n . Finally, we can show l n ( m ) (cid:88) i =1 E (( ζ ( m ) ni α ( m ) ni ) |G i − )= l n ( m ) (cid:88) i : t ( i ) >τ −√ b n Cb n K b n ( t ( i ) − τ )Λ τ −√ b n ( A i + B i ) ≤ Cb n l n ( m ) (cid:88) i =1 | K b n ( t ( i ) − τ ) | (cid:16) ∆ / n (cid:17) = O ( m n ∆ n ) → . Now we single out a two dimension Brownian motion ˜ W = ( W, B ) , and a subset N of boundedmartingales, all orthogonal to ˜ W . Let D ni ( N ) = N ( I i +1 − n − N ( I i − n . We need to prove l n ( m ) (cid:88) i =1 E (cid:0) ( ζ ( m ) ni + α ( m ) ni ) D ni ( N ) |G ni − (cid:1) P → , ˜ W or is in the set N . Since [ W t , B t ] ≤ [ W t , W t ] = t , we candeduce (cid:80) l n ( m ) i =1 E (cid:0) ( ζ ( m ) ni + α ( m ) ni ) D ni ( N ) |G ni − (cid:1) P → , for the same reason as in proving (B.27).Next, l n ( m ) (cid:88) i =1 E (cid:16) ( ζ ( m ) ni + α ( m ) ni ) |G i − (cid:17) P → can be easily deduced by straightforward computation and (B.25).Thus, let m → ∞ , we can conclude m / n ∆ / n (cid:0) H (1) n + H (2) n (cid:1) converges stably in law to a ran-dom variable deﬁned on a good extension (cid:18) ˜Ω , ˜ F , (cid:16) ˜ F t (cid:17) t> , ˜ P (cid:19) of the space (cid:16) Ω , F , ( F t ) t ≥ , P (cid:17) , andconditionally on F , are a Gaussian random variable with conditional variance δ + δ . Combiningwith Lemma B. , we can ﬁnally deduce that m / n ∆ / n (cid:0) H (1) n + H (2) n (cid:1) st −→ Z τ + βZ (cid:48) τ , where Z τ , Z (cid:48) τ are deﬁned on (cid:18) ˜Ω , ˜ F , (cid:16) ˜ F t (cid:17) t> , ˜ P (cid:19) and conditionally independent with (cid:101) E (cid:0) Z τ |F (cid:1) = δ = 4 (cid:0) Φ σ τ /θ + 2Φ σ τ γ τ θ + Φ γ τ θ (cid:1) (cid:82) K ( u ) du, (cid:101) E (cid:0) Z (cid:48) τ |F (cid:1) = δ = Λ τ (cid:82)(cid:82) xy ≥ K ( x ) K ( y )( | x | ∧ | y | ) dxdy. References

Aït-Sahalia, Y. & Jacod, J. (2014).

High-frequency ﬁnancial econometrics . Princeton UniversityPress.

Aït-Sahalia, Y. & Xiu, D. (2019). Principal component analysis of high-frequency data.

Journalof the American Statistical Association (525), 287–303.

Alvarez, A., Panloup, F., Pontier, M. & Savy, N. (2012). Estimation of the instantaneousvolatility.

Statistical inference for stochastic processes (1), 27–59. Bandi, F. & Russell, J. (2008). Microstructure noise, realized volatility and optimal sampling.

Review of Economic Studies , 339–369. Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. & Shephard, N. (2008). Design-ing realized kernels to measure the ex post variation of equity prices in the presence of noise.

Econometrica (6), 1481–1536. Campbell, J., Lo, A. & MacKinlay, A. (1997).

The econometrics of Financial Markets . Prince-ton.

Chen, R. Y. (2019). Inference for volatility functionals of multivariate itô semimartingales observedwith jump and noise. Tech. rep., Working paper. Available at arXiv:1810.04725v2.

Fan, J. & Wang, Y. (2008). Spot volatility estimation for high-frequency data.

Statistics and itsInterface (2), 279–288. 37 igueroa-López, J. & Li, C. (2020a). Optimal kernel estimation of spot volatility ofstochastic diﬀerential equations. To appear in Stochastic Processes and their Applications.

Doi.org/10.1016/j.spa.2020.01.013.

Figueroa-López, J. & Li, C. (2020b). Supplement to “optimal kernel estima-tion of spot volatility of stochastic diﬀerential equations”.

Available online onhttps://pages.wustl.edu/ﬁgueroa/publications . Foster, D. & Nelson, D. (1996). Continuous record asymptotics for rolling sample varianceestimators.

Econometrica (1), 139–174. Hansen, P. & Lunde, A. (2006). Realized variance and market microstructure noise.

J. Bus.Econom. Statist. , 127–218. Jacod, J., Li, Y., Mykland, P. A., Podolskij, M. & Vetter, M. (2009). Microstructure noisein the continuous case: the pre-averaging approach.

Stochastic processes and their applications (7), 2249–2276.

Jacod, J., Podolskij, M. & Vetter, M. (2010). Limit theorems for moving averages of dis-cretized processes plus noise.

The Annals of Statistics (3), 1478–1545. Jacod, J. & Protter, P. (2011).

Discretization of processes . Springer Science & Business Media.

Jacod, J. & Rosenbaum, M. (2013). Quarticity and other functionals of volatility: eﬃcientestimation.

The Annals of Statistics (3), 1462–1484. Kristensen, D. (2010). Nonparametric ﬁltering of the realized spot volatility: A kernel-basedapproach.

Econometric Theory (1), 60–93. Li, J., Liu, Y., Xiu, D. et al. (2019). Eﬃcient estimation of integrated volatility functionals viamultiscale jackknife.

The Annals of Statistics (1), 156–176. Li, J., Todorov, V. & Tauchen, G. (2017). Adaptive estimation of continuous-time regressionmodels using high-frequency data.

Journal of Econometrics (1), 36–47.

Li, J. & Xiu, D. (2016). Generalized method of integrated moments for high-frequency data.

Econometrica (4), 1613–1633. Mancini, C., Mattiussi, V. & Renò, R. (2015). Spot volatility estimation using delta sequences.

Finance & Stochastics (2), 261–293. Mykland, P. & Zhang, L. (2012). The econometrics of high-frequency data.

In Statistical Methodsfor Stochastic Diﬀerential Equations, M. Kessler, A. Lindner, and M. Sorensen, eds. , 109–190.

Mykland, P. A. & Zhang, L. (2009). Inference for continuous semimartingales observed at highfrequency.

Econometrica (5), 1403–1445. Parzen, E. (1962). On estimation of a probability density function and mode.

The annals ofmathematical statistics (3), 1065–1076. 38 odolskij, M. & Vetter, M. (2009). Estimation of volatility functionals in the simultaneouspresence of microstructure noise and jumps. Bernoulli (3), 634–658. Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function.

TheAnnals of Mathematical Statistics (3), 832–837. Wand, M. & Jones, M. (1995).

Monographs on statistics and applied probability . Chapman andHall London, UK.

Xiu, D. (2010). Quasi-maximum likelihood estimation of volatility with high frequency data.

Journalof Econometrics (1), 235–250.

Zeng, Y. (2003). A partially observed model for micromovement of asset prices with Bayes estima-tion via ﬁltering.

Mathematical Finance (3), 411–444. Zhang, L. (2006). Eﬃcient estimation of stochastic volatility using noisy observations: A multi-scaleapproach.

Bernoulli (6), 1019–1043. Zhang, L., Mykland, P. A. & Aït-Sahalia, Y. (2005). A tale of two time scales: Deter-mining integrated volatility with noisy high-frequency data.

Journal of the American StatisticalAssociation (472), 1394–1411.

Zu, Y. & Boswijk, H. P. (2014). Estimating spot volatility with high-frequency ﬁnancial data.

Journal of Econometrics181