[PDF] Adaptive Robust Large Volatility Matrix Estimation Based on High-Frequency Financial Data

Abstract

Several novel statistical methods have been developed to estimate large integrated volatility matrices based on high-frequency financial data. To investigate their asymptotic behaviors, they require a sub-Gaussian or finite high-order moment assumption for observed log-returns, which cannot account for the heavy tail phenomenon of stock returns. Recently, a robust estimator was developed to handle heavy-tailed distributions with some bounded fourth-moment assumption. However, we often observe that log-returns have heavier tail distribution than the finite fourth-moment and that the degrees of heaviness of tails are heterogeneous over the asset and time period. In this paper, to deal with the heterogeneous heavy-tailed distributions, we develop an adaptive robust integrated volatility estimator that employs pre-averaging and truncation schemes based on jump-diffusion processes. We call this an adaptive robust pre-averaging realized volatility (ARP) estimator. We show that the ARP estimator has a sub-Weibull tail concentration with only finite 2\alpha-th moments for any \alpha>1. In addition, we establish matching upper and lower bounds to show that the ARP estimation procedure is optimal. To estimate large integrated volatility matrices using the approximate factor model, the ARP estimator is further regularized using the principal orthogonal complement thresholding (POET) method. The numerical study is conducted to check the finite sample performance of the ARP estimator.

Full PDF

AAdaptive Robust Large Volatility Matrix EstimationBased on High-Frequency Financial Data ∗ Minseok Shin , Donggyu Kim and Jianqing Fan KAIST and Princeton UniversityFebruary 26, 2021

Abstract

Several novel statistical methods have been developed to estimate large integratedvolatility matrices based on high-frequency ﬁnancial data. To investigate their asymp-totic behaviors, they require a sub-Gaussian or ﬁnite high-order moment assumptionfor observed log-returns, which cannot account for the heavy tail phenomenon of stockreturns. Recently, a robust estimator was developed to handle heavy-tailed distribu-tions with some bounded fourth-moment assumption. However, we often observe thatlog-returns have heavier tail distribution than the ﬁnite fourth-moment and that thedegrees of heaviness of tails are heterogeneous over the asset and time period. Inthis paper, to deal with the heterogeneous heavy-tailed distributions, we develop anadaptive robust integrated volatility estimator that employs pre-averaging and trun-cation schemes based on jump-diﬀusion processes. We call this an adaptive robustpre-averaging realized volatility (ARP) estimator. We show that the ARP estimator ∗ Minseok Shin is a Ph.D. student, College of Business, KAIST, Seoul 02455, South Korea. DonggyuKim is Ewon Assistant Professor, College of Business, KAIST, Seoul 02455, South Korea. His researchwas supported by KAIST Basic Research Funds by Faculty (A0601003029). Jianqing Fan is Frederick L.Moore’18 Professor of Finance, Department of Operations Research and Financial Engineering, PrincetonUniversity, Princeton, NJ 08544. His research was supported by NSFC grant No.71991471 and 71991470. a r X i v : . [ m a t h . S T ] F e b as a sub-Weibull tail concentration with only ﬁnite 2 α -th moments for any α > Key words:

Heterogeneity, tail index, pre-averaging, minimax lower bound, optimality,POET, factor model.

In modern ﬁnancial studies and practices, volatility estimation is fundamental in risk man-agement, performance evaluation, and portfolio allocation. Due to the wide availabilityof high-frequency ﬁnancial data, many well-performing volatility estimation methods havebeen developed to estimate integrated volatilities. Examples include two-time scale real-ized volatility (TSRV) (Zhang et al., 2005), multi-scale realized volatility (MSRV) (Zhang,2006, 2011), wavelet estimator (Fan and Wang, 2007), pre-averaging realized volatility (PRV)(Christensen et al., 2010; Jacod et al., 2009), kernel realized volatility (KRV) (Barndorﬀ-Nielsen et al., 2008, 2011), quasi-maximum likelihood estimator (QMLE) (A¨ıt-Sahalia et al.,2010; Xiu, 2010), and the local method of moments (Bibinger et al., 2014). One of thestylized features of ﬁnancial data is the existence of price jumps, and the empirical studieshave shown that the decomposition of daily variation into its continuous and jump compo-nents can better explain the volatility dynamics (A¨ıt-Sahalia et al., 2012; Andersen et al.,2007; Barndorﬀ-Nielsen and Shephard, 2006; Corsi et al., 2010; Song et al., 2020). For ex-ample, Fan and Wang (2007) and Zhang et al. (2016) employed wavelet method to identifythe jumps given noisy high-frequency data. Mancini (2004) studied a threshold method forjump-detection and presented the order of an optimal threshold, and Davies and Tauchen(2018) further examined a data-driven type threshold method. These estimation methodsperform well for a small number of assets. However, we often encounter a large number of2ssets in practices such as portfolio allocation, which results in the curse of dimensionality.To overcome the curse of dimensionality and obtain an eﬃcient and eﬀective large volatilityestimator, we often impose the approximate factor structure on the volatility matrix (Fanand Kim, 2018; Fan et al., 2013, 2018; Kim and Fan, 2019). For example, to account forcommon market factors such as sector, ﬁrm size, and book-to-market ratios, the factor-basedhigh-dimensional Itˆo process is widely employed and the idiosyncratic volatility is assumedto be sparse (A¨ıt-Sahalia and Xiu, 2017; Fan et al., 2016a,b; Kim et al., 2018; Kong, 2018).The principal orthogonal complement thresholding (POET) method (Fan et al., 2013) isoften employed to estimate these low-rank plus sparse matrices.The performance of the factor-based large volatility matrix estimator critically dependson the accuracy of each integrated volatility estimator. Speciﬁcally, sub-Weibull tail con-centration for the input volatility matrix estimator is required to investigate its asymptoticbehaviors. However, one stylized feature of stock return data is heavy-tailedness, whichviolates the sub-Gaussian assumption on the stock return data. Recently, with a boundedfourth-moment assumption on the microstructural noise, Fan and Kim (2018) developed a ro-bust estimation method, which can attain sub-Gaussian tail concentration with the optimalconvergence rate. See also Catoni (2012); Minsker (2018). However, the empirical studieshave demonstrated that the bounded fourth-moment condition is often violated (Cont, 2001;Mao and Zhang, 2018; Massacci, 2017). Figure 1 shows the box plots of daily log kurtosesof the returns of the 200 most liquid assets in the S&P 500 index, calculated from 1-minlog-return data with the previous tick scheme, for each of the 5 days in 2016: from theday with the largest interquartile range (IQR) to the day with the smallest IQR among 252days. In Figure 1, we ﬁnd that the log-return data are heavy-tailed and also have heteroge-neous degrees of heaviness of tails over the diﬀerent assets and diﬀerent days. These factsgenerate the demand for developing an adaptive robust estimation method that can handleheterogeneous heavy-tailedness.In this paper, we develop an adaptive robust integrated volatility estimator based onjump-diﬀusion processes contaminated by microstructural noises. We ﬁrst use the pre-averaging scheme (Jacod et al., 2009) to adjust the unbalanced order relationship between3 l lllll llllllllllll llllllllllll llllllllllllllll (a) (b) (c) (d) (e)

Box plot

Day

Log k u r t o s i s t−distribution with degrees of freedom 5 Figure 1: The box plots for the daily distributions of log kurtoses calculated from 1-minlog returns based on the most liquid 200 stocks in S&P 500 index in 2016. Day (a) has thelargest IQR, and days (b)–(e) have the 75th, 50th, 25th, and 0th (smallest) percentile of theIQR among 252 trading days in 2016, respectively. The red dash line marks the kurtosis forthe t -distribution.the microstructural noises and true log-returns. We then employ the truncation method(Minsker, 2018) using the daily moment conditions of assets. Speciﬁcally, we truncate pre-averaged variables according to their heavy-tailedness, which allows for adaptive learningmerits to be enjoyed. Also, the truncation method suﬃciently mitigates the eﬀect of thejump signal on the pre-averaged variables. We call the proposed estimator the adaptiverobust pre-averaging realized volatility (ARP) estimator. We show that the ARP estimatorhas sub-Weibull tail concentration, with ﬁnite 2 α -th moment assumption for any α > We ﬁrst deﬁne some notations. For any given p by p matrix A = ( A ij ) ≤ i ≤ p , ≤ j ≤ p , let (cid:107) A (cid:107) = max ≤ j ≤ p p (cid:88) i =1 | A ij | , (cid:107) A (cid:107) ∞ = max ≤ i ≤ p p (cid:88) j =1 | A ij | , (cid:107) A (cid:107) max = max i,j | A ij | . The matrix spectral norm (cid:107) A (cid:107) is the square root of the largest eigenvalue of AA (cid:62) , andthe Frobenius norm of A is denoted by (cid:107) A (cid:107) F = (cid:112) tr( A (cid:62) A ). We will use C to denote ageneric positive constant whose value is free of n and p and may change from appearance toappearance.Let X ( t ) = ( X ( t ) , . . . , X p ( t )) (cid:62) be the vector of true log-prices for p assets at time t .To model the high-frequency ﬁnancial data, we often employ the jump-diﬀusion process asfollows: d X ( t ) = d X c ( t ) + L ( t ) d Λ ( t )= µ ( t ) dt + σ (cid:62) ( t ) d W t + L ( t ) d Λ ( t ) , (2.1)where X c ( t ) = ( X c ( t ) , . . . , X cp ( t )) (cid:62) with X c (0) = X (0) is the vector of true continuous log-prices at time t , µ ( t ) = ( µ ( t ) , . . . , µ p ( t )) (cid:62) is a drift vector, σ ( t ) is a q by p matrix, W t

5s a q dimensional independent Brownian motion, and the stochastic processes µ ( t ), X ( t ), X c ( t ), and σ ( t ) are deﬁned on a ﬁltered probability space (Ω , F , {F t , t ∈ [0 , } , P ) withﬁltration F t satisfying the usual conditions. For the jump part, L ( t ) = ( L ( t ) , . . . , L p ( t )) (cid:62) denotes the jump sizes and Λ ( t ) = (Λ ( t ) , . . . , Λ p ( t )) (cid:62) is the p dimensional Poisson processwith bounded intensity I ( t ) = ( I ( t ) , . . . , I p ( t )) (cid:62) . The instantaneous volatility matrix of thecontinuous log-price X c ( t ) is γ ( t ) = ( γ ij ( t )) ≤ i,j ≤ p = σ (cid:62) ( t ) σ ( t ) , and their quadratic variation is[ X c , X c ] t = (cid:90) t γ ( s ) ds = (cid:18)(cid:90) t γ ij ( s ) ds (cid:19) ≤ i,j ≤ p = (cid:90) t σ (cid:62) ( s ) σ ( s ) ds. The parameter of interest is the integrated volatility matrix of X c ( t ), Γ = [ X c , X c ] = (cid:90) γ ( s ) ds. (2.2)Unfortunately, we cannot observe the true log-prices X ( t ). In fact, observed high-frequency data are contaminated by microstructural noises. Furthermore, high-frequencydata encounter a non-synchronization problem that transactions for multiple assets oftenarrive asynchronously. In this regard, we assume that the observed log-price Y i ( t i,k ) obeysthe following model: Y i ( t i,k ) = X i ( t i,k ) + (cid:15) i ( t i,k ) for i = 1 , . . . , p, k = 0 , . . . , n i , (2.3)where t i,k is the k -th observation time point of the i -th asset, for ﬁxed i = 1 , . . . , p, (cid:15) i ( t i,k ) , k =0 , . . . , n i , are i.i.d. noises with a mean of zero; and for i, j = 1 , . . . , p, E [ (cid:15) i ( t ) (cid:15) j ( t )] = η ij and (cid:15) i ( t ) is independent of (cid:15) j ( t (cid:48) ) for t (cid:54) = t (cid:48) . In other words, we allow the microstructural noisesto have cross-sectional dependency, and (cid:15) i ( · ) is independent of the price processes X i ( · ).6o handle the microstructural noise issue, several estimation methods have been devel-oped (A¨ıt-Sahalia et al., 2010; Barndorﬀ-Nielsen et al., 2008, 2011; Bibinger et al., 2014;Christensen et al., 2010; Fan and Wang, 2007; Jacod et al., 2009; Xiu, 2010; Zhang et al.,2005; Zhang, 2006, 2011). They work well for a ﬁnite number of assets and are widelyadopted to develop large volatility matrix estimation procedures (Kim et al., 2016; Wangand Zou, 2010). However, the observed log-prices are heavy-tailed, so these methods cannotlead to the estimators with the sub-Weibull concentration bound that is essential asymptoticbehaviors for large volatility matrix inferences. To tackle the heavy tail issue, Fan and Kim(2018) proposed the robust pre-averaging realized volatility estimation procedure, which canachieve the sub-Gaussian tail concentration with only ﬁnite fourth-moment condition on themicrostructural noise. However, as shown in Figure 1, the degrees of heaviness of tails oflog-returns of assets are heterogeneous across assets and over time. Furthermore, jumps inthe true log-price process can also cause heavy-tailed distributions. To account for thesefeatures, we accommodate heterogeneous degrees of tail distributions based on the jump-diﬀusion process contaminated by microstructural noises. We assume that each asset hasa diﬀerent order of the highest ﬁnite absolute moment (see Assumption 1 in Section 4 fordetails). In this section, we introduce an adaptive robust integrated volatility estimation procedureto handle the non-synchronization, price jump, and microstructural noise. To handle thenon-synchronization problem, we consider the generalized sampling time proposed by A¨ıt-Sahalia et al. (2010). We note that the generalized sampling time scheme includes othersynchronization schemes such as previous tick (Zhang, 2011; Wang and Zou, 2010) andrefresh time (Barndorﬀ-Nielsen et al., 2011; Fan et al., 2012). See also Bibinger et al. (2014);Chen et al. (2020); Hayashi and Yoshida (2005, 2011); Malliavin et al. (2009); Park et al.(2016). We deﬁne the generalized sampling time as follows.7 eﬁnition 1. (A¨ıt-Sahalia et al., 2010). A sequence of time points τ = { τ , . . . , τ n } is saidto be the generalized sampling time if(1) 0 = τ < τ < · · · < τ n − < τ n = 1;(2) There exists at least one observation for each asset between consecutive τ j ’s;(3) The time intervals, { ∆ j = τ j − τ j − ; j = 1 , . . . , n } , satisfy sup j ∆ j p −→ i -th asset, we select arbitrary observation, Y i ( τ i,k ), between τ k − and τ k . In otherwords, we choose any τ i,k ∈ ( τ k − , τ k ] ∩ { t i,l , l = 0 , , . . . , n i } , i = 1 , . . . , p .Based on synchronized time, τ , we adopt the pre-averaging method to manage the mi-crostructural noise (Jacod et al., 2009). For the observed log-returns, Y i ( τ i,k +1 ) − Y i ( τ i,k ) , i =1 , . . . , p, k = 1 , . . . , n −

1, the variance of the microstructural noise 2 η ii dominates the con-tinuous log-return volatility (cid:82) τ i,k +1 τ i,k γ ii ( t ) dt . Therefore, it is hard to estimate the integratedvolatility without smoothing to denoising. To adjust the order relationship between thenoises and continuous log-returns, we use the following pre-averaged data to suppress thenoises (Christensen et al., 2010; Jacod et al., 2009): Z i ( τ k ) = K n − (cid:88) l =0 g (cid:18) lK n (cid:19) { Y i ( τ i,k + l +1 ) − Y i ( τ i,k + l ) } for i = 1 , . . . , p, k = 1 , . . . , n − K n , (3.1)where the weight function g ( · ) is continuous and piecewise continuously diﬀerentiable witha piecewise Lipschitz derivative g (cid:48) and satisﬁes g (0) = g (1) = 0 and (cid:82) { g ( t ) } dt >

0. In thispaper, we choose bandwidth parameter K n as C K n / for some constants C K , which providesthe optimal rate n − / . Then the continuous log-returns and noises in Z i ( τ k )’s are of thesame order of magnitude (Fan and Kim, 2018). However, as shown in Figure 1, the pre-averaged random variables still have heterogeneous heavy tails across assets. Furthermore,there exist jump variations in the pre-averaged data. We note that in Z i ( τ k ), the jumpshave higher order of magnitude than the noises and continuous log-returns. To handle theseproblems, we robustly estimate the volatility matrix by applying an adaptive truncationmethod according to the tails of the data. 8eﬁne the quadratic pre-averaged random variables Q ij ( τ k ) = n − K n φK n Z i ( τ k ) Z j ( τ k ) for i, j = 1 , . . . , p, k = 1 , . . . , n − K n , (3.2)where φ = K n (cid:80) K n − (cid:96) =0 (cid:110) g (cid:16) (cid:96)K n (cid:17)(cid:111) , and let α ij = 2 ∧ α i α j α i + α j , (3.3)where α i is the order of the highest ﬁnite moment for the continuous part of Q ii ( τ k ) (seeAssumption 1 in Section 4). Then, to handle the heterogeneous heavy tails, we propose thefollowing adaptive truncation method: (cid:98) T αij,θ = 1( n − K n ) θ ij n − K n (cid:88) k =1 ψ α ij { θ ij Q ij ( τ k ) } , (3.4)where θ ij is a truncation parameter and ψ α ( x ) is a bounded non-decreasing function deﬁnedfor 1 < α ≤ ψ α ( x ) =  − log(1 − t α + c α t αα ) if x ≥ t α − log(1 − x + c α x α ) if 0 ≤ x ≤ t α log(1 + x + c α | x | α ) if − t α ≤ x ≤ − t α + c α t αα ) if x ≤ − t α , where c α = max (cid:110) ( α − /α, (cid:112) (2 − α ) /α (cid:111) and t α = (1 /αc α ) / ( α − . We note that the trun-cation detects the jumps and mitigates their impact on the estimator. Other truncation canalso achieve similar goal (see Fan et al. (2021)). It will be shown that the proposed adaptiverobust estimator (cid:98) T αij,θ possesses the sub-Weibull concentration bounds (see Theorem 1).The adaptive robust estimator (cid:98) T αij,θ is, however, not a consistent estimator of the trueintegrated volatility Γ ij since the noises still remain in each Q ij ( τ k ). Indeed, it will be shown9hat (cid:98) T αij,θ converges to T ij = Γ ij + ρ ij , (3.5)where ρ ij = (cid:80) nk =1 ( τ i,k = τ j,k ) φK n ζη ij , ζ = K n − (cid:88) l =0 (cid:26) g (cid:18) lK n (cid:19) − g (cid:18) l + 1 K n (cid:19)(cid:27) = O (cid:18) K n (cid:19) , with the covariance of noise η ij deﬁned in (2.3), and ( · ) is the indicator function. Hence,to estimate the integrated volatility Γ ij , we adjust (cid:98) T αij,θ by subtracting an estimator of ρ ij .For this purpose, let us ﬁrst deﬁne an adaptive robust estimator, (cid:98) ρ αij,θ , as (cid:98) ρ αij,θ = ζφK n θ ρ,ij n − (cid:88) k =1 ψ α ij { θ ρ,ij Q ρ,ij ( τ k ) } , (3.6)where Q ρ,ij ( τ k ) = 12 { Y i ( τ i,k +1 ) − Y i ( τ i,k ) }{ Y j ( τ j,k +1 ) − Y j ( τ j,k ) } (3.7)for i, j = 1 , . . . , p, k = 1 , . . . , n − , and θ ρ,ij is truncation parameter that will be speciﬁed inTheorem 3. We now deﬁne the integrated volatility estimator as follows: (cid:98) Γ ij = (cid:98) T αij,θ − (cid:98) ρ αij,θ . (3.8)We call this the adaptive robust pre-averaging realized volatility (ARP) estimator. Thisprovides a preliminary consistent estimate of (cid:98) Γ ij , which will be further regularized. In this section, we show the concentration property and optimality of the ARP estimator,by establishing matching upper and lower bounds for both (cid:98) T αij,θ and (cid:98) ρ αij,θ . Note that we donot impose any restrictions on the jump sizes L ( t ) in (2.1). In other words, for the truelog-prices, we only need assumptions for the continuous part. Speciﬁcally, Assumptions 1–210re based on the following random variables, Y ci ( τ i,k ) = X ci ( τ i,k ) + (cid:15) i ( τ i,k ) , Z ci ( τ k ) = K n − (cid:88) l =0 g (cid:18) lK n (cid:19) { Y ci ( τ i,k + l +1 ) − Y ci ( τ i,k + l ) } ,Q cij ( τ k ) = n − K n φK n Z ci ( τ k ) Z cj ( τ k ) ,Q cρ,ij ( τ k ) = 12 { Y ci ( τ i,k +1 ) − Y ci ( τ i,k ) }{ Y cj ( τ j,k +1 ) − Y cj ( τ j,k ) } , where X ci ( t ) is the true continuous log-price process deﬁned in (2.1) and the superscript c rep-resents the continuous part of the true log-price. Now, to investigate asymptotic propertiesof (cid:98) T αij,θ , we make the following assumptions. Assumption 1. (a) There exist positive constants ν µ and ν γ such that max ≤ i ≤ p max ≤ t ≤ µ i ( t ) ≤ ν µ a.s. and max ≤ i ≤ p max ≤ t ≤ γ ii ( t ) ≤ ν γ a.s. ; (b) The generalized sampling time { τ j } is independent of the price process X ( t ) and thenoise (cid:15) i ( t i,k ) . The time intervals, { ∆ j = τ j − τ j − , ≤ j ≤ n } , satisfy max ≤ j ≤ n ∆ j ≤ Cn − a.s. ; (c) There exists a positive constant ν Q such that max ≤ i ≤ p E {| Q cii ( τ k ) | α i } ≤ ν Q for all ≤ k ≤ n − K n . Remark 1.

For Assumption 1(a), the boundedness condition of the instantaneous volatilityprocess γ ii ( t ) can be relaxed to the locally boundedness condition when we investigate theasymptotic behaviors of volatility estimators such as their convergence rate (see A¨ıt-Sahaliaand Xiu (2017)). Speciﬁcally, Lemma 4.4.9 in Jacod and Protter (2012) indicates that if the symptotic result such as convergence in probability or stable convergence in law is satisﬁedunder the boundedness condition, it is also satisﬁed under the locally boundedness condition.From this point of view, since we consider a ﬁnite time period, it is suﬃcient to investigatethe asymptotic properties under the boundedness condition. Thus, Assumption 1(a) is notrestrictive. Remark 2.

Assumption 1(c) is the ﬁnite moment condition, which entails that the quadraticpre-averaged variable, Q cij ( τ k ) , for the continuous part satisﬁes E (cid:110)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k (cid:111) ≤ U ij ( τ k ) a.s.for all ≤ i, j ≤ p , ≤ k ≤ n − K n , and some positive constants U ij ( τ k ) , where α ij isdeﬁned in (3.3) (see Proposition 2(a) in the Appendix). To account for the heterogeneousheavy-tailedness, we allow the tail index α i to vary from 1 to inﬁnity. If α i = 2 for all i = 1 , . . . , p , it is the similar setting as that of Fan and Kim (2018), and the ARP estimatorhas universal truncation, which we call the universal robust pre-averaging realized volatility(URP) estimator. To investigate the heterogeneous heavy tail, we compare the ARP andURP estimators in the numerical study. The theorem below shows that (cid:98) T αij,θ has the sub-Weibull tail concentration with a con-vergence rate of n (1 − α ij ) / α ij . Theorem 1. (Upper bound) Under the models (2.1) and (2.3) and Assumption 1, let δ − ∈ [ n c , e n / ] for some positive constant c > . Take θ ij = (cid:18) K n log (3 K n δ − )( α ij − c α ij S ij ( n − K n ) (cid:19) /α ij , where S ij = 1 n − K n (cid:80) n − K n k =1 U ij ( τ k ) . Then we have, for suﬃciently large n , Pr (cid:110) | (cid:98) T αij,θ − T ij | ≤ C (cid:0) n − / log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ − δ. (4.1)Theorem 1 indicates that (cid:98) T αij,θ has a sub-Weibull concentration bound with a convergence12ate of n (1 − α ij ) / α ij . Speciﬁcally, as long as p b +2 ∈ [ n c , e n / ] for some positive constant c > (cid:26) max ≤ i,j ≤ p | (cid:98) T αij,θ − T ij | ≥ C b (cid:0) n − / log p (cid:1) ( α − /α (cid:27) ≤ p − b for any constant b > α = min ≤ i ≤ p α i , where C b is some constant depending on b ,which is the essential condition for investigating large matrix inferences (see Proposition 1).An interesting ﬁnding is that there is a trade-oﬀ between the convergence rate n (1 − α ij ) / α ij and the tail indexes α i and α j . This raises the question of whether the upper bound in (4.1)is optimal or not.Let (cid:98) T ij ( Q ij ( τ k ) , δ ) = (cid:98) T ij ( Q ij ( τ ) , . . . , Q ij ( τ n − K n ) , δ ) be any pre-averaging estimator for T ij deﬁned in (3.5), which takes values of pre-averaged variables Q ij ( τ k ) , k = 1 , . . . , n − K n , deﬁned in (3.2). The following theorem establishes the lower bound for the maximumconcentration probability among the class of pre-averaging estimators (cid:98) T ij ( Q ij ( τ k ) , δ ) whichsatisfy max ≤ i ≤ p E {| Q cii ( τ k ) | α i } ≤ C for all 1 ≤ k ≤ n − K n . Theorem 2. (Lower bound) Under the assumptions in Theorem 1, let α ij ∈ (1 , for some ≤ i, j ≤ p . Then we have, for suﬃciently large n , min (cid:98) T ij ( Q ij ( τ k ) ,δ ) max Q c ∈ Ξ( α ,...,α p ) Pr (cid:110) | (cid:98) T ij ( Q ij ( τ k ) , δ ) − T ij | ≥ C (cid:0) n − / log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ δ, (4.2) where Ξ( α , . . . , α p ) = { Q c = ( Q cii ( τ k )) i =1 ,...,p,k =1 ,...,n − K n : max i,k E {| Q cii ( τ k ) | α i } ≤ C } . Theorem 2 shows that the lower bound is n (1 − α ij ) / α ij , which matches the upper boundin Theorem 1. Thus, the proposed estimator obtains the optimal convergence rate of n (1 − α ij ) / α ij . Remark 3.

To handle the microstructural noise, we use the sub-sampling scheme, andthe number of non-overlapping quadratic pre-averaged variables Q ij ( τ k ) is Cn / , which isknown as the optimal choice. That is, we are only able to use n / observations to estimate T ij due to biases of varying spot volalities, which is the cost of managing the microstructural oise. Thus, the optimal convergence rate is expected to be the square root of the rates of theestimators that are not aﬀected by the microstructural noise. From this point of view, theconvergence rate n (1 − α ij ) / α ij is consistent with the results in Devroye et al. (2016) and Sunet al. (2020). Recall that the ARP estimator has the bias adjustment as follows: (cid:98) Γ αij = (cid:98) T αij,θ − (cid:98) ρ αij,θ . (4.3)Thus, to establish the concentration inequality for the ARP estimator (cid:98) Γ αij , we need to inves-tigate (cid:98) ρ αij,θ . To do this, we use the quadratic log-return random variables Q ρ,ij ( τ k ) deﬁnedin (3.7) and need the following moment condition. Assumption 2.

There exists a positive constant ν ρ,Q such that max ≤ i ≤ p E (cid:8)(cid:12)(cid:12) Q cρ,ii ( τ k ) (cid:12)(cid:12) α i (cid:9) ≤ ν ρ,Q for all ≤ k ≤ n − . Remark 4.

Assumption 2 indicates that the continuous part of the observed log-return, Y ci ( τ i,k +1 ) − Y ci ( τ i,k ) , has ﬁnite α i -th moment. We note that Assumption 1(c) is satisﬁedunder Assumption 1(a)–(b) and Assumption 2 (see Proposition 3 in the Appendix). Under Assumption 1(a)–(b) and Assumption 2, Q cρ,ij ( τ k ) has conditional α ij -th moments,E (cid:110)(cid:12)(cid:12) Q cρ,ij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k − (cid:111) ≤ U ρ,ij ( τ k ) a.s.for all 1 ≤ i, j ≤ p , 1 ≤ k ≤ n −

1, and some positive constants U ρ,ij ( τ k ) (see Proposition2(b) in the Appendix). With this α ij -th moment condition, we establish the concentrationinequalities for the ARP estimator (cid:98) Γ αij in the following theorem. Theorem 3. (Upper bound) Under the assumptions in Theorem 1 and Assumption 2, take θ ρ,ij = (cid:18) log (6 δ − )( α ij − c α ij S ρ,ij ( n − (cid:19) /α ij , here S ρ,ij = 1 n − (cid:80) n − k =1 U ρ,ij ( τ k ) . Then, for suﬃciently large n , we have Pr (cid:110) | (cid:98) ρ αij,θ − ρ ij | ≤ C (cid:0) n − log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ − δ (4.4) and Pr (cid:110) | (cid:98) Γ αij − Γ ij | ≤ C (cid:0) n − / log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ − δ. (4.5)Theorem 3 shows that (cid:98) ρ αij,θ has a sub-Weibull tail concentration bound with the conver-gence rate of n (1 − α ij ) /α ij , which is negligible compared to the upper bound in Theorem 1.Thus, the ARP estimator has a sub-Weibull tail concentration with the convergence rate of n (1 − α ij ) / α ij as in Theorem 1, which is optimal as shown in Theorems 1–2. Although the upperbound for (cid:98) ρ αij,θ is dominated by the upper bound in Theorem 1, it is worth checking whether (cid:98) ρ αij,θ is an optimal estimator or not. Let (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ) = (cid:98) ρ ij ( Q ρ,ij ( τ ) , . . . , Q ρ,ij ( τ n − ) , δ )be any estimator for ρ ij , possibly depending on δ . The following theorem provides alower bound for the maximum concentration probability among the class of estimators (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ) which satisfy max ≤ i ≤ p E (cid:8)(cid:12)(cid:12) Q cρ,ii ( τ k ) (cid:12)(cid:12) α i (cid:9) ≤ C for all 1 ≤ k ≤ n − Theorem 4. (Lower bound) Under the assumptions in Theorem 3, let α ij ∈ (1 , for some ≤ i, j ≤ p . Then we have, for suﬃciently large n , min (cid:98) ρ ij ( Q ρ,ij ( τ k ) ,δ ) max Q cρ ∈ Ξ ρ ( α ,...,α p ) Pr (cid:110) | (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ) − ρ ij | ≥ C (cid:0) log δ − /n (cid:1) ( α ij − /α ij (cid:111) ≥ δ, (4.6) where Ξ ρ ( α , . . . , α p ) = { Q cρ = ( Q cρ,ii ( τ k )) i =1 ,...,p,k =1 ,...,n − : max i,k E (cid:8)(cid:12)(cid:12) Q cρ,ii ( τ k ) (cid:12)(cid:12) α i (cid:9) ≤ C } . The upper bound in (4.4) and lower bound in (4.6) match, which implies that (cid:98) ρ αij,θ achievesthe optimal rate. To sum up, the proposed estimators for T ij and ρ ij are both optimal interms of convergence rate, which indicates that the ARP estimator is also optimal in theclass of pre-averaging approaches. 15 Application to large volatility matrix estimation

In this section, we discuss how to estimate large integrated volatility matrices based onthe approximate factor model using the ARP estimator. Speciﬁcally, we assume that theintegrated volatility matrix has the following low-rank plus sparse structure: Γ = p (cid:88) k =1 λ k q k q (cid:62) k = Θ + Σ = r (cid:88) k =1 ¯ λ k ¯ q k ¯ q (cid:62) k + Σ , where λ i > λ i > i -th largest eigenvalues of Γ and Θ , respectively, and theircorresponding eigenvectors are q i and ¯ q i . The low-rank volatility matrix Θ accounts forthe factor eﬀect on the volatility matrix. We assume that the rank, r , of Θ is bounded.The sparse volatility matrix Σ stands for idiosyncratic risk and satisﬁes the following sparsecondition: max ≤ i ≤ p p (cid:88) j =1 | Σ ij | q (Σ ii Σ jj ) (1 − q ) / ≤ M σ s p a.s. , (5.1)where M σ is a positive random variable with E ( M σ ) < ∞ , q ∈ [0 , s p is a deterministicfunction of p that grows slowly in p . When Σ ii is bounded from below and q = 0, s p measuresthe maximum number of nonvanishing elements in each row of matrix Σ . This low-rank plussparse structure is widely adopted for studying large matrix inferences (A¨ıt-Sahalia and Xiu,2017; Bai, 2003; Bai and Ng, 2002; Fan and Kim, 2018; Fan et al., 2018; Kim et al., 2018;Stock and Watson, 2002).To harness the low-rank plus sparse structure, we employ the principal orthogonal com-plement thresholding (POET) method (Fan et al., 2013) as follows. We ﬁrst decompose aninput volatility matrix using the ARP estimators in (3.8) as follows: (cid:98) Γ = (cid:16)(cid:98) Γ αij (cid:17) i,j =1 ,...,p = p (cid:88) k =1 (cid:98) λ k (cid:98) q k (cid:98) q (cid:62) k , where (cid:98) λ i is the i -th largest eigenvalue of (cid:98) Γ , and (cid:98) q i is its corresponding eigenvector. Then,using the ﬁrst r principal components, we estimate the low-rank factor volatility matrix Θ

16s follows: (cid:98) Θ = r (cid:88) k =1 (cid:98) λ k (cid:98) q k (cid:98) q (cid:62) k . To estimate the sparse volatility matrix Σ , we ﬁrst calculate the input idiosyncratic volatilitymatrix estimator (cid:101) Σ = ( (cid:101) Σ ij ) ≤ i,j ≤ p = (cid:98) Γ − (cid:98) Θ and employ the adapted thresholding method asfollows: (cid:98) Σ ij = (cid:101) Σ ij ∨ , if i = js ij ( (cid:101) Σ ij ) ( | (cid:101) Σ ij | ≥ (cid:36) ij ) , if i (cid:54) = j and (cid:98) Σ = ( (cid:98) Σ ij ) ≤ i,j ≤ p , where the thresholding function s ij ( · ) satisﬁes that | s ij ( x ) − x | ≤ (cid:36) ij , and the adaptive thresh-olding level (cid:36) ij = (cid:36) n (cid:113) ( (cid:101) Σ ii ∨ (cid:101) Σ jj ∨ (cid:36) ij . Examples of the thresholding function s ij ( x ) include the soft thresholding func-tion s ij ( x ) = x − sign( x ) (cid:36) ij and the hard thresholding function s ij ( x ) = x . The tuningparameter (cid:36) n will be speciﬁed in Proposition 1. In the empirical study, we use the hardthresholding method.With the low-rank volatility matrix estimator (cid:98) Θ = ( (cid:98) Θ ij ) ≤ i,j ≤ p and the sparse volatilitymatrix estimator (cid:98) Σ = ( (cid:98) Σ ij ) ≤ i,j ≤ p , we estimate the integrated volatility matrix Γ by (cid:98) Γ P OET = (cid:98) Θ + (cid:98) Σ . To investigate asymptotic behaviors of the POET estimator, the sub-Weibull concentra-tion inequality is required, and is satisﬁed by the ARP estimator as shown in Theorem 3.Thus, the POET estimator based on the ARP estimators can enjoy the similar asymptoticproperties established in Fan and Kim (2018). To study its asymptotic behaviors, we needthe following technical conditions imposed by Fan and Kim (2018), but the sub-Weibullconcentration rate is diﬀerent because we consider heterogeneous heavy-tailedness.

Assumption 3. (a) Let D λ = min { ¯ λ i − ¯ λ i +1 : 1 ≤ i ≤ r } , ( λ + pM σ ) /D λ ≤ C a.s., and D λ ≥ C p a.s.for some generic constants C and C , where ¯ λ r +1 = 0 , M σ is deﬁned in (5.1) , and ¯ λ i and λ i are the i -th eigenvalues of Θ and Γ , respectively; b) For some ﬁxed constant C , we have pr max ≤ i ≤ p r (cid:88) j =1 ¯ q ij ≤ C a.s. , where ¯ q j = (¯ q j , . . . , ¯ q pj ) (cid:62) is the j -th eigenvector of Θ ;(c) The smallest eigenvalue of Σ stays away from zero almost surely;(d) s p / √ p + (cid:0) n − / log p (cid:1) ( α − /α = o (1) , where α = min ≤ i ≤ p α i . Under Assumption 3, we can establish the following proposition similar to the proof ofTheorem 3 in Fan and Kim (2018). Below, we assume a generic input (cid:98) Γ that satisﬁes (5.2).In particular, the ARP estimator satisﬁes the condition, as shown in Theorem 3. Proposition 1.

Under the model (2.1) , let α = min ≤ i ≤ p α i and assume that the concentra-tion inequality, Pr (cid:26) max ≤ i,j ≤ p | (cid:98) Γ ij − Γ ij | ≥ C (cid:0) n − / log p (cid:1) ( α − /α (cid:27) ≤ p − , (5.2) Assumption 3, and the sparse condition (5.1) are met. Take (cid:36) n = C (cid:36) β n for some large ﬁxedconstant C (cid:36) , where β n = M σ s p /p + (cid:0) n − / log p (cid:1) ( α − /α . Then we have for suﬃciently largen, with probability greater than − p − , (cid:107) (cid:98) Σ − Σ (cid:107) ≤ CM σ s p β − qn , (5.3) (cid:107) (cid:98) Σ − Σ (cid:107) max ≤ Cβ n , (5.4) (cid:107) (cid:98) Γ P OET − Γ (cid:107) Γ ≤ C (cid:104) p / (cid:0) n − / log p (cid:1) (2 α − /α + M σ s p β − qn (cid:105) , (5.5) (cid:107) (cid:98) Γ P OET − Γ (cid:107) max ≤ Cβ n , (5.6) where the relative Frobenius norm (cid:107) A (cid:107) Γ = p − (cid:107) Γ − / AΓ − / (cid:107) F . Furthermore, suppose that M σ s p β − qn = o (1) . Then, with probability approaching 1, the minimum eigenvalue of (cid:98) Σ is ounded away from 0, (cid:98) Γ P OET is non-singular, and (cid:107) (cid:98) Σ − − Σ − (cid:107) ≤ CM σ s p β − qn , (5.7) (cid:107) (cid:98) Γ − P OET − Γ − (cid:107) ≤ CM σ s p β − qn . (5.8) Remark 5.

Unlike Theorem 3 in Fan and Kim (2018), Proposition 1 imposes the sub-Weibull concentration condition (5.2) , which is the optimal rate with only ﬁnite 2 α -th mo-ments, as shown in Theorems 1–4. Note that if p ∈ [ n c , e n / ] for some positive con-stant c > , Theorem 3 shows that the ARP estimator satisﬁes (5.2) for δ = 1 / (2 p ) .Also, the POET estimator is consistent in terms of the relative Frobenius norm as long as p = o ( n (2 α − /α ) . That is, the convergence rate is a function of the minimum tail index α . To implement the ARP estimation procedure, we need to choose tuning parameters. In thissection, we discuss how to select the tuning parameters for the numerical studies.We ﬁrst estimate the tail index α i as follows. Let D i ( τ k ) = Y i ( τ i,k +1 ) − Y i ( τ i,k ) for k = 1 , . . . , n −

1. Then the tail index is estimated by (cid:98) α i = min (cid:40) a ∈ (1 , c α ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) k =1 (cid:12)(cid:12)(cid:12) D i ( τ k ) − ¯ D i s D i (cid:12)(cid:12)(cid:12) a > c α E | Z | a (cid:41) , (5.9)where ¯ D i and s D i are sample mean and sample standard deviation of D i ( τ k ), respectively, Z is a standard normal random variable, and c α and c α are two positive constants. Ifthere is no a that satisﬁes above inequality, we choose (cid:98) α i = c α . The intuition is that ifthe standardized 2 a -th moment is too large, we would say that its 2 a -th moment does notexist. To quantify the degree of largeness, we compare it with a multiple of correspondingstandard Gaussian moment. That leads to the method in (5.9). In the empirical study, weuse c α = 5 and c α = 2. Then the estimator, (cid:98) α ij , of α ij is calculated as follows: (cid:98) α ij = 2 ∧ (cid:98) α i (cid:98) α j (cid:98) α i + (cid:98) α j . (cid:98) α ij , we choose the thresholding level as follows: θ ij = c (cid:32) K n log p ( (cid:98) α ij − c (cid:98) α ij (cid:98) S ij ( n − K n ) (cid:33) / (cid:98) α ij (5.10)and θ ρ,ij = c (cid:32) log p ( (cid:98) α ij − c (cid:98) α ij (cid:98) S ρ,ij ( n − (cid:33) / (cid:98) α ij , (5.11)where (cid:98) S ij = 1 n − K n (cid:80) n − K n k =1 (cid:110) | Q ij ( τ k ) | (cid:98) α ij (cid:111) , (cid:98) S ρ,ij = 1 n − (cid:80) n − k =1 (cid:110) | Q ρ,ij ( τ k ) | (cid:98) α ij (cid:111) , and c is atuning parameter. For the empirical study, we choose c as 0.15. Also, in the pre-averagingstage, we choose K n = (cid:98) n / (cid:99) and g ( x ) = x ∧ (1 − x ). Remark 6.

There are other ways to estimate the tail indices. For example, one of the mostpopular methods is Hill’s estimator (Hill, 1975) which is consistent under some conditions.However, the performance of the Hill’s estimator also heavily depends on the choice of theother tuning parameter. In fact, we conducted some empirical study using the Hill’s estimatorand found that the performance is not robust to the choice of the tuning parameter. Thatis, we run into the other tuning parameter choice problem. Furthermore, the result is notbetter than that of the proposed method in (5.9) . Thus, we use the proposed method inthe numerical study. To justify the proposed method, we need to investigate its asymptoticbehavior. However, it may be a demanding task to develop some tuning parameter choiceprocedure that not only works well practically, but also has good properties in theory. Weleave this for the future study.

To check the ﬁnite sample performance of the ARP estimator, we conducted a simulationstudy. To obtain the low rank plus sparse structure, we considered the following true log-price20rocess, d X ( t ) = µ ( t ) dt + ϑ (cid:62) ( t ) d W ∗ t + σ (cid:62) ( t ) d W t + L ( t ) d Λ ( t ) , where µ ( t ) = (0 . , . . . , . (cid:62) , W ∗ t and W t are r and p dimensional independent Brownianmotions, respectively, ϑ ( t ) and σ ( t ) are r by r and p by p matrices, respectively, L ( t ) isjump sizes, and Λ ( t ) is the p dimensional Poisson process with the intensity I ( t ). We useda heterogeneous heavy tail process (heavy tail process 1), homogeneous heavy tail process(heavy tail process 2), and sub-Gaussian process to generate the volatility process. Togenerate two heavy tail processes, we used a setting similar to those in Wang and Zou(2010) and Fan and Kim (2018). Speciﬁcally, let σ ( t ) be the Cholesky decomposition of ς ( t ) = ( ς ij ( t )) ≤ i,j ≤ p . The diagonal elements of ς ( t ) come from four diﬀerent processes:geometric Ornstein-Uhlenbeck processes, the sum of two CIR processes (Cox et al., 1985;Barndorﬀ-Nielsen, 2002), the volatility process in Nelson’s GARCH diﬀusion limit model(Wang, 2002), and the two-factor log-linear stochastic volatility process (Huang and Tauchen,2005) with leverage eﬀect. Details can be found in Wang and Zou (2010). To control the tailbehaviors of the instantaneous volatility matrix ς ( t ), we used the t -distribution as follows: ς ii ( t l ) = (1 + | t df i ,l | ) (cid:101) ς ii ( t l ) , where for l = 1 , . . . , n all , t df i ,l are the i.i.d. t -distributions with degrees of freedom df i , t l = l/n all , and (cid:101) ς ii ( t l ) were generated by the above four diﬀerent processes. To account forthe heterogeneous heavy-tailed distribution (heavy tail process 1), df i were generated fromthe unif(2.5, 4), whereas for the homogeneous heavy-tailed distribution (heavy tail process2), we set df i = 5. To obtain the sparse instantaneous volatility matrix ς ( t ), we generatedits oﬀ-diagonal elements as follows: ς ij ( t l ) = { κ ( t l ) } | i − j | (cid:113) ς ii ( t l ) ς jj ( t l ) , ≤ i (cid:54) = j ≤ p, κ ( t ) is given by κ ( t ) = e u ( t ) − e u ( t ) + 1 , du ( t ) = 0 . { . − u ( t ) } dt + 0 . u ( t ) dW κ,t ,W κ,t = √ . W κ,t − . p (cid:88) i =1 W it / √ p, and W κ,t , κ = 1 , . . . , p , are one-dimensional Brownian motions independent of the Brow-nian motions W ∗ t and W t . The low-rank instantaneous volatility matrix ϑ (cid:62) ( t ) ϑ ( t ) is B (cid:62) { ϑ f ( t ) } (cid:62) ϑ f ( t ) B , where B = ( B ij ) ≤ i ≤ r, ≤ j ≤ p ∈ R r × p and B ij was generated from thenormal distribution with a mean of 0 and a standard deviation of 0.9, and ϑ f ( t ) was gen-erated similar to σ ( t ). Speciﬁcally, ϑ f ( t ) is the Cholesky decomposition of ς f ( t ), and thediagonal elements of ς f ( t ) at time t l were ς fii ( t l ) = (cid:110) (cid:12)(cid:12)(cid:12) t fdf i ,l (cid:12)(cid:12)(cid:12)(cid:111) (cid:101) ς fii ( t l ) , where t fdf i ,l , l = 1 , . . . , n all , are the i.i.d. t-distributions with degrees of freedom df i , and (cid:101) ς fii ( t l ) , l = 1 , . . . , n all , were generated from the geometric Ornstein-Uhlenbeck processes. Theoﬀ-diagonal elements of ς f ( t ) were set as zero. For the jump part, we chose I ( t ) = (5 , . . . , (cid:62) and the jump size L i ( t ) was obtained from independent t-distribution with degrees of free-dom df i and standard deviation 0 . (cid:113)(cid:82) γ ii ( t ) dt . We also generated a sub-Gaussian processsimilar to the heavy tail process except that t-distribution terms were set as standard normaldistribution terms.To generate the observation time points, we ﬁrst simulated n all − n all − t k , k = 1 , . . . , n all −

1, with t = 0 and t n all = 1. Basedon these points, we generated non-synchronized data similar to the scheme in A¨ıt-Sahaliaet al. (2010) as follows. First, p random proportions w i , i = 1 , . . . , p , were independentlygenerated from the unif(0.8, 1). Second, we set each t k as the observation time point of the i -th asset if and only if independent Bernoulli random variable with parameter w i has a valueof 1. Third, the noise-contaminated high-frequency observations Y i ( t i,k ) were generated from22he model (2.3). Speciﬁcally, the noise (cid:15) i ( t i,k ) was obtained from independent t-distributionwith degrees of freedom df i and standard deviation 0 . (cid:113)(cid:82) γ ii ( t ) dt. We chose p = 200 and r = 3, and we varied n all from 1000 to 4000. We employed the refresh time scheme to obtainsynchronized data.To investigate the eﬀect of the adaptiveness of the proposed ARP procedure, we introducea universal robust pre-averaging realized volatility (URP) estimator, which uses the sameestimation procedure as the ARP estimator with (cid:98) α ij = 2 for all 1 ≤ i, j ≤ p . That is,the URP estimator truncates the pre-averaged variables with the universal tail index level.Thus, we calculated the input volatility matrix using the adaptive robust pre-averagingrealized volatility matrix (ARPM), universal robust pre-averaging realized volatility matrix(URPM), and pre-averaging realized volatility matrix (PRVM) estimators. We used thetuning parameters discussed in Section 5.1 and set the tuning parameter c in (5.10)–(5.11)as 0.2. We note that, compared with the ARPM estimation procedure, the PRVM estimatorwas obtained by setting ψ α ( x ) = x in Section 3 and the URPM estimator was obtained bysetting α ij = 2 for all i, j = 1 , . . . , p . This means that the PRVM estimator cannot accountfor the heavy tail and that the URPM estimator cannot explain the heterogeneity of diﬀerentdegrees of the heaviness of tail distributions.To make the estimates positive semi-deﬁnite, we projected the volatility matrix estimatorsonto the positive semi-deﬁnite cone in the spectral norm. To calculate the POET estimators,we used the hard thresholding scheme and selected the thresholding level by minimizing thecorresponding Frobenius norm. The average estimation errors under the Frobenius norm,relative Frobenius norm, (cid:107) · (cid:107) Γ , (cid:96) -norm (spectral norm), and maximum norm were computedbased on 1000 simulations. The average numbers of synchronized time points with the refreshtime scheme were equal to 300.5, 600.4, 1199.7 for n all = 1000 , , α ij given n all = 1000 , , MSEtail type \ n all α ij against the samplesize n all for two heavy-tail processes. For the heterogeneous heavy-tail process, 2 α i ’s weregenerated from the unif(2.5, 4) and 2 α i =5 for the homogeneous heavy-tail process. Wecalculated α ij using (3.3). From Table 1, we can ﬁnd that for the heterogeneous heavy-tailprocess, the MSE decreases as the sample size n all increases. The MSEs for the homogeneousheavy-tail process are small regardless of n all . We note that for the sub-Gaussian process,more than 99.99 percent of α ij were estimated to be 2 (regarding as correctly estimateddue to subGaussianity of the truncated average) for all n all . These results indicate that theproposed tail index estimator works well.Figure 2 plots the Frobenius, relative Frobenius, spectral, and max norm errors againstthe sample size n all for the POET estimators from the ARPM, URPM, and PRVM estima-tors. Figure 3 depicts the spectral norm errors against the sample size n all for the inversePOET estimators with the ARPM, URPM, and PRVM estimators. As expected, the ARPMestimator outperforms other estimators for the heterogeneous heavy-tail process. For the ho-mogeneous heavy-tail and sub-Gaussian processes, the ARPM and URPM estimators bothperform similarly and outperform the PRVM estimator. One possible explanation of thepoor performance of the PRVM estimator in the Gaussian noise case is that the true returnprocess contains heavy distributions over time and hence robust methods outperform. Tosum up, the ARPM estimator is robust to the heterogeneity of the heaviness of tails andadapts to homogeneity of the heaviness of tails. In this section, we applied the proposed ARP estimator to high-frequency trading data for200 assets from January 1 to December 31, 2016 (252 trading days). The top 200 large tradingvolume stocks were selected from among the S&P 500, and the data was obtained from theWharton Data Service (WRDS) system. Figure 4 plots the daily synchronized sample sizesfrom the refresh time scheme for 200 assets. As seen in Figure 4, sampling frequency higherthan 1 minute can lead to the non-existence of the observation between some consecutivesample points. Hence, we employed 1-min log-return data with the previous tick scheme to24 l l

Heterogeneous tail n all F r oben i u s no r m l ARPMURPMPRVM l l l

Homogeneous tail n all F r oben i u s no r m l l l Sub−Gaussian process n all F r oben i u s no r m l l l . . . . . . . Heterogeneous tail n all R e l a t i v e F r oben i u s no r m l l l . . . . . . . Homogeneous tail n all R e l a t i v e F r oben i u s no r m l l l . . . . . . . Sub−Gaussian process n all R e l a t i v e F r oben i u s no r m l l l Heterogeneous tail n all S pe c t r a l no r m l l l Homogeneous tail n all S pe c t r a l no r m l l l Sub−Gaussian process n all S pe c t r a l no r m l l l Heterogeneous tail n all M a x i m u m no r m l l l Homogeneous tail n all M a x i m u m no r m l l l Sub−Gaussian process n all M a x i m u m no r m Figure 2: The Frobenius, relative Frobenius, spectral, and max norm error plots of thePOET estimators with the ARPM, URPM, and PRVM estimators for p = 200 and n all =1000 , , l l . . . . Heterogeneous tail n all S pe c t r a l no r m l ARPMURPMPRVM l l l . . . . Homogeneous tail n all S pe c t r a l no r m l l l . . . . Sub−Gaussian process n all S pe c t r a l no r m Figure 3: The spectral norm error plots of the inverse POET estimators with the ARPM,URPM, and PRVM estimators for p = 200 and n all = 1000 , , llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Day S a m p l e s i z e

30 s1 min

Figure 4: The number of daily synchronized samples from the refresh time scheme for 200assets over 252 days in 2016. The blue dash and red solid lines mark the numbers of possibleobservations for 30-sec and 1-min log-returns in each trading day, which are 780 and 390respectively.To catch the heterogeneous heavy-tailedness over time, we estimated the tail indexesusing the method (5.9) in Section 5.1. Figure 5 shows the box plots of daily estimated tailindexes (cid:98) α i of 200 assets for each of the 5 days in 2016: from the day with the largest IQRto the day with the smallest IQR among 252 days. Figure 5 shows that the tail indexes ofobserved log-returns are heterogeneous over time, which matches the daily kurtoses result inFigure 1. This supports the heterogeneous heavy tail assumption.To apply POET estimation procedures, we ﬁrst need to determine the rank r . We26 lllllllllllllll lllllll lllllllllll l (a) (b) (c) (d) (e) Box plot

Day a ^ i Figure 5: The box plots of the distributions of the daily estimated tail indexes (cid:98) α i for the200 most liquid stocks among the S&P 500 index in 2016. Day (a) has the largest IQR, anddays (b)–(e) have the 75th, 50th, 25th, and 0th (minimum) percentile of the IQR among 252trading days in 2016, respectively.calculated 252 daily integrated volatility matrices using the PRVM estimation procedure.Figure 6 shows the scree plot drawn using the eigenvalues from the sum of 252 PRVMestimates. As seen in Figure 6, the possible values of the rank r are 1 , ,

3, and hence weconducted the empirical study for r = 1 , , llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Scree plot E i gen v a l ue Figure 6: The scree plot of eigenvalues of the sum of 252 PRVM estimates.To estimate the sparse volatility matrix Σ , we used the Global Industry Classiﬁcation27tandard (GICS) (Fan et al., 2016a). Speciﬁcally, the covariance matrix for idiosyncraticcomponents for the diﬀerent sectors are set to zero and those for the same sector are main-tained. This corresponds to hard-thresholding using the sector information. To make theestimates positive semi-deﬁnite, we projected the POET estimators onto the positive semi-deﬁnite cone in the spectral norm.To check the performance of the proposed ARP estimation procedure, we ﬁrst investigatedthe mean squared prediction error (MSPE) for the POET estimators, where the MSPE isdeﬁned as follows: MSPE( (cid:98) Γ ) = 1 s − s − (cid:88) d =1 (cid:107) (cid:98) Γ d − (cid:98) Γ d +1 (cid:107) F , (6.1)where s is the number of days in the period and (cid:98) Γ d can be POET estimators from theARPM, URPM, and PRVM estimators for the d -th day of the period. We used threediﬀerent periods: 252 days, day 1 to day 126, and day 127 to day 252. Table 2 reports theMSPE results for the POET estimators from the inputs of the ARPM, URPM, and PRVMestimators. We ﬁnd that for each period and rank r , the ARPM estimator has the smallestMSPE. The URPM estimator is slightly better than the PRVM estimator, but insigniﬁcantlyso when compared with the ARPM estimator. One possible explanation is that the proposedARP estimator can help deal with heterogeneous heavy-tailed distributions when estimatingintegrated volatility matrices.To check the out-of-sample performance, we applied the ARPM, URPM, and PRVMestimators to the following minimum variance portfolio allocation problem:min ω ω (cid:62) (cid:98) Γ ω, subject to ω (cid:62) J = 1 and (cid:107) ω (cid:107) ≤ c , where J = (1 , . . . , (cid:62) ∈ R p , the gross exposure constraint c was changed from 1 to 6, and (cid:98) Γ could be the POET estimators from the ARPM, URPM, and PRVM estimators. To calculatethe out-of-sample risks, we constructed the portfolios at the beginning of each trading dayusing the stock weights calculated using the data from the previous day. We then held thisfor one day and calculated the square root of realized volatility using the 10-min portfoliolog-returns. Their average was used for out-of-sample risk. We tested the performances for28able 2: The MSPEs of the POET estimators from the ARPM, URPM, and PRVM estima-tors (period 1: 252 days, period 2: day 1 to day 126, period 3: day 127 to day 252). MSPE × Rank r three diﬀerent periods: 252 days, day 1 to day 126, and day 127 to day 252.Figure 7 depicts the out-of-sample risks of the portfolios constructed by the POET esti-mators from the ARPM, URPM, and PRVM estimators. We can ﬁnd that, for the purposeof portfolio allocation, the ARPM estimator shows a stable result and has the smallest risks.The URPM estimator performs better than the PRVM estimator, but the improvement isless compared to that of the ARPM estimator. It is worth noting that the results do notsigniﬁcantly depend on the period and rank r . These results lend further support to ourclaim that the heavy-tailed distributions of observed log-returns are heterogeneous, as shownin Figure 1 and Figure 5, and the proposed ARP estimation procedure can account for theheterogeneity of the degrees of heaviness of tail distributions. In this paper, we develop the adaptive robust pre-averaging realized volatility (ARP) esti-mation method to handle the heterogeneous heavy-tailed distributions of stock returns. Toaccount for the heterogeneity of the heavy-tailedness from microstructural noises and pricejumps, the ARP estimator truncates quadratic pre-averaged random variables according todaily tail indices. We show that the proposed ARP estimator achieves sub-Weibull tail con-29 . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) ARPMURPMPRVM . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . Exposure constraint A nnua li z ed r i sk ( % ) Figure 7: The out-of-sample risks of the optimal portfolios constructed by using the POETestimators from the ARPM, URPM, and PRVM estimators.centration with the optimal convergence rate by showing that its upper bound is matchedwith its lower bound. To estimate large integrated volatility matrices, the ARP estimator isfurther regularized using the POET procedure, and the asymptotic properties of the POETestimator from the ARP estimator are also investigated. In the empirical study, for the30urpose of portfolio allocation, the POET estimator based on the ARP estimator performsthe best. These ﬁndings suggest that when it comes to estimating the integrated volatil-ity matrices, the proposed ARP estimation procedure helps handle the heterogeneous taildistributions of observed log-returns.The non-synchronization could be other source of the heavy-tailness and also the hetero-geneity of time intervals can cause some heterogeneous variation. However, in this paper,we do not focus on this issue and mainly consider the noise and jump as the source of theheavy-tailness. It would be interesting and important to study the observation time pointin the aspect of the heavy-tailness. Furthermore, there are other possible sources of theheavy-tailness and it is important to know what actually causes heavy-tailness. We leavethese interesting questions for the future study.

References

A¨ıt-Sahalia, Y., Fan, J., and Xiu, D. (2010). High-frequency covariance estimates withnoisy and asynchronous ﬁnancial data.

Journal of the American Statistical Association ,105(492):1504–1517.A¨ıt-Sahalia, Y., Jacod, J., and Li, J. (2012). Testing for jumps in noisy high frequency data.

Journal of Econometrics , 168(2):207–222.A¨ıt-Sahalia, Y. and Xiu, D. (2017). Using principal component analysis to estimate a highdimensional factor model with high-frequency data.

Journal of Econometrics , 201(2):384–399.Andersen, T. G., Bollerslev, T., and Diebold, F. X. (2007). Roughing it up: Including jumpcomponents in the measurement, modeling, and forecasting of return volatility.

The reviewof economics and statistics , 89(4):701–720.Bai, J. (2003). Inferential theory for factor models of large dimensions.

Econometrica ,71(1):135–171. 31ai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.

Econometrica , 70(1):191–221.Barndorﬀ-Nielsen, O. E. (2002). Econometric analysis of realized volatility and its use inestimating stochastic volatility models.

Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 64(2):253–280.Barndorﬀ-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008). Designingrealized kernels to measure the ex post variation of equity prices in the presence of noise.

Econometrica , 76(6):1481–1536.Barndorﬀ-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2011). Multivariaterealised kernels: consistent positive semi-deﬁnite estimators of the covariation of equityprices with noise and non-synchronous trading.

Journal of Econometrics , 162(2):149–169.Barndorﬀ-Nielsen, O. E. and Shephard, N. (2006). Econometrics of testing for jumps inﬁnancial economics using bipower variation.

Journal of ﬁnancial Econometrics , 4(1):1–30.Bibinger, M., Hautsch, N., Malec, P., and Reiß, M. (2014). Estimating the quadratic co-variation matrix from noisy observations: Local method of moments and eﬃciency.

TheAnnals of Statistics , 42(4):1312–1346.Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviationstudy.

Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques , 48(4):1148–1185.Chen, D., Mykland, P. A., and Zhang, L. (2020). The ﬁve trolls under the bridge: Principalcomponent analysis with asynchronous and noisy high frequency data.

Journal of theAmerican Statistical Association , 115(532):1960–1977.Christensen, K., Kinnebrock, S., and Podolskij, M. (2010). Pre-averaging estimators of theex-post covariance matrix in noisy diﬀusion models with non-synchronous data.

Journalof Econometrics , 159(1):116–133.Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues.

Quantitative Finance , 1(2):223–236. 32orsi, F., Pirino, D., and Reno, R. (2010). Threshold bipower variation and the impact ofjumps on volatility forecasting.

Journal of Econometrics , 159(2):276–288.Cox, J. C., Ingersoll Jr, J. E., and Ross, S. A. (1985). A theory of the term structure ofinterest rates.

Econometrica: Journal of the Econometric Society , 53:385–407.Davies, R. and Tauchen, G. (2018). Data-driven jump detection thresholds for applicationin jump regressions.

Econometrics , 6(2):16.Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R. I. (2016). Sub-gaussian mean estima-tors.

The Annals of Statistics , 44(6):2695–2725.Fan, J., Furger, A., and Xiu, D. (2016a). Incorporating global industrial classiﬁcation stan-dard into portfolio allocation: A simple factor-based large covariance matrix estimatorwith high-frequency data.

Journal of Business & Economic Statistics , 34(4):489–503.Fan, J. and Kim, D. (2018). Robust high-dimensional volatility matrix estimation for high-frequency factor model.

Journal of the American Statistical Association , 113(523):1268–1283.Fan, J., Li, Y., and Yu, K. (2012). Vast volatility matrix estimation using high-frequencydata for portfolio selection.

Journal of the American Statistical Association , 107(497):412–428.Fan, J., Liao, Y., and Liu, H. (2016b). An overview of the estimation of large covarianceand precision matrices.

The Econometrics Journal , 19(1):C1–C32.Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholdingprincipal orthogonal complements.

Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 75(4):603–680.Fan, J., Wang, W., and Zhong, Y. (2018). An (cid:96) ∞ eigenvector perturbation bound andits application to robust covariance estimation. Journal of Machine Learning Research ,18(207):1–42. 33an, J., Wang, W., and Zhu, Z. (2021). A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery.

Annals of Statistics , page to appear.Fan, J. and Wang, Y. (2007). Multi-scale jump and volatility analysis for high-frequencyﬁnancial data.

Journal of the American Statistical Association , 102(480):1349–1362.Hayashi, T. and Yoshida, N. (2005). On covariance estimation of non-synchronously observeddiﬀusion processes.

Bernoulli , 11(2):359–379.Hayashi, T. and Yoshida, N. (2011). Nonsynchronous covariation process and limit theorems.

Stochastic processes and their applications , 121(10):2416–2454.Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution.

The annals of statistics , pages 1163–1174.Huang, X. and Tauchen, G. (2005). The relative contribution of jumps to total price variance.

Journal of ﬁnancial econometrics , 3(4):456–499.Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., and Vetter, M. (2009). Microstructurenoise in the continuous case: the pre-averaging approach.

Stochastic processes and theirapplications , 119(7):2249–2276.Jacod, J. and Protter, P. (2012).

Discretization of Processes . Springer.Kim, D. and Fan, J. (2019). Factor garch-itˆo models for high-frequency data with applicationto large volatility matrix prediction.

Journal of Econometrics , 208(2):395–417.Kim, D., Liu, Y., and Wang, Y. (2018). Large volatility matrix estimation with factor-baseddiﬀusion model for high-frequency ﬁnancial data.

Bernoulli , 24(4B):3657–3682.Kim, D., Wang, Y., and Zou, J. (2016). Asymptotic theory for large volatility matrix estima-tion based on high-frequency ﬁnancial data.

Stochastic Processes and their Applications ,126:3527—-3577.Kong, X.-B. (2018). On the systematic and idiosyncratic volatility with large panel high-frequency data.

Annals of Statistics , 46(3):1077–1108.34alliavin, P., Mancino, M. E., et al. (2009). A fourier transform method for nonparametricestimation of multivariate volatility.

The Annals of Statistics , 37(4):1983–2010.Mancini, C. (2004). Estimation of the characteristics of the jumps of a general poisson-diﬀusion model.

Scandinavian Actuarial Journal , 2004(1):42–52.Mao, G. and Zhang, Z. (2018). Stochastic tail index model for high frequency ﬁnancial datawith bayesian analysis.

Journal of Econometrics , 205(2):470–487.Massacci, D. (2017). Tail risk dynamics in stock returns: Links to the macroeconomy andglobal markets connectedness.

Management Science , 63(9):3072–3089.Minsker, S. (2018). Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries.

The Annals of Statistics , 46(6A):2871–2903.Park, S., Hong, S. Y., and Linton, O. (2016). Estimating the quadratic covariation matrix forasynchronously observed high frequency stock returns corrupted by additive measurementerror.

Journal of Econometrics , 191(2):325–347.Song, X., Kim, D., Yuan, H., Cui, X., Lu, Z., Zhou, Y., and Wang, Y. (2020). Volatilityanalysis with realized garch-itˆo models.

Journal of Econometrics .Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a largenumber of predictors.

Journal of the American statistical association , 97(460):1167–1179.Sun, Q., Zhou, W.-X., and Fan, J. (2020). Adaptive huber regression.

Journal of theAmerican Statistical Association , 115(529):254–265.Wang, Y. (2002). Asymptotic nonequivalence of garch models and diﬀusions.

The Annalsof Statistics , 30(3):754–783.Wang, Y. and Zou, J. (2010). Vast volatility matrix estimation for high-frequency ﬁnancialdata.

The Annals of Statistics , 38(2):943–978.Xiu, D. (2010). Quasi-maximum likelihood estimation of volatility with high frequency data.

Journal of Econometrics , 159(1):235–250. 35hang, L. (2006). Eﬃcient estimation of stochastic volatility using noisy observations: Amulti-scale approach.

Bernoulli , 12(6):1019–1043.Zhang, L. (2011). Estimating covariation: Epps eﬀect, microstructure noise.

Journal ofEconometrics , 160(1):33–47.Zhang, L., Mykland, P. A., and A¨ıt-Sahalia, Y. (2005). A tale of two time scales: Determiningintegrated volatility with noisy high-frequency data.

Journal of the American StatisticalAssociation , 100(472):1394–1411.Zhang, X., Kim, D., and Wang, Y. (2016). Jump variation estimation with noisy highfrequency ﬁnancial data via wavelets.

Econometrics , 4(3):34.36

Appendix

A.1 Proof of Theorem 1

For simplicity, we denote K n by K . Let¯ X i ( τ k ) = (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) { X ci ( τ i,k + l +1 ) − X ci ( τ i,k + l ) } = (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) (cid:90) τ i,k + l +1 τ i,k + l µ i ( t ) dt + (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) (cid:90) τ i,k + l +1 τ i,k + l e (cid:62) i σ (cid:62) ( t ) d W t = ¯ X µi ( τ k ) + ¯ X σi ( τ k ) , where e i is the p × i -th coordinate and¯ (cid:15) i ( τ k ) = (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) { (cid:15) i ( τ i,k + l +1 ) − (cid:15) i ( τ i,k + l ) } = (cid:115) n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) (cid:15) i ( τ i,k + l +1 ) . Then we have Q cij ( τ k ) = (cid:2) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:3) (cid:2) ¯ X j ( τ k ) + ¯ (cid:15) j ( τ k ) (cid:3) . (A.1) Proposition 2.

Under models (2.1) and (2.3) , (a) and (b) hold for all ≤ i, j ≤ p andsuﬃciently large n .(a) Under Assumption 1, there exist positive constants U ij ( τ k ) whose values are free of n and p such that E (cid:110)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k (cid:111) ≤ U ij ( τ k ) a.s.for all ≤ k ≤ n − K n .(b) Under Assumption 1(a)–(b) and Assumption 2, there exist positive constants U ρ,ij ( τ k )37 hose values are free of n and p such that E (cid:110)(cid:12)(cid:12) Q cρ,ij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k − (cid:111) ≤ U ρ,ij ( τ k ) a.s.for all ≤ k ≤ n − . Proof of Proposition 2.

First, consider (a). By the H¨older’s inequality,E (cid:104) (cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α i α j / ( α i + α j ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ (cid:110) E (cid:104) | Q cii ( τ k ) | α i (cid:12)(cid:12)(cid:12) F τ k (cid:105)(cid:111) α j / ( α i + α j ) (cid:110) E (cid:104) (cid:12)(cid:12) Q cjj ( τ k ) (cid:12)(cid:12) α j (cid:12)(cid:12)(cid:12) F τ k (cid:105)(cid:111) α i / ( α i + α j ) a.s.Therefore, it suﬃces to show thatE (cid:104) | Q cii ( τ k ) | α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ C a.s.By Assumption 1(c) and (A.1), we have ν Q ≥ E (cid:104) | Q cii ( τ k ) | α i (cid:105) = E (cid:104)(cid:12)(cid:12) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:105) = E (cid:110) E (cid:104) (cid:12)(cid:12) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:12)(cid:12)(cid:12) ¯ (cid:15) i ( τ k ) (cid:105)(cid:111) ≥ E (cid:26)(cid:12)(cid:12)(cid:12) E (cid:104) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) | ¯ (cid:15) i ( τ k ) (cid:105)(cid:12)(cid:12)(cid:12) α i (cid:27) ≥ E (cid:26)(cid:12)(cid:12)(cid:12) | ¯ (cid:15) i ( τ k ) | − (cid:12)(cid:12) E (cid:2) ¯ X i ( τ k ) (cid:3)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) α i (cid:27) . Also, by the fact that (cid:12)(cid:12) ¯ X µi ( τ k ) (cid:12)(cid:12) ≤ Cn − / a.s., we have (cid:12)(cid:12) E (cid:2) ¯ X i ( τ k ) (cid:3)(cid:12)(cid:12) = (cid:12)(cid:12) E (cid:2) ¯ X µi ( τ k ) (cid:3)(cid:12)(cid:12) ≤ Cn − / a.s.Hence, using the H¨older’s inequality, we can showE (cid:2) | ¯ (cid:15) i ( τ k ) | α i (cid:3) ≤ C. g ( · ), we haveE (cid:104) | Q cii ( τ k ) | α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) = E (cid:104)(cid:12)(cid:12) ¯ X µi ( τ k ) + ¯ X σi ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ C + C E (cid:104)(cid:12)(cid:12) ¯ X σi ( τ k ) (cid:12)(cid:12) α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ C + C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − KφK K − (cid:88) l =0 g (cid:18) lk (cid:19) ν γ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i ≤ C a.s. , (A.2)where the ﬁrst inequality is due to the H¨older’s inequality and the second inequality is fromthe Burkholder-Davis-Gundy inequality. Then (a) is proved, and we can show (b) similar tothe proof of (a). (cid:4) Proof of Theorem 1.

Without loss of generality, we assume that n = K ( L + 1) forsome L ∈ N . We have | (cid:98) T αij,θ − T ij | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − T ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K ) θ ij n − K (cid:88) k =1 (cid:104) ψ α ij { θ ij Q ij ( τ k ) } − ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = ( I ) + ( II ) . (A.3)First, consider ( I ). Let A ij ( τ k ) = E (cid:104) Q cij ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) . Then for any s >

0, we obtain thatPr (cid:40) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − n − K n − K (cid:88) k =1 A ij ( τ k ) ≥ Kn − K s (cid:41) ≤ exp {− θ ij s } E (cid:34) K − (cid:89) m =0 L − (cid:89) k =0 exp (cid:26) K (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105)(cid:27)(cid:35) ≤ exp {− θ ij s } K − (cid:89) m =0 E (cid:34) L − (cid:89) k =0 exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105)(cid:35) /K = exp {− θ ij s } K − (cid:89) m =0 E (cid:34) L − (cid:89) k =0 exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105) E (cid:34) exp (cid:104) ψ α ij (cid:8) θ ij Q cij (cid:0) τ K ( L − m +1 (cid:1)(cid:9) − θ ij A ij (cid:0) τ K ( L − m +1 (cid:1) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ K ( L − m +1 (cid:35)(cid:35) /K ≤ exp {− θ ij s } K − (cid:89) m =0 E (cid:34) L − (cid:89) k =0 exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105)(cid:35) /K × exp (cid:40) K − K − (cid:88) m =0 c α ij U ij ( τ K ( L − m +1 ) θ α ij ij (cid:41) ≤ exp (cid:26) − θ ij s + n − KK c α ij S ij θ α ij ij (cid:27) , (A.4)where the ﬁrst and second inequalities are due to the Markov inequality and H¨older’s in-equality, respectively, and the third and fourth inequalities can be obtained by (A.5). Sincewe can get − log(1 − x + c α ij | x | α ij ) ≤ ψ α ij ( x ) ≤ log(1 + x + c α ij | x | α ij ) from Lemma A.2(Minsker, 2018), we haveE (cid:34) exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − θ ij A ij ( τ k ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:35) ≤ E (cid:34) exp (cid:104) log (cid:8) θ ij Q cij ( τ k ) + c α ij (cid:12)(cid:12) θ ij Q cij ( τ k ) (cid:12)(cid:12) α ij (cid:9) − θ ij A ij ( τ k ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:35) = exp (cid:104) log (cid:8) θ ij A ij ( τ k ) + c α ij θ α ij ij E (cid:2)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij |F τ k (cid:3)(cid:9) − θ ij A ij ( τ k ) (cid:105) ≤ exp (cid:104) c α ij θ α ij ij E (cid:2)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij |F τ k (cid:3) (cid:105) ≤ exp (cid:104) c α ij U ij ( τ k ) θ α ij ij (cid:105) a.s. , (A.5)where the second inequality is due to the fact that log (1 + x ) ≤ x for any x > −

1, and thelast inequality is from Proposition 2. Choose θ ij = (cid:18) K n log y − ( α ij − c α ij S ij ( n − K n ) (cid:19) /α ij , s = (cid:32) α α ij ij c α ij S ij ( n − K ) (log y − ) α ij − ( α ij − α ij − K (cid:33) /α ij , where c log n ≤ log y − ≤ √ n . Then we havePr (cid:34) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − n − K n − K (cid:88) k =0 A ij ( τ k ) ≥ (cid:32) α α ij ij c α ij S ij K α ij − (log y − ) α ij − ( α ij − α ij − ( n − K ) α ij − (cid:33) /α ij (cid:35) ≤ y. (cid:34) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − n − K n − K (cid:88) k =0 A ij ( τ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij (cid:35) ≥ − y. (A.6)Now, we need to establish the relationship between (cid:80) n − Kk =0 A ij ( τ k ) / ( n − K ) and T ij . Since X and (cid:15) are independent, we have A ij ( τ k ) = E (cid:104) ¯ X µi ( τ k ) ¯ X µj ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) + E (cid:104) ¯ X µi ( τ k ) ¯ X σj ( τ k ) + ¯ X σi ( τ k ) ¯ X µj ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) +E (cid:104) ¯ X σi ( τ k ) ¯ X σj ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) + E (cid:104) ¯ (cid:15) i ( τ k )¯ (cid:15) j ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) = ( a ) + ( b ) + ( c ) + ( d ) . By the fact that (cid:12)(cid:12) ¯ X µi ( τ k ) (cid:12)(cid:12) ≤ Cn − / a.s., we have | ( a ) | ≤ Cn − / a.s. (A.7)Using the Burkholder-Davis-Gundy inequality, we can show | ( b ) | ≤ Cn − / (cid:32)(cid:114) E (cid:104) (cid:8) ¯ X σi ( τ k ) (cid:9) (cid:12)(cid:12)(cid:12) F τ k (cid:105) + (cid:114) E (cid:104) (cid:8) ¯ X σj ( τ k ) (cid:9) (cid:12)(cid:12)(cid:12) F τ k (cid:105)(cid:33) ≤ Cn − / a.s. (A.8)Consider ( c ). Let ¯ X σi ( τ k ) = (cid:115) n − KφK K − (cid:88) l =0 H i,k,l , where H i,k,l = g (cid:18) lK (cid:19) (cid:90) τ i,k + l +1 τ k + l e (cid:62) i σ (cid:62) ( t ) d W t + g (cid:18) l + 1 K (cid:19) (cid:90) τ k + l +1 τ i,k + l +1 e (cid:62) i σ (cid:62) ( t ) d W t = g (cid:18) lK (cid:19) (cid:90) τ k + l +1 τ k + l e (cid:62) i σ (cid:62) ( t ) d W t + (cid:26) g (cid:18) l + 1 K (cid:19) − g (cid:18) lK (cid:19)(cid:27) (cid:90) τ k + l +1 τ i,k + l +1 e (cid:62) i σ (cid:62) ( t ) d W t . c ) = n − KφK K − (cid:88) l =0 E (cid:104) H i,k,l H j,k,l (cid:12)(cid:12)(cid:12) F τ k (cid:105) a.s.By the Itˆo’s isometry and the boundedness of γ ij ( t ), we can get for all 0 ≤ l ≤ K − (cid:104) H i,k,l H j,k,l (cid:12)(cid:12)(cid:12) F τ k (cid:105) − (cid:26) g (cid:18) lK (cid:19)(cid:27) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) ≤ Cn − (cid:34)(cid:26) g (cid:18) l + 1 K (cid:19) − g (cid:18) lK (cid:19)(cid:27) + (cid:12)(cid:12)(cid:12)(cid:12) g (cid:18) lK (cid:19) (cid:26) g (cid:18) l + 1 K (cid:19) − g (cid:18) lK (cid:19)(cid:27)(cid:12)(cid:12)(cid:12)(cid:12)(cid:35) ≤ Cn − / a.s. , where the last inequality is by the piecewise Lipschitz derivative condition for g ( · ). Thus,we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( c ) − n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn − / a.s. (A.9)Finally, for ( d ), we have( d ) = n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) ( τ i,k + l +1 = τ j,k + l +1 ) η ij a.s. (A.10)Combining (A.7)–(A.10), we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K n − K (cid:88) k =1 A ij ( τ k ) − A ∗ ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn − / a.s. , (A.11)where A ∗ ij = 1 φK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) n − K (cid:88) k =1 E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) + ρ ij . Now, we investigate the relationship between A ∗ ij and T ij . Note that γ ij ( t ) is boundedand (cid:80) n − Kk =1 (cid:104) E (cid:110)(cid:82) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12) F τ k (cid:111) − (cid:82) τ k + l +1 τ k + l γ ij ( t ) dt (cid:105) is the sum of l +1 martingales. Hence,42sing the Azuma-Hoeﬀding inequality for each martingale, we can show for all 0 ≤ l ≤ K − (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K (cid:88) k =1 (cid:34) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) − (cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C (cid:0) n − / log y − (cid:1) / (cid:33) ≤ Ky.

Also, simple algebraic manipulations show | A ∗ ij − T ij | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) (cid:32)(cid:34) n − K (cid:88) k =1 E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41)(cid:35) − (cid:90) γ ij ( t ) dt (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) n − K (cid:88) k =1 (cid:34) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) − (cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 Kn ν γ . Therefore, we havePr (cid:26) | A ∗ ij − T ij | ≤ C (cid:0) n − / log y − (cid:1) / + 2 Kn ν γ (cid:27) ≥ − K y. (A.12)Combining (A.6), (A.11), and (A.12), we havePr (cid:110) ( I ) ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − K y. (A.13)Consider ( II ). Note that by the boundedness of the intensity, we havePr (cid:8) Λ i (1) ≤ C log y − (cid:9) ≥ − y. Hence, using the fact that ψ α ( x ) is a bounded function, we havePr (cid:26) ( II ) ≤ C (cid:18) K log y − nθ ij (cid:19)(cid:27) ≥ − y, which implies Pr (cid:110) ( II ) ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − y. (A.14)43ollecting (A.3), (A.13), and (A.14), we obtain that with probability at least 1 − K y , | (cid:98) T αij,θ − T ij | ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij , and then substituting δ/ (3 K ) for y completes the proof. (cid:4) A.2 Proof of Theorem 2

Proof of Theorem 2.

Let n i = n j = n and t i,k = t j,k = τ i,k = τ j,k = τ k = k/n for1 ≤ k ≤ n . To derive a lower bound, we construct two quadratic pre-averaged randomvariables Q ,ij ( τ k ) and Q ,ij ( τ k ) as follows. Let d X ( t ) = d X ( t ) = σ (cid:62) ( t ) d W t for anyappropriate σ ( t ), which implies ¯ X ,h ( τ k ) = ¯ X ,h ( τ k ) for 1 ≤ h ≤ p and 1 ≤ k ≤ n − K . Also,let 2 (cid:15) ,h ( t h,k ) = (cid:15) ,h ( t h,k ) for 1 ≤ h ≤ p and 0 ≤ k ≤ n , where the distributions of (cid:15) ,h ( t h,k ),1 ≤ h ≤ p are deﬁned as follows: (cid:15) ,h ( t h,k ) =  K ( α h +1) / α h (log(1 / δ )) − / α h with probability d − d − K ( α h +1) / α h (log(1 / δ )) − / α h with probability d, where d = C K log(1 / δ ) / K . For each 0 ≤ k ≤ n , let Pr { (cid:15) ,h ( t h,k ) > ≤ h ≤ p } = Pr { (cid:15) ,h ( t h,k ) < ≤ h ≤ p } = d and Pr { (cid:15) ,h ( t h,k ) = 0 for all 1 ≤ h ≤ p } = 1 − d .Then, using the fact that 1 − x ≥ exp( − x/ (1 − x )) for any 0 ≤ x ≤ /

2, we can show n (cid:89) k =1 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } = (cid:18) − n log 12 δ (cid:19) n ≥ δ. (A.15)Here, we need to check whether the construction satisﬁes Assumption 1(c). It suﬃces toshow E (cid:2)(cid:12)(cid:12) Q c ,ii ( τ ) (cid:12)(cid:12) α i (cid:3) ≤ C. (A.16)44ote that for all 1 ≤ k ≤ n ,E (cid:2) | (cid:15) ,i ( τ k ) | α i (cid:3) = C K K α i − and E (cid:2) (cid:15) ,i ( τ k ) (cid:3) = C K (cid:18) K − log 12 δ (cid:19) ( α i − /α i . Hence, by the Lipschitz continuity of g ( · ), we haveE (cid:2) | ¯ (cid:15) ,i ( τ ) | α i (cid:3) = (cid:32)(cid:115) n − KφK K (cid:33) α i E (cid:34) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K − (cid:88) l =0 K (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) (cid:15) ,i ( τ l +2 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i (cid:35) ≤ CK − α i (cid:40) K − (cid:88) l =0 E (cid:2) | (cid:15) ,i ( τ l +2 ) | α i (cid:3) + (cid:32) K − (cid:88) l =0 E (cid:2) (cid:15) ,i ( τ l +2 ) (cid:3)(cid:33) α i (cid:41) ≤ C + CK − α i (cid:40) K (cid:18) K − log 12 δ (cid:19) ( α i − /α i (cid:41) α i ≤ C, (A.17)where the ﬁrst inequality is due to the Rosenthal’s inequality. Then similar to the proof ofProposition 2, we can show (A.16).Now, since | T ,ij − T ,ij | = 3 nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] , we have for any (cid:98) T ij ( Q ij ( τ k ) , δ ),max (cid:34) Pr (cid:26)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] (cid:27) , Pr (cid:26)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] (cid:27) (cid:35) ≥

12 Pr (cid:34) (cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )]or (cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] (cid:35) ≥

12 Pr (cid:110) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) = (cid:98) T ij ( Q ,ij ( τ k ) , δ ) (cid:111) ≥ n (cid:89) k =1 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } ≥ δ, (A.18)45here the last inequality is from (A.15). Combining (A.18) and the fact that E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] = C ( K − log(1 / δ )) ( α ij − /α ij , we have for suﬃciently large n ,max (cid:34) Pr (cid:40)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ C (cid:18) n − / log 12 δ (cid:19) ( α ij − /α ij (cid:41) , Pr (cid:40)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ C (cid:18) n − / log 12 δ (cid:19) ( α ij − /α ij (cid:41) (cid:35) ≥ δ, (A.19)which completes the proof. (cid:4) A.3 Proof of Theorem 3

Proposition 3.

Under Assumption 1(a)–(b) and Assumption 2, Assumption 1(c) is satis-ﬁed.

Proof of Proposition 3.

Similar to the proof of Proposition 2, we can showE (cid:2) | (cid:15) i ( τ i,k ) | α i (cid:3) ≤ C. Then we haveE (cid:104) | Q cii ( τ k ) | α i (cid:105) = E (cid:104)(cid:12)(cid:12) ¯ X µi ( τ k ) + ¯ X σi ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:105) ≤ C + C E (cid:104)(cid:12)(cid:12) ¯ X σi ( τ k ) (cid:12)(cid:12) α i (cid:105) + C E (cid:2) | ¯ (cid:15) i ( τ k ) | α i (cid:3) ≤ C + C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − KφK K − (cid:88) l =0 g (cid:18) lk (cid:19) ν γ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i + C E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) (cid:15) i ( τ i,k + l +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i (cid:35) ≤ C + CK − α i E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K − (cid:88) l =0 (cid:15) i ( τ i,k + l +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i (cid:35) ≤ C + CK − K − (cid:88) l =0 E (cid:2) | (cid:15) i ( τ i,k + l +1 ) | α i (cid:3) ≤ C a.s. , (cid:4) Proof of Theorem 3.

Without loss of generality, we assume that n = 2 L + 1 for some L ∈ N . We have | (cid:98) ρ αij,θ − ρ ij | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ζφKθ ρ,ij n − (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) − ρ ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ζφKθ ρ,ij n − (cid:88) k =1 (cid:104) ψ α ij { θ ρ,ij Q ρ,ij ( τ k ) } − ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = ( I ) + ( II ) . (A.20)First, consider ( I ). Let( ζ/φKθ ρ,ij ) n − (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) = (cid:98) ρ α,c ,ij,θ + (cid:98) ρ α,c ,ij,θ , where (cid:98) ρ α,c ,ij,θ = ζφKθ ρ,ij L (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) , (cid:98) ρ α,c ,ij,θ = ζφKθ ρ,ij L (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) . Also, deﬁne A ρ,ij ( τ k ) = E (cid:104) Q cρ,ij ( τ k ) (cid:12)(cid:12)(cid:12) F τ k − (cid:105) . Then we can show for any s > (cid:40)(cid:98) ρ α,c ,ij,θ − ζφK L (cid:88) k =1 A ρ,ij ( τ k − ) ≥ ζsφK (cid:41) ≤ exp {− θ ρ,ij s } E (cid:34) exp (cid:40) L (cid:88) k =1 (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) − θ ρ,ij A ρ,ij ( τ k − ) (cid:105)(cid:41)(cid:35) = exp {− θ ρ,ij s } E (cid:34) exp (cid:40) L − (cid:88) k =1 (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) − θ ρ,ij A ρ,ij ( τ k − ) (cid:105)(cid:41) E (cid:34) exp (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ L − ) (cid:9) − θ ρ,ij A ρ,ij ( τ L − ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ L − (cid:35)(cid:35) ≤ exp {− θ ρ,ij s } E (cid:34) exp (cid:40) L − (cid:88) k =1 (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) − θ ρ,ij A ρ,ij ( τ k − ) (cid:105)(cid:41)(cid:35) × exp (cid:8) c α ij U ρ,ij ( τ L − ) θ α ij ρ,ij (cid:9) ≤ exp (cid:8) − θ ρ,ij s + ( n − c α ij S ρ,ij θ α ij ρ,ij (cid:9) . Choose θ ρ,ij = (cid:18) log y − ( α ij − c α ij S ρ,ij ( n − (cid:19) /α ij , s = (cid:32) α α ij ij c α ij S ρ,ij ( n −

1) (log y − ) α ij − ( α ij − α ij − (cid:33) /α ij , where c log n ≤ log y − ≤ √ n . Then we havePr (cid:40)(cid:98) ρ α,c ,ij,θ − ζφK L (cid:88) k =1 A ρ,ij ( τ k − ) ≥ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:41) ≤ y. Similarly, we can showPr (cid:34) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ζφKθ ρ,ij n − (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) − ζφK n − (cid:88) k =1 A ρ,ij ( τ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:35) ≤ y. (A.21)Now, we need to establish the relationship between ζ (cid:80) n − k =1 A ρ,ij ( τ k ) / ( φK ) and ρ ij . Since X and (cid:15) are independent, similar to the proof of Theorem 1, we can show (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ρ ij − ζφK n − (cid:88) k =1 A ρ,ij ( τ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn − a.s. (A.22)Combining (A.21) and (A.22), we havePr (cid:110) ( I ) ≤ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − y. (A.23)48onsider ( II ). Note that Pr (cid:8) Λ i (1) ≤ C log y − (cid:9) ≥ − y. Hence, using the fact that ψ α ( x ) is a bounded function, we havePr (cid:26) ( II ) ≤ C (cid:18) log y − nθ ρ,ij (cid:19)(cid:27) ≥ − y, which implies Pr (cid:110) ( II ) ≤ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − y. (A.24)Collecting (A.20), (A.23), and (A.24), we obtain that with probability at least 1 − y , | (cid:98) ρ αij,θ − ρ ij | ≤ C (cid:0) n − log y − (cid:1) ( α ij − /α ij , and then substituting δ/ y completes the proof of (4.4). Also, (4.5) is proved by (4.3)and Theorem 1. (cid:4) A.4 Proof of Theorem 4

Proof of Theorem 4.

Let n i = n j = n and t i,k = t j,k = τ i,k = τ j,k = τ k = k/n for1 ≤ k ≤ n . Similar to the proof of Theorem 2, we construct two quadratic log-returnrandom variables Q ,ρ,ij ( τ k ) and Q ,ρ,ij ( τ k ) as follows. Let d X ( t ) = d X ( t ) = σ (cid:62) ( t ) d W t for any appropriate σ ( t ), which implies X ,h ( τ k +1 ) − X ,h ( τ k ) = X ,h ( τ k +1 ) − X ,h ( τ k ) for1 ≤ h ≤ p and 1 ≤ k ≤ n −

1. Also, let 2 (cid:15) ,h ( t h,k ) = (cid:15) ,h ( t h,k ) for 1 ≤ h ≤ p and 0 ≤ k ≤ n ,where the distributions of (cid:15) ,h ( t h,k ), 1 ≤ h ≤ p are deﬁned as follows: (cid:15) ,h ( t h,k ) =  n / α h (log(1 / δ )) − / α h with probability d − d − n / α h (log(1 / δ )) − / α h with probability d, d = log(1 / δ ) / n . For each 0 ≤ k ≤ n , let Pr { (cid:15) ,h ( t h,k ) > ≤ h ≤ p } =Pr { (cid:15) ,h ( t h,k ) < ≤ h ≤ p } = d and Pr { (cid:15) ,h ( t h,k ) = 0 for all 1 ≤ h ≤ p } = 1 − d .Then, using the fact that 1 − x ≥ exp( − x/ (1 − x )) for any 0 ≤ x ≤ /

2, we can show n (cid:89) k =1 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } = (cid:18) − n log 12 δ (cid:19) n ≥ δ. (A.25)Here, we need to check whether the construction satisﬁes Assumption 2. It suﬃces to showE (cid:2) | (cid:15) ,i ( τ ) | α i (cid:3) ≤ C. (A.26)Note thatE (cid:2) | (cid:15) ,i ( τ ) | α i (cid:3) = 14 and E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] = 14 (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij . Hence, (A.26) is satisﬁed, and since | ρ ,ij − ρ ,ij | = 3 nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] , we have for any (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ),max (cid:34) Pr (cid:40) | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij (cid:41) , Pr (cid:40) | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij (cid:41) (cid:35) ≥

12 Pr (cid:34) | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij or | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij (cid:35) ≥

12 Pr { (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) = (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) }≥ n (cid:89) k =0 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } ≥ δ, (A.27)50here the last inequality is from (A.25). (cid:4)(cid:4)