Adaptive Robust Large Volatility Matrix Estimation Based on High-Frequency Financial Data
AAdaptive Robust Large Volatility Matrix EstimationBased on High-Frequency Financial Data ∗ Minseok Shin , Donggyu Kim and Jianqing Fan KAIST and Princeton UniversityFebruary 26, 2021
Abstract
Several novel statistical methods have been developed to estimate large integratedvolatility matrices based on high-frequency financial data. To investigate their asymp-totic behaviors, they require a sub-Gaussian or finite high-order moment assumptionfor observed log-returns, which cannot account for the heavy tail phenomenon of stockreturns. Recently, a robust estimator was developed to handle heavy-tailed distribu-tions with some bounded fourth-moment assumption. However, we often observe thatlog-returns have heavier tail distribution than the finite fourth-moment and that thedegrees of heaviness of tails are heterogeneous over the asset and time period. Inthis paper, to deal with the heterogeneous heavy-tailed distributions, we develop anadaptive robust integrated volatility estimator that employs pre-averaging and trun-cation schemes based on jump-diffusion processes. We call this an adaptive robustpre-averaging realized volatility (ARP) estimator. We show that the ARP estimator ∗ Minseok Shin is a Ph.D. student, College of Business, KAIST, Seoul 02455, South Korea. DonggyuKim is Ewon Assistant Professor, College of Business, KAIST, Seoul 02455, South Korea. His researchwas supported by KAIST Basic Research Funds by Faculty (A0601003029). Jianqing Fan is Frederick L.Moore’18 Professor of Finance, Department of Operations Research and Financial Engineering, PrincetonUniversity, Princeton, NJ 08544. His research was supported by NSFC grant No.71991471 and 71991470. a r X i v : . [ m a t h . S T ] F e b as a sub-Weibull tail concentration with only finite 2 α -th moments for any α > Key words:
Heterogeneity, tail index, pre-averaging, minimax lower bound, optimality,POET, factor model.
In modern financial studies and practices, volatility estimation is fundamental in risk man-agement, performance evaluation, and portfolio allocation. Due to the wide availabilityof high-frequency financial data, many well-performing volatility estimation methods havebeen developed to estimate integrated volatilities. Examples include two-time scale real-ized volatility (TSRV) (Zhang et al., 2005), multi-scale realized volatility (MSRV) (Zhang,2006, 2011), wavelet estimator (Fan and Wang, 2007), pre-averaging realized volatility (PRV)(Christensen et al., 2010; Jacod et al., 2009), kernel realized volatility (KRV) (Barndorff-Nielsen et al., 2008, 2011), quasi-maximum likelihood estimator (QMLE) (A¨ıt-Sahalia et al.,2010; Xiu, 2010), and the local method of moments (Bibinger et al., 2014). One of thestylized features of financial data is the existence of price jumps, and the empirical studieshave shown that the decomposition of daily variation into its continuous and jump compo-nents can better explain the volatility dynamics (A¨ıt-Sahalia et al., 2012; Andersen et al.,2007; Barndorff-Nielsen and Shephard, 2006; Corsi et al., 2010; Song et al., 2020). For ex-ample, Fan and Wang (2007) and Zhang et al. (2016) employed wavelet method to identifythe jumps given noisy high-frequency data. Mancini (2004) studied a threshold method forjump-detection and presented the order of an optimal threshold, and Davies and Tauchen(2018) further examined a data-driven type threshold method. These estimation methodsperform well for a small number of assets. However, we often encounter a large number of2ssets in practices such as portfolio allocation, which results in the curse of dimensionality.To overcome the curse of dimensionality and obtain an efficient and effective large volatilityestimator, we often impose the approximate factor structure on the volatility matrix (Fanand Kim, 2018; Fan et al., 2013, 2018; Kim and Fan, 2019). For example, to account forcommon market factors such as sector, firm size, and book-to-market ratios, the factor-basedhigh-dimensional Itˆo process is widely employed and the idiosyncratic volatility is assumedto be sparse (A¨ıt-Sahalia and Xiu, 2017; Fan et al., 2016a,b; Kim et al., 2018; Kong, 2018).The principal orthogonal complement thresholding (POET) method (Fan et al., 2013) isoften employed to estimate these low-rank plus sparse matrices.The performance of the factor-based large volatility matrix estimator critically dependson the accuracy of each integrated volatility estimator. Specifically, sub-Weibull tail con-centration for the input volatility matrix estimator is required to investigate its asymptoticbehaviors. However, one stylized feature of stock return data is heavy-tailedness, whichviolates the sub-Gaussian assumption on the stock return data. Recently, with a boundedfourth-moment assumption on the microstructural noise, Fan and Kim (2018) developed a ro-bust estimation method, which can attain sub-Gaussian tail concentration with the optimalconvergence rate. See also Catoni (2012); Minsker (2018). However, the empirical studieshave demonstrated that the bounded fourth-moment condition is often violated (Cont, 2001;Mao and Zhang, 2018; Massacci, 2017). Figure 1 shows the box plots of daily log kurtosesof the returns of the 200 most liquid assets in the S&P 500 index, calculated from 1-minlog-return data with the previous tick scheme, for each of the 5 days in 2016: from theday with the largest interquartile range (IQR) to the day with the smallest IQR among 252days. In Figure 1, we find that the log-return data are heavy-tailed and also have heteroge-neous degrees of heaviness of tails over the different assets and different days. These factsgenerate the demand for developing an adaptive robust estimation method that can handleheterogeneous heavy-tailedness.In this paper, we develop an adaptive robust integrated volatility estimator based onjump-diffusion processes contaminated by microstructural noises. We first use the pre-averaging scheme (Jacod et al., 2009) to adjust the unbalanced order relationship between3 l lllll llllllllllll llllllllllll llllllllllllllll (a) (b) (c) (d) (e)
Box plot
Day
Log k u r t o s i s t−distribution with degrees of freedom 5 Figure 1: The box plots for the daily distributions of log kurtoses calculated from 1-minlog returns based on the most liquid 200 stocks in S&P 500 index in 2016. Day (a) has thelargest IQR, and days (b)–(e) have the 75th, 50th, 25th, and 0th (smallest) percentile of theIQR among 252 trading days in 2016, respectively. The red dash line marks the kurtosis forthe t -distribution.the microstructural noises and true log-returns. We then employ the truncation method(Minsker, 2018) using the daily moment conditions of assets. Specifically, we truncate pre-averaged variables according to their heavy-tailedness, which allows for adaptive learningmerits to be enjoyed. Also, the truncation method sufficiently mitigates the effect of thejump signal on the pre-averaged variables. We call the proposed estimator the adaptiverobust pre-averaging realized volatility (ARP) estimator. We show that the ARP estimatorhas sub-Weibull tail concentration, with finite 2 α -th moment assumption for any α > We first define some notations. For any given p by p matrix A = ( A ij ) ≤ i ≤ p , ≤ j ≤ p , let (cid:107) A (cid:107) = max ≤ j ≤ p p (cid:88) i =1 | A ij | , (cid:107) A (cid:107) ∞ = max ≤ i ≤ p p (cid:88) j =1 | A ij | , (cid:107) A (cid:107) max = max i,j | A ij | . The matrix spectral norm (cid:107) A (cid:107) is the square root of the largest eigenvalue of AA (cid:62) , andthe Frobenius norm of A is denoted by (cid:107) A (cid:107) F = (cid:112) tr( A (cid:62) A ). We will use C to denote ageneric positive constant whose value is free of n and p and may change from appearance toappearance.Let X ( t ) = ( X ( t ) , . . . , X p ( t )) (cid:62) be the vector of true log-prices for p assets at time t .To model the high-frequency financial data, we often employ the jump-diffusion process asfollows: d X ( t ) = d X c ( t ) + L ( t ) d Λ ( t )= µ ( t ) dt + σ (cid:62) ( t ) d W t + L ( t ) d Λ ( t ) , (2.1)where X c ( t ) = ( X c ( t ) , . . . , X cp ( t )) (cid:62) with X c (0) = X (0) is the vector of true continuous log-prices at time t , µ ( t ) = ( µ ( t ) , . . . , µ p ( t )) (cid:62) is a drift vector, σ ( t ) is a q by p matrix, W t
5s a q dimensional independent Brownian motion, and the stochastic processes µ ( t ), X ( t ), X c ( t ), and σ ( t ) are defined on a filtered probability space (Ω , F , {F t , t ∈ [0 , } , P ) withfiltration F t satisfying the usual conditions. For the jump part, L ( t ) = ( L ( t ) , . . . , L p ( t )) (cid:62) denotes the jump sizes and Λ ( t ) = (Λ ( t ) , . . . , Λ p ( t )) (cid:62) is the p dimensional Poisson processwith bounded intensity I ( t ) = ( I ( t ) , . . . , I p ( t )) (cid:62) . The instantaneous volatility matrix of thecontinuous log-price X c ( t ) is γ ( t ) = ( γ ij ( t )) ≤ i,j ≤ p = σ (cid:62) ( t ) σ ( t ) , and their quadratic variation is[ X c , X c ] t = (cid:90) t γ ( s ) ds = (cid:18)(cid:90) t γ ij ( s ) ds (cid:19) ≤ i,j ≤ p = (cid:90) t σ (cid:62) ( s ) σ ( s ) ds. The parameter of interest is the integrated volatility matrix of X c ( t ), Γ = [ X c , X c ] = (cid:90) γ ( s ) ds. (2.2)Unfortunately, we cannot observe the true log-prices X ( t ). In fact, observed high-frequency data are contaminated by microstructural noises. Furthermore, high-frequencydata encounter a non-synchronization problem that transactions for multiple assets oftenarrive asynchronously. In this regard, we assume that the observed log-price Y i ( t i,k ) obeysthe following model: Y i ( t i,k ) = X i ( t i,k ) + (cid:15) i ( t i,k ) for i = 1 , . . . , p, k = 0 , . . . , n i , (2.3)where t i,k is the k -th observation time point of the i -th asset, for fixed i = 1 , . . . , p, (cid:15) i ( t i,k ) , k =0 , . . . , n i , are i.i.d. noises with a mean of zero; and for i, j = 1 , . . . , p, E [ (cid:15) i ( t ) (cid:15) j ( t )] = η ij and (cid:15) i ( t ) is independent of (cid:15) j ( t (cid:48) ) for t (cid:54) = t (cid:48) . In other words, we allow the microstructural noisesto have cross-sectional dependency, and (cid:15) i ( · ) is independent of the price processes X i ( · ).6o handle the microstructural noise issue, several estimation methods have been devel-oped (A¨ıt-Sahalia et al., 2010; Barndorff-Nielsen et al., 2008, 2011; Bibinger et al., 2014;Christensen et al., 2010; Fan and Wang, 2007; Jacod et al., 2009; Xiu, 2010; Zhang et al.,2005; Zhang, 2006, 2011). They work well for a finite number of assets and are widelyadopted to develop large volatility matrix estimation procedures (Kim et al., 2016; Wangand Zou, 2010). However, the observed log-prices are heavy-tailed, so these methods cannotlead to the estimators with the sub-Weibull concentration bound that is essential asymptoticbehaviors for large volatility matrix inferences. To tackle the heavy tail issue, Fan and Kim(2018) proposed the robust pre-averaging realized volatility estimation procedure, which canachieve the sub-Gaussian tail concentration with only finite fourth-moment condition on themicrostructural noise. However, as shown in Figure 1, the degrees of heaviness of tails oflog-returns of assets are heterogeneous across assets and over time. Furthermore, jumps inthe true log-price process can also cause heavy-tailed distributions. To account for thesefeatures, we accommodate heterogeneous degrees of tail distributions based on the jump-diffusion process contaminated by microstructural noises. We assume that each asset hasa different order of the highest finite absolute moment (see Assumption 1 in Section 4 fordetails). In this section, we introduce an adaptive robust integrated volatility estimation procedureto handle the non-synchronization, price jump, and microstructural noise. To handle thenon-synchronization problem, we consider the generalized sampling time proposed by A¨ıt-Sahalia et al. (2010). We note that the generalized sampling time scheme includes othersynchronization schemes such as previous tick (Zhang, 2011; Wang and Zou, 2010) andrefresh time (Barndorff-Nielsen et al., 2011; Fan et al., 2012). See also Bibinger et al. (2014);Chen et al. (2020); Hayashi and Yoshida (2005, 2011); Malliavin et al. (2009); Park et al.(2016). We define the generalized sampling time as follows.7 efinition 1. (A¨ıt-Sahalia et al., 2010). A sequence of time points τ = { τ , . . . , τ n } is saidto be the generalized sampling time if(1) 0 = τ < τ < · · · < τ n − < τ n = 1;(2) There exists at least one observation for each asset between consecutive τ j ’s;(3) The time intervals, { ∆ j = τ j − τ j − ; j = 1 , . . . , n } , satisfy sup j ∆ j p −→ i -th asset, we select arbitrary observation, Y i ( τ i,k ), between τ k − and τ k . In otherwords, we choose any τ i,k ∈ ( τ k − , τ k ] ∩ { t i,l , l = 0 , , . . . , n i } , i = 1 , . . . , p .Based on synchronized time, τ , we adopt the pre-averaging method to manage the mi-crostructural noise (Jacod et al., 2009). For the observed log-returns, Y i ( τ i,k +1 ) − Y i ( τ i,k ) , i =1 , . . . , p, k = 1 , . . . , n −
1, the variance of the microstructural noise 2 η ii dominates the con-tinuous log-return volatility (cid:82) τ i,k +1 τ i,k γ ii ( t ) dt . Therefore, it is hard to estimate the integratedvolatility without smoothing to denoising. To adjust the order relationship between thenoises and continuous log-returns, we use the following pre-averaged data to suppress thenoises (Christensen et al., 2010; Jacod et al., 2009): Z i ( τ k ) = K n − (cid:88) l =0 g (cid:18) lK n (cid:19) { Y i ( τ i,k + l +1 ) − Y i ( τ i,k + l ) } for i = 1 , . . . , p, k = 1 , . . . , n − K n , (3.1)where the weight function g ( · ) is continuous and piecewise continuously differentiable witha piecewise Lipschitz derivative g (cid:48) and satisfies g (0) = g (1) = 0 and (cid:82) { g ( t ) } dt >
0. In thispaper, we choose bandwidth parameter K n as C K n / for some constants C K , which providesthe optimal rate n − / . Then the continuous log-returns and noises in Z i ( τ k )’s are of thesame order of magnitude (Fan and Kim, 2018). However, as shown in Figure 1, the pre-averaged random variables still have heterogeneous heavy tails across assets. Furthermore,there exist jump variations in the pre-averaged data. We note that in Z i ( τ k ), the jumpshave higher order of magnitude than the noises and continuous log-returns. To handle theseproblems, we robustly estimate the volatility matrix by applying an adaptive truncationmethod according to the tails of the data. 8efine the quadratic pre-averaged random variables Q ij ( τ k ) = n − K n φK n Z i ( τ k ) Z j ( τ k ) for i, j = 1 , . . . , p, k = 1 , . . . , n − K n , (3.2)where φ = K n (cid:80) K n − (cid:96) =0 (cid:110) g (cid:16) (cid:96)K n (cid:17)(cid:111) , and let α ij = 2 ∧ α i α j α i + α j , (3.3)where α i is the order of the highest finite moment for the continuous part of Q ii ( τ k ) (seeAssumption 1 in Section 4). Then, to handle the heterogeneous heavy tails, we propose thefollowing adaptive truncation method: (cid:98) T αij,θ = 1( n − K n ) θ ij n − K n (cid:88) k =1 ψ α ij { θ ij Q ij ( τ k ) } , (3.4)where θ ij is a truncation parameter and ψ α ( x ) is a bounded non-decreasing function definedfor 1 < α ≤ ψ α ( x ) = − log(1 − t α + c α t αα ) if x ≥ t α − log(1 − x + c α x α ) if 0 ≤ x ≤ t α log(1 + x + c α | x | α ) if − t α ≤ x ≤ − t α + c α t αα ) if x ≤ − t α , where c α = max (cid:110) ( α − /α, (cid:112) (2 − α ) /α (cid:111) and t α = (1 /αc α ) / ( α − . We note that the trun-cation detects the jumps and mitigates their impact on the estimator. Other truncation canalso achieve similar goal (see Fan et al. (2021)). It will be shown that the proposed adaptiverobust estimator (cid:98) T αij,θ possesses the sub-Weibull concentration bounds (see Theorem 1).The adaptive robust estimator (cid:98) T αij,θ is, however, not a consistent estimator of the trueintegrated volatility Γ ij since the noises still remain in each Q ij ( τ k ). Indeed, it will be shown9hat (cid:98) T αij,θ converges to T ij = Γ ij + ρ ij , (3.5)where ρ ij = (cid:80) nk =1 ( τ i,k = τ j,k ) φK n ζη ij , ζ = K n − (cid:88) l =0 (cid:26) g (cid:18) lK n (cid:19) − g (cid:18) l + 1 K n (cid:19)(cid:27) = O (cid:18) K n (cid:19) , with the covariance of noise η ij defined in (2.3), and ( · ) is the indicator function. Hence,to estimate the integrated volatility Γ ij , we adjust (cid:98) T αij,θ by subtracting an estimator of ρ ij .For this purpose, let us first define an adaptive robust estimator, (cid:98) ρ αij,θ , as (cid:98) ρ αij,θ = ζφK n θ ρ,ij n − (cid:88) k =1 ψ α ij { θ ρ,ij Q ρ,ij ( τ k ) } , (3.6)where Q ρ,ij ( τ k ) = 12 { Y i ( τ i,k +1 ) − Y i ( τ i,k ) }{ Y j ( τ j,k +1 ) − Y j ( τ j,k ) } (3.7)for i, j = 1 , . . . , p, k = 1 , . . . , n − , and θ ρ,ij is truncation parameter that will be specified inTheorem 3. We now define the integrated volatility estimator as follows: (cid:98) Γ ij = (cid:98) T αij,θ − (cid:98) ρ αij,θ . (3.8)We call this the adaptive robust pre-averaging realized volatility (ARP) estimator. Thisprovides a preliminary consistent estimate of (cid:98) Γ ij , which will be further regularized. In this section, we show the concentration property and optimality of the ARP estimator,by establishing matching upper and lower bounds for both (cid:98) T αij,θ and (cid:98) ρ αij,θ . Note that we donot impose any restrictions on the jump sizes L ( t ) in (2.1). In other words, for the truelog-prices, we only need assumptions for the continuous part. Specifically, Assumptions 1–210re based on the following random variables, Y ci ( τ i,k ) = X ci ( τ i,k ) + (cid:15) i ( τ i,k ) , Z ci ( τ k ) = K n − (cid:88) l =0 g (cid:18) lK n (cid:19) { Y ci ( τ i,k + l +1 ) − Y ci ( τ i,k + l ) } ,Q cij ( τ k ) = n − K n φK n Z ci ( τ k ) Z cj ( τ k ) ,Q cρ,ij ( τ k ) = 12 { Y ci ( τ i,k +1 ) − Y ci ( τ i,k ) }{ Y cj ( τ j,k +1 ) − Y cj ( τ j,k ) } , where X ci ( t ) is the true continuous log-price process defined in (2.1) and the superscript c rep-resents the continuous part of the true log-price. Now, to investigate asymptotic propertiesof (cid:98) T αij,θ , we make the following assumptions. Assumption 1. (a) There exist positive constants ν µ and ν γ such that max ≤ i ≤ p max ≤ t ≤ µ i ( t ) ≤ ν µ a.s. and max ≤ i ≤ p max ≤ t ≤ γ ii ( t ) ≤ ν γ a.s. ; (b) The generalized sampling time { τ j } is independent of the price process X ( t ) and thenoise (cid:15) i ( t i,k ) . The time intervals, { ∆ j = τ j − τ j − , ≤ j ≤ n } , satisfy max ≤ j ≤ n ∆ j ≤ Cn − a.s. ; (c) There exists a positive constant ν Q such that max ≤ i ≤ p E {| Q cii ( τ k ) | α i } ≤ ν Q for all ≤ k ≤ n − K n . Remark 1.
For Assumption 1(a), the boundedness condition of the instantaneous volatilityprocess γ ii ( t ) can be relaxed to the locally boundedness condition when we investigate theasymptotic behaviors of volatility estimators such as their convergence rate (see A¨ıt-Sahaliaand Xiu (2017)). Specifically, Lemma 4.4.9 in Jacod and Protter (2012) indicates that if the symptotic result such as convergence in probability or stable convergence in law is satisfiedunder the boundedness condition, it is also satisfied under the locally boundedness condition.From this point of view, since we consider a finite time period, it is sufficient to investigatethe asymptotic properties under the boundedness condition. Thus, Assumption 1(a) is notrestrictive. Remark 2.
Assumption 1(c) is the finite moment condition, which entails that the quadraticpre-averaged variable, Q cij ( τ k ) , for the continuous part satisfies E (cid:110)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k (cid:111) ≤ U ij ( τ k ) a.s.for all ≤ i, j ≤ p , ≤ k ≤ n − K n , and some positive constants U ij ( τ k ) , where α ij isdefined in (3.3) (see Proposition 2(a) in the Appendix). To account for the heterogeneousheavy-tailedness, we allow the tail index α i to vary from 1 to infinity. If α i = 2 for all i = 1 , . . . , p , it is the similar setting as that of Fan and Kim (2018), and the ARP estimatorhas universal truncation, which we call the universal robust pre-averaging realized volatility(URP) estimator. To investigate the heterogeneous heavy tail, we compare the ARP andURP estimators in the numerical study. The theorem below shows that (cid:98) T αij,θ has the sub-Weibull tail concentration with a con-vergence rate of n (1 − α ij ) / α ij . Theorem 1. (Upper bound) Under the models (2.1) and (2.3) and Assumption 1, let δ − ∈ [ n c , e n / ] for some positive constant c > . Take θ ij = (cid:18) K n log (3 K n δ − )( α ij − c α ij S ij ( n − K n ) (cid:19) /α ij , where S ij = 1 n − K n (cid:80) n − K n k =1 U ij ( τ k ) . Then we have, for sufficiently large n , Pr (cid:110) | (cid:98) T αij,θ − T ij | ≤ C (cid:0) n − / log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ − δ. (4.1)Theorem 1 indicates that (cid:98) T αij,θ has a sub-Weibull concentration bound with a convergence12ate of n (1 − α ij ) / α ij . Specifically, as long as p b +2 ∈ [ n c , e n / ] for some positive constant c > (cid:26) max ≤ i,j ≤ p | (cid:98) T αij,θ − T ij | ≥ C b (cid:0) n − / log p (cid:1) ( α − /α (cid:27) ≤ p − b for any constant b > α = min ≤ i ≤ p α i , where C b is some constant depending on b ,which is the essential condition for investigating large matrix inferences (see Proposition 1).An interesting finding is that there is a trade-off between the convergence rate n (1 − α ij ) / α ij and the tail indexes α i and α j . This raises the question of whether the upper bound in (4.1)is optimal or not.Let (cid:98) T ij ( Q ij ( τ k ) , δ ) = (cid:98) T ij ( Q ij ( τ ) , . . . , Q ij ( τ n − K n ) , δ ) be any pre-averaging estimator for T ij defined in (3.5), which takes values of pre-averaged variables Q ij ( τ k ) , k = 1 , . . . , n − K n , defined in (3.2). The following theorem establishes the lower bound for the maximumconcentration probability among the class of pre-averaging estimators (cid:98) T ij ( Q ij ( τ k ) , δ ) whichsatisfy max ≤ i ≤ p E {| Q cii ( τ k ) | α i } ≤ C for all 1 ≤ k ≤ n − K n . Theorem 2. (Lower bound) Under the assumptions in Theorem 1, let α ij ∈ (1 , for some ≤ i, j ≤ p . Then we have, for sufficiently large n , min (cid:98) T ij ( Q ij ( τ k ) ,δ ) max Q c ∈ Ξ( α ,...,α p ) Pr (cid:110) | (cid:98) T ij ( Q ij ( τ k ) , δ ) − T ij | ≥ C (cid:0) n − / log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ δ, (4.2) where Ξ( α , . . . , α p ) = { Q c = ( Q cii ( τ k )) i =1 ,...,p,k =1 ,...,n − K n : max i,k E {| Q cii ( τ k ) | α i } ≤ C } . Theorem 2 shows that the lower bound is n (1 − α ij ) / α ij , which matches the upper boundin Theorem 1. Thus, the proposed estimator obtains the optimal convergence rate of n (1 − α ij ) / α ij . Remark 3.
To handle the microstructural noise, we use the sub-sampling scheme, andthe number of non-overlapping quadratic pre-averaged variables Q ij ( τ k ) is Cn / , which isknown as the optimal choice. That is, we are only able to use n / observations to estimate T ij due to biases of varying spot volalities, which is the cost of managing the microstructural oise. Thus, the optimal convergence rate is expected to be the square root of the rates of theestimators that are not affected by the microstructural noise. From this point of view, theconvergence rate n (1 − α ij ) / α ij is consistent with the results in Devroye et al. (2016) and Sunet al. (2020). Recall that the ARP estimator has the bias adjustment as follows: (cid:98) Γ αij = (cid:98) T αij,θ − (cid:98) ρ αij,θ . (4.3)Thus, to establish the concentration inequality for the ARP estimator (cid:98) Γ αij , we need to inves-tigate (cid:98) ρ αij,θ . To do this, we use the quadratic log-return random variables Q ρ,ij ( τ k ) definedin (3.7) and need the following moment condition. Assumption 2.
There exists a positive constant ν ρ,Q such that max ≤ i ≤ p E (cid:8)(cid:12)(cid:12) Q cρ,ii ( τ k ) (cid:12)(cid:12) α i (cid:9) ≤ ν ρ,Q for all ≤ k ≤ n − . Remark 4.
Assumption 2 indicates that the continuous part of the observed log-return, Y ci ( τ i,k +1 ) − Y ci ( τ i,k ) , has finite α i -th moment. We note that Assumption 1(c) is satisfiedunder Assumption 1(a)–(b) and Assumption 2 (see Proposition 3 in the Appendix). Under Assumption 1(a)–(b) and Assumption 2, Q cρ,ij ( τ k ) has conditional α ij -th moments,E (cid:110)(cid:12)(cid:12) Q cρ,ij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k − (cid:111) ≤ U ρ,ij ( τ k ) a.s.for all 1 ≤ i, j ≤ p , 1 ≤ k ≤ n −
1, and some positive constants U ρ,ij ( τ k ) (see Proposition2(b) in the Appendix). With this α ij -th moment condition, we establish the concentrationinequalities for the ARP estimator (cid:98) Γ αij in the following theorem. Theorem 3. (Upper bound) Under the assumptions in Theorem 1 and Assumption 2, take θ ρ,ij = (cid:18) log (6 δ − )( α ij − c α ij S ρ,ij ( n − (cid:19) /α ij , here S ρ,ij = 1 n − (cid:80) n − k =1 U ρ,ij ( τ k ) . Then, for sufficiently large n , we have Pr (cid:110) | (cid:98) ρ αij,θ − ρ ij | ≤ C (cid:0) n − log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ − δ (4.4) and Pr (cid:110) | (cid:98) Γ αij − Γ ij | ≤ C (cid:0) n − / log δ − (cid:1) ( α ij − /α ij (cid:111) ≥ − δ. (4.5)Theorem 3 shows that (cid:98) ρ αij,θ has a sub-Weibull tail concentration bound with the conver-gence rate of n (1 − α ij ) /α ij , which is negligible compared to the upper bound in Theorem 1.Thus, the ARP estimator has a sub-Weibull tail concentration with the convergence rate of n (1 − α ij ) / α ij as in Theorem 1, which is optimal as shown in Theorems 1–2. Although the upperbound for (cid:98) ρ αij,θ is dominated by the upper bound in Theorem 1, it is worth checking whether (cid:98) ρ αij,θ is an optimal estimator or not. Let (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ) = (cid:98) ρ ij ( Q ρ,ij ( τ ) , . . . , Q ρ,ij ( τ n − ) , δ )be any estimator for ρ ij , possibly depending on δ . The following theorem provides alower bound for the maximum concentration probability among the class of estimators (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ) which satisfy max ≤ i ≤ p E (cid:8)(cid:12)(cid:12) Q cρ,ii ( τ k ) (cid:12)(cid:12) α i (cid:9) ≤ C for all 1 ≤ k ≤ n − Theorem 4. (Lower bound) Under the assumptions in Theorem 3, let α ij ∈ (1 , for some ≤ i, j ≤ p . Then we have, for sufficiently large n , min (cid:98) ρ ij ( Q ρ,ij ( τ k ) ,δ ) max Q cρ ∈ Ξ ρ ( α ,...,α p ) Pr (cid:110) | (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ) − ρ ij | ≥ C (cid:0) log δ − /n (cid:1) ( α ij − /α ij (cid:111) ≥ δ, (4.6) where Ξ ρ ( α , . . . , α p ) = { Q cρ = ( Q cρ,ii ( τ k )) i =1 ,...,p,k =1 ,...,n − : max i,k E (cid:8)(cid:12)(cid:12) Q cρ,ii ( τ k ) (cid:12)(cid:12) α i (cid:9) ≤ C } . The upper bound in (4.4) and lower bound in (4.6) match, which implies that (cid:98) ρ αij,θ achievesthe optimal rate. To sum up, the proposed estimators for T ij and ρ ij are both optimal interms of convergence rate, which indicates that the ARP estimator is also optimal in theclass of pre-averaging approaches. 15 Application to large volatility matrix estimation
In this section, we discuss how to estimate large integrated volatility matrices based onthe approximate factor model using the ARP estimator. Specifically, we assume that theintegrated volatility matrix has the following low-rank plus sparse structure: Γ = p (cid:88) k =1 λ k q k q (cid:62) k = Θ + Σ = r (cid:88) k =1 ¯ λ k ¯ q k ¯ q (cid:62) k + Σ , where λ i > λ i > i -th largest eigenvalues of Γ and Θ , respectively, and theircorresponding eigenvectors are q i and ¯ q i . The low-rank volatility matrix Θ accounts forthe factor effect on the volatility matrix. We assume that the rank, r , of Θ is bounded.The sparse volatility matrix Σ stands for idiosyncratic risk and satisfies the following sparsecondition: max ≤ i ≤ p p (cid:88) j =1 | Σ ij | q (Σ ii Σ jj ) (1 − q ) / ≤ M σ s p a.s. , (5.1)where M σ is a positive random variable with E ( M σ ) < ∞ , q ∈ [0 , s p is a deterministicfunction of p that grows slowly in p . When Σ ii is bounded from below and q = 0, s p measuresthe maximum number of nonvanishing elements in each row of matrix Σ . This low-rank plussparse structure is widely adopted for studying large matrix inferences (A¨ıt-Sahalia and Xiu,2017; Bai, 2003; Bai and Ng, 2002; Fan and Kim, 2018; Fan et al., 2018; Kim et al., 2018;Stock and Watson, 2002).To harness the low-rank plus sparse structure, we employ the principal orthogonal com-plement thresholding (POET) method (Fan et al., 2013) as follows. We first decompose aninput volatility matrix using the ARP estimators in (3.8) as follows: (cid:98) Γ = (cid:16)(cid:98) Γ αij (cid:17) i,j =1 ,...,p = p (cid:88) k =1 (cid:98) λ k (cid:98) q k (cid:98) q (cid:62) k , where (cid:98) λ i is the i -th largest eigenvalue of (cid:98) Γ , and (cid:98) q i is its corresponding eigenvector. Then,using the first r principal components, we estimate the low-rank factor volatility matrix Θ
16s follows: (cid:98) Θ = r (cid:88) k =1 (cid:98) λ k (cid:98) q k (cid:98) q (cid:62) k . To estimate the sparse volatility matrix Σ , we first calculate the input idiosyncratic volatilitymatrix estimator (cid:101) Σ = ( (cid:101) Σ ij ) ≤ i,j ≤ p = (cid:98) Γ − (cid:98) Θ and employ the adapted thresholding method asfollows: (cid:98) Σ ij = (cid:101) Σ ij ∨ , if i = js ij ( (cid:101) Σ ij ) ( | (cid:101) Σ ij | ≥ (cid:36) ij ) , if i (cid:54) = j and (cid:98) Σ = ( (cid:98) Σ ij ) ≤ i,j ≤ p , where the thresholding function s ij ( · ) satisfies that | s ij ( x ) − x | ≤ (cid:36) ij , and the adaptive thresh-olding level (cid:36) ij = (cid:36) n (cid:113) ( (cid:101) Σ ii ∨ (cid:101) Σ jj ∨ (cid:36) ij . Examples of the thresholding function s ij ( x ) include the soft thresholding func-tion s ij ( x ) = x − sign( x ) (cid:36) ij and the hard thresholding function s ij ( x ) = x . The tuningparameter (cid:36) n will be specified in Proposition 1. In the empirical study, we use the hardthresholding method.With the low-rank volatility matrix estimator (cid:98) Θ = ( (cid:98) Θ ij ) ≤ i,j ≤ p and the sparse volatilitymatrix estimator (cid:98) Σ = ( (cid:98) Σ ij ) ≤ i,j ≤ p , we estimate the integrated volatility matrix Γ by (cid:98) Γ P OET = (cid:98) Θ + (cid:98) Σ . To investigate asymptotic behaviors of the POET estimator, the sub-Weibull concentra-tion inequality is required, and is satisfied by the ARP estimator as shown in Theorem 3.Thus, the POET estimator based on the ARP estimators can enjoy the similar asymptoticproperties established in Fan and Kim (2018). To study its asymptotic behaviors, we needthe following technical conditions imposed by Fan and Kim (2018), but the sub-Weibullconcentration rate is different because we consider heterogeneous heavy-tailedness.
Assumption 3. (a) Let D λ = min { ¯ λ i − ¯ λ i +1 : 1 ≤ i ≤ r } , ( λ + pM σ ) /D λ ≤ C a.s., and D λ ≥ C p a.s.for some generic constants C and C , where ¯ λ r +1 = 0 , M σ is defined in (5.1) , and ¯ λ i and λ i are the i -th eigenvalues of Θ and Γ , respectively; b) For some fixed constant C , we have pr max ≤ i ≤ p r (cid:88) j =1 ¯ q ij ≤ C a.s. , where ¯ q j = (¯ q j , . . . , ¯ q pj ) (cid:62) is the j -th eigenvector of Θ ;(c) The smallest eigenvalue of Σ stays away from zero almost surely;(d) s p / √ p + (cid:0) n − / log p (cid:1) ( α − /α = o (1) , where α = min ≤ i ≤ p α i . Under Assumption 3, we can establish the following proposition similar to the proof ofTheorem 3 in Fan and Kim (2018). Below, we assume a generic input (cid:98) Γ that satisfies (5.2).In particular, the ARP estimator satisfies the condition, as shown in Theorem 3. Proposition 1.
Under the model (2.1) , let α = min ≤ i ≤ p α i and assume that the concentra-tion inequality, Pr (cid:26) max ≤ i,j ≤ p | (cid:98) Γ ij − Γ ij | ≥ C (cid:0) n − / log p (cid:1) ( α − /α (cid:27) ≤ p − , (5.2) Assumption 3, and the sparse condition (5.1) are met. Take (cid:36) n = C (cid:36) β n for some large fixedconstant C (cid:36) , where β n = M σ s p /p + (cid:0) n − / log p (cid:1) ( α − /α . Then we have for sufficiently largen, with probability greater than − p − , (cid:107) (cid:98) Σ − Σ (cid:107) ≤ CM σ s p β − qn , (5.3) (cid:107) (cid:98) Σ − Σ (cid:107) max ≤ Cβ n , (5.4) (cid:107) (cid:98) Γ P OET − Γ (cid:107) Γ ≤ C (cid:104) p / (cid:0) n − / log p (cid:1) (2 α − /α + M σ s p β − qn (cid:105) , (5.5) (cid:107) (cid:98) Γ P OET − Γ (cid:107) max ≤ Cβ n , (5.6) where the relative Frobenius norm (cid:107) A (cid:107) Γ = p − (cid:107) Γ − / AΓ − / (cid:107) F . Furthermore, suppose that M σ s p β − qn = o (1) . Then, with probability approaching 1, the minimum eigenvalue of (cid:98) Σ is ounded away from 0, (cid:98) Γ P OET is non-singular, and (cid:107) (cid:98) Σ − − Σ − (cid:107) ≤ CM σ s p β − qn , (5.7) (cid:107) (cid:98) Γ − P OET − Γ − (cid:107) ≤ CM σ s p β − qn . (5.8) Remark 5.
Unlike Theorem 3 in Fan and Kim (2018), Proposition 1 imposes the sub-Weibull concentration condition (5.2) , which is the optimal rate with only finite 2 α -th mo-ments, as shown in Theorems 1–4. Note that if p ∈ [ n c , e n / ] for some positive con-stant c > , Theorem 3 shows that the ARP estimator satisfies (5.2) for δ = 1 / (2 p ) .Also, the POET estimator is consistent in terms of the relative Frobenius norm as long as p = o ( n (2 α − /α ) . That is, the convergence rate is a function of the minimum tail index α . To implement the ARP estimation procedure, we need to choose tuning parameters. In thissection, we discuss how to select the tuning parameters for the numerical studies.We first estimate the tail index α i as follows. Let D i ( τ k ) = Y i ( τ i,k +1 ) − Y i ( τ i,k ) for k = 1 , . . . , n −
1. Then the tail index is estimated by (cid:98) α i = min (cid:40) a ∈ (1 , c α ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:88) k =1 (cid:12)(cid:12)(cid:12) D i ( τ k ) − ¯ D i s D i (cid:12)(cid:12)(cid:12) a > c α E | Z | a (cid:41) , (5.9)where ¯ D i and s D i are sample mean and sample standard deviation of D i ( τ k ), respectively, Z is a standard normal random variable, and c α and c α are two positive constants. Ifthere is no a that satisfies above inequality, we choose (cid:98) α i = c α . The intuition is that ifthe standardized 2 a -th moment is too large, we would say that its 2 a -th moment does notexist. To quantify the degree of largeness, we compare it with a multiple of correspondingstandard Gaussian moment. That leads to the method in (5.9). In the empirical study, weuse c α = 5 and c α = 2. Then the estimator, (cid:98) α ij , of α ij is calculated as follows: (cid:98) α ij = 2 ∧ (cid:98) α i (cid:98) α j (cid:98) α i + (cid:98) α j . (cid:98) α ij , we choose the thresholding level as follows: θ ij = c (cid:32) K n log p ( (cid:98) α ij − c (cid:98) α ij (cid:98) S ij ( n − K n ) (cid:33) / (cid:98) α ij (5.10)and θ ρ,ij = c (cid:32) log p ( (cid:98) α ij − c (cid:98) α ij (cid:98) S ρ,ij ( n − (cid:33) / (cid:98) α ij , (5.11)where (cid:98) S ij = 1 n − K n (cid:80) n − K n k =1 (cid:110) | Q ij ( τ k ) | (cid:98) α ij (cid:111) , (cid:98) S ρ,ij = 1 n − (cid:80) n − k =1 (cid:110) | Q ρ,ij ( τ k ) | (cid:98) α ij (cid:111) , and c is atuning parameter. For the empirical study, we choose c as 0.15. Also, in the pre-averagingstage, we choose K n = (cid:98) n / (cid:99) and g ( x ) = x ∧ (1 − x ). Remark 6.
There are other ways to estimate the tail indices. For example, one of the mostpopular methods is Hill’s estimator (Hill, 1975) which is consistent under some conditions.However, the performance of the Hill’s estimator also heavily depends on the choice of theother tuning parameter. In fact, we conducted some empirical study using the Hill’s estimatorand found that the performance is not robust to the choice of the tuning parameter. Thatis, we run into the other tuning parameter choice problem. Furthermore, the result is notbetter than that of the proposed method in (5.9) . Thus, we use the proposed method inthe numerical study. To justify the proposed method, we need to investigate its asymptoticbehavior. However, it may be a demanding task to develop some tuning parameter choiceprocedure that not only works well practically, but also has good properties in theory. Weleave this for the future study.
To check the finite sample performance of the ARP estimator, we conducted a simulationstudy. To obtain the low rank plus sparse structure, we considered the following true log-price20rocess, d X ( t ) = µ ( t ) dt + ϑ (cid:62) ( t ) d W ∗ t + σ (cid:62) ( t ) d W t + L ( t ) d Λ ( t ) , where µ ( t ) = (0 . , . . . , . (cid:62) , W ∗ t and W t are r and p dimensional independent Brownianmotions, respectively, ϑ ( t ) and σ ( t ) are r by r and p by p matrices, respectively, L ( t ) isjump sizes, and Λ ( t ) is the p dimensional Poisson process with the intensity I ( t ). We useda heterogeneous heavy tail process (heavy tail process 1), homogeneous heavy tail process(heavy tail process 2), and sub-Gaussian process to generate the volatility process. Togenerate two heavy tail processes, we used a setting similar to those in Wang and Zou(2010) and Fan and Kim (2018). Specifically, let σ ( t ) be the Cholesky decomposition of ς ( t ) = ( ς ij ( t )) ≤ i,j ≤ p . The diagonal elements of ς ( t ) come from four different processes:geometric Ornstein-Uhlenbeck processes, the sum of two CIR processes (Cox et al., 1985;Barndorff-Nielsen, 2002), the volatility process in Nelson’s GARCH diffusion limit model(Wang, 2002), and the two-factor log-linear stochastic volatility process (Huang and Tauchen,2005) with leverage effect. Details can be found in Wang and Zou (2010). To control the tailbehaviors of the instantaneous volatility matrix ς ( t ), we used the t -distribution as follows: ς ii ( t l ) = (1 + | t df i ,l | ) (cid:101) ς ii ( t l ) , where for l = 1 , . . . , n all , t df i ,l are the i.i.d. t -distributions with degrees of freedom df i , t l = l/n all , and (cid:101) ς ii ( t l ) were generated by the above four different processes. To account forthe heterogeneous heavy-tailed distribution (heavy tail process 1), df i were generated fromthe unif(2.5, 4), whereas for the homogeneous heavy-tailed distribution (heavy tail process2), we set df i = 5. To obtain the sparse instantaneous volatility matrix ς ( t ), we generatedits off-diagonal elements as follows: ς ij ( t l ) = { κ ( t l ) } | i − j | (cid:113) ς ii ( t l ) ς jj ( t l ) , ≤ i (cid:54) = j ≤ p, κ ( t ) is given by κ ( t ) = e u ( t ) − e u ( t ) + 1 , du ( t ) = 0 . { . − u ( t ) } dt + 0 . u ( t ) dW κ,t ,W κ,t = √ . W κ,t − . p (cid:88) i =1 W it / √ p, and W κ,t , κ = 1 , . . . , p , are one-dimensional Brownian motions independent of the Brow-nian motions W ∗ t and W t . The low-rank instantaneous volatility matrix ϑ (cid:62) ( t ) ϑ ( t ) is B (cid:62) { ϑ f ( t ) } (cid:62) ϑ f ( t ) B , where B = ( B ij ) ≤ i ≤ r, ≤ j ≤ p ∈ R r × p and B ij was generated from thenormal distribution with a mean of 0 and a standard deviation of 0.9, and ϑ f ( t ) was gen-erated similar to σ ( t ). Specifically, ϑ f ( t ) is the Cholesky decomposition of ς f ( t ), and thediagonal elements of ς f ( t ) at time t l were ς fii ( t l ) = (cid:110) (cid:12)(cid:12)(cid:12) t fdf i ,l (cid:12)(cid:12)(cid:12)(cid:111) (cid:101) ς fii ( t l ) , where t fdf i ,l , l = 1 , . . . , n all , are the i.i.d. t-distributions with degrees of freedom df i , and (cid:101) ς fii ( t l ) , l = 1 , . . . , n all , were generated from the geometric Ornstein-Uhlenbeck processes. Theoff-diagonal elements of ς f ( t ) were set as zero. For the jump part, we chose I ( t ) = (5 , . . . , (cid:62) and the jump size L i ( t ) was obtained from independent t-distribution with degrees of free-dom df i and standard deviation 0 . (cid:113)(cid:82) γ ii ( t ) dt . We also generated a sub-Gaussian processsimilar to the heavy tail process except that t-distribution terms were set as standard normaldistribution terms.To generate the observation time points, we first simulated n all − n all − t k , k = 1 , . . . , n all −
1, with t = 0 and t n all = 1. Basedon these points, we generated non-synchronized data similar to the scheme in A¨ıt-Sahaliaet al. (2010) as follows. First, p random proportions w i , i = 1 , . . . , p , were independentlygenerated from the unif(0.8, 1). Second, we set each t k as the observation time point of the i -th asset if and only if independent Bernoulli random variable with parameter w i has a valueof 1. Third, the noise-contaminated high-frequency observations Y i ( t i,k ) were generated from22he model (2.3). Specifically, the noise (cid:15) i ( t i,k ) was obtained from independent t-distributionwith degrees of freedom df i and standard deviation 0 . (cid:113)(cid:82) γ ii ( t ) dt. We chose p = 200 and r = 3, and we varied n all from 1000 to 4000. We employed the refresh time scheme to obtainsynchronized data.To investigate the effect of the adaptiveness of the proposed ARP procedure, we introducea universal robust pre-averaging realized volatility (URP) estimator, which uses the sameestimation procedure as the ARP estimator with (cid:98) α ij = 2 for all 1 ≤ i, j ≤ p . That is,the URP estimator truncates the pre-averaged variables with the universal tail index level.Thus, we calculated the input volatility matrix using the adaptive robust pre-averagingrealized volatility matrix (ARPM), universal robust pre-averaging realized volatility matrix(URPM), and pre-averaging realized volatility matrix (PRVM) estimators. We used thetuning parameters discussed in Section 5.1 and set the tuning parameter c in (5.10)–(5.11)as 0.2. We note that, compared with the ARPM estimation procedure, the PRVM estimatorwas obtained by setting ψ α ( x ) = x in Section 3 and the URPM estimator was obtained bysetting α ij = 2 for all i, j = 1 , . . . , p . This means that the PRVM estimator cannot accountfor the heavy tail and that the URPM estimator cannot explain the heterogeneity of differentdegrees of the heaviness of tail distributions.To make the estimates positive semi-definite, we projected the volatility matrix estimatorsonto the positive semi-definite cone in the spectral norm. To calculate the POET estimators,we used the hard thresholding scheme and selected the thresholding level by minimizing thecorresponding Frobenius norm. The average estimation errors under the Frobenius norm,relative Frobenius norm, (cid:107) · (cid:107) Γ , (cid:96) -norm (spectral norm), and maximum norm were computedbased on 1000 simulations. The average numbers of synchronized time points with the refreshtime scheme were equal to 300.5, 600.4, 1199.7 for n all = 1000 , , α ij given n all = 1000 , , MSEtail type \ n all α ij against the samplesize n all for two heavy-tail processes. For the heterogeneous heavy-tail process, 2 α i ’s weregenerated from the unif(2.5, 4) and 2 α i =5 for the homogeneous heavy-tail process. Wecalculated α ij using (3.3). From Table 1, we can find that for the heterogeneous heavy-tailprocess, the MSE decreases as the sample size n all increases. The MSEs for the homogeneousheavy-tail process are small regardless of n all . We note that for the sub-Gaussian process,more than 99.99 percent of α ij were estimated to be 2 (regarding as correctly estimateddue to subGaussianity of the truncated average) for all n all . These results indicate that theproposed tail index estimator works well.Figure 2 plots the Frobenius, relative Frobenius, spectral, and max norm errors againstthe sample size n all for the POET estimators from the ARPM, URPM, and PRVM estima-tors. Figure 3 depicts the spectral norm errors against the sample size n all for the inversePOET estimators with the ARPM, URPM, and PRVM estimators. As expected, the ARPMestimator outperforms other estimators for the heterogeneous heavy-tail process. For the ho-mogeneous heavy-tail and sub-Gaussian processes, the ARPM and URPM estimators bothperform similarly and outperform the PRVM estimator. One possible explanation of thepoor performance of the PRVM estimator in the Gaussian noise case is that the true returnprocess contains heavy distributions over time and hence robust methods outperform. Tosum up, the ARPM estimator is robust to the heterogeneity of the heaviness of tails andadapts to homogeneity of the heaviness of tails. In this section, we applied the proposed ARP estimator to high-frequency trading data for200 assets from January 1 to December 31, 2016 (252 trading days). The top 200 large tradingvolume stocks were selected from among the S&P 500, and the data was obtained from theWharton Data Service (WRDS) system. Figure 4 plots the daily synchronized sample sizesfrom the refresh time scheme for 200 assets. As seen in Figure 4, sampling frequency higherthan 1 minute can lead to the non-existence of the observation between some consecutivesample points. Hence, we employed 1-min log-return data with the previous tick scheme to24 l l
Heterogeneous tail n all F r oben i u s no r m l ARPMURPMPRVM l l l
Homogeneous tail n all F r oben i u s no r m l l l Sub−Gaussian process n all F r oben i u s no r m l l l . . . . . . . Heterogeneous tail n all R e l a t i v e F r oben i u s no r m l l l . . . . . . . Homogeneous tail n all R e l a t i v e F r oben i u s no r m l l l . . . . . . . Sub−Gaussian process n all R e l a t i v e F r oben i u s no r m l l l Heterogeneous tail n all S pe c t r a l no r m l l l Homogeneous tail n all S pe c t r a l no r m l l l Sub−Gaussian process n all S pe c t r a l no r m l l l Heterogeneous tail n all M a x i m u m no r m l l l Homogeneous tail n all M a x i m u m no r m l l l Sub−Gaussian process n all M a x i m u m no r m Figure 2: The Frobenius, relative Frobenius, spectral, and max norm error plots of thePOET estimators with the ARPM, URPM, and PRVM estimators for p = 200 and n all =1000 , , l l . . . . Heterogeneous tail n all S pe c t r a l no r m l ARPMURPMPRVM l l l . . . . Homogeneous tail n all S pe c t r a l no r m l l l . . . . Sub−Gaussian process n all S pe c t r a l no r m Figure 3: The spectral norm error plots of the inverse POET estimators with the ARPM,URPM, and PRVM estimators for p = 200 and n all = 1000 , , llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Day S a m p l e s i z e
30 s1 min
Figure 4: The number of daily synchronized samples from the refresh time scheme for 200assets over 252 days in 2016. The blue dash and red solid lines mark the numbers of possibleobservations for 30-sec and 1-min log-returns in each trading day, which are 780 and 390respectively.To catch the heterogeneous heavy-tailedness over time, we estimated the tail indexesusing the method (5.9) in Section 5.1. Figure 5 shows the box plots of daily estimated tailindexes (cid:98) α i of 200 assets for each of the 5 days in 2016: from the day with the largest IQRto the day with the smallest IQR among 252 days. Figure 5 shows that the tail indexes ofobserved log-returns are heterogeneous over time, which matches the daily kurtoses result inFigure 1. This supports the heterogeneous heavy tail assumption.To apply POET estimation procedures, we first need to determine the rank r . We26 lllllllllllllll lllllll lllllllllll l (a) (b) (c) (d) (e) Box plot
Day a ^ i Figure 5: The box plots of the distributions of the daily estimated tail indexes (cid:98) α i for the200 most liquid stocks among the S&P 500 index in 2016. Day (a) has the largest IQR, anddays (b)–(e) have the 75th, 50th, 25th, and 0th (minimum) percentile of the IQR among 252trading days in 2016, respectively.calculated 252 daily integrated volatility matrices using the PRVM estimation procedure.Figure 6 shows the scree plot drawn using the eigenvalues from the sum of 252 PRVMestimates. As seen in Figure 6, the possible values of the rank r are 1 , ,
3, and hence weconducted the empirical study for r = 1 , , llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Scree plot E i gen v a l ue Figure 6: The scree plot of eigenvalues of the sum of 252 PRVM estimates.To estimate the sparse volatility matrix Σ , we used the Global Industry Classification27tandard (GICS) (Fan et al., 2016a). Specifically, the covariance matrix for idiosyncraticcomponents for the different sectors are set to zero and those for the same sector are main-tained. This corresponds to hard-thresholding using the sector information. To make theestimates positive semi-definite, we projected the POET estimators onto the positive semi-definite cone in the spectral norm.To check the performance of the proposed ARP estimation procedure, we first investigatedthe mean squared prediction error (MSPE) for the POET estimators, where the MSPE isdefined as follows: MSPE( (cid:98) Γ ) = 1 s − s − (cid:88) d =1 (cid:107) (cid:98) Γ d − (cid:98) Γ d +1 (cid:107) F , (6.1)where s is the number of days in the period and (cid:98) Γ d can be POET estimators from theARPM, URPM, and PRVM estimators for the d -th day of the period. We used threedifferent periods: 252 days, day 1 to day 126, and day 127 to day 252. Table 2 reports theMSPE results for the POET estimators from the inputs of the ARPM, URPM, and PRVMestimators. We find that for each period and rank r , the ARPM estimator has the smallestMSPE. The URPM estimator is slightly better than the PRVM estimator, but insignificantlyso when compared with the ARPM estimator. One possible explanation is that the proposedARP estimator can help deal with heterogeneous heavy-tailed distributions when estimatingintegrated volatility matrices.To check the out-of-sample performance, we applied the ARPM, URPM, and PRVMestimators to the following minimum variance portfolio allocation problem:min ω ω (cid:62) (cid:98) Γ ω, subject to ω (cid:62) J = 1 and (cid:107) ω (cid:107) ≤ c , where J = (1 , . . . , (cid:62) ∈ R p , the gross exposure constraint c was changed from 1 to 6, and (cid:98) Γ could be the POET estimators from the ARPM, URPM, and PRVM estimators. To calculatethe out-of-sample risks, we constructed the portfolios at the beginning of each trading dayusing the stock weights calculated using the data from the previous day. We then held thisfor one day and calculated the square root of realized volatility using the 10-min portfoliolog-returns. Their average was used for out-of-sample risk. We tested the performances for28able 2: The MSPEs of the POET estimators from the ARPM, URPM, and PRVM estima-tors (period 1: 252 days, period 2: day 1 to day 126, period 3: day 127 to day 252). MSPE × Rank r three different periods: 252 days, day 1 to day 126, and day 127 to day 252.Figure 7 depicts the out-of-sample risks of the portfolios constructed by the POET esti-mators from the ARPM, URPM, and PRVM estimators. We can find that, for the purposeof portfolio allocation, the ARPM estimator shows a stable result and has the smallest risks.The URPM estimator performs better than the PRVM estimator, but the improvement isless compared to that of the ARPM estimator. It is worth noting that the results do notsignificantly depend on the period and rank r . These results lend further support to ourclaim that the heavy-tailed distributions of observed log-returns are heterogeneous, as shownin Figure 1 and Figure 5, and the proposed ARP estimation procedure can account for theheterogeneity of the degrees of heaviness of tail distributions. In this paper, we develop the adaptive robust pre-averaging realized volatility (ARP) esti-mation method to handle the heterogeneous heavy-tailed distributions of stock returns. Toaccount for the heterogeneity of the heavy-tailedness from microstructural noises and pricejumps, the ARP estimator truncates quadratic pre-averaged random variables according todaily tail indices. We show that the proposed ARP estimator achieves sub-Weibull tail con-29 . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) ARPMURPMPRVM . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . Exposure constraint A nnua li z ed r i sk ( % ) . . . . . Exposure constraint A nnua li z ed r i sk ( % ) Figure 7: The out-of-sample risks of the optimal portfolios constructed by using the POETestimators from the ARPM, URPM, and PRVM estimators.centration with the optimal convergence rate by showing that its upper bound is matchedwith its lower bound. To estimate large integrated volatility matrices, the ARP estimator isfurther regularized using the POET procedure, and the asymptotic properties of the POETestimator from the ARP estimator are also investigated. In the empirical study, for the30urpose of portfolio allocation, the POET estimator based on the ARP estimator performsthe best. These findings suggest that when it comes to estimating the integrated volatil-ity matrices, the proposed ARP estimation procedure helps handle the heterogeneous taildistributions of observed log-returns.The non-synchronization could be other source of the heavy-tailness and also the hetero-geneity of time intervals can cause some heterogeneous variation. However, in this paper,we do not focus on this issue and mainly consider the noise and jump as the source of theheavy-tailness. It would be interesting and important to study the observation time pointin the aspect of the heavy-tailness. Furthermore, there are other possible sources of theheavy-tailness and it is important to know what actually causes heavy-tailness. We leavethese interesting questions for the future study.
References
A¨ıt-Sahalia, Y., Fan, J., and Xiu, D. (2010). High-frequency covariance estimates withnoisy and asynchronous financial data.
Journal of the American Statistical Association ,105(492):1504–1517.A¨ıt-Sahalia, Y., Jacod, J., and Li, J. (2012). Testing for jumps in noisy high frequency data.
Journal of Econometrics , 168(2):207–222.A¨ıt-Sahalia, Y. and Xiu, D. (2017). Using principal component analysis to estimate a highdimensional factor model with high-frequency data.
Journal of Econometrics , 201(2):384–399.Andersen, T. G., Bollerslev, T., and Diebold, F. X. (2007). Roughing it up: Including jumpcomponents in the measurement, modeling, and forecasting of return volatility.
The reviewof economics and statistics , 89(4):701–720.Bai, J. (2003). Inferential theory for factor models of large dimensions.
Econometrica ,71(1):135–171. 31ai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.
Econometrica , 70(1):191–221.Barndorff-Nielsen, O. E. (2002). Econometric analysis of realized volatility and its use inestimating stochastic volatility models.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 64(2):253–280.Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008). Designingrealized kernels to measure the ex post variation of equity prices in the presence of noise.
Econometrica , 76(6):1481–1536.Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2011). Multivariaterealised kernels: consistent positive semi-definite estimators of the covariation of equityprices with noise and non-synchronous trading.
Journal of Econometrics , 162(2):149–169.Barndorff-Nielsen, O. E. and Shephard, N. (2006). Econometrics of testing for jumps infinancial economics using bipower variation.
Journal of financial Econometrics , 4(1):1–30.Bibinger, M., Hautsch, N., Malec, P., and Reiß, M. (2014). Estimating the quadratic co-variation matrix from noisy observations: Local method of moments and efficiency.
TheAnnals of Statistics , 42(4):1312–1346.Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviationstudy.
Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques , 48(4):1148–1185.Chen, D., Mykland, P. A., and Zhang, L. (2020). The five trolls under the bridge: Principalcomponent analysis with asynchronous and noisy high frequency data.
Journal of theAmerican Statistical Association , 115(532):1960–1977.Christensen, K., Kinnebrock, S., and Podolskij, M. (2010). Pre-averaging estimators of theex-post covariance matrix in noisy diffusion models with non-synchronous data.
Journalof Econometrics , 159(1):116–133.Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues.
Quantitative Finance , 1(2):223–236. 32orsi, F., Pirino, D., and Reno, R. (2010). Threshold bipower variation and the impact ofjumps on volatility forecasting.
Journal of Econometrics , 159(2):276–288.Cox, J. C., Ingersoll Jr, J. E., and Ross, S. A. (1985). A theory of the term structure ofinterest rates.
Econometrica: Journal of the Econometric Society , 53:385–407.Davies, R. and Tauchen, G. (2018). Data-driven jump detection thresholds for applicationin jump regressions.
Econometrics , 6(2):16.Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R. I. (2016). Sub-gaussian mean estima-tors.
The Annals of Statistics , 44(6):2695–2725.Fan, J., Furger, A., and Xiu, D. (2016a). Incorporating global industrial classification stan-dard into portfolio allocation: A simple factor-based large covariance matrix estimatorwith high-frequency data.
Journal of Business & Economic Statistics , 34(4):489–503.Fan, J. and Kim, D. (2018). Robust high-dimensional volatility matrix estimation for high-frequency factor model.
Journal of the American Statistical Association , 113(523):1268–1283.Fan, J., Li, Y., and Yu, K. (2012). Vast volatility matrix estimation using high-frequencydata for portfolio selection.
Journal of the American Statistical Association , 107(497):412–428.Fan, J., Liao, Y., and Liu, H. (2016b). An overview of the estimation of large covarianceand precision matrices.
The Econometrics Journal , 19(1):C1–C32.Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholdingprincipal orthogonal complements.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 75(4):603–680.Fan, J., Wang, W., and Zhong, Y. (2018). An (cid:96) ∞ eigenvector perturbation bound andits application to robust covariance estimation. Journal of Machine Learning Research ,18(207):1–42. 33an, J., Wang, W., and Zhu, Z. (2021). A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery.
Annals of Statistics , page to appear.Fan, J. and Wang, Y. (2007). Multi-scale jump and volatility analysis for high-frequencyfinancial data.
Journal of the American Statistical Association , 102(480):1349–1362.Hayashi, T. and Yoshida, N. (2005). On covariance estimation of non-synchronously observeddiffusion processes.
Bernoulli , 11(2):359–379.Hayashi, T. and Yoshida, N. (2011). Nonsynchronous covariation process and limit theorems.
Stochastic processes and their applications , 121(10):2416–2454.Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution.
The annals of statistics , pages 1163–1174.Huang, X. and Tauchen, G. (2005). The relative contribution of jumps to total price variance.
Journal of financial econometrics , 3(4):456–499.Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., and Vetter, M. (2009). Microstructurenoise in the continuous case: the pre-averaging approach.
Stochastic processes and theirapplications , 119(7):2249–2276.Jacod, J. and Protter, P. (2012).
Discretization of Processes . Springer.Kim, D. and Fan, J. (2019). Factor garch-itˆo models for high-frequency data with applicationto large volatility matrix prediction.
Journal of Econometrics , 208(2):395–417.Kim, D., Liu, Y., and Wang, Y. (2018). Large volatility matrix estimation with factor-baseddiffusion model for high-frequency financial data.
Bernoulli , 24(4B):3657–3682.Kim, D., Wang, Y., and Zou, J. (2016). Asymptotic theory for large volatility matrix estima-tion based on high-frequency financial data.
Stochastic Processes and their Applications ,126:3527—-3577.Kong, X.-B. (2018). On the systematic and idiosyncratic volatility with large panel high-frequency data.
Annals of Statistics , 46(3):1077–1108.34alliavin, P., Mancino, M. E., et al. (2009). A fourier transform method for nonparametricestimation of multivariate volatility.
The Annals of Statistics , 37(4):1983–2010.Mancini, C. (2004). Estimation of the characteristics of the jumps of a general poisson-diffusion model.
Scandinavian Actuarial Journal , 2004(1):42–52.Mao, G. and Zhang, Z. (2018). Stochastic tail index model for high frequency financial datawith bayesian analysis.
Journal of Econometrics , 205(2):470–487.Massacci, D. (2017). Tail risk dynamics in stock returns: Links to the macroeconomy andglobal markets connectedness.
Management Science , 63(9):3072–3089.Minsker, S. (2018). Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries.
The Annals of Statistics , 46(6A):2871–2903.Park, S., Hong, S. Y., and Linton, O. (2016). Estimating the quadratic covariation matrix forasynchronously observed high frequency stock returns corrupted by additive measurementerror.
Journal of Econometrics , 191(2):325–347.Song, X., Kim, D., Yuan, H., Cui, X., Lu, Z., Zhou, Y., and Wang, Y. (2020). Volatilityanalysis with realized garch-itˆo models.
Journal of Econometrics .Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a largenumber of predictors.
Journal of the American statistical association , 97(460):1167–1179.Sun, Q., Zhou, W.-X., and Fan, J. (2020). Adaptive huber regression.
Journal of theAmerican Statistical Association , 115(529):254–265.Wang, Y. (2002). Asymptotic nonequivalence of garch models and diffusions.
The Annalsof Statistics , 30(3):754–783.Wang, Y. and Zou, J. (2010). Vast volatility matrix estimation for high-frequency financialdata.
The Annals of Statistics , 38(2):943–978.Xiu, D. (2010). Quasi-maximum likelihood estimation of volatility with high frequency data.
Journal of Econometrics , 159(1):235–250. 35hang, L. (2006). Efficient estimation of stochastic volatility using noisy observations: Amulti-scale approach.
Bernoulli , 12(6):1019–1043.Zhang, L. (2011). Estimating covariation: Epps effect, microstructure noise.
Journal ofEconometrics , 160(1):33–47.Zhang, L., Mykland, P. A., and A¨ıt-Sahalia, Y. (2005). A tale of two time scales: Determiningintegrated volatility with noisy high-frequency data.
Journal of the American StatisticalAssociation , 100(472):1394–1411.Zhang, X., Kim, D., and Wang, Y. (2016). Jump variation estimation with noisy highfrequency financial data via wavelets.
Econometrics , 4(3):34.36
Appendix
A.1 Proof of Theorem 1
For simplicity, we denote K n by K . Let¯ X i ( τ k ) = (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) { X ci ( τ i,k + l +1 ) − X ci ( τ i,k + l ) } = (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) (cid:90) τ i,k + l +1 τ i,k + l µ i ( t ) dt + (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) (cid:90) τ i,k + l +1 τ i,k + l e (cid:62) i σ (cid:62) ( t ) d W t = ¯ X µi ( τ k ) + ¯ X σi ( τ k ) , where e i is the p × i -th coordinate and¯ (cid:15) i ( τ k ) = (cid:115) n − KφK K − (cid:88) l =0 g (cid:18) lK (cid:19) { (cid:15) i ( τ i,k + l +1 ) − (cid:15) i ( τ i,k + l ) } = (cid:115) n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) (cid:15) i ( τ i,k + l +1 ) . Then we have Q cij ( τ k ) = (cid:2) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:3) (cid:2) ¯ X j ( τ k ) + ¯ (cid:15) j ( τ k ) (cid:3) . (A.1) Proposition 2.
Under models (2.1) and (2.3) , (a) and (b) hold for all ≤ i, j ≤ p andsufficiently large n .(a) Under Assumption 1, there exist positive constants U ij ( τ k ) whose values are free of n and p such that E (cid:110)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k (cid:111) ≤ U ij ( τ k ) a.s.for all ≤ k ≤ n − K n .(b) Under Assumption 1(a)–(b) and Assumption 2, there exist positive constants U ρ,ij ( τ k )37 hose values are free of n and p such that E (cid:110)(cid:12)(cid:12) Q cρ,ij ( τ k ) (cid:12)(cid:12) α ij (cid:12)(cid:12)(cid:12) F τ k − (cid:111) ≤ U ρ,ij ( τ k ) a.s.for all ≤ k ≤ n − . Proof of Proposition 2.
First, consider (a). By the H¨older’s inequality,E (cid:104) (cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α i α j / ( α i + α j ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ (cid:110) E (cid:104) | Q cii ( τ k ) | α i (cid:12)(cid:12)(cid:12) F τ k (cid:105)(cid:111) α j / ( α i + α j ) (cid:110) E (cid:104) (cid:12)(cid:12) Q cjj ( τ k ) (cid:12)(cid:12) α j (cid:12)(cid:12)(cid:12) F τ k (cid:105)(cid:111) α i / ( α i + α j ) a.s.Therefore, it suffices to show thatE (cid:104) | Q cii ( τ k ) | α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ C a.s.By Assumption 1(c) and (A.1), we have ν Q ≥ E (cid:104) | Q cii ( τ k ) | α i (cid:105) = E (cid:104)(cid:12)(cid:12) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:105) = E (cid:110) E (cid:104) (cid:12)(cid:12) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:12)(cid:12)(cid:12) ¯ (cid:15) i ( τ k ) (cid:105)(cid:111) ≥ E (cid:26)(cid:12)(cid:12)(cid:12) E (cid:104) ¯ X i ( τ k ) + ¯ (cid:15) i ( τ k ) | ¯ (cid:15) i ( τ k ) (cid:105)(cid:12)(cid:12)(cid:12) α i (cid:27) ≥ E (cid:26)(cid:12)(cid:12)(cid:12) | ¯ (cid:15) i ( τ k ) | − (cid:12)(cid:12) E (cid:2) ¯ X i ( τ k ) (cid:3)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) α i (cid:27) . Also, by the fact that (cid:12)(cid:12) ¯ X µi ( τ k ) (cid:12)(cid:12) ≤ Cn − / a.s., we have (cid:12)(cid:12) E (cid:2) ¯ X i ( τ k ) (cid:3)(cid:12)(cid:12) = (cid:12)(cid:12) E (cid:2) ¯ X µi ( τ k ) (cid:3)(cid:12)(cid:12) ≤ Cn − / a.s.Hence, using the H¨older’s inequality, we can showE (cid:2) | ¯ (cid:15) i ( τ k ) | α i (cid:3) ≤ C. g ( · ), we haveE (cid:104) | Q cii ( τ k ) | α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) = E (cid:104)(cid:12)(cid:12) ¯ X µi ( τ k ) + ¯ X σi ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ C + C E (cid:104)(cid:12)(cid:12) ¯ X σi ( τ k ) (cid:12)(cid:12) α i (cid:12)(cid:12)(cid:12) F τ k (cid:105) ≤ C + C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − KφK K − (cid:88) l =0 g (cid:18) lk (cid:19) ν γ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i ≤ C a.s. , (A.2)where the first inequality is due to the H¨older’s inequality and the second inequality is fromthe Burkholder-Davis-Gundy inequality. Then (a) is proved, and we can show (b) similar tothe proof of (a). (cid:4) Proof of Theorem 1.
Without loss of generality, we assume that n = K ( L + 1) forsome L ∈ N . We have | (cid:98) T αij,θ − T ij | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − T ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K ) θ ij n − K (cid:88) k =1 (cid:104) ψ α ij { θ ij Q ij ( τ k ) } − ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = ( I ) + ( II ) . (A.3)First, consider ( I ). Let A ij ( τ k ) = E (cid:104) Q cij ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) . Then for any s >
0, we obtain thatPr (cid:40) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − n − K n − K (cid:88) k =1 A ij ( τ k ) ≥ Kn − K s (cid:41) ≤ exp {− θ ij s } E (cid:34) K − (cid:89) m =0 L − (cid:89) k =0 exp (cid:26) K (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105)(cid:27)(cid:35) ≤ exp {− θ ij s } K − (cid:89) m =0 E (cid:34) L − (cid:89) k =0 exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105)(cid:35) /K = exp {− θ ij s } K − (cid:89) m =0 E (cid:34) L − (cid:89) k =0 exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105) E (cid:34) exp (cid:104) ψ α ij (cid:8) θ ij Q cij (cid:0) τ K ( L − m +1 (cid:1)(cid:9) − θ ij A ij (cid:0) τ K ( L − m +1 (cid:1) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ K ( L − m +1 (cid:35)(cid:35) /K ≤ exp {− θ ij s } K − (cid:89) m =0 E (cid:34) L − (cid:89) k =0 exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ Kk + m +1 ) (cid:9) − θ ij A ij ( τ Kk + m +1 ) (cid:105)(cid:35) /K × exp (cid:40) K − K − (cid:88) m =0 c α ij U ij ( τ K ( L − m +1 ) θ α ij ij (cid:41) ≤ exp (cid:26) − θ ij s + n − KK c α ij S ij θ α ij ij (cid:27) , (A.4)where the first and second inequalities are due to the Markov inequality and H¨older’s in-equality, respectively, and the third and fourth inequalities can be obtained by (A.5). Sincewe can get − log(1 − x + c α ij | x | α ij ) ≤ ψ α ij ( x ) ≤ log(1 + x + c α ij | x | α ij ) from Lemma A.2(Minsker, 2018), we haveE (cid:34) exp (cid:104) ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − θ ij A ij ( τ k ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:35) ≤ E (cid:34) exp (cid:104) log (cid:8) θ ij Q cij ( τ k ) + c α ij (cid:12)(cid:12) θ ij Q cij ( τ k ) (cid:12)(cid:12) α ij (cid:9) − θ ij A ij ( τ k ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:35) = exp (cid:104) log (cid:8) θ ij A ij ( τ k ) + c α ij θ α ij ij E (cid:2)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij |F τ k (cid:3)(cid:9) − θ ij A ij ( τ k ) (cid:105) ≤ exp (cid:104) c α ij θ α ij ij E (cid:2)(cid:12)(cid:12) Q cij ( τ k ) (cid:12)(cid:12) α ij |F τ k (cid:3) (cid:105) ≤ exp (cid:104) c α ij U ij ( τ k ) θ α ij ij (cid:105) a.s. , (A.5)where the second inequality is due to the fact that log (1 + x ) ≤ x for any x > −
1, and thelast inequality is from Proposition 2. Choose θ ij = (cid:18) K n log y − ( α ij − c α ij S ij ( n − K n ) (cid:19) /α ij , s = (cid:32) α α ij ij c α ij S ij ( n − K ) (log y − ) α ij − ( α ij − α ij − K (cid:33) /α ij , where c log n ≤ log y − ≤ √ n . Then we havePr (cid:34) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − n − K n − K (cid:88) k =0 A ij ( τ k ) ≥ (cid:32) α α ij ij c α ij S ij K α ij − (log y − ) α ij − ( α ij − α ij − ( n − K ) α ij − (cid:33) /α ij (cid:35) ≤ y. (cid:34) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K ) θ ij n − K (cid:88) k =1 ψ α ij (cid:8) θ ij Q cij ( τ k ) (cid:9) − n − K n − K (cid:88) k =0 A ij ( τ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij (cid:35) ≥ − y. (A.6)Now, we need to establish the relationship between (cid:80) n − Kk =0 A ij ( τ k ) / ( n − K ) and T ij . Since X and (cid:15) are independent, we have A ij ( τ k ) = E (cid:104) ¯ X µi ( τ k ) ¯ X µj ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) + E (cid:104) ¯ X µi ( τ k ) ¯ X σj ( τ k ) + ¯ X σi ( τ k ) ¯ X µj ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) +E (cid:104) ¯ X σi ( τ k ) ¯ X σj ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) + E (cid:104) ¯ (cid:15) i ( τ k )¯ (cid:15) j ( τ k ) (cid:12)(cid:12)(cid:12) F τ k (cid:105) = ( a ) + ( b ) + ( c ) + ( d ) . By the fact that (cid:12)(cid:12) ¯ X µi ( τ k ) (cid:12)(cid:12) ≤ Cn − / a.s., we have | ( a ) | ≤ Cn − / a.s. (A.7)Using the Burkholder-Davis-Gundy inequality, we can show | ( b ) | ≤ Cn − / (cid:32)(cid:114) E (cid:104) (cid:8) ¯ X σi ( τ k ) (cid:9) (cid:12)(cid:12)(cid:12) F τ k (cid:105) + (cid:114) E (cid:104) (cid:8) ¯ X σj ( τ k ) (cid:9) (cid:12)(cid:12)(cid:12) F τ k (cid:105)(cid:33) ≤ Cn − / a.s. (A.8)Consider ( c ). Let ¯ X σi ( τ k ) = (cid:115) n − KφK K − (cid:88) l =0 H i,k,l , where H i,k,l = g (cid:18) lK (cid:19) (cid:90) τ i,k + l +1 τ k + l e (cid:62) i σ (cid:62) ( t ) d W t + g (cid:18) l + 1 K (cid:19) (cid:90) τ k + l +1 τ i,k + l +1 e (cid:62) i σ (cid:62) ( t ) d W t = g (cid:18) lK (cid:19) (cid:90) τ k + l +1 τ k + l e (cid:62) i σ (cid:62) ( t ) d W t + (cid:26) g (cid:18) l + 1 K (cid:19) − g (cid:18) lK (cid:19)(cid:27) (cid:90) τ k + l +1 τ i,k + l +1 e (cid:62) i σ (cid:62) ( t ) d W t . c ) = n − KφK K − (cid:88) l =0 E (cid:104) H i,k,l H j,k,l (cid:12)(cid:12)(cid:12) F τ k (cid:105) a.s.By the Itˆo’s isometry and the boundedness of γ ij ( t ), we can get for all 0 ≤ l ≤ K − (cid:104) H i,k,l H j,k,l (cid:12)(cid:12)(cid:12) F τ k (cid:105) − (cid:26) g (cid:18) lK (cid:19)(cid:27) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) ≤ Cn − (cid:34)(cid:26) g (cid:18) l + 1 K (cid:19) − g (cid:18) lK (cid:19)(cid:27) + (cid:12)(cid:12)(cid:12)(cid:12) g (cid:18) lK (cid:19) (cid:26) g (cid:18) l + 1 K (cid:19) − g (cid:18) lK (cid:19)(cid:27)(cid:12)(cid:12)(cid:12)(cid:12)(cid:35) ≤ Cn − / a.s. , where the last inequality is by the piecewise Lipschitz derivative condition for g ( · ). Thus,we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( c ) − n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn − / a.s. (A.9)Finally, for ( d ), we have( d ) = n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) ( τ i,k + l +1 = τ j,k + l +1 ) η ij a.s. (A.10)Combining (A.7)–(A.10), we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K n − K (cid:88) k =1 A ij ( τ k ) − A ∗ ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn − / a.s. , (A.11)where A ∗ ij = 1 φK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) n − K (cid:88) k =1 E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) + ρ ij . Now, we investigate the relationship between A ∗ ij and T ij . Note that γ ij ( t ) is boundedand (cid:80) n − Kk =1 (cid:104) E (cid:110)(cid:82) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12) F τ k (cid:111) − (cid:82) τ k + l +1 τ k + l γ ij ( t ) dt (cid:105) is the sum of l +1 martingales. Hence,42sing the Azuma-Hoeffding inequality for each martingale, we can show for all 0 ≤ l ≤ K − (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − K (cid:88) k =1 (cid:34) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) − (cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C (cid:0) n − / log y − (cid:1) / (cid:33) ≤ Ky.
Also, simple algebraic manipulations show | A ∗ ij − T ij | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) (cid:32)(cid:34) n − K (cid:88) k =1 E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41)(cid:35) − (cid:90) γ ij ( t ) dt (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19)(cid:27) n − K (cid:88) k =1 (cid:34) E (cid:40)(cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ k (cid:41) − (cid:90) τ k + l +1 τ k + l γ ij ( t ) dt (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 Kn ν γ . Therefore, we havePr (cid:26) | A ∗ ij − T ij | ≤ C (cid:0) n − / log y − (cid:1) / + 2 Kn ν γ (cid:27) ≥ − K y. (A.12)Combining (A.6), (A.11), and (A.12), we havePr (cid:110) ( I ) ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − K y. (A.13)Consider ( II ). Note that by the boundedness of the intensity, we havePr (cid:8) Λ i (1) ≤ C log y − (cid:9) ≥ − y. Hence, using the fact that ψ α ( x ) is a bounded function, we havePr (cid:26) ( II ) ≤ C (cid:18) K log y − nθ ij (cid:19)(cid:27) ≥ − y, which implies Pr (cid:110) ( II ) ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − y. (A.14)43ollecting (A.3), (A.13), and (A.14), we obtain that with probability at least 1 − K y , | (cid:98) T αij,θ − T ij | ≤ C (cid:0) n − / log y − (cid:1) ( α ij − /α ij , and then substituting δ/ (3 K ) for y completes the proof. (cid:4) A.2 Proof of Theorem 2
Proof of Theorem 2.
Let n i = n j = n and t i,k = t j,k = τ i,k = τ j,k = τ k = k/n for1 ≤ k ≤ n . To derive a lower bound, we construct two quadratic pre-averaged randomvariables Q ,ij ( τ k ) and Q ,ij ( τ k ) as follows. Let d X ( t ) = d X ( t ) = σ (cid:62) ( t ) d W t for anyappropriate σ ( t ), which implies ¯ X ,h ( τ k ) = ¯ X ,h ( τ k ) for 1 ≤ h ≤ p and 1 ≤ k ≤ n − K . Also,let 2 (cid:15) ,h ( t h,k ) = (cid:15) ,h ( t h,k ) for 1 ≤ h ≤ p and 0 ≤ k ≤ n , where the distributions of (cid:15) ,h ( t h,k ),1 ≤ h ≤ p are defined as follows: (cid:15) ,h ( t h,k ) = K ( α h +1) / α h (log(1 / δ )) − / α h with probability d − d − K ( α h +1) / α h (log(1 / δ )) − / α h with probability d, where d = C K log(1 / δ ) / K . For each 0 ≤ k ≤ n , let Pr { (cid:15) ,h ( t h,k ) > ≤ h ≤ p } = Pr { (cid:15) ,h ( t h,k ) < ≤ h ≤ p } = d and Pr { (cid:15) ,h ( t h,k ) = 0 for all 1 ≤ h ≤ p } = 1 − d .Then, using the fact that 1 − x ≥ exp( − x/ (1 − x )) for any 0 ≤ x ≤ /
2, we can show n (cid:89) k =1 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } = (cid:18) − n log 12 δ (cid:19) n ≥ δ. (A.15)Here, we need to check whether the construction satisfies Assumption 1(c). It suffices toshow E (cid:2)(cid:12)(cid:12) Q c ,ii ( τ ) (cid:12)(cid:12) α i (cid:3) ≤ C. (A.16)44ote that for all 1 ≤ k ≤ n ,E (cid:2) | (cid:15) ,i ( τ k ) | α i (cid:3) = C K K α i − and E (cid:2) (cid:15) ,i ( τ k ) (cid:3) = C K (cid:18) K − log 12 δ (cid:19) ( α i − /α i . Hence, by the Lipschitz continuity of g ( · ), we haveE (cid:2) | ¯ (cid:15) ,i ( τ ) | α i (cid:3) = (cid:32)(cid:115) n − KφK K (cid:33) α i E (cid:34) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K − (cid:88) l =0 K (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) (cid:15) ,i ( τ l +2 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i (cid:35) ≤ CK − α i (cid:40) K − (cid:88) l =0 E (cid:2) | (cid:15) ,i ( τ l +2 ) | α i (cid:3) + (cid:32) K − (cid:88) l =0 E (cid:2) (cid:15) ,i ( τ l +2 ) (cid:3)(cid:33) α i (cid:41) ≤ C + CK − α i (cid:40) K (cid:18) K − log 12 δ (cid:19) ( α i − /α i (cid:41) α i ≤ C, (A.17)where the first inequality is due to the Rosenthal’s inequality. Then similar to the proof ofProposition 2, we can show (A.16).Now, since | T ,ij − T ,ij | = 3 nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] , we have for any (cid:98) T ij ( Q ij ( τ k ) , δ ),max (cid:34) Pr (cid:26)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] (cid:27) , Pr (cid:26)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] (cid:27) (cid:35) ≥
12 Pr (cid:34) (cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )]or (cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] (cid:35) ≥
12 Pr (cid:110) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) = (cid:98) T ij ( Q ,ij ( τ k ) , δ ) (cid:111) ≥ n (cid:89) k =1 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } ≥ δ, (A.18)45here the last inequality is from (A.15). Combining (A.18) and the fact that E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] = C ( K − log(1 / δ )) ( α ij − /α ij , we have for sufficiently large n ,max (cid:34) Pr (cid:40)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ C (cid:18) n − / log 12 δ (cid:19) ( α ij − /α ij (cid:41) , Pr (cid:40)(cid:12)(cid:12)(cid:12) (cid:98) T ij ( Q ,ij ( τ k ) , δ ) − T ,ij (cid:12)(cid:12)(cid:12) ≥ C (cid:18) n − / log 12 δ (cid:19) ( α ij − /α ij (cid:41) (cid:35) ≥ δ, (A.19)which completes the proof. (cid:4) A.3 Proof of Theorem 3
Proposition 3.
Under Assumption 1(a)–(b) and Assumption 2, Assumption 1(c) is satis-fied.
Proof of Proposition 3.
Similar to the proof of Proposition 2, we can showE (cid:2) | (cid:15) i ( τ i,k ) | α i (cid:3) ≤ C. Then we haveE (cid:104) | Q cii ( τ k ) | α i (cid:105) = E (cid:104)(cid:12)(cid:12) ¯ X µi ( τ k ) + ¯ X σi ( τ k ) + ¯ (cid:15) i ( τ k ) (cid:12)(cid:12) α i (cid:105) ≤ C + C E (cid:104)(cid:12)(cid:12) ¯ X σi ( τ k ) (cid:12)(cid:12) α i (cid:105) + C E (cid:2) | ¯ (cid:15) i ( τ k ) | α i (cid:3) ≤ C + C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − KφK K − (cid:88) l =0 g (cid:18) lk (cid:19) ν γ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i + C E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − KφK K − (cid:88) l =0 (cid:26) g (cid:18) lK (cid:19) − g (cid:18) l + 1 K (cid:19)(cid:27) (cid:15) i ( τ i,k + l +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i (cid:35) ≤ C + CK − α i E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K − (cid:88) l =0 (cid:15) i ( τ i,k + l +1 ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α i (cid:35) ≤ C + CK − K − (cid:88) l =0 E (cid:2) | (cid:15) i ( τ i,k + l +1 ) | α i (cid:3) ≤ C a.s. , (cid:4) Proof of Theorem 3.
Without loss of generality, we assume that n = 2 L + 1 for some L ∈ N . We have | (cid:98) ρ αij,θ − ρ ij | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ζφKθ ρ,ij n − (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) − ρ ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ζφKθ ρ,ij n − (cid:88) k =1 (cid:104) ψ α ij { θ ρ,ij Q ρ,ij ( τ k ) } − ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = ( I ) + ( II ) . (A.20)First, consider ( I ). Let( ζ/φKθ ρ,ij ) n − (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) = (cid:98) ρ α,c ,ij,θ + (cid:98) ρ α,c ,ij,θ , where (cid:98) ρ α,c ,ij,θ = ζφKθ ρ,ij L (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) , (cid:98) ρ α,c ,ij,θ = ζφKθ ρ,ij L (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) . Also, define A ρ,ij ( τ k ) = E (cid:104) Q cρ,ij ( τ k ) (cid:12)(cid:12)(cid:12) F τ k − (cid:105) . Then we can show for any s > (cid:40)(cid:98) ρ α,c ,ij,θ − ζφK L (cid:88) k =1 A ρ,ij ( τ k − ) ≥ ζsφK (cid:41) ≤ exp {− θ ρ,ij s } E (cid:34) exp (cid:40) L (cid:88) k =1 (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) − θ ρ,ij A ρ,ij ( τ k − ) (cid:105)(cid:41)(cid:35) = exp {− θ ρ,ij s } E (cid:34) exp (cid:40) L − (cid:88) k =1 (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) − θ ρ,ij A ρ,ij ( τ k − ) (cid:105)(cid:41) E (cid:34) exp (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ L − ) (cid:9) − θ ρ,ij A ρ,ij ( τ L − ) (cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F τ L − (cid:35)(cid:35) ≤ exp {− θ ρ,ij s } E (cid:34) exp (cid:40) L − (cid:88) k =1 (cid:104) ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k − ) (cid:9) − θ ρ,ij A ρ,ij ( τ k − ) (cid:105)(cid:41)(cid:35) × exp (cid:8) c α ij U ρ,ij ( τ L − ) θ α ij ρ,ij (cid:9) ≤ exp (cid:8) − θ ρ,ij s + ( n − c α ij S ρ,ij θ α ij ρ,ij (cid:9) . Choose θ ρ,ij = (cid:18) log y − ( α ij − c α ij S ρ,ij ( n − (cid:19) /α ij , s = (cid:32) α α ij ij c α ij S ρ,ij ( n −
1) (log y − ) α ij − ( α ij − α ij − (cid:33) /α ij , where c log n ≤ log y − ≤ √ n . Then we havePr (cid:40)(cid:98) ρ α,c ,ij,θ − ζφK L (cid:88) k =1 A ρ,ij ( τ k − ) ≥ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:41) ≤ y. Similarly, we can showPr (cid:34) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ζφKθ ρ,ij n − (cid:88) k =1 ψ α ij (cid:8) θ ρ,ij Q cρ,ij ( τ k ) (cid:9) − ζφK n − (cid:88) k =1 A ρ,ij ( τ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:35) ≤ y. (A.21)Now, we need to establish the relationship between ζ (cid:80) n − k =1 A ρ,ij ( τ k ) / ( φK ) and ρ ij . Since X and (cid:15) are independent, similar to the proof of Theorem 1, we can show (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ρ ij − ζφK n − (cid:88) k =1 A ρ,ij ( τ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn − a.s. (A.22)Combining (A.21) and (A.22), we havePr (cid:110) ( I ) ≤ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − y. (A.23)48onsider ( II ). Note that Pr (cid:8) Λ i (1) ≤ C log y − (cid:9) ≥ − y. Hence, using the fact that ψ α ( x ) is a bounded function, we havePr (cid:26) ( II ) ≤ C (cid:18) log y − nθ ρ,ij (cid:19)(cid:27) ≥ − y, which implies Pr (cid:110) ( II ) ≤ C (cid:0) n − log y − (cid:1) ( α ij − /α ij (cid:111) ≥ − y. (A.24)Collecting (A.20), (A.23), and (A.24), we obtain that with probability at least 1 − y , | (cid:98) ρ αij,θ − ρ ij | ≤ C (cid:0) n − log y − (cid:1) ( α ij − /α ij , and then substituting δ/ y completes the proof of (4.4). Also, (4.5) is proved by (4.3)and Theorem 1. (cid:4) A.4 Proof of Theorem 4
Proof of Theorem 4.
Let n i = n j = n and t i,k = t j,k = τ i,k = τ j,k = τ k = k/n for1 ≤ k ≤ n . Similar to the proof of Theorem 2, we construct two quadratic log-returnrandom variables Q ,ρ,ij ( τ k ) and Q ,ρ,ij ( τ k ) as follows. Let d X ( t ) = d X ( t ) = σ (cid:62) ( t ) d W t for any appropriate σ ( t ), which implies X ,h ( τ k +1 ) − X ,h ( τ k ) = X ,h ( τ k +1 ) − X ,h ( τ k ) for1 ≤ h ≤ p and 1 ≤ k ≤ n −
1. Also, let 2 (cid:15) ,h ( t h,k ) = (cid:15) ,h ( t h,k ) for 1 ≤ h ≤ p and 0 ≤ k ≤ n ,where the distributions of (cid:15) ,h ( t h,k ), 1 ≤ h ≤ p are defined as follows: (cid:15) ,h ( t h,k ) = n / α h (log(1 / δ )) − / α h with probability d − d − n / α h (log(1 / δ )) − / α h with probability d, d = log(1 / δ ) / n . For each 0 ≤ k ≤ n , let Pr { (cid:15) ,h ( t h,k ) > ≤ h ≤ p } =Pr { (cid:15) ,h ( t h,k ) < ≤ h ≤ p } = d and Pr { (cid:15) ,h ( t h,k ) = 0 for all 1 ≤ h ≤ p } = 1 − d .Then, using the fact that 1 − x ≥ exp( − x/ (1 − x )) for any 0 ≤ x ≤ /
2, we can show n (cid:89) k =1 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } = (cid:18) − n log 12 δ (cid:19) n ≥ δ. (A.25)Here, we need to check whether the construction satisfies Assumption 2. It suffices to showE (cid:2) | (cid:15) ,i ( τ ) | α i (cid:3) ≤ C. (A.26)Note thatE (cid:2) | (cid:15) ,i ( τ ) | α i (cid:3) = 14 and E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] = 14 (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij . Hence, (A.26) is satisfied, and since | ρ ,ij − ρ ,ij | = 3 nζφK E [ (cid:15) ,i ( τ ) (cid:15) ,j ( τ )] , we have for any (cid:98) ρ ij ( Q ρ,ij ( τ k ) , δ ),max (cid:34) Pr (cid:40) | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij (cid:41) , Pr (cid:40) | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij (cid:41) (cid:35) ≥
12 Pr (cid:34) | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij or | (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) − ρ ,ij | ≥ nζ φK (cid:18) n − log 12 δ (cid:19) ( α ij − /α ij (cid:35) ≥
12 Pr { (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) = (cid:98) ρ ij ( Q ,ρ,ij ( τ k ) , δ ) }≥ n (cid:89) k =0 Pr { (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = (cid:15) ,i ( τ k ) = (cid:15) ,j ( τ k ) = 0 } ≥ δ, (A.27)50here the last inequality is from (A.25). (cid:4)(cid:4)