Streaming Approach to Quadratic Covariation Estimation Using Financial Ultra-High-Frequency Data
SStreaming Perspective in Quadratic Covariation EstimationUsing Financial Ultra-High-Frequency Data
Vladimír Holý
University of Economics, PragueWinston Churchill Square 4, 130 67 Prague 3, [email protected]
Petra Tomanová
University of Economics, PragueWinston Churchill Square 4, 130 67 Prague 3, [email protected]
March 31, 2020
Abstract:
We investigate the computational issues related to the memory size in the estimationof quadratic covariation using financial ultra-high-frequency data. In the multivariate price pro-cess, we consider both contamination by the market microstructure noise and the non-synchronousobservations. We express the multi-scale, flat-top realized kernel, non-flat-top realized kernel, pre-averaging and modulated realized covariance estimators in a quadratic form and fix their bandwidthparameter at a constant value. This allows us to operate with limited memory and formulatesuch estimation approach as a streaming algorithm. We compare the performance of the estima-tors with fixed bandwidth parameter in a simulation study. We find that the estimators ensuringpositive semidefiniteness require much higher bandwidth than the estimators without such constraint.
Keywords:
Ultra-High-Frequency Data, Market Microstructure Noise, Quadratic Covariation,Streaming Algorithm.
JEL Codes:
C32, C58, C63.
In finance, Engle (2000) coined the term ultra-high-frequency data referring to irregularly spacedtime series recorded at the highest possible frequency corresponding to each transaction or change inbid/ask offer. Financial high-frequency time series include stock prices, foreign exchange rates andcommodity prices. The availability of these high-frequency data allows econometricians to constructmore precise models while facing some new challenges.A key object in financial econometrics is volatility of the price process. For continuous processes,volatility over a given time interval (e.g. a day) is typically measured by the quadratic variation (seee.g. Andersen et al., 2001; Barndorff-Nielsen and Shephard, 2002). In the theoretically ideal settingfor the price process, it can be straightforwardly estimated by the realized variance . In practice,however, the so-called efficient price is concealed as the observed prices are contaminated by the market microstructure noise making the realized variance significantly biased at high frequencies. Themarket microstructure noise is caused by various frictions in the trading process such as discretnessof price values, bid-ask spread and information effects (see e.g. Hansen and Lunde, 2006).One way to deal with the market microstructure noise is to sample the price process at lowerfrequencies and find the optimal trade-off between the bias due to noise and precision (see Aït-Sahaliaet al., 2005; Zhang et al., 2005; Bandi and Russell, 2008). A better way is to utilize estimators robustto the noise with the full dataset. There are three dominant approaches in non-parametric estimationof quadratic variation: subsampling (see Zhang et al., 2005; Zhang, 2006; Nolte and Voev, 2012;Aït-Sahalia et al., 2011), autocovariance combining (see Barndorff-Nielsen et al., 2008, 2009) and pre-averaging (see Jacod et al., 2009; Hautsch and Podolskij, 2013; Jacod and Mykland, 2015). Theseestimators have different motivations but very similar structure in the end. Sun (2006) and Andersen1 a r X i v : . [ q -f i n . C P ] M a r t al. (2011) show that the multi-scale estimator of Zhang (2006) based on subsampling and the flat-top realized kernel estimator of Barndorff-Nielsen et al. (2008) based on autocovariance combiningcan be expressed in a quadratic form. Furthermore, it can be shown that the pre-averaging estimator of Jacod et al. (2009) has also the quadratic form structure. Finally, when assuming a specific modelfor the price process, a parametric approach for the volatility estimation can also be adopted (seeAït-Sahalia et al., 2005, 2010; Xiu, 2010; Holý and Tomanová, 2018). Precise estimation of volatilityis the cornerstone of derivative pricing (see e.g. Bandi et al., 2008) and risk management (see e.g.Žikeš and Baruník, 2015).The estimation of quadratic covariation between two price processes is even more challenging dueto non-synchronous trading . When the observations are simply synchronised using the previous-tickinterpolation scheme , the Epps effect occurs and the realized covariance is biased (see e.g.Hayashiand Yoshida, 2005; Zhang, 2011). Zhang (2011), however, shows that subsampling in the two-scalesestimator cancels not only the market microstructure noise but the Epps effect as well. Hayashiand Yoshida (2005) propose a consistent estimator for the quadratic covariation utilizing the orig-inal unaltered data. This approach is also adopted by Palandri (2006), Nolte and Voev (2008),Christensen et al. (2010) and Bibinger (2011). Finally, Martens (2004), Christensen et al. (2010)and Barndorff-Nielsen et al. (2011) synchronise observations using the refresh times of Harris et al.(1995). Estimation of vast covariation matrices is crucial in high-dimensional portfolio allocation (seee.g. Hautsch et al., 2015; Lunde et al., 2016).From the computational perspective, it is natural to consider financial high-frequency data as a data stream . A streaming algorithm can examine a sequence of inputs in a single pass only. Theavailable memory for the computation is limited and cannot store all data. The proper definitionmay vary in different papers (see e.g. Kontorovich, 2012 vs. Černý, 2019). Related concepts arethe online algorithm and the recursive algorithm which focus on the updating scheme rather thanthe memory constraints. Examples of streaming, online and recursive algorithms in the field ofeconometrics include the estimation and diagnostics of the linear regression model by Černý (2019),the estimation of the ARMA model by Ouakasse and Mélard (2014), the estimation of the GARCHmodel by Aknouche and Guerbyenne (2006), the estimation of the EWMA model by Hendrych andCipra (2019), the estimation of the spot volatility by Dahlhaus and Neddermeyer (2014) and thedetection of changepoints by Bodenham and Adams (2017). Naturally, computationally effectivealgorithms are crucial in high-frequency trading (see e.g. Christensen et al., 2012; Loveless et al.,2013; Arce et al., 2019).In the paper, we focus on the non-parametric estimation of the quadratic covariation from thestreaming perspective. We consider both the market microstructure noise and the non-synchronoustrading. First, we introduce the commonly used estimators robust to the market microstructurenoise: the multi-scale estimator of Zhang (2006), the flat-top realized kernel estimator of Barndorff-Nielsen et al. (2008), the non-flat-top realized kernel estimator of Barndorff-Nielsen et al. (2011),the pre-averaging estimator of Jacod et al. (2009) and the modulated realized covariance estimatorof Christensen et al. (2010). With the aim of the unified and simple computational framework, weexpress the estimators in a quadratic form. All five estimators depend on a bandwidth parameter.The respective papers show that the optimal value of this parameter is of order related to the numberof observations. In contrast, we consider the bandwidth parameter to be constant. This of courseleads to sub-optimal performance but allows us to adopt a streaming algorithm with fixed memoryleading to fast computation. In a simulation study, we compare the estimators and show the impact ofthe constant bandwidth parameter. In the case of non-synchronous trading, we can straightforwardlysynchronise the observations by the refresh times method of Harris et al. (1995) with no computationalissues.The rest of the paper is organized as follows. In Section 2, we present the standard framework forthe price process and quadratic variation. In Section 3, we describe the class of quadratic estimatorsand our streaming approach. In Section 4, we evaluate the performance of the estimators with fixedbandwidth using simulations. We conclude the paper in Section 5.2
Theoretical Framework
We utilize the standard framework for the price process. Let us denote the m -dimensional logarithmic efficient price as P t in continuous time t ≥ . We consider the efficient price to follow a multivariatecontinuous Itô semimartingale given by P t = P + (cid:90) t µ s d s + (cid:90) t σ s d W s , (1)where µ s is a multivariate finite variation càdlàg drift process, σ s is a multivariate adapted càdlàgvolatility process and W s is a vector of independent Wiener processes. This class is quite general as µ s and σ s can vary over time. However, a limitation is that we do not consider jumps in this modelas the process is defined to have continuous paths. Without the loss of generality, we restrict ourselves to the time interval [0 , . The quadratic covari-ation of the process P t on [0 , is then given by QV = plim ∆ n → n (cid:88) i =1 (cid:16) P T i − P T i − (cid:17)(cid:16) P T i − P T i − (cid:17) (cid:48) , (2)where plim denotes the limit in probability and ∆ n = max { T − T , T − T , . . . , T n − T n − } is themaximal lag between observations of synchronized partitions T < T < · · · < T n = 1 . Of courseas ∆ n → , we have that n → ∞ . In our case of the continuous Itô semimartingale, the quadraticvariation is identical to the integrated covariance given by IV = (cid:90) σ s σ (cid:48) s d s. (3)For general semimartingales, however, they differ due to the jump component. Let us consider that we observe the m -dimensional price process at non-synchronous discrete times.Furthermore, the observed price process is contaminated by the market microstructure noise. Let usdenote the k -th univariate observed price process as X ki at discrete times ≤ T k < T k < · · · < T kn k ≤ . Note that the observations can be irregularly spaced. The k -the component of the latent priceprocess P kt and the observed price process X ki are then related as X ki = P kT ki + E ki , E ki ∼ (0 , ω k ) for i = 0 , . . . , n k , (4)where E ki is the market microstructure noise with zero expected value and standard deviation ω k .At this point, we do not impose additional assumptions on the noise E ki as various estimators re-quire various assumptions. We refer to the respective papers of the individual estimators for moredetails. In an empirical study of financial markets, Hansen and Lunde (2006) show that the marketmicrostructure noise is dependent both on its past values and the efficient price process. Next, we synchronize observation times. Similarly to Harris et al. (1995), we define the refresh times ≤ T < T < · · · < T n ≤ in the following way. Let the initial refresh time be T = max (cid:8) T , . . . , T m (cid:9) . (5)3ext, let the subsequent refresh times be T i = min (cid:110) t : t ≥ T j > T i − , t ≥ T j > T i − , . . . , t ≥ T mj m > T i − (cid:111) for i = 1 , . . . , n. (6)Martens (2004) uses this scheme for the realized covariance. Barndorff-Nielsen et al. (2011) showsthat this synchronisation leads to consistent estimates of quadratic covariation by the realized kernelestimator while Christensen et al. (2010) present similar results for the modulated realized covarianceestimator. After the synchronisation, we can write the m -dimensional observed price process as X i =( X i , X i , . . . , X mi ) (cid:48) and the market microstructure noise as E i = ( E i , E i , . . . , E mi ) (cid:48) with synchronisedtimes T i , i = 0 , . . . , n . Furthermore, we define the observed returns as Y i = X i − X i − , i = 1 , . . . , n .Finally, let Y = ( Y , . . . , Y n ) = ( Y ki ) m,nk =1 ,i =1 denote the matrix of the observed returns with rowsindicating the asset and columns corresponding to the time. Of course, not all prices are observedexactly at the same moment corresponding to a refresh time. Often, a new price of only a singleasset is observed. For the other assets, the last observed price is utilized. This is similar to theprevious-tick approach but the price interpolation is performed only at refresh times instead of allobservation times. We estimate the quadratic covariation QV in the presence of the market microstructure noise byvarious non-parametric methods within a unified framework based on a quadratic form. Sun (2006)and Andersen et al. (2011) consider the class of quadratic estimators in the univariate case of quadraticvariation. The estimators of quadratic covariation in the quadratic class based on the returns Y i canbe formulated as QE = n (cid:88) i =1 n (cid:88) j =1 w i,j Y i Y (cid:48) j = Y W Y (cid:48) , (7)where W = ( w i,j ) n,ni =1 ,j =1 are weights associated with a given estimator. The formula can also berewritten in a quadratic form for the actual prices X i as QE = n (cid:88) i =0 n (cid:88) j =0 v i,j X i X (cid:48) j = XV X (cid:48) , (8)where weights V = ( v i,j ) n,ni =0 ,j =0 are given by V = F (cid:48) W F . The elements of matrix F = ( f i,j ) n,ni =1 ,j =0 are given by f i,j = for j = i + 1 , − for j = i, otherwise. (9) In general, the computation of quadratic form cannot be formulated as a streaming algorithm aseach observation is required to be stored in the memory. However, we can impose restrictions on theweight matrix W in order to make the computation streaming. Let the elements of the weight matrix W meet w i,j = (cid:26) u | i − j | for | i − j | < h, otherwise , (10)where u = ( u , . . . , u h − ) (cid:48) is the updating vector determining a given streaming estimator. Matrix W is therefore symmetric (2 h − -diagonal matrix in which the elements in the main diagonal and eachlower and upper diagonal are the same and determined by the updating vector u . At time T i , thequadratic estimator can then be recursively computed as QE i = QE i − + u Y i Y (cid:48) i + h − (cid:88) j =1 u j Y i Y (cid:48) i − j + h − (cid:88) j =1 u j Y i − j Y (cid:48) i . (11)4 Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Realized Covariance
Figure 1: Quadratic form of the realized variance with n = 16 .Besides the previous matrix QE i − and the vector of the current observations Y i , we need to storethe vectors of the previous observations Y i − , . . . , Y i − h +1 in the memory. In total, that is m ( m + h ) real numbers.In the following sections, we show that many estimators of quadratic covariation can be expressedin a quadratic form (7) with restriction (10) if the bandwidth of the estimators is fixed at h and thepossible edge effects are omitted. On one hand, fixed bandwidth leads to sub-optimal performanceof the estimators. On the other hand, it makes the estimation of quadratic variation in the presenceof the market microstructure noise a streaming algorithm. We define the edge effects as deviations ofweights w ij in the left upper corner i, j = 1 , . . . , h − and the right lower corner i, j = n − h + 2 , . . . , n from the values suggested by the updating vector u . The multi-scale, pre-averaging and modulatedrealized covariation estimators possess the edge effects (see figures 4, 7 and 8 respectively) whileboth the flat-top and non-flat-top realized kernel estimators do not have them (see figures 5 and 6respectively). Jacod et al. (2009) argue that the the edge effects are asymptotically unbiased in thecase of the pre-averaging estimator and allows for a simpler central limit theorem than in the case ofthe flat-top realized kernel estimator. In our streaming application, however, it is more suitable toomit the edge effects and modify all considered estimators in the fashion of Barndorff-Nielsen et al.(2008). A natural estimator of the quadratic covariation is the realized covariance defined as RV n = n (cid:88) i =1 Y i Y (cid:48) i . (12)In the absence of the noise, it is a consistent estimator of the quadratic covariation. In the presenceof the noise, however, it is biased and inconsistent. Note that in the asymptotics for n → ∞ , thetime interval remains [0 , but the frequency of observations increases. The realized covariance canbe easily expressed as a quadratic estimator using the weight matrix W RV,n = I . An example of thisweight matrix is shown in Figure 1. The updating vector is simply u RV,n = (1 , , . . . , (cid:48) . The first unbiased and consistent non-parametric estimator of the quadratic variation proposed inthe literature is the two-scale estimator of Zhang et al. (2005). It combines the average of realized5
Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Sparse Realized Covariance
Figure 2: Quadratic form of the sparse realized variance with n = 16 , l = 1 and s = 4 .variances computed at a lower frequency with realized variance at the highest possible frequency.The first term serves as a biased estimate of quadratic variation while the second term estimatesthe noise variance and therefore functions as the bias correction. Zhang (2006) further generalizesthe two-scale estimator to the multi-scale estimator based on average realized variances computedat multiple frequencies. Aït-Sahalia et al. (2011) shows that the multi-scale estimator is robust toserial dependency in the market microstructure noise. In the subsampling spirit, Nolte and Voev(2012) propose to combine average realized variances using the least squares. Extensions to quadraticcovariation estimation are proposed by Palandri (2006), Nolte and Voev (2008), Zhang (2011) andBibinger (2011).Before presenting the multi-scale estimator, we need to define some preliminary quantities. First,we introduce the sparse realized covariance which is simply the realized covariance at a lower frequency.Let l denote the lag of the initial observation and s denote the sampling interval. For example l = 1 and s = 4 would correspond to observations at times { T , T , T , T , . . . } . The number of observedprices utilized in the estimation is then (cid:98) ( n − l ) /s (cid:99) + 1 , where (cid:98)·(cid:99) denotes rounding down. The sparserealized covariance is defined as SRV n,l,s = (cid:98) ( n − l ) /s (cid:99) (cid:88) k =1 (cid:0) X ks + l − X ( k − s + l (cid:1) = (cid:98) ( n − l ) /s (cid:99) (cid:88) k =1 s (cid:88) j =1 Y ( k − s + l + j . (13)It can be expressed as a quadratic estimator using the weight matrix W SRV,n,l,s given by elements w SRV,n,l,si,j = (cid:26) for i, j ∈ [( k − s + l + 1 , ks + l ] , k = 1 , . . . , (cid:98) ( n − l ) /s (cid:99) , otherwise. (14)Figure 2 shows visualisation of the quadratic form. However, the sparse realized covariance cannotbe expressed using the updating vector as it does not meet the requirements given by (10).Next, we introduce the average realized covariance . As the sparse realized covariance uses only afraction of available observations, it is natural to utilize all observations by averaging sparse realizedcovariances over subgrids given by different lag of the initial observation l . For a given samplinginterval s , the average realized covariance is defined as ARV n,s = 1 s s − (cid:88) l =0 SRV n,l,s . (15)6 Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Average Realized Covariance
Figure 3: Quadratic form of the average realized variance with n = 16 and s = 4 .Although this approach reduces the impact of the market microstructure noise, the average realizedvariance is still a biased estimator of the quadratic variation. It can be expressed as a quadraticestimator using the weight matrix W ARV,n,s = 1 s s − (cid:88) l =0 W SRV,n,l,s . (16)This weight matrix is shown in Figure 3. The updating vector u ARV,n,s omitting the edge effects isgiven by elements u ARV,n,si = s − is for i = 0 , . . . , s − . (17)Finally, we present the multi-scale estimator of Zhang (2006). It is a weighted average of theaverage realized variances based on the sampling intervals ranging from 1 up to the bandwidth h .The estimator is given by M SE n,h = h (cid:88) s =1 A ( s, h ) ARV n,s , (18)where A ( s, h ) is the weight function. Zhang (2006) suggests to use A ( s, h ) = 12 s h − h − sh − h . (19)The weight matrix is given by W MSE,n,h = h (cid:88) s =1 A ( s, h ) W ARV,n,s . (20)It is illustrated in Figure 4. The updating vector u MSE,n,h omitting the edge effects is given byelements u MSE,n,hi = h (cid:88) s = i +1 s − is A ( s, h ) for i = 0 , . . . , h − . (21)7 Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Multi−Scale Estimator
Figure 4: Quadratic form of the multi-scale estimator with n = 16 and h = 4 . Another approach for the robust estimation of the quadratic variation is combining realized autoco-variances using kernel functions. Barndorff-Nielsen et al. (2008) propose to utilize the flat-top kernelswith unit weights at lags 0 and 1. Furthermore, Barndorff-Nielsen et al. (2009) considers non-flat-top kernels with unit weight only at lag 0. Flat-top realized kernels have faster convergence ratebut do not guarantee non-negativity of the estimates. In contrast, non-flat-top realized kernels havesub-optimal convergence rate but ensure non-negativity. Barndorff-Nielsen et al. (2011) propose thenon-flat-top kernels for the estimation of the quadratic covariation.The flat-top realized kernel estimator of Barndorff-Nielsen et al. (2008) is defined as
RKE n,h = RV n + h − (cid:88) j =1 K (cid:18) j − h − (cid:19) ( RA n,j + RA n, − j ) , (22)where K ( · ) is a kernel function and RA n,l is the realized autocovariance defined as RA n,l = n (cid:88) i = l +1 Y i Y (cid:48) i − l for l ≥ , (23)and RA (cid:48) n, − l for l < . Barndorff-Nielsen et al. (2008) consider many kernel functions and find thatthe modified Tukey–Hanning kernel of order 2 is near efficient. It is defined as K ( x ) = sin (cid:16) π − x ) (cid:17) . (24)The flat-top realized kernel estimator can be expressed as a quadratic form with weight matrix W RKE,n,h given by elements w RKE,n,hi,j = for i = j,K (cid:16) | i − j |− h − (cid:17) for ≤ | i − j | < h, otherwise. (25)It is illustrated in Figure 5. The updating vector u RKE,n,h is given by elements u RKE,n,hi = (cid:40) for i = 0 ,K (cid:16) i − h − (cid:17) for i = 1 , . . . , h − . (26)8 Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Flat−Top Realized Kernel Estimator
Figure 5: Quadratic form of the flat-top realized kernel estimator with n = 16 and h = 4 .To ensure positive semidefinite covariance matrix, the non-flat-top realized kernel of Barndorff-Nielsen et al. (2011) can be utilized. It is defined as P D - RKE n,h = h − (cid:88) j = − h +1 K (cid:18) | j | h (cid:19) RA n,j . (27)Both Barndorff-Nielsen et al. (2009) in the univariate case and Barndorff-Nielsen et al. (2011) in themultivariate case suggest to use the Parzen kernel given by K ( x ) = (cid:26) − x + 6 x for ≤ x < , − x ) for ≤ x ≤ . (28)The non-flat-top realized kernel estimator can be expressed as a quadratic form with weight matrix W P D - RKE,n,h given by elements w P D - RKE,n,hi,j = (cid:40) K (cid:16) | i − j | h (cid:17) for | i − j | < h, otherwise. (29)It is illustrated in Figure 6. The updating vector u P D - RKE,n,h is simply given by elements u P D - RKE,n,hi = K (cid:18) ih (cid:19) for i = 0 , . . . , h − . (30) The third class of estimators we present is the pre-averaging estimators pioneered by Jacod et al.(2009). The idea is to locally average returns and then sum their squares. Hautsch and Podolskij(2013) extend the theory of pre-averaging estimators to accommodate for serial dependence in themarket microstructure noise and jumps in the price process. Jacod and Mykland (2015) propose anadaptive method for the choice of the bandwidth parameter. Christensen et al. (2010) extend thepre-averaging estimator to the multivariate setting and use the name modulated realized covarianceinstead. 9
Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Non−Flat−Top Realized Kernel Estimator
Figure 6: Quadratic form of the non-flat-top realized kernel estimator with n = 16 and h = 4 .The pre-averaging estimator of Jacod et al. (2009) is based on the averaged returns given by ¯ Y i = h − (cid:88) j =0 G (cid:18) j + 1 h + 1 (cid:19) Y i + j , (31)where G ( · ) is a suitable weight function. Jacod et al. (2009) suggest to use G ( x ) = min { x, − x } . (32)The pre-averaging estimator is then defined as P AE n,h = 1 ψ n − h +1 (cid:88) i =1 ¯ Y i ¯ Y (cid:48) i − ψ ψ RV n , (33)where ψ = h (cid:88) j =0 (cid:18) G (cid:18) j + 1 h + 1 (cid:19) − G (cid:18) jh + 1 (cid:19)(cid:19) ,ψ = h − (cid:88) j =0 G (cid:18) j + 1 h + 1 (cid:19) . (34)The realized variance term serves as the bias correction. Note that similarly to Jacod and Mykland(2015), we use a simpler expression of the estimator than the one introduced in Jacod et al. (2009)and omit terms related to the asymptotics of the bandwidth parameter. We also do not includethe adjustment for the sample size as the final number of observations is unknown in the typicalstreaming setting. The pre-averaging estimator can be expressed as a quadratic estimator using theweight matrix W P AE,n,h given by elements w P AE,n,hi,j = ψ (cid:80) min { h − ,i − } k =max { ,i + h − n − } G (cid:16) k +1 h +1 (cid:17) − ψ ψ for i = j, ψ (cid:80) min { h − −| i − j | ,i − ,j − } k =max { ,i + h − n − ,j + h − n − } G (cid:16) k +1 h +1 (cid:17) G (cid:16) k +1+ | i − j | h +1 (cid:17) for ≤ | i − j | < h, otherwise. (35)10 Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Pre−Averaging Estimator
Figure 7: Quadratic form of the pre-averaging estimator with n = 16 and h = 4 .The weight matrix is illustrated in Figure 7. The updating vector u P AE,n,h omitting the edge effectsis given by elements u P AE,n,hi = ψ (cid:80) h − j =0 G (cid:16) j +1 h +1 (cid:17) − ψ ψ for i = 0 , ψ (cid:80) h − − ij =0 G (cid:16) j +1 h +1 (cid:17) G (cid:16) j +1+ ih +1 (cid:17) for i = 1 , . . . , h − . (36)When the bias correction term in (33) is omitted, the resulting estimator is guaranteed to bepositive semidefinite. Similary to the non-flat-top realized kernel estimator, however, it has sub-optimal convergence rate. Christensen et al. (2010) define the modulated realized covariance estimator as P D - P AE n,h = 1 ψ n − h +1 (cid:88) i =1 ¯ Y i ¯ Y (cid:48) i . (37)The weight matrix W P D - P AE,n,h is given by elements w P D - P AE,n,hi,j = 1 ψ { h − −| i − j | ,i − ,j − } (cid:88) k =max { ,i + h − n − ,j + h − n − } G (cid:18) k + 1 h + 1 (cid:19) G (cid:18) k + 1 + | i − j | h + 1 (cid:19) for ≤ | i − j | < h. (38)This weight matrix is shown in Figure 8. The updating vector u P D - P AE,n,h omitting the edge effectsis given by elements u P D - P AE,n,hi = 1 ψ h − − i (cid:88) j =0 G (cid:18) j + 1 h + 1 (cid:19) G (cid:18) j + 1 + ih + 1 (cid:19) for i = 0 , . . . , h − . (39) To compare the finite-sample performance of the estimators, we conduct a simulation study. Weconsider the same model for the observed price process as Barndorff-Nielsen et al. (2011). Theindividual efficient price P k , k = 1 , . . . , m , in continuous time follows d P k = µ d t + d V k + d F k , (40)11 Weight Matrix for Returns
Weight Matrix for Prices −2−1012
Weight
Modulated Realized Covariance Estimator
Figure 8: Quadratic form of the modulated realized covariation estimator with n = 16 and h = 4 .where component V k and the common factor F k are respectively given by d V k = ρS k d B k , d F k = (cid:112) − ρ S k d B , (41)and B , B , . . . , B m are independent Wiener processes. The volatility process S k is given by S k = exp (cid:16) α + βU k (cid:17) , d U k = θU k d t + d B k . (42)We restrict ourselves to two-dimensional processes, i.e. m = 2 . We generate the observation times T ki by two independent Poisson point processes with scale parameters λ = ( λ , λ ) . We standardize timeso that one unit corresponds to one second in 6.5 hours long trading day. The time window for whichwe compute the quadratic covariation is therefore
23 400 seconds long. In the case of two independentPoisson processes, the refresh time sampling results in the average number of observations n = 23 400 λ + λ λ + λ λ + λ . (43)We contaminate the efficient prices by the market microstructure noise. The observed prices X k arethen X ki = P kT ki + E ki , E ki ∼ N (cid:16) , ω k (cid:17) , ω k = ξ (cid:118)(cid:117)(cid:117)(cid:116) n + 1 n (cid:88) i =0 (cid:16) S kT ki (cid:17) . (44)The simulations are perfomed times, i.e. we simulate days. The volatility process(42) is simulated using the exact simulation algorithm for the Ornstein–Uhlenbeck process. The firstobservation on each day is generated using the stationary distribution Q k ∼ N (cid:0) , ( − θ ) − (cid:1) . Theefficient price process (40) with its components (41) is simulated using the Euler method. The initialobservation is set to P k = 0 .As in Barndorff-Nielsen et al. (2011), we set the parameter values for the efficient prices to µ = 0 . , ρ = − . , α = − . , β = 0 . and θ = − . . Such values reflect the empiricalproperties of financial markets and result in the expected value of quadratic covariation given by E [ QV ] = (cid:18) .
00 0 . .
67 1 . (cid:19) . (45)12urthermore, we consider three scenarios for the noise – ξ = 0 (denoted as None), ξ = 0 . (denotedas Small) and ξ = 0 . (denoted as Large). Finally, we consider two scenarios for the sample size– δ = (1 , . (denoted as Moderate) and δ = (0 . , . (denoted as High). We choose much higherfrequencies than Barndorff-Nielsen et al. (2011) to reflect the current trading behaviour. The scenariowith moderate frequency has on average
23 400 and
46 800 observations respectively while the highfrequency scenario has
234 000 and
468 000 observations respectively. After the refresh time sampling,we have on average
20 057 observations for the moderate frequency and
200 571 observations for thehigh frequency.
The results of the simulation study are reported in Table 1, Figure 9 and Figure 10. In Table 1, wechoose the best performing bandwidth parameter according to the root-mean-square error. In figures9 and 10, we investigate the behaviour of the root-mean-square error for values of the bandwidthparameter ranging from 2 to (regardless of the number of observations).The realized variance is the best estimator in the univariate case when there is no market mic-srostructure noise. When the noise is present, however, realized variance is significantly biased withincreasing frequency. In the multivariate case, the realized covariance is negatively biased under therefresh time sampling. These are both standard results well covered in the literature.The multi-scale, flat-top realized kernel and pre-averaging estimators perform comparably. Theyhave lower root-mean-square error of the univariate estimates than the non-flat-top realized kerneland modulated realized covariance estimators in the presence of the market microstructure noise.However, they do not ensure positive semidefiniteness, which is a major drawback. Table 1 withfigures 9 and 10 show that the multi-scale estimator requires the lowest bandwidth for the optimalperformance and is the best choice when the bandwidth is lower than optimal. As this is the naturalsituation in the streaming setting, we recommend to adopt the multi-scale estimator in practice whenthe memory is limited. When the bandwidth is higher than optimal, however, the flat-top realizedkernel estimator is most precise. The pre-averaging estimator represents the middle way.The non-flat-top realized kernel and modulated realized covariance estimators ensure positivesemidefiniteness. On the other hand, they are less precise and require higher bandwidth in the uni-variate case as shown in Table 1 and figures 9 and 10. In the case of quadratic covariation betweentwo series, however, they are quite comparable to the estimators not ensuring positive semidefinite-ness. Interestingly, the non-flat-top realized kernel estimator has almost identical performance as themodulated realized covariance estimator even though they structure differs as illustrated in figures 6and 8. As positive semidefinite covariance matrix is quite reasonable requirement, either of these twomethods should be utilized when the bandwidth parameter is not limited.
We deal with the estimation of the quadratic covariation using financial ultra-high-frequency dataexhibiting the market microstructure noise and non-synchronous observations. To our knowledge,the high-frequency literature lacks a comprehensive overview of quadratic covariation estimators ina unified framework. We remedy this and present the multi-scale, flat-top realized kernel, non-flat-top realized kernel, pre-averaging and modulated realized covariance estimators in a quadratic form.We also illustrate differences in the structure between the individual estimators. This is the firstcontribution of the paper.We approach the problem of quadratic covariation estimation from the computational point ofview focusing on limited memory. We utilize the convenient quadratic structure and show that theestimates can be computed by a streaming algorithm when the bandwith is fixed and the edge effectsare omitted. The streaming representation of the estimators is crucial especially when the covariancematrix is vast. This is the second contribution of the paper.We compare the finite-sample performance of the estimators with fixed bandwidth using sim-ulations. We find that for small bandwidth, the multi-scale estimator is the most precise. The13imulation Scenario Quadratic Variation Quadratic CovariationNoise Freq. Method Band. Bias RMSE Band. Bias RMSENone Mod. RV 1 -0.00 0.02 1 -0.20 0.28None Mod. MSE 3 -0.00 0.04 2 -0.00 0.02None Mod. RKE 3 -0.00 0.04 2 -0.00 0.02None Mod. PD-RKE 2 -0.00 0.02 12 -0.01 0.03None Mod. PAE 18 -0.02 0.07 18 -0.01 0.03None Mod. PD-PAE 2 -0.00 0.02 12 -0.01 0.03None High RV 1 -0.00 0.01 1 -0.20 0.28None High MSE 5 -0.00 0.01 2 -0.00 0.01None High RKE 8 -0.00 0.01 3 -0.00 0.01None High PD-RKE 2 -0.00 0.01 20 -0.00 0.01None High PAE 32 -0.01 0.03 30 -0.00 0.01None High PD-PAE 2 -0.00 0.01 20 -0.00 0.01Small Mod. RV 1 41.23 94.96 1 -0.17 0.41Small Mod. MSE 13 0.00 0.08 17 -0.00 0.04Small Mod. RKE 22 0.00 0.09 25 -0.00 0.04Small Mod. PD-RKE 81 0.04 0.21 23 -0.00 0.04Small Mod. PAE 18 -0.01 0.09 26 -0.01 0.04Small Mod. PD-PAE 80 0.04 0.21 23 -0.00 0.04Small High RV 1 405.33 937.21 1 -0.09 1.31Small High MSE 78 0.01 0.04 44 -0.00 0.02Small High RKE 108 0.01 0.04 72 -0.00 0.02Small High PD-RKE 335 0.02 0.10 60 -0.00 0.02Small High PAE 108 0.01 0.05 60 -0.00 0.02Small High PD-PAE 334 0.02 0.10 59 -0.00 0.02Large Mod. RV 1 412.50 950.21 1 -0.27 3.78Large Mod. MSE 49 0.02 0.19 55 -0.00 0.08Large Mod. RKE 73 0.02 0.20 84 -0.00 0.07Large Mod. PD-RKE 195 0.09 0.50 65 -0.00 0.07Large Mod. PAE 58 0.02 0.19 73 -0.00 0.08Large Mod. PD-PAE 194 0.09 0.50 63 -0.00 0.07Large High RV 1 .
57 9 340 . .00.51.01.52.0 0 250 500 750 1000 Bandwidth R M SE Quadratic Variation
Bandwidth R M SE Quadratic Covariation
Method
MSERKEPD−RKEPAEPD−PAE
Simulations with Large Noise and Moderate Frequency
Figure 9: Root-mean-square error of quadratic covariation estimates for various bandwidth parame-ters in the scenario with ξ = 0 . and δ = (1 , . . Bandwidth R M SE Quadratic Variation
Bandwidth R M SE Quadratic Covariation
Method
MSERKEPD−RKEPAEPD−PAE
Simulations with Large Noise and High Frequency
Figure 10: Root-mean-square error of quadratic covariation estimates for various bandwidth param-eters in the scenario with ξ = 0 . and δ = (0 . , . .15at-top realized kernel and pre-averaging estimator, however, perform very similarly. In contrast,the non-flat-top realized kernel and modulated realized covariance estimators which ensure positivesemidefiniteness require much higher bandwidth than the estimators without such constraint. Thisis the third contribution of the paper.Our results find their use in the implementation of the quadratic covariation estimators in practice.Financial applications include derivative pricing, risk management, portfolio allocation and high-frequency trading. Acknowledgements
We would like to thank the organizers and participants of the 23rd International Conference onComputational Statistics (Iasi, August 28–39, 2018) for fruitful discussions.
Funding
The work on this paper was supported by the Internal Grant Agency of the University of Economics,Prague under project F4/21/2018 and the Czech Science Foundation under project 19-02773S.
References
Aït-Sahalia , Y.,
Mykland , P. A.,
Zhang , L. 2005. How Often to Sample a Continuous-TimeProcess in the Presence of Market Microstructure Noise.
The Review of Financial Studies . Volume18. Issue 2. Pages 351–416. ISSN 0893-9454. https://doi.org/10.1093/rfs/hhi016 . Aït-Sahalia , Y.,
Fan , J.,
Xiu , D. 2010. High-Frequency Covariance Estimates with Noisy andAsynchronous Financial Data.
Journal of the American Statistical Association . Volume 105. Issue492. Pages 1504–1517. ISSN 0162-1459. https://doi.org/10.1198/jasa.2010.tm10163 . Aït-Sahalia , Y.,
Mykland , P. A.,
Zhang , L. 2011. Ultra High Frequency Volatility Estimationwith Dependent Microstructure Noise.
Journal of Econometrics . Volume 160. Issue 1. Pages160–175. ISSN 0304-4076. https://doi.org/10.1016/j.jeconom.2010.03.028 . Aknouche , A.,
Guerbyenne , H. 2006. Recursive Estimation of GARCH Models.
Communicationsin Statistics - Simulation and Computation . Volume 35. Issue 4. Pages 925–938. ISSN 0361-0918. https://doi.org/10.1080/03610910600880328 . Andersen , T. G.,
Bollerslev , T.,
Diebold , F. X.,
Labys , P. 2001. The Distribution of RealizedExchange Rate Volatility.
Journal of the American Statistical Association . Volume 96. Issue 453.Pages 42–55. ISSN 0162-1459. https://doi.org/10.1198/016214501750332965 . Andersen , T. G.,
Bollerslev , T.,
Meddahi , N. 2011. Realized Volatility Forecasting and MarketMicrostructure Noise.
Journal of Econometrics . Volume 160. Issue 1. Pages 220–234. ISSN0304-4076. https://doi.org/10.1016/j.jeconom.2010.03.032 . Arce , P.,
Antognini , J.,
Kristjanpoller , W.,
Salinas , L. 2019. Fast and Adaptive CointegrationBased Model for Forecasting High Frequency Financial Time Series.
Computational Economics . Vol-ume 54. Issue 1. Pages 99–112. ISSN 0927-7099. https://doi.org/10.1007/s10614-017-9691-7 . Bandi , F. M.,
Russell , J. R. 2008. Microstructure Noise, Realized Variance, and Optimal Sampling.
Review of Economic Studies . Volume 75. Issue 2. Pages 339–369. ISSN 0034-6527. https://doi.org/10.1111/j.1467-937X.2008.00474.x . Bandi , F. M.,
Russell , J. R.,
Yang , C. 2008. Realized Volatility Forecasting and Option Pricing.
Journal of Econometrics . Volume 147. Issue 1. Pages 34–46. ISSN 0304-4076. https://doi.org/10.1016/j.jeconom.2008.09.002 . 16 arndorff-Nielsen , O. E.,
Shephard , N. 2002. Econometric Analysis of Realized Volatility andIts Use in Estimating Stochastic Volatility Models.
Journal of the Royal Statistical Society: SeriesB (Methodological) . Volume 64. Issue 2. Pages 253–280. ISSN 1369-7412. https://doi.org/10.1111/1467-9868.00336 . Barndorff-Nielsen , O. E.,
Hansen , P. R.,
Lunde , A.,
Shephard , N. 2008. Designing RealizedKernels to Measure the ex post Variation of Equity Prices in the Presence of Noise.
Econometrica .Volume 76. Issue 6. Pages 1481–1536. ISSN 0012-9682. https://doi.org/10.3982/ecta6495 . Barndorff-Nielsen , O. E.,
Hansen , P. R.,
Lunde , A.,
Shephard , N. 2009. Realized Kernelsin Practice: Trades and Quotes.
Econometrics Journal . Volume 12. Issue 3. Pages 1–32. ISSN1368-4221. https://doi.org/10.1111/j.1368-423X.2008.00275.x . Barndorff-Nielsen , O. E.,
Hansen , P. R.,
Lunde , A.,
Shephard , N. 2011. Multivariate RealisedKernels: Consistent Positive Semi-Definite Estimators of the Covariation of Equity Prices withNoise and Non-Synchronous Trading.
Journal of Econometrics . Volume 162. Issue 2. Pages149–169. ISSN 0304-4076. https://doi.org/10.1016/j.jeconom.2010.07.009 . Bibinger , M. 2011. Efficient Covariance Estimation for Asynchronous Noisy High-Frequency Data.
Scandinavian Journal of Statistics . Volume 38. Issue 1. Pages 23–45. ISSN 0303-6898. https://doi.org/10.1111/j.1467-9469.2010.00712.x . Bodenham , D. A.,
Adams , N. M. 2017. Continuous Monitoring for Changepoints in Data StreamsUsing Adaptive Estimation.
Statistics and Computing . Volume 27. Issue 5. Pages 1257–1270. ISSN0960-3174. https://doi.org/10.1007/s11222-016-9684-8 . Černý , M. 2019. Narrow Big Data in a Stream: Computational Limitations and Regression.
In-formation Sciences . Volume 486. Pages 379–392. ISSN 0020-0255. https://doi.org/10.1016/j.ins.2019.02.052 . Christensen , H. L.,
Murphy , J.,
Godsill , S. J. 2012. Forecasting High-Frequency Futures ReturnsUsing online Langevin Dynamics.
IEEE Journal on Selected Topics in Signal Processing . Volume6. Issue 4. Pages 366–380. ISSN 1932-4553. https://doi.org/10.1109/jstsp.2012.2191532 . Christensen , K.,
Kinnebrock , S.,
Podolskij , M. 2010. Pre-Averaging Estimators of the Ex-PostCovariance Matrix in Noisy Diffusion Models with Non-Synchronous Data.
Journal of Economet-rics . Volume 159. Issue 1. Pages 116–133. ISSN 0304-4076. https://doi.org/10.1016/j.jeconom.2010.05.001 . Dahlhaus , R.,
Neddermeyer , J. C. 2014. Online Spot Volatility-Estimation and Decompositionwith Nonlinear Market Microstructure Noise Models.
Journal of Financial Econometrics . Volume12. Issue 1. Pages 174–212. ISSN 1479-8409. https://doi.org/10.1093/jjfinec/nbt008 . Engle , R. F. 2000. The Econometrics of Ultra-High-Frequency Data.
Econometrica . Volume 68.Issue 1. Pages 1–22. ISSN 0012-9682. https://doi.org/10.1111/1468-0262.00091 . Hansen , P. R.,
Lunde , A. 2006. Realized Variance and Market Microstructure Noise.
Journal ofBusiness & Economic Statistics . Volume 24. Issue 2. Pages 127–161. ISSN 0735-0015. https://doi.org/10.1198/073500106000000071 . Harris , F. H. deB.,
McInish , T. H.,
Shoesmith , G. L.,
Wood , R. A. 1995. Cointegration,Error Correction, and Price Discovery on Informationally Linked Security Markets.
Journal ofFinancial and Quantitative Analysis . Volume 30. Issue 4. Pages 563–579. ISSN 0022-1090. https://doi.org/10.2307/2331277 . Hautsch , N.,
Podolskij , M. 2013. Preaveraging-Based Estimation of Quadratic Variation inthe Presence of Noise and Jumps: Theory, Implementation, and Empirical Evidence.
Journal ofBusiness & Economic Statistics . Volume 31. Issue 2. Pages 165–183. ISSN 0735-0015. https://doi.org/10.1080/07350015.2012.754313 . 17 autsch , N.,
Kyj , L. M.,
Malec , P. 2015. Do High-Frequency Data Improve High-DimensionalPortfolio Allocations?
Journal of Applied Econometrics . Volume 30. Issue 2. Pages 263–290. ISSN1099-1255. https://doi.org/10.1002/jae.2361 . Hayashi , T.,
Yoshida , N. 2005. On Covariance Estimation of Non-Synchronously Observed DiffusionProcesses.
Bernoulli . Volume 11. Issue 2. Pages 359–379. ISSN 1350-7265. https://doi.org/10.3150/bj/1116340299 . Hendrych , R.,
Cipra , T. 2019. Recursive Estimation of the Exponentially Weighted MovingAverage Model.
Journal of Risk . Volume 21. Issue 6. Pages 43–67. ISSN 1465-1211. https://doi.org/10.21314/jor.2019.413 . Holý , V.,
Tomanová , P. 2018.
Estimation of Ornstein-Uhlenbeck Process Using Ultra-High-Frequency Data with Application to Intraday Pairs Trading Strategy . Working Paper. https://arxiv.org/abs/1811.09312 . Jacod , J.,
Mykland , P. A. 2015. Microstructure Noise in the Continuous Case: ApproximateEfficiency of the Adaptive Pre-Averaging Method.
Stochastic Processes and Their Applications .Volume 125. Issue 8. Pages 2910–2936. ISSN 0304-4149. https://doi.org/10.1016/j.spa.2015.02.005 . Jacod , J., Li , Y., Mykland , P. A.,
Podolskij , M.,
Vetter , M. 2009. Microstructure Noise inthe Continuous Case: The Pre-Averaging Approach.
Stochastic Processes and Their Applications .Volume 119. Issue 7. Pages 2249–2276. ISSN 0304-4149. https://doi.org/10.1016/j.spa.2008.11.004 . Kontorovich , L. 2012. Statistical Estimation with Bounded Memory.
Statistics and Comput-ing . Volume 22. Issue 5. Pages 1155–1164. ISSN 0960-3174. https://doi.org/10.1007/s11222-011-9293-5 . Loveless , J.,
Stoikov , S.,
Waeber , R. 2013. Online Algorithms in High-Frequency Trading.
Communications of the ACM . Volume 56. Issue 10. Pages 50–56. ISSN 0001-0782. https://doi.org/10.1145/2507771.2507780 . Lunde , A.,
Shephard , N.,
Sheppard , K. 2016. Econometric Analysis of Vast Covariance MatricesUsing Composite Realized Kernels and Their Application to Portfolio Choice.
Journal of Business& Economic Statistics . Volume 34. Issue 4. Pages 504–518. ISSN 0735-0015. https://doi.org/10.1080/07350015.2015.1064432 . Martens , M. P. 2004.
Estimating Unbiased and Precise Realized Covariances . Working Paper. https://ssrn.com/abstract=556118 . Nolte , I.,
Voev , V. 2008.
Estimating High-Frequency Based (Co-) Variances: A Unified Approach .Working Paper. https://ssrn.com/abstract=1003201 . Nolte , I.,
Voev , V. 2012. Least Squares Inference on Integrated Volatility and the RelationshipBetween Efficient Prices and Noise.
Journal of Business & Economic Statistics . Volume 30. Issue1. Pages 94–108. ISSN 0735-0015. https://doi.org/10.1080/10473289.2011.637876 . Ouakasse , A.,
Mélard , G. 2014. On-Line Estimation of ARMA Models Using Fisher-Scoring.
Systems Science & Control Engineering . Volume 2. Issue 1. Pages 406–432. ISSN 2164-2583. https://doi.org/10.1080/21642583.2014.912572 . Palandri , A. 2006.
Consistent Realized Covariance for Asynchronous Observations Contaminatedby Market Microstructure Noise . Working Paper. https://ssrn.com/abstract=2727826 . Sun , Y. 2006.
Best Quadratic Unbiased Estimators of Integrated Variance in the Presence of MarketMicrostructure Noise . Working Paper. https://ssrn.com/abstract=1714751 .18 iu , D. 2010. Quasi-Maximum Likelihood Estimation of Volatility with High Frequency Data. Journal of Econometrics . Volume 159. Issue 1. Pages 235–250. ISSN 0304-4076. https://doi.org/10.1016/j.jeconom.2010.07.002 . Zhang , L. 2006. Efficient Estimation of Stochastic Volatility Using Noisy Observations: A Multi-Scale Approach.
Bernoulli . Volume 12. Issue 6. Pages 1019–1043. ISSN 1350-7265. https://doi.org/10.2307/25464852 . Zhang , L. 2011. Estimating Covariation: Epps Effect, Microstructure Noise.
Journal of Economet-rics . Volume 160. Issue 1. Pages 33–47. ISSN 0304-4076. https://doi.org/10.1016/j.jeconom.2010.03.012 . Zhang , L.,
Mykland , P. A.,
Aït-Sahalia , Y. 2005. A Tale of Two Time Scales: Determining Inte-grated Volatility with Noisy High-Frequency Data.
Journal of the American Statistical Association .Volume 100. Issue 472. Pages 1394–1411. ISSN 0162-1459. https://doi.org/10.2307/27590680 . Žikeš , F.,
Baruník , J. 2015. Semi-Parametric Conditional Quantile Models for Financial Returnsand Realized Volatility.
Journal of Financial Econometrics . Volume 14. Issue 1. Pages 185–226.ISSN 1479-8417. https://doi.org/10.1093/jjfinec/nbu029https://doi.org/10.1093/jjfinec/nbu029