[PDF] Streaming Approach to Quadratic Covariation Estimation Using Financial Ultra-High-Frequency Data

Abstract

We investigate the computational issues related to the memory size in the estimation of quadratic covariation, taking into account the specifics of financial ultra-high-frequency data. In multivariate price processes, we consider both contamination by the market microstructure noise and the non-synchronicity of the observations. We formulate a multi-scale, flat-top realized kernel, non-flat-top realized kernel, pre-averaging and modulated realized covariance estimators in quadratic form and fix their bandwidth parameter at a constant value. This allows us to operate with limited memory and formulate this estimation as a streaming algorithm. We compare the performance of the estimators with fixed bandwidth parameter in a simulation study. We find that the estimators ensuring positive semidefiniteness require much higher bandwidth than the estimators without this constraint.

Full PDF

SStreaming Perspective in Quadratic Covariation EstimationUsing Financial Ultra-High-Frequency Data

Vladimír Holý

University of Economics, PragueWinston Churchill Square 4, 130 67 Prague 3, [email protected]

Petra Tomanová

University of Economics, PragueWinston Churchill Square 4, 130 67 Prague 3, [email protected]

March 31, 2020

Abstract:

We investigate the computational issues related to the memory size in the estimationof quadratic covariation using ﬁnancial ultra-high-frequency data. In the multivariate price pro-cess, we consider both contamination by the market microstructure noise and the non-synchronousobservations. We express the multi-scale, ﬂat-top realized kernel, non-ﬂat-top realized kernel, pre-averaging and modulated realized covariance estimators in a quadratic form and ﬁx their bandwidthparameter at a constant value. This allows us to operate with limited memory and formulatesuch estimation approach as a streaming algorithm. We compare the performance of the estima-tors with ﬁxed bandwidth parameter in a simulation study. We ﬁnd that the estimators ensuringpositive semideﬁniteness require much higher bandwidth than the estimators without such constraint.

Keywords:

Ultra-High-Frequency Data, Market Microstructure Noise, Quadratic Covariation,Streaming Algorithm.

JEL Codes:

C32, C58, C63.

In ﬁnance, Engle (2000) coined the term ultra-high-frequency data referring to irregularly spacedtime series recorded at the highest possible frequency corresponding to each transaction or change inbid/ask oﬀer. Financial high-frequency time series include stock prices, foreign exchange rates andcommodity prices. The availability of these high-frequency data allows econometricians to constructmore precise models while facing some new challenges.A key object in ﬁnancial econometrics is volatility of the price process. For continuous processes,volatility over a given time interval (e.g. a day) is typically measured by the quadratic variation (seee.g. Andersen et al., 2001; Barndorﬀ-Nielsen and Shephard, 2002). In the theoretically ideal settingfor the price process, it can be straightforwardly estimated by the realized variance . In practice,however, the so-called eﬃcient price is concealed as the observed prices are contaminated by the market microstructure noise making the realized variance signiﬁcantly biased at high frequencies. Themarket microstructure noise is caused by various frictions in the trading process such as discretnessof price values, bid-ask spread and information eﬀects (see e.g. Hansen and Lunde, 2006).One way to deal with the market microstructure noise is to sample the price process at lowerfrequencies and ﬁnd the optimal trade-oﬀ between the bias due to noise and precision (see Aït-Sahaliaet al., 2005; Zhang et al., 2005; Bandi and Russell, 2008). A better way is to utilize estimators robustto the noise with the full dataset. There are three dominant approaches in non-parametric estimationof quadratic variation: subsampling (see Zhang et al., 2005; Zhang, 2006; Nolte and Voev, 2012;Aït-Sahalia et al., 2011), autocovariance combining (see Barndorﬀ-Nielsen et al., 2008, 2009) and pre-averaging (see Jacod et al., 2009; Hautsch and Podolskij, 2013; Jacod and Mykland, 2015). Theseestimators have diﬀerent motivations but very similar structure in the end. Sun (2006) and Andersen1 a r X i v : . [ q -f i n . C P ] M a r t al. (2011) show that the multi-scale estimator of Zhang (2006) based on subsampling and the ﬂat-top realized kernel estimator of Barndorﬀ-Nielsen et al. (2008) based on autocovariance combiningcan be expressed in a quadratic form. Furthermore, it can be shown that the pre-averaging estimator of Jacod et al. (2009) has also the quadratic form structure. Finally, when assuming a speciﬁc modelfor the price process, a parametric approach for the volatility estimation can also be adopted (seeAït-Sahalia et al., 2005, 2010; Xiu, 2010; Holý and Tomanová, 2018). Precise estimation of volatilityis the cornerstone of derivative pricing (see e.g. Bandi et al., 2008) and risk management (see e.g.Žikeš and Baruník, 2015).The estimation of quadratic covariation between two price processes is even more challenging dueto non-synchronous trading . When the observations are simply synchronised using the previous-tickinterpolation scheme , the Epps eﬀect occurs and the realized covariance is biased (see e.g.Hayashiand Yoshida, 2005; Zhang, 2011). Zhang (2011), however, shows that subsampling in the two-scalesestimator cancels not only the market microstructure noise but the Epps eﬀect as well. Hayashiand Yoshida (2005) propose a consistent estimator for the quadratic covariation utilizing the orig-inal unaltered data. This approach is also adopted by Palandri (2006), Nolte and Voev (2008),Christensen et al. (2010) and Bibinger (2011). Finally, Martens (2004), Christensen et al. (2010)and Barndorﬀ-Nielsen et al. (2011) synchronise observations using the refresh times of Harris et al.(1995). Estimation of vast covariation matrices is crucial in high-dimensional portfolio allocation (seee.g. Hautsch et al., 2015; Lunde et al., 2016).From the computational perspective, it is natural to consider ﬁnancial high-frequency data as a data stream . A streaming algorithm can examine a sequence of inputs in a single pass only. Theavailable memory for the computation is limited and cannot store all data. The proper deﬁnitionmay vary in diﬀerent papers (see e.g. Kontorovich, 2012 vs. Černý, 2019). Related concepts arethe online algorithm and the recursive algorithm which focus on the updating scheme rather thanthe memory constraints. Examples of streaming, online and recursive algorithms in the ﬁeld ofeconometrics include the estimation and diagnostics of the linear regression model by Černý (2019),the estimation of the ARMA model by Ouakasse and Mélard (2014), the estimation of the GARCHmodel by Aknouche and Guerbyenne (2006), the estimation of the EWMA model by Hendrych andCipra (2019), the estimation of the spot volatility by Dahlhaus and Neddermeyer (2014) and thedetection of changepoints by Bodenham and Adams (2017). Naturally, computationally eﬀectivealgorithms are crucial in high-frequency trading (see e.g. Christensen et al., 2012; Loveless et al.,2013; Arce et al., 2019).In the paper, we focus on the non-parametric estimation of the quadratic covariation from thestreaming perspective. We consider both the market microstructure noise and the non-synchronoustrading. First, we introduce the commonly used estimators robust to the market microstructurenoise: the multi-scale estimator of Zhang (2006), the ﬂat-top realized kernel estimator of Barndorﬀ-Nielsen et al. (2008), the non-ﬂat-top realized kernel estimator of Barndorﬀ-Nielsen et al. (2011),the pre-averaging estimator of Jacod et al. (2009) and the modulated realized covariance estimatorof Christensen et al. (2010). With the aim of the uniﬁed and simple computational framework, weexpress the estimators in a quadratic form. All ﬁve estimators depend on a bandwidth parameter.The respective papers show that the optimal value of this parameter is of order related to the numberof observations. In contrast, we consider the bandwidth parameter to be constant. This of courseleads to sub-optimal performance but allows us to adopt a streaming algorithm with ﬁxed memoryleading to fast computation. In a simulation study, we compare the estimators and show the impact ofthe constant bandwidth parameter. In the case of non-synchronous trading, we can straightforwardlysynchronise the observations by the refresh times method of Harris et al. (1995) with no computationalissues.The rest of the paper is organized as follows. In Section 2, we present the standard framework forthe price process and quadratic variation. In Section 3, we describe the class of quadratic estimatorsand our streaming approach. In Section 4, we evaluate the performance of the estimators with ﬁxedbandwidth using simulations. We conclude the paper in Section 5.2

Theoretical Framework

We utilize the standard framework for the price process. Let us denote the m -dimensional logarithmic eﬃcient price as P t in continuous time t ≥ . We consider the eﬃcient price to follow a multivariatecontinuous Itô semimartingale given by P t = P + (cid:90) t µ s d s + (cid:90) t σ s d W s , (1)where µ s is a multivariate ﬁnite variation càdlàg drift process, σ s is a multivariate adapted càdlàgvolatility process and W s is a vector of independent Wiener processes. This class is quite general as µ s and σ s can vary over time. However, a limitation is that we do not consider jumps in this modelas the process is deﬁned to have continuous paths. Without the loss of generality, we restrict ourselves to the time interval [0 , . The quadratic covari-ation of the process P t on [0 , is then given by QV = plim ∆ n → n (cid:88) i =1 (cid:16) P T i − P T i − (cid:17)(cid:16) P T i − P T i − (cid:17) (cid:48) , (2)where plim denotes the limit in probability and ∆ n = max { T − T , T − T , . . . , T n − T n − } is themaximal lag between observations of synchronized partitions T < T < · · · < T n = 1 . Of courseas ∆ n → , we have that n → ∞ . In our case of the continuous Itô semimartingale, the quadraticvariation is identical to the integrated covariance given by IV = (cid:90) σ s σ (cid:48) s d s. (3)For general semimartingales, however, they diﬀer due to the jump component. Let us consider that we observe the m -dimensional price process at non-synchronous discrete times.Furthermore, the observed price process is contaminated by the market microstructure noise. Let usdenote the k -th univariate observed price process as X ki at discrete times ≤ T k < T k < · · · < T kn k ≤ . Note that the observations can be irregularly spaced. The k -the component of the latent priceprocess P kt and the observed price process X ki are then related as X ki = P kT ki + E ki , E ki ∼ (0 , ω k ) for i = 0 , . . . , n k , (4)where E ki is the market microstructure noise with zero expected value and standard deviation ω k .At this point, we do not impose additional assumptions on the noise E ki as various estimators re-quire various assumptions. We refer to the respective papers of the individual estimators for moredetails. In an empirical study of ﬁnancial markets, Hansen and Lunde (2006) show that the marketmicrostructure noise is dependent both on its past values and the eﬃcient price process. Next, we synchronize observation times. Similarly to Harris et al. (1995), we deﬁne the refresh times ≤ T < T < · · · < T n ≤ in the following way. Let the initial refresh time be T = max (cid:8) T , . . . , T m (cid:9) . (5)3ext, let the subsequent refresh times be T i = min (cid:110) t : t ≥ T j > T i − , t ≥ T j > T i − , . . . , t ≥ T mj m > T i − (cid:111) for i = 1 , . . . , n. (6)Martens (2004) uses this scheme for the realized covariance. Barndorﬀ-Nielsen et al. (2011) showsthat this synchronisation leads to consistent estimates of quadratic covariation by the realized kernelestimator while Christensen et al. (2010) present similar results for the modulated realized covarianceestimator. After the synchronisation, we can write the m -dimensional observed price process as X i =( X i , X i , . . . , X mi ) (cid:48) and the market microstructure noise as E i = ( E i , E i , . . . , E mi ) (cid:48) with synchronisedtimes T i , i = 0 , . . . , n . Furthermore, we deﬁne the observed returns as Y i = X i − X i − , i = 1 , . . . , n .Finally, let Y = ( Y , . . . , Y n ) = ( Y ki ) m,nk =1 ,i =1 denote the matrix of the observed returns with rowsindicating the asset and columns corresponding to the time. Of course, not all prices are observedexactly at the same moment corresponding to a refresh time. Often, a new price of only a singleasset is observed. For the other assets, the last observed price is utilized. This is similar to theprevious-tick approach but the price interpolation is performed only at refresh times instead of allobservation times. We estimate the quadratic covariation QV in the presence of the market microstructure noise byvarious non-parametric methods within a uniﬁed framework based on a quadratic form. Sun (2006)and Andersen et al. (2011) consider the class of quadratic estimators in the univariate case of quadraticvariation. The estimators of quadratic covariation in the quadratic class based on the returns Y i canbe formulated as QE = n (cid:88) i =1 n (cid:88) j =1 w i,j Y i Y (cid:48) j = Y W Y (cid:48) , (7)where W = ( w i,j ) n,ni =1 ,j =1 are weights associated with a given estimator. The formula can also berewritten in a quadratic form for the actual prices X i as QE = n (cid:88) i =0 n (cid:88) j =0 v i,j X i X (cid:48) j = XV X (cid:48) , (8)where weights V = ( v i,j ) n,ni =0 ,j =0 are given by V = F (cid:48) W F . The elements of matrix F = ( f i,j ) n,ni =1 ,j =0 are given by f i,j =  for j = i + 1 , − for j = i, otherwise. (9) In general, the computation of quadratic form cannot be formulated as a streaming algorithm aseach observation is required to be stored in the memory. However, we can impose restrictions on theweight matrix W in order to make the computation streaming. Let the elements of the weight matrix W meet w i,j = (cid:26) u | i − j | for | i − j | < h, otherwise , (10)where u = ( u , . . . , u h − ) (cid:48) is the updating vector determining a given streaming estimator. Matrix W is therefore symmetric (2 h − -diagonal matrix in which the elements in the main diagonal and eachlower and upper diagonal are the same and determined by the updating vector u . At time T i , thequadratic estimator can then be recursively computed as QE i = QE i − + u Y i Y (cid:48) i + h − (cid:88) j =1 u j Y i Y (cid:48) i − j + h − (cid:88) j =1 u j Y i − j Y (cid:48) i . (11)4 Weight Matrix for Returns