[PDF] Malliavin-Mancino estimators implemented with non-uniform fast Fourier transforms

Abstract

We implement and test kernel averaging Non-Uniform Fast Fourier Transform (NUFFT) methods to enhance the performance of correlation and covariance estimation on asynchronously sampled event-data using the Malliavin-Mancino Fourier estimator. The methods are benchmarked for Dirichlet and Fejér Fourier basis kernels. We consider test cases formed from Geometric Brownian motions to replicate synchronous and asynchronous data for benchmarking purposes. We consider three standard averaging kernels to convolve the event-data for synchronisation via over-sampling for use with the Fast Fourier Transform (FFT): the Gaussian kernel, the Kaiser-Bessel kernel, and the exponential of semi-circle kernel. First, this allows us to demonstrate the performance of the estimator with different combinations of basis kernels and averaging kernels. Second, we investigate and compare the impact of the averaging scales explicit in each averaging kernel and its relationship between the time-scale averaging implicit in the Malliavin-Mancino estimator. Third, we demonstrate the relationship between time-scale averaging based on the number of Fourier coefficients used in the estimator to a theoretical model of the Epps effect. We briefly demonstrate the methods on Trade-and-Quote (TAQ) data from the Johannesburg Stock Exchange to make an initial visualisation of the correlation dynamics for various time-scales under market microstructure.

Full PDF

MMalliavin-Mancino estimators implemented with non-uniform fast Fourier transforms

Patrick Chang a , Etienne Pienaar a , Tim Gebbie a a Department of Statistical Sciences, University of Cape Town, Rondebosch 7700, South Africa

Abstract

We implement and test kernel averaging Non-Uniform Fast Fourier Transform (NUFFT) methods to enhance the performanceof correlation and covariance estimation on asynchronously sampled event-data using the Malliavin-Mancino Fourier estimator.The methods are benchmarked for Dirichlet and Fej´er Fourier basis kernels. We consider test cases formed from GeometricBrownian motions to replicate synchronous and asynchronous data for benchmarking purposes. We consider three standardaveraging kernels to convolve the event-data for synchronisation via over-sampling for use with the Fast Fourier Transform(FFT): the Gaussian kernel, the Kaiser-Bessel kernel, and the exponential of semi-circle kernel. First, this allows us todemonstrate the performance of the estimator with diﬀerent combinations of basis kernels and averaging kernels. Second, weinvestigate and compare the impact of the averaging scales explicit in each averaging kernel and its relationship between thetime-scale averaging implicit in the Malliavin-Mancino estimator. Third, we demonstrate the relationship between time-scaleaveraging based on the number of Fourier coeﬃcients used in the estimator to a theoretical model of the Epps eﬀect. Webrieﬂy demonstrate the methods on Trade-and-Quote (TAQ) data from the Johannesburg Stock Exchange to make an initialvisualisation of the correlation dynamics for various time-scales under market microstructure.

Keywords:

Malliavin-Mancino estimator, non-uniform fast Fourier transform, Trade-and-Quote event-data, Epps eﬀectAMS subject classiﬁcations: 62G08, 65T04, 62P08

1. Introduction

Data-informed approaches to modelling the relationshipsbetween fast asynchronous streaming event-data features re-quires eﬃcient algorithms to compute the dependency or sim-ilarity across data features. This can be useful to relate col-lections of similar features to similar but potentially useful in-formation on the appropriate decision time-scale. When thedependency structure can be approximated by an averagedrealised correlation or covariance matrix then the problem ofestimation from asynchronous event data can be signiﬁcantlysimpliﬁed. Then the problem of correlation and covarianceestimation over asynchronous event data can be addressedusing the Malliavin-Mancino estimator [27, 28, 29].This has several advantages over ad-hoc averaging and in-terpolation methods built on the underlying assumptions ofcontinuity, such as the approach taken in the well under-stood Hayashi-Yoshida estimator [4]. However, the Malliavin-Mancino estimator is built on numerically evaluating Fouriertransforms and their inverses. This has a computational cost.Quickly extracting realised correlations or covariances on agiven time-scale for large feature sets of distinct asynchronousevents without biased interpolation is key to avoiding spuri-ous correlations that can lead to ineﬀective decision makingunder uncertainty.

Email addresses: [email protected] (Patrick Chang), [email protected] (Etienne Pienaar), [email protected] (Tim Gebbie)

This paper directly addresses two key issues: First, that ofperformance as measured by computational speed. Second,the implicit dependence of time-scale in the estimation ofrealised covariances and correlations on asynchronous eventdata using the Malliavin-Mancino estimator. The key contri-bution is to mitigate the ﬁrst problem using non-uniform fastFourier transforms to compute the Fourier coeﬃcients in theMalliavin-Mancino estimator, and to provide clarity into thesecond idea using insights from the non-uniform fast Fouriertransform.Performance is a key requirement in two related use cases,namely simulation and real-time estimation. Being able tocarry out large scale Monte-Carlo simulations over many fea-tures and many time-scales where one needs to iterate andrecompute the correlation matrix over event data. In a real-time environment where decisions are being made on stream-ing event-data, the use of fast methods can reduce the time-scales of eﬀective data-sampling. For example, the minimumeﬀective sampling rate of correlation based state detectionis bounded by the compute time of the correlation matrix.A speed improvement on the compute time of the realisedcovariance, or realised correlation matrix potentially allowsmore time for learning algorithm convergence and identiﬁ-cation. This can be of particular importance for learningalgorithms that require many updates to identify a reliableoptimal relationship between actions and system states givenan objective, such as Q-learning based implementations ofreinforcement learning for trading [18, 19, 21].Concretely, we extend an approach to performance en-1 a r X i v : . [ q -f i n . C P ] N ov ancement based on the fast Fourier transform [10] in thecontext of the Malliavin-Mancino estimator [27, 28] by usingnon-uniform fast Fourier transform methods [2, 12, 16]. Thiscombines the performance advantage of fast Fourier trans-forms while providing intuition into the time-scale averaging.This follows from the basic idea behind the non-uniform fastFourier transform: convolving the data onto a uniform grid(dependent on the number of Fourier coeﬃcients required)through a choice of averaging kernel. Furthermore, the av-eraging kernel has an explicit averaging scale which providesavenues for controlling speed and accuracy.We hope to follow Ren`o [38] and Precup and Iori [37] by us-ing the choice of the number of Fourier coeﬃcients N as themethod of tuning the estimation to diﬀerent time-scales. Toimplement this with conﬁdence using NUFFT methods, weneed to understand the relative dependencies between kernelaveraging (proxied by the tolerances) and time-scale aver-aging (proxied by the number of Fourier coeﬃcients) undersimulation to evaluate their impact on the estimated corre-lations. Moreover, we need to ensure the NUFFT estimatesrecover the same estimates as the original implementation.To explore this idea we consider three diﬀerent averagingkernels: (i) the Gaussian kernel [16] (see eq. (5)), (ii) theKaiser-Bessel kernel [36] (see eqs. (7) and (8)), and (iii) theexponential of semi-circle kernel [2] (see eqs. (9) and (10)).In conjunction with these choices of averaging kernels, weconsider two diﬀerent choices of Fourier basis kernels: (i) theDirichlet, and (ii) the Fej´er basis kernels. Combinations ofthese are compared with diﬀerent length and breadth data-sets and for diﬀerent numbers of Fourier coeﬃcients. Thisallows us to better understand the relative algorithm perfor-mance by comparing algorithm compute times with data-sizeand various tolerance levels (see Figures 4 and 5).These combinations of kernel choices are benchmarkedagainst three vanilla algorithms that implement the Malliavin-Mancino estimator: (i) the benchmark “for-loop” implemen-tation ﬁrst provided by Mancino, Recchioni and Sanfelici [29],(ii) a vectorised implementation with speed enhancements as-suming real-valued data [19, 25, 26], and (iii) a zero-paddedFast Fourier implementation [20, 26] that allows the use ofthe fast Fourier transform on asynchronous data without theneed to apply an averaging kernel, but using an underlyingmissing data approach to implement lossless interpolation.Here an important observation is that using the zero-padded FFT to compute the Malliavin-Mancino estimator canonly work for uniformly sampled data that has missing datapoints and fails for truly asynchronous data (see Figure 6and Section 2.2.4). This is the key motivation for the neces-sary requirement of using a non-uniform FFT in the settingof speeding up the compute time of the Malliavin-Mancinoestimator using the fast Fourier transform method for asyn-chronous event data. The zero-padded FFT biases the data,while the non-uniform FFT does not if correctly used. It isfor this reason that we promote the idea of using the NUFFTin conjunction with the Malliavin-Mancino estimator if thedata is asynchronous, discrete and event driven.The paper is organised as follows: Section 2 we outline the various implementation methods for the Malliavin-Mancinoestimator. Section 3 we benchmark the various algorithms tounderstand the factors impacting speed. Moreover, we deter-mine the conditions required for the NUFFT implementationto recover the correct estimates. Section 4 we demonstratethe link between the number of Fourier coeﬃcients and theimplicit time-scale investigated along with its relation to atheoretical model of the Epps eﬀect. We then carry-out EDAon real world TAQ data to investigate the correlation dy-namics under market microstructure. We ﬁnally conclude inSection 5 to summarise our ﬁndings.

2. Algorithm Outline

Malliavin and Mancino [27, 28] proposed an estimator thatis constructed in the frequency domain. It expresses theFourier coeﬃcients of the volatility process using the Fouriercoeﬃcients of the price process p i ( t ) = ln( S i ( t )) , where S i ( t ) is the generic asset price at time t . By re-scaling the tradingtimes from [0 , T ] to [0 , π ] (see algorithm 4) and using the Bohr convolution product (see Theorem 2.1 of [28]) we havethat for all k ∈ Z and N samples: F (Σ ij )( k ) = lim N →∞ π N + 1 (cid:88) | s |≤ N F ( dp i )( s ) F ( dp j )( k − s ) . (1)Here F ( ∗ )( (cid:63) ) is the (cid:63) th Fourier coeﬃcient of the ∗ process.Now using previous tick interpolation to avoid a downwardbias in the estimator [3] and a simple function approxima-tion for the Fourier coeﬃcients (see [4, 25, 28]), we obtainthe Dirichlet representation of the integrated volatility/co-volatility estimator: ˆΣ ijn,N = 12 N + 1 n i − ,n j − (cid:88) | s |≤ Nh =1 ,(cid:96) =1 e i s ( t j(cid:96) − t ih ) δ i ( I h ) δ j ( I (cid:96) ) , (2)where ( t ih ) h =1 ,...,n i and ( t j(cid:96) ) (cid:96) =1 ,...,n j are the observationtimes for asset i and j , and the price ﬂuctuations are: δ i ( I h ) = p i ( t ih +1 ) − p i ( t ih ) , δ j ( I (cid:96) ) = p j ( t j(cid:96) +1 ) − p j ( t j(cid:96) ) , for the i th and j th asset respectively. Note that n i is thesample dimension for price p i and n j that of the price p j ,which a priori can be diﬀerent.An alternate version of the Fourier estimator is the Fej´errepresentation: We try follow the notation of [27, 28, 29, 30] where i ∈ C in theexponential deﬁning the Fourier transform is such that Re(i) = 0 and

Im(i) = 1 . It should not be confused with integer indices i , for exampleon the times t ih . Σ ijn,N = 1 N + 1 n i − ,n j − (cid:88) | s |≤ Nh =1 ,(cid:96) =1 (cid:18) − | s | N (cid:19) e i s ( t j(cid:96) − t ih ) δ i ( I h ) δ j ( I (cid:96) ) , (3)which is more stable under the presence of market microstruc-ture noise [28].The various implementation methods follow the same gen-eral structure (outlined in algorithm 3). First, we re-scale thetrading times from [0 , T ] to [0 , π ] (see algorithm 4) and com-pute the Nyquist frequency (see algorithm 2). Second, wecompute the non-normalised Fourier coeﬃcients F ( dp i )( k ) k ∈ {− N, ..., N } for all assets. Finally, we compute eitherthe Dirichlet or Fej´er representation of the estimator. Thediﬀerence between the implementation methods are in thecomputation of the Fourier coeﬃcients. The computationally intensive step in the Malliavin-Mancino estimator is the computation of the Fourier coef-ﬁcients for each asset deﬁned as: F ( dp i )( k ) = 12 π n i − (cid:88) h =1 δ i ( I h ) e − i kt ih , (4)for k ∈ {− N, ..., N } , and i = 1 , ..., D features. We outlinethe various methods to evaluate eq. (4) along with the use-case, beneﬁts, pitfalls and general algorithm complexity. The Mancino et al. implementation (see algorithm 5)is from the appendix of [29] and uses a for-loop construc-tion. The evaluation relies on looping through {− N, ..., N } to compute the k th Fourier mode. The implementation doesnot rely on any techniques to improve performance and willact as a benchmark to compare against other methods. Themethod can be used for all synchronous and asynchronouscases. The complexity is O ( n ) , the same as Discrete FourierTransforms (DFTs). The legacy code implementation (see algorithm 6) isbased on a MATLAB implementation [20, 26]. The dif-ference compared to the Mancino et al. implementation isthat all the Fourier modes are evaluated in parallel by vec-torising the computation. Here we further improve uponthe legacy code by exploiting techniques found in Mancinoet al. [29]. Concretely, we exploit the Hermitian symmetry F ( dp i )( k ) = F ( dp i )( − k ) where F denotes the conjugate Mancino et al. [29] picks N such that MSE is minimised. The use-case refers to the ability to evaluate synchronous or asyn-chronous time-series data. The complexity is given only for the synchronous case when n = n = ... = n D , N = n , and t ih +1 − t ih = ∆ t , ∀ i = 1 , ..., D and ∀ h = 1 , ..., n − . function of F . This is possible because the source strengths δ i ( I h ) are all real-valued. Therefore, we only need to evalu-ate k ∈ { , ..., N } and obtain the conjugates for these Fouriermodes. Finally, F ( dp i )(0) = (cid:80) n i − h =1 δ i ( I h ) / π must be com-puted to complete the range of Fourier modes required for theconvolution. The method can be used for all synchronous andasynchronous cases with a complexity of O ( n ) . The keyconcern with this method is the memory usage constraintsthat it can face. After inspecting algorithm 6 we see that alarge matrix of size ( n × N ) is required for the vectorisationwhich can adversely aﬀect performance by either: (i) pre-maturely ending the computation due to insuﬃcient memoryor heap-size constraints, or (ii) slow down performance dueto an over-reliance on virtual-memory management. There-fore, memory management is crucial for eﬀective performanceenhancements of large data-sets. The FFT implementation used here is the current state-of-the-art FFTW package [14] based on the Cooley-Tukeyalgorithm [10] to compute the Fourier modes. This imple-mentation also exploits the Hermitian symmetry, making thisthe fastest implementation known to the authors. This im-plementation has a well understood complexity of O ( n log n ) .The key constraint of this method is its restriction to strictlysynchronous data, so the evaluation becomes a simple DFT. The Zero-padded FFT (ZFFT) implementation extends theFFT implementation by zero padding missing observations.Therefore, allowing the computation of the asynchronouscase under a missing data representation (see algorithm 7).The implementation computes the minimum sampling inter-val ∆ t and creates a new over-sampled grid with intervals ∆ t .The observations are then placed at the nearest neighbour ofthe over-sampled grid. The FFT algorithm is then appliedto the new over-sampled grid. The implementation retains acomplexity of O ( n log n ) but is slower than the FFT imple-mentation since it does not exploit the Hermitian symmetryand requires the additional step of creating an over-sampledgrid. This is our benchmark asynchronous approach to thefast Fourier transform.Figure 1 demonstrates two points: (i) how the zero-paddedimplementation works, and (ii) why the implementation doesnot work for the asynchronous case using an arrival time rep-resentation. The original grid has equal spacing ∆ t whenasynchrony is induced using a missing data representation.Therefore, the over-sampled grid will be at the same timepoints as the original grid with a value of zero when there isa missing observation. Meaning that there is no shifting of It is also recommended that this be implement to preserve the ﬁl-tration structure of time-series events by moving to the nearest rightneighbour so that information in the future is shifted, at worst, furtherinto the future, but never into the past relative to a particular time toavoid temporal contamination. .0 1.0 2.0 3.0 4.0 5.0 6.00.0 1.0 2.0 3.0 4.0 5.0 6.0Original GridUp−sampled Grid Figure 1: A toy example to show how the zero-padding works for thezero-padded FFT implementation. Here the minimum sampling intervalis ∆ t = 1 . . A new uniform over-sampled grid is created and observa-tions are placed on the nearest neighbouring point of the over-sampledgrid. time points, allowing the correct recovery of eq. (4). How-ever, the original grid does not have equal spacing ∆ t whenasynchrony is induced using an arrival time representation.Resulting in the time points being shifted (seen in the thirdarrow from the left in Figure 1) and the incorrect recovery ofeq. (4). The Non-Uniform FFT (NUFFT) implementation of theMalliavin-Mancino estimator is the main contribution of thispaper. We want a fast algorithm to evaluate eq. (4) when ( t ih ) h =1 ,...,n i are non-uniformly spaced in [0 , π ] . This can beachieved by using the 1-dimensional “type 1” NUFFT [2, 16](also known as the adjoint NUFFT [36]). We adopt thepopular NUFFT algorithm [2, 16, 36]: (i) convolve the non-uniform source points onto an over-sampled uniform grid,(ii) apply the FFT on the uniform up-sampled grid, and (iii)deconvolve the eﬀects of the convolution in the Fourier space.The convolution is achieved with a kernel ϕ ( x ) . We con-sider the three most popular kernels: the Gaussian kernelusing the fast Gaussian gridding implementation from [16],the Kaiser-Bessel kernel using the implementation approachof [36], and the exponential of semi-circle used by the state-of-the-art FINUFFT package [2].To set the theoretical scene: let M = 2 N + 1 be thenumber of Fourier modes we want returned, σ be the over-sampling ratio (most studies have settled on σ = 2 [2]), ξ (cid:96) be the (cid:96) th location on the over-sampled grid with (cid:96) ∈{ , ..., M r − σM − } and ω is the spreading width with M sp as the spreading in each direction. The Gaussian kerneland its Fourier transform is deﬁned as: ϕ G ( x ) = e − x / τ and ˆ ϕ G ( k ) = √ πe − k τ . (5)Here τ is deﬁned as The choice of kernel has a fascinating history and has a signiﬁcantimpact on the speed of NUFFTs. We refer the reader to [2] for furtherdetails. τ = 1 M πσ ( σ − . M sp . (6)The Kaiser-Bessel pair is deﬁned as: ϕ KB ( x ) = 1 π  sinh ( b √ M sp − M r x ) √ M sp − M r x | x |≤ M sp M r , sin ( b √ M r x − M sp ) √ M r x − M sp otherwise , (7)and ˆ ϕ KB ( k ) = 1 M r I (cid:18) m (cid:113) b − (2 πk/M r ) (cid:19) , (8)where b = π (cid:0) − σ (cid:1) and I ( · ) is the modiﬁed zero-orderBessel function [36]. Finally, the exponential of semicirclepair is deﬁned as: φ ES ( x ) = (cid:40) e β ( √ − x − ) | x |≤ , otherwise , (9)and ˆ φ ES ( k ) = (cid:90) ∞−∞ φ ES ( x ) e i kx dx, (10)where β = 2 . ω . The kernel is re-scaled to have supportbetween [ − α, α ] with α = πω/M r . Thus the re-scaled kernelis then ϕ ES ( x ) = φ ES ( x/α ) and ˆ ϕ ES ( k ) = α ˆ φ ES ( αk ) . Theexponential of semicircle kernel has no known analytic Fouriertransform; therefore, numerical integration is used to obtain ˆ ϕ ES ( k ) . See [2] for more details on their implementation.We focus our attention on the implementation for the var-ious kernels which have diﬀerent periodicity. The Gaussianand exponential of semi-circle are π -periodic with domainon [0 , π ] [2, 16], while the Kaiser-Bessel kernel is -periodicwith domain on [0 , . Therefore ξ (cid:96) , t ih ∈ [0 , π ] for the Gaus-sian and exponential of semi-circle kernel, and ξ (cid:96) , t ih ∈ [0 , for the Kaiser-Bessel kernel.Now let p be the periodicity, then its periodisation is ˜ ϕ ( x ) = ∞ (cid:88) r = −∞ ϕ ( x − rp ) . (11)Hence the source strength on the over-sampled grid is givenby the periodic discrete convolution f ϕ,dp i ( ξ (cid:96) ) = n i − (cid:88) h =1 δ i ( I h ) ˜ ϕ ( ξ (cid:96) − t ih ) , for (cid:96) = 0 , ..., M r − . (12)The full derivation to obtain eq. (12) can be found in either[2, 16, 36]. The second step is to now evaluate the DFT ofthe over-sampled grid using the standard FFT Potts and Steidl [36] have domain on [ − , ] , but we change it to [0 , for simpler implementation. The actual domain is not importantprovided the periodicity is correct, this is because all that matters forthe convolution is the distances between t ih and ξ (cid:96) . ϕ ( dp i )( k ) = M r − (cid:88) (cid:96) =0 f ϕ,dp i ( ξ (cid:96) ) e − π i k(cid:96)/M r , for k = − M r / , ..., M r / − . (13)The ﬁnal step, as a consequence of the convolution theoremis to correct the eﬀects of the convolution and retain the M central frequencies [2] F ( dp i )( k ) = F ϕ ( dp i )( k ) / ˆ ϕ ( k ) , for k = − N, ..., N. (14)The Fourier coeﬃcient F ( dp i )( k ) in eq. (14) is the evalu-ation of F ( dp i )( k ) in eq. (4) using non-uniform fast Fouriertechniques. The level of numerical accuracy between eq. (14)and eq. (4) can be measured as the relative (cid:96) -norm in theoutput vector deﬁned as: (cid:15) = (cid:107) F ( dp i ) − F ( dp i ) (cid:107) (cid:107) F ( dp i ) (cid:107) . (15)Moreover, the desired level of accuracy can be controlledby the amount of spreading in each direction M sp (interms of number of grid points). We found that set-ting M sp = (cid:98) − ln( (cid:15) )( σ − / π ( σ − + (cid:99) for the Gaussian kernel, M sp = (cid:98) ( (cid:100) log ( (cid:15) ) (cid:101) + 2) (cid:99) for the Kaiser-Bessel kernel,and M sp = (cid:98) ( (cid:100) log ( (cid:15) ) (cid:101) + 2) (cid:99) + 2 for the exponential ofsemi-circle kernel allow us to achieve the desired relative errorlevel. At ﬁrst glance, eq. (12) seems a lot more expensive thanit actually is. This is based on two observations: ﬁrst, thekernels in equation eqs. (5), (7) and (9) are sharply peaked ina manner such that the contribution of δ i ( I h ) to grid pointsoutside the kernel width is zero (the kernels have small nu-merical support). Second, the evaluation of ˜ ϕ ( · ) is unneces-sary; we only need to evaluate ϕ ( · ) (see Figure 2). This isbecause the purpose of ˜ ϕ ( · ) is to account for the periodic-ity when spreading near the end points of the over-sampledgrid. Using these observations, we can eﬃciently implementeq. (12) by looping through the source points. Find the near-est up-sampled grid point ξ (cid:96) ∗ that is less than or equal to t ih .Spread to the s ∈ {− M sp , .., M sp } nearest grid points ξ (cid:96) ∗ − s with δ i ( I h ) ϕ ( t ih − ξ (cid:96) ∗ − spM r ) , subject to the condition thatwhen (cid:96) ∗ − s < the index becomes (cid:96) ∗ − s + M r , and when (cid:96) ∗ − s ≥ M r the index becomes (cid:96) ∗ − s − M r to account forthe correct indices due to the periodicity.The method can be used for all synchronousand asynchronous cases and has a complexity of O ( M r log M r + n | log ( (cid:15) ) | ) [2]. We tuned the M sp such that it always strictly achieves the desirederror level. We note that our choice of M sp is stricter than that in theliterature. Speciﬁcally, [2] set ω = (cid:100) log ( (cid:15) ) (cid:101) + 1 . NUFFT error is thetest script to check that the desired relative (cid:96) error is strictly achievedand can be found in the GitHub resource [5]. x * x = x x x x x p Up−sampled Gridf j Figure 2: The ﬁgure is a toy example to show how the spreading worksfor a single source point t ih . The up-sampled grid is denoted by ξ (cid:96) , andthe source strength is spread to the nearest grid points as δ i ( I h ) ϕ ( ξ (cid:96) − t ih ) . The ﬁgure aims to show that we only need to evaluate ϕ ( ξ (cid:96) − t ih ) instead of ˜ ϕ ( ξ (cid:96) − t ih ) . The grid points ξ ∗ and ξ (denoted by a red star)is the same point due to the periodicity, but the distance between ξ − t ih is large resulting in ϕ ( ξ − t ih ) ≈ . ˜ ϕ ( ξ − t ih ) ﬁxes this by accounting forthe periodicity. We can reduce the unnecessary computation of eq. (11)by contributing δ i ( I h ) to f ϕ,dp i ( ξ ) with δ i ( I h ) ϕ ( ξ ∗ − t ih ) . The use of non-uniform FFT methods presents not onlya speed advantage, but provides insights in: (i) the im-plicit time-scale averaging (controlled by N ) in the Malliavin-Mancino estimator, and (ii) interpolation of ﬁnancial data.First, the Malliavin-Mancino estimator aims to representthe Fourier coeﬃcients of the volatility process as a functionof the Fourier coeﬃcients of the price process. Therefore,investigation into diﬀerent time scales of the volatility processis limited to the sampling rate of the price process. Thehighest sampling rate present in the data is N , then theNyquist frequency is . N = N — the highest componentfrequency we can investigate without introducing aliasing.Meaning we are band-limited to frequencies ≤ N .To reconstruct the volatility process at the Nyquist fre-quency, we require at least N samples. This condition issatisﬁed by construction of the Bohr convolution productwith Fourier modes ranging from {− N, ..., N } — resultingin a sampling frequency M = 2 N + 1 samples.The relation between the number of Fourier modes andthe sampling interval is simply TM = ∆ t . Therefore, wecan investigate diﬀerent time scales by investigating diﬀer-ent frequency ranges. This is due to a consequence of thesampling theorem — which in essence states that in order toperfectly reconstruct a certain frequency, one needs at leasttwice the amount of samples. By reducing the number ofFourier modes (investigating larger time scales), we are re-ducing the number of samples and thereby aliasing the largerfrequencies. With this in mind, we are able to investigatediﬀerent time scales by perfectly reconstructing frequencies < . M through the cost of aliasing frequencies ≥ . M —a result used by [37, 38].The insight of NUFFT methods is that the relation between N and the time scale is demonstrated more intuitively (seeFigure 3). For ﬁxed M sp , when N is small M r is also small.Therefore, the M r grid points will be more spread out andeach grid point will have contributions from multiple sourcestrengths averaged based on the choice of kernel ϕ ( · ) . While5 p Up−sampled Gridf f f f f Figure 3: A toy example is used to show how the choice of N has animpact on averaging and the time scale. For ﬁxed M sp = 6 , when N is small ( e.g. N = 5 ) M r is also small ( M r = 22 ). Meaning the M r grid points will be more spaced out and each M r grid point willhave contributions from most of the source strengths δ i ( I h ) ; therefore,investigating smaller time scales by averaging. On the other hand, when N is large ( e.g. N = 40 ) M r is also large ( M r = 162 ). Meaning the M r grid points will be more closely spaced and only some of the M r gridpoints will have contributions from separate source points and majorityof the M r grid points will have no contributions. This illustrates theintuitive idea of how changing N allows one to investigate diﬀerent timescales using the ideas from NUFFTs. for the case when N is large M r will also be large. Meaningthe M r grid points are more tightly packed and fewer gridpoints will have contributions from separate source strengths— essentially there is less averaging.Second, the interpolation is explicit in NUFFT methods.This is interesting because interpolation of ﬁnancial data canresult in estimates being biased — such as linear interpolation[3] or interpolation based on underlying continuity assump-tions such as the Hayashi-Yoshida estimator [4]. We arguethese methods are ﬂawed because they do not account forthe eﬀects of interpolation — whereas NUFFT methods ac-count for this by deconvolving the interpolation eﬀects in theFourier space.Before moving on, we highlight that one of the main char-acteristics of the Malliavin-Mancino estimators is that it doesnot require the manipulation of the original data in the com-putation of eq. (4). However, there is an explicit averagingstep in the evaluation of the Fourier coeﬃcients using NUFFTmethods. Therefore, we need to carefully examine if the useof eq. (14) rather than eq. (4) in eq. (1) will aﬀect the re-sulting estimates (see Section 3.2).

3. Algorithm Performance and Benchmarking

The benchmarking is done using Monte Carlo simulations. We compare the relative performance of the algorithms andinvestigate the various factors inﬂuencing speed and accu-racy. We use the Geometric Brownian Motion (GBM) whichsatisﬁes the following SDEs: dS i ( t ) S i ( t ) = µ i dt + σ i dW i ( t ) , i = 1 , ..., D, (16)with Corr( dW i , dW j ) = ρ ij . The GBM is simulated usingthe Euler–Maruyama scheme (which is strong order 0.5 in All the seeds for replication of the work are provided in the respectivescript ﬁles from our GitHub resource [5] the sense of [23]) with equal spacing ∆ t between the obser-vations (see algorithm 1) and are at the same time acrossthe features. This is known as the synchronous case. Asyn-chrony is then induced from the synchronous case using twoapproaches: (i) the missing data representation, and (ii) thearrival time representation. The missing data representationis achieved by randomly sampling and removing a certainpercentage of observations. The arrival time representationis achieved by sampling the synchronous price path using anexponential inter-arrival time with rate λ i . The common factors aﬀecting the computation time for allthe algorithms are: (i) the number of data points n , ..., n D ,(ii) the number of Fourier coeﬃcients M = 2 N + 1 , and(iii) the number of features D . The parameter speciﬁc to thenon-uniform FFT method is the tolerance (cid:15) which determinesthe spreading width ω .First, we investigate the common factors aﬀecting com-putation time for the various algorithms. To this end, weinvestigate the computation time as a function of the numberof data points for a synchronous GBM ( n = n = ... = n D ).The synchronous GBM is used because the Nyquist frequencyis N = n/ for the synchronous case. Therefore, the numberof Fourier coeﬃcients scale linearly with the number of datapoints.Figure 4 we compare the following algorithms: the for-loop implementation (MRS), the vectorised implementation(CFT), the FFT implementation (FFT), the zero-padded FFTimplementation (ZFFT) and the fast Gaussian gridding imple-mentation of the NUFFT (FGG) using the default (cid:15) = 10 − .The O ( n ) plots are plotted with compute time on the logscale for better comparison, and include the Dirichlet andFej´er representation for D = 2 , and features. Tak-ing a closer look at Figure 4 we notice several things.First, the for-loop and vectorised implementation take onthe same general shape but the vectorised implementationis faster due to the exploitation of the Hermitian symmetry.Figures 4a and 4b demonstrate the limitation of the vectori-sation: memory usage. This is because each element in thecomplex matrix (of size n × N ) requires 16 bytes to storethe Complex 64-bit ﬂoating point number. For × datapoints we require 20GB of memory, therefore demanding theuse of virtual memory which results in a deterioration of per-formance. The benchmarking is done using a 2.5GHz base clock speed Quad-Core Intel i7-4870HQ with 16GB of 1600MHz DDR3L (CL=11) RAMon MacOS version 10.15.1 with JuliaPro version 1.2.0. GCC8 is used asa requirement for the Julia interface to FINUFFT provided by [22]. The compute time is the minimum estimate over 10 replications.As the minimum is a robust estimator for the location parameter of thetime distribution [8]. The induced correlation matrix for D = 10 and are createdusing a uniform random matrix and re-scaled appropriately. The func-tion can be found in gencovmatrix provided in our GitHub resources [5].We use a uniform random matrix, such a choice will only produce posi-tive correlations but is computationally convenient and has no inﬂuenceestimates of the compute times. igure 4: We investigate the algorithm complexity between traditional implementation methods against fast Fourier methods. We plot the logarithmof compute time (measured in seconds) as a function of the number of data points n for various implementation methods. The n is the number ofprice observations simulated using a Geometric Brownian Motion with algorithm 1. We investigate the synchronous case since the Nyquist cutoﬀscales linearly with the number of data points when the data points are uniform in time. Furthermore, we obtain run times for the Dirichlet andFej´er kernel basis for D = 2 , and features to investigate the impact breadth has on the run time. The traditional methods investigate are thevectorised implementation (CFT - blue dashes) and the “for-loop” implementation (MRS - orange dashes). The fast Fourier methods investigatedare the FFT (FFT - green dots), zero-padded FFT (ZFFT - purple dash-dots) and the NUFFT implementation using the fast Gaussian gridding(FGG - dark green dash-dot-dots) with the default (cid:15) = 10 − . The ﬁgures demonstrate the eﬃcacy of fast Fourier methods in reducing computetime for both basis kernels of the Malliavin-Mancino estimator. The ﬁgures can be recovered using the Julia script ﬁles Dirichlet Timing and FejerTiming on the GitHub resource [5]. Table 1: The measured compute times are given in seconds for various algorithms using 2 features with data points. The methods consideredare: the “for-loop” implementation (MRS), the fast Gaussian gridding (FGG), Kaiser-Bessel (KB), exponential of semi-circle using our naiveimplementation (ES) and the implementation from FINUFFT (FINUFFT). The NUFFT methods are computed using the default (cid:15) = 10 − . Thetimes are extracted from Figures 4 and 5. Dirichlet [sec] Fej´er [sec] n n N /λ FGG MRS FGG MRS319 336 48855 30 0.047676 3.83602 0.050635 3.93206319 272 36164 40 0.027140 2.51448 0.032651 2.64331319 214 7128 50 0.004719 0.40825 0.005266 0.44622319 169 38917 60 0.021548 2.19374 0.028445 2.40181319 168 25006 70 0.013686 1.39089 0.016143 1.50772319 106 2281 80 0.000995 0.09976 0.001036 0.10474319 119 3325 90 0.001844 0.14592 0.001923 0.15903319 98 2227 100 0.000765 0.09277 0.000805 0.09451

Table 2: The Dirichlet and Fej´er computation times (measured in seconds) are given for varying degree of asynchrony. Here a synchronous GBM with data points is sampled with /λ = 30 for the ﬁrst feature, while /λ ranges from 30 to 100 in increments of 10 seconds. The table reportsthe exact n i and N from the sampling. The fast Gaussian gridding implementation of the NUFFT (FGG) outperforms the for-loop implementation(MRS); as /λ increases, n and N decrease, resulting in a faster compute time. However, increasing /λ does not guarantee that ∆ t will belarger, which can lead to a larger N and therefore a longer compute time. Second, the diﬀerence in speed between the naive methodscompared to the fast Fourier transform methods is signiﬁcant.Looking at Table 1 for 2 features with n = 10 data points,the for-loop implementation takes seconds while the fastGaussian gridding takes . seconds — , times fasterthan the naive for-loop compute time.Third, between the fast Fourier methods from fastest toslowest we have: FFT, zero-padded FFT, and FGG. Thisis because the FFT computes N Fourier modes, the zero-padded FFT computes M Fourier modes, and the FGG com-putes M r Fourier modes. On top of that, the FFT requiresno steps before performing the FFT whereas the zero-paddedFFT needs to zero-pad missing data while the FGG requiresthe convolution and deconvolution step.Finally, the breadth of features D can impact computationtime depending on the choice of N . The case when N isthe same across all features is simple. We then only need tocompute the M Fourier coeﬃcients for D features. This ispresented in Figure 4. When N is the same across all fea-tures, we are presented with two advantages: (i) the timescale investigated will be the same for all the features, and(ii) if one uses the Fej´er basis kernel we can guarantee positivesemi-deﬁniteness in the covariance matrix [29, 30]. The casewhen N changes for the diﬀerent features becomes more nu-anced. For example, diﬀerent features have diﬀerent Nyquistfrequencies in the arrival time representation. Here we needto compute D ( D − pairwise estimates for each ˆΣ ijn,N entry.A potential problem arises when N is independently obtainedto investigate the co-movement between events for each fea-ture pair [4], or when the Dirichlet basis kernel is used. Weare not guaranteed a positive semi-deﬁnite matrix which canpresent challenges. For example, when an invertible covari-ance matrix is a necessary requirement such as in the case of portfolio optimisation. This can be ameliorated by transform-ing the non-positive semi-deﬁnite covariance matrix estimateto the closest positive semi-deﬁnite matrix under some appro-priate norm [24], or using extensions of these type of trans-formations [32]. However, doing so comes with an additionalcomputational cost.Let us investigate the degree of asynchrony as a variableof study, speciﬁcally the aﬀect on computation time underthe arrival time representation. Here we only consider thecase when D = 2 . Let /λ be the mean inter-arrival timeused to sample the ﬁrst feature, and N be the correspondingNyquist frequency from the feature; similarly /λ and N for the second feature. Therefore, the N used in eqs. (1)to (3) is N = min { N , N } to avoid aliasing.Table 2 reports the Dirichlet and Fej´er computation time(minimum estimate over 10 replications and measured in sec-onds) for the for-loop implementation (MRS) and the fastGaussian gridding implementation of the NUFFT (FGG) us-ing the default (cid:15) = 10 − . We simulate a GBM with n = 10 data points. This process is then sampled using an exponen-tial inter-arrival with rate λ i . Here the ﬁrst feature is sampledwith an average inter-arrival ( /λ ) of 30 seconds, while thesecond feature is sampled with an average inter-arrival ( /λ )ranging from 30 to 100 seconds in increments of 10. The ex-act number of observed data points n i ≈ n/λ i and Nyquistfrequency N from the sampling is also reported. Table 2 demonstrates two things: ﬁrst, the FGG is sig-niﬁcantly faster than the for-loop implementation. Second, The Nyquist frequency here is (cid:98) T t (cid:99) where ∆ t is the smallestdistance between two consecutive prices [29]. Using the Nyquist frequency under asynchrony will result in theestimate being biased. This is demonstrated in Section 3.2. igure 5: We investigate the algorithm complexity for various non-uniform fast Fourier transform methods by plotting the time (measured in seconds)as a function of the error tolerance (cid:15) which determines the spreading width ω for the various averaging kernels. The FFT (FFT - purple line) andzero-padded FFT (ZFFT - dark green line) is plotted as a baseline for comparison. We simulate a synchronous Geometric Brownian Motion usingalgorithm 1 with n = 10 data points for D = 2 , and respectively. The run times are obtained for the two basis kernels: Dirichlet (Dir.) andFej´er (Fej.). The non-uniform FFT methods considered are the Gaussian kernel (FGG - blue dashes), the Kaiser-Bessel kernel (KB - orange dashes)and the exponential of semi-circle kernel with our naive implementation (ES - red dashes) and the implementation by FINUFFT (FINUFFT - greendots). The results are consistent with the results from [2], where the FINUFFT implementation is faster than the fast Gaussian gridding which isfaster than the Kaiser-Bessel. The evaluations are all done “on-the-ﬂy” without any pre-computation. The ﬁgures can be recovered using the Juliascript ﬁle Error Timing on the GitHub resource [5]. /λ increases we get a faster compute time (most ofthe time) because n and N decrease. However, this is notguaranteed because a larger /λ does not ensure ∆ t (min-imum ∆ t ) will also be larger. Therefore, N does not alwaysdecrease which can lead to longer compute times.The last variable inﬂuencing compute time to investigateis the impact of tolerance (cid:15) . This is explored by plottingthe computation time as a function of (cid:15) . Here we use thesynchronous case ( n = n = ... = n D ) with n = 10 datapoints.Figure 5 we compare the following algorithms: the FFTimplementation (FFT) and the zero-padded FFT implemen-tation (ZFFT) as a baseline for comparison. The NUFFTmethods include the fast Gaussian gridding with the Gaussiankernel (FGG), the Kaiser-Bessel kernel (KB), the exponentialof semi-circle using our naive implementation (ES) and theFINUFFT package (FINUFFT). The O ( n ) plots are plottedon the log scale as the minimum compute time estimate over10 replications. The ﬁgures include the Dirichlet and Fej´errepresentation for D = 2 , and features.Looking more closely, we see that the zero-padded FFTimplementation needs to assign n data points to the over-sampled grid, and the NUFFT methods need to assign ωn points to the over-sampled grid. Furthermore, the zero-padded FFT requires no evaluations whereas the NUFFTmethods require ωn evaluations of δ i ( I h ) ϕ ( · ) .The key diﬀerences between the NUFFT algorithms is inhow they reduce the number of evaluations required. Thetechnique used in the fast Gaussian gridding is to reduce thenumber of exponential evaluations for ϕ G ( · ) . This is achievedby separating the exponential into three components: e − ( x j − πm/M r ) / τ = (cid:104) e − x j / τ (cid:105) (cid:104) e x j π/M r τ (cid:105) m (cid:104) e − ( πm/M r ) /τ (cid:105) . (17)By splitting the exponential this way, we only need two expo-nential evaluations per source point instead of ω exponentialevaluations for each source point. Reducing the number ofexponential evaluations from ωn to ω + 2 n . The advantagewith using the Kaiser-Bessel kernel is that it is both smoothand has narrow support [2]. This can be exploited to cut thenumber of kernel evaluations by reducing ω while maintaininga comparable level of accuracy. For example, the Gaussiankernel requires ω = 24 for roughly 12 digit accuracy whilethe Kaiser-Bessel kernel requires ω = 13 for the same 12digit accuracy [2]. Finally, the exponential of semi-circle hasnarrow support similar to that of the Kaiser-Bessel kernel butis simpler and faster to evaluate. The downfall is that thereis no known analytic Fourier transform, thus incurring theadditional cost of numerical integration to evaluate eq. (10).Our implementation of the exponential of semi-circle isnaive compared to [2]. We do not exploit the piecewisepolynomial kernel approximation to accelerate the evaluationof eq. (9). Furthermore, we use naive numerical integra-tion to compute eq. (10) using the adaptive Gauss-Kronrodquadrature QuadGK rather than the Gauss-Legendre quadra-ture with “phase winding” [2]. The naive implementation ofthe exponential of semi-circle serves two purposes: (i) allow- ing the like-for-like comparison between the various kernelsand their algorithms based on our implementation, and (ii)illustrating the importance of the implementation techniquesused by [2]. This is seen in Figure 5 where the exponential ofsemi-circle is signiﬁcantly slower than the Gaussian or Kaiser-Bessel kernel without the implementation techniques due tothe numerical integration required for eq. (10). However, [2]are able to reduce the compute time of the exponential ofsemi-circle to a similar time as our zero-padded FFT withthe appropriate implementation techniques in place.The results in Figure 5 are consistent with that of [2].Between the non-uniform FFT methods considered: the FIN-UFFT implementation of the exponential of semi-circle is thefastest, followed by the fast Gaussian gridding, the Kaiser-Bessel kernel evaluated “on-the-ﬂy”, and lastly the expo-nential of semi-circle using the naive implementation. We have demonstrated the merit of fast Fourier techniquesin terms of speed. We now investigate the accuracy of the fastFourier methods and conditions when they fail. Moreover, weneed to ﬁnd out what level of numerical accuracy is requiredto ensure that the Fourier coeﬃcients evaluated using NUFFTtechniques eq. (14) can recover the same estimates eqs. (2)and (3) using the direct evaluation of eq. (4). This is done bytesting the fast Fourier methods on the synchronous case, themissing data representation, and the arrival time representa-tion. We then look at the inter-relation between two types ofaveraging: (i) kernel averaging—the convolution step in theNUFFT algorithms, and (ii) time-scale averaging—the choiceof N in the Malliavin-Mancino estimator. Ensuring that theNUFFT methods can correctly recover estimates for diﬀerentchoices of N allows us to quickly and accurately investigatevarious time scales. Finally, we compare the MSE and bias ofthe estimator under asynchronous sampling using the directevaluation of eq. (4) against NUFFT methods of evaluatingthe Fourier coeﬃcients eq. (14).The setting for the following two experiments are as fol-lows: a bivariate Geometric Brownian Motion with n = 10 is simulated using algorithm 1. The daily parameters for theGBM are: µ = 0 . , µ = 0 . , σ = 0 . , σ = 0 . and ρ = 0 . . We set ∆ t = , therefore each unit intervalcan be thought of as a second in Calendar time. From thesynchronous case we create the missing data representationby randomly removing of data points from each path.The arrival time representation is achieved by sampling eachprice path with an exponential inter-arrival time with mean30 and 45 for the ﬁrst and second price paths respectively.We measure accuracy as the diﬀerence between the esti-mates from the fast Fourier methods and the estimates fromthe vectorised implementation averaged over 100 replications.This allows us to directly see if the estimates obtained usingthe fast Fourier methods recover the same estimates usingthe direct evaluation. Without any pre-computations, preventing large RAM overhead. igure 6: We investigate the accuracy of various fast Fourier methods as a function of the error tolerance (cid:15) for various simulation settings. Accuracy is measured asthe diﬀerence between the estimates of the various fast Fourier methods ρ ∗ and the estimate using the vectorised implementation (CFT) ρ v averaged over the 100replications. The average correlation estimate from the vectorised implementation is provided as an inset in each ﬁgure. The base-line price process is the synchronousGeometric Brownian Motion with data points simulated using algorithm 1. The three simulation cases are: the synchronous case (a) and (b), the missing datarepresentation (c) and (d), and the arrival time representation (e) and (f). The missing data representation is down-sampled by while the arrival time representationis sampled with an exponential inter-arrival time (Rand. Exp.) with mean 30 and 45 for the ﬁrst and second price path respectively. The fast Fourier methods investigatedare: the fast Gaussian gridding (FGG - blue line), the Kaiser-Bessel kernel (KB - orange dashes), the exponential of semi-circle with our naive implementation (ES - reddash-dots) and the FINUFFT implementation (FINUFFT - green dots) and ﬁnally, the zero-padded FFT (ZFFT - purple dash-dot-dot). The accuracy is tested for thetwo basis kernels: Dirichlet (Dir.) and Fej´er (Fej.). Furthermore, the Nyquist frequency (Nyq.) is used for all the correlation estimate, therefore it must be noted thatfor the arrival time representation, N changes for each replication. We see ﬁrstly, provided the tolerance (cid:15) < − , the NUFFT methods can accurately recover theestimates, secondly, the divergence from the CFT implementation when (cid:15) ≥ − may simply be an artefact of random errors from the lack of precision requested, andﬁnally, the zero-padded FFT fails for arrival time representation. The ﬁgures can be recovered using the Julia script ﬁle AccSynDS and AccRE on the GitHub resource[5]. igure 7: We investigate the inter-play between kernel averaging and time-scale averaging. We plot the accuracy of various fast Fourier methods asa function of the error tolerance (cid:15) for various choices of N . Accuracy is measured as the diﬀerence between the estimates of the various fast Fouriermethods ρ ∗ and the estimate using the vectorised implementation (CFT) ρ v averaged over the 100 replications. The average correlation estimatefrom the vectorised implementation is provided as an inset in each ﬁgure. The base-line price process is the synchronous Geometric Brownian Motionwith data points simulated using algorithm 1. The synchronous price paths are then sampled with an exponential inter-arrival time (Rand.Exp.) with mean 30 and 45 for the ﬁrst and second price path respectively to create the arrival time representation of asynchrony. The fast Fouriermethods investigated are: the fast Gaussian gridding (FGG - blue line), the Kaiser-Bessel kernel (KB - orange dashes), the exponential of semi-circlewith our naive implementation (ES - red dash-dots) and the FINUFFT implementation (FINUFFT - green dots). The accuracy is tested for the twobasis kernels: Dirichlet (Dir.) and Fej´er (Fej.). We see that there is no clear relation between the two types of averaging, rather the divergence forhigher tolerance levels seems be an artefact of errors arising from the lack of precision requested. The ﬁgures can be recovered using the Julia scriptﬁle AccRE on the GitHub resource [5]. (cid:15) < − . Furthermore, we see that the ES kernel(red dashes) recover the correct estimates for (cid:15) = 10 − . Thisis not a property of the exponential of semi-circle kernel asthe FINUFFT implementation diverges from (cid:15) = 10 − (dueto their more lenient choice of ω ). Rather, this is a result ofour choice of M sp for the ES implementation to ensure therequested tolerance is always strictly met. Concretely, thismeans each source point must be spread in each directionfor a minimum of M sp = 4 grid points for the Gaussian ker-nel, M sp = 3 grid points for the Kaiser-Bessel kernel, and M sp = 3 grid points for the exponential of semi-circle torecover the vectorised estimate. Second, the non-uniformFFT methods diverge away from the vectorised implementa-tion when tolerance (cid:15) ≥ − . There is no clear pattern inthe divergence for the various kernels, therefore it seems theerrors are a simple artefact arising from the lack of precisionrequested in F ( dp i ) . Finally, the zero-padded FFT recoversthe correct estimate for the synchronous case and missingdata representation. More importantly, it fails for the arrivaltime representation because of the shifting of time points (seeFigure 1). Non-uniform FFT methods overcome this througha convolution and deconvolution step to correct the eﬀectsof shifting the points to a uniform grid by trying to preservethe power spectrum.Previously in Figure 6, the arrival time representation had N changing for each replication. Figure 7 we ﬁx three casesof N and measure the accuracy using the arrival time rep-resentation (with the same parameters as before) to bet-ter understand the relationship between the kernel averag-ing and the time-scale averaging. The ﬁrst N is computedas the minimum Nyquist frequency across the 100 replica-tions resulting in N = 18 , . The second N is computedbased on the smallest average sampling interval resulting in N = (cid:98) , × (cid:99) = 166 . Finally, the last N is chosen to bearbitrarily small subject to the condition that the correspond-ing M r is larger than ω for (cid:15) = 10 − . We pick N = 15 for the ﬁnal case. The zero-padded FFT is excluded becausethe implementation only computes the case when N is theNyquist frequency. We see that there is no clear relation be-tween the two types of averaging. For any choice of N , wecan recover the vectorised estimate provided the tolerance (cid:15) < − . There is no clear pattern in the divergence for thevarious kernels and the lack of accuracy in the estimates aredue to the lack of precision requested in F ( dp i ) . The M sp requirement is calculated based on (cid:15) = 10 − for theGaussian and Kaiser-Bessel kernel and (cid:15) = 10 − for the ES kernel. This is to ensure the up-sampled grid is larger than the total spread-ing width.

Figure 8 compares the MSE and bias of the integrated co-variance as a function of the number of Fourier coeﬃcientsfor the vectorised implementation and the NUFFT methods(with default (cid:15) = 10 − ). A bivariate GBM with n = 100 is simulated with the same parameters as before (except ∆ t = 1 /n ). Asynchrony is induced using a special case ofthe missing data representation: the regular non-synchronoustrading used by [29]. Here the second asset is observed atevery second trade of asset one. The methods investigatedare: the vectorised implementation (CFT), the fast Gaussiangridding (FGG), the Kaiser-Bessel kernel (KB), the exponen-tial of semi-circle with our naive implementation (ES) andthe FINUFFT implementation (FINUFFT). We see that theNUFFT methods recover the same bias and MSE results asthe vectorised implementation. When asynchrony is intro-duced the integrated volatility ˆΣ n,N (see eqs. (2) and (3))does not present a bias for all values of N , but the MSE isstill large for small values of N due to the variance of the es-timator [29]. However, we have an increase in bias for larger N (smaller time-scales) with co-volatility ˆΣ n,N . This is aresult of the Epps eﬀect (the decay in corrections as time-scales decrease). To remove this eﬀect, a smaller N (largertime-scales) must be chosen.Mancino et al. [29] suggest picking N by minimizing theMSE for an optimal bias and variance tradeoﬀ. The speedof convergence with respect to the degree of asynchrony ofthe Malliavin-Mancino estimator was then investigated withthe MSE criterion in mind [9, 29, 35]. Picking N to minimisethe MSE could be seen as inadvertently assuming that thereis some latent model generating the data with some appropri-ate limiting properties, and that we need only be concernedabout an estimators deviation from this implied latent model.This can potentially lead one to inadvertently average awayempirically important sources of the Epps eﬀect. Allowing N to be chosen to isolate a particular time-scale by its impliedchoice of ∆ t is weaker but it can allow us to disentangle gen-uine and statistical sources of the Epps eﬀect (see Section 4.1for further details).

4. Correlations and time-scale averaging

We now consider the relationship between the time-scalesand the Epps eﬀect [13]. Concretely, we demonstrate howdiﬀerent time-scales can be investigated with the Malliavin-Mancino estimator through the choice of N . Speciﬁcally,by using ∆ t = T N +1 (see Section 2.3). This follows theinsights introduced by Ren`o [38], and Precup and Iori [37]to investigate the Epps eﬀect. Precup and Iori were ableto demonstrate that the higher the level of asynchrony, thelarger the drop in correlation for the Epps eﬀect. This is We also performed a sensitivity analysis conﬁrming that the NUFFTmethods and vectorised implementation correctly recover the integratedcovariance. This is conﬁrmed through the linear relationship by plottingthe estimate against the true integrated covariance for a range of values(see Figure B.12).

20 30 40 500.00.10.20.30.40.5 CFTFGGKBESFINUFFT 20 30 40 500.00.10.20.30.40.5 CFTFGGKBESFINUFFT

Figure 8: Here we show the bias and MSE of the integrated covariance as a function of the number of Fourier coeﬃcients using the Dirichletrepresentation (the Fej´er can be found in Figure A.11). The base-line price process is a synchronous GBM with n = 100 data points. The missingdata representation is induced here using the regular non-synchronous trading [29], where the second asset is observed at every second trade of assetone. The methods investigated are: the vectorised implementation (CFT - black dashes), the fast Gaussian gridding (FGG - blue dash-dot-dot), theKaiser-Bessel kernel (KB - orange dashes), the exponential of semi-circle with our naive implementation (ES - red dash-dots) and the FINUFFTimplementation (FINUFFT - green dots). The NUFFT methods are computed using the default (cid:15) = 10 − . We see that the fast Fourier methodsrecover the same bias and MSE results as the vectorised implementation and are consistent with the results obtained by [29]. The ﬁgures can berecovered using the Julia script ﬁle MSEBias on the GitHub resource [5]. demonstrated in Figure 6 with the average correlation pro-vided as insets for varying level of asynchrony. Additionally,Ren`o was able to demonstrate the Epps eﬀect as a functionof sampling frequency under the arrival time representation ofasynchrony. Demonstrated here in Figure 7 with the averagecorrelation provided as insets for various N . Following thework of [37, 38], T´oth and Kert´esz [41] and Mastromatteoet al. [31] were able to analytically quantify the Epps eﬀectarising from asynchrony under an arrival time representationas: ˜ ρ ij ∆ t = c (cid:18) λ ∆ t (cid:0) e − λ ∆ t − (cid:1)(cid:19) . (18)Here c is the induced correlation and the sampling intensityis λ ; the same for the price paths [31]. This will serve as ourbase-line theoretical Epps eﬀect.We compare the relationship between the time-scale av-eraging in the Malliavin-Mancino estimator used by [37, 38]to the analytic formula characterising the Epps eﬀect arising from Poissonian sampling in eq. (18). This is done by sim-ulating T data points from a bivariate Geometric BrownianMotion with the same parameters as Section 3.2. We considerone hour, one trading day and one trading weeks’ worth ofsimulated data with a price realisation sampled each second.Thus, assuming that each trading day is 8 hours in Calendartime, we have T = 3600 , and synchronous datapoints for the various cases. The synchronous price paths arethen sampled using an exponential inter-arrival time with thesame rate λ to create the arrival time representation of theasynchronous price paths.Figure 9 plots eq. (18) as a function of ∆ t (MMZ) rang-ing from to seconds and compared against the esti-mated correlations. The corresponding N for the Malliavin- We note that eq. (19) may not always be a perfect conversion dueto the range of Fourier modes in eq. (1). igure 9: Here we demonstrate the relationship between the time-scale averaging in the Malliavin-Mancino estimator (as determined by the number of Fourier coeﬃcients)and the analytic formula characterising the Epps eﬀect arising from Poissonian sampling using eq. (18). The conversion from ∆ t to N in the estimators is given byeq. (19). The arrival time representation samples T synchronous data points (each point representing a second in a day) with an inter-arrival time with rates λ = and λ = for the ﬁrst and second columns respectively. The induced correlation across the replications is . . Here the green dashes (“MMZ”) is the plot of eq. (18).The Malliavin-Mancino estimates for each ∆ t are obtained using the Dirichlet and Fej´er basis kernels; these are respectively the blue line (“MM Dirichlet”) and red line(“MM Fej´er”). Here we repeat this process 100 times to obtain 100 arrival time representation paths, based on 100 diﬀerent GBM paths to obtain 100 estimates for each ∆ t . The average correlation estimate at each ∆ t is then plotted with error bars (computed using a t-distribution with 99 degrees of freedom and the sample standarddeviation) representing 68% of the variability between the estimated paths. The Dirichlet basis kernel (plausibly) recovers the Epps eﬀect given by eq. (18), while theFej´er basis kernel is biased upwards with respect to the Dirichlet basis kernel (and the theoretical Epps eﬀect) because there is induced averaging. For the same reasonthe estimate curves using the Dirichlet basis kernel are more volatile than those estimated using Fej´er kernel basis (see Figure C.13). The ﬁgures can be recovered usingthe Julia script ﬁle MMZandMM on the GitHub resource [5]. N = (cid:22) (cid:18) T ∆ t − (cid:19)(cid:23) . (19)The Malliavin-Mancino correlation estimates are computedusing the fast Gaussian gridding implementation of the non-uniform fast Fourier transform with (cid:15) = 10 − for the variouschoices of ∆ t . This is done for the Dirichlet basis kernel (MMDirichlet) and the Fej´er basis kernel (MM Fej´er). Further-more, this process is repeated 100 times so that the variabil-ity between the measured estimates can be investigated forvarious n i with i = 1 , and N . Here, n i ≈ T /λ on averagebased on the Poissonian sampling.Figure 9 plots the average correlation estimates over thevarious replications with the error bars representing 68% ofthe variability between the estimated paths. First, the pre-cision of the estimates improves as n i and N increase i.e. fordecreasing time-scales. The exact contributions of n i and N leading to the increased precision for larger T is unclear, aslarger T implies larger n i and N . However, for a ﬁxed T wesee the eﬀect that larger N has on the precision (ignoring thevariability from n i changing from the replications). Second,the Dirichlet kernel can plausibly recover the theoretical Eppscurve; while the Fej´er is biased upwards with respect to theDirichlet basis and the theoretical Epps eﬀect. This is becausethe Fej´er kernel places more weight on the lower frequenciesand less weight on the higher frequencies, which makes itmore stable under microstructure noise [30]. Furthermore,due to the weighting of frequencies in the Fej´er kernel weget smoother estimates compared to the Dirichlet kernel (seeFigure C.13 for indication of individual realisations).Deciding which basis kernel to use depends on how onewants to treat the Epps eﬀect. The Epps eﬀect is well knownand has many factors contributing to it [31, 34, 33, 37, 38,39, 40, 41]. The eﬀects include statistical causes that requirecorrection such as asynchrony [33, 37, 38, 41] and tick-size[34, 33], but also genuine eﬀects such as lead-lag [31, 38] andsampling interval dependent correlations [1]. Therefore, theDirichlet kernel will be more appropriate if one is interestedin recovering the empirical nature of correlation dynamics atvarious time-scales as it (plausibly) recovers the theoreticalEpps eﬀect. However, the Fej´er kernel is more eﬀective if oneis interested in correcting the Epps eﬀect as more weight isplaced on lower order frequencies to avoid market microstruc-ture noise. Furthermore, the Fej´er kernel can be coupled with N which minimises the MSE.There have been many methods proposed to correct theEpps eﬀect arising from asynchrony, such as the estimatorproposed by Hayashi and Yoshida [17], a correction basedon the distortion caused by asynchrony [33], or picking asmaller N with the Malliavin-Mancino estimator [29]. Theimplication of picking N to minimise MSE is that control overthe time-scale of interest is relinquished (as with the Hayashi-Yoshida estimator). It is clear in Figure 9 that in order for We use the sample standard deviation and a t-distribution with 99degrees of freedom for 100 replications.

MSE to be minimised N must be small, meaning that largertime intervals are investigated (which is a simple method toremove the Epps eﬀect). This can be problematic if one isinterested in disentangling the genuine causes of the Eppseﬀect from the statistical causes at various time-scales [31].The better approach here would be to measure the observedcorrelation dynamics (the Malliavin-Mancino estimator usingNUFFTs provides a quick method to so) then correct for thestatistical causes [7, 33] so that genuine causes in the decayof correlations at various time-scales can be investigated. The estimated correlations at various high-frequency time-scales for 10 equity assets listed on the Johannesburg StockExchange (JSE) are given as a real-world example. Thecorrelations are estimated using Trade and Quote (TAQ)event data for the 10 equities extracted from BloombergPro and processed to remove repeated time stamps by ag-gregating trades with the same time stamp using a vol-ume weighted average. The processed TAQ data can befound in [6]. The 10 equities considered are: FirstRandLimited (FSR), Shoprite Holdings Ltd (SHP), Absa GroupLtd (ABG), Nedbank Group Ltd (NED), Standard BankGroup Ltd (SBK), Sasol Ltd (SOL), Mondi Plc (MNP), An-glo American Plc (AGL), Naspers Ltd (NPN) and BritishAmerican Tobacco Plc (BTI). The period considered is theweek from 24/06/2019 to 28/06/2019. The data is for a 5day period with equities trading 8 hours a day. This yields T = 5 × ,

800 = 144 , seconds in the period of con-sideration. The TAQ data is discrete and asynchronous withdiﬀerent rates of trading for diﬀerent stocks.Tickers Vol. Traded Unique Trades / ˆ λ [sec] BTI 3143263 7893 17.83 ± ± ± ± ± ± ± ± ± ± Table 3: The table provides a summary of the 10 equities considered forthe week from 24/06/2019 to 28/06/2019. The table indicates the vol-ume traded, the number of unique trades and the mean inter-arrival timebetween trades measured in seconds with the 95% conﬁdence intervalprovided.

Table 3 provides the volume traded, the number of uniquetrades, and the mean inter-arrival times between the tradesfor the 10 equities used in the analysis. It is importantto notice that the measured intensities ˆ λ ’s are not the Here measured in seconds with a 95% conﬁdence interval providedcomputed using a t-distribution and the standard errors. The ˆ λ ’s are indicative, estimated from the TAQ data. igure 10: The two most interesting correlation pairs from the 10 available equity data is shown by plotting the correlation as a function of thesampling interval ∆ t measured in seconds (the complete set is in Figure C.13 of Appendix C). The conversion between ∆ t and N is given byeq. (19) assuming T = 144 , . The correlation pairs considered are FSR/SBK (blue) and FSR/AGL (red). Here the lines with error bars arethe mean estimates and 95% variability between the paths obtained from block bootstrap, while the dashes are the measured estimates from thecomplete time series. Furthermore, we plot a simple theoretical Epps eﬀect arising from asynchrony for the FSR/SBK (green dashes) pair (seeeq. (18)). This is done by assuming the inter-arrival time of trades follow an exponential distribution with the same rate ˆ λ = 1 / . . We foundthat c = 0 . provided a relatively good ﬁt. The theoretical Epps is not plotted for the FSR/AGL pair because the correlation dynamics do notexhibit the Epps eﬀect. The empirical reality of curves such as the upper blue curve for the FSR/SBK pair suggest that current theoretical Eppseﬀect models can plausibly model the correlation dynamics for some asset pairs; while curves such as the lower red curve for the FSR/AGL pairsuggest that the current theoretical explanations for the Epps eﬀect are possibly insuﬃcient. The ﬁgures can be recovered using the Julia script ﬁleEmpirical on the GitHub resource [5]. same across the assets. In order to use eq. (18), we makethe simplifying assumption that the ˆ λ ’s are approximatelythe same and take on the larger intensity of the two i.e. ˆ λ = max(ˆ λ i , ˆ λ j ) . We highlight that an extension of eq. (18)to model diﬀerent intensities and lead-lags is provided byMastromatteo et al. [31]. They considered multiple intensi-ties λ i (cid:54) = λ j in order to decouple eﬀects from asynchronoussampling and the eﬀects from lead-lag. We are interestedin the general concave shape of the theoretical model (see[31, 41] and the plots therein) and do not use the extendedmodel for the theoretical Epps eﬀect. However, we point outthat not all of the measured Epps curves conform to the shapeof the known theoretical models; irrespective of whether onecan conﬂate the lead-lag formulation for some lag τ with theasynchronous versions with diﬀerent intensities λ i (cid:54) = λ j .Before comparing the theoretical Epps eﬀect against themeasured Epps eﬀect, we perform some Exploratory DataAnalysis to identify the interesting correlation pairs out of the45 available pairs. We investigate the 45 correlation pairsas a function of the sampling interval ∆ t for the Dirichletand Fej´er basis kernel (see Figure C.13). The conversionfor ∆ t to N is given by eq. (19), assuming T = 144 , .The correlation estimates are estimated using the fast Gaus-sian gridding implementation of the non-uniform fast Fouriertransform with (cid:15) = 10 − . We obtain correlation matrices for

100 ∆ t ’s ranging from 1 to 100. The compute time for diﬀerent N ’s took a total of . seconds using the Dirichletbasis and . seconds using the Fej´er basis — demonstrat- ing the eﬃcacy of our fast Fourier method. Snapshots ofthe correlations structures are then plotted as heat-maps for ∆ t = 1 , , and seconds for easier identiﬁcation ofthe pairs (see Figure C.14). From considering all the corre-lation pairs as a function of the sampling intervals and thereduced heat maps (see Figures C.13 and C.14), we were ableto make two initial observations. First, the Fej´er kernel pro-duces smoother estimates compared to the Dirichlet kernel.Second, nearly all the correlations pairs exhibit the Epps ef-fect where the correlations rise as ∆ t increases and conform-ing to the theoretical models in the literature [31, 40, 41].However, there are exceptions where correlation pairs do notexhibit the behaviours easily accounted for by the prevailingmodels such as the FSR/AGL pair. Rather, it seems that thecorrelation drops as ∆ t increases to the point where the signof the correlation switches.Figure 10 investigates this in more detail by plotting thecorrelation as a function of ∆ t for two particular asset pairs.The indicative sample error bars are obtained through blockbootstrap quoted to indicate 95% of the variability betweenthe estimates for each ∆ t . The ﬁrst pair FSR/SBK (blue In this instance both the Dirichlet and Fej´er kernel produced positivesemi-deﬁnite covariance matrices. This is achieved by splitting the data into 100 calendar time blocksand estimating correlations at various ∆ t ’s with 1 block removed eachtime. T remains the same across the various replications, so the missingblock is treated as missing data. Standard deviations are obtained fromthe block bootstrap and error-bars computed using a t-distribution with99 degrees of freedoms. The errors are overlaid on the mean estimates max(ˆ λ FSR , ˆ λ SBK ) = ˆ λ ≈ / . . We found that c = 0 . produced a relatively good ﬁt for the measured correlationsusing the Dirichlet basis. The second pair FSR/AGL (redline/dashes) does not behave in accordance to the Epps ef-fect, so no theoretical Epps curve is plotted for the pair as thecorrelation dynamics do not meaningfully ﬁt the functionalform of the model under estimation. One could possibly tryargue that this is because there are stocks in the set with lowrelative correlations with respect to other stocks (see Ap-pendix D for a simple simulated 3-asset example) where thesample error can generate measured sign changes. However,we argue that this is insuﬃcient as it does not recover thepathological behaviour seen in the FSR/AGL pair.Figure 10 illustrates a case where the Epps eﬀect can beplausibly modelled and a case where it cannot be easily mod-elled. The majority of the correlation pairs ﬁt into the morenotable Epps eﬀect models [31, 33, 40, 41] which accountfor a drop in magnitude with a concave decay in correlations(see Figure C.13). However, Mastromatteo et al. cautionthat a signiﬁcant portion of the measured Epps eﬀect cannotbe completely accounted for by current models of the Eppseﬀect. Meaning there are other factors which can aﬀect thedynamics of the observed correlation [31]. The FSR/AGLpair is one such example. This suggests that either: (i) cur-rent theoretical explanations for the Epps eﬀect are possiblyinsuﬃcient, or (ii) the correlation dynamics under market-microstructure cannot be explained with only the Epps eﬀect[4].

5. Conclusions

We provide a fast novel implementation of the Malliavin-Mancino Fourier estimators using non-uniform fast Fouriertransforms and promote the use of fast Gaussian griddingwith the Fej´er basis function as our preferred implementation.First, we compared three averaging kernels: the Gaussian,Kaiser-Bessel, and exponential of semi-circle kernel. Basedon the like-for-like algorithmic comparison, the fast Gaus-sian gridding is the fastest out of the three non-uniform fastFourier methods. However, with appropriate low-level im-plementation techniques the exponential of semi-circle kernelcan be made to be faster than the fast Gaussian gridding [2].All three non-uniform fast Fourier method signiﬁcantly out-perform the naive implementations of the Malliavin-Mancinoestimators.Second, we demonstrate the requirement for using the non-uniform fast Fourier methods as motivated by the failure ofthe zero-padded fast Fourier transform using the arrival timerepresentation of asynchrony. from the block bootstrap.

Third, we demonstrate that there is no adverse interplaybetween the kernel averaging and the time-scale averaging(arising from the choice of N ); provided there is suﬃcientspreading to enough nearby grid points. Concretely, the re-quested tolerance (cid:15) must be less than − under our choiceof M sp . We show when this is the case, the non-uniform fastFourier methods can recover the estimates of eqs. (2) and (3)to machine precision. Moreover, we show that the NUFFTmethods recover the same bias and MSE results as the di-rect evaluation of eq. (4) and correctly recover the targetintegrated covariance.Fourth, we provide the link between the work done by Ren`o[38] and Precup and Iori [37] with the work from T´oth andKert´esz [41] and Mastromatteo, Marsili and Zoi [31]. More-over, we argue the Dirichlet kernel is the better choice if onewants to recover the empirical nature of correlation dynamicsat various time-scales while the Fej´er kernel is more appropri-ate if one wants to correct the Epps eﬀect (with appropriatechoice of N ).Finally, we demonstrate the eﬃcacy of our non-uniform fastFourier methods with one week of Trade and Quote data fromthe JSE. We argue that the current theoretical explanationsfor the Epps eﬀect are possibly insuﬃcient in explaining theentirety of the empirical correlation dynamics under speciﬁcmarket microstructures.Future work aims to expand our empirical understanding ofcorrelation dynamics on various streaming event-data sourcesusing this convenient estimation tool and to incorporate theestimators NUFFT implementation with an extension thatmakes the estimate robust to samples that explicitly incorpo-rate jumps [11]. Acknowledgements

We would like to thank Melusi Mavuso and Roger Bukurufor comments. We would also like to thank the reviewersfor helpful critique and suggestions. The data was sourcedfrom Bloomberg Professional via the University of Cape TownLibrary service. Patrick Chang would like to acknowledgethe support of the Manuel & Luby Washkansky Scholarshipand the South African Statistical Association [grant number127931].

References [1] Bacry, E., Delattre, S., Hoﬀmann, M., Muzy, J.F., 2013. Modellingmicrostructure noise with mutually exciting point processes. Quan-titative Finance 13, 65–77. doi: .[2] Barnett, A.H., Magland, J.F., af Klinteberg, L., 2018. A parallelnon-uniform fast fourier transform library based on an ”exponentialof semicircle” kernel. SIAM J. Scientiﬁc Computing 41, C479–C504. doi: .[3] Barucci, E., Ren`o, R., 2002. On measuring volatility and thegarch forecasting performance. Journal of International Finan-cial Markets, Institutions and Money 12, 183–200. doi: .[4] Chang, P., Bukuru, R., Gebbie, T., 2019. Revisiting theEpps eﬀect using volume time averaging: An exercise in R. arXiv:1912.02416 .

5] Chang, P., Pienaar, E., Gebbie, T., 2020a. Julia code:Malliavin-mancino estimators implemented with the non-uniformfast fourier transform. URL: https://github.com/CHNPAT005/PCEPTG-MM-NUFFT , doi: .[6] Chang, P., Pienaar, E., Gebbie, T., 2020b. Malliavin-Mancino esti-mators implemented with the non-uniform fast Fourier transform:Dataset. doi: .[7] Chang, P., Pienaar, E., Gebbie, T., 2020c. Using the Epps eﬀectto detect discrete data generating processes. arXiv:2005.10568 .[8] Chen, J., Revels, J., 2016. Robust benchmarking in noisy environ-ments. arXiv:1608.04295 .[9] Chen, R.Y., 2019. The Fourier transform method for volatility func-tional inference by asynchronous observations. arXiv:1911.02205 .[10] Cooley, J., Tukey, J., 1965. An Algorithm for the Machine Cal-culation of Complex Fourier Series. Mathematics of Computation19, 297–301. doi: .[11] Cuchiero, C., Teichmann, J., 2015. Fourier transform methods forpathwise covariance estimation in the presence of jumps. Stochas-tic Processes and their Applications 125, 116 – 160. doi: https://doi.org/10.1016/j.spa.2014.07.023 .[12] Dutt, A., Rokhlin, V., 1993. Fast fourier transforms for nonequis-paced data. SIAM Journal on Scientiﬁc Computing 14, 1368–1393.doi: .[13] Epps, T.W., 1979. Comovements in stock prices in the very shortrun. Journal of the American Statistical Association 74, 291–298.[14] Frigo, M., Johnson, S.G., 2005. The design and implementationof FFTW3. Proceedings of the IEEE 93, 216–231. doi: . special issue on “Program Generation, Opti-mization, and Platform Adaptation”.[15] Glasserman, P., 2004. Monte Carlo methods in ﬁnancial engineer-ing. Springer, New York.[16] Greengard, L., Lee, J.Y., 2004. Accelerating the nonuniform fastFourier transform. SIAM Review 46, 443–454. doi: .[17] Hayashi, T., Yoshida, N., 2005. On covariance estimation of non-synchronously observed diﬀusion processes. Bernoulli 11, 359–379.doi: .[18] Hendricks, D., 2017. Using real-time cluster conﬁgurations ofstreaming asynchronous features as online state descriptors in ﬁ-nancial markets. Pattern Recognition Letters 97, 21 – 28.[19] Hendricks, D., Gebbie, T., Wilcox, D., 2016. Detecting intradayﬁnancial market states using temporal clustering. Quantitative Fi-nance 16, 1657–1678.[20] Hendricks, D., Gebbie, T., Wilcox, D., 2017. High-speed fouriermethod estimation of covariances from asynchronous data. Work-ing paper.[21] Hendricks, D., Wilcox, D., 2014. A reinforcement learning exten-sion to the almgren-chriss framework for optimal trade execution.2014 IEEE Conference on Computational Intelligence for FinancialEngineering & Economics (CIFEr) , 457–464.[22] af Klinteberg, L., 2018. Julia interface to ﬁnuﬀt. URL: https://github.com/ludvigak/FINUFFT.jl .[23] Kloeden, P.E., Platen, E., 2013. Numerical solution of stochasticdiﬀerential equations. volume 23. Springer Science & BusinessMedia.[24] Lindskog, F., 2001. Linear correlation estimation. RiskLab Report,ETH Zurich.[25] Malherbe, C., 2007. Fourier method for the measurement of uni- variate and multivariate volatility in the presence of high frequencydata. MSc. Dissertation. University of Cape Town.[26] Malherbe, C., Hendricks, D., Gebbie, T., Wilcox, D., 2005. Matlabfunctions ftcorrgpu.m and fftcorrgpu.m .[27] Malliavin, P., Mancino, M.E., 2002. Fourier series method formeasurement of multivariate volatilities. Finance and Stochastics6, 49–61. doi: .[28] Malliavin, P., Mancino, M.E., 2009. A Fourier transform methodfor nonparametric estimation of multivariate volatility. Ann. Statist.37, 1983–2010. doi: .[29] Mancino, M., Recchioni, M., Sanfelici, S., 2017. Fourier-MalliavinVolatility Estimation Theory and Practice. Springer InternationalPublishing. doi: .[30] Mancino, M.E., Sanfelici, S., 2011. Estimating Covariance viaFourier Method in the Presence of Asynchronous Trading and Mi-crostructure Noise. Journal of Financial Econometrics 9, 367–408.doi: .[31] Mastromatteo, I., Marsili, M., Zoi, P., 2011. Financial corre-lations at ultra-high frequency: theoretical models and empiri-cal estimation. The European Physical Journal B 80, 243–253.doi: .[32] Matoti, L., 2009. Building a statistical linear factor model anda global minimum variance portfolio using estimated covariancematrices. MSc. Dissertation. University of Cape Town.[33] M¨unnix, M.C., Sch¨afer, R., Guhr, T., 2011. Statistical causesfor the epps eﬀect in microstructure noise. International Journalof Theoretical and Applied Finance 14, 1231–1246. doi: .[34] M¨unnix, M.C., Sch¨afer, R., Guhr, T., 2010. Impact of the tick-size on ﬁnancial returns and correlations. Physica A: StatisticalMechanics and its Applications 389, 4828 – 4843. doi: https://doi.org/10.1016/j.physa.2010.06.037 .[35] Park, S., Hong, S.Y., Linton, O., 2016. Estimating the quadraticcovariation matrix for asynchronously observed high frequencystock returns corrupted by additive measurement error. Journal ofEconometrics 191, 325 – 347. doi: https://doi.org/10.1016/j.jeconom.2015.12.005 . innovations in Measurement in Economicsand Econometrics.[36] Potts, D., Steidl, G., 2003. Fast Summation at NonequispacedKnots by NFFT. SIAM Journal on Scientiﬁc Computing 24, 2013–2037. doi: .[37] Precup, O.V., Iori, G., 2007. Cross-correlation measures in thehigh-frequency domain. The European Journal of Finance 13, 319–331. doi: .[38] Ren`o, R., 2003. A closer look at the Epps eﬀect. InternationalJournal of Theoretical and Applied Finance 06, 87–102. doi: .[39] Saichev, A., Sornette, D., 2014. A simple microstructure returnmodel explaining microstructure noise and epps eﬀects. Interna-tional Journal of Modern Physics C 25, 1450012. doi: .[40] T´oth, B., Kert´esz, J., 2009. The Epps eﬀect revisited. QuantitativeFinance 9, 793–802. doi: .[41] T´oth, B., Kert´esz, J., 2007. Modeling the Epps eﬀect of crosscorrelations in asset prices, in: Kert´esz, J., Bornholdt, S., Man-tegna, R.N. (Eds.), Noise and Stochastics in Complex Systemsand Finance, International Society for Optics and Photonics.SPIE. pp. 89 – 97. URL: https://doi.org/10.1117/12.727127 ,doi: . ppendix A. Additional ﬁgure sets

20 30 40 50-1.0-0.50.00.5 CFTFGGKBESFINUFFT 20 30 40 50-1.0-0.50.00.5 CFTFGGKBESFINUFFT

20 30 40 500.00.10.20.30.40.5 CFTFGGKBESFINUFFT 20 30 40 500.00.10.20.30.40.5 CFTFGGKBESFINUFFT

Figure A.11: The ﬁgures investigate the bias and MSE of the integrated covariance as a function of the number of Fourier coeﬃcients using theFej´er representation. The base-line price process is a synchronous GBM with n = 100 data points. The missing data representation is induced hereusing the regular non-synchronous trading [29], where the second asset is observed at every second trade of asset one. The methods investigatedare: the vectorised implementation (CFT - black dashes), the fast Gaussian gridding (FGG - blue dash-dot-dot), the Kaiser-Bessel kernel (KB -orange dashes), the exponential of semi-circle with our naive implementation (ES - red dash-dots) and the FINUFFT implementation (FINUFFT -green dots). The NUFFT methods are computed using the default (cid:15) = 10 − . We see that the fast Fourier methods recover the same bias andMSE results as the vectorised implementation and behave similarly to ﬁg. 8. The ﬁgures can be recovered using the Julia script ﬁle MSEBias onthe GitHub resource [5]. Appendix B. Sensitivity test

We perform a sensitivity analysis to ensure that the NUFFT implementation can correctly recover the target integratedcovariance, not depending on the parameters chosen in the main document. Here we simulate a synchronous bivariate GBMwith n = n = n = 10 data points with discretisation size ∆ t = 1 /n . Figures B.12a and B.12b is the Dirichlet representation,while Figures B.12c and B.12d is the Fej´er representation. For the case of the integrated variance (cid:82) T Σ ( t ) dt the true valueranges from 0.1 to 0.3, while for the case of the integrated covariance (cid:82) T Σ ( t ) dt the true value ranges from -0.1 to 0.1.Figure B.12 plots the estimated integrated covariance as a function of the true integrated covariance. The methodsinvestigated are: the vectorised implementation (CFT), the fast Gaussian gridding (FGG), the Kaiser-Bessel kernel (KB), theexponential of semi-circle with our naive implementation (ES) and the FINUFFT implementation (FINUFFT). The NUFFTmethods are computed using the default (cid:15) = 10 − . We see a linear relationship between the estimated integrated covarianceand the true integrated covariance. This conﬁrms that the NUFFT methods can correctly recover the target estimates.Moreover, the estimates for the various methods are exactly the same as each other. This conﬁrms that the NUFFT methodscan recover the same estimates as the traditional implementation of the estimator.20 .10 0.15 0.20 0.25 0.300.100.150.200.250.30 TrueCFTFGGKBESFINUFFT -0.10 -0.05 0.00 0.05 0.10-0.10-0.050.000.050.10 TrueCFTFGGKBESFINUFFT -0.10 -0.05 0.00 0.05 0.10-0.10-0.050.000.050.10 TrueCFTFGGKBESFINUFFT Figure B.12: The ﬁgures investigate if the NUFFT methods an correctly recover the target integrated covariance, independent of the speciﬁcparameters chosen. This is achieved by simulating a synchronous bivariate GBM with n = 10 data points with the true value of (cid:82) T Σ ( t ) dt ranging from 0.1 to 0.3 and the true value of (cid:82) T Σ ( t ) dt ranging from -0.1 to 0.1. The methods investigated are: the vectorised implementation(CFT - purple line), the fast Gaussian gridding (FGG - blue dash-dot-dot), the Kaiser-Bessel kernel (KB - orange dashes), the exponential ofsemi-circle with our naive implementation (ES - red dash-dots) and the FINUFFT implementation (FINUFFT - green dots). The NUFFT methodsare computed using the default (cid:15) = 10 − . The estimates recover a linear relationship between the estimated integrated covariance and the trueintegrated covariance, conﬁrming that the NUFFT methods can correctly recover the target estimates. The ﬁgures can be recovered using the Juliascript ﬁle Sensitivity on the GitHub resource [5]. Appendix C. Epps eﬀect EDA for 10 JSE stocks

Here we consider the 10 stocks as described in Table 3 and provide the correlation heat-maps in Figure C.14 and Epps eﬀectplots in Figure C.13. 21 igure C.13: We investigate the Epps eﬀect on the JSE by plotting the correlation estimate from the Malliavin-Mancino estimator as a function ofthe sampling interval ∆ t . The conversion between ∆ t and N is given by (19), assuming T = 144 , seconds in the 5 day period. The correlationpairs are plotted for all 10 equities. Sub-ﬁgure (a) is the estimates using the Dirichlet basis kernel and (b) the Fej´er basis kernel. It is clear that theFej´er kernel produces smoother estimates compared to the Dirichlet kernel due to the induced averaging. Finally, we see that most of the equitypairs produce the Epps eﬀect demonstrated in Figure 9, but more interesting is that there is an equity pair where the correlation switches signs fordiﬀerent ∆ t . The ﬁgures can be recovered using the Julia script ﬁle Empirical on the GitHub resource [5].Figure C.14: We investigate the Epps eﬀect on the JSE by plotting correlation structure as heat-maps for snapshots from Figure C.13. The snapshotsare taken for ∆ t = 1 , , and seconds for (a) to (d) and (e) to (h) respectively. (a) to (d) plots the correlation structure using the Dirichletbasis and (e) to (h) the Fej´er basis. We see that the correlations at short time scales are generally positively correlated - with the top right quadrantbeing the most positively correlated as they are from the banking sector. More interestingly, we see that the correlation pair FSR/AGL goes frompositively correlated to negatively correlated as ∆ t increases. The ﬁgures can be recovered using the Julia script ﬁle Empirical on the GitHub resource[5]. ppendix D. Simulated 3-asset case Here we consider 3 simulated correlated stocks to demonstrate the interplay of the combination of negative and positivecorrelations in Figure D.15. The change of scale can lead to spurious negative and positive correlations when there is insuﬃcientdata.

Figure D.15: We investigate whether the interplay from correlation combinations or estimation uncertainty can explain the correlation dynamicsfound in FSR/AGL from Figure 10. We simulate T = 28800 data points for a three feature Geometric Brownian Motion with induced correlationchoices ρ = − . (blue line), ρ = 0 . (red line) and ρ = 0 . (green line) for (a) and ρ = 0 . (blue line), ρ = 0 . (red line) and ρ = 0 . (green line) for (b). The synchronous case is then sampled with an exponential inter-arrival process with rate λ = 1 / . The Dirichletestimates are obtained for ∆ t ranging from 1 to 100 with the conversion to N given in eq. (19). This process is repeated 100 times and the averagecorrelation estimate at each ∆ t is then plotted with error bars (computed using a t-distribution with 99 degrees of freedom and the sample standarddeviation) representing 68% of the variability between the estimation paths. We see that for ρ ≈ , when n and N is not large enough, estimationuncertainty arises, explaining the switching of signs; but it does not account for the correlation magnitude dropping as ∆ t increases. The ﬁgurescan be recovered using the Julia script ﬁle 3Asset on the GitHub resource [5]. Appendix E. Algorithms

Algorithm outline for the various implementation methods.

Algorithm 1

GBM Algorithm

Require:

1. n: number of price points to simulate.2. µ : (D x 1) vector of drift parameters.3. Σ : (D x D) covariance matrix.4. start price: (D x 1) vector of S (0) .Procedure for the i th feature:1. Generate: Z ∼ N D ( , I DxD ) .

2. Set: S i ( t k +1 ) = S i ( t k ) exp [( µ i − σ i )( t k +1 − t k ) + √ t k +1 − t k (cid:80) dk =1 A ik Z k ] . return ( S ) Table E.4: The Geometric Brownian Motion (GBM) Algorithm (see algorithm 1) simulates a correlated multivariate GBM using the Euler–Maruyamascheme. It is subject to the initial condition S (0) = start price. A is the Cholesky decomposition of Σ . The Julia implementation can be found atGBM in the GitHub resource [5] and was provided by [15]. lgorithm 2 Nyquist Frequency

Require: ˜t : (n x D) of re-scaled sampled times. Non-trade times are represented using NaNs or NAs .Set: ∆ t i = minimum distance between sampled times for asset i .Set: N i = π ∆ t i .Set: N i = (cid:98) N i (cid:99) , where (cid:98) x (cid:99) denotes the ﬂoor of x .Set: N = min i { N i } . return (N) Table E.5: The Nyquist frequency Algorithm (see algorithm 2) computes the Nyquist cutoﬀ. The Julia implementation can be found in any of theDirichlet or Fej´er implementations in the GitHub resource [5] and is an auxiliary function based on the MATLAB implementation from [26, 20].

Algorithm 3

Malliavin-Mancino Estimators

Require: P : (n x D) matrix of sampled prices. Non-trade times are represented using NaNs or NAs .2. T : (n x D) matrix of sampled times. Non-trade times are represented using NaNs or NAs .3. N (Optional): cutoﬀ frequency (Integer) used in the convolution. Default is set to be the Nyquist cutoﬀ.4. tol (Optional): error tolerance for NUFFTs. Determines the number of grid points to spread. Default is set to − .Step I. Initialisation.I.1. Re-scale the sampled times ( T ) (see algorithm 4).I.2. Compute the Nyquist cutoﬀ ( N ) — unless speciﬁed otherwise through input parameter (see algorithm 2).Step F: Compute the Fourier coeﬃcients, k ∈ {− N, ..., N } . for i = 1 to D do F.1. Extract the re-scaled sampled times for the i th object: ˜t i = T ( i ) , excluding any NaNs or NAs .F.2. Extract and compute the logarithm of the sampled prices for the i th object: ˜p i = ln (cid:0) p i (cid:0) ˜t i (cid:1)(cid:1) , excluding any NaNs or NAs .F.3. Compute the returns: δ i ( I h ) = ˜ p i (˜ t ih +1 ) − ˜ p i (˜ t ih ) F.4. Compute the Fourier coeﬃcients: c + k ( i ) = n i − (cid:88) h =1 e i k ˜ t ih δ i ( I h ); c − k ( i ) = n i − (cid:88) h =1 e − i k ˜ t ih δ i ( I h ) end for Step C: Convolution.C.1. The Dirichlet implementation: ˆΣ ij = 12 N + 1 N (cid:88) k = − N [ c + k ( i ) c − k ( j )] C.2. The Fej´er implementation: ˆΣ ij = 1 N + 1 N (cid:88) k = − N (cid:18) − | k | N (cid:19) [ c + k ( i ) c − k ( j )] Correlation: R ij = Σ ij √ Σ ii √ Σ jj return ( Σ , R ) Table E.6: The Malliavin-Mancino estimators (see algorithm 3) computes the Dirichlet or Fej´er implementation of the Malliavin-Mancino estimator[27, 28] using a complex exponential formulation of the Fourier transform. The algorithm is a mere sketch provided by [20] and is based on theirMATLAB implementation [26]. lgorithm 4 Time-rescaling Algorithm

Require: T : (n x D) matrix of sampled times. Non-trade times are represented using NaNs or NAs .Set: t min = minimum value of T Set: t max = maximum value of T for i = 1 to D dofor h = 1 to n i do ˜ t ih = 2 π ( t ih − t min ) t max − t min end forend forreturn ( ˜t ) Table E.7: The Time-rescaling Algorithm (see algorithm 4) re-scales the trading times from [0 , T ] to [0 , π ] . The Julia implementation can be foundin any of the Dirichlet or Fej´er implementations in the GitHub resource [5] and is an auxiliary function based on the MATLAB implementation from[26, 20]. Algorithm 5 “for-loop” implementation

Require: δ i = ( δ i ( I h )) n i − h =1 : vector of source strengths for asset i .2. ˜t i = (˜ t ih ) n i − h =1 : vector of re-scaled sample times for asset i .3. N: the cutoﬀ frequency. for s =1 to N + 1 do k = s − N − c + s = n i − (cid:88) h =1 e i k ˜ t ih δ i ( I h ) c − s = n i − (cid:88) h =1 e − i k ˜ t ih δ i ( I h ) end forreturn ( c + k , c − k ) Table E.8: The for-loop implementation [29] (see algorithm 5) computes the Fourier coeﬃcients using for-loops. The Julia implementation canbe found in MScorrDK or MScorrFK on the GitHub resource [5] and correspond to the Dirichlet and Fej´er representation respectively. Theimplementation is based on the MATLAB implementation from [29]. lgorithm 6 Vectorised code implementation

Require: δ i = ( δ i ( I h )) n i − h =1 : vector of source strengths for asset i .2. ˜t i = (˜ t ih ) n i − h =1 : vector of re-scaled sample times for asset i .3. N: the cutoﬀ frequency.Set: k = (1 , , . . . , N ) T a column vector 1 to N.Compute: c N = δ (cid:124) i exp( − i ˜t i k (cid:124) ) Compute: c = (cid:80) n i − h =1 δ i ( I h ) Piece c N , c N and c together to obtain c + k and c − k return ( c + k , c − k ) Table E.9: The vectorised code implementation [20, 26] (see algorithm 6) replaces for-loops to vectorise the computation of the Fourier coeﬃcientsand exploits the Hermitian symmetry of the real source strengths. The Julia implementation can be found in CFTcorrDK or CFTcorrFK on theGitHub resource [5] and correspond to the Dirichlet and Fej´er representation respectively.

Algorithm 7

Zero-padded FFT implementation

Require: δ i = ( δ i ( I h )) n i − h =1 : vector of source strengths for asset i .2. ˜t i = (˜ t ih ) n i − h =1 : vector of re-scaled sample times for asset i ( ˜ t ih ∈ [0 , π ] ).3. N ∗ = (cid:98) π ∆ t (cid:101) , where (cid:98) x (cid:101) denotes rounding x to the nearest Integer and ∆ t is the minimum distance between sampledtimes from algorithm 2.Initialise: ( ˜ f (cid:96) ) N ∗ (cid:96) =1 = , a zero vector of length N ∗ . for h = 1 to n i − do (cid:96) = (cid:98) ˜ t ih N ∗ π (cid:101) + 1˜ f (cid:96) = δ i ( I h ) end forreturn ( ˜ f (cid:96) for FFT computation) Table E.10: The zero-padded FFT implementation (see algorithm 7) creates a uniform grid and shifts the non-uniform source points to the nearestgrid point on an up-sampled uniform grid. The Julia implementation can be found in FFTZPcorrDK or FFTZPcorrFK on the GitHub resource [5]and correspond to the Dirichlet and Fej´er representation respectively. Note that the index (cid:96) is set for languages with array indices starting from 1. lgorithm 8 Fast Gaussian Gridding NUFFT implementation

Require: δ i = ( δ i ( I h )) n i − h =1 : vector of source strengths for asset i .2. ˜t i = (˜ t ih ) n i − h =1 : vector of re-scaled sample times for asset i ( ˜ t ih ∈ [0 , π ] ).3. M = 2 N + 1 : the number of Fourier modes computed.4. (cid:15) : error tolerance.Step I. Initialisation:I.1. Set: σ = 2 .I.2. Set: M r = σM ; M sp = (cid:98) − ln( (cid:15) )( σ − / π ( σ − + (cid:99) .I.3. Set: λ = σ M sp σ ( σ − . ; h x = πM r ; t = πλ .I.4. Set: τ = πλM r .I.5. Initialise: ( ˜ f (cid:96) ) M r (cid:96) =1 = , a zero vector of length M r . for k = 1 to M sp do E ,k = exp( − t k ) end for Step C: Convolution (see eq. (12)). for h = 1 to n i − do b = (cid:98) ˜ t ih h x (cid:99) , index of nearest up-sampled grid ξ b ≤ ˜ t ih .d = ˜ t ih h x − b . E = , a zero vector of length M sp . E = e − t d ; E ,M sp = E ; E = e t d . for k = 1 to M sp do E ,M sp + k = E ,k E E k . end forfor k = 1 to M sp − do E ,M sp − k = E ,k E E − k . end for b d = min( M sp − , b ) ; b u = min( M sp , M r − b − . for k = − M sp + 1 to − b d − do ˜ f b + k + M r +1 = ˜ f b + k + M r +1 + δ i ( I h ) E ,M sp + k . end forfor k = − b d to b u do ˜ f b + k +1 = ˜ f b + k +1 + δ i ( I h ) E ,M sp + k . end forfor k = b u + 1 to M sp do ˜ f b + k − M r +1 = ˜ f b + k − M r +1 + δ i ( I h ) E ,M sp + k . end forend for Step F: Compute FFT on over-sampled grid (see eq. (13)).F.1. Find Fourier coeﬃcients F G ( dp i )( k ) via FFT on the grid ˜ f (cid:96) .Step D: Deconvolution (see eq. (14)).D.1. Compute: F ( dp i )( k ) = (cid:112) πτ e k τ F G ( dp i )( k ) M r . return ( F ( dp i )( k ) , k ∈ {− N, ..., N } ) Table E.11: The non-uniform FFT implementation (see algorithm 8) creates an up-sampled uniform grid and convolves the non-uniform sourcepoints onto the uniform grid, applies the FFT on the up-sampled grid and deconvolves the convolution eﬀects in the Fourier space. Algorithm 8 isspeciﬁc for the Gaussian kernel as it uses the fast Gaussian gridding implementation by [16] to reduce the number of exponential evaluations. TheJulia implementation can be found in NUFFT-FGG on the GitHub resource [5]. The Algorithm is a replication of the FORTRAN source code from[16], but adjusted for languages with array indices starting from 1. lgorithm 9 Kaiser-Bessel NUFFT implementation

Require: δ i = ( δ i ( I h )) n i − h =1 : vector of source strengths for asset i .2. ˜t i = (˜ t ih ) n i − h =1 : vector of re-scaled sample times for asset i ( ˜ t ih ∈ [0 , ).3. M = 2 N + 1 : the number of Fourier modes computed.4. (cid:15) : error tolerance.Step I. Initialisation:I.1. Set: σ = 2 .I.2. Set: M r = σM ; M sp = (cid:98) ( (cid:100) log ( (cid:15) ) (cid:101) + 2) (cid:99) .I.3. Initialise: ( ˜ f (cid:96) ) M r (cid:96) =1 = , a zero vector of length M r .Step C: Convolution (see eq. (12)). for h = 1 to n i − do b = (cid:98) ˜ t ih M r (cid:99) , index of nearest up-sampled grid ξ b ≤ ˜ t ih .d = ˜ t ih − b M r . b d = min( M sp − , b ) ; b u = min( M sp , M r − b − . for k = − M sp to − b d − do ˜ f b + k + M r +1 = ˜ f b + k + M r +1 + δ i ( I h ) ϕ KB ( d − kM r ) . end forfor k = − b d to b u do ˜ f b + k +1 = ˜ f b + k +1 + δ i ( I h ) ϕ KB ( d − kM r ) . end forfor k = b u + 1 to M sp do ˜ f b + k − M r +1 = ˜ f b + k − M r +1 + δ i ( I h ) ϕ KB ( d − kM r ) . end forend for Step F: Compute FFT on over-sampled grid (see eq. (13)).F.1. Find Fourier coeﬃcients F KB ( dp i )( k ) via FFT on the grid ˜ f (cid:96) .Step D: Deconvolution (see eq. (14)).D.1. Compute: F ( dp i )( k ) = M r F KB ( dp i )( k ) / ˆ ϕ KB ( k ) . return ( F ( dp i )( k ) , k ∈ {− N, ..., N } ) Table E.12: The non-uniform FFT implementation (see algorithm 9) creates an up-sampled uniform grid and convolves the non-uniform sourcepoints onto the uniform grid, applies the FFT on the up-sampled grid and deconvolves the convolution eﬀects in the Fourier space. Algorithm 9uses the Kaiser-Bessel kernel here, but the algorithm can be applied to any kernel that is 1-periodic. The algorithm is structured without thepre-computation step used in [36] and evaluates the algorithm “on-the-ﬂy”. The Julia implementation can be found in NUFFT-KB on the GitHubresource [5]. The algorithm is set for languages with array indices starting from 1. lgorithm 10 Exponential of semi-circle NUFFT implementation