[PDF] Complex Correlation Approach for High Frequency Financial Data

Abstract

We propose a novel approach that allows to calculate Hilbert transform based complex correlation for unevenly spaced data. This method is especially suitable for high frequency trading data, which are of a particular interest in finance. Its most important feature is the ability to take into account lead-lag relations on different scales, without knowing them in advance. We also present results obtained with this approach while working on Tokyo Stock Exchange intraday quotations. We show that individual sectors and subsectors tend to form important market components which may follow each other with small but significant delays. These components may be recognized by analysing eigenvectors of complex correlation matrix for Nikkei 225 stocks. Interestingly, sectorial components are also found in eigenvectors corresponding to the bulk eigenvalues, traditionally treated as noise.

Full PDF

CComplex Correlation Approach for High Frequency Financial Data

Mateusz Wilinski , ∗ Yuichi Ikeda , and Hideaki Aoyama Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw, Poland Scuola Normale Superiore, piazza dei Cavalieri 7, 56126 Pisa, Italy Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan Graduate School of Science, Kyoto University, Kyoto, Japan (Dated: November 9, 2018)We propose a novel approach that allows to calculate Hilbert transform based complexcorrelation for unevenly spaced data. This method is especially suitable for high frequencytrading data, which are of a particular interest in ﬁnance. Its most important feature is theability to take into account lead-lag relations on diﬀerent scales, without knowing them inadvance. We also present results obtained with this approach while working on Tokyo StockExchange intraday quotations. We show that individual sectors and subsectors tend to formimportant market components which may follow each other with small but signiﬁcant delays.These components may be recognized by analysing eigenvectors of complex correlation matrixfor Nikkei 225 stocks. Interestingly, sectorial components are also found in eigenvectorscorresponding to the bulk eigenvalues, traditionally treated as noise.

I. INTRODUCTION

Describing interactions between parts of a system is a crucial element of the complexity science.World Wide Web, human brain, transportation networks, these are just few complex systems thatattract the attention of researchers. The main tools used to investigate connections’ properties inhighly coupled systems are complex networks [1]. For many complex systems described with networks,it is clear what kind of relation is represented by a particular edge. In social networks, for example,it may be a friendship or a common work place. In other cases, like scientiﬁc collaboration or citycommunication, we can even assign a weight to each connection [2]. There are, however, complexsystems for which the relationship between any two elements is implicit. It is the case when weobserve only the evolution of the system and at the same time we are unable to track the actualinteractions and the way parts of the system inﬂuence each other. This applies to, among others,brain activity and ﬁnancial markets. The latter will be the main topic of this paper.In order to ﬁnd the real structure of a system with hidden dependencies we need to analyse theobservable outcome of system dynamics. For ﬁnancial markets this outcome might be stock quotationsand for brain activity it may be electroencephalography (EEG) signal. It is a common practice touse diﬀerent measures that would be able to indicate which parts of the system are connected basedon their signal similarity. In ﬁnance, the most popular measure is the Pearson correlation coeﬃcientwhich proved to be successful in uncovering hierarchical structure of the market [3]. In other areas,like in mentioned EEG signal, it is often needed to develop other methods that would suit the speciﬁcsof the data [4].Using correlation as a measure of dependency in ﬁnancial data has several drawbacks. First, thecorrelation does not imply causation and its high values may be a consequence of indirect relations.Some papers tried to deal with these problems by using more sophisticated methods, like partial corre-lation [5] or mutual information [6]. Second, price movements of related stocks may be desynchronized.In other words, one stock may lead the other one, which will be delayed. Traditional Pearson correla-tion will not be able to ﬁnd this kind of lead-lag relations. This problem was approached by analysingshifted correlations [7, 8]. Unfortunately, this methodology may be ineﬃcient when dealing with highnumber of signals, since the delays in ﬁnancial data may be diﬀerent across diﬀerent pairs of stocks.Taking into account that in both [7, 8] around 40 lags were analysed, it would mean estimating around10 correlations in case of our data. Moreover, this number grows like N . In this paper we proposea diﬀerent approach, suited for high frequency ﬁnancial data, which will take into account delays and ∗ Electronic address: [email protected] a r X i v : . [ q -f i n . S T ] N ov do it in an eﬃcient way. Furthermore, we will use this approach to analyse intraday quotations datafrom Tokyo Stock Exchange (TSE). II. DATA

The data used in this paper consist of all the trades and all the best bid and best ask oﬀerssubmitted to the TSE in the period between 1 January and 31 December 2014. We focused on thestocks that form the

Nikkei 225 , the most important stock market index for the TSE and one of themost signiﬁcant indexes in Asia. We choose 222 stocks that were components of Nikkei in 2014 andwere traded on every analysed day. For each stock i we calculated the mid price: P i ( t ) = A i ( t ) + B i ( t )2 , (1)where A i ( t ) and B i ( t ) are respectively best ask and best bid for i th stock. From mid prices we getthe logarithmic prices: p i ( t ) = log( P i ( t )) , (2)where p i ( t ) is a step function with steps at each quotation. We also grouped all the stocks into sectorsand subsectors, according to the oﬃcial Nikkei 225 website . See Table I for the names of sectors andsubsectors, as well as the assigned colors and shapes used in further graphs. The chosen data subsetconsist of more than 2 · quotations, however, 90% of them does not change the mid price. Theremaining part is around 1 . · signiﬁcant quotations with the least actively traded stock having 1778data points and the second least 8632 data point. It should also be pointed out that the resolution oftimestamps is one seconds, although, we know the precise order of quotes if they appear in the samesecond. III. METHODOLOGY

A nonparametric methodology of calculating covariance and correlation matrix for multivariatedata was proposed and described in details in [9]. It can be used with unevenly spaced time series andit already showed some meaningful results for ﬁnancial data [10]. It was also shown to outperformother methods when dealing with ﬁnite samples [11]. Thorough study showed as well, that it is almostunbiased in the presence of market microstructure noise [12]. The idea behind this method is basedon Fourier transformation. Its major advantage is that it does not require any modiﬁcations in rawdata. We will describe the way this approach is used with empirical data. More technical details canbe found in [9]. Lets denote the analysed price process as p i ( t ) = log P i ( t ). Additionally, let us assumethat this process is well deﬁned on the time interval [0 , T ]. First, we rescale mentioned time intervalinto [0 , π ]. Second, we need to calculate Fourier coeﬃcients of dp i ( t ), which are deﬁned as follows: a k ( dp i ) = 1 π (cid:90) π cos( kt ) dp i ( t ) ,b k ( dp i ) = 1 π (cid:90) π sin( kt ) dp i ( t ) . Assuming that p i ( t ) is a step function and t m is the time of the m th quotation, the above integralbecomes a summation: a k ( dp i ) = p i ( t N ) − p i ( t ) π − π N − (cid:88) m =1 p i ( t m ) (cos( kt m +1 ) − cos( kt m )) ,b k ( dp i ) = − π N − (cid:88) m =1 p i ( t m ) (sin( kt m +1 ) − sin( kt m )) , http://indexes.nikkei.co.jp/en/nkave where N is the number of quotations, t = 0 and t N = 2 π . From these coeﬃcients, we can obtainthe log-returns variance-covariance matrix Σ ij as a function of time. In our case though, we onlywant the average covariance on a particular time period. For this purpose, we only need the constantfactor of the covariance Fourier representation, because other parts will reduce to zero in the processof averaging. A precise derivation of this constant part may be found in [9]. We will only present theﬁnal result: a (Σ ij ) = lim τ → πτT T/ τ (cid:88) k =1 [ a k ( dp i ) a k ( dp j ) + b k ( dp i ) b k ( dp j )] , (3)where τ determines the highest wave harmonic T τ . In the case of our TSE data the minimum valueof τ could be one second, because this is the timestamp resolution. However, after analysing thedistribution of time intervals between quotations, we decided to set τ = 1 minute. This is consistentwith ﬁndings from [11, 13] where it was shown that this is the most eﬃcient scale for Fourier estimator.Moreover, as shown in [12], τ = 1 minute gives the best balance between the problem of ﬁnite sampleand asynchronicity. It should also be pointed out that taking a speciﬁc τ is not in any way equivalentwith aggregating data into evenly spaced time series and we still use all the data points. Finally wecan compute the covariance matrix estimator:ˆ σ ij = 2 πa (Σ ij ) , (4)and further also the correlation matrix: ρ ij = ˆ σ ij ˆ σ ii · ˆ σ jj . (5)An important advantage of this estimator is that it gives accurate estimates for step function type ofdata. This makes it particularly suited for ﬁnancial data.We can now use all the tick data available, but the problem of lead-lag relationships still holds.To overcome it, we will adopt and adjust to our needs ideas from the Complex Hilbert PrincipalComponent Analysis (CHPCA) [14], mainly the Hilbert Transform. The CHPCA was used primarilyin geophysics and meteorology. Recently, it also proved to be eﬀective with ﬁnancial data [15, 16]. Theprevious research, however, was made using daily data and was applicable only to synchronous datawith constant step. Although it makes sense to suspect diﬀerent countries indexes to have lead-lagrelations that would span for periods longer than one day, it is rather not the case of stocks from thesame market. The increased activity of intraday traders and hedge funds made the market much moreeﬃcient and synchronized [17]. As a result, one needs to use high frequency data in order to ﬁnd outwhich stocks are leading others. Since we do not want to aggregate the data, which may lead to dataloss, we would like to propose a way to use the Hilbert transform together with the Fourier algorithmdescribed before. We show that this may be done in a purely analytical way which not only allows usto use raw data, but also makes the procedure numerically eﬃcient.Hilbert transform is a linear operator that does not change the time domain of a transformedfunction. A formal deﬁnition of this operator is as follows: H ( Z, t ) = p.v. π (cid:90) ∞−∞ Z ( s ) t − s ds, (6)where p.v. stands for Cauchy principal value and Z is some process that we want to transform.Hilbert transform, at particular time t , codes information about the future and past movement oftransformed series. For this reason it is useful to use it when looking for correlations among diﬀerentsignals, especially if we expect that the relation may be lagged, but we don’t know the speciﬁc lag-value or we know that this lag may vary over time. In order to keep the information about the timeseries value and its Hilbert transform, we produce a complexiﬁed time series. This new time series hascomplex values with the real part being the original value and the imaginary part being the Hilberttransform. ˆ Z ( t ) = Z ( t ) + iH ( Z, t ) . (7)Such time series may now be used in order to obtain complex correlations, which should containboth the information about immediate and lagged correlations. Using Eq. (6) to obtain Hilberttransform of a time series, may be very diﬃcult. Our goal, however, is to ﬁnd the correlation matrixof complexiﬁed time series ˆ Z ( t ) and, as we will show, we do not need to know an explicit form of H ( Z, t ). In our case Z ( t ) = dp i ( t ) and we already know how to obtain its Fourier coeﬃcients thatlead to estimating its correlation matrix. Moreover, Hilbert transform is additive and can be easilycalculated for trigonometric functions, namely: H (sin( · ) , x ) = − cos( x ) ,H (cos( · ) , x ) = sin( x ) . As a result, if we know Fourier coeﬃcients of a given process we can easily calculate them also for itsHilbert transform: a k ( H ( Z )) = − b k ( Z ) ,b k ( H ( Z )) = a k ( Z ) . This way, we can easily obtain Fourier coeﬃcients of ˆ Z ( t ): a k ( ˆ Z ) = a k ( Z ) + ia k ( H ( Z )) = a k ( Z ) − ib k ( Z ) ,b k ( ˆ Z ) = b k ( Z ) + ib k ( H ( Z )) = b k ( Z ) + ia k ( Z ) . Finally we can use it with Eq. (3) and obtain: a (Σ ij ) = lim τ → πτT T/ τ (cid:88) k =1 [( a k ( dp i ) − ib k ( dp i ))( a k ( dp j ) + ib k ( dp j )) + ( b k ( dp i ) + ia k ( dp i ))( b k ( dp j ) − ia k ( dp j ))] . (8)Combining Eq. (3), (8), (4), (5) allows us to calculate the complex correlation matrix directly, withno numerical approximations. The precision of this estimate is limited only by the ﬁnite sample anddata timestamp resolution. In particular, including Hilbert transformation part is done in an entirelyanalytical way. As a result, the precision of our estimation is the same as for the Fourier approach,namely it is proportional to N − . It is also worth mentioning that the number of parameters neededto estimate complex correlation with our method is linear in N . Basically we only need to obtain theparameters of the Fourier correlation estimator. In contrast, previous works on complex correlationof daily ﬁnancial data [15, 16] required O ( N ) estimates.Each element ρ kl of the complex correlation matrix has the form: ρ kl = s kl e − iθ kl , (9)where s kl is the magnitude of correlation and the θ kl is the phase, which indicate the lead-lag relationbetween the stock k and the stock l . The last part of the data analysis done in this paper is theeigendecomposition of the correlation matrix. This is often used as a tool of the random matrixtheory for ﬁnancial data [18–20]. Since the complex correlation matrix ρ is Hermitian, we can writeit in a following form: ρ = N (cid:88) i =1 λ i V ( i ) V ( i ) † , (10)where λ i is the i th eigenvalue, V ( i ) is the corresponding eigenvector and † denotes a complex conjugate.The complex principal components CP i may be derived from eigenvectors according to the equation: CP i ( t ) = N (cid:88) j =1 dp j ( t ) V ( i ) j , (11)where dp j is in our case the process of log returns. The important thing is that V ( i ) j is actually thecomplex correlation between j th time series and the i th complex principal component. The complexform of this correlation informs us not only about which stocks, or groups of stocks, are closely relatedwith particular component but also about the lead lag relation between them. We will exploit it inthe empirical analysis of ﬁnancial data. Eigenvalue0246810121416 N u m b e r o f O cc u r a n c e FIG. 1: Histogram of the frequency of eigenvalues of complex correlation matrix for TSE intraday data (2014).

IV. RESULTS

We start our analysis by looking at the eigenvalues of complex correlation matrix obtained forNikkei 225 stock quotations in 2014. As it is shown in Fig. 1, there is one extremely large eigenvalue,few relatively big and the rest form a bulk. This structure is in accordance with previous results onﬁnancial data correlations [18–20] and our approach is able to reproduce them. It was often statedthat the largest eigenvalue describes the so called market mode , which represents the overall movementof the market. This movement was often consistent with the most important index in the analysedmarket. Moreover, it was shown in [21] that this eigenvalue, when all the correlations are positive, issimply a linear function of the average correlation. In our case correlations are complex but we checkedthat normal correlations are positive also for the data used in this work. To ﬁnd out, whether the sameinterpretation holds for complex correlations, let us look at the ﬁrst eigenvector, corresponding to thelargest eigenvalue. Fig. 2 shows in panel a) that all the coeﬃcients of the eigenvector, represented withcomplex points, group together on a small, nearly real and positive section in the complex plane. Eachpoint corresponds to a certain stock and their colors and shapes represent sectors as shown in TableI. There is no sectoral structure, maybe apart from few material stocks being closer to zero value. Ifwe recall that the coeﬃcients of each eigenvector are actually correlations between a particular stockand a principal component, we see that all the stocks are positively correlated with this componentand there are almost no delays, since phase diﬀerence is close to zero for all the points. This is exactlywhat we would expect from the market mode, which should lead all the stocks in a similar, positiveand immediate manner.The few big eigenvalues, apart from the largest one, are commonly associated with market sectorsand said to contain signiﬁcant information about the market structure. Finally, the rest of eigenvalues,being part of the bulk, are supposed to represent market noise. Our analysis of the complex correlationmatrix shows that the spectral structure of ﬁnancial high frequency data is probably much richer andcomplex. We observe that there are at least three signiﬁcantly diﬀerent groups of eigenvectors. Firstgroup consists of few largest eigenvalues, the ones that according to random matrix theory shouldcontain most of the non-noise information. We called this group the immediate components and theyare all shown in Fig. 2. The word immediate comes from the fact that all of them have eigenvectors’coeﬃcients spread across the real axis with their phases being approximately equal to zero. Thismeans that these components inﬂuence and are inﬂuenced by all the stocks with neglectable delay.Apart from the ﬁrst eigenvector, we can also see that stocks from the same sectors tend to group closeto each other, which is in line with the traditional interpretation of the components being related tomarket sectors. I m a g i n a r y P a r t (a)Eigenvector corresponding to the largest eigenvalue. I m a g i n a r y P a r t (b)Eigenvector corresponding to the second largesteigenvalue. I m a g i n a r y P a r t (c)Eigenvector corresponding to the third largesteigenvalue. I m a g i n a r y P a r t (d)Eigenvector corresponding to the fourth largesteigenvalue. I m a g i n a r y P a r t (e)Eigenvector corresponding to the ﬁfth largesteigenvalue. I m a g i n a r y P a r t (f)Eigenvector corresponding to the sixth largesteigenvalue. FIG. 2: Immediate eigenvectors’ coeﬃcients obtained from complex correlation matrix.

If we look at further eigenvalues and corresponding eigenvectors, the situation gets even moreinteresting. From seventh to somewhere around twenty-ﬁfth eigenvector we can still observe stocksgrouping into sectors. This time, however, they are no longer spread only on the real axis. Some ofthese components, which we called delayed , are presented in Fig. 3. There is always few stocks fromone sector or subsector occupying higher positive correlations on the real axis. We see them as beingthe core or the drivers of a given component. For example the cores of seventh and tenth componentsare respectively Financials and Capital Goods (speciﬁcally insurance and construction). Then we cansee one or more groups of subsectors that are leading or delayed, i.e. have positive or negative phase.Is there an explanation for these lead lag relations? We believe that they are not accidental, just likestocks being close to each other as a result of belonging to the same sector. To prove that, lets takea closer look at the delayed eigenvectors, starting with the seventh component shown in panel a) ofFig. 3. The main driver of this component is the ﬁnancial sector, with its stocks having the largestcorrelation magnitude and having a nearly zero phase diﬀerence. After a closer look, we see that thiscomponent separates not only sectors but even subsectors. All the ﬁnancial stocks on a far positivereal axis are from the insurance subsector. Moreover, these are all the insurance companies in theNikkei 225 index. The other ﬁnancial institutions are not correlated or even negatively correlatedwith this component. Other groups highly correlated with this component are the gas and electricitysubsectors. They seem to be leading this component, which might be surprising since it is representedmainly by insurance. We suspect, however, that this component is connected to households and theirprosperity. That is why it is led by gas and electricity, and is positively correlated with the insurancesubsector. Next three components shown in panels b), c) and d) are all strongly connected to all thestocks from petroleum and mining sectors, speciﬁcally oil and coal products. These subsectors are ofa great importance in Japan which lacks fossil fuels and needs to import them from other countriesmainly through sea transportation. We argue that these complex principal components representfossil fuels and their dependencies across the market. The tenth component is highly correlated withconstruction and depicts its lagged relation to mining and petroleum. Both thirteenth and sixteentheigenvectors are driven by the fossil fuels. The ﬁrst one follows the marine transport which representsthe main source of fossil fuels. The second one, on the other hand, is leading the train and bustransport, which is dependent on petrol. The eigenvectors shown in panels e) and f) are more chaoticbut we can still see stocks of the same sector being close to each other. These delayed eigenvectors,clearly linked to ﬁnancial and economic dependencies, are in the bulk which meant to be mostly noise.The above ﬁndings conﬁrm the predictions of [22], which showed that in case of factor models, bulkdoes not need to be driven only by the statistical uncertainty.The last group consists of chaotic components. As shown in Fig. 4, it is diﬃcult to ﬁnd any,at least connected to sectors, structures among their complex coeﬃcient. Nevertheless, we wouldlike to comment on two interesting phenomena that we observed. First, for the vast majority ofthese components we observe that there is one stock, that may be diﬀerent for diﬀerent eigenvectors,which has the highest correlation magnitude and at the same time its phase is equal to zero. Thisbehavior is shown in both panel a) and b) of Fig. 4 but as said before, it is very common amongchaotic components. An example of an eigenvector without this kind of driving stock is shown inpanel c) of Fig. 4. We attribute this eﬀect to mathematical constraints imposed by PCA, namelythe orthogonality of components and their decreasing variance. Second observation is connected tothe last eigenvector, shown in panel d) of Fig. 4, where there are three stocks with signiﬁcantly highabsolute correlation. These stocks are all from the same sector and they are all near the real axis,whereas all other stocks group closely to the (0 ,

0) point. Closer look at these three stocks shows thatthey not only belong to the same sector but also have a closely related price formation . Therefore,this component is a linear combination of very similar time series, with combination coeﬃcients whichsum up to somewhere around zero. As a result it has a very low variance and again this is connectedto mathematical constraints of PCA.As a next step, we would like to show aggregated results, instead of analysing each eigen subspace Speciﬁcally, they are: 8630 - SOMPO Holdings Inc, 8725 - Ms&Ad Insurance Group Holding Inc and 8766 - TokioMarine Holdings Inc. I m a g i n a r y P a r t (a)Eigenvector corresponding to the seventh largesteigenvalue. I m a g i n a r y P a r t (b)Eigenvector corresponding to the tenth largesteigenvalue. I m a g i n a r y P a r t (c)Eigenvector corresponding to the thirteenth largesteigenvalue. I m a g i n a r y P a r t (d)Eigenvector corresponding to the sixteenth largesteigenvalue. I m a g i n a r y P a r t (e)Eigenvector corresponding to the twentieth largesteigenvalue. I m a g i n a r y P a r t (f)Eigenvector corresponding to the twenty-fourth largesteigenvalue. FIG. 3: Delayed eigenvectors’ coeﬃcients obtained from complex correlation matrix. I m a g i n a r y P a r t (a)Eigenvector corresponding to the forty-ﬁrst largesteigenvalue. I m a g i n a r y P a r t (b)Eigenvector corresponding to the ﬁfty-sixth largesteigenvalue. I m a g i n a r y P a r t (c)Eigenvector corresponding to the ninetieth largesteigenvalue. I m a g i n a r y P a r t (d)Eigenvector corresponding to the last eigenvalue. FIG. 4: Chaotic eigenvectors’ coeﬃcients obtained from complex correlation matrix. separately. We suspect the complex correlation matrix to code information about sectors, in accor-dance with what we have seen so far. At ﬁrst, we shall get rid of the market mode by using the sumfrom eq. (10) but without the ﬁrst element ( i = 1). Then, we use a classical ﬁltering method called the Minimum Spanning Tree , which was used considerable number of times with ﬁnancial data [3, 10, 23–25]. Because our correlations are complex, we cannot use the usual formula for the distances in thecorrelation network. Instead, we will take correlations magnitude s kl = | ρ kl | as the edge weight inthe ﬁltering algorithm, and calculate the Maximum Spanning Tree (MST), since we want to maximizecorrelations. Additionally, we will use the phase θ kl in order to determine the direction of an edge.If θ kl > θ kl <

0) then the edge goes from l to k ( k to l ) and latter leads the previous one. Resultof this procedure is shown in Fig. 5 panel a), with four digits numbers being the stocks tickers atTSE. We observe a signiﬁcant sector and subsector clustering, which again conﬁrms that our methodis consistent with previous ﬁndings obtained by traditional correlation methods [3, 10, 24]. Moreover,there is an additional outcome, coded by the edge direction and the phase diﬀerence. Diﬀerent arrowcolors are connected to the phase diﬀerence size. Black corresponds to small phase diﬀerences, redmeans that the phase diﬀerence is close to π/ π or − π ,0which suggests negative correlations. Connections between stocks from the same sector are all black,whereas many intersector edges are orange. This indicates that inside of a sector, stocks tend to fol-low each other rather closely. On the other hand, there are strong, but probably negative correlationsbetween stocks from diﬀerent sectors. Another interesting observation is that highly connected stockshave much more arrows pointing at them than pointing outside. That means that they are actuallyleading the rest of the sector. To point out few examples, we see this for stocks: 7203 (Automobilesand Auto parts), 5401 (Steel products) and 8031 (Trading companies). This result is consistent with afrequent remark that the highly connected stocks in ﬁltered correlation networks are actually the mostsigniﬁcant ones. Similar results and conclusions were also obtained by estimating the time-dependentcorrelation function [7, 8]. A diﬀerent outcome of lead lag analysis may be found in [26], but in thiscase all the correlation were analysed, without any prior data ﬁltering.In order to valid the statements made above, we also used a less restrictive ﬁltering method, thatleaves three times more connections but has the MST backbone among them. This method is calledPlanar Maximally Filtered Graph (PMFG) [27] and was often used as a ﬁltering method for stockmarket dependencies [6, 28]. PMFG in Fig. 5 panel b), drawn without tickers to make the graph morereadable, conﬁrms the observations made on MST. Furthermore, this ﬁltering method allow cliquesand they are often formed by stocks belonging to the same sectors, similarly to what was shown in[27].A single outlier to all the characteristics presented above, is the stock 9983 - Fast Retailing (Retail).It is connected to diﬀerent sectors, it has out-degree higher then in-degree, despite having manyconnections, and it has both edges with high and low phase diﬀerence. The reason for this stock tohave so peculiar and exceptional structure of relations is that it is by far the most inﬂuential stockof the Nikkei 225 index. It had the highest index weight, much higher than any other stock analysedhere.As a last remark, we present Fig. 6, which shows the dependence between the complex correlationmagnitude and the complex correlation phase diﬀerence. As we can see in panel b), the highestcorrelations are between stocks from the same sector, and they are characterized by a small phasediﬀerences. The pairs with large phase diﬀerences and more signiﬁcant correlations, on the other hand,are more likely to be from diﬀerent sectors. This is in line with our ﬁndings in complex correlationbased ﬁltered graphs. V. CONCLUSIONS

We presented a novel approach that combines a Fourier transform based method for calculatingcorrelations in high frequency data, with a Hilbert transform based method of including lead-lagrelations in a correlation measure. This way we propose a unique tool to work with unevenly spaceddata. Moreover, the calculation of the complex correlation matrix from Fourier estimator coeﬃcientsis completely analytical and do not require additional numerical approximations. We should pointout, however, that as any other method, it is limited by the data resolution and aﬀected by the ﬁnitesize eﬀects.We further used this approach with TSE intraday data and proved that it gives insightful infor-mation, especially when analysing eigenspace of the complex correlation matrix. We conﬁrmed thedominance of sector relations in the market structure and showed that they inﬂuence also the bulkeigenvalues. Furthermore, even though the delays showed by the phase diﬀerences are small, theysuggest that stocks with higher correlations are leading others. It is particularly observed for groupsof stocks from the same sector and it is in line with the intuition that stocks with higher degree inﬁltered correlation network are the leading ones.We believe that this method may and should be used with other types of data, in particular datacharacterised by unevenly spaced observations.1 (a)Maximum Spanning Tree.(b)Planar Maximally Filtered Graph.

FIG. 5: Filtered complex correlation graphs for TSE intraday data (2014). Nodes represent stocks from Nikkei225 index and their colors represent sectors. Arrows indicate which stock is leading the other and their colorcorresponds to the phase diﬀerence between these stocks. Correlation Magnitude C o rr e l a t i o n P h a s e (a)Pairs of stocks from diﬀerent sectors. Correlation Magnitude C o rr e l a t i o n P h a s e (b)Pairs of stocks from the same sectors. FIG. 6: Dependence between correlation magnitude and its phase for all pairs of stocks in Nikkei 225 index in2014.

VI. ACKNOWLEDGMENTS

This work was supported by the National Science Centre under Grant 2015/19/N/ST2/02701. Au-thors would also like to thank Tomasz Raducha and two anonymous referees for insightful discussionsand suggestions. [1] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Physics reports , 175 (2006).[2] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, Proceedings of the National Academyof Sciences of the United States of America , 3747 (2004).[3] R. N. Mantegna, The European Physical Journal B-Condensed Matter and Complex Systems , 193(1999).[4] M. Kaminski and K. J. Blinowska, Biological cybernetics , 203 (1991).[5] D. Y. Kenett, M. Tumminello, A. Madi, G. Gur-Gershgoren, R. N. Mantegna, and E. Ben-Jacob, PloS one , e15032 (2010).[6] P. Fiedor, Physical Review E , 052801 (2014).[7] N. Huth and F. Abergel, Journal of Empirical Finance , 41 (2014).[8] L. Kullmann, J. Kert´esz, and K. Kaski, Physical Review E , 026125 (2002).[9] P. Malliavin and M. E. Mancino, Finance and Stochastics , 49 (2002).[10] G. Iori and O. V. Precup, Physical Review E , 036110 (2007).[11] M. Ø. Nielsen and P. Frederiksen, Journal of Empirical Finance , 265 (2008).[12] M. E. Mancino and S. Sanfelici, Computational Statistics & data analysis , 2966 (2008).[13] V. Mattiussi, M. Tumminello, G. Iori, and R. N. Mantegna, papers.ssrn.com (2011).[14] J. D. Horel, Journal of climate and Applied Meteorology , 1660 (1984).[15] Y. Arai, T. Yoshikawa, and H. Iyetomi, Procedia Computer Science , 1826 (2015).[16] I. Vodenska, H. Aoyama, Y. Fujiwara, H. Iyetomi, and Y. Arai, PloS one , e0150994 (2016).[17] B. T´oth and J. Kert´esz, Physica A: Statistical Mechanics and its Applications , 505 (2006).[18] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, Physical review letters , 1467 (1999).[19] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, and H. E. Stanley, Physical Review Letters ,1471 (1999).[20] A. Utsugi, K. Ino, and M. Oshikawa, Physical Review E , 026110 (2004).[21] S. Friedman and H. F. Weisberg, Educational and Psychological Measurement , 11 (1981).[22] F. Lillo and R. Mantegna, Physical Review E , 016219 (2005).[23] J.-P. Onnela, A. Chakraborti, K. Kaski, J. Kertesz, and A. Kanto, Physical Review E , 056110 (2003).[24] G. Bonanno, F. Lillo, and R. N. Mantegna, Quantitative Finance , 96 (2001). TABLE I: Sectors of Nikkei225 stocks.Machinery Capital Goods and OthersShipbuildingOther manufacturingConstructionReal estateFoods Consumer GoodsFisheryRetailServicesBanking FinancialsSecuritiesInsuranceOther ﬁnancial servicesTextiles and apparel MaterialsPulp and paperChemicalsRubber productsTrading companiesPetroleumGlass and ceramicsSteel productsNonferrous metalsMiningElectric machinery TechnologyAutomobiles and Auto partsPrecision instrumentsCommunicationsPharmaceuticalsRailway and bus Transportation and UtilitiesOther land transportMarine transportAir transportWarehousingElectric powerGas[25] M. Wili´nski, A. Sienkiewicz, T. Gubiec, R. Kutner, and Z. Struzik, Physica A: Statistical Mechanics andits Applications , 5963 (2013).[26] C. Curme, M. Tumminello, R. N. Mantegna, H. E. Stanley, and D. Y. Kenett, Quantitative Finance ,1375 (2015).[27] M. Tumminello, T. Aste, T. Di Matteo, and R. N. Mantegna, Proceedings of the National Academy ofSciences of the United States of America , 10421 (2005).[28] G. Buccheri, S. Marmi, and R. N. Mantegna, Physical Review E88