Complex Correlation Approach for High Frequency Financial Data
CComplex Correlation Approach for High Frequency Financial Data
Mateusz Wilinski , ∗ Yuichi Ikeda , and Hideaki Aoyama Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw, Poland Scuola Normale Superiore, piazza dei Cavalieri 7, 56126 Pisa, Italy Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Kyoto, Japan Graduate School of Science, Kyoto University, Kyoto, Japan (Dated: November 9, 2018)We propose a novel approach that allows to calculate Hilbert transform based complexcorrelation for unevenly spaced data. This method is especially suitable for high frequencytrading data, which are of a particular interest in finance. Its most important feature is theability to take into account lead-lag relations on different scales, without knowing them inadvance. We also present results obtained with this approach while working on Tokyo StockExchange intraday quotations. We show that individual sectors and subsectors tend to formimportant market components which may follow each other with small but significant delays.These components may be recognized by analysing eigenvectors of complex correlation matrixfor Nikkei 225 stocks. Interestingly, sectorial components are also found in eigenvectorscorresponding to the bulk eigenvalues, traditionally treated as noise.
I. INTRODUCTION
Describing interactions between parts of a system is a crucial element of the complexity science.World Wide Web, human brain, transportation networks, these are just few complex systems thatattract the attention of researchers. The main tools used to investigate connections’ properties inhighly coupled systems are complex networks [1]. For many complex systems described with networks,it is clear what kind of relation is represented by a particular edge. In social networks, for example,it may be a friendship or a common work place. In other cases, like scientific collaboration or citycommunication, we can even assign a weight to each connection [2]. There are, however, complexsystems for which the relationship between any two elements is implicit. It is the case when weobserve only the evolution of the system and at the same time we are unable to track the actualinteractions and the way parts of the system influence each other. This applies to, among others,brain activity and financial markets. The latter will be the main topic of this paper.In order to find the real structure of a system with hidden dependencies we need to analyse theobservable outcome of system dynamics. For financial markets this outcome might be stock quotationsand for brain activity it may be electroencephalography (EEG) signal. It is a common practice touse different measures that would be able to indicate which parts of the system are connected basedon their signal similarity. In finance, the most popular measure is the Pearson correlation coefficientwhich proved to be successful in uncovering hierarchical structure of the market [3]. In other areas,like in mentioned EEG signal, it is often needed to develop other methods that would suit the specificsof the data [4].Using correlation as a measure of dependency in financial data has several drawbacks. First, thecorrelation does not imply causation and its high values may be a consequence of indirect relations.Some papers tried to deal with these problems by using more sophisticated methods, like partial corre-lation [5] or mutual information [6]. Second, price movements of related stocks may be desynchronized.In other words, one stock may lead the other one, which will be delayed. Traditional Pearson correla-tion will not be able to find this kind of lead-lag relations. This problem was approached by analysingshifted correlations [7, 8]. Unfortunately, this methodology may be inefficient when dealing with highnumber of signals, since the delays in financial data may be different across different pairs of stocks.Taking into account that in both [7, 8] around 40 lags were analysed, it would mean estimating around10 correlations in case of our data. Moreover, this number grows like N . In this paper we proposea different approach, suited for high frequency financial data, which will take into account delays and ∗ Electronic address: [email protected] a r X i v : . [ q -f i n . S T ] N ov do it in an efficient way. Furthermore, we will use this approach to analyse intraday quotations datafrom Tokyo Stock Exchange (TSE). II. DATA
The data used in this paper consist of all the trades and all the best bid and best ask offerssubmitted to the TSE in the period between 1 January and 31 December 2014. We focused on thestocks that form the
Nikkei 225 , the most important stock market index for the TSE and one of themost significant indexes in Asia. We choose 222 stocks that were components of Nikkei in 2014 andwere traded on every analysed day. For each stock i we calculated the mid price: P i ( t ) = A i ( t ) + B i ( t )2 , (1)where A i ( t ) and B i ( t ) are respectively best ask and best bid for i th stock. From mid prices we getthe logarithmic prices: p i ( t ) = log( P i ( t )) , (2)where p i ( t ) is a step function with steps at each quotation. We also grouped all the stocks into sectorsand subsectors, according to the official Nikkei 225 website . See Table I for the names of sectors andsubsectors, as well as the assigned colors and shapes used in further graphs. The chosen data subsetconsist of more than 2 · quotations, however, 90% of them does not change the mid price. Theremaining part is around 1 . · significant quotations with the least actively traded stock having 1778data points and the second least 8632 data point. It should also be pointed out that the resolution oftimestamps is one seconds, although, we know the precise order of quotes if they appear in the samesecond. III. METHODOLOGY
A nonparametric methodology of calculating covariance and correlation matrix for multivariatedata was proposed and described in details in [9]. It can be used with unevenly spaced time series andit already showed some meaningful results for financial data [10]. It was also shown to outperformother methods when dealing with finite samples [11]. Thorough study showed as well, that it is almostunbiased in the presence of market microstructure noise [12]. The idea behind this method is basedon Fourier transformation. Its major advantage is that it does not require any modifications in rawdata. We will describe the way this approach is used with empirical data. More technical details canbe found in [9]. Lets denote the analysed price process as p i ( t ) = log P i ( t ). Additionally, let us assumethat this process is well defined on the time interval [0 , T ]. First, we rescale mentioned time intervalinto [0 , π ]. Second, we need to calculate Fourier coefficients of dp i ( t ), which are defined as follows: a k ( dp i ) = 1 π (cid:90) π cos( kt ) dp i ( t ) ,b k ( dp i ) = 1 π (cid:90) π sin( kt ) dp i ( t ) . Assuming that p i ( t ) is a step function and t m is the time of the m th quotation, the above integralbecomes a summation: a k ( dp i ) = p i ( t N ) − p i ( t ) π − π N − (cid:88) m =1 p i ( t m ) (cos( kt m +1 ) − cos( kt m )) ,b k ( dp i ) = − π N − (cid:88) m =1 p i ( t m ) (sin( kt m +1 ) − sin( kt m )) , http://indexes.nikkei.co.jp/en/nkave where N is the number of quotations, t = 0 and t N = 2 π . From these coefficients, we can obtainthe log-returns variance-covariance matrix Σ ij as a function of time. In our case though, we onlywant the average covariance on a particular time period. For this purpose, we only need the constantfactor of the covariance Fourier representation, because other parts will reduce to zero in the processof averaging. A precise derivation of this constant part may be found in [9]. We will only present thefinal result: a (Σ ij ) = lim τ → πτT T/ τ (cid:88) k =1 [ a k ( dp i ) a k ( dp j ) + b k ( dp i ) b k ( dp j )] , (3)where τ determines the highest wave harmonic T τ . In the case of our TSE data the minimum valueof τ could be one second, because this is the timestamp resolution. However, after analysing thedistribution of time intervals between quotations, we decided to set τ = 1 minute. This is consistentwith findings from [11, 13] where it was shown that this is the most efficient scale for Fourier estimator.Moreover, as shown in [12], τ = 1 minute gives the best balance between the problem of finite sampleand asynchronicity. It should also be pointed out that taking a specific τ is not in any way equivalentwith aggregating data into evenly spaced time series and we still use all the data points. Finally wecan compute the covariance matrix estimator:ˆ σ ij = 2 πa (Σ ij ) , (4)and further also the correlation matrix: ρ ij = ˆ σ ij ˆ σ ii · ˆ σ jj . (5)An important advantage of this estimator is that it gives accurate estimates for step function type ofdata. This makes it particularly suited for financial data.We can now use all the tick data available, but the problem of lead-lag relationships still holds.To overcome it, we will adopt and adjust to our needs ideas from the Complex Hilbert PrincipalComponent Analysis (CHPCA) [14], mainly the Hilbert Transform. The CHPCA was used primarilyin geophysics and meteorology. Recently, it also proved to be effective with financial data [15, 16]. Theprevious research, however, was made using daily data and was applicable only to synchronous datawith constant step. Although it makes sense to suspect different countries indexes to have lead-lagrelations that would span for periods longer than one day, it is rather not the case of stocks from thesame market. The increased activity of intraday traders and hedge funds made the market much moreefficient and synchronized [17]. As a result, one needs to use high frequency data in order to find outwhich stocks are leading others. Since we do not want to aggregate the data, which may lead to dataloss, we would like to propose a way to use the Hilbert transform together with the Fourier algorithmdescribed before. We show that this may be done in a purely analytical way which not only allows usto use raw data, but also makes the procedure numerically efficient.Hilbert transform is a linear operator that does not change the time domain of a transformedfunction. A formal definition of this operator is as follows: H ( Z, t ) = p.v. π (cid:90) ∞−∞ Z ( s ) t − s ds, (6)where p.v. stands for Cauchy principal value and Z is some process that we want to transform.Hilbert transform, at particular time t , codes information about the future and past movement oftransformed series. For this reason it is useful to use it when looking for correlations among differentsignals, especially if we expect that the relation may be lagged, but we don’t know the specific lag-value or we know that this lag may vary over time. In order to keep the information about the timeseries value and its Hilbert transform, we produce a complexified time series. This new time series hascomplex values with the real part being the original value and the imaginary part being the Hilberttransform. ˆ Z ( t ) = Z ( t ) + iH ( Z, t ) . (7)Such time series may now be used in order to obtain complex correlations, which should containboth the information about immediate and lagged correlations. Using Eq. (6) to obtain Hilberttransform of a time series, may be very difficult. Our goal, however, is to find the correlation matrixof complexified time series ˆ Z ( t ) and, as we will show, we do not need to know an explicit form of H ( Z, t ). In our case Z ( t ) = dp i ( t ) and we already know how to obtain its Fourier coefficients thatlead to estimating its correlation matrix. Moreover, Hilbert transform is additive and can be easilycalculated for trigonometric functions, namely: H (sin( · ) , x ) = − cos( x ) ,H (cos( · ) , x ) = sin( x ) . As a result, if we know Fourier coefficients of a given process we can easily calculate them also for itsHilbert transform: a k ( H ( Z )) = − b k ( Z ) ,b k ( H ( Z )) = a k ( Z ) . This way, we can easily obtain Fourier coefficients of ˆ Z ( t ): a k ( ˆ Z ) = a k ( Z ) + ia k ( H ( Z )) = a k ( Z ) − ib k ( Z ) ,b k ( ˆ Z ) = b k ( Z ) + ib k ( H ( Z )) = b k ( Z ) + ia k ( Z ) . Finally we can use it with Eq. (3) and obtain: a (Σ ij ) = lim τ → πτT T/ τ (cid:88) k =1 [( a k ( dp i ) − ib k ( dp i ))( a k ( dp j ) + ib k ( dp j )) + ( b k ( dp i ) + ia k ( dp i ))( b k ( dp j ) − ia k ( dp j ))] . (8)Combining Eq. (3), (8), (4), (5) allows us to calculate the complex correlation matrix directly, withno numerical approximations. The precision of this estimate is limited only by the finite sample anddata timestamp resolution. In particular, including Hilbert transformation part is done in an entirelyanalytical way. As a result, the precision of our estimation is the same as for the Fourier approach,namely it is proportional to N − . It is also worth mentioning that the number of parameters neededto estimate complex correlation with our method is linear in N . Basically we only need to obtain theparameters of the Fourier correlation estimator. In contrast, previous works on complex correlationof daily financial data [15, 16] required O ( N ) estimates.Each element ρ kl of the complex correlation matrix has the form: ρ kl = s kl e − iθ kl , (9)where s kl is the magnitude of correlation and the θ kl is the phase, which indicate the lead-lag relationbetween the stock k and the stock l . The last part of the data analysis done in this paper is theeigendecomposition of the correlation matrix. This is often used as a tool of the random matrixtheory for financial data [18–20]. Since the complex correlation matrix ρ is Hermitian, we can writeit in a following form: ρ = N (cid:88) i =1 λ i V ( i ) V ( i ) † , (10)where λ i is the i th eigenvalue, V ( i ) is the corresponding eigenvector and † denotes a complex conjugate.The complex principal components CP i may be derived from eigenvectors according to the equation: CP i ( t ) = N (cid:88) j =1 dp j ( t ) V ( i ) j , (11)where dp j is in our case the process of log returns. The important thing is that V ( i ) j is actually thecomplex correlation between j th time series and the i th complex principal component. The complexform of this correlation informs us not only about which stocks, or groups of stocks, are closely relatedwith particular component but also about the lead lag relation between them. We will exploit it inthe empirical analysis of financial data. Eigenvalue0246810121416 N u m b e r o f O cc u r a n c e FIG. 1: Histogram of the frequency of eigenvalues of complex correlation matrix for TSE intraday data (2014).
IV. RESULTS
We start our analysis by looking at the eigenvalues of complex correlation matrix obtained forNikkei 225 stock quotations in 2014. As it is shown in Fig. 1, there is one extremely large eigenvalue,few relatively big and the rest form a bulk. This structure is in accordance with previous results onfinancial data correlations [18–20] and our approach is able to reproduce them. It was often statedthat the largest eigenvalue describes the so called market mode , which represents the overall movementof the market. This movement was often consistent with the most important index in the analysedmarket. Moreover, it was shown in [21] that this eigenvalue, when all the correlations are positive, issimply a linear function of the average correlation. In our case correlations are complex but we checkedthat normal correlations are positive also for the data used in this work. To find out, whether the sameinterpretation holds for complex correlations, let us look at the first eigenvector, corresponding to thelargest eigenvalue. Fig. 2 shows in panel a) that all the coefficients of the eigenvector, represented withcomplex points, group together on a small, nearly real and positive section in the complex plane. Eachpoint corresponds to a certain stock and their colors and shapes represent sectors as shown in TableI. There is no sectoral structure, maybe apart from few material stocks being closer to zero value. Ifwe recall that the coefficients of each eigenvector are actually correlations between a particular stockand a principal component, we see that all the stocks are positively correlated with this componentand there are almost no delays, since phase difference is close to zero for all the points. This is exactlywhat we would expect from the market mode, which should lead all the stocks in a similar, positiveand immediate manner.The few big eigenvalues, apart from the largest one, are commonly associated with market sectorsand said to contain significant information about the market structure. Finally, the rest of eigenvalues,being part of the bulk, are supposed to represent market noise. Our analysis of the complex correlationmatrix shows that the spectral structure of financial high frequency data is probably much richer andcomplex. We observe that there are at least three significantly different groups of eigenvectors. Firstgroup consists of few largest eigenvalues, the ones that according to random matrix theory shouldcontain most of the non-noise information. We called this group the immediate components and theyare all shown in Fig. 2. The word immediate comes from the fact that all of them have eigenvectors’coefficients spread across the real axis with their phases being approximately equal to zero. Thismeans that these components influence and are influenced by all the stocks with neglectable delay.Apart from the first eigenvector, we can also see that stocks from the same sectors tend to group closeto each other, which is in line with the traditional interpretation of the components being related tomarket sectors. I m a g i n a r y P a r t (a)Eigenvector corresponding to the largest eigenvalue. I m a g i n a r y P a r t (b)Eigenvector corresponding to the second largesteigenvalue. I m a g i n a r y P a r t (c)Eigenvector corresponding to the third largesteigenvalue. I m a g i n a r y P a r t (d)Eigenvector corresponding to the fourth largesteigenvalue. I m a g i n a r y P a r t (e)Eigenvector corresponding to the fifth largesteigenvalue. I m a g i n a r y P a r t (f)Eigenvector corresponding to the sixth largesteigenvalue. FIG. 2: Immediate eigenvectors’ coefficients obtained from complex correlation matrix.
If we look at further eigenvalues and corresponding eigenvectors, the situation gets even moreinteresting. From seventh to somewhere around twenty-fifth eigenvector we can still observe stocksgrouping into sectors. This time, however, they are no longer spread only on the real axis. Some ofthese components, which we called delayed , are presented in Fig. 3. There is always few stocks fromone sector or subsector occupying higher positive correlations on the real axis. We see them as beingthe core or the drivers of a given component. For example the cores of seventh and tenth componentsare respectively Financials and Capital Goods (specifically insurance and construction). Then we cansee one or more groups of subsectors that are leading or delayed, i.e. have positive or negative phase.Is there an explanation for these lead lag relations? We believe that they are not accidental, just likestocks being close to each other as a result of belonging to the same sector. To prove that, lets takea closer look at the delayed eigenvectors, starting with the seventh component shown in panel a) ofFig. 3. The main driver of this component is the financial sector, with its stocks having the largestcorrelation magnitude and having a nearly zero phase difference. After a closer look, we see that thiscomponent separates not only sectors but even subsectors. All the financial stocks on a far positivereal axis are from the insurance subsector. Moreover, these are all the insurance companies in theNikkei 225 index. The other financial institutions are not correlated or even negatively correlatedwith this component. Other groups highly correlated with this component are the gas and electricitysubsectors. They seem to be leading this component, which might be surprising since it is representedmainly by insurance. We suspect, however, that this component is connected to households and theirprosperity. That is why it is led by gas and electricity, and is positively correlated with the insurancesubsector. Next three components shown in panels b), c) and d) are all strongly connected to all thestocks from petroleum and mining sectors, specifically oil and coal products. These subsectors are ofa great importance in Japan which lacks fossil fuels and needs to import them from other countriesmainly through sea transportation. We argue that these complex principal components representfossil fuels and their dependencies across the market. The tenth component is highly correlated withconstruction and depicts its lagged relation to mining and petroleum. Both thirteenth and sixteentheigenvectors are driven by the fossil fuels. The first one follows the marine transport which representsthe main source of fossil fuels. The second one, on the other hand, is leading the train and bustransport, which is dependent on petrol. The eigenvectors shown in panels e) and f) are more chaoticbut we can still see stocks of the same sector being close to each other. These delayed eigenvectors,clearly linked to financial and economic dependencies, are in the bulk which meant to be mostly noise.The above findings confirm the predictions of [22], which showed that in case of factor models, bulkdoes not need to be driven only by the statistical uncertainty.The last group consists of chaotic components. As shown in Fig. 4, it is difficult to find any,at least connected to sectors, structures among their complex coefficient. Nevertheless, we wouldlike to comment on two interesting phenomena that we observed. First, for the vast majority ofthese components we observe that there is one stock, that may be different for different eigenvectors,which has the highest correlation magnitude and at the same time its phase is equal to zero. Thisbehavior is shown in both panel a) and b) of Fig. 4 but as said before, it is very common amongchaotic components. An example of an eigenvector without this kind of driving stock is shown inpanel c) of Fig. 4. We attribute this effect to mathematical constraints imposed by PCA, namelythe orthogonality of components and their decreasing variance. Second observation is connected tothe last eigenvector, shown in panel d) of Fig. 4, where there are three stocks with significantly highabsolute correlation. These stocks are all from the same sector and they are all near the real axis,whereas all other stocks group closely to the (0 ,
0) point. Closer look at these three stocks shows thatthey not only belong to the same sector but also have a closely related price formation . Therefore,this component is a linear combination of very similar time series, with combination coefficients whichsum up to somewhere around zero. As a result it has a very low variance and again this is connectedto mathematical constraints of PCA.As a next step, we would like to show aggregated results, instead of analysing each eigen subspace Specifically, they are: 8630 - SOMPO Holdings Inc, 8725 - Ms&Ad Insurance Group Holding Inc and 8766 - TokioMarine Holdings Inc. I m a g i n a r y P a r t (a)Eigenvector corresponding to the seventh largesteigenvalue. I m a g i n a r y P a r t (b)Eigenvector corresponding to the tenth largesteigenvalue. I m a g i n a r y P a r t (c)Eigenvector corresponding to the thirteenth largesteigenvalue. I m a g i n a r y P a r t (d)Eigenvector corresponding to the sixteenth largesteigenvalue. I m a g i n a r y P a r t (e)Eigenvector corresponding to the twentieth largesteigenvalue. I m a g i n a r y P a r t (f)Eigenvector corresponding to the twenty-fourth largesteigenvalue. FIG. 3: Delayed eigenvectors’ coefficients obtained from complex correlation matrix. I m a g i n a r y P a r t (a)Eigenvector corresponding to the forty-first largesteigenvalue. I m a g i n a r y P a r t (b)Eigenvector corresponding to the fifty-sixth largesteigenvalue. I m a g i n a r y P a r t (c)Eigenvector corresponding to the ninetieth largesteigenvalue. I m a g i n a r y P a r t (d)Eigenvector corresponding to the last eigenvalue. FIG. 4: Chaotic eigenvectors’ coefficients obtained from complex correlation matrix. separately. We suspect the complex correlation matrix to code information about sectors, in accor-dance with what we have seen so far. At first, we shall get rid of the market mode by using the sumfrom eq. (10) but without the first element ( i = 1). Then, we use a classical filtering method called the Minimum Spanning Tree , which was used considerable number of times with financial data [3, 10, 23–25]. Because our correlations are complex, we cannot use the usual formula for the distances in thecorrelation network. Instead, we will take correlations magnitude s kl = | ρ kl | as the edge weight inthe filtering algorithm, and calculate the Maximum Spanning Tree (MST), since we want to maximizecorrelations. Additionally, we will use the phase θ kl in order to determine the direction of an edge.If θ kl > θ kl <
0) then the edge goes from l to k ( k to l ) and latter leads the previous one. Resultof this procedure is shown in Fig. 5 panel a), with four digits numbers being the stocks tickers atTSE. We observe a significant sector and subsector clustering, which again confirms that our methodis consistent with previous findings obtained by traditional correlation methods [3, 10, 24]. Moreover,there is an additional outcome, coded by the edge direction and the phase difference. Different arrowcolors are connected to the phase difference size. Black corresponds to small phase differences, redmeans that the phase difference is close to π/ π or − π ,0which suggests negative correlations. Connections between stocks from the same sector are all black,whereas many intersector edges are orange. This indicates that inside of a sector, stocks tend to fol-low each other rather closely. On the other hand, there are strong, but probably negative correlationsbetween stocks from different sectors. Another interesting observation is that highly connected stockshave much more arrows pointing at them than pointing outside. That means that they are actuallyleading the rest of the sector. To point out few examples, we see this for stocks: 7203 (Automobilesand Auto parts), 5401 (Steel products) and 8031 (Trading companies). This result is consistent with afrequent remark that the highly connected stocks in filtered correlation networks are actually the mostsignificant ones. Similar results and conclusions were also obtained by estimating the time-dependentcorrelation function [7, 8]. A different outcome of lead lag analysis may be found in [26], but in thiscase all the correlation were analysed, without any prior data filtering.In order to valid the statements made above, we also used a less restrictive filtering method, thatleaves three times more connections but has the MST backbone among them. This method is calledPlanar Maximally Filtered Graph (PMFG) [27] and was often used as a filtering method for stockmarket dependencies [6, 28]. PMFG in Fig. 5 panel b), drawn without tickers to make the graph morereadable, confirms the observations made on MST. Furthermore, this filtering method allow cliquesand they are often formed by stocks belonging to the same sectors, similarly to what was shown in[27].A single outlier to all the characteristics presented above, is the stock 9983 - Fast Retailing (Retail).It is connected to different sectors, it has out-degree higher then in-degree, despite having manyconnections, and it has both edges with high and low phase difference. The reason for this stock tohave so peculiar and exceptional structure of relations is that it is by far the most influential stockof the Nikkei 225 index. It had the highest index weight, much higher than any other stock analysedhere.As a last remark, we present Fig. 6, which shows the dependence between the complex correlationmagnitude and the complex correlation phase difference. As we can see in panel b), the highestcorrelations are between stocks from the same sector, and they are characterized by a small phasedifferences. The pairs with large phase differences and more significant correlations, on the other hand,are more likely to be from different sectors. This is in line with our findings in complex correlationbased filtered graphs. V. CONCLUSIONS
We presented a novel approach that combines a Fourier transform based method for calculatingcorrelations in high frequency data, with a Hilbert transform based method of including lead-lagrelations in a correlation measure. This way we propose a unique tool to work with unevenly spaceddata. Moreover, the calculation of the complex correlation matrix from Fourier estimator coefficientsis completely analytical and do not require additional numerical approximations. We should pointout, however, that as any other method, it is limited by the data resolution and affected by the finitesize effects.We further used this approach with TSE intraday data and proved that it gives insightful infor-mation, especially when analysing eigenspace of the complex correlation matrix. We confirmed thedominance of sector relations in the market structure and showed that they influence also the bulkeigenvalues. Furthermore, even though the delays showed by the phase differences are small, theysuggest that stocks with higher correlations are leading others. It is particularly observed for groupsof stocks from the same sector and it is in line with the intuition that stocks with higher degree infiltered correlation network are the leading ones.We believe that this method may and should be used with other types of data, in particular datacharacterised by unevenly spaced observations.1 (a)Maximum Spanning Tree.(b)Planar Maximally Filtered Graph.
FIG. 5: Filtered complex correlation graphs for TSE intraday data (2014). Nodes represent stocks from Nikkei225 index and their colors represent sectors. Arrows indicate which stock is leading the other and their colorcorresponds to the phase difference between these stocks. Correlation Magnitude C o rr e l a t i o n P h a s e (a)Pairs of stocks from different sectors. Correlation Magnitude C o rr e l a t i o n P h a s e (b)Pairs of stocks from the same sectors. FIG. 6: Dependence between correlation magnitude and its phase for all pairs of stocks in Nikkei 225 index in2014.
VI. ACKNOWLEDGMENTS
This work was supported by the National Science Centre under Grant 2015/19/N/ST2/02701. Au-thors would also like to thank Tomasz Raducha and two anonymous referees for insightful discussionsand suggestions. [1] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Physics reports , 175 (2006).[2] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, Proceedings of the National Academyof Sciences of the United States of America , 3747 (2004).[3] R. N. Mantegna, The European Physical Journal B-Condensed Matter and Complex Systems , 193(1999).[4] M. Kaminski and K. J. Blinowska, Biological cybernetics , 203 (1991).[5] D. Y. Kenett, M. Tumminello, A. Madi, G. Gur-Gershgoren, R. N. Mantegna, and E. Ben-Jacob, PloS one , e15032 (2010).[6] P. Fiedor, Physical Review E , 052801 (2014).[7] N. Huth and F. Abergel, Journal of Empirical Finance , 41 (2014).[8] L. Kullmann, J. Kert´esz, and K. Kaski, Physical Review E , 026125 (2002).[9] P. Malliavin and M. E. Mancino, Finance and Stochastics , 49 (2002).[10] G. Iori and O. V. Precup, Physical Review E , 036110 (2007).[11] M. Ø. Nielsen and P. Frederiksen, Journal of Empirical Finance , 265 (2008).[12] M. E. Mancino and S. Sanfelici, Computational Statistics & data analysis , 2966 (2008).[13] V. Mattiussi, M. Tumminello, G. Iori, and R. N. Mantegna, papers.ssrn.com (2011).[14] J. D. Horel, Journal of climate and Applied Meteorology , 1660 (1984).[15] Y. Arai, T. Yoshikawa, and H. Iyetomi, Procedia Computer Science , 1826 (2015).[16] I. Vodenska, H. Aoyama, Y. Fujiwara, H. Iyetomi, and Y. Arai, PloS one , e0150994 (2016).[17] B. T´oth and J. Kert´esz, Physica A: Statistical Mechanics and its Applications , 505 (2006).[18] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, Physical review letters , 1467 (1999).[19] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, and H. E. Stanley, Physical Review Letters ,1471 (1999).[20] A. Utsugi, K. Ino, and M. Oshikawa, Physical Review E , 026110 (2004).[21] S. Friedman and H. F. Weisberg, Educational and Psychological Measurement , 11 (1981).[22] F. Lillo and R. Mantegna, Physical Review E , 016219 (2005).[23] J.-P. Onnela, A. Chakraborti, K. Kaski, J. Kertesz, and A. Kanto, Physical Review E , 056110 (2003).[24] G. Bonanno, F. Lillo, and R. N. Mantegna, Quantitative Finance , 96 (2001). TABLE I: Sectors of Nikkei225 stocks.Machinery Capital Goods and OthersShipbuildingOther manufacturingConstructionReal estateFoods Consumer GoodsFisheryRetailServicesBanking FinancialsSecuritiesInsuranceOther financial servicesTextiles and apparel MaterialsPulp and paperChemicalsRubber productsTrading companiesPetroleumGlass and ceramicsSteel productsNonferrous metalsMiningElectric machinery TechnologyAutomobiles and Auto partsPrecision instrumentsCommunicationsPharmaceuticalsRailway and bus Transportation and UtilitiesOther land transportMarine transportAir transportWarehousingElectric powerGas[25] M. Wili´nski, A. Sienkiewicz, T. Gubiec, R. Kutner, and Z. Struzik, Physica A: Statistical Mechanics andits Applications , 5963 (2013).[26] C. Curme, M. Tumminello, R. N. Mantegna, H. E. Stanley, and D. Y. Kenett, Quantitative Finance ,1375 (2015).[27] M. Tumminello, T. Aste, T. Di Matteo, and R. N. Mantegna, Proceedings of the National Academy ofSciences of the United States of America , 10421 (2005).[28] G. Buccheri, S. Marmi, and R. N. Mantegna, Physical Review E88