Localization in covariance matrices of coupled heterogenous Ornstein-Uhlenbeck processes
LLocalization in covariance matrices of coupled heterogenous Ornstein-Uhlenbeckprocesses
Paolo Barucca ∗ Dipartimento di Fisica, Universit`a La Sapienza,P.le A. Moro 2, I-00185 Roma, Italy (Dated: November 12, 2018)We define a random-matrix ensemble given by the infinite-time covariance matrices of Ornstein-Uhlenbeck processes at different temperatures coupled by a Gaussian symmetric matrix. The spec-tral properties of this ensemble are shown to be in qualitative agreement with some stylized facts offinancial markets. Through the presented model formulas are given for the analysis of heterogeneoustime-series. Furthermore evidence for a localization transition in eigenvectors related to small andlarge eigenvalues in cross-correlations analysis of this model is found and a simple explanation oflocalization phenomena in financial time-series is provided. Finally we identify both in our modeland in real financial data an inverted-bell effect in correlation between localized components andtheir local temperature: high and low temperature/volatility components are the most localizedones.
I. INTRODUCTION
Complex systems are hard to analyse since by defi-nition the interactions among their components are noteasily connected with their behaviours [1]. In these sys-tems the absence of a well-defined general model makescorrelation analysis an irreplaceable, if not unique, com-pass [2, 3]. Furthermore in these systems the presence ofnoise makes benchmarking important and random matrixtheory (RMT) is fundamental to check the statistical va-lidity of pair-correlations.RMT has mainly focused on the effects of the finitelengths of time series. In particular a careful analysishas been carried out on the spectral properties of ran-dom matrices in the case where the number of variables N is large and the length of the signal M is comparable,i.e. with a finite ratio Q = M/N [4–8]. In this case thetotal time is not enough large for making the noise neg-ligible: one needs to disentangle the properties inducedby couplings from the ones brought by randomness.Nevertheless time-series in complex systems are not onlynoisy and finite but also heterogeneous, which meanstheir variances can be really different (i.e. the varianceof one time series can be very different from the varianceof another time series). More generally the marginal dis-tribution of one variable may be qualitatively and quan-titatively different from the one of another variable.In finance, on which we will focus our considerations, thevolatilities of different assets, i.e. the index of the per-centage change in stock prices, have a very broad distri-bution [9], i.e. there is a strong heterogeneity between thereturns of different assets. In recent studies it has beenshown that this distribution is similar to a log-normalthat is compatible with a fractal model of the market[10, 11]. This feature has been included in models based ∗ [email protected] on the random matrix Wishart ensemble to improve thecomparison with real matrices [12–14].Summarising complex systems are heterogeneous, disor-dered and noisy and they have a non-trivial relationshipbetween interactions and correlations: carefully studiedbenchmarks are needed to gain a more detailed insight.In the following we will see how these different featuresare interconnected and we will point out how importantis to consider them together in order to predict their ef-fects on cross-correlation analysis.The aim of this article is to observe the consequencesof heterogeneity in a simple ad hoc model that allowsto explicitly compute the relation between couplings andcorrelations.In Section II we start from the basic dynamical modelgiven by a set of independent Ornstein-Uhlenbeck (OU)processes at different temperatures. Then we turn tothe interacting case where the OU processes are coupledthrough a given matrix. The ensemble we consider isthe one given by the infinite-time covariance matricesof OU processes at different temperatures coupled by aGaussian symmetric matrix. We also consider the sta-tionary distribution of the time-series induced and showthe relation with the known Wishart-Laguerre ensembleof random matrices.In Section III we show the results of numerical simula-tions in the asymptotic limit. Varying heterogeneity wecompute the spectral density of eigenvalues, the inverseparticipation ratio (IPR), a standard index of eigenvec-tors localization [15], and the component participationratio (CPR), that defines the contribution of a given com-ponent on all the eigenvectors. We check this ensembleproperties both in averaged and single-sample eigenvec-tors. Moreover we identify a steep change in eigenvectorlocalization driven by heterogeneity, that might be an in-dicator for a transition from an extended phase towards alocalized phase in the eigenvectors of the cross-correlationmatrix of the model. Finally we discuss the results bothwith respect to the known spectral properties of random a r X i v : . [ q -f i n . S T ] O c t matrix models and with the real localization propertieswidely observed in financial data [16, 17] and give theo-retical perspectives. II. COUPLED HETEROGENEOUS OUPROCESSA. Indipendent OU processes
In the following we will consider signals extracted fromthe equilibrium distribution of a continuous-time stochas-tic dynamics. The interest of this model for applicationsrelies on the hypothesis that in complex systems observa-tions are samplings from a complicated noisy dynamicsas, for instance, in finance daily prices are the result ofall the small price adjustments given by all the transac-tions.We would like to stress though that we do not want tomodel a particular asset dynamics in detail: each class ofassets may require a different dynamics and more compli-cated non-linear interaction terms that would not allowto give explicit formulas for the direct, from couplings tocorrelations, and inverse problem, from correlations tocouplings.The aim is to construct a null-model including a spe-cific parametrization that separates couplings and tem-peratures in order to explicitly distinguish their role onthe covariance matrix. We start our analysis from alimit-case, the noisy dynamics of N independent vari-ables x = { x , x ...x N } following a standard OU processwith a set of N temperatures T = { T , T ...T N } :˙ x i = − x i + (cid:112) T i η i ( t ) , (1)where η i ( t ) is a delta-correlated Gaussian noise with (cid:104) η i ( t ) η j ( t (cid:48) ) (cid:105) = 2 δ ij δ ( t − t (cid:48) ). In this case the marginalequilibrium distribution for x i is P i ( x i ): P i ( x i ) = e − x i Ti √ πT i (2)If we know sample the values of all the x i ’s at M timeswe can compute the empirical covariance coefficients, C ij = x i x j − x i x j , where · indicates the average overthe M sampled times.For an infinite value of the ratio Q = M/N the covariancematrix converges towards a diagonal one, C ij = T i δ ij .Meanwhile for a finite value of Q the off-diagonal ele-ments of C are N ( N − / T i T j M .In this case the Pearson correlation-matrix c ij = C ij / (cid:112) C ii C jj has exactly the same statistics of a matrixextracted from the widely-used Wishart-Laguerre ensem-ble of random matrices since its elements are the pair-correlations of N normally distributed signals of length M . The heterogeneity we have put in the dynamics playsno role in the correlation matrix in this case. B. Coupled OU processes
The generalisation to the coupled case is interesting.The dynamics now verifies:˙ x i = − (cid:88) j J ij x j + (cid:112) T i η i ( t ) , (3)where J ij is symmetric and positive-definite in order toensure a finite limit to the process. In the following wewill focus our analysis on the asymptotic limit since in thepresent work we are not interested in the consequencesof the interplay of finite Q and heterogeneity but solelyon the consequences of the latter. In this system thereare two different methods [18] to obtain a closed formulafor the asymptotic covariance matrix, C ij = (cid:104) x i x j (cid:105) , asa function of couplings and temperatures ( (cid:104)·(cid:105) indicatesthe average over an infinite time). Starting from thedynamics with a few standard steps it is possible to findthe implicit formula : { C, J } = 2 ˜ T , (4)where ˜ T ij = T i δ ij , and {· , ·} denotes the matrix anti-commutator. From the spectral decomposition of J it ispossible to find a set of explicit formulas for the elementsof C ij : C ij = 2 (cid:88) a,b u ai u bj λ a + λ b (cid:88) k u ak u bk T k , (5)where u ai is the i th component a th eigenvector of J and λ a is the a -th eigenvalue. In (4) C and J appear in asymmetric form and the same symmetry must hold alsoin (5). This fact implies that (5) can be used to solvethe inverse problem for this system, that is finding thecouplings J given the covariances C . This symmetry isnot surprising since it holds also in the familiar homo-geneous case where C = J − , an ostensibly symmetricformula. In Appendix A we examine the consequenceson the Pearson correlation matrix c in the case of smallcouplings.We have thus defined two different random-matrix en-sembles: one, that we will examine in the next Section,is the set of infinite-time covariance matrices that are de-fined by formula (5) for coupling matrices J sampled froma given random-matrix ensemble (for instance the Gaus-sian ensemble) and for sets of temperatures T sampledfrom a distribution chosen at will, the other (AppendixB) is the set of finite-time empirical covariance matricesbetween signals sampled from the stationary distributionof the OU dynamics for a given infinite-time C . III. SAMPLING MATRICES
Since we are interested in finding the consequences ofheterogeneity we use straightly the infinite-time asymp-totic formula (5) so that we avoid simulating the whole λ ρ ( λ ) as D increases FIG. 1. For fixed N = 100 and (cid:15) = 0 . / √ N we plot thespectral density of the correlation matrix C for D = 0 . , . . samples. Increasing D we see that the lower edge of the spectrum becomes smallerand smaller and conversely that the higher edge increases. stochastic dynamics. Thus we generate a random cou-pling matrix J = I + (cid:15)K where I is the identity ma-trix, where (cid:15) is the strength of the coupling among sig-nals and K and random Gaussian matrix whose elementshave variance N . J must be positive-definite for any N so we eliminated samples with non-positive eigenvaluesthat have vanishing probability as N goes to infinity. Inprinciple it is possible to consider any kind of probabilitymeasure for couplings and temperatures, the main ideaaddressed here is to regard couplings as homogeneousso that temperatures are the only source of heterogene-ity. Since in the financial context temperatures repre-sent volatilities that are typically log-normal distributed[10, 11] we choose to draw them from this kind of distri-bution: p ( T ) = 1 T e − (log T − µ )22 D √ πD (6)Namely we generate N normally distributed randomnumbers, ξ i , and define T i = e µ + Dξ i . Then we fix (cid:15) and draw the coupling matrix J , diagonalise it and use(5) to obtain C . Varying D , (cid:15) and N we observe somebasic features of the C matrix. First we compute theeigenvalue distribution changing N at fixed D = 1 and (cid:15) = 0 . / √ N and we notice that, as N increases, the dis-tribution rapidly converges towards an infinite-size spec-trum. Once this is verified we study the eigenvalue distri-bution varying D alone. The spectrum spreads on bothedges as is often observed in real data analysis Fig.1.Thus introducing heterogeneity we have new eigenvalues,both small and large, so we enquiry the related eigen-vectors and check whether they are statistically differentfrom the ones in the homogeneous bulk of the spectrum.We characterise the eigenvectors of C , v ai , through theIPR, a standard quantity in matrix analysis, defined bythe formula: IP R a = (cid:88) i ( v ai ) (7) I P R null−modelReal dataJ −1 market eigenvector FIG. 2. For each ordered eigenvalue we plot the mean valueof the n-th IPR versus the eigenvalue index averaged over1000 samples for a system size N = 100 and a value of (cid:15) =0 . / √ N for D = .
74 and µ = 7 .
74 (as obtained from realdata volatilities). Crosses show the IPR averaged over 10matrices of daily asset returns from NYSE from the first ofJune 1987 to the 31 of December 1998. The J − line is theequal temperatures case ( D = 0). We see that the largesteigenvector, representing the market, is extended and fallsexactly on the D = 0 line. Obviously
IP R values depend by the sample. Sincewe want to characterise its typical behaviour we takefor each sample the set of ordered eigenvalues and con-sider their IPR, then the IPRs over samples Fig.2. Realdata used are a set of 1017 daily asset returns fromNYSE from the first of June 1987 to the 31 of Decem-ber 1998. In order to compare qualitatively with data wefixed the values of the log-normal distribution by evalu-ating the mean and standard deviation of the logarithmof returns variances, namely µ = (cid:80) k =1 log ( σ k ) and D = (cid:80) k =1 ( log ( σ k ) − µ ) , where σ i ’s are the empiricalvariances. The figure we obtain shows localization at theedges, a common feature observed in real data analysis.In particular the IPR shows agreement not only in thetypical flat region related to the bulk where its value isfluctuating slightly over 3 /N but also on the edges (seeIPR in [17]), where we observe the increasing of the IP R .We then evaluate level spacings, λ n +1 − λ n , where λ n is the n -th eigenvalue of the covariance matrix, and ob-serve a clear left-shift in the spacings distribution [19, 20],mean that the skewness of spacings increases with het-erogeneity Fig. 3 approaching real data.To observe the heterogeneity effect we also need to con-sider a matrix observable not depending on the eigenvec-tor, such as IPR, but depending on the component so westudy the component participation ratio that we defineby the formula: CP R i = (cid:88) a ( v ai ) (8) −5 −4 −3 −2 −1 0 1 2 3−4.5−4−3.5−3−2.5−2−1.5−1−0.5 s p ( s ) Real datanull−modelJ −1 FIG. 3. Distribution of level spacings normalised by theirmean value, s n = λ n +1 − λ n (cid:104) λ n +1 − λ n (cid:105) , where λ n is the n -th eigenvalueof the covariance matrix. Data are presented in a log-log scale.Crosses show the IPR averaged over 10 matrices of daily assetreturns from NYSE from the first of June 1987 to the 31 ofDecember 1998. The J − line is the equal temperatures case, D = 0. Null-model data are averaged over 10 samples for asystem of size N = 100 with µ and D parameters obtainedfrom real data. that is just the equivalent of the IPR for the change ofbasis matrix transposed. We investigate the relation be-tween CPR and heterogeneity evaluating the correlationsbetween CPR and both T and 1 /T by constructing thescatter plot (log( T i ) , CP R i ). For real data we decidedto approximate different temperatures with the diffusionterms [21] so we plot (log( D ( e ) i ) , CP R i ) Fig. 4, where D ( e ) i = T T − (cid:80) t =1 ( r i ( t + 1) − r i ( t )) being r i ( t ) the return ofasset i at time t . The effect holds also considering vari-ances versus CPR.The inverted-bell shape indicates that high and low tem-perature/volatility components are the most localizedones. This result depends both on the presence of cou-plings and heterogenous temperatures/volatilities: withno couplings the covariance matrix would be diagonaland so all the eigenvectors would be localized and withtoo low heterogeneity the differences between diffusionterms would be negligible and would not affect localiza-tion so clearly.An explanation for this effect can be achieved if weconsider the uncoupled case where every eigenvector issharply localized since the matrix is diagonal. If we nowput a coupling between the components what happensis that the ones in the bulk with closer eigenvalues arelikely to interact and spread while the ones on the edgesare related to more isolated eigenvalues so are less likelyto mix with others and will stay more localized. Thispicture should hold until the couplings are large enoughto contrast the differences in temperature. C P R ( T ) null−modelReal data FIG. 4. Scatter plot of the components in the plane( log ( T i ) , CP R i ). We can see an inverted-bell shape that isabsent in GOE matrices, i.e. with no heterogeneity. Realand null-model data are over 10 matrices of size N = 100.For null-model data we used the values for µ and D obtainedfrom real data. IV. CONCLUSION
We have analysed a simple model of complex systemsthat provides a method for sampling random matrices.We have shown how our method gives results which arein agreement with eigenvector localization ubiquitous inreal data. This model suggests that heterogeneity amongsignals is likely to cause localization, as indicated alsoby known random band models [22, 23]. The analysisshowed the peculiar characteristic that localizationinvolves both the noisiest signals and the most determin-istic ones, the inverted-bell effect. Another interestingaspect is the heterogeneity effect in localization in themodel proposed showing a non-trivial transition froma coupling dominated phase, where spectral propertiesare the same as those of Wishart matrices, towardsan heterogeneity dominated phase, where localizationon the edges of the spectrum occurs. A theoreticalperspective is to establish whether the effect arises froma simple crossover or from a real phase transition, validalso in the thermodynamical limit, i.e. for infinite N ,and possibly to characterise more in detail the twophases by examining also other matrix properties. Toimprove the comparison with real data, especially infinance, another perspective is characterising the caseof finite time-samplings, i.e. finite ratio Q = M/N andcheck how the interplay of heterogeneities, couplingsand finite time-samplings change the properties of thecovariance matrix in a benchmark case.The research leading to these results has receivedfunding from the European Research Council underthe European Unions Seventh Framework Programme(FP7/2007-2013) / ERC grant agreement No. 247328.
V. APPENDIX A
We showed in the general case how couplings, covari-ances and heterogeneities are related. Here we show in aperturbative limit of small couplings what happens pass-ing from the covariance to the correlation matrix.We write J = I + (cid:15)K (1) where I is the identity matrix, K (1) is a random symmetric gaussian matrix and (cid:15) is anarbitrarily small real number. At first order in (cid:15) the co-variance matrix must satisfy the perturbative expression C = ˜ T + (cid:15) Σ (1) . Consequently K (1) and Σ (1) verify: K (1) ij ( T i + T j ) = − (1) ij (9)Furthermore C ii = T i + (cid:15) Σ (1) ii so for the covariance matrixwe have: C ij = T i − (cid:15) K (1) ij ( T i + T j ) (10)while the correlation matrix c ij = C ij √ C ii C jj satisfies: c ij = I − (cid:15) K (1) ij ( T i + T j ) √ T i (cid:112) T j (11)First-order expansion reveals a symmetry between T and1 /T in the correlation matrix, that can be easily verified.This expansion allows us to consider a simplified random-matrix ensemble for the covariance matrices of weakly-coupled heterogeneous time-series for which analytical re-sults can be obtained [24]. Moreover in case of strongheterogeneity, i.e. T i >> T j c ij = c ji = (cid:15) (cid:113) T i T j K (1) ij , so ifthere is a low probability for a large value of | K (1) ij | , theelements of the correlation matrix on the rows/columnsrelated to variables with high or low temperature can besignificantly bigger than the others. From the theory ofLevy matrices [25] we know that large values of specificpair-correlation coefficients, i.e. a large c mn , implies thepresence of eigenvectors concentrated on the two compo-nents involved, e.g. m and n . Moreover if the elementsof a whole row are large compared to the rest of matrixthere will be an eigenvector localized on the related com-ponent. An higher-order expansion shows the breaking ofthis high/low temperature symmetry in favour of the low-temperature components. At second order in (cid:15) we canwrite C = T + (cid:15) Σ (1) + (cid:15) Σ (2) and J = I + (cid:15)K (1) + (cid:15) K (2) .This higher order expansion leads to the supplementary equation for Σ (2) :2Σ (2) ij = ( T i + T j )( − K (2) ij + (cid:88) k K (1) ik K (1) kj )+2 (cid:88) k K (1) ik T k K (1) kj , (12)where we substituted Σ (1) with the expression found atfirst-order (9). If we know divide by √ T i (cid:112) T j we obtainthe second order correction to the correlation matrix c that reads:( T i + T j ) √ T i (cid:112) T j ( − K (2) ij + (cid:88) k K (1) ik K (1) kj ) + 2 (cid:80) k K (1) ik T k K (1) kj √ T i (cid:112) T j . (13)The first two terms remain unchanged if we substitute T i with 1 /T i but the third one does not, it breaks the sym-metry in favour of elements related to components withlow temperatures. We stress the fact that (cid:15) is small re-gardless the value of the system size N . If one performedthe expansion for large N , then terms at all orders wouldhave to be considered since at higher orders matrix mul-tiplication would involve sums on an increasing numberof elements. VI. APPENDIX B
For a given coupling matrix J and set of temperatures T the equilibrium distribution of the signals x i is a mul-tivariate Gaussian, namely: P ( { x i }| J, T ) = exp ( − x T C − x ) (cid:112) (2 π ) N det C (14)where C is the covariance matrix, solution of eq. (5). Theempirical covariance matrix between signals extractedfrom this distribution defines a correlated Wishart en-semble [13, 26–28] whose peculiarity is the separation ofthe quenched disorders given by couplings and tempera-tures. VII. ACKNOWLEDGEMENTS
I want to thank C. Cammarota, B. Cerruti, A. De-celle, C. Lucibello, G.Parisi, J. Rocchi and B. Seoane forinteresting discussions. [1] R. K. Pan and S. Sinha, Phys. Rev. E , 046116 (2007).[2] B. Podobnik, D. Wang, D. Horvatic, I. Grosse, and H. E.Stanley, EPL (Europhysics Letters) , 68001 (2010).[3] A. E. Biondo, A. Pluchino, A. Rapisarda, and D. Hel-bing, PloS one , e68344 (2013). [4] M. Potters, J.-P. Bouchaud, and L. Laloux, arXivphysics/0507111, Financial applications of random ma-trix theory: Old laces and new pieces (2005).[5] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Ama-ral, T. Guhr, and H. E. Stanley, Phys. Rev. E , 066126(2002). [6] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters,Phys. Rev. Lett. , 1467 (1999).[7] Z. Burda and J. Jurkiewicz, Physica A: Stat. Mech. ,67 (2004).[8] A. Utsugi, K. Ino, and M. Oshikawa, Phys. Rev. E ,026110 (2004).[9] P. Cizeau, Y. Liu, M. Meyer, C.-K. Peng, and H. Eu-gene Stanley, Physica A: Stat. Mech. , 441 (1997).[10] Y. Liu, P. Gopikrishnan, P. Cizeau, Meyer, and H. E.Stanley, Phys. Rev. E , 1390 (1999).[11] J.-P. Bouchaud, M. Potters, and M. Meyer, The Euro-pean Physical J. B (Cond. Matt.) , 595 (2000).[12] Z. Burda, J. Jurkiewicz, M. A. Nowak, G. Papp, andI. Zahed, Physica A: Stat. Mech. , 694 (2004).[13] Z. Burda, A. T. G¨orlich, and B. Wac(cid:32)law, Phys. Rev. E , 041129 (2006).[14] G. Akemann, J. Fischmann, and P. Vivo, Physica A:Stat. Mech. , 2566 (2010).[15] J. Edwards and D. Thouless, Journal of Physics C: SolidState Physics , 807 (1972).[16] A. Chakraborti, I. M. Toke, M. Patriarca, andF. Abergel, Quant. Fin. , 991 (2011). [17] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Ama-ral, and H. E. Stanley, Phys. Rev. Lett. , 1471 (1999).[18] H. Risken, Fokker-Planck Equation (Springer, 1984).[19] B. Shklovskii, B. Shapiro, B. Sears, P. Lambrianides, andH. Shore, Physical Review B , 11487 (1993).[20] O. Agam, B. L. Altshuler, and A. V. Andreev, Physicalreview letters , 4389 (1995).[21] S. Siegert, R. Friedrich, and J. Peinke, Physics LettersA , 275 (1998).[22] G. Casati, L. Molinari, and F. Izrailev, Phys. Rev. Lett. , 1851 (1990).[23] Y. V. Fyodorov and A. D. Mirlin, Phys. Rev. Lett. ,2405 (1991).[24] P. Barucca, Quenched heterogeneities in disordered sys-tems , Ph.D. thesis, Sapienza University (2014).[25] P. Cizeau and J.-P. Bouchaud, Phys. Rev. E , 1810(1994).[26] V. A. Marˇcenko and L. A. Pastur, Sbornik: Mathematics , 457 (1967).[27] Z. Burda, J. Jurkiewicz, and B. Wac(cid:32)law, Phys. Rev. E , 026111 (2005).[28] S. H. Simon and A. L. Moustakas, Physical Review E69