[PDF] Complex market dynamics in the light of random matrix theory

Abstract

We present a brief overview of random matrix theory (RMT) with the objectives of highlighting the computational results and applications in financial markets as complex systems. An oft-encountered problem in computational finance is the choice of an appropriate epoch over which the empirical cross-correlation return matrix is computed. A long epoch would smoothen the fluctuations in the return time series and suffers from non-stationarity, whereas a short epoch results in noisy fluctuations in the return time series and the correlation matrices turn out to be highly singular. An effective method to tackle this issue is the use of the power mapping, where a non-linear distortion is applied to a short epoch correlation matrix. The value of distortion parameter controls the noise-suppression. The distortion also removes the degeneracy of zero eigenvalues. Depending on the correlation structures, interesting properties of the eigenvalue spectra are found. We simulate different correlated Wishart matrices to compare the results with empirical return matrices computed using the S&P 500 (USA) market data for the period 1985-2016. We also briefly review two recent applications of RMT in financial stock markets: (i) Identification of "market states" and long-term precursor to a critical state; (ii) Characterization of catastrophic instabilities (market crashes).

Full PDF

CComplex market dynamics in the light ofrandom matrix theory

Hirdesh K. Pharasi, Kiran Sharma, Anirban Chakraborti and Thomas H. Seligman

Abstract

We present a brief overview of random matrix theory (RMT) with theobjectives of highlighting the computational results and applications in ﬁnancialmarkets as complex systems. An oft-encountered problem in computational ﬁnanceis the choice of an appropriate epoch over which the empirical cross-correlationreturn matrix is computed. A long epoch would smoothen the ﬂuctuations in thereturn time series and suffers from non-stationarity, whereas a short epoch results innoisy ﬂuctuations in the return time series and the correlation matrices turn out tobe highly singular. An effective method to tackle this issue is the use of the powermapping, where a non-linear distortion is applied to a short epoch correlation ma-trix. The value of distortion parameter controls the noise-suppression. The distor-tion also removes the degeneracy of zero eigenvalues. Depending on the correlationstructures, interesting properties of the eigenvalue spectra are found. We simulatedifferent correlated Wishart matrices to compare the results with empirical returnmatrices computed using the S&P 500 (USA) market data for the period 1985-2016.We also brieﬂy review two recent applications of RMT in ﬁnancial stock markets:

Hirdesh K. PharasiInstituto de Ciencias F´ısicas, Universidad Nacional Aut´onoma de M´exico, Cuernavaca-62210,M´exico, e-mail: [email protected]

Kiran SharmaSchool of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India, e-mail: [email protected]

Anirban ChakrabortiSchool of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India, e-mail: [email protected]

Thomas H. SeligmanInstituto de Ciencias F´ısicas, Universidad Nacional Aut´onoma de M´exico, Cuernavaca-62210,M´exico and Centro Internacional de Ciencias, Cuernavaca-62210, M´exico e-mail: [email protected] a r X i v : . [ q -f i n . S T ] S e p Hirdesh K. Pharasi, Kiran Sharma, Anirban Chakraborti and Thomas H. Seligman (i) Identiﬁcation of “market states” and long-term precursor to a critical state; (ii)Characterization of catastrophic instabilities (market crashes).

With the advent of the “Big Data” era [10, 14], large data sets have become ubiq-uitous in numerous ﬁelds – image analysis, genomics, epidemiology, engineering,social media, ﬁnance, etc., for which we need new statistical and analytical meth-ods [4, 6, 7, 16, 30]. Empirical correlation matrices are of primal importance inbig data analyses, since various statistical methods strongly rely on the validity ofsuch matrices in order to isolate meaningful information contained in the “observa-tional” signals or time series [3]. Often the time series are of ﬁnite lengths, whichcan lead to spurious correlations and make it difﬁcult to extract the signal from noise[12, 27]. Hence, it is very important to understand quantitative effects of ﬁnite-sizetime series in determination of empirical correlations [9, 12, 27, 34].Random matrix theory (RMT) tries to describe statistics of eigenvalues of ran-dom matrices, often in the limit of large dimensions. The subject came up ﬁrst in acelebrated paper of J. Wishart [40] in 1929 where he proposed that the correlationmatrix of white noise time series was an adequate prior for correlation matrices.E. Cartan proposed the classical random matrix ensembles in an important but littleknown paper [5]. After that there was increasing interest in the subject among whichit is important to mention work by L.G. Hua, who published the ﬁrst monographson the subject in 1952; an English translation is available [13].Wigner introduced RMT to physics, based on the assumption that the interac-tions between the nuclear constituents were so complex that they could be modeledas random ﬂuctuations in the framework of his R-matrix scattering theory [37]. Thisculminated in the presentation of the Hamiltonian ˆ H as a large random matrix, suchthat the energy levels of the nuclear system could be approximated by the eigenval-ues of this matrix, and indeed the spacings between the energy levels of nuclei couldbe modeled by the spacing of eigenvalues of the matrix [38, 39]. The use of RMThas spread over many ﬁelds from molecular physics [15] to quantum chromody-namics [29]. Lately, RMT has become a popular tool for investigating the dynamicsof ﬁnancial markets using cross-correlations of empirical return time series [26, 31].In this chapter, we present recent techniques of random matrix theory (RMT)mainly focused on computational results and applications of correlations in ﬁnan-cial markets viewed as complex systems [2, 11, 31, 32]. A central problem thatoften arises in computational ﬁnance is the choice of the epoch size over which theempirical cross-correlation return matrix needs to be computed. A very long epochwould smoothen the ﬂuctuations in return time series and also the time series suf-fers from the problem of non-stationarity [20], whereas a short-time epoch wouldresult in noisy ﬂuctuations in return time series and the correlation matrix turns outto be highly singular (with many zero eigenvalues) [9]. Among others, an effectivemethod to tackle this issue has been the use of the power mapping [9, 12, 27, 34],where a non-linear distortion is applied to a short epoch correlation matrix. Here, wedemonstrate how the value of distortion parameter controls the noise-suppression. omplex market dynamics in the light of random matrix theory 3 It also removes the degeneracy of the zero eigenvalues (which for very small val-ues of the distortion parameter leads to a well separated “emerging spectum” nearzero). Depending on the correlation structures, interesting properties of the eigen-value spectra are found. Correlation matrices constructed from white noise were in-troduced by Wishart and their eigenvalue spectrum gets a shape of Mar˘cenko-Pasturdistribution [17]; there are signiﬁcant deviations when a correlation structure is in-troduced [8]. We simulate different correlated Wishart matrices [19, 40] to comparethe results with empirical return matrices computed using S&P 500 (USA) marketdata for the period 1985-2016 [9]. We also brieﬂy review two recent applications ofRMT in ﬁnancial stock markets: (i) Identiﬁcation of “market states” and long-termprecursor to a critical state [24]; (ii) Characterization of catastrophic instabilities(market crashes) [9].This paper is described as follows. Section 2 discusses the data description,methodology and results in details. Section 3 contains applications of RMT in ﬁ-nancial markets. Finally, section 4 contains concluding remarks.

We have used the database of Yahoo ﬁnance [1], for the time series of adjustedclosure prices for S&P 500 (USA) market, for the period 02/01/1985 to 30/12/2016( T = N = Table 1

Abbreviations of ten different sectors for S&P 500 index

Labels Sectors Labels SectorsCD

Consumer Discretionary ID Industrials CS Consumer Staples IT Information Technology HC Health Care MT Materials EG Energy TC Technology FN Financials UT Utilities

Correlations between different ﬁnancial assets play fundamental roles in the analy-ses of portfolio management, risk management, investment strategies, etc. However,one only has ﬁnite time series of the assets prices; hence, one cannot calculate the

Hirdesh K. Pharasi, Kiran Sharma, Anirban Chakraborti and Thomas H. Seligman exact correlation among assets, but only an approximation. The quality of the es-timation of the true cross-correlation matrix strongly depends on the ratio betweenthe length of the ﬁnancial price time series T and the number of assets N . The largerthe ratio Q = T / N , the better the estimation is; though for practical limitations, theratio can be even smaller than unity. However, such correlation matrices are oftentoo noisy, and thus need to be ﬁltered from noise. To build the correlation matrices,we ﬁrst calculate the return r i from the daily price P i of stocks i = , ..., N , at time t (trading day): r i ( t ) = ln P i ( t ) − ln P i ( t − ) , (1)where P i ( t ) denotes the price of stock i at time t . Since different stocks have varyinglevels of volatility, we deﬁne the equal-time Pearson cross-correlation coefﬁcient as C i j ( τ ) = (cid:104) r i r j (cid:105) − (cid:104) r i (cid:105)(cid:104) r j (cid:105) σ i σ j , (2)where (cid:104) . . . (cid:105) denotes the time average and σ k denotes the standard deviation of thereturn time series r k , k = , . . . , N , computed over an epoch of M trading days end-ing on day τ . The elements C i j are restricted to the domain − ≤ C i j ≤

1, where C i j = C i j = − C i j = C i j are due toseveral reasons, which include the following:1. Market conditions change with time and the cross-correlations that exist be-tween any pair of stocks may not be stationary if an epoch chosen is too long.2. Too short epoch for estimation of cross-correlations introduces “noise”, i.e.,ﬂuctuations.For these reasons, the empirical cross-correlation matrix C ( τ ) often contains “ran-dom” contributions plus a part that is not a result of randomness [25, 23]. Hence,the eigenvalue statistics of C ( τ ) are often compared against those of a large randomcorrelation matrix – a correlation matrix constructed from mutually uncorrelatedtime series (white noise) known as Wishart matrix.We ﬁrst reproduce the basic results of RMT, e.g., the Mar˘cenko-Pastur distribu-tion, or Mar˘cenko-Pastur law, which describes the asymptotic behavior of eigen-values of square random matrices [17]. Then, we present a study of time evolutionof the empirical cross-correlation structures of return matrices for N stocks and theeigenvalues spectra over different time epochs, and try to extract some new proper-ties or information about the ﬁnancial market [9, 24]. Let us construct a large random matrix B arising from N random time series eachof length T , where the entries of a time series are real independent random variablesdrawn from a standard Gaussian distribution with zero mean and variance σ , such omplex market dynamics in the light of random matrix theory 5 that the resulting matrix B is N × T . Then the Wishart matrix can be constructed as W = T BB (cid:48) . (3)In RMT, the ensemble of Wishart matrices is known as the Wishart orthogonal en-semble . In the context of a time series, W may be interpreted as the covariance matrix, calculated over N stochastic time series, each with T statistically indepen-dent variables. This implies that on average, W does not have cross-correlations.A correlated Wishart matrix can be constructed as W = T GG (cid:48) , (4)where G = ζ / B , is a N × T matrix; G (cid:48) is the T × N transpose matrix of G , andthe N × N positive deﬁnite symmetric matrix ζ controls the actual correlations. If ζ is a diagonal matrix with the diagonal entries as unity and off-diagonal entries aszero (i.e., ζ = , the identity matrix), then the resulting matrix W reduces to oneof the former Wishart orthogonal ensemble . If the diagonal entries of ζ are unityand off-diagonal elements are non-zero and real, then the resulting matrices formthe correlated Wishart orthogonal ensemble . For simplicity, in this chapter, we havegenerated and used ζ for which all the off-diagonal elements are same (equal to aconstant U , which lies between zero and unity).The spectrum of eigenvalues for the Wishart orthogonal ensemble can be cal-culated analytically. For the limit N → ∞ and T → ∞ , with Q = T / N ﬁxed (andbigger than unity), the probability density function of the eigenvalues is given bythe Mar˘cenko-Pastur distribution:¯ ρ ( λ ) = Q πσ (cid:112) ( λ max − λ )( λ − λ min ) λ , (5)where σ is the variance of the elements of G , while λ min and λ max satisfy therelation: λ maxmin = σ (cid:18) ± √ Q (cid:19) . (6)For Q ≤

1, positive semi-deﬁnite matrices W , the density ¯ ρ ( λ ) in the aboveEq. 5 is normalized to Q and not to unity. Therefore, taking into account the ( N − T ) zeros, we have¯ ρ ( λ ) = Q πσ (cid:112) ( λ max − λ )( λ − λ min ) λ + ( − Q ) δ ( λ ) . (7)First, we have generated a Wishart matrix W (with ζ = ) of size N × N con-structed from N time series of real independent Gaussian variables, each of ﬁnitelength T , zero mean and unit variance ( σ = N and T on the probability distributions of the ele-ments W i j of the Wishart ensemble and the corresponding eigenvalue spectra. Fig. 1 Hirdesh K. Pharasi, Kiran Sharma, Anirban Chakraborti and Thomas H. Seligman( a ) ( b ) ( c )( d ) ( e ) ( f ) Fig. 1 (a)-(c) show the effect of ﬁnite size on true correlations with the dimensions of B ( N = ﬁnite, T = ﬁnite and Q (= T / N ) = ( W ij ) of the Wishartensemble of size, constructed from N time series, each with real independent Gaussian randomvariables of length T with zero mean and variance σ . The variance of the distribution of W ij decreases with the increase of N and T and reduces to zero for N → ∞ and T → ∞ with TN = ﬁnite.(d)-(f) show the density of eigenvalues ¯ ρ ( λ ) of Wishart ensemble, which are numerically ﬁttedwith the Mar˘cenko-Pastur distributions [17] (red dash lines) for all N and T . The numerical valuesof λ max = .

732 and λ min = .

468 of the spectra also match exactly with the results theoreticallycalculated from Eq. 6. Numerical results for the probability distributions of the elements ( W ij ) anddensities of the eigenvalues ( ¯ ρ ( λ )) have been generated using averages up to 200 ensembles. (a) shows the probability distribution of the elements of the Wishart matrix of di-mensions, where N = T = ρ ( λ ) , which takes the shape of the theoretical Mar˘cenko-Pastur distribution (red dashed line) [17]. Similarly, Figs. 1 (b) and (c) show therespective probability distributions of the elements of Wishart matrices generatedusing the sets of parameters N = T = N = T = N the shape of the distribution be-comes narrower, implying that the amount of spurious cross-correlations decreases.Ideally, the distribution should be a Dirac-delta at zero, since true cross-correlationsdo not exist. The eigenvalue spectra are less sensitive to the parameters N and T , ascan be seen in Figs. 1 (e) and (f), which show the corresponding eigenvalue spectra.For all of the above simulations, we ﬁnd the simulated data agree closely with thetheoretical Mar˘cenko-Pastur distributions (red dashed lines) with λ max = .

732 and λ min = .

468 (theoretically calculated using Eq. 6, and Q = T into n shorter epochs, each of size M (such that T / M = n ). The assumption of stationaritythen improves for each of the shorter epochs. However, if there are N return time omplex market dynamics in the light of random matrix theory 7( a ) ( b )( c ) ( d ) Fig. 2

Semi-log plot of the eigenvalue distribution of Wishart matrix W , using the set of pa-rameters (a) N = M = N = M =

64. For short epochs ( N > M ), theeigenvalue spectra have N − M + M − ε = . M , and also some of its eigenvaluesbecome negative at smaller values of M . series, such that N >> M , then the corresponding cross-correlation matrices arehighly singular with N − M + ( W i j ) of the Wishart matrix W (or later in each correlation coefﬁcient C i j of theempirical cross-correlation matrix C ) of short epoch by: W i j → ( sign W i j ) | W i j | + ε , (8)where ε is a noise-suppression parameter. For very small distortions, e.g., ε = . Hirdesh K. Pharasi, Kiran Sharma, Anirban Chakraborti and Thomas H. Seligman( a ) ( b ) ( c )( d ) ( e ) (f) Fig. 3

Eigenvalue spectra of correlated Wishart ensembles with parameters, N = M = U = .

1, (b) U = .

3, and (c) U = . ε = . U . tions). Later in this chapter, we study different aspects of the power mapping methodby varying the value of distortion ε from 0 to 0 . U = N >> M . The top row of Fig. 2 shows semi-log plots of the ensembles with parameters: (a) N = M = N = M =

64. Then small non-linear distortions with ε = .

001 are givento the ensembles to display the emerging spectra, shown in Figs. 2 (c) and (d).Interestingly, the shape of the emerging spectrum changes from a semi-circle to astrongly distorted one, as M becomes shorter. Also, note that emerging spectrumshifts towards the left side as M becomes shorter. For smaller values of M , someof the eigenvalues of emerging spectrum become negative. The number of negativeeigenvalues depend on the size of the epoch M , the distortion parameter ε and themean correlation in the case of a correlated Wishart ensemble [22].Fig. 3 shows the effect of a constant correlation with strength U on the eigenvaluespectra and the emerging spectra of correlated Wishart ensembles with parameters N = M =

64. Figs. 3 (a)-(c) show the eigenvalue distributions, on the semi-log scales, for the correlated Wishart ensembles with correlations U = . U = .

3, and U = .

8, respectively. Insets show the densities of non-zero eigenvalues,which are closely described by the Mar˘cenko-Pastur distributions in all cases. In thebottom row, Figs. 3 (d)-(f) show the densities of the corresponding emerging spectraarising from non-linear distortion of the degenerate eigenvalues at zero. The shapes omplex market dynamics in the light of random matrix theory 9( a ) ( b ) ( c )( d ) ( e ) ( f ) Fig. 4

Semi-log plots of the eigenvalue spectra for the correlated Wishart ensemble W with pa-rameters N = M =

256 at a constant correlation with U = .

1, and distortion parametersof: (a) ε =

0, (b) ε = .

1, (c) ε = .

2, (d) ε = .

4, (e) ε = .

6, and (f) ε = .

8. For ε = .

1, theemerging spectrum is well separated from non-zero eigenvalues but with the increase of the distor-tion parameter ε the emerging spectrum starts moving towards the remaining non-zero eigenvaluesspectra, and eventually merges with it at higher values, e.g., ε = . of the emerging spectra change from distorted semi-circle to Lorentzian-like, as theconstant correlation values increase for the correlated Wishart ensembles.Next, we present the effect of the distortion (or noise-suppression) parameter ε onthe eigenvalue spectra in Fig. 4. Fig. 4 (a)-(f) show the distributions of eigenvaluesfor the correlated Wishart ensembles with parameters N = M =

64, andvarying distortion parameter values: ε = . , . , . , . , . .

8, keeping aconstant correlation ( U = .

1) among all off-diagonal elements in ζ . The densitiesof non-zero eigenvalues are closely described by the Mar˘cenko-Pastur distributions,but the emerging spectra move towards the main spectra as the value of ε increases.The emerging spectra is absent at ε =

0, while it merges with the main spectrum athigh values of distortion parameter, e.g., ε = . We also analyze N =

194 adjusted daily closure price time series of the stocksof S&P 500 (USA) index from the Yahoo ﬁnance database [1]. As discussed inthe methodology subsection, we construct the empirical cross-correlation matrix C ( τ ) for an epoch of M =

200 trading days, ending on trading day τ . In Fig. 5 (a)and (e), we choose two correlation matrices for the time series from 07/03/2011 to16/12/2011 (high mean correlation) and 18/04/1995 to 30/01/1996 (low mean cor- relation), respectively. The color-bar shows the amount of correlation among thestocks. The stocks are arranged according to their industrial groups (abbreviationsare given in Table 1). The blocks along the diagonal show the correlations withinthe same industrial group. Fig. 5 (b) and (f) show the eigenvalue decomposition ofthe correlation matrices into the respective market mode, the group modes and therandom modes. From such a segregation/decomposition, it is also possible to recon-struct the contributions of different modes to the aggregate correlation matrix as weshow below.The largest eigenvalue of the correlation matrix, corresponds to a market modereﬂects the aggregate dynamics of the market common across all stocks, andstrongly correlated to the mean market correlation. The group modes capture thesectoral behavior of the market, which are 15 eigenvalues subsequent to the largesteigenvalue of the correlation matrix of Fig. 5 (c), and the 62 subsequent eigenval-ues for correlation matrix of Fig. 5(g). Remaining eigenvalues capture the randommodes behavior of the market (see Fig. 5 (d) and (h)). By using the eigenvalue de-composition, we can thus ﬁlter the true correlations (coming from the signal) and thespurious correlations (coming from the random noise). For this, we ﬁrst decomposethe aggregate correlation matrix as ( a ) ( b ) ( c ) ( d )( e ) ( f ) ( g ) ( h ) Fig. 5 (a) and (e) show the cross-correlation matrices of 194 stocks of S&P 500 for M = C = N ∑ i = λ i a i a (cid:48) i , (9)where λ i and a i are the eigenvalues and eigenvectors, respectively, of the correlationmatrix C . An easy way of handling the reconstruction of the correlation matrix isto sort the eigenvalues in descending order, and then rearranging the eigenvectors incorresponding ranks. This allows one to decompose the matrix into three separatecomponents, viz., market, group and random C = C M + C G + C R , (10) = λ a a (cid:48) + N G ∑ i = λ i a i a (cid:48) i + N ∑ i = N G + λ i a i a (cid:48) i , (11)where N G is taken to be 15 for the high mean correlated matrix (Fig. 5(a)) and 62 forthe low mean correlation (Fig. 5(e)), i.e., corresponding to the 15 (or 62) eigenvaluesafter the largest one, for two chosen correlation matrices. It is worth noting that theresult is not extremely sensitive to the exact value of N G . As mentioned above, theeigenvectors from 2 to N G describe the sectoral dynamics.Fig. 5(c) and (g) show the correlation matrices after removing the market modeand random modes from the respective correlation matrices; so the matrices showgroup modes only. We can see the block structures, which exhibit the correlationsamong the sectors. Figs. 5 (d) and (h) show the correlation matrices after removingthe market mode and group modes; so the matrices display the random modes only.An important observation is that the market mode shifts towards the right withthe increment of the mean correlation. The group modes almost coincide with the ( a ) ( b ) ( c ) Fig. 6 (a) Average cross-correlation matrix of 194 stocks of S&P 500 in 32-years period from1985 to 2016. The stocks are arranged according to their industrial groups (abbreviations are givenin Table 1). The diagonal blocks show the correlations within the same industrial groups and offdiagonal elements show correlations with other industrial groups. (b) Eigenvalue decompositionof the average correlation matrix into market mode, group modes and random modes. The marketmode captures the mean market correlation. The group modes give the sectoral behavior of themarket. The random modes of the correlation matrix yield the Mar˘cenko-Pastur distribution. (c)Eigenvalue spectrum of the correlation matrix, evaluated for the long return time series for theentire period of 32-years, with the maximum eigenvalue of the normal spectrum λ max = .

72. Thelargest eigenvalue is well separated from the ‘bulk’. Inset shows the random part of the spectrum,with the smallest eigenvalue of the normal spectrum λ min = . random modes but with higher variance. Thus, the sectoral dynamics are almostabsent whereas the market mode is very strong (similar to what was observed inRef. [28]).Fig. 6 (a) shows the average cross-correlation matrix of N =

194 stocks of S&P500 for the entire duration 1985-2016 ( T = N << T so we do not get any zero eigenvalues. The maximumeigenvalue ( λ max = .

72) of the spectra dominates the whole market. The next 19eigenvalues correspond to the group modes, and the rest behave as random modes.The smallest eigenvalue of the spectrum λ min = . surrogate data( N =

194 correlated Gaussian noises, each of length T = surrogate cross-correlation matrix ( N = T = CD , FN , ID and MT of Fig. 6 (a)) and they show high inter-sectorialcorrelation in S&P 500 market in 32 years. Eigenvalue spectra of the correlationmatrices are shown in Figs. 7 (b) and (e), each of which consists of the Mar˘cenko-Pastur distributions (see insets), followed by 10 (and 7) eigenvalues correspondingto 10 (and 7) blocks (similar to sectors), respectively. Figs. 7 (c) and (f) show the 3 D MDS plots, where the points (representing stocks) are scattered based on the corre-lations among the 10 and 7 blocks, respectively. In the MDS maps, more correlatedstocks are placed nearby and anti-correlated are placed far apart (see also Ref. [24]).The k -means clustering performed on the surrogate data matrices, with k =

10 and k =

7, yield 10 and 7 different clusters (represented in different colors), respectively.

Next, we study the time evolution of the market correlations computed with thedaily returns of N =

194 stocks of S&P 500 over the period of 32-year (1985-2016,with T = < C i j > ), meanof absolute values of correlation coefﬁcients ( < | C i j | > ) and the difference of themean and the mean of absolute values of correlation coefﬁcients d f = < | C i j | > − < C i j > for short epochs of M =

20 days, with shifts of: ∆ τ = ∆ τ =

10 days (50% overlap), respectively. Shifts toward the positive side omplex market dynamics in the light of random matrix theory 13 of correlations are pointing toward periods of market crashes (with very high meancorrelation values). The values of d f are anti-correlated with the values of the meanof correlation coefﬁcients. During a market crash when mean of correlation coefﬁ-cient is high, there are very little anti-correlations among the stocks, then the valueof d f is extremely small, indeed near to zero (see Ref. [18]). It may act as an indi-cator of a market crash, as we observe that there is a high anti-correlation betweenthe values of d f and < C i j > , with leads of one or two days (ahead of the marketcrashes). Similarly, Fig. 8 (c) and (d) show the plots of variance, skewness, and kur-tosis of the correlation coefﬁcients C i j as functions of time with shifts of ∆ τ = ∆ τ =

10 days, respectively. The mean correlation is anti-correlated to varianceand skewness of C , i.e., when the mean correlation is high then both variance andskewness are low. Kurtosis is highly correlated to the mean correlation. These ob-servations are seen in the dynamical evolution of the market with epochs of M = ∆ τ = ,

10 day(s).The scatter plots between < C i j > and < | C i j | > , and < C i j > and d f (= < | C i j | > − < C i j > ) for different time lags (no-lag, lag-1, lag-2, and lag-3) of empirical ( a ) ( b )( d ) ( e ) ( c )( f ) Fig. 7 (a) Cross-correlation matrices constructed from the correlated Gaussian time series with 10diagonal blocks of different correlations (equal to the average correlation of each sector in Fig. 6(a)). (d) shows the same cross-correlation matrix but with one big block and 6 smaller blocks. Themean correlation of the big block is equal to the mean correlation of four sectors ( CD , FN , ID and MT of Fig. 6 (a)). They have high inter-sectorial correlation over the last 32 years in S&P500 market. (b) and (e) show the eigenvalue spectra of the correlation matrices, which consist ofthe Mar˘cenko-Pastur distributions followed by 10 group modes corresponding to 10 sectors and7 group modes corresponding to 7 sectors, respectively. Insets show the enlarged pictures of therandom part of the spectrum. (c) and (f) show plots of 10 and 7 different clusters, respectively,drawn in different colors using 3-dimensional k -means clustering technique. The clustering wasperformed on 3- D multidimensional scaling (MDS) map of 194 stocks. Each point on the MDSmap represents a stock of the market. The points are scattered in the map, based on the cross-correlations among the stocks – more correlated stocks are placed nearby and less correlated areplaced far apart (see also Ref. [24]).4 Hirdesh K. Pharasi, Kiran Sharma, Anirban Chakraborti and Thomas H. Seligman( a ) ( b )( c ) ( d ) Fig. 8

Plots of mean of correlation coefﬁcients ( < C ij > ), mean of absolute values of correlationcoefﬁcients ( < | C ij | > ) and the difference ( d f = < | C ij | > − < C ij > ) as functions of time, forshort epochs of M =

20 days, and shifts of: (a) ∆τ = ∆τ =

10 days. We ﬁnd thatduring crashes (when mean correlation is very high), the difference d f = < | C ij | > − < C ij > shows a minimum (close to zero) (see Ref. [18]). Plots of variance ( σ ), skewness, and kurtosis ofthe correlation coefﬁcients as functions of time, for short epochs of M =

20 days, and shifts of: (c) ∆τ = ∆τ =

10 days. correlation matrices C ( τ ) , with 194 stocks of S&P 500 and epochs of M =

20 days,and shift of ∆ τ = < C i j > vs. < | C i j | > and < C i j > vs. d f , at differenttime lags. The variances of the scatter plots increase with the increase of time lag,keeping the value of linear correlation coefﬁcient almost similar. The strong linearcorrelation between < C i j > and < | C i j | > may give us an early information abouta crash up to 3 days ahead (from the result of lag-3). Similar linear correlations arealso visible in Fig. 9 (c) and (d), between < C i j > and < | C i j | > , and < C i j > and d f , at different time lags (no-lag, lag-1, lag-2, and lag-3) for a shift of ∆ τ =

10 days.Here, obviously lag-1, lag-2, and lag-3 represent time lags of 10 days, 20 days, and30 days, respectively. The large variances in scatter plots indicate that it is hard todetect and extract information about a crash, e.g., 30 days in advance.Fig. 10 (a) shows the temporal variation of mean correlation ( < C i j > ), max-imum eigenvalue ( λ max ), number of negative eigenvalues ( − ve EV ) and small- omplex market dynamics in the light of random matrix theory 15( a )( b )( c )( d ) Fig. 9

Scatter plots of < C ij > vs. < | C ij | > and < C ij > and d f = < | C ij | > − < C ij > , fordifferent time lags (No lag,1-day, 2-days and 3-days) for the correlation matrix of epoch 20 days,with shifts of: (a)-(b) ∆τ = ∆τ =

10 days. The color-bar shows the time period inyears. est eigenvalue ( λ min ) of the emerging spectra with a shift of ∆ τ = ε = . ε = .

01 is negligibleon non-zero eigenvalues of the spectrum including λ max . We high correlation be-tween < C i j > and λ max . But the other properties of emerging spectrum ( − ve EV and λ min ) are less correlated with mean correlation < C i j > [22]. Fig. 10 (b) showsthe same for the shift of ∆ τ =

10 days. a ) ( b ) Fig. 10

Plots for mean of correlation coefﬁcients ( < C ij > ), maximum eigenvalue ( λ max ), numberof negative eigenvalues ( − ve EV ) and smallest eigenvalue ( λ min ) of the spectrum as a function oftime for an epoch of 20 days at ε = .

01 with shifts of: (a) ∆τ = ∆τ =

10 days. Thecorrelation between < C ij > and λ max is high, but two other properties of the “emerging spectrum”( − ve EV and λ min ) are less correlated to mean correlation < C ij > . The study of the critical dynamics in any complex system is interesting, yet it canbe very challenging. Recently, Pharasi et al. [24] presented an analysis based on thecorrelation structure patterns of S&P 500 (USA) data and Nikkei 225 (JPN) data,with short time epochs during the 32-year period of 1985-2016. They identiﬁed“market states” as clusters of similar correlation structures which occurred morefrequently than by pure chance (randomness).They ﬁrst used the power mapping to reduce noise of the singular correlationmatrices and obtained distinct and denser clusters in three dimensional MDS map(as shown in Fig. 11(a)). The effects of noise-suppression were found to be promi-nent not only on a single correlation matrix at one epoch, but also on the similaritymatrices computed for different correlation matrices at different short-time epochs,and their corresponding MDS maps. Using 3D-multidimensional scaling maps, theyapplied k -means clustering to divide the clusters of similar correlation patterns into k groups or market states. One major difﬁculty of this clustering method is that onehas to pass the value of k as an input to the algorithm. Normally, there are sev-eral proposed methods of determining the value of k (often arbitrary). Pharasi et al.[24] showed that using a new prescription based on the cluster radii and an optionalchoice of the noise suppression parameter, one could have a fairly robust determi-nation of the “optimal” number of clusters.In the new prescription, they measured the mean and the standard deviation ofthe intra-cluster distances using an ensemble of fairly large number (about 500) of omplex market dynamics in the light of random matrix theory 17( a ) ( b )( c ) ( d ) Fig. 11 (a) Classiﬁcation of the US market into four typical market states. k -means clustering isperformed on a MDS map constructed from noise suppressed ( ε = .

6) similarity matrix [21].The coordinates assigned in the MDS map are the corresponding correlation matrices constructedfrom short-time series of M =

20 days and shifted by ∆τ =

10 days. (b) shows the four differentstates of the US market as S1, S2, S3 and S4, where S1 corresponds to a calm state with low meancorrelation, and S4 corresponds to a critical state (crash) with high mean correlation. (c) Temporaldynamics of the US market in four different states ( S , S , S S

4) for the period of 1985 − S S S S S different initial conditions (choices of random coordinates for the k -centroids orequivalently random initial clustering of n objects); each set of initial conditionsusually results in slightly different clustering of the n objects representing differentcorrelation matrices. If the clusters of points are very distinct in the coordinate space,then even for different initial conditions, the k -means clustering method yields sameresults, producing a small variance of the intra-cluster distance. However, the prob-lem of allocating the matrices into the different clusters becomes problematic, whenthe clusters are very close or overlapping, as the initial conditions can then inﬂu-ence the ﬁnal clustering of the different points; so there is a larger variance of theintra-cluster distance for the ensemble of initial conditions. Therefore, a minimumvariance or standard deviation for a particular number of clusters implies the robust-ness of the clustering. For optimizing the number of clusters, Pharasi et al. proposedthat one should look for maximum k , which has the minimum variance or standarddeviation in the intra-cluster distances with different initial conditions. Thus, based on the modiﬁed prescription of ﬁnding similar clusters of correlation patterns, theycharacterized the market states for USA and JPN.Here, in Fig. 11(b), we reproduce the results for the US market, showing four typ-ical market states. The evolution of the market can be then viewed as the dynamicaltransitions between market states, as shown in Fig. 11(c). Importantly, this methodyields the correlation matrices that correspond to the critical states (or crashes).They correspond to the well-known ﬁnancial market crashes and clustered in mar-ket state S

4. They also analyzed the transition probabilities of the paired marketstates, and found that (i) the probability of remaining in the same state is muchhigher than the transition to a different states, and (ii) most probable transitions arethe nearest neighbor transitions, and the transitions to other remote states are rare(see Fig. 11(d)). Most signiﬁcantly, the state adjacent to a critical state (crash) be-haved like a long-term “precursor” for a critical state, serving an early warning fora ﬁnancial market crash.

Market crashes, ﬂoods, earthquakes, and other catastrophic events, though rarelyoccurring, can have devastating effects with long term repurcussions. Therefore, itis of primal importance to study the complexity of the underlying dynamics andsignatures of catastrophic events. Recently, Sharma et al. [9] studied the evolutionof cross-correlation structures of stock return matrices and their eigenspectra overdifferent short-time epochs for the US market and Japanese market. By using thepower mapping method, they applied the non-linear distortion with a small valueof distortion parameter ε = .

01 to correlation matrices computed for any epoch,leading to the emerging spectrum of eigenvalues.Here, we reproduce some of the signiﬁcant ﬁndings of the paper [9]. Interest-ingly, it is found that the statistical properties of the emerging spectrum display thefollowing features: (i) the shape of the emerging spectrum reﬂects the market insta-bility (see Fig. 12(a) and (b)), (ii) the smallest eigenvalue (in a similar way as themaximum eigenvalue, which captured the mean correlation of the market) indicatedthat the ﬁnancial market had become more turbulent, especially from 2001 on wards(see Fig. 12(c)), and (iii) the smallest eigenvalue is able to statistically distinguishthe nature of a market turbulence or crisis – internal instability or external shock (seeFig. 12(c)). In certain instabilities the smallest eigenvalue of the emerging spectrumwas positively correlated with the largest eigenvalue (and thus with the mean marketcorrelation) while in other cases there were trivial anti-correlations. They proposedthat this behavioral change could be associated to the question whether a crash is as-sociated to intrinsic market conditions (e.g., a bubble) or to external events (e.g., theFukushima meltdown). A lead-lag effect of the crashes was also observed throughthe behavior of λ min and mean correlation, which could be examined further. omplex market dynamics in the light of random matrix theory 19( a )( b ) ( c ) Fig. 12 (a) Non-critical (normal) period of the correlation matrix and its eigenvalue spectrum,evaluated for the short return time series for an epoch of M =

20 days ending on 08-07-1985,with the maximum eigenvalue of the normal spectrum λ max = .

63. Inset: Emerging spectrumusing power map technique ( ε = .

01) is a deformed semi-circle, with the smallest eigenvalueof the emerging spectrum λ min = − . M =

20 days ending on 15-09-2008, with themaximum eigenvalue of the normal spectrum λ max = .

49. Inset: Emerging spectrum using powermap technique ( ε = .

01) is Lorentzian, with the smallest eigenvalue of the emerging spectrum λ min = − . r ( t ) , (ii) mean market correlation µ ( t ) , (iii) smallesteigenvalue of the emerging spectrum ( λ min ), and (iv) t-value of the t-test, which tests the statisticaleffect over the lag-1 smallest eigenvalue λ min ( t − ) on the mean market correlation µ ( t ) . The meanof the correlation coefﬁcients and the smallest eigenvalue in the emerging spectra are correlatedto a large extent. Notably, the smallest eigenvalue behaves differently (sharply rising or falling)at the same time when the mean market correlation is very high (crash). The vertical dashed linescorrespond to the major crashes, which brewed due to internal market reactions. Note that, thesmallest eigenvalue of the US market indicates that the ﬁnancial market has become more turbulentfrom 2001 onward. Figure adapted from Ref. [9]. We have presented a brief overview of the Wishart and correlated Wishart ensem-bles in the context of ﬁnancial time series analysis. We displayed the dependenceof the length of the time series on the eigenspectra of the Wishart ensemble. Theeigenspectra of large random matrices are not very sensitive to Q = T / N ; however,the amount of spurious correlations is dependent on it. To avoid the problem ofnon-stationarity and suppress the noise in the correlation matrices, computed overshort epochs, we applied the power mapping method on the correlation matrices.We showed that the shape of the emerging spectrum depends on the amount of thecorrelation U of the correlated Wishart ensemble. We also studied the effect of thenon-linear distortion parameter ε on the emerging spectrum.Then we demonstrated the eigenvalue decomposition of the empirical cross-correlation matrix into market mode, group modes and random modes, using thereturn time series of 194 stocks of S&P 500 index during the period of 1985-2016.The bulk of the eigenvalues behave as random modes and give rise to the Mar˘cenko- Pastur. We also created surrogate correlation matrices to understand the effect of thesectoral correlations. Then we studied the eigenvalue distribution of those matricesas well as k -means clustering on the MDS maps generated from the correlation ma-trices. Evidently, if we have 10 diagonal blocks (representing sectors) then we get10 clusters on a MDS map. Similarly, when we merged the four blocks to one andhad 7 diagonal blocks then again we got 7 clusters on the MDS map.Further, we studied the dynamical evolution of the statistical properties of the cor-relation coefﬁcients using the returns of the S&P 500 stock market. We computedthe mean, the absolute mean, the difference between absolute mean and mean, vari-ance, skewness and kurtosis of the correlation coefﬁcients C i j , for short epochs of M =

20 days and shifts of ∆ τ = ∆ τ =

10 days. We also showed the evolu-tion of the mean of correlation coefﬁcients, maximum eigenvalue of the correlationmatrix, as well as the number of negative eigenvalues and smallest eigenvalue of theemerging spectrum, for the same epoch and shift.Finally, we discussed the applications of RMT in ﬁnancial markets. In an ap-plication, we demonstrated the use of RMT and correlation patterns in identifyingpossible “market states” and long-term precursors to the market crashes. In the sec-ond application, we presented the characterization of catastrophic instabilities, i.e.,the market crashes, using the smallest eigenvalue of the emerging spectra arisingfrom correlation matrices computed over short epochs.

Acknowledgements

The authors thank R. Chatterjee, S. Das and F. Leyvraz for the joint workspresented here. A.C. and K.S. acknowledge the support by grant number BT/BI/03/004/2003(C) ofGovt. of India, Ministry of Science and Technology, Department of Biotechnology, Bioinformaticsdivision, University of Potential Excellence-II grant (Project ID-47) of JNU, New Delhi, and theDST-PURSE grant given to JNU by the Department of Science and Technology, Government ofIndia. K.S. acknowledges the University Grants Commission (Ministry of Human Research De-velopment, Govt. of India) for her senior research fellowship. H.K.P. is grateful for postdoctoralfellowship provided by UNAM-DGAPA. A.C., H.K.P., K.S. and T.H.S. acknowledge support byProject CONACyT Fronteras 201, and also support from the project UNAM-DGAPA-PAPIIT IG100616.

References

1. Yahoo ﬁnance database. https://finance.yahoo.co.jp/ (2017). Accessed on 7thJuly, 2017, using the R open source programming language and software environment forstatistical computing and graphics.2. Bar-Yam, Y.: General features of complex systems. Encyclopedia of Life Support Systems(EOLSS), UNESCO, EOLSS Publishers, Oxford, UK (2002)3. Bendat, J.S., Piersol, A.G.: Engineering applications of correlation and spectral analysis. NewYork, Wiley-Interscience, 1980. 315 p. (1980)4. Bouchaud, J.P., Potters, M.: Theory of Financial Risk and Derivative Pricing: from StatisticalPhysics to Risk Management. Cambridge University Press (2003)5. Cartan, ´E.: Sur les domaines born´es homog`enes de lespace den variables complexes. In:Abhandlungen aus dem mathematischen Seminar der Universit¨at Hamburg, vol. 11, pp. 116–162. Springer (1935)omplex market dynamics in the light of random matrix theory 216. Chakraborti, A., Muni Toke, I., Patriarca, M., Abergel, F.: Econophysics review: I. empiricalfacts. Quantitative Finance (7), 991–1012 (2011)7. Chakraborti, A., Muni Toke, I., Patriarca, M., Abergel, F.: Econophysics review: Ii. agent-based models. Quantitative Finance (7), 1013–1041 (2011)8. Chakraborti, A., Patriarca, M., Santhanam, M.: Financial time-series analysis: A briefoverview. In: Econophysics of Markets and Business Networks, pp. 51–67. Springer (2007)9. Chakraborti, A., Sharma, K., Pharasi, H.K., Das, S., Chatterjee, R., Seligman, T.H.: Char-acterization of catastrophic instabilities: Market crashes as paradigm. arXiv preprintarXiv:1801.07213 (2018)10. Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies:A survey on big data. Information Sciences , 314–347 (2014)11. Gell-Mann, M.: What is complexity? Complexity , 16–19 (1995)12. Guhr, T., K¨alber, B.: A new method to estimate the noise in ﬁnancial correlation matrices.Journal of Physics A: Mathematical and General (12), 3009 (2003)13. Hua, L.: Harmonic analysis of functions of several complex variables in the classical domains.6. American Mathematical Soc. (1963)14. Jin, X., Wah, B.W., Cheng, X., Wang, Y.: Signiﬁcance and challenges of big data research.Big Data Research (2), 59–64 (2015)15. Leviandier, L., Lombardi, M., Jost, R., Pique, J.P.: Fourier transform: A tool to measure sta-tistical level properties in very complex spectra. Physical review letters (23), 2449 (1986)16. Mantegna, R.N., Stanley, H.E.: An introduction to econophysics: correlations and complexityin ﬁnance. Cambridge University Press, Cambridge (2007)17. Marˇcenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices.Mathematics of the USSR-Sbornik (4), 457 (1967)18. Martinez, M.M.R.: Caracterizaci´on estadistica de mercados europeos. Master’s thesis, UNAM(2018)19. Mehta, M.L.: Random Matrices. Academic Press (2004)20. Mikosch, T., St˘aric˘a, C.: Nonstationarities in ﬁnancial time series, the long-range dependence,and the igarch effects. Review of Economics and Statistics (1), 378–390 (2004)21. M¨unnix, M.C., Shimada, T., Sch¨afer, R., Leyvraz, F., Seligman, T.H., Guhr, T., Stanley, H.E.:Identifying states of a ﬁnancial market. Scientiﬁc Reports , 644 (2012)22. Ochoa, S.: Mapeo de guhr-kaelber aplicado a matrices de correlaci´on singulares de dos mer-cados ﬁnancieros. Master’s thesis, UNAM (2018)23. Pandey, A., et al.: Correlated wishart ensembles and chaotic time series. Physical Review E (3), 036,202 (2010)24. Pharasi, H.K., Sharma, K., Chatterjee, R., Chakraborti, A., Leyvraz, F., Seligman, T.H.: Iden-tifying long-term precursors of ﬁnancial market crashes using correlation patterns. ArXive-prints: 1809.00885 (2018)25. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Guhr, T., Stanley, H.E.: Randommatrix approach to cross correlations in ﬁnancial data. Physical Review E (6), 066,126(2002)26. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Stanley, H.E.: Universal andnonuniversal properties of cross correlations in ﬁnancial time series. Physical review letters (7), 1471 (1999)27. Sch¨afer, R., Nilsson, N.F., Guhr, T.: Power mapping with dynamical adjustment for improvedportfolio optimization. Quantitative Finance (1), 107–119 (2010)28. Sharma, K., Shah, S., Chakrabarti, A.S., Chakraborti, A.: Sectoral co-movements in the indianstock market: A mesoscopic network analysis pp. 211–238 (2017)29. Shuryak, E.V., Verbaarschot, J.: Random matrix theory and spectral sum rules for the diracoperator in qcd. Nuclear Physics A (1), 306–320 (1993)30. Sinha, S., Chatterjee, A., Chakraborti, A., Chakrabarti, B.K.: Econophysics: an introduction.John Wiley & Sons (2010)31. Utsugi, A., Ino, K., Oshikawa, M.: Random matrix theory analysis of cross correlations inﬁnancial markets. Physical Review E (2), 026,110 (2004)2 Hirdesh K. Pharasi, Kiran Sharma, Anirban Chakraborti and Thomas H. Seligman32. Vemuri, V.: Modeling of Complex Systems: An Introduction. Academic Press, New York(1978)33. Vinayak, Prosen, T., Bu˘ca, B., Seligman, T.H.: Spectral analysis of ﬁnite-time correlationmatrices near equilibrium phase transitions. Europhysics Letters (2), 20,006 (2014)34. Vinayak, Sch¨afer, R., Seligman, T.H.: Emerging spectra of singular correlation matrices undersmall power-map deformations. Physical Review E (3), 032,115 (2013)35. Vinayak, Seligman, T.H.: Time series, correlation matrices and random matrix models. In:AIP Conference Proceedings, vol. 1575, pp. 196–217. AIP (2014)36. Vyas, M., Guhr, T., Seligman, T.H.: Multivariate analysis of short time series in terms ofensembles of correlation matrices. ArXiv e-prints: 1801.07790 (2018)37. Wigner, E.: Ep wigner, ann. math. 53, 36 (1951). Ann. Math. , 36 (1951)38. Wigner, E.P.: On the distribution of the roots of certain symmetric matrices. Annals of Math-ematics pp. 325–327 (1958)39. Wigner, E.P.: Random matrices in physics. SIAM review9