[PDF] Multivariate analysis of short time series in terms of ensembles of correlation matrices

Abstract

When dealing with non-stationary systems, for which many time series are available, it is common to divide time in epochs, i.e. smaller time intervals and deal with short time series in the hope to have some form of approximate stationarity on that time scale. We can then study time evolution by looking at properties as a function of the epochs. This leads to singular correlation matrices and thus poor statistics. In the present paper, we propose an ensemble technique to deal with a large set of short time series without any consideration of non-stationarity. We randomly select subsets of time series and thus create an ensemble of non-singular correlation matrices. As the selection possibilities are binomially large, we will obtain good statistics for eigenvalues of correlation matrices, which are typically not independent. Once we defined the ensemble, we analyze its behavior for constant and block-diagonal correlations and compare numerics with analytic results for the corresponding correlated Wishart ensembles. We discuss differences resulting from spurious correlations due to repeatitive use of time-series. The usefulness of this technique should extend beyond the stationary case if, on the time scale of the epochs, we have quasi-stationarity at least for most epochs.

Full PDF

aa r X i v : . [ phy s i c s . d a t a - a n ] J a n Multivariate analysis of short time series in terms of ensembles ofcorrelation matrices

Manan Vyas, ∗ T. Guhr,

2, 3, † and T. H. Seligman

1, 3, ‡ Instituto de Ciencias F´ısicas, Universidad NacionalAut´onoma de M´exico, 62210 Cuernavaca, M´exico Fakult¨at f¨ur Physik, Universit¨at Duisburg-Essen,Lotharstra β e 1, D-47048, Duisburg, Germany Centro Internacional de Ciencias, 62210 Cuernavaca, M´exico

Abstract

When dealing with non-stationary systems, for which many time series are available, it is com-mon to divide time in epochs, i.e. smaller time intervals and deal with short time series in thehope to have some form of approximate stationarity on that time scale. We can then study timeevolution by looking at properties as a function of the epochs. This leads to singular correlationmatrices and thus poor statistics. In the present paper, we propose an ensemble technique to dealwith a large set of short time series without any consideration of non-stationarity. We randomlyselect subsets of time series and thus create an ensemble of non-singular correlation matrices. Asthe selection possibilities are binomially large, we will obtain good statistics for eigenvalues ofcorrelation matrices, which are typically not independent. Once we deﬁned the ensemble, we ana-lyze its behavior for constant and block-diagonal correlations and compare numerics with analyticresults for the corresponding correlated Wishart ensembles. We discuss diﬀerences resulting fromspurious correlations due to repeatitive use of time-series. The usefulness of this technique shouldextend beyond the stationary case if, on the time scale of the epochs, we have quasi-stationarityat least for most epochs. ∗ [email protected] † [email protected] ‡ [email protected] . INTRODUCTION Non-equilibrium stationary states (NESS) have attracted large amounts of attention inrecent years [1–6] but more recently increasing attention is given to non-stationary situa-tions, as they actually cover a wide range of observational as well as of experimental data.Such data cover a wide range of ﬁelds including astronomy, ﬁnancial markets and meteorol-ogy or chemical engineering, fractures and colloids, as well as numerical results for modelsof such systems and dynamical systems among many others. Among such systems the ones,that have several near stationary states with more or less abrupt transitions are of par-ticular interest. Such systems are wide spread and of relevance. They include bi-stable,and multi-stable systems with smooth transitions as well as systems that might run intocatastrophic instability. We can think of both types ocurring as ﬁrst order phase transitionsunder temperature change depending on conditions. Beyond that, we may hope that non-stationary systems may be quasi-stationary over suﬃciently short time periods. Yet abruptnon-stationarities may occur and we may hope to obtain either warnings or at least post-event learning from a correlation analysis of known facts over a short time period before theabrupt events.For the sake of illustration, let us think of a chemical reactor that should produce certainend products in a stationary fashion, but in fact the state is only quasi-stationary. Thisreactor may have other states that produce less of the desired and more undesirable productsand a transition might prove costly. Yet this might get much worse if breaking stationaritymay lead to explosions with release of toxic substances, that may in addition cause greatcost of lives and health, such as in Bhopal 1984 [7]. To use a Wishart model as a model fornon-stationarity was ﬁrst put forward in [8] and used for credit risk analysis in [9, 10].Our interest was triggered by studies of ﬁnancial markets, where the very attempt todeﬁne states of quasi-stationary evolution is relatively new [11]. In this paper, the correla-tion matrix of short time series was detected to be a good basis to specify the states andclustering techniques were used to identify these states. An attempt to detect conditionsunder which change may occur was not made and may also be futile in this context, asthe clustering technique by deﬁnition assigned each correlation matrix to a state and thusborders become unclear. One could use diﬀerent clustering algorithms to detect larger dif-ferences in clusterings, but this might depend very much on the deﬁnition of distance we2se [12].We believe that in some sense eigenvalues do indicate very relevant aspects of dynamicsand recently it was shown, that this is also true for the correlation matrices. Using Metropolisdynamics the larger eigenvalues of the correlation matrix of a 2-D Ising model at criticaltemperature, can display a power law, that can be directly derived from the power law ofspace correlations in this system [13]; it was further shown that such a power law will surviveif a suﬃciently large random subset of time series is used. Yet long time series are essentialto see this eﬀect because the number of large eigenvalues rapidly becomes too small as thecorrelation matrix becomes more and more singular with shorter time series. The use ofthe power map originally introduced for noise reduction [14, 15] was suggested in [16] as anappropriate tool to detect correlations if powers very near to identity are used, indeed in[13] this can be explicitly seen. Yet while the power map does detect correlations eﬃciently,information as to the nature of the correlation at this point has not been given.The power map is not transparent due to the nonlinearity and therefore we propose adiﬀerent path: we shall choose at random, subsets of time series that yield non-singularcorrelation matrices and treat these as an ensemble. The number of non-zero eigenvalues wehave can now be increased dramatically as the combinatorial choices are very large. Keepingin mind that ﬁnally there is no more information available than there is in the original datamatrix, we limit the actual number of selections from a single data matrix resulting frommany short time series.We propose to use uncorrelated and correlated Wishart ensembles as solvable or approxi-matively solvable examples to exemplify our method and see the results, but the results canalso be used as prior distribution to which experimental data can be compared to obtainclariﬁcation of the data.In the next section, we shall present brieﬂy the characteristics of data ensembles and thecorrelation matrices we can obtain. This will also serve to ﬁx notation. Next we will presentbasic results obtained from supersymmetric calculations to derive the formula for correlatedWishart ensembles with arbitrary correlations. We analyze the special case of block-wisecorrelated subsets of time series and then compare these results with numerical calculationswhere we restrict the block situation to two blocks. Note that the bulk of the spectra will bewell-described by the analytics while the outliers will only be approximated as the analyticresult we present is for large

N, T with a ﬁxed ratio κ = N/T and independent time series.3inally we give some conclusions and an outlook to applications that are under way.

II. CONSTRUCTION OF ENSEMBLE

Multivariate analysis of time series is an old problem, but its systematic application toquasi-stationary systems has generated an increased interest in large numbers of short timeseries. Examples thereof result from ﬁnancial markets [11, 16], but other ﬁelds includingtraﬃc, chemical reactors, astronomical data, dynamical systems, pixel by pixel analysis ofexperimental registers by CCD camera etc provide a wide range of data. Time series can alsobe obtained numerically from dynamical systems such as Ising models and TASEP [6, 16].The correlation matrix (or the covariance matrix) is the preferred object of analysis if longtime series are available. The matrices themselves, their eigenvalues and the eigenfunctionsmay be analyzed to extract meaningful information. But diﬃculties appear for short timeseries.The building blocks for the correlation(covariance) matrices are rectangular N × T datamatrices A = [ A ij ], with i = 1 , , . . . , N and j = 1 , , . . . , T . Each row in the data matrix A is a time series of length T , measured at usually equidistant times. It can be obtainedfrom observations or experimetal measurements of observables like stock prices, temperature,intensity, astronomical observations and so on. The matrix C = AA t /T , with A t denotingthe transpose of matrix A , is the N × N covariance matrix. Wishart matrices are randommatrix models used to describe universal features of covariance matrices [17]. We considerthe case for real entries A ij ∈ R , known in the literature as Wishart orthogonal ensemble(WOE). For WOE, the matrix elements of A are real independent Gaussian variables withﬁxed mean µ and variance σ i.e. A ij ∈ N R ( µ, σ ). In order to arrive at correlation matrices,one needs to normalize µ = 0 and σ = 1. In the context of time series, C may beinterpreted as the correlation matrix, calculated over stochastic time series of time horizon T for N statistically independent variables. By construction, C is a real symmetric positivesemideﬁnite matrix. For T < N , C is singular and has exactly ( N − T −

1) zero eigenvalues.Note that, stationarity improves when short time series are used. In real applications, oneneeds to understand the role of correlations and thus, correlated WOE (CWOE) modelsprovide the null hypothesis. CWOE is deﬁned by real-symmetric matrices C = AA t /T , with A = χ / A . Here, χ is a real symmetric positive deﬁnite non-random N × N matrix that4ccounts for the correlations in time series (rows) of data matrix A and A ij ∈ N R (0 , C = χ .We analyze highly singular correlation matrices ( N >> T ) by constructing ensembles ofcorrelation matrices from a given correlation matrix by randomly selecting short observa-tional time series. By randomly choosing m rows out of N given rows of A ( A ) such that m = aT with a being a real number close but smaller than unity, we construct an ensemble of m × T dimensional matrices. While making selections, we ensure that no two rows are samein a given matrix and no two matrices are same in the ensemble. Using these, we obtain anensemble of m × m non-singular correlation matrices and analyze eigenvalue distribution.If the number N of time series available is large compared to the number of entries T ineach time series, the discussion of eigenvalues becomes statistically unsatisfactory. A typicalexample would be ﬁnancial time series of increaments of 40 consecutive closing prices fora selection of N = 400 shares from some index (say a selection from Standard and Poors500). In this case we would obtain but 39 non-zero eigenvalues (40 for covariance matrices)from the 400 ×

400 correlation matrix, which might be all over the place. We propose toselect m time series (experimental, observational or computational) that is smaller then thelength T of the time series at random. If we allow all diﬀerent choices, we would end upwith a very large ensemble of correlation matrices (in our example, we might choose m = 36leading to (cid:0) (cid:1) ≈ . × choices, which is an unpractically large number). So we choosea random subset of a few thousand and get excellent statistics for eigenvalues. Having moremembers in the ensemble would increase the amount of spurious information, which entersunavoidably if we allow repeated time series in diﬀerent members of the ensemble. If onthe other hand we do not allow repetitions the results would depend very much on theselection we make and statistics would be less adequate. An alternative may be to make anensemble of ensembles with diﬀerent but totally independent choices, and calculate averagesand variances of speciﬁc satistical quantities obtained for a lower level ensemble. We choosenot to go this more complicated route.The question arises, how stable and informative the corresponding results are. Thepurpose of the present paper is to take this simple idea and compare it to cases whereanalytic results can be derived from well-known results [18, 19]. We start by analyzingwhite noise time series and the resulting correlation matrices known as the Wishart ensemble[17, 20] as well as for correlated Wishart ensembles with constant correlations [21]. Here, the5evel densities are known analytically and the n -point correlation function converges to theuniversal result [20]. Because the case of constant correlations will mimick real situationsonly very roughly, we shall study in more detail the situations where subsets of time seriesare more correlated among each other than with the time series of other subsets. This willbe typical case of market sectors of stock exchanges. To emphasize the characteristics ofsuch a block structure, we shall restrict ourselves in graphical displays to two blocks in thispaper.We shall see that clear signatures of the correlations (or lack thereof) can be obtained withvery good statistics. This distinguishes the present linear method both from the clusteringtechniques [11] and the power-law technique [16], which are inherently non-linear. Theﬁrst is a transparent standard technique but requires considerable previous insight into theproblem on hand, while the second turns out to be quite stable but interpretation is an openproblem. III. SUPERSYMMETRY APPROACH

Time series analysis is an imperative tool to study dynamics of variety of complex sys-tems. Wishart correlation matrices are standard models employed for statistical analysis ofensembles of time series. We provide here a brief sketch of the derivation using standardsupersymmetric steps; for further details refer to [18, 19, 22, 23].In multivariate analysis, it is desirable to derive a “null hypothesis” from a statisticalensemble to understand the measured eigenvalue density of the given correlation matrix.The random matrix ensemble we consider is CWOE with arbitrary correlations that givesthe ’empirical’ (population) correlation matrix C upon averaging over the probability densityfunction P ( A | C ) (normalized to unity), P ( A | C ) = [2 π det( C )] − T/ exp (cid:20) −

12 tr( A t C − A ) (cid:21) . (1)By construction, R d [ A ] P ( C ) AA t /T = C , with measure d [ A ] = Q Ni =1 Q Tj =1 dA ij being prod-uct of diﬀerentials of all independent elements in A . It is important to mention that, inthe supersymmetric approach, one assumes T ≥ N to ensure invertibility of C . In orderto be able to derive the ensemble averaged eigenvalue density (one-point function), we mayreplace C by diagonal matrix Λ of its eigenvalues { Λ , Λ , . . . , Λ N } since the domain R N × T A is orthogonally invariant.The ensemble averaged eigenvalue density for correlation matrix AA t is deﬁned by, S ( x ) = 1 N Z d [ A ] P ( A | Λ) N X i =1 δ ( x − Λ i ) . (2)In terms of resolvent, Equation 2 reads S ( x ) = 1 N π lim ǫ → ℑ (cid:20)Z d [ A ] P ( A | Λ)tr (cid:18) N ( x − iǫ ) N − AA t (cid:19)(cid:21) (3)In case of CWOE (also WOE), the eigenvalue density for the correlation matrices is de-rived using supersymmetry technique [18, 19]. In supersymmetric approach, the eigenvaluedensity is written as the derivative of the generating function. The generating function inturn is mapped onto a suitable superspace which leads to drastic reduction in degrees offreedom. Then, the eigenvalue density is derived by introducing eigenvalue coordinates forthe supermatrix and integrating over the anticommuting Grassmann variables.The generating function Z as a function of source variable J is the starting point of thesupersymmetry approach, Z ( J ) = Z d [ A ] P ( A | Λ) det( x + N + J N − AA t )det( x + N − AA t ) ; x + ∈ C . (4)Note that x + = x + iǫ and Z (0) = 1. The one-point function is then computed by thederivative, S ( x ) = − πN ∂Z ( J ) ∂J (cid:12)(cid:12)(cid:12)(cid:12) J =0 . (5)The generalized Hubbard-Stratanovich transformation [24, 25] and superbosonization for-mula [26] have been used to express the generating function as an integral over a suitablesuperspace. In fact, these are equivalent [27]. The determinant in the denominator ofEquation (4) can be expressed as a Gaussian integral over a vector in ordinary commutingvariables. Similarly, the determinant in the numerator can be expressed as a Gaussian inte-gral over a vector in anticommunting variables. Combining these expressions, we obtain aGaussian integral over a rectangular supermatrix B which is n × (2 |

2) dimensional, B = [ u a , v a , ζ ∗ a , ζ a ] , B t =  u b v b ζ b − ζ ∗ b  ≤ b ≤ n . (6)7ere, ζ i , ζ ∗ i are Grassmann variables. Using this and d [ B ] = (2 π ) − N P Ni =1 du i dv i ∂ζ ∗ i ∂ζ i , u i , v i ∈ R in Equation (4) and performing the Gaussian integral over A , we apply the dualityrelation between ordinary spaces and superspaces det( N + i BB t Λ) = sdet( + i B t Λ B ), onecan then rewrite the determinant as a superdeterminant. Importantly, the supermatrix B t Λ B is 4 × BB t is N × N dimensional. This dimensionalreduction is the advantage of the supersymmetry technique. The left upper block (boson-boson block) of supermatrix B t Λ B is a Hermitian matrix. We now use the generalizedHubbard-Stratonovich transformation to replace the supermatrix B t Λ B by a supermatrix σ with independent matrix elements. For the required power of superdeterminant in theexpression for the generating function, we write a super-Fourier representationsdet − T/ ( + i B t Λ B ) = Z d [ ρ ] I ( ρ ) exp( − i B t Λ B ρ )) . (7)The Fourier transform gives a supersymmetric Ingham-Siegel distribution, I ( ρ ) = Z d [ σ ]sdet − T/ ( + iσ ) exp( i σρ )) . (8)Here, σ =  σ ττ ∗ iσ  and ρ =  ρ ωω ∗ iρ  are supermatrices of dimension 4 × × τ =  η η ∗ ξ ξ ∗  and τ ∗ =  η ∗ ξ ∗ − η − ξ  (similarly, for ω ). The super-integration measure is d [ σ ] = (2 π ) − dσ aa dσ ab dσ bb dσ ∂η∂η ∗ ∂ξ∂ξ ∗ ,where σ aa , σ bb are diagonal and σ ab is the oﬀ-diagonal elements of σ . The measure d [ ρ ]is deﬁned in a similar fashion. Using these and integrating over the supermatrix B , thegenerating function is a supermatrix integral, Z ( J ) = Z d [ ρ ] I ( ρ ) N Y i =1 sdet − / ( x + − J γ − ρ Λ i ) . (9)Here, the matrix γ = diag(0 , − ) is diagonal. For arbitrary small J , using Equations (8)and (9), we have Z ( J ) = R d [ σ ] R d [ ρ ] e L with a Langrangian L given by L = − N X i =1 str ln( x + − ρ Λ i ) + T ρ − str ρ , (10)8nd we end up with a scalar polynomial equation resulting from the saddle point equationthat can be solved numerically,12 N X i =1 Λ i x + − ρ Λ i + T ρ − . (11)This is the main analytic result of the paper which we test with numerics for diﬀerent WOEmodels in the following section. The one-point function is then given in terms of the complexsolution, say ρ ( x ), of this saddle point equation, S ( x ) = − ℑ ρ ( x ) /πN x , (12)in the limit N, T → ∞ with ﬁxed ratio κ = N/T . Note that the eigenvalue density isnormalized to unity. Equation (11) is valid for CWOE with arbitrary correlations and thestructure of the correlation matrix enters via its eigenvalues Λ i (1 ≤ i ≤ N ). Equation (11)is another version of a classical result [28–31]. IV. NUMERICAL RESULTS

For the random selections, we have two choices: (a) ’Non-Singular Random SelectionEnsemble’ (NSRSE) in which a given time series can appear many times but at most oncein the construction of any correlation matrix to avoid singularities. As mentioned above,we will have binomially many choices but the members of the ensemble are not entirelyindependent. We will usually not have N and T very large but even so we will ﬁnd thatthe behavior of the bulk is not signiﬁcantly aﬀected although the outliers are. Alternatively,for suﬃciently large N and T , we could use a random matrix model ’Exclusive RandomSelection Ensemble’ that constructs an ensemble that excludes any repeatition of time seriesin its construction. We can use this ensemble to calculate the expectation values of thequantities we are interested in and avergae those over all or a subset of possible selections.In this case, we expect to a large extent coincidence with correlated Wishart ensemblesbut the procedure is rather unyieldy and we should rather focus on the ﬁrst choice namelyNSRSE.We now proceed to analyze three special cases. We start with the case of uncorrelatedtime series, where we should reproduce the Mar˘cenko-Pastur distribution [28, 32]. Next,we pass to the case of constant correlations where we, in addition, have an outlier that9hould be described. Finally, we proceed to the block structure, which we illustrate byusing two blocks of time series which have constant internal correlation and relatively smallcorrelation between the two blocks. Note that our results need not agree with theory forWishart matrices because starting with a single representative of this ensemble in the largespace, we select the smaller matrices from that space and repetitions of the selection willturn out to be important. A. Uncorrelated Non-Singular Random Selection Ensemble

As a ﬁrst example, we study uncorrelated NSRSE, correlation matrices of which will beobtained from a data set A of white noise time series by selecting m time series in L samplesfrom the (cid:0) Nm (cid:1) possible selections. The corresponding eigenvalues will be obtained numericallybelow and compared to the solution of the polynomial equation. This corresponds to averageeigenvalues Λ i = 1 for i = 1 , . . . , N in Equation (11) that results in a quadratic equationwhich can be solved analytically to obtain, ρ ( y ) = κ πy [( y + − y )( y − y − )] / . (13)Here, κ = N/T and y ± = T (1 ± √ κ ) / N, T → ∞ withﬁxed κ . Hence, in order to be able to compare with the Mar˘cenko-Pastur distribution andnumerics, one needs to re-scale the variables as x + → x ′ T / ρ ( x + ) → ρ ( x ′ ) /T inEquation (11).We compare numerical NSRSE eigenvalue densities with the analytical result given byEquation (13) in Figure 1. For Monte-Carlo simulations, we start with a singular datamatrix of dimension 1000 ×

100 ( κ = 10). One can normalize these 1000 time series in twoways: (1) by rescaling each time series by its respective mean and standard deviation ( micro-canonical normalization) and (2) by rescaling all the time series by their average mean andaverage standard deviation ( canonical normalization). Then, by randomly selecting the rowsof this data matrix as explained above, we construct a 5000 member ensemble of 90 × a = 0 .

9) data matrices ( κ = 0 . L = 5000 membersof NSRSE and diagonalize these to obtain the eigenvalues. In Figure 1(a), we show thenumerical histogram for the 1000 eigenvalues of the correlation matrix corresponding to the10nitial 1000 ×

100 data matrix obtained using microcanonical normaliztion and similarly forcanonical normalization in Figure 1(b). The spectral bounds are in agreement with the solidcurve obtained using Equation (13). However, as we have a single copy of correlation matrix,there are a lot of ﬂuctuations in numerics. We do not ﬁnd any signiﬁcant diﬀerences betweenmicrocanonical and canonical normalizations for NSRSE. Then, we apply ensemble techniqueand eigenvalue histograms for microcanonical and canonical normalizations respectively areshown in Figure 1(c) and 1(d). The agreement with the solid curves obtained using Equation(13) is excellent. Again, we do not observe any signiﬁcant diﬀerences in the microcanonicaland canonical normalizations for NSRSE using the ensemble technique.

B. Correlated Non-Singular Random Selection Ensemble With Constant LinearCorrelations

Going further, we consider correlated NSRSE A = χ / A with constant linear correlationsdeﬁned by χ j,k = δ j,k + υ (1 − δ j,k ); υ being the correlation coeﬃcient. The numericalhistograms obtained for correlated NSRSE with constant linear correlations deﬁned by υ =0 .

1, 0 . . L = 5000, N = 1000, κ = 10 and a = 0 .

9. The solid histogramscorrespond to microcanonical normalization and empty histograms correspond to canonicalnormalization. The solid curves are obtained by numerically solving Equation (11) withΛ i = 1 − υ for i = 1 , . . . , N − N = N υ + 1 − υ (a third order polynomial equation).Insets in each of these pictures show the distribution of the outlier Λ N . The agreementof the polynomial equation solution in the bulk of the spectrum is excellent except forsmall deviations in the tails with increasing correlation coeﬃcient υ . Notice the increasingdiﬀerence between the bulk and the outlier along with shrinking of spectral bounds forthe bulk distribution with increasing υ . The histograms for microcanonical and canonicalnormalizations are similar for the bulk distribution while there are diﬀerences in outliersnoticeable with increasing υ .The shape of the farthest peak (outlier) is Gaussian for the numerical histograms whereasit resembles a semicircle for the respective solutions from the polynomial equation. Thesaddle point approximation must be good where many peaks overlap. It must be worsefor individual peaks (outliers). But as seen from Figure 2, the saddle point approximation11

10 1500.010.02 5 10 1500.010.020 1 2 3 400.30.60.91.2 0 1 2 3 400.30.60.91.2 (a)(b) (c)(d)

FIG. 1. Density of non-zero eigenvalues of a singular correlation matrix C obtained from a datamatrix A of dimension 1000 × κ = 10 with (a) micro-canonical and (c) canonical normal-ization. Ensemble averaged eigenvalue density for a 5000 member NSRSE of correlation matricesconstructed using 0 . T × T ( κ = 0 .

9) dimensional A matrices with (b) micro-canonical and (d)canonical normalization. Numerical results are histograms and solid curves are obtained fromEquation (13). works remarkably well even for the outliers: it reproduces their position and gives a widthnot too far from reality. However, it cannot reproduce the shape of the peaks. The exactproblem is highly complex and thus, one cannot expect to get all the features by a simplepolynomial equation. 12

12 1510 -4 -3 -1

40 5010 -4 -3 -1

80 10010 -4 -3 -1 (a) (b)(c) FIG. 2. Ensemble averaged eigenvalue density for a 5000 member ensemble of 90 ×

90 dimensionalcorrelated NSRSE matrices with constant linear correlations deﬁned by (a) υ = 0 .

1, (b) υ = 0 . υ = 0 .

9. Here, κ = 10. Numerical results are histograms and solid curves are obtainedfrom Equation (11). Insets give the distribution of the outlier. . Block Non-Singular Random Selection Ensemble (a) (b)(c) (d) FIG. 3. Ensemble averaged non-singular block correlation matrices constructed using 90 × A with constant block correlations: (a) υ = υ = 0 .

1; (b) υ = 0 . υ = 0 .

5; (c) υ = 0 . υ = 0 . υ = υ = 0 . As is usual in ﬁnancial market analysis, one deals with block matrices where each blockrepresents a sector. For instance, energy, utility and technology are a few sectors in stocks.Inspired by this, we consider a simple 2 × A t = h A A i with A and A representing14 (a) (b)(d)(c) FIG. 4. Ensemble averaged eigenvalue density for a 5000 member block NSRSE of correlation ma-trices constructed using 0 . T × T ( κ = 0 .

9) dimensional A matrices with constant block correlationsdeﬁned by (a) υ = υ = 0 .

1; (b) υ = 0 . υ = 0 .

5; (c) υ = 0 . υ = 0 . υ = υ = 0 . data matrix in each sector with respective dimensions N × T and N × T ; N = N + N . Ineach sector, we consider constant linear correlations with correlation coeﬃcients c and c .For numerics, we construct a L = 5000 member block NSRSE with N = 1000, κ = 10, a = 0 . N = 0 . N and N = 0 . N . To generate the ensemble, for each member, randomselections of time series from the given A matrix are done depending on the weights (say,15hese are p and p ): p = N ∗ m/N and p = N ∗ m/N . These are the number of timeseries randomly chosen from each sector. Figure 3 shows the structure of ensemble averagedcorrelation matrices constructed using canonical normalization with (a) υ = υ = 0 .

1; (b) υ = 0 . υ = 0 .

5; (c) υ = 0 . υ = 0 . υ = υ = 0 .

5. The block structureremains intact with the weighted random permutations. This is obvious as the χ matrix isinvariant under permutations.In Figure 4, we compare the eigenvalue histograms (solid ones corresponding to micro-canonical normalization and empty ones corresponding to canonical normalization) of blockNSRSE for (a) υ = υ = 0 .

1; (b) υ = 0 . υ = 0 .

5; (c) υ = 0 . υ = 0 .

1, (d) υ = υ = 0 . i = 1 − υ for i = 1 , . . . , N − N = N υ + 1 − υ , Λ i = 1 − υ for i = N + 1 , . . . , N −

1, Λ N = N υ + 1 − υ (ﬁfth orderpolynomial equation). We ﬁnd good agreement in the bulk distributions with deviations inthe tails for larger correlation coeﬃcients υ and υ . Insets show the distributions of thetwo outliers (Λ N and Λ N ). It can be single peaked, overlapping peaks or double peakedas the positions depend on correlation coeﬃcients υ and υ . The choice of normalizationgenerates diﬀerences in the distribution of outliers. The saddle point approximation givesthe approximate positions and widths of the peaks but not the shape. In conclusion, therepeatition of time series in the construction of NSRSE strongly aﬀects the outliers. In sad-dle point approximation, the bulk of the spectrum is order N correction and if the outlieris far away from the bulk, it is only order 1 correction term. Thus, the latter is a very smallperturbative term. V. CONCLUSIONS AND OUTLOOK

We have presented an entirely new way to treat large numbers of time series pertainingto the same system and therefore likely to display some correlation. Basically the propo-sition consists in dividing the entire set of time series in diﬀerent ways, thus obtaining theNon-Singular Random Selection Ensemble from the data. This allows to obtain a spectraldistribution for an ensemble of correlation or covariance matrices and also to get distribu-tions of particular eigenvalues, say the largest or the smallest one. The same will hold foreigenfunctions. This opens a new alley for investigations of systems that are not stationaryon longer time scales but quasi-stationary on a short time scale as deﬁned by the lenght of16he epochs we choose. We can then study the temporal evolution of such an ensemble. Thisin turn may give hints to instabilities emerging in the system which might be suﬃcientlystrong to be used to give an early warning.The next step will be to show how such an ensemble behaves, when at or near a criticaltransition. At this point we are studying this in ﬁnancial markets and in two dynamicalsystems, namely the TASEP [6] and the 2-D Ising model near criticality [16]. The rangeof potential applications is very wide and in the present paper we have performed the ﬁrsttests using correlated random matrices as a model where analytic results are available. Thecase we generically discuss is a set of time series, which are strongly correlated within eachof two subsets leading to a block structure in the correlation matrix. This is a toy modelfor ﬁnancial markets with its tradicional divison into market sectors. Preliminary results onﬁnancial markets can be viewed in a master thesis [33] and further work in this direction isin progress.

ACKNOWLEDGEMENTS

We thank F. Leyvraz and M. Kieburg for useful discussions. Authors acknowledge ﬁ-nancial support from UNAM/DGAPA/PAPIIT research grant IA104617 and CONACyTFRONTERAS 201. [1] S. Katz, J. L. Lebowitz, and H. Spohn, J. Stat. Phys. , 497 (1984).[2] Bernard Derrida, J. Stat. Mech. , P07023 (2007).[3] T. Prosen, Phys. Rev. Lett. , 217206 (2011).[4] B. Li, G. Casati, J. Wang, and T. Prosen, Phys. Rev. Lett. , 254301 (2004).[5] T. Stegmann and N. Szpak, New J. Phys. , 053016 (2016).[6] S. Biswas, F. Leyvraz, P. M. Castillero, and T. H. Seligman, Nat. Sci. Rep. , 40506 (2017).[7] E. Broughton, Environmental Health: A Global Access Science Source (2005),doi:https://doi.org/10.1186/1476-069X-4-6.[8] T. A. Schmitt, D. Chetalova, R. Sch¨afer, and T. Guhr, Europhys. Lett. , 58003 (2013).[9] T. A. Schmitt, D. Chetalova, R. Sch¨afer, and T. Guhr, Europhys. Lett. , 38004 (2014).

10] T. A. Schmitt, R. Sch¨afer, and T. Guhr, J. Credit Risk , 73 (2015).[11] M. C. M¨unnix, T. Shimada, R. Sch¨afer, F. Leyvraz, T. H. Seligman, T. Guhr, and H. E.Stanley, Nat. Sci. Rep. , 644 (2012).[12] Marco Fattore and Rainer Bruggemann (eds.), Partial Order Concepts in Applied Sciences ,(Springer, Heidelberg, 2016).[13] Vinayak, T. Prosen, B. Bu˘ c a, and T. H. Seligman, Europhys. Lett. , 200006 (2014).[14] T. Guhr and B. Kaelber, J. Phys. A , 3009 (2002).[15] R. Sch¨afer and T. Guhr, Physica A , 3856 (2010).[16] Vinayak, R. Sch¨afer, and T. H. Seligman, Phys. Rev. E , 032115 (2013).[17] J. Wishart, Biometrika , 32 (1928).[18] C. Recher, M. Kieburg, and T. Guhr, Phys. Rev. Lett. , 244101 (2010).[19] C. Recher, M. Kieburg, T. Guhr, and M. R. Zirnbauer, J. Stat. Phys. , 981 (2012).[20] M. L. Mehta, Random matrices (Elsevier, Amsterdam, 2004).[21] Vinayak and A. Pandey, Phys. Rev. E , 036202 (2010).[22] F. A. Berezin, Introduction to Superanalysis (D. Reidel Publishing Company, Dordrecht, 1987).[23] K. Efetov,

Supersymmetry in Disorder and Chaos (Cambridge University Press, Cambridge,1997).[24] T. Guhr, J. Phys. A , 13191 (2006).[25] M. Kieburg, J. Gr¨onqvist, and T. Guhr, J. Phys. A , 275205 (2009).[26] P. Littelmann, H.-J. Sommers, and M.R. Zirnbauer, Commun. Math. Phys. , 343 (2008).[27] M. Kieburg, H.-J. Sommers, and T. Guhr, J. Phys. A , 275206 (2009).[28] V. A. Mar˘cenko and L. A. Pastur, Math. USSR Sb. 1, 457 (1967).[29] J. Silverstein and S. Choi, J. Multivariate Anal. , 295 (1995)[30] Z. D. Bai and J. W. Silverstein, ANn. Prob. , 316 (1998).[31] T. Wirtz, M. Kieburg, and T. Guhr, J. Phys. A , 235203 (2017).[32] L. A. Pastur, Theoret. and Math. Phys. , 67 (1972).[33] J. Morales, Tcnicas nuevas en el anlisis de mercados de valores , Master thesis, UNAM (2016)., Master thesis, UNAM (2016).