CCOINTEGRATION IN LARGE VARS
ANNA BYKHOVSKAYA AND VADIM GORIN
Abstract.
The paper analyses cointegration in vector autoregressive processes (VARs) forthe cases when both the number of coordinates, N , and the number of time periods, T ,are large and of the same order. We propose a way to examine a VAR for the presence ofcointegration based on a modification of the Johansen likelihood ratio test. The advantageof our procedure over the original Johansen test and its finite sample corrections is that ourtest does not suffer from over-rejection. This is achieved through novel asymptotic theoremsfor eigenvalues of matrices in the test statistic in the regime of proportionally growing N and T . Our theoretical findings are supported by Monte Carlo simulations and an empiricalillustration. Moreover, we find a surprising connection with multivariate analysis of variance(MANOVA) and explain why it emerges. Date : June 26, 2020.The authors are grateful to Alexei Borodin, Bruce Hansen, and Grigori Olshanski for valuable commentsand suggestions. V.G. acknowledges support from the NSF Grants DMS-1664619 and DMS-1949820. a r X i v : . [ ec on . E M ] J un ANNA BYKHOVSKAYA AND VADIM GORIN
1. Introduction1.1. Motivation.
The importance of cointegration in economics stems from the seminalpapers Granger (1981) and Engle and Granger (1987). For example, as they show, monthlyrates on 1-month and 20-year treasury bonds are cointegrated, which means that they areboth non-stationary, but they have a stationary linear combination. A lot of other variablesin macroeconomics and finance such as price level, consumption, output, trade flows, interestrates and so on are non-stationary, and, thus, are potentially subject to cointegration. Whendealing with non-stationary time series, it is always a question whether one should work withlevels or with differences. For multivariate settings such as vector autoregression (VAR), thechoice of model would depend on whether the series is cointegrated or not.There are several ways to test for the presence of cointegration (see, e.g., Maddala andKim (1998) for the detailed description of various methods). One popular approach relies onchecking whether the residuals from regressing one of the coordinates on the remaining onesare stationary. It is based on Engle and Granger (1987) and was later extended in Phillipsand Ouliaris (1990). Another widely used technique is due to Søren Johansen (Johansen,1988, 1991) . This approach assumes VAR structure and relies on the likelihood ratio. Ittests a null hypothesis of at most r cointegrating relationships versus an alternative of atmost r (cid:48) > r cointegrating relationships. The Johansen test turns out to be related to theeigenvalues of some random matrix and has a non-standard asymptotic distribution.Neither of the approaches is commonly used in the analysis of large systems. However, inmany situations the data turns out to have both cross-sectional and time dimensions beinglarge. Natural examples are given, e.g., by financial data (stock prices, exchange rates, etc.)or by monthly data on trade and investments between countries (where the number of pairsof countries is large). The main difficulty with applying the above cointegration testingapproaches to high-dimensional settings is that both of them require N , the cross-sectionaldimension of a time series, to be fixed and small. For the former approach, the larger is N , themore regressions we need to run and interpreting their results becomes ambiguous. For thelatter approach, it turns out, the asymptotic theory stops providing a good approximationfor moderate values of T . The test starts to over-reject the null of fewer cointegrations infavor of the alternative (see, for example, Ho and Sørensen (1996), Gonzalo and Pitarakis(1999)). For instance, the simulations reported in Table 1 of Gonzalo and Pitarakis (1999)indicate that the empirical rejection rate based on the 95% asymptotic critical value for theno-cointegration hypothesis (constructed using the asymptotics of Johansen (1988, 1991)) is20% at N = 5, T = 30. It eventually decreases to the desired 5% as T grows, but even at T = 400 the empirical size is still 6%. A related approach was also proposed in Stock and Watson (1988).
OINTEGRATION IN LARGE VARS 3
After size distortions became clear econometricians developed various procedures for cor-recting over-rejection. Popular methods include finite sample correction (e.g., Reinsel andAhn (1992), Johansen (2002)) and bootstrap (e.g., Swensen (2006), Cavaliere et al. (2012)).Such modifications help to restore correct size for moderate values of T (e.g., T = 50), when N is of a smaller order (e.g., N = 5). Yet, the question of larger N remained open.A recent ground-breaking paper Onatski and Wang (2018) shows that the over-rejectioncan be mathematically explained by considering an alternative asymptotics when both N and T go to infinity jointly and proportionally. When N is large such asymptotic regimeturns out to better suit the finite sample properties of the data. While Onatski and Wang(2018) point this out, they do not provide alternative testing procedures.The observation that large VARs might have analyzable joint limits opens a new area ofresearch. That is, one can try to develop various sophisticated joint asymptotics and derive,as a result, appropriate ways to test for cointegration in settings where both N and T arelarge. This, however, requires new tools rooted in the random matrix theory. In our paperwe propose such cointegration tests and introduce asymptotic theorems, which make testingpossible. Let us describe our main results. By slightly modifying the Johansen likelihood ratio (LR) test, we come upwith a way to examine the presence of cointegration in a time series when the cross-sectionaldimension, N , and the time dimension, T , are of the same order. We consider a vectorautoregression (VAR) of order 1 in the error correction representation(1) ∆ X t = µ + Π X t − + ε t , t = 1 , . . . , T, where errors { ε t } are Gaussian with N × N covariance matrix Λ, and Π, µ are unknownparameters. We do not restrict Λ to be diagonal, which means that any cross-sectionalcorrelation structure is allowed. Such heteroscedasticity assumption may be important inapplications, where one expects variables, e.g., countries, to be correlated.We are interested in whether the N -dimensional non-stationary process X t is cointegratedor not. That is, whether there is a non-zero vector β such that β (cid:48) X t is trend stationary. Ourmain contributions lie in the construction of the appropriate test, analysis of its asymptoticdistribution, and computation of critical values (some of them are reported in Table 1 ofSection 3).Our procedure for cointegration testing relies on the following steps. First, we de-trend X t by subtracting tT ( X T − X ) from X t . Then we follow Johansen’s procedure and regressboth first differences, ∆ X t , and lagged de-trended X t on a constant. That is, we de-meanthe data. Then we calculate the squared sample canonical correlations between the residuals Accompanying paper Bykhovskaya and Gorin (2020) investigates possible generalizations of our approachto VAR( k ) for general k . ANNA BYKHOVSKAYA AND VADIM GORIN from those regressions. This corresponds to the eigenvalues from the modification of theJohansen LR statistics obtained by using the lagged de-trended version of X t and not de-trended first differences ∆ X t .The de-trending procedure can be interpreted in the following way: we take the firstdifferences of the observed variable, de-mean them, and re-sum back. This is equivalent (upto a constant X which disappears after we further de-mean the sums) to our de-trending.Such interpretation is related to Elliott et al. (1996) which analyzes unit root testing in thepresence of a trend d t . Under the null hypothesis of a unit root and when the trend d t inElliott et al. (1996) is linear, the paper suggests to de-mean the first differences.De-trending step is crucial to our procedure. First, as discussed in Xiao and Phillips(1999), de-trending plays an important role in improving the performance of cointegrationtests. Second, in our case de-trending leads to an unexpected connection with the Ja-cobi ensemble, a familiar object in random matrix theory, but a novel one in econometrics.Eventually, this connection is the main technical tool leading to our asymptotic results andconstruction of the test.We show that after proper rescaling our test under the null hypothesis of no cointegrationconverges to the sum of the first r elements of the Airy point process (we formally definethat process in Section 3.1). Airy process is a known object in the random matrix theoryand its marginal distributions can be computed in various ways. We also present in Section5 a simulation study of the speed of convergence, which supports the limiting results evenfor moderate values of N and T . In Table 2 of that section we demonstrate the significantimprovement over the finite sample behavior of the Johansen test and its corrections reportedin Gonzalo and Pitarakis (1999).Along with the size we also report power simulations in Section 5. The power dependson the choice of the alternative and we perform several experiments with random rank 1matrix Π and random initial condition X of varying magnitudes. In many cases the poweris very close to 100%. In particular, we found high power even for moderate N, T , e.g., N ≈ , T ≥ T /N ≈
5, corresponding to high power and close to 5% sample rejection rate in simulations.Log-prices are known to have a unit root, and we find a strong evidence that they arecointegrated. We remark that de-trending is implicitly present in the proofs of Onatski and Wang (2018, 2019), yet inour work it becomes a central ingredient, rather than a technical detail.
OINTEGRATION IN LARGE VARS 5
Let us briefly indicate technical aspects of our proofs and their mathe-matical novelty. The key observation that we make is that a small perturbation of the matrixarising in the modified Johansen test has an explicit distribution of its eigenvalues. This dis-tribution is called Jacobi ensemble, and its usual appearances in statistics include samplecanonical correlations for two sets of independent data (as opposed to highly dependent inour case, see the discussion in Section 4.1) and multivariate analysis of variance (MANOVA),see, e.g., Muirhead (2009). Our method has two main components, which are new, as far aswe are aware. First, our perturbation of the model is based on a replacement of a certainpermutation in the matrix formulation of the modified Johansen test by a uniformly randomorthogonal matrix. The second ingredient is a challenging computation of matrix integrals leading to the identity of the law of the perturbed matrix with the Jacobi ensemble.Put it otherwise, we discover an exactly-solvable or integrable model in a small neighbor-hood of the (random) matrix of the Johansen cointegration test. Let us center our attentionon this integrable model. It can be used as an initial point for perturbative arguments lead-ing to asymptotic theorems for our and possibly other modifications of the Johansen test.In addition, our exactly-solvable model is not isolated, but rather it is a representative of awhole class of similar cases. We expect that our approach works in several other situationin which various test statistics in the VAR( k ) context can be understood through (other)instances of the Jacobi ensemble. Justifications of this point of view are contained in Sections7.2, 9.5, and in Bykhovskaya and Gorin (2020). Section 2 describes the setting and the main objects of interest.Section 3 presents asymptotic results, while Section 4 gives a sketch of their proofs. Section5 shows supporting Monte Carlo simulations and Section 6 applies our test to S&P100.Additional results and extensions are presented in Section 7. Finally, Section 8 concludes.All proofs, unless otherwise noted, are in the Appendix.
2. Setting2.1. Building block.
We consider an N -dimensional vector autoregressive process of order k , VAR( k ), based on a sequence of i.i.d. mean zero Gaussian errors { ε t } with non-degeneratecovariance matrix Λ. That is,(2) ∆ X t = k − (cid:88) i =1 Γ i ∆ X t − i + Π X t − k + µ + ε t , t = 1 , . . . , T, where Γ , . . . , Γ k − , Π, and µ are unknown parameters. We do not impose any restrictionson Λ, thus, we allow for arbitrary correlations across coordinates of X t . The process isinitialized at fixed X − k , . . . , X . Our arguments have some similarities with proofs in Hua (1963).
ANNA BYKHOVSKAYA AND VADIM GORIN
We are interested in analyzing whether X t is cointegrated. That is, whether there existsa non-zero N × r matrix β such that β (cid:48) X t is (trend) stationary.If there exists a N × r matrix β of rank r such that β (cid:48) X t is (trend) stationary, but theredoes not exist a N × ( r + 1) matrix ˜ β of rank r + 1 such that ˜ β (cid:48) X t is (trend) stationary, thenwe say that X t is cointegrated of order r . As shown in Engle and Granger (1987, Grangerrepresentation theorem), X t is cointegrated of order r if and only if rank(Π) = r and, thus,there exist two N × r matrices α and β of rank r such that Π = αβ (cid:48) .Our testing procedure is a modification of the Johansen likelihood ratio (LR) test (Jo-hansen, 1988, 1991). In this paper we focus on VAR(1) setting:(3) ∆ X t = Π X t − + µ + ε t , t = 1 , . . . , T. We discuss extensions to VAR( k ), k > We test the null hypothesis of no cointegration versus at most r cointegrations, where r is a fixed finite number. That is, using the Granger representationtheorem (Engle and Granger, 1987), we can rewrite our hypothesis as(4) H : Π ≡ H : rank(Π) ≤ r Our approach consists of several steps:
Step 1.
De-trend the data and define(5) ˜ X t = X t − − t − T ( X T − X ) . Note that we also do a time shift. This shift is in line with Johansen test, where lags andfirst differences are regressed on the observables.
Step 2.
De-mean the data. That is, regress de-trended lags, ˜ X t , and first differences ∆ X t on a constant. Define the residuals from those regressions as R t = ˜ X t − T T (cid:88) τ =1 ˜ X τ , R t = ∆ X t − T T (cid:88) τ =1 ∆ X τ . (6) Step 3.
Calculate the squared sample canonical correlations between R and R . Thatis, define S ij = T (cid:88) t =1 R it R ∗ jt , i, j = 0 , , (7)where here and below the notation X ∗ means transpose of the matrix X (transpose-conjugatewhenever complex matrices appear). Then calculate N eigenvalues λ ≥ . . . ≥ λ N of the We discuss the importance of de-trending in Section 3.2.
OINTEGRATION IN LARGE VARS 7 matrix S S − S S − . The eigenvalues solve the equation(8) det( S S − S − λS ) = 0 . Step 4.
Form the test statistic(9) LR N,T ( r ) = r (cid:88) i =1 ln(1 − λ i ) . The subscript
N, T in (9) indicates that we modify Johansen LR test in accordance with thelarge
N, T asymptotics. This statistic after centering and rescaling will be compared withappropriate critical values to decide whether one can reject H or not (see Theorem 2).Throughout the proofs and extensions, we will be using an alternative way to write theresiduals and matrices S ij . For this let us define the demeaning operator P . It is a linearoperator in a T –dimensional space defined by its matrix(10) P = − T − T . . . − T − T − T . . . − T . . . − T − T . . . − T . P is an orthogonal projector on the space orthogonal to the vector (1 , , . . . , R it = [ ˜ X P ] it = ˜ X it − T T (cid:88) τ =1 ˜ X iτ , R it = [∆ X P ] it = ∆ X it − T T (cid:88) τ =1 ∆ X iτ . Using the fact that P = P ,(11) S = ∆ X P ∆ X ∗ , S = ∆ X P ˜ X ∗ , S = S ∗ = ˜ X P ∆ X ∗ , S = ˜ X P ˜ X ∗ . Let us emphasize that our test differs from the original Johansen test in the fact that weuse ˜ X t instead of X t − . Note that ˜ X t can be viewed as a rank 1 perturbation of X t − . Hence,our test statistic is a finite rank perturbation of the original Johansen procedure.
3. Asymptotic results
In this section we formulate our main asymptotic results and discuss the importance ofde-trending (Step 1 in Section 2.2). We provide a sketch of the proof of our asymptotictheorem in Section 4.
N, T ) limit of the test.
Our results use the Airy point process. Thus, letus introduce it before we formulate the main theorems. The Airy point process is a randominfinite sequence of reals a > a > a > . . . ANNA BYKHOVSKAYA AND VADIM GORIN which can be defined through the following proposition.
Proposition 1 (Forrester (1993),Tracy and Widom (1996)) . Let Y N be N × N matrix ofi.i.d. N (0 , Gaussian random variables and let µ N ≥ µ N ≥ . . . µ N ; N be eigenvalues of ( Y + Y ∗ ) . Then in the sense of convergence of finite-dimensional distributions (12) lim N →∞ (cid:110) N / (cid:16) µ i ; N − √ N (cid:17)(cid:111) ∞ i =1 = { a i } ∞ i =1 . The law of the first coordinate a is known as the Tracy-Widom F distribution; itsdistribution function can be written in terms of a solution of the Painleve II differentialequation.We remark that from the computational point of view (12) gives an efficient way to accessthe distribution of various functions of { a i } ∞ i =1 . From the theoretical perspective, one wouldlike to have a more structural definition, which can be used for the analysis. Such definitionsexist, yet, unfortunately, none of them is particularly simple. Theorem 2.
Suppose that
T, N → ∞ in such a way that
T > N and the ratio T /N remains bounded. Suppose that H holds, that is, we have (3) with Π = 0 . Then for eachfinite r = 1 , , . . . , we have convergence in distribution for the largest eigenvalues defined inEq. (8) : (13) (cid:80) ri =1 ln(1 − λ i ) − r · c ( N, T ) N − / c ( N, T ) d −−−−−→ T,N →∞ r (cid:88) i =1 a i , where (14) c ( N, T ) = ln (1 − λ + ) , c ( N, T ) = − / λ / (1 − λ + ) / ( λ + − λ − ) / ( p + q ) − / < , (15) p = 2 − N , q = TN − − N , λ ± = 1( p + q ) (cid:104)(cid:112) p ( p + q − ± √ q (cid:105) . Figure 1 shows densities for the random variables (cid:80) ri =1 a i for r = 1 , ,
3. As r grows, theskewness of (cid:80) ri =1 a i decreases, the expectations of (cid:80) ri =1 a i tend to −∞ and the variancestend to + ∞ .Quantiles of the distribution of r (cid:80) i =1 a i serve as critical values for testing the hypothesis H of no cointegrations against the alternative of at most r cointegrations. We report thequantiles for r = 1 in Table 1.Note the shifts by N in the definition (15) of p and q . Theorem 2 remains valid withoutthese shifts, i.e., for the simplified p = 2, q = TN −
1. However, we found that for
T /N < An even faster way uses tridiagonal matrix models (Dumitriu and Edelman, 2002). There are several equivalent ways to define Airy point process: through Pfaffian formulas for the correla-tion functions (Forrester, 1993; Tracy and Widom, 1996), through combinatorial formulas for the Laplacetransform (Sodin, 2014), through eigenvalues of the Stochastic Airy Operator (Ramirez et al., 2011). OINTEGRATION IN LARGE VARS 9 -25 -20 -15 -10 -5 0 5 1000.050.10.150.20.250.30.35 r=1r=2r=3
Figure 1.
The probability density for the random variables r (cid:80) i =1 a i . The his-togram plots are obtained from 100,000 samples of 200 ×
200 Gaussian matricesfollowing Proposition 1. r α
90% 95% 97 .
5% 99%1 0.45 0.98 1.45 2.02
Table 1.
Quantiles of a from Bejan (2005, Table 2).the shifts improve the speed of convergence and, thus, the finite sample behavior of the test.We discuss the shifts in more detail in Sections 5 and 9.3.We sketch the proof of Theorem 2 in Section 4 and give full details in the appendix. Thetwo main ideas that we use:(I) We show that a small (vanishing as N, T → ∞ ) perturbation of eigenvalues λ ≥ λ ≥ · · · ≥ λ N leads to an explicit probability distribution known as the Jacobiensemble (see Definition 3 below). While this distribution appears in many problemsof random matrix theory and multivariate statistics, its connection to the law of theJohansen test statistic remained previously unknown.(II) Further, we rely on the universality phenomenon from random matrix theory, whichsays that the particular choice ( ( Y + Y ∗ )) for the law of random matrix made inProposition 1 is not of central importance. Instead, the Airy point process is auniversal scaling limit for the largest eigenvalues in various ensembles of symmetricrandom matrices of growing sizes, see, e.g., Erdos and Yau (2012) and Tao and Vu(2012). In particular, an asymptotic statement similar to Proposition 1 is known for the Jacobi ensemble (see Section 9.3) and, by combining it with the first idea, weeventually arrive at Theorem 2.It is natural to ask about an extension of Theorem 2 for the case of r growing togetherwith N . We distinguish two subcases here: • Slow growth: r = (cid:98) N θ (cid:99) for some 0 < θ < • Linear growth: r = (cid:98) ρN (cid:99) for some 0 < ρ ≤ c and c ) the limit in (13)becomes Gaussian. Although we are not going to pursue this direction here, for the slowgrowth case the proof of asymptotic normality can probably be obtained by the same methodsas we use in Theorem 2: we can again use the Jacobi ensemble for the computation. Forthe Jacobi ensemble individual eigenvalues λ i (in the regime of growing i and N − i ) becomeasymptotically Gaussian, hence, also their sums. For the linear growth case the situation is more complicated. Our present tools only allowus to prove an asymptotic upper-bound of the form (cid:98) ρN (cid:99) (cid:88) i =1 ln(1 − λ i ) − N · c ( T /N, ρ ) = o ( N (cid:15) ) , N → ∞ , for any (cid:15) > (cid:15) = 1 this also follows from the results of Onatski and Wang (2018,2019)), while we expect the expression to be O (1). Yet, there exist very general statementson asymptotic Gaussianity of linear statistics of functions of random matrices (see, e.g.,Guionnet and Novak (2015), Mingo and Popa (2013)), and, therefore, there is little doubtin the fact that asymptotic distribution is Gaussian. Note, however, that these methodsusually give very limited information about asymptotic variance and it is unclear at thismoment how to find a reasonably simple explicit formula for it. De-trending is a necessary feature which allows our present constructionof the cointegration test. Let us discuss its meaning and some implications.First, the key property of the Johansen statistic, which guarantees its relevance to theidentification of cointegration, is not affected by de-trending. This can be already seen in N = 1 case, where the presence of cointegration means stationarity of the data. In this casethe original Johansen statistic becomes the sample correlation coefficient of T –dimensionalvectors { ∆ X t } Tt =1 and { X t − } Tt =1 . Under the unit root assumption, Π = 0, one can see thatthe correlation coefficient decays to 0 as T →
0. On the other hand, when Π (cid:54) = 0 thecorrelation coefficient does not vanish. A generalization of this feature to
N > O’Rourke (2010) proves asymptotic Gaussianity for eigenvalues of Gaussian random matrices and we expectthe same proof to work for the Jacobi ensemble case, see also Bao et al. (2013) for more details in the complexsetting. The value of the constant c can be computed through certain integrals involving the equilibrium measureof the Jacobi ensemble, see (69) and (72) for more details. OINTEGRATION IN LARGE VARS 11 the appearance of close to 1 eigenvalues in the matrix S S − S S − whenever cointegrationis present, and, hence, explains that the statistic of the form r (cid:80) i = r ln(1 − λ i ) can be used toidentify such scenario. De-trending does not change these observations and, thus, there areno a priory reasons on why the de-trended statistic should be any worse than the originalstatistic of Johansen.Second, from a more global perspective, the ultimate aim of the cointegration analysis isto figure out, whether there exists a linear combination of cross-sectional components of X t which does not grow with t . But if such a linear combination exists, then the same linearcombination for the de-trended version of X t also does not grow with t . In the oppositedirection this becomes trickier. If the de-trended version of X t is cointegrated, then we canonly guarantee that there exists a linear combination of components of X t which is a sum ofa non-growing process and a linear trend. But if we are not really interested in linear trends,then cointegration for X t and de-trended X t become equivalent.Finally, note also that an equivalent point of view is that we can deal with modifiedVAR(1) model∆ Y t = Π (cid:18) Y t − − t − T Y T (cid:19) + µ + ε t , t = 1 , . . . , T, Y = 0 . and apply the original Johansen approach to this model, see the discussion in Onatski andWang (2019).
4. Outline of the proofs
The proof of Theorem 2 rests on the notion of the Jacobi ensemble. Let us first define itand then provide a sketch of the proof of Theorem 2.
A (real) Jacobi ensemble J ( N ; p, q ) is a distribution on N × N symmetricmatrices M of density proportional to (16) det( M ) p − det( I N − M ) q − , < M < I N , with respect to the Lebesgue measure, where p, q > are two parameters and < M < I N means that both M and I N − M are positive definite. When N = 1, (16) is the Beta distribution. For general N , the eigenvectors of randomJacobi-distributed M are uniformly distributed, while N eigenvalues x ≥ · · · ≥ x N admitan explicit density with respect to the Lebesgue measure given by(17) 1 Z ( N ; p, q ) (cid:89) ≤ i OINTEGRATION IN LARGE VARS 13 canonical correlations r > r > · · · > r N have the law of the eigenvalues of the Jacobiensemble J ( N ; K − N +12 , T − N − K +12 ).At this point we observe both a similarity and a difference between the above instance ofthe Jacobi ensemble and the matrix appearing in the Johansen test. On one hand, the latteralso deals with sample canonical correlations of two data sets. On the other hand, the datasets are no longer independent, instead one is obtained from another by a deterministic lineartransformation. So are the roots of Eq. (8) related to the Jacobi ensemble? There is evidencein both directions. First, computer simulations quickly reveal that in one-dimensional casethe distribution of the single eigenvalue in the Johansen test is not the Beta distribution.Yet, second, recent results of Onatski and Wang (2018, 2019) show that the Law of LargeNumbers for the empirical distribution of the eigenvalues appearing in the Johansen test(with roots of Eq. (8) being a particular case) matches the one for the Jacobi ensemble (withshifted dimension parameters) in the limit as N, T → ∞ . Those articles were asking for anexplanation. To prove Theorem 2, we first need to establishthe following central statement: a small perturbation of the Johansen test matrix, obtainedby replacing the deterministic summation matrix in its definition by a random analogue,exactly matches the Jacobi ensemble. We further show that the perturbation vanishes inthe limit as N, T → ∞ , thus, allowing us to obtain the asymptotics of the variants of theJohansen test from the known asymptotic results for the Jacobi ensemble. Theorem 4. Suppose that T, N → ∞ in such a way that T > N and the ratio T /N remains bounded. Under the hypothesis Π = 0 for (3) , one can couple (i.e. define on thesame probability space) the eigenvalues λ ≥ λ ≥ · · · ≥ λ N of the matrix S S − S S − andeigenvalues x ≥ · · · ≥ x N of the Jacobi ensemble J ( N ; N , T − N ) in such a way that for each (cid:15) > we have lim T,N →∞ Prob (cid:18) max ≤ i ≤ N | λ i − x i | < N − (cid:15) (cid:19) = 1 . The proof of Theorem 4 is based on the following idea. Looking at Eq. (3) when Π = 0,one can notice that matrices entering into the test and given by Eq. (11) can be expressedin terms of the matrix of data X and the lag operator mapping X t → X t − . A computationshows that since we deal with the de-trended and de-meaned data, we can replace the lagoperator with its cyclic version , which maps X to X T rather than X . The latter is anorthogonal operator whose eigenvalues are roots of unity of order T . Then, the idea is toreplace this operator by uniformly random orthogonal operator. From that we proceed intwo steps:(I) We show that when the lag operator is replaced by its random counterpart, theeigenvalues of S S − S S − have distribution J ( N ; N , T − N ). We remark that this is a new appearance of the Jacobi ensemble, which was not present in the literaturebefore.(II) We show that replacement of the lag operator by its random counterpart introducesan error, which can be upper-bounded by N (cid:15) − for any (cid:15) > 0. This part is basedon the rigidity results from random matrix theory, which say that eigenvalues of auniformly random orthogonal matrix can be very closely approximated by equallyspaced roots of unity.The full proof of Theorem 4 is given in the Appendix. In addition to the real case, we alsosimultaneously prove a similar statement for complex matrices, encountering Jacobi ensembleof Hermitian matrices. While complex case is a rare guest in economics, it is important forapplications in other areas, such as signal processing or high energy physics.By combining Theorem 4 with known asymptotic results for the Jacobi ensemble we canobtain the asymptotics of test statistic (9) in various regimes. 5. Monte Carlo simulations In this section we illustrate the performance of our test via Monte Carlo simulations. Weconsider both size (rejection rate) and power. We compare the finite sample performance of our approach versusJohansen’s LR test and one of its corrected versions. The finite sample correction takes theform T − NT and was suggested in Reinsel and Ahn (1992). Table 2 summarizes the results(numbers closer to 5 mean better performance). The LR N,T column is our test and the lasttwo columns are from Gonzalo and Pitarakis (1999). They correspond to the same null ofno cointegration, but use a different from ours alternative hypothesis. Our alternative isexactly r cointegrating relations (in Table 2 we use r = 1), while the LR and RALR testsconsider “at most N ” cointegrations as H . We can see that in finite samples when both N and T are of a similar magnitude, our approach significantly outperforms the alternativesbased on small N , large T asymptotics.To illustrate the performance of our test as the sample size increases, we fix the ratio T /N ≡ c and plot the empirical size as a function of N . This is shown in Figure 2, wherethe target is 5% rejection rate. The picture suggests that the test has rejection rate close to5%. The green solid line corresponding to c = 4 achieves 5% rejection rate very fast. Otherthree curves overshoot 5% by couple percents (e.g., both c = 5 and c = 6 are always below7%). Moreover, the larger is c , the higher is the corresponding rejection curve. E.g., c = 10curve (blue, short dash) is strictly above other curves.To improve the finite-sample behavior for large c we suggest to ignore the N correctionin p and q in Theorem 2, Eq. (15), when T /N is large. That is, to use simplified formulasinstead: p = 2 , q = T /N − 1. We recalculate the empirical rejection rate under the simplified OINTEGRATION IN LARGE VARS 15 N LR N,T LR RALR5 6.60 20.75 3.596 5.45 31.66 2.687 4.52 47.44 1.988 3.80 67.42 2.009 3.16 85.00 1.3210 2.60 96.69 0.96 Table 2. Empirical size under no cointegration hypothesis (5% nominallevel). Data generating process: ∆ X it = ε it , ε it ∼ i.i.d. N (0 , T = 30, M C = 1 , , 000 replications for LR N,T and M C = 10 , 000 for LR and RALR. rejection rate N T / N = 4 T / N = 5 T / N = 6 T / N = 1 0 5 % Figure 2. Empirical size when T /N is fixed and N increases.formulas for p , q in Figure 3. Under the simplified formulas, the larger is c , the smaller isover-rejection and the closer the curve is to 5% line. As can be seen by comparing Figures2 and 3, for small c the formula with the correction leads to better rejection rates, while forlarge c it is the opposite, and we do not gain from finite sample correction. The conclusionis to use the simplified formula when c = T /N is at least 6. When one analyzes the power of a test, the question is which data generatingprocess (dgp) to use under an alternative H . In our setting the space of alternative dgpshas growing with N dimension. Thus, there is no clear choice of the proxy alternative hy-pothesis to use in simulations. Hence, instead we proceed with various random alternatives. rejection rate N T / N = 4 , s i m p l i f i e d T / N = 5 , s i m p l i f i e d T / N = 6 , s i m p l i f i e d T / N = 1 0 , s i m p l i f i e d 5 % Figure 3. Empirical size when T /N is fixed and N increases. Test based on p = 2, q = T /N − ε it are generated as i.i.d. N (0 , N –dimensional unit vectors, u, v ,such that the non-zero eigenvalue of uv ∗ is negative. Then we set Π = uv ∗ . By construction,Π has rank 1, singular value 1, and one eigenvalue between − so that X t is non-stationary, while ∆ X t is stationary. The average power constructed fromsuch random alternative is shown in Figure 4. As in the previous subsection we fix the ratio T /N and plot the average power as a function of N . Following our recommendation, we usesimplified formulas for p and q when T /N ≥ 6. We can see that the average power quicklyreaches 100% for all ratios of T /N . The larger is that ratio, the faster we reach 100%. Thisis due to the fact that higher ratio means higher time span. This gives the process morechances to accumulate the effects of the presence of cointegration.In the second experiment, we randomly generate a symmetric matrix Π of rank 1. Wedo this by generating a uniformly random N –dimensional unit vector v . Then we set Π = − λvv ∗ , where λ goes from 0 to 2. The coefficient λ equals to the non-zero eigenvalue of − Π. The fact that it lies between 0 and 2 guarantees that X t is non-stationary, while ∆ X t is stationary. Figure 5 shows the power as a function of λ for N = 100 and various values As N goes to infinity the non-zero eigenvalue is of order N − / . OINTEGRATION IN LARGE VARS 17 power N T / N = 4 T / N = 5 T / N = 6 , s i m p l i f i e d T / N = 1 0 , s i m p l i f i e d Figure 4. Power against random alternative of 1 cointegrating relationshipwhen T /N is fixed and N increases. power n o n - z e r o e i g e n v a l u e o f - P N = 1 0 0 , T = 4 0 0 N = 1 0 0 , T = 5 0 0 N = 1 0 0 , T = 6 0 0 , s i m p l i f i e d N = 1 0 0 , T = 1 0 0 0 , s i m p l i f i e d Figure 5. Power against random alternative of 1 cointegrating relationshipwhen Π is symmetric. To ensure stationarity of ∆ X a non-zero eigenvalue of − Π lies between 0 and 2. power n o n - z e r o e i g e n v a l u e o f - P T / N = 1 0 , s t d = 0 T / N = 1 0 , s t d = 1 0 T / N = 1 0 , s t d = 2 0 T / N = 5 , s t d = 0 T / N = 5 , s t d = 1 0 T / N = 5 , s t d = 2 0 Figure 6. Power against random alternative of 1 cointegrating relationshipwhen Π is symmetric. Initial condition is chosen as X = std · N (0 , N = 100.of T corresponding to T /N = { , , , } as in the previous experiments. The larger is − λ ,the larger is power, which eventually reaches 100%. When λ = 0, the dgp corresponds tothe null H . Thus, all curves start from ≈ T /N , the faster we reach 100%. The reason isthe same as in the previous simulation.Drawing the intuition from the unit root testing literature (e.g., M¨uller and Elliott (2003),Harvey et al. (2009)), we also analyze the sensitivity of power to the choice of initial con-dition, X . In two previous experiments we set X = 0. Figure 6 shows how the poweragainst random alternative of 1 cointegrating relationship for symmetric Π changes for vari-ous magnitudes of X . To be more specific, we redo the same simulations as in the previousparagraph, but start with X i ∼ i.i.d. std · N (0 , 1) for each Monte Carlo draw. We consider T /N = 5 and T /N = 10. Curves with std = 0 are the same as on Figure 5 (they are alsorepresented with the same style and color on both pictures). We can see that the largeris the magnitude of X , as measured by its standard deviation std , the slower the powerreaches 100%.In contrast to the previous paragraph, the power against random alternative of 1 cointe-grating relationship when Π is asymmetric (as in Figure 4) does not exhibit any substantialchanges with respect to the magnitude of X . Hence, we do not redraw the analogue ofFigure 4 for non-zero X . OINTEGRATION IN LARGE VARS 19 Overall, the simulations suggests the good asymptotic performance of our test procedureboth under H and H . The theoretical analysis of the power remains an open and challengingquestion. 6. Empirical illustration In this section we illustrate our testing strategy by analyzing cointegration in log prices ofvarious stocks. The search for cointegrations is a part of a stock market strategy called pairstrading, see, e.g., Krauss (2017) and references therein. For us this is a convenient testingground, as both N and T are large.We use logarithms of weekly S&P100 prices over ten years: 01 . . − . . − N = 92, T = 521 and T /N ≈ . k ), defined inEq. (2), when ranks of matrices Γ i and Π grow sub-linearly in N . They show that when N and T jointly go to infinity the Wachter distribution appears as a limit for the empiricaldistribution of eigenvalues λ , . . . , λ N , which solve Eq. (8). So we see in Figure 7 a closematch between theoretical predictions for general VAR( k ) and real data.On the other hand, the match with the Wachter distribution becomes worse for the largesteigenvalues, as some of them are slightly larger than the predictions. In Theorem 2 we haveshown that under the hypothesis H of no cointegrations these eigenvalues should be atdistance of order N − / from λ + , which is the right edge of the support of the Wachterdistribution. Thus, Theorem 2 shows that Figure 7 is not consistent with H . In moredetail, our cointegration test statistic in this case is 7 . 47, which is way above 1% criticalvalue from Table 1. Thus, we can reject the null of no cointegrations in favor of the presenceof cointegration.In principle, the cointegration could have been due to the fact that one of the companies’log price series is trend-stationary per se. In order to rule this out, we also redo the test for Figure 7. Eigenvalues from S&P data and Wachter distribution.the subset of 80 prices, for which the ADF test does not reject the unit root hypothesis. Inthis case the statistic equals 4 . 10 and we again reject the null of no cointegration. We remarkthat the fact that the ADF test rejected the null can not be treated as a true justificationof the presence of trend-stationary components, since when one repeats the same test for92 components of the time series, there is a significant chance of sporadic rejections, whichshould be taken into account. Thus, we are being on the safe side here in order to strengthenour belief in the presence of non-trivial cointegration in S&P100 prices. 7. Extensions Let us describe various possible extensions of our results. In Subsection 7.1 we considernon-Gaussian errors ε t . In Subsection 7.2 we discuss testing the hypothesis Π = − I N usingthe same approach as for Π = 0. We also refer to Bykhovskaya and Gorin (2020) for thediscussion of the general VAR( k ), k > 1, setting. The result of Theorem 2 is obtained under the assumptionthat the errors ε t in Eq. (3) are Gaussian. However, we believe that is should be possible toremove this restriction and it is reasonable to expect that the very same statement shouldhold for any (independent and identical across time t ) distribution of ε t , as long as it hassufficiently many moments. The underlying reason for this belief is the so-called universalityphenomenon in the random matrix theory: asymptotic local spectral characteristics of arandom matrix are almost independent of the distributions of the matrix elements, see, e.g.,Erdos and Yau (2012), Tao and Vu (2012). In particular, for the Wigner matrices ( Y + Y ∗ , We use 5% critical value and allow for autoregression of order 3 or less. OINTEGRATION IN LARGE VARS 21 where Y is a square matrix with real i.i.d. entries), it is known that the asymptotic behaviorof the largest eigenvalues depends only on the first two moments (expectation and variance)of the distribution of an individual matrix element. Note, however, that our asymptoticresult in Theorem 2 holds for any choice of the first two moment of Gaussian noise ε t : inEq. (3) the covariance matrix Λ is arbitrary and any shift in expectation can be absorbedinto the parameter µ . Hence, we conjecture that the result of Theorem 2 would hold for anydistribution of ε t , as long as it is sufficiently well-behaved.In order to test this conjecture we made simulations for the case when elements of ε t are non-gaussian, but i.i.d. across both i and t (corresponding to a diagonal covariancematrix Λ). We ran Monte-Carlo simulations for three different distributions for ε it : uniformon the interval [0 , { , , } , and the product of two independent N (0 , 1) random variables. In each case for T = 900, N = 300, and small values of r , wedo not see any significant changes in the distribution of (cid:80) ri =1 ln(1 − λ i ) from the limit inTheorem 2. However, things go differently when the distribution has heavy tails. In the forthexperiment we chose ε it to be Cauchy-distributed, and then the distribution of (cid:80) ri =1 ln(1 − λ i )dramatically changed from what we saw in the Gaussian case. Hence, we conclude, that theexistence of at least some number of moments of the errors { ε t } should be necessary for thevalidity of the conjecture. Rigorous proof of the conjecture remains an important problemfor the future research. ) setting. The main result of The-orem 2 is a development of a test for the hypothesis Π = 0 in the VAR(1) model (3). Onecould also try to understand for which other Π can the testing be possible. The asymptoticdistribution depends on the choice of Π, and it is not possible to estimate Π consistently inour regime of T and N growing to infinity proportionally. Thus, for general Π the problemseems infeasible at this point. However, there is another particular choice of Π for which anapproach very similar to Theorem 4 still works: Π = − I N . Denoting this hypothesis H w.n. ,where w.n. stays for the white noise, the data generating process (Eq. (3)) becomes:(20) H w.n. : X t = µ + ε t , t = 1 , . . . , T. In other words, we are now testing the hypothesis that the time series X t is independentacross time t against various VAR(1) alternatives. Here is one setup where such testing canbe relevant. Suppose that we want to forecast some variable Y and we chose some model forit. After estimating parameters of the model we obtain residuals X t . If we know that X t areindependent, then they are unforecastable and we cannot further improve our forecastingmodel. To check the above we can take the residuals X t and then apply our white noisehypothesis testing procedure. Let us introduce an adaptation of the Johansen statistic to H w.n. . As in Eq. (10), weuse the notation P for the de-meaning operator projecting on the hyperplane orthogonal to(1 , , . . . , X t P . For theincrements ∆ X t in addition to the conventional de-meaning we use an extra modification:we deal with cyclic increments ∆ c X t defined as:∆ c X t = X t +1 − X t , t = 1 , , . . . , T − ,X − X T , t = T. While it might seem bizarre to subtract the last observation from the first one, if we recallthat our current hypothesis of interest (20) ignores the time ordering, then this becomes lesscontroversial. Note also a shift of index by 1, as compared to the conventional ∆ X t , whichis compensated by the lack of shift t → t − X t , as compared to Eq. (5). Our choice ofdefinition of ∆ c is important for the following precise asymptotic results. As in Section 2,the conventional Johansen statistic should be thought of as a finite rank perturbation of themodified version that we now introduce.(21) S w.n. = ∆ c X P (∆ c X ) ∗ , S w.n. = ∆ c X P X ∗ , S w.n. = X P (∆ c X ) ∗ , S w.n. = X P X ∗ , We further define N numbers λ ≥ λ ≥ · · · ≥ λ N as N roots to the equation(22) det (cid:0) S w.n. ( S w.n. ) − S w.n. − λS w.n. (cid:1) = 0 . Equivalently, { λ i } are eigenvalues of S w.n. ( S w.n. ) − S w.n. ( S w.n. ) − . Theorem 5. Suppose that T, N → ∞ in such a way that T > N and the ratio T /N remainsbounded. Under the hypothesis H w.n. one can couple (i.e. define on the same probabilityspace) the eigenvalues λ ≥ λ ≥ · · · ≥ λ N of the matrix S w.n. ( S w.n. ) − S w.n. ( S w.n. ) − andeigenvalues x ≥ · · · ≥ x N of Jacobi ensemble J ( N ; T − N − , T − N ) in such a way that foreach (cid:15) > we have lim T,N →∞ Prob (cid:18) max ≤ i ≤ N | λ i − x i | < N − (cid:15) (cid:19) = 1 . The proof of Theorem 5 follows a similar strategy as Theorem 4 and we refer to Section 9.5in the Appendix for details; in particular, the proofs rely on yet another novel appearanceof the Jacobi ensemble.The remaining straightforward step to obtain the asymptotics of various statistics builton the eigenvalues { λ i } is to combine Theorem 5 with asymptotic results for the Jacobiensemble presented in Section 9.3. This is in the spirit of Theorem 2.Note that the hypothesis H w.n. implies the maximal amount of cointegrating relationships:each of the N components of X t is already stationary. A reasonable alternative hypothesis H is the presence of N − r cointegrating relationships. For simplicity, let us concentrate OINTEGRATION IN LARGE VARS 23 on the case r = 1. Then the alternative can be also interpreted as a presence of a singlegrowing factor. In this situation we expect the smallest eigenvalue λ N to be a good teststatistic. We see in numerical simulations that λ N is bounded away from 0 under H w.n. .It can also be formally proved by combining Theorem 5 with asymptotics of the Jacobiensemble from Section 9.3. Thus, if λ N is close to 0, then we are able to reject H w.n. . Thesame simulations indicate that λ N starts to be close to 0 when there are at most N − λ N should have a good asymptoticpower. We leave rigorous justifications of this observation till future research, and for nowonly mention the following heuristics: the stationary linear combinations of X t are stronglycorrelated with the same linear combinations of ∆ c X t ; on the other hand, the growing linearcombinations of X t have very weak correlation with the same linear combinations of ∆ c X t (cf. correlations of a one dimensional random walk with its increments). Hence, if the latterare present, the smallest canonical correlations of X P and ∆ c X P , which coincide with theeigenvalues of the matrix S w.n. ( S w.n. ) − S w.n. ( S w.n. ) − , should become small leading to closeto 0 value of λ N . 8. Conclusion The paper presents a cointegration test which has desirable empirical size when N and T are of the same magnitude. To our knowledge, this is a first paper which constructsand analyzes asymptotic properties of a test that does not suffer from significant distortions(such as over-rejection) for comparable N and T . The test is based on the Johansen LRtest and incorporates some additional steps. First, our procedure reinforces the importanceof de-trending in cointegration testing. It turns out, that de-trending is crucial for derivingdesirable asymptotic properties. (E.g., only after de-trending one can rewrite the laggedprocess as a linear function of its first differences.) Second, our asymptotic results reveal andexplain an unexpected connection between the Johansen cointegration test and the Jacobiensemble — a classical ensemble of the random matrix theory whose previous appearancesin statistics include multivariave analysis of variance (MANOVA) and sample canonicalcorrelations for independent sets of data.On the theoretical side the next step would be to go from null hypothesis of zero cointe-gration to analyzing the behaviour of our test under r cointegrations. This will allow us tocalculate the power of the test, reinforcing our simulational findings in Section 5.2, as wellas to perform tests of r versus r + 1 cointegrations.On the empirical side it would be interesting to apply our test to other data sets beyondwhat is presented in Section 6. Annual cross-country data provides a natural example of oursetting where the number of years and countries is comparable. Another example arises ifone considers network-type settings which evolve over time (e.g., as in Bykhovskaya (2020)). Data on trade or on foreign direct investment can potentially be non-stationary, especially ifwe focus on largest and the most active countries. Moreover, although such monthly data isavailable, for many countries it only covers 20 years. Thus, we have T ≈ N = 90 cross-section units, which fitsideally in our setting. Classical cointegration tests are known to perform poorly in the abovesettings. However, the asymptotic results of our paper open up a possibility of detecting thepresence of cointegration in such time-series data. 9. Appendix9.1. New matrix models for the Jacobi ensemble. Recall that our testing procedurerelies on the squared sample canonical correlations between two correlated data sets. As welater show in the proofs of Propositions 11 and 16, an equivalent point of view is that wedeal with eigenvalues of a product of two orthogonal projectors P and P , where P is aprojection on a random N -dimensional subspace V of a T –dimensional space and P is aprojection on U V , where U is a certain deterministic linear operator.We randomize this problem by replacing U with a random operator, whose spectrum isclose to U . The randomized problem turns out to be exactly solvable — the eigenvalues ofnew P P coincide with the classical Jacobi ensemble. The goal of this section is to provethis fact. The choices of U and V depend on the hypothesis that we are testing and, hence,we need several theorems. Throughout this section we a going to deal both with real andcomplex matrices. According to the customary random matrix theory notation, they arereferred to as β = 1 and β = 2 cases, respectively.In what follows I n means the identity matrix in n –dimensional space. Sometimes we omit n and write simply I when the dimension is clear from the context. For a matrix X we let[ X ] NN to be its top-left N × N corner. The parameter T used in this section is related to T of the main text through T = T − Q stays for a n × n matrix: • The map Z (cid:55)→ QZ on n × m matrices has the Jacobian(23) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( QZ ) ∂Z (cid:12)(cid:12)(cid:12)(cid:12) = | det Q | βm , OINTEGRATION IN LARGE VARS 25 • The map Z (cid:55)→ QZQ ∗ from the space of n × n symmetric (Hermitian if β = 2) matricesto itself has the Jacobian(24) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( QZQ ∗ ) ∂Z (cid:12)(cid:12)(cid:12)(cid:12) = | det Q | βn +2 − β = | det Q | n , β = 2 , | det Q | n +1 , β = 1 . • The map Z (cid:55)→ QZQ ∗ from the space of n × n skew-symmetric (skew-Hermitian if β = 2) matrices to itself has the Jacobian(25) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( QZQ ∗ ) ∂Z (cid:12)(cid:12)(cid:12)(cid:12) = | det Q | βn + β − = | det Q | n , β = 2 , | det Q | n − , β = 1 . The first identity (23) follows from the observation that each column of Z is transformedby linear map Q and there are m such columns. The second and third identities are similarand we refer to Forrester (2010, (1.35)) for details.The first theorem of this section is relevant for the setting of Theorem 4 for VAR(1). Theorem 6. Assume T ≥ N . Let O be a random T × T matrix chosen from the uniformmeasure on the group of real orthogonal matrices of determinant if β = 1 or of complexunitary matrices if β = 2 . Define U = ( I T + O ) − . Then the matrix (26) M = [ U ] NN ([ U ∗ U ] NN ) − [ U ∗ ] NN is distributed as the N × N real symmetric (if β = 1 ) or complex Hermitian (if β = 2 ) matrixof density (27) det( M ) β N + β − det( I N − M ) β ( T − N +1) − , ≤ M ≤ I N , β = 1 , . with respect to the Lebesgue measure on symmetric/Hermitian matrices. Remark 1. Define a T ×T –dimensional matrix M by putting M in N × N corner of M andfilling the rest with zeros. Non-zero eigenvalues of M and M are the same. Simultaneously, M can be identified with P P P , where P is the orthogonal projector on space V spannedby the first N coordinate vectors and P is the projector on U V . Remark 2. If T < N , then the spaces V and U V necessarily intersect, and therefore M hasdeterministic eigenvalues which equal . This should be taken into account when extending (27) to this case and we will not pursue this direction here.Proof of Theorem 6. We parameterize the orthogonal (or unitary if β = 2) group by themeans of the Cayley transform. Namely, we set R = ( I T − O )( I T + O ) − = I T − OI T + O , so that O = I T − R I T + R . Since O − = O ∗ , R is a skew-symmetric T × T matrix, i.e., R ∗ = −R . The uniform (Haar)measure on O leads to the following distribution on R , in which we omitted the irrelevantfor us normalization constant:(28) det( I T − R ) − β T + − β d R , β = 1 , , see, e.g., Forrester (2010, (2.55)). Further, we have U = I T + R , so that(29) M = [ I T + R ] NN ([ I T − R ] NN ) − [ I T − R ] NN . We partition T = N + ( T − N ) and write I T + R in a block form according to this split: I T + R = (cid:32) I N + A B − B ∗ I T − N + C (cid:33) , where A is a N × N skew-symmetric matrix, C is ( T − N ) × ( T − N ) skew-symmetric matrixand B is an arbitrary N × ( T − N ) matrix.We make a change of variables by introducing ˜ B so that B = ( I N − A ) ˜ B, and B ∗ = ˜ B ∗ ( I N + A ) . Using (23) we compute the Jacobian of the transformation:(30) (cid:12)(cid:12)(cid:12)(cid:12) ∂B∂ ˜ B (cid:12)(cid:12)(cid:12)(cid:12) = | det( I N − A ) | β ( T − N ) . Note that(31) M = ( I N + A )( I N − A + BB ∗ ) − ( I N − A )= ( I N + A ) (cid:0) ( I N − A )( I N + ˜ B ˜ B ∗ )( I N + A ) (cid:1) − ( I N − A ) = (cid:0) I N + ˜ B ˜ B ∗ ) − Using the formula for the determinant of a block matrixdet (cid:32) A BC D (cid:33) = det A · det( D − CA − B ) , OINTEGRATION IN LARGE VARS 27 we also have(32) det( I T + R ) = det (cid:32) I N + A ( I N − A ) ˜ B − B ∗ ( I N + A ) I T − N + C (cid:33) = det( I N + A ) det (cid:32) I N ( I N − A ) ˜ B − ˜ B ∗ I T − N + C (cid:33) = det( I N + A ) det( I T − N + C + ˜ B ∗ ( I N − A ) ˜ B )= det( I N + A ) det( I T − N + ˜ B ∗ ˜ B ) det (cid:18) I T − N +( I T − N + ˜ B ∗ ˜ B ) − / ( C − ˜ B ∗ A ˜ B )( I T − N + ˜ B ∗ ˜ B ) − / (cid:19) , Next, we introduce˜ C = ( I T − N + ˜ B ∗ ˜ B ) − / ( C − ˜ B ∗ A ˜ B )( I T − N + ˜ B ∗ ˜ B ) − / and notice that the map C (cid:55)→ ˜ C preserves the space of all skew-symmetric ( T − N ) × ( T − N )matrices. Using (25) the map has the Jacobian(33) ∂C∂ ˜ C = det( I T − N + ˜ B ∗ ˜ B ) β ( T − N )+ β − , Combining (30), (32), and (33), we rewrite the measure (28) as follows:(34) det( I T − R ) − β T + − β dA dB dC = (cid:12)(cid:12) det(1 T + R ) − β T +2 − β (cid:12)(cid:12) dA dB dC = (cid:12)(cid:12)(cid:12) det( I N + A ) det( I T − N + ˜ B ∗ ˜ B ) det( I T − N + ˜ C ) (cid:12)(cid:12)(cid:12) − β T +2 − β × | det( I N − A ) | β ( T − N ) · det( I T − N + ˜ B ∗ ˜ B ) β ( T − N )+ β − dA d ˜ B d ˜ C. The key property of (34) is that the measure has factorized and projecting onto the ˜ B –component is straightforward. We conclude that ˜ B is distributed according to the measure(35) det( I T − N + ˜ B ∗ ˜ B ) − β ( T + N )+ − β d ˜ B = det( I N + ˜ B ˜ B ∗ ) − β ( T + N )+ − β d ˜ B, where we used det( I + U W ) = det( I + W U ) in the last equality. According to Forrester(2010, Exercise 3.2.q6), (35) implies that the symmetric (or Hermitian if β = 2) non-negativedefinite N × N matrix F = ˜ B ˜ B ∗ has the law(36) det( F ) β ( T − N +1) − det( I N + F ) − β ( T + N )+ − β dF. We have by (31) M = 1 I N + F , F = 1 − MM , dM = − I N + F dF I N + F . Using (24) we have(37) (cid:12)(cid:12)(cid:12)(cid:12) ∂M∂F (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) I N + F (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) βN − β +2 = | det M | βN − β +2 . Formulas (36) and (37) imply that the matrix M has the distributiondet( I N − M ) β ( T − N +1) − det( M ) − β ( T − N +1)+1 det( M ) β ( T + N ) − − β det( M ) − βN + β − dM = det( I N − M ) β ( T − N +1) − det( M ) β N + β − dM, ≤ M ≤ I N . (cid:3) Our next theorem gives another realisation of the Jacobi ensemble, which is relevant totesting Π = − I N in VAR(1) setting. Theorem 7. Assume T ≥ N . Let O be a random T × T real matrix chosen from theuniform measure on the group of orthogonal matrices with determinant if β = 1 or ofcomplex unitary matrices if β = 2 . Let A be N × N top-left corner of O . Then the matrix M = ( I N + A )(2 I N + A + A ∗ ) − ( I N + A ∗ ) is distributed as N × N real symmetric (if β = 1 ) or complex Hermitian (if β = 2 ) matrixof density (38) det( M ) β ( T − N )+ β − det( I N − M ) β ( T − N +1) − , ≤ M ≤ I N , β = 1 , . with respect to the Lebesgue measure on symmetric/Hermitian matrices.Proof. The computation of the law of A is well-known and Forrester (2010, (3.113)) providesa formula for the density with respect to the Lebesgue measure on all real/complex N × N matrices. It is(39) det( I N − A ∗ A ) β ( T − N +1) − dA, ≤ A ∗ A ≤ I N , where we omit here and below the normalization constant, which makes the total mass ofmeasure equal to 1. The matrix M is a function of A and in the rest of the proof we transformthe measure (39) to make the distribution of this function explicit.Our first step is to rewrite (39) in terms of M . We claim that(40) det( I N − A ∗ A ) = det( I N − M ) det(2 I N + A + A ∗ )Indeed, we first define M = ( I N + A ∗ )( I N + A )(2 I N + A + A ∗ ) − and notice that I N − A ∗ A = 2 I N + A + A ∗ − M (2 I N + A + A ∗ ) = ( I N − M )(2 I N + A + A ∗ ) . Hence,(41) det( I N − A ∗ A ) = det( I N − M ) det(2 I N + A + A ∗ )and (40) is obtained from (41) by using the identity det( I − CD ) = det( I − DC ). OINTEGRATION IN LARGE VARS 29 We conclude that the density (39) has the form(42) det( I N − M ) β ( T − N +1) − det(2 I N + A + A ∗ ) β ( T − N +1) − dA. Note that the first factor has the desired form from the statement of the theorem.We now split the Euclidean space of all N × N real (or complex) matrices into the sym-metric (or Hermitian) and skew-symmetric (or skew-Hermitian) parts. For that we define X = I N + A + A ∗ , Y = A − A ∗ . Since A belongs to a unit matrix ball (because AA ∗ + BB ∗ = I N , where B is the top-right N × ( T − N ) corner of O ), X is a positive-definite symmetric (or Hermitian) matrix, i.e., X ≥ 0. The relation ( I N + A )(2 I N + A + A ∗ ) − ( I N + A ∗ ) = M is now rewritten as(43) ( X + Y )(2 X ) − ( X − Y ) = M or X = 2 M + Y X − Y. Note that X − is positive definite, while Y is skew-Hermitian. This implies that Y X − Y ≤ Y X − Y is a negative semi-definite Hermitian matrix. Hence, (43) implies X ≤ M, which means that 2 M − X is a non-negative definite matrix.Using (43) we make a change of coordinates in the space of matrices ( X, Y ) → ( M, Y ) . The Jacobian of this change of coordinates isdet (cid:32) ∂M∂X ∂Y∂X∂M∂Y ∂Y∂Y (cid:33) = det (cid:32) ∂M∂X ∂M∂Y I (cid:33) = det (cid:18) ∂M∂X (cid:19) Therefore, we need to compute the Jacobian of the map X (cid:55)→ M = (cid:0) X − Y X − Y (cid:1) , which maps the Eucledian space of symmetric (Hermitian if β = 2) matrices to itself. Dif-ferentiating, and using dX − = X − dXX − , we see the matrix identity(44) 2 dM = dX − Y X − dXX − Y = dX + ( Y X − ) dX ( Y X − ) ∗ , where we used X = X ∗ , Y = − Y ∗ in the last identity. Hence, the Jacobian of the map X (cid:55)→ M is a function of Y X − and we denote this function J ( Y X − ). Therefore, (42)becomes(45) ∼ det( I N − M ) β ( T − N +1) − det( X ) β ( T − N +1) − J − ( Y X − ) dM dY. We now make another change of variables by setting˜ Y = M − / Y M − / , Y = M / ˜ Y M / . The Jacobian of the map ( M, Y ) → ( M, ˜ Y ) is the same as the Jacobian of the map Y → ˜ Y ,which is computed by (25) as(46) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ˜ Y∂Y (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = | det M | − β N + − β = (det M ) − N , β = 2 , (det M ) − ( N − / , β = 1 . This converts (45) into(47) ∼ det( I N − M ) β ( T − N +1) − (det M ) β N + β − det( X ) β ( T − N +1) − J − ( Y X − ) dM d ˜ Y. It remains to figure out the factors involving det X and Y X − in the last formula. For thatlet us introduce the notation ˜ X = M − / XM − / , and notice that (43) implies(48) ˜ X = 2 I N + ˜ Y ˜ X − ˜ Y, ≤ ˜ X ≤ I N . Solution to (48) gives us a function ˜ X = ˜ X ( ˜ Y ), such that X = M / ˜ XM / . Hence, we cantransform(49) det( X ) β ( T − N +1) − = det( M ) β ( T − N +1) − det( ˜ X ) β ( T − N +1) − . For J ( Y X − ) we notice two properties. First, Y X − = M / ˜ Y ˜ X − M − / . In addition, we have the following statement, which we prove later: Claim. For the above Jacobian J ( · ), we have(50) J ( A ) = J ( B − AB ) , whenever B = B ∗ . Thus, J ( Y X − ) = J ( ˜ Y ˜ X − ) = J ( f ( ˜ Y )) for a certain function f , and using (49) and (50) werewrite (47) as(51) det( I N − M ) β ( T − N +1) − (det M ) β N + β − det( M ) β ( T − N +1) − g ( ˜ Y ) dM d ˜ Y for a certain function g ( ˜ Y ). Since the parts involving ˜ Y and M are now decoupled, we canintegrate out ˜ Y arriving at the desired expression for the distribution of M :(52) det( I N − M ) β ( T − N +1) − det( M ) β ( T − N )+ β − dM. OINTEGRATION IN LARGE VARS 31 It remains to prove the claim (50). By definition (44), J ( B − AB ) is the Jacobian of thelinear map dX (cid:55)→ dX + ( B − AB ) dX ( BA ∗ B − )We split this map into a composition of three: dX (cid:55)→ B dX B (cid:55)→ ( B dX B ) + A ( B dX B ) A ∗ (cid:55)→ B − (cid:2) ( B dX B ) + A ( B dX B ) A ∗ (cid:3) B − .J ( B − AB ) is the product of Jacobians of these three maps. The first and the last maps areinverse to each other (above we used the explicit Jacobian of these maps, but this is noteven needed) and their Jacobians cancel. The Jacobian of the middle map is the desired J ( A ). (cid:3) The proof of Theorem 4 is split into two steps. First, Propo-sition 11 uses rotation invariance of the Gaussian law to rewrite the matrix S S − S S − as a product of corners of certain matrices, reminiscent of (26). Next, we show in Proposi-tion 12 that replacement of a certain deterministic matrix (in that product) by its randomperturbation leads precisely to (26) and simultaneously has controlled effect on the changeof eigenvalues.We need to introduce some deterministic T × T matrices. The summation matrix Φ has1’s below the diagonal and 0’s on the diagonal and everywhere above the diagonal:Φ = . . . 01 0 0 . . . 01 1 0 . . . 0. . .1 1 . . . . Definition 8. V is the ( T − –dimensional hyperplane orthogonal to (1 , , . . . , . We let P be the orthogonal projector on V (see Eq. (10)). Finally, we set˜Φ = P Φ P . We also need the cyclic version of the lead operator F mapping ( x , x , . . . , x T ) to( x , x , x , . . . , x T , x ). V is an invariant subspace for F and we denote through F V therestriction of F onto V . Lemma 9. The operator ˜Φ preserves the space V . Its restriction onto V coincides with − ( I V − F V ) − , where I V is the identical operator acting in V . Proof of Lemma 9. Let e i , i = 1 , . . . , T denote the i th coordinate vector. Then ( e i − e i +1 ) T − i =1 gives a linear basis of V . Since Φ e i = e i +1 + e i +2 + · · · + e T , we have˜Φ( e i − e i +1 ) = P Φ( e i − e i +1 ) = P e i +1 = e i +1 − T T (cid:88) j =1 e j . Applying I − F to the last vector and noticing that F e i +1 = e i , we conclude that( I − F ) ˜Φ( e i − e i +1 ) = e i +1 − e i . (cid:3) Let us introduce one more notation. Choose some orthonormal basis ˜ e , ˜ e , . . . , ˜ e T − of V ,e.g., this can be an ortogonalization of e − e , e − e , . . . , e T − − e T (but the exact choiceis irrelevant for the following theorems). Definition 10. For an operator A acting in V we set [ A ] VNN to be the N × N corner of theoperator A , taken in the basis { ˜ e i } . Next, we take a uniformly-random orthogonal (or unitary if β = 2) operator ˜ O acting in( T − V and define an operator ˜ U acting in V :˜ U = ( I V − ˜ OF V ˜ O ∗ ) − . Let us introduce a symmetric (or Hermitian if β = 2) N × N matrix ˜ M ,(53) ˜ M = [ ˜ U ] VNN ([ ˜ U ∗ ˜ U ] VNN ) − [ ˜ U ∗ ] VNN , Proposition 11. Choose an arbitrary positive definite covariance matrix Λ . Let ε be N × T matrix of random variables (real if β = 1 and complex if β = 2 ), such that T columnsof ε are i.i.d., and each of them is an N -dimensional mean zero Gaussian vector with co-variance Λ . Fix arbitrary N –dimensional vectors X and µ . Define the N × T matrix X = ( X , X , . . . , X T ) as in Eq. (3) via recurrence X t = X t − + µ + ε t , t = 1 , . . . , T. Further set ∆ X t = X t − X t − and ˜ X t = X t − − t − T ( X T − X ) , t = 1 , . . . , T . Define N × N matrices: S = ∆ X P ∆ X ∗ , S = ∆ X P ˜ X ∗ , S = S ∗ = ˜ X P ∆ X ∗ , S = ˜ X P ˜ X ∗ . Then the eigenvalues of the matrix S S − S S − have the same distribution as those of ˜ M in (53) .Proof. We start by expressing the matrices S ij via ε . Clearly, ∆ X t = µ + ε t . Further,∆ X t P = µ + ε t − T T (cid:88) i =1 ( µ + ε i ) , ∆ X P = ε P . OINTEGRATION IN LARGE VARS 33 Hence, S = ε P ε ∗ . Next,˜ X t = X + ( t − µ + t − (cid:88) i =1 ε i − t − T (cid:32) T µ + T (cid:88) i =1 ε i (cid:33) = X + t − (cid:88) i =1 ε i − t − T T (cid:88) i =1 ε i We claim that ˜ X P = ε ˜Φ ∗ . Indeed, ˜ X P coincides with ≈ X P , where ≈ X t = ˜ X t − X = t − (cid:88) i =1 ε i − t − T T (cid:88) i =1 ε i = t − (cid:88) i =1 (cid:32) ε i − T T (cid:88) j =1 ε j (cid:33) . Since Φ is the summation operator, we have ≈ X = (Φ( ε P ) ∗ ) ∗ = ε P Φ ∗ and the claim is provenas ˜Φ ∗ = P Φ ∗ P .We conclude that(54) S = ε P ε ∗ , S = ε ˜Φ ε ∗ , S = S ∗ = ε ˜Φ ∗ ε ∗ , S = ε ˜Φ ∗ ˜Φ ε ∗ . Next, note that we can (and will) assume without loss of generality that the covariancematrix Λ is identical, i.e., all matrix elements of ε are i.i.d. random variables N (0 , 1) if β = 1 and N (0 , 1) + i N (0 , 1) if β = 2. Indeed, using (54), if we take any non-degenerate N × N matrix A and replace ε by Aε , then the matrix S S − S S − is conjugated by A ,which keeps eigenvalues unchanged. By choosing appropriate A , we can guarantee that thecovariance of columns of Aε is identical, and then rename Aε as ε .In the remaining proof we explicitly couple ε with ˜ O (arising in the definition of ˜ M ) byconstructing ˜ O using randomness coming from ε .Since for any two rectangular matrices A and B of the same sizes, AB ∗ and B ∗ A havethe same non-zero eigenvalues, the eigenvalues of S S − S S − can be identified (using also P ˜Φ = ˜Φ) with those of(55) (cid:0) P ε ∗ S − ε P (cid:1)(cid:0) ˜Φ ε ∗ S − ε ˜Φ ∗ (cid:1) = P P , where P is the orthogonal projector onto the space spanned by N columns of P ε ∗ and P isthe orthogonal projector onto the N columns of ˜Φ ε ∗ . Since, the eigenvalues of P P are thesame as those of P P P , and the latter operator acts as 0 on the orthogonal complementof V , we can restrict all the operators to V . Due to invariance of the Gaussian law withrespect to rotations by orthogonal (or unitary if β = 2) matrices, the columns of P ε ∗ spana rotationally-invariant N –dimensional subspace in V . The columns of ˜Φ ε ∗ = ˜Φ P ε ∗ thenspan the ˜Φ–image of this subspace.The eigenvalues of P P P are preserved when we conjugate each of the projectors with anorthogonal transformation O , i.e. replace P P P with OP O ∗ OP O ∗ OP O ∗ = OP P P O ∗ .In this transformation the spaces to where P and P are projecting are replaced by their O –images. We take O to be a random operator satisfying two conditions: • O maps the span of the columns of P ε ∗ to the subspace spanned by the first N basisvectors, ˜ e , . . . , ˜ e N in V ; • O is distributed as uniformly random orthogonal operator acting in V , i.e., O d = ˜ O .The existence of such O follows from the rotational invariance of the Gaussian law. Hence, OP O ∗ is a projector onto (cid:104) ˜ e , . . . , ˜ e N (cid:105) , and we conclude that the eigenvalues of P P P coincide with those of [ OP O ∗ ] VNN . Since P is the projector on the span of columns of ˜Φ ε ∗ = O ∗ ( O ˜Φ O ∗ ) Oε ∗ , OP O ∗ is theprojector on the O –image of this span, i.e., OP O ∗ is the projector onto the columns of( O ˜Φ O ∗ ) Oε ∗ . Since columns of Oε ∗ span (cid:104) ˜ e , . . . , ˜ e N (cid:105) and O ˜Φ O ∗ = O ( − ( I V − F V ) − ) O ∗ = − ˜ U on V by Lemma 9, we reach the formula for ˜ M as the N × N corner of the projector on thefirst N columns of ˜ U . (cid:3) Set T = T − O be uniformly random real orthogonal of determinant 1 (if β = 1)or complex unitary (if β = 2) T × T matrix. Define U = ( I T + O ) − . Then we set, as in(26),(56) M = [ U ] NN ([ U ∗ U ] NN ) − [ U ∗ ] NN For a matrix A with real spectrum, we set λ i ( A ) to be the i th largest eigenvalue of A . Proposition 12. Assume T = T − ≥ N . One can couple M from (56) with ˜ M from (53) in such a way that for each (cid:15) > we have Prob (cid:18) max ≤ i ≤ N | λ i ( M ) − λ i ( ˜ M ) | < N − (cid:15) (cid:19) → , as T, N → ∞ in such a way that the ratio T /N remains bounded. Remark 3. We believe that the condition T ≥ N can be weakened and replaced by T /N bounded away from , but we stick to it, since that’s the situation when Theorem 6 applies. Let us compare the definitions of M and ˜ M . There is only one difference between them:the eigenvalues of − ˜ OF V ˜ O ∗ in the definition of ˜ M are deterministic, while the eigenvalues of O are random. Note that both matrices − ˜ OF V ˜ O ∗ and O have uniformly-random eigenvec-tors. Then the idea of the proof is to rely on the phenomenon of rigidity for random matrixeigenvalues, which says that as T → ∞ , the eigenvalues of O are very close to their de-terministic expected positions, which, in turn, essentially coincide with eigenvalues of − F V .One technical difficulty is that we need to deal with ( I + O ) − , whose eigenvalues can belarge (the absolute value of the largest eigenvalue grows linearly in T ), however, as we will One way to see it is by noticing that the span of the columns of P ε ∗ and the span of N columns of O − (for uniformly random O ) have the same distribution given by the uniform measure on all N –dimensionalsubspace of T –dimensional space. OINTEGRATION IN LARGE VARS 35 see, one can study ( I + O ) instead of ( I + O ) − , and the problem of unbounded operatorsdisappears. We proceed to the detailed proof. Proof of Proposition 12. We start by explicitly constructing the desired coupling. The eigen-values of F V are all roots of unity of order T different from 1. If β = 2, then we can diagonalize F V to turn it into T × T = ( T − × ( T − 1) diagonal matrix with the roots of unity onthe diagonal. If β = 1, the matrix F V should be block-diagonalized (with blocks of size 2and one additional block of size 1 corresponding to eigenvalue − T is even): the pair ofcomplex conjugate roots of unity ω and ¯ ω gives rise to the 2 × | arg( ω ) | . Let us denote by D the resulting (block) diagonal matrix multiplied by − | arg( − ω ) | , i.e. the top-left 2 × D correspondsto the pair of the closest to 1 eigenvalues of D .The eigenvalues of O also lie on the unit circle and if β = 1 they come in complex-conjugate pairs. Hence, O can be similarly block-diagonalized (we do not need to multiplyby − D rand the result. The distinction with F V is that theeigenvalues are random and so is D rand . The law of the eigenvalues of O is explicitly knownin the random-matrix literature. Both for β = 1 and β = 2 they form a determinantal pointprocess on the unit circle with explicit kernel. The repulsion between the eigenvalues leadsto them being very close to evenly spaced as T → ∞ . We summarize this property in thefollowing statement (which is a manifestation of much more general rigidity of eigenvalues,see, e.g., Erdos and Yau (2012)), whose proof can be found in Meckes and Meckes (2013,Lemma 10, m = 1, u = T δ case, and Section 5). Claim. There exist constants c ( β ) , c ( β ) > 0, such that for β = 1 , 2, every δ > 0, thereexists T ( δ ) and for every T > T ( δ ) we have (57) Prob (cid:18) max ≤ i,j< T (cid:12)(cid:12) D − D rand (cid:12)(cid:12) ij > T − δ (cid:19) < c ( β ) · T · exp (cid:18) − c ( β ) T δ log T (cid:19) . We remark that since D and D rand are block-diagonal, the bound on the maximum matrixelement of their difference is equivalent to a similar bound for any other norm, e.g., for themaximum absolute value among the eigenvalues of D − D rand .We now choose another T ×T uniformly-random orthogonal (or unitary if β = 2) matrix O (independent from the rest), replace − ˜ OF V ˜ O ∗ with O DO ∗ and replace O with O D rand O ∗ .The invariance of the uniform measure on the orthogonal group SO ( N ) (or on the unitarygroup U ( N ) if β = 2) with respect to right/left multiplications, implies the distributionalidentities: − ˜ OF V ˜ O ∗ d = O DO ∗ , O d = O D rand O ∗ . All the constants can be made explicit, following Meckes and Meckes (2013). The right-hand sides of the identities provide the desired coupling and (57) implies thatthese two random matrices are close to each other as T → ∞ . Both matrices M and ˜ M areobtained from the above two matrices by the following mechanism: Map M ( Z ) : Given an orthogonal (or unitary if β = 2) matrix Z in T –dimensional space,we define U ( Z ) = ( I T + Z ) − and an N × N symmetric (or hermitian) matrix M ( Z ) through(58) M ( Z ) = [ U ( Z )] NN ([ U ∗ ( Z ) U ( Z )] NN ) − [ U ∗ ( Z )] NN . Clearly, under our coupling M = M ( O D rand O ∗ ) , ˜ M = M ( O DO ∗ ) , and it remains to prove a form of the uniform continuity of the map Z (cid:55)→ M ( Z ) as N, T → ∞ .We note that 2 U ( Z ) − I T = I T − ZI T + Z is a skew-symmetric matrix and we use it to write 2 U ( Z ) in the block form according to thesplitting T = N + ( T − N ).2 U ( Z ) = (cid:32) I N + A ( Z ) B ( Z ) − B ∗ ( Z ) I T − N + C ( Z ) (cid:33) , A ∗ ( Z ) = − A ( Z ) , C ∗ ( Z ) = − C ( Z ) . Then, as in the proof of Theorem 6, we have(59) M ( Z ) = (cid:0) I N + ˜ B ( Z ) ˜ B ∗ ( Z ) (cid:1) − , ˜ B ( Z ) = (cid:0) I N − A ( Z ) (cid:1) − B ( Z ) . and our statement boils down to the uniform continuity of the map Z (cid:55)→ ˜ B ( Z ). We remarkthat in our limit regime the matrix B ( Z ) B ∗ ( Z ) might have large eigenvalues (in fact, of order T ) and therefore, B ( Z ) is potentially an exploding factor in the definition of ˜ B ( Z ). However,the exploding parts would precisely cancel out with similarly growing parts in ( I N − A ( Z )).In order to see that, we use the formula for the inverse of the block matrix, which reads (cid:32) A BC D (cid:33) − = (cid:32) A − + A − BQCA − − A − BQ − QCA − Q (cid:33) , Q = ( D − CA − B ) − The advantage of this formula is that when used for 2 U ( Z ), the ratio − CA − = B ∗ ( z )(1 + A ( z )) − = ˜ B ∗ ( Z ) gets expressed as Q − ( − QCA − ), which is the ratio of two blocks in U ( Z ) − = 1 + Z . Thus, if we write 1 + Z itself in the block form according to T = N + ( T − N ) splitting: I T + Z = (cid:32) I N + Z Z Z I T − N + Z (cid:33) , The same observation can be used to link Theorems 6 and 7 to each other. OINTEGRATION IN LARGE VARS 37 then ˜ B ∗ ( Z ) = Q − ( − QCA − ) = ( I T − N + Z ) − Z ,M ( Z ) = (cid:18) I N + Z ∗ (cid:0) ( I T − N + Z )( I T − N + Z ∗ ) (cid:1) − Z (cid:19) − Hence, the matrix M ( Z ) has the same (other from 1) eigenvalues as M (cid:48) ( Z ) = (cid:18) I T − N + (cid:0) ( I T − N + Z )( I T − N + Z ∗ ) (cid:1) − Z Z ∗ (cid:19) − The latter can be simplified, since Z is orthogonal, implying Z Z ∗ + Z Z ∗ = I T − N :(60) M (cid:48) ( Z ) = (cid:18) I T − N + (cid:0) ( I T − N + Z + Z ∗ + Z Z ∗ ) (cid:1) − ( I T − N − Z Z ∗ ) (cid:19) − = (cid:18)(cid:0) ( I T − N + Z + Z ∗ + Z Z ∗ ) (cid:1) − (2 · I T − N + Z + Z ∗ ) (cid:19) − = (2 · I T − N + Z + Z ∗ ) − (cid:0) I T − N + Z + Z ∗ + Z Z ∗ (cid:1) At this point we rely on the lemma, which is proven below: Lemma 13. Fix any small υ > . If Z is a uniformly random element of the group SO ( T ) oforthogonal determinant matrices (or U ( T ) if β = 2 ), then there exists r > such that withprobability tending to (uniformly) as T , N → ∞ in such a way that υ < T /N < υ − ,the smallest eigenvalue of I T − N + Z + Z ∗ is larger than r . Now we are ready to compare M (cid:48) ( Z ) with M (cid:48) ( ˜ Z ), where Z = O D rand O ∗ , ˜ Z = O DO ∗ .Fix δ > 0. By (57), with probability tending to 1 as N, T → ∞ , (cid:107) Z − ˜ Z (cid:107) < T δ − , where (cid:107) · (cid:107) is the spectral norm of the matrix (i.e. its largest singular value). Hence, since cuttingcorners of the matrix only decreases the spectral norm, with probability tending to 1,(61) (cid:13)(cid:13)(cid:13)(cid:0) I T − N + Z + Z ∗ + Z Z ∗ (cid:1) − (cid:0) I T − N + ˜ Z + ˜ Z ∗ + ˜ Z ˜ Z ∗ (cid:1)(cid:13)(cid:13)(cid:13) < T δ − . Simultaneously, by Lemma 13, the spectral norm of (2 I T − N + Z + Z ∗ ) − is bounded fromabove by r − (and, hence, a similar bound holds for Z replaced with ˜ Z ). We concludethat the part of the increment M (cid:48) ( Z ) − M (cid:48) ( ˜ Z ) due to the change of the numerator in theright-hand side of (60) is bounded from above (in spectral norm) by(62) const · T δ − . For the denominator in (60), we use the matrix identity(63) ( G + ∆) − − G − = − ( G + ∆) − ∆ G − with G = 2 I T − N + Z + Z ∗ , ∆ = ˜ Z + ˜ Z ∗ − Z − Z ∗ . By (57), (cid:107) ∆ (cid:107) < T δ − withprobability tending to 1. Hence, using Lemma 13 we upper bound the spectral norm of(63) by const · r − T δ − . Since the spectral norm of the numerator in the right-hand side of (60) is at most 4, we conclude that the part of the increment M (cid:48) ( Z ) − M (cid:48) ( ˜ Z ) due to thechange of the denominator also admits a bound (62). Summing up, we have shown that withprobability tending to 1, (cid:107) M (cid:48) ( O D rand O ∗ ) − M (cid:48) ( O DO ∗ ) (cid:107) < const · T δ − . This finishes the proof of Proposition 12. (cid:3) Proof of Lemma 13. We would like to show that for some deterministic r > N, T → ∞ :(64) (cid:13)(cid:13)(cid:13)(cid:13) Z + Z ∗ (cid:13)(cid:13)(cid:13)(cid:13) < − r . For that we denote H = Z + Z ∗ and notice that H is a Hermitian matrix with spectrumbetween − H is again a Jacobiensemble. The law of large numbers for the latter (reviewed in Theorem 14) implies thatas T → ∞ , the empirical measure of the eigenvalues T (cid:80) T i =1 δ λ i ( H ) converges (weakly inprobability ) to a certain explicit deterministic measure on [ − , 1] with a density p ( x ).The matrix Z + Z ∗ is a ( T − N ) × ( T − N ) corner of H . The largest eigenvalue of thismatrix has the following representation:(65) λ (cid:18) Z + Z ∗ (cid:19) = max u ∈(cid:104) e N +1 ,...,e T (cid:105)| u | =1 (cid:32) T (cid:88) i =1 λ i ( H ) (cid:12)(cid:12) ( u, v i ( H )) (cid:12)(cid:12) (cid:33) , where λ i ( H ) is the i th largest eigenvalue of H , v i ( H ) is the corresponding eigenvector and( u, v i ( H )) is the scalar product with this eigenvector. Since the vectors v i are orthonormal, (cid:80) Ni =1 (cid:12)(cid:12) ( u, v i ( H )) (cid:12)(cid:12) = 1. On the other hand, all λ i ( H ) are not greater than 1 and only fewof them are close to 1. Hence, (64) would follow, if we manage to show that the sum in(65) is not concentrated on few largest λ i ( H ). To show that, it suffices to upper-bound (cid:12)(cid:12) ( u, v i ( H )) (cid:12)(cid:12) , which we now do.Let us fix k with k < min( N, T − N ) and upper bound (65) by(66) max u ∈(cid:104) e N +1 ,...,e T (cid:105)| u | =1 (cid:32) k (cid:88) i =1 (cid:12)(cid:12) ( u, v i ( H )) (cid:12)(cid:12) + λ k +1 ( H ) T (cid:88) i = k +1 (cid:12)(cid:12) ( u, v i ( H )) (cid:12)(cid:12) (cid:33) = λ k +1 ( H ) + (1 − λ k +1 ( H )) max u ∈(cid:104) e N +1 ,...,e T (cid:105)| u | =1 (cid:32) k (cid:88) i =1 (cid:12)(cid:12) ( u, v i ( H )) (cid:12)(cid:12) (cid:33) . A sequence of random probability measures µ n converges weakly in probability to a measure µ if foreach bounded continuous function f ( x ), the random variables (cid:82) f ( x ) µ n ( dx ) converge in probability to (cid:82) f ( x ) µ ( dx ). OINTEGRATION IN LARGE VARS 39 Since the law of the matrix H is invariant under conjugations with orthogonal (or unitary)matrices, the eigenvectors v ( H ) , v ( H ) , . . . , v T ( H ) are uniformly-distributed, i.e. the T × T matrix formed by them is a uniformly-random orthogonal (or unitary) matrix. Hence,max u ∈(cid:104) e N +1 ,...,e T − N (cid:105)| u | =1 (cid:32) k (cid:88) i =1 (cid:12)(cid:12) ( u, v i ( H )) (cid:12)(cid:12) (cid:33) . is a maximum eigenvalue of the ( T − N ) × ( T − N ) corner of the projector in T –dimensionalspace on random k –dimensional subspace chosen uniformly at random. If we denote through P k and P T − N the projectors on the first k and T − N basis vectors, respectively, and take auniformly random orthogonal (or unitary if β = 2) T × T matrix O , then we deal with themaximal eigenvalue of P T − N OP k O ∗ P T − N . The maximal eigenvalue of this product is thesame as the one of OP k O ∗ P T − N , or of P k O ∗ P T − N O , or of P k O ∗ P T − N OP k . Equivalently, thisis the maximal eigenvalue of k × k corner Y of a projector on uniformly-random ( T − N )–dimensional space. Such k × k matrix Y is distributed as Jacobi ensemble with densityproportional to(67) det( Y ) β ( T − N − k +1) − det(1 − Y ) β ( N − k +1) − dY, see Forrester (2010, (3.113) and the following formula). We now choose k = N/ 2. Thenthe largest eigenvalue of the ensemble (67) converges as N, T → ∞ in probability (see e.g.,Johnstone (2008), or Anderson et al. (2010, Section 2.6.2), or Holcomb and Moreno Flores(2012)) to a deterministic number 0 < c < 1, depending on the ratio T /N .Simultaneously, λ N/ ( H ) converges as T , N → ∞ to another deterministic number0 < c (cid:48) < 1, due to the aforementioned convergence of the empirical measure of H . Hence,(66) implies that the largest eigenvalue of Z + Z ∗ is bounded away from 1. (cid:3) Remark 4. Although we are not pursuing this direction, it is possible to upgrade the aboveargument to the case when N, T → ∞ in such a way that T /N → ∞ . Then r is tendingto , but we can find the speed of convergence (and then propagate it to the precise boundin Proposition 12). For that we would need to analyze the largest eigenvalue of (67) moreprecisely (when T /N → ∞ , the Jacobi ensemble concentrates near and degenerates into theWishart ensemble after proper rescaling), and also use exact asymptotics of λ k +1 ( H ) (whichcan be obtained from the local law of the Jacobi ensemble for H near ). We now have all the ingredients for Theorem 4 (as well as for its β = 2 version). Proof of Theorem 4. By Theorem 6, the eigenvalues x ≥ · · · ≥ x N of Jacobi ensemble J ( N ; N , T − N ) have the same distribution as those of M defined in (56). On the other hand,by Proposition 11 the eigenvalues λ ≥ · · · ≥ λ N of the matrix S S − S S − have the same distribution as those of ˜ M defined in (53). Hence, Proposition 12 gives the desiredstatement. (cid:3) In this section we review the asymptotic resultsfor the Jacobi ensemble J ( N ; p, q ) introduced in the Definition 3 as N → ∞ .We assume that as N → ∞ , also p, q → ∞ , in such a way that(68) p − N = 12 ( p − , p ≥ , q − N = 12 ( q − , q ≥ , where p and q are two parameters, which stay bounded away from 1 and from ∞ as N → ∞ . We further define the equilibrium measure µ p , q of the Jacobi ensemble through:(69) µ p , q ( x ) dx = p + q π · (cid:112) ( x − λ − )( λ + − x ) x (1 − x ) [ λ − ,λ + ] dx, where the support [ λ − , λ + ] of the measure is defined via(70) λ ± = 1( p + q ) (cid:16)(cid:112) p ( p + q − ± √ q (cid:17) . One can check that 0 < λ − < λ + < p , q > 1. The law (69) is known as Wachterdistribution. Further, define(71) c ± = ( p + q )2 (cid:112) λ + − λ − λ ± (1 − λ ± ) , and note that µ p , q ( x ) ≈ c ± π (cid:112) | x − λ ± | , as x → λ ± inside [ λ − , λ + ] , where the normalization π (cid:112) | x − λ ± | was chosen to match the behavior of the Wigner semi-circle law π √ − x near edges ± Theorem 14. Suppose that N, p, q → ∞ in such a way that p ≥ and q ≥ in (68) staybounded. For the second conclusions we additionally assume that q is bounded away from and for the third conclusion we additionally require p to be bounded away from . Let x ≥ x ≥ · · · ≥ x N be N random eigenvalues of Jacobi ensemble J ( N ; p, q ) . Then (I) lim N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) i =1 δ x i − µ p , q (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 , weakly in probability. The use of p − q − p and q is related to the choice of notations in (16). The followingtheorems would work without this shift as well, however, the shift leads to a faster speed of convergence,cf. Johnstone (2008, Discussion before Theorem 1). OINTEGRATION IN LARGE VARS 41 This means that for any continuous function f ( x ) we have convergence in proba-bility: (72) lim N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) i =1 f ( x i ) − (cid:90) f ( x ) µ p , q ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 . (II) For { a i } ∞ i =1 as in Proposition 1, we have convergence in finite-dimensional distribu-tions for the largest eigenvalues: (73) lim N →∞ (cid:110) N / c / ( x i − λ + ) (cid:111) ∞ i =1 → { a i } ∞ i =1 . In particular, N / c / ( x − λ + ) converges to the Tracy-Widom distribution F . (III) We also have convergence in distribution for the smallest eigenvalues (74) lim N →∞ (cid:110) N / c / − ( λ − − x N +1 − i ) (cid:111) ∞ i =1 → { a i } ∞ i =1 . The first proof of (72) appeared in Wachter (1980), for other proofs see Bai and Silverstein(2010) and Dumitriu and Paquette (2012); we follow the notations of the last reference. Theasymptotics (73), (74) can be found in Johnstone (2008), and it is a manifestation of theuniversality for the distributions of largest/smallest eigenvalues of random matrices holdingin much wider generality, see, e.g., Deift and Gioev (2007), Erdos and Yau (2012), Tao andVu (2012). As an intermediate step we prove the following theorem. Theorem 15. Suppose that T, N → ∞ in such a way that T /N > remains bounded awayfrom and from ∞ . Let λ ≥ · · · ≥ λ N be eigenvalues of the matrix S S − S S − defined asin Theorem 2 and Proposition 11. Then for { a i } ∞ i =1 as in Proposition 1, we have convergencein finite-dimensional distributions for the largest eigenvalues: (75) lim N →∞ (cid:110) N / c / ( λ i − λ + ) (cid:111) ∞ i =1 → { a i } ∞ i =1 , where (76) λ ± = 1( p + q ) (cid:104)(cid:112) p ( p + q − ± √ q (cid:105) , c + = ( p + q ) (cid:112) λ + − λ − · λ + · (1 − λ + ) , p = 2 − N , q = TN − − N . Proof. Theorem 4 implies that the asymptotics of largest eigenvalues x , x , . . . is the sameas the one for the largest eigenvalues of J ( N ; N , T − N ). For the latter we use Theorem 14with p = 2 − N , q = TN − − N . (cid:3) The limiting processes { a i } ∞ i =1 arising for the largest and smallest eigenvalues are independent. Proof of Theorem 2. We use Theorem 15 and the fact that for small x ln (cid:0) − ( λ + + x ) (cid:1) = ln(1 − λ + ) − − λ + x + o ( x ) . (cid:3) In this section we explain how theproof of Theorem 5 is obtained. We follow the notations of Section 9.2 for the deterministicoperators and spaces. We take a uniformly-random orthogonal (or unitary if β = 2) operator˜ O acting in ( T − V and define an operator ˜ W acting in V :˜ W = I V − ˜ O ( F V ) − ˜ O ∗ , Let us introduce a symmetric (or Hermitian if β = 2) N × N matrix (cid:99) M through(77) (cid:99) M = [ ˜ W ] VNN ([ ˜ W ∗ ˜ W ] VNN ) − [ ˜ W ∗ ] VNN . Proposition 16. Choose an arbitrary positive definite covariance matrix Λ . Let ε be N × T matrix of random variables (real if β = 1 and complex if β = 2 ), such that T columns of ε arei.i.d., and each of them is an N -dimensional mean zero Gaussian vector with covariance Λ .Fix an arbitrary N –dimensional vector µ . Define the N × T matrix X = ( X , X , . . . , X T ) as in Eq. (20) via X t = µ + ε t , t = 1 , . . . , T. Further set ∆ c X t = X t +1 − X t , t = 1 , . . . , T − and ∆ c X T = X − X t . Define N × N matrices: (78) S w.n. = ∆ c X P (∆ c X ) ∗ , S w.n. = ∆ c X P X ∗ , S w.n. = X P (∆ c X ) ∗ , S w.n. = X P X ∗ , Then the eigenvalues of the matrix S w.n. ( S w.n. ) − S w.n. ( S w.n. ) − have the same distributionas those of (cid:99) M in (77) .Proof. The proof is similar to that of Proposition 11 and we omit many details. First, notethat µ cancels out both in X P and in ∆ c X . Hence, without loss of generality we can assume µ = 0.Next, we define G = P ( F − − I T ) P . Note that P and ( F − − I T ) commute with each other. Using additionally P = P , we get: S w.n. = εG ∗ Gε ∗ , S w.n. = εG ∗ ε ∗ , S w.n. = εGε ∗ , S w.n. = ε P ε ∗ . At this point, arguing as in the proof of Proposition 11, we can assume that the covariancematrix Λ is identical, since any other covariance matrix can be obtained at the cost ofconjugation of S w.n. ( S w.n. ) − S w.n. ( S w.n. ) − , which leaves the eigenvalues unchanged.We can further identify non-zero eigenvalues of S w.n. ( S w.n. ) − S w.n. ( S w.n. ) − with those ofthe product of two projectors P P P (similarly to (55)), where P is the orthogonal projector OINTEGRATION IN LARGE VARS 43 onto the space spanned by N columns of Gε ∗ and P is the orthogonal projector onto thespace spanned by N columns of P ε ∗ . Next, we rotate the space V , so that the span of N columns of P ε ∗ turns into the span of the first N basis vectors ˜ e , . . . , ˜ e N in V . At this pointwe arrive at (77) with ˜ W replaced by − ˜ W . It remains to notice that the introduction ofprefactor − W does not change the matrix (77). (cid:3) Let T = T − O be uniformly random real orthogonal of determinant 1 (if β = 1)or complex unitary (if β = 2) T × T matrix. Define W = I T + O . Then we set(79) M = [ W ] NN ([ W ∗ W ] NN ) − [ W ∗ ] NN . Recall that for a matrix A with real spectrum, λ i ( A ) is the i th largest eigenvalue of A . Proposition 17. Assume T = T − ≥ N . One can couple M from (79) with (cid:99) M from (77) in such a way that for each (cid:15) > we have Prob (cid:18) max ≤ i ≤ N | λ i ( M ) − λ i ( (cid:99) M ) | < N − (cid:15) (cid:19) → , as T, N → ∞ in such a way that the ratio T /N remains bounded. The proof is almost identical to that of Proposition 12 and we omit it: the key idea is tonotice that the only difference between M and (cid:99) M is in the replacement of O by − ˜ O ( F V ) − ˜ O ∗ ;however, by the results of Meckes and Meckes (2013) the latter two matrices can be coupledso that their eigenvectors are the same, while eigenvalues are very close to each other.Combining Proposition 16, Proposition 17, and Theorem 7, we arrive at the statement ofTheorem 5. As of May 12, 2020, S&P100 consists of the following companies: Apple Inc.,AbbVie Inc., Abbott Laboratories Accenture, Adobe Inc., American International GroupAllstate, Amgen Inc., American Tower, Amazon.com, American Express Boeing Co., Bankof America Corp, Biogen, The Bank of New York Mellon, Booking Holdings, BlackRock Inc,Bristol-Myers Squibb, Berkshire Hathaway, Citigroup Inc, Caterpillar Inc., Charter Commu-nications, Colgate-Palmolive, Comcast Corp., Capital One Financial Corp., ConocoPhillips,Costco Wholesale Corp., salesforce.com, Cisco Systems, CVS Health, Chevron Corporation,DuPont de Nemours Inc., Danaher Corporation, The Walt Disney Company, Dow Inc., DukeEnergy, Emerson Electric Co., Exelon, Ford Motor Company, Facebook Inc., FedEx, Gen-eral Dynamics, General Electric, Gilead Sciences, General Motors, Alphabet Inc. (Class C),Alphabet Inc. (Class A), Goldman Sachs, Home Depot, Honeywell, International BusinessMachines, Intel Corp., Johnson & Johnson, JPMorgan Chase & Co., Kraft Heinz, KinderMorgan, The Coca-Cola Company, Eli Lilly and Company, Lockheed Martin, Lowe’s, Mas-terCard Inc, McDonald’s Corp, Mondel¯ez International, Medtronic plc, MetLife Inc., 3M Company, Altria Group, Merck & Co., Morgan Stanley, Microsoft, NextEra Energy, Netflix,Nike Inc., NVIDIA Corp., Oracle Corporation, Occidental Petroleum Corp., PepsiCo, PfizerInc, Procter & Gamble Co, Philip Morris International, PayPal Holdings, Qualcomm Inc.,Raytheon Technologies, Starbucks Corp., Schlumberger, Southern Company, Simon PropertyGroup, Inc., AT&T Inc, Target Corporation, Thermo Fisher Scientific, Texas Instruments,UnitedHealth Group, Union Pacific Corporation, United Parcel Service, U.S. Bancorp, VisaInc., Verizon Communications, Walgreens Boots Alliance, Wells Fargo, Walmart, ExxonMobil Corp.Eight of the above comanies are not available for the entire period under consideration01 . . − . . ReferencesAnderson, G., A. Guionnet, and O. Zeitouni , An Introduction to Random Matrices ,Cambridge University Press, 2010. Bai, Z. and J.W. Silverstein , Spectral analysis of large dimensional random matrices ,Vol. 20, New York: Springer, 2010. Bao, Z., G. Pan, and W. Zhou , “Central limit theorem for partial linear eigenvaluestatistics of Wigner matrices,” Journal of Statistical Physics , 2013, , 88–129. Bejan, A. , “Largest eigenvalues and sample covariance matrices. Tracy-Widom andpainlev´e ii: computational aspects and realization in s-plus with applications,” , 2005. Ben Arous, G. and A. Guionnet , “Large deviations for Wigner’s law and Voiculescu’snon-commutative entropy,” Probability theory and related fields , 1997, (4), 517–542. Borodin, A. and V. Gorin , “General β − Jacobi Corners Process and the Gaussian FreeField,” Communications on Pure and Applied Mathematics , 2015, (10), 1774–1844. Bykhovskaya, A. , “Time Series Approach to the Evolution of Networks: Prediction andEstimation,” Working Paper , 2020. Bykhovskaya, A. and V. Gorin , “Jacobi Ensemble and Cointegration in VAR( k ),” Work-ing Paper , 2020. Cavaliere, G., A. Rahbek, and A.R. Taylor , “Bootstrap determination of the co-integration rank in vector autoregressive models,” Econometrica , 2012, (4), 1721–1740. OINTEGRATION IN LARGE VARS 45 Deift, P. and D. Gioev , “Universality at the edge of the spectrum for unitary, orthogonal,and symplectic ensembles of random matrices,” Communications on Pure and AppliedMathematics , 2007, (6), 867—-910. Dumitriu, I. and A. Edelman , “Matrix models for beta ensembles,” Journal of Mathe-matical Physics , 2002, (11), 5830–5847. Dumitriu, I. and E. Paquette , “Global fluctuations for linear statistics of β -Jacobi en-sembles,” Random Matrices: Theory and Applications , 2012, (04), 1250013. Elliott, G., T.J. Rothenberg, and J.H. Stock , “Efficient tests for an autoregressiveunit root,” Econometrica , 1996, (4), 813–836. Engle, R.F. and C.W. Granger , “Co-integration and error correction: representation,estimation, and testing,” Econometrica , 1987, (2), 251–276. Erdos, L. and H.T. Yau , “Universality of local spectral statistics of random matrices,” Bulletin of the American Mathematical Society , 2012, (3), 377—-414. Forrester, P.J. , “The spectrum edge of random matrix ensembles,” Nuclear Physics B ,1993, (3), 709–728. Forrester, P.J. , Log-gases and random matrices , Princeton University Press, 2010. Gonzalo, J. and J.Y. Pitarakis , “Dimensionality effect in cointegration analysis,” in“Cointegration, Causality, and Forecasting. A Festschrift in Honour of Clive WJ Granger,”Oxford: Oxford University Press, 1999, chapter 9, pp. 212–229. Granger, C.W. , “Some properties of time series data and their use in econometric modelspecification,” Journal of Econometrics , 1981, (1), 121–130. Guionnet, A. and J. Novak , “Asymptotics of unitary multimatrix models: TheSchwinger–Dyson lattice and topological recursion,” The Schwinger–Dyson lattice andtopological recursion , 2015, (10), 2851–2905. Harvey, D.I., S.J. Leybourne, and A.R. Taylor , “Unit root testing in practice: dealingwith uncertainty over the trend and initial condition,” Econometric theory , 2009, (3),587–636. Ho, M.S. and B.E. Sørensen , “Finding cointegration rank in high dimensional systemsusing the Johansen test: an illustration using data based Monte Carlo simulations,” TheReview of Economics and Statistics , 1996, (4), 726–732. Holcomb, D. and G.R. Moreno Flores , “Edge scaling of the β -Jacobi ensemble,” Journalof Statistical Physics , 2012, (9), 1136–1160. Hua, L. , Harmonic analysis of functions of several complex variables in the classical do-mains , Vol. 6, American Mathematical Soc., 1963. Johansen, S. , “Statistical Analysis of Cointegrating Vectors,” Journal of Economic Dy-namics and Control , 1988, (2–3), 231–254. Johansen, S. , “Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models,” Econometrica , 1991, , 1551–1580. Johansen, S. , “A small sample correction for the test of cointegrating rank in the vectorautoregressive model,” Econometrics , 2002, (5), 1929–1961. Johansson, K. , “On fluctuations of eigenvalues of random Hermitian matrices,” Duke math-ematical journal , 1998, (1), 151–204. Johnstone, I. , “Multivariate analysis and Jacobi ensembles: largest eigenvalue, Tracy-Widom limits and rates of convergence,” Annals of statistics , 2008, (6), 2638–2716. Kadell, K.W. , “The Selberg–Jack symmetric functions,” Advances in mathematics , 1997, (1), 33–102. Killip, R. and I. Nenciu , “Matrix models for circular ensembles,” International Mathe-matics Research Notices , 2004, (50), 2665–2701. Krauss, C. , “Statistical arbitrage pairs trading strategies: Review and outlook.,” Journalof Economic Surveys , 2017, (2), 513–545. Maddala, G.S. and In-Moo Kim , Unit Roots, Cointegration, and Structural Change ,Cambridge University Press, 1998. Meckes, E. and M. Meckes , “Spectral measures of powers of random matrices,” Electroniccommunications in probability , 2013, Mehta, M.L. , Random matrices , Elsevier, 2004. Mingo, J.A. and M. Popa , “Real second order freeness and Haar orthogonal matrices,” Journal of Mathematical Physics , 2013, (5), 051701. Muirhead, R.J. , Aspects of multivariate statistical theory , John Wiley & Sons, 2009. M¨uller, U.K. and G. Elliott , “Tests for unit roots and the initial condition,” Economet-rica , 2003, (4), 1269–1286. Onatski, A. and C. Wang , “Alternative asymptotics for cointegration tests in large vars,” Econometrica , 2018, (4), 1465–1478. Onatski, A. and C. Wang , “Extreme canonical correlations and high-dimensional cointe-gration analysis,” Journal of Econometrics , 2019. O’Rourke, S. , “Gaussian Fluctuations of Eigenvalues in Wigner Random Matrices,” Jour-nal of Statistical Physics , 2010, (6), 1045–1066. Phillips, P.C.B. and S. Ouliaris , “Asymptotic properties of residual based tests forcointegration,” Econometrica , 1990, (1), 165–193. Ramirez, J., B. Rider, and B. Vir´ag , “Beta ensembles, stochastic Airy spectrum, anda diffusion,” Journal of the American Mathematical Society , 2011, (4), 919–944. Reinsel, G.C. and S.K. Ahn , “Vector autoregressive models with unit roots and reducedrank structure: Estimation. Likelihood ratio test, and forecasting,” Journal of time seriesanalysis , 1992, (4), 353–375. OINTEGRATION IN LARGE VARS 47 Sodin, S. , “Several applications of the moment method in random matrix theory,” Proceed-ings of the International Congress of Mathematicians , 2014. Stock, J.H. and M.W. Watson , “Testing for common trends,” Journal of the Americanstatistical Association , 1988, (404), 1097–1107. Swensen, A.R. , “Bootstrap Algorithms for Testing and Determining the CointegrationRank in VAR Models,” Econometrica , 2006, (6), 1699–1714. Tao, T. and V. Vu , “Random matrices: the universality phenomenon for Wigner ensem-bles,” Modern aspects of random matrix theory , 2012, , 121—-172. Tracy, C.A. and H. Widom , “On orthogonal and symplectic matrix ensembles,” Com-munications in Mathematical Physics , 1996, (3), 727–754. Wachter, K.W. , “The limiting empirical measure of multiple discriminant ratios,” Annalsof Statistics , 1980, (5), 937–957. Xiao, Z. and P.C.B. Phillips , “Efficient detrending in cointegrating regression,” Econo-metric Theory , 1999, (4), 519–548. University of Wisconsin-Madison E-mail address , A.B.: [email protected] E-mail address , V.G.:, V.G.: