[PDF] Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations

Abstract

Consider two random vectors \widetilde{\mathbf x} \in \mathbb R^p and \widetilde{\mathbf y} \in \mathbb R^q of the forms \widetilde{\mathbf x}=A\mathbf z+\mathbf C_1^{1/2}\mathbf x and \widetilde{\mathbf y}=B\mathbf z+\mathbf C_2^{1/2}\mathbf y, where \mathbf x\in \mathbb R^p, \mathbf y\in \mathbb R^q and \mathbf z\in \mathbb R^r are independent vectors with i.i.d. entries of mean 0 and variance 1, \mathbf C_1 and \mathbf C_2 are p \times p and q\times q deterministic covariance matrices, and A and B are p\times r and q\times r deterministic matrices. With n independent observations of (\widetilde{\mathbf x},\widetilde{\mathbf y}), we study the sample canonical correlations between \widetilde{\mathbf x} and \widetilde{\mathbf y}. We consider the high-dimensional setting with finite rank correlations. Let t_1\ge t_2 \ge \cdots\ge t_r be the squares of the nontrivial population canonical correlation coefficients, and let \widetilde\lambda_1 \ge\widetilde\lambda_2\ge\cdots\ge\widetilde\lambda_{p\wedge q} be the squares of the sample canonical correlation coefficients. If the entries of \mathbf x, \mathbf y and \mathbf z are i.i.d. Gaussian, then the following dichotomy has been shown in [7] for a fixed threshold t_c \in(0, 1): for 1\le i \le r, if t_i < t_c, then \widetilde\lambda_i converges to the right-edge \lambda_+ of the limiting eigenvalue spectrum of the sample canonical correlation matrix; if t_i>t_c, then \widetilde\lambda_i converges to a deterministic limit \theta_i \in (\lambda_+,1) determined by t_i. In this paper, we prove that these results hold universally under the sharp fourth moment conditions on the entries of \mathbf x and \mathbf y. Moreover, we prove the results in full generality, in the sense that they also hold for near-degenerate t_i's and for t_i's that are close to the threshold t_c.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b Sample canonical correlation coeﬃcients of high-dimensionalrandom vectors with ﬁnite rank correlations

Zongming Ma ∗ and Fan Yang † Department of Statistics, University of PennsylvaniaFebruary 9, 2021

Abstract

Consider two random vectors r x P R p and r y P R q of the forms r x “ A z ` C { x and r y “ B z ` C { y ,where x P R p , y P R q and z P R r are independent random vectors with i.i.d. entries of zero mean andunit variance, C and C are p ˆ p and q ˆ q deterministic population covariance matrices, and A and B are p ˆ r and q ˆ r deterministic factor loading matrices. With n independent observations of p r x , r y q ,we study the sample canonical correlations between r x and r y . We consider the high-dimensional settingwith ﬁnite rank correlations, that is, p { n Ñ c and q { n Ñ c as n Ñ 8 for some constants c P p , q and c P p , ´ c q , and r is a ﬁxed integer. Let t ě t ě ¨ ¨ ¨ ě t r ě r x and r y , and let r λ ě r λ ě ¨ ¨ ¨ ě r λ p ^ q ě x , y and z are i.i.d. Gaussian,then the following dichotomy has been shown in [7] for a ﬁxed threshold t c P p , q : for any 1 ď i ď r ,if t i ă t c , then r λ i converges to the right-edge λ ` of the limiting eigenvalue spectrum of the samplecanonical correlation matrix, and moreover, n { p r λ i ´ λ ` q converges weakly to the Tracy-Widom law; if t i ą t c , then r λ i converges to a deterministic limit θ i P p λ ` , q that is determined by c , c and t i . Inthis paper, we prove that these results hold universally under the sharp fourth moment conditions on theentries of x , y and z . Moreover, we prove the results in full generality, in the sense that they also holdfor near-degenerate t i ’s and for t i ’s that are close to the threshold t c . Finally, we also provide almostsharp convergence rates for the sample canonical correlation coeﬃcients under a general a -th momentassumption. Since the seminal work by Hotelling [32], the canonical correlation analysis (CCA) has been one of the mostclassical methods to study the correlations between two random vectors. Given two random vectors r x P R p and r y P R q , we denote the population covariance and cross-covariance matrices by r Σ xx : “ Cov p r x , r x q , r Σ yy : “ Cov p r y , r y q , r Σ xy “ r Σ J yx : “ Cov p r x , r y q . It is well-known that the i -th canonical correlation coeﬃcient (CCC) between r x and r y , denoted by ρ i , is thesquare root of the i -th largest eigenvalue t i of the population canonical correlation (PCC) matrix r Σ : “ r Σ ´ { xx r Σ xy r Σ ´ yy r Σ yx r Σ ´ { xx . ∗ E-mail: [email protected] † E-mail: [email protected]. F. Yang was supported by the Wharton Dean’s Fund for Postdoctoral Research. n independent samples of p r x , r y q . Then we can study the population CCC’s throughtheir sample counterparts. More precisely, we form data matrices r X and r Y as r X : “ n ´ { `r x , r x , ¨ ¨ ¨ , r x n ˘ , r Y : “ n ´ { `r y , r y , ¨ ¨ ¨ , r y n ˘ , (1.1)where n ´ { is a convenient scaling, so that the sample covariance and cross-covariance matrices can bewritten concisely as r S xx : “ n n ÿ i “ r x i r x J i “ r X r X J , r S yy : “ n n ÿ i “ r y i r y J i “ r Y r Y J , r S xy “ r S J yx : “ n n ÿ i “ r x i r y J i “ r X r Y J . Then the squares of the sample CCC’s, r λ ě r λ ě ¨ ¨ ¨ ě r λ p ^ q ě

0, are deﬁned as the eigenvalues of the sample canonical correlation (SCC) matrix C r X r Y : “ r S ´ { xx r S xy r S ´ yy r S yx r S ´ { xx . If n Ñ 8 while p, q and r are ﬁxed, the SCC matrix converges to the PCC matrix almost surely by law of largenumber, and hence the sample CCC can be used as a consistent estimator of the population CCC. However,many modern applications, such as statistical learning, wireless communications, medical imaging, ﬁnancialeconomics and population genetics, are seeing a rapidly increasing demand in analyzing high-dimensionaldata, where p and q are comparable to n when n is large. In the high-dimensional setting, the behavior ofthe SCC matrix can deviate greatly from the PCC matrix due to the so-called “curse of dimensionality”.There have been several works on the theoretical analysis of high-dimensional CCA. We mention someof them that are most related to this paper.First, we consider the null case where r x and r y are independent random vectors. When r x and r y areindependent Gaussian vectors, the eigenvalues of the SCC matrix have the same joint distribution as thoseof a double Wishart matrix [34]. In particular, the joint distribution of the eigenvalues of double Wishartmatrices has been studied in the context of the Jacobi ensemble and F-type matrices [31, 34], which impliesthat the largest few eigenvalues of the SCC matrix satisfy the Tracy-Widom law asymptotically. For generaldistributed random vectors r x and r y , the Tracy-Widom ﬂuctuation of the largest eigenvalues of the SCCmatrix is proved in [30] under the assumption that the entries of r x and r y have ﬁnite moments up to anyorder. The moment assumption is later relaxed to the ﬁnite fourth moment assumption in [50]. In theGaussian case, it is shown in [46] that, almost surely, the empirical spectral distribution (ESD) of the SCCmatrix converges weakly to a deterministic probability distribution (cf. (2.12)). In the general non-Gaussiancase, both the convergence and the linear spectral statistics of the ESD of the SCC matrix have been provedin [52, 53].Then we consider the case where r x and r y have ﬁnite rank correlations. If r x and r y are random Gaussianvectors, then the asymptotic distributions of the sample canonical correlation coeﬃcients have been derivedwhen one of p and q is ﬁxed as n Ñ 8 [27]. If p and q are both proportional to n , the asymptotic distributionsof the sample CCC’s have been established under the Gaussian assumption in [7]. Under certain sparsityassumptions, the theory of high-dimensional sparse CCA and it applications have been discussed in [28, 29].In [40], the authors derived asymptotic null and non-null distributions of several test statistics for tests ofredundancy in high-dimensional CCA. In [35], the authors studied the asymptotic behaviors of the likelihoodratio processes of CCA under the null hypothesis of no spikes and the alternative hypothesis of a single spike.In this paper, we consider the following signal-plus-noise model for r x P R p and r y P R q : r x “ A z ` C { x , r y “ B z ` C { y . Here z P R r is a rank- r signal vector with i.i.d. entries of mean zero and variance one, and A and B are p ˆ r and q ˆ r deterministic factor loading matrices, respectively. x P R p and y P R q are two independent noise2ectors with i.i.d. entries of mean zero and variance one, and C and C are p ˆ p and q ˆ q deterministicpopulation covariance matrices. Then we can write the data matrices in (1.1) into r X : “ AZ ` C { X, r Y : “ BZ ` C { Y, (1.2)where X , Y and Z are respectively p ˆ n , q ˆ n and r ˆ n matrices with i.i.d. entries of mean zero and variance n ´ . We consider the high-dimensional setting with a low-rank signal, that is, p { n Ñ c and q { n Ñ c as n Ñ 8 for some constants c P p , q and c P p , ´ c q , and r is a ﬁxed integer that does not depend on n .For the model (1.2), the PCC matrix is given by r Σ “ p C ` AA J q ´ { AB J p C ` BB J q ´ BA J p C ` AA J q ´ { , which is of rank at most r . We order the nontrivial eigenvalues of r Σ as t ě t ě ¨ ¨ ¨ ě t r ě

0. Under theGaussian assumption, that is, X , Y and Z are independent random matrices with i.i.d. Gaussian entries,Bao et al. [7] proved that for any 1 ď i ď r , r λ i exhibits very diﬀerent behaviors depending on whether t i isbelow or above the threshold t c , where t c : “ c c c p ´ c qp ´ c q . (1.3)More precisely, if t i ă t c , then the corresponding sample CCC r λ i sticks to the right edge λ ` of the bulkeigenvalue spectrum (cf. (2.13)) of the SCC matrix, and n { p r λ i ´ λ ` q converges weakly to a type-1 Tracy-Widom distribution. On the other hand, if t i ą t c , then it gives rise to an outlier r λ i that lies around a ﬁxedlocation θ i P p λ ` , q determined by t i , c and c . Furthermore, n { p r λ i ´ θ i q converges weakly to a centeredGaussian. Such an abrupt change of the behavior of r λ i when t i crosses the threshold t c is generally called a BBP transition , which dates back to the seminal work of Baik, Ben Arous and P´ech´e [5] on spiked samplecovariance matrices. The BBP transition has been observed in many random matrix ensembles with ﬁniterank perturbations. Without attempting to be comprehensive, we mention the references [14, 15, 24, 36, 37,42] on deformed Wigner matrices, [3, 5, 6, 12, 25, 33, 41] on spiked sample covariance matrices, [17, 49, 51] onspiked separable covariance matrices, and [8, 9, 10, 47] on several other deformed random matrix ensembles.In our setting, the SCC matrix C r X r Y can be regarded as a ﬁnite rank perturbation of the SCC matrix in thenull case with r “ r λ i satisﬁes the sameproperties if we only assume certain moment conditions on the entries of X , Y and Z . In fact, the proofin [7] depends crucially on the rotational invariance of multivariate Gaussian distributions under orthogonaltransforms, and it is hard (if possible) to be extended to the data matrices with generally distributed entries.In this paper, we answer the above question deﬁnitely, and show the universality of the results in [7].Moreover, we highlight the following improvements over the results in [7]. • Theorem 2.14 shows that the following results hold assuming only a ﬁnite fourth moment condition(actually we require a slightly weaker condition (2.34)): for 1 ď i ď r , n { p r λ i ´ λ ` q converges weaklyto Tracy-Widom distribution if t i ă t c , while t i Ñ θ i in probability if t i ą t c . • We obtain quantitative versions of all the results under general moment assumptions: Theorem 2.9,provides almost sharp convergences rates for the sample CCC’s; Theorem 2.11 provides an almostsharp eigenvalue sticking estimate, which shows that the eigenvalues of the SCC matrix stick to thoseof the null SCC matrix with r “ • Our results hold even when some t i ’s are close to the BBP transition threshold t c , and when there aregroups of near-degenerate t i ’s—both of these two cases are ruled out in the assumptions of [7].3o complete the theory, we still need to prove the CLT for r λ i when t i ą t c . Due to length constraint, wepostpone it to [39], where we will show that n { p r λ i ´ θ i q still converges to a center Gaussian but with alimiting variance that is diﬀerent from the one in the Gaussian case. Instead of using the rotational invarianceof multivariate Gaussian distributions, the proofs in this paper are based on a linearization method developedin [50], which reduces the problem to the study of a p p ` q ` n q ˆ p p ` q ` n q random matrix H that islinear in X and Y (cf. (3.3)). Moreover, an optimal local law was proved for the resolvent G : “ H ´ in [50],which is the basis of all the proofs in this paper. Our approach is relatively more ﬂexible and allows us toobtain precise convergence rates for the eigenvalues of the SCC matrix.This paper is organized as follows. In Section 2, we deﬁne the model and state the main results, Theorem2.9, Theorem 2.11 and Theorem 2.14. In Section 3, we introduce the linearization method and collect somebasic tools that will be used in the proof. Then we will give the proofs of Theorem 2.9, Theorem 2.11 andTheorem 2.14 in Sections 4, 5 and 6. Our proofs utilize a result on the eigenvalues of the null SCC matrix,Lemma 2.7, which will be proved in Section 7. Conventions.

For two quantities a n and b n depending on n , we use a n “ O p b n q to mean that | a n | ď C | b n | for a constant C ą

0, and use a n “ o p b n q to indicate that | a n | ď c n | b n | for a positive sequence of numbers c n Ó n Ñ 8 . We will use the notations a n À b n if a n “ O p b n q , and a n „ b n if a n “ O p b n q and b n “ O p a n q . For a matrix A , we use } A } to denote its operator norm. For a vector v , we use } v } to denoteits Euclidean norm. In this paper, we will write an identity matrix as I or 1 without causing any confusions. We consider two independent families of data matrices X “ p x ij q and Y “ p y ij q , which are of dimensions p ˆ n and q ˆ n , respectively. We assume that the entries x ij , 1 ď i ď p , 1 ď j ď n , and y ij , 1 ď i ď q ,1 ď j ď n , are real independent random variables satisfying that E x ij “ E y ij “ , E | x ij | “ E | y ij | “ n ´ . (2.1)To be more general, we do not assume that these random variables are identically distributed. Then wedeﬁne the following data model with ﬁnite rank correlation: r X : “ C { X ` r AZ, r Y : “ C { Y ` r BZ, where C and C are p ˆ p and q ˆ q deterministic positive deﬁnite symmetric covariance matrices, r A and r B are p ˆ r and q ˆ r deterministic matrices, and Z “ p z ij q is an r ˆ n random matrix which leads to thenontrivial correlation between r X and r Y . We assume that Z is independent of X and Y , and the entries z ij ,1 ď i ď r , 1 ď j ď n , are independent random variables satisfying E z ij “ , E | z ij | “ n ´ . (2.2)In this paper, we study the eigenvalues of the sample canonical correlation (SCC) matrix C r X r Y “ ´ r X r X J ¯ ´ { ´ r X r Y J ¯ ´ r Y r Y J ¯ ´ ´ r Y r X J ¯ ´ r X r X J ¯ ´ { . In particular, we are interested in the relations between the eigenvalues of C r X r Y and the canonical correlationcoeﬃcients —the square roots of the eigenvalues of the population canonical correlation (PCC) matrix r Σ : “ r Σ ´ { xx r Σ xy r Σ ´ yy r Σ yx r Σ ´ { xx , r Σ xx : “ C ` r A r A J , r Σ yy : “ C ` r B r B J , r Σ xy “ r Σ J yx : “ r A r B J . Note that the eigenvalues of both SCC and PCC matrices are unchanged under the non-singular transfor-mations r X Ñ X : “ C ´ { r X and r Y Ñ Y : “ C ´ { r Y . Thus it suﬃces to consider the data matrices X : “ X ` AZ, Y : “ Y ` BZ, where A : “ C ´ { r A, B : “ C ´ { r B. (2.3)We assume that A and B have the following singular value decompositions: A “ r ÿ i “ a i u ai p v ai q J , B “ r ÿ i “ b i u bi p v bi q J , (2.4)where t a i u and t b i u are the singular values, t u ai u and t u bi u are the left singular vectors, and t v ai u and t v bi u are the right singular vectors. We assume that for some constant C ą ď a r ď ¨ ¨ ¨ ď a ď a ď C, ď b r ď ¨ ¨ ¨ ď b ď b ď C. (2.5)In this paper, we consider the high dimensional setting, that is, c p n q : “ pn Ñ ˆ c P p , q , c p n q : “ qn Ñ ˆ c P p , ´ ˆ c q . For simplicity of notations, we will always abbreviate c p n q ” c and c p n q ” c for the rest of the paper.Without loss of generality, we assume that c ě c .We now summarize the main assumptions for future reference. For our purpose, we relax the assumptions(2.1) and (2.2) a little bit. One can refer to Corollary 2.13 for the reason of this extension. Assumption 2.1.

Fix a small constant τ ą .(i) X “ p x ij q and Y “ p Y ij q are two real independent p ˆ n and q ˆ n random matrices. Their entries areindependent random variables that satisfy the following moment conditions: max i,j | E x ij | ď n ´ ´ τ , max i,j | E y ij | ď n ´ ´ τ , (2.6)max i,j ˇˇ E | x ij | ´ n ´ ˇˇ ď n ´ ´ τ , max i,j ˇˇ E | y ij | ´ n ´ ˇˇ ď n ´ ´ τ . (2.7) We remark that (2.6) and (2.7) are (slightly) more general than (2.1).(ii) Z “ p z ij q is a real r ˆ n random matrix that is independent of X and Y , and its entries are independentrandom variables that satisfy the following moment conditions: max i,j | E z ij | ď n ´ ´ τ , max i,j ˇˇ E | z ij | ´ n ´ ˇˇ ď n ´ ´ τ . (2.8) (iii) We assume that r ď τ ´ , τ ď c ď c , c ` c ď ´ τ. (2.9) (iv) We consider the data model in (2.3) , where A and B satisfy (2.4) and (2.5) .

5n this paper, we will study the SCC matrix C X Y : “ ` X X J ˘ ´ { ` X Y J ˘ ` YY J ˘ ´ ` YX J ˘ ` X X J ˘ ´ { , the null SCC matrix C XY : “ S ´ { xx S xy S ´ yy S yx S ´ { xx , where S xx : “ XX J , S yy : “ Y Y J , S xy “ S J yx : “ XY J (2.10)and the PCC matrix Σ X Y : “ Σ ´ { xx Σ xy Σ ´ yy Σ yx Σ ´ { xx where Σ xx “ I p ` AA J , Σ yy “ I q ` BB J , Σ xy “ Σ J yx “ AB J . Moreover, we will also consider the following matrices: C YX : “ ` YY J ˘ ´ { ` YX J ˘ ` X X J ˘ ´ ` X Y J ˘ ` YY J ˘ ´ { , and C Y X “ S ´ { yy S yx S ´ xx S xy S ´ { yy , Σ YX : “ Σ ´ { yy Σ yx Σ ´ xx Σ xy Σ ´ { yy . Finally, we deﬁne another null SCC matrix C b Y X as C b Y X : “ p S byy q ´ { S byx S ´ xx S bxy p S byy q ´ { with S byy : “ YY J , S bxy “ p S byx q J : “ X Y J . (2.11)The matrix C bX Y can be deﬁned in the obvious way. In the null case with r “

0, we denote the eigenvalues of C Y X by λ ě λ ě ¨ ¨ ¨ ě λ q ě

0. Then C XY sharesthe same eigenvalues with C Y X , except that it has p p ´ q q more trivial zero eigenvalues λ q ` “ ¨ ¨ ¨ “ λ p “ C Y X by F n p x q : “ q q ÿ i “ λ i ď x . It is known [46, 52] that, almost surely, F n converges weakly to a deterministic probability distribution F p x q with density f p x q “ πc a p λ ` ´ x qp x ´ λ ´ q x p ´ x q λ ´ ď x ď λ ` , (2.12)where λ ˘ : “ ´a c p ´ c q ˘ a c p ´ c q ¯ . (2.13)For the model (2.3) with ﬁnite rank correlations, we denote the eigenvalues of C YX by r λ ě r λ ě ¨ ¨ ¨ ě r λ q ě

0, while C X Y has p p ´ q q more trivial zero eigenvalues r λ q ` “ ¨ ¨ ¨ “ r λ p “

0. We denote the eigenvaluesof Σ X Y by t ě t ě ¨ ¨ ¨ ě t r ě t r ` “ ¨ ¨ ¨ “ t q “ . (2.14)Suppose the entries of X and Y are i.i.d. Gaussian. Then it was proved in [7] that, if t i ą t c (recall (1.3)), r λ i ´ θ i Ñ θ i : “ t i ` ´ c ` c t ´ i ˘ ` ´ c ` c t ´ i ˘ ; (2.15)6f t i ď t c , r λ i ´ λ ` Ñ t i ą t c we have θ i ą λ ` , that is, r λ i will be an outlierthat is detached from the support r λ ´ , λ ` s of the limiting distribution F p x q .Before stating the main results, we ﬁrst introduce the following notion of stochastic domination. It wasﬁrst introduced in [19], and subsequently used in many works on random matrix theory, such as [11, 12, 13,20, 21, 38]. It simpliﬁes the presentation of the results and their proofs by systematizing statements of theform “ ξ is bounded by ζ with high probability up to a small power of n ”. Deﬁnition 2.2 (Stochastic domination and high probability event) . (i) Let ξ “ ´ ξ p n q p u q : n P N , u P U p n q ¯ , ζ “ ´ ζ p n q p u q : n P N , u P U p n q ¯ be two families of nonnegative random variables, where U p n q is an n -dependent parameter set. We say ξ isstochastically dominated by ζ , uniformly in u , if for any small constant ε ą and large constant D ą , sup u P U p n q P ” ξ p n q p u q ą n ε ζ p n q p u q ı ď n ´ D for large enough n ě n p ε, D q , and we shall use the notation ξ ă ζ . Throughout this paper, the stochasticdomination will always be uniform in all parameters that are not explicitly ﬁxed (such as matrix indices, andthe spectral parameter z ). If ξ is complex and we have | ξ | ă ζ , then we will also write ξ ă ζ or ξ “ O ă p ζ q .(ii) We extend the deﬁnition of O ă p¨q to matrices in the sense of operator norm as follows. Let A be a familyof matrices and ζ be a family of nonnegative random variables. Then A “ O ă p ζ q means that } A } ă ζ .(iii) We say an event Ξ holds with high probability if for any constant D ą , P p Ξ q ě ´ n ´ D for large enough n . Moreover, we say Ξ holds with high probability on an event Ω , if for any constant D ą , P p Ω z Ξ q ď n ´ D for large enough n . The following lemma collects basic properties of stochastic domination ă , which will be used tacitlythroughout this paper. Lemma 2.3 (Lemma 3.2 in [11]) . Let ξ and ζ be families of nonnegative random variables, and let C ą be any (large) constant.(i) Suppose that ξ p u, v q ă ζ p u, v q uniformly in u P U and v P V . If | V | ď n C , then ř v P V ξ p u, v q ă ř v P V ζ p u, v q uniformly in u .(ii) If ξ p u q ă ζ p u q and ξ p u q ă ζ p u q uniformly in u P U , then ξ p u q ξ p u q ă ζ p u q ζ p u q uniformly in u .(iii) Suppose that Ψ p u q ě n ´ C is deterministic and ξ p u q satisﬁes E | ξ p u q| ď n C for all u . If ξ p u q ă Ψ p u q uniformly in u , then we have E ξ p u q ă Ψ p u q uniformly in u . We introduce the following bounded support condition for the random matrices considered in this paper.

Deﬁnition 2.4 (Bounded support condition) . We say a random matrix X satisﬁes the bounded supportcondition with φ n , if max i,j | x ij | ă φ n . (2.16) Whenever (2.16) holds, we say that X has support φ n . In the rest of this paper, φ n is usually a deterministic parameter satisfying that n ´ { ď φ n ď n ´ c φ forsome small constant c φ ą | t i ´ t c | “ o p q , i.e. the spike t i is very close to theBBP transition threshold. Suppose that X and Y have bounded support φ n , and Z has bounded support ψ n . Then we make the following assumption. 7 ssumption 2.5. We assume that for some integer ď r ` ď r , the following statement holds: t i ´ t c ě n ´ { ` ψ n ` φ n if and only if ď i ď r ` . (2.17) The lower bound is chosen for deﬁniteness, and it can be replaced with any n -dependent parameter that is ofthe same order. Before stating our main results on the eigenvalues of the SCC matrix C YX , we describe the behaviors ofthe eigenvalues of the null SCC matrix C b Y X (recall (2.11)). We denote its eigenvalues by λ b ě λ b ě ¨ ¨ ¨ ě λ bq .Then we deﬁne the quantiles of the density (2.12), which give the classical locations of the eigenvalues. Deﬁnition 2.6.

The classical location γ j of the j -th eigenvalue is deﬁned as γ j : “ sup x "ż `8 x f p x q d x ą j ´ q * , (2.18) where f is deﬁned in (2.12) . Note that we have γ “ λ ` and λ ` ´ γ j „ p j { n q { for j ą . We have the following eigenvalue rigidity and edge universality result for C b Y X . If B “

0, i.e. there is no Z term, then the same results have been proved in [50] under the same assumptions. Lemma 2.7.

Suppose Assumption 2.1 holds. Suppose X and Y have bounded support φ n such that n ´ { ď φ n ď n ´ c φ for some constant c φ ą , and Z has bounded support ψ n such that n ´ { ď ψ n ď n ´ c ψ for someconstant c ψ ą . Assume that max i,j E | x ij | “ O p n ´ { q , max i,j E | y ij | “ O p n ´ { q , max i,j E | x ij | ă n ´ , max i,j E | y ij | ă n ´ . (2.19) Then the eigenvalues of the null SCC matrix C b Y X satisfy the following eigenvalue rigidity estimate: for anyconstant δ ą and all ď j ď p ´ δ q q , | λ bi ´ γ i | ă i ´ { n ´ { . (2.20) Moreover, we have that for any ﬁxed k P N , lim n Ñ8 P „ˆ n { λ bi ´ λ ` c T W ď s i ˙ ď i ď k  “ lim n Ñ8 P GOE „´ n { p λ i ´ q ď s i ¯ ď i ď k  , (2.21) for all p s , s , . . . , s k q P R k , where c T W : “ « λ ` p ´ λ ` q a c c p ´ c qp ´ c q ﬀ { , and P GOE stands for the law of GOE (Gaussian orthogonal ensemble), which is an n ˆ n symmetry matrixwith independent (up to symmetry) Gaussian entries of mean zero and variance n ´ .Remark . Taking k “ n { λ b ´ λ ` c T W ñ F , where F is the type-1 Tracy-Widom distribution as given by [44, 45]. Moreover, the joint distribution of thelargest k eigenvalues of GOE can be written in terms of the Airy kernel for any ﬁxed k [26]. Hence (2.21)gives a complete description of the ﬁnite-dimensional correlation functions of the edge eigenvalues of C b Y X .8ow we are ready to state our main results on the eigenvalues of the SCC matrix C X Y . We denote∆ i : “ | t i ´ t c | , α ` : “ min ď i ď r | t i ´ t c | . (2.22)We ﬁrst describe the convergence of the outlier eigenvalues and the extreme non-outlier eigenvalues. Theorem 2.9.

Suppose the assumptions of Lemma 2.7 and Assumption 2.5 hold. Then we have the followingestimates: for ď i ď r ` , we have | r λ i ´ θ i | ă p ψ n ` φ n q ∆ i ` n ´ { ∆ { i ; (2.23) for any r ` ` ď i ď ̟ , where ̟ is a ﬁxed integer, and any constant ε ą , we have ´ n ´ { ` ε ă r λ i ´ λ ` ď n ε p ψ n ` φ n ` n ´ { q with high probability. (2.24) Remark . This theorem gives precise large deviation bounds on the locations of the outliers and the ﬁrstfew extreme non-outlier eigenvalues. Consider a small support case with φ n ` ψ n ď n ´ { (this holds withprobability 1 ´ o p q if we assume the existence of 12-th moment). Then (2.23) and (2.24) show that theﬂuctuation of the i -th eigenvalue changes from the order p ψ n ` φ n q ∆ i ` n ´ { ∆ { i to n ´ { when ∆ i crossesthe scale n ´ { . This implies the occurrence of the BBP transition.For the non-outlier eigenvalues of C X Y , they are sticked to the corresponding eigenvalues of C bX Y as givenby the following lemma. Theorem 2.11.

Suppose the assumptions of Lemma 2.7 and Assumption 2.5 hold. Assume that α ` ě n ε p ψ n ` φ n q for some constant ε ą . Then we have the eigenvalue sticking estimate | r λ i ` r ` ´ λ bi | ă n ´ α ´ ` (2.25) for all i ď p ´ δ q q , where δ ą is any small constant.Remark . Theorem 2.11 establishes the large deviation bounds on the non-outlier eigenvalues of C X Y with respect to the eigenvalues of C bX Y . In particular, when α ` " n ´ { , the right-hand side of (2.25) ismuch smaller than n ´ { for i “ O p q . Together with (2.21) for λ bi , (2.25) implies that the largest non-outliereigenvalues of C X Y also converge to the Tracy-Widom law as long as the population canonical correlationcoeﬃcients t i are away from the transition threshold t c at least by α ` " n ´ { .Notice that applying (2.25) to C bX Y and C XY also gives that the eigenvalue λ bi are stick to λ i for 1 ď i ďp ´ δ q q . Thus we obtain the following eigenvalue sticking estimate | r λ i ` r ` ´ λ i | ă n ´ α ´ ` . (2.26)The reason why we state (2.25) instead of (2.26) in Theorem 2.11 will be explained below (5.2).Using a simple cutoﬀ argument, it is easy to obtain the following corollary under the ﬁnite a -th momentassumption for any ﬁxed a ą

4. Since we did not assume that the entries of X and Y are identicallydistributed, the means and variances of the truncated entries may be diﬀerent. This is why we have assumedthe slightly more general mean and variance conditions (2.6)–(2.8). Corollary 2.13.

Assume that X “ p x ij q , Y “ p Y ij q and Z “ p z ij q are respectively p ˆ n , q ˆ n and r ˆ n real matrices, whose entries are independent random variables satisfying (2.1) , (2.2) and max i,j E |? nx ij | a ď C, max i,j E |? ny ij | a ď C, max i,j E |? nz ij | b ď C, (2.27)9 or some constants a ą , b ą and C ą . Suppose Assumption 2.1 (iii) and Assumption 2.5 hold with φ n “ n ´ { ` { a , ψ n “ n ´ { ` { b . (2.28) Then we have that for ď i ď r ` , | r λ i ´ θ i | ď n ε ” p ψ n ` φ n q ∆ i ` n ´ { ∆ { i ı with probability ´ o p q , (2.29) for any small constant ε ą . Moreover, assume that the eigenvalues of Σ X Y satisfy that α ` ě n ε p ψ n ` φ n q ` n ´ { ` ε (2.30) for a constant ε ą . Then we have that for any ﬁxed k P N , lim n Ñ8 P «˜ n { r λ i ` r ` ´ λ ` c T W ď s i ¸ ď i ď k ﬀ “ lim n Ñ8 P GOE „´ n { p λ i ´ q ď s i ¯ ď i ď k  , (2.31) for all p s , s , . . . , s k q P R k .Proof. For φ n and ψ n in (2.28), we introduce the truncated matrices r X , r Y and r Z deﬁned as r X ij : “ x ij | x ij |ď φ n n ε , r Y ij : “ y ij | y ij |ď φ n n ε , r Z ij : “ z ij | z ij |ď ψ n n ε . for a suﬃciently small constant ε ą

0. Combining the moment conditions in (2.27) with Markov’s inequality,we obtain that P p r X ‰ X, r Y ‰ Y, r Z ‰ Z q “ O ` n ´ aε ` n ´ bε ˘ , (2.32)by a simple union bound. Using (2.27) and integration by parts, we can also verify that E | x ij | | x ij |ą φ n n ε ď n ´ ´ ε , E | x ij | | x ij |ą φ n n ε ď n ´ ´ ε , (2.33)For example, for the ﬁrst estimate in (2.33), we have that E | p| x ij | ą φ n n ε q x ij | “ ż P p| p| x ij | ą φ n n ε q x ij | ą s q d s “ ż φ n n ε P p| x ij | ą φ n n ε q d s ` ż φ n n ε P p| x ij | ą s q d s À ż φ n n ε ´ n { ` ε φ n ¯ ´ a d s ` ż φ n n ε ` ? ns ˘ ´ a d s ď n ´ ´ a ´ a ´p a ´ q ε ď n ´ ´ ε , where in the third step we used (2.27) and Markov’s inequality, and in the last step we used a ą

4. Thesecond estimate of (2.33) can be proved in a similar way. Note that (2.33) implies | E ˜ x ij | ď n ´ ´ ε , E | ˜ x ij | “ n ´ ` O p n ´ ´ ε q . Moreover, we trivially have E | ˜ x ij | ď E | x ij | “ O p n ´ { q , E | ˜ x ij | ď E | x ij | “ O p n ´ q . Similar estimates also hold for the entries of r Y . Hence r X and r Y are random matrices satisfying Assumption2.1 (i) and condition (2.19). For r Z , using (2.27) and a similar argument we can check that | E ˜ z ij | ď n ´ ´ ε , E | r z ij | “ n ´ ` O p n ´ ´p b ´ q ε q . Z is a random matrix satisfying Assumption 2.1 (ii). Now combining (2.32) with Theorem 2.9, weconclude (2.29). Next combining (2.32) with Theorem 2.11, we obtain that | r λ r ` ` i ´ λ bi | ă n ´ α ´ ` ď n ´ { ´ ε , ď i ď k. Together with Lemma 2.7, it concludes (2.31).If the entries of X , Y are Z are identically distributed, then we can obtain the following result under theweaker tail condition (2.34). We believe it to be a sharp condition. Theorem 2.14.

Suppose Assumption 2.1 (iii) and Assumption 2.5 hold. Assume that x ij “ n ´ { p x ij , y ij “ n ´ { p y ij and z ij “ n ´ { p z ij , where t p x ij u , t p y ij u and t p z ij u are three independent families of i.i.d.random variables with mean zero and variance one. Moreover, we assume the tail condition lim t Ñ8 t r P p| p x | ě t q ` P p| p y | ě t qs “ . (2.34) We assume that the eigenvalues of Σ X Y converge as n Ñ 8 with lim n t r ` ą t c ą lim n t r ` ` . (2.35) Then both (2.31) and the following convergence hold: r λ i ´ θ i Ñ in probability . (2.36)Finally, for an outlier eigenvalue, n { p r λ i ´ θ i q actually converges to a normal distribution, which hasbeen proved in [7] for the Gaussian case and for well-separated outliers, i.e. the outliers are either exactlydegenerate or separated from each other by a distance of order 1. The proof for the general distribution caseand for near-degenerate outliers is quite involved, and, considering the length of this paper, we postpone itto another paper [39]. The self-adjoint linearization method has been proved to be useful in studying the local laws of randommatrices of Gram type [1, 2, 16, 18, 38, 48, 49]. We now introduce a generalization of this method, whichwas introduced in [50] to study the null SCC matrix C XY . For the discussion below, we assume that X X J , YY J , XX J and Y Y J are all non-singular almost surely. (This is trivially true if, say, the entries of X , Y and Z have continuous densities.) Then given λ ą

0, it is an eigenvalue of C X Y if and only if the followingequation holds: det ´` X Y J ˘ ` YY J ˘ ´ ` YX J ˘ ´ λ X X J ¯ “ . (3.1)Using Schur complement, we can easily check that equation (3.1) is equivalent todet ˆ λ X X J λ { X Y J λ { YX J λ YY J ˙ “ . Using Schur complement again, the above equation is equivalent todet ¨˚˚˝ ˆ X Y ˙ˆ X J Y J ˙ ˆ λI n λ { I n λ { I n λI n ˙ ´ ˛‹‹‚ “ λ R t , u . (3.2)11nspired by equation (3.2), we deﬁne the following p p ` q ` n q ˆ p p ` q ` n q symmetric block matrix H p λ q : “ ¨˚˚˝ ˆ X Y ˙ˆ X J Y J ˙ ˆ λI n λ { I n λ { I n λI n ˙ ´ ˛‹‹‚ . (3.3)In general, we can extend the argument λ to z P C ` : “ t z P C : Im z ą u and call it H p z q , where we take z { to be the branch with positive imaginary part. Then using (2.3) and (2.4) we can write equation (3.2)as det „ H p λ q ` ˆ U E ˙ ˆ DD ˙ ˆ U J E J ˙ “ , (3.4)where D is a 2 r ˆ r matrix with D : “ ˆ Σ a

00 Σ b ˙ , Σ a : “ diag p a , ¨ ¨ ¨ , a r q , Σ b : “ diag p b , ¨ ¨ ¨ , b r q , (3.5)and U and E are p p ` q q ˆ r and 2 n ˆ r matrices, respectively, with U : “ ˆ` u a , ¨ ¨ ¨ , u ar ˘ ` u b , ¨ ¨ ¨ , u br ˘˙ , E : “ ˆ` Z J v a , ¨ ¨ ¨ , Z J v ar ˘ ` Z J v b , ¨ ¨ ¨ , Z J v br ˘˙ . (3.6)If λ is not an eigenvalue of C XY , then H p λ q is non-singular by Schur complement, and (3.4) is equivalent todet „ ` ˆ DD ˙ ˆ U J E J ˙ H p λ q ˆ U E ˙ “ , (3.7)where we used the identity det p ` M M q “ det p ` M M q for any matrices M and M of conformabledimensions. Inspired by the discussion above, we deﬁne the resolvent (or Green’s function) as G p z q : “ r H p z qs ´ , z P C ` Y R , (3.8)whenever the inverse exists. Note that although H p λ q is not well-deﬁned for λ “

1, we can still deﬁne G p q “ lim z Ñ G p z q using Schur complement; see (3.14) and (3.15) below. In order to study the eigenvaluesof C X Y , we need to obtain some estimates on the 4 r ˆ r matrices ˆ U J E J ˙ G p λ q ˆ U E ˙ . These are provided by the anisotropic local law on G p z q , which is one of the main results in [50]. We willstate it in Theorem 3.7 below.For the proof of Theorem 2.11, we will also use a diﬀerent representation of (3.7): if λ is not an eigenvalueof C bX Y , then λ is an eigenvalue of C X Y if and only ifdet „ ` ˆ D a D a ˙ ˆ U J a E J a ˙ G b p λ q ˆ U a E a ˙ “ , (3.9)where G b p z q : “ “ H b p z q ‰ ´ , H b p z q : “ ¨˚˚˝ ˆ X Y ˙ˆ X J Y J ˙ ˆ zI n z { I n z { I n zI n ˙ ´ ˛‹‹‚ , (3.10)12nd D a : “ ˆ Σ a

00 0 ˙ , U a : “ ˆ` u a , ¨ ¨ ¨ , u ar ˘

00 0 ˙ , E a : “ ˆ` Z J v a , ¨ ¨ ¨ , Z J v ar ˘

00 0 ˙ . For simplicity of notations, we introduce the following index sets for our linearized matrices.

Deﬁnition 3.1 (Index sets) . We deﬁne the index sets I : “ J , p K , I : “ J p ` , p ` q K , I : “ J p ` q ` , p ` q ` n K , I : “ J p ` q ` n ` , p ` q ` n K . We will consistently use the latin letters i, j P I Y I and greek letters µ, ν P I Y I . Moreover, we willuse the notations a , b P I : “ Y i “ I i . Then we deﬁne the following forms of resolvents that will be used in the proof.

Deﬁnition 3.2 (Resolvents) . We denote the p I Y I q ˆ p I Y I q block of G p z q by G L p z q , the p I Y I q ˆp I Y I q block by G LR p z q , the p I Y I q ˆ p I Y I q block by G RL p z q , and the p I Y I q ˆ p I Y I q block by G R p z q . We denote the I α ˆ I α block of G p z q by G α p z q for α “ , , , . Then we deﬁne the partial traces as m α p z q : “ n Tr G α p z q “ n ÿ a P I α G aa p z q , α “ , , , . Recalling the notations in (2.10) , we deﬁne H : “ S ´ { xx S xy S ´ { yy and R p z q : “ p HH J ´ z q ´ , R p z q : “ p H J H ´ z q ´ , m p z q : “ q Tr R p z q . (3.11) Note that we have R H “ H R , H J R “ R H J , and Tr R “ Tr R ´ p ´ qz “ qm p z q ´ p ´ qz , (3.12) since C XY “ HH J has p p ´ q q more zero eigenvalues than C Y X “ H J H . Moreover, we deﬁne R p z q : “ ˆ ´ z ´ z { H ´ z { H J ´ z ˙ ´ . Finally, we can deﬁne G bL p z q , G bR p z q , m bα p z q , H b , R b etc. in the obvious way by replacing Y with Y . By Schur complement formula, we can check that R p z q : “ ˆ R ´ z ´ { R H ´ z ´ { H J R R ˙ . Let H “ ř qk “ ? λ k ξ k ζ J k be a singular value decomposition of H , where λ ě . . . ě λ q ě “ λ q ` “ . . . “ λ p , t ξ k u pk “ are the left-singular vectors, and t ζ k u qk “ are the right-singular vectors. Then we have R p z q “ q ÿ k “ λ k ´ z ˆ ξ k ξ J k ´ z ´ { ? λ k ξ k ζ J k ´ z ´ { ? λ k ζ k ξ J k ζ k ζ J k ˙ ´ z ˆ ř pk “ q ` ξ k ξ J k

00 0 ˙ . (3.13)On the other hand, applying Schur complement formula to G p z q , it is easy to get that G L “ ˜ S ´ { xx S ´ { yy ¸ R p z q ˜ S ´ { xx S ´ { yy ¸ . (3.14)13oreover, the other blocks take the forms G R “ ˆ zI n z { I n z { I n zI n ˙ ` ˆ zI n z { I n z { I n zI n ˙ ˆ X J Y J ˙ G L ˆ X Y ˙ ˆ zI n z { I n z { I n zI n ˙ , (3.15)and G LR p z q “ ´ G L p z q ˆ X Y ˙ ˆ zI n z { I n z { I n zI n ˙ , G RL p z q “ ´ ˆ zI n z { I n z { I n zI n ˙ ˆ X J Y J ˙ G L p z q . (3.16)Expanding the product in (3.15) using (3.14), and calculating the partial traces, one can verify directly that m p z q “ z ` n ` ´ zp ´ z Tr R ` z Tr R ˘ “ c z p ´ z q m p z q ` p ´ c ´ c q z, (3.17)and m p z q “ z ` n ` ´ zq ´ z Tr R ` z Tr R ˘ “ c z p ´ z q m p z q ´ p c ´ c q ` p ´ c q z, (3.18)where we also used (3.12) in the derivations. In particular, we have the identity m p z q ´ m p z q “ p ´ z qp c ´ c q . (3.19)We remark that all the above identities also hold for G b , G bL p z q , G bR p z q , m bα p z q etc. with some obvious changesof notations.Since S xx and S yy are standard sample covariance matrices, it is well-known that their eigenvalues are allinside the supports of the Marchenko-Pastur laws— rp ´ ? c q , p ` ? c q s and rp ´ ? c q , p ` ? c q s —with probability 1 ´ o p q [4]. We denote the extreme eigenvalues of S xx and S yy by λ p S xx q ě λ p p S xx q and λ p S yy q ě λ q p S yy q . We shall need some estimates on them with stronger probability bounds as given by thefollowing lemma. Lemma 3.3.

Suppose Assumption 2.1 holds. Suppose X and Y have bounded support φ n such that n ´ { ď φ n ď n ´ c φ for some constant c φ ą , and Z has bounded support ψ n such that n ´ { ď ψ n ď n ´ c ψ for someconstant c ψ ą . Then for any constant ε ą , we have that with high probability, p ´ ? c q ´ ε ď λ p p S xx q ď λ p S xx q ď p ` ? c q ` ε, (3.20) and p ´ ? c q ´ ε ď λ q p S yy q ď λ p S yy q ď p ` ? c q ` ε. (3.21) Moreover, there exists a constant c ą such that with high probability, c ď λ q p S byy q ď λ p S byy q ď c ´ , (3.22) where λ p S byy q and λ q p S byy q are respectively the largest and smallest eigenvalues of S byy .Proof. The estimates (3.20) and (3.21) have been proved in Lemma 3.3 of [50]. To get (3.22), we write S byy “ ` I q , B ˘ W W J ˆ I q B J ˙ , W : “ ˆ YZ ˙ . Since r { n Ñ

0, the estimate (3.21) applied to

W W J gives that with high probability, p ´ ? c q ´ ε ď λ q ` r p W W J q ď λ p W W J q ď p ` ? c q ` ε. Then using that for any unit vector v P R q , v J S byy v „ v J W W J v , we conclude (3.22).14et m αc be the asymptotic limits of m α for α “ , , ,

4. In [50], we have obtained that m c p z q “ ´ z ` c ` c ` a p z ´ λ ´ qp z ´ λ ` q p ´ c q z p ´ z q ´ c p ´ c q z , (3.23) m c p z q “ ´ z ` c ` c ` a p z ´ λ ´ qp z ´ λ ` q p ´ c q z p ´ z q ´ c p ´ c q z , (3.24) m c p z q “ ” p ´ c q z ` c ´ c ` a p z ´ λ ´ qp z ´ λ ` q ı , (3.25) m c p z q “ ” p ´ c q z ` c ´ c ` a p z ´ λ ´ qp z ´ λ ` q ı , (3.26)where λ ˘ are deﬁned in (2.13). On can check that when z Ñ

1, both m c p z q and m c p z q have ﬁnite limits,and without loss of generality, we still denote them by m c p q and m c p q . By (3.17), we can easily obtainthe asymptotic limit of m p z q as m c p z q “ m c p z q ` p c ` c ´ q zc z p ´ z q “ ´ c c m c p z q . (3.27)Through direct calculation, one can check easily that m αc satisfy the following equations: m c “ ´ c m c , m c “ ´ c m c , m c p z q ´ m c p z q “ p ´ z qp c ´ c q . (3.28)Finally, we introduce the function h p z q : “ z ´ { m c p z q ` p ´ z q m c p z q “ z ´ { m c p z q ` p ´ z q m c p z q“ z { ” ´ z ` p ´ c ´ c q ` a p z ´ λ ´ qp z ´ λ ` q ı . (3.29)Now with the functions m αc and h , we can deﬁne the matrix limit of G p z q asΠ p z q : “ ¨˚˚˝ˆ c ´ m c p z q I p c ´ m c p z q I q ˙ ˆ m c p z q I n h p z q I n h p z q I n m c p z q I n ˙˛‹‹‚ . (3.30)Given z “ E ` i η , we deﬁne its distance (along the real axis) to the two edges as κ ” κ E : “ min t| E ´ λ ´ | , | E ´ λ ` |u . (3.31)We have the following lemma, which can be proved through direct calculations using (3.23)–(3.26). Lemma 3.4.

Fix any constants c, C ą . If (2.9) holds, then we have the following estimates.(1) For z P C ` X t z : c ď | z | ď C u , we have | m c p z q| „ , ď Im m c p z q „ η {? κ ` η, if E R r λ ´ , λ ` s? κ ` η, if E P r λ ´ , λ ` s . (3.32)15

2) For z, z , z P C ` X t z : c ď | z | ď C u X t Re z ą λ ` u , we have | m c p z q ´ m c p λ ` q| „ | z ´ λ ` | { , | m c p z q| „ | z ´ λ ` | ´ { , (3.33) and | m c p z q ´ m c p z q| „ | z ´ z | max i “ , | z i ´ λ ` | { . (3.34) The above estimates also hold for m c , m c , m c and m c . Finally, (3.33) , (3.34) and the ﬁrst estimate in (3.32) hold for h p z q . For simplicity of notations, we introduce the following notion of generalized entries.

Deﬁnition 3.5 (Generalized entries) . For v , w P C I , a P I and an I ˆ I matrix A , we denote A vw : “ x v , A w y , A v a : “ x v , A e a y , A a w : “ x e a , A w y , (3.35) where e a is the standard unit vector along a -th coordinate axis, and the inner product is deﬁned as x v , w y : “ v ˚ w with v ˚ being the conjugate transpose of v . Given a vector v P C I α , α “ , , , , we always identifyit with its natural embedding in C I . For example, we shall identify v P C I with ˆ v0 q ` n ˙ P C I . We deﬁne the following spectral domains for the local laws of G p z q . Deﬁnition 3.6 (Spectral domains) . For any constant ε ą , we deﬁne the domains S p ε q : “ z “ E ` i η : ε ď E ď , n ´ ` ε ď η ď ε ´ ( . (3.36) and S out p ε q : “ S p ε q X t z “ E ` i η : E R r λ ´ , λ ` s , nη ? κ ` η ě n ε u . (3.37) Correspondingly, we shall deﬁne the following two domains that are away from z “ : for any ﬁxed r ε ą , r S p ε, r ε q : “ z “ E ` i η : ε ď E ď ´ r ε, n ´ ` ε ď η ď ε ´ ( , r S out p ε, r ε q : “ r S p ε, r ε q X S out p ε q . Now we are ready to state the local laws for G p z q . For z “ E ` i η , we deﬁne the control parameterΨ p z q : “ d Im m c p z q nη ` nη . (3.38) Theorem 3.7 (Theorem 2.11 and Theorem 2.12 of [50]) . Suppose the assumptions of Lemma 2.7 hold. Thenfor any ﬁxed r ε, ε ą , the following estimates hold.(1) Anisotropic local law : For any z P S p ε q and deterministic unit vectors u , v P C I , we have | G uv p z q ´ Π uv p z q| ă φ n ` Ψ p z q . (3.39) (2) Averaged local law : For any z P r S p ε, r ε q , we have | m p z q ´ m c p z q| ă p nη q ´ . (3.40) Moreover, outside of the spectrum we have the following stronger estimate | m p z q ´ m c p z q| ă n p κ ` η q ` p nη q ? κ ` η , (3.41) for any z P r S out p ε, r ε q . The estimates (3.40) and (3.41) also hold for m α p z q ´ m αc p z q , α “ , , , . ll the above estimates are uniform in the spectral parameter z and any set of deterministic unit vectors ofcardinality n O p q . The averaged local law leads to the following rigidity of eigenvalues.

Theorem 3.8 (Theorem 2.5 of [50]) . Suppose the assumptions of Lemma 2.7 hold. For any ﬁxed δ ą , thefollowing rigidity estimate holds for all ď j ď p ´ δ q q : | λ i ´ γ i | ă i ´ { n ´ { . (3.42)The anisotropic local law (3.39) and the rigidity estimate (3.42) together give the following delocalizationestimates of eigenvectors. Lemma 3.9 (Lemma 3.9 of [50]) . Suppose (3.39) and (3.42) hold. Then for any small constant δ ą anddeterministic unit vectors u α P C I α , α “ , , , , the following estimates hold: max ď k ďp ´ δ q q "ˇˇˇ x u , S ´ { xx ξ k y ˇˇˇ ` ˇˇˇ x u , S ´ { yy ζ k y ˇˇˇ * ă n ´ , (3.43) and max ď k ďp ´ δ q q "ˇˇˇ x u , X J S ´ { xx ξ k y ˇˇˇ ` ˇˇˇ x u , Y J S ´ { yy ζ k y ˇˇˇ * ă n ´ . (3.44)Away from the support r λ ´ , λ ` s , the anisotropic local law can be strengthened as follows. Theorem 3.10 (Anisotropic local law outside of the spectrum) . Suppose the assumptions of Lemma 2.7hold. Fix any constant ε ą . Then for any z P D out p ε q : “ ! E ` i η : λ ` ` n ´ { ` ε ď E ď , ď η ď ) , (3.45) and deterministic unit vectors u , v P C I , the following anisotropic local law holds: | G uv p z q ´ Π uv p z q| ă φ n ` d Im m c p z q nη — φ n ` n ´ { p κ ` η q ´ { . (3.46) Proof.

The second step of (3.46) follows from (3.32). Using (3.39) and κ ě n ´ { ` ε , we have that (3.46)holds for z P S p ε q X D out p ε q with η ě η : “ n ´ { κ { . Hence it remains to prove that for z P D out p ε q with0 ď η ď η , we have | G vv p X, z q ´ Π vv p z q| ă φ n ` n ´ { κ ´ { , (3.47)for any deterministic unit vector v P C I . Note that (3.47) implies (3.46) by the polarization identity x u , M v y “ xp u ` v q , M p u ` v qy ´ xp u ´ v q , M p u ´ v qy` i4 xp i u ` v q , M p i u ` v qy ´ i4 xp i u ´ v q , M p i u ´ v qy for any I ˆ I matrix M . Now ﬁx any z “ E ` i η P D out p ε q with η ď η . We denote z : “ E ` i η . Since(3.47) holds at z , it suﬃces to prove thatΠ vv p z q ´ Π vv p z q ă n ´ { κ ´ { , (3.48)17nd G vv p z q ´ G vv p z q ă n ´ { κ ´ { . (3.49)The estimate (3.48) follows immediately from (3.34). It remains to show (3.49).We write v “ ` v J , v J , v J , v J ˘ J , where v α P C I α , α “ , , ,

4. We claim that ` v ˚ , v ˚ ˘ r G L p z q ´ G L p z qs ˆ v v ˙ ă n ´ { κ ´ { . (3.50)Using (3.13) and (3.14), and recalling that with high probability E ´ λ k Á k ě p ´ δ q q by rigidityestimate (3.42), we obtain that |x v , p G p z q ´ G p z qq v y| ă ÿ k ďp ´ δ q q η |x v , S ´ { xx ξ k y| rp E ´ λ k q ` η s { rp E ´ λ k q ` η s { ` η ÿ k ąp ´ δ q q |x v , S ´ { xx ξ k y| . (3.51)By (3.42), we have that for any k ě E ´ λ k Á κ " η with high probability. Then using (3.43) and (3.20),we can bound (3.51) by |x v , p G p z q ´ G p z qq v y| ă η ` q q ÿ k “ η p E ´ λ k q ` η “ η ` Im m p z q ă η ` nκ ` p nη q ? κ ` Im m c p z qÀ nκ ` p nη q ? κ ` η ? κ ` η À n ´ { κ ´ { , where we used the spectral decomposition for m p z q in the second step, (3.41) in the third step, and (3.32)in the fourth step. Similarly, we have |x v , p G p z q ´ G p z qq v y| ă | ´ p zz ´ q { | |x v , G p z q v y| ` q ÿ k “ η |x v , S ´ { xx ξ k y||x v , S ´ { yy ζ k y|| λ k ´ z || λ k ´ z | ă η ` Im m p z q ă n ´ { κ ´ { . With similar arguments for x v , p G p z q ´ G p z qq v y and |x v , p G p z q ´ G p z qq v y , we can conclude (3.50).Finally, using (3.50), (3.15), (3.16) and Lemma 3.9, we can prove (3.49). We omit the details.The second moment of x u , p G p z q ´ Π p z qq v y in fact satisﬁes a stronger bound. It will be used in the proofof Theorem 2.14. Lemma 3.11.

Suppose the assumptions of Lemma 2.7 hold. Fix any constant ε ą . For all z P S p ε q (recall (3.36) ), we have that E | G uv p z q ´ Π uv p z q| ă Ψ p z q , (3.52) for any deterministic unit vectors u , v P C I . Moreover, for all z P D out p ε q we have that E | G uv p z q ´ Π uv p z q| ă n ? κ ` η . (3.53)18 roof. The estimate (3.52) has been proved in Lemma 3.10 of [50]. The estimate (3.53) can be proved usingalmost the same argument, where the only diﬀerence is that we replace the anisotropic local law (3.39) withthe stronger one (3.46) in the proof. We omit the details.Finally, we state the local law for G b p z q , which can be derived easily from the local law for G p z q withthe following Woodbury matrix identity: for A , S, B , T of conformable dimensions, p A ` S B T q ´ “ A ´ ´ A ´ S p B ´ ` T A ´ S q ´ T A ´ , (3.54)and the following approximate isometry condition of Z : } ZZ J ´ I r } ă ψ n . (3.55)The estimate (3.55) can be proved using standard large deviation estimate (cf. Lemma 3.8 of [22]). We deﬁneΠ b p z q : “ Π p z q ´ Π p z q ˆ U b E b ˙ ¨˚˚˝ˆ c m ´ c p z q Σ b M b ˙ ˆ M b ˙ˆ M b ˙ ˆ m ´ c p z q Σ b M b ˙˛‹‹‚ˆ U J b E J b ˙ Π p z q , where M b : “ Σ b ` Σ b , U b : “ ˆ ` u b , ¨ ¨ ¨ , u br ˘˙ , E b : “ ˆ ` Z J v b , ¨ ¨ ¨ , Z J v br ˘˙ . (3.56) Lemma 3.12 (Local laws for G b ) . Suppose the assumptions of Lemma 2.7 hold. Fix any constant ε ą and unit vectors u , v P C I that are independent of X and Y . Then we have that for all z P S p ε q , ˇˇ G b uv p z q ´ Π b uv p z q ˇˇ ă ψ n ` φ n ` Ψ p z q , (3.57) and for all z P D out p ε q , ˇˇ G b uv p z q ´ Π b uv p z q ˇˇ ă ψ n ` φ n ` n ´ { p κ ` η q ´ { . (3.58) Moreover (3.57) and (2.20) together imply that for any constant δ ą , max ď k ďp ´ δ q q "ˇˇˇ x u , S ´ { xx ξ bk y ˇˇˇ ` ˇˇˇ x u , p S byy q ´ { ζ bk y ˇˇˇ * ă n ´ , (3.59) and max ď k ďp ´ δ q q "ˇˇˇ x u , X J S ´ { xx ξ bk y ˇˇˇ ` ˇˇˇ x u , Y J p S byy q ´ { ζ bk y ˇˇˇ * ă n ´ , (3.60) where t ξ bk u pk “ are t ζ bk u qk “ are the left and right singular vectors of H b , respectively, and u α P C I α are unitvectors independent of X and Y .Proof. Using (3.54), we can write G b p z q in (3.10) as G b “ G ´ G ˆ U b E b ˙ „ˆ D ´ b D ´ b ˙ ` ˆ U J b E J b ˙ G ˆ U b E b ˙ ´ ˆ U J b E J b ˙ G. (3.61)where D b : “ ˆ b ˙ . Since D ´ b is not well-deﬁned, the above expression should be understood through „ˆ D ´ b D ´ b ˙ ` ˆ U J b E J b ˙ G ˆ U b E b ˙ ´ : “ „ ` ˆ D b D b ˙ ˆ U J b E J b ˙ G ˆ U b E b ˙ ´ ˆ D b D b ˙ . Combining (3.61) with Theorem 3.7, Theorem 3.10 and (3.55), we can conclude (3.57) and (3.58). Theestimates (3.59) and (3.60) follow from (3.57) and (2.20) as in Lemma 3.9, where the details can be foundin the proof of Lemma 3.9 of [50]. 19

Proof of Theorem 2.9

In this section, we prove Theorem 2.9 using the local laws, Theorems 3.7 and 3.10, and the eigenvalue rigidityestimate, Theorem 3.8. During the proof, in order to avoid some non-generic events, we assume thatthe entries x ij , y ij and z ij have continuous densities . (4.1)It can be achieved by adding a small perturbation to X , Y and Z . For example, we can add to each matrixa small Gaussian matrix: X Ñ X ` δe ´ n X G , Y Ñ Y ` δe ´ n Y G , Z Ñ Z ` δe ´ n Z G . These Gaussian components are negligible for our results and can be easily removed by taking δ Ñ

0. Under(4.1), the matrices

X X J , YY J , XX J and Y Y J are all non-singular almost surely under (4.1). Moreover,almost surely, λ “ C XY or C X Y . Hence by (3.7), 0 ă λ ă C X Y if and only if det „ ` ˆ DD ˙ ˆ U J E J ˙ G p λ q ˆ U E ˙ “ . (4.2)Now for λ P D out p ε q (recall (3.45)), using Theorem 3.10 and (3.55), we can write (4.2) as0 “ det „ ` ˆ DD ˙ p Π r p λ q ` E r q  “ det »——–¨˚˚˝ I r D ˆ m c p λ q I r h p λ q M r h p λ q M J r m c p λ q I r ˙ D ˆ c ´ m c p λ q I r c ´ m c p λ q I r ˙ I r ˛‹‹‚ ` ˆ DD ˙ E r ﬁﬃﬃﬂ . (4.3)Here E r is a 4 r ˆ r random matrix satisfying } E r } ă ψ n ` φ n ` n ´ { κ ´ { λ , with κ λ “ min t| λ ´ λ ´ | , | λ ´ λ ` |u , (4.4) M r is an r ˆ r orthogonal matrix with entries p M r q ij : “ p v ai q J v bj , ď i, j ď r, and Π r p λ q is deﬁned asΠ r p λ q : “ ¨˚˚˝ˆ c ´ m c p λ q I r c ´ m c p λ q I r ˙ ˆ m c p λ q I r h p λ q M r h p λ q M J r m c p λ q I r ˙˛‹‹‚ . Applying Schur complement formula and using (3.28), we obtain that (4.3) is equivalent todet „ˆ I r ` Σ a h p λ q m ´ c p λ q Σ a M r h p λ q m ´ c p λ q Σ b M J r I r ` Σ b ˙ ` DE r  “ ô det ˆ m c p λ q m c p λ q h p λ q I r ´ Σ a p ` Σ a q { M r Σ b ` Σ b M J r Σ a p ` Σ a q { ` E r ˙ “ , (4.5)20here E r and E r are 2 r ˆ r and r ˆ r random matrices, both of which satisfy the same bound as in (4.4).Note that the matrix Σ a p ` Σ a q { M r Σ b ` Σ b M J r Σ a p ` Σ a q { is the PCC matrix p ` AA J q ´ { AB J p ` BB J q ´ BA J p ` AA J q ´ { in the basis of u ai , 1 ď i ď r . Thus itseigenvalues are exactly the squares of the PCC’s, t , t , ¨ ¨ ¨ , t r (recall (2.14)). Thus after a change of basis,(4.5) reduces to det ˆ m c p λ q m c p λ q h p λ q I r ´ diag p t , ¨ ¨ ¨ , t r q ` E r p λ q ˙ “ , (4.6)where E r also satisﬁes the bound as in (4.4).Next we show that if E r “

0, then solving equation (4.6) gives the classical locations θ i deﬁned in (2.15).Using (3.25), (3.26) and (3.29), we can calculate that f c p z q : “ m c p z q m c p z q h p z q “ z r ` p ´ z q m c p z qsr ` p ´ z q m c p z qs“ z ´ p c ` c ´ c c q ` a p z ´ λ ´ qp z ´ λ ` q p ´ c qp ´ c q . We can ﬁnd the inverse function of f c p z q for z R r λ ´ , λ ` s as g c p ξ q : “ ξ ` ´ c ` c ξ ´ ˘ ` ´ c ` c ξ ´ ˘ . Note that f c p λ q is monotonically increasing in λ for λ ą λ ` , so the function f c p λ q ´ t i “ p λ ` , if and only if (recall (1.3)) f c p λ ` q ă t i ô t c ă t i . (4.7)If (4.7) holds, the classical location of the outlier corresponding to t i is θ i “ g c p t i q , which explains (2.15).With direct calculation, one can verify the following simple estimates on f c and g c . Lemma 4.1.

Fix a large constant C ą . Let z, z , z P D : “ t z P C : λ ` ă Re z ă C, ă Im z ď C u and ξ, ξ , ξ P f c p D q . Then the following estimates hold: | f c p z q ´ f c p λ ` q| „ | z ´ λ ` | { , | f c p z q| „ | z ´ λ ` | ´ { , (4.8) | g c p ξ q ´ λ ` | „ | ξ ´ t c | , | g c p ξ q| „ | ξ ´ t c | , (4.9) and | f c p z q ´ f c p z q| „ | z ´ z | max i “ , | z i ´ λ ` | { , | g c p ξ q ´ g c p ξ q| „ | ξ ´ ξ | ¨ max i “ , | ξ i ´ t c | . (4.10) The estimate (4.8) also holds for z with λ ´ ` c ď Re z ď λ ` and ă Im z ď c ´ for any constant c ą . For the proof of Theorem 2.9, we record the following eigenvalues interlacing result: r λ i P r λ i ` r , λ i ´ r s , (4.11)where we adopt the convention that λ i “ i ă λ i “ i ą q . For the reader’s convenience, webrieﬂy describe why (4.11) holds. We ﬁrst consider a 1-dimensional perturbation: X : “ X ` u v J , u P R p , v P R n . P X : “ X J p XX J q ´ X is a projection onto the subspace W spanned by the rowsof X . Similarly, P X : “ X J p X X J q ´ X is a projection onto the subspace W spanned by the rows of X .Moreover, W and W diﬀer at most by a 1-dimensional subspace. Hence by Cauchy interlacing, we have λ i p P X P Y P X q P r λ i ` p P X P Y P X q , λ i ´ p P X P Y P X qs , where P Y : “ Y J Y Y J Y. Notice that P X P Y P X (resp. P X P Y P X ) have the same nonzero eigenvalues as C XY (resp. C X Y ): if u is aneigenvector of C XY with eigenvalue λ , then X J p XX J q ´ { u is an eigenvector of P X P Y P X with the sameeigenvalue. Thus we get λ i p C X Y q P r λ i ` p C XY q , λ i ´ p C XY qs . Repeating this estimate r times for the rank- r perturbation X , we get λ i p C a X Y q P r λ i ` r p C XY q , λ i ´ r p C XY qs , where C a X Y is deﬁned by replacing X with X in C XY . Obviously, the same argument works for the rank- r perturbation of Y , which leads to (4.11).With (4.6) and (4.11), the rest of the proof for Theorem 2.9 is similar to the ones in [12, Section 4] and[36, Section 6], but these references have only considered the cases with small support φ n ă n ´ { . We needto adapt their proofs to our setting with larger φ n and ψ n . Proof of Theorem 2.9.

For simplicity of presentation, in this proof we abbreviate φ n ` ψ n as φ n becausethese two factors always appear together. By Theorems 3.7, 3.8 and 3.10, for any ﬁxed ε ą

0, we can choosea high-probability event Ξ on which the following estimates hold: p Ξ q ››››ˆ U J E J ˙ G p z q ˆ U E ˙ ´ Π r p z q ›››› ď n ε { p φ n ` Ψ p z qq , for z P S p ε q , (4.12) p Ξ q ››››ˆ U J E J ˙ G p z q ˆ U E ˙ ´ Π r p z q ›››› ď n ε { ´ φ n ` n ´ { κ ´ { ¯ , for z P D out p ε q , (4.13)and for a ﬁxed large integer ̟ , p Ξ q | λ i ´ λ ` | ď n ´ { ` ε , for 1 ď i ď ̟ ` r. (4.14)We remark that the randomness of X and Y only comes into play to ensure that Ξ holds with high probability.The rest of the proof will be entirely deterministic once restricted to Ξ. In the following proof, we assumethat ε is a suﬃciently small constant.We now deﬁne the index sets O ε : “ ! i : t i ´ t c ě n ε φ n ` n ´ { ` ε ) . (4.15)Since the constant ε is arbitrary, in order to prove (2.23) and (2.24), it suﬃces to show that for some constant C ą p Ξ q ˇˇˇr λ i ´ θ i ˇˇˇ ď Cn ε ´ φ n ∆ i ` n ´ { ∆ { i ¯ , (4.16)for all i P O ε , and ´ n ´ { ` ε ď p Ξ q ´r λ i ´ λ ` ¯ ď Cn ε φ n ` Cn ´ { ` ε (4.17)for all i P t , ¨ ¨ ¨ , ̟ uz O ε . 22 tep 1: Our ﬁrst step is to prove that on Ξ, there are no eigenvalues outside the neighborhoods of θ i ’s. Foreach 1 ď i ď r ` , we deﬁne the permissible intervalsI i ” I i p t q : “ ” θ i ´ n ε ´ φ n ∆ i ` n ´ { ∆ { i ¯ , θ i ` n ε ´ φ n ∆ i ` n ´ { ∆ { i ¯ı , (4.18)where t represents the canonical correlation coeﬃcients t : “ p t , t , ¨ ¨ ¨ , t r q . We then deﬁneI ” I p t q : “ I Y ´ ď i P O ε I i p t q ¯ , I ” I : “ ” , λ ` ` n ε φ n ` n ´ { ` ε ı . (4.19)We claim the following result. Lemma 4.2.

The complement of I p t q contains no eigenvalues of C X Y .Proof.

The main idea is similar to the ones for [36, Proposition 6.5] and [17, Lemma S.4.2]. It suﬃces toshow that for any 1 ď i ď r , if x R I p t q , then | f c p x q ´ t i | ě c ´ n ε φ n ` n ´ { ` ε κ ´ { x ¯ (4.20)for some constant c ą

0. Thus (4.6) cannot hold on Ξ by (4.13).For x R I , using (4.10) we get f c p x q ´ t c “ f c p x q ´ f c p λ ` q ě cκ { x ě c ´ n ε φ n ` n ´ { ` ε κ ´ { x ¯ . for some constants c, c ą

0. This concludes (4.20) for i ě r ` using t i ď t c ` n ´ { ` φ n .Next for the case 1 ď i ď r ` , we take any x R I Y I i p t q . We ﬁrst assume that there exists a constant r c ą θ i R r x ´ r cκ x , x ` r cκ x s . Then since f c is monotonically increasing on p λ ` , `8q , we have that | f c p x q ´ t i | “ | f c p x q ´ f c p θ i q| ě | f c p x q ´ f c p x ˘ r cκ x q| ě cκ { x ě c ´ n ε φ n ` n ´ { ` ε κ ´ { x ¯ , for some constants c, c ą

0, where we used (4.10) in the third step. On the other hand, suppose θ i Pr x ´ r cκ x , x ` r cκ x s , in which case we have that θ i ´ λ ` „ κ x . With (4.9), we have κ x „ θ i ´ λ ` „ ∆ i . Thenusing (4.10) and the deﬁnition of I i p t q , we get that for x R I i p t q , | f c p x q ´ t i | “ | f c p x q ´ f c p θ i q| ě c ∆ ´ i ´ n ε φ n ∆ i ` n ´ { ` ε ∆ { i ¯ ě c ´ n ε φ n ` n ´ { ` ε κ ´ { x ¯ , for some constants c, c ą

0. This concludes (4.20) and hence Lemma 4.2.

Step 2:

Before giving the general proof, for heuristics we consider an easy case where the t i ’s are independentof n and satisfy that t ą t ą ¨ ¨ ¨ ą t r ` ą λ ` . (4.21)We claim that each I i p t q , 1 ď i ď r ` , contains precisely one eigenvalue of C X Y . Fix any 1 ď i ď r ` andchoose a small n -independent positively oriented closed contour Γ Ă C {r , λ ` s that encloses θ i but no otherpoints of the set t θ i : 1 ď i ď r ` u . Deﬁne two functions f p z q : “ det p f c p z q I r ´ diag p t , ¨ ¨ ¨ , t r qq , f p z q “ det ` f c p z q I r ´ diag p t , ¨ ¨ ¨ , t r q ` E r p z q ˘ . (4.22)The functions f , f are holomorphic on and inside Γ when n is suﬃciently large, because Γ does not encloseany pole of G p z q by (4.14). Moreover, by the construction of Γ , the function f has precisely one zero inside Γ at θ i . By (4.13), we have min z P Γ | f p z q| Á , max z P Γ | f p z q ´ f p z q| “ o p q . Step 3:

In order to extend the argument in Step 2 to an arbitrary n -dependent conﬁguration t n , we needto deal with the case where some of the intervals I i and I j , i ‰ j , have non-empty overlaps. For any ε ą r ε : “ | O ε | . In this step, we prove the following claim for the ﬁrst r ε eigenvalues. Claim 4.3.

On event Ξ , the estimate (4.16) holds for i P O ε .Proof. Let B denote the ﬁnest partition of t , ¨ ¨ ¨ , r ` u in the sense that i and j belong to the same block of B whenever I i X I j ‰ H . We now ﬁx any 1 ď i ď r ε , and denote by B i the block of B that contains i . Ourﬁrst task is to estimate θ j ´ ´ θ j for j, j ´ P B i . We claim that there exists a constant C ą θ j ´ ´ θ j ď C ´ n ε φ n ∆ j ` n ´ { ` ε ∆ { j ¯ , if j P B i and j ´ P B i . (4.23)First we assume that j P O ε . We pick any x P I j X I j ´ such that θ j ď x ď θ j ´ . Then using (4.8) and(4.10) we obtain that | f c p x q ´ t j | “ | f c p x q ´ f c p θ j q| ď C ´ n ε φ n ` n ´ { ` ε ∆ ´ { j ¯ ! ∆ j , using ∆ j ě n ε φ n ` n ´ { ` ε for j P O ε . Thus we get that | f c p x q ´ t c | “ p ` o p qq ∆ j . Similarly, we canshow that | f c p x q ´ t c | “ p ` o p qq ∆ j ´ . This gives (4.23) due to the choice of x and the deﬁnition of I j andI j ´ . In addition we also get that∆ j “ p ` o p qq ∆ j ´ , if j P B i and j ´ P B i . (4.24)It remains to verify that j P O ε for all j P B i . Let j be the smallest integer such that θ j R B i . Since | B i | ď r , by (4.23) we have that θ j ´ ą θ i ´ C ´ n ε φ n ∆ i ` n ´ { ` ε ∆ { i ¯ for some constant C ą

0. Then using i P O ε , j R O ε and (4.9), we can check that θ j ´ ´ θ j " ´ n ε φ n ∆ j ´ ` n ´ { ` ε ∆ { j ´ ¯ ` ´ n ε φ n ∆ j ` n ´ { ` ε ∆ { j ¯ , which contradicts the deﬁnition of B i . This concludes (4.23).Now with (4.23), (4.24) and | B i | ď r , we obtain that d i : “ diam ˜ ď j P B i I j ¸ ď C r ´ n ε φ n ∆ i ` n ´ { ` ε ∆ { i ¯ . (4.25)for some constant C r ą r and C only. On the other hand, by (4.9) we have that θ i ´ λ ` ´ d i ě c ∆ i ´ C r ´ n ε φ n ∆ i ` n ´ { ` ε ∆ { i ¯ " n ε φ n ` n ´ { ` ε , where we used ∆ i ě n ε φ n ` n ´ { ` ε for i P O ε in the second step. Hence there is a gap between the rightedge of I and the left edge of Ť j P B i I j .Let x i and y i be the left and right end points of the interval Ť j P B i I j . Then we pick the contour Γ i : “ t z “ x i ` i η : ´ d i ď η ď d i u Y t z “ y i ` i η : ´ d i ď η ď d i u Y t z “ E ˘ i d i : x i ď E ď y i u , , and only includes θ j ’s with j P B i , but no other points ofthe set t θ i : 1 ď i ď r ` u . We again consider the functions in (4.22). We know that f p z q has exactly | B i | eigenvalues at θ j , j P B i . Moreover, with the arguments in Lemma 4.2, one can show that } E p z q} “ o p q for z P Γ i , E p z q : “ r f c p z q I r ´ diag p t , ¨ ¨ ¨ , t r qs ´ E r p z q . Thus we have | f p z q ´ f p z q| “ | f p z q| | det p ` E p z qq ´ | ă | f p z q| for z P Γ i . Then by Rouch´e’s theorem, f p z q has exactly | B i | eigenvalues in Ť j P B i I j . Together with Lemma 4.2 and asimple eigenvalues counting argument, we get that r λ i P Ť j P B i I j , and hence | r λ i ´ θ i | ď d i , i P O ε . This concludes Claim 4.3 by (4.25).

Step 4:

Finally, we consider the eigenvalues r λ i with i R O ε . First by (4.14) and (4.11), we have that λ i p s q ě λ ` ´ n ´ { ` ε , i ď ̟. (4.26)For the upper bound, we consider the intervals as in (4.18) and p I : “ ” , λ ` ` r C ´ n ε φ n ` n ´ { ` ε ¯ı , for a large constant r C ą

0. Then we deﬁne a partition B as in Step 3, where B is the block of B thatcontains i . With the same arguments as in the proof of Claim 4.3, we can prove that p I Y ˜ ď j P B I j ¸ Ă ” , λ ` ` C ´ n ε φ n ` n ´ { ` ε ¯ı (4.27)for some constant C ą

0. Moreover, for any j R B , we have that j P O ε by (4.9) as long as r C is chosenlarge enough. Thus with Lemma 4.2, the result of Step 3 and a simple eigenvalues counting argument, weget that r λ i P p I Y ˜ ď j P B I j ¸ , i R O ε . This concludes (4.17) by (4.27), and hence completes the proof of Theorem 2.9.

As in Section 4, from equation (3.9), we can derive a similar equation as (4.6). More precisely, suppose λ isnot an eigenvalue of C bX Y and the following local law holds for G b p λ q : ˆ U J a E J a ˙ G b p λ q ˆ U a E a ˙ ´ Π br p λ q “ O p Φ n q with high probability,where Φ n is a deterministic parameter satisfying 0 ă Φ n ď n ´ ε for a constant ε ą

0, andΠ br p λ q : “ ¨˚˚˚˝ˆ c ´ m c p λ q I r

00 0 ˙ ˜ m c p λ q I r ´ h p λ q m c p λ q M r Σ b ` Σ b M J r

00 0 ¸˛‹‹‹‚ . λ is an eigenvalue of C X Y if and only ifdet p f c p λ q I r ´ diag p t , ¨ ¨ ¨ , t r q ` E r p λ qq “ , (5.1)with E r satisfying } E r } À ψ n ` Φ n with high probability. Moreover, similar to (4.11), we have the followingeigenvalues interlacing, r λ i P r λ bi ` r , λ bi ´ r s , (5.2)where we adopt the convention that λ bi “ i ă λ bi “ i ą q . This is the main reason why we use C bX Y and G b p z q instead of C XY and G p z q for the proof of Theorem 2.11—the interlacing result (4.11) is notstrong enough by using a rank- p r q perturbation. Proof of Theorem 2.11.

Again, in this proof, we abbreviate φ n ` ψ n as φ n . By (2.20), Theorem 2.9, Lemma3.3, (3.55) and Lemma 3.12, for any small constant ε ą ̟ P N , we can choose ahigh-probability event Ξ on which the following estimates hold: p Ξ q ˇˇ λ bi ´ λ ` ˇˇ ď n ´ { ` ε { , for 1 ď i ď ̟ ; (5.3) p Ξ q| λ bi ´ γ i | ď i ´ { n ´ { ` ε { , for 1 ď i ď p ´ δ q q ; (5.4) p Ξ q ´r λ i ´ θ i ¯ ď n ε φ n ∆ i ` n ´ { ` ε ∆ { i , for i ď r ` ; (5.5) ´ p Ξ q n ´ { ` ε { ď p Ξ q ´r λ i ´ λ ` ¯ ď n ε { φ n ` n ´ { ` ε { , for r ` ` ď i ď ̟ ; (5.6) c ď min t λ p p S xx q , λ q p S byy qu ď max t λ p S xx q , λ p S byy qu ď c ´ ; (5.7) p Ξ q} ZZ J ´ I r } ď n ε { φ n ; (5.8) p Ξ q| m b p z q ´ m c p z q| ď n ε { p φ n ` Ψ p z qq , for z P S p ε q ; (5.9) p Ξ q ››››ˆ U J a E J a ˙ G b p z q ˆ U a E a ˙ ´ Π br p z q ›››› ď n ε { p φ n ` Ψ p z qq , for z P S p ε q ; (5.10) p Ξ q ››››ˆ U J a E J a ˙ G b p z q ˆ U a E a ˙ ´ Π br p z q ›››› ď n ε { ´ φ n ` n ´ { κ ´ { ¯ , for z P D out p ε q ; (5.11) p Ξ q max ď k ďp ´ δ q q "ˇˇˇ x u , S ´ { xx ξ bk y ˇˇˇ ` ˇˇˇ x u , p S byy q ´ { ζ bk y ˇˇˇ * ď n ´ ` ε { , u P C I , u P C I ; (5.12) p Ξ q max ď k ďp ´ δ q q "ˇˇˇ x u , X J S ´ { xx ξ bk y ˇˇˇ ` ˇˇˇ x u , Y J p S byy q ´ { ζ bk y ˇˇˇ * ď n ´ ` ε { , u P C I , u P C I . (5.13)Here c is a small enough constant, and the vectors u α , α “ , , ,

4, belong to a set of vectors Γ thatis independent of X and Y , has cardinality n O p q , and includes all the unit vectors that will be used inthe proof. Again the randomness of X , Y and Z only comes into play to ensure that Ξ holds with highprobability, and the rest of the proof will be entirely deterministic. Step 1:

As in the proof of Theorem 2.9, we ﬁrst ﬁnd a permissible region. For any i , we deﬁne the setΩ i : “ ! x P r λ bi ` r ` , λ ` ` n ε φ n ` n ´ { ` ε s : dist ´ x, Spec p C bX Y q ¯ ą n ´ ` ε α ´ ` ) , (5.14)where Spec p C bX Y q stands for the eigenvalue spectrum of C bX Y .26 emma 5.1. There exists a constant C ą such that for α ` ě C ` n ε φ n ` n ´ { ` ε ˘ and i ď n ´ ε α ` , theset Ω i contains no eigenvalue of C X Y . Proof.

In the proof, we always use the following spectral parameters η x : “ n ´ ` ε α ´ ` , z x “ x ` i η x . (5.15)Suppose x P Ω i . We ﬁrst claim that for any deterministic unit vectors u , v P Γ, we have | G b u v p z x q ´ G b u v p x q| ď Cn ε { Im m b p z x q ` Cn ε { η x , x P Ω i . (5.16)We use a similar argument as in the proof of Theorem 3.10. To illustrate the idea, for v “ ` v J , v J , v J , v J ˘ J and u “ ` u J , u J , u J , u J ˘ J with u α , v α P C I α , we calculate G b u v p z x q ´ G b u v p x q as an example. As in(3.51), we have ˇˇ G b u v p z x q ´ G b u v p x q ˇˇ À ÿ k ďp ´ δ q q η x |x v , S ´ { xx ξ bk y||x u , S ´ { xx ξ bk y|| λ bk ´ x | “ p λ bk ´ x q ` η x ‰ { ` η x ÿ k ąp ´ δ q q |x v , S ´ { xx ξ bk y||x u , S ´ { xx ξ bk y|À n ´ ` ε { q ÿ k “ η x p λ bk ´ x q ` η x ` η x À n ε { Im m b p z x q ` η x , where in the second step we used (5.7), (5.12) and | λ bk ´ x | ě η x for x P Ω i , and in the last step we used thespectral decomposition of m b p z x q . The proofs for the rest of the cases G b u α v β p z x q´ G b u α v β p x q , α, β “ , , , x P Ω i is an eigenvalue of C X Y if and only if (5.1) holds, where E r satisfy the following boundby (5.16), (5.9) and (5.10): } E r p x q} ď C ´ n ε { Im m c p z x q ` n ε { η x ` n ε { φ n ` n ε { Ψ p z x q ¯ for some constant C ą

0. With (3.32) and the deﬁnition of Ψ p z x q in (3.38), we can further bound that } E r p x q} ď C ˆ n ε { φ n ` n ε { Im m c p z x q ` n ε { nη x ˙ for some constant C ą

0. Now to prove the lemma, it suﬃces to show that for any 1 ď j ď r , | f c p x q ´ t j | ą C ˆ n ε { φ n ` n ε { Im m c p z x q ` n ε { nη x ˙ , x P Ω i . (5.17)Since i ď n ´ ε α ` , by (5.4) we have ´ ´ n ε φ n ` n ´ { ` ε ¯ ď λ ` ´ x À p i { n q { ` i ´ { n ´ { ` ε { À n ´ ε { α ` , x P Ω i , (5.18)where we also used γ i „ p i { n q { and α ` ě n ´ { ` ε . Then by (4.8), we have | f c p x q ´ t c | “ | f c p x q ´ f c p λ ` q| ď Cn ´ ε { α ` , x P Ω i X t x : x ď λ ` u . and | f c p x q ´ t c | “ | f c p x q ´ f c p λ ` q| ď C ´ n ε φ n ` n ´ { ` ε ¯ , x P Ω i X t x : x ą λ ` u , C ą C . Hence as long as C is chosen large enough, we have | f c p x q ´ t c | ď α ` ñ | f c p x q ´ t j | ě α ` , (5.19)where we used the deﬁnition of α ` in (2.22). On the other hand, with (3.32), (5.15) and (5.18) we can verifythat C ˆ n ε { φ n ` n ε { Im m c p z x q ` n ε { nη x ˙ ď C ´ n ε { φ n ` n ε { ? κ x ` η x ` n ´ ε { α ` ¯ ! α ` for x P Ω i X t x : x ď λ ` u , and C ˆ n ε { φ n ` n ε { Im m c p z x q ` n ε { nη x ˙ ď C ˆ n ε { φ n ` n ε { η x ? κ x ` η x ` n ´ ε { α ` ˙ ! α ` for x P Ω i X t x : x ą λ ` u . Together with (5.19), we see that (5.17) holds. This concludes the proof. Step 2:

In this step, we perform a counting argument for a special case as in the following lemma. Wepostpone its proof until we ﬁnish the proof of Theorem 2.11.

Lemma 5.2.

Given ď r ` ď r , we choose a matrix A ” A p q of rank r ` such that the eigenvaluesconﬁguration t ” t p q : “ p t , t , ¨ ¨ ¨ , t r q of the PCC matrix satisﬁes that p t r ` ´ t c q ^ p t c ´ t r ` ` q ^ min ď i ď r ` ´ p t i ´ t i ` q Á . (5.20) Then for i ď n ´ ε α ` p q , we have | r λ i ` r ` ´ λ bi | ď n ´ ` ε α ´ ` p q , (5.21) where α ` p q is deﬁned as in (2.22) for t p q . (The meaning of the argument 0 will be clear in Step 3 below.) Step 3:

In this step we employ a continuity argument as in [36, Section 6.5] and [17, Section S.4.2]. Wechoose a continuous ( n -dependent) path A p s q for 0 ď s ď

1, such that A p q “ A is the matrix in Theorem2.11, and A p q gives an eigenvalues conﬁguration t p q satisfying (5.20). Correspondingly, we have continuouspaths of the conﬁgurations t p s q and the sample eigenvalues t r λ i p s qu ni “ . We can choose A p s q such thatinf s Pr , s α ` p s q Á α ` ” α ` p q , where α ` p s q is deﬁned as in (2.22) for the eigenvalues conﬁguration t p s q .In this step we consider the case where α ` ě C ` n ε φ n ` n ´ { ` ε ˘ and i ď n ´ ε α ` . Without loss ofgenerality, we rename α ` : “ inf s Pr , s α ` p s q . Deﬁne r I : “ ! x P r , λ ` ` n ε φ n ` n ´ { ` ε s : dist ` x, Spec p C bX Y q ˘ ď n ´ ` ε α ´ ` ) . Note that r I is a union of connected intervals. Due to the interlacing (5.2), we have λ bi ` r ď r λ i p s q ď λ bi ´ r , s P r , s . (5.22)By Lemma 5.1 and Lemma 5.2, we know | r λ i ` r ` p q ´ λ bi | ď n ´ ` ε α ´ ` , ´r λ i ` r ` p s q , Spec p C bX Y q ¯ ď n ´ ` ε α ´ ` , s P r , s . (5.23)In addition, by continuity of eigenvalues with respect to s , we know that r λ i ` r ` p s q is in the same connectedcomponent of r I as r λ i ` r ` p q . For any i , let B i be the set of j such that λ bi and λ bj are in the same connectedcomponent of r I . Then we conclude that r λ i ` r ` p s q P ď j P B i : | i ` r ` ´ j |ď r “ λ bj ´ n ´ ` ε α ´ ` , λ bj ` n ´ ` ε α ´ ` ‰ . This gives that ˇˇˇr λ i ` r ` p s q ´ λ bi ˇˇˇ ď rn ´ ` ε α ´ ` , s P r , s . (5.24) Step 4:

Finally we consider the cases α ` ă C ` n ε φ n ` n ´ { ` ε ˘ , or i ą n ´ ε α ` . Suppose ﬁrst that α ` ă C ` n ε φ n ` n ´ { ` ε ˘ . Then by the assumption of Theorem 2.11, if ε is small enough such that ε ă ε ,we must have φ n ď n ´ { , and α ` À n ´ { ` ε . (5.25)Now using (5.25), (5.2), (5.4) and (5.6), we ﬁnd that | r λ i ` r ` ´ λ bi | À n ´ { ` ε À n ´ ` ε α ´ ` . On the other hand, suppose i ą n ´ ε α ` . If i ď r , then we have α ` À n ´ { ` ε { , and with the sameargument as above, we get | r λ i ` r ` ´ λ bi | ď Cn ´ { ` ε ď n ´ ` ε α ´ ` . Otherwise, using (5.2) and (5.4) we get | r λ i ` r ` ´ λ bi | ď Ci ´ { n ´ { ` ε { ď n ´ ` ε α ´ ` . Combining the above three estimates with (5.24), we conclude (2.25), since ε ą i ď ̟ for some ﬁxed integer ̟ . Proof of Lemma 5.2.

Note that in this lemma, we have α ` ” α ` p q „

1. In the ﬁrst step, we group togetherthe eigenvalues λ i that are close to each other. More precisely, let B “ t B k u be the ﬁnest partition of t , ¨ ¨ ¨ , q u such that i ă j belong to the same block of B if | λ bi ´ λ bj | ď n ´ ` ε { α ´ ` . Note that each block B k of B consists of a sequence of consecutive integers. We order the blocks in thedescending order, that is, if k ă l then λ bi k ą λ bi l for all i k P B k and i l P B l .We ﬁrst derive a bound on the sizes of the blocks. We deﬁne k ˚ such that n : “ r n ´ ε α ` s P B k ˚ . Forany k ď k ˚ , we take i ă j such that i and j both belong to the block B k . Then by (5.2) and (5.4), we havethat for some constants c, C ą c «ˆ jn ˙ { ´ ˆ in ˙ { ﬀ ´ Ci ´ { n ´ { ` ε { ď λ bi ´ λ bj ď C p j ´ i q n ´ ` ε { α ´ ` . j { ´ i { ě j ´ { p j ´ i q , we obtain that ´ j ´ { ´ Cn ´ { ` ε { α ´ ` ¯ p j ´ i q ď Ci ´ { n ε { . From this estimate we conclude that if i and j satisfy1 ď i ď j ď n ´ ε { , (5.26)then j ´ i ď C p j { i q { n ε { . (5.27)Now we claim that | B k | ď Cn ε { for k “ , ¨ ¨ ¨ , k ˚ , (5.28)and for any given i k P B k , | λ bi ´ γ i k | ď i ´ { n ´ { ` ε for all i P B k . (5.29)To prove (5.28) and (5.29), we denote α k : “ max i P B k i and β k : “ min i P B k i. If i P B k satisﬁes i ě α k {

2, then(5.27) gives that α k ´ i ď Cn ε { , with which we obtain that | γ i ´ γ α k | ď Ci ´ { n ´ { ` ε { . On the other hand, if i P B k satisﬁes i ď α k {

2, then (5.27) gives that α k ´ i ď α k ď Cn ε { . Thus we get | γ i ´ γ α k | ď | γ ´ γ α k | ď Cn ´ { ` ε { ď Ci ´ { n ´ { ` ε { . Together with (5.4), we obtain that | λ bi ´ γ i k | ď | λ bi ´ γ i | ` | γ i ´ γ α k | ` | γ α k ´ γ i k | ď Ci ´ { n ´ { ` ε { ď i ´ { n ´ { ` ε . From the above proof, we see that (5.28) and (5.29) as long as (5.26) holds. We still need to prove (5.26)for i, j P B k ˚ . In fact, if there is j P B k ˚ such that j ě n ´ ε { , then we can ﬁnd j P B k ˚ such that n ε ď j ´ n ď n ε , which contradicts (5.27) and (5.28).We are now ready to complete the proof. For any 1 ď k ď k ˚ , we denote a k : “ min i P B k λ bi “ λ bα k , b k : “ max i P B k λ bi “ λ bβ k . (5.30)We introduce a continuous path as x ks : “ p ´ s q p a k ´ δ n { q ` s p b k ` δ n { q , s P r , s , (5.31)where δ n : “ n ´ ` ε { α ´ ` . The interval r x k , x k s contains precisely the eigenvalues of C bX Y that are in B k ,and the endpoints x k and x k are at distances at least δ n { C bX Y . Then we have thefollowing proposition. We postpone its proof until we ﬁnish the proof of Lemma 5.2. Proposition 5.3.

Almost surely, there are at least | B k | eigenvalues of C X Y in r x k , x k s . Here “almost surely” in the statement is due to the assumption (4.1): in the proof we discard a measurezero non-generic event. We postpone its proof until we complete the proof of Lemma 5.2.We now use a standard interlacing argument to show that C X Y has at most | B k | eigenvalues in r x k , x k s .By (5.2), there are at most | B | ` r ` eigenvalues of C X Y in r x , (recall that the rank of A p q is r ` ).Moreover, with the argument in Section 4, we can prove that (5.5) holds in the case A ” A p q , i.e. there areexactly r ` outliers. Then together with Proposition 5.3, we obtain that there are exactly | B | eigenvalues of30 X Y in r x , x s . Repeating this argument, we can show that C X Y has exact | B k | eigenvalues in r x k , x k s forall k “ , ¨ ¨ ¨ , k ˚ . Moreover, using (5.28) we ﬁnd that for any i P B k ,sup ! | x ´ λ bi | : x P r x k , x k s ) ď Cn ε { ´ n ´ ` ε { α ´ ` ¯ ď n ´ ` ε α ´ ` , which concludes Lemma 5.2.Finally we give the proof of Proposition 5.3. Proof of Proposition 5.3.

For the spectral decomposition of R b p z q (which takes a similar form as (3.13)), wedeﬁne P B k R b p z q : “ ÿ l P B k λ bl ´ z ˆ ξ bl p ξ bl q J ´ z ´ { p λ bl q { ξ bl p ζ bl q J ´ z ´ { p λ bl q { ζ bl p ξ bl q J ζ bl p ζ bl q J ˙ , (5.32)and P B ck R b p z q : “ R b p z q ´ P B k R b p z q . We deﬁne P B ck G b by replacing R with P B ck R b , and Y with Y in (3.14),(3.15) and (3.16). Then we deﬁne P B k G b p z q : “ G b p z q ´ P B ck G b p z q . Let x P r x k , x k s and denote z x : “ x ` i η x with η x : “ n ´ ` ε { α ´ ` . We claim that ››››ˆ U J a E J a ˙ “ P B ck G b p z x q ´ P B ck G b p x q ‰ ˆ U a E a ˙›››› À Cn ε { Im m b p z x q ` Cn ε { η x . (5.33)The proof is very similar to the one for (5.16). For example, for deterministic unit vectors u , v P I , using(3.14), (5.7) and (5.12) we get ˇˇ P B ck G b u v p z x q ´ P B ck G b u v p x q ˇˇ À ÿ l R B k ,l ďp ´ δ q q η x |x v , S ´ { xx ξ bl y||x u , S ´ { xx ξ bl y|| λ bl ´ x | “ p λ bl ´ x q ` η x ‰ { ` η x ÿ l ąp ´ δ q q |x v , S ´ { xx ξ bl y||x u , S ´ { xx ξ bl y|À n ´ ` ε { q ÿ l “ η x p λ bl ´ x q ` η x ` η x À n ε { Im m b p z x q ` η x , where in the second step we used | λ bl ´ x | Á η x for l R B k . The proofs for the rest of the cases p G b u α v β p z x q ´ G b u α v β p x qq , α, β “ , , ,

4, are similar, so we omit the details.Then we claim that ˇˇ P B k G b u v p z x q ˇˇ ` ˇˇ P B k G b u v p x k q ˇˇ ď n ´ ε { . (5.34)For example, for the z x term we have ˇˇˇˇˇ ÿ l P B k x u , S ´ { xx ξ bl yx ξ bl S ´ { xx , v y λ bl ´ z x ˇˇˇˇˇ ď Cn ε { η ´ x n ´ ` ε { ! n ´ ε { , where we used (5.12) and (5.28). The proofs for the rest of the blocks P B k G b u α v β p z x q , α, β “ , , ,

4, aresimilar. For z “ x k , the proof is the same except that we need to use | λ bl ´ x k | Á n ´ ` ε { α ´ ` for l P B k .We remove the zero singular values of A and redeﬁne thatΣ a : “ diag p a , ¨ ¨ ¨ , a r ` q , U a “ ` u a , ¨ ¨ ¨ , u ar ` ˘ , E a “ ` Z J v a , ¨ ¨ ¨ , Z J v ar ` ˘ . Then inspired by (3.9), for x R spec p C bX Y q we deﬁne M p x q : “ ˆ ´ a Σ ´ a ˙ ` ˆ U J a E J a ˙ ˆ G b p x q G b p x q G b p x q G b p x q ˙ ˆ U a E a ˙ , G bα is the I α ˆ I α block of G b (cf. Deﬁnition 3.2), and we use G bαβ to denote the I α ˆ I β block of G b . Then we know that almost surely, x P Spec p C X Y qz Spec p C bX Y q if and only if M p x q is singular.To simplify the notation, we shall denote “ G b p z q ‰ , : “ ˆ G b p z q G b p z q G b p z q G b p z q ˙ . Now using (5.9), (5.10), (5.33) and (5.34), we obtain that M p x q “ ˆ ´ a Σ ´ a ˙ ` ˆ U J a E J a ˙ “ P B k G b p x q ` P B ck ` G b p x q ´ G b p z x q ˘ ` G b p z x q ´ P B k G b p z x q ‰ , ˆ U a E a ˙ “ ˆ ´ a Σ ´ a ˙ ` ˆ U J a E J a ˙ “ P B k G b p x q ‰ , ˆ U a E a ˙ ` “ Π br p z x q ‰ , ` R p x q“ ˆ ´ a Σ ´ a ˙ ` ˆ U J a E J a ˙ “ P B k G b p x q ‰ , ˆ U a E a ˙ ` “ Π br p λ ` q ‰ , ` R p x q , (5.35)where “ Π br p z q ‰ , : “ ˜ c ´ m c p z q I r m c p z q I r ´ h p z q m c p z q M r Σ b ` Σ b M J r ¸ , and R and R are two matrices satisfying that } R p x q} “ O ´ n ε { η x ` n ε { Im m c p z x q ` n ε { Ψ p z x q ` n ε { φ n ` n ´ ε { ¯ “ O ´ n ´ ε { ¯ , and } R p x q} “ ›› R p x q ` O p? κ x ` η x q ›› “ O ´ n ´ ε { ¯ . In bounding the } R p x q} and } R p x q} , we also used Lemma 3.4, (3.38) and that κ x ď max | λ ` ´ x k | , | λ ` ´ x k | ( À p n ´ ε { { n q { ` n ´ { ` ε ` n ´ ` ε { α ´ ` ! n ´ ε { , where in the second step we used (5.26), (5.29) and the deﬁnitions in (5.31). Moreover, R p x q is real symmetric(because all the other terms in (5.35) are real symmetric), and continuous in x on the extended real line R .The rest of the proof follows from a continuity argument, which is exactly the same as the proof in [36,Section 6.4]. Instead of writing down all the details, we shall give an almost rigorous argument to show howequation (5.35) implies Proposition 5.3.First, we claim that M p x q has some negative singular values when x “ x k . By (5.34), (5.35) gives that M p x k q “ ˆ ´ a Σ ´ a ˙ ` “ Π br p λ ` q ‰ , ` O p n ´ ε { q . Let v i be an eigenvector of Σ a p ` Σ a q { M r Σ b ` Σ b M J r Σ a p ` Σ a q { with eigenvalue t i . Then for u i “ : ˆ m c p λ ` qp ` Σ a q ´ { v i Σ a p ` Σ a q ´ { v i ˙ , we can verify that u J i M p x k q u i “ h p λ ` q m c p λ ` q p f c p λ ` q ´ t i q } v i } ` O p n ´ e { q} v i } ă , m c p λ ` q ą t i ą t c “ f c p λ ` q and t i ´ t c „ ď i ď r ` .Next we claim that for l P B k , almost surely, M p x q is positive deﬁnite when x Ñ λ bl ´ and negativedeﬁnite when x Ñ λ bl ` . To see why this holds, we pick any unit vector v “ ` v J , v J ˘ J , v , v P R r ` , anddenote r v “ ` v J , r ` , v J , r ` ˘ J . Then v J M p x q v “ O p q ` r v J ˆ U J a E J a ˙ P B k G b p x q ˆ U a E a ˙ r v “ O p q ` r w J ˆ P B k G bL p x q ´ P B k G bL p x q´ P B k G bL p x q P B k G bL p x q ˙ r w “ O p q ` w J P B k R b p x q w , (5.36)where in the second step we used similar identities for G b as in (3.15) and (3.16) with r w “ ˆ w w ˙ : “ ¨˝ I p ` q ˆ X Y ˙ ˆ xI n x { I n x { I n xI n ˙˛‚ˆ U a E a ˙ r v , w , w P R p ` q , and in the third step we used (3.14) with w : “ ˜ S ´ { xx p S byy q ´ { ¸ p w ´ w q , Using the spectral decomposition (5.32), we can write P B k R b p x q “ ÿ l P B k „ x ´ { p λ bl q { ´ x { ˆ ξ bl ´ ζ bl ˙ ` p ξ bl q J , ´p ζ bl q J ˘ ´ x ´ { p λ bl q { ` x { ˆ ξ bl ζ bl ˙ ` p ξ bl q J , p ζ bl q J ˘ . (5.37)In particular, it has poles at x “ λ bl for l P B k . Combining (5.36) and (5.37), we conclude the claim.With the above two claims and a simple continuity argument, we see that there exists x P p x k , λ bα k q (recall (5.30)) such that M p x q is singular. Moreover, for any l, l ´ P B k , there exists x P p λ bl , λ bl ´ q suchthat M p x q is singular. This gives at least | B k | eigenvalues of C X Y inside r x k , x k s and hence completes theproof. Writing down a rigorous continuity argument involves discussion on some non-generic measure zeroevents, and we refer the reader to [36, Section 6.4] for more details. For the proof of Theorem 2.14, we adopt a similar argument as the one for Theorem 2.7 in [50]. However,our setting here is much more complicated. First, we introduce a cutoﬀ on the matrix entries of X and Y at the level n ´ ε for a suﬃciently small constant ε ą α p q n : “ P ´ | p x | ą n { ´ ε ¯ , β p q n : “ E ” ´ | p x | ą n { ´ ε ¯p x ı . Using (2.34), we can check with integration by parts that for any small constant δ ą α p q n ď δn ´ ` ε , | β p q n | ď δn ´ { ` ε . (6.1)Now we deﬁne independent random variables p x sij , p x lij , c p q ij , 1 ď i ď p, ď j ď n , as follows.33 eﬁnition 6.1. We deﬁne p x sij as a random variable that has law ρ p q s deﬁned through ρ p q s p Ω q “ ´ α p q n ż ˜ x ` β p q n ´ α p q n P Ω ¸ ´ | x | ď n { ´ ε ¯ ρ p q p d x q for any event Ω , where ρ p q p d x q is the law of p x ij . We deﬁne p x lij as a random variable that has law ρ p q l deﬁned through ρ p q l p Ω q “ α p q n ż ˜ x ` β p q n ´ α p q n P Ω ¸ ´ | x | ą n { ´ ε ¯ ρ p q p d x q for any event Ω . Finally, c p q ij is a Bernoulli 0-1 random variable with P p c p q ij “ q “ α p q n and P p c p q ij “ q “ ´ α p q n . In the above deﬁnition, ρ p q s and ρ p q l are deﬁned in a way such that p x sij and p x lij are both centered. Nowlet X s , X l and X c be independent random matrices such that x sij “ n ´ { p x sij , x lij “ n ´ { p x lij and x cij “ c p q ij .Then we can easily check that x ij d “ x sij ` ´ x cij ˘ ` x lij x cij ´ ? n β p q n ´ α p q n , (6.2)where d “ means that the two random variables have the same distribution. Similarly, we decompose Y as y ij d “ y sij ` ´ y cij ˘ ` y lij y cij ´ ? n β p q n ´ α p q n , (6.3)where the entries y sij , y lij and y cij of the independent random matrices Y s , Y l and Y c are deﬁned in similarways using α p q n : “ P ´ | p y | ą n { ´ ε ¯ , β p q n : “ E ” ´ | p y | ą n { ´ ε ¯p y ı . Notice that the deterministic matrix M with p M q ij “ ´ ? n β p q n ´ α p q n , ď i ď p, ď j ď n, has operator norm O p n ´ ` ε q , which, by Weyl’s inequality, perturbs the singular values of X at most byO p n ´ ` ε q . Such a small error is always negligible for our result, so we will omit the constant term in(6.2) throughout the proof. Similarly, we will also omit the constant term in (6.3). Finally, we decompose Z “ Z s ` Z l , where Z sij “ p| Z ij | ď n ´ ε q Z ij ` β p q n , Z lij “ p| Z ij | ą n ´ ε q Z ij ´ β p q n , β p q n : “ E r p| Z ij | ą n ´ ε q Z ij s . Using (2.2) and integration by parts, one can verify that β p q n “ O p n ´ ` ε q . The deterministic vector p β p q n , ¨ ¨ ¨ , β p q n q J P R n has Euclidean norm O p n ´ { ` ε q , and we can easily check thatit is also negligible for the following arguments. Hence for simplicity of notations, we will omit it throughoutthe proof. 34 emark . The purpose of the above decomposition (in distribution) is to write p X, Y, Z q into well-behavedrandom matrices p X s , Y s , Z s q with bounded support q “ O p n ´ ε q plus a perturbation matrix. For example,for X , the perturbation is of the form p X l ´ X s q ˝ X c up to a negligible deterministic term. Here the matrix X c gives the locations of the nonzero entries, and its rank is at most n ε with high probability; see (6.8)below. The matrix X l contains the large entries above the cutoﬀ, but the tail condition (2.34) guaranteesthat the sizes of these entries are of order o p q in probability; see (6.13). Hence the perturbation is of lowrank and has small signal strengths. We expect that, as in the famous BBP transition [5], the eﬀect of thisweak perturbation on the largest few eigenvalues is negligible.With (2.34) and integration by parts, we can obtain that E p x s “ , E | p x s | “ ´ O p n ´ ` ε q , E | p x s | “ O p q , E | p x s | “ O p log n q . (6.4)Similar estimates hold for the p y s variable. Hence X : “ p E | p x s | q ´ { X s and Y : “ p E | p y s | q ´ { Y s arerandom matrices that satisfy the assumptions for X and Y in Lemma 2.7, Theorem 2.9 and Theorem 2.11with φ n “ ψ n “ O p n ´ ε q . Moreover, the small error O p n ´ ` ε q in E | p x s | and E | p y s | can be neglected forour purpose. For Z , using lim t Ñ8 E “ | p z | p| p z | ą t q ‰ “

0, we get that E | z s | “ ´ o p q , E | z l | “ o p q , where we denote p z s : “ ? nZ s and p z l : “ ? nZ l . Then Z : “ p E | p z s | q ´ { Z s satisfy the assumptions for Z in Lemma 2.7, Theorem 2.9 and Theorem 2.11. Note that the scaling of Z s amounts to a rescaling of A and B : A Ñ A “ p E | p z s | q { A and B Ñ B “ p E | p z s | q { B so that A Z “ AZ s and B Z “ BZ s . Inparticular, we have that the t i ’s in (2.14) are only perturbed by an amount of o p q . (6.5)We denote by C s X Y and C sXY the SCC matrices obtained by replacing p X, Y, Z q with p X s , Y s , Z s q inthe deﬁnitions. Let r λ si and λ si be their eigenvalues, respectively. Then by Theorem 2.9 and (6.5), for any1 ď i ď r ` we have that | r λ si ´ θ i | “ o p q with high probability , (6.6)and by Lemma 2.7, we have thatlim n Ñ8 P ˆ n { λ s ´ λ ` c T W ď s ˙ “ lim n Ñ8 P GOE ´ n { p λ ´ q ď s ¯ . (6.7)Throughout the following proof, we only consider the largest non-outlier eigenvalue. The extension to thecase with multiple largest non-outlier eigenvalues is simple. We write the right-hand sides of (6.2) and (6.3)as x sij ` ´ x cij ˘ ` x lij x cij “ x sij ` ∆ p q ij x cij , ∆ p q ij : “ x lij ´ x sij , and y sij ` ´ y cij ˘ ` y lij y cij “ y sij ` ∆ p q ij y cij , ∆ p q ij : “ y lij ´ y sij . We deﬁne the matrices E p q : “ p ∆ p q ij x cij : 1 ď i ď p, ď j ď n q and E p q : “ p ∆ p q ij y cij : 1 ď i ď q, ď j ď n q .It suﬃces to show that the eﬀect of E p q , E p q and Z l on the eigenvalues r λ i , 1 ď i ď r ` and r λ r ` ` is negligible.Deﬁne the event A : “ tp i, j q : x cij “ u ď n ε ( X x cij “ x ckl “ ñt i, j u “ t k, l u or t i, j u X t k, l u “ H ( .

35y a Chernoﬀ bound, we get that P ` tp i, j q : x cij “ u ď n ε (˘ ě ´ exp p´ n ε q . (6.8)If the number n of the nonzero elements in X c satisﬁes n ď n ε , then we can check that P ` D i “ k, j ‰ l or i ‰ k, j “ l such that x cij “ x ckl “ ˇˇ tp i, j q : x cij “ u “ n ˘ “ O p n n ´ q . (6.9)Combining the estimates (6.8) and (6.9), we get that P p A q ě ´ O p n ´ ` ε q . (6.10)Similarly, for the event B : “ tp i, j q : y cij “ u ď n ε ( X y cij “ y ckl “ ñt i, j u “ t k, l u or t i, j u X t k, l u “ H ( , we have P p B q ě ´ O p n ´ ` ε q , (6.11)if the nonzero elements in Y c is at most n ε . On the other hand, using condition (2.34) and Markov’sinequality, we get P ´ | E p q ij | ě ω ¯ ` P ´ | E p q ij | ě ω ¯ ď P ´ | p x ij | ě ω n { ¯ ` P ´ | p y ij | ě ω n { ¯ “ o p n ´ q , for any ﬁxed constant ω ą

0. With a simple union bound, we get P ˆ max i,j | E p q ij | ě ω ˙ ` P ˆ max i,j | E p q ij | ě ω ˙ “ o p q . (6.12)Deﬁne the event C : “ " max i,j | E p q ij | ď ω * X " max i,j | E p q ij | ď ω * . Combining (6.10), (6.11) and (6.12), we get P p A X B X C q “ ´ o p q . (6.13)We also deﬁne the event C : “ }p Z s q J Z s ´ I r } ď w, }p Z l q J Z l } ď w , }p Z s q J Z l } ď w ( . (6.14)By strong law of large number, we have P p C q “ ´ o p q . Recalling (3.2), we only need to study the zeros of det r r H p λ qs on event A X B X C X C . Here we deﬁne r H t p λ q , t P r , s , as r H t p λ q : “ r H s p λ q ` t ¨˚˚˝ ˆ E p q ` AZ l E p q ` BZ l ˙ˆ p E p q ` AZ l q J p E p q ` BZ l q J ˙ ˛‹‹‚ , where r H s p λ q : “ H s p λ q ` ¨˚˚˝ ˆ AZ s BZ s ˙ˆ p AZ s q J p BZ s q J ˙ ˛‹‹‚ , H s p λ q : “ ¨˚˚˝ ˆ X s Y s ˙ˆ p X s q J p Y s q J ˙ ˆ λI n λ { I n λ { I n λI n ˙ ´ ˛‹‹‚ . We would like to extend (6.6) and (6.7) at t “ t “ t P r , s , we deﬁne the PCC matrix C X Y p t q for X p t q : “ X s ` t r E p q ` A p Z s ` tZ l q and Y p t q : “ Y s ` t r E p q ` B p Z s ` tZ l q , and denote its eigenvalues as r λ i p t q . Note that r λ i “ r λ i p q are theeigenvalues we are interested in, and the eigenvalues r λ si “ r λ i p q satisfy (6.6) and (6.7). Moreover, r λ i p t q iscontinuous with respect to t on the extended real line R . Proof of (2.36) . Fix any 1 ď i ď r ` , we pick a suﬃciently small constant δ ą n : (i) the interval J i : “ r θ i ´ δ, θ i ` δ s only contains θ j ’s that converge tothe same limit as θ i when n Ñ 8 , (ii) J i is away from all the other θ j ’s at least by δ , and (iii) J i is awayfrom λ ` at least by δ . By (6.6), we know r λ i p q P J i with high probability. Now for µ : “ θ i ˘ δ , we claimthat P ´ det r H t p µ q ‰ ď t ď ¯ “ ´ o p q . (6.15)If (6.15) holds, then µ is not an eigenvalue of C X Y p t q for all t P r , s with probability 1 ´ o p q . By continuityof r λ i p t q with respect to t , we have r λ i “ r λ i p q P J i with probability 1 ´ o p q , that is, P p| r λ i ´ θ i | ď δ q “ ´ o p q . This concludes (2.36) since δ can be arbitrarily small.For the proof of (6.15), we will condition on A X B and the event C n x n y that X c and Y c have n x and n y nonzero entries with max t n x , n y u ď n ε . Moreover, we assume that the positions of the n x nonzeroentries of X c are p σ x p q , π x p qq , p σ x p q , π x p qq , ¨ ¨ ¨ , p σ x p n x q , π x p n x qq , and the positions of the n y nonzeroentries of Y c are p σ y p q , π y p qq , p σ y p q , π y p qq , ¨ ¨ ¨ , p σ y p n y q , π y p n x qq . Here σ x : t , ¨ ¨ ¨ , n x u Ñ t , ¨ ¨ ¨ , p u , π x : t , ¨ ¨ ¨ , n x u Ñ t , ¨ ¨ ¨ , n u , σ y : t , ¨ ¨ ¨ , n y u Ñ t , ¨ ¨ ¨ , q u and π y : t , ¨ ¨ ¨ , n y u Ñ t , ¨ ¨ ¨ , n u are uniformrandom injective functions. Then we can rewrite that r H t p µ q “ H s p µ q ` O t ¨˚˚˝ ˆ D t D e ˙ˆ D t D e ˙ ˛‹‹‚ O J t , O t : “ ˆ` U , F ˘ ` E t , F ˘˙ , where D and U have been deﬁned in (3.5) and (3.6); D e : “ ˜ Σ p q e

00 Σ p q e ¸ withΣ p q e : “ diag ´ E p q σ x p q π x p q , ¨ ¨ ¨ , E p q σ x p n x q π x p n x q ¯ , Σ p q e : “ diag ´ E p q σ y p q π y p q , ¨ ¨ ¨ , E p q σ y p n y q π y p n y q ¯ ; E t : “ ˆ` Z J t v a , ¨ ¨ ¨ , Z J t v ar ˘ ` Z J t v b , ¨ ¨ ¨ , Z J t v br ˘˙ , with Z t : “ Z s ` tZ l ; F : “ ¨˝´ e p p q σ x p q , ¨ ¨ ¨ , e p p q σ x p n x q ¯ ´ e p q q σ y p q , ¨ ¨ ¨ , e p q q σ y p n y q ¯˛‚ ;37 : “ ¨˝´ e p n q π x p q , ¨ ¨ ¨ , e p n q π x p n x q ¯ ´ e p n q π y p q , ¨ ¨ ¨ , e p n q π y p n y q ¯˛‚ . Here we use e p l q i to denote the standard unit vector along i -th coordinate in R l .Applying the identity det p ` AB q “ det p ` BA q , we obtain thatdet r H t p µ q “ det r G s p µ qs ¨ det ” ` r F t p µ q ` E t p µ q ı , (6.16)where r F t p µ q : “ ¨˚˚˝ ˆ D t D e ˙ˆ D t D e ˙ ˛‹‹‚ O J t Π p µ q O t , and E t p µ q : “ ¨˚˚˝ ˆ D t D e ˙ˆ D t D e ˙ ˛‹‹‚ O J t r G s p µ q ´ Π p µ qs O t . Note that O t is deterministic conditioning on Z . Hence by Lemma 3.11, we have that (recall (6.14)) E „ ˇˇˇ“ O J t p G s p µ q ´ Π p µ qq O t ‰ ij ˇˇˇ ˇˇˇˇ C n x n y , Z, C  ă n ´ , ď i, j ď r ` n x ` n y . Applying Markov’s inequality to this estimate and using a simple union bound, we get thatmax ď i,j ď r ` n x ` n y ˇˇˇ“ O J t p G s p µ q ´ Π p µ qq O t ‰ ij ˇˇˇ ď n ´ { with probability 1 ´ O p n ´ { ` ε q , (6.17)conditioning on C n x n y , Z and C . Next we claim that on C X C ,sup ď t ď ››› r F t p µ q ´ r F p µ q ››› ď Cω, (6.18)for some constant C ą ω . In fact, expanding r F t p µ q and using that } Π p µ q} “ O p q , } t Σ p q e } ď ω , } t Σ p q e } ď ω and } E t ´ E } “ O p ω q on C X C , we can easily obtain (6.18). Then combining(6.17) and (6.18) we get that on event A X B X C X C ,det ´ ` r F t p µ q ` E t p µ q ¯ “ det ´ ` r F p µ q ` O p ω q ¯ for all t P r , s , (6.19)with probability 1 ´ o p q . When t “

0, the discussion at the beginning of Section 4 (i.e. the argument leadingto (4.6)) gives that at µ “ θ i ˘ δ , }p ` r F p µ qq ´ } ď C δ with high probability for some constant C δ ą

0. Thusby (6.19), as long as ω is suﬃciently small, we have that with probability 1 ´ o p q , det p ` r F t p µ q ` E t p µ qq ‰ t P r , s . This concludes (6.15), which further concludes (2.36). Proof of (2.31) for Theorem 2.14.

Similar to (6.15), we claim that P ´ det r H t p µ q ‰ ď t ď ¯ “ ´ o p q , for µ “ λ p q ˘ n ´ { ” λ s ˘ n ´ { . (6.20)38ecall that at t “

0, by Theorem 2.11, we have | r λ ` r ` p q ´ λ b p q| ă n ´ . Applying Theorem 2.11 againgives | λ b p q ´ λ p q| ă n ´ . Thus we have that r λ ` r ` p q P r λ s ´ n ´ { , λ s ` n ´ { s with high probability.If (6.20) holds, then by continuity of r λ ` r ` p t q with respect to t , we get r λ ` r ` ” r λ ` r ` p q P r λ s ´ n ´ { , λ s ` n ´ { s with probability 1 ´ o p q , which concludes the proof together with (6.7).In the following proof, we choose z “ λ ` ` i n ´ { . As in (6.16), we need to studydet »——– ` r F t p z q ` E t p z q ` ¨˚˚˝ ˆ D t D e ˙ˆ D t D e ˙ ˛‹‹‚ O J t r G s p µ q ´ G s p z qs O t ﬁﬃﬃﬂ , where we used the simple identity O J t G s p µ q O t “ O J t r G s p µ q ´ G s p z qs O t ` O J t G s p z q O t . Repeating the proof below (6.16), we can show that with probability 1 ´ o p q ,1 ` r F t p z q ` E t p z q “ ` r F p z q ` O p ω q for all t P r , s , (6.21)and }p ` r F p z qq ´ } ď C with high probability for some constant C ą ω . Moreover,we have that } O J t r G s p µ q ´ G s p z qs O t } ď n ´ { with probability 1 ´ o p q , (6.22)which is proved as (5.16) in [50]. Combining (6.21) and (6.22), we get that with probability 1 ´ o p q ,det ´ ` r F t p µ q ` E t p µ q ¯ “ det ´ ` r F p z q ` O p ω q ¯ ‰ t P r , s , as long as ω is suﬃciently small. This concludes (6.20), which completes the proof of (2.31) for the k “ k ą Finally, in this section, we present the proof of Lemma 2.7. It has been proved in [50] when B “

0, and weneed to show that adding the BZ term to Y does not aﬀect the results. By Theorem 2.5 of [50], (2.21) holdsfor λ i , the eigenvalues of C XY . On the other hand, by Theorem 2.11 we have | λ bi ´ λ i | ă n ´ α ´ ` À n ´ , where in the second step we used that t i “ ď i ď r and hence α ` “ t c „

1. This shows that(2.21) also holds for λ bi .However, since we need to use (2.20) in the proof of Theorem 2.11, we cannot use (2.25) and (3.42) toconclude (2.20). Instead, we need a separate argument. We ﬁrst prove an averaged local law for G b p z q as in(3.40) and (3.41), using the following resolvent estimates. Lemma 7.1 (Lemma 3.8 of [50]) . For any deterministic unit v β P C I β , β “ , , we have that ÿ a P I ˇˇ G a v β ˇˇ ă ` ˇˇ Im p UG R q v β v β ˇˇ η , ÿ a P I ˇˇ G v β a ˇˇ ă ` ˇˇ Im p G R U J q v β v β ˇˇ η , (7.1)39 here U : “ z { ˆ zI n z { I n z { I n zI n ˙ ˆ zI n z { I n z { I n zI n ˙ ´ . Now we calculate m b p z q “ n ´ ř µ P I G bµµ p z q using (3.61). By the anisotropic local law (3.57), we havethat with high probability, ›››››„ ` ˆ D b D b ˙ ˆ U J b E J b ˙ G p z q ˆ U b E b ˙ ´ ˆ D b D b ˙››››› “ O p q . Hence by (3.61), we obtain that (recall (3.56)) | m b p z q ´ m p z q| ă max ď k ď r ÿ µ P I ´ | G µ u bk p z q| ` | G µ r v bk p z q| ¯ , where we abbreviated that r v bk : “ Z J v bk . Note that r v bk are approximately orthonormal vectors by (3.55).Then using (7.1), we obtain that for z P r S p ε, r ε q , | m b p z q ´ m p z q| ă n ` max ď k ď r | Im p UG R q u bk u bk | ` | Im p UG R q r v bk r v bk | nη ă n ` max ď k ď r η ` Im m c p z q ` Ψ p z q ` ψ n ` φ n nη À Ψ p z q ` ψ n ` φ n nη , (7.2)where in the second step we used the local law (3.57) and that ˇˇˇ Im ` U Π bR p z q ˘ u bk u bk ˇˇˇ ` ˇˇˇ Im ` U Π bR ˘ r v bk r v bk ˇˇˇ À Im m c p z q ` η. Here Π bR p z q denotes the p I Y I q ˆ p I Y I q block of Π b . Combining (7.2) with the averaged local laws(3.40)–(3.41) for m p z q , and equation (3.17) for m b p z q and m b p z q , we obtain the following local laws: forany ﬁxed ε, r ε ą | m b p z q ´ m c p z q| ă p nη q ´ (7.3)uniformly in z P r S p ε, r ε q , and | m b p z q ´ m c p z q| ă ψ n ` φ n nη ` n p κ ` η q ` p nη q ? κ ` η (7.4)uniformly in z P r S out p ε, r ε q .Next we introduce the following regularized resolvents. Deﬁnition 7.2 (Regularized resolvents) . For z “ E ` i η P C ` , we deﬁne the regularized resolvent p G p z q as p G p z q : “ „ H p z q ´ zn ´ ˆ I p ` q

00 0 ˙ ´ . Moreover, we deﬁne p H : “ p S ´ { xx S xy p S ´ { yy , p S xx : “ S xx ` n ´ , p S yy : “ S yy ` n ´ . Then the resolvents p R p z q , p G b p z q and p R b p z q etc. can be deﬁned in the obvious way as in Deﬁnition 3.2.

40y Schur complement formula, we can obtain similar expressions for p G L , p G R and p G LR as in (3.14)–(3.16).The main reason for introducing the regularized resolvents is that they satisfy the following deterministicbounds: for some constant C ą ››› p G p z q ››› ď Cn η , ››› p G b p z q ››› ď Cn η . (7.5)This estimate has been proved in Lemma 3.6 of [50]. With a standard perturbation argument, we can controlthe diﬀerence between p G p z q and G p z q as in the following claim. Claim 7.3.

Suppose there exists a high probability event Ξ on which } G p z q} max “ O p q for z in some subset,where } G } max : “ max i,j | G ij | denotes the max norm. Then we have that } G p z q ´ p G p z q} max ď n ´ on Ξ . (7.6) The same bound also holds for } G b p z q ´ p G b p z q} max on event t} G b p z q} max “ O p qu or t} p G b p z q} max “ O p qu .Proof. For t P r , s , we deﬁne G t p z q : “ „ H p z q ´ tzn ´ ˆ I p ` q

00 0 ˙ ´ , with G p z q “ G p z q , G p z q “ p G p z q . Taking derivative with respect to t , we immediately get that B t G t p z q “ zn ´ G t p z q ˆ I p ` q

00 0 ˙ G t p z q . (7.7)Thus applying Gronwall’s inequality to } G t p z q} max ď } G p z q} max ` Cn ´ ż t } G s p z q} d s, we obtain that } G t p z q} max ď C for all 0 ď t ď . Then using (7.7) again, we get (7.6).Note that the bound (7.6) is purely deterministic on Ξ, so we do not lose any probability here. Moreover,such a small error n ´ will not aﬀect any of our results. Proof of (2.20) . With the same arguments as the ones for [22, Theorems 2.12 and 2.13], [23, Theorem 2.2]and [43, Theorem 3.3], from the averaged local law (7.3) we can derive that for any small constants δ, ε ą n ε ď i ď p ´ δ q q . To conclude (2.20) for the ﬁrst n ε eigenvalues, we still need to provean upper bound on them. More precisely, it suﬃces to show that for any small constant ε ą λ b ď λ ` ` n ´ { ` ε , w.h.p. (7.8)Combining this estimate with the rigidity estimate for λ bn ε , we can conclude that (2.20) holds all 1 ď i ăp ´ δ q q since ε can be arbitrarily small.First, using the averaged local law (7.4), we can obtain that for any small constants c, ε ą t i : λ bi P r λ ` ` n ´ { ` ε , ´ c su “ , w.h.p. (7.9)The proof is standard and similar to the one for (4.7) of [50], so we omit the details. It remains to provethat t i : λ bi P r ´ c, su “ , w.h.p., (7.10)41or a suﬃciently small constant c ą t P r , s , we deﬁne a continuous path of interpolated random matrices between Y and Y ` BZ as Y t : “ Y ` tBZ, t P r , s . By replacing Y with Y t in (3.10) and Deﬁnition 7.2, we can deﬁne H bt p z q , G bt p z q , p H bt p z q and p G bt p z q corre-spondingly. First, we claim the following result. Claim 7.4.

With high probability, we have that } G bt p ´ c q} max ă 8 for all t P r , s . (7.11)We postpone the proof of this claim until we complete the proof of (2.20). Let λ b p t q ě λ b p t q ě ¨ ¨ ¨ ě λ bq p t q be the eigenvalues of C X Y t . For any 1 ď i ď q , λ bi p t q : r , s Ñ R is a continuous function with respect to t on the extended real line R . By (3.42), the eigenvalues λ bi p q of C XY are all inside r , λ ` ` n ´ { ` ε s withhigh probability. If (7.11) holds, then we have that m bt p ´ c q “ q q ÿ i “ λ bi p t q ´ p ´ c q is ﬁnite for all t P r , s .It means that the eigenvalue λ b p t q does not cross the point E “ ´ c for all t P r , s . Thus we conclude(7.10), which further concludes (7.8) together with (7.9).Finally, it remains to prove Claim 7.4. Proof of Claim 7.4.

Take a discrete net of t , t k “ kn ´ , for 0 ď k ď n . First, we claim that there existsa high probability event Ξ so that p Ξ q max ď k ď n } p G bt k p E ` i n ´ q} max ď C for E : “ ´ c, (7.12)for some large constant C ą

0. In fact, notice that Y t also satisﬁes the assumptions for Y in Lemma 2.7.Hence using (7.9), we obtain that for any t k , the eigenvalues λ bi p t k q are inside r , λ ` ` n ´ { ` ε s Y r ´ c { , s with high probability. By taking a union bound, we get thatmin ď k ď n min ď i ď q | E ´ λ bi p t k q| Á w.h.p. (7.13)Applying the spectral decomposition (3.13) to R b , we obtain from (7.13) thatmax ď k ď n ›› R bt k p z q ›› ď C, for z “ E ` i n ´ . Combining this bound with (3.14)–(3.16), and using Lemma 3.3, we get thatmax ď k ď n ›› G bt k p z q ›› ď C, w.h.p.

Finally, applying Claim 7.3, we get (7.12) for p G b .Now given (7.12), using the deterministic bound (7.5) for p G b , we get that on Ξ , ››› p G bt p E ` i n ´ q ´ p G bt k p E ` i n ´ q ››› max À n ´ } p G bt p E ` i n ´ q} ¨ } Z } ¨ } p G bt k p E ` i n ´ q}À n ´ ¨ ` n ˘ ¨ } Z } À n ´ } Z } , t k ´ ď t ď t k . By the bounded support condition of Z , we have that } Z } “ O p? n q on a highprobability event Ξ . Thus we have that on the high probability event Ξ X Ξ , ››› p G bt p E ` i n ´ q ´ p G bt k p E ` i n ´ q ››› max À n ´ ¨ ` n ˘ ¨ ? n ď n ´ , which gives that p Ξ X Ξ q max ď t ď } p G bt p E ` i n ´ q} max ď C. Finally, using the same perturbation argument as in the proof of Claim 7.3, we can remove both the i n ´ and the regularization in p G , which gives (7.11) on Ξ X Ξ . References [1] J. Alt. Singularities of the density of states of random Gram matrices.

Electron. Commun. Probab. ,22:13 pp., 2017.[2] J. Alt, L. Erd˝os, and T. Kr¨uger. Local law for random Gram matrices.

Electron. J. Probab. , 22:41 pp.,2017.[3] Z. Bai and J. Yao. Central limit theorems for eigenvalues in a spiked population model.

Ann. Inst. H.Poincar´e Probab. Statist. , 44(3):447–474, 2008.[4] Z. D. Bai and J. W. Silverstein. No eigenvalues outside the support of the limiting spectral distributionof large-dimensional sample covariance matrices.

Ann. Probab. , 26(1):316–345, 1998.[5] J. Baik, G. Ben Arous, and S. P´ech´e. Phase transition of the largest eigenvalue for nonnull complexsample covariance matrices.

Ann. Probab. , 33(5):1643–1697, 2005.[6] J. Baik and J. W. Silverstein. Eigenvalues of large sample covariance matrices of spiked populationmodels.

Journal of Multivariate Analysis , 97(6):1382 – 1408, 2006.[7] Z. Bao, J. Hu, G. Pan, and W. Zhou. Canonical correlation coeﬃcients of high-dimensional Gaussianvectors: Finite rank case.

Ann. Statist. , 47(1):612–640, 2019.[8] S. T. Belinschi, H. Bercovici, M. Capitaine, and M. F´evrier. Outliers in the spectrum of large deformedunitarily invariant models.

Ann. Probab. , 45(6A):3571–3625, 2017.[9] F. Benaych-Georges, A. Guionnet, and M. Maida. Fluctuations of the extreme eigenvalues of ﬁnite rankdeformations of random matrices.

Electron. J. Probab. , 16:1621–1662, 2011.[10] F. Benaych-Georges and R. R. Nadakuditi. The eigenvalues and eigenvectors of ﬁnite, low rank pertur-bations of large random matrices.

Advances in Mathematics , 227(1):494 – 521, 2011.[11] A. Bloemendal, L. Erd˝os, A. Knowles, H.-T. Yau, and J. Yin. Isotropic local laws for sample covarianceand generalized Wigner matrices.

Electron. J. Probab. , 19(33):1–53, 2014.[12] A. Bloemendal, A. Knowles, H.-T. Yau, and J. Yin. On the principal components of sample covariancematrices.

Prob. Theor. Rel. Fields , 164(1):459–552, 2016.[13] P. Bourgade, H.-T. Yau, and J. Yin. Local circular law for random matrices.

Probab. Theory Relat.Fields , 159:545–595, 2014. 4314] M. Capitaine, C. Donati-Martin, and D. F´eral. The largest eigenvalues of ﬁnite rank deformation oflarge Wigner matrices: Convergence and nonuniversality of the ﬂuctuations.

Ann. Probab. , 37(1):1–47,2009.[15] M. Capitaine, C. Donati-Martin, and D. F´eral. Central limit theorems for eigenvalues of deformationsof Wigner matrices.

Ann. Inst. H. Poincar´e Probab. Statist. , 48(1):107–133, 2012.[16] X. Ding and F. Yang. Edge statistics of large dimensional deformed rectangular matrices. arXiv:2009.00389 , 2020.[17] X. Ding and F. Yang. Spiked separable covariance matrices and principal components.

Annals ofStatistics (in press) , 2020.[18] X. Ding and F. Yang. Tracy-widom distribution for the edge eigenvalues of Gram type random matrices. arXiv:2008.04166 , 2020.[19] L. Erd˝os, A. Knowles, and H.-T. Yau. Averaging ﬂuctuations in resolvents of random band matrices.

Ann. Henri Poincar´e , 14:1837–1926, 2013.[20] L. Erd˝os, A. Knowles, H.-T. Yau, and J. Yin. Delocalization and diﬀusion proﬁle for random bandmatrices.

Commun. Math. Phys. , 323:367–416, 2013.[21] L. Erd˝os, A. Knowles, H.-T. Yau, and J. Yin. The local semicircle law for a general class of randommatrices.

Electron. J. Probab. , 18:1–58, 2013.[22] L. Erd˝os, A. Knowles, H.-T. Yau, and J. Yin. Spectral statistics of Erd˝os-R´enyi graphs I: Local semicirclelaw.

Ann. Probab. , 41(3B):2279–2375, 2013.[23] L. Erd˝os, H.-T. Yau, and J. Yin. Rigidity of eigenvalues of generalized Wigner matrices.

Advances inMathematics , 229:1435 – 1515, 2012.[24] D. F´eral and S. P´ech´e. The largest eigenvalue of rank one deformation of large Wigner matrices.

Communications in Mathematical Physics , 272(1):185–228, 2007.[25] D. F´eral and S. P´ech´e. The largest eigenvalues of sample covariance matrices for a spiked population:Diagonal case.

Journal of Mathematical Physics , 50(7):073302, 2009.[26] P. Forrester. The spectrum edge of random matrix ensembles.

Nucl. Phys. B , 402(3):709 – 728, 1993.[27] Y. Fujikoshi. High-dimensional asymptotic distributions of characteristic roots in multivariate linearmodels and canonical correlation analysis.

Hiroshima Math. J. , 47(3):249–271, 2017.[28] C. Gao, Z. Ma, Z. Ren, and H. H. Zhou. Minimax estimation in sparse canonical correlation analysis.

Ann. Statist. , 43(5):2168–2197, 2015.[29] C. Gao, Z. Ma, and H. H. Zhou. Sparse CCA: Adaptive estimation and computational barriers.

Ann.Statist. , 45(5):2074–2101, 2017.[30] X. Han, G. Pan, and Q. Yang. A uniﬁed matrix model including both CCA and F matrices in multi-variate analysis: The largest eigenvalue and its applications.

Bernoulli , 24(4B):3447–3468, 2018.[31] X. Han, G. Pan, and B. Zhang. The Tracy-Widom law for the largest eigenvalue of F type matrices.

Ann. Statist. , 44(4):1564–1592, 2016.[32] H. Hotelling. Relations between two sets of variates.

Biometrika , 28(3-4):321–377, 1936.4433] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis.

Ann.Statist. , 29:295–327, 2001.[34] I. M. Johnstone. Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy-Widom limitsand rates of convergence.

Ann. Statist. , 36(6):2638–2716, 2008.[35] I. M. Johnstone and A. Onatski. Testing in high-dimensional spiked models.

Ann. Statist. , 48(3):1231–1254, 2020.[36] A. Knowles and J. Yin. The isotropic semicircle law and deformation of Wigner matrices.

Comm. PureAppl. Math. , 66:1663–1749, 2013.[37] A. Knowles and J. Yin. The outliers of a deformed Wigner matrix.

Ann. Probab. , 42(5):1980–2031,2014.[38] A. Knowles and J. Yin. Anisotropic local laws for random matrices.

Probability Theory and RelatedFields , pages 1–96, 2016.[39] Z. Ma and F. Yang. Limiting distribution of the sample canonical correlation coeﬃcients of high-dimensional random vectors.

In preparation , 2021.[40] R. Oda, H. Yanagihara, and Y. Fujikoshi. Asymptotic null and non-null distributions of test statisticsfor redundancy in high-dimensional canonical correlation analysis.

Random Matrices: Theory andApplications , 08(01):1950001, 2019.[41] D. Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model.

StatisticaSinica , 17(4):1617–1642, 2007.[42] S. P´ech´e. The largest eigenvalue of small rank perturbations of Hermitian random matrices.

ProbabilityTheory and Related Fields , 134(1):174–174, 2006.[43] N. S. Pillai and J. Yin. Universality of covariance matrices.

Ann. Appl. Probab. , 24:935–1001, 2014.[44] C. A. Tracy and H. Widom. Level-spacing distributions and the airy kernel.

Comm. Math. Phys. ,159:151–174, 1994.[45] C. A. Tracy and H. Widom. On orthogonal and symplectic matrix ensembles.

Comm. Math. Phys. ,177:727–754, 1996.[46] K. W. Wachter. The limiting empirical measure of multiple discriminant ratios.

Ann. Statist. , 8(5):937–957, 1980.[47] Q. Wang and J. Yao. Extreme eigenvalues of large-dimensional spiked Fisher matrices with application.

Ann. Statist. , 45(1):415–460, 2017.[48] H. Xi, F. Yang, and J. Yin. Local circular law for the product of a deterministic matrix with a randommatrix.

Electron. J. Probab. , 22:77 pp., 2017.[49] F. Yang. Edge universality of separable covariance matrices.

Electron. J. Probab. , 24:57 pp., 2019.[50] F. Yang. Sample canonical correlation coeﬃcients of high-dimensional random vectors: local law andTracy-Widom limit. arXiv:2002.09643 , 2020.[51] F. Yang, S. Liu, E. Dobriban, and D. P. Woodruﬀ. How to reduce dimension with PCA and randomprojections? arXiv:2005.00511 . 4552] Y. Yang and G. Pan. The convergence of the empirical distribution of canonical correlation coeﬃcients.

Electron. J. Probab. , 17:13 pp., 2012.[53] Y. Yang and G. Pan. Independence test for high dimensional data based on regularized canonicalcorrelation coeﬃcients.