[PDF] Freeness over the diagonal and outliers detection in deformed random matrices with a variance profile

Abstract

We study the eigenvalue distribution of a GUE matrix with a variance profile that is perturbed by an additive random matrix that may possess spikes. Our approach is guided by Voiculescu's notion of freeness with amalgamation over the diagonal and by the notion of deterministic equivalent. This allows to derive a fixed point equation to approximate the spectral distribution of certain deformed GUE matrices with a variance profile and to characterize the location of potential outliers in such models in a non-asymptotic setting. We also consider the singular values distribution of a rectangular Gaussian random matrix with a variance profile in a similar setting of additive perturbation. We discuss the application of this approach to the study of low-rank matrix denoising models in the presence of heteroscedastic noise, that is when the amount of variance in the observed data matrix may change from entry to entry. Numerical experiments are used to illustrate our results.

Full PDF

aa r X i v : . [ m a t h . S T ] M a y Freeness over the diagonal and outliers detection indeformed random matrices with a variance proﬁle

Jérémie Bigot & Camille MaleInstitut de Mathématiques de Bordeaux et CNRS (UMR 5251)Université de BordeauxMay 19, 2020

Abstract

We study the eigenvalue distribution of a GUE matrix with a variance proﬁlethat is perturbed by an additive random matrix that may possess spikes. Ourapproach is guided by Voiculescu’s notion of freeness with amalgamation over thediagonal and by the notion of deterministic equivalent. This allows to derive a ﬁxedpoint equation to approximate the spectral distribution of certain deformed GUEmatrices with a variance proﬁle and to characterize the location of potential outliersin such models in a non-asymptotic setting. We also consider the singular valuesdistribution of a rectangular Gaussian random matrix with a variance proﬁle in asimilar setting of additive perturbation. We discuss the application of this approachto the study of low-rank matrix denoising models in the presence of heteroscedasticnoise, that is when the amount of variance in the observed data matrix may changefrom entry to entry. Numerical experiments are used to illustrate our results.

Keywords:

Deformed random matrix, Variance proﬁle, Outlier detection, Free probabil-ity, Freeness with amalgamation, Operator-valued Stieltjes transform, Gaussian spikedmodel, Low-rank model.

Contents

Related literature and methods 10

We ﬁrst introduce some questions related to the estimation of the eigenvalue distributionof the sum of a GUE matrix with a variance proﬁle and a deterministic matrix that may2ossess spikes. Then, we present the main theoretical contributions of the paper and itsorganization.

We recall that for any integer N ě

1, a N ˆ N GUE matrix is a Hermitian randommatrix ` x i,j ? N ˘ i,j “ ,...,N , such that the sub-diagonal entries are independent, the variables x i,i , i “ , . . . , N , are centered real Gaussian variables with variance one, and the non-diagonal entries x i,j (with i ‰ j ) are centered complex Gaussian variables with varianceone. Let Γ N “ ` γ N p i, j q ˘ i,j “ ,...,N be a symmetric matrix with non negative entries.Then, the random matrix X N : “ ˆ γ N p i, j q x i,j ? N ˙ i,j “ ,...,N (1.1)is called a GUE matrix of size N with variance proﬁle Γ N . Furthermore, we consider adeterministic Hermitian matrix Y N . The random matrix H N : “ X N ` Y N (1.2)is then an (additively) deformed random matrix . We shall also study the situationwhere Y N has spikes that may generate a ﬁnite number of eigenvalues (called outliers )that detach from the rest of the spectrum.There are two dual ways to interpret model (1.2) in RMT. From the point of viewof mathematical physics, the random matrix X N models the Hamiltonian of a system,and the deformation Y N is a perturbation coming from an external source. A varianceproﬁle for X N can model impurities of the system. From the signal processing point ofview, Y N models a signal and X N an additive noise which deforms the data.In this paper, we present results and heuristics to approximate the eigenvalue dis-tribution of a deformed random matrix by a deterministic function, called classicallya deterministic equivalent . Our approach involves tools in RMT and free probability,some of them having been created quite recently. They may have applications for thenumerical analysis of the spectral distribution of data organized in a matrix form suchas noisy images.The substance of our mathematical arguments is a legacy of the investigations ofHaagerup and Thorbjørnsen who developed a strategy in [37] for the study of block GUEmatrices, involving classical techniques of Gaussian calculus and complex analysis. Intheir seminal paper, such a strategy served as an ingredient to reinforce the link betweenRMT and the abstract theory of C ˚ -algebras of free groups. The analysis in [37] and inour work are both based on Voiculescu’s equations of subordination with amalgamation.Adapting Haagerup and Thorbjørnsen’s technique, we want to spread their method tomore applied contexts that we believe to be of interest in high-dimensional statistics andsignal processing. We notify the reader that the aim of this work is not to obtain sharp3stimations (that is with optimal rates of convergence) of the spectral distribution ofdeformed random matrix with a variance proﬁle, and we refer to [30] for recent resultson this topic. If model (1.2) is quite natural from the mathematical point of view, it is more interestingfor application in statistics to study rectangular matrices with no symmetry. We deﬁnea standard Gaussian matrix as a N ˆ M rectangular random matrix ` x i,j ? M ˘ i,j with x i,j independent centered complex Gaussian entries with variance one for all i, j . LetΓ N,M “ ` γ N,M p i, j q ˘ i,j be a N ˆ M matrix with non negative entries. Then, the randommatrix X N,M “ ˆ γ N,M p i, j q x i,j ? M ˙ i “ ,...,Nj “ ,...,M . is called a Gaussian matrix with variance proﬁle Γ N,M . Let Y N,M be a deterministicmatrix of size N ˆ M . The random matrix H N,M “ X N,M ` Y N,M (1.3)is called an information plus noise model with a variance proﬁl .When Y N,M is a ﬁnite rank matrix, then the rectangular information plus noisemodel (1.3) corresponds to the low-rank matrix denoising problem which arises in variousapplications, where it is of interest to estimate a N by M signal matrix Y N,M from noisydata. When the variance proﬁle Γ

N,M is a matrix with equal entries, the problemof estimating the low-rank matrix Y N,M has been extensively studied in statistics andmachine learning [19, 26, 47, 55] using spectral estimators constructed from the singularvalue decomposition of H N,M . These works build upon well understood results of theasymptotic behavior (as min p N, M q Ñ 8 ) of the singular values of H N,M in the Gaussianspiked population model [14, 27].Hence, low-rank matrix estimation is well understood when the additive noise isGaussian with homoscedastic variance (that is a constant variance proﬁle). However,in many applications the noise can be highly heteroscedastic meaning that the amountof variance in the observed data matrix may signiﬁcantly change from entry to entry.Examples can be found in photon imaging [53], network traﬃc analysis [11] or genomicsfor microbiome studies [20]. In such applications, the observations are count data thatare modeled by Poisson or multinomial distributions which leads to heteroscedasticity.The literature on statistical inference from high-dimension matrices with heteroscedasticnoise has thus recently been growing [16, 32, 43, 58, 62].As discussed in [62], the Gaussian model (1.3) with a variance proﬁle can serve as aprototype for various applications involving low-rank matrix denoising in the presenceof heteroscedasticity. The analysis of the outliers in model (1.3) with a non-constantvariance proﬁle is also of interest in neural networks to model synaptic matrix [52]and in signal processing for radio communications [24]. Nevertheless, to the best ofour knowledge, the characterization of potential outliers in model (1.3) when the noise4atrix X N,M has a variance proﬁle that is not constant has not received very muchattention so far.We now address the two questions on deformed random matrices we investigate.

The ﬁrst issue is the global behavior of the eigenvalues of H N . In particular, undergeneral assumptions on Γ N and Y N , we shall consider the question of how approximatingthe empirical spectral distribution (e.s.d.) of H N : µ H N : “ N N ÿ i “ δ λ i p H N q , where δ is the Dirac mass and λ i p H N q the i -th eigenvalue of H N . As commonly done inrandom matrix theory (RMT), we use the Stieltjes transform g H N of the e.s.d. of H N ,namely the map deﬁned by the trace of the resolvent p λ I N ´ H N q ´ of H N that is g H N p λ q : “ N Tr “ p λ I N ´ H N q ´ ‰ . (1.4)In above formula, λ belongs to the set C ` of complex number with positive imaginarypart. When the context is clear, we write λ ´ H N for the matrix λ I N ´ H N . Thistransform has numerous properties and applications, we refer to Benaych-Georges andKnowles lecture notes [13, Section 2.2] for an introduction. Note that for λ “ t ` i η (with t P R and η ą

0) then π ℑ m g H N p λ q “ p µ H N ˚ ρ η qp t q where ˚ denotes the convolutionproduct and ρ η p t q “ ηt ` η is an approximate delta function (Cauchy kernel) as η Ñ ` .Hence for η chosen suﬃciently small, the function t ÞÑ ´ π ℑ m p g H N p t ` i η qq (1.5)is a good approximation of µ H N over R by a random analytic function. In what follows,the smooth density deﬁned by (1.5) will be referred to as the inverse Stieltjes transform of g H N .Then, we ﬁrst address the question of the construction of a deterministic analyticfunction g ˝ H N : C ` Ñ C ´ , which only depends on N , on the variance proﬁle Γ N and on Y N , and that approximates g H N with high probability. The second issue is the eﬀect of a low rank additive perturbation. For notational con-venience, we introduce another deterministic Hermitian matrix Z N of ﬁnite rank k (notdepending on N ). We can thus write Z N “ U N,k Θ k U ˚ N,k , where Θ k is a k ˆ k diagonalmatrix with nonzero diagonal entries and U N,k is a N ˆ k matrice whose columns areorthonormal vectors. The eigenvalues of Z N are commonly called the spikes . The matrix H N “ X N ` Y N ` Z N “ H N ` U N,k Θ k U ˚ N,k (1.6)5s a ﬁnite rank deformation of H N . For such models, we consider the problem of howlocating in the spectrum of H N the eigenvalues coming from Z N that detach from thespectrum of H N .To formulate more precisely this problem, we say that, for a given level of precision ε ą

0, an outlier is a real eigenvalue t of H N which is not in an ε -neighborhood of thespectrum of H N . Following [14], let us ﬁrst factorize the spectrum of H N in H N , in thefollowing sense. For any complex λ which is not in the spectrum of H N , we consider thedecomposition p λ ´ H N q “ p λ ´ H N q ˆ α N p λ q , where α N p λ q “ I N ´ p λ ´ H N q ´ Z N . An outlier is then a real number t P R away from the spectrum of H N such thatdet ` α N p t q ˘ “

0. Therefore, noticing that the determinant of α N p λ q is equal to thedeterminant of the k ˆ k matrix β k p λ q : “ I k ´ U ˚ N,k p λ ´ X N ´ Y N q ´ U N,k Θ k , (1.7)it follows that, to localize the outliers of H N , it is suﬃcient to compute the real numbers t P R such that det ` β k p t q ˘ “ . (1.8)Then, the second question that we address in this paper is the construction of a deter-ministic matrix-valued function β ˝ k , which depends on N , on the variance proﬁle Γ N ,and on the matrices Y N , Z N , and that approximates β k . In this manner, the zeros of β ˝ k that are away from the spectrum of H N shall indeed be close to the outliers of H N . We now provide the formal statements of our main results and the method to answer thetwo questions raised above. We let D N p C ` q (resp. D N p C q ´ ) denote the set of diagonalmatrices Λ “ ` Λ p i, j q ˘ i,j of size N with diagonal entries having positive imaginary (resp.negative) parts. For any matrix A N “ p a i,j q i,j , we denote by ∆ p A N q the diagonal matrixwhose diagonal entries are those of A N .The operator-valued Stieltjes transform G A N of a Hermitian matrix A N is the map G A N : D N p C q ` Ñ D N p C q ´ Λ ÞÑ ∆ “ p Λ ´ A N q ´ ‰ . (1.9)For any Λ P D N p C ` q , we also introduce the mapping R N p Λ q “ diag i “ ,...,N ´ N ÿ j “ γ N p i, j q N Λ p j, j q ¯ . (1.10)that is a key tool in our analysis. The map R N may also be written as R N p Λ q “ deg ` Γ N N Λ ˘ “ E r X N Λ X N s , (1.11)6here deg p A q for a matrix A is the diagonal matrix whose k -diagonal element is the sumof the entries of the k -row of A . We now state our main result on the construction of adeterministic equivalent of the operator-valued Stieltjes transform of deformed randommatrices and its ﬁnite sample properties of approximation. Theorem 1.1.

There exists a unique function G ˝ H N : D N p C q ` Ñ D N p C q ´ , analytic ineach variable, that solves of the following ﬁxed point equation G ˝ H N p Λ q “ ∆ „´ Λ ´ R N ` G ˝ H N p Λ q ˘ ´ Y N ¯ ´  , (1.12) for any Λ P D N p C q ` . Let γ “ max i,j γ N p i, j q , let ă δ ă , and consider Λ P D N p C q ` satisfying ℑ m Λ ě γ max ˆ N p ´ δ q ˙ { I N . (1.13) Then, for any d ą , setting ε N p d q : “ ? γ max c d log p N q N }p ℑ m Λ q ´ } ` ´ ` γ δ }p ℑ m Λ q ´ } ¯ γ }p ℑ m Λ q ´ } N , we have, for N ě , P ´›› G H N p Λ q ´ G ˝ H N p Λ q ›› ě ε N p d q ¯ ď N ´ d , (1.14) where } ¨ } denotes the operator norm of a matrix.If moreover Y N is diagonal, for any Λ P D N p C q ` satisfying ℑ m Λ ě γ max N ´ { p ´ δ q ´ { I N (1.15) then (1.14) holds with ε N p d q : “ ? γ max c d log p N q N }p ℑ m Λ q ´ } ` ` ` γ δ }p ℑ m Λ q ´ } ˘ γ }p ℑ m Λ q ´ } N . The solution G ˝ H N of the ﬁxed point equation is referred to as the deterministicequivalent of the operator-valued Stieltjes transform of H N . It has another descriptiongiven in Section 6.2.2, as the limit of a functional on large random matrices. One mayremark that the results of Theorem 1.1 hold without any assumption on the Hermitianmatrix Y N and the variance proﬁle Γ N . In particular no bound from below for the entriesΓ N is involved to derive the concentration inequality (1.14). If we are given sequencesof matrices with growing dimension N , then this estimate is also meaningful when γ max grows slowly with N .The proof of Theorem 1.1 is divided into three steps that are detailed in Section 5,Section 6 and Section 7. We then deduce the following methods to answer the two issuesraised in Section 1.1.3 and Section 1.1.4. 7 pproximation of the global behavior of the e.s.d. of H N by a smooth den-sity. The proof of Theorem 1.1. with a slightly diﬀerent argument of concentrationimplies the following approximation for the Stieltjes transform of H N (recall that γ “ max i,j γ N p i, j q ). Corollary 1.2.

For any λ P C ` such that ℑ m λ ě γ max ˆ N p ´ δ q ˙ { , (1.16) denoting for d ą ε N p d q : “ ? γ max a d log p N q| ℑ m λ | N ` ´ ` γ δ | ℑ m λ | ¯ γ N p ℑ m λ q , and g ˝ H N p λ q : “ N Tr G ˝ H N p λ I N q , it follows that P `ˇˇ g H N p λ q ´ g ˝ H N p λ q ˇˇ ě ˜ ε N p d q ˘ ď N ´ d , (1.17) where G ˝ H N p λ I N q is the unique solution of (1.12) for the scalar matrix Λ “ λ I N . Ifmoreover Y N is diagonal and ℑ m λ ě γ max N ´ { p ´ δ q ´ { , then (1.17) holds with ˜ ε N p d q : “ ? γ max a d log p N q| ℑ m λ | N ´ ` ´ ` γ δ | ℑ m λ | ¯ γ N { p ℑ m λ q , The concentration inequality (1.17) means that the deterministic equivalent g ˝ H N p λ q “ N Tr G ˝ H N p λ I N q is a good approximation of g H N p λ q when the imaginary part of λ is al-lowed to decay to zero as the dimension N grows at a rate given by the right hand sideof Inequality (1.16). Hence, by the inverse Stieltjes transform formula (1.5), the map t ÞÑ π ℑ m ` g ˝ H N p t ` i η q ˘ is a good approximation for the e.s.d. of H N when η is small.However, compared to existing results in the RMT literature, the lower bound (1.16)on ℑ m λ is not optimal, and thus Corollary 1.2 can only be interpreted as a weak locallaw . More generally, the lower bound condition for ℑ m Λ in Theorem 1.1 means thatall diagonal entries of Λ must have an imaginary part that is suﬃciently large as shownby the Conditions (1.13) and (1.15). Therefore, Theorem 1.1 may be interpreted as an operator-valued weak local law for the operator-valued Stieltjes transform of H N . InSection 2.4, we discuss more precisely the connections between our work and existingresults on local laws for random matrices with a variance proﬁle. Outliers localization in the case where Y N is diagonal. From its deﬁnition,the natural way to construct a deterministic equivalent for β k p λ q is to replace p λ I N ´ X N ´ Y N q ´ in expression (1.7) by an appropriate deterministic estimate. When Y N isdiagonal, then for any Λ P D N p C q ` , the matrix E “ p Λ ´ X N ´ Y N q ´ ‰ is also diagonal(thanks to Corollary 7.2). Using a concentration inequality, one has that p Λ ´ X N ´ Y N q ´ is thus close to a diagonal matrix, and so we can replace this generalized resolvent withthe deterministic equivalent G ˝ X N ` Y N p Λ q of the operator-valued Stieltjes transform.8 orollary 1.3. We assume that Y N is diagonal and deﬁne the deterministic matrixvalued-function function β ˝ k p λ q “ I k ´ U ˚ N,k G ˝ H N p λ I N q U N,k Θ k , for λ P C ` . (1.18) Then, for any λ such that ℑ m λ ě γ max N ´ { p ´ δ q ´ { , then β ˝ k p λ q is a deterministicequivalent of β k p λ q , in the sense that P ` } β k p λ q ´ β ˝ k p λ q} ě } Θ k } ε N p d q ˘ ď k N ´ d , (1.19) where, for any d ą , ε N p d q : “ ? kγ max a d log p N q| ℑ m λ | N ´ { ` ´ ` γ δ | ℑ m λ | ¯ γ N { p ℑ m λ q . The proof of Inequality (1.19) can be found in Section 7.

Outliers localization, general case.

In general, if Y N is not diagonal then the ex-pectation of the generalized resolvent, that is E “ p Λ ´ X N ´ Y N q ´ ‰ , is no longer a diagonalmatrix. Hence it is no longer correct to approximate β k p λ q by replacing p Λ ´ X N ´ Y N q ´ with G ˝ H N p λ I N q in Equation (1.7). Yet, it is proved in Section 5 that the generalizedresolvent p Λ ´ X N ´ Y N q ´ can be approximated by the following deterministic matrix p Λ ´ X N ´ Y N q ´ « ` Ω X N ,Y N p Λ q ´ Y N ˘ ´ , (1.20)with Ω X N ,Y N p Λ q : “ Λ ´ R N ` E “ G X N ` Y N p Λ q ‰˘ . It appears that the matrix Ω X N ,Y N p Λ q can be interpreted as an approximate operator-valued subordination function, and werefer to Sections 2.2 and 2.3 for further details and discussion on this heuristic. Corollary 1.4.

We deﬁne the deterministic matrix valued-function function ˜ β ˝ k p λ q “ I k ´ U ˚ N,k ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ U N,k Θ k , for λ P C ` , (1.21) where Ω ˝ H N p Λ q : “ Λ ´ R N ` G ˝ H N p Λ q ˘ , with G ˝ H N p Λ q solution of the ﬁxed point Equation (1.12) . Then, for any λ such that ℑ m λ ě γ max ´ N p ´ δ q ¯ { , then β ˝ k p λ q is a deterministic equivalent of β k p λ q , in thesense that P ` } β k p λ q ´ ˜ β ˝ k p λ q} ě } Θ k } ¯ ε N p d q ˘ ď k N ´ d , (1.22) where, for any d ą , ¯ ε N p d q : “ ? kγ max a d log p N q| ℑ m λ | N ´ { ` γ N p ℑ m λ q ` ´ ` γ δ | ℑ m λ | ¯ γ N p ℑ m λ q . emark . A key argument developed to obtain the proof of Inequality (1.22) is thederivation of an upper bound on the diﬀerence in operator norm between E “ p λ I N ´ X N ´ Y N q ´ ‰ and its approximation by ` Ω ˝ H N p λ I N q´ Y N ˘ ´ . In the general case, when λ is heldﬁxed, this error term decays at the rate N ´ whereas, in the case where Y N is diagonal,the operator norm of the diﬀerence between E “ p λ I N ´ X N ´ Y N q ´ ‰ and G ˝ H N p λ I N q decays at the rate N ´ { . Note that an integrable decay such as N ´p ` η q , η ą

0, impliesthe convergence of the spectrum also called strong convergence [46]: almost surely for N large enough the spectrum of X N ` Y N belongs to a small neighborhood of the supportof the measure whose Stieltjes transform is g ˝ H N . Under mild assumptions, the strongconvergence of H N is proved in [30, Corollary 2.3] with Y N possibly non diagonal. Remark . In general, even when Y N “ Z N , and there is no longer a canonical way to associate an outlierto a speciﬁc spike as it is the case when the variance proﬁle has constant entries. Weshall discuss this speciﬁc case in the literature review proposed in Section 2, and thisproperty will be illustrated by numerical experiments in Section 3. In Section 2 we relate the approach followed in this paper to the existing literature inRMT and free probability on deformed models and outlier detection. Then, we reportthe results of numerical experiments in Section 3 to shed some light on the beneﬁts of ourapproach to localize potential outliers in information plus noise models with a varianceproﬁle. We present the strategy of the proof of Theorem 1.1 and its organization inSection 4. The details of the main steps of the proof are then gathered in Section 5,Section 6 and Section 7.

Acknowledgements.

Jérémie Bigot is a member of Institut Universitaire de France(IUF), and this work has been carried out with ﬁnancial support from the IUF. CamilleMale received the support of the Simons CRM Scholar-in-Residence Program. We wouldlike to also thank László Erdös for his remarks on a previous version of this paper andfor pointing out recent references.

For the sake of reproducible research, Python scripts available at the following address allow to reproduce the numerical experiments carried out in this paper.

We ﬁrst discuss the case when all entries of the variance proﬁle equal one, that is X N is a standard GUE matrix . The celebrated Wigner’s Theorem [61] states that10he empirical spectral distribution of X N converges to the semicircular distribution µ x : “ p π q ´ ? ´ t r´ , s d t . Pastur studies in [49] the global behavior of a GUEmatrix with additive perturbation. Assuming that the e.s.d. µ Y N of Y N converges to ameasure µ y , then almost surely the e.s.d. of X N ` Y N converges to a measure denoted µ x ` y . In general, there is no explicit description of µ x ` y , but the limiting Stieltjes trans-form g x ` y of X N ` Y N satisﬁes the so-called Pastur’s equation, which is expressed interms of the Stieltjes transform g y of µ y as follows: for all λ in C ` , we have g x ` y p λ q “ g y ` λ ´ g x ` y p λ q ˘ . (2.1)Note that Y N “ µ y “ δ and the above equation reads g x p λ q “ ` λ ´ g x p λ q ˘ ´ .In practical applications, we are given a matrix Y N of ﬁxed size and not a sequencewhose e.s.d. converges to a certain distribution µ y . The deterministic equivalent method,for Pastur’s equation (2.1), consists in replacing µ y by the true e.s.d. µ Y N . In this setting,there is a unique analytic map g ˝ X N ` Y N : C ` Ñ C ´ , solution of the ﬁxed-point equation g p λ q “ g Y N ` λ ´ g p λ q ˘ , @ λ P C ` , (2.2)for a map g : C ` Ñ C ´ . Note that there is a convenient abuse of notation in the sensethat g ˝ X N ` Y N depends only on Y N . We say that g ˝ X N ` Y N is a deterministic equivalent of g X N ` Y N . The interest is that g ˝ X N ` Y N is a good approximation of g x ` y of the Stieltjestransform that we can approximate numerically thanks to the ﬁxed-point equation (2.2).A fundamental result of phase transition for ﬁnite rank deformed random matriceswas discovered by Ben Arous, Baik and Péché [10] in the slightly diﬀerent model ofGaussian matrices without symmetry. Called in short BBP-transition, the analogueresult for GUE matrices [51] states that, in the large N limit, a spike θ of Z N will createan outlier σ in X N ` Z N only if θ ą σ “ θ ` θ ´ . For related results, wealso refer to [22, 23, 33, 34]. Remark . When all entries of the variance proﬁle equal one, we emphasis that theexistence and the position of an outlier is independent of the eigenvector of Z N associatedto the spike θ . More generally, the issues discussed above have also been considered when X N is aunitary invariant random matrix. In this context, Voiculescu’s notion of asymptoticfreeness [60] implies a generalization of Pastur’s equation. Assume that µ X N and µ Y N have limiting e.s.d. µ x and µ y respectively. Recall that, denoting g x the Stieltjes trans-form of µ x , the R -transform R x of x is the analytic map satisfying g x p λ q “ ´ λ ´ R x ` g x p λ q ˘¯ ´ . (2.3)Then µ X N ` Y N converges to a measure µ x ` y “ µ x ‘ µ y , called the free convolution of µ x and µ y . This limit is characterized by the so-called subordination property: @ λ P C ` , g x ` y p λ q “ g y ´ λ ´ R x ` g x ` y p λ q ˘¯ . (2.4)11he map R x is linear if and only if µ x is a centered semicircular distribution. The methodof deterministic equivalent can be extended to this case using Voiculescu’s equation (2.4)instead of Pastur’s one. The diﬃculty in general is to compute the R -transform R x , orto replace it with a good approximation.For ﬁnite rank deformation, an important discovery was made in the early decadeby Capitaine [21]. We recall the heuristic presented in [12] and refer to this paper forthe mathematical arguments, without deﬁning the notions of free probability. In thecontext of Voiculescu’s problem, the limit x of X N and y of Y N are modeled in the freevon Neumann algebra generated by two self-adjoint variables x and y with distribution µ x and µ y respectively. Let E y be the projection on the von Neumann algebra generatedby y . Then Biane proved [15] that there exists an analytic map ω x,y deﬁned outside thespectrum of x ` y , called the subordination function, such thatE y “ p λ ´ x ´ y q ´ ‰ “ ` ω x,y p λ q ´ y ˘ ´ , @ λ P C ` . (2.5)The above equality means that the projection of the resolvent of x ` y equals the resolventof y evaluated at the subordination function. Taking the trace in the identity yields therelation for Stieltjes transforms g x ` y p λ q “ g y ` ω x,y p λ q ˘ , @ λ P C ` . The reader not familiar with free probability language can still translate this resultinto a heuristic for unitary invariant matrices by replacing the condition expectation E y by the classical expectation E and the variables by the matrices. One method tointroduce an approximate subordination function consists in setting ω X N ,Y N p λ q : “ ` E “ p λ ´ X N ´ Y N q ´ ‰˘ ´ ` Y N , so that we have E “ p λ ´ X N ´ Y N q ´ ‰ “ ` ω X N ,Y N p λ q ´ Y N ˘ ´ , similar to (2.5).The matrix ω X N ,Y N is not a scalar, but this property is true for the compressioninvolved in the outlier detection problem, in the following sense. If X N is unitarilyinvariant, then E “ p λ ´ X N ´ Y N q ´ ‰ belongs to the unital algebra generated by Y N . Letnow assume that Z N “ U N,k Θ U ˚ N,k whose eigenvectors are orthogonal to those of Y N .This property ensures that the spikes are eigenvalues of Y N ` Z N . It implies that for anymatrix A in the algebra generated by Y N , its compression A ÞÑ U ˚ N,k AU N,k is a scalarmatrix. Hence in particular U ˚ N,k ω X N ,Y N p λ q U N,k “ ` U ˚ N,k E “ p λ ´ X N ´ Y N q ´ ‰ U N,k ˘ ´ ` , is a scalar matrix, and it is actually a good approximation for ω x,y p λ q I k .Concentration properties of unitarily invariant matrices implies that the function β k : λ ÞÑ I k ´ U ˚ N,k p λ ´ X N ´ Y N q ´ U N,k Θ k deﬁned in (1.7) is close to its expectation,and so det p β k p λ qq is well approximated by k ź i “ ´ ´ θ i ω X N ,Y N p λ q ¯ , θ , . . . , θ k denote the eigenvalues of Z N . Hence, we retrieve the fundamentalCapitaine’s relation [21] between spikes and outliers, namely in the large N limit, thelocations of the outliers belong to the pre-image by ω x,y of the spikes. Remark . When X N is unitary invariant random matrix and assuming that the eigen-vectors of Z N are orthogonal to those of Y N , the locations of outliers in X N ` Y N ` Z N depend individually on the eigenvalues of Z N . They do not depend on its eigenvectors,and the outliers generated by a spike θ do not depend on the other spikes. The asymptotic of GUE matrix with a variance proﬁle is characterized by Shlyakhtenkoin [56] in the multi-matrix setting of operator-valued free probability over the diagonal.Assuming the variance proﬁles are of the form γ N p i, j q “ γ ` iN , jN ˘ , for some boundedreal-valued function γ : r , s Ñ R ` , Shlyakhtenko proved that independent GUE ma-trices with variance proﬁles are asymptotically free with amalgamation over the diagonal .In particular, this implies that the e.s.d. of X N converges almost surely, and thatthe limit is characterized by an integral operator with kernel γ . The approach is in thelineage of previous works on band matrices, see [17, 41, 48], for which it was observedthat to derive the limiting e.s.d. of a random matrix one can derive a system of linearequations for the diagonal of the resolvent p λ I N ´ X N q ´ of the matrix. An interestin Shlyakhtenko’s approach is the use of the notion of operator-valued free probability,which (in particular) nicely generalizes Pastur’s equation.More formally, Shlyakhtenko considers in [56] the operator-valued Stieltjes transform G X N of X N , which the a map between sets of diagonal matrices deﬁned by (1.9). For anybounded function Λ : r , s Ñ C ` , let Λ N P D N p C q ` deﬁned by Λ N p i, i q “ Λ ` iN ˘ . Then,for any Λ : r , s Ñ C ` , the diagonal matrix G X N p Λ N q , seen as a piece-wise constantfunction on r , s , converges to a function G x p Λ q in L ` r , s , C ´ ˘ . The functional map G x is characterized the identity G x p Λ q “ ´ Λ ´ R x ` G x p Λ q ˘¯ ´ , (2.6)where for any Λ in L ` r , s , C ˘ , R x p Λ q : “ ż γ p ¨ , y q Λ p y q d y. (2.7)Since R x is linear, we say that the abstract limit x of X N (which lives in a von Neumannalgebra) is a semicircular variable with amalgamation over the diagonal. Note that themapping R N deﬁned by (1.10) is a discretization of the functional map R x .Recently, the traﬃc method yields to the observation that asymptotic freeness overthe diagonal was a generic rule for large permutation invariant random matrices with avariance proﬁle [7]. We mention brieﬂy a consequence of this result, referring to [45] fordeﬁnitions. Let X N “ X N ˝ p γ ij q i,j be the entry-wise product of a permutation invariantrandom matrix X N that converges in traﬃc distribution and of a matrix p γ ij q i,j “ ,...,N Y N be amatrix bounded in operator norm that converges in traﬃc distribution. Then, under theadditional assumption that Y N is permutation invariant, X N and Y N are asymptoticallyfree over the diagonal. They converge to elements x and y of a von Neumann algebraendowed with a conditional expectation ∆, and their operator-valued Stieltjes transformssatisfy G x ` y p Λ q “ G y ´ Λ ´ R x ` G x ` y p Λ q ˘¯ , (2.8)where R x is such that the above equation is valid for y “

0. For more details about theoperator-valued subordination property, we refer to [59].A motivation of our work is to specify this statement, in a comprehensive way, when X N is a GUE matrix with a variance proﬁle and to give an estimate for its operator-valued Stieltjes transform. Then, we explicit the associated deterministic equivalent andwe show how to adapt Capitaine’s approach for outlier detection. In the RMT literature, Hermitian random matrices with centered entries but non-equaldistribution are referred to as generalized Wigner matrices for which many asymptoticproperties are now well understood in a precise sense. For example, under the assumptionthat the variance proﬁle is bi-stochastic (that is its rows and columns elements sump upto one), bulk universality at optimal spectral resolution for local spectral statistics havebeen established in [31] and they are shown to converge to those of the GUE. The caseof a Wigner matrix with a variance proﬁle that is not necessarily bi-stochastic has beenstudied in [2], and non-hermitian random matrices with a variance proﬁle have beenconsidered in [25, 38, 39] using the notion of deterministic equivalent.Under mild assumptions on the variance proﬁle Γ N , the e.s.d. of a generalized Wignermatrice converges to a limiting spectral measure for which there is generally no explicitformula. Currently, a classical method to approximate an asymptotic spectral measureis to solve a nonlinear system of deterministic equations that are referred to as theDyson equation [2–5]. For each ﬁxed dimension N , the solution of this equation yieldsa deterministic equivalent of the resolvent of H N “ X N ` Y N . These equations arealso equivalent to the operator-valued equation of the subordination functions where theparameter is scalar, instead of being functional. For example, the vector Dyson equation studied in details in [2] corresponds to Equation (1.12) with Λ “ λ I N and Y N a diagonalmatrix. The matrix Dyson equation , introduced in [3] to study Hermitian random ma-trices with correlated entries and nonzero expectation, is the following nonlinear matrixequation formulated for an unknown matrix-valued function A N : C ` Ñ C N ˆ N I N ´ p λ I N ´ S N p A N p λ qq ´ Y N q A N p λ q “ λ P C ` , where S N is the mapping (see Equation (1.3) in [3]) S N p A q “ E “ X N A X N ‰ “ R N ` ∆ p A q ˘ , for A P C N ˆ N . (2.10)14hus, provided that A N p λ q is invertible, Equation (2.9) may be written as A N p λ q “ p λ I N ´ R N p ∆ p A N p λ qqq ´ Y N q ´ . Hence, applying the operator ∆ of both sides of the above equality yields the ﬁxed pointEquation (1.12) with scalar parameter Λ “ λ I N .The existence and stability of the solutions of the vector and matrix Dyson equationare studied in details in [2] and [3] respectively. These deterministic vector or matrixvalued functions (parametrized by λ P C ` ) are used to prove local laws for the resolventof H N . In RMT, the derivation of local laws refers to results controlling the diﬀerencebetween the Stieltjes transform g H N p λ q (or the resolvent p λ I N ´ H N q ´ ) and a determin-istic function when ℑ m λ is allowed to decay to zero at a rate depending on N . Derivinga local law for the the Stieltjes transform is a generally a delicate problem that is moreinvolved than proving a global law which refers to the convergence (e.g. in probability)of g H N p λ q for a ﬁxed value of λ P C ` . The notion of local semicircular law for Wignermatrices, which constituted the central open question known as Wigner-Dyson-Mehtaconjecture, was solved in 2011 independently by [28] and [57]. For detailed lecture noteson this notion, we refer to [13].In the case where Y N “ X N is a standard Wigner matrix satisfying mildassumptions (with a constant variance proﬁle), then the optimal local law for the Stieltjestransform and the resolvent of H N “ X N reads as the following concentration inequalities[13, Theorem 2.6]: for a ﬁxed τ ą

0, deﬁne the complex domain C N p τ q “ λ P C ˚ : | λ | ď τ ´ , N ´ ` τ ď ℑ m λ ( . Then, denoting by g sc the Stieltjes transform of the semicircular distribution µ sc , forany ε ą D ą

0, one has that P ´ | g H N p λ q ´ g sc p λ q| ě N ε ψ p q N p λ q ¯ ď N ´ D with ψ p q N p λ q “ N ℑ m λ , and, uniformly for i, j “ . . . , N , P ´ |p λ I N ´ H N q ´ p i, j q ´ g sc p λ q δ ij | ě N ε ψ p q N p λ q ¯ ď N ´ D , with ψ p q N p λ q “ c ℑ m p g sc p λ qq N ℑ m λ ` N ℑ m λ , for all λ P C N p τ q and all suﬃciently large N ě N p ε, D q . The above deviation inequalitiesare called optimal local laws as the rate of convergence of the error terms ψ p q N p λ q and ψ p q N p λ q as λ “ λ N tends to zero slower than N ´ are known to be optimal for t “ ℜ e p λ q P r´ , s which is the support of µ sc (see e.g. [13, Section 2]). Note that when t is outside r´ , s , faster error rate (as λ Ñ

0) may be obtained (see e.g. [13, Theorem10.3]).The derivation of optimal local laws for Hermitian random matrices with an arbitraryvariance proﬁle using the deterministic solution of a Dyson equation has been largely15nvestigated by László Erdös and his collaborators over the last decade, and for a recentoverview we refer to Erdös’s lecture notes [29]. In [2, 3], the authors make several as-sumptions to derive local laws, in particular they suppose that the entries of the varianceproﬁle Γ N are bounded away from below. In [30] this assumption is relaxed: a weak locallaw is obtained for the model H N “ X N ` Y N without lower bound assumptions on thevariance proﬁle. The generalized Wigner matrix X N can possibly have correlated entriesthat are not necessarily Gaussian random variables. More precisely, from [30, Theorem2.1] (see also the preliminary result [6, Lemma B.1]) the following deviation inequalitieshold: for a ﬁxed δ ą C outN p δ q “ ! λ P C : | λ | ď N C , dist p λ, supp µ ˝ q ě N ´ δ ) for some arbitrary C ě µ ˝ is the probability measure associated to theStieltjes transform N Tr “ A ˝ N p λ q ‰ where A ˝ N is the solution of the matrix Dyson equation(2.9) (see [30, Section 2]). Then, from [30, Theorem 2.1] it follows that for any ε ą δ ą D ą P ˆ |x u, pp λ I N ´ H N q ´ ´ A ˝ N p λ qq v y| ě N ε p ` | λ |q ? N in C outN p δ q ˙ ď CN ´ D , (2.11)for all unit vectors u, v P C N , and P ˆ | g H N p λ q ´ N Tr “ A ˝ N p λ q ‰ | ě N ε p ` | λ |q N in C outN p δ q ˙ ď CN ´ D , (2.12)where C is a constant depending on ε, D and other quantities not depending on N .Inequalities (2.11) and (2.12) hold under general assumptions on the distributions ofthe entries of X N and the additive deformation Y N (we refer to [30, Section 2.2] forfurther details) but without assuming that the entries of Γ N are lower bounded. In [30]this result is referred to as a local law outside the support of µ ˝ as the error terms inInequalities (2.11) and (2.12) do not decay at an optimal rate when t “ ℜ e p λ q belongto supp µ ˝ . To obtain an optimal local law when t P supp µ ˝ (that is in the bulk of thespectrum) a ﬂatness condition on the variance proﬁle is added in [30, Section 2.2] whichimplies that the entries of Γ N are bounded from below by positive constant. Under thissupplementary condition, the following deviation inequalities hold: for ﬁxed δ, γ ą C outN p δ, γ q “ ! λ “ t ` i η : | λ | ď N C , | η | ě N ´ ` γ , ρ p x q ` dist p x, supp µ ˝ q ě N ´ δ ) , where ρ is the density of µ . Then, from [30, Theorem 2.2] it follows that for any γ, ε ą δ ą D ą P ˆ |x u, pp λ I N ´ H N q ´ ´ A ˝ N p λ qq v y| ě N ε ? N η in C outN p δ q ˙ ď CN ´ D , (2.13)16or all unit vectors u, v P C N , and P ˆ | g H N p λ q ´ N Tr “ A ˝ N p λ q ‰ | ě N ε N η in C outN p δ q ˙ ď CN ´ D . (2.14)The error terms in Inequalities (2.13) and (2.14) decay at the optimal rate when t P supp µ ˝ and η “ ℑ m p λ q becomes small, and this result is thus an optimal local lawwithin the bulk of the spectrum.Comparing Inequalities (2.11)-(2.14) with the results stated in Theorem 1.1 (whenΛ “ λ I N is scalar) and Corollary 1.2, it follows that we have only obtained a weaklocal law on the convergence of the operator-valued Stieltjes transform G H N and thescalar Stieltjes transform g H N to their deterministic equivalent, since the decay of theerror terms in Inequalities (1.14) and (1.17) as ℑ m p λ q Ñ “ λ I N is scalar, the results of Theorem 1.1 are clearly sub-optimal as betterconcentration bounds already exist in the literature that also go beyond the Gaussianassumption for the entries of X N . Nevertheless, to the best of our knowledge, obtaininga weak local law for the operator-valued Stieltjes transform G H N is a novel result. Hence,it would be interesting to have a generalization of optimal local laws as in [2, 3, 30] butin an operator-valued sense, that is for p Λ ´ H N q ´ , when Λ is not a scalar matrix aswe consider in the current paper.Finally, it should be mentioned that the results from Corollary 1.3 and Corollary1.4 on the convergence of the deterministic equivalents β ˝ k and ˜ β ˝ k to the matrix-valuedfunction β k are novel. Indeed, we are not aware of other works dealing with the issueof localizing outliers in the singular values distribution of the matrix H N “ X N ` Y N ` Z N when X N has an arbitrary variance proﬁle. The question of the optimality of thedeviation inequalities (1.19) and (1.22) is obviously left open. In this section, we report results on numerical experiments on the localization of outliersin the rectangular information plus noise model (1.3) for various variance proﬁles andadditive perturbations that may posses spikes generating outliers.

We introduce the model H N,M “ X N,M ` Y N,M ` Z N,M . (3.1)where Z N,M “ U N,k Θ k V ˚ M,k is a low-rank matrix N ˆ M with k spikes that are equal tothe singular values of Z N,M , where Θ k is k ˆ k diagonal matrix with positive diagonalentries and U N,k (resp. V M,k ) is the matrix whose columns are the left (resp. right)singular vectors of Z N,M .The question of locating potential outliers in model (3.1) can be answered using theapproach developed in this paper for the Hermitian setting. To this end, we use the17rinciple of

Hermitian dilation [50] which corresponds to embed any rectangular matrix A N,M (with complex entries) of size N ˆ M within a larger Hermitian block matrix bydeﬁning D p A N,M q “ „ A N,M A ˚ N,M  . (3.2)Note that if one denotes by σ ě . . . ě σ r ą A N,M assumed tobe of rank r , then the spectrum of the Hermitian matrix D p A N,M q is t´ σ ď . . . ď ´ σ r ď ď σ r ď . . . ď σ u where the eigenvalue 0 is of multiplicity M ` N ´ r . By applying Hermitian dilationto Equation (1.3), we obtain that D p H N,M q “ D p X N,M q ` D p Y N,M q ` D p Z N,M q , which is a ﬁnite rank deformation of the GUE model D p H N,M q “ D p X N,M q ` D p Y N,M q .For i ď N and j ě N ` p i, j q of D p X N,M q is a centered complex Gaussianvariable with variance γ N,M p i,j q M satisfying x ij ? M “ b N ` MM ` x ij ? N ` M ˘ . Therefore, D p X N,M q is a GUE matrix of size N ` M with variance proﬁle N ` MM D p Γ N,M q , where the zeroentries of D p X N,M q are considered as centered Gaussian variables with variance equal tozero. The additive perturbation D p Z N,M q is a matrix of rank 2 k that can be written as D p Z N,M q “ W N ` M, k „ Θ k ´ Θ k  W ˚ N ` M, k where W N ` M, k “ ? „ ´ U N,k U N,k ´ V M,k ´ V M,k  is a p N ` M q ˆ k matrix whose columns are orthonormal vectors. After introducingthe Hermitian dilation of model (1.3), one may thus consider the function G ˝ D p H q :D N ` M p C q ` Ñ D N ` M p C q ´ , analytic in each variable, that is the unique solution of theﬁxed point equation G ˝ D p H q p Λ q “ ∆ „´ Λ ´ R N ` M ` G ˝ D p H q p Λ q ˘ ´ D p Y N,M q ¯ ´  . (3.3)which holds for any Λ P D N ` M p C q ` . Following Equation (1.11), R N ` M is the mapdeﬁned on D N ` M p C q ` by R N ` M p Λ q “ deg ´ N ` M ˆ N ` MM D p Γ q Λ ¯ “ deg ´ D p Γ q M Λ ¯ . (3.4)Hence, after Hermitian dilation, one may follow the approach described in Section 1.2 toapproximate of the global behavior of the empirical distribution of the singular valuesof H N,M and to localize potential outliers generated by the spikes of Z N,M .18 emark . In the RMT literature, the Hermitian dilation (3.2) of a rectangular randommatrices is classically referred to as Girko’s Hermitization trick [35, 36]. As our studyof a GUE matrix with variance proﬁle allows the setting where large blocks of D p Γ N,M q are equal to zero, treating the setting of the information plus noise model using Girko’sHermitization is an immediate application of our results in the Hermitian case. Asexplained e.g. in [4, 5] this Hermitization trick may also be used beyond the Gaussiancase. However, a direct application of the results in [2, 3] on the vector and matrixDyson equation for the study of generalized Wigner matrices is not possible as a keyassumption in these papers is that the entries of the variance proﬁle must be boundedfrom below. Hence, beyond the Gaussian case, the use of Girko’s Hermitization requires aspeciﬁc treatment in [4,5] and the introduction of a system of quadratic vector equationsextensively studied in [1] that relates the resolvent of X N,M X ˚ N,M to the resolvent of D p X N,M q via the equality p λ I N ´ X N X ˚ N q ´ “ A , p λ q{ λ, where A , p λ q denotes theupper left N ˆ N block of p λ I N ` M ´ D p X N,M qq ´ . Let us ﬁrst consider the rectangular model (3.1) under the simpliﬁed setting Y N,M “ k “

1, that is H N,M “ X N,M ` Z N,M , where Z N,M “ θu N v ˚ M , (3.5)where u N P R N , v M P R M are unit vectors, θ ą X N,M is a rectangularGaussian matrix with a variance proﬁle Γ

N,M “ ` γ N,M p i, j q ˘ i,j . We assume that thevariance proﬁle satisﬁes 1 N N ÿ i “ M ÿ j “ γ N,M p i, j q M “ , (3.6)which ensures the normalization condition E “ N Tr X N,M X ˚ N,M ‰ “

1. In all the numericalexperiments we took N “

360 and M “ λ such thatdet ` β ˝ p λ q ˘ “ , (3.7)where β ˝ p λ q “ I ´ „ ´ u ˚ N ´ v ˚ M u ˚ N ´ v ˚ M  G ˝ D p H q p λ I N ` M q „ ´ u N u N ´ v M ´ v M  „ θ ´ θ  , (3.8)and G ˝ D p H q p λ I N ` M q is the solution of the ﬁxed point equation (3.3), with D p Y N,M q “ Remark . We choose to directly search for potential outliers by minimizing β ˝ overthe set of reals λ P R rather than over the set of complex values p λ “ t ` i η q t P R with asmall and ﬁxed value η ą λ ) is easily obtained by the followingiterative procedure G ˝ n ` p λ q “ ∆ „´ λ I N ` M ´ R N ` M ` G ˝ n p λ q ˘¯ ´  , (3.9)that is stopped for n suﬃciently large or when the diﬀerence between two successiveiterations is suﬃciently small. Since λ I N ` M ´ R N ` M ` G ˝ n p λ q ˘ is a diagonal matrix,the ﬁxed point iteration corresponds to the numerical evaluation of the vector Dysonequation [2], and it simpliﬁes to the vector equation G ˝ n ` p λ q “ λ N ` M ´ D p Γ N,M q M G ˝ n p λ q . (3.10)In practice, to ﬁnd a potential solution to equation (3.7), we use a numerical optimizationprocedure to obtain a minimizer λ “ ¯ λ p θ q of the function λ ÞÑ det ` β ˝ p λ q ˘ over R ` . Tothis end, we have used Python’s command fminsearch which is based on the Nelder-Mead simplex method. Then, if the value det ` β ˝ p ¯ λ p θ qq ˘ is suﬃciently close to zero , weconclude that ¯ λ p θ q is an outlier.Finally, a smooth approximation of the singular values distribution (s.v.d.) of H N,M “ X N,M at location t P R may be obtained from the inverse Stieltjes transform formula(1.5). Letting g ˝ n p λ q “ N ` M Tr G ˝ n p λ q , we deﬁne f n p t q “ ´ π ℑ m p g ˝ n p t ` i η qq for η ą g ˝ n p λ q is an approximation of the dilation matrix D p H N,M )whose eigenvalue values are 0 (with multiplicity M ´ N ) and t´ σ , . . . , ´ σ N , σ N , . . . , σ u where σ N ď . . . ď σ are the singular values of H N,M , and that the inverse Stieltjes trans-form (1.5) amounts to approximate a measure by a convolution with the Cauchy kernel t ÞÑ ηt ` η . Therefore, an approximation of the s.v.d. of H N,M is given by the density˜ f n p t q “ ´ p M ´ N q{p M ` N q ˆ f n p t q ´ M ´ NM ` N ηt ` η ˙ , t ě . (3.11) We propose to validate this way of localizing outliers by ﬁrst considering the standardcase where the variance proﬁle Γ

N,M has constant entries equal to one. This settingcorresponds the so-called

Gaussian spike population model for which the asymptoticbehavior (as min p N, M q Ñ `8 ) of the singular values of H N,M is well understood[14, 27, 44, 55] when the rank k of the additive deformation Z N,M is held ﬁxed. In theasymptotic framework where the sequence M “ M N ě N is such that lim N Ñ`8 NM “ c with 0 ă c ď

1, it is well known [9] that the empirical distribution of the singularvalues of X N,M converges, as N Ñ `8 , to the quarter circle distribution if c “

1, orto its generalized version if c ă

1, called the Marchenko-Pastur distribution, which issupported on the compact interval r c ´ , c ` s with c ˘ “ ˘ ? c where c ` is the so-calledbulk (right) edge. Now, if one denotes by σ ě . . . ě σ N ą N ď M ) the20ingular values of H N,M , then the following result holds (see e.g. Theorem 2.8 in [14]and Proposition 9 in [55]). Almost surely, one has thatlim N Ñ`8 σ “ " λ c p θ q if θ ą c { ,c ` otherwise,and lim N Ñ`8 σ N “ c ´ , where λ c p θ q “ c p ` θ qp c ` θ q θ for any θ ą c { . (3.12)The interpretation of this result, called the BBP transition after [10], is as follows. If thespike θ in model (3.5) is larger than c { then an outlier exists and it is asymptoticallylocated at λ c p θ q ą c ` . To the contrary, if θ ď c { then there exists no outlier as thelargest singular value H N,M converges to the bulk edge c ` .For a constant variance proﬁle, an explicit solution of the equation det ` β ˝ p λ q ˘ “ Lemma 3.3.

Assume that Γ N,M p i, j q “ for any i, j . Then, the Equation (3.7) admitsa solution given by λ N { M p θ q “ d p ` θ qp NM ` θ q θ provided that θ ą ˆ NM ˙ { . (3.13) Proof.

Let us ﬁrst determine the solution G ˝ D p H q p λ I N ` M q of the ﬁxed point equation (3.3)for a constant variance proﬁle, namely Γ N,M p i, j q “ i, j . Using the particularstructure of the variance proﬁle D p Γ N,M q and the expression (3.4) of R N ` M , one obtainsby simple calculations that G ˝ D p H q p λ I N ` M q “ „ g ˝ N p λ q I N g ˝ M p λ q I M  , where g ˝ N , g ˝ M arecomplex-valued functions satisfying g ˝ N p λ q “ p λ ´ g ˝ M p λ qq ´ and g ˝ M p λ q “ ` λ ´ NM ´ b p ` λ ´ NM q ´ λ λ . Inserting this expression for G ˝ D p H q p λ I N ` M q into (3.8), one obtains thatdet ` β ˝ p λ q ˘ “ ´ θ g ˝ N p λ q g ˝ M p λ q . Then, by simple calculations, it can be shown that the equation det ` β ˝ p λ q ˘ “ θ ą ` NM ˘ { .Note that the condition θ ą ` NM ˘ { guarantees that λ N { M p θ q ą ` b NM . Hence, weretrieve the expression (3.12) of the asymptotic location of an outlier in the Gaussianspike population model where the asymptotic ratio c “ lim N Ñ`8 NM is replaced with its21 .0 0.5 1.0 1.5 2.0 2.50.00.10.20.30.40.50.60.70.80.9 (a) (b) (c) Figure 1: Constant variance proﬁle. (a) Histogram of the singular values of one re-alization of H N,M for θ “

2. The black curve is the smooth approximation by ˜ f n of the singular values distribution of X N,M . The red vertical line denotes the value λ N { M p q « .

48 which is the approximation of the location of the outlier, while theblue vertical dashed line denotes the location of the singular value of H N,M which is thecloset to λ N { M p q . (b) The red line is the curve θ ÞÑ max ´ ` b NM , λ N { M p θ q ¯ , andthe blue dots are the points p θ, ¯ λ p θ qq where ¯ λ p θ q is found by numerical minimization of λ ÞÑ det ` β ˝ p λ q ˘ for θ ranging in a grid of 50 regularly spaced values in r , s . The blackvertical line is located at p NM q { and its height is 1 ` b NM . (c) The blue dashed line isthe curve θ ÞÑ det ` β ˝ p ¯ λ p θ qq ˘ .non-asymptotic approximation NM . As expected, we also remark that the localization λ N { M p θ q of an outlier does not depend on the singular vectors u N and v N of the additiveperturbation Z N,M .In Figure 1(a), we display the histogram of the singular values of one realization of H N,M “ X N,M ` θu N v ˚ M with θ “

2, where u N and v M are chosen to be unit vectors withconstant entries. There is clearly an outlier outside the interval r ´ b NM , ` b NM s « . , . s . In Figure 1(a), we also plot the curve x ÞÑ ˜ f n p x q which shows that thedensity deﬁned by (3.11) is a very satisfactory approximation the distribution of the sin-gular values of X N,M . In Figure 1(b), we plot the curve θ ÞÑ max ´ ` b NM , λ N { M p θ q ¯ for θ P r , s , which gives the location of outliers for any θ ą ` NM ˘ { « . θ on r , s , we also report the results of thenumerical procedure that we use to compute a minimizer ¯ λ p θ q of the function λ ÞÑ det ` β ˝ p λ q ˘ over R ` . In Figure 1(c), we display the curve θ ÞÑ det ` β ˝ p ¯ λ p θ qq ˘ . It canbe seen that this curve is close to zero for θ ą ` NM ˘ { , and that it does not vanishfor smallest values of θ which is in agreement with the fact that there is no outlier for θ ď ` NM ˘ { . In Figure 1(b), we also display the curve θ ÞÑ ¯ λ p θ q found by numericalminimization which coincides with θ ÞÑ λ N { M p θ q for θ ą ` NM ˘ { . Interestingly, forall values of θ smaller than ` NM ˘ { it appears that ¯ λ p θ q “ ` b NM , suggesting that λ ÞÑ det ` β ˝ p λ qq admits a minimizer at the bulk edge. We now consider the following example of a piecewise constant variance proﬁleΓ N “ « γ N { ˚ M { γ N { ˚ M { γ N { ˚ M { γ N { ˚ M { ﬀ , (3.14)where q denotes the vector of length q with all entries equal to one, and γ , γ arepositive constant such that γ “ ˆ γ . Then, we compare two settings where either u N and v M are unit vectors with constant entries, or u N (resp. v M ) is equal to the ﬁrstvector e N (resp. e M ) of the canonical basis of R N (resp. R M ).In Figure 2, we display the histogram of the singular values of one realization of H N,M “ X N,M ` θu N v ˚ M for diﬀerent values of θ ă u N “ ? N N and v M “ ? M M .When u N and v M are unit vectors with constant entries, a spike θ P t . , . , . , . u clearly generates an outlier at λ P r . , . s , while in the case u N “ e N and v M “ e M there is no outlier for such values of the spike. For each setting, we also display inFigure 3 the curves θ ÞÑ ¯ λ p θ q and θ ÞÑ det ` β ˝ p ¯ λ p θ qq ˘ where ¯ λ p θ q is found by numericalminimization of λ ÞÑ det ` β ˝ p λ q ˘ over R ` . In the case where u N and v M have constantentries, the value of det ` β ˝ p ¯ λ p θ qq ˘ is close to zero when θ P r . , s which conﬁrms theexistence of outliers for values of the spike within this interval. When u N “ e N and v M “ e M , one has clearly that det ` β ˝ p ¯ λ p θ qq ˘ ‰ θ P r . , s and thus, for suchspikes, there is no outlier. In both settings, when θ is suﬃciently large (e.g. θ ě λ p θ q ą . u N and v M .23 .0 0.5 1.0 1.5 2.00.00.51.01.52.02.53.03.54.0 (a) θ “ . (b) θ “ . (c) θ “ . (d) θ “ . Figure 2: Piecewise constant variance proﬁle. (c-f) Histograms of the singular values ofone realization of H N,M “ X N,M ` θu N v ˚ M and smooth approximation of the singularvalues distribution of X N,M (black curve) for u N “ ? N N and v M “ ? M M . The red(resp. green) vertical line denotes the value ¯ λ p θ q when u N “ ? N N and v M “ ? M M (resp. u N “ e N and v M “ e M ), while the blue vertical dashed line denotes the locationof the singular value of H N,M which is the closet to ¯ λ p θ q . We now consider variance proﬁles whose entries may be equal to zero and are chosenrandomly (and independently) as follows. Each entry of Γ

N,M takes either the valuezero with probability 1 ´ p (for some 0 ă p ă

1) or a ﬁxed positive value γ ą p . After randomly ﬁxing the entries of Γ N,M in this way, the value of γ ischosen such that the normalisation condition (3.6) is satisﬁed. In Figure 4, we displaythe histogram of the singular values of one realization of H N,M “ X N,M ` θu N v ˚ M for θ “

2, with u N “ ? N N , v M “ ? M M , and for various Bernoulli variance proﬁles byletting the sampling probability p ranging from M to M . For values of p larger than M , the value ¯ λ p q is an accurate approximation of the location of an outlier in the24 .0 0.5 1.0 1.5 2.0 2.5 3.00.00.51.01.52.02.53.03.54.0 (a) (b) Figure 3: Piecewise constant variance proﬁle. (a) The dashed black line is the curve θ ÞÑ max ´ ` b NM , λ N { M p θ q ¯ , and the red (resp. green) dots are the points p θ, ¯ λ p θ qq when u N “ ? N N and v M “ ? M M (resp. u N “ e N and v M “ e M ). When u N and v M are unit vectors with constant entries, outliers are generated within the interval r . , . s for spikes θ P r . , s . (b) The dashed lines are the curves θ ÞÑ det ` β ˝ p ¯ λ p θ qq ˘ dependingon the choice of p u N , v M q .singular values distribution of H N,M . The shape of the smooth approximation by ˜ f n ofthe singular values distribution of X N,M clearly depends on the value of p . We ﬁnally assume that N “ M “ Γ N,M M is adoubly stochastic matrix. Under such an assumption, an explicit solution of the equationdet ` β ˝ p λ q ˘ “ Lemma 3.4.

Assume that Γ N,M M is a doubly stochastic matrix. Then, the Equation (3.7) admits a solution given by λ p θ q “ p ` θ q θ “ θ ´ ` θ provided that θ ą , (3.15) Proof.

Under the assumption that the variance proﬁle is doubly stochastic, it can beeasily shown that the solution to (3.3) is a scalar matrix G ˝ D p H q p λ I N ` M q “ g ˝ N p λ q I N ` M , where g ˝ N is complex-valued function satisfying g ˝ N p λ q “ p λ ´ g ˝ N p λ qq ´ for λ P C ` .Therefore, one has that g ˝ N p λ q “ λ ´? λ ´ which is the Stieltjes transform of the semi-circular law. Hence, using exactly the same calculations than those made for a constantvariance proﬁle to derive Lemma 3.3, one obtains that, for any unit vectors u N and v N ,the equation det ` β ˝ p λ q ˘ “ ´ θ ` g ˝ N p λ q ˘ “ .0 0.5 1.0 1.5 2.0 2.50.00.10.20.30.40.50.60.70.80.9 (a) p “ M , γ « (b) p “ M , γ « (c) p “ M , γ « (d) p “ M , γ « Figure 4: Bernoulli variance proﬁle. Histogram of the singular values of one realization of H N,M for θ “ p . In each ﬁgure, the black curve is the smooth approximation by ˜ f n of thesingular values distribution of X N,M , the red vertical line denotes the value ¯ λ p q whichis the approximation of the location of an outlier, while the blue vertical dashed linedenotes the location of the singular value of H N,M which is the closet to ¯ λ p q .admits a solution given by (3.15) provided that θ ą c “

1. In Figure 5, we display the histogram of the singular values ofone realization of H N,M “ X N,M ` θu N v ˚ M with θ “ u N and v M chosen to be unitvectors with constant entries. The normalized variance proﬁle Γ N,M M of X N,M is chosenas follows Γ

N,M M “ K M ÿ k “ P k , (3.16)where P , . . . , P K are permutation matrices (obtained by random permutations of thecolumns of the identity matrix I N ). For small values of K , such variance proﬁles have26 .0 0.5 1.0 1.5 2.0 2.50.00.10.20.30.40.50.60.70.80.9 (a) K “ , γ “ (b) K “ , γ “ (c) K “ , γ “ (d) K “ , γ “ Figure 5: Doubly stochastic variance proﬁle. Histogram of the singular values of onerealization of H N,M for θ “ K . In each ﬁgure, the black curve is the smooth approximation by ˜ f n of thesingular values distribution of X N,M , the red vertical line denotes the value λ p q “ . H N,M which is the closet to λ p q .many entries equal to zero. In Figure 5, we display the histogram of the singular values ofone realization of H N,M “ X N,M ` θu N v ˚ M with θ “ K “ , , , K ě

4, there is clearly an outlier located approximately at λ p θ q “ .

5. The qualityof the smooth approximation by ˜ f n of the singular values distribution of X N,M alsoclearly depends on the value of γ (maximum value of the entries of the variance proﬁleΓ N,M ) which is consistent with our theoretical results in Theorem 1.1 on the control ofthe deviation between the deterministic equivalent g ˝ H N and the Stieltjes transform g H N . Let us now consider the general setting of the rectangular information plus noise model(3.1) with Y N,M ‰

0. 27 .3.1 Simulated model

Taking again N “

360 and M “

400 we generate one realization from the model H N,M “ X N,M ` Y N,M ` Z N,M as follows. The matrix X N,M is a Gaussian matrix with piecewiseconstant variance proﬁle given by (3.14). Then, we generate a N ˆ M matrix W N,M with i.i.d. real entries sampled from a Gaussian distribution with zero mean and variance τ “ M , that we write using singular value decomposition (SVD) as W N,M “ U Σ V ˚ .Denoting p U j q ď j ď (resp. p V j q ď j ď ) the left (resp. right) singular vectors associated tothe three largest singular values p σ j q ď j ď of W N,M , we ﬁnally deﬁne Y N,M “ W N,M ´ ÿ j “ σ j U j V ˚ j and Z N,M “ ÿ j “ θ j U j V ˚ j with θ “ θ “ θ “ . The histograms of the s.v.d. of X N,M , Y N,M and H N,M are displayed in Figure 6. It canbe seen that the k “ Z N,M clearly generate 3 outliers.In Figure 6(c), we also display the smooth approximation of the s.v.d. of the randommatrix H N,M “ X N,M ` Y N,M using the inverse Stieltjes transform (1.5) of g ˝ n p λ q “ N ` M Tr G ˝ n p λ q . Since Y N,M is not a diagonal matrix, the deterministic equivalent G ˝ n p λ q of the operator-valued Stieltjes transform of H N,M is obtained by iterating the following matrix equation G ˝ n ` p λ q “ ∆ „´ λ I N ` M ´ R N ` M ` G ˝ n p λ q ˘ ´ Y N,M ¯ ´  . (3.17)In Figure 6(d), we also report the curves of the mapping λ ÞÑ log p β ˝ k p λ qq , where β ˝ k is the deterministic equivalent deﬁned by (1.18), and of the mapping λ ÞÑ log p ˜ β ˝ k p λ qq deﬁned by (1.21). As Y N,M is non diagonal, the second deterministic equivalent ˜ β ˝ k givesthe right prediction of the locations of the outliers which is not the case for the ﬁrst one β ˝ k . This is conﬁrmed by the numerical simulations reported in Figure 6(c), where it canbe seen that the zeros of the mapping λ ÞÑ log p ˜ β ˝ k p λ qq correspond to the locations of thetrue outliers in the s.v.d. of H N,M . To conclude this section on numerical experiments, we study an example inspired bythe problem of low-rank matrix denoising in image processing in the presence of Poissonnoise. In this setting, one observes a N ˆ M data matrix such that each p i, j q -th entryis independently sampled from a Poisson distribution with parameter κ i,j ą

0. Undersuch an assumption, the expectation and variance of each entry are thus equal to κ i,j .Therefore, the following rectangular information plus noise model H N,M “ X N,M ` Γ N,M

M , (3.18)28 .0 0.5 1.0 1.5 2.00.00.51.01.52.02.53.03.54.0 (a) s.v.d. of X N,M (b) s.v.d. of Y N,M (c) s.v.d. of H N,M (d)

Figure 6: Deformed model with a piecewise constant variance proﬁle. Histogram ofthe s.v.d. of (a) X N,M and (b) Y N,M . (c) Histogram of the singular values of H N,M “ X N,M ` Y N,M ` Z N,M with k “ H N,M “ X N,M ` Y N,M . The red (resp. blue) vertical lines denote the values p ˜ λ j q ď j ď (resp. p λ j q ď j ď ) which are the approximation of the locations of outliers givenby the zeros of ˜ β ˝ k (resp. β ˝ k q , (d) Graph of λ ÞÑ log ` det ` β ˝ k p λ q ˘˘ (blue curves) and λ ÞÑ log ` det ` ˜ β ˝ k p λ q ˘˘ (red curves). The blue vertical dashed lines denote the locationsof the true outliers in the s.v.d. of H N,M .where X N,M is a rectangular Gaussian matrix with a variance proﬁle Γ

N,M “ ` γ N,M p i, j q ˘ i,j ,may be viewed as a prototype for studying low-rank matrix denoising in the presenceof Poisson noise (with κ i,j “ γ N,M p i, j q{ M ), as considered e.g. in [62]. We shall referto model (3.18) as the Gaussian setting with equal mean and variance . In Figure 7,we display the image made of the entries of the normalized variance proﬁle Γ N,M M thatis considered in these numerical experiments, as well as the histogram of the singular29

100 200 300 400 500050100150200250300 01234 (a) (b) (c)

Figure 7: Gaussian setting with equal mean and variance. (a) Image of the values (ingrayscale) of the entries of the N ˆ M normalized variance proﬁle Γ N,M M with N “ M “

510 (b) Image of the modulus of the entries of the matrix H N,M . (c) Histogramof the singular values of the matrix Γ

N,M .values of this matrix which is scaled so that it satisﬁes the normalization condition:1 N N ÿ i “ M ÿ j “ γ N,M p i, j q M “ . (3.19)Now, let us consider the SVD of the normalized variance proﬁle Γ N,M M “ U Θ V ˚ . For any1 ď k ď min p N, M q , model (3.18) can be written as H N,M “ X N,M ` Y p k q N,M ` Z p k q N,M , with Z p k q N,M “ U N,k Θ k V ˚ M,k , (3.20)where Θ k is k ˆ k diagonal matrix whose elements are the k largest singular values of Γ N,M M and U N,k (resp. V M,k ) is the matrix made of the associated left (resp. right) singularvectors, and Y p k q N,M “ Γ N,M M ´ U N,k Θ k V ˚ M,k .

10 15 20 25 300.000.020.040.060.080.100.120.14 (a) k “ (b) k “ (c) k “ (d) k “ (e) k “ (f) k “ Figure 8: Gaussian setting with equal mean and variance. First row: histogram of thes.v.d. of H N,M “ X N,M ` Y p k q N,M ` Z p k q N,M for k spikes with k “ , ,

3. The black curvesare the smooth approximations of the s.v.d. of H p k q N,M “ X N,M ` Y p k q N,M for diﬀerentvalues of k , while the green curve is the smooth approximation of the s.v.d. of X N,M .The red (resp. blue) vertical lines denote the values which are the approximation of thelocations of outliers given by the zeros of ˜ β ˝ k (resp. β ˝ k q . Second row: graph of λ ÞÑ log ` det ` β ˝ k p λ q ˘˘ (blue curves) and λ ÞÑ log ` det ` ˜ β ˝ k p λ q ˘˘ (red curves) for diﬀerentvalues of k . The blue vertical dashed lines denote the locations of the true outliers inthe s.v.d. of H N,M is the matrix obtained by keeping only the remaining smallest singular values in theSVD of the normalized variance proﬁle.In Figure 8, we display the histogram of the s.v.d. of H N,M sampled from model(3.18). We also report the smooth approximation of the singular value distributionsof the random matrices X N,M and H p k q N,M : “ X N,M ` Y p k q N,M for k “ , , g ˝ n p λ q “ N ` M Tr G ˝ n p λ q . As described previously,for the matrix X N,M , the deterministic equivalent G ˝ n p λ q of its operator-valued Stieltjestransform is obtained by iterating the vector Dyson equation (3.10). For the matrix H p k q N,M , such a deterministic equivalent is obtained by iterating the matrix equation (3.17).In Figure 8, we also report, for k “ , ,

3, the values of the mapping λ ÞÑ det ` β ˝ k p λ q ˘ and λ ÞÑ det ` ˜ β ˝ k p λ q ˘ for 15 ď λ ď

35. Again, this illustrates the beneﬁts of using ˜ β k instead of β k for outliers detection. Moreover, for all 1 ď k ď k outliers using the mapping λ ÞÑ det ` ˜ β ˝ k p λ q ˘ .31 Organization of the proofs

We ﬁrst describe the mains steps to derive the proof of Theorem 1.1 using the notationof Section 1.2. In Section 5, following the method and terminology of Haagerup andThorbjørnsen in [37], we start by proving a

Master equality (see Lemma 5.7) involvingthe expectation of the generalized resolvent p Λ ´ X N ´ Y N q ´ that can be decomposedas follows E “ p Λ ´ X N ´ Y N q ´ ‰ “ ` Ω X N ,Y N p Λ q ´ Y N ˘ ´ ` F N p Λ q , (4.1)with Ω X N ,Y N p Λ q “ Λ ´ R N ` E “ G X N ` Y N p Λ q ‰˘ , and F N p Λ q “ ` Ω H N p Λ q ´ Y N ˘ ´ E N , where E N is the matrix of covariance between p Λ ´ X N ´ Y N q ´ and R N ` E r G X N ` Y N p Λ qs ˘ given explicitly in (5.11). We deduce thisresult from an identity on the resolvent p Λ ´ H N q ´ that is a consequence of the Gaussianintegration by part formula (see Lemma 5.5 and Lemma 5.6 below).Following the heuristic of Section 2.2, in particular (2.5), the matrix Ω X N ,Y N p Λ q is agood candidate to approximate an operator-valued subordination function. Applying theoperator ∆ on both sides of equality (4.1) provides the following approximate equationfor operator-valued Stieltjes transforms: E “ G X N ` Y N p Λ q ‰ “ G Y N ´ Λ ´ R N ` E “ G X N ` Y N p Λ q ‰˘¯ ` ∆ r F N p Λ qs . (4.2)Equation (4.2) tells that the expectation of G X N ` Y N satisﬁes, up to additive error term,the ﬁxed point equation (1.12) that deﬁnes its deterministic equivalent. Then, using Gaussian Poincaré inequality (see Lemma 5.9), we obtain an upper bound on } E N } and } ∆ p E N q} that we call Master inequalities (see Lemma 5.8 below), following again theterminology of Haagerup and Thorbjørnsen [37]. It will be ﬁnally shown that ›› ∆ r F N p Λ qs ›› ď γ N ´ ˆ }p ℑ m Λ q ´ } . (4.3)and if Y N is diagonal then ›› ∆ r F N p Λ qs ›› ď γ N ´ ˆ }p ℑ m Λ q ´ } . (4.4)In Section 6 we prove the existence of the deterministic equivalent G ˝ H N , solution ofthe equation as in (4.2) taken for ∆ r F N p Λ qs “

0. We also prove that the upper bound(4.3) for ›› ∆ r F N p Λ qs ›› implies a bound for the diﬀerence ›› E “ G H N p Λ q ‰ ´ G ˝ H N p Λ q ›› , thanksto an analysis of regularity of the ﬁxed point problem (1.12). More precisely, G ˝ H N is agood approximation of G H N out of a small strip around the real axe as stated below. Lemma 4.1.

For all δ P p , q and all Λ such that ℑ m Λ ě γ max ´ N p ´ δ q ¯ { I N we have ››› E “ G H N p Λ q ‰ ´ G ˝ H N p Λ q ››› ď ` ` γ { δ }p ℑ m Λ q ´ } ˘ C N , here C N is the bound in the r.h.s. of (4.3) . Moreover, if Y N is diagonal and ℑ m Λ ě γ max ´ N p ´ δ q ¯ { I N , then the above estimate holds with C N as in the r.h.s. of (4.4) . The rest of the proof of Theorem 1.1 is ﬁnally based on the control of the diﬀer-ence between the generalized resolvent p Λ ´ H N q ´ and the expectation of G H N p Λ q . Asexplained in Section 7, the comparison between the random quantity G H N p Λ q and thedeterministic equivalent G ˝ H N p Λ q follows from the combination of Lemma 4.1 and Gaus-sian concentration inequality of Lipschitz functions allowing to show that the entries ofthe generalized resolvent p Λ ´ H N q ´ are close to their expectation with high probabil-ity. Finally, Section 7 ends with a mathematical justiﬁcation of the convergence of thenumerical method used to approximate the solution of the ﬁxed point equation (1.12). We let r N s be the set of integers between 1 and N . Then, we recall some basic propertiesof matrices with complex entries that we repeatedly use in the proof. For any matrix A in M N p C q we denote by } A } its operator norm, namely } A } “ sup x P C N s . t . } x } “ } Ax } , where } x } “ ` x x, x y ˘ , x x, y y “ ÿ i ¯ x i y i , we x ¨ y stands for the standard scalar product on C N . We recall if A is a Hermitian matrixthen } A } is the spectral radius of A , and in general it is the square-root of the spectralradius of AA ˚ . In particular it satisﬁes the C ˚ -norm condition } A } “ } A ˚ } “ } AA ˚ } .We denote by ℜ e A and ℑ m A the real and imaginary parts of A , which are Hermitianmatrices deﬁned by ℜ e A “ p A ` A ˚ q , ℑ m A “ i p A ´ A ˚ q , i “ ´ . We write A ě A ą

0) whenever the matrix A is Hermitian and positive (resp.positive deﬁnite), and A ď A ă

0) if ´ A satisﬁes this property. Lemma 5.1.

Let A in M N p C q such that ℑ m A ą . Then A is invertible and ›› A ´ ›› ď }p ℑ m A q ´ } . (5.1) Proof.

We follow the proof of Lemma 3.1 in [37]. For any unit vector x in C N , since x ℜ e p A q x, x y and x ℑ m p A q x, x y are real, we have } Ax } “ } Ax } } x } ě ˇˇ x Ax, x y ˇˇ “ ˇˇ x ℜ e p A q x, x y ` i x ℑ m p A q x, x y ˇˇ ě ˇˇ x ℑ m p A q x, x y ˇˇ ě }p ℑ m A q ´ } ´ } x } “ }p ℑ m A q ´ } ´ . Hence A is injective, so it is invertible. Since 1 “ } x } “ } A ´ Ax } ď } A ´ } ˆ } Ax } ,we get from the previous lower bound that } A ´ } ď }p ℑ m A q ´ } .33or a diagonal matrix Λ, note that } Λ } “ max i Pr N s | Λ p i, i q| and Λ ě p i, i q ě i “ , . . . , N . Recall that we deﬁned ∆ p A q “ diag i Pr N s ` A p i, i q ˘ for any matrix A . Lemma 5.2.

For any A in M N p C q , one has ›› ∆ p A q ›› ď } A } . (5.2) Moreover, A ď implies ∆ p A q ď and A ă implies ∆ p A q ă .Proof. We have } ∆ p A q ›› “ max i Pr N s ˇˇ A p i, i q ˇˇ “ max i Pr N s ˇˇ x e i , Ae i y ˇˇ ď } A } , where p e i q i Pr N s denotesthe canonical basis of C N . Moreover, if A ď A “ ´ BB ˚ for some Hermitianmatrix B . Hence for any i “ , . . . , N , ∆ p A qp i, i q “ ´ ř Nj “ ˇˇ B p i, j q ˇˇ ď , and so we get∆ p A q ď

0. If moreover A ă A is invertible so the i -th line of B is nonzero forany i “ , . . . , N and hence ∆ p A q ă } Λ ´ } “ ´ min i Pr N s | Λ p i, i q| ¯ ´ . Weuse several times the following direct consequence of Lemmas 5.1 and 5.2.

Lemma 5.3.

For any Hermitian matrix A and any diagonal matrix Λ such that ℑ m Λ ą , the matrix p Λ ´ A q is invertible and ›› p Λ ´ A q ´ ›› ď }p ℑ m Λ q ´ } . (5.3) Hence the map G A : Λ ÞÑ ∆ “ p Λ ´ A q ´ ‰ is well deﬁned on D N p C ` q , and it satisﬁes ›› G A p Λ q ›› ď }p ℑ m Λ q ´ } and ℑ m G A p Λ q ă for all Λ . Recall that we deﬁned R N p G q “ diag i Pr N s ´ ř Nj “ γ i,j N G j,j ¯ for any diagonal matrix G .Throughout the paper, we denote γ max “ max i,j γ i,j . Lemma 5.4.

For any diagonal matrix G such that ℑ m G ă , one has R N p G q ď .Moreover, for any diagonal matrix G , the following operator norm bound holds } R N p G q} ď γ } G } . Proof.

For any G such that ℑ m G ă

0, since the matrix Γ N has nonnegative entries, wehave ℑ m ` R N p G q ˘ “ ÿ i Pr N s i ” N ÿ j “ γ i,j N G j,j E i,i ´ ` N ÿ j “ γ i,j N G j,j E i,i ˘ ˚ ı “ ÿ i Pr N s N ÿ j “ γ i,j N ℑ m p G j,j q E i,i , E i,i “ e i b e i denotes the matrix will entries equal to zero except the p i, i q -thone which is equal to one. Hence the diagonal entries of R N p G q are indeed nonpositive.Moreover, for any diagonal matrix G we have } R N p G q} “ max i Pr N s ˇˇ R N p G qp i, i q ˇˇ ď γ N N ÿ j “ | G p j, j q| . Since | G p j, j q| ď } G } , we obtain the expected inequality. The notation are as in Section 1.2, in particular we denote H N “ X N ` Y N . We startwith the following observation. Lemma 5.5.

For any Λ P D N p C ` q , diagonal matrix with positive imaginary part, re-calling that G H N p Λ q “ ∆ “ p Λ ´ H N q ´ ‰ , we have the equality between the N ˆ N matrices E “ X N p Λ ´ H N q ´ ‰ “ E ” R N ` G H N p Λ q ˘ p Λ ´ H N q ´ ı . (5.4)Lemma 5.5 is a consequence of the well-known Gaussian integration by part formula(see e.g. Lemma 3.3 in [37]) that we recall below. Lemma 5.6.

Let f : R q Ñ C be a continuously diﬀerentiable function, and X , . . . , X q asequence of independent centered real Gaussian variables with possibly diﬀerent variances V ar p X k q “ γ k for ď k ď q . Then, under the conditions that f and its ﬁrst orderderivatives BB x f, . . . , BB x q f are polynomially bounded, one has that E “ X k f p X , . . . , X q q ‰ “ γ k E „ BB x k f p X , . . . , X q q  . (5.5) Proof of Lemma 5.5.

Let us write the random matrix X N in an appropriate orthonormalbasis to make its dependency on only real Gaussian variables more explicit. Let E i,j “ e i b e j be the canonical basis of N ˆ N matrices, and deﬁne F i,j “ $’&’% E i,j if i “ j ? p E i,j ` E j,i q if i ą j i ? p E i,j ´ E j,i q if i ă j (5.6)Then, X N can be decomposed as X N “ N ÿ i,j “ x i,j F i,j , (5.7)35here the x i,j are i.i.d. real Gaussian random variables, centered and such that x i,j hasvariance γ i,j N . This implies that E “ X N p Λ ´ H N q ´ ‰ “ N ÿ i,j “ F i,j E “ x i,j p Λ ´ H N q ´ ‰ . By the Gaussian integration by part (5.5), we have E “ x i,j p Λ ´ H N q ´ ‰ “ γ i,j N E ” ddǫ ˇˇˇˇ ǫ “ p Λ ´ H N ´ ǫF i,j q ´ ı . Now, recalling that H N “ X N ` Y N , we can compute ddǫ ˇˇˇˇ ǫ “ p Λ ´ H N ´ ǫA q ´ “ p Λ ´ H N q ´ A p Λ ´ H N q ´ (5.8)for any direction A P C N ˆ N , and get the following relation E “ X N p Λ ´ H N q ´ ‰ “ N ÿ i,j “ γ i,j N F i,j E “ p Λ ´ H N q ´ F i,j p Λ ´ H N q ´ ‰ . (5.9)Note that (5.3) ensures that the function and its derivatives are bounded, so we cancorrectly apply (5.5). Moreover, since γ ij “ γ ji , we have for any matrix A N ÿ i,j “ γ i,j N F i,j AF i,j “ N ÿ i “ γ i,i N E i,i AE i,i ` ÿ i ą j γ i,j N p E i,j ` E j,i q A p E i,j ` E j,i q´ ÿ i ă j γ i,j N p E i,j ´ E j,i q A p E i,j ´ E j,i q“ N ÿ i “ γ i,i N E i,i AE i,i ` ÿ i ą j γ i,j N p E i,j AE j,i ` E j,i AE i,j q“ N ÿ i,j “ γ i,j N A jj E ii “ R N ` ∆ p A q ˘ . (5.10)Combined with (5.9), the equality implies the expected result E “ X N p Λ ´ H N q ´ ‰ “ E “ R N ` ∆ rp Λ ´ H N q ´ s ˘ p Λ ´ H N q ´ ‰ . Let us now introduce E N “ E N p Λ q deﬁned as the covariance between the matrices R N ` G H N p Λ q ˘ and p Λ ´ H N q ´ , namely E N “ E ” R N ` G H N p Λ q ˘ p Λ ´ H N q ´ ı ´ E ” R N ` G H N p Λ q ˘ı ˆ E ” p Λ ´ H N q ´ ı . (5.11)We can now state and prove the so-called Master equality introduced in Equation (4.1).36 emma 5.7 (Master equality) . For any Λ P D N p C q ` , we deﬁne the diagonal matrix Ω H N p Λ q “ Λ ´ R N ´ E “ G H N p Λ q ‰¯ . (5.12) Then, we have ℑ m Ω H N p Λ q ą ℑ m Λ , and the following equality holds for all Λ P D N p C q ` : E “ p Λ ´ X N ´ Y N q ´ ‰ “ ` Ω H N p Λ q ´ Y N ˘ ´ ˆ ` I N ` E N ˘ . (5.13) Proof.

Starting with the left hand side X N p Λ ´ H N q ´ in (5.4), we want to obtain anexpression involving only generalized resolvent. Recall that H N “ X N ` Y N . If we wheresolely considering the random matrix X N and assume Y N “

0, we should write X N p Λ ´ X N q ´ “ ` ´ p Λ ´ X N q ` Λ ˘ p Λ ´ X N q ´ “ ´ I N ` Λ p Λ ´ H N q ´ . When Y N is non zero, we ﬁrst ﬁx a deterministic matrix Ω P D N p C ` q whose choice isdetermined later on, and multiplying on the left by p Ω ´ Y N q ´ our expression underconsideration: we have p Ω ´ Y N q ´ X N p Λ ´ H N q ´ “ p Ω ´ Y N q ´ “ ´ p Λ ´ X N ´ Y N q ` p Ω ´ Y N q ` p Λ ´ Ω q ‰ p Λ ´ H N q ´ “ ´p Ω ´ Y N q ´ ` p Λ ´ H N q ´ ` p Ω ´ Y N q ´ p Λ ´ Ω qp Λ ´ H N q ´ . Moreover, since Y N is deterministic, the master equality (5.4) is equivalent to E “ p Ω ´ Y N q ´ X N p Λ ´ H N q ´ ‰ “ E “ p Ω ´ Y N q ´ R N ` G H N p Λ q ˘ p Λ ´ H N q ´ ‰ . Introducing the map f : A ÞÑ E “ p Ω ´ Y N q ´ A p Λ ´ H N q ´ ‰ for any random matrix A ,we obtain E “ p Λ ´ X N ´ Y N q ´ ‰ “ ` Ω ´ Y N ˘ ´ ` f ´ R N ` G H N p Λ q ˘¯ ´ f p Λ ´ Ω q . Since f is linear, we have f ´ R N ` G H N p Λ q ˘¯ ´ f p Λ ´ Ω q“ f ´ E “ R N ` G H N p Λ q ‰ ´ Λ ` Ω ˘¯ ` f ´ R N ` G H N p Λ q ˘ ´ E “ R N ` G H N p Λ q ˘‰¯ . We set Ω “ Ω H N p Λ q “ Λ ´ R N ` E “ G H N p Λ q ‰˘ , so that the ﬁrst term in the above equalityvanishes. By Lemma 5.4, we have that ℑ m ` R N ` G H N p Λ q ˘˘ ď

0, so Ω H N belongs toD N p C ` q and satisﬁes ℑ m Ω H N p Λ q ą ℑ m Λ. Hence we obtain the following expression E “ p Λ ´ X N ´ Y N q ´ ‰ “ ` Ω H N p Λ q ´ Y N ˘ ´ ` f ´ R N ` G H N p Λ q ˘ ´ E “ R N ` G H N p Λ q ˘‰¯ . Since Y N is deterministic, the above identity completes the proof of Lemma 5.7.37 .3 The Master inequality We prove the following estimates on E N “ E N p Λ q deﬁned in (5.11) as the covariancebetween the matrices R N ` G H N p Λ q ˘ and p Λ ´ H N q ´ Lemma 5.8 (Master inequality) . Recall that we denote γ max “ max i,j γ i,j . For any Λ belonging to D N p C ` q , we have ›› E N ›› ď γ N ´ ˆ }p ℑ m Λ q ´ } , (5.14) ›› ∆ “ E N ‰›› ď γ N ´ ˆ }p ℑ m Λ q ´ } . (5.15)By Lemma 5.7 one has that } ℑ m Ω ´ H N p Λ q} ď }p ℑ m Λ q ´ } . Hence as announced inthe presentation of the proof of Section 4, Lemma 5.8 implies that the operator-valuedsubordination property (4.2) approximatively holds up to an error term ∆ r F N p Λ qs “ ∆ “ p Ω H N ´ Y N q ´ E N ‰ satisfying ›› ∆ r F N p Λ qs ›› ď ›› p Ω H N ´ Y N q ´ E N ›› ď γ N ´ ˆ }p ℑ m Λ q ´ } . Moreover, if Y N is diagonal then so is p Ω H N ´ Y N q ´ and we have ›› ∆ r F N p Λ qs ›› “ ›› p Ω H N ´ Y N q ´ ∆ “ E N ‰›› ď γ N ´ ˆ }p ℑ m Λ q ´ } . The rest of the section is devoted to the proof of Lemma 5.8. To abbreviate thenotation, we deﬁne the random matrices (and functions of the diagonal parameter Λ) A N “ p Λ ´ H N q ´ , ˝ A N “ A N ´ E r A N s ,D N “ R N ` ∆ p A N q ˘ , ˝ D N “ D N ´ E r D N s , so that E N “ E “ ˝ D N ˝ A N ‰ . We ﬁrst use the Cauchy-Schwarz inequality for the norm ›› E “ ˝ D N ˝ A ‰›› ď ›› E “ ˝ D N ˝ D ˚ N ‰›› ›› E “ ˝ A ˚ N ˝ A N ‰›› . (5.16)We recall the proof of (5.16) from [40, Lemma 3.1 (a1)]. By the singular value decom-position, } E r ˝ D N ˝ A N s} is the supremum of ˇˇ@ E r ˝ D N ˝ A N s x, y Dˇˇ over all unit vectors x and y in C N . Moreover, we have ˇˇˇ@ E r ˝ D N ˝ A N s x, y Dˇˇˇ ď E “ˇˇ x ˝ D N ˝ A N x, y y ˇˇ‰ ď E “ } ˝ D ˚ N y } ˆ } ˝ A N x } ‰ ď ` E r} ˝ D ˚ N y } s ˆ E r} ˝ A N x } s ˘ “ ` x E r ˝ D N ˝ D ˚ N s y, y y ˆ x E r ˝ A ˚ N ˝ A N s x, x y ˘ ď ›› E “ ˝ D N ˝ D ˚ N ‰›› ›› E “ ˝ A ˚ N ˝ A N ‰›› . ›› E N ›› ď sup k Pr N s ´ V ar D N p k, k q ¯ ˆ ›› E “ ˝ A ˚ N ˝ A N ‰›› (5.17)For the diagonal of E N we shall use the classical Cauchy-Schwarz inequality: ›› ∆ “ E N ‰›› “ ›› ˝ D N ∆ “ ˝ A N ‰›› “ max k Pr N s ´ˇˇˇ E “ ˝ D N p k, k q ˆ ˝ A N p k, k q ˇˇˇ¯ ď max k Pr N s ˆ V ar ` D N p k, k q ˘ V ar ` A N p k, k q ˘˙ . (5.18)We now state the Gaussian Poincaré inequality (see e.g. Proposition 4.1 in [37]),which allows use to estimate the variances V ar ` A N p k, k q ˘ and V ar ` D N p k, k q ˘ that appearin Inequalities (5.17) and (5.18). Proposition 5.9 (Gaussian Poincaré inequality) . Let f : R q Ñ C be a continuouslydiﬀerentiable function, and X , . . . , X q a sequence of independent centered real Gaussianvariables with possibly diﬀerent variances V ar p X k q “ γ k for ď k ď q . Then, under thecondition that f and its ﬁrst order derivatives are polynomially bounded, one has that V ar p f p X , . . . , X q qq ď E ´ } Γ { ∇ f p X , . . . , X q q} ¯ where Γ “ diag p γ , . . . , γ q q , ∇ f is the gradient of f , and } ¨ } is the standard Euclideannorm of a vector with complex entries. We write the matrices A N and D N as functions of the independent real Gaussianrandom variables deﬁned in (5.7), namely: recalling H N “ X N ` Y N , we deﬁne for anyreal matrix M “ p m i,j q i,j the diagonal matrices f p M q “ ∆ ”` Λ ´ ÿ i,j m i,j F i,j ´ Y N ˘ ´ ı ,f p M q “ R N ´ ∆ ”´ Λ ´ ÿ i,j m i,j F i,j ´ Y N ¯ ´ ı¯ , where p F i,j q i,j is the basis of Hermitian matrices deﬁned by (5.6), so that we have∆ r A N s “ f ` p x i,j q i,j ˘ and D N “ f ` p x i,j q i,j ˘ for the coordinates p x i,j q i,j of X N in thebasis p F i,j q i,j , see (5.7). By the same computation as for the derivative in (5.8), and bythe linearity of the maps R N and ∆, we obtain BB x i,j f ` p x i,j q i,j ˘ “ ∆ “ p Λ ´ H N q ´ F i,j p Λ ´ H N q ´ ‰ “ ∆ r A N F i,j A N s , BB x i,j f ` p x i,j q i,j ˘ “ R N ´ ∆ “ p Λ ´ H N q ´ F i,j p Λ ´ H N q ´ ‰¯ “ R N ´ ∆ “ A N F i,j A N q ‰¯ . ℓ P r N s , we apply Proposition 5.9 (Gaussian Poincaré inequality) to the ℓ -thdiagonal entry of A N and of D N , and hence get V ar ` A N p ℓ, ℓ q ˘ ď E „ ÿ i,j γ i,j N ˇˇˇ“ A N F i,j A N ‰ p ℓ, ℓ q ˇˇˇ  ď γ N E „ ÿ i,j ˇˇˇ“ A N F i,j A N ‰ p ℓ, ℓ q ˇˇˇ  , (5.19) V ar ` D N p ℓ, ℓ q ˘ ď E „ ÿ i,j γ i,j N ˇˇˇ R N ´ ∆ “ A N F i,j A N ‰¯ p ℓ, ℓ q ˇˇˇ  ď γ N E „ ÿ i,j ˇˇˇ R N ´ ∆ “ A N F i,j A N ‰¯ p ℓ, ℓ q ˇˇˇ  . (5.20) Hence to obtain the required estimates for (5.17) and (5.18), one should ﬁnd upperbounds for (5.19) and (5.20). This is the purpose of this section, where we derive belowthe two estimates (5.22) and (5.24), which allow to complete the proof of Lemma 5.8.We ﬁrst write in a diﬀerent way the term ř i,j ˇˇ“ A N F i,j A N ‰ p ℓ, ℓ q ˇˇ . Recall that p e k q k Pr N s denotes the canonical basis of C N . Recalling that the entry p ℓ, ℓ q of a ma-trix M is equal to e ˚ ℓ M e ℓ , that the elementary matrix E ℓ,k equals e ℓ e ˚ k . Since F ij is aHermitian matrix, we have ˇˇˇ“ A N F i,j A N ‰ p ℓ, ℓ q ˇˇˇ “ “ A N F i,j A N ‰ p ℓ, ℓ q ˆ “ A ˚ N F i,j A ˚ N ‰ p ℓ, ℓ q“ e ˚ ℓ A N F i,j A N e ℓ ˆ e ˚ ℓ A ˚ N F i,j A ˚ N e ℓ , Note also that (5.10) implies the following equality, valid for any matrix M : N ÿ i,j “ F ij M F ij “ N ÿ i,j “ M jj E ii “ N ÿ i,j “ e i e ˚ j M e j e ˚ i . (5.21)Using (5.21) for M “ A N e ℓ e ˚ ℓ A ˚ N , we get ÿ i,j ˇˇˇ“ A N F i,j A N ‰ p ℓ, ℓ q ˇˇˇ “ e ˚ ℓ A N ÿ i,j ` F i,j M F i,j ˘ A ˚ N e ℓ “ e ˚ ℓ A N ÿ i,j ` e i e ˚ j M e j e ˚ i ˘ A ˚ N e ℓ “ ÿ i,j e ˚ ℓ A N e i e ˚ j A N e ℓ e ˚ ℓ A ˚ N e j e ˚ i A ˚ N e ℓ “ ÿ i,j | A N p ℓ, i q| | A N p j, ℓ q| “ “ A N A ˚ N sp ℓ, ℓ q . V ar ` A N p ℓ, ℓ q ˘ ď γ N ´ E ”“ A N A ˚ N sp ℓ, ℓ q ı ď γ N ´ E “ } A N } ‰ ď γ N ´ }p ℑ m Λ q ´ } . (5.22)Similarly, we re-write the term ř i,j ˇˇ R N ` ∆ “ A N F i,j A N ‰˘ p ℓ, ℓ q ˇˇ . Since for any matrix M , R N ` ∆ M ˘ p ℓ, ℓ q “ ÿ ℓ γ ℓℓ N e ˚ ℓ M e ℓ , we obtain ˇˇˇ R N ´ ∆ “ A N F i,j A N ‰¯ p ℓ, ℓ q ˇˇˇ “ R N ´ ∆ “ A N F i,j A N ‰¯ p ℓ, ℓ q R N ´ ∆ “ A ˚ N F i,j A ˚ N ‰¯ p ℓ, ℓ q“ ÿ ℓ ,ℓ γ ℓ,ℓ γ ℓ,ℓ N ´ e ˚ ℓ A N ´ F i,j A N e ℓ e ˚ ℓ A ˚ N F i,j ¯ A ˚ N e ℓ . Using (5.21) for M “ A N e ℓ e ˚ ℓ A ˚ N , we get the expressions and estimates ÿ i,j ˇˇˇ R N ´ ∆ “ A N F i,j A N ‰¯ p ℓ, ℓ q ˇˇˇ “ ÿ i,j,ℓ ,ℓ γ ℓ,ℓ γ ℓ,ℓ N ´ e ˚ ℓ A N e i e ˚ j A N e ℓ e ˚ ℓ A ˚ N e j e ˚ i A ˚ N e ℓ “ ÿ i,j,ℓ ,ℓ γ ℓ,ℓ γ ℓ,ℓ N ´ A N p ℓ , i q A N p j, ℓ q A ˚ N p ℓ , j q A ˚ N p i, ℓ qď γ N ´ ÿ ℓ ,ℓ ˇˇˇ ÿ i,j A N p ℓ , i q A N p j, ℓ q A ˚ N p ℓ , j q A ˚ N p i, ℓ q ˇˇˇ “ γ N ´ ÿ ℓ ,ℓ ˇˇˇ p A N A ˚ N qp ℓ , ℓ q ˆ p A ˚ A qp ℓ , ℓ q ˇˇˇ . The Cauchy-Schwarz inequality for ř ℓ ,ℓ implies ÿ i,j ˇˇˇ R N ´ ∆ “ A N F i,j A N ‰¯ p ℓ, ℓ q ˇˇˇ ď γ N ´ ˆ ´ ÿ ℓ ,ℓ ˇˇˇ p A N A ˚ N qp ℓ , ℓ q ˇˇˇ ¯ ˆ ´ ÿ ℓ ,ℓ ˇˇˇ p A ˚ A qp ℓ , ℓ q ˇˇˇ ¯ “ γ N ´ ˆ Tr “ p A N A ˚ N q ‰ ď γ N ´ }p ℑ m Λ q ´ } . Hence with (5.20), the above inequality gives V ar ` D N p ℓ, ℓ q ˘ ď γ N ´ ˆ γ N ´ }p ℑ m Λ q ´ } (5.23) “ γ N ´ }p ℑ m Λ q ´ } . (5.24)41nfortunately, the Poincaré inequality does not conclude to an interesting estimatefor ›› E “ ˝ A ˚ N ˝ A N ‰›› that will be roughly bounded by 4 }p ℑ m Λ q ´ } . Inserting the estimates(5.24) in (5.17) give ›› E N ›› ď ´ γ N ´ }p ℑ m Λ q ´ } ˆ }p ℑ m Λ q ´ } ¯ “ γ N ´ }p ℑ m Λ q} . Moreover, (5.22) and (5.17) implies ››› ∆ “ E N ‰››› ď ´ γ N ´ }p ℑ m Λ q ´ } ˆ γ N ´ }p ℑ m Λ q ´ } ¯ “ γ N ´ }p ℑ m Λ q ´ } , which proves Inequality (5.14) and complete the proof of Lemma 5.8. In this section, we prove the existence and uniqueness of the deterministic equivalent G ˝ H N p Λ q , and we derive an estimate for the diﬀerence G ˝ H N ´ E “ G H N ‰ . Recall that for any Λ in D N p C ` q , we denote G Y N p Λ q “ ∆ “ p Λ ´ Y N q ´ ‰ . For any Λ inD N p C ` q we consider the function ψ Λ : D N p C q ´ Ñ D N p C q ´ G ÞÑ G Y N ` Λ ´ R N p G q ˘ . (6.1)So G ˝ H N p Λ q is solution of Equation (1.12) of Theorem 1.1 if and only if G ˝ H N p Λ q is a ﬁxedpoint of ψ Λ for any Λ. Note that by Lemma 5.4, when G P D N p C q ´ then R N p G q ď ´ R N p G q P D N p C ` q . Hence we can correctly evaluate the function G Y N inthis diagonal matrix and the expression deﬁning ψ Λ makes sense. Moreover, by thelast statement of Lemma 5.3 we indeed have ψ Λ p G q P D N p C q ´ . We shall use the nextstatement later. Lemma 6.1.

For any Λ P D N p C q ` the function ψ Λ is bounded and Lipschitz for theoperator norm: for any G, G P D N p C q ´ , } ψ Λ p G q} ď }p ℑ m Λ q ´ } , } ψ Λ p G q ´ ψ Λ p G q} ď γ ›› p ℑ m Λ q ´ ›› ˆ } G ´ G } . Proof.

We have by Lemmas 5.3 and 5.4 } ψ Λ p G q} “ ››› G Y N ` Λ ´ R N p G q ˘››› ď ››› ℑ m ` Λ ´ R N p G q ˘ ´ ››› ď }p ℑ m Λ q ´ } } ¨ } is an algebra norm, for any G, G in D N p C q ´ we have } ψ Λ p G q ´ ψ Λ p G q}“ ››› ∆ ´` Λ ´ R N p G q ´ Y N ˘ ´ ´ R N p G ´ G q ¯` Λ ´ R N p G q ´ Y N ˘ ´ ¯››› ď ››` Λ ´ R N p G q ´ Y N ˘ ´ ›› ˆ ›› R N p G ´ G q ›› ˆ ››` Λ ´ R N p G q ´ Y N ˘ ´ ›› By (5.3) we have ››` Λ ´ R N p G q ´ Y N ˘ ´ ›› ď ›› ℑ m ` Λ ´ R N p G q ˘ ´ ›› . But by Lemma 5.4we have ℑ m ` Λ ´ R N p G q ˘ ě ℑ m Λ and so ››` Λ ´ R N p G q ´ Y N ˘ ´ ›› ď }p ℑ m Λ q ´ } . Thesame inequality holds for G instead of G . These inequalities together with the estimate ›› R N p G ´ G q ›› ď γ } G ´ G } of Lemma 5.4 yield } ψ Λ p G q ´ ψ Λ p G q} ď γ ›› p ℑ m Λ q ´ ›› ˆ } G ´ G } . By Banach ﬁxed-point theorem and Lemma 6.1, we get the existence and uniquenessa priori of the deterministic equivalent on a region on D N p C q ` . Corollary 6.2.

For any Λ such that ℑ m Λ ą γ max I N there exists a unique deterministicdiagonal matrix G ˝ H N p Λ q P D N p C q ´ such that G ˝ H N p Λ q “ ψ Λ ` G ˝ H N p Λ q ˘ . We want to extend the previous corollary for any Λ in D N p C q ` . The diﬃculty is that ψ Λ is not contractive in general. Fortunately, it will be enough in our problem to consideruniqueness in the class of analytic function in several variables. Recall from [54] thatfor any open set Ω of D N p C q , we say that a function G : Ω Ñ M N p C q is analytic on Ωwhenever for any k, ℓ “ , . . . , N the function p λ , . . . , λ N q ÞÑ G ` diag p λ , . . . , λ N q ˘ p k, ℓ q are analytic in each variable λ i . Lemma 6.3.

There exists a unique deterministic analytic map G ˝ H N : D N p C q ` Ñ D N p C q ´ such that G ˝ H N p Λ q “ ψ Λ ` G ˝ H N p Λ q ˘ for any Λ P D N p C q ` . Moreover, for any Λ , Λ P D N p C q ` , ›› G ˝ H N p Λ q ´ G ˝ H N p Λ q ›› ď }p ℑ m Λ q ´ }}p ℑ m Λ q ´ } ˆ } Λ ´ Λ } . (6.2)The remainder of the subsection is devoted to the proof of Lemma 6.3. By [54,Conclusion 1.2.1.2], the analytic continuation principle holds for analytic maps in severalvariables. Hence, by Corollary 6.2 we know that there exists at most one analytic map G solution of the ﬁxed point problem on D N p C q ` , since all solutions must coincide in t Λ P D N p C q : ℑ m Λ ą γ max I N u . In Section 6.2.2 we prove the existence of such function G ˝ H N and that it satisﬁes the estimate (6.2). Then, Section 6.2.3 will be dedicated tothe proof of analyticity. 43 .2.2 Large random matrix model Given N , Γ N and Y N ﬁxed we consider an intermediate sequence of Hermitian randommatrices H N,M of size

N M by N M . Seeing a generic element A of M NM p C q as anelement of M N p C q b M M p C q , we denote A “ ` A i,i ,j,j ˘ i,j Pr N s i ,j Pr M s “ ÿ i,j Pr N s i ,j Pr M s A i,i ,j,j E i,j b E i ,j . Then, we consider two new random and deterministic matrices. - Let X N,M be a G.U.E. matrix with variance proﬁle Γ

N,M “ Γ N b M,M , where M,M is the matrix whose all entries are one. Hence the variance proﬁle is constant onblocks of size M ˆ M and we can write X N,M “ ÿ i,j Pr N s i ,j Pr M s γ i,j N ´ M ´ x i,j,i ,j E i,j b E i ,j , where the x i,j,i ,j are complex Gaussian random variables, i.i.d. up to the Hermitiansymmetry, centered and such that x i,j,i ,j has variance 1. - We denote by Y N,M “ Y N b I M , where I M is the M ˆ M identity matrix. Itis a deterministic matrix diagonal by blocks of size M ˆ M , so that Y N,M “ ř i,i ,j Y N p i, j q E i,j b E i ,i .We set H N,M “ X N,M ` Y N,M . To avoid ambiguity, we denote by Λ N a generic elementof D N p C q ` and Λ M,N a generic element of D NM p C q ` . We consider the diagonal of thegeneralized resolvent G H N,M : D NM p C q ` Ñ D NM p C q ´ Λ N,M ÞÑ ∆ ”` Λ N,M ´ H N,M ˘ ´ ı , and the deterministic function G ˝ ,MH N : D N p C q ` Ñ D N p C q ´ Λ N ÞÑ p id b M Tr q ” E “ G H N,M p Λ N b I M q ‰ı . (6.3)Note ﬁrst that since } G ˝ ,MH N p Λ N q} ď }p ℑ m Λ q ´ } , we know that up to a subsequence G ˝ ,MH N p Λ N q has a limit G ˝ H N p Λ N q as M goes to inﬁnity for any Λ N P D N p C q ´ . Moreover,the same computation as Lemma 6.1 yields } G ˝ ,MH N p Λ N q ´ G ˝ ,MH N p Λ N q} ď }p ℑ m Λ N q ´ }}p ℑ m Λ N q ´ } ˆ } Λ N ´ Λ N } . (6.4)Letting M going to inﬁnity along a subsequence, this implies that the estimate (6.2) isvalid for any accumulation point G ˝ H N p Λ N q of G ˝ ,MH N p Λ N q .44e shall now prove that G ˝ ,MH N converges when M goes to inﬁnity to a solution of theﬁxed point problem. Thanks to Lemma 5.8, we may now apply Equality (4.2) to therandom matrix H N,M : for any Λ

N,M P D NM p C q , we have E “ G H N,M p Λ N,M q ‰ “ ψ Λ N,M ´ E “ G H N,M p Λ N,M q ‰¯ ` Θ M,N , (6.5)where ψ Λ M,N p G q “ G Y M,N ` Λ M,N ´ R N,M p G q ˘ ´ , for any G P D NM p C q , and we have the estimates } Θ M,N } ď γ N ´ ˆ }p ℑ m Λ q ´ } , and } Θ M,N } ď γ N ´ ˆ }p ℑ m Λ q ´ } if Y N is diagonal. Note that γ is indeed themaximum of the variances in the proﬁle Γ N,M . Moreover, because of the deﬁnition ofΓ

N,M the map R N,M is given by: for any G P D NM p C q , for any i P r N s , i P r M s R N,M p G qp i, i , i, i q “ ÿ j,j ´ Γ N p i, j q ˆ M,M p i , j q N M ¯ ˆ G p j, j , j, j q“ ÿ j Pr N s ´ Γ N p i, j q N ¯ M ÿ j Pr M s G p j, j , j, j q . Since the above expression does not depends on i , this proves that R N,M p G q “ R N ´` id b M Tr ˘ p G q ¯ b I M . Moreover note that if Λ

N,M “ Λ N b I M then G Y N,M p Λ N,M q “ G Y N p Λ N q b I M . Hencewe get for any G P D NM p C q ψ Λ N b I M p G q “ ψ Λ N ´` id b M Tr ˘ p G q ¯ b I M . Hence, applying id b M Tr in (6.5) yields the following formula: G ˝ ,MH N p Λ N q “ ψ Λ N ` G ˝ ,MH N p Λ N q ˘ ` p id b M Tr q ` Θ M,N ˘ . Since } Θ M,N } goes to zero as M goes to inﬁnity and by the continuity of ψ Λ N , letting M go to inﬁnity proves that any accumulation point G ˝ H N of G ˝ ,MH N is solution of the ﬁxedpoint problem that satisﬁes (6.2) thanks to Inequality (6.4). Let us justify that the quantities under consideration up to now are analytic functions.

Lemma 6.4.

For any Hermitian random matrix M , the function Λ ÞÑ E “ p Λ ´ M q ´ ‰ is analytic on D N p C q ` . roof. Let us ﬁrst assume that the matrices are deterministic. We prove the lemma byinduction of the size N of the matrices. Since for any m P R the map λ ÞÑ p λ ´ m q ´ isanalytic on C ` , the lemma is true for N “

1. From now we ﬁx N ě N ´ “ diag p λ , . . . , λ N q and recall that we denote A N “ p Λ ´ M q ´ . For any k P r N s , let A p k q N ´ be the inverse of the N ´ N ´ p Λ ´ M q by removing the k -th line and column. Let m p k q be the vector of size N ´ k -th column of M by removing the k -th entry. Recall the Schur complementformula [8, Appendix A.1.4]: p Λ ´ M q ´ p k, k q “ A N p k, k q “ ´ λ k ´ M p k, k q ´ m p k q˚ A p k q N ´ m p k q ¯ ´ , @ k P r N sp Λ ´ M q ´ p k, ℓ q “ ´ ` A p k q N ´ m p k q˚ ˘ p ℓ q ˆ A N p k, k q , @ k ą ℓ P r N sp Λ ´ M q ´ p k, ℓ q “ ´ ` m p k q A p k q N ´ ˘ p ℓ q ˆ A N p k, k q , @ k ă ℓ P r N s . By induction hypothesis, Λ p k q ÞÑ A p k q N ´ is analytic on D N ´ p C q ` . By Lemma 5.3 wehave ℑ m ´ λ k ´ M p k, k q ´ m p k q˚ A p k q N ´ m p k q ¯ “ ℑ m λ k ´ m p k q˚ ` ℑ m A p k q N ´ ˘ m p k q ě ℑ m λ k Hence the maps Λ

ÞÑ p Λ ´ M q ´ p k, k q are analytic in each variable for each k “ , . . . , N ,and hence so are the maps Λ ÞÑ p Λ ´ M q ´ p k, ℓ q for any k, ℓ .Let now assume that M is random. Each realization Λ ÞÑ p Λ ´ M q ´ is analytic andthe map is bounded. Hence Λ ÞÑ E “ p Λ ´ M q ´ ‰ is also analytic.Hence the map G ˝ ,MH N deﬁned by (6.3) is indeed analytic. Since it is Lipschitz byInequality (6.4), it follows that every accumulation point G ˝ H N of the sequence is alsoanalytic. This ﬁnishes the proof of Lemma 6.3. Let G ˝ H N : D N p C ` q Ñ D N p C q ´ be the deterministic equivalent, unique analytic solutionof the ﬁxed point problem G ˝ H N p Λ q “ G Y N ´ Λ ´ R N ` G ˝ H N p Λ q ˘¯ . For reading convenience, we recall that E “ G H N p Λ q ‰ “ E ” ∆ “ p Λ ´ H N q ´ ‰ı satisﬁes theapproximate subordination property, namely E “ G H N p Λ q ‰ “ G Y N ´ Λ ´ R N ` E “ G H N p Λ q ‰˘¯ ` Θ N p Λ q , (6.6)where Θ N p Λ q “ ∆ “ p Ω H N p Λ q ´ Y N q ´ E N p Λ q ‰ . The operator norm of Θ N p Λ q satisﬁes byLemma 5.8 } Θ N p Λ q} ď C N where C N “ γ N ´ ˆ }p ℑ m Λ q ´ } (6.7)46n general, and C N “ γ N ´ ˆ }p ℑ m Λ q ´ } (6.8)if Y N is diagonal. The purpose of this section is to prove Lemma 4.1, giving an estimatefor the norm of the diﬀerence ›› G ˝ H N p Λ q ´ E “ G H N p Λ q ‰›› .We deﬁne two diagonal matrices by˜ G N p Λ q “ E “ G H N p Λ q ‰ ´ Θ N p Λ q , ˜Λ “ Λ ´ R N ` Θ N p Λ q ˘ “ Λ ´ R N ` E “ G H N p Λ q ‰ ´ ˜ G N p Λ q ˘ . Provided we can justify that ˜Λ belongs to D N p C q ` , we have˜ G N p Λ q “ G Y N ´ Λ ´ R N ` E “ G H N p Λ q ‰˘¯ “ G Y N ´ ˜Λ ´ R N ` ˜ G N p Λ q ˘¯ “ ψ ˜Λ ` ˜ G N p Λ q ˘ . (6.9)With I N denoting the identity matrix and using Lemma 5.4 and the bound (6.7) for } Θ N p Λ q} , we have ℑ m ˜Λ “ ℑ m Λ ´ ℑ m R N ` Θ N p Λ q ˘ ě ℑ m Λ ´ } R N ` Θ N p Λ q ˘ } ˆ I N ě ℑ m Λ ´ γ C N ˆ I N , (6.10)where C N is as in (6.7), or as in (6.8) when Y N is diagonal. Assuming that2 γ N ´ }p ℑ m Λ ´ q} ă , (6.11)we indeed have ˜Λ P D N p C q ` and so ˜ G N p Λ q is solution of the ﬁxed point problem for ψ ˜Λ .If Y N is diagonal, the same conclusion holds whenever γ N ´ }p ℑ m Λ q ´ } ă , . (6.12)Hence, by Lemma 6.3 we obtain the equality ˜ G N p Λ q “ G ˝ H N p ˜Λ q and so, by Equalities(6.6) and (6.9), we obtain E “ G H N p Λ q ‰ “ G ˝ H N p ˜Λ q ` Θ N p Λ q . (6.13)By Lemma 6.4, the map is Λ ÞÑ E “ G H N p Λ q ‰ is analytic on D N p C q ` . Recall that Θ N p Λ q “ ∆ “ p Ω H N p Λ q ´ Y N q ´ E N ‰ where E N is deﬁned in (5.11) and Ω H N p Λ q is deﬁned in (5.12).One checks easily that the map Λ ÞÑ Θ N p Λ q is analytic on D N p C q ` , which implies thatso are Λ ÞÑ ˜ G N p Λ q and Λ ÞÑ ˜Λ. Hence Equality (6.13) extends by analyticity for allΛ ą ą } E “ G H N p Λ q ‰ ´ G ˝ H N p Λ q} ď } G ˝ H N p ˜Λ q ´ G ˝ H N p Λ q} ` } Θ N p Λ q} . (6.14)Moreover, with the same proof as for (6.2), we have the estimate } G ˝ H N p ˜Λ q ´ G ˝ H N p Λ q} ď }p ℑ m Λ q ´ } }p ℑ m ˜Λ q ´ } } Λ ´ ˜Λ } . (6.15)47e have by Lemma 5.4, } Λ ´ ˜Λ } “ } R N p Θ N p Λ qq} ď γ ax } Θ N p Λ q} . Moreover, under theassumption γ C N ď p ´ δ q ℑ m Λ , for some 0 ă δ ă , (6.16)we obtain by (6.10) that ℑ m ˜Λ ě δ }p ℑ m Λ q ´ } ´ I N . Hence provided that Condition (6.16) holds, we have }p ℑ m ˜Λ q ´ } ď }p ℑ m Λ q ´ }{ δ . Com-bining (6.14) with (6.15) and (6.7), the previous estimates on } Λ ´ ˜Λ } and }p ℑ m ˜Λ q ´ } give } E “ G H N p Λ q ‰ ´ G ˝ H N p Λ q} ď }p ℑ m Λ q ´ } { δ γ } Θ N p Λ q} ` } Θ N p Λ q}ď ` ` }p ℑ m Λ q ´ } { δ γ ˘ C N , which completes the proof of Lemma 4.1. We now have all the ingredients to control the diﬀerence between the resolvent p λ I N ´ H N q ´ and the deterministic equivalent G ˝ H N p λ I N q which will ﬁnally complete the proofof Theorem 1.1. We ﬁrst establish results allowing to show that the expectation of the resolvent is adiagonal matrix. Recall that a random matrix A is unitarily invariant whenever U AU ˚ has the same law as A for any unitary matrix U . Lemma 7.1.

Let A be a N by N unitarily invariant random matrix whose entrieshave ﬁnite moment of any orders and let Σ P M N p C q Then for any n ě , the matrix E “ p Σ ˝ A q n ‰ is diagonal.Proof. For any i , i n ` P r N s we have E “ p Σ ˝ A q n ‰ p i , i n ` q “ N ÿ i ,...,i n “ ´ n ź k “ σ p i k , i k ` q ¯ ˆ E ” n ź k “ A p i k , i k ` q ı , (7.1)where σ p i k , i k ` q denotes the p i k , i k ` q -th entry of the matrix Σ.We shall prove that for any i , . . . , i n P r N s then E ” ś nk “ A p i k , i k ` q ı “ i ‰ i n ` . For this purpose we introduce a matrix function A t that depends on animplicit parameter G : for any anti-Hermitian matrix G , i.e. such that G ˚ “ ´ G , andfor any t P R , we denote A t “ e tG Ae ´ tG . Note that A “ A and the derivative of A t with respect to t is B t A t “ GA t ´ A t G . Moreover, the unitary invariance of A implies48hat A and A t have the same law. In particular for any i , . . . , i n ` P r N s and any t P R we have E ” n ź k “ A p i k , i k ` q ı “ E ” n ź k “ A t p i k , i k ` q ı . We now diﬀerentiate the above equality with respect to t and take t “

0: using Leibnizformula,0 “ E ” B t ` n ź k “ A t p i k , i k ` q ˘ | t “ ı (7.2) “ n ÿ k “ E ” A p i , i q ¨ ¨ ¨ A p i k ´ , i k q ` GA ´ AG ˘ p i k , i k ` q A p i k ` , i k ` q ¨ ¨ ¨ A p i n , i n ` q ı . Recall that the above equality is a priori valid under the assumption that G is anti-Hermitian. But the relation is linear in G , and the set of anti-Hermitian matricesspans M N p C q as a vector space (any matrix A can be written as a linear combinationof Hermitian matrices A “ ℜ e A ` i ℑ m A and a matrix G is Hermitian whenever i G isanti-Hermitian). We can hence specify the equality for the elementary matrix G “ E i ,i .Note that for any k P r N s , we have ` GA ´ AG ˘ p i k , i k ` q “ ` δ i ,i k ´ δ i ,i k ` ˘ A p i k , i k ` q . Hence we obtain from Equation (7.2) a telescopic sum0 “ E “ A p i , i q ¨ ¨ ¨ A p i n , i n ` q ‰ ˆ n ÿ k “ ` δ i ,i k ´ δ i ,i k ` ˘ “ E “ A p i , i q ¨ ¨ ¨ A p i n , i n ` q ‰ p ´ δ i ,i n ` q . If i ‰ i n ` then we get E “ ś nk “ A p i k , i k ` q ‰ “ i , . . . , i n and hence by Equation(7.1) we deduce that E “ p Σ ˝ A q n ‰ is a diagonal matrix. Corollary 7.2.

Let A and Σ be as in Lemma 7.1. Assume moreover that Σ ˝ A isHermitian. Then for any Λ P D N p C q ` , the expectation of the generalized resolvent E “ p Λ ´ Σ ˝ A q ´ ‰ is a diagonal matrix.Proof. We ﬁrst assume that there is a constant B ą } A } ď B . For any matrix M we denote by } M } F its Frobenius norm ›› M ›› F “ ` Tr “ M ˚ M ‰˘ and we recall that } M } ď } M } F ď ? N } M } . Hence we have for M “ Σ ˝ A , } Σ ˝ A } ď ´ Tr ”` Σ ˝ A ˘ ˚ ` Σ ˝ A ˘ı¯ “ ´ ÿ i,j | σ p i, j q| | A p i, j q| ¯ ď σ max ˆ } A } F ď σ max ? N } A } . σ p i, j q denotes the p i, j q -th entry of the matrix Σ, and σ max “ max i,j | σ p i, j q| . Forany Λ P D N p C q ` such that ℑ m Λ ą σ max ? N B I N , we have that }p Σ ˝ A q Λ ´ } ă

1, andthus the following identity holds p Λ ´ Σ ˝ A q ´ “ ÿ n ě Λ ´ ` p Σ ˝ A q Λ ´ ˘ n , where the convergence of the sum is normal. In particular we can interchange summationand expectation, namely E “ p Λ ´ Σ ˝ A q ´ ‰ “ ÿ n ě Λ ´ E ”` p Σ ˝ A q Λ ´ ˘ n ı . Moreover, we have the equality p Σ ˝ A q Λ ´ “ Σ ˝ A where Σ “ ΣΛ ´ . Hence by Lemma7.1 for any Λ P D N p C q ` and any n ě E “` p Σ ˝ A q Λ ´ ˘ n ‰ is diagonal. So forany Λ such that ℑ m Λ ą σ max ? N B I N , the matrix E “ G X N p Λ q ‰ is also diagonal. Thisfact extends for any Λ P D N p C q ` by analytic continuation thanks to Lemma 6.4.To treat the general case, we use a classical spectral truncation argument. Let usdenote by λ , . . . , λ N and u , . . . , u N the eigenvalues and the associated eigenvectorsof A , so that we have A “ ř Ni “ λ j u ˚ i u i . For any B ą

0, we denote by A p B q thematrix A p B q “ ř Ni “ λ j ` | λ j | ď B ˘ u ˚ i u i , where denote the indicator function. Hence A p B q is uniformly bounded in operator norm by B . By the previous case, the matrix E “ p Λ ´ Σ ˝ A p B q q ´ ‰ is diagonal. Moreover, we have ››› p Λ ´ Σ ˝ A q ´ ´ p Λ ´ Σ ˝ A p B q q ´ ››› ď }p ℑ m Λ ´ q} ›› Σ ˝ ` A p B q ´ A ˘›› ď }p ℑ m Λ ´ q} σ max ? N } A p B q ´ A } . With N ﬁxed, we get that almost surely as B tends to inﬁnity the matrix p Λ ´ Σ ˝ A p B q q ´ converges to p Λ ´ Σ ˝ A q ´ . Since the matrices are bounded in operator norm, theconvergence holds in expectation. In particular, E “ p Λ ´ Σ ˝ A q ´ ‰ is the limit of adiagonal matrix so it is diagonal. We now complete the proof of Theorem 1.1 by combining Lemma 4.1 with a concentrationargument for the resolvent p Λ ´ X N ´ Y N q ´ towards its expectation E “ p Λ ´ H N q ´ ‰ . Lemma 7.3.

Let Λ P D N p C q ` . Then, for any pair of unit vectors v, w (that is } v } “} w } “ ), one has that, for all t ą , P `ˇˇ v ˚ p ` Λ ´ H N q ´ ´ E “ p Λ ´ H N q ´ ‰˘ w ˇˇ ě t ˘ ď ˆ ´ N t }p ℑ m Λ q ´ } ´ γ ˙ , (7.3) where γ is the maximum of the variances in the proﬁle Γ N . Moreover, for any λ P C ` and all t ą , P `ˇˇ g H N p λ q ´ E “ g H N p λ q ‰ˇˇ ě t ˘ ď ˆ ´ N t | ℑ m λ | γ ˙ , (7.4)50 here g H N is the Stieltjes transform of H N . To prove this result, we use the following result of Gaussian concentration inequalityfor Lipschitz functions (see e.g. [18, Theorem 5.6]).

Theorem 7.4.

Let X “ p X , . . . , X n q be a vector of n independent standard normalrandom variables. Let f : R n Ñ R be a L -Lipschitz function for the Euclidean norm of R n . Then for all t ą we have P ´ f p X q ´ E “ f p X q ‰ ě t ¯ ď e ´ t {p L q . (7.5) Proof.

We let H N p C q Ă M N p C q be the subset of Hermitian matrices. For two unitvectors v, w and any Λ in D N p C ` q we consider the function φ v,w Λ : H N p C q Ñ C A ÞÑ v ˚ p Λ ´ A ´ Y N q ´ w. In order to use Theorem 7.4 for the real and the imaginary parts of φ u,w Λ , we shallestimate its Lipschitz constant. We denote by ›› A ›› F “ ` Tr “ A ˚ A ‰˘ the Frobenius normof a matrix A . We use the isomorphism between M N p C q endowed with } ¨ } F and R N endowed as the Euclidean norm. Using Cauchy-Schwarz’s inequality and Lemma 5.3, itfollows that, for any A, A in H N p C q , | φ v,w Λ p A q ´ φ v,w Λ p A q| “ ˇˇˇ v ˚ ` Λ ´ A ´ Y N ˘ ´ ´ A ´ A ¯` Λ ´ A ´ Y N ˘ ´ w ˇˇˇ ď ›› v ˚ ` Λ ´ A ´ Y N ˘ ´ ›› ››´ A ´ A ¯` Λ ´ A ´ Y N ˘ ´ w ›› ď ››` Λ ´ A ´ Y N ˘ ´ ›››› A ´ A ››››` Λ ´ A ´ Y N ˘ ´ ›› ď }p ℑ m Λ q ´ } ›› A ´ A ›› ď }p ℑ m Λ q ´ } ›› A ´ A ›› F . (7.6)Since A can be decomposed in the basis (5.6) of Hermitian matrices as A “ ř Ni,j “ ˜ a i,j F i,j , where the ˜ a i,j are reals, we have that φ v,w Λ is a L -Lipschitz function (with Lipschitzconstant L “ }p ℑ m Λ q ´ } ) of the N ˆ N matrix ˜ A “ p ˜ a i,j q having real entries. Now,recall the decomposition (5.7) of X N into the basis (5.6) X N “ N ÿ i,j “ γ i,j ? N ˜ x i,j F i,j , (7.7)where the ˜ x i,j are i.i.d. real Gaussian random variables that are centered and of vari-ance one. By Inequality (7.6) and after a change of variable, we get that the function p ˜ x i,j q i,j ÞÑ φ v,w Λ p X N q is a L -Lipschitz function (that is complex-valued) with Lipschitzconstant L “ }p ℑ m Λ q ´ } γ max ? N . λ P C ` and recall that the Stieltjes transform H N “ X N ` Y N is the map g H N p λ q “ N Tr “ G H N p λ I N q ‰ “ N Tr “ p λ I N ´ H N q ´ ‰ , and let us consider the mapping A ÞÑ φ λ p A q : “ Tr “ p λ I N ´ A ´ Y N q ´ ‰ for A P H N p C q .For any A, A in H N p C q , we remark that ˇˇ φ λ p A q ´ φ λ p A q ˇˇ “ ˇˇ Tr “` λ I N ´ A ´ Y N ˘ ´ p A ´ A q ` λ I N ´ A ´ Y N ˘ ´ ‰ˇˇ “ ˇˇ Tr “` λ I N ´ A ´ Y N ˘ ´ ` λ I N ´ A ´ Y N ˘ ´ p A ´ A q ‰ˇˇ (7.8)To obtain an appropriate upper bound for (7.8), we use the following inequalities (seee.g. Lemma II.2 in [42]): denoting by σ p A q ď ¨ ¨ ¨ ď σ N p A q the singular values of A , N ÿ i “ σ n ´ i ` p ℜ e B q σ i p C q ď ℜ e ` Tr “ BC ‰˘ ď N ÿ i “ σ i p ℜ e B q σ i p C q (7.9)which hold for any B P M N p C q and C P H N p C q . For two such matrices, Inequality (7.9)combined with Cauchy-Schwarz’s inequality implies that | ℜ e ` Tr “ BC ‰˘ | ď | σ p ℜ e B q| N ÿ i “ | σ i p C q| ď } ℜ e B }? N gffe N ÿ i “ | σ i p C q| . Since } C } F “ ř Ni “ | σ i p C q| and } ℜ e B } ď } B } (by the same argument as for the imagi-nary part in the proof of Lemma 5.1), one ﬁnally obtains that | ℜ e ` Tr “ BC ‰˘ | ď ? N } B }} C } F . Using the fact ℑ m A “ ´ ℜ e p i A q , one obtains by similar arguments that | ℑ m ` Tr “ BC ‰˘ | ď ? N } B }} C } F , which ﬁnally yields | Tr “ BC ‰ | ď | ℜ e ` Tr “ BC ‰˘ | ` | ℑ m ` Tr “ BC ‰˘ | ď ? N } B }} C } F Hence, combining the above inequality with (7.8) and Lemma 5.3, it follows that ˇˇ φ λ p A q ´ φ λ p A q ˇˇ ď ? N | ℑ m λ | ´ ›› A ´ A ›› F Therefore, thanks to the decomposition (7.7) for X N , the mapping ˜ X N ÞÑ N Tr “ p λ I N ´ X N ´ Y N q ´ ‰ “ N φ λ p X N q is a L -Lipschitz function with Lipschitz constant L “ | ℑ m λ | ´ γ max N .

Therefore, using again Gaussian concentration for Lipschitz functions, one obtainsInequality (7.4), which completes the proof of Lemma 7.3.52ow, let us ﬁx 0 ă δ ă

1, and consider Λ P D N p C q ` satisfying 2 γ N ´ }p ℑ m Λ q ´ } ď ´ δ so that } E “ G H N p Λ q ‰ ´ G ˝ H N p Λ q} ď t p q N , by Lemma 4.1, where t p q N : “ ´ ` γ δ }p ℑ m Λ q ´ } ¯ γ }p ℑ m Λ q ´ } N . (7.10)Since G H N p Λ q and its expectation are diagonal matrices one has that the operator normof their diﬀerence satisﬁes ›› G H N p Λ q ´ E “ G H N p Λ q ‰›› “ max ď i ď N ˇˇ p Λ ´ H N q ´ r i, i s ´ E “ p Λ ´ H N q ´ r i, i s ‰ˇˇ . Thus, combining Inequality (7.3) with a union bound yields the following concentrationinequality: for all t ą P `›› G H N p Λ q ´ E “ G H N p Λ q ‰›› ě t ˘ ď N exp ˆ ´ N t }p ℑ m Λ q ´ } ´ γ ˙ . Hence, taking t “ t p q N : “ ? γ max a d log p N q}p ℑ m Λ q ´ } N ´ { (for some d ą P `›› G H N p Λ q ´ G ˝ H N p Λ q ›› ě t p q N ` t p q N ˘ ď N ´ d , which proves Inequality (1.14), and completes the proof of Theorem 1.1 in the generalcase. The case diagonal where Y N is diagonal follows similarly.Then, to derive the proof of Corollary 1.2, we use the concentration inequality (7.4)for the Stieltjes transform g H N p λ q . Since g ˝ H N p λ q “ N Tr “ G ˝ H N p λ I N q ‰ and given that | N Tr “ A ‰ | ď } A } for any diagonal matrix A P M N p C q , we obtain from Lemma 4.1 that,for λ satisfying Condition (1.16), ˇˇ E “ g H N p λ q ‰ ´ g ˝ H N p λ q ˇˇ ď ˜ t p q N , where ˜ t p q N : “ ´ ` γ δ | ℑ m λ | ¯ γ N p ℑ m λ q . (7.11)Therefore, taking t “ ˜ t p q N : “ ? γ max ? d log p N q| ℑ m λ | N (for some d ą

0) one obtains, by combiningInequalities (7.4) and (7.11), that P `ˇˇ g H N p λ q ´ g ˝ H N p λ q ˇˇ ě ˜ t p q N ` ˜ t p q N ˘ ď N ´ d , which proves Inequality (1.17). The case diagonal where Y N is diagonal also followssimilarly, and this completes the proof of Corollary 1.2.We now prove Corollary 1.3, assuming hence that Y N is a diagonal matrix. We takeΛ “ λ I N with satisfying ℑ m λ ě γ max N ´ { p ´ δ q ´ { . Since Y N is supposed to beHermitian, it is a diagonal matrix with real entries. Therefore, by Corollary 7.2, one hasthat E “ p λ I N ´ H N q ´ ‰ is diagonal. Then, we remark that } β k p λ q ´ β ˝ k p λ q} ď } U ˚ N,k ` p λ I N ´ H N q ´ ´ G ˝ H N p λ I N q ˘ U N,k }} Θ k } (7.12)53ow, with ˆ t p q N : “ ´ ` γ δ | ℑ m λ | ¯ γ N { p ℑ m λ q , we have ›› U ˚ N,k ` E “ G H N p λ I N q ‰ ´ G ˝ H N p λ I N q ˘ U N,k ›› ď ›› E “ G H N p λ I N q ‰ ´ G ˝ H N p λ I N q ›› ď ˆ t p q N . (7.13)Moreover, if we denote by u , . . . , u k the columns of the matrix U N,k , we have ›› U ˚ N,k ` p λ I N ´ H N q ´ ´ E “ p λ I N ´ H N q ´ ‰˘ U N,k ›› ď k max ď ℓ,ℓ ď k u ˚ ℓ ` p λ I N ´ H N q ´ ´ E “ p λ I N ´ H N q ´ ˘ u ℓ . Thus, by combining Inequality (7.3) with the fact that E “ p λ I N ´ H N q ´ ‰ “ E “ G H N p λ I N q ‰ (since E “ p λ I N ´ H N q ´ ‰ is diagonal), it follows by a union bound argument that, for all t ą P ˆ k } U ˚ N,k ` p λ I N ´ H N q ´ ´ E “ G H N p λ I N q ‰˘ U N,k } ě t ˙ ď k exp ˆ ´ N t | ℑ m λ | γ ˙ . (7.14)Therefore, combining Inequalities (7.12), (7.13) and (7.14) we obtain that P ´ } β k p λ q ´ β ˝ k p λ q} ě } Θ k }p k ¯ t p q N ` ˆ t p q N q ¯ ď k N ´ d , with ¯ t p q N : “ ? γ max ? d log p N q| ℑ m λ | N ´ { , which ﬁnally yields Inequality (1.19).Let us now prove Corollary 1.4, where Y N is not necessarily diagonal. In particular E “ p λ I N ´ H N q ´ ‰ is not necessarily a diagonal matrix. Thus, we shall use the determin-istic equivalent ˜ β ˝ k p λ q deﬁned by (1.21) to approximate β k p λ q . First, as previously, weremark that } β k p λ q ´ ˜ β ˝ k p λ q} ď } U ˚ N,k ´ p λ I N ´ H N q ´ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ ¯ U N,k }} Θ k } , (7.15)where Ω ˝ H N p λ I N q “ λ I N ´ R N ` G ˝ H N p λ I N q ˘ . By the same arguments used to deriveinequality (7.14), it follows that P ˆ k } U ˚ N,k ` p λ I N ´ H N q ´ ´ E “ p λ I N ´ H N q ´ ‰˘ U N,k } ě t ˙ ď k exp ˆ ´ N t | ℑ m λ | γ ˙ . (7.16)Hence, the only diﬀerence with the setting where Y N is diagonal is the controlof the term } U ˚ N,k ´ E “ p λ I N ´ H N q ´ ‰ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ ¯ U N,k } which is obviouslybounded by } E “ p λ I N ´ H N q ´ ‰ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ } . We now consider the decompo-sition } E “ p λ I N ´ H N q ´ ‰ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ }ď } E “ p λ I N ´ H N q ´ ‰ ´ ` Ω H N p λ I N q ´ Y N ˘ ´ }`} ` Ω H N p λ I N q ´ Y N ˘ ´ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ } ` Ω H N p λ I N q ´ Y N ˘ ´ “ λ I N ´ R N ´ E “ G H N p λ I N q ‰¯ . By combining Lemma 5.3,Lemma 5.7 and Lemma 5.8, we obtain that } E “ p λ I N ´ H N q ´ ‰ ´ ` Ω H N p λ I N q ´ Y N ˘ ´ } ď γ N p ℑ m λ q . (7.17)Now, using the equality ` Ω H N p λ I N q ´ Y N ˘ ´ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ “ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ ` Ω ˝ H N p λ I N q ´ Ω H N p λ I N q ˘ ` Ω H N p λ I N q ´ Y N ˘ ´ , one has that } ` Ω H N p λ I N q ´ Y N ˘ ´ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ }ď } ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ }} ` Ω H N p λ I N q ´ Y N ˘ ´ }ˆ} R N ` G ˝ H N p λ I N q ˘ ´ R N ´ E “ G H N p λ I N q ‰¯ } By Lemma 5.3 and Lemma 5.7 one obtains that } ` Ω H N p λ I N q ´ Y N ˘ ´ } ď p ℑ m λ q ´ .Using again Lemma 5.3 one has that } ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ } ď } ` ℑ m Ω ˝ H N p λ I N q ˘ ´ } .Since Ω ˝ H N p λ I N q “ λ I N ´ R N ` G ˝ H N p λ I N q ˘ and given that ℑ m G ˝ H N p λ I N q ă ℑ m ` R N ` G ˝ H N p λ I N q ˘˘ ď

0. Consequently, onehas that ℑ m Ω ˝ H N p λ I N q ą ℑ m λ I N , and this ﬁnally yields } ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ } ďp ℑ m λ q ´ .Then, using Lemma 5.4, we remark that } R N ` G ˝ H N p λ I N q ˘ ´ R N ´ E “ G H N p λ I N q ‰¯ } ď γ } G ˝ H N p λ I N q ´ E “ G H N p λ I N q ‰ } , and therefore, if ℑ m λ ě γ max ´ N p ´ δ q ¯ { , then Lemma 4.1 implies that } R N ` G ˝ H N p λ I N q ˘ ´ R N ´ E “ G H N p λ I N q ‰¯ } ď ´ ` γ δ | ℑ m λ | ¯ γ N p ℑ m λ q . Therefore, we obtain that } ` Ω H N p λ I N q ´ Y N ˘ ´ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ } ď ´ ` γ δ | ℑ m λ | ¯ γ N p ℑ m λ q , (7.18)Hence, the use of Inequalities (7.17) and (7.18) yields that } E “ p λ I N ´ H N q ´ ‰ ´ ` Ω ˝ H N p λ I N q ´ Y N ˘ ´ } ď γ N p ℑ m λ q ` ´ ` γ δ | ℑ m λ | ¯ γ N p ℑ m λ q . (7.19)Finally, combining Inequalities (7.15), (7.16) and (7.19) yields Inequality (1.22), whichcompletes the proofs of the results stated in Section 1.2.55 .3 Convergence of the ﬁxed point algorithm A key step in these numerical experiments in the numerical approximation of the solutionof the ﬁxed point equation (1.12) through an iterative algorithm whose convergence isﬁrst brieﬂy discussed.

Lemma 7.5.

For any diagonal function G p q N : D N p C q ` Ñ D N p C q ´ , we consider thesequence of diagonal functions ` G p n q N ˘ n ě given by G p n ` q N p Λ q “ ψ Λ ` G p n q N p Λ q ˘ for any Λ P D N p C q ` . Then, as n goes to inﬁnity, the sequence G p n q N p Λ q converges to G ˝ H N p Λ q ,where G ˝ H N denotes the deterministic equivalent characterized in Lemma 6.3.Proof. Recall that by Lemma 6.1 the sequence ` G p n q N p Λ q ˘ n ě is bounded, and so up to asubsequence it converges to some diagonal matrix G N p Λ q . If ℑ m Λ ą γ max I N , by Corol-lary 6.2 then G N p Λ q “ G ˝ H N p Λ q by contractivity of the ﬁxed point problem. Moreover,for each n , the function G p n q is an analytic function of the variable Λ, uniformly boundedfor ℑ m Λ ą ε for any ε . By dominated convergence, the limit G N up to any subsequenceis analytic. By analytic continuation, all limit coincide and are equal to G ˝ H N . References [1]

O. Ajanki, T. Krüger, and L. Erdős , Singularities of solutions to quadraticvector equations on the complex upper half-plane , Communications on Pure andApplied Mathematics, 70 (2017), pp. 1672–1705.[2]

O. H. Ajanki, L. Erdős, and T. Krüger , Universality for general Wigner-typematrices , Probability Theory and Related Fields, 169 (2017), pp. 667–727.[3] ,

Stability of the matrix Dyson equation and random matrices with correlations ,Probability Theory and Related Fields, 173 (2019), pp. 293–373.[4]

J. Alt, L. Erdős, and T. Krüger , Local law for random Gram matrices , Elec-tron. J. Probab., 22 (2017), p. 41 pp.[5] ,

Local inhomogeneous circular law , Ann. Appl. Probab., 28 (2018), pp. 148–203.[6]

J. Alt, L. Erdös, T. Krüger, and Y. Nemish , Location of the spectrum ofKronecker random matrices , Ann. Inst. H. Poincaré Probab. Statist., 55 (2019),pp. 661–696.[7]

B. Au, G. Cébron, A. Dahlqvist, F. Gabriel, and C. Male , Large permu-tation invariant random matrices are asymptotically free over the diagonal , arXive-prints, (2018), p. arXiv:1805.07045.[8]

Z. Bai and J. Silverstein , Spectral Analysis of Large Dimensional Random Ma-trices , Springer Series in Statistics, Springer New York, 2012.569]

Z. Bai and J. W. Silverstein , Spectral analysis of large dimensional randommatrices , Springer Series in Statistics, Springer, New York, second ed., 2010.[10]

J. Baik, G. Ben Arous, and S. Péché , Phase transition of the largest eigenvaluefor nonnull complex sample covariance matrices , Ann. Probab., 33 (2005), pp. 1643–1697.[11]

J. Bazerque, G. Mateos, and G. Giannakis , Inference of Poisson countprocesses using low-rank tensor data , ICASSP, IEEE International Conference onAcoustics, Speech and Signal Processing - Proceedings, 10 2013, pp. 5989–5993.[12]

S. T. Belinschi, H. Bercovici, M. Capitaine, and M. Février , Outliers inthe spectrum of large deformed unitarily invariant models , Ann. Probab., 45 (2017),pp. 3571–3625.[13]

F. Benaych-Georges and A. Knowles , Lectures on the local semicircle law forWigner matrices , arXiv e-prints, (2016), p. arXiv:1601.04055.[14]

F. Benaych-Georges and R. R. Nadakuditi , The singular values and vec-tors of low rank perturbations of large rectangular random matrices , J. MultivariateAnalysis, 111 (2012), pp. 120–135.[15]

P. Biane , Processes with free increments , Math. Z., 227 (1998), pp. 143–174.[16]

J. Bigot, C. Deledalle, and D. Féral , Generalized sure for optimal shrink-age of singular values in low-rank matrix denoising , Journal of Machine LearningResearch, 18 (2017), pp. 1–50.[17]

L. V. Bogachev, S. A. Molchanov, and L. A. Pastur , On the density ofstates of random band matrices , Mat. Zametki, 50 (1991), pp. 31–42, 157.[18]

S. Boucheron, G. Lugosi, and P. Massart , Concentration inequalities , OxfordUniversity Press, Oxford, 2013. A nonasymptotic theory of independence, With aforeword by Michel Ledoux.[19]

E. J. Candès, C. A. Sing-Long, and J. D. Trzasko , Unbiased risk estimatesfor singular value thresholding and spectral estimators , IEEE Trans. Signal Process.,61 (2013), pp. 4643–4657.[20]

Y. Cao, A. Zhang, and H. Li , Multi-sample estimation of bacterial compositionmatrix in metagenomics data , Preprint, arXiv:1706.02380, (2017).[21]

M. Capitaine , Additive/multiplicative free subordination property and limitingeigenvectors of spiked additive deformations of Wigner matrices and spiked sam-ple covariance matrices , J. Theoret. Probab., 26 (2013), pp. 595–648.[22]

M. Capitaine, C. Donati-Martin, and D. Féral , The largest eigenvalues ofﬁnite rank deformation of large Wigner matrices: convergence and nonuniversalityof the ﬂuctuations , Ann. Probab., 37 (2009), pp. 1–47.5723]

M. Capitaine, C. Donati-Martin, D. Féral, and M. Février , Free convo-lution with a semicircular distribution and eigenvalues of spiked deformations ofWigner matrices , Electron. J. Probab., 16 (2011), pp. no. 64, 1750–1792.[24]

F. Chapon, R. Couillet, W. Hachem, and X. Mestre , The outliers amongthe singular values of large rectangular random matrices with additive ﬁxed rankdeformation , Markov Processes and Related Fields, 20 (2014), pp. 183–228.[25]

N. Cook, W. Hachem, J. Najim, and D. Renfrew , Non-hermitian randommatrices with a variance proﬁle (i): deterministic equivalents and limiting esds ,Electron. J. Probab., 23 (2018), p. 61 pp.[26]

D. Donoho and M. Gavish , Minimax risk of matrix denoising by singular valuethresholding , Ann. Statist., 42 (2014), pp. 2413–2440.[27]

R. B. Dozier and J. W. Silverstein , On the empirical distribution of eigenvaluesof large dimensional information-plus-noise-type matrices , J. Multivariate Anal., 98(2007), pp. 678–694.[28]

L. Erdős, B. Schlein, and H.-T. Yau , Universality of random matrices andlocal relaxation ﬂow , Invent. Math., 185 (2011), pp. 75–119.[29]

L. Erdös , The matrix dyson equation and its applications for random matrices ,Preprint, arXiv:1903.10060, (2019).[30]

L. Erdös, T. Krüger, and D. Schröder , Random matrices with slow correla-tion decay , Forum of Mathematics, Sigma, 7 (2019), p. e8.[31]

L. Erdös, H.-T. Yau, and J. Yin , Bulk universality for generalized Wignermatrices , Probability Theory and Related Fields, 154 (2012), pp. 341–407.[32]

J. Fan, Y. Fan, X. Han, and J. Lv , Asymptotic theory of eigenvectors for largerandom matrices , Preprint, arXiv:1902.06846, (2019).[33]

D. Féral and S. Péché , The largest eigenvalue of rank one deformation of largeWigner matrices , Comm. Math. Phys., 272 (2007), pp. 185–228.[34]

Z. Füredi and J. Komlós , The eigenvalues of random symmetric matrices , Com-binatorica, 1 (1981), pp. 233–241.[35]

V. L. Girko , Theory of stochastic canonical equations. Vol. I , vol. 535 of Mathe-matics and its Applications, Kluwer Academic Publishers, Dordrecht, 2001.[36] ,

Theory of stochastic canonical equations. Vol. II , vol. 535 of Mathematicsand its Applications, Kluwer Academic Publishers, Dordrecht, 2001.[37]

U. Haagerup and S. Thorbjørnsen , A new application of random matrices: ext p c ˚ red p f qq is not a group , Ann. Math., 2 (2005), pp. 711–775.5838] W. Hachem, P. Loubaton, and J. Najim , The empirical distribution of theeigenvalues of a gram matrix with a given variance proﬁle , Annales de l’InstitutHenri Poincare (B) Probability and Statistics, 42 (2006), pp. 649 – 670.[39]

W. Hachem, P. Loubaton, and J. Najim , Deterministic equivalents for certainfunctionals of large random matrices , Ann. Appl. Probab., 17 (2007), pp. 875–930.[40]

D. R. Jocic , Cauchy-schwarz norm inequalities for weak*-integrals of operatorvalued functions , J. Funct. Anal., 218 (2005), pp. 318–346.[41]

M. Kuś, M. Lewenstein, and F. Haake , Density of eigenvalues of random bandmatrices , Phys. Rev. A (3), 44 (1991), pp. 2800–2808.[42]

J. B. Lasserre , A trace inequality for matrix product , IEEE Transactions on Au-tomatic Control, 40 (1995), pp. 1500–1501.[43]

L. T. Liu, E. Dobriban, and A. Singer , e pca: High dimensional exponentialfamily pca , Ann. Appl. Stat., 12 (2018), pp. 2121–2150.[44] P. Loubaton and P. Vallet , Almost sure localization of the eigenvalues in agaussian information plus noise model. application to the spiked models. , Electron.J. Probab., 16 (2011), pp. 1934–1959.[45]

C. Male , Traﬃc distributions and independence: permutation invariant ran-dom matrices and the three notions of independence , arXiv e-prints, (2011),p. arXiv:1111.4662.[46]

C. Male , The norm of polynomials in large random and deterministic matrices ,Probability Theory and Related Fields, (June 2011), pp. 1–56.[47]

R. R. Nadakuditi , OptShrink: an algorithm for improved low-rank signal matrixdenoising by optimal, data-driven singular value shrinkage , IEEE Trans. Inform.Theory, 60 (2014), pp. 3002–3018.[48]

L. Pastur , Eigenvalue distribution of random operators and matrices , Astérisque,(1992), pp. Exp. No. 758, 5, 445–461. Séminaire Bourbaki, Vol. 1991/92.[49]

L. A. Pastur , The spectrum of random matrices , Teoret. Mat. Fiz., 10 (1972),pp. 102–112.[50]

V. Paulsen and C. U. Press , Completely Bounded Maps and Operator Algebras ,Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2002.[51]

S. Péché , The largest eigenvalue of small rank perturbations of Hermitian randommatrices , Probab. Theory Related Fields, 134 (2006), pp. 127–173.[52]

K. Rajan and L. F. Abbott , Eigenvalue spectra of random matrices for neuralnetworks. , Physical review letters, 97 (2006).5953]

J. Salmon, Z. T. Harmany, C. Deledalle, and R. Willett , Poisson noisereduction with non-local PCA , Journal of Mathematical Imaging and Vision, 48(2014), pp. 279–294.[54]

V. Scheidemann , Introduction to Complex Analysis in Several Variables ,Birkhäuser Basel, 2005.[55]

A. A. Shabalin and A. B. Nobel , Reconstruction of a low-rank matrix in thepresence of Gaussian noise , J. Multivariate Anal., 118 (2013), pp. 67–76.[56]

D. Shlyakhtenko , Random Gaussian band matrices and freeness with amalgama-tion , Internat. Math. Res. Notices, (1996), pp. 1013–1025.[57]

T. Tao and V. Vu , Random matrices: universality of local eigenvalue statistics ,Acta Math., 206 (2011), pp. 127–204.[58]

M. Udell, C. Horn, R. Zadeh, and S. Boyd , Generalized low rank models ,Foundations and Trends in Machine Learning, 9 (2016), pp. 1–118.[59]

D. Voiculescu , The coalgebra of the free diﬀerence quotient and free probability ,Internat. Math. Res. Notices, (2000), pp. 79–106.[60]

D. V. Voiculescu, K. J. Dykema, and A. Nica , Free random variables , vol. 1 ofCRM Monograph Series, American Mathematical Society, Providence, RI, 1992. Anoncommutative probability approach to free products with applications to randommatrices, operator algebras and harmonic analysis on free groups.[61]

E. P. Wigner , On the distribution of the roots of certain symmetric matrices , Ann.of Math. (2), 67 (1958), pp. 325–327.[62]