[PDF] Local Stability of the Free Additive Convolution

Abstract

We prove that the system of subordination equations, defining the free additive convolution of two probability measures, is stable away from the edges of the support and blow-up singularities by showing that the recent smoothness condition of Kargin is always satisfied. As an application, we consider the local spectral statistics of the random matrix ensemble A+UB U ∗ , where U is a Haar distributed random unitary or orthogonal matrix, and A and B are deterministic matrices. In the bulk regime, we prove that the empirical spectral distribution of A+UB U ∗ concentrates around the free additive convolution of the spectral distributions of A and B on scales down to N −2/3 .

Full PDF

aa r X i v : . [ m a t h . P R ] J a n Local Stability of the Free Additive Convolution

Zhigang Bao † IST Austria [email protected]

L´aszl´o Erd˝os ∗ IST Austria [email protected]

Kevin Schnelli † IST Austria [email protected]

We prove that the system of subordination equations, deﬁning the free additive con-volution of two probability measures, is stable away from the edges of the supportand blow-up singularities by showing that the recent smoothness condition of Karginis always satisﬁed. As an application, we consider the local spectral statistics of therandom matrix ensemble A + UBU ∗ , where U is a Haar distributed random unitaryor orthogonal matrix, and A and B are deterministic matrices. In the bulk regime,we prove that the empirical spectral distribution of A + UBU ∗ concentrates aroundthe free additive convolution of the spectral distributions of A and B on scales downto N − / . Keywords : Free convolution, subordination, local eigenvalue density

AMS Subject Classiﬁcation (2010) : 46L54, 60B20 Introduction

One of the basic concepts of free probability theory is the free additive convolution of twoprobability laws in a non-commutative probability space; it describes the law of the sum oftwo free random variables. In the case of a bounded self-adjoint random variable, its lawcan be identiﬁed with a probability measure of compact support on the real line. Hencethe free additive convolution of two probability measures is a well-deﬁned concept and it ischaracteristically diﬀerent from the classical convolution.In this paper, we prove a local stability result of the free additive convolution. A directconsequence is the continuity of the free additive convolution in a much stronger topologythan established earlier by Bercovici and Voiculescu [10]. A second application of our sta-bility result is to establish a local law on a very small scale for the eigenvalue density of arandom matrix ensemble A + U BU ∗ where U is a Haar distributed unitary or orthogonalmatrix and A , B are deterministic N by N hermitian matrices.The free additive convolution was originally introduced by Voiculescu [36] for the sumof free bounded noncommutative random variables in an algebraic setup (see Maassen [32]and by Bercovici and Voiculescu [10] for extensions to the unbounded case). The Stieltjestransform of the free additive convolution is related to the Cauchy-Stieltjes transforms of theoriginal measures by an elegant analytic change of variables. This subordination phenomenon was ﬁrst observed by Voiculescu [38] in a generic situation and extended to full generality byBiane [14]. In fact, the subordination equations, see (2.5)-(2.6) below, may directly be used ∗ Partially supported by ERC Advanced Grant RANMAT No. 338804. † Supported by ERC Advanced Grant RANMAT No. 338804. to deﬁne the free additive convolution. This analytic deﬁnition was given independently byBelinschi and Bercovici [4] and by Chistyakov and G¨otze [18]; for further details we refer to, e.g., [39, 27, 2].Kargin [30] pointed out that the analytic approach to the subordination equations, incontrast to the algebraic one, allows one to eﬀectively study how free additive convolutionis aﬀected by small perturbations; this is especially useful to treat various error terms in therandom matrix problem [31]. The basic tool is a local stability analysis of the subordinationequations. In [30], Kargin assumed a lower bound on the imaginary part of the subordinationfunctions and a certain non-degeneracy condition on the Jacobian that holds for genericvalues of the spectral parameter. While these so-called smoothness conditions hold in manyexamples, a general characterization was lacking. Our ﬁrst result, Theorem 2.5, showsthat the smoothness conditions hold wherever the absolutely continuous part of the freeconvolution measure is ﬁnite and nonzero. In particular, local stability holds unconditionally(Corollary 2.6) and, following Kargin’s argument [30], we immediately obtain the continuityof the free additive convolution in a stronger sense; see Theorem 2.7.The random matrix application of this stability result, however, goes well beyond Kargin’sanalysis [31] since our proof is valid on a much smaller scale. To explain the new elements,we recall how free probability connects to random matrices.The following fundamental observation was made by Voiculescu [37] (later extended byDykema [20] and Speicher [35]): if A = A ( N ) and B = B ( N ) are two sequences of Hermitianmatrices that are asymptotically free with eigenvalue distributions converging to probabilitymeasures µ α and µ β , then the eigenvalue density of A + B is asymptotically given by thefree additive convolution µ α ⊞ µ β . One of the most natural ways to ensure asymptoticfreeness is to consider conjugation by independent unitary matrices. Indeed, if A and B are deterministic (may even be chosen diagonal) with limit laws µ α and µ β , then A and U BU ∗ are asymptotically free if U = U ( N ) is a Haar distributed matrix; see [37] and manysubsequent works, e.g., [34, 40, 15, 33, 19]. In particular, the limiting spectral density ofthe eigenvalues of H = A + U BU ∗ is given by µ α ⊞ µ β .The conventional setup of free probability operates with moment calculations. An alter-native approach [33] proves the convergence of the resolvent at any ﬁxed spectral parame-ter z ∈ C + . Both approaches give rise to weak convergence of measures, in particular theyidentify the limiting spectral density on macroscopic scale.Armed with these macroscopic results, it is natural to ask for a local law , i.e., for thesmallest possible ( N -dependent) scale so that the local eigenvalue density on that scale stillconverges as N tends to inﬁnity. Local laws have been somewhat outside of the focus of freeprobability before Kargin’s recent works. After having improved a concentration result forthe Haar measure by Chatterjee [17] by using the Gromov-Milman concentration inequality,Kargin obtained a local law for the ensemble H = A + U BU ∗ on scale η ≫ (log N ) − / [29], i.e., slightly below the macroscopic scale. Recently in [31], he improved this result downto scale η ≫ N − / under the above mentioned smoothness condition. In Theorem 2.8 weprove the local law down to scale η = Im z ≫ N − / without any additional assumption.To achieve this short scale, we eﬀectively use the positivity of the imaginary parts of thesubordination functions by localizing the Gromov–Milman concentration inequality withinthe spectrum. Since the subordination functions are obtained as the solution of a systemof self-consistent equations whose derivation itself requires bounds on the subordinationfunctions, the reasoning seems circular. We break this circularity by a continuity argument(similarly as in [23]) in which we reduce the imaginary part of the spectral parameter invery small steps, use the previous step as an a priori bound and show that the bound doesnot deteriorate by using the local stability result, Theorem 2.6. Finally, we remark that the local stability result is also a key ingredient in [3], where wewere able to prove a local law down to the smallest possible scale η ≫ N − , but with aweaker error bound than in Theorem 2.8; see Remark 2.4 for details.1.1. Notation.

We use the symbols O ( · ) and o ( · ) for the standard big-O and little-onotation. We use c and C to denote positive numerical constants. Their values may changefrom line to line. For a, b >

0, we write a . b , a & b if there is C ≥ a ≤ Cb , a ≥ C − b respectively. We write a ∼ b , if a . b and a & b both hold. We denote by k v k theEuclidean norm of v ∈ C N . For an N × N matrix A ∈ M N ( C ), we denote by k A k its operatornorm and by k A k := p h A, A i its Hilbert-Schmidt norm, where h A, B i := Trace( AB ∗ ), for A, B ∈ M N ( C ). Finally, we denote by tr A the normalized trace of A , i.e., tr A = N Trace A . Acknowledgment.

We thank an anonymous referee for many useful comments and re-marks, and bringing references [7, 12] to our attention.2.

Main results

Free additive convolution.

In this subsection, we recall the deﬁnition of the freeadditive convolution. Given a probability measure ∗ µ on R , its Stieltjes transform , m µ , onthe complex upper half-plane C + := { z ∈ C : Im z > } is deﬁned by m µ ( z ) := Z R d µ ( x ) x − z , z ∈ C + . (2.1)We denote by F µ the negative reciprocal Stieltjes transform of µ , i.e., F µ ( z ) := − m µ ( z ) , z ∈ C + . (2.2)Observe that lim η ր∞ F µ (i η )i η = 1 , (2.3)as follows easily from (2.1). Note, moreover, that F µ is an analytic function on C + withnon-negative imaginary part. Conversely, if F : C + → C + is an analytic function such thatlim η ր∞ F (i η ) / i η = 1, then F is the negative reciprocal Stieltjes transform of a probabilitymeasure µ , i.e., F ( z ) = F µ ( z ), for all z ∈ C + ; see, e.g., [1].The free additive convolution is the binary operation on probability measures on R char-acterized by the following result. Proposition 2.1 (Theorem 4.1 in [4], Theorem 2.1 in [18]) . Given two probability measures µ and µ on R , there exist unique analytic functions, ω , ω : C + → C + , such that, ( i ) for all z ∈ C + , Im ω ( z ) , Im ω ( z ) ≥ Im z , and lim η ր∞ ω (i η )i η = lim η ր∞ ω (i η )i η = 1 ; (2.4)( ii ) for all z ∈ C + , F µ ( ω ( z )) − ω ( z ) − ω ( z ) + z = 0 ,F µ ( ω ( z )) − ω ( z ) − ω ( z ) + z = 0 . (2.5) ∗ All probability measures considered will be assumed to be Borel.

It follows from (2.4) that the analytic function F : C + → C + deﬁned by F ( z ) := F µ ( ω ( z )) = F µ ( ω ( z )) , (2.6)satisﬁes (2.3). Thus F is the negative reciprocal Stieltjes transform of a probability mea-sure µ , called the free additive convolution of µ and µ , usually denoted by µ ≡ µ ⊞ µ .Note that (2.6) shows that the rˆoles of µ and µ are symmetric and thus µ ⊞ µ = µ ⊞ µ .The functions ω and ω of Proposition 2.1 are called subordination functions and F is saidto be subordinated to F µ , respectively to F µ .We mention that Voiculescu [36] originally introduced the free additive convolution ina diﬀerent, algebraic manner. The equivalent analytic deﬁnition based on the existence ofsubordination functions (taken up in Proposition 2.1 above) was introduced in [4, 18].We next recall some basic examples. Choosing µ arbitrary and µ as a single pointmass at b ∈ R , it is easy to check that µ ⊞ µ simply is µ shifted by b . We exclude thisuninteresting case by henceforth assuming that µ and µ are both supported at more thanone point. Choosing µ = µ = µ as the Bernoulli distribution µ = (1 − ξ ) δ + ξδ , ξ ∈ (0 , , the free additive convolution is explicitly given by (see e.g., (5.5) of [33])( µ ⊞ µ )( x ) = p ( ℓ + − x ) + ( x − ℓ − ) + πx (2 − x ) + (1 − ξ ) + δ ( x ) + (2 ξ − + δ ( x ) , (2.7) x ∈ R , where ℓ ± := 1 ± p ξ (1 − ξ ) and where ( · ) + denotes the positive part. Observe that µ ⊞ µ has a nonzero absolutely continuous part and, depending on the choice of ξ , a pointmass. Another important choice for µ is Wigner’s semicircle law µ sc . For arbitrary µ , µ ⊞ µ sc is then purely absolutely continuous with a bounded density † that is real analyticwherever positive [13].Returning to the generic setting, the atoms of µ ⊞ µ are identiﬁed as follows. A point c ∈ R is an atom of µ ⊞ µ , if and only if there exist a, b ∈ R such that c = a + b and µ ( { a } ) + µ ( { b } ) >

1; see [Theorem 7.4, [11]]. For another interesting properties of theatoms of µ ⊞ µ we refer the reader to [12]. The boundary behavior of the functions F µ ⊞ µ , ω and ω has been studied by Belinschi [5, 6, 7] who proved the next two results.For simplicity, we restrict the discussion to compactly supported probability measures. Proposition 2.2 (Theorem 2.3 in [5], Theorem 3.3 in [6]) . Let µ and µ be compactlysupported probability measures on R , none of them being a single point mass. Then thefunctions F µ ⊞ µ , ω , ω : C + → C + extend continuously to R . Belinschi further showed in Theorem 4.1 in [6] that the singular continuous part of µ ⊞ µ is always zero and that the absolutely continuous part, ( µ ⊞ µ ) ac , of µ ⊞ µ is alwaysnonzero. We denote the density function of ( µ ⊞ µ ) ac by f µ ⊞ µ .We are now ready to introduce our notion of regular bulk , B µ ⊞ µ , of µ ⊞ µ . Informally,we let B µ ⊞ µ be the open set on which µ ⊞ µ admits a continuous density that is strictlypositive and bounded from above. For a formal deﬁnition we ﬁrst introduce the set U µ ⊞ µ := int (cid:26) supp ( µ ⊞ µ ) ac (cid:15) { x ∈ R : lim η ց F µ ⊞ µ ( x + i η ) = 0 } (cid:27) . (2.8)Note that U µ ⊞ µ does not contain any atoms of µ ⊞ µ . By Privalov’s theorem the set { x ∈ R : lim η ց F µ ⊞ µ ( x + i η ) = 0 } has Lebesgue measure zero. In fact, an even strongerstatement applies for the case at hand. Belinschi [7] showed that if x ∈ R is such that † All densities are with respect to Lebesgue measure on R . lim η ց F µ ⊞ µ ( x + i η ) = 0, then it must be of the form x = a + b with µ ( { a } ) + µ ( { b } ) ≥ a, b ∈ R . There could only be ﬁnitely many such x , thus U µ ⊞ µ must contain an open non-empty interval. Proposition 2.3 (Theorem 3.3 in [6]) . Let µ and µ be as above and ﬁx any x ∈ U µ ⊞ µ .Then F µ ⊞ µ , ω , ω : C + → C + extend analytically around x . In particular, the densityfunction f µ ⊞ µ is real analytic in U µ ⊞ µ wherever positive. The regular bulk is obtained from U µ ⊞ µ by removing the zeros of f µ ⊞ µ inside U µ ⊞ µ . Deﬁnition 2.4.

The regular bulk of the measure µ ⊞ µ is deﬁned as the set B µ ⊞ µ := U µ ⊞ µ \ (cid:8) x ∈ U µ ⊞ µ : f µ ⊞ µ ( x ) = 0 (cid:9) . (2.9)Note that B µ ⊞ µ is an open non-empty set on which µ ⊞ µ admits the density f µ ⊞ µ .The density is strictly positive and thus (by Proposition 2.3) real analytic on B µ ⊞ µ .2.2. Stability Result.

To present our results it is convenient to recast (2.5) in a compactform: For generic probability measures µ , µ as above, let the function Φ µ ,µ : ( C + ) → C be given by Φ µ ,µ ( ω , ω , z ) := (cid:18) F µ ( ω ) − ω − ω + zF µ ( ω ) − ω − ω + z (cid:19) . (2.10)Considering µ , µ as ﬁxed, the equationΦ µ ,µ ( ω , ω , z ) = 0 , (2.11)is equivalent to (2.5) and, by Proposition 2.1, there are unique analytic functions ω , ω : C + → C + , z ω ( z ) , ω ( z ) satisfying (2.4) that solve (2.11) in terms of z . We use thefollowing conventions: We denote by ω and ω generic variables on C + and we denote, witha slight abuse of notation, by ω ( z ) and ω ( z ) the subordination functions solving (2.11) interms of z . When no confusion can arise, we simply write Φ for Φ µ ,µ .We call the system (2.11) linearly S -stable at ( ω , ω ) ifΓ µ ,µ ( ω , ω ) := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − F ′ µ ( ω ) − F ′ µ ( ω ) − − (cid:19) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ S , (2.12)for some constant S . Especially, the partial Jacobian matrix, DΦ( ω , ω ), of (2.10) given byDΦ( ω , ω ) := (cid:18) ∂ Φ ∂ω ( ω , ω , z ) , ∂ Φ ∂ω ( ω , ω , z ) (cid:19) = (cid:18) − F ′ µ ( ω ) − F ′ µ ( ω ) − − (cid:19) , admits a bounded inverse at ( ω , ω ). Note that DΦ( ω , ω ) is independent of z .Our ﬁrst main result shows that the system (2.11) is linearly stable and that the imaginaryparts of the subordination functions are bounded below in the regular bulk. We require somemore notation: For a, b ≥ b ≥ a , and an interval I ⊂ R , we introduce the domain S I ( a, b ) := { z = E + i η ∈ C + : E ∈ I , a ≤ η ≤ b } . (2.13) Theorem 2.5.

Let µ and µ be compactly supported probability measures on R , and assumethat neither is supported at a single point and that at least one of them is supported at morethan two points. Let I ⊂ B µ ⊞ µ be a compact non-empty interval and ﬁx some < η M < ∞ .Then there are two constants k > and S < ∞ , both depending on the measures µ and µ , on the interval I as well as on the constant η M , such that following statements hold. ( i ) The imaginary parts Im ω and Im ω of the subordination functions associatedwith µ and µ satisfy min z ∈S I (0 ,η M ) Im ω ( z ) ≥ k , min z ∈S I (0 ,η M ) Im ω ( z ) ≥ k . (2.14)( ii ) The system Φ µ ,µ ( ω , ω , z ) = 0 is linearly S -stable at ( ω ( z ) , ω ( z )) uniformly in S I (0 , η M ) , i.e., max z ∈S I (0 ,η M ) Γ µ ,µ ( ω ( z ) , ω ( z )) ≤ S . (2.15)

Remark . The assumption that neither of µ , µ is a point mass guarantees that thefree additive convolution is not a simple translate. The case when both, µ and µ arecombinations of two point masses is special and its discussion is postponed to Section 7.Theorem 2.5 has the following local stability result as corollary. Corollary 2.6.

Let µ , µ and S I (0 , η M ) be as in Theorem 2.5. Fix z ∈ C + . Assume thatthe functions e ω , e ω , e r , e r : C + → C satisfy Im e ω ( z ) > , Im e ω ( z ) > and Φ µ ,µ ( e ω ( z ) , e ω ( z ) , z ) = e r ( z ) , (2.16) with e r ( z ) := ( e r ( z ) , e r ( z )) ⊤ . Let ω , ω be the subordination functions solving the system Φ µ ,µ ( ω ( z ) , ω ( z ) , z ) = 0 , z ∈ C + .Then there exists a (small) constant δ > such that whenever we have | e ω ( z ) − ω ( z ) | ≤ δ , | e ω ( z ) − ω ( z ) | ≤ δ , (2.17) we also have | e ω ( z ) − ω ( z ) | ≤ S k e r ( z ) k , | e ω ( z ) − ω ( z ) | ≤ S k e r ( z ) k . (2.18) The constant δ > depends on µ and µ , on the interval I as well as on η M . We omit the proof of Corollary 2.6 from Theorem 2.5, since it follows directly fromProposition 4.1 in Section 4 below.2.3.

Applications.

We next explain two main applications of the stability estimates ob-tained in Theorem 2.5.2.3.1.

Continuity of the free additive convolution.

Our ﬁrst application shows that the freeadditive convolution is a continuous operation when the image is equipped with the topologyof local uniform convergence of the density in the regular bulk; see (2.23). Bercovici andVoiculescu (Proposition 4.13 of [10]) showed that the free additive convolution is continuouswith respect to weak convergence of measures. More precisely, given two pairs of probabilitymeasures µ A , µ B and µ α , µ β on R , the measures µ A ⊞ µ B and µ α ⊞ µ β satisfyd L ( µ A ⊞ µ B , µ α ⊞ µ β ) ≤ d L ( µ A , µ α ) + d L ( µ B , µ β ) , (2.19)where d L denotes the L´evy distance. In particular, weak convergence of µ A to µ α and weakconvergence of µ B to µ β imply weak convergence of µ A ⊞ µ B to µ α ⊞ µ β .Using the Stieltjes transform, we can easily link (2.19) to the systems of equations in (2.5),respectively in (2.10). Using integration by parts and the deﬁnition of the Stieltjes transform,a direct computation reveals that there is a numerical constant C such that | m µ A ⊞ µ B ( z ) − m µ α ⊞ µ β ( z ) | ≤ Cη (cid:16) η (cid:17) d L ( µ A ⊞ µ B , µ α ⊞ µ β ) ≤ Cη (cid:16) η (cid:17) (d L ( µ A , µ α ) + d L ( µ B , µ β )) , η = Im z , (2.20) for all z ∈ C + , where we used (2.19) to get the second line. Note that the estimate in (2.20)deteriorates as η approaches the real line. Our next result strengthens (2.20) as follows. Weconsider the measure µ α ⊞ µ β as “reference” measure (in the sense that it locates the regularbulk) while µ A , µ B are arbitrary probability measures and show that the L´evy distancesbound | m µ A ⊞ µ B ( E + i η ) − m µ α ⊞ µ β ( E + i η ) | uniformly in η , for all E inside the regular bulkof µ α ⊞ µ β . Theorem 2.7.

Let µ α and µ β be compactly supported probability measures on R , and assumethat neither is supported at a single point and that at least one of them is supported at morethan two points. Let I ⊂ B µ α ⊞ µ β be a compact non-empty interval and ﬁx some < η M < ∞ .Let µ A and µ B be two arbitrary probability measures on R .Then there are constants b > and Z < ∞ , both depending on the measures µ α and µ β ,on the interval I as well as on the constant η M , such that whenever d L ( µ A , µ α ) + d L ( µ B , µ β ) ≤ b (2.21) holds, then max z ∈S I (0 ,η M ) (cid:12)(cid:12) m µ A ⊞ µ B ( z ) − m µ α ⊞ µ β ( z ) (cid:12)(cid:12) ≤ Z (d L ( µ A , µ α ) + d L ( µ B , µ β )) , (2.22) holds, too. Note that max z ∈S I (0 ,η M ) | m µ α ⊞ µ β ( z ) | < ∞ by compactness of I and analyticity of m µ α ⊞ µ β in I . Thus the Stieltjes-Perron inversion formula directly implies that ( µ A ⊞ µ B ) ac has adensity, f µ A ⊞ µ B , inside I and thatmax x ∈I | f µ A ⊞ µ B ( x ) − f µ α ⊞ µ β ( x ) | ≤ Z (d L ( µ A , µ α ) + d L ( µ B , µ β )) , (2.23)provided that (2.21) holds, where f µ α ⊞ µ β is the density of ( µ α ⊞ µ β ) ac . Remark . The estimate (2.22) was recently given by Kargin [30] under the assumptionthat (2.14) and (2.15) hold for all z ∈ S I (0 , η M ), i.e., under the assumption that the con-clusions of our Theorem 2.5 hold. It is quite surprising that one can directly set Im z = 0in (2.22). As ﬁrst noted by Kargin, this is due to the regularizing eﬀect of ω α , ω β and tothe global uniqueness of solutions to (2.5) for arbitrary probability measures.2.3.2. Application to random matrix theory.

We now turn to an application of Theorem 2.5in random matrix theory. Let A ≡ A ( N ) and B ≡ B ( N ) be two sequences of N × N deter-ministic real diagonal matrices, whose empirical spectral distributions are denoted by µ A and µ B respectively, i.e., µ A := 1 N N X i =1 δ a i , µ B := 1 N N X i =1 δ b i , (2.24)where A = diag( a i ), B = diag( b i ). The matrices A and B depend on N , but we omitthis fact from the notation. Let ω A and ω B denote the subordination functions associatedwith µ A and µ B by Proposition 2.1.We assume that there are deterministic probability measures µ α and µ β on R , neitherof them being a single point mass, such that the empirical spectral distributions µ A , µ B converge weakly to µ α , µ β , as N → ∞ . More precisely, we assume thatd L ( µ A , µ α ) + d L ( µ B , µ β ) → , (2.25)as N → ∞ . Let ω α , ω β denote the subordination functions associated with µ α and µ β . Let U be an independent N × N Haar distributed unitary matrix (in short

Haar unitary )and consider the random matrix H ≡ H ( N ) := A + U BU ∗ . (2.26)We introduce the Green function , G H , of H and its normalized trace, m H , by setting G H ( z ) := 1 H − z , m H ( z ) := tr G H ( z ) , (2.27) z ∈ C + . We refer to z as the spectral parameter and we often write z = E + i η , E ∈ R , η >

0. Recall the deﬁnition of S I ( a, b ) in (2.13). We have the following local law for m H . Theorem 2.8.

Let µ α and µ β be two compactly supported probability measures on R , andassume that neither is only supported at one point and that at least one of them is supportedat more than two points. Let I ⊂ B µ α ⊞ µ β be a compact non-empty interval and ﬁx some < η M < ∞ . Assume that the sequences of matrices A and B in (2.26) are such that theirempirical eigenvalue distributions µ A and µ B satisfy (2.25) . Fix any small γ > and set η m := N − / γ .Then we have the following uniform estimate: For any (small) ǫ > and any (large) D , P (cid:18) [ z ∈S I ( η m ,η M ) (cid:26)(cid:12)(cid:12) m H ( z ) − m µ A ⊞ µ B ( z ) (cid:12)(cid:12) > N ǫ N η / (cid:27)(cid:19) ≤ N D , (2.28) holds for N ≥ N , with some N suﬃciently large, where we write z = E + i η . Using standard techniques of random matrix theory, we can translate the estimate (2.28)on the Green function into an estimate on the empirical spectral distribution of the matrix H .Let λ , . . . , λ N denote the ordered eigenvalues of H and denote by µ H := 1 N N X i =1 δ λ i (2.29)its empirical spectral distribution. Our result on the rate of convergence of µ H is as follows. Corollary 2.9.

Let

I ⊂ B µ α ⊞ µ β be a compact non-empty interval. Then, for any E < E in I , we have the following estimate. For any (small) ǫ > and any (large) D we have P (cid:18)(cid:12)(cid:12)(cid:12) µ H ([ E , E )) − µ A ⊞ B ([ E , E )) (cid:12)(cid:12)(cid:12) > N ǫ N / (cid:19) ≤ N − D , (2.30) for N ≥ N , with some N suﬃciently large. We omit the proof of Corollary 2.9 from Theorem 2.8, but mention that the normalizedtrace m H of the Green function and the empirical spectral distribution µ H of H are linked by m H ( z ) = tr G H ( z ) = 1 N N X i =1 λ i − z = Z R d µ H ( x ) x − z , z ∈ C + . Corollary 2.9 then follows from a standard application of the Helﬀer-Sj¨ostrand functionalcalculus; see e.g.,

Section 7.1 of [22] for a similar argument.Note that assumption (2.25) does not exclude that the matrix H has outliers in thelarge N limit. In fact, the model H = A + U BU ∗ shows a rich phenomenology when, say, A has a ﬁnite number of large spikes; we refer to the recent works in [8, 9, 16, 31]. Remark . Our results in Theorem 2.8 and Corollary 2.9 are stated for U Haar distributedon the unitary group U ( N ). However, they also hold true (with the same proofs) when U is Haar distributed on the orthogonal group O ( N ). Remark . In [3], we derive, with a diﬀerent approach, the estimate (with the notationof (2.28)) P (cid:18) [ z ∈S I ( η m ,η M ) (cid:26)(cid:12)(cid:12) m H ( z ) − m µ A ⊞ µ B ( z ) (cid:12)(cid:12) > N ǫ √ N η (cid:27)(cid:19) ≤ N D , (2.31)for N ≥ N , with some N suﬃciently large, and with η m = N − γ . In fact, we obtainestimates for individual matrix elements of the resolvent G H as well. Comparing with (2.28),we see that we can choose η in (2.31) almost as small as N − at the price of losing afactor √ N . The stability and perturbation analysis in [3] rely on the optimal results inTheorem 2.5 and Theorem 2.7 as well as in Sections 3-5 of the present paper.2.4. Organization of the paper.

In Section 3, we consider the stability of the system (2.5)when at least one of the measures µ and µ is supported at more than two points and wegive the proof of Theorem 2.5. In Section 4, we consider perturbations of the system (2.5)and derive results that will be used in the proof of Theorem 2.8 and also in [3]. In Sec-tion 5, we prove Theorem 2.7. In Section 6 we consider the random matrix setup and proveTheorem 2.8. In the ﬁnal Section 7, we separately settle the special case when both µ and µ are combinations of two point masses and give the results analogous to Theorem 2.5,Theorem 2.7 and Theorem 2.8 for that case.3. Stability of the system (2.11) and proof of Theorem 2.5

In this section, we discuss stability properties of the system (2.11), with µ , µ twocompactly supported probability measures satisfying the assumptions of Theorem 2.5. Lemma 3.1.

Let µ , µ be two probability measures on R neither of them being supportedat a single point. Then there is, for any compact set K ⊂ C + , a strictly positive constant < σ ( µ , K ) < such that the reciprocal Stieltjes transform F µ (see (2.2) ) satisﬁes Im z ≤ (1 − σ ( µ , K )) Im F µ ( z ) , ∀ z ∈ K . (3.1) Similarly, there is < σ ( µ , K ) < such that (3.1) holds with µ and F µ , respectively.Assume in addition that µ is supported at more than two points. Then there is, for anycompact set K ⊂ C + , a strictly positive constant < e σ ( µ , K ) < such that | F ′ µ ( z ) − | ≤ (1 − e σ ( µ , K )) Im F µ ( z ) − Im z Im z , ∀ z ∈ K , (3.2) where F ′ µ ( z ) ≡ ∂ z F µ ( z ) .Proof of Lemma 3.1. Assuming by contradiction that inequality (3.1) saturates (with van-ishing constant σ ( µ , K ) = 0, for some z ∈ K ⊂ C + ), we have Im F µ ( z ) = Im z for some z ,thus F µ ( z ) = z − a , a ∈ R , i.e., µ = δ a . This shows (3.1).To establish (3.2), we ﬁrst note that the analytic functions F µ j : C + → C + , j = 1 , F µ j ( z ) = a F µj + z + Z R zxx − z d ρ F µj ( x ) , j = 1 , , z ∈ C + , (3.3)where a F µj ∈ R and ρ µ Fj are ﬁnite Borel measures on R . Note that the coeﬃcients of z onthe right-hand side are determined by (2.3). From (3.3) we see that | F ′ µ ( z ) − | = (cid:12)(cid:12)(cid:12)(cid:12)Z R x ( x − z ) d ρ F µ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) , z ∈ C + , (3.4) as well as Im F µ ( z ) − Im z Im z = Z R x | x − z | d ρ F µ ( x ) , z ∈ C + . (3.5)Hence, assuming by contradiction that inequality (3.2) saturates (with e σ ( µ , K ) = 0, forsome z ∈ K ), we must have Z R x | x − z | d ρ F µ ( x ) = (cid:12)(cid:12)(cid:12)(cid:12)Z R x ( x − z ) d ρ F µ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) , (3.6)for some z ∈ K , implying that ρ F µ is either a single point mass or ρ F µ = 0. In the lattercase, we have F µ ( z ) = a µ + z and we conclude that µ must be single point measure, butthis is excluded by assumption. Thus ρ F µ is a single point mass, i.e., there is a constant d µ ∈ R such that F µ ( z ) = a F µ + z + (1 + zd µ ) / ( d µ − z ), z ∈ K . It follows that µ is aconvex combination of two point measures yielding a contradiction. This shows (3.2). (cid:3) Bounds on the subordination functions.

Let µ , µ be as above and let ω ( z ), ω ( z ) be the associated subordination functions. Recall that we rewrite the deﬁning equa-tions (2.5) for ω and ω in the compact form Φ µ ,µ ( ω , ω , z ) = 0 introduced in (2.11).We ﬁrst provide upper bounds on the subordination functions ω ( z ), ω ( z ). Our proofrelies on the assumption that µ , µ are compactly supported, i.e., that there is a constant L < ∞ such that supp µ ⊂ [ − L, L ] , supp µ ⊂ [ − L, L ] . (3.7)Recall from Theorem 2.5 that we ﬁxed a compact non-empty interval I ⊂ B µ ⊞ µ . Sincethe density f µ ⊞ µ is real analytic inside the regular bulk by Proposition 2.3 and since I iscompact, there exists a constant κ > < κ ≤ min x ∈I f µ ⊞ µ ( x ) . (3.8)Fixing a constant 0 < η M < ∞ , it further follows that there is a constant M < ∞ such thatmax z ∈S I (0 ,η M ) | m µ ⊞ µ ( z ) | ≤ M . (3.9)

Lemma 3.2.

Let µ , µ be two compactly supported probability measures on R satisfy-ing (3.7) , for some L < ∞ , and assume that both are supported at more than one point. Let I ⊂ B µ ⊞ µ be a compact non-empty interval. Then there is a constant K < ∞ such that max z ∈S I (0 ,η M ) | ω ( z ) | ≤ K , max z ∈S I (0 ,η M ) | ω ( z ) | ≤ K . (3.10) The constant K depends on the constant η M and on the interval I as well as on the measures µ , µ through the constants κ in (3.8) and the constant L in (3.7) .Proof. We start by noticing that there is a constant κ > m µ ⊞ µ ( z ) = Z R η d( µ ⊞ µ )( x )( x − E ) + η ≥ Z I η f µ ⊞ µ ( x )d x ( x − E ) + η ≥ κ , (3.11)uniformly in z = E + i η ∈ S I (0 , η M ), where we used (3.8). Thus by subordination we havemin z ∈S I (0 ,η M ) | m µ ⊞ µ ( z ) | = min z ∈S I (0 ,η M ) (cid:12)(cid:12)(cid:12)(cid:12)Z R d µ ( a ) a − ω ( z ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ κ , (3.12)since m µ ⊞ µ ( z ) = − /F µ ⊞ µ ( z ) by (2.6).On the other hand, µ is supported on the interval [ − L, L ]; see (3.7). Hence, using (3.12), | ω ( z ) | must be bounded from above on S I (0 , η M ). Interchanging the rˆoles of the indices 1and 2, we also get that | ω ( z ) | is bounded from above on S I (0 , η M ). (cid:3) Having established upper bounds on the subordination functions, we show that theirimaginary parts are uniformly bounded from below on the domain S I (0 , η M ). The proofrelies on inequality (3.1). Lemma 3.3.

Let µ , µ be two probability measures on R satisfying (3.7) , for some L < ∞ ,and assume that neither of them is only supported at a single point. Let I ⊂ B µ ⊞ µ be acompact non-empty interval. Then there is a strictly positive constant k > such that min z ∈S I (0 ,η M ) Im ω ( z ) ≥ k , min z ∈S I (0 ,η M ) Im ω ( z ) ≥ k . (3.13) Remark . The constant k in (3.13) depends on the interval I through the constants κ in (3.8) and M in (3.9). It further depends on η M , as well as on σ ( µ , K ) and σ ( µ , K )in (3.1), with K i = { u ∈ C + : u = ω i ( z ) , z ∈ S I (0 , η M ) } , i = 1 , . (3.14) Proof of Lemma 3.3.

First note that there is κ > m µ ⊞ µ ( z ) ≥ κ for all z ∈ S I (0 , η M ); c.f., (3.11). Moreover, there is M < ∞ such that | m µ ⊞ µ ( z ) | ≤ M for all z ∈ S I (0 , η M ); c.f., (3.9). Recall from (2.5) and (2.6) that ω ( z ) + ω ( z ) = z − m µ ⊞ µ ( z ) , z ∈ C + . (3.15)Hence, considering the imaginary part, we notice from (3.9) that there is κ > z ∈S I (0 ,η M ) (Im ω ( z ) + Im ω ( z )) ≥ κ . (3.16)It remains to show that Im ω and Im ω are separately bounded from below. To do so weinvoke (3.1) and assume by contradiction that Im ω ( z ) ≤ ǫ , for some small 0 ≤ ǫ < κ / ω ( z ) ≥ κ /

2. Since µ is assumed not to be a single point mass,Lemma 3.1 assures thatIm F µ ( ω ( z )) ≥ Im ω ( z )1 − σ ( µ , K ) , z ∈ S I (0 , η M ) , (3.17)with 0 < σ ( µ , K ) <

1, where K denotes the image of S I (0 , η M ) under the map ω (whichis necessarily compact by Lemma 3.2). On the other hand, (2.11) impliesIm F µ ( ω ( z )) = Im ω ( z ) + Im ω ( z ) − Im z , z ∈ C + . (3.18)Since Im ω ( z ) ≥ Im z , by Proposition 2.1, we get, by comparing (3.18) and (3.17), a con-tradiction with the assumption that Im ω ( z ) ≤ ǫ , for suﬃciently small ǫ . Repeating theargument with the rˆoles of the indices 1 and 2 interchanged, we get (3.13). (cid:3) Linear stability of (2.11) . Having established lower and upper bounds on the subor-dination functions ω , ω , we now turn to the stability of the system Φ µ ,µ ( ω , ω , z ) = 0.Remember that we call the system linearly S -stable at ( ω , ω ) if Γ µ ,µ ( ω , ω ) ≤ S ,where Γ µ ,µ is deﬁned in (2.12). Lemma 3.4.

Let µ , µ be two probability measures on R satisfying (3.7) for some L < ∞ .Assume that neither of them is a single point mass and that at least one of them is supportedat more than two points. Let I ⊂ B µ ⊞ µ be a compact non-empty interval.Then, there is a ﬁnite constant S such that max z ∈S I (0 ,η M ) Γ µ ,µ ( ω ( z ) , ω ( z )) ≤ S , (3.19) and max z ∈S I (0 ,η M ) | ω ′ ( z ) | ≤ S , max z ∈S I (0 ,η M ) | ω ′ ( z ) | ≤ S , (3.20) where ω ( z ) , ω ( z ) are the solutions to Φ µ ,µ ( ω , ω , z ) = 0 .Remark . Lemma 3.4 is the ﬁrst instance where we use that at least one of µ and µ is supported at more than two points. For deﬁniteness, we assume that µ is supportedat more than two points. The constant S in (3.19) depends on the interval I through theconstant κ in (3.8), on the constants η M , L in (3.7), σ ( µ , K ) and σ ( µ , K ), as well as on e σ ( µ , K ) of (3.2) with K deﬁned in (3.14). Proof of Lemma 3.4.

Using (2.12) and Cramer’s rule, Γ ≡ Γ µ ,µ ( ω , ω , z ) equals Γ = 1 (cid:12)(cid:12) − ( F ′ µ ( ω ) − F ′ µ ( ω ) − (cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − − F ′ µ ( ω ( z )) + 1 − F ′ µ ( ω ( z )) + 1 − (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) . (3.21) As above, we assume for deﬁniteness that µ is supported at more than two points. We ﬁrstfocus on F ′ µ ( ω ). Recalling the deﬁnition of K from Remark 3.1 and invoking (3.2), weobtain | F ′ µ ( ω ( z )) − | ≤ (cid:0) − e σ ( µ , K ) (cid:1) Im F µ ( ω ( z )) − Im ω ( z )Im ω ( z ) , (3.22)for all z ∈ S I (0 , η M ), where 0 < e σ ( µ , K ) <

1. Abbreviating e σ ≡ e σ ( µ , K ) and usingΦ µ ,µ ( ω ( z ) , ω ( z ) , z ) = 0, we thus have | F ′ µ ( ω ( z )) − | ≤ (1 − e σ ) Im ω ( z )Im ω ( z ) , z ∈ S I (0 , η M ) . (3.23)Reasoning in the similar way ( c.f., (4.9)), we also obtain | F ′ µ ( ω ( z )) − | ≤ Im ω ( z )Im ω ( z ) , z ∈ C + , (3.24)where the inequality may saturate here since we do not exclude µ being supported at twopoints only. Multiplying (3.23) and (3.24), we getmax z ∈S I (0 ,η M ) | F ′ µ ( ω ( z )) − | | F ′ µ ( ω ( z )) − | ≤ − e σ . (3.25)Using Lemma 3.2 and Lemma 3.3, we also have from (3.23) and (3.24) thatmax z ∈S I (0 ,η M ) | F ′ µ i ( ω j ( z )) − | ≤ K k , { i, j } = { , } . (3.26)Hence, bounding the operator norm by the Hilbert-Schmidt norm in (3.21), we obtainby (3.26) and (3.25) thatmax z ∈S I (0 ,η M ) Γ µ ,µ ( ω ( z ) , ω ( z )) ≤ √ e σ (cid:18) (cid:16) K k (cid:17) (cid:19) / =: S , (3.27)with ﬁnite constant S . This proves (3.19).The estimates in (3.20) follow by diﬀerentiating the equation Φ µ ,µ ( ω ( z ) , ω ( z ) , z ) = 0with respect to z . We get (cid:18) − F ′ µ ( ω ( z )) − F ′ µ ( ω ( z )) − − (cid:19) (cid:18) ω ′ ( z ) ω ′ ( z ) (cid:19) = (cid:18) (cid:19) . (3.28)From (3.19) we know that Φ is uniformly S -stable and we get (3.20) by inverting (3.28). (cid:3) Remark . The crucial estimate in the proof above is (3.25). An alternative proof of (3.25)under the assumptions that both µ and µ are supported at more than two points waspointed out by an anonymous referee. From (2.5) we observe that the subordination func-tion ω ( z ) appears, for ﬁxed z ∈ C + , as the ﬁxed point of the map F z : C + → C + , u

7→ F z ( u ) := F µ ( F µ ( u ) − u + z ) − F µ ( u ) − u + z . (3.29)Indeed, assuming that ω ∈ C + and that µ , µ are supported at least at three points(so that F µ ( z ) − z and F µ ( z ) − z are not M¨obius transformations), the ﬁxed point ω ( z )is attracting as was shown in [4]. Thus, for any ﬁxed k >

0, the Schwarz–Pick Theoremand (2.5) imply that for any compact subset b K of { z ∈ C + ∪ R : Im ω ( z ) , Im ω ( z ) ≥ k } there is a constant b σ ( b K ) < | F ′ ( ω ( z ) − || F ′ ( ω ( z ) − | ≤ b σ ( b K ) <

1, for any z ∈ b K . Thus, under the assumption that µ and µ are both supported at least at threepoints, (3.25) follows from Lemma 3.2 and Lemma 3.3.Collecting the results of this section, we obtain the proof of Theorem 2.5. Proof of Theorem 2.5.

Lemma 3.3 proves (2.14). Lemma 3.4 proves (2.15). (cid:3) Perturbations of the system (2.11)In this section, we study perturbations of the system Φ µ ,µ ( ω , ω , z ) = 0, where µ , µ denote general compactly supported probability measures on R . The main results ofthis section, Proposition 4.1 below, is used repeatedly in the continuity argument to proveTheorem 2.8. Yet, as noted in Corollary 2.6, it is of interest itself and it is also used in [3]. Proposition 4.1.

Fix z ∈ C + . Assume that the functions e ω , e ω , e r , e r : C + → C satisfy Im e ω ( z ) > , Im e ω ( z ) > and Φ µ ,µ ( e ω ( z ) , e ω ( z ) , z ) = e r ( z ) , (4.1) where e r ( z ) := ( e r ( z ) , e r ( z )) ⊤ . Assume moreover that there is δ ∈ [0 , such that | e ω ( z ) − ω ( z ) | ≤ δ , | e ω ( z ) − ω ( z ) | ≤ δ , (4.2) where ω ( z ) , ω ( z ) solve the unperturbed system Φ µ ,µ ( ω , ω , z ) = 0 . Assume that thereis a constant S such that Φ is linearly S -stable at ( ω ( z ) , ω ( z )) , and assume in additionthat there are strictly positive constants K and k with k > δ and with k > δKS such that < k ≤ Im ω ( z ) ≤ K , < k ≤ Im ω ( z ) ≤ K . (4.3)

Then we have the bounds | e ω ( z ) − ω ( z ) | ≤ S k e r ( z ) k , | e ω ( z ) − ω ( z ) | ≤ S k e r ( z ) k . (4.4) Proof.

Combining (4.3) and (4.2) with δ < k , we getIm e ω ( z ) ≥ k , Im e ω ( z ) ≥ k . (4.5)Next, we bound higher derivatives of F i ≡ F µ i , i = 1 ,

2. We ﬁrst note that by the Nevanlinnarepresentation (3.3) we haveIm F i ( ω )Im ω = 1 + Z R x | x − ω | d ρ F i ( x ) , ω ∈ C + , i = 1 , . (4.6)On the other hand, we also have from (3.3) that | F ′ i ( ω ) − | ≤ Z R x | x − ω | d ρ F i ( x ) , ω ∈ C + , i = 1 , , (4.7) and analogously for higher derivatives, n ≥ | F ( n ) i ( ω ) | ≤ Z R x | x − ω | n +1 d ρ F i ( x ) ≤ ω ) n − Z R x | x − ω | d ρ F i ( x ) , (4.8) ω ∈ C + , i = 1 ,

2. Thus, combining (4.7), (4.6) and (2.11) we get | F ′ i ( ω j ( z )) − | ≤ Im F i ( ω j ( z )) − Im ω j ( z )Im ω j ( z ) = Im ω i ( z ) − Im z Im ω j ( z ) , { i, j } = { , } , (4.9) z ∈ C + , and similarly, starting from (4.8), | F ( n ) i ( ω j ( z )) | ≤ Im ω i ( z ) − Im z (Im ω j ( z )) n , z ∈ C + , { i, j } = { , } . (4.10)Let Ω i ( z ) := e ω i ( z ) − ω i ( z ), i = 1 ,

2, and Ω := (Ω , Ω ) ⊤ . Fixing z = z and Taylorexpanding F ( e ω ( z )) around ω ( z ) we get F ′ ( ω ( z ))Ω ( z ) − Ω ( z ) − Ω ( z ) = e r ( z ) − X n ≥ n ! F ( n )1 ( ω ( z ))Ω ( z ) n . (4.11)Recalling that k Ω( z ) k / k ≤ δ/k < | F ′ ( ω ( z ))Ω ( z ) − Ω ( z ) − Ω ( z ) | ≤ k e r ( z ) k + K k k Ω( z ) k , (4.12)and the analogous expansion with the rˆoles of the indices 1 and 2 interchanged. We thereforeobtain from (2.12) and from solving the linearized equation that k Ω( z ) k ≤ S k e r ( z ) k + KS k k Ω( z ) k . (4.13)Thus, we have the dichotomy that either k Ω( z ) k ≤ S k e r k or 2( KS ) − k ≤ k Ω( z ) k . Since k > δKS by assumption, the second alternative contradicts k Ω( z ) k ≤ δ . This proves theestimates in (4.4). (cid:3) In Proposition 4.1 we assumed the apriori bound | e ω i − ω i | ≤ δ ; see (4.2). The next lemmashows that we may drop this assumption, for spectral parameters z with suﬃciently largeimaginary part, at the price of assuming eﬀective lower bounds on Im e ω i . This statementwill be used as an initial input to start the continuity argument in Section 6. Lemma 4.2.

Assume there is a (large) e η > such that for any z ∈ C + with Im z ≥ e η theanalytic functions e ω , e ω , e r , e r : C + → C satisfy Im e ω ( z ) − Im z ≥ k e r ( z ) k , Im e ω ( z ) − Im z ≥ k e r ( z ) k . (4.14) and Φ µ ,µ ( e ω ( z ) , e ω ( z ) , z ) = e r ( z ) , (4.15) where e r ( z ) := ( e r ( z ) , e r ( z )) ⊤ .Then there is a constant η > , with η ≥ e η , such that | e ω ( z ) − ω ( z ) | ≤ k e r ( z ) k , | e ω ( z ) − ω ( z ) | ≤ k e r ( z ) k , (4.16) on the domain { z ∈ C + : Im z ≥ η } , where ω and ω are the subordination functionsassociated with µ and µ . The constant η depends on the measures µ and µ , and on thefunction e r through the constant e η > . Proof.

Recall the Nevanlinna representation (3.3) for F µ and F µ . Since µ and µ arecompactly supported, we have, as Im ω ր ∞ , F µ ( ω ) − ω = a + O ( | ω | − ) , F µ ( ω ) − ω = a + O ( | ω | − ) , (4.17)with a ≡ a F µ and a ≡ a F µ . There are thus e s , e s : C + → C such thatΦ µ ,µ ( e ω ( z ) , e ω ( z ) , z ) = (cid:18) a + e s ( z ) − e ω ( z ) + za + e s ( z ) − e ω ( z ) + z (cid:19) = (cid:18) e r ( z ) e r ( z ) (cid:19) , (4.18)with e s ( z ) = O ( | e ω ( z ) | − ) , e s ( z ) = O ( | e ω ( z ) | − ) , (4.19)as Im z ր ∞ . It follows immediately that e ω ( z ) = O (Im z ) and e ω ( z ) = O (Im z ), asIm z ր ∞ . Thus, recalling the deﬁnition of Γ µ ,µ in (2.12), we getΓ µ ,µ ( e ω ( z ) , e ω ( z )) = 1 + O ( η − ) , (4.20)as η = Im z ր ∞ . In particular, we obtain k ((DΦ) − Φ)( e ω ( z ) , e ω ( z ) , z ) k ≤ k Γ µ ,µ ( e ω ( z ) , e ω ( z )) kk Φ( e ω ( z ) , e ω ( z ) , z ) k≤ k e r ( z ) k , (4.21)for Im z suﬃciently large. From (4.8) and (4.17), we also get | F (2) µ i ( ω ) | ≤ Im F µ i ( ω ) − Im ω (Im ω ) = O ((Im ω ) − ) , ω ∈ C + , i = 1 , , (4.22)as Im ω ր ∞ . Thus the matrix of second derivatives of Φ given byD Φ( ω , ω ) := (cid:18) ∂ Φ ∂ω ( ω , ω , z ) , ∂ Φ ∂ω ( ω , ω , z ) (cid:19) = F (2) µ ( ω ) F (2) µ ( ω ) 0 ! , satisﬁes k D Φ( e ω ( z ) , e ω ( z )) k = O (Im z ) − , as Im z ր ∞ . Hence, choosing η > s := 2 k e r ( z ) k k D Φ( e ω ( z ) , e ω ( z )) k < , on the domain { z ∈ C + : Im z ≥ η } . Thus, by the Newton-Kantorovich theorem(see, e.g., Theorem 1 in [25]), there are for every such z unique b ω ( z ) , b ω ( z ) such thatΦ µ ,µ ( b ω ( z ) , b ω ( z ) , z ) = 0, with | e ω i ( z ) − b ω i ( z ) | ≤ − √ − s s k e r ( z ) k ≤ k e r ( z ) k , i = 1 , . (4.23)Finally, we note that Im b ω ( z ) = Im b ω ( z ) − Im e ω ( z )+Im e ω ≥ Im z , by (4.14), for Im z ≥ η .Similarly, Im b ω ( z ) ≥ Im z , for Im z ≥ η . It further follows that Γ µ ,µ ( b ω ( z ) , b ω ( z )) = 0,for all z ∈ C + with Im z ≥ η , thus b ω ( z ) and b ω ( z ) are analytic on { z ∈ C : Im z > η } since F µ and F µ are. Finally, using (4.17) with ω = b ω , ω = b ω respectively, we see thatlim η ր∞ Im b ω (i η )i η = lim η ր∞ Im b ω (i η )i η = 1 . Thus, by the uniqueness claim in Proposition 2.1, b ω ( z ) , b ω ( z ) agree with ω ( z ) , ω ( z ) onthe domain { z ∈ C + : Im z ≥ η } . This proves (4.16) from (4.23). (cid:3) Proof of Theorem 2.7

In the setup of Theorem 2.7 we have two pairs of probability measures on R , µ α , µ β and µ A , µ B , where we consider µ α , µ β as “reference” measures (in the sense that they satisfythe assumptions of Theorem 2.7), while µ A , µ B are arbitrary. Under the assumptions ofTheorem 2.7 we can apply Theorem 2.5 with the choices µ α = µ and µ β = µ .Recall from (2.13) the deﬁnition of the domain S I ( a, b ), a ≤ b . Lemma 5.1.

Let µ A , µ B and µ α , µ β be the probability measures from (2.24) and (2.25) satisfying the assumptions of Theorem 2.7. Let ω A , ω B and ω α , ω β denote the associated sub-ordination functions by Proposition 2.1. Let I ⊂ B µ α ⊞ µ β be a compact non-empty interval.Fix < η M < ∞ .Then there are a (small) constant b > and a (large) constant K < ∞ , both dependingon the measures µ α and µ β , on the interval I and on the constant η M , such that whenever d L ( µ A , µ α ) + d L ( µ B , µ β ) ≤ b , (5.1) holds, then | ω A ( z ) − ω α ( z ) | ≤ K d L ( µ A , µ α )(Im ω β ( z )) + K d L ( µ B , µ β )(Im ω α ( z )) , | ω B ( z ) − ω β ( z ) | ≤ K d L ( µ A , µ α )(Im ω β ( z )) + K d L ( µ B , µ β )(Im ω α ( z )) , (5.2) hold uniformly on S I (0 , η M ) . In particular, choosing b ≤ b suﬃciently small and assumingthat d L ( µ A , µ α ) + d L ( µ B , µ β ) ≤ b , we have, max z ∈S I (0 ,η M ) | ω A ( z ) | ≤ K , max z ∈S I (0 ,η M ) | ω B ( z ) | ≤ K , (5.3)min z ∈S I (0 ,η M ) Im ω A ( z ) ≥ k , min z ∈S I (0 ,η M ) Im ω B ( z ) ≥ k , (5.4) where K and k are the constant from Lemma 3.2 and Lemma 3.3, respectively.Remark . Armed with the conclusions of Theorem 2.5, our proof follows closely the argu-ments of [30]. We further remark that the main argument in the proof of Lemma 5.1 is diﬀer-ent from the ones given in Section 4: it crucially relies on the global uniqueness of solutionson the upper half plane for both systems, Φ µ α ,µ β ( ω α , ω β , z ) = 0 and Φ µ A ,µ B ( ω A , ω B , z ) = 0,asserted by Proposition 2.1. Proof of Lemma 5.1.

We ﬁrst write the system Φ µ α ,µ β ( ω α ( z ) , ω β ( z ) , z ) = 0 asΦ µ A ,µ B ( ω α ( z ) , ω β ( z ) , z ) = r ( z ) , z ∈ C + , with r ( z ) ≡ (cid:18) r A ( z ) r B ( z ) (cid:19) := (cid:18) F µ A ( ω β ( z )) − F µ α ( ω β ( z )) F µ B ( ω α ( z )) − F µ β ( ω α ( z )) (cid:19) . (5.5)From Lemma 3.3, we know that the imaginary parts of the subordination functions ω α , ω β are uniformly bounded from below on S I (0 , η M ). Next, integration by parts reveals that forany probability measures µ and µ , | m µ ( z ) − m µ ( z ) | ≤ c d L ( µ , µ )Im z (cid:18) z (cid:19) , z ∈ C + , (5.6)with some numerical constant c ; see, e.g., [31]. Thus, | F µ A ( ω β ( z )) − F µ α ( ω β ( z )) | = | m µ A ( ω β ( z ))) − m µ α ( ω β ( z )) || m µ A ( ω β ( z )) m µ α ( ω β ( z )) | ≤ C d L ( µ A , µ α )(Im ω β ( z )) , (5.7) with a new constant C that depends on the lower bound of Im m µ α ( ω β ( z )) = Im m µ α ⊞ µ β ( z )which is strictly positive on S I (0 , η M ); c.f., (3.12). Here we usedIm m µ A ( ω β ( z )) ≥ Im m µ α ( ω β ( z )) − | Im m µ A ( ω β ( z )) − Im m µ α ( ω β ( z )) | ≥

12 Im m µ α ( ω β ( z )) , as follows from (5.6) for small enough d L ( µ A , µ α ) ≤ b . Repeating the argument with therˆoles of A and B interchanged, we arrive at | r A ( z ) | ≤ C d L ( µ A , µ α )(Im ω β ( z )) , | r B ( z ) | ≤ C d L ( µ B , µ β )(Im ω α ( z )) , z ∈ S I (0 , η M ) , (5.8)for some constant C . Recalling the deﬁnition of Γ in (2.12), we get for suﬃciently small b ,Γ µ A ,µ B ( ω α , ω β ) ≤ µ α ,µ β ( ω α , ω β ) ≤ S , (5.9)where S is from Lemma 3.4, and where we also use Lemma 3.3 and the assumption d L ( µ A , µ α )+d L ( µ B , µ β ) ≤ b . The Newton–Kantorovich theorem then implies ( c.f., the proof of Lemma 4.2for a similar application) that there are b ω A ( z ), b ω B ( z ) satisfyingΦ µ A ,µ B ( b ω A ( z ) , b ω B ( z ) , z ) = 0 , z ∈ S I (0 , η M ) , (5.10)and | ω α ( z ) − b ω A ( z ) | ≤ k r ( z ) k , | ω β ( z ) − b ω B ( z ) | ≤ k r ( z ) k , (5.11) z ∈ S I (0 , η M ). Invoking (5.8), (5.11) and Lemma 3.3, we see that b ω A ( z ) ∈ C + and b ω B ( z ) ∈ C + , for any z ∈ S I (0 , η M ) if b is suﬃciently small. Yet, by the global uniqueness of solutionsasserted in Proposition 2.1, we must have b ω A ( z ) = ω A ( z ), b ω B ( z ) = ω B ( z ), z ∈ C + . Togetherwith (5.11) and (5.8) this implies (5.2) and concludes the proof. Then, choosing b suﬃcientlysmall, (5.3) and (5.4) are direct consequences of (5.2), Lemma 3.2 and Lemma 3.3. (cid:3) With the aid of Lemma 3.4, we prove the stability of the system Φ µ A ,µ B ( ω A , ω B , z ) = 0. Corollary 5.2.

Under the assumptions of Lemma 5.1, there is a (small) constant b > ,depending on the measures µ α and µ β , on the interval I and on the constant η M , such that d L ( µ A , µ α ) + d L ( µ B , µ β ) ≤ b (5.12) implies max z ∈S I (0 ,η M ) Γ µ A ,µ B ( ω A ( z ) , ω B ( z )) ≤ S (5.13) and max z ∈S I (0 ,η M ) | ω ′ A ( z ) | ≤ S , max z ∈S I (0 ,η M ) | ω ′ B ( z ) | ≤ S , (5.14) where ω A ( z ) , ω B ( z ) satisfy Φ µ A ,µ B ( ω A ( z ) , ω B ( z ) , z ) = 0 and S is the constant in Lemma 3.4.Proof. Let Γ ≡ Γ µ A ,µ B ( ω A ( z ) , ω B ( z )). Analogously to (3.21), we have Γ = 1 (cid:12)(cid:12) − ( F ′ µ A ( ω B ) − F ′ µ B ( ω A ) − (cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − − F ′ µ A ( ω B ( z )) + 1 − F ′ µ B ( ω A ( z )) + 1 − (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) . (5.15) Using the bounds (5.3) and (5.4) for suﬃciently small b , we follow, mutatis mutandis, theproof of Lemma 3.4 to get (5.13). The estimates in (5.14) then follow as in Lemma 3.4. (cid:3) We are now ready to complete the proof of Theorem 2.7. Proof of Theorem 2.7.

Recall that m µ A ⊞ µ B ( z ) = m µ A ( ω B ( z )), z ∈ C + . We ﬁrst note that | m µ A ( ω B ( z )) − m µ α ( ω B ( z )) | ≤ C d L ( µ A , µ α )Im ω β ( z ) (cid:18) ω β ( z ) (cid:19) , for some numerical constant C ; c.f., (5.6). Thus using (5.4) we get | m µ A ( ω B ( z )) − m µ α ( ω B ( z )) | ≤ K k − d L ( µ A , µ α ) , z ∈ S I (0 , η M ) , for some numerical constant K . Choosing b as in Lemma 5.1 and assuming that d L ( µ A , µ α )+d L ( µ A , µ β ) ≤ b , we get from (5.2) that | m µ α ( ω B ( z )) − m µ α ( ω β ( z )) | ≤ K k − ((d L ( µ A , µ α ) + d L ( µ A , µ β )) , z ∈ S I (0 , η M ) . Setting Z := K k − + K k − we thus obtain (2.22). (cid:3) Remark . Note that under the assumptions of Theorem 2.7, we have, for d L ( µ A , µ α ) +d L ( µ A , µ β ) ≤ b , the bounds κ / ≤ | m µ A ⊞ µ B ( z ) | ≤ /k , (5.16)uniformly on S I (0 , η M ) with κ > k > Proof of Theorem 2.8

Before we immerse into the details of the proof of Theorem 2.8, we outline how The-orem 2.5 and the local stability results of Section 4 in combination with concentrationestimates for the unitary groups lead to the local law in (2.28).6.1.

Outline of proof.

We brieﬂy outline of our proof when U is Haar distributed on U ( N ).Since we are interested in the tracial quantity m H of H = A + U BU ∗ , we may replace H by the matrix e H := V AV ∗ + U BU ∗ , (6.1)where V is another Haar unitary independent from U . By cyclicity of the trace we have m H = m e H and we study m e H below. We emphasize that this replacement is a convenient techni-cality which is not essential to our proof.Using the shorthand e A := V AV ∗ , e B := U BU ∗ , (6.2)we introduce the Green functions G e A ( z ) := ( e A − z ) − , G e B ( z ) := ( e B − z ) − , z ∈ C + . (6.3)For a given N × N matrix Q , we introduce the function f Q ( z ) := tr QG e H ( z ) , z ∈ C + , (6.4)where G e H = ( e H − z ) − is the Green function of e H . We deﬁne the approximate subordinationfunctions , ω cA and ω cB , by setting ω cA ( z ) := z − E f e A ( z ) E m e H ( z ) , ω cB ( z ) := z − E f e B ( z ) E m e H ( z ) , z ∈ C + , (6.5)where the expectation E is with respect to both Haar unitaries U and V . From the identity( e H − z ) G e H ( z ) = 1, z ∈ C + , we then obtain the relation ω cA ( z ) + ω cB ( z ) − z = − E m e H ( z ) , z ∈ C + , (6.6) reminiscent to ( c.f., (2.5)–(2.6)) ω A ( z ) + ω B ( z ) − z = − m A ⊞ B ( z ) , z ∈ C + . For the proof of Theorem 2.8, we decompose m e H ( z ) − m A ⊞ B ( z ) = (cid:0) m e H ( z ) − E m e H ( z ) (cid:1) + (cid:0) E m e H ( z ) − m A ⊞ B ( z ) (cid:1) , (6.7)where we abbreviate m A ⊞ B ≡ m µ A ⊞ µ B . To control the ﬂuctuation part, m e H ( z ) − E m e H ( z ),we rely on the Gromov–Milman concentration inequality [26] for the unitary group; see (6.22)below. To control the deterministic part, we ﬁrst note that, by (6.6) and m A ⊞ B ( z ) = m A ( ω B (( z )), bounding | E m e H ( z ) − m A ⊞ B ( z ) | amounts to bounding | ω cA ( z ) − ω A ( z ) | and | ω cB ( z ) − ω B ( z ) | . We then show that ω cA ( z ) and ω cB ( z ) are both in the upper-half plane andsatisfy Φ µ A ,µ B ( ω cA ( z ) , ω cB ( z ) , z ) = r ( z ) , z ∈ S I ( η m , η M ) , (6.8)for some small error r ( z ) ∈ C + , i.e., we consider (6.8) as a perturbation of the sys-tem Φ µ A ,µ B ( ω A ( z ) , ω B ( z ) , z ) = 0; c.f., (2.10). The formal derivation of (6.8) goes backto Pastur and Vasilchuk [33]. Using Proposition 4.1 (with rough a priori estimates on | ω cA ( z ) − ω A ( z ) | and | ω cB ( z ) − ω B ( z ) | obtained from the continuity argument below) andstability results of Theorem 2.5 and of Section 5, we then bound | ω cA ( z ) − ω A ( z ) | and | ω cB ( z ) − ω B ( z ) | in terms of r ( z ).In sum, for ﬁxed z ∈ C + , our proof includes two parts: ( i ) estimation of the error r ( z )in (6.8) and ( ii ) concentration for m e H ( z ) around E m e H ( z ). Both parts rely on the estimates E m e H ( z ) , ω cA ( z ) , ω cB ( z ) ∼ , Im ω cA ( z ) , Im ω cB ( z ) & , z ∈ S I ( η m , η M ) . (6.9)Note that the quantities in (6.9) are obtained from the Green function of e H by averagingwith respect to the Haar measure. Similar bounds for m A ⊞ B , ω A and ω B were obtained inSection 5. These latter quantities are deﬁned directly from µ A and µ B via Proposition 2.1To establish (6.9), we use a similar continuity argument as was used for Wigner matricesin [24]: For Im z = η M suﬃciently large, the estimates in (6.9) directly follow from deﬁnitions.For z = E + i η , with E ∈ I ﬁxed, we decrease η = η M down to η = η m in steps of size O ( N − ), where, at each step, we invoke parts ( i ) and ( ii ). However, a direct application ofthe Gromov–Milman concentration inequality for part ( i ) does not allow to push η belowthe mesoscopic scale η = N − / . Indeed, the Gromov–Milman inequality is eﬀective if L /N = o (1), where L is the Lipschitz constant of m e H ( z ) with respect to the Haar unitary V .It is roughly bounded by p tr | G e H ( z ) | /N , which in turn is trivially bounded by 1 / p N η ,giving the η ≥ N − / γ , γ >

0, threshold. However, in reality, the random quantity p tr | G e H ( z ) | /N is typically of order 1 / p N η as follows by combining the deterministicestimate tr | G e H ( z ) | ≤ η − Im m e H ( z ) with a probabilistic order one bound for Im m e H ( z ).Our key novelty here is to capitalize on this latter information. We introduce a smoothcutoﬀ that regularizes m e H ( z ) and then apply the Gromov–Milman inequality for this regu-larized quantity. With the bound 1 / p N η for the Lipschitz constant, we get concentrationestimates down to scales η ≥ N − / γ , γ > Notation.

The following notation for high-probability estimates is suited for our pur-poses. A slightly diﬀerent form was ﬁrst used in [21].

Deﬁnition 6.1.

Let X = ( X ( N ) ( v ) : N ∈ N , v ∈ V ( N ) ) , Y = ( Y ( N ) ( v ) : N ∈ N , v ∈ V ( N ) ) (6.10) be two families of nonnegative random variables where V ( N ) is a possibly N -dependent pa-rameter set. We say that Y stochastically dominates X , uniformly in v , if for all (small) ǫ > and (large) D > , P (cid:18) [ v ∈V ( N ) (cid:26) X ( N ) ( v ) > N ǫ Y ( N ) ( v ) (cid:27)(cid:19) ≤ N − D , (6.11) for suﬃciently large N ≥ N ( ǫ, D ) . If Y stochastically dominates X , uniformly in v , wewrite X ≺ Y . If we wish to indicate the set V ( N ) explicitly, we write that X ( v ) ≺ Y ( v ) forall v ∈ V ( N ) . Localized Gromov–Milman concentration estimate.

In this subsection, we de-rive concentration bounds for some key tracial quantities. They are tailored for the continuityargument of Subsection 6.3 used to complete the proof of Theorem 2.8. The argument workswith U , V independent and both Haar distributed on U ( N ) or on O ( N ). Below, E denotesthe expectation with respect Haar measure.In the rest of this section, we let I ⊂ B µ α ⊞ µ β denote the compact non-empty subset ﬁxedin Theorem 2.8. Also recall from Theorem 2.8 that we set η m = N − / γ , γ >

0. Below wechoose the constant η M ∼ < η M < ∞ arbitrary. Recall from (6.4) the notation f Q ,where Q is an arbitrary N × N matrix. Proposition 6.2.

Let Q be a given N × N deterministic matrix with k Q k . . Fix E ∈ I and b η ∈ [ η m , η M ] . Then Im m e H ( E + i η ) ≺ , ∀ η ∈ [ b η, η M ] , (6.12) implies the concentration bound | f V QV ∗ ( E + i b η ) − E f V QV ∗ ( E + i b η ) | ≺ p N b η . (6.13) The same concentration holds with

V QV ∗ replaced by U QU ∗ .Proof. For ﬁxed E ∈ I , we consider z = E + i η ∈ C + as a varying spectral parameter anduse b z = E + i b η for the speciﬁc choice in the lemma. By the deﬁnition of f ( · ) and cyclicity ofthe trace, we have f V QV ∗ ( z ) = tr V QV ∗ (cid:0) V AV ∗ + U BU ∗ − z (cid:1) − = tr Q (cid:0) A + V ∗ U BU ∗ V − z (cid:1) − , (6.14)where tr ( · ) stands for the normalized trace. For simplicity, we denote W := V ∗ U, H := A + W BW ∗ , G H ( z ) := ( H − z ) − . (6.15)Observe that W is Haar distributed on U ( N ), respectively O ( N ), too. By cyclicity of thetrace we have tr G H ( z ) = tr G H ( z ) = m H ( z ). According to (6.14) and (6.15), we may regardin the sequel f V QV ∗ as a function of the Haar unitary matrix W by writing h ( z ) = h W ( z ) := f V QV ∗ ( z ) . For any ﬁxed (small) ε >

0, let b χ be a smooth cutoﬀ supported on [0 , N ε ], with b χ ( x ) = 1, x ∈ [0 , N ε ], and with bounded derivatives. Since m H ( z ) = tr G H ( z ), we can regard m H ( z )as a function of W and write χ ( z ) = χ W ( z ) := b χ (Im m H ( z )) . (6.16) We then introduce a regularization, e h W , of h W by setting e h ( z ) = e h W ( z ) := h W ( z ) ⌈− log η ⌉ Y n =0 χ W ( E + i2 n η ) . (6.17)We will often drop the W subscript from the notations h W ( z ), e h W ( z ) and χ W ( z ) butremember that these are random variables depending on the Haar unitary W .We will use assumption (6.12) at dyadic points, i.e., thatIm m H ( E + i2 l b η ) ≺ , ≤ l ≤ ⌈− log b η ⌉ , (6.18)(recall that m H ( z ) = m e H ( z ) so we may drop the tilde in the subscript of m ). Hence,by (6.16) and (6.18) we see that, for arbitrary large D > ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) = 1 , i.e., e h ( b z ) = h ( b z ) , (6.19)with probability larger than 1 − N − D , for N suﬃciently large (depending on ε and D ).Taking the trivial bound k Q k / b η for h ( b z ) and for e h ( b z ) into account, we also have E e h ( b z ) − E h ( b z ) = O (cid:0) N − D +1 (cid:1) . (6.20)To prove (6.13), it therefore suﬃces to establish the concentration estimate (cid:12)(cid:12)(cid:12)e h ( b z ) − E e h ( b z ) (cid:12)(cid:12)(cid:12) ≺ p N b η , (6.21)for the regularized quantity e h ( b z ).To verify (6.21), we use the Gromov–Milman concentration inequality [26] (see Theo-rem 4.4.27 in [2] for similar applications) which states the following. Let M ( N ) = SO( N )or SU( N ) endowed with the Riemann metric k d s k inherited from M N ( C ) (equipped withthe Hilbert-Schmidt norm). If g : ( M ( N ) , k d s k ) → R is an L -Lipschitz function satisfying E g = 0, then P ( | g | > δ ) ≤ e − c Nδ L , ∀ δ > , (6.22)with some numerical constant c > N ).Here P and E are with respect Haar measure on M ( N ).In order to apply (6.22) to the function W e h W ( b z ) = e h ( b z ), we need to control itsLipschitz constant. To that end, we deﬁne the eventΩ( b η ) ≡ Ω E ( b η ) := (cid:8) Im m H ( E + i2 n b η ) ≤ N ε : ∀ n ∈ N (cid:9) . (6.23)To bound the Lipschitz constant, we need to bound quantities of the form tr | G H ( b z ) | k re-stricted to the event Ω( b η ). Let ( λ i ( H )) denote the eigenvalues of H and introduce I n := [ E − n b η, E + 2 n b η ] ∩ I , N n := |{ i : λ i ( H ) ∈ I n }| , n ∈ N . Since H and H are unitarily equivalent, their empirical eigenvalue distributions are thesame, µ H ; c.f., (2.29). Using the deﬁnition of the Stieltjes transform we have, for all n ∈ N ,the estimate N n = N Z I n d µ H ≤ N · n +1 b η Z E +2 n b ηE − n b η b η d µ H ( x )( x − E ) + b η ≤ N · n b η Im m H ( b z ) . Thus we have on the event Ω( b η ) that N n . n N ε b η , ∀ n ∈ N . (6.24) By the spectral theorem, we can boundtr | G H ( b z ) | . N N X i =1 | λ i ( H ) − E | + b η . (6.25)Then we observe (with the convention I − = ∅ ) that1 N N X i =1 | λ i ( H ) − E | + b η = 1 N N X i =1 ∞ X n =0 ( λ i ∈ I n \ I n − ) 1 | λ i ( H ) − E | + b η = 1 N ⌈ c log N ⌉ X n =0 X λ i ∈I n \I n − | λ i ( H ) − E | + b η , where we used kHk ≤ C to truncate the sum over n at ⌈ c log N ⌉ . We then bound (Ω( b η )) 1 N ⌈ c log N ⌉ X n =0 X λ i ∈I n \I n − | λ i ( H ) − E | + b η ≤ (Ω( b η )) 1 N ⌈ c log N ⌉ X n =0 N n n b η . N ε log N , where we used (6.24), i.e., with (6.25) we arrive at (Ω( b η ))tr | G H ( b z ) | . N ε log N . (6.26)Using the spectral decomposition of H we see thattr | G H ( b z ) | = 1 N N X i =1 | λ i ( H ) − E | + b η = 1 N b η N X i =1 b η | λ i ( H ) − E | + b η = Im m H ( b z ) b η , (6.27)where we also used that tr G H ( b z ) = tr G H ( b z ) = m H ( b z ). Thus, we bound (Ω( b η ))tr | G H ( b z ) | k ≤ (Ω( b η )) b η − k +1 Im m H ( b z ) . N ε b η − k +1 , ∀ k ≥ . (6.28)Having established (6.26) and (6.28), we proceed to estimate the Lipschitz constant of e h ( b z )as a function of W . Let su ( N ) and so ( N ) denote the (fundamental representations in M N ( C )of the) Lie algebras of SU ( N ) and SO ( N ) respectively. Let m stand for either su ( N ) or so ( N ). Note that X ∈ m satisﬁes X ∗ = − X . Since SU ( N ) and SO ( N ) are matrix groupsthe Lie bracket of su ( N ) and so ( N ) respectively is given by the commutator in the matrixalgebras. For ﬁxed X ∈ M N ( C ), we let ad X : M N ( C ) → M N ( C ), Y ad X ( Y ) := XY − Y X . For X ∈ m and t ∈ R , we may write e t ad X ( W BW ∗ ) = (e tX W ) B (e tX W ) ∗ , wherewe used that X ∗ = − X . Further note thatdd t e t ad X ( W BW ∗ ) = e t ad X ad X ( W BW ∗ ) . (6.29)For X ∈ m with k X k = 1, we let G H ( z, tX ) := (cid:16) A + e t ad X ( W BW ∗ ) − z (cid:17) − , t ∈ R , and denote accordingly m H ( z, tX ) := tr G H ( z, tX ) , χ ( z, tX ) := b χ (Im m H ( z, tX )) , e h ( z, tX ) := tr QG H ( z, tX ) ⌈− log b η ⌉ Y l =0 χ ( E + i2 l η, tX ) , with χ ( z, ≡ χ ( z ), e h ( z, ≡ e h (0), etc . Evaluating the derivative of e h ( b z, tX ) with respect to t at t = 0 we get ∂∂t e h ( b z, tX ) (cid:12)(cid:12)(cid:12) t =0 = − tr (cid:16) QG H ( b z )ad X ( W BW ∗ ) G H ( b z ) (cid:17) ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) − tr (cid:0) QG H ( b z ) (cid:1)(cid:18) ⌈− log b η ⌉ X j =0 h ⌈− log b η ⌉ Y l =0 l = j χ ( E + i2 l b η ) i · φ ( E + i2 j b η ) × Im tr (cid:16) G H ( E + i2 j b η ) ad X ( W BW ∗ ) G H ( E + i2 j b η ) (cid:17)(cid:19) , (6.30)where we used (6.29) and where we introduced φ ( z ) := b χ ′ (Im m H ( z )), with b χ ′ the derivativeof b χ . Recalling (6.16) and the deﬁnition of the cutoﬀ b χ , we note the bounds ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) ≤ , ⌈− log b η ⌉ X j =0 h Y l = j χ ( E + i2 l b η ) i · φ ( E + i2 j b η ) = O (log N ) . (6.31)On the event Ω c ( b η ), the complementary event to Ω( b η ), we further have the identities ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) = 0 , ⌈− log b η ⌉ X j =0 h ⌈− log b η ⌉ Y l =0 l = j χ ( E + i2 l b η ) i · φ ( E + i2 j b η ) = 0 . (6.32)It thus suﬃces to bound (6.30) on the event Ω( b η ). We bound the ﬁrst term on the right sideof (6.30) as (Ω( b η )) (cid:12)(cid:12)(cid:12) tr (cid:16) QG H ( b z )ad X ( W BW ∗ ) G H ( b z ) (cid:17) ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) (cid:12)(cid:12)(cid:12) ≤ (Ω( b η )) 1 N k ad X ( W BW ∗ ) k k G H ( b z ) QG H ( b z ) k ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) , (6.33)where we used cyclicity of the trace and Cauchy–Schwarz inequality. Next, note that k ad X ( W BW ∗ ) k ≤ k B kk X k ≤ k B k , where we used the deﬁnition of ad X , k W k ≤ k X k = 1. Similarly, we have k G H ( b z ) QG H ( b z ) k ≤ k Q kk G H ( b z ) G ∗H ( b z ) k . Thus from (6.33), (Ω( b η )) (cid:12)(cid:12)(cid:12) tr (cid:16) QG H ( b z )ad X ( W BW ∗ ) G H ( b z ) (cid:17) ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) (cid:12)(cid:12)(cid:12) ≤ k B kk Q k (Ω( b η )) (cid:18) tr | G H ( b z ) | N (cid:19) ⌈− log b η ⌉ Y l =0 χ ( E + i2 l b η ) . (cid:18) N ε N b η (cid:19) , (6.34)where we used (6.28) with k = 4 in the last step. To handle the second term on the right side of (6.30), we use (6.31) and (6.26) to get (Ω( b η )) (cid:12)(cid:12)(cid:12) tr (cid:0) QG H ( b z ) (cid:1) ⌈− log b η ⌉ X j =0 ⌈− log b η ⌉ Y l =0 l = j χ ( E + i2 l b η ) · φ ( E + i2 j b η ) × Im tr (cid:16) G H ( E + i2 j b η )ad X ( W BW ∗ ) G H ( E + i2 j b η ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ (Ω( b η )) k Q k tr | G H ( b z ) | ⌈− log b η ⌉ X j =0 ⌈− log b η ⌉ Y l =0 l = j χ ( E + i2 l b η ) | φ ( E + i2 l b η ) |× k B kk Q k (cid:18) tr | G H ( E + i2 j b η ) | N (cid:19) . (cid:18) N ε N b η (cid:19) . (6.35)Combining (6.35) and (6.34) we obtain, for any X ∈ su ( N ) or so ( N ) with k X k = 1, that (cid:12)(cid:12)(cid:12)(cid:12) ∂∂t e h ( b z, tX ) (cid:12)(cid:12)(cid:12) t =0 (cid:12)(cid:12)(cid:12)(cid:12) . (cid:18) N ε N b η (cid:19) , (6.36) i.e., the Lipschitz constant of e h ( b z ) as a function of W is bounded by C (cid:16) N ε N b η (cid:17) / , for someconstant C depending only on k B k and k Q k . Thus, taking g = e h ( z ) − E e h ( z ) , L = C (cid:18) N ε N b η (cid:19) , δ = N ε / p N b η in (6.22), and choosing ε > (cid:3) Continuity argument.

In this subsection, we often omit z ∈ C + from the notation.Let U and V be independent and both Haar distributed on either U ( N ) or O ( N ). Recallingthe notation in Section 6.1, we set∆ A ( z ) := − ( IE [ m H ( z )]) G e H ( z ) − ( IE [ f e B ( z )]) G e A ( z ) G e H ( z ) , ∆ B ( z ) := − ( IE [ m H ( z )]) G e H ( z ) − ( IE [ f e A ( z )]) G e B ( z ) G e H ( z ) , z ∈ C + , (6.37)where we introduced IE X := X − E X , for any random variables X . Using the left-invarianceof Haar measure, one derives the identities E [ G e H ⊗ e AG e H ] = E [ e AG e H ⊗ G e H ] , E [ G e H ⊗ e BG e H ] = E [ e BG e H ⊗ G e H ] ;see Theorem 7 in [33] or Appendix A of [31] for proofs. Taking the partial trace for the ﬁrstcomponent of the tensor products, we get E G e H ( z ) = E G e A ( ω cB ( z )) + δ cA ( z ) , δ cA ( z ) := 1 E m H ( z ) E [ G e A ( ω cB ( z ))( e A − z )∆ A ( z )] , (6.38)where ω cB ( z ) is deﬁned in (6.5), we used (6.6), and where we implicitly assumed thatIm ω cB ( z ) >

0. This last assumption will be veriﬁed along the continuity argument. Then,we set r cA ( z ) := − tr δ cA ( z )tr G e A ( ω cB ( z ))(tr G e A ( ω cB ( z )) + tr δ cA ( z )) , (6.39) and deﬁne δ cB ( z ) and r cB ( z ) in the same way by swapping the rˆoles of A and B . Us-ing (6.38), (6.6), we eventually obtain, under the assumption that Im ω cA ( z ) > ω cB ( z ) >

0, Φ µ A ,µ B ( ω cA ( z ) , ω cB ( z ) , z ) = r c ( z ) , z ∈ C + , (6.40)with r c ( z ) = ( r cA ( z ) , r cB ( z )) ⊤ . Lemma 6.3.

Fix E ∈ I and any b η ∈ [ η m , η M ] . Set the notation z = E + i η and b z = E + i b η .Suppose that | ω cA ( z ) − ω A ( z ) | + | ω cB ( z ) − ω B ( z ) | ≤ N − γ , ∀ η = Im z ∈ [ b η, η M ] . (6.41) Moreover, assume that for the event Ξ( b η ) ≡ Ξ E ( b η ) := n | m H ( z ) − m A ⊞ B ( z ) | ≤ N − γ : z = E + i η, ∀ η ∈ [ b η, η M ] o we have P (cid:0) Ξ( b η ) (cid:1) ≥ − N − D (cid:0) N ( η M − b η ) (cid:1) , (6.42) for any D > if N ≥ N ( D ) .Then, for any ǫ > , the estimates | r cA ( b z ) | + | r cB ( b z ) | ≤ N ǫ N b η , (6.43) | ω cA ( b z ) − ω A ( b z ) | + | ω cB ( b z ) − ω B ( b z ) | ≤ N ǫ N b η , (6.44) | E m H ( b z ) − m A ⊞ B ( b z ) | ≤ N ǫ N b η , (6.45) hold for any N ≥ N ( ǫ ) . Moreover, for any ǫ, D > , the event Θ( b η ) ≡ Θ E ( b η ) := Ξ E ( b η ) ∩ n | m H ( b z ) − m A ⊞ B ( b z ) | ≥ N ǫ p N b η o (6.46) satisﬁes P (cid:0) Θ( b η ) (cid:1) ≤ N − D , (6.47) if N ≥ N ( ǫ, D ) . The threshold functions N , N , N depend only on µ α , µ β , the speed ofconvergence in (2.25) and they are uniform in b η ∈ [ η m , η M ] and E ∈ I . We postpone the proof of Lemma 6.3 and prove Theorem 2.8 ﬁrst.

Proof of Theorem 2.8.

We start with observing that it is suﬃcient to prove a version of (2.28)where the real part of the spectral parameter E is ﬁxed. This version asserts that there is alarge ( N -independent) η M , to be ﬁxed below, such that for any (small) ǫ > D ,and any ﬁxed E ∈ I , P (cid:18) [ z ∈S E ( η m ,η M ) (cid:26)(cid:12)(cid:12) m H ( z ) − m µ A ⊞ µ B ( z ) (cid:12)(cid:12) > N ǫ N (Im z ) / (cid:27)(cid:19) ≤ N D , (6.48)holds for N ≥ N , i.e., the set S I ( η m , η M ) in (2.28) is replaced with S E ( η m , η M ) := { E + i η : η ∈ [ η m , η M ] } . The threshold N depends on ǫ, D , µ α , µ β , I and on the speed of convergencein (2.25).Indeed, by introducing the discretized lattice version b S I ( a, b ) := S I ( a, b ) ∩ N − { Z × i Z } of the spectral domain S I ( a, b ) ( c.f., (2.13)) and by taking a union bound, we see that (6.48)implies P (cid:18) [ z ∈ b S I ( η m ,η M ) (cid:26)(cid:12)(cid:12) m H ( z ) − m µ A ⊞ µ B ( z ) (cid:12)(cid:12) > N ǫ N (Im z ) / (cid:27)(cid:19) ≤ CN D − . (6.49)Thanks to the Lipschitz continuity of the Stieltjes transforms m H ( z ) and m µ A ⊞ µ B ( z ) withLipschitz constant η − = (Im z ) − ≤ N , for any Im z ≥ η m , we see that (2.28) followsfrom (6.49) after a small adjustment of ǫ and D that were anyway arbitrary.From now on we ﬁx E ∈ I and our goal is to prove (6.48). We will use Lemma 6.3. Inthe ﬁrst step we verify that the assumptions of this lemma hold for b η = η M , i.e., that (6.41)and (6.42) hold for z = E + i η M . In the second step, we successively use Lemma 6.3 toreduce b η steps by steps of size N − until we have veriﬁed (6.41)–(6.42) down to b η = η m .Then (6.48) will follow from a ﬁnal application of Lemma 6.3 combined with discretizationargument similar to the one above, but this time the η variable instead of the E variable. Step 1. Initial bound.

First we note that since µ A and µ B are compactly supported, k H k is deterministically bounded, we thus have Im m H ( E +i η M ) ≤ ( η M ) − ≤ η M ≥ | f V QV ∗ ( E + i η M ) − E f V QV ∗ ( E + i η M ) | ≺ p N η , (6.50)uniformly for any deterministic Q with k Q k .

1. The analog concentration holds with V replaced by U . Using (6.50) with Q = I ( I the identity matrix), we have | IE [ m H ( E +i η M )] | ≺ N − . Hence, it suﬃces to show that | E m H ( E + i η M ) − m A ⊞ B ( E + i η M ) | ≺ N . (6.51)Recalling the deﬁnitions of ω cA and ω cB in (6.5), we have, with z = E + i η M , the expansion ω cA ( z ) = z − E tr e AG e H ( z ) E tr G e H ( z ) = z − tr Az − + E tr e A ( e A + e B ) z − z − + O ( z − ) , as η M ր ∞ . Thus using the assumption tr A = 0 we getIm ω cA ( E + i η M ) − Im η M = tr A η M + E tr e A e Bη M | E + i η M | + O (cid:18) η (cid:19) , as η M ր ∞ . Next, since V and U are independent, we have E tr V AV ∗ U BU ∗ = tr E [ V AV ∗ ] E [ U BU ∗ ] = tr A tr B = 0 , since tr A = tr B = 0 by assumption. ThusIm ω cA ( E + i η M ) − Im η M = tr A η M | E + i η M | + O (cid:18) η (cid:19) , (6.52)as η M ր ∞ . Since tr A >

0, we achieve by choosing η M suﬃciently large (but independentof N ) that Im ω cA ( E + i η M ) − Im η M ≥

12 tr A η M | E + i η M | , (6.53)and the analogue estimate holds with A replaced by B . In particular, we have, for such η M ,Im ω cA ( E + i η M ) & , Im ω cB ( E + i η M ) & , (6.54)and ω cA ( E + i η M ) ∼ ω cB ( E + i η M ) ∼ To show (6.51), we apply Lemma 4.2 to the system (6.40). Having established (6.53), itsuﬃces to show that | r cA ( E + i η M ) | ≺ N , | r cB ( E + i η M ) | ≺ N , (6.55)since then we have, for N suﬃciently large and η M as above that, for any ﬁxed ε ∈ [0 , N ε N ≤

12 tr A η M | E + i η M | ≤ Im ω cA ( E + i η M ) − Im η M , (6.56)and similarly with B replacing A . In particular, combining (6.55) and (6.56), we see thatassumption (4.14) of Lemma 4.2 (with the choice e r = r c ) is satisﬁed for N suﬃciently large(with high probability). Consequently, we see that (6.41) (even with N − ǫ instead of N − γ in the latter) hold for z = E + i η M . Finally, the equations E m H ( z ) = 1 z − ω cA ( z ) − ω cB ( z ) , m A ⊞ B ( z ) = 1 z − ω A ( z ) − ω B ( z ) , (6.57)together with the concentration estimate (6.50) yield (6.42).It remains to justify (6.55). Since i η M m H ( E + i η M ) = O ( η − ), we have E m H ( E + i η M ) ∼

1. In addition, from (6.54) it follows that m A ( ω cB ( E + i η M )) ∼ m B ( ω cA ( E + i η M )) ∼ | tr δ cA ( E + i η M ) | ≺ N , | tr δ cB ( E + i η M ) | ≺ N . (6.58)By the deﬁnitions of δ cA , δ cB in (6.38), and ∆ A , ∆ B in (6.37), it is easy to obtain (6.58) byusing (6.50) and Cauchy–Schwarz. This completes Step 1 , i.e., the veriﬁcation of (6.41)–(6.42) for b η = η M . Step 2. Induction.

Recall that ω A , ω B and m A ⊞ B (see Lemma 5.1) are uniformly boundedand ω cA ( z ), ω cB ( z ), m H ( z ), ω A ( z ), ω B ( z ) and m A ⊞ B ( z ) are Lipschitz continuous with aLipschitz constant bounded by (Im z ) − ≤ N , for any Im z ≥ η m . Applying Lemma 6.3 toconclude (6.44) with the choice ǫ = γ/

10, we see that if (6.41) and (6.42) hold for some b η ,then (6.41) also holds for b η replaced with b η − N − as long as b η ≥ η m . Moreover, by theLipschitz continuity of m H and m A ⊞ B , notice thatΞ( b η − N − ) ⊃ Ξ( b η ) \ Θ( b η ) . (6.59)Thus, if (6.42) holds for some b η , then (6.59) and (6.47) imply that (6.42) also holds for b η replaced with b η − N − . Using Step 1 as an initial input with the choice b η = η M , andapplying the above induction argument O ( N ) times by reducing b η with stepsize N − , wesee that (6.41) and (6.42) hold for all b η k ∈ [ η m , η M ] of the form b η k = η M − k · N − with someinteger k . Applying Lemma 6.3 once more for these b η k , but now with an arbitrary ǫ > | m H ( E + i b η k ) − m A ⊞ B ( E + i b η k ) | ≺ p N b η k , k = 0 , , . . . , k , (6.60)where k is the largest integer with b η k ≥ η m . The uniformity of (6.60) in k follows from thefact that the threshold functions N j in Lemma 6.3 are independent of b η . Clearly k = O ( N ),so taking a union bound of (6.60), compensating the combinatorial factor CN by replacing D by D −

5, and slightly adjusting ǫ to extend the control from the set { z = E +i b η k : k ≤ k } to all z ∈ S E ( η m , η M ), we obtain (6.48). (cid:3) It remains to prove Lemma 6.3. Proof of Lemma 6.3.

First we notice that E ∈ I and (2.25) imply that for all suﬃcientlylarge N , the bounds (5.3)-(5.4) hold. Together with (6.41) they imply that ω cA ( b z ) , ω cB ( b z ) ∼ , Im ω cA ( b z ) , Im ω cB ( b z ) & , (6.61)moreover, using (6.6) we also get 1 E m H ( b z ) . . (6.62)We start with (6.43). Thanks to symmetry, we only need to estimate | r cA ( b z ) | . By (6.61)we have k G e A ( ω cB ( b z )) k = k G A ( ω cB ( b z )) k . . (6.63)Furthermore, ω cB ( b z ) ∼ ω cB ( b z ) & m A ( ω cB ( b z )) ∼ (cid:12)(cid:12)(cid:12) E (cid:2) tr (cid:0) G e A ( ω cB ( b z ))( e A − b z )∆ A ( b z ) (cid:1)(cid:3)(cid:12)(cid:12)(cid:12) ≤ N ǫ N b η , (6.64)for any ǫ > N ≥ N ( ǫ ) is large enough, uniformly for b η ∈ [ η m , η M ]. Assuming (6.64) andrecalling the deﬁnition of δ cA and r cA in (6.38)-(6.39), from (6.62) we get the ﬁrst estimatein (6.43).Next, we prove (6.64). By the deﬁnitions in (6.37), we have E h tr (cid:0) G e A ( ω cB ( b z ))( e A − b z )∆ A ( b z ) (cid:1)i = − E h IE [ m H ( b z )] tr (cid:0) G e A ( ω cB ( b z ))( e A − b z ) G e H ( b z ) (cid:1)i − E h IE [ f B ( b z )] tr (cid:0) G e A ( ω cB ( b z )) G e H ( b z ) (cid:1)i . (6.65)We rewrite the two terms on the right side separately as covariances, E h IE [ m H ( b z )] tr (cid:0) G e A ( ω cB ( b z ))( e A − b z ) G e H ( b z ) (cid:1)i = Cov (cid:16) m H ( b z ) , tr (cid:16) G e A ( ω cB ( b z ))( e A − b z ) G e H ( b z ) (cid:17)(cid:17) , respectively, E h IE [ f e B ( b z )]tr (cid:0) G e A ( ω cB ( b z )) G e H ( b z ) (cid:1) i = Cov (cid:16) f e B ( b z ) , tr (cid:0) G e A ( ω cB ( b z )) G e H ( b z ) (cid:1)(cid:17) , where Cov( X, Y ) := E ( IE [ X ] · IE [ Y ]), for arbitrary random variables X and Y .Given (6.42) and the uniform boundedness of m A ⊞ B ( z ) from (5.16), we see that (6.12)is satisﬁed and we can apply Proposition 6.2 using diﬀerent choices for Q . Together withCauchy–Schwarz inequality | Cov(

X, Y ) | ≤ E | IE [ X ] | · E | IE [ Y ] | , we get (cid:12)(cid:12)(cid:12) Cov (cid:16) m H ( b z ) , tr (cid:16) G e A ( ω cB ( b z ))( e A − b z ) G e H ( b z ) (cid:17)(cid:17) (cid:12)(cid:12)(cid:12) ≺ N b η , (cid:12)(cid:12)(cid:12) Cov (cid:0) f e B ( b z ) , tr (cid:0) G e A ( ω cB ( b z )) G e H ( b z ) (cid:1)(cid:1) (cid:12)(cid:12)(cid:12) ≺ N b η . (6.66)More speciﬁcally, for the ﬁrst line of (6.66), we chose Q = I and Q = G A ( ω cB ( b z ))( A − b z );for the second line we chose Q = B and Q = G A ( ω cB ( b z )), where we also used the facts e A = V AV ∗ and e B = U BU ∗ . Here, we also implicitly used (6.63). Then, (6.64) followsfrom (6.66), which in turn proves (6.43).Next, using Proposition 4.1, (6.40) and (6.43) we immediately get (6.44). Moreover,since Im ω A ( z ) , Im ω B ( z ) ≥ Im z , we have | z − ω A ( z ) − ω B ( z ) | ≥ Im ω B ( z ) &

1. Togetherwith (6.44) and (6.57) this implies (6.45). Notice that (6.42) together with the uniform boundon m A ⊞ B imply the condition (6.12) in Proposition 6.2. Thus, ﬁnally (6.46) and (6.47)follow from (6.45) and the concentration inequality (6.13). This completes the proof ofLemma 6.3. (cid:3) Two point mass case

In this section, we discuss stability properties of the free additive convolution µ α ⊞ µ β when both µ α and µ β are convex combinations of two point masses. The analogous resultto Theorem 2.5 is given in Proposition 7.2 below. Applications of that result in the spirit ofTheorems 2.7 and 2.8 are then stated in Proposition 7.3 and Proposition 7.4. When we referto the results in Sections 2-4, we will henceforth regard µ and µ as µ α and µ β , respectively,unless speciﬁed otherwise.7.1. Stability in the two point mass case.

Without loss of generality (up to shiftingand scaling), we assume that µ α = ξδ + (1 − ξ ) δ , µ β = ζδ θ + (1 − ζ ) δ , θ = 0 ,ξ, ζ ∈ (0 ,

12 ] , ξ ≤ ζ, ( θ, ξ, ζ ) = ( − , ,

12 ) . (7.1)Here we exclude the case ( θ, ξ, ζ ) = ( − , , ) since it is equivalent to ( θ, ξ, ζ ) = (1 , , )under a shift. Note that the latter is a special case of µ α = µ β .Set ℓ := min n (cid:16) θ − p (1 − θ ) + 4 θr + (cid:17) , (cid:16) θ − p (1 − θ ) + 4 θr − (cid:17)o ,ℓ := max n (cid:16) θ − p (1 − θ ) + 4 θr + (cid:17) , (cid:16) θ − p (1 − θ ) + 4 θr − (cid:17)o ,ℓ := min n (cid:16) θ + p (1 − θ ) + 4 θr + (cid:17) , (cid:16) θ + p (1 − θ ) + 4 θr − (cid:17)o ,ℓ := max n (cid:16) θ + p (1 − θ ) + 4 θr + (cid:17) , (cid:16) θ + p (1 − θ ) + 4 θr − (cid:17)o , where we introduced r ± := ξ + ζ − ξζ ± p ξζ (1 − ξ )(1 − ζ ) . (7.2)Note that ℓ < ℓ ≤ ℓ < ℓ . The following result, taken from [28], describes the regularbulk of µ α ⊞ µ β in the setting of (7.1). Recall that f µ α ⊞ µ β denotes the density of ( µ α ⊞ µ β ) ac . Lemma 7.1.

Let µ α and µ β be as in (7.1) . Then the regular bulk is given by B µ α ⊞ µ β = ( ℓ , ℓ ) ∪ ( ℓ , ℓ ) , (7.3) in case µ α = µ β , while in case µ α = µ β it is given by B µ α ⊞ µ α = ( ℓ , ℓ ) . (7.4) Proof.

Choose the diagonal matrices A and B with spectral distribution µ A = ξ N δ + (1 − ξ N ) δ and µ B = ζ N δ θ + (1 − ζ N ) δ respectively, with ξ N := ⌊ ξN ⌋ /N and ζ N := ⌊ ζN ⌋ /N ,where ⌊ · ⌋ denotes the integer part. Recall from (7.1) that ξ ≤ ζ and ξ + ζ ≤

1. FromTheorem 1.1 of [28], we ﬁrst observe that the θ and 0 are eigenvalues of the matrix H = A + U BU ∗ , U a Haar unitary, with multiplicities N ( ζ N − ξ N ) and N (1 − ζ N − ξ N ), respectively.The remaining 2 ξ N N eigenvalues of H may be obtained via a two-fold transformation fromthe eigenvalues, ( t i ), of a ξ N N -dimensional Jacobi ensemble as τ ± j := 12 (cid:16) θ ± q (1 − θ ) + 4 t j (cid:17) , j = 1 , . . . , ξ N N , (7.5) and then identifying the eigenvalues of H as the set { τ + j } ∪ { τ − j } ∪ { , θ } . In addition, theweak limit of ξ N N P j δ t j , as N → ∞ , admits a density given by f ( x ) = 12 πξ p ( r + − x )( x − r − ) x (1 − x ) [ r − ,r + ] ( x ) , x ∈ R , (7.6)where r + and r − are deﬁned in (7.2). Since the limiting spectral distribution of H isgiven by µ α ⊞ µ β , we see that ( µ α ⊞ µ β ) ac agrees with the weak limit of the measure N P j ( δ τ + j + δ τ − j ), as N → ∞ . Using this information together with (7.5) and (7.6), onededuces that supp ( µ α ⊞ µ β ) ac = [ ℓ , ℓ ] ∪ [ ℓ , ℓ ]. It then follows from the explicit form ofthe limiting distribution of the Jacobi ensemble that f µ α ⊞ µ β is bounded and strictly positiveinside its support. This proves (7.3).In the special case µ α = µ β , we have ℓ = ℓ = 1 and thus supp ( µ α ⊞ µ β ) ac = [ ℓ , ℓ ],with ℓ = 1 − p ξ (1 − ξ ) and ℓ = 1 + 2 p ξ (1 − ξ ). In fact, the density of ( µ α ⊞ µ α ) ac equals f µ α ⊞ µ α ( x ) = 1 π p ( ℓ − x )( x − ℓ ) x (2 − x ) , x ∈ ( ℓ , ℓ ) ; (7.7)see (5.5) of [33] for instance. Then (7.4) follows directly (cid:3) Our main task in this section is to show the following result on the stability of the systemΦ µ α ,µ β ( ω α , ω β , z ) = 0 in the setting (7.1). Recall the deﬁnition of Γ µ α ,µ β in (2.12). Proposition 7.2.

Let µ α and µ β be as in (7.1) . Let I ⊂ B µ α ⊞ µ β be a compact non-emptyinterval. Fix < η M < ∞ . Then, there are constants k > , K < ∞ and S < ∞ , dependingon the constants ξ , ζ , θ , η M and on the interval I , such that the subordination functionspossess the following bounds: min z ∈S I (0 ,η M ) Im ω α ( z ) ≥ k , min z ∈S I (0 ,η M ) Im ω α ( z ) ≥ k , (7.8)max z ∈S I (0 ,η M ) | ω α ( z ) | ≤ K , max z ∈S I (0 ,η M ) | ω β ( z ) | ≤ K . (7.9) Moreover, we have the following bounds: ( i ) If µ α = µ β , max z ∈S I (0 ,η M ) Γ µ α ,µ β ( ω α ( z ) , ω β ( z )) ≤ S . (7.10)( ii ) If µ α = µ β , Γ µ α ,µ α ( ω α ( z ) , ω α ( z )) ≤ S | z − | , (7.11) holds uniformly on S I (0 , η M ) .Remark . As an immediate consequence of Proposition 7.2 and (3.28), we obtain for µ α = µ β the bounds max z ∈S I (0 ,η M ) | ω ′ α ( z ) | ≤ S , max z ∈S I (0 ,η M ) | ω ′ β ( z ) | ≤ S with I asin (7.10). For µ α = µ β , we get | ω ′ α ( z ) | ≤ S | z − | , uniformly on S I (0 , η M ) as in (7.11). Remark . In the case µ α = µ β , we note from Lemma 7.1 ( c.f., (7.7)) that the point E = 1 is in the regular bulk B µ α ⊞ µ α . However, m µ α ⊞ µ β (1 + i0) is unstable under smallperturbations. For instance, let µ A = µ α = ξδ + (1 − ξ ) δ , µ B = ( ξ − ε ) δ + (1 − ξ + ε ) δ , for some small ε >

0. Then, according to Theorem 7.4 of [11], µ A ⊞ µ B has a point mass εδ .Hence, even though (2.25) ( i.e., d L ( µ B , µ β ) →

0, as ε →

0) is satisﬁed, m µ A ⊞ µ B ( z ) contains a singular part ε (1 − z ) , which blows up as | z − | = o ( ε ). This explains, on a heuristic level,the bound in (7.11) and shows why the µ α = µ β case at energy E = 1 is special even thoughthe density f µ α ⊞ µ α is real analytic in a neighborhood of E = 1. Remark . Consider a more general setup with µ α = ξδ a + e µ α and µ β = (1 − ξ ) δ b + e µ β , forsome constants ξ ∈ (0 , a, b ∈ R and for some Borel measures e µ α and e µ β with e µ α ( R ) = 1 − ξ and e µ β ( R ) = ξ . Analogously to the discussion in Remark 7.2, we note that m µ α ⊞ µ β ( a + b +i0)is unstable under small perturbations. However, from Lemma 3.4, we know that the systemΦ µ α ,µ β ( ω α , ω β , z ) = 0 is linearly S -stable in the regular bulk under the assumptions ofTheorem 2.5. That means, if neither µ α nor µ β is supported at a single point and at leastone of them is supported at more than two points, then the point E = a + b cannot lie in theregular bulk B µ α ⊞ µ β . Thus, only in the special case µ α = µ β with µ α as in (7.1), there is anunstable point, up to scaling and shifting given by E = 1, inside the regular bulk B µ α ⊞ µ α . Proof of Proposition 7.2.

Estimates (7.8) and (7.9) follow from Lemma 3.2 and Lemma 3.3.To show statement ( i ), we recall from the proof of Lemma 3.4 that Φ µ α ,µ β ( ω α , ω β , z ) = 0is linearly S -stable at ( ω α , ω β ) if (cid:12)(cid:12) − ( F ′ µ α ( ω β ) − F ′ µ β ( ω α ) − (cid:12)(cid:12) ≥ c , (7.12)for some strictly positive constant c . We now show that (7.12) holds for the case µ α = µ β in the setup of (7.1). Using henceforth the shorthand F α ≡ F µ α , F β ≡ F µ β , we compute F α ( z ) = z (1 − z )1 − ξ − z , F β ( z ) = z ( θ − z ) θ − θζ − z , z ∈ C + . (7.13)Then it is easy to obtain F ′ α ( z ) − ξ − ξ (1 − ξ − z ) , F ′ β ( z ) − θ ( ζ − ζ )( θ − θζ − z ) , (7.14)and | F ′ α ( z ) − | = Im F α ( z ) − Im z Im z , | F ′ β ( z ) − | = Im F β ( z ) − Im z Im z . Consequently, we have ( c.f., (3.18)) (cid:12)(cid:12)(cid:12)(cid:0) F ′ α ( ω β ( z )) − (cid:1)(cid:0) F ′ β ( ω α ( z )) − (cid:1)(cid:12)(cid:12)(cid:12) = (Im ω α ( z ) − Im z )(Im ω β ( z ) − Im z )Im ω α ( z )Im ω β ( z ) (7.15)for any z ∈ C + . Hence, for z ∈ S I ( η , η M ) with some small but ﬁxed η > z ∈ S I (0 , η ).Then (7.13) together with (2.5) implies that ω β (1 − ω β )1 − ξ − ω β = ω α ( θ − ω α ) θ − θζ − ω α , ω β (1 − ω β )1 − ξ − ω β = ω α + ω β − z . (7.16)Denote s := 1 − ξ − ω β and t := θ − θζ − ω α . From (7.14) we then have (cid:0) F ′ α ( ω β ) − (cid:1)(cid:0) F ′ β ( ω α ) − (cid:1) = ( ξ − ξ )( θ ζ − ( θζ ) )( st ) . (7.17)Using (7.16), some algebra reveals that1 st = − ξ − ξ + ξ + θ − θζ − z ( ξ − ξ ) t , st = − θ ( ζ − ζ ) + θζ + 1 − ξ − zθ ( ζ − ζ ) s . (7.18) Owing to (7.15) and ( ξ − ξ )( θ ζ − ( θζ ) ) > ξ, ζ ∈ (0 , /

2] and θ = 0), itsuﬃces to show that | Im ( st ) | ≥ c , (7.19)in order to prove (7.12). Note that, from the deﬁnitions of s and t , together with (7.8)and (7.9), we have | Im s | , | Im t | ≥ c , | s | , | t | ≤ C . (7.20)Since µ α = µ β , there exists a positive constant d such that max {| ξ − ζ | , | θ − |} ≥ d . It isthen elementary to work out thatmax {| ( ξ − ξ ) − θ ( ζ − ζ ) | , | (2 ξ − θζ + θ − |} ≥ d , (7.21)for some positive constant d ≡ d ( ξ, ζ, θ ) >

0, since the special case ( θ, ξ, ζ ) = ( − , , ) isalso excluded in the setting (7.1). For brevity, we adopt the notation ϕ := θζ + 1 − ξ − zθ ( ζ − ζ ) s , ψ := ξ + θ − θζ − z ( ξ − ξ ) t . Then, according to (7.18) we haveRe 1 st = Re ψ − ξ − ξ = Re ϕ − θ ( ζ − ζ ) , Im 1 st = Im ψ = Im ϕ . (7.22)If | ( ξ − ξ ) − θ ( ζ − ζ ) | ≥ d holds in (7.21), then (7.22) implies that | Re ψ − Re ϕ | ≥ d , (7.23)for some positive constant d ≡ d ( ξ, ζ, θ ). For small enough η = η ( ξ, ζ, θ ), we then getRe ψ − Re ϕ = ( ξ + θ − θζ − E )Re t + O ( η )( ξ − ξ ) | t | − ( θζ + 1 − ξ − E )Re s + O ( η ) θ ( ζ − ζ ) | s | , which, together with (7.20) and (7.23), implies thatmax (cid:8) | θζ + 1 − ξ − E | , | ξ + θ − θζ − E | (cid:9) ≥ d , (7.24)for some positive constant d ≡ d ( ξ, ζ, θ ). If, on the other hand, | (2 ξ − θζ + θ − | ≥ d holds in (7.21), we get (7.24) by triangle inequality. Either way, (7.24) follows from (7.21),for suﬃciently small, but ﬁxed, η > c > η , for all z ∈ S I (0 , η ), we have max {| Im ϕ | , | Im ψ |} ≥ c . Since Im ϕ =Im ψ by (7.22), (7.19) holds on S I (0 , η ). Therefore, (7.19) holds on all of S I (0 , η M ). So, if µ α = µ β , the system Φ µ α ,µ β ( ω α , ω β , z ) = 0 is linearly S -stable with some ﬁnite S .We next prove statement ( ii ) where µ α = µ β and thus θ = 1, ξ = ζ . From (7.16), we seethat ω α = ω β satisﬁes the equation ω α (1 − ω α )1 − ξ − ω α = 2 ω α − z . (7.25)Solving (7.25) for ω α ( z ) we get ω α ( z ) = ω β ( z ) = 12 (cid:16) z − − ξ ) + p ( z − − ξ (1 − ξ ) (cid:17) , (7.26)where the square root is chosen such that ω β ( z ) → i p ξ (1 − ξ ) , as z →

1. Substituting (7.26)into (7.17), together with the θ = 1, ζ = ξ , s = t = 1 − ξ − ω α , yields (cid:0) F ′ α ( ω β ( z )) − (cid:1)(cid:0) F ′ β ( ω α ( z )) − (cid:1) = 4( ξ − ξ ) (cid:0) z − p ( z − − ξ − ξ ) (cid:1) . Then it is elementary to check that (cid:12)(cid:12) − (cid:0) F ′ α ( ω β ( z )) − (cid:1)(cid:0) F ′ β ( ω α ( z )) − (cid:1)(cid:12)(cid:12) & | z − | , z ∈ S (0 , η M ) , which further implies Γ µ α ,µ β ( ω α ( z ) , ω β ( z )) . / | z − | . Hence (7.11) is proved. (cid:3) Applications of Proposition 7.2.

Analogously to Theorem 2.5, we have two mainapplications of Proposition 7.2. The ﬁrst one is the following modiﬁcation of Theorem 2.7.Let µ α , µ β be as in (7.1) and let µ A , µ B be arbitrary probability measures on R . Recall thedomain S I ( a, b ) introduced in (2.13). For given (small) ς >

0, we set S ς I ( a, b ) := (cid:26) z ∈ S I ( a, b ) : ς | z − | ≥ max np d L ( µ A , µ α ) , q d L ( µ B , µ β ) o(cid:27) . (7.27) Proposition 7.3.

Let µ α , µ β be as in (7.1) . Let I ⊂ B µ α ⊞ µ β be a compact non-emptyinterval. Let µ A , µ B be two probability measures on R . Fix < η M < ∞ . Then there areconstants b > and Z < ∞ such that the condition d L ( µ A , µ α ) + d L ( µ B , µ β ) ≤ b (7.28) implies max z ∈S I (0 ,η M ) (cid:12)(cid:12) m µ A ⊞ µ B ( z ) − m µ α ⊞ µ β ( z ) (cid:12)(cid:12) ≤ Z (d L ( µ A , µ α ) + d L ( µ B , µ β )) , (7.29) in case µ α = µ β , respectively (cid:12)(cid:12) m µ A ⊞ µ B ( z ) − m µ α ⊞ µ α ( z ) (cid:12)(cid:12) ≤ Z | z − | (d L ( µ A , µ α ) + d L ( µ B , µ α )) , (7.30) uniformly on S ς I (0 , η M ) with ς ≤ ς , for some ς > , in case µ α = µ β . The constants b and Z depend only on the constants ξ, ζ, θ and on the interval I , while ς also depends on b .Proof. Having established Proposition 7.2, the proof of (7.29) is the same as that of Theo-rem 2.7. To establish (7.30), we mimic the proof of Theorem 2.7 with S replaced by S | z − | .We only give a sketch here. Similarly to (5.9), using (7.8) and (7.11), we have with b in (7.28)suﬃciently small thatΓ µ A ,µ B ( ω α ( z ) , ω α ( z )) . | z − | , z ∈ S ς I (0 , η M ) . (7.31)As in the proof of Lemma 5.1, we rewrite the system Φ µ α ,µ α ( ω α ( z ) , ω α ( z ) , z ) = 0 asΦ µ A ,µ B ( ω α ( z ) , ω α ( z ) , z ) = r ( z ) with k r ( z ) k satisfying the bound (5.8). From the uniquenessof the solution to Φ µ A ,µ B ( ω A , ω B , z ) = 0 and (7.31), we get | ω A ( z ) − ω α ( z ) | . k r ( z ) k / | z − | , | ω B ( z ) − ω α ( z ) | . k r ( z ) k / | z − | , z ∈ S ς I (0 , η M ) , (7.32)via the Newton-Kantorovich theorem. Note that the inequality k r ( z ) k . ς | z − | is neededto guarantee that the ﬁrst order term dominates over the higher order terms in the Taylorexpansion of Φ µ A ,µ B ( ω A , ω B , z ) around Φ µ A ,µ B ( ω α , ω β , z ). This is the reason why we restrictour discussion on the set S ς I (0 , η M ). In addition, thanks to (7.32) we see that (5.3) and (5.4)still hold with S I (0 , η M ) replaced by S ς I (0 , η M ). Then the remaining parts of the proofof (7.30) are the same as the counterparts in the proof of Theorem 2.7. (cid:3) The second application of Proposition 7.2 gives the following local law for the Greenfunction in the random matrix setup from Subsection 2.3.2. Fix any γ >

0. We introduce asub-domain of S ς I ( a, b ) by setting e S ς I ( a, b ) := S ς I ( a, b ) ∩ n z ∈ C : | z − | ≥ N γ p N η / o . (7.33) Proposition 7.4.

Let µ α , µ β be as in (7.1) . Assume that the empirical eigenvalue distri-butions µ A , µ B of the sequences of matrices A , B satisfy (2.25) . Fix any < η M < ∞ , anysmall γ > and set η m = N − + γ . Let I ⊂ B µ α ⊞ µ β be a compact non-empty interval. Thenwe have the following conclusions. ( i ) If µ α = µ β , then max z ∈S I ( η m ,η M ) (cid:12)(cid:12) m H ( z ) − m A ⊞ B ( z ) (cid:12)(cid:12) ≺ N η / . ( ii ) If µ α = µ β , then, for any ﬁxed (small) ς > , (cid:12)(cid:12) m H ( z ) − m A ⊞ B ( z ) (cid:12)(cid:12) ≺ | z − | N η / , uniformly on e S ς I ( η m , η M ) .Proof of Proposition 7.4. Note that, in the proof of Theorem 2.8, the only place where weuse the assumption that at least one of µ α and µ β is supported at more than two pointsis Lemma 3.4; in particular in (3.25). Hence, it suﬃces to mimic the proof of Theorem 2.8with Lemma 3.4 replaced by Proposition 7.2. Then the proof of the case µ α = µ β is exactlythe same as that of Theorem 2.8. It suﬃces to discuss the case µ α = µ β below.Analogously to Corollary 5.2, with the aid of (7.31) and (7.32), we show thatΓ µ A ,µ B ( ω A ( z ) , ω B ( z )) . | z − | , z ∈ S ς I (0 , η M ) . (7.34)Then, we use a continuity argument, based on Lemma 4.2 and Proposition 4.1 with S replaced by S | z − | therein, to deduce from (6.40) that | ω ci ( z ) − ω i ( z ) | ≺ k r c ( z ) k / | z − | , i = A, B , on e S ς I ( η m , η M ). The remaining parts of the proof are the same as in Theorem 2.8.This completes the proof of part ( ii ) of Proposition 7.4. (cid:3) References [1] Akhieser, N. I.:

The classical moment problem and some related questions in analysis , Hafner Publish-ing Co., New York, 1965.[2] Anderson, G., Guionnet, A., Zeitouni, O.:

An introduction to random matrices , Cambridge Stud. Adv.Math. , Cambridge Univ. Press, Cambridge, 2010.[3] Bao Z. G., Erd˝os, L., Schnelli K.:

Local law of addition of random matrices on optimal scale ,arXiv:1509.07080 (2015).[4] Belinschi, S., Bercovici, H.:

A new approach to subordination results in free probability , J. Anal. Math. , 357-365 (2007).[5] Belinschi, S.:

A note on regularity for free convolutions , Ann. Inst. Henri Poincar´e Probab. Stat. ,635-648 (2006).[6] Belinschi, S.:

The Lebesgue decomposition of the free additive convolution of two probability distribu-tions , Probab. Theory Related Fields , 125-150 (2008).[7] Belinschi, S.: L ∞ -boundedness of density for free additive convolutions , Rev. Roumaine Math. PuresAppl. , 173-184 (2014).[8] Belinschi, S., Bercovici, H., Capitaine, M., F´evrier, M.: Outliers in the spectrum of large deformedunitarily invariant models , arXiv:1412.4916 (2014).[9] Benaych-Georges, F., Nadakuditi, R. R.:

The eigenvalues and eigenvectors of ﬁnte, low rank perturb-bations of large random matrices , Adv. Math. , 494-521 (2011). [10] Bercovici, H, Voiculescu, D.: Free convolution of measures with unbounded support , Indiana Univ.Math. J. , 733-773 (1993).[11] Bercovici, H., Voiculescu, D.: Regularity questions for free convolution, nonselfadjoint operator alge-bras, operator theory, and related topics , Oper. Theory Adv. Appl. , 37-47 (1998).[12] Bercovici, H., Wang, J.-C.:

On freely indecomposable measures , Indiana Univ. Math. J. , 2601-2610 (2008).[13] Biane, P.:

On the free convolution with a semi-circular distribution , Indiana Univ. Math. J. , 705-718(1997).[14] Biane, P.: Processes with free increments , Math. Z. , 143-174 (1998).[15] Biane, P.:

Representations of symmetric groups and free probability , Adv. Math. , 126-181(1998).[16] Capitaine, M.:

Additive/multiplicative free subordination property and limiting eigenvectors of spikedadditive deformations of Wigner matrices and spiked sample covariance matrices , J. Theoret. Probab. , 595-648 (2013).[17] Chatterjee, S.:

Concentration of Haar measures, with an application to random matrices , J. Funct.Anal. , 379-389 (2007).[18] Chistyakov, G. P., G¨otze, F.:

The arithmetic of distributions in free probability theory , Cent. Euro. J.Math. , 997-1050 (2011).[19] Collins, B.: Moments and cumulants of polynomial random variables on unitary groups, the Itzykson-Zuber integral, and free probability , Int. Math. Res. Not. , 953-982 (2003).[20] Dykema, K.:

On certain free product factors via an extended matrix model , J. Funct. Anal. ,31-60 (1993).[21] Erd˝os, L., Knowles, A., Yau, H.-T.:

Averaging ﬂuctuations in resolvents of random band matrices ,Ann. Henri Poincar´e , 1837-1926 (2013).[22] Erd˝os, L., Knowles, A., Yau, H.-T., Yin, J.: The local semicircle law for a general class of randommatrices . Electron. J. Probab. , 1-58 (2013).[23] Erd˝os, L., Schlein, B., Yau, H.-T.:

Local semicircle law and complete delocalization for Wigner randommatrices , Ann. Probab. , 815-852 (2009).[24] Erd˝os, L., Yau, H.-T., Yin, J.:

Bulk universality for generalized Wigner matrices , Probab. TheoryRelated Fields , 341-407 (2012).[25] Ferreira, O. P., Svaiter, B. F.:

Kantorovich’s theorem on Newton’s method , arXiv:1209.5704 (2012).[26] Gromov, M., Milman V. D.:

A topological application of the isoperimetric inequality , Amer. J. Math. , 843-854 (1983).[27] Hiai, F., Petz, D.:

The semicircle law, free random variables and entropy , Math. Surveys Monogr. ,Amer. Math. Soc., Providence RI, 2000.[28] Kargin, V.: On eigenvalues of the sum of two random projections , J. Stat. Phys. , 246-258(2012).[29] Kargin, V.:

A concentration inequality and a local law for the sum of two random matrices , Probab.Theory Related Fields , 677-702 (2012).[30] Kargin, V.:

An inequality for the distance between densities of free convolutions , Ann. Probab. ,3241-3260 (2013).[31] Kargin, V.:

Subordination for the sum of two random matrices , Ann. Probab. , 2119-2150 (2015).[32] Maassen, H.:

Addition of freely independent random variables . J. Funct. Anal. , 409-438 (1992).[33] Pastur, L., Vasilchuk. V:

On the law of addition of random matrices , Comm. Math. Phys. ,249-286 (2000).[34] Speicher, R.:

Free convolution and the random sum of matrices , Pub. Res. Inst. Math. Sc. ,731-744 (1993).[35] Speicher, R.:

Multiplicative functions on the lattice of non-crossing partitions and free convolution ,Math. Ann. , 611-628 (1994).[36] Voiculescu, D.:

Addition of certain non-commuting random variables , J. Funct. Anal. , 323-346(1986).[37] Voiculescu, D.:

Limit laws for random matrices and free products , Invent. Math. , 201-220(1991).[38] Voiculescu, D.:

The analogues of entropy and of Fisher’s information measure in free probabilitytheory I , Comm. Math. Phys. , 71-92 (1993).[39] Voiculescu, D., Dykema, K. J., Nica, A.:

Free random variables , CRM Monogr. Ser., Amer. Math. Soc.,Providence RI, 1992. [40] Xu, F.: A random matrix model from two-dimensional Yang-Mills theory , Comm. Math. Phys.190