The middle-scale asymptotics of Wishart matrices
TThe middle-scale asymptotics ofWishart matrices
Didier Ch´etelat and Martin T. Wells
Department of Applied Mathematicsand Industrial Engineering´Ecole PolytechniqueMontr´eal, Qu´ebec H3T 1J4Canadae-mail: [email protected]
Department of Statistical ScienceCornell University1198 Comstock HallIthaca, NY 14853-3801USAe-mail: [email protected]
Abstract:
We study the behavior of a real p -dimensional Wishart randommatrix with n degrees of freedom when n, p Ñ 8 but p { n Ñ
0. We establishthe existence of phase transitions when p grows at the order n p K ` q{p K ` q for every k P N , and derive expressions for approximating densities betweenevery two phase transitions. To do this, we make use of a novel tool we callthe G-transform of a distribution, which is closely related to the charac-teristic function. We also derive an extension of the t -distribution to thereal symmetric matrices, which naturally appears as the conjugate distri-bution to the Wishart under a G-transformation, and show its empiricalspectral distribution obeys a semicircle law when p { n Ñ
0. Finally, we dis-cuss how the phase transitions of the Wishart distribution might originatefrom changes in rates of convergence of symmetric t statistics. MSC 2010 subject classifications:
Primary 60B20, 60B10; secondary60E10.
1. Introduction
The roots of random matrix theory lies in statistics, with the work of Wishart[1928] and Bartlett [1933], and in numerical analysis, with the work of Von Neu-mann and Goldstine [1947]. In this early period, many well-known matrix dis-tributions were introduced. This includes the real Gaussian matrix ensembleG p p, q q , a p ˆ q matrix with independent standard Gaussian entries, the Gaus-sian orthogonal ensemble GOE p p q , the distribution of a symmetric matrix p X ` X t q{? X „ G p p, p q , and the Wishart (also known as Laguerre) dis-tribution W p p n, I p { n q , the distribution of a symmetric matrix XX t { n with X „ G p p, n q . During that time, the main concern was to derive propertiesof these distributions for a fixed dimension. Some asymptotics of the Wishartdistribution were considered, but only as n Ñ 8 for fixed p . a r X i v : . [ m a t h . P R ] M a y h´etelat and Wells/Mid-scale Wishart asymptotics Starting with the pioneering work of Wigner [1951, 1955, 1957], Porter andRosenzweig [1960], Gaudin [1961] and Mehta [1960a,b], researchers began in-vestigating the asymptotics of Gaussian ensembles as their dimension grew toinfinity. As a result of decades of work, the behavior of a GOE p p q matrix is nowwell understood both in the classical setting where p is fixed, and in the settingwhere p Ñ 8 .However, the situation asymptotics of the Wishart distribution is more com-plicated, as it depends on two parameters, n and p , and initial progress was slow.The work of Marchenko and Pastur [1967] clearly established that the analogueof a Gaussian orthogonal ensemble matrix whose dimension p grows to infinityis a Wishart matrix whose degrees of freedom n and dimension p jointly growto infinity in such a way that p { n Ñ c P p , q . Since then, we gained a verygood understanding of the behavior of Wishart matrices in this regime.But this body of work left open the question as to what happens to a Wishartmatrix when n, p Ñ 8 with p { n Ñ
0. Since such asymptotics are middle-scalebetween the classical regime where p is fixed as n Ñ 8 and the high-dimensional regime where p { n Ñ c P p , q , we might refer to them as middle-scale regimes.Hence, we might ask: what is the asymptotic behavior of a Wishart matrixW p p n, I p { n q in the middle-scale regimes? This question is addressed this article.To gain some intuition, it is instructive to look at the eigenvalues λ ą ¨ ¨ ¨ ą λ p ą p p n, I p { n q Wishart matrix. In the classical regime where p isfixed as n Ñ 8 , the eigenvalues must all almost surely tend to 1 by the stronglaw of large numbers. In constrast, in the high-dimensional regime where both n, p
Ñ 8 with p { n Ñ c P p , q , the Marchenko-Pastur law states that for anybounded, continuous f ,1 n n ÿ i “ f p λ i q Ñ ż c ` c ´ f p l q a p c ` ´ l qp c ´ ´ l q πcl dl a.s. , where c ˘ “ p ˘ ? c q . Thus the eigenvalues do not all tend to 1, but ratherdistribute themselves in the shape of a Marchenko-Pastur law with parameter c . What happens between these two extremes? When c Ñ
0, the Marchenko-Pastur law converges weakly to a Dirac measure with mass at 1. This suggeststhat whenever n, p
Ñ 8 with p { n Ñ n n ÿ i “ f p λ i q Ñ f p q a.s. , or in other words that the eigenvalues converge almost surely to 1, as in theclassical case.This motivates a binary view of Wishart asymptotics. It appears that thebehavior of a Wishart matrix in the middle-scale regimes is the same as inthe classical regime, and therefore that there really are only two regimes: low-dimensional where p { n Ñ
0, and high-dimensional where p { n Ñ c P p , q .This binary view has very concrete repercussions. For example, in statistics,many covariance matrix estimators have been developed that leverage high- h´etelat and Wells/Mid-scale Wishart asymptotics dimensional Wishart asymptotics (see Pourahmadi [2013] for a review). Whenfaced with a problem where p is large with respect to n , it has been arguedthat the high-dimensional asymptotics, rather than the classical, constitute thecorrect model. The binary view provides a useful rule of thumb: small p ’s callfor classical covariance estimators, while large p ’s call for high-dimensional co-variance estimators.Unfortunately, recent results establish that this binary view is incorrect. Inthe classical regime where p is fixed, the central limit theorem implies that ? n ” W p p n, I p { n q ´ I p ı ñ GOE p p q , as n Ñ 8 , where the arrow stands for weak convergence. In fact, somethingbetter is known: recent work has extended this result to the case where p tends to infinity. Recall that the total variation distance between two abso-lutely continuous distributions F and F with densities f and g is given byd TV p F , F q “ d TV p f , f q “ ş | f p x q ´ f p x q| dx . With different approaches,Jiang and Li [2015] and Bubeck et al. [2016] independently established thatd TV ˆ ? n ” W p p n, I p { n q ´ I p ı , GOE p p q ˙ Ñ p { n Ñ
0. Thus, when p { n Ñ
0, the same asymptotics hold as in the p fixed case, and we might regard these regimes as rightfully belonging to theclassical setting.The surprising part is that the converse is true! When p { n Û
0, results ofBubeck et al. [2016] and R´acz and Richey [2016] show thatd TV ˆ ? n ” W p p n, I p { n q ´ I p ı , GOE p p q ˙ Û . Thus a phase transition occurs when p is of the order ? n . This begs the question:if a normal approximation fails to hold when p grows faster than ? n , whatasymptotics hold? Is there a uniform asymptotic behavior that holds whenever p { n Ñ p { n Û
0, or are there further phase transitions as the growthrate of p is increased?The results of this paper offers a mostly complete answer to this question.Namely, we establish that when p { n Û p { n Ñ TV ˆ ? n ” W p p n, I p { n q ´ I p ı , F ˙ Ñ , where F is a continuous distribution on the space of real symmetric matriceswhose density is given when n ě p ´ f p X q 9 ˇˇˇˇ E „ exp " i ? XZ ´ i ? n tr Z ` n tr Z ` ? i n { tr Z ´ n tr Z ` i p p ` q ? n tr Z ´ p ` n tr Z ´ ? i p p ` q n { tr Z *ˇˇˇˇ , (1.1) h´etelat and Wells/Mid-scale Wishart asymptotics Fig 1 . Correct picture of Wishart asymptotics. This contrasts with the binary view, where nophase transitions occur between p held constant and p growing like n . for a Z „ GOE p p q . When p grows like ? n , another phase transition occurs.Namely, we establish that when p { n Û p { { n Ñ TV ˆ ? n ” W p p n, I p { n q ´ I p ı , F ˙ Ñ , where F is a continuous distribution on the space of real symmetric matriceswhose density is given when n ě p ´ f p X q 9 ˇˇˇˇ E „ exp " i ? XZ ´ i ? n tr Z ` n tr Z ` ? i n { tr Z ´ n tr Z ´ ? i n { tr Z ` i p p ` q ? n tr Z ´ p ` n tr Z ´ ? i p p ` q n { tr Z ` p ` n tr Z ` i p p ` q n { tr Z ´ p p ` q n tr Z *ˇˇˇˇ , (1.2)again for a Z „ GOE p p q .In general, for every K P N we find a continuous distribution F K on the spaceof real symmetric matrices, with density given when n ě p ´ f K p X q 9 ˇˇˇˇˇ E « exp i ? p XZ q ` n K ` ` r K odd s ÿ k “ i k ´ n ¯ k tr Z k k ` p ` K ` ´ r K odd s ÿ k “ i k ´ n ¯ k tr Z k k +ffˇˇˇˇˇ (1.3)for a Z „ GOE p p q , which approximates the normalized Wishart distribution insome (but not all) middle-scale regimes. Namely, we prove the following, whichcan be regarded as the main result of this paper. Theorem 1.
For any K P N , the total variation distance between the the nor-malized Wishart distribution ? n r W p p n, I p { n q ´ I p s and the K th degree density f K satisfies d TV ˆ ? n ” W p p n, I p { n q ´ I p ı , F K ˙ Ñ h´etelat and Wells/Mid-scale Wishart asymptotics as n Ñ 8 with p K ` { n K ` Ñ . The definition of f K and proof of Theorem 1 are found in Section 6, andfollow from definitions and results from Sections 3, 4 and 5 that constitute thebulk of this paper.The main consequence of this theorem is the existence of an infinite countablenumber of phase transitions, occurring when p grows like n p K ` q{p K ` q for K P N . A diagram is provided at Figure 1. This naturally groups the middle-scaleregimes satisfying lim n Ñ8 log p log n ă “ KK ` , K ` K ` ˘ theirlimit lim n Ñ8 log p log n belongs to. We might refer to this grouping as the degree of theregime. In other words, we will say an middle-scale regime satisfying lim n Ñ8 log p log n ă K when lim n Ñ8 log p log n P “ KK ` , K ` K ` ˘ .The main result of this paper, Theorem 1, may then be summarized as sayingthat the normalized Wishart distribution can be approximated by the distribu-tion with density f K in every middle-scale regime of degree K or less. The0 th degree case corresponds to the classical setting, while the higher degreescorrespond to previously unknown behavior. In fact, we show that our 0 th de-gree approximation F is asymptotically equivalent to the Gaussian orthogonalensemble. The results of this paper can therefore be regarded as a wide general-ization of the Wishart asymptotics results of Jiang and Li [2015], Bubeck et al.[2016], Bubeck and Ganguly [2016] and R´acz and Richey [2016].Our approach relies on a novel technical tool we call the G-transform. Itturns out that to understand middle-scale regime behavior of Wishart matrices,densities are less clear than characteristic functions (that is, Fourier transformsof densities). Unfortunately, characteristic functions are difficult to relate tometrics like the total variation distance. To remedy this problem, we developthe G-transform and some associated theory in Section 3. An interesting aspectof G-transform theory is that to every distribution we can associate a closelyrelated distribution called its G-conjugate. In fact, the G-conjugate of a Wishartmatrix is essentially a generalization of the t distribution to real symmetricmatrices. In Section 4, we define and derive several results concerning this newdistribution, including a semicircle law. From these results, we derive in Section5 approximations to the Wishart distribution for middle-scale regimes of everydegree. Since these approximations are given using the language of G-transforms,we derive in Section 6 density approximations, from which Theorem 1 follows.We briefly discuss what concrete effects the phase transitions might have onWishart asymptotics in Section 7. Finally, we compile auxiliary results in Section8, while we discuss in Section 9 open questions that arise from these results.Although the results of this paper explain a large part of the behavior ofWishart matrices when p { n Ñ
0, there exists regimes for which p { n Ñ p R O p n p K ` q{p K ` q q for all K P N , or in other words for which lim n Ñ8 log p log n “ p grows at the order n ´ {? log n . Although the results ofour paper characterize almost all middle-scale regimes in the sense that amongthose regimes satisfying lim n Ñ8 log p log n ď
1, those such that lim n Ñ8 log p log n “ h´etelat and Wells/Mid-scale Wishart asymptotics a negligible set, they nonetheless exist. One might regard regimes such as thoseas having infinite degree. Beyond this, however, it is difficult to say anythingabout the behavior of Wishart matrices in these regimes. More work in thatdirection is clearly needed.
2. Notation and definitions
The transpose of a matrix is denoted t , and the identity matrix of dimension p is I p . As is standard, we take the trace operator to have lower priority thanthe power operator: thus for a matrix X , tr X k means the trace of X k . We willwrite tr k X when we mean the k th power of the trace of X . The Kronecker deltais the symbol δ kl “ r k “ l s .The space of all real-valued symmetric matrices is denoted S p p R q “ t X P M p p R q| X “ X t u . For a symmetric matrix X , we define the symmetric differen-tiation operator B s {B s X kl by B s B s X kl “ ` δ kl BB X kl “ BB X kl for k “ l BB X kk for k “ l. This operator has the elegant property that B S B S X kl tr p XY q “ Y kl for any twosymmetric matrices X , Y .The space of symmetric matrices S p p R q can be assimilated to R p p p ` q{ bymapping a symmetric matrix to its upper triangle. By integration over S p p R q ,we mean integration with respect to the pullback Lebesgue measure under thisisomorphism, that is, ż S p p R q f p X q dX “ ż R p p p ` q{ f ` X ˘ p ź i ď j dX ij . We say a real symmetric matrix follows the Gaussian orthogonal ensembleGOE p p q distribution if X kl , k ď l are all independent, with diagonal elements X kk „ N p , q and off-diagonal elements X kl „ N p , q .Let X be a n ˆ p matrix of i.i.d. N p , q random variables, and let Σ be a p ˆ p positive-definite matrix. The Wishart distribution W p p n, Σ q is the distri-bution of the random matrix Σ X t X Σ . This is a special case of the matrixgamma distribution. Following Gupta and Nagar [1999, Section 3.6], we saya positive-definite matrix X has a matrix gamma distribution G p p α, Σ q withshape parameter α ą p p ´ q{ S p p R q given by f p X q “ | Σ | α Γ p p α q ˇˇ X ˇˇ α ´ p ` exp ! ´ tr p Σ ´ X q ) r X ą s , where Γ p is the multivariate gamma function. With this definition, the Wishartdistribution W p p n, Σ q is a matrix gamma with shape n and scale 2Σ.While studying the Wishart distribution, the expression n ´ p ´ m “ n ´ p ´ h´etelat and Wells/Mid-scale Wishart asymptotics The Hellinger distance is metric between absolutely continuous probabil-ity measures. For two distributions F and F with densities f and f , theirHellinger distance is defined asH p F , F q “ H p f , f q “ ´ ż ˇˇˇ f { p x q ´ f { p x q ˇˇˇ dx ¯ . The Hellinger distance is closely related to the total variation distance by theinequalities12 d TV p f , f q ď H p f , f q ď d { p f , f q . (2.1)In particular, H p f , f q Ñ TV p f , f q Ñ
0. Thus they canbe seen as inducing the same topology on absolutely continuous probabilitymeasures, called the strong topology , in contrast to the topology induced byweak convergence of measures called the weak topology . One can show that if asequence of measures converges in the strong sense (i.e. in the d TV or H metrics),then it converges weakly.
3. G-transforms
Our analysis of Wishart matrices relies heavily on a tool we call the G-transformof a probability measure. To do so, we first need to define the Fourier transformover symmetric matrices.In Section 2, we clarified what we meant by integration over S p p R q . For afunction f : S p p R q Ñ C in L p S p p R qq , we define its Fourier transform to be F t f up T q “ p π p p p ` q ż S p p R q e ´ i tr p T X q f p X q dX. (3.1)It is more common to define the Fourier transform on symmetric matrices withthe integrand exp (cid:32) ´ i ř k ď l T kl X kl ( , but choosing exp (cid:32) i tr p T X q ( considerablysimplifies our computations.We extend this definition to f P L r p S p p R qq , 1 ă r ď ż S p p R q f p X q s g p X q dX “ ż S p p R q F t f up T q Ę F t g up T q dT. We now define the G-transform. In itself, the definition has nothing to dowith symmetric matrices and could have been perfectly well defined on anyother space endowed with a Fourier transform.
Definition 1 (G-transform of a density) . Let f be an integrable function S p p R q Ñ C . Its G-transform is the complex-valued function G t f u : S p p R q Ñ C defined by G t f u “ F t f { u , (3.2)where z { stands for the principal branch of the complex logarithm. h´etelat and Wells/Mid-scale Wishart asymptotics In the same way that the Fourier transform maps L p S p p R qq to itself, theG-transform maps L p S p p R qq to itself.By extension, for an absolutely continuous distribution on S p p R q with density f , we will define its G-transform to be the G-transform of its density. (Thisusage mirrors other transforms, such as the Stietjes transform.) We will usuallydenote the G-transform of f by ψ . Since a density is integrable, this is alwayswell-defined. Moreover, f “ F ´ t ψ { u , so the density can be recovered fromthe G-transform, and therefore to understand a distribution it is equivalent tostudy its density or its G-transform.Two comments are in order. First, for many densities, f { P L p S p p R qq . Inthis case, the G-transform can be written explicitly as ψ p T q “ G t f up T q “ p π p p p ` q ˆ ż S p p R q e ´ i tr p T X q f { p X q dX ˙ . (3.3)Second, throughout this article we will often talk about “the” square root of aG-transform. To be clear, by ψ { we will always mean F t f { u .Now, in many ways, the G-transform behaves similarly to the characteristicfunction (Fourier transform of a density), but it has unique features. First,Plancherel’s theorem yields that ż S p p R q | ψ p T q| dT “ ż S p p R q | ψ { p T q| dT “ ż S p p R q | f { p X q| dX “ ż S p p R q | f p X q| dX “ . (3.4)Thus | ψ | is itself a density, which we will call the G-conjugate of f . (In par-ticular, ψ { is much like a quantum-mechanical wavefunction.) We will alsouse an asterisk notation, so that the G-conjugate of a N p , q distributionwill be denoted N p , q ˚ . For example, straightforward computations yield thatN p , q ˚ “ N p , { q , χ ˚ n “ ? n t n { (where χ ν and t ν are the univariate χ and t distributions with ν degrees of freedom, respectively) and p aF ` b q ˚ “ a ´ F ˚ for any distribution F and scalars a “ b P R . Studying the G-conjugate of theWishart distribution will play a key part in deriving results about the Wishartdistribution itself. We should note that, in general, the double G-conjugate F ˚˚ is not the same as F . For example, χ ˚˚ n is a density involving modified Besselfunctions of the first kind, not a χ n .A second feature that distinguishes G-transforms from characteristic func-tions is that they are easy to relate to the Hellinger distance between probabilitymeasures. Consider two densities f , f with G-transforms ψ , ψ . By analogy,we could define the “total variation” and “Hellinger” distances of ψ and ψ by d TV p ψ , ψ q “ ż S p p R q | ψ p T q ´ ψ p T q| dT, (3.5) h´etelat and Wells/Mid-scale Wishart asymptotics and H p ψ , ψ q “ ´ ż S p p R q | ψ { p T q ´ ψ { p T q| dT ¯ . (3.6)Since the modulus of the G-transforms integrate to one, their total variation andHellinger distances are related to each other in the same way as in Equation 2.1for densities, namely12 d TV p ψ , ψ q ď H p ψ , ψ q ď d { p ψ , ψ q . (3.7)Thus d TV p ψ , ψ q Ñ p ψ , ψ q Ñ
0. But the Hellinger distancebetween G-transforms is much more useful. Indeed, by the Plancherel theorem,for any two densities f , f with G-transforms ψ , ψ , their Hellinger distancesatisfiesH p f , f q “ ż S p p R q | f { p X q ´ f { p X q| dX “ ż S p p R q | ψ { p T q ´ ψ { p T q| dT “ H p ψ , ψ q . (3.8)Thus to compute the Hellinger distance H p f , f q between two densities, wecan instead compute the Hellinger distance H p ψ , ψ q of their G-transforms.In contrast, there is no explicit way to express the Hellinger distance in termsof characteristic functions. And no such connection exists between the totalvariation distances of densities and G-transforms.The G-transform does have some disadvantages compared to the Fouriertransform. It is a non-linear transformation (and therefore not a true transform),and it does not behave well with respect to convolution. For our purposes,however, the advantages listed above outweigh these problems.In practice, it is not aways easy to control the Hellinger distance directly, andone often focuses on the Kullback-Leibler divergence instead. The two quantitiesare related through the well known inequalityH p f , f q ď E „ log f p X q f p X q for X „ f . For G-transforms, the following analog holds, which clarifies our interest in G-conjugates:H p ψ , ψ q ď E „ (cid:60) Log ψ p T q ψ p T q ` „ˇˇˇˇ (cid:61) Log ψ p T q ψ p T q ˇˇˇˇ for T „ F ˚ , where Log stands for the principal branch of the complex logarithm. In fact, inthis article we will need a further generalization, where ψ does not need to bea G-transform of a density. Proposition 1 (Kullback-Leibler inequality for G-transforms) . Let ψ be theG-transform of an absolutely continuous distribution F on S p p R q , and let ψ h´etelat and Wells/Mid-scale Wishart asymptotics be an integrable function S p p R q Ñ C . Then H p ψ , ψ q ď „ ż S p p R q | ψ |p T q dT ´ ` E „ (cid:60) Log ψ p T q ψ p T q ` ż S p p R q | ψ |p T q dT ¨ E „ˇˇˇˇ (cid:61) Log ψ p T q ψ p T q ˇˇˇˇ for T „ F ˚ , where Log stands for the principal branch of the complex logarithm.Proof.
We can writeH p ψ , ψ q “ ż S p p R q | ψ |p T q ` | ψ |p T q| ´ s ψ { ψ { ´ ψ { s ψ { dT “ „ ż S p p R q | ψ |p T q| ´ ` ´ ż S p p R q « ψ { p T q ψ { p T q ` ¯ ψ { p T q ¯ ψ { p T q ff | ψ |p T q dT “ „ ż S p p R q | ψ |p T q| ´ ` „ ´ ż S p p R q (cid:60) ψ { p T q ψ { p T q + | ψ |p T q dT “ „ ż S p p R q | ψ |p T q| ´ ` „ ´ ż S p p R q exp " ´ (cid:60) Log ψ p T q ψ p T q * ¨ cos ˆ (cid:61) Log ψ p T q ψ p T q ˙ | ψ |p T q dT . Now using the inequality ´ cos p x q ď ´ ` a | x | that holds for any x P R . Thelast quantity is bounded as ď „ ż S p p R q | ψ |p T q| ´ ` „ ´ ż S p p R q exp " ´ (cid:60) Log ψ p T q ψ p T q * | ψ |p T q dT ` ż S p p R q exp " ´ (cid:60) Log ψ p T q ψ p T q *dˇˇˇˇ (cid:61) Log ψ p T q ψ p T q ˇˇˇˇ | ψ |p T q dT. In the second term, use 1 ´ x ď ´ log p x q for x ě
0, while in the third term, usethe Cauchy-Schwarz inequality to obtain ď „ ż S p p R q | ψ |p T q| ´ ´ ż S p p R q exp " ´ (cid:60) Log ψ p T q ψ p T q * | ψ |p T q dT h´etelat and Wells/Mid-scale Wishart asymptotics ` ż S p p R q exp " ´ (cid:60) Log ψ p T q ψ p T q * | ψ |p T q dT ż S p p R q ˇˇˇ (cid:61) Log ψ p T q ψ p T q ˇˇˇ | ψ |p T q dT . Now use Jensen’s inequality in the second term and the algebraic identityexp t´ (cid:60) Log ψ p T q{ ψ p T qu “ | ψ |p T q{| ψ |p T q in the third term to obtain ď „ ż S p p R q | ψ |p T q| ´ ` ż S p p R q (cid:60) Log ψ p T q ψ p T q | ψ |p T q dT ` ż S p p R q | ψ |p T q dT ż S p p R q ˇˇˇ (cid:61) Log ψ p T q ψ p T q ˇˇˇ | ψ |p T q dT , as desired.Let us now compute the G-transform of the Gaussian Orthogonal Ensembleand the normalized Wishart distribution, which will be needed in our proofs.The density of a GOE p p q matrix over S p p R q is f GOE p X q “ p p p ` q π p p p ` q exp ! ´
14 tr X ) . (3.9)To compute its G-transform, we will make use of the fact that the elementsof a GOE p p q matrix are independent to reduce the expression to a product ofcharacteristic functions. Proposition 2.
The G-transform of the Gaussian Orthogonal Ensemble densityon S p p R q is ψ GOE p T q “ p p p ` q π p p p ` q exp ! ´ T ) . Proof.
From Equation (3.9), f { is proportional to the density of the ? p p q distribution, so it is integrable. Therefore, we can apply Equation (3.3) to findthat ψ { p T q “ p π p p p ` q ż S p p R q exp ! ´ i tr p T X q ) f { p X q dX “ p p p ` q π p p p ` q ż S p p R q exp ! ´ i tr p T X q ´
18 tr X ) dX “ p p p ` q π p p p ` q ż R p p p ` q{ exp ! ´ i p ÿ k ă l T kl X kl ´ p ÿ k ă l X kl ´ i p ÿ k “ T kk X kk ´ p ÿ k “ X kk ) p ź k ď l dX kl h´etelat and Wells/Mid-scale Wishart asymptotics “ p p p ` q π p p p ` q p ź k ă l ż R exp ! ´ iT kl X kl ) exp (cid:32) ´ X kl ( ? π dX kl ¨ p ź k “ ż R exp ! ´ iT kk X kk ) exp (cid:32) ´ X kk ( ? π dX kk “ p p p ` q π p p p ` q p ź k ă l E ” exp ! ´ ? iT kl Z )ı p ź k “ E ” exp ! ´ iT kk Z )ı for Z „ N p , q . The characteristic function of a N p , q is exp p´ t { q , so “ p p p ` q π p p p ` q p ź k ă l exp ! ´ T kl ) p ź k “ exp ! ´ T kk ) “ p p p ` q π p p p ` q exp ! ´ T ) . Squaring this result yields the desired expression for ψ GOE .In particular, we see that | ψ GOE | is the density of a GOE p p q{ p p q ˚ “ GOE p p q{
4. In particular the Gaussian or-thogonal ensemble is its own G-conjugate, up to a constant factor.Let us now compute the G-transform of the normalized Wishart distribution.Unlike the GOE p p q case, the elements of the matrix are not independent, butthe elements of its Cholesky decomposition are. By being careful about complexchanges of variables, we can reduce the computation of the G-transform to thecomputation of characteristic functions of the Cholesky elements. Proposition 3.
Let n ě p ´ . Then the G-transform of the normalized Wishartdistribution ? n r W p p n, I p { n q ´ I p s density on S p p R q is given by ψ NW p T q “ C n,p exp " i ? n tr T *ˇˇˇˇ I p ` i T ? n ˇˇˇˇ ´ n ` p ` with C n,p “ p p n ` p q π p p p ` q n p p p ` q Γ p ` n ` p ` ˘ Γ p ` n ˘ . Proof.
Recall the notation m “ n ´ p ´ Y „ W p p n, I p { n q distribution is f W p Y q “ n np r Y ą s np Γ p ` n ˘ exp ! ´ n Y ) | Y | m . If we do a change of variables X “ ? n p Y ´ I p q , so that Y “ I p ` X {? n and p ź i ď j dY ij “ n p p p ` q p ź i ď j dX ij , we see that the normalized Wishart distribution ? n r W p p n, I p { n q ´ I p s has den- h´etelat and Wells/Mid-scale Wishart asymptotics sity f NW p X q “ n p p n ` m q np Γ p ` n ˘ „ I p ` X ? n ą ¨ exp ! ´ n ” I p ` X ? n ı)ˇˇˇ I p ` X ? n ˇˇˇ m . (3.10)Notice that f { is proportional to exp t´ tr ` r n I p s ´ Y ˘ u| Y | n ` p ` ´ p ` , so itmust be proportional to the density of a matrix gamma distribution G p ` n ` p ` , n I p ˘ when n ` p ` ą p ´ , i.e. n ě p ´
2. In particular, it must be integrable. As f NW was obtained by a linear change of variables from f W , f { must be integrabletoo, that is ż S p p R q f { p X q dX ă 8 . (3.11)Therefore, we can apply Equation (3.3) to obtain ψ { NW p T q “ p π p p p ` q ż S p p R q exp ! ´ i tr p T X q ) f { p X q dX “ p π p p p ` q E ” exp ! ´ i tr p T X q ) f ´ { p X q ı “ p p n ´ q Γ p ` n ˘ π p p p ` q n p p n ` m q E „ exp " ´ i tr p T X q ` n ” I p ` X ? n ı*ˇˇˇ I p ` X ? n ˇˇˇ ´ m . If we rewrite the expectation in terms of Y “ I p ` X {? n , this last expressionequals “ p p n ´ q Γ p ` n ˘ π p p p ` q n p p n ` m q exp ` i ? n tr T ˘ E „ etr "´ ´ i ? nT ` n I p ¯ Y * | Y | ´ m . (3.12)Since T is real symmetric, there must be a spectral decomposition T “ ODO t with O real orthogonal and D real diagonal. As O t Y O has the same distributionas Y , namely W p p n, I p { n q , we can rewrite Equation (3.12) as “ p p n ´ q Γ p ` n ˘ π p p p ` q n p p n ` m q exp ` i ? n tr T ˘ E „ etr "´ ´ i ? nD ` n I p ¯ Y * | Y | ´ m . (3.13)Now, since Y is positive-definite it has a Cholesky decomposition Y “ U t U with U upper-triangular. According to Bartlett’s theorem [see Muirhead, 1982,Theorem 3.2.14], all the elements of U are independent, the diagonal elementshave the distribution U kk „ χ n ´ k ` { n and the upper diagonal elements have U kl „ N p , { n q for k ă l . Sincetr „´ ´ i ? nD ` I p ¯ Y “ p ÿ j,k,l ´ ´ i ? nD ` n I p ¯ jk U kl U lj h´etelat and Wells/Mid-scale Wishart asymptotics “ p ÿ l ď k ´ ´ i ? nD kk ` n ¯ U lk and | Y | “ ś pk “ U kk , we have by independence and Equation (3.13) that “ p p n ´ q Γ p ` n ˘ π p p p ` q n p p n ` m q exp i ? n tr T + p ź l ă k E „ exp "´ ´ i ? nD kk ` n ¯ U lk * ¨ p ź k “ E „ exp "´ ´ i ? nD kk ` n ¯ U kk * p U kk q ´ m . (3.14)We will now compute these expected values in several steps. For a given 1 ď k ď p , letA “ E „ exp "´ ´ i ? nD kk ` n ¯ T kk * T ´ m kk . (3.15)Since T kk „ χ n ´ k ` { n and m “ n ´ p ´
1, we haveA “ n n ´ k ` n ´ k ` Γ ´ n ´ k ` ¯ ż exp "´ ´ i ? nD kk ` n ¯ x * x ´ m ¨ x n ´ k ` ´ exp ! ´ n x ) dx “ n n ´ k ` n ´ k ` Γ ´ n ´ k ` ¯ ż exp " ´ ´ n `? nD kk i ¯ x * x n ´ k ` p ` ´ dx (3.16)Consider the truncated integrands h M p x q “ exp " ´ ´ n ` ? nD kk i ¯ x * x n ´ k ` p ` ´ r ă x ă M s . Clearly this sequence is dominated by the integrable positive function h , | h M p x q| ď h p x q “ exp " ´ n x * x n ´ k ` p ` ´ , ż h p x q dx “ ˆ n ˙ n ´ k ` p ` Γ ˆ n ´ k ` p ` ˙ ă 8 . Therefore, by the Dominated Convergence Theorem and Equation (3.16),A “ n n ´ k ` n ´ k ` Γ ´ n ´ k ` ¯ lim M Ñ8 ż M exp " ´ ´ n ` ? nD kk i ¯ x * x n ´ k ` p ` ´ dx. By the change of variables z “ ` n ` ? nD kk i ˘ x , this can be rewritten “ n n ´ k ` ` n ` ? nD kk i ˘ ´ n ´ k ` p ` n ´ k ` Γ ´ n ´ k ` ¯ lim M Ñ8 ż nM `? nD kk Mi e ´ z z n ´ k ` p ` ´ dz. (3.17)To compute this integral, we use a contour argument. Consider the closed path C “ C ` C ` C given by C a path from 0 to nM , C a path from nM to h´etelat and Wells/Mid-scale Wishart asymptotics xy C nM C nM `? nD kk Mi C Fig 2 . Contour C “ C ` C ` C when D kk ě . The diagram is mirrored around the x axiswhen D kk ă . nM ` ? nD kk M i and finally C a path from nM ` ? nD kk M i to 0. A diagramis provided as Figure 2.As k ď p , z ÞÑ e ´ z z n ´ k ` p ` ´ is entire and its integral over C must be zero.Therefore ˇˇˇˇˇˇˇˇ lim M Ñ8 ż nM `? nD kk Mi e ´ z z n ´ k ` p ` ´ dz ´ Γ ˆ n ´ k ` p ` ˙ˇˇˇˇˇˇˇˇ “ lim M Ñ8 ˇˇˇˇˇˇˇˇ ż nM `? nD kk Mi e ´ z z n ´ k ` p ` ´ dz ´ ż nM e ´ x x n ´ k ` p ` ´ dx ˇˇˇˇˇˇˇˇ “ lim M Ñ8 ˇˇˇˇˇˇˇˇ ż C e ´ z z n ´ k ` p ` ´ dz ˇˇˇˇˇˇˇˇ “ lim M Ñ8 ˇˇˇˇˇˇˇˇ ż nM `? nD kk Mi nM e ´ z z n ´ k ` p ` ´ dz ˇˇˇˇˇˇˇˇ . Do a change of variables z “ nM ` y ? nD kk M i , so that y “ z ´ nM ? nD kk Mi is real onthe path. It yields “ ? n | D kk | lim M Ñ8 Me nM ˇˇˇˇˇˇˇˇ ż e ´ y ? nD kk Mi „ nM ` y ? nD kk M i n ´ k ` p ` ´ dy ˇˇˇˇˇˇˇˇ ď ? n | D kk | ż ˇˇˇˇ n ` y ? nD kk i ˇˇˇˇ n ´ k ` p ` ´ dy ¨ lim M Ñ8 M n ´ k ` p ` e nM . This last integral is finite, since it is continuous on a bounded interval. Therefore h´etelat and Wells/Mid-scale Wishart asymptotics the limit is zero and by Equation (3.17) and the previous expression,A “ n n ´ k ` Γ ´ n ´ k ` p ` ¯ n ´ k ` Γ ´ n ´ k ` ¯ ´ n ` ? nD kk i ¯ ´ n ´ k ` p ` . (3.18)Going back to (3.14), let us now consider the expectations in the secondproducts. For fixed 1 ď l ă k ď p , letB “ E „ exp "´ ´ i ? nD kk ` n ¯ T lk * . (3.19)Since T lk „ χ { n, B “ c n π ż exp "´ ´ i ? nD kk ` n ¯ x * e ´ n x ? x dx “ c n π ż exp " ´ ´ n ` i ? nD kk ¯ x * ? x dx. (3.20)Consider the truncated integrands h M p x q “ exp " ´ ´ n ` i ? nD kk ¯ x * ? x „ M ă x ă M . (3.21)We see that they are dominated by a positive, integrable function h p x q , | h M p x q| ď h p x q “ e ´ n x ? x , ż h p x q dx “ c πn ă 8 . Therefore, by the Dominated Convergence Theorem and Equation (3.20), weconclude thatB “ c n π lim M Ñ8 ż M { M exp " ´ ´ n ` i ? nD kk ¯ x * ? x dx. A complex change of variables z “ ` n ` i ? nD kk ˘ x yields “ c n π ´ n ` i ? nD kk ¯ ´ lim M Ñ8 ż nM `? nD kk Mi n M ` ? nDkkM i e ´ z {? z dz. (3.22)Let’s compute this integral again using a contour integration argument. Considerthe contour C “ C ` C ` C ` C given by C a line from n M to nM , C a linefrom nM to nM ` ? nD kk M i , C a line from nM ` ? nD kk M i to n M ` ? nD kk M i and C a line from n M ` ? nD kk M i to n M . A diagram is provided as Figure 3.Since z ÞÑ e ´ z {? z is holomorphic away from zero, ˇˇˇˇˇˇˇˇ lim M Ñ8 ż nM `? nD kk Mi n M ` ? nDkkM i e ´ z {? z dz ´ ? π ˇˇˇˇˇˇˇˇ “ lim M Ñ8 ˇˇˇˇˇˇˇˇ ż nM `? nD kk Mi n M ` ? nDkkM i e ´ z {? z dz ´ ż nM n M e ´ x {? x dx ˇˇˇˇˇˇˇˇ h´etelat and Wells/Mid-scale Wishart asymptotics xy n M C nM C nM `? nD kk Mi C n M ` ? nDkkM i C Fig 3 . Contour C “ C ` C ` C ` C when D kk ě . The diagram is mirrored around the x axis when D kk ă . “ lim M Ñ8 ˇˇˇˇˇˇˇ ż C e ´ z {? z dz ` ż C e ´ z {? z dz ˇˇˇˇˇˇˇ ď lim M Ñ8 ˇˇˇˇˇˇˇˇ ż nM `? nD kk Mi nM e ´ z {? z dz ˇˇˇˇˇˇˇˇ ` ˇˇˇˇˇˇˇˇ ż n Mn M ` ? nDkkM i e ´ z {? z dz ˇˇˇˇˇˇˇˇ . By changes of variables z “ nM ` y ? nD kk M i and z “ n M ` y ? nD kk M i on thetwo respective integrals, we get “ ? n | D kk | lim M Ñ8 Me nM ˇˇˇˇˇˇˇ ż e y ? nD kk Mi ” nM ` y ? nD kk M i ı dy ˇˇˇˇˇˇˇ ` ? n | D kk | lim M Ñ8 M e n M ˇˇˇˇˇˇˇ ż e y ? nDkkM i ” n M ` y ? nD kk M i ı dy ˇˇˇˇˇˇˇ ď ? n | D kk | ż ˇˇˇ n ` y ? nD kk i ˇˇˇ dy ¨ lim M Ñ8 ? Me nM ` ? n | D kk | ż ˇˇˇ n ` y ? nD kk i ˇˇˇ dy ¨ lim M Ñ8 ? M e n M . Since ˇˇˇ n ` y ? nD kk i ˇˇˇ ´ “ ´ n ` y nD kk ¯ ´ is continuous on r , s , a boundedinterval, we conclude that the integrals are finite and that the limits are zero. h´etelat and Wells/Mid-scale Wishart asymptotics Therefore, by Equation (3.22),B “ c n ´ n ` i ? nD kk ¯ ´ . (3.23)Recall the definitions of A and B at Equations (3.15) and (3.19). Combiningboth Equations (3.18) and (3.23) into the expression for ψ { at Equation (3.14)provides ψ { p T q “ p p n ´ q Γ p ` n ˘ π p p p ` q n p p n ` m q exp " i ? n tr T * p ź l ă k c n ´ n ` i ? nD kk ¯ ´ ¨ p ź k “ n n ´ k ` Γ ´ n ´ k ` p ` ¯ n ´ k ` Γ ´ n ´ k ` ¯ ´ n ` ? nD kk i ¯ ´ n ´ k ` p ` “ p p n ´ q Γ p ` n ˘ π p p p ` q n p p n ` m q p ź k “ n n ´ k ` ` k ´ n ´ k ` ` k ´ p ź k “ Γ ´ n ´ k ` p ` ¯ Γ ` n ´ k ` ˘ exp " i ? n tr T * ¨ p ź k “ ´ n ´ ? nD kk i ¯ ´ n ´ k ` p ` ´ k ´ “ n p p n ` p ` q p p n ` q π p p p ` q p ź k “ Γ ´ n ´ k ` p ` ¯ Γ ` n ´ k ` ˘ exp " i ? n tr T *ˇˇˇˇ n I p ` i ? nT ˇˇˇˇ ´ n ` p ` . But by Muirhead [1982, Theorem 2.1.12], p ź k “ Γ ´ n ´ k ` p ` ¯ Γ ` n ´ k ` ˘ “ π p p p ´ q p ś k “ Γ ` n ` p ` ` ´ k ˘ π p p p ´ q p ś k “ Γ ` n ` ´ k ˘ “ Γ p ` n ` p ` ˘ Γ p ` n ˘ , so by taking a n { ψ { p T q “ p p n ` p q π p p p ` q n p p p ` q Γ p ` n ` p ` ˘ Γ { p ` n ˘ exp " i ? n tr T *ˇˇˇˇ I p ` i T ? n ˇˇˇˇ ´ n ` p ` . Squaring this result yields the desired expression for ψ NW .By Proposition 3, when n ě p ´ S p p R q given by | ψ NW |p T q “ p p n ` p q π p p p ` q n p p p ` q Γ p ` n ` p ` ˘ Γ p ` n ˘ ˇˇˇˇ I p ` T n ˇˇˇˇ ´ n ` p ` . (3.24)As mentioned in the paragraph following Equation (3.4), the G-conjugate ofa χ n { n distribution is a scaled t n { . Thus, by analogy, Equation (3.24) shouldbe represent some kind of generalization of the t distribution to the real sym-metric matrices. Matrix-variate generalizations of the t distribution have beeninvestigated in the past, but not for symmetric matrices. Hence it appears theconcept is new. h´etelat and Wells/Mid-scale Wishart asymptotics This motivates us to propose in Section 4 a candidate for a symmetric matrixvariate t distribution. Using that definition, the G-conjugate to the normalizedWishart could then be regarded as the t distribution with n { I p {
8, which we denote T n { p I p { q . But regardless of its name,this distribution will play a key role in our results about the middle-scale regimeasymptotics of Wishart matrices, and will be investigated in depth in Section4.
4. The symmetric matrix variate t distribution In Section 3, Equation (3.24), we proved that when n ě p ´
2, the G-conjugateof the normalized Wishart distribution ? n r W p p n, I p { n q ´ I p s has density on S p p R q given by | ψ NW |p T q “ p p n ` p q π p p p ` q n p p p ` q Γ p ` n ` p ` ˘ Γ p ` n ˘ ˇˇˇˇ I p ` T n ˇˇˇˇ ´ n ` p ` . (4.1)Two remarks are in order. First, we are unaware of any matrix calculus toolsthat could let us integrate this expression directly. Thus, the mere fact thatthis expression integrates to unity, a consequence of being the G-conjugate ofanother distribution, seems remarkable.Second, when p “
1, this is the t n { {? t distribution to S p p R q , the spaceof real-valued symmetric matrices. The purpose of this section is to proposea candidate definition for such generalization, as well as prove several resultsconcerning the normalized Wishart G-conjugate.To the best of our knowledge, no extension of the t distribution to symmetricmatrices has ever been proposed. However, a non-symmetric matrix variate t distribution has been thoroughly investigated in the literature – see Gupta andNagar [1999, Chapter 4] for a thorough summary. Several definitions exist. Forour purposes, we say that a p ˆ q real-valued random matrix T has the matrixvariate t distribution with ν degrees of freedom and q ˆ q positive-definite scalematrix Ω if it has densityΓ p ´ ν ` p ` q ´ ¯ ν pq π pq Γ p ´ ν ` p ´ ¯ | Ω | ´ p ˇˇˇˇ I p ` T Ω ´ T t ν ˇˇˇˇ ´ ν ` p ` q ´ . It is not exactly clear what should be the proper analog of this distributionfor symmetric matrices. But it would be elegant if the degrees of freedom ofEquation (4.1) were to be exactly n {
2, as in the univariate case. Thus, thefollowing definition seems natural.
Definition 2 (Symmetric matrix variate t distribution) . We say a real sym-metric p ˆ p matrix T has the symmetric matrix variate t distribution with ν ě p { ´ p ˆ p positive-definite scale matrix Ω, h´etelat and Wells/Mid-scale Wishart asymptotics denoted T ν p Ω q , if it has density f T n p Ω q p T q 9 ˇˇˇˇ I p ` T Ω ´ Tν ˇˇˇˇ ´ ν `p p ` q{ . With this definition, the G-conjugate to the normalized Wishart distribution,whose density is given by Equation (4.1), is the T n { p I p { q distribution on S p p R q .In fact, since Equation (4.1) integrates to one, we can deduce the normaliza-tion constant of Definition 2. For an arbitrary degrees of freedom parameter ν ,imagine the density | ψ NW | of the G-conjugate of a normalized Wishart distribu-tion with n “ ν ě p ´
2. By virtue of being a G-conjugate, it must integrate tounity. Then from the change of variables T “ Ω ´ S Ω ´ {? dT “ ´ p p p ` q | Ω | ´ p ` dS , we see that1 “ ż S p p R q | ψ NW |p T q dT “ p p p ` q | Ω | p ` ż S p p R q | ψ NW | ˆ Ω ´ S Ω ´ ? ˙ dS “ p p ν ´ q Γ p ´ ν `p p ` q{ ¯ π p p p ` q ν p p p ` q Γ p p ν q | Ω | ´ p ` ż S p p R q ˇˇˇˇ I p ` S Ω ´ Sν ˇˇˇˇ ´ ν `p p ` q{ dS. Thus we must have f T n p Ω q p T q “ p p ν ´ q Γ p ´ ν `p p ` q{ ¯ π p p p ` q ν p p p ` q Γ p p ν q | Ω | ´ p ` ˇˇˇˇ I p ` T Ω ´ Tν ˇˇˇˇ ´ ν `p p ` q{ . (4.2)It would be interesting to see if this distribution satisfies the properties we wouldexpect of a t distribution, to ensure our guess is the “correct” one. However, thiswould take us too far away from the topic of this article. Instead, we will focusin the rest of this section on proving results about T n { p I p { q , the G-conjugateto the normalized Wishart distribution.Our first result will concern the asymptotic expansion of its normalizationconstant. We mention that this constant is the same as the C n,p term appearingin the expression of the G-transform of the normalized Wishart in Proposition3. Lemma 1.
The normalization constant of the T n { p I p { q distribution C n,p “ p p n ` p q π p p p ` q n p p p ` q Γ p ` n ` p ` ˘ Γ p ` n ˘ (4.3) has, for every K P N , the asymptotic expansion C n,p “ p p p ` q π p p p ` q exp " ´ K ` ÿ k “ r k even s k p k ` qp k ` q p k ` n k ´ K ` ÿ k “ ` r k even s k p k ` q p k ` n k ` o ´ p K ` n K ` ¯* . as n Ñ 8 with p { n Ñ . h´etelat and Wells/Mid-scale Wishart asymptotics Proof.
By Stirling’s approximation applied to log Γ, as well as Muirhead [1982,Theorem 2.1.12], we find thatlog Γ p p x q “ p p p ´ q π ` p ÿ i “ log Γ ˆ x ´ i ´ ˙ “ p p p ´ q π ` p ÿ i “ „ˆ x ´ i ´ ´ ˙ log ˆ x ´ i ´ ˙ ´ ˆ x ´ i ´ ˙ `
12 log 2 π ` O ˆ x ˙ “ p p p ` q π ` p ´ px ` p p p ´ q ` p ÿ i “ ˆ x ´ i ˙ log ˆ x ´ i ´ ˙ ` O ´ px ¯ as x Ñ 8 . Thus2 log Γ p ˆ n ` p ` ˙ ´ log Γ p ´ n ¯ “ p p p ` q π ` p ´ p p p ` q ` p ÿ i “ ˆ n ` p ` ´ i ˙ log ˆ n ` p ` ´ i ´ ˙ ´ p ÿ i “ ˆ n ´ i ˙ log ˆ n ´ i ´ ˙ ` o p q“ p p p ` q π ´ p p n ` p ´ q ` p p p ` q n ´ p p p ` q ` p ÿ i “ ´ n ´ r i ´ p ´ s ¯ log ˆ ´ i ´ p ´ n ˙ ´ p ÿ i “ p n ´ i q log ˆ ´ i ´ n ˙ ` o p q , as n Ñ 8 with p { n Ñ
0, and so by Equation (4.3),log C n,p “ p p p ` q ´ p p p ` q π ´ p p p ` q ` p ÿ i “ ´ n ´ r i ´ p ´ s ¯ log ˆ ´ i ´ p ´ n ˙ ´ p ÿ i “ p n ´ i q log ˆ ´ i ´ n ˙ ` o p q . (4.4)Let us now focus on the two sums in this expression. Recall that for any k ě ´ log p ´ x q “ x ` x ` x ` ¨ ¨ ¨ ` x k k ` O p x k ` q as x Ñ , (4.5)even for negative x . Therefore, ´ p ÿ i “ p n ´ i q log ˆ ´ i ´ n ˙ h´etelat and Wells/Mid-scale Wishart asymptotics “ p ÿ i “ n K ` ÿ k “ p i ´ q k k n k ´ p ÿ i “ i K ` ÿ k “ p i ´ q k k n k ` O ˆ p K ` n K ` ˙ “ p ÿ i “ p i ´ q ` p ÿ i “ K ` ÿ k “ ˆ i ´ k ´ ´ ik ˙ p i ´ q k n k ` O ˆ p K ` n K ` pn ˙ “ p p p ´ q ´ p ÿ i “ K ` ÿ k “ ˆ i ´ k p k ` q ` k ˙ p i ´ q k n k ` o ˆ p K ` n K ` ˙ “ p p p ´ q ´ K ` ÿ k “ k p k ` q n k p ´ ÿ i “ i k ` ´ K ` ÿ k “ k n k p ´ ÿ i “ i k ` o ˆ p K ` n K ` ˙ . Now let B k denote the Bernoulli numbers, with the convention B “ . Faul-haber’s formula provides “ p p p ´ q ´ K ` ÿ k “ k p k ` q n k ¨ k ` k ` ÿ l “ B k ` ´ l p p ´ q l ´ K ` ÿ k “ k n k ¨ k ` k ` ÿ l “ B k ` ´ l p p ´ q l ` o ˆ p K ` n K ` ˙ . But by the binomial theorem, p p ´ q k ` n k “ p k ` n k ´ p k ` q p k ` n k ` o p q , p p ´ q k ` n k “ p k ` n ` o p q and p p ´ q l n k “ o p q for any 1 ď l ď k . Thus “ p p p ´ q ´ K ` ÿ k “ „ˆ k ` k ` ˙ B k p k ` qp k ` q p k ` ´ p k ` q p k ` n k ` ˆ k ` k ` ˙ B k p k ` qp k ` q p k ` n k ` o p q ´ K ` ÿ k “ „ˆ k ` k ` ˙ B k p k ` q p k ` n k ` o p q ` o ˆ p K ` n K ` ˙ . Using that B “ B “ , we obtain “ p p p ´ q ´ K ` ÿ k “ k p k ` qp k ` q p k ` n k ´ K ` ÿ k “ k p k ` q p k ` n k ` o ˆ p K ` n K ` ˙ . (4.6)The analysis of the other sum of Equation (4.4) is similar but more involved,as we must distinguish the cases where p is even and where p is odd. We find,from Equation (4.5) again, that12 p ÿ i “ ´ n ´ r i ´ p ´ s ¯ log ˆ ´ i ´ p ´ n ˙ “ ´ p ÿ i “ p i ´ p ´ q log ˆ ´ i ´ p ´ n ˙ ` p ÿ i “ n log ˆ ´ i ´ p ´ n ˙ h´etelat and Wells/Mid-scale Wishart asymptotics “ p ÿ i “ p i ´ p ´ q K ` ÿ k “ p i ´ p ´ q k k n k ´ p ÿ i “ n K ` ÿ k “ p i ´ p ´ q k k n k ` O ˆ p K ` n K ` ˙ “ p ÿ i “ p i ´ p ´ q ` p ÿ k “ K ` ÿ k “ ˆ i ´ p ´ k ´ i ´ p ´ k ` ˙ p i ´ p ´ q k n k ` O ˆ p K ` n K ` pn ˙ “ p ` p ÿ i “ K ` ÿ k “ ˆ i ´ p ´ k p k ` q ` k ˙ p i ´ p ´ q k n k ` o ˆ p K ` n K ` ˙ “ p ` K ` ÿ k “ k p k ` q n k p ÿ i “ p i ´ p ´ q k ` ` K ` ÿ k “ kn k p ÿ i “ p i ´ p ´ q k ` o ˆ p K ` n K ` ˙ “ p ` K ` ÿ k “ k p k ` q n k „ p´ p ´ q k ` ` p´ p ` q k ` ` p ÿ i “ p i ´ p ´ q k ` ` K ` ÿ k “ kn k „ p´ p ´ q k ` p´ p ` q k ` p ÿ i “ p i ´ p ´ q k ` o ˆ p K ` n K ` ˙ . (4.7)At this point, it is simpler to analyze the cases where p is even and odd sepa-rately. If p is odd, define q “ p p ´ q{ p ÿ i “ p i ´ p ´ q l “ ” `p´ q l ı q ÿ i “ p i q l “ r l even s l ` l ` ÿ s “ ˆ l ` s ˙ B l ` ´ s l ` ´ s p q q s . By the binomial theorem, p q q k ` n k “ p k ` n k ´ p k ` q p k ` n k ` o p q , p q q k ` n k “ p k ` n k ` o p q and p q q l n k “ o p q for 1 ď l ď k . Moreover, p´ p ´ q k ` n k “ p´ p ` q k ` n k “ p´ p q k ` n k ` o p q (4.8)and p´ p ´ q k n k “ p´ p ` q k n k “ o p q . (4.9)Thus, for odd p , Equation 4.7 equals “ p ` K ` ÿ k “ k p k ` q „ ` r k odd s ˆ k ` k ` ˙ B k ` p k ` ´ p k ` q p k ` n k ` r k odd s ˆ k ` k ` ˙ B k ` p k ` n k ` p´ p q k ` n k ` o p q h´etelat and Wells/Mid-scale Wishart asymptotics ` K ` ÿ k “ k „ r k even s ˆ k ` k ` ˙ B k ` p k ` n k ` o p q ` o ˆ p K ` n K ` ˙ . Moreover,2 p´ p q k ` ´ r k odd s p k ` ` r k odd s p k ` ` r k even s p k ` “ ´ r k even s p k ` . Thus, “ p ` K ` ÿ k “ r k odd s k p k ` qp k ` q p k ` n k ´ K ` ÿ k “ r k even s k p k ` q p k ` n k ` o ˆ p K ` n K ` ˙ . (4.10)When p is even, let q “ p p ´ q{ p ÿ i “ p i ´ p ´ q l “ ” `p´ q l ı q ÿ i “ p i ´ q l “ r l even s ˆ q ÿ i “ i l ´ q ÿ i “ p i q l ˙ “ r l even s l ` l ` ÿ s “ ˆ l ` s ˙ B l ` ´ s p ´ l ` ´ s q p q q s . But by the binomial theorem, p q q k ` n k “ p k ` n k ´ p k ` q p k ` n k ` o p q , p q q k ` n k “ p k ` n k ` o p q and p q q l n k “ o p q for 1 ď l ď k . If we apply Equations (4.8)–(4.9),then Equation (4.7) becomes “ p ` K ` ÿ k “ k p k ` q „ r k odd s ˆ k ` k ` ˙ p ´ q B k ` p k ` ´ p k ` q p k ` n k ` r k odd s ˆ k ` k ` ˙ p ´ q B k ` p k ` n k ` o p q ` K ÿ k “ k „ r k even s ˆ k ` k ` ˙ p ´ q B k ` p k ` n k ` o p q ` o ˆ p K ` n K ` ˙ . Moreover,2 p´ p q k ` ´ r k odd s p k ` ` r k even s p k ` “ ´ r k even s p k ` . Thus again, “ p ` K ` ÿ k “ r k odd s k p k ` qp k ` q p k ` n k ´ K ` ÿ k “ r k even s k p k ` q p k ` n k ` o ˆ p K ` n K ` ˙ . (4.11)which is the exact same result as in the odd p case (see Equation 4.10). PluggingEquations (4.6) and (4.10)–(4.11) in Equation (4.4), we obtainlog C n,p “ p p p ` q ´ p p p ` q π ´ K ` ÿ k “ r k even s k p k ` qp k ` q p k ` n k ´ K ` ÿ k “ ` r k even s k p k ` q p k ` n k ` o ˆ p K ` n K ` ˙ , as desired. h´etelat and Wells/Mid-scale Wishart asymptotics Thus the constant C n,p is closely related to the normalization constant of theGOE p p q distribution 2 p p p ` q{ { π p p p ` q{ .We now turn our attention to the study of the asymptotic moments of a T n { p I p { q distribution. We first remind the reader of some classic results. For aGaussian Orthogonal Ensemble matrix Z „ GOE p p q , a moment-based approachto Wigner’s theorem states that for any k P N , its k th moment satisfylim p Ñ8 E „ p tr ´ Z ? p ¯ k “ C k { r k even s , where C k “ k ` ` kk ˘ is the k th Catalan number. In fact, Anderson et al.[2010, section 2.1.4 on p.17] show that the variance of the k th moment satisfieslim p Ñ8 Var “ p tr p Z {? p q k ‰ “
0, so we really have1 p tr ´ Z ? p ¯ k L ÝÑ C k { r k even s as p Ñ 8 .Now, what do we know about the moments of T n { p I p { q ? By symmetry,E “ tr T k ‰ “ k , but it is much less clear what happens for even k . Itturns out that in many ways, if T „ T n { p I p { q then 4 T „ T n { p I p q mimics theGaussian Orthogonal Ensemble results outlined above, especially when p { n Ñ n Ñ 8 . We have the following result.
Theorem 2.
Let k P N and T „ T n { p I p { q . If p { n Ñ c P r , q , the moments of T satisfy the asymptotic bounds E “ tr T k ‰ “ O p p k ` q and E “ tr T k ‰ “ O p p k ` q as n Ñ 8 . In fact, for any k P N , p tr ´ T ? p ¯ k L ÝÑ C k { r k even s as n, p Ñ 8 with p { n Ñ , where C k “ k ` ` kk ˘ is the k th Catalan number.
Although our proof will rely on the close relationship between the Wishartand the t distribution, it is worthwhile to step back and think why a T n { p I p q should behave like a GOE p p q when p { n Ñ
0. One good reason might be theclassic result that as n Ñ 8 , the density of a t distribution converges pointwiseto a standard normal density. Thus, we might think that as long as p does notgrow too fast, in some aspects the symmetric t distribution should behave likea GOE p p q .In the context of the proof, it will prove useful to use the notion of power sumsymmetric polynomials. For any integer partition κ “ p κ , . . . , κ q q in decreasingorder κ ě ¨ ¨ ¨ ě κ q ą
0, define its associated power sum polynomial to be r κ p Z q “ q ź i “ tr Z κ i . (4.12)The norm of the partition κ is | κ | “ κ ` ¨ ¨ ¨ ` κ q ą
0, which should not beconfused with its length q p κ q “ q (number of elements).By convention, we will assume there also exists an empty partition ∅ “ pq with length q p ∅ q “
0, norm | ∅ | “ r ∅ p Z q “ h´etelat and Wells/Mid-scale Wishart asymptotics Let’s now turn to the proof of the theorem. The odd moments of the T n { p I p { q moments are zero by symmetry, so it makes sense to focus on the even momentsE “ tr T k ‰ and the square moments E “ tr T k ‰ . Our first step in the proof is toexpress these in terms of expectations of power sum polynomials of an inverseWishart Y ´ „ W ´ p p n, I p { n q , where by power sum polynomials we mean ex-pressions like at Equation (4.12). Recall the useful shorthand m “ n ´ p ´ Lemma 2.
Let T „ T n { p I p { q . Then for any k P N , whenever n is large enoughso that n ě p ` k ` , we can compute the k th moment of T by E “ tr T k ‰ “ p´ q k n k ż Y ą n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m ¨ p ÿ i ,...,i k B s B s X i i k . . . B s B s X i i B s B s X i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m ˇˇˇˇ X “ dY “ p´ q k n k ÿ | κ |ď k b p q κ p n, m, p q E “ r κ p Y ´ q ‰ (4.13) and its squared k th moment by E “ tr T k ‰ “ p´ q k n k ż Y ą n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m ¨ p ÿ i ,...,i k j ,...,j k B s B s X j j k . . . B s B s X j j B s B s X j j B s B s X i i k . . . B s B s X i i B s B s X i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m ˇˇˇˇ X “ dY “ p´ q k n k ÿ | κ |ď k ` b p q κ p n, m, p q E “ r κ p Y ´ q ‰ , (4.14) for Y ´ „ W ´ p p n, I p { n q and some b p q κ , b p q κ . These b p q κ , b p q κ are polynomials in n, m, p , indexed by integer partitions κ , whose degrees satisfy deg b p q κ ď k ` ´ q p κ q and deg b p q κ ď k ` ´ q p κ q . The sums are taken over all partitions of theintegers κ satisfying | κ | ď k and | κ | ď k ` respectively, including the emptypartition.Proof. Let f NW and ψ NW stand for the density and the G-transform of a nor-malized Wishart matrix ? n r W p p n, I p { n q ´ I p s . In the proof of Proposition 3, weconcluded at Equation (3.11) that f { had to be integrable when n ě p ´
2, asits integral was proportional to a multivariate gamma function. Let R p X q “ ´ X be the flip operator. Since f { is integrable, f { ˝ R must be integrable as well, h´etelat and Wells/Mid-scale Wishart asymptotics and so their convolution f { ‹ ` f { ˝ R ˘ is well-defined and integrable.At Equation (3.1) at the start of Section 3, we defined our notion of Fouriertransform for integrable functions on S p p R q . Define the map ι : S p p R q Ñ R p p p ` q{ that maps a symmetric matrix to its vectorized upper triangle, andlet τ : S p p R q Ñ S p p R q be the map τ p X q ij “ X ij if i ‰ jX jj if i “ . Then in terms of the usual Fourier transform on R p p p ` q{ , F t f up T q “ p p p ´ q F (cid:32) f ˝ ι ´ (` ι ˝ τ p T q ˘ . This close relationship transfer properties to our Fourier transform on S p p R q .We will need three.1. For any integrable function f , we have F (cid:32) f ˝ R ( “ F t f u .2. ( Convolution ) For any two integrable functions f and f , we have F t f ‹ f u “ p π p p p ` q F t f u F t f u .3. ( Fourier inversion ) For any continuous integrable f with integrable Fouriertransform φ , we have f p X q “ p π p p p ` q ż S p p R q e i tr p T X q φ p T q dT “ F t φ up´ X q , for all X P S p p R q .These properties are important for the following. Since f NW is real-valued, prop-erties 1 and 2 provide2 ´ p π ´ p p p ` q F ! f { ‹ ` f { ˝ R ˘) “ F (cid:32) f { ) F (cid:32) f { ˝ R ( “ F (cid:32) f { ( F (cid:32) f { ( “ ψ { ψ { “ ˇˇ ψ { ˇˇ . But then, since | ψ NW | is integrable (in fact, to unity), the Fourier inversionformula yields that f { ‹ ` f { ˝ R ˘ p X q “ ż S p p R q e i tr p T X q ˇˇ ψ NW p T q ˇˇ dT. (4.15)Thus we might say the characteristic function of the T n { p I p { q distribution isgiven by f { ‹ ` f { ˝ R ˘ . It is well known that the derivatives of the character-istic function of a distribution evaluated at zero provide its moments, up to aconstant. This suggests we should try to repeatedly differentiate f { ‹ ` f { ˝ R ˘ at zero to compute E “ tr T k ‰ and E “ tr T k ‰ , our ultimate goal.Unfortunately, the convolution is given by an integral whose domain makesit difficult to directly interchange the differentiation and integration symbols.Because the integrand is orthogonally invariant, we found it easier to computethe derivatives at zero by taking a limit over a sequence of decreasing positive- h´etelat and Wells/Mid-scale Wishart asymptotics definite matrices at both sides instead. In this spirit, define on the open set t ă X ă I p u Ă S p p R q the real-valued functions H p X q “ p´ q k n k p ÿ i ,...,i k B s B s X i i k . . . B s B s X i i B s B s X i i f { ‹ ` f { ˝ R ˘ p? nX q and H p X q “ p´ q k n k p ÿ i ,...,i k j ,...,j k B s B s X j j k . . . B s B s X j j B s B s X j j B s B s X i i k . . . B s B s X i i B s B s X i i f { ‹ ` f { ˝ R ˘ p? nX q for fixed k , p and n . Here B s B s X ij stands for the symmetric differentiation operator B s B s X ij “ ` δ ij BB X ij , as defined in Section 2. The ? n scaling in the argumenthelps link the convolution to an expectation with respect to an inverse Wishartdistribution.Let’s first relate these functions to the moments of the T n { p I p { q distri-bution. The symmetric differentation operator has the pleasant property that B s B s X ij tr p XT q “ T ij for any two symmetric matrices X , T . Thus, for any 1 ď l ď k and indices 1 ď i , . . . , i l ď p , we find that ˇˇˇˇ B s B s X i l i l ´ . . . B s B s X i i B s B s X i i e i ? n tr p T X q ˇˇ ψ NW ˇˇ p T q ˇˇˇˇ “ n l ˇˇˇ T i l i l ´ ¨ ¨ ¨ T i i T i i ˇˇˇˇˇ ψ NW ˇˇ p T q (4.16)for all X P S p p R q .We now show that the right hand side (4.15) is integrable. This is not a mereformality: when p “
1, asking if this expression is integrable is the same asasking if the t distribution with n { l th moment, andit is well-known that the t distribution only possesses moments of order smallerthan its degrees of freedom. So the answer is most likely to be positive, but onlyfor n large enough.Let us see why. For any symmetric matrix T , | T ij | ď a λ p T q ď gffe p ź i “ ´ ` λ i p T q ¯ “ bˇˇ I p ` T ˇˇ , where λ p T q ě ¨ ¨ ¨ ě λ p p T q ě T . Thus ż S p p R q n l ˇˇˇ T i l i l ´ ¨ ¨ ¨ T i i ˇˇˇˇˇ ψ NW ˇˇ p T q dT “ n l l C n,p ż S p p R q ˇˇˇ T i l i l ´ ? n ¨ ¨ ¨ T i i ? n ˇˇˇˇˇˇˇ I p ` T n ˇˇˇˇ ´ n ` p ` dT h´etelat and Wells/Mid-scale Wishart asymptotics ď n l l C n,p ż S p p R q ˇˇˇˇ I p ` T n ˇˇˇˇ ´ p n ´ l q` p ` dT ď n l C n,p l C n ´ l,p ˆ nn ´ l ˙ p p p ` q ż S p p R q C n ´ l,p ˇˇˇˇ I p ` T n ´ l ˇˇˇˇ ´ p n ´ l q` p ` dT. (4.17)When n ´ l ě p ´
2, the last integrand is the density of a T n { ´ l p I p { q distribution, so integrates to unity. Thus, when n ě p ` k ´
2, the righthand side of Equation (4.16) is an integrable function for all 1 ď l ď k and1 ď i , . . . , i l ď p . By Equation (4.15), and repeated differentiation under theintegral sign justified by the integrability bounds given by Equations (4.16) and(4.17), we find that H p X q “ p´ q k n k p ÿ i ,...,i k B s B s X i i k . . . B s B s X i i B s B s X i i ż S p p R q e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇ dT “ ż S p p R q tr T k e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇ dT (4.18)and H p X q “ p´ q k n k p ÿ i ,...,i k j ,...,j k B s B s X j j k . . . B s B s X j j B s B s X i i k . . . B s B s X i i B s B s X i i ż S p p R q e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇ dT “ ż S p p R q tr T k e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇ dT. (4.19)for any X P S p p R q and any n ě p ` k ´ H and H to the definition of f { ‹ ` f { ˝ R ˘ as a convolution.This is where restricting H and H to small positive-definite matrices becomesuseful. By Equation (3.10), the expression is f { ‹ ` f { ˝ R ˘ p? nX q “ ż S p p R q f { p Z q f { p Z ´ ? nX q dZ “ n np np Γ p ` n ˘ ż Y ` X ą , Y ą exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m exp ! ´ n Y )ˇˇ Y ˇˇ m dY using the change of variables Y “ I p ` Z {? n ´ X with dZ “ n p p p ` q dY . For h´etelat and Wells/Mid-scale Wishart asymptotics X ą
0, we have r Y ` X ą , Y ą s “ r Y ą s , and thus H , H satisfy H p X q “ p´ q k n k p ÿ i ,...,i k B s B s X i i k . . . B s B s X i i B s B s X i i ż Y ą exp ! ´ n p Y ` X q ) ¨ ˇˇ Y ` X ˇˇ m n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY (4.20)and H p X q “ p´ q k n k p ÿ i ,...,i k j ,...,j k B s B s X j j k . . . B s B s X j j B s B s X j j B s B s X i i k . . . B s B s X i i B s B s X i i ż Y ą exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m ¨ n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY. (4.21)We would now like to interchange the integral and differentiation signs. To doso, we must understand what the repeated derivatives of exp t´ n tr Y ) | Y | m look like. Differentiating once, we see that: B s B s X i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m “ „ m p Y ` X q ´ i i ´ n p I p q i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m . Differentiating twice, we see that: B s B s X i i B s B s X i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m “ „ ´ m p Y ` X q ´ i i p Y ` X q ´ i i ´ m p Y ` X q ´ i i p Y ` X q ´ i i ` m p Y ` X q ´ i i p Y ` X q ´ i i ´ mn p Y ` X q ´ i i p I p q ´ i i ´ mn p I p q ´ i i p Y ` X q ´ i i ` n p I p q ´ i i p I p q ´ i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m . So in general, it is clear that the repeated derivatives are given by some poly-nomial in entries of p Y ` X q ´ , times exp t´ n tr p Y ` X q ) | Y ` X | m . We won’tinvestigate further the nature of these polynomials beyond remarking that forany indices 1 ď l ď k and 1 ď i , . . . , i l ď p , and any symmetric matrices X, Y P S p p R q , we must have some crude bound ˇˇˇˇ B s B s X i l i l ´ . . . B s B s X i i B s B s X i i exp " ´ n p Y ` X q *ˇˇ Y ` X ˇˇ m ˇˇˇˇ h´etelat and Wells/Mid-scale Wishart asymptotics ď l ÿ s “ ÿ J Pt ,...,p u l ˇˇˇ a J,s p n, m q ˇˇˇ l ź t “ s ` ˇˇ p I p q j t j t ´ ˇˇ s ź t “ ˇˇ p Y ` X q ´ j t j t ´ ˇˇ ¨ exp " ´ n p Y ` X q *ˇˇ Y ` X ˇˇ m for some polynomials a J,s that do not depend on X or Y . We relegate a proofof this result as Lemma 3 in Section 8. This can be uniformly bounded for all0 ď X ď I p by ď C p n, m, p q l ÿ s “ tr s p Y ´ q exp ! ´ n Y )“ ` tr Y ‰ mp (4.22)for some constant C p n, m, p q that does not depend on X or Y . But for any n ě p ´ l ě ż Y ą C p n, m, p q l ÿ s “ tr s p Y ´ q exp ! ´ n Y )“ ` tr Y ‰ mp ¨ n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY “ n np C p n, m, p q np Γ p ` n ˘ l ÿ s “ ż Y ą “ ` tr Y ‰ mp tr s p Y ´ q exp ! ´ n Y )ˇˇ Y ˇˇ n ` p ` ´ p ` dY “ ´ n ¯ mp Γ p ` n ` p ` ˘ Γ p ` n ˘ E ” p ` tr Y q mp tr s p Y ´ q ı for a Y with a matrix gamma distribution G p ` n ` p ` , n I p ˘ . The Cauchy-Schwarzinequality then entails the bound ď ´ n ¯ mp Γ p ` n ` p ` ˘ Γ p ` n ˘ E ” p ` tr Y q mp ı E ” tr s p Y ´ q ı . (4.23)The first expectation is always finite when n ě p ´
2. Since tr s p Y ´ q can bewritten as a sum of zonal polynomials indexed by partitions of the integer 2 s , theresults of Muirhead [1982, Theorem 7.2.13] imply that the second expectationis finite whenever n ` p ` ą s ` p ´ ô n ě p ` s ´
2. Thus, in Equation (4.20)with l ď k and (4.21) with l ď k , whenever n ě p ` k ´ H p X q “ p´ q k n k ż Y ą p ÿ i ,...,i k B s B s X i i k . . . B s B s X i i B s B s X i i (4.24) h´etelat and Wells/Mid-scale Wishart asymptotics exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY (4.25)and H p X q “ p´ q k n k ż Y ą p ÿ i ,...,i k j ,...,j k B s B s X j j k . . . B s B s X j j B s B s X j j B s B s X i i k . . . B s B s X i i B s B s X i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m ¨ n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY. (4.26)Let us now look at how H p X q and H p X q behave as X Ñ
0. On one hand,for any symmetric matrix T we have | tr T k | ď a p tr T k ď ? p | I p ` T | k { , sowe must have the bounds ˇˇˇˇ tr T k e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇˇˇˇˇ ď n k C n,p k C n ´ k,p C n ´ k,p ˇˇˇˇ I p ` T n ˇˇˇˇ ´ p n ´ k q` p ` and ˇˇˇˇ tr T k e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇˇˇˇˇ ď pn k C n,p k C n ´ k,p C n ´ k,p ˇˇˇˇ I p ` T n ˇˇˇˇ ´ p n ´ k q` p ` . holding uniformly in X . When n ´ k ě p ´ ô n ě p ` k ´
2, the right handsides are proportional to the density of the G-conjugates of the normalizedWishart distributions with n ´ k degrees of freedom, so are integrable. Thus,by the dominated convergence theorem and Equations (4.18) and (4.19),lim X Ñ ă X ă I p H p X q “ ż S p p R q tr T k lim X Ñ ă X ă I p e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇ dT “ E “ tr T k ‰ (4.27)andlim X Ñ ă X ă I p H p X q “ ż S p p R q tr T k lim X Ñ ă X ă I p e i ? n tr p T X q ˇˇ ψ NW p T q ˇˇ dT “ E “ tr T k ‰ (4.28)for T „ T n { p I { q .On the other hand, the integrands at Equations (4.25) and (4.26) take aparticularly simple form. Lemma 4 establishes by induction that there must bepolynomials b p q κ and b p q κ in n , m and p with degrees deg b p q κ ď k ` ´ q p κ q h´etelat and Wells/Mid-scale Wishart asymptotics and deg b p q κ ď k ` ´ q p κ q such that H p X q “ p´ q k n k ż Y ą ÿ | κ |ď k b p q κ p n, m, p q r κ pr Y ` X s ´ q exp " ´ n p Y ` X q * ¨ ˇˇ Y ` X ˇˇ m n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY (4.29)and H p X q “ p´ q k n k ż Y ą ÿ | κ |ď k ` b p q κ p n, m, p q r κ pr Y ` X s ´ q exp " ´ n p Y ` X q * ¨ ˇˇ Y ` X ˇˇ m n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY (4.30)for any 0 ă X ă I p and n ě p ` k ´
2. The sums are taken over all partitionsof the integers κ satisfying | κ | ď k and | κ | ď k ` κ , the bound r κ pr Y ` X s ´ q e ´ n tr p Y ` X q ˇˇ Y ` X ˇˇ m ď tr | κ | p Y ´ q e ´ n tr Y “ ` tr Y ‰ mp holds uniformly in 0 ď X ď I p . Thus for | κ | ď k `
1, the right hand side isintegrable for n ě p ` k `
6, by the same argument as for Equation (4.23).Thus for such n , by the dominated convergence theorem and Equations (4.29)and (4.30), we obtain thatlim X Ñ ă X ă I p H p X q “ p´ q k n k ż Y ą lim X Ñ ă X ă I p ÿ | κ |ď k b p q κ p n, m, p q r κ pr Y ` X s ´ q¨ exp " ´ n p Y ` X q *ˇˇ Y ` X ˇˇ m n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY “ p´ q k n k ÿ | κ |ď k b p q κ p n, m, p q E “ r κ p Y ´ q ‰ (4.31)andlim X Ñ ă X ă I p H p X q “ p´ q k n k ż Y ą lim X Ñ ă X ă I p ÿ | κ |ď k ` b p q κ p n, m, p q r κ pr Y ` X s ´ q¨ exp " ´ n p Y ` X q *ˇˇ Y ` X ˇˇ m n np np Γ p ` n ˘ exp ! ´ n Y )ˇˇ Y ˇˇ m dY “ p´ q k n k ÿ | κ |ď k ` b p q κ p n, m, p q E “ r κ p Y ´ q ‰ , (4.32) h´etelat and Wells/Mid-scale Wishart asymptotics where Y follows a W p p n, I p { n q distribution. Combining Equations (4.27)–(4.28)with Equations (4.31)–(4.32) and Lemma 4 concludes the proof.Something remarkable about Lemma 2 is that it provides us with an algo-rithm to compute the moments of a symmetric t distribution in terms of themoments of an inverse Wishart matrix. For example, when k “
1, repeateddifferentiation yields that p ÿ i ,i B s B s X i i B s B s X i i exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m “ „ m p m ´ q
16 tr p Y ` X q ´ ´ mn p Y ` X q ´ ` n p ´ m p Y ` X q ´ exp ! ´ n p Y ` X q )ˇˇ Y ` X ˇˇ m . (4.33)We can recognize tr p Y ` X q ´ and tr p Y ` X q as the power sum polynomials r p q pr Y ` X s ´ q and r p , q pr Y ` X s ´ q in the sense of Equation (4.12), so wemust have b p qp q “ m p m ´ q{ b p qp , q “ ´ m { b p qp q “ mn { b p q ∅ “ n p {
16 inthe result of Lemma 4. Hence Lemma 2 really tells us that whenever n ě p ` “ tr T ‰ for T „ T n { p I p { q can be expressed asE “ tr T ‰ “ ´ m p m ´ q n E “ tr Y ´ ‰ ` m n E “ tr Y ´ ‰ ` m “ tr Y ´ ‰ ´ np
16 (4.34)where Y „ W p p n, I p { n q .Of course, this also works with square moments and higher k . For example,the same strategy for, say, square moments with k “ n ě p `
38, E “ tr T ‰ for T „ T n { p I p { q can be expressed asE “ tr T ‰ “ ´ m ` m ´ m ` ˘ n E “ tr Y ´ ‰ ` m p m ´ q n E “ tr Y ´ tr Y ´ ‰ ` m ` m ´ m ` m ´ ˘ n E “ tr Y ´ ‰ ` m n E “ tr Y ´ ‰ ´ m ` m ´ m ` ˘ n E “ tr Y ´ tr Y ´ ‰ ` m p m ´ q n E “ tr Y ´ ‰ ´ m ` m ´ m ` ˘ n E “ tr Y ´ tr Y ´ ‰ ` m n E “ tr Y ´ ‰ ` m p mp ´ p ´ q
128 E “ tr Y ´ ‰ ´ m p´ m ` p q
64 E “ tr Y ´ ‰ ´ mnp
64 E “ tr Y ´ ‰ ` n p
256 (4.35)again where Y „ W p p n, I p { n q .Unfortunately, as we consider larger orders, the repeated differentiation ofexp t´ n tr Z u| Z | m { quickly becomes too cumbersome to perform by hand. But h´etelat and Wells/Mid-scale Wishart asymptotics at least in theory, we can compute expressions like (4.34) and (4.35) for anyE “ tr T k ‰ and E “ tr T k ‰ , and Lemma 2 summarize that fact. That is, using theFourier inversion theorem we have reduced the problem of computing momentsof the t -distribution T n { p I p { q to that of computing expected power sum poly-nomials of the inverse Wishart distribution W ´ p p n, I p { n q , for large enough n .How can we compute expected power sum polynomials of an inverse Wishart?There are two approaches in the literature. Letac and Massam [2004] found anexpression in terms of a different basis, the zonal polynomials, which behaveparticularly nicely with respect to the inverse Wishart distribution, and whoseexpectations have a simple closed form. From this, they provided an algorithmfor computing expected power sum polynomials to arbitrary order. Matsumoto[2012] found expressions of coordinate-wise moments in terms of modified Wien-garten orthogonal functions, from which expectations of power sum polynomialscan be computed. We follow the idea of Letac and Massam [2004] in our asymp-totic analysis.For any integer partition κ , there exist coefficients c κ,λ (which depend solelyon κ and λ ) such that r κ p Y ´ q “ ÿ | λ |“| κ | c λ C λ p Y ´ q for C λ the so-called zonal polynomials. For an overview of the topic with a focuson random matrix theory, see Muirhead [1982, Chapter 7]. The coefficients c κ,λ are explicitly computable. If we follow the normalization of zonal polynomialsof Muirhead [1982], for example, we find that “ r ∅ ‰ “ “ C ∅ ‰ , and „ r p q r p , q “ „ ´ „ C p q C p , q . “ r p q ‰ “ “ C p q ‰ , (4.36)As mentioned, expectations of zonal polynomials with respect to a Wishart orinverse Wishart distribution take a particularly simple form. From Muirhead[1982, Theorem 7.2.13 and Equation (18) on p.237], the expected zonal polyno-mials for Y ´ „ W ´ p p n, I p { n q areE “ C λ p Y ´ q ‰ “ n | λ | | λ | ś q p λ q i “ m ´ i ` C λ p I p q“ | λ | | λ | ! ś q p λ q i ă j p λ i ´ λ j ´ i ` j q ś q p λ q i “ p λ i ` q p λ q ´ i q ! n | λ | q p λ q ź i “ λ i ´ ź l “ p ` p ´ i ` l q m ´ p ´ i ` l q (4.37)for λ ‰ ∅ , and E “ C ∅ p Y ´ q ‰ “
1. For example, the first few expected zonalpolynomials areE “ C ∅ p Y ´ q ‰ “ , E “ C p q p Y ´ q ‰ “ npm , E “ C p , q p Y ´ q ‰ “ n p p p ´ q m p m ` q , E “ C p q p Y ´ q ‰ “ n p p p ` q m p m ´ q . From this, we can exactly compute E “ r κ p Y ´ q ‰ and thus E “ tr T k ‰ and E “ tr T k ‰ , h´etelat and Wells/Mid-scale Wishart asymptotics as a function of p and n (or m ). For example, by Equation (4.36) we find thatE “ r ∅ p Y ´ q ‰ “ , E “ r p q p Y ´ q ‰ “ npm , E “ r p , q p Y ´ q ‰ “ n p p mp ´ p ` q m p m ´ q p m ` q , E “ r p q p Y ´ q ‰ “ n p p m ` p q m p m ´ q p m ` q . and thus, by Equation (4.34), whenever n ě p ` “ tr T ‰ “ np p mp ` m ` q p m ´ q p m ` q . (4.38)In a similar way, we can compute the expected zonal polynomials and hence,the expected power sum polynomials of Y ´ for | κ | “ ,
4. So from Equation(4.35), we obtain for n ě p `
38 thatE “ tr T ‰ “ n p p m ´ q p m ´ q p m ´ q p m ` q p m ` q ´ m p ` m p ` m p ` m ´ m p ` m p ` m p ` m ´ mp ´ mp ` mp ´ p ¯ . (4.39)Of course, this reasoning also works for other k ’s. In particular, we essen-tially derived a (potentially inefficient) algorithm to compute the moments of a T n { p I p { q distribution to arbitrary order on our path to proving this theorem.At this point, it is worthwhile to realize that Equations (4.38) and (4.39) arealready enough to prove the theorem for small moments. For example, when n Ñ 8 such that p { n Ñ c P r , q , then m „ p ´ c q n andE “ tr T ‰ “ m p m ´ q p m ` q ˆ np m ` np m ` n m ˙ „ p ´ c q p , which proves that E “ tr T ‰ “ O p p q . In fact,E „ p tr ´ T ? p ¯ ´ “ m p m ´ q p m ` q ˆ p ` pm ` m ` mp ` m ` m p ˙ „ p ` pm Ñ . (4.40)Moreover, when n, p Ñ 8 such that p { n Ñ
0, then m „ n andE „ˆ p tr ´ T ? p ¯ ´ ˙ “ p E “ tr T ‰ ´ p E “ tr T ‰ ` “ m p m ´ q p m ´ q p m ´ q p m ` q p m ` q ˆ p ` p ` m ` mp ` mp ` mp ` p m ` pm ` m ` m p ` m p ` m p ´ p m ` pm ` m ` m p ` m p ` m p ´ p m ´ pm h´etelat and Wells/Mid-scale Wishart asymptotics ´ m ´ m p ´ m p ´ m ´ m p ´ m p ˙ „ p ` m ` p m Ñ . (4.41)Thus 1 p tr ´ T ? p ¯ ÝÑ “ C and the theorem is proven for the second moment.In theory, we could proceed in the same way for any moment of interest, butnaturally we could never conclude that the theorem holds for all moments thatway. Nonetheless, the calculations give us some hints about how to argue in thegeneral case.The idea is to express the moments of the symmetric t distribution as poly-nomials of p and p { m . There are two regimes where random matrix theory iswell understood: the classical regime where p is held fixed as n Ñ 8 , and thelinear, high-dimensional regime where p grows linearly with n . From this, wecan therefore conclude a few facts regarding the behavior of symmetric t mo-ments in these regimes. But these moments are polynomials, and a polynomialis a very rigid object: results from the two extreme cases where p is fixed and p grows linearly will be enough to prove results for every regime in between,yielding the first part of the theorem. Proving the second part will then be thesimple matter of applying the GOE approximation of Jiang and Li [2015] andBubeck et al. [2016] to the specific shape found for the symmetric t momentswhile proving the first part, namely Equations (4.54) and (4.55). Proof of Theorem 2.
Recall the expected zonal polynomial of an inverse WishartW ´ p p n, I p { n q is given by Equation (4.37). Based on the previous calculations,it is tempting to define c λ “ | λ | | λ | ! ś q p λ q i ă j p λ i ´ λ j ´ i ` j q ś q p λ q i “ p λ i ` q p λ q ´ i q ! , R λ p m q “ q p λ q ź i “ λ i ´ ź l “ mm ´ p ´ i ` l q and P λ p m, p q “ q p λ q ź i “ λ i ´ ź l “ ˆ pm ` ´ i ` lm ˙ (4.42)so that E “ C λ p Y ´ q ‰ “ c λ n | λ | R λ p m q P λ p m, p q . With these expressions the expected power sum polynomials can be written asE “ r κ p Y ´ q ‰ “ p ÿ | λ |“| κ | c κ,λ c λ n | κ | ˜ ź | µ |“| κ | R µ p m q ź | µ |“| κ | µ “ λ R ´ µ p m q ¸ P λ p m, p q“ n | κ | m | κ | ¨ ź | µ |“| κ | R µ p m q ¨ m | κ | ÿ | λ |“| κ | c κ,λ c λ ź | µ |“| κ | µ “ λ R ´ µ p m q P λ p m, p q . h´etelat and Wells/Mid-scale Wishart asymptotics In other words, if we define R µ | “ ź | µ |“| κ | R µ p m q ,P λ | p m, p q “ m | κ | ÿ | λ |“| κ | c κ,λ c λ ź | µ |“| κ | µ “ λ R ´ µ p m q P λ p m, p q (4.43)then E “ r κ p Y ´ q ‰ “ n | κ | m | κ | R κ | p m q P κ | p m, p q . (4.44)But R ´ µ p m q “ ś q p λ q i “ ś λ i ´ l “ ` ´ ´ i ` lm ˘ is a polynomial in 1 { m , while P λ p m, p q “ ś q p λ q i “ ś λ i ´ i “ ` pm ` ´ i ` lm ˘ is a polynomial in p { m and 1 { m , both of degree atmost | µ | “ | λ | “ | κ | . Thus P κ | p m, p q ” m | κ | ÿ | λ |“| κ | c κ,λ c λ ź | µ |“| κ | µ “ λ R ´ µ p m q P λ p m, p q“ m | κ | | κ | ÿ i “ | κ | ÿ j “ b ij ´ pm ¯ i m j (4.45)for some coefficients b ij that don’t depend on m , p (or n ). Define the polynomials f j p α q “ ř | κ | i “ b ij α i , so that P κ | p m, p q “ m | κ | | κ | ÿ j “ f ´ pm ¯ m ´ j . (4.46)Let us show that for all 0 ď j ă | κ |´ q p κ q , the polynomial f j must be identicallyzero over the interval α P ` , { max p| κ | ´ , q ˘ . Indeed, say this was not thecase, and let 0 ď j ă | κ | ´ q p κ q be the smallest j with the property that f j p α q ‰ α P ` , p| κ |´ , q ˘ . As f j is a polynomial, by continuityit must be non-zero in a neighborhood of α , so we may as well assume α isrational without loss of generality. Now look at what happens to E “ r κ p Y ´ q ‰ as p grows to infinity at the very specific linear rate p “ t α ` α p n ´ q u . Since α isrational, there must be a subsequence n l such that p l is exactly an integer (forexample, if α “ a { b with a , b integers, we can take n l “ p a ` b q l ` p l “ α ` α p n l ´ q , we have exactly p l “ α m l .Since α ă p| κ |´ , q , then | κ | ă ` ` α ` α ˘ ´ . Thus by H¨older’s inequalityand Lemma 5,lim l Ñ8 m | κ |´ q p κ q´ j l ¨ p q p κ q l E “ r κ p Y ´ q ‰ ď ¨ lim l Ñ8 p l E ” tr Y ´| κ | ı “ . On the other hand, by Equations (4.44) and (4.46), the definition of j and thefact that R | κ | p m q Ñ m Ñ 8 ,lim l Ñ8 m | κ |´ q p κ q´ j l ¨ p q p κ q l E “ r κ p Y ´ q ‰ h´etelat and Wells/Mid-scale Wishart asymptotics “ lim l Ñ8 ˆ n l m l ˙ | κ | R κ | p m l q ˆ m l p l ˙ q p κ q | κ | ÿ j “ j f j p α q m j ´ jl “ p ` α q | κ | α ´ q p κ q f j p α q . As α ą f j p α q must therefore equal zero, a contradiction. Hence, as claimed,the polynomials f j p α q for 0 ď j ă | κ | ´ q p κ q all vanish over the interval ` , p| κ |´ , q ˘ .But a polynomial can have an infinite number of zeros only if all its coefficientsare zero, so we conclude that b ij “ ď j ă | κ | ´ q p κ q . Thus, from Equations (4.44) and (4.45) we haveE “ r κ p Y ´ q ‰ “ ´ nm ¯ | κ | m q p κ q R κ | p m q P κ p m, p q where P κ p m, p q “ | κ | ÿ i “ | κ | ÿ j “| κ |´ q p κ q b ij ´ pm ¯ i m j ´| κ |` q p κ q . Going back to equations (4.13) and (4.14) and plugging in the above yieldsthat as long as n ě p ` k ` “ tr T k ‰ “ m k ` n k R k p m q Q p q κ p m, p q , (4.47)E “ tr T k ‰ “ m k ` n k R k ` p m q Q p q κ p m, p q (4.48)where Q p q κ p m, p q “ p´ q k ÿ | κ |ď k ´ ` pm ` m ¯ | κ | R κ | p m q R k p m q b p q κ p n, m, p q m k ` ´ q p κ q P κ p m, p q ,Q p q κ p m, p q “ p´ q k ÿ | κ |ď k ` ´ ` pm ` m ¯ | κ | R κ | p m q R k ` p m q b p q κ p n, m, p q m k ` ´ q p κ q P κ p m, p q . Now, for any a ď b , we can associate a partition µ of norm | µ | “ a with thepartition µ ˚ “ p µ ` b ´ a, µ ` , . . . , µ q p µ q q of norm | µ ˚ | “ b , which satisfies q p µ ˚ q ź i “ µ ˚ i ´ ź j “ ˆ ´ ´ i ` jm ˙ “ q p µ q ź i “ µ i ´ ź j “ ˆ ´ ´ i ` jm ˙ µ ` b ´ a ´ ź j “ µ ˆ ´ jm ˙ . By definition for the R µ p m q ’s at Equation (4.42), this means that every factorthat appears in R ´ µ p m q appears in R ´ µ ˚ p m q , so by definition of the R | µ | p m q ’sat Equation (4.43), R a p m q R ´ p m q is a polynomial in m . Moreover, as b p q κ and b p q κ are polynomials of degrees d p κ q ” k ` ´ q p κ q and d p κ q ” k ` ´ q p κ q h´etelat and Wells/Mid-scale Wishart asymptotics respectively, there exists coefficients c p q ijl and c p q ijl such that b p q κ p n, m, p q m k ` ´ q p κ q “ m d p κ q d p κ q ÿ i “ d p κ q´ i ÿ j “ d p κ q´ i ´ j ÿ l “ c p q ijl m i n j p l “ d p κ q ÿ i “ d p κ q´ i ÿ j “ d p κ q´ i ´ j ÿ l “ c p q ijl m d p κ q´ i ´ j ´ l ´ ` pm ` m ¯ j ´ pm ¯ l and b p q κ p n, m, p q m k ` ´ q p κ q “ m d p κ q d p κ q ÿ i “ d p κ q´ i ÿ j “ d p κ q´ i ´ j ÿ l “ c p q ijl m i n j p l “ d p κ q ÿ i “ d p κ q´ i ÿ j “ d p κ q´ i ´ j ÿ l “ c p q ijl m d p κ q´ i ´ j ´ l ´ ` pm ` m ¯ j ´ pm ¯ l . As d p´ i ´ j ´ l ě q , j, l ě d p κ q ´ i ´ j ´ l, j, l ě pm and m . Therefore, looking back at (4.47) and (4.48), we conclude thatthe Q p q κ p m, p q ’s and Q p q κ p m, p q ’s are polynomials in pm and m . Therefore, if n ě p ` k ` a p q ij , a p q ij and large enough integers D , D such thatE “ tr T k ‰ “ m k ` n k R k p m q D ÿ i “ i ÿ j “ a p q ij p j m i “ ´ mn ¯ k R k p m q D ÿ i “ g p q i p p q m k ` ´ i (4.49)E “ tr T k ‰ “ m k ` n k R k ` p m q D ÿ i “ i ÿ j “ a p q ij p j m i “ ´ mn ¯ k R k ` p m q D ÿ i “ g p q i p p q m k ` ´ i (4.50)for polynomials g p q i p p q “ i ř j “ a p q ij p j and g p q i p p q “ i ř j “ a p q ij p j .We will now proceed to show that g p q i and g p q i must vanish on N for 0 ď i ă k ` ď i ă k ` T in the classical regime where p is held fixed while n grows to infinity.Observe first that E “ tr T k ‰ and E “ tr T k ‰ must have a finite limit as n Ñ 8 with p held fixed. Indeed, since 16 T { n is positive definite, | I p ` T { n | is h´etelat and Wells/Mid-scale Wishart asymptotics greater than one and so we have the boundE “ tr T k ‰ “ C n,p ż S p p R q tr T k ˇˇˇˇ I p ` T n ˇˇˇˇ ´ n ` p ` dT ď C n,p ż S p p R q tr T k ˇˇˇˇ I p ` T n ˇˇˇˇ ´ n dT. When p is held fixed, lim n Ñ8 C n,p “ p p p ` q { π p p p ` q by Lemma 1. Moreover, ˇˇˇˇ I p ` T n ˇˇˇˇ ´ n “ p ź i “ ˆ ` λ i p T q n { ˙ ´ n for λ p T q ě ¨ ¨ ¨ ě λ p p T q ě T , and p ` x { n q ´ n ismonotone decreasing towards exp p x q . Therefore, for a fixed dimension p we canapply the monotone convergence theorem to obtain thatlim n Ñ8 E “ tr T k ‰ ď p p p ` q π p p p ` q ż S p p R q tr T k e ´ T dT “ E “ tr Z k ‰ ă 8 (4.51)for Z „ GOE p p q{
4. Repeating the argument with tr T k yields similarlylim n Ñ8 E “ tr T k ‰ ď p p p ` q π p p p ` q ż S p p R q tr T k e ´ T dT “ E “ tr Z k ‰ ă 8 . (4.52)Thus indeed E “ tr T k ‰ and E “ tr T k ‰ have finite limits when p is held fixed.We can use this to show that g p q i and g p q i must vanish on N for 0 ď i ă k ` ď i ă k ` ď i ă k ` i such that for some p P N , g p q i p p q ‰
0. Thenby Equation (4.49) and the definition of i , the limit of E “ tr T k ‰ as n Ñ 8 with p fixed at p satisfieslim n Ñ8 E “ tr T k ‰ m k ` ´ i “ k ¨ ¨ lim n Ñ8 D ÿ i “ i g p q i p p q m i ´ i “ g p q i p p q . But m “ n ´ p ´ n tends to infinity, and since k ` ´ i ą “ tr T k ‰ { m k ` ´ i must tend to zero. Thus g p q i p p q has to equal zero, which contradicts our assumption. Thus for every 0 ď i ă k ` g p q i must vanish on N .Similarly, for 0 ď i ă k `
2, the polynomial g p q i must vanish on N , becauseif it wasn’t the case, we could take 0 ď i ă k ` i with theproperty that for some p P N , g p q i p p q ‰
0, and then by Equation (4.50) with h´etelat and Wells/Mid-scale Wishart asymptotics p fixed at p as n Ñ 8 we would getlim n Ñ8 E “ tr T k ‰ m k ` ´ i “ k ¨ ¨ lim n Ñ8 D ÿ i “ i g p q i p p q m i ´ i “ g p q i p p q . But then by Equation (4.52), as m tends to infinity and k ` ´ i ě “ tr T k ‰ { m k ` ´ i must tend to zero. Thus we must have g p q i p p q “
0, acontradiction. Hence indeed for 0 ď i ă k `
2, the polynomial g p q i must vanishon N .But of course, a polynomial can only have an infinite number of zeroes if itscoefficients are all zero, so we must have a p q ij “ ď i ă k ` ,a p q ij “ ď i ă k ` . (4.53)Now say that p varies with n in such a way that lim n Ñ8 p { n “ α ă
1. Thenfor large enough n , n ě p ` k ` n Ñ8 p k ` E “ tr T k ‰ “ lim n Ñ8 ´ mn ¯ k R k p m q D ÿ i “ k ` i ÿ j “ a p q ij p j ´p k ` q m i ´p k ` q “ p ´ α q k ¨ ¨ lim n Ñ8 « D ÿ i “ k ` k ÿ j “ a p q ij m i ´p k ` q p p k ` q´ j ` D ÿ i “ k ` i ÿ j “ k ` a p q ij ´ pm ¯ j ´p k ` q m i ´ j ff “ p ´ α q k « k ÿ j “ a p q k ` ` lim p Ñ8 p ˘ p k ` q´ j ` D ÿ i “ k ` a p q ii ˆ α ´ α ˙ i ´p k ` q ff (4.54)and by Equations (4.50) and (4.53),lim n Ñ8 p k ` E “ tr T k ‰ “ lim n Ñ8 ´ mn ¯ k R k ` p m q D ÿ i “ k ` i ÿ j “ a p q ij p j ´p k ` q m i ´p k ` q “ p ´ α q k ¨ ¨ lim n Ñ8 « D ÿ i “ k ` k ` ÿ j “ a p q ij m i ´p k ` q p p k ` q´ j ` D ÿ i “ k ` i ÿ j “ k ` a p q ij ´ pm ¯ j ´p k ` q m i ´ j ff “ p ´ α q k « k ` ÿ j “ a p q k ` ` lim p Ñ8 p ˘ p k ` q´ j ` D ÿ i “ k ` a p q ii ˆ α ´ α ˙ i ´p k ` q ff . (4.55)Although we might not know what a ij coefficients are, this shows at least thatthe limits are finite. In particular, from Equations (4.54) and (4.55) we canconclude that E “ tr T k ‰ “ O p p k ` q and E “ tr T k ‰ “ O p p k ` q , which shows thefirst claim of the theorem.For the second claim, let n, p Ñ 8 with p { n Ñ α “
0. Then Equations (4.54) h´etelat and Wells/Mid-scale Wishart asymptotics and (4.55) specialize tolim n Ñ8 p k ` E “ tr T k ‰ “ a p qp k ` qp k ` q , lim n Ñ8 p k ` E “ tr T k ‰ “ a p qp k ` qp k ` q . (4.56)What is interesting about this result is that these limits must be the same regardless of the way p grows! As long as p Ñ 8 with p { n Ñ
0, the limits are a p qp k ` qp k ` q and a p qp k ` qp k ` q , regardless of whether p „ log n or p „ ? n or someother growth.Now, Bubeck et al. [2016, Theorem 7] and Jiang and Li [2015, Theorem 1]have shown that when p Ñ 8 with p { n Ñ
0, the total variation distancebetween a normalized Wishart ? n p W p p n, I p { n q ´ I p q matrix and a GaussianOrthogonal Ensemble GOE p p q matrix tends to zero as n Ñ 8 . Therefore, theHellinger distance satisfies also H p ψ NW , ψ GOE q “ H p f NW , f GOE q Ñ n Ñ8 .But convergence in Hellinger distance has strong implications for real-valuedstatistics. Indeed, for T „ T n { p I p { q “ | ψ NW | , T „ GOE p p q{ “ | ψ GOE | andany function g : S p p R q Ñ R such that g p T q , g p T q are square-integrable, ˇˇˇ E r g p T qs ´ E r g p T qs ˇˇˇ “ ˇˇˇˇ ż S p p R q g p T q ” | ψ NW |p T q ´ | ψ GOE |p T q ı dT ˇˇˇˇ ď ˇˇˇˇ ż S p p R q g p T q ψ { p T q ” ψ { p T q ´ ψ { p T q ı dT ˇˇˇˇ ` ˇˇˇˇ ż S p p R q g p T q ψ { p T q ” ψ { p T q ´ ψ { p T q ı dT ˇˇˇˇ ď „ ż S p p R q g p T q | ψ NW |p T q dT ` ż S p p R q g p T q | ψ GOE |p T q dT ¨ ż S p p R q ˇˇˇ ψ { p T q ´ ψ { p T q ˇˇˇ dT “ „ E “ g p T q ‰ ` E “ g p T q ‰ H p ψ NW , ψ GOE q (4.57)by the Cauchy-Schwarz inequality.Let’s consider applying this result to g p T q “ tr T k { p k ` and g p T q “ tr T k { p k ` .What do we know about these statistics? In the case where T „ GOE p p q{ n Ñ8 E „ tr T k p k ` “ C k k h´etelat and Wells/Mid-scale Wishart asymptotics lim n Ñ8 E „ tr T k p k ` “ lim n Ñ8 ´ E „ tr T k p k { ` ¯ “ ˆ C k { k r k even s ˙ lim n Ñ8 E „´ tr T k p k ` ¯ “ lim n Ñ8 ´ E „ tr T k p k ` ¯ “ ˆ C k k ˙ ă 8 lim n Ñ8 E „´ tr T k p k ` ¯ ď lim n Ñ8 E „ tr T k p k ` “ C k k ă 8 because these expressions only depend on p , and since p Ñ 8 as n Ñ 8 , takinga limit as n Ñ 8 is the same as taking a limit as p Ñ 8 . Moreover, in the T „ T n { p I p { q case, using Jensen’s inequality and Equations (4.54)–(4.55) wecan at least see thatlim n Ñ8 E „´ tr T k p k ` ¯ ď lim n Ñ8 E „ tr T k p k ` ă 8 lim n Ñ8 E „´ tr T k p k ` ¯ ď lim n Ñ8 E „ tr T k p k ` ă 8 . Therefore, using Equation (4.57) with g p T q “ tr T k { p k ` and g p T q “ tr T k { p k ` we find that when n, p Ñ 8 with p { n Ñ ˇˇˇˇ lim n Ñ8 E „ tr T k p k ` ´ C k k ˇˇˇˇ ď ˆ lim n Ñ8 E „´ tr T k p k ` ¯ ` C k k ˙ ¨ “ , ˇˇˇˇ lim n Ñ8 E „ tr T k p k ` ´ ´ C k { k r k even s ¯ ˇˇˇˇ ď ˆ lim n Ñ8 E „´ tr T k p k ` ¯ ` C { k k ˙ ¨ “ . Since p { n Ñ p { n Ñ
0, we conclude from Equation (4.56) that a p qp k ` qp k ` q “ C k k , a p qp k ` qp k ` q “ C k { k r k even s . But then, by that same equation, we conclude that when n, p
Ñ 8 , not onlywhen p { n Ñ p such that p { n Ñ
0, we havelim n Ñ8 p k ` E “ tr T k ‰ “ C k k , lim n Ñ8 p k ` E “ tr T k ‰ “ C k { k r k even s (4.58)for T „ T n { p I p { q . To finish the proof, use Equation (4.58) with the fact thatE “ tr T k ‰ “ k to find thatlim n Ñ8 Var „ tr T k p k { ` “ lim n Ñ8 E „ tr T k p k ` ´ E „ tr T k p k { ` “ C k { k r k even s ´ ´ C k { k r k even s ¯ “ . Thus tr T k { p k ` L ÝÑ C k { k , as desired. This proves the second claim andconcludes the proof. h´etelat and Wells/Mid-scale Wishart asymptotics A pleasant consequence of this result is that when n, p
Ñ 8 with p { n Ñ
0, wecan conclude a version of the semicircle law holds for the T n { p I p q distribution.This is interesting because the T n { p I p q distribution has dependent entries withheavy tails, whose distribution varies with n, p .Let 4 T {? p „ n { p I p { q{? p , with eigenvalues λ p T {? p q ě ¨ ¨ ¨ ě λ p p T {? p q .Then define its empirical spectral measure to be L T {? p p A q “ p p ÿ i “ r λ i p T {? p q P A s . Since L T {? p depends on the random matrix T , it is a random measure on R . Corollary 1 (Semicircle law for the t distribution) . The empirical spectralmeasure L T {? p of a T n { p I p { q{? p random matrix converges weakly, in squaremean, to the semicircle distribution L p A q “ ż A ? ´ x π r| x | ď s dx. Proof.
Let f be any continuous function R Ñ R that vanishes at infinity. Bythe Stone-Weierstrass theorem, there exists a sequence f , f , . . . of polynomialssuch that for any (cid:15) ą
0, sup x P R | f p x q ´ f l p x q| ă (cid:15) . To fix some notation, write f l p x q “ deg f l ÿ k “ a lk x k . Then since L T {? p and L are both probability measures,E «ˆ ż R f p x q dL T {? p p x q ´ ż R f p x q dL p x q ˙ ff ď E «ˆ ż R “ f p x q ´ f l p x q ‰ dL T {? p p x q ˙ ff ` E «ˆ ż R f l p x q dL T {? p p x q ´ ż R f l p x q dL p x q ˙ ff ` E «ˆ ż R “ f l p x q ´ f p x q ‰ dL p x q ˙ ff ď (cid:15) ` deg f l ÿ k “ | a lk | E «ˆ p tr ´ T ? p ¯ k ´ C k { r k even s ˙ ff ` (cid:15). By Theorem 2, the expectation tends to zero as n, p
Ñ 8 with p { n Ñ
0. Thuslim n Ñ8 E «ˆ ż R f p x q dL T {? p p x q ´ ż R f p x q dL p x q ˙ ff ď (cid:15). But this is true for every (cid:15) ą
0, so the limit must be zero. Hence for every con-tinuous f that vanishes at infinity, the integral ş f dL T {? p converges in square h´etelat and Wells/Mid-scale Wishart asymptotics mean to ş f dL . By Chung [2001, Theorem 4.4.1 and 4.4.2], this implies that forevery bounded continuous f , the integral ş f dL T {? p converges in square meanto ş f dL . Thus the empirical spectral distribution L T {? p converges weakly, insquare mean, to the semicircle distribution L , as desired.
5. Wishart asymptotics: the G-transform point-of-view
We now turn our attention to the main objective of this paper, namely studyingthe behavior of Wishart matrices in the various middle-scale regimes. To dothis, we exploit the close connection between the Wishart and the symmetric t distributions and make use of the results found Section 4. The main resultof this section, Theorem 3, states that we can approximate for every middle-scale regime the G-transform ψ NW of a normalized Wishart by a degree-specificfunction ψ K . This can be seen as an analogue of Theorem 1 in the G-transformdomain.The reasoning behind the approximations is as follows. We could imaginewriting ψ NW from Proposition 3 in exponential form, and expanding the termsas a Taylor series would yield ψ NW p T q “ C n,p exp " i ? n tr T ´ n ` p `
12 log ˇˇˇˇ I p ` i T ? n ˇˇˇˇ* “ C n,p exp " i ? n tr T ` n ` p ` ÿ k “ p´ i q k k ´ pn ¯ k tr ˆ T ? p ˙ k * . Now imagine that the T ’s appearing in the expression follow a T n { p I p { q dis-tribution. By Theorem 2, we know that tr ` T ? p ˘ k “ Θ p p q when k is even, inan L sense. When k is odd, the theorem merely proves that tr ` T ? p ˘ k “ o p p q ,but for a GOE p p q matrix Z , we know that tr Z ? p k is asymptotically normalfor odd k by Anderson et al. [2010, Theorem 2.1.31]. This would suggest thattr ` T ? p ˘ k “ Θ p q when k is odd. Thus we would have, in some sense, ψ NW p T q “ C n,p exp " ÿ k “ Θ ˆ p k { ` n k { ´ ˙ ` ÿ k “ Θ ˆ p k { n k { ´ ˙ * “ C n,p exp " ÿ k “ Θ ˆ p k ` n k ˙ ` ÿ k “ Θ ˜c p p k ´ q` n k ´ ¸ * . In other words, terms in the power series would be associated with some degree K , such that they would be non-negligible in any middle-scale regime of degreeup to K , and negligible in higher degrees. In fact, a similar phenomenon occurswith C n,p , by Lemma 1. This suggests we should try truncating these powerseries to derive degree-specific approximations. Definition 3 (G-transform approximations) . For any K P N , define the K th h´etelat and Wells/Mid-scale Wishart asymptotics degree approximation ψ K : S p p R q Ñ C as ψ K p T q “ C p K q n,p exp $’&’% n K ` ` r K odd s ÿ k “ ´ i ? n ¯ k tr T k k ` p ` K ` ´ r K odd s ÿ k “ ´ i ? n ¯ k tr T k k ,/./- with C p K q n,p “ p p p ` q π p p p ` q exp " ´ K ` ÿ k “ r k even s k p k ` qp k ` q p k ` n k ´ K ` ÿ k “ ` r k even s k p k ` q p k ` n k * . Just like the G-transform of a normalized Wishart matrix, these functionsimplicitly depend on n . The first three are ψ p T q “ p p p ` q π p p p ` q exp " ´ T ´ i ? n tr T ` i p ` ? n tr T ´ p ` n tr T * ,ψ p T q “ p p p ` q π p p p ` q exp " ´ p n ´ T ´ i ? n tr T ` n tr T ` i n { tr T ´ n tr T ` i p ` ? n tr T ´ p ` n tr T ´ i p p ` q n { tr T * and ψ p T q “ p p p ` q π p p p ` q exp " ´ p n ´ p n ´ T ´ i ? n tr T ` n tr T ` i n { tr T ´ n tr T ´ i n { tr T ` i p ` ? n tr T ´ p ` n tr T ´ i p p ` q n { tr T ` p ` n tr T ` i p p ` q n { tr T ´ p p ` q n tr T * . These functions have the pleasant property that their modulus is bounded, upto a constant, by the G-conjugate density | ψ K | . Indeed, on one hand we canrewrite Definition 3 into ψ K p T q “ exp $’&’% log C p K q n,p ` n K ` ` r K odd s ÿ k “ p´ q k tr p T {? n q k k ` p ` K ` ´ r K odd s ÿ k “ p´ q k tr p T {? n q k k ´ i n K ` ÿ k “ p´ q k tr p T {? n q k ` k ` h´etelat and Wells/Mid-scale Wishart asymptotics ´ i p ` K ÿ k “ p´ q k tr p T {? n q k ` k ` ,/./- . (5.1)On the other hand, for any x P R , we can write 1 ` ix “ ? ` x exp ` i atan p x q ˘ with the arctangent function taking values in p´ π { , π { q . Thus by Proposition3 we can rewrite ψ NW as ψ NW p T q “ exp " log C n,p ´ n ` p `
14 log ˇˇˇˇ I p ` T n ˇˇˇˇ ´ i n ` p `
12 tr atan ˆ T ? n ˙ ` i ? n tr T * , (5.2)with the understanding that the matrix-variate arctangent function operates oneigenvalues by functional calculus. Now, for any x P R and odd integer L , thereis an elementary inequality L ÿ l “ p´ q l x l l ď ´
12 log p ` x q . Notice that K ` ˘ r K odd s is always an odd integer. Thus, from the aboveinequality and Equations (5.1) and (5.2), we can derive the bound ˇˇ ψ K ˇˇ p T q ď C p K q n,p exp ! ´ ´ n ` p ` ¯ log ˇˇˇ I p ` T n ˇˇˇ) “ C p K q n,p C n,p ˇˇ ψ NW ˇˇ p T q . (5.3)In particular, since ψ NW is integrable whenever n ě p ´ ψ K must also be integrable whenever n ě p ´ n it makes sense to talk about the asymptotictotal variation or Hellinger distance between ψ NW and ψ K .We now state the main result, which is that each function ψ K approximatesthe G-transform of a normalized Wishart for all middle-scale regimes of degree K or lower, but no other. Theorem 3.
Let lim n Ñ8 log p log n ă as n Ñ 8 . For any K P N , the total varia-tion distance between the G-transform of the normalized Wishart distribution ? n r W p p n, I p { n q ´ I p s and the K th approximating function ψ K satisfies d TV ` ψ NW , ψ K ˘ “ ż S p p R q ˇˇ ψ NW p T q ´ ψ K p T q ˇˇ dT Ñ as n Ñ 8 if and only if p K ` { n K ` Ñ .Proof. If statement. For the first part of the theorem, remark that by Equation(3.7) it is equivalent to show that the Hellinger distance tends to zero, i.e. thatH ` ψ NW , ψ K ˘ “ ż S p p R q ˇˇˇ ψ { p T q ´ ψ { K p T q ˇˇˇ dT Ñ n Ñ 8 h´etelat and Wells/Mid-scale Wishart asymptotics when p K ` { n K ` Ñ
0. To control this quantity, we use the Kullback-Leiblerinequality for G-transforms. Notice that for any x P R and L P N , ˇˇˇˇˇ ´
12 log p ` x q ´ L ´ ÿ l “ p´ q l x l l ˇˇˇˇˇ ď x L L , (5.4) ˇˇˇˇˇ atan p x q ´ L ´ ÿ l “ p´ q l x l ` l ` ˇˇˇˇˇ ď x L L . (5.5)Let Log stand for the principal branch of the complex logarithm, and let usstudy Log ψ NW { ψ K . Its real part can be bounded by ˇˇˇˇ (cid:60) Log ψ NW p T q ψ K p T q ˇˇˇˇ “ ˇˇˇˇˇˇˇ log C n,p ´ n ` p `
14 log ˇˇˇˇ I p ` T n ˇˇˇˇ ´ log C p K q n,p ´ n K ` ` r K odd s ÿ k “ p´ q k tr p T {? n q k k ´ p ` K ` ´ r K odd s ÿ k “ p´ q k tr p T {? n q k k ˇˇˇˇˇˇˇ ď ˇˇˇˇ log C n,p ´ log C p K q n,p ˇˇˇˇ ` n ˇˇˇˇˇˇˇ ´
12 log ˇˇˇˇ I p ` T n ˇˇˇˇ ´ K ` ` r K odd s ÿ k “ p´ q k tr p T {? n q k k ˇˇˇˇˇˇˇ ` p ` ˇˇˇˇˇˇˇ ´
12 log ˇˇˇˇ I p ` T n ˇˇˇˇ ´ K ` ´ r K odd s ÿ k “ p´ q k tr p T {? n q k k ˇˇˇˇˇˇˇ . By Equation (5.4), this can be bounded by ď ˇˇˇˇ log C n,p ´ log C p K q n,p ˇˇˇˇ ` n p T {? n q K ` ` r K odd s K ` ` r K odd s` p `
12 tr p T {? n q K ` ´ r K odd s K ` ´ r K odd s“ ˇˇˇˇ log C n,p ´ log C p K q n,p ˇˇˇˇ ` K ` ` r K odd s K ` ` r K odd s tr T K ` ` r K odd s n K ` ` r K odd s ` K ` ´ r K odd s K ` ´ r K odd s p p ` q tr T K ` ´ r K odd s n K ` ´ r K odd s . (5.6)We can bound the imaginary part of Log ψ NW { ψ K in a similar way. Define P p´ π,π s : R Ñ p´ π, π s to be the projection mapping P p´ π,π s x “ x ´ π r x π ´ s .A plot is given as Figure 4.It satisfies (cid:61) Log z “ P p´ π,π s (cid:61) log z for all branches of log z , as well as theinequality | P p´ π,π s x | ď | x | . Using this mapping, we can see that the imaginary h´etelat and Wells/Mid-scale Wishart asymptotics x -5 π -3 π - π π π πyπ - π Fig 4 . Plot of P p´ π,π q on p´ π, π s . part of Log ψ NW { ψ K can be bounded as ˇˇˇˇ (cid:61) Log ψ NW p T q ψ K p T q ˇˇˇˇ “ ˇˇˇˇˇˇˇ P p´ π,π s »—– ´ n ` p `
12 tr atan ˆ T ? n ˙ ` ? n tr T ` n K ` ÿ k “ p´ q k tr p T {? n q k ` k ` ` p ` K ÿ k “ p´ q k tr p T {? n q k ` k ` fiffiflˇˇˇˇˇˇˇ ď n ˇˇˇˇˇˇˇ tr atan ˆ T ? n ˙ ´ ? n tr Tn { ´ K ` ÿ k “ p´ q k tr p T {? n q k ` k ` ˇˇˇˇˇˇˇ ` p ` ˇˇˇˇˇˇˇ tr atan ˆ T ? n ˙ ´ K ÿ k “ p´ q k tr p T {? n q k ` k ` ˇˇˇˇˇˇˇ . By Equation (5.5), this can be bounded by ď n p T {? n q K ` K ` ` p `
12 tr p T {? n q K ` K ` ď K ` K ` T K ` n K ` ` K ` K ` p p ` q tr T K ` n K ` . (5.7)Recall that the G-conjugate of the normalized Wishart distribution is the t dis-tribution with n { I p {
8, denoted T n { p I p { q – see Equation (3.24) and Section 4 for details. Let us bound the expectations ofthese absolute real and imaginary parts under this distribution. By Equations(5.10), (5.6), (5.7) and Theorem 2, we find that for T „ | ψ NW | “ T n { p I p { q ,E „ˇˇˇˇ (cid:60) Log ψ NW p T q ψ K p T q ˇˇˇˇ ď K ` ` r K odd s K ` ` r K odd s E “ tr T p K ` ` r K odd sq ‰ n K ` ` r K odd s ` K ` ´ r K odd s K ` ´ r K odd s p p ` q E “ tr T p K ` ´ r K odd sq ‰ n K ` ´ r K odd s ` o ´ p K ` n K ` ¯ ď O ˆ p K ` ` r K odd s` n K ` ` r K odd s ˙ ` O ˆ p ¨ p K ` ´ r K odd s` n K ` ´ r K odd s ˙ ` o ´ p K ` n K ` ¯ h´etelat and Wells/Mid-scale Wishart asymptotics “ O ´ p K ` n K ` ¯ (5.8)andE „ˇˇˇˇ (cid:61) Log ψ NW p T q ψ K p T q ˇˇˇˇ ď K ` K ` “ tr T p K ` q ‰ n K ` ` K ` K ` p p ` q E “ tr T p K ` q ‰ n K ` ď O ˆ p K ` ` n K ` ˙ ` O ˆ p ¨ p K ` ` n K ` ˙ “ O ´ p K ` n K ` ¯ (5.9)as n Ñ 8 with p { n Ñ C n,p “ C p K q n,p exp " o ˆ p K ` n K ` ˙* as n Ñ 8 with pn Ñ . (5.10)Thus, from Equations (5.3) and (5.10), we see that when p K ` { n K ` Ñ
0, theasymptotic L norm of ψ K is bounded bylim n Ñ8 ż S p p R q ˇˇ ψ K p T q ˇˇ dT ď lim n Ñ8 exp ! ´ o ´ p K ` n K ` ¯) ż S p p R q ˇˇ ψ NW p T q ˇˇ dT “ . (5.11)In fact, at the end of this proof we will see that this bound is sharp and thelimit must be exactly 1.Using Equations (5.8), (5.9) and (5.11) with Proposition (1) implies thatwhen p K ` { n K ` Ñ ď lim n Ñ8 H ´ ψ NW , ψ K ¯ ď lim n Ñ8 „ ż S p p R q | ψ K |p T q dT ´ ` ` n Ñ8 ż S p p R q | ψ K |p T q dT ¨ ď r ´ s ` ` ¨ ¨ “ . Thus H ` ψ NW , ψ K ˘ Ñ
0, hence by Equation (3.7) we must have the limitd TV ` ψ NW , ψ K ˘ Ñ
0, as desired.
Only if statement.
For the second part of the theorem, assume that the totalvariation distance satisfies d TV p ψ NW , ψ K q Ñ
0, hence H ` ψ NW , ψ K ˘ Ñ n Ñ 8 . We will show by contradiction this implies that p K ` { n K ` Ñ n Ñ8 log p log n ă
1, there must be an L P N such that p L ` { n L ` Ñ
0, and since p K ` { n K ` Û
0, we must have K ă L .By Equation (5.8), we must have for T „ | ψ NW | “ T n { p I p { q thatlim n Ñ8 E „ˇˇˇˇ (cid:60) Log ψ NW p T q ψ L p T q ˇˇˇˇ ď lim n Ñ8 O ´ p L ` n L ` ¯ “ , so 12 (cid:60) Log ψ NW p T q ψ L p T q L ÝÑ . (5.12) h´etelat and Wells/Mid-scale Wishart asymptotics Now write, by Equation (5.1) and Definition 3, R p T q ” (cid:60) Log ψ L p T q ψ K p T q “
12 log C p L q n,p C p K q n,p ` n L ` ` r L odd s ÿ k “ K ` ` r K odd s p´ q k k tr ˆ T ? n ˙ k ` p ` L ` ´ r L odd s ÿ k “ K ` ´ r K odd s p´ q k k tr ˆ T ? n ˙ k “ ´ L ` ÿ k “ K ` r k even s k p k ` qp k ` q p k ` n k ´ L ` ÿ k “ K ` ` r k even s k p k ` q p k ` n k ` n L ` ` r L odd s ÿ k “ K ` ` r K odd s p´ q k k ´ pn ¯ k tr ˆ T ? p ˙ k ` p ` L ` ´ r L odd s ÿ k “ K ` ´ r K odd s p´ q k k ´ pn ¯ k tr ˆ T ? p ˙ k . But as p L ` { n L ` , we must have p { n Ñ
0, so by Theorem 2, we have p tr p T ? p q k L Ñ C k . Moreover, as we assumed that p K ` { n K ` Û
0, we must have p Ñ 8 . Thus n K ` p K ` R p T q “ ´ L ` ÿ k “ K ` r k even s k p k ` qp k ` q p k ´ K ´ n k ´ K ´ ´ L ` ÿ k “ K ` ` r k even s k p k ` q p p k ´ K ´ n k ´ K ´ ` L ` ` r L odd s ÿ k “ K ` ` r K odd s p´ q k k p k ´ K ´ n k ´ K ´ p tr ˆ T ? p ˙ k ` ˆ ` p ˙ L ` ´ r L odd s ÿ k “ K ` ´ r K odd s p´ q k k p k ´ K ´ n k ´ K ´ p tr ˆ T ? p ˙ kL ÝÑ ` ` r K even s p K ` q C K ` ` r K odd s p K ` q C K ` “ C K ` ` r K even s p K ` ` r K even sq ą . (5.13)Then by the reverse triangle inequality,0 “ lim n Ñ8 H ` ψ NW , ψ K ˘ “ lim n Ñ8 ż S p p R q ˇˇˇ ψ { p T q ´ ψ { K p T q ˇˇˇ dT h´etelat and Wells/Mid-scale Wishart asymptotics ď lim n Ñ8 ż S p p R q ˇˇˇ | ψ NW | { p T q ´ | ψ K | { p T q ˇˇˇ dT “ lim n Ñ8 E «ˇˇˇˇ exp " (cid:60) Log ψ K p T q ψ NW p T q * ´ ˇˇˇˇ ff for a T „ | ψ NW | “ T n { p I p { q , that isexp " (cid:60) Log ψ K p T q ψ NW p T q * L ÝÑ . Since L p convergence implies convergence in probability, by the continuous map-ping theorem we must have12 (cid:60) Log ψ K p T q ψ NW p T q P ÝÑ n Ñ 8 , so by Equation (5.12) R p T q “ ´ (cid:60) Log ψ K p T q ψ NW p T q ´ (cid:60) Log ψ NW p T q ψ L p T q P ÝÑ ´ ´ “ . But then, from Equation (5.13) and Slutsky’s lemma [van der Vaart, 2000,Lemma 2.8 (iii)], p K ` n K ` “ ˆ n K ` p K ` R p T q ˙ ´ R p T q P ÝÑ p K ` ` r K even sq C K ` ` r K even s ¨ “ . as n Ñ 8 . As p K ` { n K ` is deterministic, this implies that p K ` { n K ` Ñ n Ñ 8 , a contradiction. Thus whenever H ` ψ NW , ψ K ˘ Ñ n Ñ 8 withlim n Ñ8 log p log n ă
1, we must have p K ` { n K ` Ñ
0, as desired. This concludes theproof.Although Theorem 3 states that the functions ψ K approximate ψ NW , thereis no guarantee that they are G-transforms of a probability density. In otherwords, nothing guarantees that their inverse G-transforms ˜ f K “ G ´ t ψ K u arereal-valued, non-negative and integrate to unity. However, the reverse triangleinequality applied to the L -norm provides that ˇˇˇ ż S p p R q | ψ NW p T q| dT ´ ż S p p R q | ψ K p T q| dT ˇˇˇ ď H ` ψ NW , ψ K ˘ , so Theorem 3 and the Plancherel theorem implies thatlim n Ñ8 ż S p p R q | ˜ f K p T q| dT “ lim n Ñ8 ż S p p R q | ψ K p X q| dX “ p K ` { n K ` Ñ
0. That is, the theorem at least guarantees that | ˜ f K | isasymptotically a density in its associated regime. We discuss this further inSection 6. h´etelat and Wells/Mid-scale Wishart asymptotics We independently know, by the results of Jiang and Li [2015] and Bubecket al. [2016], that a Gaussian orthogonal ensemble approximation holds in theclassical regime. Although ψ is not the G-transform of a GOE p p q , a simpleKullback-Leibler argument is sufficient to prove that it approximates ψ GOE for0 th degree regimes. Proposition 4.
The total variation distance between the th degree G-transformapproximation ψ and the Gaussian orthogonal ensemble G-transform ψ GOE satisfies d TV p ψ , ψ GOE q Ñ as n Ñ 8 with p { n Ñ .Proof. We use a similar strategy to Theorem 3: namely, by Equation (3.7) is itequivalent to prove that H p ψ , ψ GOE q Ñ n Ñ 8 with p { n Ñ
0, and to con-trol that quantity we can use the Kullback-Leibler inequality for G-transforms.Let T „ | ψ GOE | “
GOE p p q{
4. Since the Gaussian orthogonal ensemble hasbeen extensively studied, we understand well its empirical moments. For exam-ple, according to Anderson et al. [2010, Lemma 2.2.2], we have E “ tr T ‰ “ O p p q ,while from Equation (2.1.45) of the same book we have E r| tr T |s “ O p p { q andE “ | tr T | ‰ “ O p p { q . Then from Definition 3 and Proposition 2, and using theprojection map P p´ π,π q as in the proof of Theorem 3, we find thatE „ (cid:60) Log ψ GOE p T q ψ p T q “ ´ p ` n E “ tr T ‰ “ O ˆ p n ˙ and E „ˇˇˇˇ (cid:61) Log ψ GOE p T q ψ p T q ˇˇˇˇ “ E „ˇˇˇˇ P p´ π,π q „ ´ ? n tr T ` p ` ? n tr T ˇˇˇˇ ď ? n E “ | tr T | ‰ ` i p ` ? n E r| tr T |s “ O ˆc p n ˙ . Since ş S p p R q | ψ |p T q dT Ñ n Ñ 8 with p { n Ñ n Ñ8 H p ψ , ψ GOE q ď ` ` ¨ ¨ “ p { n Ñ
0. By Equation (3.7), this concludes the proof.As a consequence, H p f NW , f GOE q “ H p ψ NW , ψ GOE q ď H p ψ NW , ψ q` H p ψ , ψ GOE q Ñ n Ñ 8 with p { n Ñ TV p f NW , f GOE q Ñ
6. Wishart asymptotics: the density point-of-view
In Section 5, we studied the asymptotic behavior of the normalized Wishartdistribution ? n r W p p n, I p { n q ´ I p s using its G-transform ψ NW . In particular,we derived an approximation to ψ NW for every middle-scale regime of a givendegree. But although it is equivalent to study a probability distribution from adensity or a G-transform point of view, it is still natural to wonder if we can find h´etelat and Wells/Mid-scale Wishart asymptotics approximations to the density of a normalized Wishart for every middle-scaleregime of a given degree.Recall from Theorem 3 that d TV p ψ NW , ψ K q Ñ p K ` { n K ` Ñ f K “ G ´ t ψ K u . In general, there is no guarantee that these should bereal-valued. On the other hand, we know from Equation (5.3) that whenever n ě p ´ ψ K must be integrable, and since the G-transform maps integrablefunctions to integrable functions, ˜ f K must also be integrable. In fact, accord-ing to Equation (5.14), we know | ˜ f K | must be asymptotically a density when p K ` { n K ` Ñ
0. This suggests we define the following densities.
Definition 4 (Density approximations) . For any K P N and n ě p ´
2, definethe K th degree density approximation as f K p X q “ | ˜ f K |p X q ş S p p R q | ˜ f K p Y q| dY , where ˜ f K “ G ´ t ψ K u and ψ K is as in Definition 3. The distribution on the realsymmetric matrices with density f K will be denoted F K .The main interest is that we can asymptotically approximate the density f NW of a normalized Wishart by the bona fide densities f K . This was the contentof Theorem 1 from Section 1, which we now prove as a simple corollary of itsG-transform analogue Theorem 3 from Section 5. Proof of Theorem 1.
As in the rest of this paper, we write the density of thenormalized Wishart distribution ? n r W p p n, I p { n q ´ I p s by f NW , and by Def-inition 4 the density of F K is f K . Notice that by Equation (2.1), to proved TV p f NW , f K q Ñ p f NW , f K q Ñ
0. From thetriangle inequality, the reverse triangle inequality, Theorem 3 and Equation(5.14),lim n Ñ8 H ´ f NW , f K ¯ ď lim n Ñ8 H ´ f NW , | ˜ f K | ¯ ` lim n Ñ8 H ´ | ˜ f K | , f K ¯ “ lim n Ñ8 ż S p p R q ˇˇˇ f { p X q ´ | ˜ f { K |p X q ˇˇˇ dX ` lim n Ñ8 ż S p p R q ˇˇˇˇ | ˜ f { K |p X q ´ | ˜ f { K |p X q ş S p p R q | ˜ f K |p Y q dY { ˇˇˇˇ dX ď lim n Ñ8 ż S p p R q ˇˇˇ f { p X q ´ ˜ f { K p X q ˇˇˇ dX ` lim n Ñ8 ˇˇˇˇ ´ ş S p p R q | ˜ f K |p Y q dY { ˇˇˇˇ ż S p p R q | ˜ f K |p X q dX h´etelat and Wells/Mid-scale Wishart asymptotics “ { ` ˇˇˇ ´ { ˇˇˇ ¨ { “ . when p K ` { n K ` Ñ
0. Thus H p f NW , f K q , hence d TV p f NW , f K q , tends to zerowhen n Ñ 8 with p K ` { n K ` Ñ
0, as desired.We defined f K in terms of the inverse G-transform of the ψ K functions givenby Definition 3. How can we express this explicitely? By Equation (5.3), we seethat | ψ K |p T q is asymptotically bounded by the T n { p I p { q density | ψ NW |p T q ,which is integrable whenever n ´ p ` ě
0. But | ψ NW | { p T q is proportionalto a T m { p n m I p q density in the sense of Definition 2, which is integrable for m { ě p { ´
1, that is whenever n ´ p ` ě
0. Thus | ψ NW | { p T q and therefore | ψ K | { p T q is integrable whenever n ´ p ` ě
0. Hence we can use the Fourerinversion theorem to conclude that f K is proportional to the integral f K p X q 9 ˇˇˇ G ´ t ψ K u ˇˇˇ p X q9 ˇˇˇˇˇ ż S p p R q exp i tr p XT q ` n K ` ` r K odd s ÿ k “ ´ i ? n ¯ k tr T k k ` p ` K ` ´ r K odd s ÿ k “ ´ i ? n ¯ k tr T k k + dT ˇˇˇˇˇ (6.1)whenever n ´ p ` ě
0. In particular, if we do a change of variables Z “ ? T ,we obtain Equation (1.3) from Section 1 whenever n ě p ´
3, from which wecan derive Equations (1.1) and (1.2).It would be quite pleasant if there was a way to solve the integral in Equation(6.1) or (1.3) and obtain a (potentially quite complicated) closed form expressionfor f K up to its normalization constant. So far, our efforts have been unfruitful.We close our discussion with a final remark. At the end of Section 5, weshowed that ψ approximates ψ GOE in 0 th degree middle-scale regimes, fromwhich the classical asymptotic normality follows. It is natural to wonder if f approximates f GOE in the same context. An argument similar to that of Theo-rem 1 shows this is the case.
Proposition 5.
The total variation distance between the th degree density ap-proximation f and the Gaussian orthogonal ensemble G-transform f GOE satis-fies d TV p f , f GOE q Ñ as n Ñ 8 with p { n Ñ .Proof. The Hellinger distance between f and f GOE satisfieslim n Ñ8 H ´ f , f GOE ¯ ď lim n Ñ8 H ´ f , | ˜ f | ¯ ` lim n Ñ8 H ´ | ˜ f | , f GOE ¯ ď lim n Ñ8 ˇˇˇˇ ´ ş S p p R q | ˜ f |p Y q dY { ˇˇˇˇ ż S p p R q | ˜ f |p X q dX ` lim n Ñ8 H ´ ˜ f , f GOE ¯ h´etelat and Wells/Mid-scale Wishart asymptotics “ ˇˇˇ ´ { ˇˇˇ ¨ { ` lim n Ñ8 H ´ ψ , ψ GOE ¯ “ . By Equation (2.1), the result follows.Of course, we could conclude from this that H p f NW , f GOE q ď H p f NW , f q ` H p f , f GOE q Ñ n Ñ 8 with p { n Ñ
0, offering yet again anotherproof that a Gaussian orthogonal ensemble approximation holds in the classicalsetting.
7. The effect of phase transitions
Although we have established the existence of phase transitions, it does not shedmuch light on how the behavior of a normalized Wishart distribution might differacross phase transitions. To do this, it can be very illuminating to study theasymptotics of some of its statistics. For example, we could study its empiricalmoments.For a normalized Wishart matrix X „ ? n r W p p n, I p { q ´ I p s , a direct com-putation yieldsE „ˆ p tr ´ X ? p ¯ ´ ˙ “ p ` p ` np ` np ` np “ p ` o ˆ p ˙ so in every middle-scale regime, that is whenever n, p Ñ 8 with p { n Ñ ›››› p tr ´ X ? p ¯ ´ ›››› L „ ? p . ´ all middle-scaleregimes ¯ Thus we have L convergence of the second empirical moment to 1, but other-wise nothing very interesting. There doesn’t seem to be any change of behavioracross the different middle-scale regimes. In contrast, the situation with thesymmetric t distribution is striking, and illustrates yet again that middle-scaleregime behavior becomes clearer under a G-transform. Indeed, we know fromTheorem 2 that for a T „ T n { p I p { q , the quantity p tr p T ? p q also converges to1, but we know more. At Equation (4.41), we computed the exact L distancebetween p tr p T ? p q and 1, and found thatE „ˆ p tr ´ T ? p ¯ ´ ˙ “ m p m ´ q p m ´ q p m ´ q p m ` q p m ` q¨ „ p ` m ` p m ` o ˆ p ` m ` p m ˙ . Thus the L distance must have middle-scale asymptotics ›››› p tr ´ T ? p ¯ ´ ›››› L h´etelat and Wells/Mid-scale Wishart asymptotics „ $’’’’’’’’’’’’’’&’’’’’’’’’’’’’’% ? p for p n Ñ ´ classical or firstdegree ¯ ? ` α ` α p “ ? α ´ ` ` α ? n “ ? α ´ ` α ´ ` pn for p n Ñ α ` second degree ˘ pn for p n Ñ 8 ´ second or higherdegree ¯ (7.1)as n, p Ñ 8 with p { n Ñ
0. Thus there is a sharp change in behavior of p tr p T ? p q when p grows like ? n , and despite the symmetric t distribution satisfying asemicircle law according to Corollary 1, it must ultimately behave differentlythan a Gaussian orthogonal ensemble matrix. The first-order asymptotics lookthe same: it is rather in the rate of this convergence that they differ.This matters for both the symmetric t and the Wishart distribution becauserates of convergence can be distinguished in the strong topology. As a simpleexample, consider the sequence of one-dimensional distributions F p “ N ` , { p ˘ , and G p “ N ` , { p ˘ . In the weak topology, these are asymptotically the same, since they converge tothe same distribution – namely F p , G p ñ δ as p Ñ 8 , for δ the Dirac measureat 0. In other words, in a metric that induces the weak topology such as theL´evy metric, d L´evy p F p , G p q Ñ . Yet, by a direct computation of the Hellinger distance, which induces the strongtopology, d Hellinger p F p , G p q “ H p F p , G p q “ ? d ´ ´ pp ` p ` ¯ { Ñ ? ą p Ñ 8 . Thus it is clear that the strong topology captures rates of con-vergence in a way that the weak topology can’t. But then, we should expecta phase transition when p grows like ? n for the T n { p I p { q distribution. Andsince the symmetric t is the G-conjugate of the Wishart, this should imply aphase transition when p grows like ? n for the Wishart distribution as well. Thisis consistent with Theorem 3, and provides an alternative explanation for theexistence of the second phase transition.A natural question then is to ask whether we can find symmetric t statisticsthat exemplify all the middle-scale regime phase transitions. It is tempting tolook at the L error of the other empirical moments of the symmetric t distri-bution, because we can use the methodology developed in Section 4 to computetheir asymptotics to arbitrary order. As a reference, we compiled a table of the h´etelat and Wells/Mid-scale Wishart asymptotics L error1 p tr ´ T ? p ¯ p p tr ´ T ? p ¯ C “ p ` m ` p m p tr ´ T ? p ¯ p p tr ´ T ? p ¯ C “ p ` m ` p m Table 1
Asymptotics of small normalized empirical moments of T „ T n { p I p { q . few first few moments as Table 1.As can be seen from the table, the odd moments seem to have uniform be-havior across all middle-scale regimes. In contrast, the even moments seem toall change asymptotics at the second phase transition p “ Θ p? n q , but nowhereelse. Hence finding statistics that “flag” the other phase transitions remain anopen question.
8. Auxiliary results
This section compiles several lemmas used elsewhere in the article.
Lemma 3 (First derivatives lemma) . For any indices ď i , . . . , i l ď p and realsymmetric matrix Z , there exist polynomials a J,s p n, m q in n and m “ n ´ p ´ ,indexed by ď s ď l and J “ p j , . . . , j l q , such that B s B s Z i l i l ´ . . . B s B s Z i i B s B s Z i i exp " ´ n Z *ˇˇ Z ˇˇ m “ l ÿ s “ ÿ J Pt ,...,p u l a J,s p n, m q l ź t “ s ` p I p q j t j t ´ s ź t “ Z ´ j t j t ´ exp " ´ n Z *ˇˇ Z ˇˇ m . Proof.
To simplify notation, let M J,s p Z q “ l ź t “ s ` p I p q j t j t ´ s ź t “ Z ´ j t j t ´ exp " ´ n Z *ˇˇ Z ˇˇ m , and let M l “ t M J,s | J P t , . . . , p u l , s ď l u be the set of all such terms “on 2 l indices”. Let x M l y denote the linear span of M l , that is, the space of all linearcombinations of elements of M l , with as coefficients real polynomials in n and m . Then we are really claiming that B s B s Z i l i l ´ . . . B s B s Z i i B s B s Z i i exp " ´ n Z *ˇˇ Z ˇˇ m P x M l y . (8.1) h´etelat and Wells/Mid-scale Wishart asymptotics To see this, let J “ p j , . . . , j l ´ q P t , . . . , p u l ´ and define the extension J qa,b “ p j , . . . , j q ´ , a, b, j q ` , . . . , j l ´ q P t , . . . , p u l to be J with indices a , b inserted (in this order) at the q th position. Then using that B s B s Z i l i l ´ Z ´ ab “ ´ ” Z ´ ai l Z ´ i l ´ b ` Z ´ ai l ´ Z ´ i l b ı and B s B s Z i l i l ´ exp " ´ n Z *ˇˇ Z ˇˇ m “ ” m Z i l i l ´ ´ n p I p q i l i l ´ ı exp " ´ n Z *ˇˇ Z ˇˇ m , we conclude that B s B s Z i l i l ´ M J,s p Z q “ ´ s ÿ r “ M J ri li l ´ ,s ` ´ s ÿ r “ M J ri l ´ i l ,s ` ` m M J s ` i li l ´ ,s ` ´ n M J s ` i li l ´ ,s P x M l y . Thus, by linearity, B s {B s Z i l i l ´ maps x M l ´ y to x M l y . But naturally we haveexp t´ n tr Z u| Z | m { P x M y , so by induction Equation (8.1) must then hold, asdesired. Lemma 4 (Second derivatives lemma) . For any k P N and any Z P S p p R q , p ÿ i ,...,i k B s B s Z i i k . . . B s B s Z i i B s B s Z i i e ´ n tr Z ˇˇ Z ˇˇ m “ e ´ n tr Z ˇˇ Z ˇˇ m ÿ | κ |ď k b p q κ p n, m, p q r κ p Z ´ q and p ÿ i ,...,i k j ,...,j k B s B s Z j j k . . . B s B s Z j j B s B s Z j j B s B s Z i i k . . . B s B s Z i i B s B s Z i i e ´ n tr Z ˇˇ Z ˇˇ m “ e ´ n tr Z ˇˇ Z ˇˇ m ÿ | κ |ď k ` b p q κ p n, m, p q r κ p Z ´ q for some polynomials b p q κ p n, m, p q and b p q κ p n, m, p q with degrees deg b p q κ ď k ` ´ q p κ q and deg b p q κ ď k ` ´ q p κ q . The sums at the right hand sides are takenover all integer partitions κ of norm at most k and k ` , including the emptypartition.Proof. We give a spectral proof. Let
OLO t be the spectral decomposition of Z ,with eigenvalues λ ě ¨ ¨ ¨ ě λ p , and notice that B s O hl B s Z ij “ p ÿ a ‰ l O ha O tai λ l ´ λ a O jl ` ÿ a ‰ l O ha O taj λ l ´ λ a O il , B s λ h B s Z ij “ O ih O jh for any 1 ď i, j, h, l ď p . As a consequence, for any differentiable real-valued h´etelat and Wells/Mid-scale Wishart asymptotics functions F p L q , . . . , F p p L q , we have p ÿ j “ B s B s Z hj ˆ p ÿ a “ O ja F a O tai ˙ “ p ÿ a,bb ‰ a O ha F b ´ F a λ b ´ λ a O tai ` p ÿ a “ O ha B F a B λ a O tai . This suggests we define a new operator D L that would map the space of diagonalmatrices F p L q “ diag p F p L q , . . . , F p p L qq that differentially depends on L , toitself, by D L t F u a “ p ÿ b ‰ a F b ´ F a λ b ´ λ a ` B F a B λ a so that p ÿ j “ B s B s Z hj OF O tji “ OF O tki . In particular, p ÿ i ,...,i k B s B s Z i i k . . . B s B s Z i i B s B s Z i i e ´ n tr Z ˇˇ Z ˇˇ m “ p ÿ i ,...,i k B s B s Z i i k . . . B s B s Z i i B s B s Z i i ” e ´ n tr Z ˇˇ Z ˇˇ m I p ı i i “ tr D kL ! e ´ n tr Z ˇˇ Z ˇˇ m I p ) , (8.2)and similarly p ÿ i ,...,i k j ,...,j k B s B s Z j j k . . . B s B s Z j j B s B s Z j j B s B s Z i i k . . . B s B s Z i i B s B s Z i i e ´ n tr Z ˇˇ Z ˇˇ m “ tr D kL ! tr D kL ! e ´ n tr Z ˇˇ Z ˇˇ m I p )) . (8.3)Let us look more closely at this operator D L . It satisfies the following.(i) D L is linear, in the sense that for diagonals F p L q , G p L q and constants a , b with respect to L , D L t aF ` bG u “ aD L t F u ` bD L t G u . (ii) D L satisfies a restricted product rule, in the sense that for a diagonal F p L q of the form F p L q “ f p L q I p for some function f p L q , and any diagonal G p L q , D L t F G u “ D L t F u G ` F D L t G u . Moreover, from the definition of D L , D L (cid:32) e ´ n tr L I p ( “ ´ n e ´ n tr L I p , D L (cid:32) | L | m I p ( “ m | L | m I p ,D L (cid:32) tr p L ´ s q I p ( “ ´ sL ´p s ` q , and D L (cid:32) L ´ s ( “ ´ s L ´p s ` q ´ s ÿ t “ tr p L ´r s ` ´ t s q L ´ t . h´etelat and Wells/Mid-scale Wishart asymptotics Now define the spaces M l “ b p n, m, p q e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q L ´ s ˇˇˇˇˇ b p n, m, p q is a polynomial withdegree at most l ´ q p κ q , and κ and s satisfy | κ | ď l ´ s . + for l “ , . . . , k , and let x M l y denote the linear span of M l , i.e. the space ofall real linear combinations of elements of M l . Moreover, for a partition κ , let κ ˘ i denote κ with the integer i added or removed, respectively. For example, p , , , q` “ p , , , , q and p , , , , q´ “ p , , , q . Note that | κ ˘ i | “| κ | ˘ i . Then, for any F P M l , D L t F u “ D L ! b p n, m, p q e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q L ´ s ) “ b p n, m, p q D L ! e ´ n tr L I p )ˇˇ L ˇˇ m r κ p L ´ q L ´ s ` b p n, m, p q e ´ n tr L D L !ˇˇ L ˇˇ m I p ) r κ p L ´ q L ´ s ` b p n, m, p q e ´ n tr L ˇˇ L ˇˇ m D L ! r κ p L ´ q I p ) L ´ s ` b p n, m, p q e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q D L ! L ´ s ) “ ” ´ n b p n, m, p q ı e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q L ´ s ` ” m b p n, m, p q ı e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q L ´ s ` q p κ q ÿ i “ ” ´ κ i b p n, m, p q ı e ´ n tr L ˇˇ L ˇˇ m r κ ´ κ i p L ´ q L ´ s ` ” ´ s b p n, m, p q ı e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q L ´p s ` q ` s ÿ t “ ” ´ b p n, m, p q ı e ´ n tr L ˇˇ L ˇˇ m r κ `p s ` ´ t q p L ´ q L ´ t Thus D L t F u P x M l ` y . It follows by linearity that D L maps x M l y to x M l ` y .Now, e ´ n tr L ˇˇ L ˇˇ m I p P M , so by induction D kL t e ´ n tr L ˇˇ L ˇˇ m I p u P x M k y .Hence, for some polynomials b p q κ,s p n, m, p q of degree at most 2 k ´ q p κ q ,tr D kL ! e ´ n tr L ˇˇ L ˇˇ m I p ) “ ÿ | κ |` s ď k b p q κ,s p n, m, p q e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q tr p L ´ s q“ ÿ | κ |ď k b p q κ p n, m, p q e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q (8.4)for κ “ κ ` s , b p q κ “ b p q κ,s when s ‰
0, while κ “ κ , b p q κ “ pb p q κ,s when s “
0. Notice that when s ‰
0, the degree of the b κ ’s is at most 2 k ´ q p κ q “ k ´ p q p κ q ´ q , while when s “ k ´ q p κ q ` “ k ´ q p κ q ` b p q κ ď k ´ q p κ q `
1, which by Equation (8.2) showsthe first statement of the lemma.For the second statement of the lemma, by an argument analoguous to Equa- h´etelat and Wells/Mid-scale Wishart asymptotics tion (8.4) we find that tr D kL t e ´ n tr L ˇˇ L ˇˇ m I p u P x M k ` y . Thus by inductionagain, we must have D kL t tr D kL t e ´ n tr L ˇˇ L ˇˇ m I p uu P x M k ` y . Hence for somepolynomials b p q κ,s p n, m, p q of degree at most 2 k ` ´ q p κ q , D kL ! tr D kL ! e ´ n tr L ˇˇ L ˇˇ m I p )) “ ÿ | κ |` s ď k ` b p q κ,s p n, m, p q e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q tr p L ´ s q“ ÿ | κ |ď k ` b p q κ p n, m, p q e ´ n tr L ˇˇ L ˇˇ m r κ p L ´ q for again κ “ κ ` s , b p q κ “ b p q κ,s when s ‰
0, while κ “ κ , b p q κ “ pb p q κ,s when s “
0. By the same argument as before, deg b p q κ ď k ´ q p κ q `
2, which byEquation (8.3) shows the second statement of the lemma. This concludes theproof.We will also need in our proof a result about the asymptotics of inversemoments of the Wishart distribution. Because we couldn’t find anything like itin the literature, we think it is worthwhile to provide some context.Let f : p , q Ñ R be the restriction to the positive reals of a complexfunction analytic in a neighborhood of p , q . We are often interested in thelinear spectral statistic p tr f p Y q for Y „ W p p n, I p { n q . Much is known aboutits distributional properties in the high-dimensional regime where p Ñ 8 suchthat lim n Ñ8 pn “ α ă
1. For example, if 0 ă α ă
1, there must be an (cid:15) ą p { n P r (cid:15), ´ (cid:15) s for all n large enough, so Bai and Silverstein [2010, Theorem9.10] and the dominated convergence theorem yield that1 p tr f p Y q P ÝÑ lim n Ñ8 ż r p { n s ` r p { n s ´ f p t q a pr p { n s ` ´ t qp t ´ r p { n s ´ q π r p { n s t dt “ ż α ` α ´ f p t q a p α ` ´ t qp t ´ α ´ q παt dt (8.5)as n Ñ 8 . Here, P Ñ stands for convergence in probability and x ˘ for p ˘ ? x q .In fact, the theorem states more, namely a central limit theorem, but what wewant to draw to attention is the class of functions for which this result wasproven.This is sometimes enough, but often we would like to understand the ex-pectation of this linear spectral statistic. If f is bounded, then Equation (8.5)implies thatlim n Ñ8 p E ” tr f p Y q ı “ ż α ` α ´ f p t q a p α ` ´ t qp t ´ α ´ q παt dt. (8.6)This is nice for a function f p z q like e z or sin z that happens to be bounded h´etelat and Wells/Mid-scale Wishart asymptotics on a neighborhood of p , q , but it unfortunately excludes many interestingunbounded functions, such as log z or 1 { z . In fact, for unbounded f , it is ingeneral not even clear if lim n Ñ8 p E r tr f p Y qs will be finite!The following result shows that, at least in the case f p z q “ { z s for s P N ,we can use Stein’s lemma to obtain Equation (8.6) and its α “ Lemma 5.
Let for Y „ W p p n, I p { n q and s be any integer s ě . Then as longas n ě p ` s ` , the s th inverse moment satisfies the recursive bound ˆ ´ p p ` q sn ˙ E ” tr Y ´ s ı ď E ” tr Y ´p s ´ q ı . In particular, as p Ñ 8 such that lim n Ñ8 pn “ α ă , if s ă α ´ ´ then lim n Ñ8 p E ” tr Y ´ s ı “ $’’’&’’’% ż α ` α ´ a p α ` ´ t qp t ´ α ´ q παt s ` dt if ă α ă , if α “ . for α ˘ “ p ˘ ? α q .Proof. The classical Stein’s lemma states that for any differentiable function f : R Ñ R such that E “ˇˇ` BB Z ´ Z ˘ f p Z q ˇˇ‰ ă 8 for Z „ N p , q and lim z Ñ˘8 f p z q e ´ z { “
0, we must haveE „´ BB Z ´ Z ¯ f p Z q “ . Let Z „ N n ˆ p p , I n b I p q be an n ˆ p matrix of i.i.d. standard normal randomvariables, and let Y “ n Z t Z „ W p p n, I p { n q . For any 1 ď α ď n and 1 ď β, i, j ď p , BB Z αβ “ n p ÿ i “ Z αi B s B s Y iβ and B s Y ´ sβj B s Y iβ “ ´ s ÿ l “ ” Y ´ lβi Y ´p s ´ l ` q βj ` Y ´ lββ Y ´p s ´ l ` q ij ı , so for δ the Kronecker delta, ˆ BB Z αβ ´ Z αβ ˙` ZY ´ s ˘ αβ “ p ÿ j “ „ δ βj Y ´ sβj ` n p ÿ i “ Z αj Z αi B s B s Y iβ Y ´ sβj ´ Z αβ Z αj Y ´ sβj “ Y ´ sββ ´ n s ÿ l “ ` ZY l ˘ αβ ` ZY ´p s ´ l ` q ˘ αβ ´ n s ÿ l “ Y ´ lββ ` ZY ´p s ´ l ` q Z t ˘ αα ´ Z αβ ` ZY ´ s ˘ αβ (8.7)Let us first show that this expression is integrable. For any matrix X , | X ij | ď} X } “ } X t X } { . Thus by Equation (8.7),E „ˇˇˇˇˆ BB Z αβ ´ Z αβ ˙` ZY ´ s ˘ αβ ˇˇˇˇ h´etelat and Wells/Mid-scale Wishart asymptotics ď E « } Y ´ s } ` s ÿ l “ } Y l ` } } Y s ` l ´ } ` s ÿ l “ } Y ´ l } } Y ´ s ` l } ` n } Y } } Y ´ s ` } ff As Y is positive definite, } Y ˘ a } ď tr Y ˘ a for any a P N , so by the Cauchy-Schwarz inequality, ď E „ tr Y ´ s ` s ÿ l “ E ” tr Y ´ l ` ı E ” tr Y ´ s ` l ´ ı ` s ÿ l “ E ” tr Y ´ l ı E ” tr Y ´ s ` l ı ` n E ” tr Y ı E ” tr Y ´ s ` ı , which is finite for n ě p ` s ` p ZY ´ s q αβ can be expressed using minors and determinants as arational function of the entries of Z , solim Z αβ Ñ˘8 p ZY ´ s q αβ e ´ Z αβ { “ . So all conditions are fulfilled to apply Stein’s lemma to Equation 8.7 and obtain0 “ E « n n ÿ α “ p ÿ β “ ˆ BB Z αβ ´ Z αβ ˙` ZY ´ s ˘ αβ ff “ E « tr Y ´ s ´ sn tr Y ´ s ´ n s ÿ l “ tr p Y ´ l q tr p Y ´p s ´ l q q ´ tr Y ´p s ´ q ff As tr p Y ´ l q tr p Y ´p s ´ l q q ď p tr Y ´ s for any 1 ď l ď s , and every term is integrableas n ě p ` s `
2, this means that ˆ ´ p p ` q sn ˙ E ” tr Y ´ s ı ď E ” tr Y ´p s ´ q ı . (8.8)This shows the first part of the proof.For the second part, let S P N . If we let n Ñ 8 such that lim n Ñ8 pn “ α ă S ă α ´ we will have n ě p ` S ` n ě p p ` q S for n largeenough. So by repeatedly applying Equation (8.8) for s “ S, . . . , p , we obtain S ź l “ ˆ ´ p p ` q ln ˙ ¨ p E ” tr Y ´ S ı ď p E ” tr Y ´ ı “ . Taking a limit in the above yields S ź l “ p ´ αl q lim n Ñ8 p E ” tr Y ´ S ı ď . Thus for any S ă α ´ , we havelim n Ñ8 p E ” tr Y ´ S ı ď S ź l “ ´ αl ă 8 . (8.9) h´etelat and Wells/Mid-scale Wishart asymptotics In the case 0 ă α ă
1, if s ` ă α ´ then by Jensen’s inequality and Equation(8.9) applied to S “ s `
1, we havelim n Ñ8 p E ”´ tr Y ´ s ˘ ` s ı ď lim n Ñ8 p E ” tr Y ´p s ` q ı ă 8 . Thus p tr Y ´ s is uniformly integrable, and by Equation (8.5),lim n Ñ8 p E ” tr T ´ s ı “ ż α ` α ´ a p α ` ´ t qp t ´ α ´ q παt s ` dt for α ˘ “ p ˘ ? α q .In contrast, by applying Jensen’s inequality twice,1 “ E „ p tr Y ´ s ď E „´ p tr Y ¯ ´ s ď E „ p tr Y ´ s so when α “
0, by applying Equation (8.9) with S “ s , we obtain thatlim n Ñ8 p E r tr Y ´ s s “
1, as desired.
9. Conclusion
The results of this paper raise more questions than they answer. We enumeratesome that we found particularly interesting.(1) The univariate t distribution with ν degrees of freedom is often defined asthe distribution of Z {? s , for Z „ N p , q and s „ χ ν { ν independent. Inthe real symmetric matrix case, we could imagine studying the distributionof S { ZS { , for Z „ GOE p p q and S „ W p p ν, I p { ν q independent. Is thisthe T ν p I p q distribution in the sense of Section 4?(2) By Theorem 2 and Corollary 1, it is clear the empirical moments of asymmetric t distribution are quite similar to those of a Gaussian orthog-onal ensemble matrix, except perhaps in their rates of convergence. FromAnderson et al. [2010, Theorem 2.1.31], we know the empirical momentsof a Gaussian orthogonal ensemble are asymptotically normal. Are theempirical moments of the symmetric t distribution also asymptoticallynormal?(3) In Section 4, we showed that the rate of convergence of the even normalizedempirical moments of a symmetric t distribution change when p grows like ? n . Can we find analogue symmetric t statistics that change their rates ofconvergence when p grows like n p K ` q{p K ` q for every K P N ? This wouldestablish phase transitions for the symmetric t distribution. If so, can wefind approximating densities between every two transitions, just like in theWishart case?(4) As a counterpart of Theorem 1, could we prove that d TV p f NW , f K q Û p K ` { n K ` Û n Ñ 8 ? This is delicate because we have noguarantee that the L norm of ψ K is asymptotically bounded for regimesof degree K ` h´etelat and Wells/Mid-scale Wishart asymptotics (5) Can we find the normalization constant or, better, solve the expectationof Equation (1.3) in closed form?(6) What asymptotics hold for the symmetric t or the Wishart distribution ina middle-scale regime of infinite degree? How do these asymptotics differfrom the other middle-scale regimes, or the high-dimensional regime?(7) The symmetric t distribution was discovered as the G-conjugate of theWishart distribution. What other distributions can be realized as the G-conjugate of some well-known distribution?(8) In Lemma 2, we expressed the characteristic function of the G-conjugate F ˚ of a distribution F as f { ‹ p f { ˝ R q , for f the density of F and R theflip operator. To obtain the moments, we then repeatedly differentiatedunder the convolution integral at zero, and obtained an expression of themoments as an expectation with respect to f . The argument worked when F ˚ was the symmetric t distribution. Can this argument be generalized toother F ˚ ? If F ˚ is a well-known distribution, does this give rise to noveland nontrivial expressions for its moments?(9) The G-transform of a distribution encodes all the information relative tothat distribution. However, taking a modulus removes some information,and so in some sense the G-conjugate distribution is “less informative”than the original distribution. What happens when we repeatedly applythe G-conjugation operator, destroying information every time? For ex-ample, is there an attractor distribution G that is the limit of this processregardless of the initial distribution?(10) Can we find distinct random operators which can be regarded, in somesense, as the total variation limit of a normalized Wishart matrix betweenevery two phase transitions?It appears to us that some of these questions might be very difficult to answer.We would be pleased if future work were able to shed light on any of them. References
Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni.
An Introduction toRandom Matrices . Cambridge University Press, 2010.Zhidong Bai and Jack W. Silverstein.
Spectral Analysis of Large DimensionalRandom Matrices , volume 20. Springer, 2010.Maurice S. Bartlett. On the theory of statistical regression.
Proceedings of theRoyal Society of Edinburgh , 53:260–283, 1933.S´ebastien Bubeck and Shirshendu Ganguly. Entropic CLT and phase transitionin high-dimensional Wishart matrices.
International Mathematics ResearchNotices , pages 243–258, 2016.S´ebastien Bubeck, Jian Ding, Ronen Eldan, and Mikl´os Z. R´acz. Testing forhigh-dimensional geometry in random graphs.
Random Structures & Algo-rithms , 49:503–532, 2016.Kai-Lai Chung.
A Course in Probability Theory . Academic Press, 2001. h´etelat and Wells/Mid-scale Wishart asymptotics Michel Gaudin. Sur la loi limite de l’espacement des valeurs propres d’unematrice al´e atoire.
Nuclear Physics , 25:447–458, 1961.Arjun K. Gupta and Daya K. Nagar.
Matrix Variate Distributions . CRC Press,1999.Tiefeng Jiang and Danning Li. Approximation of rectangular beta-Laguerreensembles and large deviations.
Journal of Theoretical Probability , 28:804–847, 2015.G´erard Letac and H´el`ene Massam. All invariant moments of the wishart distri-bution.
Scandinavian Journal of Statistics , 31:295–318, 2004.Vladimir A. Marchenko and Leonid A. Pastur. Distribution of eigenvalues forsome sets of random matrices.
Matematicheskii Sbornik , 114:507–536, 1967.Sho Matsumoto. General moments of the inverse real wishart distribution andorthogonal weingarten functions.
Journal of Theoretical Probability , 25:798–822, 2012.Madan L. Mehta. On the statistical properties of the level-spacings in nuclearspectra.
Nuclear Physics , 18:395–419, 1960a.Madan L. Mehta. Lemploi des polynˆomes orthogonaux pour calculer certaind´eterminants.
Rapport S PH (Saclay) , 658, 1960b.Robb J. Muirhead.
Aspects of Multivariate Statistical Theory . Wiley Series inProbability and Mathematical Statistics. Wiley, 1982.Charles E. Porter and Norbert Rosenzweig. Statistical properties of atomic andnuclear spectra. Technical report, Univ. of Minnesota, Minneapolis, 1960.Mohsen Pourahmadi.
High-dimensional Covariance Estimation: with High-Dimensional Data . John Wiley & Sons, 2013.Mikl´os Z. R´acz and Jacob Richey. A smooth transition from Wishart to GOE. arXiv:1611.05838 , 2016.Aad W. van der Vaart.
Asymptotic statistics , volume 3. Cambridge UniversityPress, 2000.John Von Neumann and Herman H. Goldstine. Numerical inverting of matricesof high order.
Bulletin of the American Mathematical Society , 53:1021–1099,1947.Eugene P. Wigner. On the statistical distribution of the widths and spacingsof nuclear resonance levels. In
Mathematical Proceedings of the CambridgePhilosophical Society , volume 47, pages 790–798, 1951.Eugene P. Wigner. Characteristic vectors of bordered matrices with infinitedimensions.
Annals of Mathematics , 62:548–564, 1955.Eugene P. Wigner. Statistical properties of real symmetric matrices with manydimensions. In
Proceedings of the Fourth Canadian Mathematical Congress .University of Toronto Press, 1957.John Wishart. The generalised product moment distribution in samples from anormal multivariate population.