Gaussian distributions on Riemannian symmetric spaces in the large N limit
GGaussian distributions on Riemannian symmetricspaces in the large N limit (cid:63) Simon Heuveline , , Salem Said , and Cyrus Mostajeran Centre for Mathematical Sciences, University of Cambridge, United Kingdom [email protected] Mathematical Institute, University of Heidelberg, Germany CNRS, University of Bordeaux, France Department of Engineering, University of Cambridge, United Kingdom
Abstract.
We consider Gaussian distributions on certain Riemanniansymmetric spaces. In contrast to the Euclidean case, it is challenging tocompute the normalization factors of such distributions, which we referto as partition functions. In some cases, such as the space of Hermitianpositive definite matrices or hyperbolic space, it is possible to computethem exactly using techniques from random matrix theory. However, inmost cases which are important to applications, such as the space ofsymmetric positive definite (SPD) matrices or the Siegel domain, thisis only possible numerically. Moreover, when we consider, for instance,high-dimensional SPD matrices, the known algorithms for computingpartition functions can become exceedingly slow. Motivated by notionsfrom theoretical physics, we will discuss how to approximate the partitionfunctions by their large N limit: an approximation that gets increasinglybetter as the dimension of the underlying symmetric space (more pre-cisely, its rank) gets larger. We will give formulas for leading order termsin the case of SPD matrices and related spaces. Furthermore, we willcharacterize the large N limit of the Siegel domain through a singularintegral equation arising as a saddle-point equation. Keywords:
Gaussian distributions · Riemannian symmetric spaces ·Random matrix theory · SPD matrices · Partition functions · High-dimensional data
It is widely known that an isotropic Gaussian distribution on R N takes theform p ( x ; ¯ x, σ ) = (2 πσ ) − n/ exp( − σ ( x − ¯ x ) ) , where ¯ x and σ denote the mean (cid:63) This work benefited from partial support by the European Research Council underthe Advanced ERC Grant Agreement Switchlet n.670645. S.H. also received partialfunding from the Cambridge Mathematics Placement (CMP) Programme. C.M. wassupported by Fitzwilliam College and a Henslow Fellowship from the CambridgePhilosophical Society. a r X i v : . [ m a t h . S T ] F e b S. Heuveline et al. and standard deviation, respectively. Gaussian distributions can naturally begeneralized to Riemannian manifolds ( M, g ) with the property Z M (¯ x, σ ) := (cid:90) M exp( − σ d g ( x, ¯ x ) ) d vol g ( x ) < ∞ , (1)where d vol g ( x ) denotes the Riemannian volume measure on M and d g : M × M → M denotes the induced Riemannian distance function. We will refer to Z M as the partition function . In this case, p M ( x ; ¯ x, σ ) := 1 Z M (¯ x, σ ) exp( − σ d g ( x, ¯ x ) ) , (2)is a well-defined probability distribution on M .In numerous applications requiring the statistical analysis of manifold-valueddata, it is important to be able to compute the partition function Z M . A keydifference between p R N and p M in general is that Z M is typically a complicatedintegral while Z R N (¯ x, σ ) = (2 πσ ) n/ is readily available. On a general Rieman-nian manifold, Z M will depend in a highly non-linear and intractable way on themean ¯ x as the dominant contribution to Z M will come from local data such asthe curvature at ¯ x . This complex dependence on local data is known from theheat-kernel approach to the Atiyah-Singer Index Theorem (see [3] chapters 2 and4), for instance. Hence, to have a chance of actually computing the full parti-tion function analytically, we should restrict to spaces that look the same locallyat any point. Riemannian symmetric spaces formalize this intuition and we arefortunate that precisely such spaces appear in most applications of interest (see,for example, [2,5,8,13,14,15,21]). Definition 1. A Riemannian symmetric space is a Riemannian manifold ( M, g ) such that for each point p ∈ M there is a global isometry s p : M → M such that s p ( p ) = p and d p s p = − id T p M , where d p s p denotes the differential of s p at p . This definition emphasizes the geometric property that there are a lot ofisometries on M . In fact, in the general theory of symmetric spaces [4,7], it canbe shown that any symmetric space is isomorphic to a quotient of Lie groups M ∼ = G/H and that integrals over M can be reduced to integrals over therespective Lie groups and their Lie algebras (more precisely, certain subspaces)by the following proposition, which is discussed in [15] and references therein: Proposition 2.
Given an integrable function on a non-compact symmetric space ( M = G/H, g ) , we can integrate in the following way: (cid:90) M f ( x ) d vol g ( x ) = C (cid:90) H (cid:90) a f ( a, h ) D ( a ) dadh (3) where dh is the normalised Haar measure on H and da is the Lebesgue measureon a certain subspace a ⊂ g = Lie( G ) [7,15]. The function D : a → R + is givenby the following product over roots λ : a → R + (with m λ -dimensional root space) D ( a ) = (cid:81) λ> sinh m λ ( | λ ( a ) | ) . aussian distributions on Riemannian symmetric spaces in the large N limit 3 Proposition 2 and the fact that its isometry group acts transitively on a sym-metric space allows us to prove that in fact, the partition function of a symmetricspace does not depend on ¯ x : Z G/H (¯ x, σ ) = Z G/H ( σ ) (see [15]). Moreover, withProposition 2 at hand, the problem of determining Z G/H reduces to calculatingan integral over a , which is a linear space. Computing the resulting integralscan now be achieved numerically using a specifically designed Monte-Carlo algo-rithm [14]. Even though this works well for small dimensions, a ∼ = R N is typicallystill a high-dimensional vector space when G/H is high-dimensional. The knownalgorithms start to break down for N ≈ [15], even though cases involving N > are of relevance to applications such as electroencephalogram (EEG)based brain-computer interfaces [2,18]. It should be noted as an aside, however,that for some spaces, N does not depend on the dimension of the underlyingsymmetric space. For instance, in hyperbolic d -space, N = 1 independently of d [8]. Fortunately, for the space of positive definite Hermitian matrices, the result-ing integrals have previously been studied in the context of Chern-Simons theory[11,20] and, in fact, they fit within the general theory of random matrices [12]. Inthis paper, we will extend such approaches to study the corresponding integralsfor a broader range of symmetric spaces of interest, including the space of sym-metric positive definite (SPD) matrices and the Siegel domain. We should liketo note that we have recently discovered that ideas similar to the ones exploredin this paper were developed independently by [17] from a slightly different per-spective, which we will draw attention to throughout this work. However, weexpect that our work will complement [17] by providing a stronger emphasis onthe large N limit, including an analysis of the saddle-point equation. N and random matrices Motivated by applications, we will study the following spaces:1. The spaces of symmetric, Hermitian and quaternionic Hermitian positivedefinite matrices will be denoted by P F ( N ) ∼ = GL( N, F ) /K where K ∈{ U( N ) , O ( N ) , Sp( N ) } for F ∈ { R , C , H } , respectively. In each case, a ⊂ Lie(GL( N, F )) ∼ = End( N, F ) is the N -dimensional space of diagonal matri-ces. We will denote the partition functions by Z β ( σ ) where β ∈ { , , } ,respectively. P ( N ) := P R ( N ) is commonly referred to as the space of SPDmatrices .2. The Siegel domain D ( N ) ∼ = Sp(2 N, R ) / U( N ) is the space of complex sym-metric N × N matrices z such that the imaginary part Im ( z ) is a positivedefinite matrix. We will denote its partition function by Z S .Using Proposition 2, it can be seen [8] that their respective partition functionsare given by Z β ( σ ) = C N,β ( σ )(2 π ) N N ! (cid:90) R N + N (cid:89) i =1 (cid:16) exp (cid:0) − log ( u i )2 σ (cid:1)(cid:17) | ∆ ( u ) | β N (cid:89) i =1 du i (4) S. Heuveline et al. where ∆ ( u ) := (cid:81) i 1) + 1 . This is in the well-knownform of a random matrix partition function: Z ( σ ) = (cid:90) ( a,b ) N N (cid:89) i =1 (cid:16) exp (cid:0) − V ( u i ; σ )) (cid:17) | ∆ ( u ) | β N (cid:89) i =1 du i , (5)even though our potential takes the non-standard form V SW ( x ; σ ) := σ log ( x ) .For instance, with the quadratic potential V Q ( x ; σ ) := σ x , the random ma-trix partition function from Equation (5) corresponds to the famous orthogonal,unitary and symplectic random matrix ensembles for β ∈ { , , } , respectively[12], on which we will comment further in Example 9. In fact, matrix modelswith the potential V SW also appear in the physics literature as the partitionfunctions of U( N ) Chern-Simons theory on S ([11,20]). This relation is notdirectly of use here, but it is worth mentioning that it provided our originalinspiration for probing the structure behind the normalizing factors. Moreover,the large N limit of U( N ) Chern-Simons theory is of physical interest and hasbeen well studied in the theoretical and mathematical physics literature such as[10] Chapter 36.2 and [1], which motivates Section 3. Z S also turns out to be ofthe form (5) with β = 1 and the potential V S ( u ; σ ) := log ( u + √ u − σ : Z S ( σ ) = vol(U( N ))2 N ( N +1)2 N ! (cid:90) (1 , ∞ ) N N (cid:89) i =1 exp (cid:0) − V S ( u i ; σ ) (cid:1) ∆ ( u ) N (cid:89) i =1 du i . (6)Integrals of the form (5) can generally be calculated, by bringing the Vander-monde determinant to a suitable form involving orthogonal or skew-orthogonalpolynomials. Definition 3. Let V : ( a, b ) → R be a given potential. A set of polynomials { R i : i = 1 , . . . , N } , with R j ( x ) = a j x j + . . . being of degree j is called1. orthogonal with potential V if they form an orthonormal basis for thespace of degree N polynomials with respect to the inner product (cid:104) f, g (cid:105) = (cid:90) ba exp (cid:0) − V ( x ) (cid:1) f ( x ) g ( x ) dx. (7) skew-orthogonal with potential V if they bring the skew-symmetric prod-uct (cid:104) f, g (cid:105) = (cid:90) ba (cid:90) ba f ( x ) g ( y ) sign ( x − y ) exp( − V ( x )) exp( − V ( y )) dxdy (8) to the standard form, meaning (cid:40) (cid:104) R k , R l (cid:105) = (cid:104) R k +1 , R l +1 (cid:105) = 0 (cid:104) R k , R l +1 (cid:105) = −(cid:104) R l +1 , R k (cid:105) = δ kl . (9) aussian distributions on Riemannian symmetric spaces in the large N limit 5 If we are given orthogonal or skew-orthogonal polynomials for a given poten-tial V , then its corresponding partition function with β = 1 or β = 2 , respec-tively, can be calculated in terms of the polynomials’ leading order coefficients[12]. A similar story, which we will not go into also works for the quaternioniccase β = 4 . Fortunately, orthogonal polynomials for the potential V SW havepreviously appeared in the literature as Stieltjes-Wigert polynomials [19],which can be used to arrive at the following result [8]. Proposition 4. The partition function Z for P C ( N ) is given by Z ( σ ) = ω ( N )2 N (2 πσ ) N exp (cid:16) ( N − N ) σ (cid:17) N − (cid:89) k =1 (1 − e − kσ ) N − k . Remark 5. It has come to our attention that very recently, there has beenfurther progress in the case β = 2 . In the beautiful paper [6], higher momentsof the probability density are provided alongside further physical interpretationsthat go beyond the ones presented here. Moreover, the eigenvalue density at finite N is computed in [17]. Unfortunately, even though [17] provides explicit calculations for the first fewskew-orthogonal polynomials of V SW , general expressions for such polynomialshave yet to be found as noted in [8,17]. However, if we let a i and b i denote theleading order coefficients of skew orthogonal polynomials { P i = a i x i + . . . | i =1 , . . . , N } for V SW and skew orthogonal polynomials { Q i = b i x i + . . . | i =1 , . . . , N } for V S , respectively, then we can obtain exact formulas for Z and Z S at finite N in terms of a i and b i [8]. For technical convenience, we will fo-cus on the even-dimensional cases ( N = 2 m ) in this work. The odd-dimensionalcases can be treated in a similar manner subject to a number of technical mod-ifications. See [17] for details on the treatment of the odd-dimensional case. Proposition 6. If N = 2 m , the partition functions Z and Z S are given by Z ( σ ) = ω ( N )2 Nm exp (cid:0) − N (( N − / ( σ / (cid:1)(cid:18) N (cid:89) l =1 a l (cid:19) − (10) Z S ( σ ) = vol(U( N ))2 m ( N +1) (cid:18) N (cid:89) l =1 b l (cid:19) − . (11)The coefficients a i and b i can be found by bringing the inner product (8) intothe standard form (9), which can be achieved numerically via a symplectic Gram-Schmidt algorithm [16]. Nonetheless, such a computation becomes exceedinglychallenging for large N , which motivates the consideration of the large N limitprovided in the following section. N limit and thesaddle-point equation We will now change our point of view and compute the large N limit of thepartition function Z , rather than trying to compute it at finite N . Here, the S. Heuveline et al. large N limit specifically refers to the limit where N → ∞ while the ‘t Hooftparameter t := σ N is kept fixed. In this limit, Z has an asymptotic expansionin N − , known as the genus expansion [8,11]: log( Z ( σ )) ∼ ∞ (cid:88) g =0 f g ( t ) N − g . (12)The first term in this series becomes an increasingly good approximation as N gets larger, which is very useful since we are able to compute it.Since fixing t implies that σ → , the large N limit is also referred to asthe double scaling limit [17]. Moreover, in the physics literature, it is alsoknown as the planar limit due to its interpretation in terms of planar Feynmandiagrams [8] originally dating back to the 1970s where it has been studied in thecontext of quantum chromodynamics [9]. Interestingly, one may also considerother limits such as σ → and σ → ∞ while N is fixed, which we will notconsider in this work (see [17] (II, 3. a, b) for further details). In the case of Z ,the large N limit can be directly computed from Proposition 4 and is found tobe N log( Z ( σ )) ∼ − 12 log (cid:18) Nπ (cid:19) + 34 + t − Li ( e − t ) − ζ (3) t (13)where Li ( x ) := ∞ (cid:80) k =1 x k k (for | x | < ) is the trilogarithm.Equation (13) crucially relies on having a closed form expression for thepartition function Z for any finite N in the first place (Proposition 4). This isnot the case for Z and Z S , since we do not know the asymptotics of a i and b i from Proposition 6. Therefore, we will now discuss a different, more powerfulapproach to the large N limit inspired by ideas from theoretical physics: a saddle-point approximation. This will enable us to directly obtain the large N limit of Z in Proposition 10 by solving a certain singular integral equation, the saddle-point equation .As above, we split Z β ( σ ) = C N,β ( σ ) ˜ Z β ( σ ) , where ˜ Z β is in an appropriateform for the use of saddle point methods as discussed in [11] (see Equation(1.46) therein): ˜ Z β ( σ ) = 1 N ! (cid:90) R N + exp (cid:0) N ˜ V SW ( (cid:126)λ, β ) (cid:1) N (cid:89) k =1 dλ k π , (14)where we have introduced the effective potential : ˜ V SW (cid:16) (cid:126)λ, β ; t (cid:17) := − tN N (cid:88) i =1 log ( λ i ) + βN (cid:88) i The effective potential ˜ V SW gives rise to a physical interpretationof the theory and its large N limit: the N eigenvalues can be seen as static par-ticles in the potential V SW interacting through a logarithmic Coulomb repulsion(the second term of the effective potential). As σ = tN decreases, the repulsionbecomes weak and all particles can sit next to each other close to the minimumof the potential ( λ = 1 ), while the particles tend to spread out for large σ asobserved in Figure 1. Now, the large N limit can be seen to correspond to theaddition of more and more particles (i.e. N → ∞ ) while letting their repulsionbecome increasingly weak (fixing t = N σ ). Finding the limiting distribution ρ t (that still depends on t ) turns out to characterize the large N limit of the parti-tion function. It can be obtained by solving the saddle-point equation for ˜ V in acontinuum limit as discussed below. Motivated by the physics literature, we willrefer to this ρ t as the master field . In the random matrix theory literature, thisapproach is known as the Coulomb gas method [12]. Remark 8. A further observation, from the effective action (15) is that up toan overall factor of ˜ V (which leaves the saddle-point equation invariant), we canabsorb the β by rescaling t (cid:55)→ t/β . So, in the large N limit, ˜ Z β ( σ ) ∼ ˜ Z ( (cid:112) β/ σ ) irrespective of the choice of β ∈ { , , } , which is remarkable considering thequite distinct geometric origins associated with the different values of β . Werefer to this phenomenon as universality : the three cases K ∈ { R , C , H } startout differently, but "flow" to a universal limit (characterized by the master field)as N → ∞ . This is specially useful for the particularly important case of β = 1 ,which can now be related to the large N limit of Z derived in Equation (13) : N log( Z β ( σ )) ∼ N log (cid:18) Z (cid:18)(cid:113) β σ (cid:19)(cid:19) − log C ∞ ,β ( σ ) C ∞ , ( (cid:113) β σ ) (17) where C ∞ ,β are the large N limits of the prefactors, which are discussed in [8].Below, we will give another formula for the large N limit of Z β , by solving thesaddle-point equation explicitly. Example 9 (Wigner’s semi-circle law) . For simplicity, since V Q ( λ ; σ ) = σ λ has a simpler form than V SW or V S yet illustrates all the necessary ideas, we willbegin by considering its saddle-point equation. As motivated in Remark 7 and fol-lowing [11] ((1.47)-(1.53)), we are interested in the saddle points of the effectivepotential, which in this case reads ˜ V Q ( (cid:126)λ ; t ) := − tN (cid:80) i λ i + βN (cid:80) i The master field ρ Qβt/ of Wigner’s semicircle law (left) and ρ SWβt/ (right) forfixed t = 1 / and different choices of β . The red, blue and green curves correspond tothe real, complex and quaternionic cases, respectively ( β = 1 , , ). In fact, the steps of Example 9 that lead to the saddle point equation (19)can be performed analogously for the potentials V SW and V S . Moreover, thesaddle-point equation for V SW can even be solved analytically using resolventmethods [8,11], which leads to the following result. Proposition 10. Let β > . If N → ∞ while the ‘t Hooft parameter t = N σ is fixed, we get N log( ˜ Z β ( σ )) ∼ F uni (cid:18) β t (cid:19) + O ( N − ) (20) aussian distributions on Riemannian symmetric spaces in the large N limit 9 where F uni ( t ) = − t (cid:90) C ( t ) ρ SWt ( λ ) log λ dλ + (cid:90) C ( t ) ρ SWt ( λ ) ρ SWt ( λ (cid:48) ) log( | λ − λ (cid:48) | ) dλdλ (cid:48) (21) and the master field ρ SWt is given by ρ SWt ( λ ) = 1 πtλ tan − (cid:34) (cid:112) λ − (1 + e − t λ ) e − t λ (cid:35) χ C ( t ) (22) where C ( t ) = [2 e t − e t + 2 e t √ e t − , e t − e t − e t √ e t − . In Figure 1, Wigner’s semicircle distribution ρ Qt is plotted alongside ρ SWt for different choices of β ∈ { , , } . It can be observed that as t decreases(or equivalently, β decreases), the distibutions tend to concentrate around theclassical minima λ = 0 (for V Q ) and λ = 1 (for V SW ). Conversely, they tend tospread out as t increases.Finally, we note a similar equation that characterizes the master field in thecase of the Siegel domain: log (cid:0) λ + √ λ − (cid:1) t √ λ − P (cid:18)(cid:90) ∞ ρ St ( λ (cid:48) ) dλ (cid:48) λ − λ (cid:48) (cid:19) . (23)At present, we are unaware of a solution to this equation and finding one ei-ther numerically or using resolvent methods is to be carried out in future work.Moreover, it is worth noting that we have certainly not exhausted the list of all(irreducible) symmetric spaces and have rather just focused on a few exampleshere. Even though further spaces are commented on in [8], the development of afull description of the partition functions of all irreducible symmetric spaces andtheir interconnections remains to be done. Universalities and duality betweencompact and non-compact symmetric spaces will be guiding principles towardsthis goal. References 1. Auckly, D., Koshkin, S.: Introduction to the Gopakumar-Vafa large N duality. TheInteraction of Finite-Type and Gromov-Witten Invariants (BIRS 2003) , 195–456(2006)2. Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Multiclass brain–computerinterface classification by Riemannian geometry. IEEE Transactions on BiomedicalEngineering (4), 920–928 (2012)3. Berline, N., Getzler, E., Vergne, M.: Heat kernels and Dirac operators. SpringerScience & Business Media (2003)4. Eberlein, P.: Geometry of nonpositively curved manifolds. University of ChicagoPress (1996)0 S. Heuveline et al.5. Fletcher, P.T., Joshi, S.: Principal geodesic analysis on symmetric spaces: Statisticsof diffusion tensors. In: Sonka, M., Kakadiaris, I.A., Kybic, J. (eds.) ComputerVision and Mathematical Methods in Medical and Biomedical Image Analysis. pp.87–98. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)6. Forrester, P.J.: Global and local scaling limits for the β = 2 Stieltjes–Wigert ran-dom matrix ensemble. arXiv preprint arXiv:2011.11783 (2020)7. Helgason, S.: Differential geometry, Lie groups, and symmetric spaces. Academicpress (1979)8. Heuveline, S., Said, S., Mostajeran, C.: Gaussian distributions on Riemannian sym-metric spaces, random matrices and planar Feynman diagrams. To appear (2021)9. Hooft, G.: A planar diagram theory for strong interactions. Nuclear Physics B (3), 461–473 (1974). https://doi.org/10.1016/0550-3213(74)90154-010. Hori, K., Thomas, R., Katz, S., Vafa, C., Pandharipande, R., Klemm, A., Vakil,R., Zaslow, E.: Mirror symmetry, vol. 1. American Mathematical Soc. (2003)11. Marino, M., et al.: Chern-Simons theory, matrix models, and topological strings.No. 131, Oxford University Press (2005)12. Mehta, M.L.: Random matrices. Elsevier (2004)13. Pennec, X., Sommer, S., Fletcher, T.: Riemannian Geometric Statistics in MedicalImage Analysis. Academic Press (2019)14. Said, S., Bombrun, L., Berthoumieu, Y., Manton, J.H.: Riemannian Gaus-sian distributions on the space of symmetric positive definite matri-ces. IEEE Transactions on Information Theory (4), 2153–2170 (2017).https://doi.org/10.1109/TIT.2017.265380315. Said, S., Hajri, H., Bombrun, L., Vemuri, B.C.: Gaussian distributions onriemannian symmetric spaces: Statistical learning with structured covariancematrices. IEEE Transactions on Information Theory (2), 752–772 (2018).https://doi.org/10.1109/TIT.2017.271382916. Salam, A.: On theoretical and numerical aspects of symplectic Gram–Schmidt-like algorithms. Numerical Algorithms (4), 437–462 (2005).https://doi.org/10.1007/s11075-005-0963-217. Santilli, L., Tierz, M.: Riemannian Gaussian distributions, random matrix ensem-bles and diffusion kernels. arXiv preprint arXiv:2011.13680 (2020)18. Seeck, M., Koessler, L., Bast, T., Leijten, F., Michel, C., Baumgart-ner, C., He, B., Beniczky, S.: The standardized EEG electrode ar-ray of the IFCN. Clinical Neurophysiology (10), 2070–2077 (2017).https://doi.org/10.1016/j.clinph.2017.06.25419. Szegő, G.: Orthogonal polynomials, vol. 23. American Mathematical Soc. (1939)20. Tierz, M.: Soft matrix models and Chern–Simons partition func-tions. Modern Physics Letters A (18), 1365–1378 (2004).https://doi.org/10.1142/S021773230401410021. Tupker, Q., Said, S., Mostajeran, C.: Online learning of Riemannian hidden Markovmodels in homogeneous Hadamard spaces. To appear (2021)22. Wigner, E.P.: Characteristic vectors of bordered matrices with infinite dimensions.Annals of Mathematics (3), 548–564 (1955),(3), 548–564 (1955),