[PDF] Tensor hypercontraction: A universal technique for the resolution of matrix elements of local, finite-range N -body potentials in many-body quantum problems

Abstract

Configuration-space matrix elements of N-body potentials arise naturally and ubiquitously in the Ritz-Galerkin solution of many-body quantum problems. For the common specialization of local, finite-range potentials, we develop the eXact Tensor HyperContraction (X-THC) method, which provides a quantized renormalization of the coordinate-space form of the N-body potential, allowing for a highly separable tensor factorization of the configuration-space matrix elements. This representation allows for substantial computational savings in chemical, atomic, and nuclear physics simulations, particularly with respect to difficult "exchange-like" contractions.

Full PDF

EExact tensor hypercontraction: A universal technique for the resolution of matrixelements of local, ﬁnite-range N -body potentials in many-body quantum problems Robert M. Parrish, Edward G. Hohenstein,

2, 3

Nicolas F. Schunck, ∗ C. David Sherrill, † and Todd J. Mart´ınez

2, 3, ‡ Center for Computational Molecular Science and Technology,School of Chemistry and Biochemistry,and School of Computational Science and Engineering,Georgia Institute of Technology, Atlanta, GA 30332-0400, United States Department of Chemistry and the PULSE Institute, Stanford University, Stanford, CA 94305 SLAC National Accelerator Laboratory, Menlo Park, CA 94025 Lawrence Livermore National Laboratory, Livermore, CA 94551 (Dated: April 11, 2018)Conﬁguration-space matrix elements of N -body potentials arise naturally and ubiquitously inthe Ritz-Galerkin solution of many-body quantum problems. For the common specialization oflocal, ﬁnite-range potentials, we develop the eXact Tensor HyperContraction (X-THC) method,which provides a quantized renormalization of the coordinate-space form of the N -body potential,allowing for a highly separable tensor factorization of the conﬁguration-space matrix elements. Thisrepresentation allows for substantial computational savings in chemical, atomic, and nuclear physicssimulations, particularly with respect to diﬃcult “exchange-like” contractions. PACS numbers: 21.30.Fe,21.60.Jz,31.15.-p,31.10.+z

The physics of many-body quantum systems is of-ten captured by local, ﬁnite-range N -body potentialsˆ V ( x , . . . , x N ), where x is any convenient parameteri-zation of the physical space, e.g., position space ( x ≡ r )or momentum space ( x ≡ k ). Given some real, ﬁ-nite, one-particle Ritz-Galerkin basis set { ψ i ( x ) } , theconﬁguration-space representation of ˆ V is the integraltensor, (cid:104) i . . . n | ˆ V | i (cid:48) . . . n (cid:48) (cid:105) = (cid:90) d x . . . (cid:90) d x N ψ i ( x ) . . . ψ n ( x N ) ˆ V ( x , . . . , x N ) ψ i (cid:48) ( x ) . . . ψ n (cid:48) ( x N ) . (1)The generation, manipulation, and storage of this tensoris a major hurdle in many-body quantum simulations. Inorder to overcome the computational diﬃculties inherentto such high order tensors, it is common to introducesimplifying approximations. For example, the Slater ap-proximation [1] has been applied to reduce the numericalexpense of treating exchange terms involving the local,two-body Coulomb potential. Unfortunately, such ap-proximations can fail, as exempliﬁed by the often spec-tacular self-interaction errors induced by local approxi-mations to exchange interactions [2]. Another canoni-cal example is nuclear density functional theory (DFT),where the need for computational savings is the maindriver for the continued usage of energy density func-tionals (EDF) derived from the zero-range Skyrme-likepseudopotential [3], in spite of severe problems at bothtwo- and three-body levels [4, 5]. At the two-body level,even EDF derived from the ﬁnite-range Gogny pseudopo-tential [6], which allows to avoid some of the limitationsof Skyrme functionals [7], contain the same phenomeno-logical density-dependent terms recently shown to cause the collapse of all beyond mean-ﬁeld methods [8–11]. Re-moving density-dependences in the EDF, however, wouldprobably require introducing explicit ﬁnite-range N -potentials (with N ≥

2) would be highly desir-able .In this Letter, we show that an exact and separable decomposition exists for any local potential in a ﬁnitebasis set built from polynomial functions in any desiredparameterization of the physical space. This decompo-sition is motivated by our recently introduced TensorHyperContraction (THC) method for electronic struc-ture [12–14], which provided a phenomenological approx-imation for the electron repulsion integrals involving theCoulomb potential in non-polynomial basis sets. Thenew eXact Tensor HyperContraction (X-THC) represen-tation reveals two points of great importance for bothelectronic and nuclear structure problems. First, THCapproximation of the Coulomb interaction is exact for ba-sis sets which can be expressed in polynomial form (andthus the approximation in electronic structure arises onlybecause the basis functions used were of non-polynomialform). Secondly, THC approximation is applicable notonly to the two-body Coulomb interaction but also toarbitrary local potentials commonly encountered in nu-clear structure (such as the Coulomb, Gogny, local formsof realistic three-body potentials, etc.). Since the nu-clear problem is already commonly formulated in termsof polynomial basis sets, this implies that many problemsin nuclear structure can now be treated exactly with the lossless scaling reduction aﬀorded by the X-THC repre-sentation. The ﬁrst of these points may aid markedly a r X i v : . [ nu c l - t h ] S e p in the search for more eﬃcient THC approximations inelectronic structure, while the second may yield unprece-dented physical ﬁdelity in nuclear structure computations(especially within the context of nuclear DFT).Below, we ﬁrst demonstrate the key features of the X-THC representation through the representative exampleof a one-dimensional, two-body problem in Cartesian co-ordinates using Hermite functions. The D -dimensional, N -body generalization of X-THC is then presented. Fi-nally, we present an example implementation of X-THCfor the ﬁnite-range Gaussian potential in a basis of Her-mite functions, demonstrating that X-THC is both loss-less and markedly eﬃcient in practice. X-THC Example -

Consider a one-dimensional ( D =1) problem in Cartesian coordinates, involving a ﬁnitebasis of M + 1 Hermite functions { ψ i ( x ) } (labeled from0 to M ) with a local two-body ( N = 2) potential ˆ V ≡ ˆ V ( x , x ). The potential matrix elements are, (cid:104) ij | ˆ V | i (cid:48) j (cid:48) (cid:105) ≡ (cid:90) (cid:90) d x d x ψ i ( x ) ψ j ( x ) ˆ V ( x , x ) ψ i (cid:48) ( x ) ψ j (cid:48) ( x ) . (2)The ﬁrst stage in X-THC is to note that all ( M +1) prod-ucts ψ i ( x ) ψ i (cid:48) ( x ) are exactly spanned by an orthonor-mal “auxiliary” basis { χ A ( x ) } consisting of 2 M + 1Hermite functions with a slightly modiﬁed spatial range, χ A ( x ) ≡ ψ A ( √ x ), ψ i ( x ) ψ i (cid:48) ( x ) = (cid:80) A [ ii (cid:48) A ] χ A ( x ) , (3)where, [ ii (cid:48) A ] ≡ (cid:90) R d x ψ i ( x ) ψ i (cid:48) ( x ) χ A ( x ) . (4)This resolution is well known in the context of nuclearphysics [15–17], and is analogous to the popular Den-sity Fitting (DF) procedure of electronic structure the-ory [18–20]. In this context, the decomposition is exactthanks to the closure properties of the polynomial-basedHermite functions. The integrals are now given as, (cid:104) ij | ˆ V | i (cid:48) j (cid:48) (cid:105) = (cid:88) AB [ ii (cid:48) A ][ jj (cid:48) B ] G AB , (5)where, G AB ≡ (cid:90) (cid:90) R d x d x χ A ( x ) χ B ( x ) ˆ V ( x , x ) . (6)Thus, the fourth-order integral tensor is expressed as aproduct of second- and third-order tensors. Even thoughwe have compressed the fourth-order tensor, this repre-sentation still precludes scaling reduction in “exchange-like” terms. A canonical example of such a term is thepairing ﬁeld in Hartree-Fock-Bogoliubov theory,∆ ij ≡ (cid:88) i (cid:48) j (cid:48) (cid:104) ij | ˆ V | i (cid:48) j (cid:48) (cid:105) κ i (cid:48) j (cid:48) = (cid:88) ABi (cid:48) j (cid:48) [ ii (cid:48) A ][ jj (cid:48) B ] G AB κ i (cid:48) j (cid:48) , (7) where κ is the pairing tensor. Despite the factorization,computing this term still scales as O ( M ) = O ( M ND ).The critical step in THC is to resolve the three-indexoverlap integral [ ii (cid:48) A ] to “unpin” the indices i and i (cid:48) across some additional linear-scaling index P . That is, weseek a decomposition of the form [ ii (cid:48) A ] = (cid:80) P X Pi X Pi (cid:48) Y PA ,where the range of P is O ( M ). Thanks to the choice ofa polynomial basis, the overlap integral is exactly inte-grated by a 2 M + 1-node Gaussian quadrature (in thiscase, Gauss-Hermite) deﬁned by the nodes and weights { < x P , w P > } [21]. Therefore, the quadrature grid indexprovides a natural decomposition of the overlap integral,[ ii (cid:48) A ] = (cid:88) P w P ψ i ( x P ) ψ i (cid:48) ( x P ) χ A ( x P ) = (cid:88) P X Pi X Pi (cid:48) Y PA , (8)where X Pi ≡ ψ i ( x P ) and Y PA ≡ w P χ A ( x p ). This isreminiscent of the discrete variable representation [22–25] or pseudospectral [26] techniques of chemical physics.Deﬁning the intermediate Z P Q = (cid:80) AB Y PA G AB Y QB , thefull integral (2) is thus expressed as, (cid:104) ij | ˆ V | i (cid:48) j (cid:48) (cid:105) = (cid:88) P Q X Pi X Qj Z P Q X Pi (cid:48) X Qj (cid:48) . (9)This X-THC representation of the integral tensor is thekey for the exact O ( M ) = O ( M ND +1 ) treatment ofthe pairing term, via several intermediate summations,indicated here by brackets for clarity,∆ ij = (cid:88) P Qi (cid:48) j (cid:48) X Pi X Qj Z P Q X Pi (cid:48) X Qj (cid:48) κ i (cid:48) j (cid:48) = (cid:88) P X Pi (cid:88) Q X Qj  Z P Q (cid:88) i (cid:48) X Pi (cid:48) (cid:88) j (cid:48) X Qj (cid:48) κ i (cid:48) j (cid:48)  . (10) Interpretation -

At ﬁrst glance, the Z operator is amere mathematical intermediate, but there exists a muchricher interpretation: it is a quantized renormalizationof the coordinate-space representation of the potentialoperator ˆ V . To see this, we ﬁrst consider the continuous,renormalized potential operator ¯ V , deﬁned as,¯ V ( x , x ) ≡ (cid:88) AB χ A ( x ) χ B ( x ) G AB . (11)This operator is not equivalent to the original in physicalspace, i.e., ˆ V ( x , x ) (cid:54) = ¯ V ( x , x ), yet the matrix ele-ments of both operators are identical, i.e., (cid:104) ij | ˆ V | i (cid:48) j (cid:48) (cid:105) = (cid:104) ij | ¯ V | i (cid:48) j (cid:48) (cid:105) . The renormalized operator is simply the rawoperator ˆ V with all components outside of the ﬁnite prod-uct space { ψ i ( x ) ψ i (cid:48) ( x ) } ⇔ { χ A ( x ) } projected out ineach coordinate. This projection is serendipitous: thecoordinate-space integrand involving ¯ V and the productsof basis functions are exactly resolved by the Gaussianquadrature for the auxiliary basis, while the correspond-ing integrand for ˆ V is not exact under any ﬁnite quadra-ture due to the presence of “alias” components outsideof { ψ i ( x ) ψ i (cid:48) ( x ) } . Applying the Gaussian quadrature,we can quantize the renormalized operator ¯ V to producethe discrete operator ˜ V , adding quadrature weights toaccount for the spatial contribution of each point,˜ V ( x , x ) ≡ w P w Q δ ( x − x P ) δ ( x − x Q ) ¯ V ( x , x ) . (12)As with ¯ V , the matrix elements of ˜ V are identical tothose of ˆ V . Integrating ˜ V instead of ˆ V naturally exposesthe X-THC factorization, (cid:104) ij | ˆ V | i (cid:48) j (cid:48) (cid:105) = (cid:104) ij | ˜ V | i (cid:48) j (cid:48) (cid:105) = (cid:90) (cid:90) d x d x ψ i ( x ) ψ j ( x ) ˜ V ( x , x ) ψ i (cid:48) ( x ) ψ j (cid:48) ( x )= (cid:88) P Q X Pi X Qj Z P Q X Pi (cid:48) X Qj (cid:48) . (13)Here, the elements Z P Q are simply the quantized valuesof the renormalized potential, with the weights rolled in,i.e., Z P Q = w P w Q ¯ V ( x P , x Q ). An example involving aGaussian potential in Hermite functions is shown in Fig-ure 1. The renormalized potential (right) clearly showsthe eﬀects of projection from the raw potential (left).The locations of the quantization to Z P Q (the positionsat which ¯ V can be discretized in a lossless manner) areindicated with small white x’s on the right. −4 −2 0 2 4 −4 −2 0 2 4−4−2024 x x FIG. 1: (color online) Example of the X-THC process fora one-dimensional, two-body Gaussian potential ˆ V ( x , x ) =exp( − x ) in Hermite functions { ψ i ( x ) } up to M = 5. Left:raw ˆ V ( x , x ). Right: renormalized, quantizable ¯ V ( x , x ).White x’s indicate the collocation locations of the Gauss-Hermite quadrature to the quantized operator ˜ V ( x , x ). This understanding of the Z operator reveals thatwhile X-THC is built from DF and DVR techniques, theresultant supersedes both of the originals. In the contextof local potentials and polynomial basis sets, DF is alwaysexact, but does not provide separability of the i and i (cid:48) in-dices, precluding scaling reductions. DVR techniques doprovide separability, but are only exact when an inﬁnitequadrature is used, for an arbitrary choice of local poten-tial. By contrast, X-THC’s particular merger of DF and DVR yields a perfect dealiasing renormalization withina ﬁnite quadrature, providing a decomposition that isboth exact and separable for an arbitrary choice of localpotential. Generalized X-THC -

The generalization of the one-dimensional, two-body, Hermite function example aboveto N -body potentials in D -dimensions and other choicesof polynomial direct-product bases is straightforward.For X-THC to hold, the one-particle basis must be ofthe D -dimensional direct-product polynomial type, i.e., ψ i ( r ) ≡ (cid:81) Dµ =1 P i µ ( r µ ) v µ ( r µ ). In each dimension µ , P i µ is a polynomial of up to degree i µ , and v µ is an arbitraryweight function (analogous to the Gaussian term in theHermite functions above). Such basis sets are widely usedin atomic and nuclear many-body physics in various coor-dinate systems. Use of a direct-product polynomial basisautomatically guarantees closure: for the M µ + 1 func-tions in the µ th dimension, the span < ψ i µ ( r µ ) ψ i (cid:48) µ ( r µ ) > lies wholly inside a 2 M µ + 1-function auxiliary basis, de-ﬁned by a set of polynomials orthogonal with respect tothe weight | v µ ( r µ ) | . Additionally, all quadratic productsof auxiliary functions are exactly integrated by a 2 M µ +1-node Gaussian quadrature { < r P µ , w P µ > } which can al-ways be found, e.g., by the Golub-Welsch algorithm [27].These properties allow for the X-THC factorization, (cid:104) i . . . n | ˆ V | i (cid:48) . . . n (cid:48) (cid:105) = (cid:88) P ...W X Pi . . . X Wn Z P ...W X Pi (cid:48) . . . X Wn (cid:48) , (14)with each X Pi being the direct product of the D under-lying X P µ i µ . Z P ...W is the generalization of Z P Q to thecase with N -body auxiliary integrals G A...N .Within the X-THC representation, the represen-tative generalization of the pairing term, ∆ i...n ≡(cid:104) i . . . n | ˆ V | i (cid:48) . . . n (cid:48) (cid:105) κ i (cid:48) ...n (cid:48) , now scales as O ( M ND +1 µ ),rather than O ( M NDµ ), with no approximation or restric-tion on the form of the local, ﬁnite-range potential ˆ V .It is worth noting that common techniques to reducethe cost of treating exchange-like terms involve approxi-mating the potential to be direct-product separable over N w terms, e.g., by approximating the Coulomb opera-tor as a sum of separable Gaussians [28, 29]. This re-duces the conventional or DF cost of forming the gen-eralized pairing tensor to O ( M ND + Nµ ). X-THC can beapplied to this approximate separable potential, produc-ing an O ( M ND +1 µ ) implementation. However, the sepa-rable form gives no particular scaling advantage in theX-THC formalism, and can only reduce the prefactorand memory requirements. A more severe approxima-tion is the invocation of a zero-range potential. This istypically formulated as a DVR-type quadrature in coor-dinate space, which can be exact depending on the formof the zero-range operator [30]. The asymptotic scal-ing of a pairing term involving a zero-range potential is O ( M ND +1 µ ), due to the ﬁrst or last transformation intoor out of the grid index. Remarkably, this is the sameasymptotic scaling as X-THC. The zero-range potentialwill generally have lower prefactor than X-THC (as thereis only one grid coordinate in the zero-range potential),but the asymptotic scalings are identical, and thus thetractability limits should be comparable. A summary ofthe scaling reductions aﬀorded with various factorizationapproaches and local potentials is shown in Table I. TABLE I: Computational scalings for the pairing term of anarbitrary local potential in several approaches. M µ is theorder of the polynomial basis in the µ th degree of freedom,and the potential is N -body in D dimensions. For simplicity,we consider the isotropic case where M µ is the same in alldimensions in this comparison. N w is the number of termsretained in a separable approximation to the potential.Approach General Local Separable Local Zero-RangeConventional O ( M NDµ ) O ( N w M ND + Nµ ) O ( M ND +1 µ )X-THC O ( M ND +1 µ ) O ( N w M ND +1 µ ) O ( M ND +1 µ ) Practical Demonstration -

To illustrate the numeri-cal equivalence and practical utility of the X-THC ap-proach, a hybrid MATLAB/C++ code was developed toproduce generalized pairing ﬁelds for D -dimensional, N -body forces in Hermite functions. A complete descriptionof the code is presented in the supplemental material.We have veriﬁed that the X-THC generalized pair-ing ﬁelds are exact within machine precision (as ex-pected mathematically). Figure 2 shows the compu-tational gains which can be achieved from the X-THCfactorization using a representative example of N = 2and D = 1 , ,

3. For a general local potential, X-THCis several orders of magnitude faster than conventionalapproaches for the largest M µ studied here. When thepotential is written in separable form, the X-THC scal-ing advantage is less dramatic, but X-THC becomes lesscostly for the largest M µ used in Figure 2. The X-THCapproach allows one to retain the general local poten-tial and calculate the exact pairing tensor in similar (oreven less) computational eﬀort as with an approximate separable potential. Summary and Outlook -

In this Letter, we havedemonstrated that an exactly quantized renormalizationof any local, ﬁnite-range N -body potential exists in anysituation where the primary basis set may be composedof polynomial-based functions. This X-THC represen-tation provides for substantial computational scaling re-duction of contractions involving the local potential in-tegral tensor, for instance by reducing the formation ofa representative generalization of the pairing tensor from O ( M NDµ ) to O ( M ND +1 µ ).In electronic structure, the concept of X-THC helpsto codify and rationalize our existing Least-Squares Ten-sor HyperContraction (LS-THC) approximation for non-polynomial basis sets [13]. The least squares procedure µ Log P a i r i ng T i m e [ s ] w−Separable Potential D=1D=2D=3General Potential D=1D=2D=3Conventional X−THC FIG. 2: (color online) Wall times for pairing tensor formationas a function of M µ for N = 2 (log-log scale). N w = 8 for theseparable potential. introduced in that work actually performs an implicitrenormalization of the potential. Since the basis setsused in our previous applications of LS-THC were not di-rect products of polynomials (but rather atom-centeredGaussian functions), the decomposition was necessarilyan approximation. As the X-THC limit is approached,the ﬁdelity of the approximation will depend on both thebasis set resemblance to a set of polynomial-based func-tions and the eﬃciency of the quadrature. The phys-ical picture provided by X-THC’s explicit renormaliza-tion process will almost certainly aid in the search forenhanced approximate THC recipes for non-polynomialbasis sets.In nuclear structure, the potential applications for X-THC are immediate and substantial. A crucial ﬁndingof this work is that the formal scaling of operations in-volving arbitrary local potential operators is identical tothat of zero-range operators, without any loss in accu-racy. This implies that the ﬁnite-range two-body Gognypotentials of nuclear DFT can immediately be appliedwith the same computational complexity as the moreapproximate zero-range Skyrme potentials. Tractabilitygains should be even more marked for three-body poten-tials, paving the way for Hamiltonian-based nuclear en-ergy densities derived from eﬀective, local, ﬁnite-range,density-independent, two- and three-nucleon pseudopo-tentials, which, by construction, would be free of thecurrent artifacts of nuclear DFT. Acknowledgments -

R.M.P. is supported by a DOEComputational Science Graduate Fellowship (Grant DE-FG02-97ER25308), particularly including a practicumrotation at Lawrence Livermore National Laboratory.This material is based on work supported by the NationalScience Foundation through grants to C.D.S. (GrantNo. CHE-1011360) and T.J.M. (Grant No. CHE-1047577), by the Department of Defense through a grantto T.J.M. (Oﬃce of the Director of Defense Research andEngineering). It was performed under the auspices of theUS Department of Energy by the Lawrence LivermoreNational Laboratory (Contract DE-AC52-07NA27344)through the Scientiﬁc Discovery through Advanced Com-puting (SciDAC) program funded by U.S. Department ofEnergy, Oﬃce of Science, Advanced Scientiﬁc Comput-ing Research and Nuclear Physics. The Center for Com-putational Molecular Science and Technology is fundedthrough an NSF CRIF award (Grant No. CHE-0946869)and by Georgia Tech. ∗ Electronic address: [email protected] † Electronic address: [email protected] ‡ Electronic address: [email protected][1] J. C. Slater, Phys. Rev. , 385 (1951).[2] J. P. Perdew, R. G. Parr, M. Levy, and J. L. Balduz,Phys. Rev. Lett. , 1691 (1982).[3] T. Skyrme, Nuclear Physics , 615 (1958).[4] P. Nozi`eres, Theory of interacting Fermi systems , Ad-vanced book classics, Addison-Wesley, Reading, MA,1964.[5] J. Blaizot, Physics Letters B , 435 (1976).[6] J. Decharg´e and D. Gogny, Phys. Rev. C , 1568 (1980).[7] M. anguiano, J. Egido, , and L. Robledo, Nucl. Phys. A , 467 (2001).[8] J. Dobaczewski, M. V. Stoitsov, W. Nazarewicz, and P.-G. Reinhard, Phys. Rev. C , 054315 (2007).[9] T. Duguet, M. Bender, K. Bennaceur, D. Lacroix, and T. Lesinski, Phys. Rev. C , 044320 (2009).[10] D. Lacroix, T. Duguet, and M. Bender, Phys. Rev. C ,044318 (2009).[11] M. Bender, T. Duguet, and D. Lacroix, Phys. Rev. C ,044319 (2009).[12] E. G. Hohenstein, R. M. Parrish, and T. J. Mart´ınez, J.Chem. Phys. , 044103 (2012).[13] R. M. Parrish, E. G. Hohenstein, T. J. Mart´ınez, andC. D. Sherrill, J. Chem. Phys. , 224106 (2012).[14] E. G. Hohenstein, R. M. Parrish, C. D. Sherill, and T. J.Mart´ınez, J. Chem. Phys. , 221101 (2012).[15] T. Brody and M. Moshinsky, Tables of transformationbrackets for nuclear shell-model calculations , Gordon andBreach Science Publishers, 1967.[16] J. D. Talman, Nuclear Physics A , 273 (1970).[17] D. Gogny, Nucl. Phys. A , 399 (1975).[18] J. L. Whitten, J. Chem. Phys. , 4496 (1973).[19] B. I. Dunlap, J. W. D. Connolly, and J. R. Sabin, Int. J.Quantum Chem. Symp. , 81 (1977).[20] O. Vahtras, J. Alml¨of, and M. W. Feyereisen, Chem.Phys. Lett. , 514 (1993).[21] M. Abramowitz and I. A. Stegun, Handbook of Math-ematical Functions with Formulas, Graphs, and Mathe-matical Tables , Dover Publications, New York, 1964.[22] D. O. Harris, G. G. Engerholm, and W. D. Gwinn, J.Chem. Phys. , 1515 (1965).[23] A. S. Dickinson and P. R. Certain, J. Chem. Phys. ,4209 (1968).[24] J. Lill, G. Parker, and J. Light, Chem. Phys. Lett. ,483 (1982).[25] D. Baye and P. H. Heenen, Journal of Physics A: Math-ematical and General , 2041 (1986).[26] R. A. Friesner, Chem. Phys. Lett. , 39 (1985).[27] G. H. Golub and J. H. Welsch, Math. Comp. , 221(1969).[28] J. Dobaczewski et al., Comp. Phys. Comm. , 2361(2009).[29] L. M. Robledo, Phys. Rev. C , 044312 (2010).[30] J. Dobaczewski and J. Dudek, Computer Physics Com-munications , 166 (1997). upplemental material for: Exact tensor hypercontraction: A universal technique forthe resolution of matrix elements of local, ﬁnite-range N -body potentials inmany-body quantum problems Robert M. Parrish, Edward G. Hohenstein,

2, 3

Nicolas F. Schunck, ∗ C. David Sherrill, † and Todd J. Mart´ınez

PACS numbers: 21.30.Fe,21.60.Jz,31.15.-p,31.10.+z

NOTATION

A rather large number of indices appear in the primarymanuscript, so we summarize them here for clarity.

Particle Number:

The particle number is denoted im-plicitly by the presence of ellipses in relevant equa-tions. This index ranges from 1 to N . Dimension:

Each degree of freedom is indexed by µ andranges from 1 to D for each particle. In direct-product bases (e.g., X-THC), dimensionality playsa major role, as the total number of primary basisfunctions, auxiliary basis functions, and quadra-ture grid points scale as O ( M Dµ ) In non-direct-product bases (e.g., LS-THC), dimensionality haslittle meaning [2], so D is usually assumed to be1. In a non-direct-product basis, the number ofprimary basis functions, auxiliary basis functions,and quadrature grid points all generally scale as O ( M ). Primary Basis:

The primary single-particle basis is de-noted by the indices i to n (bra), and i to n (ket).In a direct-product basis, i is a composite indexcorresponding to the underlying direct-product of1-dimensional primary basis functions, e.g., | i i ≡| i x i| i y i| i z i . In a polynomial direct-product basis,the 1-dimensional primary basis functions for the µ th degree of freedom range from 0 to M µ (thezero is a consequence of the polynomial deﬁnitionof the basis). In a non-direct-product basis, thebasis function indices range from 1 to M . Auxiliary Basis:

Auxiliary basis indices are denoted bythe indices

A, B, . . . . In a direct-product basis, A is a composite index corresponding to the underly-ing direct-product of 1-dimensional auxiliary basisfunctions, e.g., | A i ≡ | A x i| A y i| A z i . In a polyno-mial direct-product basis, the 1-dimensional auxil-iary basis functions in dimension µ range from 0 to 2 M µ (the zero is a consequence of the polynomialdeﬁnition of the basis). In a non-direct-product ba-sis, the full auxiliary basis functions range from 1to M A , where increasing M A increases the ﬁdelityof the approximate density ﬁtting procedure. Quadrature Grid:

Quadrature grid indices are de-noted by the indices

P, Q, . . . . In a direct-productbasis, P is a composite index corresponding to theunion of the underlying one-dimensional quadra-tures, e.g., r P ≡ ( x P x , y P y , z P z ). In a polyno-mial direct-product basis, the indices for an ex-act quadrature for the µ th degree of freedom rangefrom 0 to 2 M µ (the zero is a consequence of thepolynomial deﬁnition of the basis). In a non-direct-product basis, the LS-THC quadrature grid con-tains M P points, where increasing M P increasesthe ﬁdelity of the approximate LS-THC procedure.For cases where two classes of indices are required to re-solve a tensor element, a double subscript is used, e.g.,the P -th grid point for the µ th degree of freedom is de-noted r P µ . Also note that we use the generalized Einsteinconvention in this work: a repeated index on the rightside of an equation is contracted over if it appears twice,or hypercontracted over if it appears more than twice, solong as the same index is not present on the left side ofthe same equation.Note below that X-THC holds for any case in which thebasis set is polynomial (in any representation of physicalspace, e.g., x = r ), so long as the auxiliary potential in-tegrals exist. This implies that a potential operator thatis formulated in momentum space can be used in con-cert with a polynomial basis in position space, so long asthe auxiliary integrals can be computed by any meansavailable. An alternative way to picture this is thatthe momentum-space potential operator could be trans-formed to position-space via analytical Fourier transfor-mation, without loss of locality. Having made this clari-ﬁcation, we will work entirely in the position space limitbelow. FULL N -BODY D -DIMENSIONAL X-THC For clarity, we provide explicit deﬁnition of the full N -body D -dimensional X-THC representation here.A direct-product basis founded on polynomials has theform, ψ i ( r ) ≡ D Y µ =1 P i µ ( r µ ) v µ ( r µ ) . (1)In each dimension µ , P i µ is a polynomial of up to degree i µ , and v µ is a weight function, typically chosen to bringthe polynomial into L ( D ), where D is the domain of theproblem, and also to provide qualitative conformationto some a priori knowledge of the future shape of thewavefunction. The i µ index ranges from 0 to M µ , so thepolynomials range up to a maximum degree of M µ . Notethat these polynomials do not have to be orthogonal,though they are often deﬁned to be so. The productof polynomials being itself a polynomial of up to order2 M µ , the local product ψ ∗ i µ ( r µ ) ψ i µ ( r µ ) in dimension µ lies inside the span of the 2 M µ + 1 auxiliary functions, χ A µ ( r µ ) = ˜ P A µ ( r µ ) | v µ ( r µ ) | , (2)where ˜ P A µ is a polynomial of up to order 2 M µ , oftendiﬀerent from the primary basis polynomials P i µ . Wewill choose these auxiliary functions to be orthonormalfor convenience (this avoids DF metric matrices). Notethat for a (generally complex) polynomial-based primarybasis { ψ i µ ( r µ ) } , we can always choose an exact, whollyreal, orthonormal auxiliary basis { χ A µ ( r µ ) } with 2 M µ +1 functions. Thus, most of the complex conjugates onthe auxiliary basis functions below are included only forconvenience in the case that this work should need to begeneralized to complex auxiliary bases.The auxiliary functions χ A µ ( r µ ) yield an exact DF rep-resentation for this basis, h i . . . n | ˆ V | i . . . n i = [ ii A ] . . . [ nn N ] G A...N . (3)The overlap integrals are separable in coordinates,[ ii A ] = D Y µ =1 [ i µ i µ A µ ] . (4)However, in general, the auxiliary potential integrals are not separable, G A...N ≡ Z d r . . . Z d r N χ ∗ A ( r ) . . . χ ∗ N ( r N ) ˆ V ( r , . . . , r N ) . (5)To produce the THC representation, it remains to ﬁndan exact quadrature for the three-index overlap integrals.Owing to the closure in the product i µ i µ , any quadrature which can exactly integrate the auxiliary overlap metric[ B µ A µ ] = δ B µ A µ can exactly integrate the three-indexoverlap integrals [ i µ i µ A µ ], as the spans are identical inboth cases. A quadrature which can exactly integrateall quadratic products of functions based on orthogonalpolynomials of up to degree 2 M µ is precisely the deﬁni-tion of the 2 M µ + 1-node Gaussian quadrature. Regard-less of the choice of weight v µ ( r µ ), the nodes and roots ofthis Gaussian quadrature can always be determined eﬃ-ciently by the Golub-Welsch algorithm, giving the nodesand weights { < r P µ , w P µ > } The overlap integral is thenexactly resolved as,[ i µ i µ A µ ] = ψ ∗ i µ ( r P µ ) | {z } X ∗ Pµiµ ψ i µ ( r P µ ) | {z } X Pµi µ χ A µ ( r P µ ) w P µ | {z } Y PµAµ . (6)Note that the contraction index P µ occurs not twice (as ina standard contraction operation) but three times (whatwe refer to as a “hypercontraction”).In the full direct-product basis, we will use the col-lapsed notation for the collocations of this Gaussianquadrature, X Pi = D Y µ =1 X P µ i µ , Y PA = D Y µ =1 Y P µ A µ , (7)to save space. However, in real implementation, thedirect-product separability of these two quantities is veryimportant to achieve near-optimal scaling in formation ofthe X-THC factorization and subsequent utilization.With these deﬁnitions, the X-THC Z operator reads Z P ...W = Y PA . . . Y WN G A...N , (8)which yields the the full N -body D -dimensional X-THC, h i . . . n | ˆ V | i . . . n i = X ∗ Pi . . . X ∗ Wn Z P ...W X Pi . . . X Wn . (9)The Z operator is not, in general, direct-product sepa-rable, but the factors X and Y are. In forming Z , thecontraction of the Y factors with the auxiliary poten-tial integrals would na¨ıvely scale as O ( M ND + Dµ ), bearingin mind that the number of auxiliary functions and thenumber of quadrature points are both proportional to O ( M Dµ ). However, for each Y , we can perform the trans-formation in one dimension at a time (e.g., replacing A x with P x , etc), reducing the formal scaling to O ( M ND +1 µ ).Similarly, the separability of the X and Y factors is crit-ically important to reduce the scaling of the generalizedpairing term,∆ i...n = h i . . . n | ˆ V | i . . . n i κ i ...n . (10)This contraction of a rank- N D tensor with the integraltensor is in fact the worst possible scenario, as far asthe DF representation is concerned, since the compoundcontraction index involves all N DF coeﬃcient tensors,∆ i...n = d Aii . . . d Nnn G A...N κ i ...n . (11)In general, this term will always scale as O ( M NDµ ) withboth conventional and DF approaches, though the com-putational pre-factor is markedly higher in the lattercase. When using X-THC, the generalized pairing termnow reads,∆ i...n = h i . . . n | ˆ V | i . . . n i κ i ...n = X ∗ Pi . . . (cid:2) X ∗ Wn (cid:2) Z P ...W (cid:2) X Pi . . . (cid:2) X Wn κ i ...n (cid:3)(cid:3)(cid:3)(cid:3) . (12)Any contraction involving X here would na¨ıvely scaleas O ( M ND + Dµ ). However, for each X , we can performthe transformation in one dimension at a time (e.g., re-placing i x with P x , etc), reducing the formal scaling to O ( M ND +1 µ ).A common technique (usually an approximation) is toassert a w -separable form for the potential,ˆ V ( r , . . . , r N ) ≡ N w X w =1 D Y µ =1 ˆ V wµ ( r µ , . . . , r µ N ) . (13)In this case, the integral tensor factors as, h i . . . n | ˆ V | i . . . n i = N w X w =1 D Y µ =1 h i µ . . . n µ | ˆ V wµ | i µ . . . n µ i . (14)This allows the pairing tensor to be computed in O ( N w M ND + Nµ ) via several intermediates. For X-THC,a w -separable potential allows for separability of the Z operator, i.e., h i . . . n | ˆ V | i . . . n i = N w X w =1 D Y µ =1 X ∗ P µ i µ . . . X ∗ W µ n µ Z P µ ...W µ w X P µ i µ . . . X W µ n µ . (15)This allows the pairing tensor to be computed in O ( N w M ND +1 µ ). Note that this is the same formal scalingas X-THC in a general local potential, but the memoryusage for Z is lower, and the prefactor may or may notbe lower, depending on how large N w is. DEMONSTRATION CODEOverview

To support the mathematical demonstration of exactX-THC resolution of the potential integral tensor, andto provide a practical example of the scaling gains pro-vided by X-THC, a mixed MATLAB/C++ code was de-veloped for the case of D -dimensional Hermite functions with N -body w -contracted Gaussian forces. In this code,the required potential integrals (in the primary or aux-iliary basis) and possible X-THC factors are generatedin MATLAB and written to disk, for the given basis size M µ , dimensionality D , number of bodies N , and numberof w -contraction points N w . For each integral technologyand w -separable vs. w -nonseparable case, the generalizedpairing ﬁeld is computed for a randomly generated pair-ing tensor in a standalone C++ code for the particular w -separability and integrals technology case. In each C++code, the integrals and factors are read in, the w indicescontracted over ﬁrst if simulating a non-separable force,and then the generalized pairing tensor is computed ac-cording to the algorithms discussed below.MATLAB was chosen for the integral and factor gen-eration routines due to ease of implementation, and par-ticular strength in treatment of arbitrary rank tensors.As this portion of the total procedure would typically beperformed as a single-use overhead step (e.g., before HFBiterations or before application of the integrals in corre-lated methods), we do not include this step in the tim-ings for the generalized pairing tensor, and thus there isno penalty for using the interpreted and rather memory-na¨ıve MATLAB language for this stage. For the heavylinear algebra work of pairing tensor formation, C++ wasselected for its “close to the metal” properties, particu-larly including explicit control of memory allocation andability to swap pointers without performing explicit deepcopy operations. Wherever possible (and for absolutelyall contraction operations), BLAS calls are used, withthe algorithms designed so as to allow for permutation ofmemory to be hidden in BLAS3 or BLAS2 operations asmuch as possible. The algorithms were formulated so asto rely preferentially on the BLAS3 DGEMM operation,followed by the BLAS2 DGEMV, followed by variousBLAS1 operations. Every eﬀort was expended to pro-duce well-optimized algorithms, with the same amountof optimization present in both the X-THC and conven-tional methods. With these considerations, we believethat the timings reported are entirely representative ofa practical application of the various integrals technolo-gies, and show a wholly fair comparison of X-THC andconventional methods.The C++ codes are compiled with the Intel icpc12.0.1 compiler, using -O3 optimization. The BLAScalls are handled with Intel’s very eﬃcient MKL 7.0.1 library, with threading disabled. Performance measure-ments are performed using the

PAPI 5.0.1 library, whichfeatures a wall timer accurate to ∼ w -separableforces with N >

1, the optimal pathway is to contractthe DF intermediates to the conventional integrals, andthen form the pairing tensor in the conventional man-ner. As a result, the DF timings results for w -separablepotentials are essentially indistinguishable from the con-ventional case. For non- w -separable potentials, the DFalgorithms require the same O ( M NDµ ) scaling as the con-ventional case, but the pre-factors are many orders ofmagnitude larger, due to the larger auxiliary basis sizesinvolved. Moreover, in the non- w -separable case, forma-tion of the conventional integrals from the DF integralsexhibits a higher scaling of O ( M ND +1 µ ), and is thereforenot a viable alternative. In any case, the conventionalalgorithm always outperforms the DF algorithm for thegeneralized pairing tensor, so we have elected to not in-clude the DF results here. This failure of DF methodsfor “exchange-like” contractions is well known, and wasthe primary motivation for the development of the THCrepresentation. Basis/Potential Choice

Hermite Function Primary Basis

The primary basis functions chosen for this demonstra-tion are direct products of generalized Hermite functions, ψ i ( r ) = D Y µ ψ i µ ( r µ ) . (16)Below, we will drop the µ labels and work in 1D ( r ≡ x ) unless otherwise noted. The 1D generalized Hermitefunction is, ψ i ( x ) = ( b µ ) / ( √ π i i !) − / H i ( z ) exp( − z / | {z } φ i ( z ) , (17)where H i is the i -th Hermite polynomial and the nondi-mensional coordinate is, z = b µ x. (18) i runs from 0 to M µ . Note that we reserve φ i ( z ) for thetrue non-dimensional Hermite function, where b µ = 1 .The auxiliary functions for this problem are also gen-eralized Hermite functions, χ A ( x ) = ( √ b µ ) / ( √ π A A !) − / H A ( z ) exp( − z / , (19) where now, z = √ b µ x, (20)and A runs from 0 to 2 M µ . The THC quadrature forthis problem is thus the 2 M µ + 1-node Gauss-Hermitequadrature with a spatial length scale of √ b µ . Gaussian Potential

For ﬂexibility, we use for the potential a linear combi-nation of Gaussians. Given the set { < α w , β w > } , theform of the potential is,ˆ V ( x , . . . , x N ) = N w X w N − X η =1 N X ξ = η +1 α /Dw exp( − β w x ηξ ) . (21)For a 2-body potential in 1 dimension, with N w = 1 thisreduces to the usual Gaussian force,ˆ V ( x , x ) = α /Dw exp( − β w x ) . (22)For higher N , this is simply a sum of two-body Gaus-sian forces over all possible pairs of two-body coordinates, x ηξ = x η − x ξ .We use a set of { < α w , β w > } with N w = 8 whichapproximates the Coulomb operator 1 /r for all compu-tations shown in this work. This allows us to show sepa-rate accuracy and timings results for the w -separable andnon- w -separable cases, depending on whether we chooseto sum over the w index ﬁrst or last. Potential Integrals (MATLAB)

All of the D -dimensional N -body potential integralsabove can be constructed from primitive 1-dimensional2-body potential integrals. For conventional integral ap-proaches, four-index integrals in the primary basis arerequired. Within X-THC, two-center integrals in theauxiliary basis are required. Integrals of this type arediscussed extensively in [1], where Moshinski transforma-tions and length-scale transformations are used to pro-vide analytical conventional integrals (the generalizationto auxiliary integrals is straightforward). Conventional Integrals

Following [1], the 1-dimensional 2-body primary inte-grals are computed as, h ij | ˆ V w | i j i = α /Dw Z Z d x d x ψ i ( x ) ψ j ( x ) × exp( − β w x ) ψ i ( x ) ψ j ( x )= α /Dw D Aamn D Abm n D aa ( η w ) D ba ( η w ) , (23)where all summations run from 0 to 2 M µ . D Aamn are theMoshinski transformation coeﬃcients, and D aa ( η w ) arethe length-scale transformation coeﬃcients (see below).The length-scale change parameter is η w = 1 q β w /b µ . (24) Auxiliary Integrals

By extension, the 1-dimensional 2-body auxiliary inte-grals are computed as, G ABw = α /Dw Z Z d x d x χ A ( x ) exp( − β w x ) χ B ( x )= 1 √ b µ α /Dw q β w /b µ D NnAB D nn ( η w ) I N I n , (25)where again all summations run from 0 to 2 M µ . The I N quantities are primitive integrals over single Hermitefunctions of unit length, I N = Z d z ψ N ( z ) = ( √ π √ ( N − √ N ! N even0 N odd (26) Moshinski Transformation Coeﬃcients D Aamn

The Moshinski transformation coeﬃcients relate prod-ucts of Hermite functions in Eulerian coordinates x and x to corresponding Hermite functions in the Lagrangiancoordinates X and x , where, X = 1 √ x + x ] , x = 1 √ x − x ] , (27)The transformation is ψ n ( x ) ψ n ( x ) = D N,nn ,n ψ N ( X ) ψ n ( x ) . (28)As the Gaussian potential is central (i.e., depends onlyon x , not X ), invoking the Moshinksi transformation re-duces the two-coordinate potential integrals to a sepa-rable product of one-coordinate integrals in X and x .If n and n each range from 0 to M µ , N and n eachrange from 0 to 2 M µ . The well-known selection rule is n + n = N + n . A simple, explicit formula for thesecoeﬃcients is [1] D N,nn ,n = δ n + n ,N + n (cid:18) n ! n ! N ! n ! (cid:19) (cid:18) √ (cid:19) N + n × i n , and that the parity agrees, F n,n = 12 [1 + ( − n + n ] = (cid:26) , n + n even0 , n + n odd (35)Because η ≤

1, this formula appears to be univer-sally stable. However, the numerators and denomina-tors above both involve divergent values, so a log-spaceformalism is required for explicit evaluation to preventoverﬂow. This may lose a few digits of precision, so wehave elected to use a recurrence relation for these coef-ﬁcients, which is easily derived. The recurrence relationis, D n +1 ,n = η r n + 1 n + 1 D n,n +1 + η r n n + 1 D n,n − − r nn + 1 D n − ,n . (36)where D , ( η ) = √ η. (37)For implementation purposes, any length-scale transfor-mation coeﬃcient with a “negative” index can be takento be zero. D -Dimensional, N -Body Integrals The generalization to N -body integrals as describedabove is carried out at the 1-dimensional stage (beforethe product over µ is carried out), e.g., for primary-basisintegrals in the 3-body case, h ijk | ˆ V w | i j k i = h ij | ˆ V w | j j i δ kk + h ik | ˆ V w | i k i δ jj + h jk | ˆ V w | j k i δ ii . (38)For auxiliary basis integrals, there is no delta functionin the third coordinate, but rather a normalized primi-tive Hermite integral, i.e., the quantity I N deﬁned in theauxiliary potential integrals. Thus, the auxiliary poten-tial integral is, G ABCw = G ABw I C + G ACw I B + G BCw I A . (39)In practice, the generalization to 3-body integrals is per-formed in MATLAB on the 1-dimensional integrals, be-fore integrals are written to disk.The generalization to non- w -separable D -dimensionalpotentials is carried out by summing over w , e.g., in the2-dimensional, 2-body case, h ij | ˆ V | i j i = N w X w h i x j x | ˆ V w | i x j x ih i y j y | ˆ V w | i y j y i , (40)or, G AB = N w X w G A x B x w G A y B y w . (41)In practice, the production of the nonseparable Z fac-tors or conventional integrals is carried out in blocks inC++, to save memory. The formation of these integralsis not counted in the pairing timings, as inﬁnite memoryis assumed. The overhead for this is many orders of mag-nitude larger for the conventional case than the X-THCcase, due to the larger size of the rank-2 N D conventionalintegral tensor.

X-THC Factors (MATLAB)

To complete the X-THC factorization, the Gauss-Hermite quadrature nodes/weights and collocation ma-trices X and Y are required. Quadratures

The THC grid for this problem is the 2 M µ + 1 nodeGauss-Hermite quadrature with the spatial range param-eter √ b µ .First, the non-dimensional quadrature is generated.The orthonormal-basis position operator X P Q = h P | ˆ x | Q i in the auxiliary Hermite functions is, X P Q = h P | ˆ x | Q i = r P + 12 ( δ P +1 ,Q + δ P,Q +1 ) . (42)This operator is symmetric tridiagonal, with zero diago-nal. The eigendecomposition is formed, X P Q = Q P P x P Q QP . (43)The eigenvalues are the nondimensional quadraturenodes. The weights are determined by ﬁrst generatingthe moment vector, v P = √ πδ P , (44)and applying the diagonalizing transformation, v P = Q P P v P . (45)The full weights are then given by w P = v P exp( x P ) . (46)The nodes and weights are then transformed to the prob-lem domain, by x P = x P √ b µ , w P = w P √ b µ . (47) Collocation

The X and Y X-THC factors require collocations atthe quadrature nodes. This is accomplished by eﬃcient,stable recurrence relations for the Hermite functions.The non-dimensional coordinate is ﬁrst computed, z = b µ x. (48)The ﬁrst two non-dimensional Hermite functions arecomputed explicitly, ψ ( z ) = π − / exp( − z / , (49)and ψ ( z ) = √ zπ − / exp( − z /

2) = √ zψ ( z ) . (50)The recurrence relation is then applied iteratively, ψ i +1 ( z ) = r i + 1 " zψ i ( z ) − r i ψ i − ( z ) = r i + 1 zψ i ( z ) − r ii + 1 ψ i − ( z ) . (51)And ﬁnally the length-scale normalization is added, ψ i ( z ) = ( b µ ) / ψ i ( z ) . (52)The X factor is, X Pi = ψ i ( x P ) = ( b µ ) / ψ i ( z = b µ x P ) , (53)and the Y factor is, Y PA = w P χ A ( x P ) = w P ( √ b µ ) / ψ A ( z = √ b µ x P ) . (54)If desired, the DF 3-index overlap integrals can be gen-erated exactly (within numerical precision) from the X-THC factors, [ ii A ] = X Pi X Pi Y PA . (55)The X-THC Z operators are immediately formed fromthe Y factors and corresponding G auxiliary potentialintegrals, Z P Qw = Y PA Y QB G ABw . (56) Generalized Pairing Algorithms (C++/BLAS)

Algorithms for the generalized pairing tensor with con-ventional or X-THC integrals, and general or w -separablepotentials, are described below and depicted in Algo-rithms 1-4. In these algorithms, we use ellipses to denotearbitrary rank in D or N , and we follow the conventionthat dimensions are striped as the slow superindex, andparticles are striped as the fast superindex [3]. The ten-sors used in these algorithms are stored in practice as acollapsed single-dimensional array (a double* ), regularlystriped so that the left-most index is the fastest dimen-sion and the right-most index is the slowest dimension(Fortran order, which allows for convenient applicationof BLAS operations). For key contraction operations,we group sets of neighboring indices to form the threesuperindices i (result row or fast index), j (result col-umn or slow index) and k (contraction index) for use inGEMM, C ij = A ik B kj . (57)Note the use of color to emphasize the choice of su-perindices. By changing the order of A and B and al-tering the GEMM transposition arguments, we can take i , j , and k in any order in the factor tensors A or B (e.g., the contraction index could actually be the row di-mension in A ). However, striding is not permitted withBLAS3 operations, e.g., a tensor contraction of the form, C ijk = A il B jlk . (58)would require explicit transposition to B ljk or B jkl to beable to group the compound jk index. Our algorithms are designed to eliminate such explicit transposition asmuch as possible. However, no transposition-free algo-rithm exists for X-THC w -separable potentials in arbi-trary N , see the discussion below.Note that our discussions and codes all assumeisotropic basis sets for simplicity (e.g., M x = M y ), butX-THC is certainly not restricted to this. In general, X-THC can handle diﬀering values of M x and M y , and evendiﬀerent classes of basis functions in each dimension (e.g.cylindrical coordinates). Conventional Integrals, General Potential

See Algorithm 1. This algorithm is quite simple, butextraordinarily expensive (this is why non- w -separableforces are rarely used in high D or N cases). The physi-cists’ integral tensor is used in a simple matrix-vectorproduct with the generalized pairing tensor, which is car-ried out by GEMV in M NDµ ≡ O ( M NDµ ) operations.In practice, such an algorithm will almost certainly beorders of magnitude more expensive than reported here.In our demonstration, we simulate inﬁnite memory byforming blocks of the integrals prior to GEMM (i.e., anintegral direct procedure). The integral generation costs O ( M NDµ ) (in this case by contracting out the w index)with a much larger prefactor than the GEMV itself. Ina fully generic potential, the integrals would have to begenerated explicitly, at possibly even higher cost. Thisintegral formation is not counted in the wall times re-ported here, but would increase the practical wall timeconsiderably in the usual case that the rank-2 N D inte-gral tensor does not ﬁt in core memory.

X-THC Integrals, General Potential

See Algorithm 2. This algorithm works by cycli-cally transforming from conﬁguration space i µ to thequantized coordinate space P µ by GEMM, scaling thecoordinate-space pairing tensor by Z , and then perform-ing another cyclic transformation to bring P µ back to i µ via GEMM. The cyclic transformations are written sothat the fast index in the current buﬀer is contracted oﬀ,and the replacement index is placed at the slow index ofthe result buﬀer. The pointers for the result and currentbuﬀer are then swapped, and the next contraction indexautomatically appears in the fast index of the currentbuﬀer, without any need for explicit transposition.Formally, O ( M ND +1 µ ) operations are required for thecyclic permutations. The prefactor arises from the ∼ M µ size (per dimension) of the quadrature index, andfrom the fact that two cyclic permutations are required.This algorithm currently requires essentially threebuﬀers of size 2 ND M NDµ ( T , U , and Z ). An eﬃcientblocked or disk-based algorithm could be developed if thisbecomes a bottleneck, with considerable performancegains expected over conventional disk-based or integral-direct algorithms with general potentials. Conventional Integrals, w -Separable Potential See Algorithm 3. This algorithm works by cyclicallyapplying the integrals for dimension µ via GEMM, foreach w point. Explicit transposition is avoided by plac-ing the replacement index ( i . . . n ) µ as the slow index ofthe result, allowing ( i . . . n ) µ +1 to be exposed as thenew fast index. The formal scaling of this algorithm is O ( N w M ND + Nµ ) operations, with remarkably small exter-nal overhead. The memory requirement is essentially 2 M NDµ buﬀers ( T and U ). The small overhead, combinedwith the eﬃciency of the long ( i . . . n ) µ contraction in-dex and small memory footprint explains much of thecurrent success of w -separable potentials. X-THC Integrals, w -Separable Potential See Algorithm 4. This algorithm works by, for each w point and dimension µ , cyclically forward transformingfrom i µ to P µ , applying the Z ( P ...W ) µ w operation, andthen cyclically backtransforming from P µ to i µ . Theformal scaling is O ( N w M ND +1 µ ) FMA (fused multiply-add) operations. The required memory is essentially two2 N M NDµ buﬀers T and U , a factor of 2 N more than theconventional w -separable algorithm.Unfortunately, an explicit transposition is required forthis algorithm for general N . The genesis of this require-ment is that the cyclic permutation is not carried throughall N D coordinates, but only N coordinates at a time.In the 2-body case, a specialized algorithm can be ap-plied to avoid the explicit transposition. The transfor-mation in each dimension reads, U j LP ← T i j L X Pi U QLP ← T j LP X Qj U QLP ← T QLP Z P Qw U QLi ← T QLP X Pi U Lij ← T QLi X Qj (59)Here L is the compound index corresponding to the otherdimensions, and pointer swap is assumed between eachstep.One additional modiﬁcation can attenuate the prefac-tor somewhat, at the cost of doubling the buﬀer space.The forward transformation of the ﬁrst dimension from κ ( i j ) L to U L ( P Q ) is the same for all w points, as κ is w -independent. Moreover, the back transformation ofthe last dimension from U L ( P Q ) D to ∆ L ( ij ) D does not de-pend on the individual w points, but only on their sum.Thus the buﬀers U L ( P Q ) and U L ( P Q ) D can be used in aprelude/epilogue construct. In the limit that N w → ∞ ,the savings are 100%, 50%, and 33%, for D = 1, 2 , , and3, respectively. In the limit that N w = 1 or D → ∞ , thesavings approach 0%, so the deployment of this modiﬁ-cation would depend greatly on the context and memorycapacity. Computational Results

Timings and accuracy results for generalized pairingtensor formation are shown in Figures 1 and 2, respec-tively, for various integral technologies, dimensions, num-ber of bodies, and problem sizes. To help crystallize theinformation in the timing data, asymptotic scaling andpredicted crossover metrics are presented in Tables I andII, respectively.The accuracy results of Figure 2 depict the relativemaximum residual R in the pairing tensor, deﬁned as, R Method = (cid:13)(cid:13) ∆ Method i...n − ∆ Reference i...n (cid:13)(cid:13) ∞ (cid:13)(cid:13) ∆ Reference i...n (cid:13)(cid:13) ∞ . (60)Here the w -separable conventional integral technologywas selected as a reference for the accuracy [4]. It isapparent that all methods (conventional and THC) areexact to within a reasonable growth factor against themachine epsilon. In fact, the worst relative maximumresidual seen here is less than 10 − (compared to thedouble-precision machine epsilon of 2 . × − ), and doesnot seem to be grow markedly with respect to problemsize. This provides strong numerical evidence that X-THC is a lossless compression of the potential integraltensor. Further agreement could doubtless be obtainedif the Moshinksi coeﬃcients were evaluated with higherprecision [5].The timing results of Figure 1 are quite clean as tim-ings go, with very smooth increase with respect to prob-lem size, indicating that the accumulation procedure(smaller M µ ) or sheer size of the problem (larger M µ )are suﬃcient to eliminate the bulk of the noise that of-ten plagues timing studies. On a log-log scale, all timingcurves exhibit a slight upward concavity which quicklytends toward linearity for larger M µ as the rate-limitingDGEMM-based steps become dominant relative to thelower-scaling operations. Two noticeable “jumps” exist:the last few points of the 2-body, 1-dimensional conven-tional separable case and the last few points of the 2-body, 3-dimensional THC separable case. These jumpsare likely due to working set size limits exceeding somediscrete performance threshold in hardware, for instance,cache or memory bank limits, respectively. From theseplots, power-law regression of the form, t Pairing = αM βµ , (61)is performed, using points selected above an M µ whichappears to be visually free of noise in each case. The β from these regressions are depicted in Table I. The pre-dicted THC crossover points from these regressions (thecritical M µ at which X-THC becomes practically superiorto conventional integrals) are shown in Table II. For allcases in which an explicit crossover occurs, the predictedcrossover point from the power-law model is within 1 M µ of the observed value. Note that the observed asymptoticscaling is often less than the theoretical value. There aretwo geneses for this: residual contributions from lowerscaling operations (which drags the scaling down at thecost of prefactor) and better GEMM/GEMV eﬃciencyfor larger matrix sizes (which makes GEMM/GEMV ap-pear to scale better than cubic/quadratic as the matrixsize increases). However, the relative scaling relation-ships between all integral technologies is retained.From this point forward, it is useful to consider generaland w -separable potentials separately.For all cases of N and D in general local potentials,X-THC is markedly more eﬃcient than conventional ap-proaches. In all such cases, X-THC crosses over conven-tional (often at very small M µ for D >

D >

1. This is strong evidence for the immedi-ate application of X-THC to problems involving generalpotentials.For w -separable cases, X-THC always exhibits lowerasymptotic scaling than conventional, and providescrossovers and sometimes signiﬁcant speedups for D = 1and D = 2. However, the narrower asymptotic sepa-ration between conventional and X-THC ( β Conv − Sep − β THC − Sep = N − β Conv − Gen − β THC − Gen = N D − D . This prefactor has two geneses: the formal FMAprefactor due to successive substitution of primary ba-sis indices for larger quadrature indices and the lowereﬃciency of X-THC DGEMM operations compared toconventional DGEMM operations. For a 3-body, 3-dimensional w -separable potential, the crossover seemslikely to occur just outside of the memory-limited prob-lem size explicitly shown in Figure 1. In fact, the pre-dicted crossover point for this case is M µ = 10 .

3. The2-body, 3-dimensional w -separable potential is somewhatmore sinister: a “jump” in the timings curve caused bysome hardware boundary causes the expected crossoverpoint to increase to a practically unreachable value of M µ = 1789.The inability of X-THC to provide a practical crossoverfor the 2-body, 3-dimensional w -separable potential is anindication that this technique is not a panacea. How-ever, we point out that the utilization of a w -separablepotential throughout the literature seems to stem from the lack of an X-THC representation for a general poten-tial: the approximation of the potential as w -separablewas required to provide a tractable numerical recipe. Inthis light, X-THC treatment of general potentials pro-vides a practical alternative pathway that avoids approx-imation of the potential as w -separable. For all of thecases studied here, the general X-THC curve is withinroughly an order of magnitude of the conventional sep-arable curve (in some cases even faster!). Moreover, X-THC does provide some gains in w -separable potentials,particularly for larger N . On more formal grounds, thedemonstrated lossless asymptotic reduction of a general-ized pairing term in any local potential from O ( M NDµ ) to O ( M ND +1 µ ) is a useful result in and of itself, consideringthat simply storing the pairing tensor requires O ( M NDµ ). ∗ Electronic address: [email protected] † Electronic address: [email protected] ‡ Electronic address: [email protected] [1] L. M. Robledo, Phys. Rev. C , 044312 (2010).[2] An example is molecular physics in an atom-centered ba-sis: the number of basis functions is always proportionalto the number of particles in the system, and has no de-pendency on the 1-, 2-, or 3-dimensional nature of themolecular geometry.[3] By “superindex” we indicate a generalized index which iscomposed as a concatenation of several quantities whichwould normally be considered to be indices in their ownright[4] Note that there is some ambiguity here, as some round-oﬀ error is intrinsic to the w -separable conventional refer-ence itself. A particularly marked source of roundoﬀ erroris the Moshinski relations for the potentials. Since boththe w -separable and general conventional pairing routinesshare the same underlying Moshinski relations, superioragreement between these two conventional methods andthe THC methods does not necessarily imply that one ismore accurate than the other, against a hypothetical ex-act precision result. In any case, the agreement betweenall methods is suﬃcient that the point is moot.[5] This is the most numerically susceptible portion of theprocedure, even when using recurrence relations. Algorithm 1

Generalized pairing algorithm: conventional integrals, general potential. procedure Pairing Conv Gen ( h ( i . . . n ) . . . ( i . . . n ) D | ˆ V | ( i . . . n ) . . . ( i . . . n ) D i , κ ( i ...n ) ... ( i ...n ) D ) Allocate ∆ ( i...n ) ... ( i...n ) D . Target (Typically Preallocated) ∆ ( i...n ) ... ( i...n ) D = h ( i . . . n ) . . . ( i . . . n ) D | ˆ V | ( i . . . n ) . . . ( i . . . n ) D i κ ( i ...n ) ... ( i ...n ) D . GEMV, O ( M NDµ ) return ∆ ( i...n ) ... ( i...n ) D end procedure Algorithm 2

Generalized pairing algorithm: X-THC integrals, general potential. procedure Pairing THC Gen ( X P µ i µ , Z ( P...W ) ... ( P...W ) D , κ ( i ...n ) ... ( i ...n ) D ) Allocate ∆ ( i ...n ) ... ( i ...n ) D . Target (Typically Preallocated) Allocate T ( P...W ) ... ( P...W ) D . Scratch Array (Typically Preallocated) Allocate U ( P...W ) ... ( P...W ) D . Scratch Array (Typically Preallocated) T ( i ...n ) ... ( i ...n ) D = κ ( i ...n ) ... ( i ...n ) D . Deep Copy for all µ ∈ [1 , D ] do . Start Cyclic Permutation in µ and η for all η ∈ [1 , N ] do U ( ...n ) µ ... ( P...W ) µ − ( P ) µ = X i µ P µ T ( i ) µ ( ...n ) µ ... ( P...W ) µ − . GEMM, O ( M ND +1 µ ) swap( T, U ) . Pointer Swap end for end for . End Cyclic Permutation in µ and η T ( P...W ) ... ( P...W ) D ∗ = Z ( P...W ) ... ( P...W ) D . Hadamard Product, O ( M NDµ ) for all µ ∈ [1 , D ] do . Start Cyclic Permutation in µ and η for all η ∈ [1 , N ] do U ( ...W ) µ ... ( i...n ) µ − ( i ) µ = X i µ P µ T ( P ) µ ( ...W ) µ ... ( i...n ) µ − . GEMM, O ( M ND +1 µ ) swap( T, U ) . Pointer Swap end for end for . End Cyclic Permutation in µ and η ∆ ( i...n ) ... ( i...n ) D = T ( i...n ) ... ( i...n ) D . Deep Copy return ∆ ( i...n ) ... ( i...n ) D end procedure Algorithm 3

Generalized pairing algorithm: conventional integrals, w -separable potential. procedure Pairing Conv Sep ( h i . . . n | ˆ V | i . . . n i wµ , κ ( i ...n ) ... ( i ...n ) D ) Allocate ∆ ( i...n ) ... ( i...n ) D = 0 . Target (Typically Preallocated) Allocate T ( i ...n ) ... ( i ...n ) D . Scratch Array (Typically Preallocated) Allocate U ( i ...n ) ... ( i ...n ) D . Scratch Array (Typically Preallocated) for all w ∈ [1 , N w ] do T w ( i ...n ) ... ( i ...n ) D = κ ( i ...n ) ... ( i ...n ) D . Deep Copy for all µ ∈ [1 , D ] do . Start Cyclic Permutation in µ U w... ( i...n ) µ − ( i...n ) µ = h i . . . n | ˆ V | i . . . n i wµ T w ( i ...n ) µ ... ( i...n ) µ − . GEMM, O ( N w M ND + Nµ ) swap( T w , U w ) . Pointer Swap end for . End Cyclic Permutation in µ ∆ ( i...n ) ... ( i...n ) D + = T w ( i...n ) ... ( i...n ) D . Contribution from w end for return ∆ ( i...n ) ... ( i...n ) D end procedure Algorithm 4

Generalized pairing algorithm: X-THC integrals, w -separable potential. procedure Pairing THC Sep ( X P µ i µ , Z ( P...W ) µ w , κ ( i ...n ) ... ( i ...n ) D ) Allocate ∆ ( i ...n ) ... ( i ...n ) D . Target (Typically Preallocated) Allocate T ( P...W ) ... ( i...n ) D . Scratch Array (Typically Preallocated) Allocate U ( P...W ) ... ( i...n ) D . Scratch Array (Typically Preallocated) for all w ∈ [1 , N w ] do T ( i ...n ) ... ( i ...n ) D = κ ( i ...n ) ... ( i ...n ) D . Deep Copy for all µ ∈ [1 , D ] do . Start Cyclic Permutation in µ for all η ∈ [1 , N ] do U ( ...n ) µ ... ( i...n ) µ − ( P ) µ = X i µ P µ T ( i ) µ ( ...n ) µ ... ( i...n ) µ − . GEMM, O ( N w M ND +1 µ ) swap( T, U ) . Pointer Swap end for T ... ( i...n ) µ − ( P...W ) µ ∗ = Z ( P...W ) µ w . SCAL, O ( N w M NDµ ) for all η ∈ [ N, do U ( n ) µ ... ( i...n ) µ − ( P... ) µ = X n µ W µ T ... ( i...n ) µ − ( P... ) µ ( W ) µ . GEMM, O ( N w M ND +1 µ ) swap( T, U ) . Pointer Swap end for U ... ( i...n ) µ − ( i...n ) µ = T ( i...n ) µ ... ( i...n ) µ − . Explicit Transposition swap(

T, U ) . Pointer Swap end for . End Cyclic Permutation in µ ∆ ( i...n ) ... ( i...n ) D + = T ( i...n ) ... ( i...n ) D . Contribution from w end for return ∆ ( i...n ) ... ( i...n ) D end procedure −7 −6 −5 −4 −3 −2 −1 T i m e [ s ] −7 −6 −5 −4 −3 −2 −1 T i m e [ s ] −7 −6 −5 −4 −3 −2 −1 T i m e [ s ] −7 −6 −5 −4 −3 −2 −1 T i m e [ s ] −7 −6 −5 −4 −3 −2 −1 T i m e [ s ] −7 −6 −5 −4 −3 −2 −1 T i m e [ s ] Conv. Sep.THC Sep.Conv. Gen.THC Gen.

FIG. 1: Timings for generalized pairing tensor formation vs. problem size for various integral technologies, numbers of bodies N , and number of dimensions D . −16 −15 −14 −13 −12 R e l a t i v e M a x i m u m R e s i dua l −16 −15 −14 −13 −12 R e l a t i v e M a x i m u m R e s i dua l −16 −15 −14 −13 −12 R e l a t i v e M a x i m u m R e s i dua l −16 −15 −14 −13 −12 R e l a t i v e M a x i m u m R e s i dua l −16 −15 −14 −13 −12 R e l a t i v e M a x i m u m R e s i dua l −16 −15 −14 −13 −12 R e l a t i v e M a x i m u m R e s i dua l THC Sep.Conv. Gen.THC Gen.

FIG. 2: Relative maximum residual for generalized pairing tensor formation vs. problem size for various integral technologies,numbers of bodies N , and number of dimensions D . TABLE I: Asymptotic scalings of generalized pairing tensor formation with various integral technologies. Shown are observedscalings based on power-law ﬁt to timing data and corresponding theoretical scaling (in parentheses).

D N β

Conv − Sep β THC − Sep β Conv − Gen β THC − Gen

D N M

Gen µ M Sep µ1 2 1.224E+01 1.172E+011 3 3.429E+00 5.170E+002 2 2.194E+00 1.657E+012 3 2.304E+00 7.928E+003 2 2.024E+00 1.789E+033 3 1.650E+00 1.035E+01