Cubic-scaling algorithm and self-consistent field for the random-phase approximation with second-order screened exchange
CCubic-scaling algorithm and self-consistent field for the random-phaseapproximation with second-order screened exchange
Jonathan E. Moussa a) Sandia National Laboratories, Albuquerque, NM 87185, USA
The random-phase approximation with second-order screened exchange (RPA + SOSEX) is a modelof electron correlation energy with two caveats: its accuracy depends on an arbitrary choice of meanfield, and it scales as O ( n ) operations and O ( n ) memory for n electrons. We derive a new algorithmthat reduces its scaling to O ( n ) operations and O ( n ) memory using controlled approximations anda new self-consistent field that approximates Brueckner coupled-cluster doubles (BCCD) theory withRPA + SOSEX, referred to as Brueckner RPA (BRPA) theory. The algorithm comparably reduces thescaling of second-order Møller-Plesset (MP2) perturbation theory with smaller cost prefactors thanRPA + SOSEX. Within a semiempirical model, we study H dissociation to test accuracy and H n ringsto verify scaling. I. INTRODUCTION
Density functional theory (DFT) and coupled-cluster (CC)theory are the two dominant paradigms for the computation ofmany-electron ground states with complementary capabilities.Both theories are built upon the independent-orbital and self-consistent field (SCF) structure of Hartree-Fock (HF) theory .DFT proves the existence of an exact density functional forwhich there are approximations enabling routine simulationof thousands of electrons that scale as O ( n ) operations and O ( n ) memory for n electrons. CC theory is a systematicallyimprovable hierarchy of methods indexed by p ≥ O ( n p + ) operations and O ( n p ) memory. In practice, DFTis limited by accuracy and CC theory is limited by cost.The random phase approximation (RPA) is a natural pointof convergence in the ongoing development of CC theory andDFT. RPA emerges from truncated versions of CC methods asreduced-complexity methods that retain a significant fractionof the CC correlation energy . RPA also occurs when DFT isused to approximate the polarization function in the adiabatic-connection fluctuation-dissipation formula for the correlationenergy . This formula might become a part of more accuratedensity functionals, but it has a higher cost and complexity than conventional density functionals. In both cases, a balanceof cost and accuracy is being sought with RPA.The electron correlation energy model that we consider inthis paper is the RPA correlation energy plus the second-orderscreened exchange (SOSEX) energy . RPA + SOSEX is exactfor one electron and to second order in perturbation theory. Itagrees with quantum Monte Carlo benchmarks of the uniformelectron gas to within 0.002 Ha / electron . Benchmarks oninhomogeneous systems show mean absolute errors of 0.002Ha / atom for cohesive energies of solids in a 5-solid test set ,0.008 Ha / molecule for atomization energies of molecules inthe G2-1 test set , and 0.010 Ha / atom for correlation energiesof first and second row atoms . These atomic and molecularRPA + SOSEX results approach but fail to surpass the accuracyof second-order Møller-Plesset (MP2) perturbation theory and the B3LYP density functional . Also, any RPA + SOSEXenergy depends on a mean field choice and is not unique . a) Electronic mail: [email protected]
The goal of this paper is to further develop RPA + SOSEXas a compromise between the cost of DFT and the accuracy ofCC theory. Known algorithms for RPA + SOSEX and similarmodels require O ( n ) operations and either O ( n ) or O ( n )memory. By utilizing an auxiliary basis set , fast interactionkernel summation , and a low-rank approximation of energydenominators , we design a new algorithm to reduce the costto O ( n ) operations and O ( n ) memory. We define a preciseRPA + SOSEX total energy with a unique choice of mean fieldbased on Brueckner orbitals that is designed to approximateBrueckner coupled-cluster doubles (BCCD) theory . This isreferred to as Brueckner RPA (BRPA) theory.It is important to be pragmatic about the short-term valueof BRPA theory and a fast RPA + SOSEX algorithm. Despiteits construction, BRPA is not a good approximation of BCCD.The approximations used to retain the RPA + SOSEX form aretoo crude. For a given basis, a fast RPA + SOSEX calculationwill have a large cost relative to conventional DFT. With thecontinued reduction of average errors in density functionals and a correlation of DFT and RPA error outliers , it is unclearwhether the cost is warranted without additional research.The paper proceeds as follows. BRPA theory is derived inSec. II as a truncation of BCCD theory. The SCF structure ofBRPA theory is emphasized. The three main components of afast RPA + SOSEX algorithm are introduced in Sec. III. Thisincludes a nonstandard choice of primary and auxiliary basis,which simplifies the cost accounting and algorithm design. InSec. IV, the tensor structure in existing RPA algorithms isreviewed and extended to RPA + SOSEX. The novel structureof RPA + SOSEX theory enables a significant reduction in thenumber of its variables. Fast and conventional RPA + SOSEXalgorithms are designed in Sec. V as pseudocode. A detailedleading-order cost analysis is given instead of a crude scalinganalysis. We study applications to a semiempirical H n modelin Sec. VI. The accuracy of BRPA theory is tested on H ,and the scaling of RPA + SOSEX algorithms is confirmed withH n calculations for large n . The implications for future workare discussed in Sec. VII. This includes implementation inestablished electronic structure codes, basis set convergenceproblems, and further development of correlation models. Weconclude in Sec. VIII with a brief consideration of RPA as adistinct electronic structure paradigm. a r X i v : . [ c ond - m a t . m t r l - s c i ] J a n II. BRUECKNER RPA (BRPA) THEORY
The common origin of all post-HF methods is the many-electron Hamiltonian in second quantization notation ,ˆ H = E + h pq ˆ c † p ˆ c q + (cid:101) V pqrs ˆ c † p ˆ c † q ˆ c s ˆ c r , (1)where ˆ c p are the fermion lowering operators of spin-orbitals.Spin-orbital indices are labeled by { p , q , r , s } . Any tensor orproduct of tensors with repeated indices has an implicit sumover those indices unless they appear unrepeated in a tensor orproduct of tensors or in an explicit sum in the same equation.This notation is related to the standard bracket notation as h pq = (cid:104) p | h | q (cid:105) = (cid:90) d x d y φ ∗ p ( x ) h ( x , y ) φ q ( y ) , V pqrs = (cid:104) pq | rs (cid:105) = (cid:90) d x d y φ ∗ p ( x ) φ ∗ q ( y ) V ( x , y ) φ r ( x ) φ s ( y ) , (cid:101) V pqrs = (cid:104) pq || rs (cid:105) = (cid:104) pq | rs (cid:105) − (cid:104) pq | sr (cid:105) , (2)where x and y are spin-space coordinates, h ( x , y ) is the single-electron Hamiltonian, V ( x , y ) is the inter-electron interaction, φ p ( x ) are the spin-orbital wavefunctions, and (cid:101) X pqrs = X pqrs − X pqsr is a tensor antisymmetrization operation. The operators havesymmetries h ( x , y ) = h ( y , x ) ∗ and V ( x , y ) = V ( y , x ).We define a reference Slater determinant, | Φ (cid:105) , by splittingthe set of orbitals into virtual orbitals labeled by { a , b , c , d } and n occupied orbitals labeled by { i , j , k , l } . | Φ (cid:105) is defined byˆ c † i | Φ (cid:105) = , ˆ c a | Φ (cid:105) = , (cid:104) Φ | Φ (cid:105) = , (3)uniquely up to a global phase. When combined with fermionanticommutation, ˆ c † p ˆ c q = δ pq − ˆ c q ˆ c † p and ˆ c p ˆ c q = − ˆ c q ˆ c p , Eq. (3)is su ffi cient to calculate all expectation values of | Φ (cid:105) .HF theory posits the minimization of E HF = (cid:104) Φ | ˆ H | Φ (cid:105) anddiagonalization of particle and hole subspaces, (cid:104) Φ | ˆ c a ˆ H ˆ c † b | Φ (cid:105) and (cid:104) Φ | ˆ c † i ˆ H ˆ c j | Φ (cid:105) . For local minima in E HF , this is equivalentto diagonalization of a Fock matrix, f p q = h pq + (cid:101) V ipiq = (cid:15) p δ pq , (4)or equivalently a Fock operator in spin-space, f ( x , y ) = h ( x , y ) + v ( x ) δ ( x − y ) − ρ ( x , y ) V ( x , y ) ,ρ ( x , y ) = φ i ( x ) φ ∗ i ( y ) , v ( x ) = (cid:90) d y V ( x , y ) ρ ( y , y ) , (cid:90) d y f ( x , y ) φ p ( y ) = (cid:15) p φ p ( x ) , (5)with the canonical SCF structure between φ p ( x ) and f ( x , y ).The total energy consistent with the Fock matrix is E HF = E + h ii + (cid:101) V i ji j (6a) = E + ( h ii + f i i ) . (6b)The orbital energies are approximate excitation energies, (cid:15) a = (cid:104) Φ | ˆ c a ˆ H ˆ c † a | Φ (cid:105) − E HF ,(cid:15) i = E HF − (cid:104) Φ | ˆ c † i ˆ H ˆ c i | Φ (cid:105) , (7)which is known as Koopmans’ theorem . A. Brueckner coupled-cluster doubles (BCCD) theory
The basic structure of CC theory is to solve ˆ H | Ψ (cid:105) = E | Ψ (cid:105) approximately as a projected eigenvalue problem . It definesa cluster operator, ˆ T , from the linear span of a set of operators, S , omitting identity, ˆ I , which only alters the normalization, (cid:104) Φ | ˆ X † exp( − ˆ T ) ˆ H | Ψ (cid:105) = E (cid:104) Φ | ˆ X † exp( − ˆ T ) | Ψ (cid:105) , ˆ X ∈ S , | Ψ (cid:105) = exp( ˆ T ) | Φ (cid:105) , ˆ T ∈ span( S \ { ˆ I } ) . (8)At the singles-and-doubles level of theory (CCSD), | Φ (cid:105) is theHF ground state and S = { ˆ I , ˆ c † a ˆ c i , ˆ c † a ˆ c † b ˆ c j ˆ c i } . In its Bruecknervariant (BCCD), | Φ (cid:105) is varied and ˆ T ∈ span( S \ { ˆ I , ˆ c † a ˆ c i } ).We use nonstandard notation to simplify the tensor formof the BCCD equations. The first is ˆ T = T abi j ˆ c † a ˆ c † b ˆ c j ˆ c i withpartial symmetry constraints: T abi j = T baji but not T abi j = − T abji .The second is a non-Hermitian Brueckner matrix , b pq = f p q + ( δ pa δ iq f j b + δ iq (cid:101) V p jab − δ pa (cid:101) V i jqb ) (cid:101) T abi j . (9)This reduces the total energy to a form similar to Eq. (6b), E = E + h ii + (cid:101) V i ji j + (cid:101) V i jab (cid:101) T abi j (10a) = E + ( h ii + b ii ) . (10b)Also, the single-excitation equations simplify to b ai = R abi j = (cid:101) V akic (cid:101) T cbk j + (cid:101) T acik (cid:101) V kbc j + (cid:101) T acik (cid:101) V klcd (cid:101) T dbl j , (11)which simplifies the remaining equations to (cid:101) V abi j + b ac (cid:101) T cbi j − b ki (cid:101) T abk j + b bc (cid:101) T aci j − b kj (cid:101) T abik + (cid:101) R abi j + (cid:101) V abcd (cid:101) T cdi j + (cid:101) T abkl (cid:101) V kli j + (cid:101) T abkl (cid:101) V klcd (cid:101) T cdi j = . (12)Only (cid:101) T abi j appears in the BCCD tensor equations. Changes in T abi j that do not alter (cid:101) T abi j are a redundancy in the theory. B. BRPA as a truncation of BCCD
We construct BRPA theory by truncating the BCCD tensorequations to extract the RPA + SOSEX correlation model. Wecrudely attempt to minimize errors by minimizing the numberof truncations. Only Eq. (12) is truncated by (1) removing the“ladder” terms on the second line of the equation, (2) reducingthe ring tensor to a “direct-ring” tensor, (cid:101) R abi j ⇒ (cid:101) D abi j , for D abi j = V akic T cbk j + T acik V kbc j + T acik V klcd T dbl j , (13)and (3) reducing b pq to its Hermitian part, ˜ b pq = ( b pq + b q ∗ p ), toguarantee orthogonal orbitals with real energies. The resultingequations depend on T abi j but only define (cid:101) T abi j . To define T abi j ,we expand the set of equations by removing all ‘ (cid:101) ’, V abi j + ˜ b ac T cbi j − ˜ b ki T abk j + ˜ b bc T aci j − ˜ b kj T abik + D abi j = . (14)These are known as the RPA Riccati equations . They have aunique, positive-definite “stabilizing” solution that evolvesover V abi j ⇒ λ V abi j from the unique λ → λ = C. Self-consistent field structure of BRPA
In conjunction with Eqs. (13) and (14), it is convenient todescribe BRPA theory with SCF structure. We expand Eq. (4)to the diagonalization of a generalized Fock matrix, f pq = f p q + Σ pq = (cid:15) p δ pq , (15)with a static and Hermitian self-energy matrix, Σ pq , in analogyto many-body Green’s function theory . With the choice f ab = ˜ b ab , f ij = ˜ b ij , f ai = b ai , f ia = b a ∗ i , (16)Eq. (15) contains b ai =
0, Eqs. (6b) and (10b) are analogous,and f pq diagonalization enables a rearrangement of Eq. (14), T abi j = − ( δ ap δ ri + T acik δ kp δ rc ) V pqrs ( δ lq δ sd T dbl j + δ bq δ sj ) (cid:15) a − (cid:15) i + (cid:15) b − (cid:15) j . (17)It is convenient to write Σ pq using a non-Hermitian matrix, Σ pq = ( σ pq + σ q ∗ p + δ pa δ iq σ ai + δ ip δ qa σ a ∗ i ) ,σ pq = ( δ pa δ iq h jb + δ iq (cid:101) V p jab − δ pa (cid:101) V i jqb ) (cid:101) T abi j , (18)whereas Σ pq ⇒ { (cid:15) p , φ p ( x ) } , we solve Eq. (17) iteratively for T abi j asan inner SCF cycle. We then calculate σ pq from T abi j , which isused to recalculate { (cid:15) p , φ p ( x ) } in the outer SCF cycle. This isappropriate for reducing the number of σ pq calculations whenthere is a large relative cost to calculate σ pq over T abi j . The SCFcycle is summarized as σ pq (cid:55)→ { (cid:15) p , φ p ( x ) } (cid:55)→ T abi j (cid:55)→ σ pq .The inner SCF cycle can be avoided by using a first-orderapproximation for T abi j instead of solving Eq. (17), T ab i j = − V abi j (cid:15) a − (cid:15) i + (cid:15) b − (cid:15) j . (19) T abi j ⇒ T ab i j in Eq. (18) defines σ pq ⇒ σ p q , which is similarto a canonical transformation MP2 (CT-MP2) theory . Othertheories use σ p q to calculate E but calculate { (cid:15) p , φ p ( x ) } usingan inconsistent Σ pq . Brueckner CC2 (BCC2) theory uses Σ pq ⇒ δ pa δ iq σ a i + δ pi δ aq σ a ∗ i , (20)and conventional MP2 theory uses Σ pq ⇒ applies when E and { (cid:15) p , φ p ( x ) } are calculated from the same T abi j and Σ pq , (cid:15) a = (cid:104) Φ | ˆ c a exp( − ˆ T ) ˆ H exp( ˆ T )ˆ c † a | Φ (cid:105) − E ,(cid:15) i = E − (cid:104) Φ | ˆ c † i exp( − ˆ T ) ˆ H exp( ˆ T )ˆ c i | Φ (cid:105) , (21)which generalizes Eq. (7). We assume b pq to be real for p = q .This partially corrects for the absence of electron correlationin Koopmans’ theorem, but it still has an “orbital relaxation”error. Because solutions to the BRPA equations depend on thechoice of orbital occupations, (cid:15) p is not exactly the di ff erencebetween a pair of converged BRPA total energies. In contrast,many-body Green’s function theory is formally exact . III. FAST ALGORITHM COMPONENTS
None of the three components of the fast BRPA algorithmare fundamentally new. However, some details of their use arenonstandard. We review these details and how they comparewith modern standards in the Gaussian-orbital and planewave-pseudopotential electronic structure methodologies.
A. Primary and auxiliary local basis sets
In this paper, we use a grid in x with α n points as both theprimary and auxiliary basis, labeled by { x , y , z , w } . α is a basisset e ffi ciency factor. The relation between primary and orbitalfermion operators is ˆ a x = φ px ˆ c p with orthonormal structure, φ px φ ∗ py = δ xy . Single-electron operators in the primary basisare X xy = φ px X pq φ ∗ qy . This notation transparently interchangeswith basis-free notation (e.g. φ p ( x ) ⇔ φ px ). In terms of ˆ a x ,ˆ H = E + h xy ˆ a † x ˆ a y + V xy ˆ n x ˆ n y − V xx ˆ n x , (22)with number operators, ˆ n x = ˆ a † x ˆ a x , and a “kernel”, V xy , from V pqrs = S p rx V xy S q sy , S p qx = φ ∗ px φ qx . (23)This is similar to a tensor hypercontraction (THC) form for V pqrs . However, here we use it as a theoretical basis to simplifyaccounting and not necessarily as a computational basis.A computational basis must enable decomposition of V pqrs similar to Eq. (23) to be suitable for a fast BRPA calculation.Its kernel, V xy , must be amenable to fast summation methodsdiscussed in Sec. III C. A general form, S p qz = φ ∗ px S xyz φ qy , isacceptable if the “vertex”, S xyz , is sparse. In S xyz , { x , y } areprimary basis indices, and z is an auxiliary basis index. Theseprimary and auxiliary bases need not be orthogonal or equal,but we assume both to simplify method development.In Gaussian-orbital electronic structure, both primary andauxiliary basis functions are atom-centered polynomials timesGaussians. The resolution-of-identity (RI) form of V pqrs is S xyz = (cid:90) d x d y f ∗ x ( x ) f y ( x ) V ( x , y ) g z ( y ) , (24a) V − zw = (cid:90) d x d y g z ( x ) V ( x , y ) g ∗ w ( y ) , (24b)for primary, f x ( x ), and auxiliary, g z ( x ), basis functions and amatrix inverse, ‘ − ’. The RI vertex is not sparse, which is alsothe case in Cholesky and pseudospectral decompositions.The THC vertex is sparse but requires O ( α n ) operations tobe calculated at present . In a standard RI-RPA calculation , α = . α = . of a frozen-core C atom ( n = . No augmentationof the auxiliary basis is a source of errors when core-valencepolarization is important . Planewaves have a sparse vertexbecause the fast Fourier transform (FFT) enables e ffi cient gridoperations. Also, the Coulomb kernel, V xy , is diagonal in theplanewave basis. In a standard planewave RPA calculation , α = . α = . B. Low-rank energy denominator approximation
In this paper, we build low-rank approximations to energydenominators using numerical quadratures of three integrals.The first integral is over the imaginary frequency axis,1 ω ai + ω b j = (cid:90) i ∞− i ∞ d Ω π i ω ai − Ω )( ω b j + Ω ) ≈ Ω e ( ω ai − ω e )( ω b j + ω e ) , (25)with ω ai = (cid:15) a − (cid:15) i > β points, ω e , andweights, Ω e , indexed by { e , f , g } . We use a permuted index, e ,to write a required quadrature symmetry: ω e = ω ∗ e = − ω e and Ω e = Ω ∗ e = Ω e . The remaining two integrals are δ pa δ qi ω ai + ω e = (cid:73) Γ v d Ω π i (cid:73) Γ o d Ω (cid:48) π i / ( Ω − Ω (cid:48) + ω e )( (cid:15) p − Ω )( (cid:15) q − Ω (cid:48) ) ≈ Ω eai ( (cid:15) p − ω a )( (cid:15) q − ω i ) , (26)with closed counterclockwise contours, Γ v separating (cid:15) a from (cid:15) i − ω e , and Γ o separating (cid:15) i from (cid:15) a + ω e . Their quadraturesare β points, ω a , indexed by { a , b } and β points, ω i , indexedby { i , j } with di ff erent weights, Ω eai , for each ω e . We combine a and i into one index, p , for notational convenience.With a 4-parameter model of the orbital energy spectrum, (cid:15) a ∈ [ (cid:15) minv , (cid:15) maxv ] and (cid:15) i ∈ [ (cid:15) mino , (cid:15) maxo ], and a target error, ε Q , weanalytically construct numerical quadratures in Appendix A.Comparable to other results , the quadrature sizes are β ≈ π ln (cid:32) (cid:15) maxv − (cid:15) mino (cid:15) minv − (cid:15) maxo (cid:33) ln ε − ,β ≈ π ln (cid:32) (cid:15) maxv − (cid:15) maxo (cid:15) minv − (cid:15) maxo (cid:33) ln ε − ,β ≈ π ln (cid:32) (cid:15) minv − (cid:15) mino (cid:15) minv − (cid:15) maxo (cid:33) ln ε − , (27)without an explicit n -dependence. When (cid:15) minv − (cid:15) maxo is zero orvery small, a low-energy cuto ff must be introduced.In conjunction with Eq. (25), the fast RPA algorithm alsorequires an approximate product closure relation,1( ω ai + ω e )( ω ai + ω f ) ≈ ∆ e fg ω ai + ω g . (28)For e (cid:44) f , this equation is closed by partial fractions, ∆ e fg = − δ ef ∇ fg − ( δ eg − δ fg ) / ( ω e − ω f + δ ef ) . (29)For e = f , this requires coe ffi cients, ∇ ef , of a finite di ff erenceapproximation to − dd ω ( ω ai + ω ) − for ω ∈ { ω e } . It is a standardlinear approximation problem that we solve by minimizing aroot-mean-square (RMS) error metric, ε FD = min ∇ ef (cid:118)(cid:117)(cid:116) α − β n (cid:88) a , i , e (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + ∇ ef ( ω ai + ω e ) ω ai + ω f (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (30)In practice, we find that ε FD is proportional to ε Q . Numerical quadratures of the Laplace transform are analternative low-rank energy denominator approximation,1 (cid:15) a − (cid:15) i + (cid:15) b − (cid:15) j = (cid:90) ∞ dse − s (cid:15) a e s (cid:15) i e − s (cid:15) b e s (cid:15) j ≈ w e e − s e (cid:15) a e s e (cid:15) i e − s e (cid:15) b e s e (cid:15) j . (31)Numerical methods for calculating quadratures are known .The main advantage of Eq. (31) over Eqs. (25) and (26) is thesingle quadrature summation rather than nested summations.However, Eqs. (25) and (26) have better numerical behaviorin the case of complex orbital energies and lead to equationswith connections to many-body Green’s function theory .Kernel interpolation (i.e. skeleton decomposition ) is yetanother low-rank approximation of energy denominators,1 ω ai + ω b j ≈ ω ai + ω e [ K − ] e f ω b j + ω f , [ K ] e f = ω e + ω f . (32)It is not based on reduction of an integral to quadrature, thus itdoes not have a simple exact limiting expression. However, itis exact if ω ai or ω b j is in { ω e } and may be e ffi cient for treatingisolated spectral features and large interior energy gaps. C. Fast summation of the interaction kernel
In this paper, fast V xy summation methods are accountedfor by assigning O ( αγ n ) operations to ρ x (cid:55)→ V xy ρ y . γ is ane ffi ciency factor with either weak or no n -dependence for fastmethods. Further details are not important for cost analysis.Fast V xy summation methods are ubiquitous in electronicstructure. FFTs are used in planewave-pseudopotential codesto solve the Poisson equation and apply the Hamiltonian to anorbital. The fast multiple method (FMM) is used in Gaussian-orbital codes to e ffi ciently calculate the matrix elements of theHartree potential . Except for planewave calculations of theFock exchange , these fast methods are not a bottleneck andtheir γ values do not contribute to the leading-order cost.The leading-order cost of the fast BRPA algorithm has adependence on γ . This is commonly the case for classical n -body simulations of electrostatics in molecular dynamics andgravity in astrophysics. As a result, the method developmentin those fields has been driven to develop new fast summationtechniques. Such modern innovations include the acceleratedcartesian expansion (ACE), multilevel summation method (MSM), and hierarchical matrix decompositions . However,electronic structure applications have special requirements forsummation methods such as the ability to treat point nuclei,smooth valence electron charge, and the multiple length scalesof core electron charge in a unified framework. Applicationssuch as BRPA might motivate this type of development.Fast summation methods exist for other V xy besides theCoulomb kernel. The fast BRPA algorithm applies to any V xy for which γ is small for any reason. Such generality might beappealing, but further improvements to accuracy or e ffi ciencycould result from better use of structure within the Coulombkernel and elementary operations beyond just ρ x (cid:55)→ V xy ρ y . Anexample is the low o ff -diagonal rank of the Coulomb kernelshared by a general class of structured matrices . IV. RPA TENSOR STRUCTURE
There are three distinct approaches to calculating the RPAcorrelation energy. The first is calculating T abi j by solving theRPA Riccati equation in Eq. (14) and evaluating E RPAc = V i jab T abi j . (33)The RPA Riccati equation is equivalent to the RPA symplecticeigenvalue problem that determines electron-hole excitationenergies. E RPAc is also half the sum of the di ff erence betweenthese excitation energies and the corresponding energies in theTamm-Danco ff approximation . These energies are the polesof frequency-dependent polarization functions and this sum isextracted from them by the residue theorem when used on theadiabatic-connection fluctuation-dissipation (ACFD) formulafor E RPAc . This formula can be rearranged as E RPAc = V rspq C pqrs , (34)where C pqrs is the correlated part of the RPA two-body densitymatrix averaged over interaction strength . We use the well-studied structure of C pqrs as a reference point for the discussionof structure in T abi j , which has yet to be elucidated. A. Adiabatic-connection fluctuation-dissipation RPA
In an auxiliary basis, the ACFD formula for E RPAc is E RPAc = − (cid:90) d λ (cid:90) i ∞− i ∞ d Ω π i V xy [ P λ xy ( Ω ) − P xy ( Ω )] , (35)where P λ xy ( ω ) is the RPA polarization function for interactionstrength λ . P λ xy ( ω ) is defined in relation to its λ = P xy ( ω ) = − S a ix S i ay ω ai + ω − S i ax S a iy ω ai − ω , P λ xy ( ω ) = P xy ( ω ) + λ P λ xz ( ω ) V zw P wy ( ω ) . (36)The screened Coulomb interaction is related to P λ xy ( ω ) as W λ xy ( ω ) = λ V xy + λ V xz P λ zw ( ω ) V wy , (37a) P λ xy ( ω ) = P xy ( ω ) + P xz ( ω ) W λ zw ( ω ) P wy ( ω ) . (37b)Using Eq. (37b), we encapsulate λ in Eq. (35) with W xy ( ω ) = (cid:90) d λ W λ xy ( ω ) , [ E ( ω )] xy = P xz ( ω ) V zy , W xy ( ω ) = − V xz [ E ( ω ) − + E ( ω ) − ln( I − E ( ω ))] zy , (38)and extract C pqrs by splitting P xy ( ω ) over its internal indices, C pqrs = − δ pa δ qb δ ir δ js (cid:90) i ∞− i ∞ d Ω π i S a ix W xy ( Ω ) S b jy ( ω ai − Ω )( ω b j + Ω ) − δ pi δ qj δ ar δ bs (cid:90) i ∞− i ∞ d Ω π i S i ax W xy ( Ω ) S j by ( ω ai + Ω )( ω b j − Ω ) . (39)Other forms of C pqrs combine the ‘ abi j ’ and ‘ i jab ’ sectors anduse di ff erent but equivalent energy denominators. B. Structure of T abij in BRPA From observing Eqs. (17), (25), and (39), it is reasonableto expect similar tensor structure in C pqrs and T abi j . To that end,we expand V pqrs in Eq. (17) using Eq. (23) and regroup terms, T abi j = − S aix V xy S bjy ω ai + ω b j , S aix = S a ix + T abi j S j bx . (40)This can be rewritten to define S aix without reference to T abi j , S aix = S a ix + S aiy V yz B zx ( ω ai ) , (41a) B xy ( ω ) = − S aix S i ay ω ai + ω . (41b)The iteration of Eq. (41a) starting from S aix = S a ix elucidates afactorization ansatz that further simplifies S aix , S aix = S a ix + S a iy V yz A zx ( ω ai ) . (42)It enables a reduction of Eq. (41) to Dyson-like equations, A xy ( ω ) = − S a ix S i ay ω ai + ω , (43a) A xy ( ω ) = B xy ( ω ) + A xz ( ω ) V zw B wy ( ω ) , (43b) B xy ( ω ) = A xy ( ω ) − (cid:73) Γ d Ω π i A zx ( − Ω ) V zw A wy [ Ω , ω ] . (43c)This uses a divided di ff erence, f [ x , y ] = [ f ( x ) − f ( y )] / ( x − y ),and an analytic reconstruction of A xy ( ω ai ) with the form A xy ( ω ai ) = (cid:73) Γ d Ω π i A xy ( − Ω ) ω ai + Ω (44)for a closed counterclockwise contour Γ separating − ω ai fromthe poles of A xy ( − ω ), which are disjoint if A xy ( ω ai ) is finite.We maximize the superficial similarity of C pqrs and T abi j byagain regrouping terms in T abi j and applying Eq. (25), T abi j = − (cid:90) i ∞− i ∞ d Ω π i S a ix U xy ( ω ai , ω b j ) S b jy ( ω ai − Ω )( ω b j + Ω ) , (45a) U xy ( ω, ω (cid:48) ) = V xy + V xz Q zw ( ω, ω (cid:48) ) V wy , (45b) Q xy ( ω, ω (cid:48) ) = A xy ( ω ) + A yx ( ω (cid:48) ) + A xz ( ω ) V zw A yw ( ω (cid:48) ) . (45c)While both C pqrs and T abi j produce the same value of E RPAc , theircorresponding SOSEX-like values, E SOSEXc = − V i jba T abi j and E AC − SOSEXc = − V rsqp C pqrs , are not equal and begin to di ff er atthird order in perturbation theory . However, E AC − SOSEXc and E SOSEXc are numerically similar for small molecules .In a BRPA calculation based on Sec. II C, T abi j will appearin its own calculation and also in the calculation of σ pq . Thereare O ( α n ) variables in T abi j , which is the memory bottleneck.Direct calculation of S aix in Eq. (41) and of σ pq from S aix willreduce the number of variables to O ( α n ) if the full storageof B xy ( ω ai ) is avoided. A xy ( ω ai ) is not directly useful becauseit contains O ( α n ) variables. Nevertheless, the interpolationof A xy ( ω ai ) from an e ff ective quadrature of Eq. (44) reduces itto O ( α β n ) variables and enables fast algorithms. V. FAST ALGORITHM DESIGN
Here we design fast and conventional algorithms for MP2and RPA + SOSEX calculations. In each of these four designs,we begin by identifying the tensor equations to be evaluated.Many of these equations are su ffi ciently complicated that analgorithm for e ffi cient evaluation is not immediately obvious.We use simple pseudocode to decompose these equations intointermediate variables and elementary operations. Algorithmcosts are then easy to account for in the pseudocode. Withoutthe simple primary and auxiliary basis sets in Sec. III A, thisdesign process becomes significantly more di ffi cult. Even so,the results here demonstrate what is possible with a practicalbasis set if su ffi cient e ff ort is given to algorithm design.The pseudocode is grouped into functions. Memory usageis delineated at the beginning of the function declaration withinputs then outputs separated by a semicolon in the functionargument and a workspace statement that lists all temporaryvariables. The body of each function contains ‘for’ loops andelementary ‘ A : = B ’ operations that denote the calculation ofexpression B and its storage in variable A . Each bottleneck iscommented with its leading-order operation count in the costpolynomial of { α, β , β , β , γ, n } . We count as one operationeach addition, multiplication, and division. Real and complexare not distinguished in operation and variable counts.The basic design strategies for minimizing cost are reuseof calculations and avoidance of concurrent data storage. Allcomputational costs that are leading order in n are arranged aseither matrix-matrix multiplications with the minimum matrixdimension maximized or as fast V xy summations. This choicehides the cost of data movement in e ffi cient implementationsof linear algebra and V xy summation. These strategies su ffi cefor the design of serial algorithms, but parallel algorithms willneed to explicitly account for communication costs.A common element of the four algorithms is calculationof σ p q or σ pq directly in the primary basis. The SCF equationsfrom Sec. II C in the primary basis are analogous to Eq. (5), E = E + ρ xy ( h yx + f yx ) , (46a) f xy = h xy + v x δ xy − ρ xy V xy + Σ xy , (46b) f xy φ py = (cid:15) p φ px , (46c) ρ xy = φ ix φ ∗ iy , (46d) v x = V xy ρ yy , (46e) Σ xy = ( σ xy + σ ∗ yx + σ xz ρ zy + ρ xz σ ∗ yz − ρ xz σ zw ρ wy − ρ xz σ ∗ wz ρ wy ) , (46f) σ xy = φ ax φ ∗ iy (cid:101) T abi j ( f j b + V xz S j bz − S j bz V zy ) , (46g)with E and Σ pq also calculated in the primary basis. A. Conventional MP2 algorithm
Algorithm 1 calculates σ xy using Eqs. (19) and (46g) bysubstituting σ xy ⇒ σ xy and T abi j ⇒ T ab i j . It is comparable toan RI-MP2 algorithm when Eq. (46) is used to calculate E from σ xy . Both require O ( α n ) operations, but the O ( α n )memory cost of RI-MP2 reduces to an O ( α n ) memory costby avoiding storage of the dense RI vertex in Eq. (24a). Theintermediate calculation of σ xy instead of a direct calculationof E enables the self-consistent MP2 methods in Sec. II C. ALGORITHM 1. Conventional MP2 function MP2c ( f i a , V xy , (cid:15) p , φ px ; σ xy ) workspace : { X , X x , X ai , X ix , Y ix } σ xy : = for each a do X ix : = φ ix φ ∗ ax X ix : = V xy X iy for each i do Y jx : = X jx φ ix − X ix φ jx X bj : = φ ∗ bx Y jx (cid:46) α n X bj : = X bj / ( (cid:15) a − (cid:15) i + (cid:15) b − (cid:15) j ) X : = X bj f j b Y jx : = X bj φ bx (cid:46) α n X x : = φ ∗ jx Y jx X x : = V xy X y σ xy : = σ xy + φ ax φ ∗ iy ( X + X x − X y ) end for end for end function B. Conventional BRPA algorithm
Because of the T abi j structure found in Sec. IV B, the SCFinner loop proposed in Sec. II C only needs to determine S aix rather than T abi j . We rearrange Eq. (41) into a residual tensor, R aix = S aix − S a ix + S aiy V yz S bjz S j bx ω ai + ω b j , (47)and recast the SCF inner loop as solving R aix =
0. Algorithm 2calculates R aix with O ( α n ) operations and O ( α n ) memory.As in the MP2 case, the operation count is the same as otherRPA + SOSEX algorithms , but avoiding the storage of T abi j reduces the O ( α n ) memory cost.Algorithm 3 calculates σ xy using Eqs. (40) and (46g). It iscomparable in cost with the R aix inner loop calculations, whichviolates the assumption in Sec. II C of an inexpensive innerloop. As a result, it is more e ffi cient to calculate R aix and σ xy concurrently and solve the orbital self-consistency and R aix = ALGORITHM 2. Conventional BRPA inner loop function inBRPAc ( V xy , (cid:15) p , φ px , S aix ; R aix ) workspace : { X ai , X ix , Y ix } R aix : = S aix − φ ∗ ax φ ix for each a do X ix : = V xy S aiy for each i do X bj : = S bix X jx (cid:46) α n X bj : = X bj / ( (cid:15) a − (cid:15) i + (cid:15) b − (cid:15) j ) Y jx : = X bj φ bx (cid:46) α n R ajx : = R ajx + Y jx φ ∗ ix end for end for end function ALGORITHM 3. Conventional BRPA outer loop function outBRPAc ( f i a , V xy , (cid:15) p , φ px , S aix ; σ xy ) workspace : { X ai , Y ai , X ix , Y ix , X aix } X ai : = X aix : = for each a do X ix : = V xy S aiy for each i do Y bj : = S bix X jx (cid:46) α n Y bj : = Y bj / ( (cid:15) a − (cid:15) i + (cid:15) b − (cid:15) j ) X aj : = X aj − Y bj f i b X ai : = X ai + Y bj f j b Y jx : = Y bj φ bx (cid:46) α n X ajx : = X ajx − Y jx φ ∗ ix X aix : = X aix + Y jx φ ∗ jx end for end for X aix : = V xy X aiy σ xy : = φ ax φ ∗ iy ( X ai + X aix − X aiy ) end function C. Fast MP2 algorithm
MP2 corresponds to A xy ( ω ) = A xy ( ω ai ) is still an elementof fast MP2 calculations. To e ffi ciently calculate A xy ( ω ai ), weintroduce a mean-field Green’s function, G xy ( ω ) = φ px φ ∗ py ω − (cid:15) p , G pxy = G xy ( ω p ) . (48)With Eqs. (25) and (26), we reduce A xy ( ω ai ) in Eq. (43a) to A xy ( ω ai ) ≈ Ω e A e xy ω ai − ω e , A e xy = A xy ( ω e ) , (49a) A xy ( ω e ) ≈ − Ω eai G ixy G ayx . (49b)Given G pxy , we need O ( α β β β n ) operations to form A e xy .Repeated application of Eqs. (25) and (26) to σ xy enablesits decomposition into G pxy and rearrangement into σ xy ≈ F i xz G izy − G axz F a zy , F i xy = Ω eai G axy W e yx + Ω eai G axz V zy ( Ξ e zyx + J e zy ) , F a xy = Ω eai G ixy ( W e xy + v e x ) + Ω eai G izy V xz Ξ e xzy , Ξ e xyz = Ω e Ω eai G ixw G awy V wz , J e xy = Ω e Ω eai G ixz f zw G awy , W e xy = Ω e V xz A e zw V wy , v e x = V xy J e yy . (50) σ xy is calculated by Algorithm 4 in O ( α β ( β + β + γ ) n )operations and O ( α ( β + β + β ) n ) memory. This improveson the O ( α n ) operations used by THC-MP2 and O ( α n )operations used by atomic-orbital Laplace MP2 with integralscreening . Neither method uses fast V xy summation, whichenables the extra factor of n speedup in the new algorithm. ALGORITHM 4. Fast MP2 function MP2f ( Ω e , Ω eai , f xy , V xy , G pxy ; σ xy ) workspace : { v e x , X epx , J e xy , W exy , X exy , F p xy } J e xy : = W e xy : = F i xy : = G ixz f zy for each i do X exy : = Ω e Ω eai G axy (cid:46) β β β α n W e xy : = W e xy − G ixy X eyx J e xy : = J e xy + F i xz X ezy (cid:46) β β α n end for W e xy : = V xz W e zy W e xy : = V yz W e xz v e x : = V xy J e yy F p xy : = for each x do X eay : = Ω eai G ixy (cid:46) β β β α n X eiy : = Ω eai G ayx (cid:46) β β β α n F a xy : = F a xy + X eay ( W e xy + v e x ) F i yx : = F i yx + X eiy W e xy X ezy : = X eiy G izy (cid:46) β β α n X ezy : = V yw X ezw (cid:46) β γα n X ezy : = Ω e X ezy V xz F a zy : = F a zy + X eay X ezy (cid:46) β β α n X ezy : = X eaz G azy (cid:46) β β α n X ezy : = V zw X ewy (cid:46) β γα n X ezy : = ( Ω e X ezy + J e xy ) V xy F i zy : = F i zy + X eiz X ezy (cid:46) β β α n end for σ xy : = F i xz G izy − G axz F a zy end function D. Fast BRPA algorithm
As in Eq. (49a), we use Eq. (25) to approximate B xy ( ω ai )with B exy = B xy ( ω e ) and postulate a similar form for A xy ( ω ai ), B xy ( ω ai ) ≈ Ω e B exy ω ai − ω e , A xy ( ω ai ) ≈ Ω e A exy ω ai − ω e , (51)with A exy (cid:44) A xy ( ω e ). Using Eq. (28), this reduces Eq. (43) to A e xy = − Ω eai G ixy G ayx , (52a) A exy = B exy + A exz V zw B ewy − Ω f ∆ e fg A fxz V zw B gwy , (52b) B exy = A e xy + Ω f ∆ e fg A fzx V zw A g wy , (52c)for ω ∈ { ω ai } . These equations are equivalent to R exy = R exy = A exy − B exy − A exz V zw B ewy + Ω f ∆ e fg A fxz V zw B gwy . (53) R exy is calculated by Algorithm 5 in O ( α β n ) operations and O ( α β n ) memory. Compared to the conventional algorithmin Sec. V B, this is more e ffi cient in operations when β (cid:28) n and in memory when β (cid:28) n . The calculation of W xy ( ω ) atquadrature points in Eq. (39) is noniterative and has the samecost, but we lack e ffi cient formulas relating W xy ( ω ) to σ xy . ALGORITHM 5. Fast BRPA inner loop function inBRPAf ( ω e , Ω e , ∇ ef , V xy , A e xy , A exy ; R exy ) workspace : { X ef , X exy , Y exy } X ef : = Ω f (1 − δ ef ) / ( ω e − ω f + δ ef ) X exy : = V xz A e zy (cid:46) β γα n Y exy : = Ω e ∇ ef X fxy (cid:46) β α n R exy : = A e xy − A ezx Y ezy (cid:46) β α n Y exy : = A ezx X ezy (cid:46) β α n R exy : = R exy + X ef Y fxy (cid:46) β α n Y exy : = X ef A fxy (cid:46) β α n R exy : = R exy + Y ezx X ezy (cid:46) β α n X exy : = V xz R ezy (cid:46) β γα n R exy : = A exy − R exy − Y exz X ezy (cid:46) β α n Y exy : = A exz X ezy (cid:46) β α n R exy : = R exy + X ef Y fxy (cid:46) β α n Y exy : = X exy + Ω e ∇ ef X fxy (cid:46) β α n R exy : = R exy − A exz Y ezy (cid:46) β α n end function In terms of half-transformed Green’s functions, G aax = φ ∗ ax ω a − (cid:15) a , G ixi = φ ix ω i − (cid:15) i , (54)Eqs. (25), (26), (28), (40), and (42) reduce T abi j to T abi j ≈ − G aax G ixi Ω eai U e fxy Ω fb j G bby G jy j , U e fxy = L egxz Ω g V zw L f gyw , L e fxy = δ ef δ xy + Ω g ∆ f ge V xz A gzy . (55)This bears some resemblance to the approximate THC-CCSDform of T abi j . U e fxy appears in the BRPA analog of Eq. (50), σ xy ≈ F ixz G izy − G axz F azy , F ixy = Ω eai G axy W eyx + Ω ea j Ω fbi G axz U e fzy ( Ξ jbzyx + J jbzy ) , F axy = Ω eai G ixy ( W exy + v ex ) + Ω ea j Ω fbi G izy U e fxz Ξ jbxzy , Ξ iaxyz = G ixw G awy V wz , J iaxy = G ixz f zw G awy , W exy = Ω e V xz A ezw V wy , v ex = U e fxy Ω fai J iayy . (56) σ xy is calculated by Algorithm 6 using O ( α β β ( β + γ ) n )operations and O ( α ( β + β + β ) n ) memory. A fast BRPAcalculation of σ xy is a factor of O ( β − β β ) more expensivethan a fast MP2 calculation of σ xy in the γ (cid:29) β m regime. Thecalculations in Eq. (56) are especially di ffi cult when memorylimitations prohibit concurrent storage of U e fxy , J iaxy , and Ξ iaxyz .Full evaluation of Ξ iaxyz requires O ( α β β γ n ) operations, andwithout concurrent storage it must be evaluated twice overall.Without the simple primary and auxiliary basis structure fromSec. III A, pseudocode that is equivalent to Algorithm 6 willbecome significantly longer and more di ffi cult to design. ALGORITHM 6. Fast BRPA outer loop function outBRPAf ( ω e , Ω e , ∇ ef , Ω eai , f xy , V xy , A exy , G pxy ; σ xy ) workspace : { U ef , W ef , X ef , Y ef , Z ef , X eai , v ex , X efx , X aix , X epx , Y epx , W exy , X exy , X ixy , F pxy } F pxy : = X ef : = (1 − δ ef ) / ( ω e − ω f + δ ef ) Y ef : = Ω g X ge X gf X ixy : = G ixz f zy X exy : = Ω e V xz A ezy W exy : = V yz X exz X aix : = X ixy G ayx v ex : = Ω eai X aix X efx : = X eyx v fy v ex : = v ex + X ef ( X f fx − X fex ) − ∇ ef X efx v ex : = Ω e V xy v ey X efx : = X exy v fy v ex : = v ex + X fe ( X efx + X fex ) − ∇ fe X f fx for each x do X eay : = Ω eai G ixy X eiy : = Ω eai G ayx F axy : = F axy + X eay ( W exy + v ex ) F iyx : = F iyx + X eiy W exy for each y do Z ef : = X exz W fyz U ef : = Y ef Z ef + Ω g ( ∇ ge ∇ gf Z gg − X ge ∇ gf Z eg − ∇ ge X gf Z gf ) W ef : = W fxy − X eg Z fg U ef : = U ef + Ω f ( X fe W fe − ∇ fe W ff ) W ef : = W fyx − X eg Z gf U ef : = U ef + Ω e ( X ef W ef − ∇ ef W ee ) U ee : = U ee + Ω e ( V xy − X ef W fxy − X ef W ef ) X eai : = U ef Ω fai (cid:46) β β β α n X aiz : = G azx G iyz X aiz : = V zw X aiw (cid:46) β β γα n Y eiz : = X eaz X aiz F ayz : = F ayz + X eai Y eiz (cid:46) β β β α n X aiz : = G azy G ixz X aiz : = V zw X aiw (cid:46) β β γα n X aiz : = X aiz + X ixw G awy Y eaz : = X eiz X aiz F izy : = F izy + X eai Y eaz (cid:46) β β β α n end for end for σ xy : = F ixz G izy − G axz F azy end function VI. APPLICATIONS
Before committing further to its development, it is prudentto test the accuracy of BRPA theory against other popular totalenergy methods and performance of the fast MP2 and BRPAalgorithms against their conventional counterparts. Direct useof the algorithms in Sec. V is possible for Hamiltonians thatare limited to a zero-di ff erential-overlap (ZDO) form . Therehave been proposals to calculate electron correlation within asemiempirical framework , but they remains undeveloped. E ( H a ) (a) 1.151.101.051.000.950.900.850.800.75 E ( H a ) (b)HFBRPAMP2exactB3LYPsymmetry-broken gapnoninteracting gap (2 t ) optimized gap1 2 3 4 5 6 7 8 9 10 11 12 R (Bohr)0.00.10.20.30.40.50.60.70.80.9 ∆ ( H a ) (c) 1 2 3 4 5 6 7 8 9 10 11 12 R (Bohr) 0.010.000.010.020.030.040.05 ∆ E ( H a ) (d) FIG. 1. Total energy, E , with symmetry (a) preserved and (b) broken, (c) virtual-occupied energy gap, ∆ , and (d) error in the symmetry-brokentotal energy, ∆ E , for H as a function of bond length, R . The RPA + SOSEX correlation energy is exact for an optimized gap when R < . We consider H n because it gives us fine control over thenumber of electrons and also because H is the simplest two-electron molecule with an internal coordinate. The ZDO formof H is an extended Hubbard model with α = h = µ − t − t µ µ − t − t µ , V = U V U VV U V UU V U VV U V U , (57)in the notation of Eq. (22). It has a natural pairwise extensionto the n > and H n Hamiltonians are fit inAppendix B. The H version is fit to reproduce exact, HF, andMP2 total energies. The H n version is fit to a simple distance-dependent form that gives proper asymptotic behavior. Thesemodels have limited transferability, which is a standard caveatof semiempirical modeling with simple Hamiltonian forms. A. Accuracy of BRPA on H Symmetric dissociation of H is used as a critical test ofnew electron correlation methods . The model in Eq. (57)has simple formulas for HF, MP2, BRPA, and exact energies, E HF = E + µ − t ) + ( U + V ) , E MP2 = E HF − ( U − V ) t + V , E BRPA = E HF − ( U − V ) t + U , E = E HF + t − (cid:113) t + ( U − V ) . (58)Dissociation curves are shown in Fig. 1a. BRPA is a uniformimprovement over HF but has three times the error of MP2 orB3LYP at equilibrium. These methods fail at dissociation, butBRPA is stable at twice the B3LYP error as MP2 diverges. RPA + SOSEX total energies are strongly modulated by thechoice of mean field. As shown in Fig. 1c, errors in H nearequilibrium can be corrected by significantly reducing the gap.This is typical behavior, and the smaller orbital gaps in DFTreference mean fields systematically improve total energies over HF-based calculations . Reducing gaps to reduce errorsconflicts with the generalized Koopmans theorem in Eq. (21)that requires the BRPA gap to approximate the physical gap.On H , BRPA increases the HF gap towards the exact gap, butsome methods for calculating excitations such as GW theorydecrease the HF gap . One possible resolution is to derive anRPA + SOSEX model where orbital energies are not requiredto approximate electron addition and removal energies, suchas by modifying P λ xy ( ω ) in Eq. (35) with exchange terms .Dissociation failures can be fixed by enabling symmetrybreaking in the reference state. As shown in Fig. 1d, the errornow peaks at around twice the equilibrium bond length. MP2now has uniformly less error than BRPA, but BRPA fixes thederivative discontinuity in the MP2 energy surface. Symmetrybreaking is needed for size consistency at this level of theorywhen considering the dissociation of a closed-shell moleculeinto open-shell fragments. It also serves as a simple model ofstatic correlation. Broken symmetries can be restored with aprojection , but the resulting total energy corrections are notsize consistent. An accurate, size-consistent model of electroncorrelation should be able to repair symmetries broken by thereference state accurately but not necessarily exactly.A single example such as H is not enough to determineerror statistics and compare the average errors of total energymethods. However, while errors on a system as important andsimple as H remain large, it su ffi ces to discount methods.0 TABLE I. Leading-order, per-iteration floating-point costs and memory footprints of MP2 and the inner and outer loops of BRPA.
Algorithm Operations Memory1. Conventional MP2 4 α n . α n
2. Conventional Inner BRPA 4 α n α n
3. Conventional Outer BRPA 4 α n α n
4. Fast MP2 6 β β β α n + β (2 β + β + γ ) α n β β β + ( β β + β β ) α n + (3 β + β + β ) α n
5. Fast Inner BRPA 2 β (5 β + γ ) α n + β α n β + . β α n
6. Fast Outer BRPA 2 β β β α n + β + γ ) β β α n β β β + ( β + β β + β β + β β ) α n + (3 β + β + β ) α n B. Scaling of BRPA on H n We benchmark the algorithms in Sec. V on H n rings, withthe intent to produce representative scaling behavior and notnecessarily to optimize performance. To this end, we consideronly n = m + ∆ ≈ . n − . . Using thequadrature formulas in Appendix A with errors that are set to ε Q = − , the quadrature sizes are observed to increase as β ≈ . + . n ,β ≈ − . + . n ,β ≈ . + . n . (59)These are not asymptotic scalings. When the orbital gap fallsbelow a numerical low-energy cuto ff , we will add an artificialgap comparable to this cuto ff that limits quadrature size.Benchmarks are measured on a c implementation . TheHF Hamiltonian matrix is diagonalized with the dsyev routinein lapack . Tensors are contracted with the real dgemm andcomplex zgemm routines in blas . A conversion factor fromruntimes to operation counts is determined by timing the 2 n operations of n -by- n matrix-matrix multiplications in dgemm calls for large n . Conventional algorithms use real arithmetic,while fast algorithms use complex arithmetic. The increasedcost of complex arithmetic is mitigated by using symmetry . V xy ρ y is summed with fftw . For regular behavior of γ , wefurther restrict H n to n = p − V xy ρ y in a cyclicconvolution of size 2 p + . We observe an fftw scaling of γ ≈ +
13 ln n . (60)Other fast summation methods avoid a ln n prefactor.Benchmark results are shown in Fig. 2. The runtimes arecompared to the model estimates summarized in Table I. Theagreement is good, and all discrepancies can be rationalized.The inBRPAf cost is increased 50% by complex arithmetic .The excess cost of MP2f is caused by zgemm calls with small O ( β ) matrix dimensions. The excess cost of outBRPAf iscaused by subleading-order terms with large prefactors. Fastand conventional calculations agree to 10 − Ha in σ xy .In the cost model, the crossover between conventional andfast runtimes occurs at n =
62 for MP2 and n =
422 for BRPA.As all algorithms have an O ( α ) scaling, larger basis sets willnot shift the crossover. Other prefactors will but are indicativeof realistic values as quadrature sizes are reported from 16to 40 and the FFT is a standard fast summation method. -4 -3 -2 -1 ti m e p e r it e r a ti on ( s ) (a) RPA outconv . MP2 conv . RPA inconv . RPA outfast
MP2 fast
RPA infast HF β β β γ -1 m e m o r y ( M i B ) (b)10 n p r e f ac t o r (c) FIG. 2. The (a) operation, (b) memory, and (c) prefactor costs of H n on a single core of an Intel Xeon X5650. Points are measured costs,and the equivalently colored lines are the model costs . The modelconversion factor between operation count and runtime is 10 Gflops. VII. DISCUSSION
A natural continuation of the research in this paper is theextension of the algorithms in Sec. V for implementation inestablished electronic structure codes. For a Gaussian-orbitalcode, this means nontrivial overlap matrices between primarybasis functions and between pairs of primary basis functionsand auxiliary basis functions. It needs an implementation ofTHC or some other decomposition of V pqrs with a sparse vertextensor and a Gaussian-compatible FMM . This combinationof features is not yet available in any code. For a planewave-pseudopotential code, algorithms for periodic systems need tobe developed. Suitable FFT and grid operations for e ffi cient V pqrs decomposition are widely available in these codes.An orthogonal direction for continued research is furtherdevelopment of basic algorithms and numerical analysis thatimprove cubic-scaling correlation models. One such directionis analysis of the slow basis set convergence that is a universaldetriment to non-DFT electron correlation models. Another isthe development of more accurate correlation models that arerestricted to O ( α n ) operations and O ( α n ) memory. Thesetwo directions are discussed in more detail below. A. The operator approximation problem
The α prefactor depends on the choice of basis set. Basisconstruction is widely considered as a function approximationproblem of electron orbitals. It is usually focused on occupiedorbitals, their response to perturbations, and low-lying virtualorbitals. Larger α values improve their approximation, whilesmaller α values reduce costs. The number of virtual orbitalsdefined by the basis set is also determined by α .Slow basis set convergence manifests in the virtual orbitalsummations in MP2 and related calculations. Convergence isoften accelerated with basis set extrapolation to avoid large α values . In wavefunction-based methods, slow convergenceis attributed to electron-electron cusps in the wavefunction that are corrected by R12 / F12 or Jastrow methods. In thefast algorithms in Sec. V, there are only operators and theirfinite-basis errors instead of summations over virtual orbitalsor wavefunction cusps. In the case of a spinless nonrelativisticHamiltonian, the BRPA operator variables asymptote to G ( (cid:126) x , (cid:126) y , ω ) → − π | (cid:126) x − (cid:126) y | , A ( (cid:126) x , (cid:126) y , ω ) → − ρ ( (cid:126) x , (cid:126) y )2 π | (cid:126) x − (cid:126) y | , (61)as | (cid:126) x − (cid:126) y | →
0. These singularities converges slowly with basisset size and are better approximated by functions of | (cid:126) x − (cid:126) y | .The e ffi cient and accurate representation of A ( x , y , ω ) and G ( x , y , ω ) is an operator approximation problem. G ( x , y , ω ) isthe simplest case where we seek to satisfy (cid:90) d z [ ωδ ( x − z ) − f ( x , z )] G ( z , y , ω ) ≈ δ ( x − y ) . (62)The conventional approach is to approximate G ( x , y , ω ) with asum of pairwise products of basis functions . The possibilitiesbeyond this include basis operators that are added linearly oras pairwise products and the direct modeling of singularities.An operator-based approach can also use spatial localization that does not manifest in molecular orbitals. These ideas forma distinct alternative to orbital function approximation. B. The cost-restricted electron correlation problem
The BRPA model developed in this paper is in the class ofRPA models, but it is derived from BCCD theory. Historically,the mathematical structure of RPA has emerged from multipletheories: resummation of diagrammatic perturbation theory ,boson approximation of electron-hole excitation operators ,equations-of-motion for ground state annihilation operators ,integration of response functions over interaction strength ,integration of Slater determinants over generator variables ,1 / N expansion in N interacting copies of the Hilbert space ,total energy functionals of the many-body Green’s function ,and an adaptation of interaction strength integration to DFT .This diverse set of theoretical ideas is responsible for the largenumber of modern RPA variants . Much like DFT, thereis no rigorous delineation on what can be incorporated into anRPA model of electron correlation energy.The results of this paper suggest a delineation of modelsbased on computational complexity: a cost strictly bound bya scaling of O ( n ) operations and O ( n ) memory with a limiton the form of allowed approximations. Other approximationssuch as orbital localization and stochastic sampling canfurther reduce the scaling of MP2-like methods but introducelocalization length and number of samples as complications.A critical question is whether or not these cost restrictions aresu ffi cient to enable method development based on the directapproximation of a quantum state as in CC theory or if it is tobe confined to the indirect and often empirical developmentthat is characteristic of modern DFT . VIII. CONCLUSIONS
The results of this paper cross an important threshold. Forthe first time, the computational cost of an electron correlationmodel with a direct connection to an underlying wavefunctionhas been reduced to the canonical cost of an SCF calculation, O ( n ) operations and O ( n ) memory, without localization orstochastic approximations. This includes MP2 theory and anRPA + SOSEX approximation of BCCD theory. The new ideasused to achieve this result may also contribute to reducing thecost of more accurate electron correlation models.Electron correlation models that require O ( n ) operationsand O ( n ) memory could constitute a distinct paradigm withcontinued development. They are a compromise between thelow cost of DFT and the high accuracy of CC theory. For thiscompromise to be worthwhile, we need to explore the balancebetween cost and accuracy to identify its fundamental limits. ACKNOWLEDGMENTS
I thank Jay Sau, Norm Tubman, Je ff Hammond, AndrewBaczewski, Rick Muller, and Toby Jacobson for discussions.I thank Andrew Baczewski for checking the mathematics andproofreading the manuscript. This work was supported by theLaboratory Directed Research and Development program atSandia National Laboratories. Sandia National Laboratories isa multi-program laboratory managed and operated by SandiaCorporation, a wholly owned subsidiary of Lockheed MartinCorporation, for the U.S. Department of Energy’s NationalNuclear Security Administration under contract DE-AC04-94AL85000.2
Appendix A: Numerical quadrature
An implementation of the fast MP2 and BRPA algorithmsrequires numerical quadratures for the integrals appearing inEqs. (25) and (26). We restate these integrals and quadraturesin a more general notation with explicit summations,1 a + a (cid:48) = (cid:90) i ∞− i ∞ dx π i a − x )( a (cid:48) + x ) ≈ n x (cid:88) i = w i ( a − x i )( a (cid:48) + x i ) , a , a (cid:48) ∈ A ,θ ( b , B ) θ ( c , C ) b − c + x = (cid:73) Y dy π i (cid:73) Z dz π i b − y )( c − z )( y − z + x ) ≈ n y (cid:88) i = n z (cid:88) j = W i j ( x )( b − y i )( c − z j ) , b , c ∈ B ∪ C ,θ ( s , S ) = ∀ s ∈ S , θ ( s , S ) = ∀ s (cid:60) S . (A1) A is the set of orbital transition energies ( (cid:15) a − (cid:15) i ), B is the setof virtual orbital energies ( (cid:15) a ), C is the set of occupied orbitalenergies ( (cid:15) i ), X is the set of imaginary reals, and Y and Z areclosed counterclockwise contours that satisfy B ⊂ in( Y ) , ( B + X ) ∩ in( Z ) = ∅ , C ⊂ in( Z ) , ( C − X ) ∩ in( Y ) = ∅ , (A2)for interior, in( S ) = { ts + (1 − t ) s (cid:48) : t ∈ [0 , , s , s (cid:48) ∈ S } , andelemental arithmetic, S ± S (cid:48) = { s ± s (cid:48) : s ∈ S , s (cid:48) ∈ S (cid:48) } , setoperations. We assume A = B − C and a > ∀ a ∈ A .Eq. (A1) is assembled from more elementary quadratures.We split the first integral into two terms with partial fractions,1 a + a (cid:48) = a + a (cid:48) (cid:90) i ∞− i ∞ dx π i (cid:34) a − x − − a (cid:48) − x (cid:35) , (A3)and fit quadrature to its two terms on a merged domain, θ ( a , A ) − = (cid:90) i ∞− i ∞ dx π i a − x ≈ n x (cid:88) i = w i a − x i , a ∈ A (cid:48) , (A4)for A (cid:48) = A ∪ {− a : a ∈ A } . For the second integral, we splitwith partial fractions after performing one of the integrations, θ ( b , B ) θ ( c , C ) b − c + x = θ ( c , C ) b − c + x (cid:73) Y dy π i (cid:34) b − y − c − x − y (cid:35) , = θ ( b , B ) b − c + x (cid:73) Z dz π i (cid:34) c − z − b + x − z (cid:35) , and again fit quadratures to each contour on a merged domain, θ ( b , B ) = (cid:73) Y dy π i b − y ≈ n y (cid:88) i = u i b − y i , b ∈ B (cid:48) ,θ ( c , C ) = (cid:73) Z dz π i c − z ≈ n z (cid:88) i = v i c − z i , c ∈ C (cid:48) , (A5)for B (cid:48) = B ∪ ( C − X ) and C (cid:48) = C ∪ ( B + X ). The fits combineto produce the quadrature weight, W i j ( x ) = u i v j / ( y i − z j + x ).We assume that the reassembly of these quadratures is stableand leave further error analysis to future work. We approximate Eq. (A4) with a 1-parameter quadrature,12 sgn( a (cid:48) ) ≈ n + (cid:88) i = w (cid:48) i a (cid:48) − x (cid:48) i , a (cid:48) ∈ [ − , − k ] ∪ [ k , , (A6)for 0 < k < (cid:34) w i x i a (cid:35) = max( A ) (cid:34) w (cid:48) i x (cid:48) i a (cid:48) (cid:35) , k = min( A )max( A ) , n x = n + . (A7)Zolotarev’s rational approximation of the sign function usingJacobi elliptic functions minimizes the maximum error , ε = r ( k ) − r ( κ ) r ( k ) + r ( κ ) , r ( x ) = x n (cid:89) m = x + f m − x + f m , f m = k sn( m θ, k (cid:48) )cn( m θ, k (cid:48) ) , w (cid:48) = r ( k ) + r ( κ ) n (cid:89) m = f m − f m , w (cid:48) m = w (cid:48) m + =
12 1 r ( k ) + r ( κ ) n (cid:89) p = f m − f p − f m − f p (1 − δ pm ) , x (cid:48) = , x (cid:48) m = − x (cid:48) m + = i f m , k (cid:48) = √ − k , κ = k dn( θ, k (cid:48) ) , θ = K (cid:48) ( k )2 n + . (A8) ε is the maximum error, which decays exponentially in n withan exponent of 2 π K ( k ) / K (cid:48) ( k ) that asymptotes to π / ln(4 / k )in the k → . sn( u , k ), cn( u , k ), and dn( u , k ) are Jacobielliptic functions. K ( k ) and iK (cid:48) ( k ) are their quarter periods.We approximate Eq. (A5) with a 1-parameter quadrature,sgn(Re( b (cid:48) )) ≈ n + (cid:88) i = u (cid:48) i b (cid:48) − y (cid:48) i , b (cid:48) ∈ X ∪ [ λ, , (A9)for 0 < λ < (cid:34) u i y i − max( C ) b − max( C ) (cid:35) = [max( B ) − max( C )] (cid:34) u (cid:48) i y (cid:48) i b (cid:48) (cid:35) ,λ = min( B ) − max( C )max( B ) − max( C ) , n y = n + , or (cid:34) v i z i − min( B ) c − min( B ) (cid:35) = [min( C ) − min( B )] (cid:34) u (cid:48) i y (cid:48) i b (cid:48) (cid:35) ,λ = max( C ) − min( B )min( C ) − min( B ) , n z = n + . (A10)The interiors of B + X and C − X are omitted from Eq. (A9) bythe maximum modulus principle. We map from Eq. (A6) toEq. (A9) with a shift of sgn( a (cid:48) ) by 1 / a (cid:48) = (1 + k )( b (cid:48) ) − k − (1 + k )( b (cid:48) ) , k = λ − λ + √ − λ . (A11)The rational transformation creates a reflection of the originalcontour about the imaginary axis and doubles the number ofpoles. We remove the new contour and its poles leaving u (cid:48) i = w (cid:48) i y (cid:48) i − k (1 + k )(1 + x (cid:48) i ) , y (cid:48) i = (cid:114) k + x (cid:48) i (1 + k )(1 + x (cid:48) i ) . (A12)Empirically, we observe that the maximum error is twice ε inEq. (A8). This approximant does not minimize the maximumerror exactly, but we conjecture that the exponential decay rateof the error with n is optimal.3 Appendix B: Semiempirical H n model To solve the semiempirical H model in Eq. (57), we usea family of reference orbitals parameterized by θ , φ a = sin θ − cos θ , φ b = − cos θ sin θ , φ i = cos θ sin θ , φ j = θ cos θ , (B1)where { i , j } label the two occupied orbitals and { a , b } label thetwo virtual orbitals. The nonzero matrix elements are h ii = h jj = µ − t sin 2 θ, h aa = h bb = µ + t sin 2 θ, h ia = h jb = t cos 2 θ, V pppp = V a ja j = V bibi = U − ( U − V )(sin 2 θ ) , V aiai = V b jb j = V abab = V i ji j = V + ( U − V )(sin 2 θ ) , V iiaa = V j jbb = − V i jab = ( U − V )(sin 2 θ ) , V ibab = V iiai = V jaba = V j jb j = − V iaaa = − V i ja j = − V jbbb = − V jibi = ( U − V ) sin 4 θ, (B2)with symmetric extensions for qp ⇔ pq and pqrs ⇔ qpsr .The BCCD total energy in terms of T = T abi j is E = E + µ − t sin 2 θ ) + V + ( U − V )(1 − T )(sin 2 θ ) . (B3)The mean field equations are an orbital energy di ff erence, ∆ ,and a coupling between occupied and virtual orbitals, ∆ = t sin 2 θ + U − ( U − V )(1 − T )(sin 2 θ ) , (B4a)0 = [ t (1 + T ) − ( U − V )(1 − T ) sin 2 θ ] cos 2 θ. (B4b)These are the complete HF equations if T =
0. There are upto two solutions: a nonsymmetric state and a symmetric statefor sin 2 θ =
1. In BCCD theory, T is the solution to Eq. (12),0 = ∆ − U ) T − ( U − V )(1 − T + T )(sin 2 θ ) . (B5)The RPA Riccati equation in Eq. (14) similarly reduces to0 = ∆ T − ( U − V )(1 − T + T )(sin 2 θ ) , (B6)which simplifies the 4-by-4 matrix equation for T abi j to a directcondition on T by exploiting matrix symmetries.For θ = π , the quadratic formula is used to solve for T , T (exact) = (cid:114)(cid:104) tU − V (cid:105) + − tU − V ≤ , T (RPA) = (cid:104) + ∆ U − V (cid:105) − (cid:114)(cid:104) + ∆ U − V (cid:105) − ≤ , T (BRPA) = U − VU + t ≤ . (B7)The bounds on T are achieved at dissociation and for ∆ = ∆ . Theapproximations change the fundamental range of T values. We determine one set of parameters for the semiempiricalmodel by solving for { t , µ, V } given { E HF , E MP2 , E } at θ = π , E = E HF − E MP2 )( E HF − E ) / ( E MP2 − E ) , V = U + E (cid:104) − (cid:112) + ( E − E HF + U ) / E (cid:105) , t = ( U − V ) / ( E HF − E ) − ( E HF − E ) ,µ = t − ( U + V ) − ( E − E HF ) . (B8)where E = / R for interatomic separation R and U is set toits value at the dissociation limit, U = . data isgenerated in gaussian09 using the cc-pV6Z basis . The dataand parameters are compiled in Table II.We consider an extension to H n having a pairwise form, E = (cid:88) x (cid:44) y R xy , x , y ∈ { , · · · , n } , h ( x σ )( y σ (cid:48) ) = − δ σσ (cid:48) δ xy − δ σσ (cid:48) (1 − δ xy ) t ( R xy ) + δ σσ (cid:48) δ xy (cid:88) z (cid:44) x ∆ µ ( R xz ) , V ( x σ )( y σ (cid:48) ) = δ xy . + (1 − δ xy ) V ( R xy ) , (B9)with spin indices, σ ∈ {↑ , ↓} , and interatomic distances, R xy .Such a simple model will have limited transferability and careis required in fitting the parameter functions.For simplicity, we restrict the scaling study to uniform H n rings that obey H¨uckel’s rule, n = m +
2. Here, R xy is R xy = R (cid:113)(cid:104) − cos (cid:16) π ( x − y ) n (cid:17)(cid:105) (cid:46) (cid:104) − cos (cid:16) π n (cid:17)(cid:105) , (B10)for nearest-neighbor distance R . Spin and spatial symmetriessimplify the normalized orbitals to φ (1 σ )( x σ (cid:48) ) = δ σσ (cid:48) (cid:113) n , φ ( n σ )( x σ (cid:48) ) = δ σσ (cid:48) (cid:113) n ( − x ,φ (2 m σ )( x σ (cid:48) ) = δ σσ (cid:48) (cid:113) n sin (cid:16) π mxn (cid:17) , ≤ m ≤ n − ,φ (2 m + σ )( x σ (cid:48) ) = δ σσ (cid:48) (cid:113) n cos (cid:16) π mxn (cid:17) , (B11)The density matrix sums to form the Dirichlet kernel, ρ ( x σ )( y σ (cid:48) ) = n δ σσ (cid:48) sin( π ( x − y ) / π ( x − y ) / n ) . (B12)We use the HF energy of H , E , for fitting to these H n rings. R is optimized by minimizing the HF total energy.We fit Eq. (B9) to { E HF , E , E } with exponential forms for t ( R ), ∆ µ ( R ), and V ( R ) plus long-range asymptotic correctionsfor ∆ µ ( R ) and V ( R ) with the form of an electrostatic potentialbetween a 1s Slater-type orbital and point charge. With 6 freeparameters, the result of this nonlinear least-squares fit is V ( R ) = .
304 exp( − . R ) + [1 − exp( − . R )] / R , ∆ µ ( R ) = .
000 exp( − . R ) − [1 − exp( − . R )] / R , t ( R ) = .
206 exp( − . R ) , (B13)with a root-mean-square error of 0 .
008 Ha / atom. The primedparameters calculated from Eq. (B13) are given in Table II. Inthis model, H n rings have optimal R values between 1 .
36 and1 .
89 and their HF energy asymptotes to − .
527 Ha / atom.4 T A B LE II . T o t a l e n e r g i e s ( o f H un l e ss o t h e r w i s e s t a t e d ) a nd m od e l p a r a m e t e r s ( i n H a r t r ee s ) a t n ea r e s t - n e i ghbo r i n t e r a t o m i c s e p a r a ti on R ( i n B oh r s ) . R H F ( H ) H F B L Y P U B L Y P M P e x ac t µµ (cid:48) tt (cid:48) U & U (cid:48) VV (cid:48) . . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . . . . . . . − . − . − . − . − . − . − . − . − . . . . . . − . − . − . − . − . − . − . − . − . . . . . . − . − . − . − . − . − . − . − . − . . . . . . − . − . − . − . − . − . − . − . − . . . . . . − . − . − . − . − . − . − . − . − . . . . . ∞ − . − . − . − . − ∞ − . − . − . . . . . . A. Szabo and N. S. Ostlund,
Modern Quantum Chemistry: Introduction toAdvanced Electronic Structure Theory (Dover, New York, 1996). P. Hohenberg and W. Kohn, Phys. Rev. , B864 (1964). W. Kohn and L. J. Sham, Phys. Rev. , A1133 (1965). J. P. Perdew and A. Zunger, Phys. Rev. B , 5048 (1981). P. R. Briddon and M. J. Rayson, Phys. Status Solidi B , 1309 (2011). R. J. Bartlett and M. Musiał, Rev. Mod. Phys. , 291 (2007). G. E. Scuseria, T. M. Henderson, and D. C. Sorensen, J. Chem. Phys. ,231101 (2008). F. Furche, J. Chem. Phys. , 114105 (2008). H. Eshuis, J. Yarkony, and F. Furche, J. Chem. Phys. , 234114 (2010). A. Gr¨uneis, M. Marsman, J. Harl, L. Schimka, and G. Kresse, J. Chem.Phys. , 154115 (2009). D. M. Ceperley and B. J. Alder, Phys. Rev. Lett. , 566 (1980). D. L. Freeman, Phys. Rev. B , 5512 (1977). J. Paier, B. G. Janesko, T. M. Henderson, G. E. Scuseria, A. Gr¨uneis, andG. Kresse, J. Chem. Phys. , 094103 (2010); , 179902 (2010). A. Heßelmann, J. Chem. Phys. , 204107 (2011). X. Ren, P. Rinke, G. E. Scuseria, and M. Sche ffl er, Phys. Rev. B , 035120(2013). J. Paier, X. Ren, P. Rinke, G. E. Scuseria, A. Gr¨uneis, G. Kresse, and M.Sche ffl er, New J. Phys. , 043002 (2012). C. Van Alsenoy, J. Comput. Chem. , 620 (1988). L. Greengard and V. Rokhlin, J. Comput. Phys. , 325 (1987). J. Alml¨of, Chem. Phys. Lett. , 319 (1991). R. K. Nesbet, Phys. Rev. , 1632 (1958). L. Z. Stolarczyk and H. J. Monkhorst, Int. J. Quantum Chem., Symp. ,267 (1984). L. Goerigkab and S. Grimme, Phys. Chem. Chem. Phys. , 6670 (2011). A. J. Cohen, P. Mori-S´anchez, and W. Yang, Chem. Rev. , 289 (2012). T. Koopmans, Physica , 104 (1934). (in German) B. P. Molinari, SIAM J. Control , 262 (1973). T. N. Lan and T. Yanai, J. Chem. Phys. , 224108 (2013). Y. Akinaga, Y. Kawashima, and S. Ten-no, Chem. Phys. Lett. , 276(2011). I. Lindgren and S. Salomonson, Int. J. Quantum Chem. , 294 (2002). L. Hedin, Phys. Rev. , A796 (1965). R. M. Parrish, E. G. Hohenstein, T. J. Mart´ınez, and C. D. Sherrill, J. Chem.Phys. , 224106 (2012). H. Koch, A. S´anchez de Mer´as, and T. B. Pedersen, J. Chem. Phys. ,9481 (2003). T. J. Mart´ınez and E. A. Carter, J. Chem. Phys. , 3631 (1994). F. Weigend, A. K¨ohn, and C. H¨attig, J. Chem. Phys. , 3175 (2002). P. E. Bl¨ochl, Phys. Rev. B , 17953 (1994). J. Harl, L. Schimka, and G. Kresse, Phys. Rev. B , 115126 (2010). The α values are generated by vasp using the reported geometries and planewavecuto ff s. For an energy cuto ff E in Ha and a volume per electron V in Bohr, α ≈ V (2 E ) / / (3 π ) using basic accounting of planewaves inside a cubicunit cell in real space and a spherical Fermi sea in momentum space. L. Lin, J. Lu, L. Ying, and W. E, Chinese Ann. Math. B , 729 (2009). A. Takatsuka, S. Ten-no, and W. Hackbusch, J. Chem. Phys. , 044112(2008). S. A. Goreinov, E. E. Tyrtyshnikov , and N. L. Zamarashkin, Linear AlgebraAppl. , 1 (1997). C. A. White, B. G. Johnson, P. M.W. Gill, and M. Head-Gordon, Chem.Phys. Lett. , 8 (1994). J. Paier, R. Hirschl, M. Marsman, and G. Kresse, J. Chem. Phys. ,234102 (2005). B. Shanker and H. Huang, J. Comput. Phys. , 732 (2007). R. D. Skeel, I. Tezcan, and D. J. Hardy, J. Comput. Chem. , 673 (2002). S. B¨orm, L. Grasedyck, and W. Hackbusch, Eng. Anal. Bound. Elem. ,405 (2003). G. Jansen, R.-F. Liu, and J. G. ´Angy´an, J. Chem. Phys. , 154106 (2010). M. Feyereisen, G. Fitzgerald, and A. Komornicki, Chem. Phys. Lett. ,359 (1993). E. G. Hohenstein, R. M. Parrish, and T. J. Mart´ınez, J. Chem. Phys. ,044103 (2012). M. H¨aser, Theor. Chim. Acta , 147 (1993). E. G. Hohenstein, R. M. Parrish, C. D. Sherrill, and T. J. Mart´ınez, J. Chem.Phys. , 221101 (2012). I. Fischer-Hjalmars, J. Chem. Phys. , 1962 (1965). W. Thiel, J. Am. Chem. Soc. , 1413 (1981). T. M. Henderson and G. E. Scuseria, Mol. Phys. , 2511 (2010). P. Mori-S´anchez, A. J. Cohen, and W. Yang, Phys. Rev. A , 042507(2012). F. Caruso, D. R. Rohr, M. Hellgren, X. Ren, P. Rinke, A. Rubio, and M.Sche ffl er, Phys. Rev. Lett. , 146403 (2013). A. Heßelmann and A. G¨orling, Mol. Phys. , 2473 (2011). J. Hubbard, Proc. R. Soc. A , 336 (1957). C. A. Jim´enez-Hoyos, T. M. Henderson, T. Tsuchimochi, and G. E. Scuse-ria, J. Chem. Phys. , 164109 (2012). See the supplementary material at http: // arxiv.org / e-print / c implementation of the algorithms in Sec. V. E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J.Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen,
LAPACK Users’ Guide, Third Edition (SIAM, Philadelphia, 1999). J. J. Dongarra, J. Du Croz, I. S. Du ff , and S. Hammarling, ACM Trans.Math. Soft. , 1 (1990). In inBRPAf , tensors of the form Z ef and Z exy have symmetries Z ef = ( Z ef ) ∗ and Z exy = ( Z exy ) ∗ . MP2f contains these symmetries and adds Z epx = (cid:18) Z ep ∗ x (cid:19) ∗ , Z p ∗ xy = (cid:18) Z p xy (cid:19) ∗ , and Z eai = (cid:18) Z ea ∗ i ∗ (cid:19) ∗ . outBRPAf contains these symmetriesand adds Z a ∗ ix = (cid:18) Z ai ∗ x (cid:19) ∗ . Only half the entries of these symmetric tensorsneed to be stored and calculated, which halves the memory and operationcounts. Complex arithmetic doubles memory usage and FFT runtimes andtriples matrix multiplication runtimes over real arithmetic. The additionalcost of complex arithmetic is mitigated for all the operations except matrixmultiplication ( zgemm ), which is reduced to 1.5 times the cost model. M. Frigo and S. G. Johnson, Proc. IEEE , 216 (2005). The leading-order cost model assumes that all parameters are much largerthan one, which is false for α =
2. We correct the conventional algorithmcosts with α → α ( α − for the operation counts, 3 α → α + α − α → α ( α −
1) for both BRPA memory costs. D. G. Truhlar, Chem. Phys. Lett. , 45 (1998). T. Kato, Commun. Pure Appl. Math. , 151 (1957). L. Kong , F. A. Bischo ff , and E. F. Valeev, Chem. Rev. , 75 (2012). N. D. Drummond, M. D. Towler, and R. J. Needs, Phys. Rev. B , 235119(2004). S. Goedecker, Rev. Mod. Phys. , 1085 (1999). M. Gell-Mann and K. A. Brueckner, Phys. Rev. , 364 (1957). K. Sawada, Phys. Rev. , 372 (1957). M. Baranger, Phys. Rev. , 957 (1960). A. D. McLachlan and M. A. Ball, Rev. Mod. Phys. , 844 (1964). B. Jancovici and D. H. Schi ff , Nucl. Phys. , 678 (1964). A. J. Glick, H. J. Lipkin, and N. Meshkov, Nucl. Phys. , 211 (1965). J. Harris and R. O. Jones, J. Phys. F , 1170 (1974). H. Eshuis, J. E. Bates, and F. Furche, Theor. Chem. Acc. , 1084 (2012). X. Ren, P. Rinke, C. Joas, and M. Sche ffl er, J. Mater. Sci. , 7447 (2012). P. Y. Ayala and G. E. Scuseria, J. Chem. Phys. , 3660 (1999). S. Y. Willow, K. S. Kim, and S. Hirata, J. Chem. Phys. , 204122 (2012). D. Neuhauser, E. Rabani, and R. Baer, J. Chem. Theory Comput. , 24(2013). J. P. Perdew and K. Schmidt, AIP Conf. Proc. , 1 (2001). E. I. Zolotarev, Zap. Imp. Akad. Nauk, St. Petersburg , 5 (1877);reprinted in Collected works (Izdat. Akad. Nauk SSSR, Leningrad, 1932),Vol. 2, pp. 1-59. (in Russian) A. A. Gonˇcar, Math. USSR Sb. , 623 (1969). M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J.R. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H.Nakatsuji, M. Caricato, X. Li, H. P. Hratchian, A. F. Izmaylov, J. Bloino,G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda,J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T.Vreven, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. Bearpark, J.J. Heyd, E. Brothers, K. N. Kudin, V. N. Staroverov, R. Kobayashi, J.Normand, K. Raghavachari, A. Rendell, J. C. Burant, S. S. Iyengar, J.Tomasi, M. Cossi, N. Rega, J. M. Millam, M. Klene, J. E. Knox, J. B.Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann,O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, R. L. Mar-tin, K. Morokuma, V. G. Zakrzewski, G. A. Voth, P. Salvador, J. J. Dan-nenberg, S. Dapprich, A. D. Daniels, ¨O. Farkas, J. B. Foresman, J. V. Ortiz,J. Cioslowski, and D. J. Fox, gaussiangaussian