[PDF] Fast periodic Gaussian density fitting by range separation

Abstract

We present an efficient implementation of periodic Gaussian density fitting (GDF) using the Coulomb metric. The three-center integrals are divided into two parts by range-separating the Coulomb kernel, with the short-range part evaluated in real space and the long-range part in reciprocal space. With a few algorithmic optimizations, we show that this new method -- which we call range-separated GDF (RSGDF) -- scales sublinearly to linearly with the number of k-points for small to medium-sized k-point meshes that are commonly used in periodic calculations with electron correlation. Numerical results on a few three-dimensional solids show about 10-fold speedups over the previously developed GDF with little precision loss. The error introduced by RSGDF is about 10^{-5}~E_{\textrm{h}} in the converged Hartree-Fock energy with default auxiliary basis sets and can be systematically reduced by increasing the size of the auxiliary basis with little extra work. [The article has been accepted by The Journal of Chemical Physics.]

Full PDF

FFast periodic Gaussian density ﬁtting by range separation

Hong-Zhou Ye and Timothy C. Berkelbach

1, 2, a) Department of Chemistry, Columbia University, New York, New York 10027, USA Center for Computational Quantum Physics, Flatiron Institute, New York, New York 10010,USA

We present an e ﬃ cient implementation of periodic Gaussian density ﬁtting (GDF) using the Coulomb metric. The three-center integrals are divided into two parts by range-separating the Coulomb kernel, with the short-range part evaluatedin real space and the long-range part in reciprocal space. With a few algorithmic optimizations, we show that this newmethod – which we call range-separated GDF (RSGDF) – scales sublinearly to linearly with the number of k -pointsfor small to medium-sized k -point meshes that are commonly used in periodic calculations with electron correlation.Numerical results on a few three-dimensional solids show about 10-fold speedups over the previously developed GDFwith little precision loss. The error introduced by RSGDF is about 10 − E h in the converged Hartree-Fock energy withdefault auxiliary basis sets and can be systematically reduced by increasing the size of the auxiliary basis with littleextra work. Introduction . For one-electron basis sets in periodic elec-tronic structure calculations, translational symmetry-adaptedatom-centered Gaussian functions are an alternative to thehistorically prevalent plane waves.

Using Gaussian ba-sis functions provides a more compact representation of or-bitals, allows natural access to all-electron calculations with-out pseudopotentials, and facilitates the adaptation of accu-rate quantum chemistry methods for solids.

The down-side of atom-centered orbitals is the introduction of four-indexelectron repulsion integrals (ERIs), with O ( N k n ) storageand O ( N k n ) CPU costs for Hartree-Fock (HF) calculations,where N k is the number of k -points sampled in the Brillouinzone and n AO is the number of atomic orbitals in the unit cell.Moreover, the direct real-space evaluation of ERIs requires anexpensive triple lattice summation. The Gaussian and planewave (GPW) method reduces the scaling of the storage to O ( N k n N PW ) and the HF cost to O ( N k n N PW ln N PW ) byevaluating the ERIs entirely in reciprocal space using an aux-iliary PW basis of size N PW . However, doing so necessitates apseudopotential and hence precludes all-electron calculations.In addition, a large PW basis may be needed if the basis setcontains relatively compact orbitals.Another way to reduce the cost of manipulating the ERIsis with Gaussian density ﬁtting (GDF). In GDF, the or-bital pair densities used to evaluate the ERIs are expanded ina second, auxiliary Gaussian basis of size n aux , from which thefour-center ERIs can be approximated using two- and three-center integrals evaluated with some metric function. Thenumber of the latter integrals scales as O ( N k n n aux ), whichis much lower than that of the ERIs if n aux is not too big.For molecules, highly optimized auxiliary basis sets with n aux ≈ n AO have made GDF a great success in both mean-ﬁeld and correlated calculations. We note that theGPW treatment of ERIs can also be understood as a PW den-sity ﬁtting where n PW (cid:29) n AO .The application of GDF to periodic systems has been arelatively recent e ﬀ ort. The main challenge is the highcomputational cost of evaluating the three-center integrals in a) Electronic mail: [email protected] real space if the Coulomb metric is used. There are twoclasses of periodic GDF schemes. The ﬁrst class exploitslocality to limit the auxiliary expansion based on the prox-imity to the target pair density.

The locality couldarise from the system itself, an explicit use of a local met-ric other than the Coulomb operator, or a combination ofboth. The other class insists on a global, Coulomb metric-based GDF and accelerates the integral evaluation by calcu-lating the slowly convergent, long-range part separately, e.g.,in reciprocal space using a PW basis or in real space us-ing a multipole expansion.

The global GDF withthe Coulomb metric is generally considered more accurate butless computationally e ﬃ cient than the local one. Here, we introduce an e ﬃ cient implementation of a global,Coulomb metric-based GDF for periodic systems. We usethe error function to range-separate the Coulomb metric in-tegrals, evaluating the short-range part in real space andthe long-range part in reciprocal space, similar in spirit toRefs. 45 and 53. With a few algorithmic developments, weshow that the new scheme – which we call range-separatedGaussian density ﬁtting (RSGDF) – scales sublinearly to lin-early with N k for small to medium-sized k -point meshes thatare commonly used in periodic calculations with electroncorrelation. Numerical tests on three simple three-dimensional solids demonstrate that RSGDF accelerates pre-vious implementations of GDF by an order of magnitude withnegligible precision loss in the computed energies. We alsoshow that the accuracy of the HF energy computed usingRSGDF can be systematically improved with little extra com-putational e ﬀ ort by increasing the size of the auxiliary basis;we achieve accuracies on the order of 10 − E h per atom withspeedups of one to two orders of magnitude compared to ref-erence GPW calculations.While ﬁnalizing this work, a preprint by Sun reported asimilar range-separation idea to accelerate the direct computa-tion of the four-center Coulomb and the exchange integrals forperiodic HF calculations, i.e. without density ﬁtting. There-fore, we will also compare our RSGDF to this new method(referred to as RSJK henceforth) in terms of accuracy andcomputational cost. Theory . We begin with a brief review of periodic GDF us- a r X i v : . [ phy s i c s . c h e m - ph ] F e b ing a basis of n AO symmetry-adapted atomic orbitals (AOs) φ k µ ( r ) = X m e i k · m φ m µ ( r ) (1)where k is a crystal momentum in the ﬁrst Brillouin zone, m is a lattice translation vector, and φ m µ ( r ) = φ µ ( r − m ). Ananalogous equation holds for the n aux auxiliary atom-centeredGaussian basis χ k P ( r ). The ERIs are the Coulomb repulsionbetween pair densities( ρ k k µν | ρ k k λσ ) = Z Ω d r Z d r ρ k k µν ( r ) ρ k k λσ ( r ) r (2)where Ω is the unit cell volume, ρ k k µν ( r ) = φ k ∗ µ ( r ) φ k ν ( r ),and the four crystal momenta satisfy ( k − k + k − k ) · m = m . In GDF, the pair densities are approximated by anauxiliary expansion ρ k k µν ( r ) ≈ n aux X Q d k k Q µν χ k Q ( r ) , (3)with k = − k + k . Minimizing the ﬁtting error in somemetric w ( r ) leads to a linear equation for d k k , n aux X Q J k PQ d k k Q µν = V k k P µν , (4)where the two- and three-center metric integrals are J k PQ = ( χ k ∗ P | w | χ k Q ) , (5) V k k P µν = ( χ k ∗ P | w | ρ k k µν ) . (6)Once { d k k } are determined, the ERIs can be easily recov-ered, ( ρ k k µν | ρ k k λσ ) ≈ n aux X P , Q d k k P µν ( χ k P | χ k Q ) d k k Q λσ . (7)The ﬁxed size of the auxiliary Gaussian basis is responsiblefor a DF error compared to a calculation without DF (through-out, we will call this the accuracy , to be contrasted with the precision with which the two- and three-center integrals areevaluated for a ﬁxed auxiliary basis). Although in principlethe same is true of GPW, the auxiliary PW basis is typicallygrown to achieve arbitrarily accurate results that are free ofDF error.The computational bottleneck of periodic GDF is due tothe three-center integrals in Eq. (6), which, when using thelong-ranged Coulomb metric w ( r ) = r − , are expensive toevaluate in real space or reciprocal space. The current imple-mentation of periodic GDF in PySCF aims to address thischallenge by introducing a Gaussian charge basis { ξ k P } to re-move the charge and multipoles of the auxiliary basis. Thissplits Eq. (6) into two parts V k k P µν = ( χ k P − ξ k P | w | ρ k k µν ) + ( ξ k P | w | ρ k k µν ) . (8) The Gaussian exponents of { ξ k P } are optimized so that the ﬁrstterm in Eq. (8) can be evaluated in real space using a lat-tice summation and the second term in reciprocal space us-ing Eq. (15). Although this yields an improvement over anyattempt to evaluate the three-center integrals entirely in realor reciprocal space, the two separate summations can both berelatively slow to converge. Our new periodic RSGDF takesa di ﬀ erent approach to evaluate the three-center integrals inEq. (6).In RSGDF, we range-separate the Coulomb operator usingthe error function r − = w SR ( r ; ω ) + w LR ( r ; ω ), w SR ( r ; ω ) = erfc( ω r ) r (9a) w LR ( r ; ω ) = erf( ω r ) r (9b)so that Eq. (6) is split into a short-range (SR) part and a long-range (LR) part with ω controlling their relative weights. Weevaluate the LR integrals in reciprocal space,( V k k P µν ) LR ω = π N PW X G e −| G + k | / ω | G + k | ˜ χ k P ( − G ) ˜ ρ k k µν ( G ) , (10)where ˜ χ and ˜ ρ are the Fourier transform of χ and ρ and theprimed summation indicates G , for k = k ; the G = component is canceled by similar terms arising from theelectron-nuclear and nuclear-nuclear energy. A relativelysmall number of PWs are necessary for convergence due tothe presence of the Gaussian damping factor. The analyti-cal Fourier transform (AFT) is needed for compact auxiliaryorbitals and pair densities, while the fast Fourier transform(FFT) can be used for di ﬀ use ones; we will return to thispoint later. The cost of this step is therefore dominated by theAFT of the orbital pair densities˜ ρ k k µν ( G ) = N AFTcell X m e − i k · m Z d r φ µ ( r ) φ m ν ( r )e − i( k + G ) · r . (11)This AFT has two separate steps: the evaluation of the real-space integrals and the subsequent contraction of these inte-grals with phase factors, which scale as O ( N k N AFTcell N PW n )and O ( N k N AFTcell N PW n ), respectively. Note that the numberof unique crystal momentum pair di ﬀ erences grows linearlywith N k and N AFTcell can be estimated from the orbital overlap.The SR part can be easily evaluated in real space by latticesummation( V k k P µν ) SR ω = N cell X mn e − i k · m e i k · n ( V , mn P µν ) SR ω − π Ω ω S k P S k k µν δ k , k (12)where( V , mn P µν ) SR ω = Z Ω d r Z d r χ P ( r ) w SR ( r ; ω ) × φ m µ ( r ) φ n ν ( r ) , (13a) S k P = Z Ω d r χ k P ( r ) , (13b) S k k µν = Z Ω d r ρ k k µν ( r ) , (13c)and the summation range N cell scales as O ( ω − ) for three-dimensional solids because ω − is the decay length of the SRpotential. The second term in Eq. (12) cancels the G = component of the ﬁrst term. With proper integral screening,only O ( N cell ) terms contribute signiﬁcantly to the double lat-tice summation to achieve a ﬁnite precision (cid:15) (see Supplemen-tary Material for a detailed derivation). Therefore, the costsscale as O ( N cell n aux n ) for the evaluation of real-space inte-grals in Eq. (13a) and O (( N k N cell + N k N ) n aux n ) for thedouble phase factor contraction in Eq. (12).Since N AFTcell and N cell can be as large as 10 , the phase fac-tor contractions in both Eqs. (11) and (12) would account formost of the computational cost (except for very small k -pointmeshes). However, if the k -points are sampled from a uniform(e.g. Monkhorst-Pack ) mesh that includes the Γ point, thenthe phase factors satisfy e i k · m = e i k · ( ˜ m + M ) = e i k · ˜ m , where ˜ m is inside the Born-von Karman supercell and M is a lat-tice translation vector of the Born-von Karman supercell. Forexample, the summation in Eq. (12) can then be rewritten as N k X ˜ m ˜ n e − i k · ˜ m e i k · ˜ n X m → ˜ m , n → ˜ n ( V , mn P µν ) SR ω , (14)where the phase factor contraction now costs O ( N k n aux n ); asimilar treatment for Eq. (11) gives O ( N k N PW n ) cost forthe phase factor contraction. This process signiﬁcantly re-duces the total cost of these contractions so that they aresubdominant (at least for moderately sized k -point mesheswhere N k is much smaller than N cell or N AFTcell ). The remain- Fig 0 CD cc cd dd(C|cc) (C|cd) (C|dd)(D|cc) (D|cd) (D|dd) A u x ili a r y b a s i s ( ! ) Working basis pair ( " ) Lat. sum (SR) + FT (LR) FT ("|$%)

FIG. 1. Schematic illustration of how di ﬀ erent components of thethree-center integrals [Eq. (6)] are evaluated in RSGDF. Both theauxiliary and the AO bases are split into a compact (“C / c”) set anddi ﬀ use (“D / d”) set. The compact integrals (shaded in red) are rangeseparated to yield short-range (SR) and long-range (LR) contribu-tions that are evaluated in real space (by lattice summation) and inreciprocal space (by Fourier transforms), respectively; the di ﬀ use in-tegrals (shaded in blue) are evaluated entirely in reciprocal space us-ing Fourier transforms. Note that the column corresponding to “dc”-type pair densities is similar to the “cd”-column and hence omittedfor simplicity. ing cost-determining steps are the real-space integral evalu-ations in Eqs. (11) and (13a), which as a reminder scale as O ( N k N AFTcell N PW n ) and O ( N cell n aux n ), respectively. Notethat the cost of these expensive steps is no worse than linearin the number of auxiliary Gaussian basis functions.The algorithm described so far yields signiﬁcant perfor-mance improvements over existing periodic GDF schemes.We have identiﬁed an additional minor improvement, moti-vated by the observation that the real-space lattice summationsneeded for the SR part are slow to converge because of di ﬀ useorbitals, i.e. those with small Gaussian exponents (recall thatthe FFT can be used in the LR part for di ﬀ use orbitals). There-fore, we split both the auxiliary and the AO bases into a com-pact (“C / c”, upper case for auxiliary) and a di ﬀ use (“D / d”)set based on a cuto ﬀ α cut for the primitive Gaussian expo-nents. Similar ideas of compact and di ﬀ use basis splittinghave also been explored by Pulay and co-workers for molec-ular calculations. This leads us to six types of three-centerintegrals as shown in Fig. 1. Four of the integral types haveeither a di ﬀ use bra or a di ﬀ use ket (shaded in blue; note thatboth “cc” and “cd” are compact) and can thus be readily eval-uated in reciprocal space using a relatively small PW basis without range separation , V k k P µν = π N PW X G ˜ χ k P ( − G ) ˜ ρ k k µν ( G ) | G + k | . (15)To summarize, in RSGDF, we evaluate these four integraltypes directly in reciprocal space and the remaining two in-tegral types with compact bra and ket using the range separa-tion scheme deﬁned above. Note that the expensive AFTs ofthe compact auxiliary orbitals and pair densities can be calcu-lated once and used in the evaluation of both integral types.In practice, a large majority of orbitals are deﬁned as compactand so we ﬁnd that this separation of orbitals speeds up ourcalculations by a factor of two or less compared to a directapplication of RSGDF for all orbitals.While the above presentation suggests a computationalscaling that is linear in N k for typical mesh densities, the actualscaling is complicated by the choice of the parameters, N PW , ω , and α cut , which we discuss more below. Empirically weﬁnd that the optimal choice of N PW scales as O ( N − / k ) (Fig.S3), and the overall cost of RSGDF scales roughly as O ( N . k )for all the systems tested in this work (Fig. S6). However,this sublinear scaling only holds for small N k , because N PW eventually reaches a minimum value. Beyond that point, theAFTs needed for the LR part dominate the cost and we expecta linear scaling of RSGDF with N k , at least for medium-sized k -point meshes. Computational details . We implemented RSGDF as pre-sented above in a local version of the PySCF softwarepackage. We test its performance in terms of precision, ac-curacy, and computational e ﬃ ciency using three simple three-dimensional solids: diamond, MgO, and LiF. For diamond, weperform all-electron calculations using the cc-pVDZ basis; for MgO and LiF, we use GTH pseudopotentials and thecorresponding GTH-DZVP basis. We use the cc-pVDZ-jkﬁtbasis and the even-tempered basis (ETB) generated with a Fig 1: Choice of Npw & scaling (C/cc-pVDZ) a b c fit: ! !".$ fit: " % " &' (.( FIG. 2. Timing of RSGDF for diamond / cc-pVDZ using N k = to 5 . (a) Total DF initialization CPU time as a function of the number ofPWs used to compute the LR integrals [Eqs. (10) and (15)]. (b) CPU time for computing the SR integrals [Eq. (12)] as a function of ω for all k -point meshes except for N k = . (c) CPU time for computing the LR integrals [Eqs. (10) and (15)] as a function of N k N PW for all k -pointmeshes except for N k = . For (b) and (c), the black lines are power law ﬁts to all data points leading to exponents as shown. progression factor β = . GPW (called FFTDF inPySCF), and RSJK, as implemented in PySCF. For RSJK,we use eq. (23) in ref. 60 to determine an appropriate ω fora given system and k -point mesh. For RSGDF, we manuallytest a range of N PW and for each we determine the maximum ω and α cut that guarantee precision (cid:15) in all integrals; future workwill focus on the automated selection of these parameters. Asa general trend, using a larger PW basis slows down the LRpart by increasing the number of expensive real-space integra-tions to be performed in Eq. (11) but accelerates the SR partby allowing larger values for ω and α cut ; a smaller PW basishas the opposite e ﬀ ect. Unless otherwise mentioned, all cal-culations are run with a target precision of (cid:15) = − a.u. forintegral evaluation, which is the default setting for production-level periodic calculations in PySCF. All timing data reportedbelow are the CPU time recorded using a single CPU core (In-tel Xeon Gold 6126 2.6 GHz) with 16 GB of memory and 100GB of disk space except for GPW which requires larger mem-ory for N k ≥ . The current implementations of GDF andRSGDF are not integral-direct, meaning that we solve Eq. (4)only once and save the coe ﬃ cients { d k k } to disk for lateruse; this step is called “DF initialization” below and requires O ( N k n aux n ) disk space which limits our calculations to amaximum k -point mesh of N k = for all three systems. Theother two methods, GPW and RSJK, are both implemented inan integral-direct manner and hence require little disk space.An integral-direct implementation of RSGDF tailored for spe-ciﬁc applications will be presented in future work. Results and discussion . We ﬁrst verify our scaling analy-sis of the CPU cost of RSGDF. In Fig. 2a, we show the RS-GDF initialization time as a function of N PW for diamond us-ing N k = to 5 k -points (recall that all calculations achievethe same target precision). The optimal N PW – identiﬁed asthe minimum on each curve – indeed decreases with N k as O ( N − / k ) (the ﬁtted exponent is about − .

44; see Fig. S3).The inverse cubic dependence of the SR time on ω is veriﬁedin Fig. 2b, and the linear scaling of the LR time with N k N PW is veriﬁed in Fig. 2c. Similar results are observed for the othertwo systems (Figs. S1 and S2), although the optimal valuesof N PW for a given N k vary slightly from system to system.We leave the automatic determination of the optimal N PW to afuture work as it requires a more careful calibration. In whatfollows, we will simply use the manually optimized valuesfrom Fig. 2a and Figs. S1a and S2a (summarized in Tab. S1).The di ﬀ erent choices of N PW (and hence ω and α cut ) in RS-GDF do not cause any inconsistency in the computed ener-gies. As shown in Fig. S4, the converged RSGDF HF ener-gies di ﬀ er from the GDF results by less than 10 − E h for dia-mond and MgO and about 10 − E h for LiF for all data pointsshown in Fig. 2 and Figs. S1 and S2. We attribute the largerdeviation observed for LiF to the linear dependency foundin the auxiliary basis. Nonetheless, these deviations are ac-ceptable as they are at least one order of magnitude smallerthan the error introduced by DF itself ( vide infra ). Beyondthe HF energy, we have also veriﬁed that the electron corre-lation energy of diamond computed with RSGDF using thesecond order Møller-Plesset perturbation theory agrees withthe GDF results to better than 10 − E h for all k -point meshestested (Tab. S2). These observations conﬁrm that the algorith-mic developments in RSGDF cause negligible precision losscompared to the original implementation of GDF.Next, we study the computational e ﬃ ciency of RSGDF. InFig. 3, we plot the per-SCF-cycle time as a function of N k forcomputing the Coulomb and the exchange integrals in a HFcalculation for all three systems using four di ﬀ erent methodsto handle the ERIs: GPW (green), RSJK (red), GDF (grey),and RSGDF (blue). Since the ﬁrst two are implemented in anintegral-direct fashion, we include the DF initialization timefor GDF and RSGDF to enable a fair comparison.We ﬁrst compare GDF and RSGDF, which both computethe three-center integrals through a SR part in real space anda LR part in reciprocal space. Although both methods ex-hibit a similar sublinear scaling with N k at large N k , only theGDF timings plateau at small N k . This di ﬀ erence arises fromthe adjustment of the PW basis size for each N k and for eachsystem (Tab. S1), which ultimately balances the SR-LR cost Fig 3: DF vs noDF a b c FIG. 3. CPU time (per SCF cycle) for computing the Coulomb and the exchange integrals in a HF calculation for (a) diamond / cc-pVDZ, (b)MgO / GTH-DZVP, and (c) LiF / GTH-DZVP using GPW (green), RSJK (red), GDF (grey), and RSGDF (blue) to handle the ERIs. For GDF andRSGDF, the DF initialization time is included. For RSGDF, results using a larger auxiliary basis are also included (RSGDF*, white triangles).Insets show the deviation of the RSGDF HF energies from RSJK for diamond (a) and deviations of the RSGDF and the RSJK HF energiesfrom GPW for MgO (b) and LiF (c), where the x-axis is N k and the y-axis is in E h . in RSGDF as analyzed above (Fig. S5). More importantly,the algorithmic optimizations developed for RSGDF in thiswork signiﬁcantly reduce its computational cost and lead tospeedups of one to two orders of magnitude over the previousGDF for all three systems studied here.We next compare RSGDF with the two other methods with-out DF error, i.e. GPW and RSJK. The GPW timing showsthe characteristic O ( N k ) scaling of computing exact exchangestarting from N k = and is 40 to 400 times slower than RS-GDF for the largest N k tested here. The very high cost ofGPW for MgO is caused by the compact primitive Gaussiansin the 2 s and 2 p orbitals of Mg, which require 83 PWs toreach the target precision of 10 − (cf. 51 for LiF). We empha-size that lowering the precision requirement for GPW (hencelowering N PW ) only moderately reduces the cost due to the O ( N PW ln N PW ) scaling of FFT. For example, using (cid:15) = − and 10 − requires 75 and 66 PWs for MgO and reduces thecost by only factors of 1 . .

1, respectively.The RSJK timings are similar to those of GPW; althoughthey show a slightly weaker dependence on N k , the precisescaling is unclear. This peculiar N k -dependence of RSJKarises from a signiﬁcant SR-LR cost unbalance (Fig. S5) andsuggests a breakdown of eq. (23) in ref. 60 for determining theoptimal ω for RSJK. Despite this, RSJK still achieves com-putational e ﬃ ciency similar to GDF and much higher thanGPW for moderately sized k -point meshes, which is remark-able given that RSJK does not use DF.Finally, we examine the accuracy of the HF energies com-puted by RSGDF, which is a combination of the DF error dueto the auxiliary basis set and the precision error when calculat-ing the matrices in Eqs. (5) and (6). The RSJK results are usedas the benchmark for diamond and the GPW results are usedas the benchmark for MgO and LiF. To probe the possibility ofachieving higher accuracy with DF, we also include results forRSGDF using a larger auxiliary basis (denoted by “RSGDF*”in Fig. 3); we use the cc-pVTZ-jkﬁt basis for diamond and anETB whose size is about 1.25 times larger (obtained by us-ing a smaller β ) for MgO and LiF. The per-atom errors of the converged HF energies are plotted in the insets of Fig. 3.With the default auxiliary basis, the error introduced byRSGDF is about 10 − E h for diamond and MgO and about10 − E h for LiF; these errors are typical for DF-based HFcalculations. Using the slightly larger auxiliary basis (RS-GDF*) reduces the error by a factor of three or more and, mostremarkably, requires little extra work as can be seen from thenearly identical timings of RSGDF and RSGDF* in Fig. 3.This is because the LR part, which makes up at least half ofthe total cost of RSGDF, is dominated by the AFTs of theorbital pair densities, Eq. (11), whose cost is independent of n aux . By contrast, the accuracy of RSJK is in general very high(10 − E h or less), but relatively large errors of about 10 − E h are also observed for certain k -point meshes (inset of Fig. 3c);the accuracy loss in the latter cases is likely due to an inac-curate integral screening, as tightening the precision to 10 − reduces the error to about 10 − E h for the calculation of LiFusing 4 k -points. These results demonstrate that RSGDF pro-vides an extremely cost-e ﬀ ective approach to calculating ac-curate HF energies in periodic systems. Conclusion . To summarize, we have presented an e ﬃ cientscheme that uses range separation for Gaussian density ﬁt-ting (RSGDF) for periodic systems. The computational scal-ing is analyzed to be sublinear with N k for small k -pointmeshes and linear for medium-sized ones. With all-electronand pseudopotential-based numerical results on a few three-dimensional solids, we veriﬁed the scaling of RSGDF andshowed that it achieves about 10-fold speedups over the pre-viously developed GDF with little precision loss. The errorintroduced by RSGDF is about 10 − E h with default auxiliarybasis sets and can be systematically reduced by increasing thesize of the auxiliary basis with little extra work.The primary purpose of the current integral-indirect imple-mentation of RSGDF is to speed up Hartree-Fock (and hybriddensity functional theory) calculations for a given Gaussianbasis and auxiliary basis. Motivated by the excellent perfor-mance seen in these preliminary calculations, we are currentlyworking on the automatic determination of the optimal N PW , ω , and α cut . Looking forward, the fast integral constructionenabled by RSGDF encourages the development of integral-direct algorithms tailored to speciﬁc tasks such as the evalua-tion of exact exchange, the ERI orbital transformation, and post-HF calculations. Such integral-direct methodswould signiﬁcantly reduce the high memory footprint cur-rently required by post-HF calculations on periodic systemswith Gaussian basis sets.

SUPPLEMENTARY MATERIAL

See the supplementary material for (i) RSGDF initializa-tion time for di ﬀ erent choices of N PW and ω for MgO andLiF; (ii) optimal choices of N PW for N k from 1 to 5 and dif-ferent systems; (iii) di ﬀ erence of the HF energies computedusing RSGDF and GDF; (iv) SR and LR component time ofRSGDF, GDF, and RSJK for varying size of k -point meshes;(v) N k -scaling of RSGDF and GDF for the timing data shownin Fig. 3; (vi) values of the parameters needed by RSGDF,GDF, RSJK, and GPW; (vii) comparison of the MP2 correla-tion energies computed using RSGDF and GDF for diamond;(viii) derivations of the conditions for prescreening the doublelattice summation in Eq. (12). ACKNOWLEDGEMENTS

HY thanks Dr. Qiming Sun and Dr. Xiao Wang for helpfuldiscussions. This work was supported by the National ScienceFoundation under Grant No. OAC-1931321. We acknowl-edge computing resources from Columbia University’s SharedResearch Computing Facility project, which is supported byNIH Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State EmpireState Development, Division of Science Technology and In-novation (NYSTAR) Contract C090171, both awarded April15, 2010. The Flatiron Institute is a division of the SimonsFoundation.

DATA AVAILABILITY STATEMENT

The data that support the ﬁndings of this study are availablefrom the corresponding author upon reasonable request. R. Dovesi, A. Erba, R. Orlando, C. M. Zicovich-Wilson, B. Civalleri,L. Maschio, M. Rérat, S. Casassa, J. Baima, S. Salustro, and B. Kirtman,Wiley Interdiscip. Rev. Comput. Mol. Sci , e1360 (2018). Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu,J. D. McClain, E. R. Sayfutyarova, S. Sharma, S. Wouters, and G. K.-L.Chan, Wiley Interdiscip. Rev. Comput. Mol. Sci , e1340 (2018). T. D. Kühne, M. Iannuzzi, M. Del Ben, V. V. Rybkin, P. Seewald, F. Stein,T. Laino, R. Z. Khaliullin, O. Schütt, F. Schi ﬀ mann, D. Golze, J. Wilhelm,S. Chulkov, M. H. Bani-Hashemian, V. Weber, U. Borštnik, M. Taille-fumier, A. S. Jakobovits, A. Lazzaro, H. Pabst, T. Müller, R. Schade,M. Guidon, S. Andermatt, N. Holmberg, G. K. Schenter, A. Hehn,A. Bussy, F. Belleﬂamme, G. Tabacchi, A. Glö β , M. Lass, I. Bethune, C. J.Mundy, C. Plessl, M. Watkins, J. VandeVondele, M. Krack, and J. Hutter,J. Chem. Phys. , 194103 (2020). S. G. Balasubramani, G. P. Chen, S. Coriani, M. Diedenhofen, M. S.Frank, Y. J. Franzke, F. Furche, R. Grotjahn, M. E. Harding, C. Hättig,A. Hellweg, B. Helmich-Paris, C. Holzer, U. Huniar, M. Kaupp, A. Mare-fat Khah, S. Karbalaei Khani, T. Müller, F. Mack, B. D. Nguyen, S. M.Parker, E. Perlt, D. Rappoport, K. Reiter, S. Roy, M. Rückert, G. Schmitz,M. Sierka, E. Tapavicza, D. P. Tew, C. van Wüllen, V. K. Voora, F. Weigend,A. Wody´nski, and J. M. Yu, J. Chem. Phys. , 184107 (2020). J. Ihm, A. Zunger, and M. L. Cohen, J. Phys. C: Solid State Phys. , 4409(1979). R. Car and M. Parrinello, Phys. Rev. Lett. , 2471 (1985). J. L. Martins and M. L. Cohen, Phys. Rev. B , 6134 (1988). W. E. Pickett, Comput. Phys. Rep. , 115 (1989). G. Kresse and J. Furthmüller, Comput. Mater. Sci. , 15 (1996). A. F. Izmaylov and G. E. Scuseria, Phys. Chem. Chem. Phys. , 3421(2008). S. Hirata and T. Shimazaki, Phys. Rev. B , 085118 (2009). L. Maschio, D. Usvyat, M. Schütz, and B. Civalleri, J. Chem. Phys. ,134706 (2010). J. McClain, Q. Sun, G. K.-L. Chan, and T. C. Berkelbach, J. Chem. TheoryComput. , 1209 (2017). Q. Sun, T. C. Berkelbach, J. D. McClain, and G. K.-L. Chan, J. Chem.Phys. , 164119 (2017). X. Wang and T. C. Berkelbach, J. Chem. Theory Comput. , 3095 (2020). H. K. Buchholz and M. Stein, J. Comput. Chem. , 1335 (2018). J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello, T. Chassaing, andJ. Hutter, Comput. Phys. Commun. , 103 (2005). J. L. Whitten, J. Chem. Phys. , 4496 (1973). B. I. Dunlap, J. W. D. Connolly, and J. R. Sabin, J. Chem. Phys. , 3396(1979). J. W. Mintmire and B. I. Dunlap, Phys. Rev. A , 88 (1982). E. Baerends, D. Ellis, and P. Ros, Chem. Phys. , 41 (1973). O. Vahtras, J. Almlöf, and M. Feyereisen, Chem. Phys. Lett. , 514(1993). Y. Jung, A. Sodt, P. M. W. Gill, and M. Head-Gordon, Proc. Natl. Acad.Sci. , 6692 (2005). S. Reine, E. Tellgren, A. Krapp, T. Kjærgaard, T. Helgaker, B. Jansik,S. Høst, and P. Salek, J. Chem. Phys. , 104101 (2008). J. G. Hill, Int. J. Quantum Chem. , 21 (2013). A. Sodt, J. E. Subotnik, and M. Head-Gordon, J. Chem. Phys. , 194109(2006). A. Sodt and M. Head-Gordon, J. Chem. Phys. , 104106 (2008). S. Manzer, P. R. Horn, N. Mardirossian, and M. Head-Gordon, J. Chem.Phys. , 024113 (2015). H.-J. Werner, F. R. Manby, and P. J. Knowles, J. Chem. Phys. , 8149(2003). H. Eshuis, J. Yarkony, and F. Furche, J. Chem. Phys. , 234114 (2010). H.-J. Werner and M. Schütz, J. Chem. Phys. , 144116 (2011). C. Riplinger and F. Neese, J. Chem. Phys. , 034106 (2013). W. Györ ﬀ y, T. Shiozaki, G. Knizia, and H.-J. Werner, J. Chem. Phys. ,104104 (2013). v. Varga, Phys. Rev. B , 073103 (2005). v. Varga, M. Milko, and J. Noga, J. Chem. Phys. , 034106 (2006). L. Maschio, D. Usvyat, F. R. Manby, S. Casassa, C. Pisani, and M. Schütz,Phys. Rev. B , 075101 (2007). D. Usvyat, L. Maschio, F. R. Manby, S. Casassa, M. Schütz, and C. Pisani,Phys. Rev. B , 075102 (2007). C. Pisani, L. Maschio, S. Casassa, M. Halo, M. Schütz, and D. Usvyat, J.Comput. Chem. , 2113 (2008). A. M. Burow, M. Sierka, and F. Mohamed, J. Chem. Phys. , 214101(2009). R. Lazarski, A. M. Burow, and M. Sierka, J. Chem. Theory Comput. ,3029 (2015). A. Luenser, H. F. Schurkus, and C. Ochsenfeld, J. Chem. Theory Comput. , 1647 (2017). M. M. J. Grundei and A. M. Burow, J. Chem. Theory Comput. , 1159(2017). X. Wang, C. A. Lewis, and E. F. Valeev, J. Chem. Phys. , 124116(2020). D. Usvyat, L. Maschio, and M. Schütz, “Periodic local møller-plesset per-turbation theory of second order for solids,” in

Handbook of Solid State

Chemistry (American Cancer Society, 2017) Chap. 3, pp. 59–86. C. H. Patterson, J. Chem. Phys. , 064107 (2020). R. Lazarski, A. M. Burow, L. Grajciar, and M. Sierka, J. Comput. Chem. , 2518 (2016). M. Becker and M. Sierka, J. Comput. Chem. , 2563 (2019). v. Varga, Int. J. Quantum Chem. , 1518 (2008). v. Varga, J. Math. Chem. , 1 (2011). P. Merlot, T. Kjærgaard, T. Helgaker, R. Lindh, F. Aquilante, S. Reine, andT. B. Pedersen, J. Comput. Chem. , 1486 (2013). G. Schmitz and O. Christiansen, Chem. Phys. Lett. , 7 (2018). L. N. Wirz, S. S. Reine, and T. B. Pedersen, J. Chem. Theory Comput. ,4897 (2017). T. Shimazaki, T. Kosugi, and T. Nakajima, J. Phys. Soc. Japan , 054702(2014). A. Grüneis, M. Marsman, and G. Kresse, J. Chem. Phys. , 074107(2010). A. Grüneis, J. Chem. Phys. , 102817 (2015). F. Hummel, T. Gruber, and A. Grüneis, Eur. Phys. J. B , 235 (2016). W. Mayr-Schmölzer, J. Planer, J. Redinger, A. Grüneis, and F. Mittendor-fer, Phys. Rev. Research , 043361 (2020). C. Pisani and R. Dovesi, Int. J. Quantum Chem. , 501 (1980). R. Dovesi, R. Orlando, C. Roetti, C. Pisani, and V. Saunders, phys. stat.sol. (b) , 63 (2000). Q. Sun, “Exact exchange matrix of periodic hartree-fock theory for all-electron simulations,” (2020), arXiv:2012.07929. P. M. Gill and R. D. Adamson, Chem. Phys. Lett. , 105 (1996). L. Füsti-Molnar and P. Pulay, J. Chem. Phys. , 7795 (2002). L. Füsti-Molnar and P. Pulay, J. Chem. Phys. , 7827 (2002). H. J. Monkhorst and J. D. Pack, Phys. Rev. B , 5188 (1976). T. H. Dunning, J. Chem. Phys. , 1007 (1989). S. Goedecker, M. Teter, and J. Hutter, Phys. Rev. B , 1703 (1996). C. Hartwigsen, S. Goedecker, and J. Hutter, Phys. Rev. B , 3641 (1998). F. Weigend, Phys. Chem. Chem. Phys. , 4285 (2002). C. Møller and M. S. Plesset, Phys. Rev. , 618 (1934). upplementary Material: Fast Periodic GaussianDensity Fitting By Range Separation Hong-Zhou Ye † and Timothy C. Berkelbach ∗ , † † Department of Chemistry, Columbia University, New York, New York 10027, USA ‡ Center for Computational Quantum Physics, Flatiron Institute, New York, New York 10010, USA

E-mail: [email protected]

Contents

S1 Supplementary ﬁgures 2S2 Supplementary tables 5S3 Conditions for prescreening the double lattice summation 7

S3.1 General considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7S3.2 A bound for R c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9S3.3 A bound for R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10S3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Note: ﬁgures and equations appearing in the main text will be referred as “Fig. Mxxx”and “Eq. Mxxx” in this Supplementary Material document.1 a r X i v : . [ phy s i c s . c h e m - ph ] F e b Fig S1: Choice of Npw & scaling (MgO/GTH-DZVP) a b c fit: ! !".) fit: " % " &' (.( Figure S1: Same plot as Fig. M1 for MgO / GTH-DZVP. Fig S2: Choice of Npw & scaling (LiF/GTH-DZVP) a b c fit: ! !*.+ fit: " % " &' (.( Figure S2: Same plot as Fig. M1 for LiF / GTH-DZVP.2 N k O p t i m a l N P W CMgOLiF

Figure S3: Optimal choice of N PW (obtained from Fig. M1 and Figs. S1 and S2) plotted as afunction of N k . The estimated exponents are − .

44 (C), − .

38 (MgO), and − .

43 (LiF). Fig S3: Error against old GDF a b c Figure S4: Absolute deviation between the converged HF energies computed using RSGDF andGDF for (a) diamond / cc-pVDZ, (b) MgO / GTH-DZVP, and (c) LiF / GTH-DZVP using 1 to 5 k -points. N k C P U t i m e [ s e c ] RSGDFCMgOLiF N k C P U t i m e [ s e c ] GDF N k C P U t i m e [ s e c ] RSJK

Figure S5: Component timing data (SR: darker; LR: lighter) plotted as a function of N k for RSGDF(left), GDF (middle), and RSJK (right). 3 Fig S7 C MgO LiF0.810.82 0.79 0.810.79 0.55

Figure S6: Same data for GDF and RSGDF as plotted in Fig. M3 with power-law ﬁtting ( t = aN γ k )where the exponent γ is displayed in the ﬁgure. For RSGDF, the last four points are used, whilefor GDF, the last three points are used. 4 Table S1: Parameters for each method used in this work. RSGDF / RSJK: ω and N PW . GDF: chargebasis exponent α chg and N PW . GPW: N PW . RSGDF N k N k Diamond MgO LiF ω N PW ω N PW ω N PW . . .

14 0 . . .

63 0 . . .

172 0 . . .

365 0 . . . RSJK N k Diamond MgO LiF ω N PW ω N PW ω N PW . . . . . . . . . . . . . . . . . . . . . . . . . . GDF N k Diamond MgO LiF α chg N PW α chg N PW α chg N PW All 0 . . . GPW N k C MgO LiF N PW N PW N PW All 31 / cc-pVDZ computed with the HF solution and ERIscalculated using RSGDF and GDF. For N k ≤ , all bands in the cc-pVDZ basis are used, whilefor N k = , only the valence occupied bands and the ﬁrst 10 conduction bands are correlated to ﬁtthe memory limit. N k E (2)RSGDF / E h E (2)GDF / E h E (2)RSGDF − E (2)GDF / n E h − . − . − . − . − . . − . − . . − . − . . % " ! " " ! !" ! ! ! ! &",$ Figure S7: Schematic illustration of the various vectors deﬁned in the text. Note that without lossof generality, the auxiliary orbital χ P is assumed to be located at the origin.Here, we derive conditions for prescreening the double lattice summation in Eq. (M12), i.e.,the short-range (SR) three-center Coulomb integrals. The basic idea is to analyze how each termdecays in real space and then derive truncation conditions to achieve a desired precision (cid:15) in thecomputed integrals. S3.1 General considerations

Eq. (M12) can be rewritten as (ignoring the phase factors) V P µν = X m , n v P µν ( R m , R n ; ω ) (S1)where v P µν ( R m , R n ; ω ) = Z Ω d r Z d r χ P ( r ) w SR ( r ; ω ) φ R m µ ( r ) φ R n ν ( r ) , (S2) R m = τ + m and R n = τ + n , with τ and τ the position of the two AOs in the reference unitcell (Fig. S7). For the simple case of all 1 s -type primitive Gaussians (with exponents α P , α µ , and α ν and coe ﬃ cients C P , C µ , and C ν ), the integral in Eq. (S2) can be easily worked out. The result is v P µν ( R m , R n ; ω ) = Ah ( R m , n ) f ( R m , n c ; ω ) (S3)7here A = (cid:18) π (cid:19) C P C µ C ν α / P ( α µ + α ν ) / , (S4) h ( R m , n ) = e − β ( R m , n ) (S5)with β = α µ α ν α µ + α ν and R m , n = R m − R n , and f ( R m , n c ; ω ) = R m , n c [erf( η R m , n c ) − erf( η R m , n c )] (S6)with η = [ α − P + ( α µ + α ν ) − ] − / , (S7) η = ( η − + ω − ) − / (S8)and R m , n c = α µ R m + α ν R n α µ + α ν . (S9)It can be shown that both h and f are positive-valued, monotonically decreasing functions.Deﬁne two new lattice vectors, n = n − m (S10) m = R m , m + n c − τ c , n (S11)where τ c , n lies in the reference unit cell. Equation (S1) can be rewritten as V P µν = A X n h ( R , n ) X m f ( R m , m + n c ; ω ) = A X n h ( R , n ) X m f ( | m + τ c , n | ; ω ) (S12)Equation (S12) changes the double lattice summation in Eq. (S1) from over the bare lattice vectors( m and n ) to over the center ( m ) and the di ﬀ erence ( n ) vectors. From now on, we will drop theprime in the vectors m and n . 8 R c We ﬁrst estimate a bound for truncating m , or equivalently R c . Let S h = P n h ( R , n ). The errormade by truncating the m -summation is δ ≈ AS h X m > m cut f ( m ; ω ) ≈ AS h π Ω Z ∞ R cut c d R R [erf( η R ) − erf( η R )] ≈ AS h π Ω √ π Z ∞ R cut c d R ( η − e − η R − η − e − η R ) ≤ AS h π Ω √ π R cut c Z ∞ R cut c d R R ( η − e − η R − η − e − η R ) = AS h π Ω √ π R cut c

12 ( η − e − ( η R cut c ) − η − e − ( η R cut c ) ) ≈ AS h π Ω √ π η R cut c e − ( η R cut c ) (S13)where we have assumed that the ﬁnal cuto ﬀ R cut c is not too small so that1. in line 1, f ( | m + τ c , n | ; ω ) ≈ f ( m ),2. in line 2, the summation can be turned into an integral with (4 π/ Ω ) R being the density ofstates,3. in line 3, the leading-order asymptotic expansion of the integrand is valid (error ∼ O ( R − )),4. in line 6, the η -related term can be dropped.These assumptions could break down if both the AOs and the auxiliary orbitals are very compactso that R cut c is small. However, for virtually all AO bases, even the most compact AOs have non-compact primitive Gaussians that control the long-range behavior and hence the magnitude of R cut c .For this reason, the assumptions made above virtually hold for all AO bases. A bound for R c ishence obtained by requiring AS h π Ω √ π η R cut c e − ( η R cut c ) < (cid:15). (S14)9ote that for estimating this bound, S h does not need to be evaluated very accurately. In practice,we fond that performing a small lattice summation S h ≈ X n < n ( (cid:15) ) h ( R , n ) (S15)su ﬃ ces, where n ( (cid:15) ) is determined such that h ( R , n ( (cid:15) )12 ) < (cid:15) . We found that (cid:15) = − already leadsto convergent results. S3.3 A bound for R We then determine a bound for truncating n , or equivalently R . First, we note that since the ﬁnalintegral depends on the product of h and f , a simple proposal like h ( R ) < (cid:15) (S16)is not appropriate: it is too tight when f ( R c ; ω ) (cid:28) R c ) and too loose when f ( R c ; ω ) (cid:29) R c ).To that end, we determine here a bound for R that is R c -dependent. We ﬁrst decouple the twosummations V P µν ≤ A X m f max m ( ω ) X n h ( R , n ) (S17)where f max m ( ω ) = max τ ∈ Ω f ( | m + τ | ; ω ) (S18)( τ ∈ Ω denotes all vectors in the reference unit cell). Now consider the error introduced bytruncating the n -summation for m in a ﬁnite shell R ∈ R + ∆ R δ R ∼ R +∆ R ≤ AN cell ( R , R + ∆ R ) f ( R ; ω ) X n > n cut h ( R , n ) (S19)where N cell ( R , R + ∆ R ) is the number of cells in the range R ∼ R + ∆ R , and the " ≤ " comes from the10act that f is a monotonically decreasing function. We found that ∆ R = N cell ( R , R + ∆ R ) =  π Ω × R + ∆ R ) ∆ R , R > R N NFcell , R ≤ R (S20)works well, where N NFcell is the number of cells within a sphere with radius 3 Ω / , and R is deter-mined as the crossing point of the two pieces of Eq. (S20)4 π Ω × R + ∆ R ) ∆ R = N NFcell . (S21)The motivation behind Eq. (S20) is that the continuous approximation of density of states workswell for large R but not for small R . The summation in Eq. (S19) can be approximated by anintegral X n > n cut h ( R , n ) ≈ π Ω Z ∞ R cut12 d R R e − β R ≈ π Ω R cut12 β e − β ( R cut12 ) . (S22)Combining Eqs. (S19), (S20) and (S22), a bound of R for R < R c < R + ∆ R is found by AN cell ( R , R + ∆ R ) f ( R ; ω ) 4 π Ω R cut12 β e − β ( R cut12 ) < (cid:15). (S23)However, when R cut12 found by Eq. (S23) is too small, the integral in Eq. (S22) is not a good approx-imation to the lattice summation in Eq. (S19). Thus, we use R cut12 = max { Eq. (S23) , R cut , min12 } (S24)where R cut , min12 is determined by N cell ( R cut , min12 , R cut , min12 + ∆ R ) R cut , min12 β e − β ( R cut , min12 ) = X R , n > R cut , min12 h ( R , n ) , (S25)i.e., the minimal R where the integral in Eq. (S22) remains a good approximation to the summa-11ion. Similar to S h in Eq. (S14), the summation on the right-hand side of Eq. (S25) need not beevaluated very accurately and can be safely truncated using the n ( (cid:15) ) determined for Eq. (S15). S3.4 Summary

To summarize, if both the AOs and the auxiliary orbitals are 1 s -type primitive Gaussians,1. we ﬁrst use Eq. (S14) to derive a bound for R c , denoted by R cut c ,2. we then make a mesh for R c with increment ∆ R , i.e.,0 , ∆ R , ∆ R , · · · , R cut c , (S26)and use Eq. (S24) to determine a series of bounds for R , denoted by { R cut12 ( R c ) } , each for awindow of R c .With these two bounds, a term v P µν ( R m , R n ; ω ) is not discarded only if R m , n c ≤ R cut c R m , n ≤ R cut12 ( R m , n c ) (S27)The bounds discussed above are determined for every AO shell pair ( µν ) and auxiliary shell( P ). For primitive Gaussians with higher angular momentum, we found that treating them as 1 s works very well (note that the normalization constant needs to be recalculated using the 1 s -orbitalnormalization condition). For a contracted Gaussian orbital, the long-range behavior is determinedsolely by the most di ﬀﬀ