Large-scale relativistic complete active space self-consistent field with robust convergence
LLarge-scale relativistic complete active space self-consistent field with robust convergence
Ryan D. Reynolds, Takeshi Yanai,
2, 3 and Toru Shiozaki Department of Chemistry, Northwestern University, 2145 Sheridan Rd., Evanston, Illinois 60208, USA. Institute of Transformative Bio-Molecules (WPI-ITbM), Nagoya University, Chikusa, Nagoya 464-8602, Japan JST PRESTO, 4-1-8 Honcho, Kawaguchi-shi, Saitama 332-0012, Japan (Dated: June 21, 2018)We report an e ffi cient algorithm using density fitting for the relativistic complete active space self-consistentfield (CASSCF) method, which is significantly more stable than the algorithm previously reported by one ofthe authors [J. E. Bates and T. Shiozaki, J. Chem. Phys. , 044112 (2015)]. Our algorithm is based on thesecond-order orbital update scheme with an iterative augmented Hessian procedure, in which the density-fittedorbital Hessian is directly contracted to the trial vectors. Using this scheme, each microiteration is made lesstime consuming than one Dirac–Hartree–Fock iteration, and macroiterations converge quadratically. In addition,we show that the CASSCF calculations with the Gaunt and full Breit interactions can be e ffi ciently performedby means of approximate orbital Hessians computed with the Dirac–Coulomb Hamiltonian. It is demonstratedthat our algorithm can also be applied to systems under an external magnetic field, for which all of the molecularintegrals are computed using gauge-including atomic orbitals. I. INTRODUCTION
It has long been recognized that special relativity plays animportant role in chemistry, especially when heavy elementsare present.
The many-body relativistic quantum mechan-ics can be accurately captured using the Dirac–Coulomb–Breit Hamiltonianˆ H Dirac = (cid:88) i (cid:104) c ( β − I ) + c ( α · ˆ p i ) + ˆ V i (cid:105) + (cid:88) i < j ˆ V i j , (1)ˆ V i j = r i j − α i · α j r i j − (cid:104) ( α i · ∇ i )( α j · ∇ j ) r i j (cid:105) , (2)where α and β are Dirac’s matrices, ˆ p is the momentum oper-ator, and c is the speed of light. The second and third termsin Eq. 2 are called the Gaunt and gauge terms, respectively;by neglecting either the gauge term or both we can obtainthe Dirac–Coulomb–Gaunt and Dirac–Coulomb Hamiltoni-ans. In practice, since we are interested in the electronic states,the Hamiltonian will be projected to the electronic manifold:ˆ H ele ≈ ˆ P ˆ H Dirac ˆ P . (3)There have been many studies to develop such a projec-tor. For example, the exact two-component (X2C) Hamil-tonian can be obtained by a transformation that block-diagonalizes the one-electron Dirac Hamiltonian; the trans-formation is usually applied in an approximate way to thetwo-electron operator. A number of other two compo-nent approaches have also been developed.
The Dirac–Hartree–Fock method and subsequent no-pair projection im-plies a projector that block-diagonalizes the mean-field Fockoperator.
Likewise, the no-pair projection based on the complete ac-tive space self-consistent field (CASSCF) method decou-ples the Dirac equation at the CASSCF level, which we advo-cate in this work. The advantage of using the CASSCF-basedprojector is that it allows for unambiguous treatment of thetwo-electron operator, including the Gaunt and Breit terms ifnecessary. When one performs multireference electron corre-lation methods for heavy-element complexes, it is natural to use this projector because the cost of Dirac CASSCF is usu-ally marginal compared to the multireference electron corre-lation treatment (such as second-order perturbation and con-figuration interaction ) in terms of both operation counts andmemory requirements. We note in passing, however, that ourformulation and programs reported herein are equally appli-cable to any relativistic Hamiltonians.Prior to this work, a second-order four-component rel-ativistic CASSCF algorithm without explicit constructionof the entire Hessian matrix has been studied by Jensenand co-workers. An e ffi cient algorithm for quasi-second-order relativistic CASSCF based on density fitting hasbeen reported by one of the authors. RelativisticCASSCF algorithms have also been developed with a two-component projection or only scalar relativistic inter-actions included.
The density fitting approximation andsimilar techniques have also been used in non-relativisticCASSCF algorithms.
This work combines these develop-ments and reports an e ffi cient algorithm for second-order four-component relativistic CASSCF using density fitting. Our al-gorithm is optimized such that neither Hessian elements norfour-center integrals (except for those with four active in-dices) are constructed. The resulting code is also applicable tomolecules under a magnetic field, for which we use the gauge-including atomic orbitals to remove the gauge origin de-pendence of the results. All of the programs have been imple-mented in the BAGEL package, which is openly availableunder the GNU General Public License. II. THEORY
We hereafter use the following orbital index notations: i and j label closed orbitals; r , s , t , and u label active orbitals; a and b label virtual orbitals; and x and y are general molecularorbitals (MOs). µ and ν are scalar atomic orbitals.The CASSCF wavefunction is parametrized using the a r X i v : . [ phy s i c s . c h e m - ph ] J un multi-configurational ansatz | Ψ ( C ) (cid:105) = (cid:88) I c I | I ( C ) (cid:105) , (4)where each | I ( C ) (cid:105) is a Slater determinant. The determina-tion of the c I -coe ffi cients has been described elsewhere; thiswork is concerned with the optimization of the MO coe ffi cientmatrix C . A. Augmented Hessian for complex minimax problems
In this section, the augmented Hessian approach (with scal-ing) for orbital updates is reviewed. The details on the aug-mented Hessian algorithm for real variables have been de-scribed in depth, for instance, in Ref. 40. They have alsobeen studied in a slightly di ff erent form in Ref. 23. There areextensions of this approach for non-relativistic CASSCF, which can similarly be translated to relativistic counterparts.The MO coe ffi cients are parameterized using an anti-Hermitian unitary generator X , C = C ref exp ( X ) (5) X xy = (cid:40) κ xy x > y − κ ∗ yx x < y (6)The collection of κ xy and κ ∗ yx appearing in the expression for X is denoted in the following as column vectors κ and κ ∗ . TheHessian matrix for the CASSCF energy with respect to therotation parameters κ and κ ∗ is H = ∂ E ∂ κ ∗ ∂ κ ∂ E ∂ κ ∗ ∂ κ ∗ ∂ E ∂ κ ∂ κ ∂ E ∂ κ ∂ κ ∗ , (7)where the Hessian elements are evaluated at κ = κ ∗ =
0. TheHessian matrix is complex and Hermitian. Note that we intro-duced a matrix notation for the subblocks of the Hessian, (cid:32) ∂ E ∂ κ ∗ ∂ κ (cid:33) xy , x (cid:48) y (cid:48) = ∂ E ∂κ ∗ xy ∂κ x (cid:48) y (cid:48) . (8)For a multi-state orbital optimization, we substitute the state-averaged CASSCF energy for E throughout this derivation.Using a pair of trial vectors for a set of κ and κ ∗ , namely s I and s J with s I = ( t TI t † I ) T , its subspace representation can beshown to be H IJ ≡ s † I Hs J = t † I σ J ] , (9)where we defined the σ vector as σ I = ∂ E ∂ κ ∗ ∂ κ t I + ∂ E ∂ κ ∗ ∂ κ ∗ t ∗ I . (10)The right-hand side of Eq. (10) should be understood asmatrix–vector multiplications. It is important to recognizethat the subspace Hessian matrix is real and symmetric. Given this expression, the subspace representation of theaugmented Hessian matrix can be easily obtained. The aug-mented Hessian is defined as H (cid:48) = (cid:32) ∂ E ∂ κ (cid:33) T (cid:32) ∂ E ∂ κ ∗ (cid:33) T ∂ E ∂ κ ∗ λ ∂ E ∂ κ ∗ ∂ κ λ ∂ E ∂ κ ∗ ∂ κ ∗ ∂ E ∂ κ λ ∂ E ∂ κ ∂ κ λ ∂ E ∂ κ ∂ κ ∗ (11)where λ is an appropriately chosen scaling factor to control thestep size (see below). We iteratively diagonalize this matrix todetermine the correction vector ∆ κ : H (cid:48) λ ∆ κ λ ∆ κ ∗ = ε λ ∆ κ λ ∆ κ ∗ (12)The first trial vector is s (cid:48) = (1 0 0) T . The other trial vectorsare written as s (cid:48) I = (0 t TI t † I ) T . Using the above definition of σ I , one can show that the subspace representation of H (cid:48) (i.e., H (cid:48) IJ = s (cid:48)† I H (cid:48) s (cid:48) J ) can be written as H (cid:48) IJ = I = J = (cid:34) t † I ∂ E ∂ κ ∗ (cid:35) I (cid:44) J = (cid:34) t † J ∂ E ∂ κ ∗ (cid:35) I = J (cid:44) λ Re (cid:104) t † I σ J (cid:105) otherwise (13)which is a real symmetric matrix. The lowest eigenvalue ofthe subspace matrix, ε , is related to the denominator shiftin the determination of the correction vector. The associatedeigenvector elements { c I } define the optimal linear combina-tion of the trial vectors, ∆ κ ≈ λ (cid:80) I ≥ ( c I / c ) t I , from which theresidual vector is computed. A new trial vector is then gener-ated from the residual vector. This procedure is repeated untilconvergence is achieved.The value of λ in Eq. (11) is chosen such that the step size s = | ∆ κ ( λ ) | is maximized within the acceptable maximum stepsize s max . In each microiteration, we first diagonalize the aug-mented Hessian matrix [Eq. (13)] and calculate the step size s . If the step size is larger than s max , we iteratively find theoptimal value for λ . When s is larger than s max , we set λ to be s / s max ; otherwise λ is linearly interpolated using twoprevious values. When s ≤ s max and | s − s max | < . s max are achieved, we consider the value of λ to be optimal. Aspointed out in Ref. 31, as long as the product λε is larger thanthe negative eigenvalues of the Hessian associated with therotations between electronic and positronic orbitals (which ispractically always the case), the shifted Hessian, H − λε , hasthe right eigenvalue structure. Therefore, this procedure canbe applied to minimax optimization in relativistic CASSCFwithout problems.The eigenvalue problem for the augmented Hessian doesnot have to be solved very accurately, especially when the or-bital gradients are large; we typically converge the microit-eration till the root mean square (RMS) of the residual vec-tor becomes either four orders of magnitude smaller than thestep size or half of the convergence threshold specified for themacroiterations, whichever is greater. B. Three-index integral transformation using density fitting
Here, we recapitulate the density fitting algorithms for four-component relativistic methods that have been reported inRef. 20. Density fitting approximates the scalar four-indextwo-electron integrals as( µ w ν w | µ (cid:48) w (cid:48)(cid:48) ν (cid:48) w (cid:48)(cid:48)(cid:48) ) ≈ (cid:88) γδ ( µ w ν w (cid:48) | γ )( J − ) γδ ( δ | µ (cid:48) w (cid:48)(cid:48) ν (cid:48) w (cid:48)(cid:48)(cid:48) ) . (14)in which γ and δ are the auxiliary functions, and w ( w = l , x , y , and z ) denotes the basis component, φ r , w ( r ) = φ r ( r ) w = l ,∂φ r ( r ) ∂ w w = x , y , z . (15)The scalar integrals ( γ | µ w ν w (cid:48) ) are real. For the Coulomb inter-action, the four-component forms of the three-index integralscan be calculated from these scalar integrals as( γ | µ X ν Y ) = (cid:88) ww (cid:48) k ww (cid:48) XY ( γ | µ w ν w (cid:48) ) , (16)in which X and Y label L + , L − , S + , and S − . The prefactor k ww (cid:48) XY is due to the use of the so-called restricted kinetic balance andis determined by k ww (cid:48) XY = ξ † X η w † η w (cid:48) ξ Y . (17)where ξ X is a unit vector, ξ X = ( δ X , L + , δ X , L − , δ X , S + , δ X , S − ) T ,and η w is a 4 × η l = (cid:32) I (cid:33) , η x = − i c (cid:32) σ x (cid:33) ,η y = − i c (cid:32) σ y (cid:33) , η z = − i c (cid:32) σ z (cid:33) . (18)The details on these expressions as well as their extensions in-clude the Gaunt or full Breit operator can be found in Ref. 20.In practice, the half index transformation of the three-electron integrals is e ffi ciently performed as( γ | i ν Y ) = (cid:88) X (cid:88) ww (cid:48) (cid:88) µ k ww (cid:48) XY ( γ | µ w ν w (cid:48) ) C X ∗ µ i . (19)where C X ν i is the X block of the molecular coe ffi cient matrix( X = L + , L − , S + , or S − ). Note that the scalar three-indexintegrals ( γ | µ w ν w (cid:48) ) are computed at the beginning of the cal-culation and stored in memory; therefore, this step consists merely of matrix–matrix multiplications and appropriate post-processing. The second index transformation is simply( γ | ia ) = (cid:88) Y (cid:88) ν ( γ | i ν Y ) C Y ν a . (20)These transformed integrals are extensively used in the rela-tivistic CASSCF algorithm. In the following, we also use thenotation ( γ J | i ν Y ) = (cid:88) δ ( J − ) γδ ( δ | i ν Y ) (21)for three index integrals that are multiplied by the metric in-verse. All of the programs for index transformation are paral-lelized in our code. C. Working equations using density fitting
In this section, we explicitly note all of the working equa-tions for contracting the Hessian matrix and a trial vector[Eq. (10)] using the density fitting approximation. Each sub-block of the Hessian matrix can be derived by a double com-mutator, ∂ E ∂ κ ∗ ∂ κ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) κ = κ ∗ = = ∂ ∂ κ ∗ ∂ κ (cid:104) | [[ ˆ H , ˆ X ] , ˆ X ] | (cid:105) (22a) ∂ E ∂ κ ∗ ∂ κ ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) κ = κ ∗ = = ∂ ∂ κ ∗ ∂ κ ∗ (cid:104) | [[ ˆ H , ˆ X ] , ˆ X ] | (cid:105) (22b)where ˆ H is the chosen relativistic Hamiltonian and ˆ X is theorbital rotation generator,ˆ X = (cid:88) x > y (cid:104) κ xy ˆ E xy − κ ∗ xy ˆ E yx (cid:105) . (23)To determine the sigma vector, we then contract the trial vec-tor and its conjugate, t xy and t ∗ xy , with the appropriate blocksof the Hessian matrix as indicated in Eq. 10.The optimized working equation is written in the matrixform so that it maps naturally to the level-3 BLAS functions.It reads2 σ VA = f VV t VA d AA − t VA ( d AA f AA + f AA d AA ) − t VC F CA + F VC t CA − t VC f CA d AA − f VC t CA d AA − t VA ( Q AA + Q AA † ) − t VC Q CA + R VA + L VA d AA (24a)2 σ CA = t CA F AA + t VC † F VA − t CA ( d AA f AA + f AA d AA ) − F CC t CA + F CV t VA + f CC t CA d AA − f VC † t VA d AA − t VC † f VA d AA − t VC † Q VA − t CA ( Q AA + Q AA † ) − R CA + L CA − L CA d AA + M CA (24b)2 σ VC = − t VC F CC − t VA F AC + F VV t VC + F VA t CA † − f VA d AA t CA † − t VA d AA f AC − t VA Q CA † − Q VA t CA † + L VC + M VC (24c)in which the submatrices of the trial and σ vectors are definedas ( t VC ) ai = t ai , ( t CA ) ir = t ∗ ri , ( t VA ) ar = t ar , (25a)( σ VC ) ai = σ ai , ( σ CA ) ir = σ ∗ ri , ( σ VA ) ar = σ ar . (25b)Note the closed–active blocks are defined with their complexconjugates. The superscripts V , A , and C denote virtual, ac-tive, and closed orbitals. The matrix notation of the densitymatrix within the active space is( d AA ) rs = γ ∗ rs . (26)where use of the complex conjugate simplifies the workingequations. The state-averaged density matrix is substitutedfor multi-state calculations.We also introduced the closed and generalized Fock opera-tor ( f and F , respectively) in the MO basis, f = h + g ( γ closed ) , (27a) F = f + g ( γ act ) , (27b)in which h is the one-electron Hamiltonian and the g ( γ ) matrixis the Coulomb and exchange part of the Fock operator associ-ated with the density matrix γ , which can be calculated usinga standard Fock builder for the Dirac–Hartree–Fock method.The submatrices of the Fock operators are also defined simi-larly as ( f CC ) i j = f i j , ( F CC ) i j = F i j , and so on.The Q vectors in Eq. (24) are defined as( Q CA ) ir = (cid:88) stu ( is | tu ) Γ rs , tu (28a)( Q AA ) r (cid:48) r = (cid:88) stu ( r (cid:48) s | tu ) Γ rs , tu (28b)( Q VA ) ar = (cid:88) stu ( as | tu ) Γ rs , tu (28c)in which Γ is the two-particle density matrix within the activespace. We further define the R vector, whose matrix form is( R VA ) ar = (cid:88) stu [( as | t ¯ u ) + ( as | ¯ tu ) + ( a ¯ s | tu )] Γ rs , tu (29a)( R CA ) ir = (cid:88) stu [( is | t ¯ u ) + ( is | ¯ tu ) + ( i ¯ s | tu )] Γ rs , tu , (29b)using the t -weighted coe ffi cient matrix that defines ¯ r , C µ ¯ r = (cid:88) a C µ a t ar − (cid:88) i C µ i t ∗ ri . (30)All of these quantities can be e ffi ciently computed fromdensity-fitted molecular integrals. The L and M terms in Eq. (24) are defined using two setsof t -transformed coe ffi cients, ¯ r and ˜ i , i.e., C µ ¯ r above and C µ ˜ i = (cid:88) a C µ a t ai + (cid:88) r C µ r t ri . (31) Using these t -transformed coe ffi cients, we evaluate the fol-lowing half-transformed integrals,( γ | ˜ i ν Y ) = (cid:88) X (cid:88) ww (cid:48) (cid:88) µ k ww (cid:48) XY ( γ | µ w ν w (cid:48) ) C X ∗ µ ˜ i , (32a)( γ | ¯ r ν Y ) = (cid:88) X (cid:88) ww (cid:48) (cid:88) µ k ww (cid:48) XY ( γ | µ w ν w (cid:48) ) C X ∗ µ ¯ r , (32b)( γ | ˇ r ν Y ) = (cid:88) r ( γ | ¯ s ν Y ) γ sr . (32c)The L and M matrices are computed as Fock-like contribu-tions, L XY µν = (cid:88) γ ( γ | µ X ν Y ) × (cid:88) δ ( J − ) γδ (cid:88) i µ Y (cid:48) ( δ | ˜ i µ Y (cid:48) ) C Y (cid:48) µ i + (cid:88) i µ Y (cid:48) ( γ J | i µ Y (cid:48) ) C Y (cid:48) µ ˜ i − (cid:88) γ i (cid:104) ( γ J | µ X i )( γ | ˜ i ν Y ) + ( γ | µ X ˜ i )( γ J | i ν Y ) (cid:105) (33a) M XY µν = (cid:88) γ ( γ | µ X ν Y ) × (cid:88) δ ( J − ) γδ (cid:88) r µ Y (cid:48) ( δ | ˇ r µ Y (cid:48) ) C Y (cid:48) µ r + (cid:88) r µ Y (cid:48) ( γ J | r µ Y (cid:48) ) C Y (cid:48) µ ˇ r − (cid:88) γ r (cid:2) ( γ J | µ X r )( γ | ˇ r ν Y ) + ( γ | µ X ˇ r )( γ J | r ν Y ) (cid:3) (33b)These matrices are subsequently transformed to the MO basis,which yield L CA , L VA , L VC , M CA , and M VC .It is important to note that, in practice, the half-transformedintegrals ( γ J | i ν Y ) and ( γ J | a ν Y ) can be computed outside themicroiteration loop, because they do not depend on the trialvector. The most expensive steps in each microiteration arethen the half transformation step with the t -weighted closedorbitals [Eq. (32a)] and the Fock build using these integrals[Eq. (33a)]. The cost of the other transformations that involveactive indices is almost negligible for large molecules. Sincethe multiplication of the metric inverse to half-transformed in-tegrals can be avoided, the cost of each microiteration is typ-ically cheaper than one Dirac–Hartree–Fock iteration in ourprogram. D. Approximate Hessian when the Gaunt and full Breitinteractions are used
In second-order CASSCF algorithms, the microiterationdoes not have to be solved very accurately (i.e., the Hessianelements can be slightly approximated), as long as the con-vergence behavior of the macroiteration is not a ff ected. Thisis because the final CASSCF energies, which are defined asthe points where the orbital gradients vanish, remain the sameeven when the Hessian is approximated. For instance, we maytake advantage of this by using loose thresholds for microiter-ations as described above (Sec. II A). This has also been real-ized in the context of non-relativistic density fitted CASSCFby Ten-no, who suggested using smaller fitting basis sets forthe Hessian elements used in the microiteration.We introduce in this work an e ffi cient scheme that is usefulfor relativistic CASSCF calculations with the Gaunt or fullBreit interaction [i.e., calculations including the second andthird terms of Eq. (2)]. While many relativistic simulationshave been performed using the standard Coulomb operator,the Gaunt and Breit terms have to be included in some of thehigh-accuracy calculations since they introduce new types ofphysical interactions (for instance, the direct spin–spin andspin–other-orbit interactions). Though it has been shown byone of the authors that the Gaunt and Breit terms can bereadily included in large-scale relativistic computations, they do increase the computational (and memory) cost sig-nificantly, because one has to transform the three-index inte-grals with large and small components. Therefore, it is pro-posed that we calculate the Hessian elements using the Dirac–Coulomb Hamiltonian, while the orbital gradients are calcu-lated with the Gaunt or Breit terms included; in other words,we replace Eq. (11) with H (cid:48) = (cid:32) ∂ E ∂ κ (cid:33) T (cid:32) ∂ E ∂ κ ∗ (cid:33) T ∂ E ∂ κ ∗ λ ∂ E DC ∂ κ ∗ ∂ κ λ ∂ E DC ∂ κ ∗ ∂ κ ∗ ∂ E ∂ κ λ ∂ E DC ∂ κ ∂ κ λ ∂ E DC ∂ κ ∂ κ ∗ (34)in which E DC is the CASSCF energy expression with theDirac–Coulomb interaction. This procedure allows us to re-duce the cost of microiterations considerably. It will be shownlater that this procedure does not deteriorate the quadratic con-vergence behavior. E. Relativistic CASSCF with a magnetic field
When molecules are placed under an external magneticfield, the interaction between molecules and the magnetic fieldcan be described by replacing the momentum operator p with π = p + A ( r ) where A ( r ) is the vector potential generatedby the external magnetic field; for a uniform field B in theCoulomb gauge, A ( r ) = B × ( r − r G ) , (35)in which r G is the gauge origin. The interaction with a mag-netic field breaks the time-reversal symmetry. Furthermore, toremove the gauge origin dependence of the results, one has touse the so-called gauge-including atomic orbitals (GIAOs) asa basis: φ GIAO x ( r ) = φ x ( r ) exp ( − i A ( R ) · r ) , (36)in which R is the position of the basis function. The smallcomponent basis functions are constructed using the restrictedmagnetic balance relation, rather than the standard re-stricted kinetic balance. The evaluation of molecular inte-grals in relativistic calculations using the GIAOs have been FIG. 1. Geometries of molecules used for benchmarking. (a) UCl (b) Dy(Acac) (H O) (c) U(H BPz ) reported by Reynolds and Shiozaki, in which the compu-tational cost of performing Dirac–Hartree–Fock calculationswith GIAOs has been shown to be almost identical to the onewith the standard basis sets, because the equations in the rela-tivistic framework (even in the absence of magnetic fields) arecomplex.Here, we extended the relativistic CASSCF program de-scribed above so that it can handle molecules placed under amagnetic field. Since we do not utilize the time-reversal sym-metry in the active FCI code and in the integral transformationprograms, this extension has been trivial; the only di ff erence isthat we disable the time-reversal adaptation of the molecularcoe ffi cient matrix when molecular coe ffi cients are updated inthe macroiteration. It will be shown in the section below thatthe rapid convergence is observed when starting from the or-bitals obtained from the calculation without a magnetic field. III. RESULTSA. Convergence Benchmarks
To show the robustness of our algorithm, we presentthe convergence behavior for a challenging system, the 49-atom Dysprosium acetylacetone complex, Dy(Acac) (H O) ,shown in Figure 1(b). The calculations used an active spaceof 9 electrons in the 7 f -orbitals. For the Dirac–CoulombCASSCF calculation, initial guess orbitals were obtained froma closed-shell Dirac–Hartree–Fock calculation performed onthe + − a.u.The energy minimized in the orbital optimization was the av-erage of the 16 states in the J = / was used for these benchmarks: theuncontracted basis set was applied to the dysprosium atom,the TZP contraction to all oxygen atoms, and the DZP con-traction to the carbon and hydrogen atoms. The fitting basisset used for light atoms was uncontracted TZVPP-JKFIT forlight atoms when the Dirac–Coulomb Hamiltonian is used.For Breit calculations, one additional p , d , and f function wasadded to the auxiliary basis set for C and O by multiplying theexponent of the previous tightest function by 2.5. The fittingbasis set for Dy was generated by decontracting ANO-RCC,generating an i -shell by copying the exponents from the ex-isting h -shell, and then adding 4 s , 4 p , 7 d , 8 f , 6 g , 6 h , and 5 i functions, each tighter than the last by a factor of 2.5. The aux- FIG. 2. Convergence of Dy(Acac) (H O) energy and residual with each Dirac CASSCF macroiteration. (a) Dirac–Coulomb Hamiltonian withzero magnetic field, starting from loosely converged Dirac–Hartree–Fock orbitals. (b) Dirac–Coulomb–Breit Hamiltonian with zero magneticfield, starting from the converged orbitals of (a). (c) Dirac–Coulomb Hamiltonian with a 20 T axial magnetic field, starting from the convergedorbitals of (a). iliary basis for H and Dy was not changed between Coulomband Breit Hamiltonians. All calculations used a Gaussian dis-tribution to represent the finite nuclear charge, and eitherrestricted kinetic or restricted magnetic balance for the smallcomponent basis functions, as appropriate. All calculationswith magnetic field used a GIAO basis. The convergencebehavior was tested using tight convergence thresholds: theresidual norm was converged to 10 − for the active CI part,and the total convergence threshold was set to 10 − . The mi-croiterations were converged to a residual norm of 10 − multi-plied by the step size, except for part (c) where this factor wasloosened to 5 × − .The convergence behavior of the new CASSCF algorithmfor the Dysprosium acetylacetone complex is illustrated inFigure 2. The relative state-averaged energy and resid-ual norm are plotted against the number of macroiterationselapsed. Note that the initial CASSCF energy was higher bymany Hartrees than the converged energy due to the use ofpoor initial guess orbitals. Even for this challenging case,our program smoothly lowers the energy towards the solution;once it reaches the quadratic region (at around 22nd iteration),we observe the second-order convergence as expected. Fig-ure 2(b) and (c) show the convergence behavior when eitherthe Breit operator or a 20 Tesla axial magnetic field is added,using the converged orbitals from part (a) as the starting guess.Both of these results show very rapid convergence to the solu-tions, since the initial guess orbitals from the Dirac–Coulombcalculation without a magnetic field were within the quadraticregion in each case. B. Timing Benchmarks
Timing benchmarks have been performed on severalmolecules containing heavy elements. The structures of thethree molecules used are shown in Figure 1. The small-est molecule tested is the octahedral complex UCl with U-Cl bond lengths of 2.42 Å for the singlet ground state. The point group symmetry was not used in the calcula-tions. Two larger organometallic single-molecule magnets,Dy(Acac) (H O) and U(H BPz ) , were also included in thetiming benchmarks. Both of these single-molecule magnetswere tested at their experimental structures, with hydro-gen atom positions optimized using density functional the-ory (DFT) with the PBE functional. The DFT optimizationwas performed by the Turbomole package, and used thedef2-TZVPP basis set except for Dy and U atoms, which useddef-TZVPP with e ff ective core potential. Resolution ofthe identity DFT (RI-DFT) was used with the correspond-ing RI-J auxiliary basis sets.
The resulting molecular co-ordinates are included in the supplementary material. Thecalculations performed here are for the lowest 16 states inDy(Acac) (H O) and lowest 10 states in U(H BPz ), whichcorrespond to the ground J = / J = / f -block atomand contracted ANO-RCC basis sets for the ligands; theTZP contraction was used for oxygen and chlorine atoms, andthe DZP contraction was used for hydrogen, carbon, boron, TABLE I. Average wall times, in seconds, for the molecules tested in this work. All timing benchmarks used two MPI processes per 16-corecompute node, with 8 threads per process.Molecule Hamiltonian Active Closed Virtual Nodes a Micro b Active FCI MacroUCl Coulomb (6,6) 94 453 4 83 . .
8) 96 . . . .
1) 508 . . . .
5) 142 . . (H O) Coulomb (9,7) 118 751 24 100 . .
7) 113 . . + Gaunt (9,7) 118 751 24 111 . .
3) 316 . . + Breit (9,7) 118 751 24 122 . .
3) 831 . . + Magnetic (9,7) 118 751 24 124 . .
0) 165 . . BPz ) Coulomb (3,7) 160 764 48 122 . .
1) 162 . . + Gaunt (3,7) 160 764 48 128 . .
3) 389 . . + Breit (3,7) 160 764 48 143 . .
0) 1129 . . a Each node consists of two Xeon E5-2650 2.00GHz CPUs (Sandy Bridge, 16 CPU cores per node; purchased in 2012) with InfiniBand QDR. b The numbers in parentheses are the average number of microiterations per macro iteration. and nitrogen.
The auxiliary basis sets for density fit-ting were the same as described in the previous section: un-contracted TZVPP-JKFIT where possible and the large cus-tomized fitting basis set for f -block elements. Whenever theGaunt or Breit interaction was included, the auxiliary basis forC, B, N, O, and Cl atoms was supplemented by adding onetighter p , d , and f function with an exponent 2.5 times higherthan the previous tightest function in the angular shell. Anoverall convergence threshold for the residual norm of 10 − was used, while the convergence threshold for active FCI wasset to 10 − . As in the convergence benchmarks in Sec. III A,the convergence threshold for microiterations was normallyset to 10 − multiplied by the step size, but this was loosenedto 5 × − when a magnetic field was applied. For calcula-tions with the Breit operator, we reduced the memory cost bysplitting up the contributions of closed orbitals to the Fock op-erator into smaller batches. (The default in BAGEL is to treatup to 250 spin-orbitals simultaneously; this parameter was re-duced to 150 for the Breit calculations.)Each calculation was run to convergence, and the averagetimings for each macroiteration are shown in Table I. The walltimes required for the average microiteration, macroiteration,and active FCI part are shown for each calculation. The to-tal cost of a macroiteration is slightly greater than the sum ofthese contributions. The active FCI timing includes calcula-tion of the closed Fock matrix with updated orbitals, iterativesolution for CI amplitudes, and computation of reduced den-sity matrices. The total cost is generally dominated by the mi-croiterations, with the active FCI part accounting for at least60% of the remaining cost, followed by smaller costs asso-ciated with the integral transformations and computation ofthe Q vectors and orbitals gradients. The number of microit-erations needed to obtain the Hessian varies substantially be-tween calculations and sometimes from one macroiteration tothe next, while the other costs per iteration vary only slightlythroughout a calculation.For UCl , we tested two active spaces. The smaller ac-tive space is that consisting of six electrons in six orbitals,which correspond to t u σ -bonding and antibonding orbitals(although symmetry is not enforced). The larger active space adds six more to incorporate π -donation into the f -orbitals.It was observed that the increase in the size of the activespaces raised the per-iteration timing by more than a factor ofthree, largely due to slower convergence in the iterative Hes-sian solver. We also measured the timing for the smaller activespace with a large basis set generated by decontracting ligandbasis orbitals. The use of the large basis set increased the costof each step, but it did not lead to the same increase in thenumber of microiterations.Using the two larger molecules, we assessed the cost as-sociated with the use of the Gaunt and Breit interactions.The computational cost associated with each microiterationincreased by 5-12% for Dirac–Coulomb–Gaunt and approx-imately 20% for Dirac–Coulomb–Breit. The higher cost ofthe Breit operator is partly due to the sub-optimal divisionof closed orbitals, which was necessary to reduce the mem-ory cost so that the calculations could be run using the samenumber of compute nodes. The cost of the active FCI part in-creased substantially, by a factor of 2-3 for Gaunt and about7 for Breit, largely due to the expense of computing the MOHamiltonians in the active space. When a magnetic field wasapplied to Dy(Acac) (H O) , we observed only a small in-crease in cost owing to the use of complex atomic orbital ba-sis functions. The numbers of microiterations for these testcases are reported to be smaller than for the Dirac–Coulombcalculations, but this is due to the fact that the starting orbitals(which are the solutions of the Dirac–Coulomb calculations)are of higher quality. IV. CONCLUSIONS
We have developed an e ffi cient algorithm for four-component relativistic CASSCF, which is applicable to siz-able systems and is significantly more stable than the previ-ously reported density-fitted algorithm. The new algorithmis based on the augmented Hessian updates using density fit-ting. Each microiteration has been made slightly cheaper thanone Dirac–Hartree–Fock iteration. The working equation isreported in Eq. (24), which is the main result of this work.We have also introduced a scheme to speed up the relativis-tic CASSCF calculations with the Gaunt and full Breit termsincluded in the Hamiltonian, in which the Hessian matrix el-ements are computed with the Dirac–Coulomb operator. Fur-thermore, we have shown that the code can be applied to sys-tems under a magnetic field, in which gauge-including atomicorbitals are used. All of the programs are parallelized usingboth threads and MPI processes and are part of the BAGELprogram package distributed under the GNU General PublicLicense.
V. SUPPLEMENTARY MATERIAL
See supplementary material for the coordinates of the twolarge molecules used for timing analysis, as well as for the converged total energies obtained from the timing bench-marks.
VI. ACKNOWLEDGMENTS
RDR has been supported by the DOD National DefenseScience and Engineering Graduate (NDSEG) Fellowship, 32CFR 168a. TS has been supported by the National ScienceFoundation CAREER Award (CHE-1351598). The develop-ment of the program infrastructure in BAGEL has been in partsupported by National Science Foundation (ACI-1550481).TY has been supported by JSPS KAKENHI (JP16H04101 andJP15H01097) and JST PRESTO (JP17937609). M. Reiher and A. Wolf,
Relativistic Quantum Chemistry (Wiley-VCH, Germany, 2009). W. Liu, Mol. Phys. , 1679 (2010). T. Saue, ChemPhysChem , 3077 (2011). J. Autschbach, J. Chem. Phys. , 150902 (2012). W. Kutzelnigg and W. Liu, J. Chem. Phys. , 241102 (2005). M. Ilias and T. Saue, J. Chem. Phys. , 064102 (2007). D. Peng, N. Middendorf, F. Weigend, and M. Reiher, J. Chem.Phys. , 184105 (2013). E. van Lenthe, E. J. Baerends, and J. G. Snijders, J. Chem. Phys. , 4597 (1993). M. Barysz and A. J. Sadlej, J. Mol. Struct. (Theochem) , 181(2001). T. Nakajima and K. Hirao, Chem. Rev. , 385 (2012). J. Liu and L. Cheng, J. Chem. Phys. , 144108 (2018). J. Seino and H. Nakai, J. Chem. Phys. , 144101 (2012). L. Visscher, O. Visser, H. Aerts, H. Merenga, and W. C. Nieuw-poort, Comput. Phys. Comm. , 120 (1994). T. Saue, K. Fægri, T. Helgaker, and O. Gropen, Mol. Phys. ,937 (1997). T. Saue and H. J. Aa. Jensen, J. Chem. Phys. , 6211 (1999). H. M. Quiney, H. Skaane, and I. P. Grant, Adv. Quantum Chem. , 1 (1998). I. P. Grant and H. M. Quiney, Int. J. Quantum Chem. , 283(2000). T. Yanai, T. Nakajima, Y. Ishikawa, and K. Hirao, J. Chem. Phys. , 6526 (2001). T. Yanai, T. Nakajima, Y. Ishikawa, and K. Hirao, J. Chem. Phys. , 10122 (2002). M. S. Kelley and T. Shiozaki, J. Chem. Phys. , 204113 (2013). M. Repisky, L. Konecny, M. Kadek, S. Komorovsky, O. L.Malkin, V. G. Malkin, and K. Ruud, J. Chem. Theory Comput. , 980 (2015). H. J. Aa. Jensen, K. G. Dyall, T. Saue, and K. Fægri, Jr., J. Chem.Phys. , 4083 (1996). J. Thyssen, T. Fleig, and H. J. Aa. Jensen, J. Chem. Phys. ,034109 (2008). J. E. Bates and T. Shiozaki, J. Chem. Phys. , 044112 (2015). T. Shiozaki and W. Mizukami, J. Chem. Theory Comput. , 4733(2015). T. Fleig, C. M. Marian, and J. Olsen, Theor. Chem. Acc. , 125(1997). W. Kutzelnigg and W. Liu, J. Chem. Phys. , 3540 (2000). D. Ganyushin and F. Neese, J. Chem. Phys. , 104113 (2013). I. Kim and Y. S. Lee, J. Chem. Phys. , 134115 (2013). B. O. Roos and P.-Å. Malmqvist, Phys. Chem. Chem. Phys. ,2919 (2004). F. Lipparini and J. Gauss, J. Chem. Theory Comput. , 4284(2016). S. Ten-no and S. Iwata, J. Chem. Phys. , 3604 (1996). F. A. Aquilante, T. B. Pedersen, R. Lindh, B. O. Roos, A. S.de Mer´as, and H. Koch, J. Chem. Phys. , 024113 (2008). F. London, J. Phys. Radium , 397 (1937). K. Wolinski, J. F. Hinton, and P. Pulay, J. Am. Chem. Soc. ,8251 (1990). E. I. Tellgren, A. Soncini, and T. Helgaker, J. Chem. Phys. ,154114 (2008). R. D. Reynolds and T. Shiozaki, Phys. Chem. Chem. Phys. ,14280 (2015). bagel , Brilliantly Advanced General Electronic-structure Library.http: // T. Shiozaki, WIREs Comput. Mol. Sci. , e1331 (2018). H.-J. Werner, Adv. Chem. Phys. , 1 (1987). H.-J. Werner and W. Meyer, J. Chem. Phys. , 5794 (1981). H.-J. Werner and P. J. Knowles, J. Chem. Phys. , 5053 (1985). Q. Sun, J. Yang, and G. K.-L. Chan, Chem. Phys. Lett. , 291(2017). Y. Ma, S. Knecht, S. Keller, and M. Reiher, J. Chem. TheoryComput. , 2533 (2017). T. Shiozaki, J. Chem. Phys. , 111101 (2013). G. A. Aucar, T. Saue, L. Visscher, and H. J. Aa Jensen, J. Chem.Phys. , 6208 (1999). W. Kutzelnigg, Phys. Rev. A , 032109 (2003). S. Komorovsk´y, M. Repisk´y, O. L. Malkina, V. G. Malkin, I. M.Ond´ık, and M. Kaupp, J. Chem. Phys. , 104101 (2008). P.-O. Widmark, P. Å. Malmqvist, and B. O. Roos., Theor. Chim.Acta , 291 (1990). B. O. Roos, R. Lindh, P.-Å. Malmqvist, V. Veryazov, and P.-O.Widmark, J. Phys. Chem. A , 2851 (2004). B. O. Roos, R. Lindh, P.-Å. Malmqvist, V. Veryazov, and P.-O.Widmark, J. Phys. Chem. A , 11431 (2008). F. Weigend, J. Comput. Chem. , 167 (2008). L. Visscher and K. G. Dyall, At. Data Nucl. Data Tables , 207(1997). W. H. Zachariasen, Acta Cryst. , 285 (1948). S.-D. Jiang, B.-W. Wang, G. Su, Z.-M. Wang, and S. Gao, Angew.Chem. Int. Ed. , 7448 (2010). J. D. Rinehart, K. R. Meihaus, and J. R. Long, J. Am. Chem. Soc. , 7572 (2010). J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. ,3865 (1996). R. Ahlrichs, M. B¨ar, M. H¨aser, H. Horn, and C. K¨olmel, Chem.Phys. Lett. , 165 (1989). F. Furche, R. Ahlrichs, C. H¨attig, W. Klopper, M. Sierka, andF. Weigend, WIREs Comput. Mol. Sci. , 91 (2014). F. Weigend, M. H¨aser, H. Patzelt, and R. Ahlrichs, Chem. Phys.Lett. , 143 (1998). M. Dolg, H. Stoll, and H. Preuss, J. Chem. Phys. , 1730 (1989). W. K¨uchle, M. Dolg, H. Stoll, and H. Preuss, J. Chem. Phys. ,7535 (1994). X. Cao, M. Dolg, and H. Stoll, J. Chem. Phys. , 487 (2003). K. Eichkorn, F. Weigend, O. Treutler, and R. Ahlrichs, Theor.Chem. Acc. , 119 (1997). F. Weigend, Phys. Chem. Chem. Phys. , 1057 (2006). B. O. Roos, R. Lindh, P.-Å. Malmqvist, V. Veryazov, and P.-O.Widmark, Chem. Phys. Lett.409