Direct Energy Minimization Based on Exponential Transformation in Density Functional Calculations of Finite and Extended Systems
Aleksei V. Ivanov, Elvar ?. Jónsson, Tejs Vegge, Hannes Jónsson
aa r X i v : . [ phy s i c s . c o m p - ph ] J a n Direct Energy Minimization Based on ExponentialTransformation in Density Functional Calculations ofFinite and Extended Systems
Aleksei V. Ivanov a,b , Elvar ¨O. J´onsson a , Tejs Vegge c , Hannes J´onsson a a Science Institute and Faculty of Physical Sciences, University of Iceland VR-III,107Reykjav´ık, Iceland b St. Petersburg State University, 199034, St. Petersburg, Russia c Department of Energy Conversion and Storage, Technical University of Denmark,DK-2800 Kgs. Lyngby, Denmark
Abstract
The energy minimization involved in density functional calculations of electronicsystems can be carried out using an exponential transformation that preservesthe orthonormality of the orbitals. The energy of the system is then repre-sented as a function of the elements of a skew-Hermitian matrix that can beoptimized directly using unconstrained minimization methods. An implementa-tion based on the limited memory Broyden-Fletcher-Goldfarb-Shanno approachwith inexact line search and a preconditioner is presented and the performancecompared with that of the commonly used self-consistent field approach. Re-sults are presented for the G2 set of 148 molecules, liquid water configurationswith up to 576 molecules and some insulating crystals. A general preconditioneris presented that is applicable to systems with fractional orbital occupation asis, for example, needed in the k-point sampling for periodic systems. This ex-ponential transformation direct minimization approach is found to outperformthe standard implementation of the self-consistent field approach in that all thecalculations converge with the same set of parameter values and it requires lesscomputational effort on average. The formulation of the exponential transfor-mation and the gradients of the energy presented here are quite general andcan be applied to energy functionals that are not unitary invariant such asself-interaction corrected functionals.
1. Introduction
There are several different approaches for finding optimal orbitals corre-sponding to the minimum of an energy functional in the context of Kohn–Shamdensity functional theory (KS-DFT) [1, 2]. The most commonly used methodis based on a self-consistent field (SCF) algorithm consisting of two steps. In
Email addresses: [email protected] (Aleksei V. Ivanov), [email protected] (Hannes J´onsson)
Preprint submitted to Computer Physics Communications February 1, 2021 he first step and for a given density, one finds eigenvalues and eigenfunctionsusing an iterative algorithm such as the Davidson algorithm [3] or even directdiagonalization of the full Hamiltonian matrix when the size of the basis set isnot too large. In the second step, the electron density or Hamiltonian matrixis updated using, for example, direct inversion in the iterative subspace (DIIS)method [4, 5]. The SCF approach is widely used and has proven to be efficientfor both finite (molecules/clusters) and extended systems, but can, neverthe-less, suffer from convergence problems. Various density and Hamiltonian mixingschemes have been introduced to address such cases [6, 7]. As a result, the userof typical software developed for KS-DFT calculations is often presented withthe task of choosing values of various parameters and select between varioustypes of eigensolvers. Systems with similar chemical and physical propertiesmay even call for different choices. A further problem of the SCF method incalculations of ground electronic states is that it may converge on a saddle pointof the energy surface rather than a minimum [8].Another approach to this optimization problem is based on direct minimiza-tion of the energy with respect to the electronic degrees of freedom [9, 10, 11,12, 13, 14, 15, 16, 17, 18]. The challenge then is to incorporate the constraintof orthonormality of the orbitals (the single electron wave functions). One wayto approach this is to follow the energy gradient projected on the subspace tan-gent to the orbitals [10, 11]. After such an adjustment of the orbitals withinthis tangent space, the orthonormality constraints will be violated and, there-fore, an explicit orthonormalization of the orbitals needs to be applied aftereach iteration. This approach is often used in calculations with a plane wavebasis set. Alternatively, when the basis set is compact, as in calculations usinglinear combination of atomic orbitals, a unitary transformation can be appliedto a set of orthonormal reference orbitals that includes all occupied and virtualorbitals, and the energy is then minimized by optimizing the elements of thetransformation matrix. The orthonormality constraints will then be satisfied,but, due to the constraints imposed by the unitary matrix, the energy is definedon a curved space. As a result, minimization algorithms need to be modifiedto take the curvature into account. This can be achieved by performing a linesearch along geodesics [19]. Alternatively, the unitary matrix can be parame-terized using an exponential transformation [9, 12, 14] in which case the energybecomes a function of the elements of a skew-Hermitian matrix in linear space.Well-established, unconstrained minimization strategies can then be applied in-cluding inexact line searches that can give robust convergence. We will referto this approach as exponential transformation direct minimization (ETDM).Furthermore, it has been used in calculations of molecules using KS-DFT [15]and previously in the context of Hartree-Fock theory [9, 20, 21, 22]. There, theoccupation numbers for the orbitals have been restricted to integers so that uni-tary invariance with respect to rotation within the space of occupied orbitals isensured. Preconditioners to accelerate convergence have been presented for suchsystems and found to be important in order to achieve good performance [9, 15].In this article, a generalization and efficient implementation of the ETDMapproach is presented as well as applications to both finite and extended sys-2ems. The method can be applied to systems with fractional occupation, forexample, where k-point sampling of the Brillouine zone (BZ) is carried out. Theformulation presented here is also applicable to energy functionals that are notunitary invariant, such as self-interaction corrected functionals [23]. Tests ofthe performance of this ETDM implementation and comparison with the SCFmethod including density mixing are carried out for the G2 set (a total of 148molecules), liquid configurations consisting of up to 576 water molecules andseveral insulating crystals.The article is organised as follows. In section 2, the ETDM method is for-mulated in a general way and equations provided for the derivative of the en-ergy with respect to the matrix elements in the exponential transformation. Insection 3, an efficient preconditioner is presented, applicable to systems withnon-integer occupation numbers, as well as methods for evaluating the gradientof the energy and ways to choose the search direction as well as step-lengthin an inexact line-search procedure. In section 4, performance tests are pre-sented with comparison to conventional SCF calculations. Finally, discussionand conclusions are presented in section 5.
2. General formulation
In KS-DFT, the energy functional is E = X i, k f i ( k ) Z d r |∇ φ i k ( r ) | Z d r ρ ( r ) v ext ( r )+ (1)+ 12 Z Z d r d r ′ ρ ( r ) ρ ( r ′ ) | r − r ′ | + E xc [ ρ ( r )] . (2)where the φ are orbitals of the non-interacting electron system that has totalelectron density ρ ( r ) = X i, k f i ( k ) | φ i k ( r ) | , (3)equal to that of the interacting electron system, the f i ( k ) are orbital occupa-tion numbers for the k -th point of the BZ with 0 ≤ f i ( k ) ≤ v ext ( r ) is theexternal potential corresponding to electron-nuclei interaction, and E xc is theexchange-correlation energy. The orbitals are expanded in terms of a possiblynon-orthogonal basis set consisting of M basis functions φ i k ( r ) = M X µ =1 O µi ( k ) χ µ k ( r ) , (4)and the task is to find optimal values of the coefficients O µi ( k ) that minimizethe energy E [ { O ( k ) } k ] subject to the orthonormality constraints: O † ( k ) S ( k ) O ( k ) = I k ∈ BZ, (5)3ith S µν ( k ) = R χ ∗ µ k ( r ) χ ν k ( r ) d r being the overlap matrix.The basis functions for periodic systems are Bloch states and in a localisedbasis set approach they can be written as χ µ k ( r ) = 1 √ N X R exp( i k · R ) η µ ( r − R − d µ ) (6)where η µ ( r − R − d µ ) is an atomic orbital centered on an atom in the simulatedcell. The subscript µ enumerates the atomic orbitals and R belongs to the Bra-vais lattice. An initial guess for the orbitals is expressed as a linear combinationof the basis functions ψ m k ( r ) = X µ =1 ..M C µm ( k ) χ µ k ( r ) . (7)Given an initial guess for the orbitals, C µm ( k ), which we will refer to as thereference orbitals, the optimal orbital coefficients O µm ( k ) that provide minimalenergy can be found through a unitary transformation as O ( k ) = C ( k ) e A ( k ) (8)where A ( k ) is a skew-Hermitian matrix, A ( k ) † = − A ( k ). For a set of N k vectorsused to represent the BZ, a set of matrices { A ( k ) } k is needed. For a given setof reference orbitals, a set of unitary matrices, U ( k ) = exp( A ( k )), exists sothat the reference orbitals are transformed to the optimal orbitals. Thus, theground-state energy of the system is a function of the upper triangular elementsof a set of matrices A ( k ), E [ n ] = E [ { a , . . . , a M , a , . . . , a M , . . . , a MM } k ] (9)where a ij = ( A ) ij and k denotes the set of N k vectors. The real part of thediagonal elements of the matrices are zeros and therefore the energy is a functionof N k M variables. There are M ( M − / M ( M + 1) / { a ij ( k ) } i ≤ j .Introducing the derivative ∂∂a ij ( k ) = 12 (cid:18) ∂∂ Re( a ij ( k )) − i ∂∂ Im( a ij ( k )) (cid:19) (10)the gradient of the energy can be evaluated as ∂E∂a ij ( k ) = X µν H µν ( k ) ∂ρ µν ( k ) ∂a ij ( k ) (11)where the Hamiltonian matrix is H µν ( k ) = Z d r χ ∗ µ k ( r ) (cid:18) − ∇ + v ( r ) (cid:19) χ ν k ( r ) . (12)4ere, v ( r ) is the single electron Kohn-Sham potential, and the density matrixis given in terms of the optimal coefficient matrix as ρ µν ( k ) = X m f m ( k ) O µm ( k ) O νm ( k ) . (13)By defining the commutator L mk ( k ) = [ F ( k ) , H ( k )] mk , (14)where H ( k ) is the Hamiltonian matrix represented in terms of the optimalorbitals H ( k ) mk = X µν O µm ( k ) H µν ( k ) O νk ( k ) , and F ( k ) is a diagonal matrix with occupation numbers f m ( k ) as diagonalelements, the derivatives in Eq. (11) can be written as ∂E∂a ij ( k ) = 2 − δ ij (cid:18)Z e tA ( k ) L ( k ) e − tA ( k ) dt (cid:19) ji . (15)For the optimal orbitals, the gradient ∂E/∂a ij ( k ) must be zero so Z e tA ( k ) L ( k ) e − tA ( k ) dt = 0 , k ∈ BZ. (16)These non-linear equations can be used to find the skew-Hermitian matrix thatprovides the energy minimum. For the remainder of this article, the k -pointindex k is omitted for simplicity.Eq. (15) is general and can be applied to an objective function that dependsexplicitly on the orbitals as well as the total density, but then the definition of L needs to be be changed accordingly. For example, for the Perdew-Zunger self-interaction correction (PZ-SIC) [23], the matrix L for a single k-point calculationis L mk = [ F, H ] mk + f k V km − f m V mk , (17)where V km is a matrix element of the SIC potential § V mk = X µν O µm V kµν O νk , (18) V kµν = Z χ ∗ µ ( r ) (cid:20)Z d r ′ ρ k ( r ′ ) | r − r ′ | + v xc ( ρ k ( r )) (cid:21) χ ν ( r ) d r . (19)Equation (16) can be expanded in a series as Z e tA Le − tA dt = L + 12! [ A, L ] + 13! [ A, [ A, L ]] + . . . (20)If k L k ≫ k [ A, L ] k , which corresponds to k A k ≪ k [ A, L ] k ≥ k A kk L k ,then the first term on the right hand side can be used to estimate the gradient.5his limit of ‘small rotations’ corresponds to the geometric approach used byVan Voorhis and Head-Gordon [15] and has also been used in the context oforbital-density dependent functionals [24, 25, 26]. The higher order terms canalso be included to increase the accuracy of the gradient estimate, but eachiteration then requires more computational effort.The minimization procedure is performed with respect to the real and imag-inary parts of matrix elements using the energy gradient given by Eq. (15) ∂E∂ Re( a ij ) = 2Re (cid:18) ∂E∂a ij (cid:19) (21)and ∂E∂ Im( a ij ) = − (cid:18) ∂E∂a ij (cid:19) . (22)Computational algorithms for the evaluation of the the matrix exponential andgradient of the energy are presented in Sec. 3.4
3. Algorithms and Computational Parameters
In order to find the optimal orbitals, O , corresponding to minimal energy,the appropriate exponential transformation of the reference orbitals, C , O = Ce A (23)needs to be determined. The reference orbitals can be chosen to be any set oforthonormal orbitals spanned by the basis set and they are held fixed during theminimization of the energy for a given number of steps while only the matrix A is varied. The closer the reference orbitals are to the optimal orbitals, the fasterthe iterative procedure will converge.A line search method has been implemented where the ( k + 1)th iterationstep is ~a ( k +1) = ~a ( k ) + α ( k ) ~p ( k ) . (24)Here, ~a ( k +1) is a vector consisting of the real and imaginary part of the uppertriangular elements of matrix A at the k th step of the minimization algorithm, ~a = (Re( a ) , . . . , Re( a M ) , Re( a ) , . . . , Re( a M ) , . . . , Re( a M − M ) , Im( a ) , Im( a ) , . . . , Im( a M ) , Im( a ) , Im( a ) , . . . , Im( a M ) , . . . , Im( a MM )) T , (25)and ~p k is the search direction while α k is the step length.6 .1. Choice of search direction The search direction can be chosen according to the steepest descent method,various Quasi-Newton methods, or nonlinear conjugate gradient (CG) meth-ods. The calculation of the search direction involves algebraic operations asso-ciated with the particular method plus the evaluation of the energy and gra-dient for the given energy functional. The dimensionality of the minimizationproblem scales as
N M , where N is the number of occupied orbitals and M is the number of basis functions. While Quasi-Newton methods such as theBroyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm require fewer iterationsthan limited-memory BFGS (L-BFGS) or CG, the algebraic operations becomea bottleneck even for systems of moderate size (the BFGS algorithm scales as O ( N M ) [27]). However, every iteration of the L-BFGS algorithm, in whichthe approximate inverse Hessian matrix is updated, can be computed with thecost of O ( mN M ) operations, where m is the number of previous steps storedin memory. In the present implementation, the L-BFGS algorithm as describedin Ref. [27] is used and m = 3 in the benchmark calculations. The step length α k is chosen in such a way that it satisfies the strong Wolfeconditions [28, 29, 27] E ( ~a ( k ) + α ( k ) ~p ( k ) ) ≤ E ( ~a ( k ) ) + c α ( k ) ∇ ~a E ( ~a ( k ) ) · ~p ( k ) (26)and |∇ E ( ~a ( k ) + α ( k ) ~p ( k ) ) · ~p ( k ) | ≤ c |∇ ~a E ( ~a ( k ) ) · ~p ( k ) | (27)with 0 < c < c <
1. A trial step of α ( k ) = 1 is always used first to testthe conditions. After several iterations, a step length of 1 guarantees that thestrong Wolfe conditions are satisfied in the L-BFGS algorithm [27]. This is ap-pealing since it reduces the number of energy and gradient calculations whichare computationally most intensive in KS-DFT calculations. If α ( k ) = 1 is notsatisfied by the strong Wolfe conditions, then the inexact line search based onthe interpolation of the energy along the search direction is used [27]. When theenergy of the system is evaluated, the KS-DFT potential needs to be obtainedand, as a result, there is little additional effort involved in evaluating the gra-dient. Therefore, the energy along the search direction is always interpolatedby a cubic function using information about the energy values and gradient atthe boundaries of the search interval [ a, b ]. Alongside the strong Wolfe condi-tions, approximate Wolfe conditions are also checked [30] at the minimum ofthe interpolated cubic function(2 δ − ∇ ~a E ( ~a ( k ) ) T ~p ( k ) ≥ ∇ ~a E ( ~a ( k ) + α ( k ) ~p ( k ) ) · ~p ( k ) ≥ σ ∇ ~a E ( ~a ( k ) ) · ~p ( k ) , (28)and the condition E ( ~a ( k ) + α ( k ) ~p ( k ) ) ≤ E ( ~a ( k ) ) + ǫ | E ( ~a ( k ) ) | (29)7here δ < min { . , σ } , 0 < σ < ǫ is a small fixed number. Thus, theline search algorithm is terminated when either the strong Wolfe conditions ofEqs. (26)-(27) or the approximate Wolfe conditions of Eq. (28) along with thecondition in Eq (29) holds. The parameter values are set to [27, 30] c = 10 − , c = 0 . , δ = 0 . , σ = 0 . , ǫ = 10 − . (30) A preconditioner speeds up convergence of this iterative algorithm. It is con-structed as the inverse of an approximate Hessian matrix that can be obtainedby taking the derivative of a linear expansion of the gradient (Eq. (15)) with re-spect to the skew-Hermitian matrix, and neglecting first order derivatives of theeffective potential. For the real valued case, the Hessian can be approximatedas ∂ E∂a ij ∂a lm ≈ δ il H jm ( f l + f i − f j − f m ) (31)+ δ jl H im ( f m + f i − f l − f j ) (32)+ δ jm H li ( f m − f i − f l + f j ) (33)+ δ im H lj ( f l − f m − f i + f j ) (34)+ β ij δ il δ jm (35)where the matrix β ij must be chosen according to the following two principles:(1) the approximate Hessian must be positive definite, and (2) it must providea good estimate of the true Hessian along the search direction such that a stepsize of 1 satisfies the strong Wolfe conditions.If the orbitals are chosen as eigenvectors of the Hamiltonian then the ap-proximate Hessian is diagonal ∂ E∂ a ij = − ǫ ii − ǫ jj )( f i − f j ) + β ij . (36)The first term on the right hand side coincides with the preconditioner that haspreviously been used for molecular systems with integer occupation numbers [9].There, an extra term was added in cases of degeneracy, ǫ ii = ǫ jj , but here theinitial approximation of the Hessian in the L-BFGS algorithm [27] β ij is used.Since the approximate Hessian is diagonal, the preconditioner is simply P ij = 1 − ǫ ii − ǫ jj )( f i − f j ) + β ij . (37)In the present implementation, the preconditioner is updated iteratively and foriteration k it is P ( k ) ij = 1 − − γ )( ǫ ii − ǫ jj )( f i − f j ) + γβ ( k ) ij , (38)8here β ( k ) ij = k∇ ~a E ( ~a ( k ) ) − ∇ ~a E ( ~a ( k − ) k ( ~a ( k ) − ~a ( k − ) · ( ∇ ~a E ( ~a ( k ) ) − ∇ ~a E ( ~a ( k − )) . (39)The parameter γ in Eq.(38) is a number that determines the mixing of thetwo approximate Hessians: the one obtained from a linear expansion of thegradient, Eq.(20), and the one based on the LBFGS estimate, Eq.(39). In thecalculations presented here, γ = 0 .
25 was found empirically to give a goodcompromise between the rate of convergence and robustness. When k-pointsampling is included for the periodic systems, β ( k ) ij , needs to be multiplied bythe numerical weight of the corresponding k-point.With this preconditioner, a step length of 1 is almost always accepted and itworks well for both finite and extended systems. It is used for both the real andimaginary parts of the skew-Hermitian matrix. We note that the eigenvaluesin Eq. (38) are not updated at every iteration of the minimization algorithmbut only at the beginning, thereby avoiding the costly diagonalization of theHamiltonian matrix at each step. The evaluation of the exponential of the skew-Hermitian matrix, exp( A ), iscarried out using eigendecomposition of iA . Let Ω be a diagonal, real-valuedmatrix with elements corresponding to the eigenvalues of the matrix iA and let U be a column matrix of the eigenvectors of iA . Then the matrix exponentialof A is exp( A ) = U exp( − i Ω) U † . (40)This computation requires diagonalization of a M × M matrix and becomesa computational bottleneck for large systems. However, for unitary invariantenergy functionals (such as Kohn-Sham functionals), Hutter et.al. [12] haveshown that A can be parametrised without loss of generality as A = (cid:18) A ov − A † ov (cid:19) , (41)where A ov is a N × ( M − N ) matrix (N - number of occupied states) and thematrix exponential can be calculated asexp( A ) = (cid:18) cos( P ) P − / sin( P / ) A ov − A † ov P − / sin( P / ) I M − N + A † ov cos( P / − I N ) P − A ov ) (cid:19) , (42)where P = A ov A † ov . In this case the computational effort scales as O ( N M ).An alternative and more general approach is provided by the scaling andsquaring algorithm based on the equationexp( A ) = exp( A/ m ) m (43)and on [ q, q ] P´ade approximant to the matrix exp( A/ m ), where m and q arepositive integer constants [31]. The algorithm of Al-Mohy and Higham is used9ere [32, 33]. The two approaches are compared in the benchmark calculationspresented below.If the matrix exponential is evaluated using the eigendecomposition of iA ,then one can calculate the gradient of the energy using the matrices U and Ωas G T = U (cid:0)(cid:0) U † LU (cid:1) ⊗ D (cid:1) U † , (44)where the matrix D is D ij = e − i (Ω ii − Ω jj ) − i (Ω ii − Ω jj ) (45)and the matrix G is G ij = ∂E∂a ij − δ ij ) . (46)However, due to the sparsity of the matrix A and if the norm is k A k ≪ G ≈ L T . (47)If the norm of the matrix A is larger than 1, then the reference orbitals can beupdated C ← C exp( A ) in which case A ← O ( k ) = C exp( A ( k ) )check if k A ( k ) k > ǫ then set C ′ = C exp( A ( k ) ), A ( k ) = 0 and continue with O ( k +1) = C ′ exp( A ( k +1) ) . For small systems, the performance is similar for the various methods forevaluating the matrix exponential and energy gradient since the calculation ofthe effective Kohn-Sham potential and the total energy then dominates thecomputational effort. For larger systems, a difference in performance becomesevident, as illustrated below for configurations of liquid water with up to 576molecules.
We have implemented the ETDM algorithm using a numerical localizedatomic basis set and the projector augmented-wave formalism (PAW) [34] totake into account the frozen, inner electrons of the atoms within the open-sourceGPAW software [35]. An SCF algorithm based on the eigendecomposition of theHamiltonian in a localised atomic basis set representation is already availablethere and is frequently used in KS-DFT calculations [36]. To compare the effi-ciency of the two approaches, single-point ground-state energy calculations areperformed for the G2 [37] data set of small molecules, five ionic solids, as wellas liquid water configurations including 32, 64, 128, 256, 384 and 576 moleculessubject to periodic boundary conditions. The double-zeta polarized basis set10which is the default basis set in GPAW) and the generalized gradient approxi-mation (GGA) parametrized by Perdew-Burke-Ernzerhof [38] is used. An initialguess for the orbitals is taken to be the eigenvectors of the Hamiltonian obtainedfrom a superposition of atomic densities.Convergence is considered achieved for both the SCF and the ETDM meth-ods when the inequality1 N e N b X i =1 Z d r f i | ˆ H KS ψ i ( r ) − N b X j =1 λ ij ψ j ( r ) | < − eV (48)is satisfied. In the equation above, the λ ij are Lagrange multipliers and for anSCF algorithm this is a diagonal matrix. N b is the number of occupied orbitals.Default values in GPAW are used, for example the Pulay density mixing param-eters. We note that in cases where the SCF method fails to converge, it could inprinciple be made to converge by using, for example, other, non-default valuesof the density mixing parameter. Failure to reach convergence here means thatconvergence is not obtained in the default maximum number of iteration steps,which is 333.
4. Results
The average number of energy and gradient evaluations for the ETDMmethod and the average number of energy and diagonalization calculations forthe SCF method are presented in Table 1 and Fig. 1. The ETDM methodconverges for all the 148 molecules in the G2 set using the parameter valuesspecified in Sec. 3. The SCF method, however, fails to converge for five of themolecules: CH, SH, ClO, NO, and OH. These five molecules are also challengingfor the ETDM method as it requires more iterations to reach convergence therethan the average for the whole G2 set (see Fig. 1). For the molecules where SCFconverges, it requires a similar number of iterations as ETDM. On average 18and 17 iterations are required by the SCF and ETDM methods, respectively.The reason for the lack of convergence for SCF and slow convergence ofETDM in the five problematic cases could be the presence of nearby saddlepoints or near-degenerate higher energy states. In the SCF calculations, theorbitals obtained from the diagonalization of the Hamiltonian matrix at subse-quent iterations can ‘jump’ between different energy surfaces or oscillate arounda saddle point. Analogous convergence issues for the DIIS method have beenreported for the G2 molecular set and transition metal complexes [15].11 H P C H C H O H C O H C S O C H C H O C H C C H C O F C H C H O C H H C O O H N u m b e r o f i t e r a t i o n s ( e n e r g y a n d g r a d i e n t s c a ll s ) SCFETDM N O C H O H C l O S H Figure 1: (a) Number of SCF iterations and energy/gradient evaluations in the exponentialtransform direct minimization needed to reach convergence according to criterion Eq. (48) fora representative set of 10 molecules from the G2 set. (b) Energy/gradient evaluations in theexponential transform direct minimization for the molecules for which the SCF method failedto converge.Table 1: Comparison of the performance of the exponential transform direct minimization,ETDM, and self-consistent field, SCF, methods for the G2 set of molecules (a total of 148molecules). The average number of energy and gradient evaluations is reported for the formermethod, but the average number of energy and diagonalization calculations for the latter (inboth cases denoted e/g(d)). In the column labeled ETDM ∗ , the five molecules for which theSCF calculations did not converge are excluded. SCF ETDM ETDM ∗ average e/g(d) 18 17 16min e/g(d) 12 6 6max e/g(d) 26 72 25did not converge 5 - -For these small molecules, the evaluation of the matrix exponential andenergy gradient, i.e. the diagonalization of the Hamiltonian matrix, is not thedominant computational effort. The various algorithms presented in Sec 3.4therefore involve similar computational effort. As examples of extended systems subject to periodic boundary conditions,calculations have been carried out for five crystalline solids: NaCl, NaF, LiCl,LiF and MgO. A cubic unit cell is chosen consisting of 8 atoms and Γ-centered3 × × a C l N a F L i C l L i F M g O N u m b e r o f i t e r a t i o n s ( e n e r g y a n d g r a d i e n t s c a ll s ) SCFETDM
Figure 2: Number of SCF iterations and energy/gradient evaluations in the exponential trans-form direct minimization needed to reach convergence according to criterion Eq. (48) for NaCl,NaF, LiCl, LiF and MgO crystals.
The number of iterations required to reach convergence is presented in Fig. 2.The results show that the ETDM and the SCF algorithms have similar rate ofconvergence for these systems. This is an important test of the preconditionergiven in equation (38) and shows that it is suitable for solids as well as molecules.Tests were also carried out for another set of extended systems representingsnapshots of liquid water. The systems contain 32, 64, 128, 256, 384 and 576water molecules subject to periodic boundary conditions. The efficiency of thetwo approaches for evaluating the matrix exponential in the ETDM methoddiscussed in Sec 3.4 is compared, also in relation to SFC, and reported in Fig.3.One of the approaches is based on Eq. (42) and makes use of the fact that theenergy is invariant with respect to unitary rotations of the occupied orbitals. Inthis case, the computation of the matrix exponential requires diagonalization ofan N × N matrix and involves less computational time as compared to the SCFalgorithm where the first N eigenvectors of a M × M Hamiltonian matrix need tobe calculated. The other approach, the scaling and squaring algorithm Eq.(43),is more general and does not rely on the parameterization of the skew-Hermitianmatrix based on Eq. (43). For dense matrices, this approach is generally slowerthan the one based on eigendecomposion of the skew-Hermitian matrix Eq. (40),but for sparse matrices this algorithm can outperform the eigendecomposionapproach. The energy gradient is calculated according to Eq. (47).The ratio of the CPU time required by calculations using the SCF methodand the ETDM method is shown as a function of the number of water moleculesin Fig. 3. When the matrix exponential is evaluated using Eq. (42) the ETDMmethod outperforms SCF by a factor of two if more than 200 water molecules areincluded in the system. Also, the more general implementation of ETDM using13
00 200 300 400 500 600Number of water molecules1.01.21.41.61.82.0 T s c f / T e t d m ETDM, ssETDM, uinv
Figure 3: Ratio of the CPU time used by the SCF method and the exponential transform directminimization, ETDM, method based on either the scaling and squaring algorithm, Eq. (43)(ss, red curve) or the evaluation of the matrix exponential by diagonalization, Eq. (40), (uinv,blue curve), as a function of the number of water molecules in liquid configurations subject toperiodic boundary conditions. For the largest system, the direct minimization based on matrixdiagonalization outperforms the SFC method by a factor of two while the implementationbased on the scaling and squaring algorithm is 20% faster than SCF.
5. Discussion and Conclusion
The main advantage of the ETDM implementation presented here, based ona general preconditioner, L-BFGS algorithm and inexact line search is robust-ness. For small molecules the computational effort is similar to the standardSCF approach when the latter converges, but the ETDM is found to convergefor all the molecules in the G2 set with the same set of parameter values, a setthat also works for extended liquid configurations and insulating solids. Thisdemonstrates the transferability of the ETDM algorithm as implemented here.For the large systems considered here, liquid water configurations with 200 andup to 576 molecules, the ETDM outperforms the direct SCF method up to bya factor of two when special parametrization of skew-Hermitian matrix is usedand by around 20% when the more general scaling and squaring method is used.The latter is more general and can be applied to any type of orbital dependentenergy functional such as self-interaction corrected functionals [23].The ETDM method involves minimization of the energy with respect to theelements of a skew-Hermitian matrix and, therefore, the number of degrees offreedom scales as M , where M is the number of basis functions. However,for energy functionals that are unitary invariant with respect to the occupiedorbitals, the skew-Hermitian matrices can be parametrized using N × ( M − N )degrees of freedom,[12] where N is the number of occupied orbitals. There-fore, taking into account the sparsity of the matrices, the algorithm can beimplemented in such a way that the computational effort scales as O ( N M ).The scaling and squaring algorithm for evaluating the matrix exponential is notas efficient but is more generally applicable and can still outperform the SCFmethod as was found for the large liquid water configurations.Future work will involve generalization of the ETDM method to finite tem-perature KS-DFT, i.e. thermal smearing, where an additional inner loop forvariational optimization of the occupation numbers is included, analogous tothe direct minimization method used in ensemble DFT [13]. This is neededfor calculations of metallic systems. A more efficient preconditioner could alsolikely be developed, especially for orbital density dependent functionals. Finally,we point out that the ETDM method is also useful in other types of electronicstructure calculations, such as studies of excited states [40, 41].
6. Acknowledgement
The authors thank Gianluca Levi for fruitful discussion and valuable com-ments on the manuscript. This work was supported by the University of IcelandResearch Fund and the Icelandic Research Fund (grant no. 174082-053). AVI issupported by a doctoral fellowship from the University of Iceland. HJ, AVI and15 ¨OJ thank the Department of Energy Conversion and Storage at the TechnicalUniversity of Denmark for hospitality during an extended visit and access tocomputational resources.