[PDF] On the Nature of Self-Consistency in Density Functional Theory

Abstract

A thesis providing a pedagogical introduction to the problem of achieving self-consistency in density functional theory. Contained is an introduction to the framework of Kohn-Sham density functional theory, leading then to the considerations required to solve the equations of Kohn-Sham density functional theory. Specifically, a focus is placed on where current self-consistent field methodology is inefficient and/or fails to converge to a solution. As such, this review spans sub-disciplines such as numerical analysis of linear and non-linear systems, linear response theory, and general electronic structure theory. Toward the end of the thesis, certain contemporary methods for achieving self-consistency from literature are outlined, and a novel, computationally efficient preconditioning strategy is proposed. This work is implemented in the CASTEP software.

Full PDF

TThe University of Cambridge

On the Nature of Self-Consistency in Density FunctionalTheory

Nick Woods

Clare CollegeTheory of Condensed Matter, Cavendish Laboratory

A thesis submitted in fulﬁlment of the requirementsfor the degree of Master of Philosophyin the

Centre for Scientiﬁc ComputingSupervisors: Dr. Phil Hasnip, Prof. Mike Payne a r X i v : . [ c ond - m a t . o t h e r] M a r This dissertation is substantially my own work and con-forms to the University of Cambridge’s guidelines on pla-giarism. Where reference has been made to other researchthis is acknowledged in the text and bibliography.

HE UNIVERSITY OF CAMBRIDGE

Abstract

Theory of Condensed Matter, Cavendish Laboratory

On the Nature of Self-Consistency in Density Functional Theory by Nick Woods

Density functional theory (DFT) in the Kohn-Sham (KS) framework presents a non-linear eigenvalue problem whereby, in order to solve the system, the particle densityone calculates from the KS equations must be equal to the particle density one usesto construct the KS equations. Such a solution deﬁnes what it means to attain self-consistency . This work studies the methods used to achieve and accelerate convergencetoward a self-consistent solution of KS DFT. That is, the extent to which one canutilise prior information about an input (and the framework itself) to minimise the timespent in going from an initial guess solution, to a self-consistent solution. The ideasand techniques presented here are general and (mostly) implementation independent.However, this work will focus speciﬁcally on self-consistency within the castep software– a planewave, pseudopotential implementation. Density (potential) mixing is the nameformally given to the process one uses to combine iterative particle densities (potentials)in order to produce an optimal estimate for the subsequent particle density (potential)such that self-consistency is achieved. Density mixing is studied from two perspectives.First, from a mathematical perspective, where KS DFT is simply treated as a featurelessnon-linear system – a ‘black-box’. Second, from a physical perspective, where the linearresponse of KS DFT is studied in an attempt to augment the above-mentioned purelymathematical treatment; this deﬁnes preconditioning. The former perspective leads tothe implementation of two recently proposed mathematical methods within castep :Marks & Luke (2011) and Banerjee et al . (2015). Both of these methods are examinedutilising a test suite of over ﬁfteen representative input systems, and neither were foundto provide a signiﬁcant, systematic improvement over the default methods in castep .The latter perspective involves the implementation of an adaptable framework allowingone to posit a model for the inhomogeneous KS dielectric in either real or Fourierspace, which is in turn used to precondition the self-consistent cycles. The scope forimprovement with such a framework in place is discussed, and simple models utilisingthe framework are analysed with the aforementioned test suite. cknow ledgements I would ﬁrst like to thank Phil Hasnip, my supervisor, for countless hours of invaluablehelp and guidance, despite the diﬃculties caused by remote supervising. Matt Smith, formany fun discussions in our (now 1% completed) quest to understand castep , particu-larly in helping me understand castep ’s parallelism, and for proof reading the thesis.Andrew Morris and his group, for kindly letting me sit in on (and occasionally contributeto) their group meetings, and providing me with an application project to run along sidethe development. Finally, Mike Payne and Michael Rutter, for help throughout the yearwith castep , computing, and logistical issues.iii ontents

Acknowledgements iiiContents iv1 Introduction 1 ontents v A Derivation of Newton’s Method 72B Test Suite 73References 75 ntroduction

The term ab initio , meaning ‘from the beginning’, is used in physics to refer to thecurrent established level of physical theory – that is, quantum mechanics . Within theframework of quantum mechanics, the natural world is characterised completely by aquantum state obeying a certain mathematical structure. However, predicting all facetsof the natural world from this mathematical structure is a near-impossible task, deﬁningso-called emergence . Analytic solutions for the quantum state are rare, and numericalsolutions come up against the (in)famous exponential wall of computational complexity[1]. Finding a computationally tractable method for extracting exact quantum mechan-ical predictions has been the source of much work across many scientiﬁc disciplines,e.g . quantum chemistry [2], materials science [3], nuclear physics [4], and particilarlyfor the following work, solid state physics [5]. Density functional theory (DFT) withinthe Kohn-Sham (KS) framework has emerged as the most widely used method for per-forming ab initio calculations [6, 7]. The success of KS DFT is, in part, due to its vastdomain of applicability, particularly in predicting ground state properties of matter suchas stable phases and cell parameters [8]. Moreover, the construction and implementationof KS DFT allows these properties, and more, to be accessed in a feasible timeframe forsystems containing over one thousand constituents [9]. Eﬃcient numerical implementa-tion of KS DFT is therefore of paramount importance, and is also the focus of the workto follow.The equations required to solve KS DFT deﬁne a non-linear eigenvalue problem . Thatis, one ﬁrst solves the so-called KS equations,ˆ H ks [ ρ ( r )] φ i ( r ) = (cid:15) i φ i ( r ) , (1.1) (cid:90) d r φ ∗ i ( r ) φ j ( r ) = δ ij , (1.2)1. Introduction linear eigenvalue problem, where { φ i ( r ) } are the single particlewavefunctions , and the spectrum of the KS Hamiltonian deﬁnes the corresponding singleparticle energies. For a system containing N constituents, only the N lowest energyeigenfunctions and eigenvalues of the KS Hamiltonian need be calculated, leading to a particle density deﬁned via ρ ( r ) = N (cid:88) i =1 | φ i ( r ) | . (1.3)This particle density, however, determines the form of the KS Hamiltonian in Eq . (1.1),the eigenfunctions of which are in turn used to deﬁne the particle density. A set of singleparticle wavefunctions is thus required that satisﬁes both Eq . (1.1) and Eq . (1.3) simulta-neously, at which point the solution is said to be self-consistent . This work will focus onthe algorithms and strategies employed to determine this self-consistent solution, oftenreferred to in literature as the self-consistent ﬁeld (SCF) procedure [10]. Starting froman initial estimate of the particle density (or equivalently, the single particle wavefunc-tions), the SCF process deﬁnes an iterative sequence { ρ j | j ∈ N , j ∈ [1 , n ] } such thatthis sequence converges to the self-consistent solution, ρ n = ρ ∗ . The optimal approachto generating this sequence as robustly and eﬃciently as possible for an arbitrary inputto KS DFT is unknown. Contemporary strategies to achieve convergence do so fromtwo perspectives. First, by utilising various methodologies from the theory of non-linearsystems, where no explicit reference is made to the underlying mathematical structureof KS DFT. Second, by analysing the linear response of KS DFT to precondition thesemethodologies. The purpose of this thesis is to provide a comprehensive review of thediﬃculties involved in attaining a self-consistent solution to KS DFT. Following this,various extensions to the current methods utilised in many KS DFT codes for achievingself-consistency will be considered and tested in the castep software [11]. The majority of time spent on the work presented in this thesis was in understandingthe mathematical and conceptual foundations of both the numerical analysis and linearresponse theory involved in obtaining a self-consistent solution to KS DFT. As such, thethesis is constructed to reﬂect this, where an outline of the contents is as follows.Chapter 2 is a pedagogical introduction to the fundamental concepts required to followthe theory used in the remainder of the thesis. First, § . the exponential increase in complexity inherent within the standard,ﬁrst-quantisation formulation of quantum mechanics. Following this, DFT within the. Introduction § castep will be brieﬂy reviewed in such a way thatdemonstrates the scope for improved methodology in achieving self-consistency. Then, § density mixing . Furthermore, the strategies used to precon-dition density mixing schemes utilised in state-of-the-art electronic structure codes willbe presented. This involves an investigation into the linear response of KS DFT, andhow this linear response can be used to stabilise and accelerate the SCF iterations.Chapter 3 will begin by describing the technical considerations on the implementationof density mixing within castep . Then, the conceptual and mathematical foundationsbehind three distinct attempts at improving density mixing will be detailed. First, adensity mixing scheme based on the work Marks & Luke in Ref . [12] will be derived.Crucially, this derivation deviates somewhat from the original work of Marks & Lukedue to inherent diﬀerences between castep and the native KS DFT software of Marks& Luke, wien2k [13]. Second, the Periodic Pulay scheme presented by Banarjee et al .in Ref . [14] will be described. Finally, the linear response theory presented in § inhomogeneous model of the dielectric response of a general KS system. Thisframework will take an inhomogeneous and local model of the KS susceptibility as input,and subsequently calculate the dielectric response in such a way to precondition the SCFcycles. Two preliminary, physically-motivated model forms of the KS susceptibility willthen be presented in order showcase this framework.Chapter 4 will present the outcome of testing on the above-mentioned three proposedimprovements to density mixing. First, the chapter will begin by deﬁning what anexactly an ‘improvement’ constitutes in the present context, and moreover the degreeof improvement that can be realistically expected will be discussed. Following this, atest suite of representative KS DFT input systems will be presented and motivated.This test suite is designed such that deﬁnitive conclusions can be drawn regarding theeﬀectiveness of a given proposed improvement. Furthermore, individual systems fromthe test suite will be isolated in order to study the convergence patterns of the presentedmethods in more detail. This individual analysis will highlight the key advantages and. Introduction heory

The problem of achieving self-consistency in KS DFT is a multifaceted one, which spanssubdisciplines such as numerical analysis of linear and non-linear systems, linear responsetheory, and electronic structure theory (not just limited to KS systems). This chapterwill detail why these are all necessary ingredients in solving KS DFT. To do this, onemust start with a statement of the overarching problem, which is calculating the elec-tronic structure of a system using quantum theory. The necessity of a framework like KSDFT will then become apparent, and following this, the background of (KS) DFT willbe reviewed. This review will end by isolating the speciﬁc subsection of KS DFT whichthe following work pertains to – self-consistency and density mixing. Here, a clear anddetailed statement of the problem to be solved will be provided. Finally, a discussion ofthe numerical methods employed within DFT software to achieve self-consistency, andhow one can precondition these methods, will follow.

The principles of quantum mechanics, developed over the course of approximately twodecades in the early twentieth century by some of history’s greatest physicists, providesthe physical theory underpinning the behaviour of fundamental particles. This includessystems containing the particles that make up everyday matter – protons, neutrons, andelectrons. Phenomena arising from the collective behaviour of these three constituentsis dubbed condensed matter physics, and this emergent behaviour makes up much ofthe rich and varied natural world around us. Within quantum mechanics, the completedescription of a condensed matter system is given by the quantum state | Ψ (cid:105) , obeyingthe dynamical equation of motion, ˆ H | Ψ (cid:105) = − i∂ t | Ψ (cid:105) , (2.1)5. Theory . Us-ing eigenstates of the position operator as a basis leads to a Hamiltonian of the formˆ H = − m ∇ + v ( r ) , (2.2)now operating on a wavefunction ψ ( r ), the position basis expansion of the quantum state | Ψ (cid:105) . Given the representation in Eq . (2.2), one can now begin to construct a Hamil-tonian operator which describes a many-body interacting system of charged nuclei andelectrons. The potential function will simply describe the Coulomb interactions betweenall constituents of the system. The Coulomb interaction decays proportional to inverseof distance, therefore a system of interacting nuclei (Greek indices) and electrons (Latinindices) with masses, charges and positions { M µ , Z µ , R µ } and { m e , e, r i } respectively isdescribed by the Hamiltonian,ˆ H = − (cid:88) i ∇ i − (cid:88) µ M µ ∇ µ − (cid:88) i,µ Z µ | r i − R µ | + 12 (cid:88) µ (cid:54) = ν Z µ Z ν | R ν − R µ | + 12 (cid:88) i (cid:54) = j | r i − r j | , (2.3)where now, and hereafter, atomic units are used, m e = e = (cid:126) = 4 π(cid:15) = 1. Thisexpression includes (from left to right): the kinetic energy operators of the electrons andnuclei, the electron-nuclear, nuclear-nuclear, and electron-electron Coulomb interactions.The ﬁrst simpliﬁcation in ﬁnding a solution to the Schr¨odinger equation is to notice thatit is linear in time, meaning the solution is separable in the following way,Ψ( { r i } , { R µ } ; t ) = ˜Ψ( { r i } , { R µ } )Θ( t ) , (2.4)Θ( t ) = e − iEt . (2.5)This allows time to be included explicitly, and one can focus on ﬁnding a stationarystate solution to the time-independent Schr¨odinger equation,ˆ H | Ψ (cid:105) = E | Ψ (cid:105) . (2.6)One is left now with a quantum state, | Ψ (cid:105) , describing a composite system as an eigen-state of the Hamiltonian operator, living in a Hilbert space, H , whose dimension ismultiplicative with respect to the number of constituents. This is an important fact,and a quick example can show why the non-separable nature of the Hamiltonian via theColumb interaction vastly increases the complexity of the problem. Imagine two single This equation in fact does apply relativistically, but the form of the Hamiltonian operator necessarilychanges in accordance with relativity. . Theory | φ (cid:105) , | ψ (cid:105) ∈ H (2.7)representing identical particles. Each can be expressed in an inﬁnite (complete) ba-sis {| n (cid:105) | n ∈ [1 , ∞ ] } that spans H . Computationally, one can truncate the basis, {| n (cid:105) | n ∈ [1 , b ] } , in such a way that it monotonically approaches the true Hilbert spacewith increasing b (for example, using planewaves). Here, b is the total number of ba-sis states used in the ﬁnite representation, and thus parametrises the accuracy of therepresentation. In an interacting system of two identical particles, | Ψ (cid:105) ∈ H composite = H ⊗ H , (2.8)the dimension of the truncated composite Hilbert space will be b . This reﬂects the factthat all possible combinations of the individual single particle bases are now requiredto span the composite space. Hence, if a computist ﬁnds b = 10 basis states providesa reasonable accuracy for resolving the individual single particle states, the solution oftheir composite system to the same accuracy will have an upper bound requirementof 10 basis states. In reality, indistinguishability (i.e . the same basis is used for eachconstituent) and the exclusion principle mean the complexity increases combinatoriallyin number of constituents as b C N . If the above { N = 1 , b = 10 } calculation took, forexample, 60 seconds, one might expect a { N = 20 , b = C

20 100 } calculation would takesomething of the order of 10 seconds, which is a number perhaps best expressed inunits of universe ages (see Fig . (2.1)).

1 2 3 4 5 6 7 8 9 10Make a cup of a teaA game of StarcraftDone by the end of an MPhilDone by the end of a PhDFinished now if started atthe advent of the Bronze Age T i m e t o c o m p l e t e c a l c u l a t i o n ( s ) Number of particles (N)

Figure 2.1:

Scaling of the many-body problem: the y-axis represents the combinato-rial scaling t ∼ O ( b C N ) with a prefactor for the one electron case of about 60 seconds. . Theory { r i } ) = N (cid:89) i =1 φ i ( r i ) , (2.9)where each φ i ( r i ) solves its own separate Hamiltonian, requiring a basis resolution b . Thetotal composite solution to a given single particle accuracy now only requires N b basisfunctions – the eﬀective dimension of the composite Hilbert space has become additivewith respect to particle number. This deﬁnes a system of so-called ‘non-interacting’particles, which is a core concept behind KS DFT, and many other electronic structuremethods.The ﬁrst simplifying approximation usually made to Eq . (2.3) in an attempt to reducethe complexity is called the Born-Oppenheimer approximation. This approximationprincipally functions by noticing that 1 /M µ in the nuclear kinetic energy is small relativeto 1 /m e = 1 in the electronic kinetic energy. This leads to a weak coupling betweenthe nuclear and electronic states. To elaborate, under certain conditions, the electronicstate can be assumed to evolve adiabatically to suﬃciently small changes in the nuclearcoordinates . Here, ‘suﬃciently small changes’ refers to the aforementioned observationthat the kinetic energy of the nuclei is far lower than that of the electrons, where boththe nuclei and electrons are subject to a Coulomb force of similar magnitude. Hence,changes in the nuclear state will be small relative to the evolution of the electronic state.That is, the electronic state will equilibrate to the electronic ground state for a givennuclear state before the nuclear state can mathematically couple to this equilibration.This deﬁnes adiabaticity, i.e . to a good approximation the complete state of the systemis determined as a weighted combination of continuously connected electronic groundstates over all sets of nuclear coordinates – the potential energy surface . As Ref . [1]details, the Born-Oppenheimer approximation breaks down when these potential energysurfaces are not suﬃciently separate. That is, the potential energy surface correspondingto the electronic ground state should be suﬃciently separate to the potential energysurface of the ﬁrst excited state, which should be suﬃciently separate to the potentialenergy surface of the second excited state, and so on. This is often the case, and leadsto a complete decoupling of the electronic and nuclear states as their contributions tothe Hamiltonian under this approximation are entirely additive with respect to eachother. Thus, the electron-nuclear interaction term is replaced by an electron moving in Moreover, the nuclear coordinates can be, and often are, taken as the coordinates of classical pointcharges. This is because the nuclear state is extremely localised, allowing the nuclei to be treatedclassically by placing a point charge at the centre of the localisation. . Theory

9a background potential parametrised by the nuclear coordinates. The full wavefunctionnow takes the form, Ψ( { r i } , { R µ } ) = Ψ elec ( { r i } )Ψ nuc ( { R µ } ) , (2.10)where the electronic wavefunction now solves its own electronic Hamiltonian parametrisedby the set of nuclear coordinates { R µ } ,ˆ H elec = − (cid:88) i ∇ i + 12 (cid:88) i (cid:54) = j | r i − r j | + (cid:88) i v ext ( r i ) , (2.11) v ext being the parametrised term, the external potential. Solving the Schr¨odinger equa-tion using this Hamiltonian deﬁnes what it means to determine the electronic structureof a system. In this sense, the Born-Oppenheimer approximation is responsible for thename of the subdiscipline electronic structure.A ‘system’ is now entirely speciﬁed by a given external potential and electron number.However, as it stands, one interaction term remains – the electron-electron Coulombinteraction – and dealing with this term has been the source of much work within the ﬁeldof many-body physics. Some of the most popular methods to do this utilise the concept ofa mean-ﬁeld in which independent (or non-interacting) particles respond. The mean-ﬁeldpotential is constructed such that it attempts to mimic the behaviour of the interactionterm; KS DFT and Hartree-Fock (HF) theory are mean-ﬁeld theories. This regime hasalso seen success as a post-DFT method for treating electron correlation with a techniquecalled dynamical mean-ﬁeld theory (DMFT) [15]. Mean-ﬁeld theories have historicallybeen most successful in describing ground state structural properties of matter, e.g. cellparameters [5]. For excited state (optical) properties however, many-body perturbationtheory has been shown to provide a better treatment of the relevant physics [16]. Forexample, the GW approximation using Hedin’s equations [17], and the Bethe-Salpeterequation (BSE) [18]. To conclude, there are many methods available that attemptto treat the true interacting nature of a system within a computationally tractableframework, each ﬁnding success in its own domain of applicability [19–21]. However, dueto its relative accuracy and transferability for the computational cost required to solve it,KS DFT has emerged as the most widely used method in electronic structure theory. Italso provides the starting point for many of the post-DFT perturbative methods quoted,such as DFT+GW and DFT+GW+BSE [16].. Theory The core concepts behind DFT are quite separate from the approximations one makesin order to implement it. In isolation from these approximations, DFT is an exactreformulation of quantum mechanics based on two astonishing theorems. These theoremsstate that, in principle, no information about a quantum system in the lowest energystate of a Hamiltonian is lost by operating on the state such that one retrieves theelectron density ρ ( r ) = N (cid:90) N (cid:89) i =2 d r i | Ψ( r , r , . . . , r N ) | . (2.12)This quantity is a measure of the probability of any electron being found at a position r ,where here, and throughout the remainder of the thesis, spin will be ignored (althougha note on spin in relation to the following work will be given in § N degrees of freedom in the complex-valued wavefunction. HK Theorem I.

The ground state electron density for a system of interacting electronsis uniquely determined by the external potential.

HK Theorem II.

As a result of HK Theorem I, a universal (non-system-dependent)functional of the electron density can be deﬁned such that this functional is minimisedfor the exact ground state electron density.

These proofs are formulated under the requirement that the wavefunctions be non-degenerate, and the electron density is so-called v -representable. This is a problem,as it is not known if an arbitrary function f ( r ) can be represented as a ground statesolution of the Schr¨odinger equation with an external potential v ext . Hence, it may bepossible that an (appropriately deﬁned) energy functional can be minimised by a spu-rious non- v -representable electron density. Fortunately, the v -representability problem,and the non-degeneracy requirement, were solved by the alternate proof of Levy andLieb [23]. Instead, a functional is deﬁned such that a constrained search over all ‘ N -representable’ electron densities yields a unique minimum (for the ground state density).Unlike v -representibility, N -representibility is not a problem as it has been proven thatan arbitrary (diﬀerentiable) function f ( r ) can be represented by some wavefunction us-ing Eq . (2.12) [24] (so spurious minima in the energy functional are avoided). Neitherproof will be presented here, but a vast literature discussing both is available [25].. Theory

In order to develop a practical approach to DFT, one needs to deﬁne a total energyfunctional E [ ρ ( r )] such that the minimum of this functional provides the exact (or a goodapproximation to the exact) interacting ground state energy. The ansatz of Kohn andSham was formulated by envisioning a non-interacting (and therefore computationallysoluble) system of electrons in an eﬀective potential v eﬀ . The question now is whethersuch a v eﬀ exists that can uniquely reproduce the exact interacting ground state energyand density. Kohn and Sham showed this to be the case, and the mapping betweenthe non-interacting and interacting systems deﬁnes the famed auxiliary KS system ofelectrons.An outline of the KS framework is as follows. Under the assumption that electronsare non-interacting, the many-body wavefunction takes the form of a product state:Ψ( { r i } ) = N (cid:81) i =1 φ i ( r i ) (or alternatively a Slater determinant). The wavefunctions { φ i } are often referred to as ‘single particle orbitals’. As a result of the exclusion principleallowing only one fermion per single particle eigenstate of the Hamiltonian, there exista total of N single particle orbitals (foregoing for now partial occupancies). The KSsystem is thus deﬁned as (cid:16) − ∇ + v eﬀ ( r ) (cid:17) φ i ( r ) = (cid:15) i φ i ( r ) , (2.13) ρ ks ( r ) = (cid:88) i ∈ occupied | φ i ( r ) | , (2.14) E ks ∼ (cid:88) i ∈ occupied (cid:15) i , (2.15). Theory . (2.13)represents the kinetic energy of a non-interacting particle. This deﬁnes the most generalnon-interacting form of the time-independent Schr¨odinger equation. One now seeks toprove that a v eﬀ exists such that ρ ks ( r ) = ρ int ( r ) = N (cid:90) N (cid:89) i =2 d r i | Ψ( r , r , . . . , r N ) | , (2.16) E ks = E int = (cid:90) N (cid:89) i =1 d r i Ψ ∗ ( { r i } ) ˆ H int Ψ( { r i } ) . (2.17)To do this, one deﬁnes the corresponding energy functional of the system in Eqs . (2.13)- (2.15) as E ks [ ρ ] = T [ ρ ] + (cid:90) d r v eﬀ ( r ) ρ ( r ) , (2.18)where T [ ρ ] is the kinetic energy functional of independent particles. The energy func-tional of the fully interacting system is E int [ ρ ] = F [ ρ ] + (cid:90) d r v ext ( r ) ρ ( r ) , (2.19) F [ ρ ] = (cid:104) Ψ gs | T + v ee | Ψ gs (cid:105) . (2.20)Here, T is the kinetic energy operating on the fully interacting system, and v ee is theCoulomb potential operator. To prove that such a v eﬀ exists, one can check thereexists a density which is the stationary point of both functionals, Eq . (2.18) and Eq . (2.19), simultaneously under some deﬁnition of v eﬀ , as will now be shown. Note that indetermining the stationary points, a constrained variation is needed – the ground stateelectron density is required to satisfy (cid:90) d r ρ ( r ) = N. (2.21)Therefore, the variation is performed using Lagrange multipliers, which leads to δE ks [ ρ ] δρ = δT [ ρ ] δρ + v eﬀ + λ NI = 0 , (2.22) δE int [ ρ ] δρ = δF [ ρ ] δρ + v ext + λ I = 0 , (2.23)with λ I/NI being the interacting and non-interacting Lagrange multipliers. These caneﬀectively be dropped here, as the behaviour of the variation can be deﬁned up to aconstant without loss of generality. Clearly, equating Eqs . (2.22) and (2.23) yields an. Theory v eﬀ = δF [ ρ ] δρ + v ext − δT [ ρ ] δρ . (2.24)The ansatz of Kohn and Sham has been proven: it is possible to deﬁne an auxiliary non-interacting system which exactly reproduces the behaviour of the interacting system .The universal functional F [ ρ ] remains unknown, and to utilise this mapping in practicea computationally tractable approximate form is required. Kohn and Sham proposed anapproximation analogous to Hartree theory. The electron-electron interaction is replacedwith a mean-ﬁeld ‘Hartree potential’, and kinetic energy operations are performed onthe non-interacting orbitals, rather than the many-body wavefunction , F [ ρ ] = − (cid:90) d r φ ∗ i ( r ) ∇ i φ i ( r ) + 12 (cid:90) d r d r (cid:48) ρ ( r ) ρ ( r (cid:48) ) | r − r (cid:48) | + E xc [ ρ ] . (2.25)The ﬁnal term here – the exchange-correlation energy – is deﬁned as the diﬀerence be-tween the ground state energy of the Hartree system and that of the exact interactingsystem. It is named as such because, so far, no explicit reference has been made to thefermionic nature of the system in relation to the statistics it obeys. That is, the many-body wavefunction changes sign upon exchange of any two electrons, and is thereforean antisymmetric function with respect to its arguments. This has the important con-sequence of lowering the total energy by tending to separate electrons of like spin, andthereby lowering the total energy by an amount E x , the exchange energy . The exchange-correlation energy is simply the sum of the exchange energy and the ‘correlation energy’.Therefore, correlation (an accurate, computationally tractable treatment of which is theholy grail of electronic structure theory) is deﬁned as the diﬀerence in energy betweenHartree-Fock theory (Hartree theory with an exact treatment of exchange) and the fullyinteracting theory. Correlation then necessarily includes a contribution from the factthat the independent particle kinetic energy, obtained from the kinetic energy operatoron the independent particle states, is not the same as the kinetic operator on the inter-acting wavefunction, which one no longer has access to. This diﬀerence is a signiﬁcantcontributor to correlation, as the Hartree potential, by virtue of being a mean-ﬁeld,provides an exact description of the Coulomb potential felt by a test charge due to itssurroundings . The problem of v -representability has resurfaced, however. It is not possible to say that any electrondensity resulting from a given external potential in the interacting system will be expressible as a densityfrom some external potential in the non-interacting system (and vice versa). This is a problem yet tobe solved, but is mostly benign [27]. Pre-KS, the kinetic energy of the interacting system was often approximated using Thomas-Fermitheory, deﬁning the ﬁrst practical density functional [28]. Although this is true for a test charge , it is not true for constituents of the KS system. That is, theHartree mean-ﬁeld is constructed from the total electron density , meaning KS constituents interact withthemselves through this Hartree term. . Theory . (2.25) into Eq . (2.24) ﬁnally provides a systematic method of approach-ing the electronic structure problem within a computationally feasible framework, v eﬀ ( r ) = (cid:90) d r ρ ( r ) | r − r (cid:48) | + v ext ( r ) + δE xc [ ρ ( r )] δρ ( r ) , = v h + v ext + v xc , (2.26)which is used to construct the KS Hamiltonian, solving the KS equations Eq . (2.13). Theexchange-correlation potential is the only term left requiring approximation, acting toparametrise the degree of ignorance about the correlated nature of electrons. It is worthnoting here that there is a certain arbitrary nature with which the exchange-correlationpotential was constructed. That is, as a functional explicitly deﬁned to satisfy a (non-physical ) mathematical mapping between interacting and non-interacting systems. Assuch, one loses a degree of physical meaning within the non-interacting KS system. Adirect consequence of this is that all but one of the individual energy eigenvalues ofthe KS system carry no formal physical meaning (other than their sum being the totalenergy, minus double counting). The eigenvalues have meaning within the frameworkitself (see, Janak’s theorem [29]), but the only eigenvalue with true physical meaning (inﬁnite systems) is the highest energy occupied one, equal to the ionisation energy [30].Despite these problems, it is an astonishing fact that for many materials the exchange-correlation term accounts for a quite a low percentage of the overall ground state energy.Consequently, ground state properties of matter, such as stable phases and cell param-eters, are predicted by KS DFT to within 1% of experiment using only a rudimentarytreatment of exchange and correlation [7]. For example, by using the exact exchange-correlation energy functional of the homogeneous electron gas – the local density approx-imation (LDA) [31]. Increasing the accuracy of KS DFT therefore amounts to ﬁnding,in some sense, a ‘better’ exchange-correlation functional to model the system at hand.It could be argued therefore that KS DFT is not an ab initio theory, because it doesnot approach the exact solution in some well deﬁned limit (as, for example, a pertur-bative method would). In this context, the problem of increasing accuracy is explainedwell in Ref . [32], which conceptualises a ‘Jacob’s ladder’ of increasing accuracy, startingfrom no exchange-correlation to exact exchange-correlation – passing through the LDA,gradient-based extensions of the LDA, exact (non-local) exchange, and more. Non-physical, here, is intended to refer to the fact that a potential energy is being constructed inorder to account for a fundamental misrepresentation of the kinetic energy . . Theory The KS equations are restated here as, (cid:16) − ∇ + v ext ( r ) + v h [ ρ in ( r )]( r ) + v xc [ ρ in ( r )]( r ) (cid:17) φ i ( r ) = (cid:15) i φ i ( r ) , (2.27) ρ out ( r ) = (cid:88) i ∈ occupied | φ i ( r ) | , (2.28)assuming a local exchange-correlation potential. In constructing the non-interacting KSsystem of electrons, the fundamental structure of the quantum mechanical equationshas changed due to the introduction of the Hartree and exchange-correlation potentials.That is, the input to the KS equations now depends explicitly on the output non-linearlyvia the electron density. Therefore, in order to even begin solving the above system ofequations, one must estimate an initial electron density to use as input, ρ in . Once aninitial input density has been speciﬁed, the (linear) KS equations can be solved Eq . (2.27) – this involves diagonalisation of the KS Hamiltonian in some basis to ﬁnd theoccupied single particle orbitals. From these single particle orbitals, the output electrondensity, ρ out , is constructed using Eq . (2.28). The input and output densities are (ingeneral) only equal if one has found the ground state electron density that solves theKS system. The computational journey one takes starting from an initial guess electrondensity, arriving at an electron density that solves the KS system is precisely what itmeans to achieve self-consistency . This process is summarised in Fig . (2.2).The above set of equations deﬁnes a non-linear system . That is, a map that takes aninput ρ in and generates an output ρ out that is non-linearly related to the input, F [ ρ in ] = ρ out , (2.29)where F hereafter deﬁnes the KS map . Solving this system amounts to ﬁnding a ﬁxedpoint of the non-linear KS map, ρ ∗ = ρ in = ρ out , which is typically done using aniterative procedure. This iterative procedure acts to deﬁne a sequence, { ρ in1 , . . . , ρ in n } ,such that ρ ∗ = ρ in n within some deﬁned tolerance. Ideally, this sequence is generated asrobustly and eﬃciently as possible, which is to say, the sequence will eventually converge,and it does so such that n is minimised. In literature, this is often referred to as the self-consistent ﬁeld (SCF) process, where here the self-consistent ﬁeld is the density – a 3-dimensional real scalar ﬁeld. It should be noted that self-consistency can be equivalentlytreated by deﬁning a converging sequence of potentials, or wavefunctions; density hassimply been chosen here. The reason being that, within castep , the default methodfor achieving self-consistency explicitly operates by updating the electron density. Thisis not required, and indeed a reliable fallback method within castep updates the single. Theory ρ in ( r )Construct the KS Hamiltonian:ˆ H ks [ ρ in ( r )] = − ∇ + v eﬀ [ ρ in ( r )]( r )Solve the KS equations to obtain the occupiedsingle particle orbitals: { φ i } Calculate the output electron density: ρ out ( r )Does ρ out ( r ) = ρ in ( r )? Construct a new ρ in ( r )Stop: { φ i } and ρ * ( r ) = ρ out ( r ) = ρ in ( r ) is the solutionYes No Figure 2.2:

Flow chart depicting a self-consistent solution to the KS equations. particle orbitals directly in such a way that minimisation in error is guaranteed betweeniterations. This method is called ensemble density functional theory (EDFT) [33], andis extremely robust due to its variational nature, but not nearly as eﬃcient as rivallingtechniques. The default method for achieving self-consistency within castep is called density mixing , and combines the input and output densities at each iteration to estimatea new input density. This will be detailed in § in silico DFT that will be utilised inthe following work, such as the planewave basis set..

Theory In order to implement a solution to the KS equations, one must ﬁrst characterise the typeof system one wishes to solve – here, that is systems in the solid state . The concept of acrystal is therefore introduced as an ordered lattice of atoms, invariant under translationby primitive lattice vectors . The set of all integer multiples of the primitive lattice vectorsform the inﬁnite, periodic Bravais lattice,Λ = { n a + n a + n a | n i ∈ Z } . (2.30)One can therefore construct the full Bravais lattice out of two constituents: a primitiveunit cell , and a translation of this unit cell by the set of all integer multiples of theprimitive lattice vectors (the basis). Here, the elements of the Bravais lattice are atomicpositions, and the lattice itself is an approximation to the bulk crystal structure – amethod of reducing an ∼ O (10 ) size system to that of just a primitive cell and threetranslation vectors. In order to fully exploit this symmetry, one can invoke Bloch’stheorem , which vastly simpliﬁes calculations of the wavefunction in periodic systems.

Bloch’s Theorem . If a Hamiltonian is invariant under a translation by R , then theresulting wavefunction will take that of a Bloch form, ψ k ( r ) = e i k . r u k ( r ) , (2.31) where u k ( r ) is a function with the periodicity of the Hamiltonian, u k ( r + R ) = u k ( r ).The quantity k labels the crystal momentum – a conserved quantity under translationby a ‘reciprocal lattice vector’. Like the Bravais lattice, the reciprocal lattice is deﬁnedby the set of all integer translations by three reciprocal lattice vectors on a reciprocalprimitive cell, and these two lattices are related via a Fourier transform. Therefore,there exists a range of k on the reciprocal lattice in which all information is contained– a Brillouin zone . The ﬁrst Brillouin zone is a particular type of primitive unit cell onthe reciprocal lattice, and is conventionally chosen to be the range of k for which thefull wavefunction is calculated.The wavefunction of the inﬁnity of electrons in the Bravais lattice can now be calculatedas the wavefunction within the unit cell, multiplied by a phase labelled by k spanningthe ﬁrst Brillouin zone. This inﬁnity has been transferred from an inﬁnity of electrons,to an inﬁnite set of continuous values of k that need to be sampled within the ﬁrstBrillouin zone. Fortunately, the energy (and therefore the wavefunctions) are smoothlyvarying with k [34], allowing for a rather course sampling of k within the ﬁrst Brillouinzone to accurately represent the wavefunction of an inﬁnite crystal. In the language. Theory

Bloch function , which possesses an inherent periodicity deﬁned by the primitive lat-tice vectors. Hence, a natural basis to consider would be a complete set of functionswith the same inherent periodicity as the unit cell. This can be done using planewaves ,which is the most popular basis set for electronic structure calculations, deﬁned as u k ( r ) = (cid:88) G C k , G e i G . r . (2.32)The set of allowed planewaves is deﬁned by the constraint G . R = 2 πm for m ∈ N , i.e . the planewaves must have the same periodicity as the unit cell. The full wavefunctionexpanded in the planewave basis now takes the form ψ k ( r ) = (cid:88) G C k + G e i ( k + G ) . r (2.33)for k in the ﬁrst Brillouin zone. Other than inherent periodiciy and orthogonality, thereare a few distinct advantages of a planewave basis. First, the space of vectors { G } deﬁning the planewaves lies on the reciprocal lattice, thus unlocking the fast Fouriertransform (FFT) as a powerful computational tool to switch between spaces when con-venient. Moreover, there is no bias in the basis set toward a particular input system.Therefore, the accuracy of the representation converges monotonically with increasingnumber of planewaves ordered by increasing magnitude of the allowed G -vectors . But,in being unbiased, the size of the Hamiltonian can become prohibitively large in orderto achieve a certain degree of accuracy . In an attempt to remedy this, one can makethe observation that core electrons – electrons in orbitals that are extremely localisedto the nuclei – contribute little to the observable properties and behaviour of a system[35]. The contribution of core electrons is mostly contained in how they screen the nu-clear potential from the valence electrons . Therefore, one might think of constructingan eﬀective potential that provides the exact behaviour for valence electrons, withoutexplicitly considering core electrons as degrees of freedom in the system. This is called This fact also makes including time-dependence particularly easy compared to system-tailored basissets. Moreover, it becomes diﬃcult to distinguish how much charge is associated with particular nucleiin a bonded system. . Theory pseudopotential approximation . It is particularly useful when utilising a planewavebasis set, as orbitals corresponding to core electrons become increasingly oscillatory,owing to the fact that the core states are constrained to be orthogonal to the valencestates. Oscillatory behaviour in the wavefunction requires many planewaves to representaccurately, so the smoothing as a result of the pseudopotential approximation allows oneto use a much reduced number of planewaves. There are many popular computer pro-grams implementing the above theory: castep [11], vasp [36], abinit [37], quantumespresso [38], and many more.The ensuing discussion will now restrict speciﬁcally to the methodology employed by castep . The KS equations expanded in terms of the planewave basis set take the form, (cid:88) G (cid:48) (cid:34) | k+G | δ GG (cid:48) + v ext ( G − G (cid:48) )+ v h ( G − G (cid:48) ) + v xc ( G − G (cid:48) ) (cid:35) C k+G (cid:48) ,b = (cid:15) k+G ,b C k+G ,b , (2.34)where b here labels the distinct single particle orbitals – the band index . The naiveway to solve these equations would be to construct the full matrix, perform an exactdiagonalisation, and use the N lowest energy eigenvectors and eigenvalues to calculatethe electron density and total energy. This is impractical, as exact diagonalsation hereis an O ( N G ) operation, where N G is the number of planewaves included in the truncatedbasis set. N G can become as large as 10 , therefore the matrix cannot be diagonalised,or even stored in RAM. Instead, an iterative diagonalisation is implemented in order tocompute only the lowest N b << N G energy eigenvectors and eigenvalues, where N b is thenumber of bands calculated (typically of the order of the number of electrons). Withoutgetting into details, this involves an algorithm (e.g . the Davidson method [39]) whichacts to isolate and compute the N b lowest energy eigenvectors. However, unconstrained,this algorithm would ﬁnd N b copies of the lowest energy eigenvector, so a constrainedversion is needed. That is, the algorithm is performed subject to the constraint that thewavefunctions remain orthogonal. Therefore, each iterative diagonalisation step requiresan orthogonalisation step in the subspace of the N b lowest energy eigenvectors. Thisis the source of the oft-quoted ‘cubic-scaling’ nature in KS DFT implementations. Inreality, the scaling of KS DFT implementations is not quite that simple . From Ref . [41], the dominant scaling is indeed due to the orthogonalisation, but instead scales as O ( N k N G N b ); where N k is the number of k points one uses to sample the Brillouin And, in fact, not cubic at all until the system size becomes large. There is indeed an O ( N k N b )scaling component to the orthogonalisation procedure (in inverting the band overlap matrix), but thisdoes not become dominant over the scaling quoted until N b ∼ [40]. . Theory .Iterative diagonalisation of the KS Hamiltonian is one of the two major time sinks ina typical KS DFT calculation. Referring back to the ﬂowchart in Fig . (2.2), everytime the input density fails to match the output density, the iterative diagonalisationis executed again for the updated Hamiltonian toward self-consistency. Hence, the wallclock time becomes multiplied by the number of SCF cycles . Accelerating KS DFTtherefore amounts to either improving the strategies one uses to solve the KS equations,or reducing the amount that one needs to solve the KS equations in order to ﬁnd aself-consistent solution. This work focuses on the latter. The need for a self-consistent density (or equivalently, potential) was outlined in § Density mixing is deﬁned as follows: one seeks a ﬁxed point of the KS map, F [ ρ ∗ ] = ρ ∗ , (2.35)or, equivalently, a root of the residual , R [ ρ ∗ ] = F [ ρ ∗ ] − ρ ∗ = 0 . (2.36) This scaling is dominant over the scaling of the application of the KS Hamiltonian on the trialvectors in the iterative diagonaliser. With a series of clever operations and FFTs, the scaling becomes O ( N k N b N G log N G ). For example, one would apply the kinetic energy operator in Fourier space, and anylocal potential operators in real space, as they are both diagonal in their respective spaces and FFTscan move between them. Assuming a negligible time spent doing density mixing, also excluding initialisation, ﬁnalisation,and other sundries. . Theory ρ ∗ by combining the inputand output density at the current iteration in such a way that the subsequent iterativedensity is closer to convergence. In fact, one can use the entire history of densities cycledthrough in the iterative procedure to construct the subsequent density. This leads to a density mixing scheme being the name given to the function, ρ in n +1 = f ( { ρ in i , ρ out i } ) ∀ i ∈ [1 , n ] . (2.37)The following work seeks to ﬁnd the optimal form of f speciﬁcally for the KS map F .A vast mathematical literature of numerical methods to solve non-linear systems exists[42], but, there is no such thing as a free lunch, and there is no one particular methodin the literature which universally performs better than others across all applications.Therefore, f will be tailored by certain facts about the KS system. For example, in KSDFT one is often privileged to reasonably accurate initial guess of the electron density(using a sum of appropriately placed atomic orbitals), leading naturally to algorithmsthat perform well with an accurate initial guess. Moreover, as discussed, the number of G -vectors, i.e . the number of elements required to represent ρ , is typically quite large,meaning the algorithm must take particular care to avoid storing prohibitively largeobjects in RAM. The function f should therefore deﬁne a robust and eﬃcient scheme,executable with limited memory requirements. With this in mind, the diﬃculties inapplying some of the ‘naive’ approaches to the SCF process can be outlined. The simplest update to the density one could imagine is to treat the SCF process as aﬁxed point iteration, feeding the output density back in as the input density by com-puting ρ in n +1 = F [ ρ in n ] = ρ out n . This turns out to be unsuitable for a variety of reasons.To elaborate, ﬁrst note that the initial guess of density is not far from convergence.For analysis’ sake, this allows the operation F to be linearised about ρ ∗ . If one writes ρ in/out n = ρ ∗ + δρ in/out n , the operation F can be Taylor expanded about ρ ∗ as such, ρ ∗ + δρ out n = F [ ρ ∗ + δρ in n ] ≈ F [ ρ ∗ ] + δF [ ρ in n ] δρ in n (cid:12)(cid:12)(cid:12)(cid:12) ρ ∗ δρ in n = ρ ∗ + δF [ ρ in n ] δρ in n (cid:12)(cid:12)(cid:12)(cid:12) ρ ∗ δρ in n . (2.38). Theory δρ in n +1 = δρ out n = δF [ ρ in n ] δρ in n (cid:12)(cid:12)(cid:12)(cid:12) ρ ∗ δρ in n . (2.39)For compactness, deﬁne, M = δF [ ρ in n ] δρ in n (cid:12)(cid:12)(cid:12)(cid:12) ρ ∗ = δρ out n δρ in n ∈ C N G × N G . (2.40)This matrix holds special meaning outside of ﬁxed point iterations, as it is closely re-lated to the KS dielectric , and can be used to precondition density mixing schemes (see § M are, with-out reference to its physical meaning, to guarantee convergence within this framework.Convergence is deﬁned as δρ in n +1 → n → ∞ , or δρ in n +1 = M δρ in n = ( M ) n δρ in1 → . (2.41)It will be assumed here (although it can be shown [43]) that M is of full rank, and canbe diagonalised using { e i } as an orthonormal eigenbasis – M ij = δ ij λ i | e i (cid:105)(cid:104) e j | with { λ i } the spectrum of M . Since (cid:104) e i | e j (cid:105) = δ ij ,( M n ) ii = ( λ i ) n | e i (cid:105)(cid:104) e i | , (2.42)= ⇒ M δρ in n → | λ i | < ∀ i. (2.43)The question now is whether, for typical DFT input systems, the condition | λ i | < . [43] | λ i | ∼ ρ ∗ (see also § | λ i | from Eq . (2.43) with a parameter α to guarantee the convergence property of α | λ i | <

1. This leads to the most simple density mixing scheme able to converge a (not-so-wide)variety of input systems – linear mixing . If M d is deﬁned as the transformed diagonal version of M , then it shares some crucial propertieswith M . Namely, M n → M d ) n →

0, by nature of diagonalisation being a similarity transformation. . Theory Instead of implementing ρ in n +1 = ρ out n , one now incorporates a damping parameter α assuch, ρ in n +1 = ρ in n + α ( ρ out n − ρ in n ) = ρ in n + αR [ ρ in n ] . (2.44)The residual here deﬁnes not only the iterative error, but also the steepest descentdirection. Therefore, the linear mixing scheme is eﬀectively a weighted step in thesteepest descent direction, and is constructed such that, in the case of α = 1, no mixingis done (and a ﬁxed point iteration update is computed). The corresponding matrix to M in Eq . (2.39) is now deﬁned as A = (1 − α ) I + M, (2.45)which has eigenvalues { α ( λ i − } . The convergence criteria is now modiﬁed to be | α ( λ i − | <

1. As α can be arbitrarily tuned such that this is true, the problemhas been (superﬁcially) solved. Supposing A is positive deﬁnite, α must be chosen suchthat the spectral radius of A is less than unity, | α ( λ max − | <

1, meaning ρ n isguaranteed to converge for all eigenvalues (assuming linearity). However, the speed ofconvergence is related to how close | α ( λ max − | is to unity. If this quantity is onlyslightly less than unity, | α ( λ max − | n will take large n to reach zero within tolerance.Moreover, if α is too low, the change in the ρ in n +1 becomes minuscule per iteration foreigenvalues at or close to λ min , leading to slow convergence . Therefore, an optimalvalue of α must be deduced, but how eﬃcacious this choice is clearly depends on theratio λ max λ min – an important quantity in numerical analysis, the condition number of A .The aim of sophisticated preconditioning is to compress the eigenspectrum of A as muchas possible, leading to faster and more robust convergence.In general A is not well conditioned, e.g. for metallic materials the condition number of A turns out to be divergent proportional to the size of the unit cell [43], hence certainsystems can take large amounts of time to converge (to the point where a user wouldsay the calculation doesn’t converge in a practical sense). The condition of the KSsystem is therefore intrinsically linked to the condition of M . This matrix encodes thedensity-density dielectric response of the KS system. That is, a linear measure of how aperturbation in the density at r aﬀects the density at r (cid:48) . An optimal method to removeill-conditioning in the KS system would be to know this dielectric response exactly for ageneral input system. An analysis of the diﬃculty in calculating this response, and the Slow convergence for the eigenvalue λ min means the convergence will be slow for the density in thedirection of the eigenvector corresponding to λ min . . Theory § Broyden’s class of methods are often referred to as being quasi-Newton, meaning theyare based on the Newton-Raphson update equation, hereafter referred to as taking theNewton step. Translated into the language of KS DFT, this update becomes, ρ in n +1 ( r ) = ρ in n ( r ) − J − R [ ρ in n ( r )] R [ ρ in n ( r )] , (2.46) J R [ ρ in n ] = ∂R [ ρ in n ] ∂ρ in n ∈ C N G × N G , (2.47)where J is the Jacobian . The mathematics in deriving the Newton step is detailed inAppendix. A as the update possesses many important and desirable properties, namely,its order of convergence. It can be shown that if there exists an interval I where ρ in1 is‘suﬃciently close’ to ρ ∗ , R (cid:48) is bounded away from zero and R (cid:48)(cid:48) exists (and is continuous),the rate of convergence of the Newton method is quadratic . Hence, quadratic convergenceis held as a ‘golden standard’ for any algorithms to follow. Unfortunately, Newton’smethod as it appears here is unsuitable for KS DFT as the Jacobian is costly to computeand store – it involves a derivative of the KS map (calculated, for example, with anumerical derivative), and is an N G × N G size object. The prohibitive computationalcost in computing J can be greatly alleviated by considering approximate updates towardthe exact J at each iteration, instead of an explicit construction of J (ideally withoutmuch loss of convergence properties) – Broyden’s methods.Broyden’s methods for non-linear root ﬁnding were ﬁrst published in 1965, yet the con-cepts and machinery he proposed remain foundational to many methods in practicaluse today. Broyden supposes that instead of computing and storing the exact Jaco-bian matrix, one can construct (or guess) an approximate Jacobian at only the initialiteration, and update it at every subsequent iteration in such a way that it approachesthe exact Jacobian, and maintains convergence properties similar to that of Newton’smethod. Since density mixing only induces relatively small changes in ρ in n , the changesin the Jacobian will be (in some sense) small per iteration. This fact lends itself nat-urally to asking the question: what is this ‘small’ update such that one can keep thenear quadratic convergence of Newton by utilising data from previous iterations, but. Theory then inverts it as isrequired in Newton’s formula; whereas the second (‘bad’) method updates the inverseof the Jacobian explicitly under similar assumptions. The ﬁrst method will be detailedbelow, whereas the second will simply be stated as it follows a similar methodology tothe ﬁrst.Supposing one knows the Jacobian J R,n − [ ρ in n − ], an update is required to ﬁnd J R,n [ ρ in n ]such that J R,n [ ρ in n ] = J R,n − [ ρ in n − ] + C n (2.48)can be computed. The Jacobian J n should approach, in some sense, the exact Jacobian J ∗ . This update C n can be cleverly obtained by ﬁrst enforcing a ‘secant’ condition,i.e. force J R,n [ ρ in n ] to obey a ﬁnite-diﬀerence equation that is approximately true, whichmanifestly uses known data. This secant condition is deﬁned as follows: for a simpleone dimensional system, the deﬁnition of the derivative of f at x n is obtained by takingthe following limit, f (cid:48) ( x n ) = lim x n − → x n f ( x n ) − f ( x n − ) x n − x n − . (2.49)Meaning if x n and x n − are suﬃciently close, the secant condition is the ﬁnite-diﬀerenceapproximation, f (cid:48) ( x n ) ≈ f ( x n ) − f ( x n − ) x n − x n − . (2.50)Importantly, in this one dimensional case, f (cid:48) ( x n ) is uniquely deﬁned by Eq . (2.50). Thisis not the case in > N G -dimensional system of KS DFT, the secant condition becomes, J R,n [ ρ in n ]( ρ in n − ρ in n − ) = R [ ρ in n ] − R [ ρ in n − ] . (2.51)If J n is an N G × N G matrix of full rank, it has N G linearly independent basis vectorsforming a N G -dimensional vector space, V . Enforcing the secant condition in Eq . (2.51)speciﬁes how J n should act on all vectors parallel to ρ in n − ρ in n − ∈ C N G , corresponding toone basis vector in an appropriately rotated basis. Therefore, there exists a ( N G − V (cid:48) ⊂ V consisting of all elements of the vector space V orthogonal. Theory ρ in n − ρ in n − : V (cid:48) = { z | z ∈ C N G , ( ρ in n − ρ in n − ) · z = 0 } . (2.52)Importantly, Eq . (2.51) does not specify how J n acts on elements of V (cid:48) , of which thereare N G − N G simultaneousequations to ﬁx J n , but N G degrees of freedom in J n . The Broyden update in itscurrent state is therefore underdetermined with information provided only by secant.By enforcing the secant condition, one rank of information has been gained, leaving N G − class of Broyden methods, i.e . all rank one updates to the Jacobian thatsatisfy the most recent iterative secant condition (of which there are inﬁnitely many).A particular method within this class will now be outlined, known as ‘Broyden 1 (B1)’or ‘Broyden’s good method’.To recover the missing information, Broyden supposes J n acts on elements of V (cid:48) thesame way as J n − does J R,n [ ρ in n ] z = J R,n − [ ρ in n − ] z ∀ z ∈ V (cid:48) . (2.53)This is a ‘minimal change’ condition on J n , motivated by the fact that changes in ρ in n aresmall per iteration, and so changes in J n will follow suit. These two conditions, minimalchange and secant, provide a unique solution to the update matrix C n which is cruciallyof rank one (from the secant information). To condense the above, B1 seeks the updatematrix C n under two conditions, 1 . J R,n ∆ ρ in n = ∆ R n , (2.54)2 . J R,n z = J R,n − z. introducing the notation, ∆ ρ in n := ρ in n − ρ in n − , ∆ R n := R [ ρ in n ] − R [ ρ in n − ] . One can show the following ansatz satisﬁes these conditions, J R,n = J R,n − + ∆ R n − J R,n − ∆ ρ in n | ∆ ρ in n | (∆ ρ in n ) † , (2.55)and thus provides a unique solution for C n . The notation uv † deﬁnes the outer product of u, v ∈ C N G , meaning Eq . (2.55) deﬁnes a manifestly rank one update. This update is. Theory J R,n ∈ S || J R,n − J R,n − || f (2.56)where || . || f is the Frobenius norm of a matrix, || A || f = (cid:118)(cid:117)(cid:117)(cid:116) m (cid:88) i =1 n (cid:88) j =1 | a ij | , (2.57)and the set S = { A | A ∈ C N G × N G , A ∆ ρ in n = ∆ R n } imposes the secant condition.In order to ﬁnally compute the Newton step, the last operation needed is a method ofinverting the expression in Eq . (2.55). An explicit formula for the update of the inverseof a rank one matrix is given by the Sherman-Morrison-Woodbury formula , resulting in J − R,n = J − R,n − + ∆ ρ in n − J − R,n − ∆ R n (∆ ρ in n ) T J − R,n − ∆ R n (∆ ρ in n ) † . (2.58)The simplicity of this expression owes to the fact that the update of J R,n − is of rank one– higher rank updates add algebraic, and hence computational, complexity to Eq . (2.58).Bryoden’s second method follows this exact same methodology, but updates the inverseof the Jacobian directly (and therefore does not use the Sherman-Morrison-Woodburyformula), i.e . solve minimise J − R,n ∈ S || J − R,n − J − R,n − || f (2.59)for S deﬁning the inverse secant condition. These methods have been shown to displaysuperlinear convergence over a large sample of KS systems [46], thus becoming estab-lished as core density mixing methods and motivating the subsequent work by Marks &Luke.The problem with B1 and B2 as presented here is that, while the diﬃculty of havingto construct the full Jacobian at every iteration has been solved, the method still re-quires storing a prohibitively large matrix. However, since Broyden ﬁrst proposed thesemethods, much work has gone into improving them to the point where they can beimplemented eﬃciently. A good review of these contributions can be found in Ref . [47].To highlight a few in particular, Srivastava [48] in 1984 was one of the ﬁrst to propose aformulation of B1 and B2 such that the Jacobian in its full matrix form need never bestored, instead using a series of vector-vector products to compute the update. More-over, Vanderbilt & Louie [49] and Eyert [50] formulated these methods in such a waythat the entire history of iterates could be utilised to assist convergence rather than. Theory castep currently. Therelevant theory has now been outlined such that the development of the Marks & Lukescheme can be understood.A rivalling method to the above is a mixing scheme proposed by Pulay in 1980 speciﬁ-cally to accelerate convergence in Hartree-Fock calculations, which will now be detailed.Importantly, the following discussion will introduce the idea of a subspace spanned bythe history of iterates, and how one can utilise such a subspace to accelerate convergence.

Pulay mixing is also known as a direct inversion in the iterative subspace (DIIS) method.The essence of DIIS is to store a history of residual norms which is then used to determine(via a least squares ﬁt) what the optimal (minimal) subsequent residual will be. Theargument of this residual can then be extrapolated to give an optimal ρ inopt which isused to construct ρ in n +1 . The full details of the DIIS method are not important here, asthey provide little further mathematical insight to the above analysis. Nonetheless, it isworth brieﬂy assessing the core concepts and mathematics of Pulay’s method due to itssuccess and widespread use (and it will be used as a method of comparison in § R [ ρ in n +1 ] = 0. To solve this, Pulay considers predicting a new residual as a linear combi-nation of all the residuals that have been iterated through, R [ ρ inopt ] = n (cid:88) i =1 α i R [ ρ in i ] . (2.60)The coeﬃcients α i are now determined by enforcing that the subsequent norm of theresidual is a minimum (i.e . as close to zero as the history of information allows it) –solve min( (cid:104) R [ ρ inopt ] | R [ ρ inopt ] (cid:105) ) s.t. n (cid:88) i =1 α i = 1 . (2.61)This is a constrained optimisation problem with a single constraint, and hence can besolved by the method of Lagrange multipliers. That is, one constructs a Lagrangian as. Theory L = (cid:104) R [ ρ inopt ] | R [ ρ inopt ] (cid:105) − λ ( (cid:88) i α i − , (2.62)where λ is the Lagrange multiplier. Minimising the residual norm is now a matter ofsubstituting in Eq . (2.60), and solving, ∂ L ∂α i , ∂ L ∂λ = 0 . (2.63)Upon making this substitution, the Lagrangian takes the form, L = n (cid:88) i n (cid:88) j α i α j (cid:104) R [ ρ in i ] | R [ ρ in j ] (cid:105) − λ ( (cid:88) i α i − . (2.64)Performing the derivatives in Eq . (2.64) requires some amount of algebraic manipulation(index gymnastics using the Kronecker delta and such), and the resultant linear systemof equations one solves in order to ﬁnd these coeﬃcients (which is the least squaresﬁtting step) is, n (cid:88) i (cid:104) R [ ρ in i ] | R [ ρ in j ] (cid:105) α j − λ = 0 , n (cid:88) i α i = 1 . (2.65)This corresponds to the following ( n +1)-dimensional linear system in matrix notation,  R , R , . . . R ,n R , . . . 1... ... R n, . . . . . .   α α ... α n λ  =   where, R i,j := (cid:104) R [ ρ in i ] | R [ ρ in j ] (cid:105) . (2.66)Solving this system deﬁnes the direct inversion in the DIIS; the iterative subspace is thespace spanned by the history of densities, S n = span { ρ in1 , . . . , ρ in n } , (2.67). Theory S n − ⊂ S n . Oneis therefore implicitly solving a limited memory problem if one restricts to a maximumhistory length m ∈ [1 , n ] such that the dimension of the iterative subspace is capped at m < n . In castep , m is an input parameter which is set to m = 20 by default. Solvingthe above system now produces the { α i } that minimises the residual. A ﬁnal step isthen required to obtain ρ inopt . This is an extrapolation step, and assuming R is linear,the input argument producing this optimised residual is, ρ inopt = n (cid:88) n − m +1 α i ρ in i . (2.68)This is easy to check by substituting Eq . (2.68) into the deﬁnition of the residual thencomparing to Eq . (2.60). Initially, one might think of setting the subsequent iterativedensity equal to the optimal density as such, ρ in n +1 = ρ inopt . (2.69)However, with an update of this form, the dimensionality of the iterative subspace, Eq . (2.67), remains the same as one has simply added a linear combination of vectors inthe subspace to the set. In order to resolve this issue, the subsequent iterative densitycould, for example, take the form ρ in n +1 = ρ inopt − J R [ ρ inopt ] (2.70)for some initial guess Jacobian J , as is done in Ref . [36].This method looks, aesthetically, quite diﬀerent to Broyden’s methods from the previoussection. However, Kresse et al . [46] were able to show that Pulay’s DIIS is equivalent toa quasi-Newton step with an appropriately constructed Jacobian. An advantage of thismapping is that preconditioning becomes more transparent, as preconditioning seeksto compress the eigenspectrum of the Jacobian. Interestingly however, this mappingalso reveals a noteworthy potential drawback of Pulay’s method, which is that the DIISupdated Jacobian does not obey the underlying physical symmetries of the KS system.That is, the iterative Jacobian is an approximation to the KS charge dielectric, which isa symmetric object by construction. The DIIS updated Jacobian does not preserve thisproperty, and neither does many variants of Broyden’s Jacobian updates. The extent towhich this observation aﬀects the mixing schemes will be revealed in § castep . That is, the DIIS. Theory . (2.66) can become singular in one of two scenarios: either the iteratesare not strongly linearly independent, or the residual becomes extremely small (closeto convergence). The latter problem seems to surface more frequently in practice, asolution to which could be to include some regularisation. A performance analysis ofboth Broyden’s and Pulay’s methods as implemented in castep will be given in §

4, andcompared to the proposed improvements of § All relevant algorithms implemented in castep have now been presented. Before movingon to preconditioning, it is worth giving an honourable mention to a few other numericalmethods that one might expect to be of utility in the present context. First is a methodquite similar to Pulay’s in that it searches over a subspace generated by a history ofiterates – the

Krylov-Newton method [52]. This methods functions by noticing that,in order to take a Newton step, one simply requires a solution to the following linearsystem of equations, J n δρ in n = R n , (2.71)where, ρ in n +1 = ρ in n + δρ in n . (2.72)Once J n and R n have been speciﬁed, the Krylov-Newton method uses the generalisedminimal residual method (GMRES) [53] to iteratively compute δρ in n such that Eq . (2.71)is satisﬁed . This involves a search over a subspace generated by past iterates, namelythe Krylov subspace of J n and R n , K m ( J n , R n ) = span( { R n , J n R n , . . . , J m − n R n } ) . (2.73)An extrapolation over this iterative subspace is performed such that an optimised sub-sequent step ( ρ in n ) m ∈ K m minimises the residual, r m = || J n δ ( ρ in n ) m − R n || . Theoreti-cally, then, this method has access to the quadratic convergence properties of Newton’smethod. In practice however, the approximation of J n involves a simple, ﬁrst ordernumerical derivative of the KS map. This leads to an extra evaluation of the KS mapper SCF cycle, meaning the advantage of ‘exact Newton’ is negated by doubling thecomputational eﬀort per SCF cycle. Note that the Jacobian here does not need to be explicitly stored in RAM, as only its applicationon the iterative solution is required. . Theory J R,n = J R, + ∆ R n − J R, ∆ ρ in n | ∆ ρ in n | (∆ ρ in n ) † , (2.74)which is the Broyden update under the substitution J n − → J . As expected, thisalgorithm performs well if one is in possession of a good initial guess Jacobian, i.e . if oneknew the dielectric of the input system quite accurately.This now covers the mathematical methodology employed by the vast majority of KSDFT implementations to reach self-consistency. The mathematics presented here is theproduct of over half a century of eﬀort on the part of mathematicians and physicists,and thus there appears little scope for signiﬁcant improvement short of a paradigmshift. However, the focus can now be switched to how one preconditions these meth-ods by utilising prior knowledge of the KS system. The scope for improvement usingpreconditioning is far greater. It is a less researched, more specialised topic, and it willbe shown that a perfect preconditioner can obtain the ground state solution in just oneiteration. From a mathematical perspective, preconditioning refers to making a transformationto a system in order to reduce its condition number without changing the underlyingmethod. If one restricts analysis to the linear response regime, it was shown in § . [46] notes that the convergence of both Broyden’s and Pulay’s methods are, toa good approximation, proportional to (cid:113) λ max ( J ) λ min ( J ) . Compressing the eigenspectrum of theJacobian therefore improves the speed of convergence through this proportionality. Inorder to ﬁnd a preconditioning matrix P such that the eigenspectrum of J is minimisedupon the operation P − J , one derives the optimal ﬁrst order change in the iterativeelectron density. First, one seeks the matrix P such that the Newton update, ρ in n +1 = ρ in n + P ( ρ in n − F [ ρ in n ]) , (2.75). Theory ρ in n,n +1 = ρ ∗ + δρ in n,n +1 can bemade, and the KS map can be linearised about ρ ∗ to give δρ in n +1 ≈ δρ in n (cid:16) I − P (cid:16) − δρ out n δρ in n (cid:17)(cid:17) . (2.76)As convergence is deﬁned as δρ in n +1 →

0, or δρ in1 (cid:16) I − P (cid:16) − δρ out n δρ in n (cid:17)(cid:17) n → , (2.77)the optimal deﬁnition of P is given by, I − P (cid:16) − δρ out n δρ in n (cid:17) = 0 (2.78)= ⇒ P = − (cid:16) − δρ out n δρ in n (cid:17) − = − (cid:15) − . (2.79)This is, in fact, the inverse of the density-density linear response or the static, indepen-dent electron, charge dielectric of the KS system of electrons. If this object were knownexactly, convergence would be achieved in just one iteration, and thus accelerated den-sity mixing schemes would be rendered redundant. That is to say, if one knew exactlyhow the KS electrons were going to respond to a perturbation about the ground statedensity, then, from any starting point within the linear regime, one could invert thisbehaviour to obtain self-consistency immediately. In general, the KS dielectric is notknown, and is diﬃcult to compute (as § P that interfaces with the accelerated mixing schemes such that, P J − n ≈ (cid:16) − δρ out n δρ in n (cid:17) − . (2.80)Ill-conditioning of the KS system is therefore mostly determined by the KS dielectric.It is worth then exploring the nature of this object in order to identify the sources ofcharge sloshing such that one can suppress them and, in turn, better condition the KSsystem. Dropping the discretised (matrix) representation and dealing now with functional forms,the density-density linear response function is deﬁned by, δρ out ( r ) = (cid:90) d r (cid:48) (1 − (cid:15) ( r , r (cid:48) )) δρ in ( r (cid:48) ) . (2.81). Theory r , what is the response of the KS density across all r (cid:48) such that it is linearly relatedto the initial perturbation? The exact quantum mechanical dielectric (or susceptibility ,deﬁned shortly) is ubiquitous to electronic structure theory across all levels of accuracy,containing within it the excitation properties of a system. In general, it is a time-dependent quantity, meaning a perturbation in the density is applied at time t , andthe system responds causally for all t (cid:48) > t . Moreover, this exact quantum mechanicaldielectric is built from the interacting susceptibility . That is, the object describing howan interacting system responds to a perturbation in the external potential . Fortunately,the optimal preconditioning matrix is built from the static (time-independent), non-interacting KS susceptibility, which parametrises how the KS system responds to aperturbation in the eﬀective potential : χ ( t − t (cid:48) , r , r (cid:48) ) → χ ( r , r (cid:48) ). The former can berecovered from the latter by means of an appropriately constructed Dyson equation(see Ref . [55]). The restriction to static susceptibilities is crucial, as time is not beingconsidered a variable of the set-up here. Instead, the act of cycling through iterativedensities is a form of explicit time dependence, rather than the addition of time as animplicit variable.The source of ill-conditioning can therefore be studied by unpacking the static KS di-electric. First, note that the entire density dependence of the KS Hamiltonian is throughboth the exchange-correlation and Hartree potentials. Utilising the chain rule, the fol-lowing expansion can be made, (cid:15) ( r , r (cid:48) ) = 1 − δρ out ( r ) δρ in ( r (cid:48) ) = 1 − (cid:90) d r (cid:48)(cid:48) δv inhxc ( r (cid:48)(cid:48) ) δρ in ( r (cid:48) ) δρ out ( r ) δv inhxc ( r (cid:48)(cid:48) ) . (2.82)Assuming a local exchange-correlation potential, the ﬁrst and second term respectivelyin the chain rule expansion are calculated through, δρ out ( r ) = (cid:90) d r (cid:48)(cid:48) χ ( r , r (cid:48)(cid:48) ) δv inhxc ( r (cid:48)(cid:48) ) , (2.83) δv inhxc ( r (cid:48)(cid:48) ) = (cid:90) d r (cid:48) (cid:0) K c ( r (cid:48) , r (cid:48)(cid:48) ) + K xc ( r (cid:48) , r (cid:48)(cid:48) ) (cid:1) δρ in ( r (cid:48) ) , (2.84)where K c and K xc are the Coulomb and exchange-correlation kernels, and χ now for-mally deﬁnes the static, independent particle susceptibility. The KS dielectric now takesthe form, (cid:15) ( r , r (cid:48) ) = 1 − (cid:90) d r (cid:48)(cid:48) (cid:0) K c ( r (cid:48) , r (cid:48)(cid:48) ) + K xc ( r (cid:48) , r (cid:48)(cid:48) ) (cid:1) χ ( r , r (cid:48)(cid:48) ) . (2.85)Therefore, the condition of (cid:15) becomes divergent if either K hxc or χ become divergent(ignoring cancellations). The most prominent cause of ill-conditioning in the KS system. Theory . (2.84) is expressed in Fourier space as follows , δ ˜ v inh ( G ) = 4 π Ω | G | δ ˜ ρ in ( G ) , (2.86)ignoring, for now, the exchange-correlation contribution. It is now easy to observe thatﬁnite changes in the density for long wavelength Fourier modes will be ampliﬁed by afactor of | G | − . Therefore, any error in the δ ˜ ρ in also gets ampliﬁed by this divergentfactor, leading often to divergence of the scheme if one leaves this untreated. Thisdeﬁnes Coulomb sloshing , and an eﬀective preconditioner to account for this behaviourwould be one that suppresses low G -vector components of the change in the density withrespect to the high G -vector components. In castep , the number of G -vectors increaseslinearly with volume of the unit cell. Thus, Coulomb sloshing is typically a problemwhen an input system has charge distributed across a large unit cell. For example,Figs . (2.3)(b)-(f) depict the descent of the electron density of a graphene nanoribbon(10˚A gap in both aperiodic directions) into divergence as an unpreconditioned, linearmixing scheme is used to converge the electronic structure. This example illustratessome key characteristics of charge sloshing in any form. That is, in Fig . (2.3)(b), thecharge density is initialised with only a small error with respect to the correct answer,Fig . (2.3)(a). This error then becomes ampliﬁed, as too much charge is moved to thecentre of the unit cell in Fig . (2.3)(c). The SCF cycle then overreacts again by movingfar too much charge to the edge of the unit cell, and so on. Note that, in the aboveanalysis, the eﬀect of the exchange-correlation kernel on the condition of the KS systemwas ignored. This is typically referred to as working in the random phase approximation (RPA) . In fact, the exchange-correlation kernel in, for example, the LDA, is not asource of ill-conditioning as it is a smooth, parametrised polynomial of the electrondensity only. An optimal preconditioner would include exchange-correlation eﬀects, butit is not necessary.Interestingly, insulators (a ﬁnite KS energy gap at the Fermi energy), and metals(overlapping bands across the Fermi energy) possess crucial diﬀerences relating to self-consistency. The following argument is intended to highlight the result of this diﬀerencein relation to charge sloshing, and therefore will be heuristic in nature. First, in asimple, homogeneous metal, the susceptibility becomes constant . That is, χ ( G , G (cid:48) ) = γδ ( G − G (cid:48) ), meaning the electrons are mobile across the metal in response to a pertur-bation in the potential, parametrised by this constant. The eigenvalues of the dielectric, As will be detailed in §

3, density mixing in castep is done in Fourier space. Although, a proper deﬁnition of the RPA within DFT is in how one recovers the interacting suscep-tibility from the KS susceptibility via a Dyson equation. . Theory

5 10 15 20 25(a) E l e c t r o n D e n s i t y Unit cell size along the width of the ribbon (Å)Fully converged density of a ZGNR along the width of the ribbon abc

5 10 15 20 25(b) E l e c t r o n D e n s i t y Unit cell size along the width of the ribbon (Å)Initial guess input density along the width of a ZGNR

5 10 15 20 25(c) E l e c t r o n D e n s i t y Unit cell size along the width of the ribbon (Å)Input Density at Iteration 1 5 10 15 20 25(c) E l e c t r o n D e n s i t y Unit cell size along the width of the ribbon (Å)Input Density at Iteration 2

5 10 15 20 25(d) E l e c t r o n D e n s i t y Unit cell size along the width of the ribbon (Å)Input Density at Iteration 3

5 10 15 20 25(e) E l e c t r o n D e n s i t y Unit cell size along the width of the ribbon (Å)Input Density at Iteration 4

5 10 15 20 25(f) E l e c t r o n D e n s i t y Unit cell size along the width of the ribbon (Å)Input Density at Iteration 5

Figure 2.3: (a) The fully converged, correct electron density for an eight atomgraphene nanoribbon (top right) sliced across the width of the ribbon using precon-ditioned Pulay mixing. (b)-(f) Progression of the electron density through the SCFprocess using unpreconditioned linear mixing. . Theory . (2.85), for this example now become ∼ − γ | G | . Hence, a constant suscep-tibility leads to Coulomb sloshing being maximally pronounced in simple metals. On theother hand, insulators possess an eﬀective ‘restoring force’ for electrons, i.e . the electronsare no longer arbitrarily mobile, which manifests in the susceptibility in the followingway: χ ( G , G (cid:48) ) ∼ η | G | [47, 56]. As a result, insulators tend to precondition themselvesagainst Coulomb sloshing, as the eigenvalues of the dielectric become ∼ − η . This isconstant for all Fourier modes, hence, an optimal preconditioner for simple insulators isan appropriately deﬁned linear scaling parameter, α .Ill-conditioning in the KS system now manifests in two additional cases: highly suscep-tible metals, and inhomogeneous (part metal part insulator) systems. The problematicnature of the former follows directly from the above arguments. The latter is prob-lematic due to the behaviour of an optimal preconditioner needing to be fundamentallydiﬀerent in the metallic and insulating regions, and therefore implicitly requiring inho-mogeneity. This would be the case if one were performing a surface or slab calculation,which involves a region of material and a region of vacuum.A ﬁnal source of ill-conditioning worth mentioning is that of band sloshing . It is funda-mentally quite diﬀerent to the other forms of sloshing discussed, and it occurs for systemspossessing a large density of states about the Fermi level. The reason for band sloshingis as follows: the orbital occupancy is decided by identifying which bands (eigenvectors)of the KS Hamiltonian are lowest in energy. So far, occupancy has been treated as abinary variable; orbitals are either completely occupied or not. Therefore, when manyorbitals are close in energy about the Fermi level, it becomes diﬃcult to assign sucha binary occupancy. The reason for this is that the mere act of occupying an orbitalresults in an energy change for bands of the subsequent KS Hamiltonian. It is possible,therefore, that a previously unoccupied band becomes lower in energy than an occupiedband (by nature of the bands being so close about the Fermi level). This band will thenbecome occupied for the subsequent SCF cycle, but in doing so, the now unoccupiedband could lower in energy again as a result of this change. Hence, certain systemsbecome impossible to converge with binary occupancies, due to a constant re-shifting ofband energies upon discontinuous occupancy. This is ﬁxed in part by introducing thenotion of partial occupancies of KS orbitals. The density, for example, now takes theform, ρ ( r ) = N b (cid:88) i =1 f i | ψ i ( r ) | , (2.87)where f i ∈ R is the occupancy of orbital i , now lying in the interval [0 , Theory . It suﬃces to note that one can eﬀectively smear this occupancy across the Fermilevel according to some (not necessarily physical) smearing function in order to bettercondition the self-consistency. For example, using the Fermi-Dirac distribution, f i = 1 e ( (cid:15) i − E f ) /T + 1 , (2.88)for some electronic temperature , T . The entropic contribution to the energy as a result ofthis eﬀective ﬁnite temperature can be subtracted post-convergence, leading to the zerotemperature solution. This is not a concern from the point of view of preconditioningdensity mixing schemes as deﬁned above, but was worth mentioning in a section devotedto ill-conditioning of the KS system. This section will be dedicated to reviewing attempts at preconditioning the self-consistentcycles from literature, which will ultimately serve to motivate and put into context thework done in §

4. For reasons already discussed, most preconditioners typically restrictto the RPA. Hence, the remainder of this work will set K xc = 0. It was shown in theprevious section that constructing the optimal preconditioner amounts to calculatingthe KS susceptibility exactly, and in turn calculating the exact inverse KS dielectric.This is typically done in Fourier space, where the inverse dielectric takes the form (cid:15) − ( G , G (cid:48) ) = (cid:34) (cid:90) d G (cid:48)(cid:48) (cid:16) δ ( G − G (cid:48) ) − K c ( G (cid:48) , G (cid:48)(cid:48) ) χ ( G , G (cid:48)(cid:48) ) (cid:17)(cid:35) − . (2.89)There is a subtlety introduced here: if K c ( G (cid:48) , G (cid:48)(cid:48) ) is deﬁned to be the Fourier transformof the Coulomb kernel, K c ( G (cid:48) , G (cid:48)(cid:48) ) = 4 π | G (cid:48)(cid:48) | δ ( G (cid:48) − G (cid:48)(cid:48) ) , (2.90)then χ ( G , G (cid:48)(cid:48) ) is not the Fourier transform of the real space susceptibility as deﬁnedabove due to the convolution required in Eq . (2.83). This is a key concern of the workto follow, and the relationship between the real and Fourier space susceptibilities will bedeﬁned more precisely in § δρ in n = (cid:16) I − K c χ (cid:17) − R [ ρ in n ] , (2.91) For example, the Levy-Lieb constrained optimisation now requires not only a search over all N -representable densities, but also over all occupancies such that (cid:80) i f i = N for f i ∈ [0 , . Theory . (2.91) would be to consider an exact expression forthe KS susceptibility. Following the mathematics of linear response theory applied toindependent particle systems, Adler [57] and Wiser [58] separately derived the exactexpression for the KS susceptibility, χ ( r , r (cid:48) ) = N e (cid:88) n =1 N b (cid:88) m = N e +1 ψ n ( r ) ψ ∗ m ( r ) ψ ∗ n ( r (cid:48) ) ψ m ( r (cid:48) ) (cid:15) n − (cid:15) m , (2.92)known as the Adler-Wiser equation. This equation can be expanded in a planewavebasis for Bloch wavefunctions, yielding χ ( G , G (cid:48) ) = (cid:88) k ∈ N b (cid:88) n,m =1 ( f n,k − f m,k ) (cid:104) n, k | e − i G . r | m, k (cid:105)(cid:104) m, k | e i G . r | n, k (cid:105) (cid:15) m,k − (cid:15) n,k . (2.93)Importantly, this expression involves a sum over all unoccupied bands of the KS Hamil-tonian. Therefore, preconditioning with the exact susceptibility in this form is not apractical approach in a planewave basis, as it involves at least an ∼ O ( N G ) compu-tation (see § . [47] notes, constructing and applyingthis susceptibility is in fact an O ( N G ) operation with a large prefactor. As such, itis not used in state-of-the-art electronic structure calculations. However, this methodserves as a useful proof of concept for the following work. That is, if the exact linearresponse is known, convergence to a self-consistent solution can be achieved in O (1)iterations, even for the complex interfaces studied in Ref . [59] . Since 1982, multipleattempts have been made at simplifying this expression aiming to either suppress theprefactor, or the scaling [10, 60, 61]. To highlight one in particular, Gonze (of abinit )and Anglade in 2008 were able to devise a method whereby all unoccupied bands neednot be calculated – the ‘extrapolar’ method [10]. Instead, the susceptibility is calculatedexactly for n, m ≤ N e , and the unoccupied orbitals are approximated by planewaves,allowing a ‘closure relation’ in the ‘inﬁnite’ sum to be utilised. The details are notimportant here, it being suﬃcient to note that the computational scaling (now O ( N e ))and prefactor remain impractical, especially for large systems. Preconditioning using anexplicit, exact expression for the susceptibility has thus achieved limited success so far.The Adler-Wiser equation is not the only technique to compute the susceptibility exactly.The susceptibility can also be calculated with a series of numerical derivatives of theKS map, which is perhaps a more conceptually straight forward technique. That is, the The exact linear response here implicitly includes all inhomogeneities of the system at hand, alsoknow as the ‘local ﬁeld eﬀects’. . Theory single planewave G , then the response of the KS systemis computed across all G (cid:48) . This would produce one column of the susceptibility matrix,and a full construction χ would require N G of these numerical derivatives. Again, this isclearly impractical, but the idea itself can be utilised by noticing that one simply needsthe application of the dielectric on a vector (the residual), not the full matrix. Thisapplication can be performed by taking only one numerical derivative of the KS map, orby use of the Sternheimer equation [62]. This concept will be elaborated further whendiscussing potential future work in § model in an attempt to capture the characteristicbehaviour of charge screening in KS DFT without making explicit reference to the inputsystem at hand. In principle, this model would be derived analytically, producing littleadditional computational overhead. By far the most successful dielectric model forpreconditioning SCF cycles has been the so-called Kerker preconditioner [63]. Basedon the work of Manninen et al . [64], an analytic expression is derived for the inversedielectric of the homogeneous electron gas (HEG). The Kerker form can thus be obtainedby ﬁrst considering the linear response in real space of a homogeneous and isotropicsystem, χ ( r , r (cid:48) ) → χ ( | r − r (cid:48) | ), satisfying δρ out ( r ) = (cid:90) d r (cid:48) χ ( | r − r (cid:48) | ) δv inh ( r (cid:48) ) . (2.94)This expression is a convolution over the input potential, and therefore takes the formof a product in Fourier space, δ ˜ ρ out ( G ) = ˜ χ ( | G | ) δ ˜ v inh ( G ) . (2.95)This susceptibility is local and homogeneous in Fourier space, and the correspondinginverse KS dielectric is (cid:15) − ( | G | ) = (cid:16) I − π | G | ˜ χ ( | G | ) (cid:17) − . (2.96)The ﬁnal step in obtaining the Kerker form is to identify the susceptibility of Eq . (2.95)with the Thomas-Fermi screening wavevector resulting from Thomas-Fermi (TF) theoryapplied to the HEG [28] . Within this framework, the susceptibility becomes constantfor all Fourier modes, leading to χ = k tf ∼ constant. The magnitude of the TFwavevector, and hence the susceptibility, is controlled entirely by the HEG density pa-rameter, r s . The Kerker preconditioner is now given in its fully parametrised form Thomas-Fermi theory applied to the HEG comes with a certain set of approximations. For example,the induced potential from an external perturbation is slowly varying in space (see Ref . [28]). . Theory (cid:15) − ( | G | ) = α (cid:16) | G | | G | (cid:17) − = α | G | | G | + | G | , (2.97)where | G | = 4 πk tf is now a parameter of the scheme, along with the linear mixingparameter α from § . In part due to its simplicity, the Kerker preconditionerhas seen astonishing success in stabilising and accelerating self-consistency in KS DFTcalculations. The mixing parameter α serves the same purpose as in § | G | nowdescribes the degree of ‘metallicity’ of an input system. That is, in the limit of | G | = 0,the preconditioner reduces to the linear mixing parameter for all Fourier modes, whichwas shown to be the optimal preconditioner for simple insulating systems. Increasing | G | acts to further suppress low magnitude G -vectors components of the density update.The more susceptible an input, the more it is prone to Coulomb sloshing, and a higher | G | is required. Kresse et al . [46] suggest the parameters α = 0 . | G | = 1 . − perform adequately across a range of input systems, and are therefore also the defaultparameters in castep . A plot demonstrating how the Kerker preconditioner weightscomponents of the density to counteract Coulomb sloshing for three characteristic valuesof | G | is given in Fig . (2.4). | G | ( − ) I n v e r s e d i e l e c t r i c G =1 . − G =0 . − G =2 . − Figure 2.4:

The Kerker form across a set of typical input parameters for α = 0 . This is, in fact, the static limit of the

Lindhard dielectric [65] for the case of TF theory applied tothe HEG. . Theory

42A key reason for the success of the Kerker preconditioner is that it interfaces well withaccelerated mixing schemes. That is, the Kerker form often successfully compresses theeigenspectrum of the iterative Jacobian when using Broyden or Pulay mixing at eachiteration. As such, the Kerker form is used almost exclusively to augment acceleratedmixing schemes, rather than as a stand-alone mixing scheme. This is unlike exact calcu-lations of the susceptibility above, which typically do not interface well with acceleratedmixing schemes. Instead, direct dielectric mixing is used due to the inherent accuracyand system-dependence in the construction of the exact susceptibility. The major draw-back of Kerker preconditioning is that it involves an entirely homogeneous model ofthe dielectric, meaning self-consistency can be hindered for highly inhomogeneous inputsystems. Moreover, there is no scope for systematically identifying input dependent fea-tures (e.g . vacuum gaps), and adapting parameters on-the-ﬂy. This is in part due to thecomplexity of the HEG dielectric in real space, involving an integral over a Yukawa-typescreening kernel (see Ref . [64] and § | G | -dependence in χ in such a waythat better treats insulating and semiconducting systems [66, 67]. For example, Zhou etal . [68] utilise an extension of TF theory to insulators, originally derived by Resta [69].This leads to a dielectric model of the form (cid:15) − ( | G | ) = | G | | G | + f ( | G | ) , (2.98) f ( | G | ) = α sinc( | G | β ) γ , (2.99)where α , β , and γ are parameters of the model. While this may provide a better treat-ment of screening in semiconductors and insulators from a physical perspective, the coredrawbacks of the Kerker preconditioner remain. That is, the model is still homogeneous,and improvements pertaining to convergence are generally minimal and entirely param-eter dependent. Therefore, the work presented in § § . [47]and [70], which provide systematics for constructing the Kerker preconditioner in realspace. ethodology This chapter will detail the methodology employed by the following work in an attempt toimprove density mixing. First, a note on the numerical implementation of density mixingwithin castep will be given. Following this, the Marks & Luke and Periodic Pulaymixing schemes will be presented and discussed, including implementation speciﬁc detailspertaining to castep in particular. Lastly, a framework for including imhomogeneityin the preconditioner will given. The scope for improvement with such a framework inplace will be considered using two preliminary models.

Density mixing within castep is done in Fourier space. This means the output densityfrom the KS orbitals is ﬁrst Fourier transformed, then the mixing scheme is appliedand the subsequent iterative input density is inverse Fourier transformed in order toconstruct the corresponding KS Hamiltonian. Density mixing in Fourier space has anumber of advantages. First, one can make the observation that the exact KS dielectricin Fourier space for both insulators and metals tends to unity in the limit of increasing | G | [46, 65]. Therefore, the optimal update in the density for these Fourier modes is infact a ﬁxed point update, i.e . no mixing needs to be done for | G | > | G | cut-oﬀ . This allowsone to deﬁne a ‘mixing sphere’ in Fourier space whereby all density components insidethe sphere are fed into the mixing scheme, and the remainder are left unchanged. Thecomputational and memory overhead of density mixing is thus drastically reduced at noloss to the mixing scheme, the only caveat being that two extra FFTs on the full gridare required. In addition to this, the | G | = 0 component of the density is explicitly notmixed, as ﬁxing this component amounts to conserving net charge. A direct result ofmixing in Fourier space is that all densities are now complex objects, therefore care mustbe taken to appropriately mix both the imaginary and real components of the densitycorrectly. 43. Methodology § G -vector components of the density update should be suppressed accordinglywith a preconditioner. In this vein, Kresse et al . [46] also deﬁne a mixing metric withwhich to evaluate scaler products of the density, (cid:104) ρ in n | ρ out n (cid:105) = N b (cid:88) i,j =1 g ij ( ρ in n ) i ( ρ out n ) j , (3.1) g ij = | G | i + | G | i | G | i δ ij . (3.2)The parameter | G | describes the extent to which low G -vector components of the densityare suppressed in the scalar product. This is typically deﬁned such that the longestwavelength Fourier mode is weighted approximately twenty times less than the shortestwavelength Fourier mode. The introduction of the mixing metric therefore providesa systematic method of dealing with Coulomb sloshing within a mixing scheme thatbecomes inherent to the scheme.Finally, it is worth noting the eﬀect of initial random seed on density mixing. Thepurpose of the random seed in castep is to initialise the wavefunctions for the iterativediagonaliser. One would expect therefore that the iterative diagonaliser would convergeappropriately, and the output density would have no dependence on the random seed. Inpractice however, the wavefunctions do not need to be fully converged at each SCF cycle,as the majority of SCF cycles solve the KS equations with an unconverged (non-physical)density. Hence, convergence of the iterative diagonaliser is deﬁned such that the generalbehaviour of the KS system is captured, but with a certain liberty taken in the tolerancesas the interim solutions will not be used to determine physics. It is possible that thiserror could aﬀect some schemes more than others, and it indeed produces some variancein the number of SCF cycles taken to converge for otherwise identical input systems.However, this diﬀerence is typically small, and it is not expected that the nature of therandom seed will alter any of the conclusions drawn from the testing in § & Luke

The Marks & and Luke scheme of Ref . [12] is an extension to Broyden’s class of methodswhereby the iterative Jacobian is determined by solving a set of secant equations utilisingthe entire iterative subspace, rather than just the most recent iterations’ data. Hence,the Marks & Luke scheme is a multisecant Broyden method . This extension is relatively. Methodology . (2.56),min J n ∈ S || J n − J n − || f , (3.3)but the set S now deﬁnes the secant condition over the entire history of densities andresiduals. That is, S := { J n | J n ∈ C N G × N G , J n S n = Y n } (3.4)where the objects S n , Y n ∈ C m × N G now deﬁne a column of density and residual diﬀer-ences respectively over the m -dimensional subspace of iterates, S n := [∆ ρ in n − m , . . . , ∆ ρ in n ] , (3.5) Y n := [∆ R n − m , . . . , ∆ R n ] . (3.6)The solution to this constrained optimisation follows the same procedure as in § ρ in n → S n and ∆ R n → Y n , yielding J n = J n − + ( Y n − J n − S n )( S † n S n ) − S † n , (3.7) J − n = J − n − + ( S n − J − n − Y n )( Y † n Y n ) − Y † n . (3.8)These deﬁne the multisecant variant of Broyden’s ﬁrst and second methods respectively,hereafter MSB1 and MSB2. Substituting Eqs . (3.7) and (3.8) into the Newton updateformula gives ρ in n +1 = ρ in n + P ( I − Y n A n ) R n − S n A n R n , (3.9)for some preconditioning matrix P , where A n deﬁnes MSB1 and MSB2 via A MSB1 n := ( S † n Y n ) − S † n , (3.10) A MSB2 n := ( Y † n Y n ) − Y † n . (3.11)There is in fact an error in the Marks & and Luke paper here, where Eq . (3.9) has anincorrect sign that is propagated throughout the paper.The update in Eq . (3.9) now needs to be adapted speciﬁcally for numerical implemen-tation within KS DFT. First, note that there is a possibility of divergence due to theinversion in the deﬁnition of A n . Similar to Pulay’s DIIS, divergence occurs if the it-erative densities are not strongly linearly independent, or when the residual becomestoo small close to convergence. The latter can be treated by including some form ofregularisation in the inversion. This is done using Tikhonov regularisation , i.e . a scalar. Methodology γ is added to the diagonal of the argument of the inversion . Furthermore,the scheme is normalised by introducing the vectorΨ n = diag( || ∆ R m − n || − , . . . , || ∆ R n || − ) , (3.12)which simply acts to scale the scheme, and will not change the underlying behaviour.With these numerical considerations in place, the update matrices take the form A MSB1 n := Ψ n (Ψ n S † n Y n Ψ n + γI ) − Ψ n S † n , (3.13) A MSB2 n := Ψ n (Ψ n Y † n Y n Ψ n + γI ) − Ψ n Y † n . (3.14)The ﬁnal contribution of Marks & Luke to the scheme was to redeﬁne the density andresidual history matrices in the following way. First, notice that the current deﬁnitionsof S n and Y n in Eqs . (3.5) and (3.6) populate the i th row with the density and residualdiﬀerences deﬁned as ∆ ρ in i = ρ in i − ∆ ρ in i − , (3.15)∆ R i = R i − R i − . (3.16)That is, the iterative Jacobian is required to satisfy the set of secant conditions as theywere deﬁned in the past iterations , see Fig . (3.1) (left). There is no a priori reason tosuggest that this set of secant conditions provides better information than a set of secantconditions centred around the current iteration, i.e . redeﬁne∆ ρ in n,i := ρ in n − ∆ ρ in i , (3.17)∆ R n,i := R n − R i , (3.18)at every iteration for all i ∈ [ n − m − , n − . (3.1) (right). This re-centring of thesecant condition at every iteration amounts to treating the history of data as samplepoints in a phase space, rather than than the contiguous path of all secant conditionscycled through in the history toward convergence.Lastly, a note will be given on the diﬀerences between above scheme, and the schemeoriginally presented by Marks & Luke in Ref . [12]. Namely, the Marks & Luke schemewas implemented in wien2k , an all-electron, augmented planewave approach [13]. Asthis implementation involves a partially real space basis set, density mixing is also done inreal space. Thus, the Marks & Luke methodology was re-derived with complex densities Marks & Luke suggest a value of γ = 10 − for the regularisation parameter. This value will also beused in the remainder of this work. Augmented planewave (APW) implementations involve some mix of atomic orbital basis functions,and planewaves. . Methodology ρ ini R [ ρ i n i ] i =1 i =2 i =3 i =4 i =5 i =6 ρ ini R [ ρ i n i ] i =1 i =2 i =3 i =4 i =5 i =6 Figure 3.1:

Snapshot on the sixth iteration of a decreasing residual ( y -axis) as ρ ini ( x -axis) moves through phase space toward convergence. Each node ( ) denotes theinformation { R i , ρ in i } . A secant condition is thus deﬁned as the ﬁnite-diﬀerence equationbetween any two nodes in the phase space to be satisﬁed by the Jacobian, denoted bya black arrow. (Left) The secant conditions are ﬁxed and deﬁned contiguously withincreasing i . (Right) The re-centering of the Marks & Luke scheme now deﬁnes allsecant conditions with respect to data on the most recent iteration, i = 6. in mind, resulting in the conjugate-transposes deﬁned above. Moreover, the treatmentof core electrons in wien2k leads to an entirely diﬀerent preconditioning approach thatis not applicable to density mixing in castep . As such, the dielectric preconditioningof § A generalisation of Pulay’s DIIS has been proposed by Banerjee et al . in Ref . [14],hereafter referred to as the Periodic Pulay mixing scheme. The concept behind PeriodicPulay is as follows. Traditional Pulay mixing involves an extrapolation over the subspaceof iterative densities, thus ﬁnding the best (least squares) ﬁt to the converged densitythat this subspace allows. If one element of the subspace were particularly ‘close’ tothe self-consistent solution, the DIIS would weight this element in the basis higher thanother elements. Indeed, if the converged density ρ ∗ were a member of the subspace, theDIIS would weight this element with unity, and zero the rest. Naturally, the eﬃcacy ofthe DIIS depends entirely on the sample of information in the history. One can noticetherefore that linear mixing will contribute a diﬀerent type of information to the historythan a DIIS step. That is, a linear step tends to be eﬃcient for either low or highwavelength Fourier components of the density, but not both , depending on the linearmixing parameter. Ill-conditioning of the exact dielectric prevents the linear step frombeing eﬃcient in and of itself. However, linear mixing has the potential add informationto the history that perhaps would not have been explored in DIIS steps, by nature ofthe step itself not being eﬃcient. Thus, the idea of the Periodic Pulay method is to. Methodology

Algorithm 1 : Periodic Pulay Mixing

Input : ρ in1 , σ , tol , k Output : ρ in n = ρ out n = ρ ∗ • while || R i || < tolR i = ρ out i − ρ in i Add { ρ in i , ρ out i } to history if ( i + 1) /k ∈ N ρ in i +1 ← DIIS else ρ in i +1 ← ρ in i + σR i end if • end while In words, one takes k − σ . In the limit k → ∞ the scheme becomes linear mixing,whereas k = 1 deﬁnes Pulay mixing. Ref . [14] suggests the algorithm performs best with k ∼ σ ∼ .

1; results for this scheme are given in § Dielectric models such as Kerker form, or simple extensions to the Kerker form, wereshown to be inadequate for treating inhomogeneous systems. This is largely due to aninability of the model, and implementations of the model, to include a systematic input-dependence. Any input-dependence is required to be included explicitly by appropriatelyadjusting input parameters. Even with a perfect parameter set, there is no possibilityfor treating inhomogeneity. This aim of this section two-fold. First, the complicationsin constructing the Kerker preconditioner in real space will be discussed. The diﬃcultyof including an inhomogeneous r -dependence in the susceptibility will thus become ap-parent. To remedy this, a framework will be detailed whereby one can propose an r -dependent model of the susceptibility, and use this to precondition the self-consistentcycles in a computationally eﬃcient manner. This will allow one to implicitly include. Methodology r -dependence, while circumventing the computational expense ofconstructing the real space Kerker preconditioner. Finally, two preliminary, physicallymotivated models to be tested in § . [47, 64, 70]. To give a brief overview, one seeks the dielectric response of the HEGwhen subject to a perturbation in the charge density, δρ induced ( r ) (caused, for example,by placement of a test charge in the system). Under the assumptions of TF theory,this perturbation in the density results in a constant and local perturbation to theelectrostatic ﬁeld in the medium, δρ induced ( r ) = k tf π δv ( r ) , (3.19)which is to say the susceptibility is constant. Here, δv ( r ) is introduced as the change inthe potential that will propagate throughout the medium as a result of the perturbationin the density. One can thus solve the relevant Poission equation to determine how themedium will respond to this perturbation, ∇ δv ( r ) = − πδρ induced ( r ) = − k tf δv ( r ) , (3.20)= ⇒ δv ( r ) ∼ e − k tf | r | | r | . (3.21)Therefore, a test body of charge q placed at r will not, as one might expect, result inthe medium experiencing the Coulomb potential for all r (cid:48) (cid:54) = r . Instead, the Coulombpotential is screened by a factor which decays exponentially away from r over somecharacteristic length scale k − tf . This deﬁnes the Yukawa potential , Eq . (3.21). To capturethe dielectric response of the HEG in real space hence requires a non-local integral overthe Yukawa kernel . This is, in fact, the form in which the Kerker preconditioner wasoriginally proposed by Manninen et al . in Ref . [64]. The non-local nature of the dielectricrenders real space Kerker preconditioning ineﬃcient. However, if the mixing were to bedone in real space, one can still apply the operators in Fourier space. A computationallyeﬃcient application of the Kerker preconditioner to a real space object is as follows.First, an accelerated mixing scheme is used to determine the unpreconditioned updatein real space, J − n R [ ρ in n ( r )] := b ( r ). Then, the preconditioned update, δρ n ( r ), solves thelinear system, (cid:16) I − K c χ (cid:17) δρ n ( r ) = b ( r ) . (3.22)Since χ is constant and K c is diagonal in Fourier space, the solution to this linearsystem becomes straight forward. One simply performs a Fourier transform to obtain Alternatively, one can solve a modiﬁed Poission equation on a real space grid, as is done in Ref . [70]. . Methodology b ( G ), inverts the dielectric, applies it to ˜ b ( G ), and then inverse Fourier transforms toobtain δρ n ( r ). This procedure thus deﬁnes Kerker preconditioning when mixing is donein real space.The diﬃculty of including inhomogeneity now becomes apparent. Instead of the suscep-tibility being constant across the entire unit cell, leading to χ ( r , r (cid:48) ) = γδ ( r − r (cid:48) ), thefollowing work now seeks to include a local r -dependence to capture the inhomogeneity, χ ( r , r (cid:48) ) = ψ ( r ) δ ( r − r (cid:48) ). The object ψ ( r ) now deﬁnes some model for the susceptibility,yet to be derived. Therefore, within this framework, the susceptibility is not diagonal inFourier space. Hence, the dielectric is diagonal in neither space, meaning the solutionto the linear system in Eq . (3.22) has become non-trivial.In order to solve Eq . (3.22) with an r -dependent susceptibility, one could start by con-structing ˜ χ ( G , G (cid:48) ) directly from ψ ( r ). This turns out to be ineﬃcient for variousreasons, but will serve to highlight key features of the framework, and will hence bedetailed. If the linear response functions are deﬁned in the following way, δρ out ( r ) = (cid:90) d r (cid:48) χ ( r , r (cid:48) ) δv h ( r (cid:48) ) , (3.23) δ ˜ ρ out ( G ) = (cid:90) d G (cid:48) ˜ χ ( G , G (cid:48) ) δ ˜ v h ( G (cid:48) ) , (3.24)then it is clear that ˜ χ ( G , G (cid:48) ) (cid:54) = F [ χ ( r , r (cid:48) )] as a two-dimensional convolution is re-quired. However, one can notice that if the real space response is local, then the Fourierspace response becomes a Toeplitz matrix . That is, for some local susceptibility ψ ( r ),the corresponding Fourier space response is˜ χ ( G , G (cid:48) ) = ˜ ψ ( G − G (cid:48) ) (3.25)where ˜ ψ ( G ) = F [ ψ ( r )], which deﬁnes the functional form of a Toeplitz matrix. Once theToeplitz matrix of ˜ ψ ( G ) has been constructed, the linear system in Eq . (3.22) now deﬁnesa Toeplitz system , which can be solved in O ( N G log N G ) operations [71]. However, thismethod remains impractical for large systems, both in terms of memory (constructingthe Toeplitz matrix) and computational (solving the resultant linear system) overhead .An eﬃcient method to solve Eq . (3.22) is thus required, suitable for N G ∼ O (10 )calculations. This was done by utilising an iterative diagonaliser, and the algorithm isoutlined as follows. Initially, this was the method used to construct the preconditioner and the diﬃculties discussedbecame apparent in testing. . Methodology Algorithm 2 : Solve (cid:16) I − K c χ (cid:17) ˜ x ( G ) = ˜ b ( G ) Input : The vector to be preconditioned J − n R [ ρ in n ( G )] := ˜ b ( G ) Output : The preconditioned vector ˜ x ( G ) • Initial guess: ˜ x ( G ) • Store the initial guess: ˜ x (cid:48) ( G ) ← ˜ x ( G ) • Transform to real space: x ( r ) ← F − [˜ x ( G )] • Apply the susceptibility: x ( r ) ← (cid:82) d r (cid:48) ψ ( r (cid:48) ) δ ( r − r (cid:48) ) x ( r (cid:48) ) • Transform to Fourier space: ˜ x ( G ) ← F [ x ( r )] • Apply the Coulomb kernel: ˜ x ( G ) ← K c ( G )˜ x ( G ) • Compute and store the ﬁnal result: ˜ x ( G ) ← ˜ x (cid:48) ( G ) − ˜ x ( G ) • Construct the residual: r = ˜ x ( G ) − ˜ b ( G ) • if || r || < tol exit: ˜ x (cid:48) ( G ) is the solution • else update via an iterative algorithm: ˜ x ( G ) = ˜ x (cid:48) ( G ) + . . . • Repeat until convergence of the iterative solverThis algorithm requires two FFTs and 2 N G ﬂoating point multiplications per step in theiterative solver (ignoring solver speciﬁc constructions). If the iterative solver converges in O (10) iterations, there is an additional computational overhead of approximately twentyFFTs and ∼ N G ﬂoating point multiplications. This is negligible as castep executes O (10 ) FFTs in a typical calculation [72]. Moreover, the storage requirements are min-imal as only O (1) N G -length vectors are needed. Therefore, the methodology presenteddeﬁnes a computationally and memory eﬃcient procedure for applying an inhomoge-neous preconditioner in Fourier space. The preconditioner is now entirely speciﬁed bythe susceptibility diagonal, ψ ( r ). Before determining some preliminary models for ψ ( r ),a note on the implementation of the above algorithm can be given. Namely, an iterativesolver must be chosen that is appropriate for the above scheme. Initially, the (diago-nally preconditioned) Jacobi and Gauss-Siedel solvers were tested [73]. However, thesewere eventually found to be insuﬃcient. The reason for this is that both the Jacobi. Methodology diagonally dominant . Theoptimal value for the susceptibility in vacuo is χ = ψ = 0 (detailed shortly). Therefore,for input systems containing above some threshold amount of vacuum, the Jacobi andGauss-Siedel solvers tended to diverge. This was remedied by implementing a conjugategradient algorithm, the method for which can be found in Ref . [73]. Note, however, thatthis is still a preliminary treatment designed for testing purposes only. A truly optimalsolver here would require a more careful consideration of the linear system at hand. Forexample, to improve robustness, one could consider implementing a Krylov-subspacebased method such as the GMRES detailed in § § ψ considered in this work is derivedby noticing that the exact susceptibility for the vacuum is zero. That is, the dielectricresponse of the vacuum is unity. Therefore, one can deﬁne ψ in the following way, ψ ( r ) =0 for ρ out n ( r ) in vacuum (3.26) ψ ( r ) = γ for ρ out n ( r ) in bulk , such that the bulk is treated with the Kerker model, and the vacuum is treated exactly.It is expected that this model will perform well for surface or slab input systems, andperform approximately equivalently to Kerker preconditioning for bulk systems. Thismodel will hereafter be referred to as the vacuum scaled (VS) susceptibility model.The second model, inspired in part by the work of Scherlis et al . [74], parametrises thesusceptibility with a smooth density dependence of the following form ψ ( r ) = α (cid:32) ρ out n ( r ) ρ (cid:33) β , (3.27)for some α , β , and normalisation ρ . Ideally, the parameters α and β are derived fromphysical theory to give a well-motivated, unique model. This is done in the followingwork by considering an inhomogeneous variant of TF theory. That is, the TF screeningwavevector is typically derived in terms of a constant HEG density ρ heg , given by k tf = 4 (cid:32) ρ heg π (cid:33) / , (3.28) In fact, Gauss-Siedel requires the system to be diagonally dominant or symmetric and positivedeﬁnite. However, while the KS dielectric presented here is manifestly symmetric, it is not necessarilypositive deﬁnite. Positive-deﬁniteness depends on the model of ψ . . Methodology ρ heg is determined by the HEG density parameter, r s (see Ref . [28]). One can nowtake a similar approach to § ψ ( r ) = 4 (cid:32) ρ out n ( r ) πρ (cid:33) / , (3.29)hereafter referred to as the inhomogeneous Thomas-Fermi (ITF) model. Again, it isexpected that this model will provide an improved treatment for slab and surface systemsdue to it possessing the correct vacuum limit, ψ → ρ out n →

0. Furthermore, a densitydependence of this form is expected to provide a more nuanced treatment of the chargescreening resulting from ‘rapid’ (on the scale of the unit cell) variations in the density– so-called ‘local ﬁeld eﬀects’ [65]. However, this preconditioner is not expected toimprove ill-conditioning due material interfaces that include both metallic and insulatingregions. This is because appropriate preconditioning would require determination of theelectronic behaviour of each region. esults and Discussion

Density mixing schemes will be judged and compared on two criteria: eﬃciency andreliability. Eﬃciency will be measured using iteration count, rather than wall clocktime, as they are equivilent in the present context. That is, when performing densitymixing, the time expended is assumed to be negligible compared to that of diagonalisingthe Hamiltonian for all schemes presented. This was explicitly detailed in § at all , not considering the number of iterations ittakes to achieve this convergence. Clearly, then, one needs to quantify how one weightsreliability versus eﬃciency for density mixing. For a default method, reliability is, insome sense, more important than eﬃciency. Thus, any of the schemes to follow that areable to converge a wide range of systems without failure will be compared favourablyagainst less reliable, but perhaps more eﬃcient, schemes. That is, of course, as long asthis decrease in eﬃciency is not by any more than an O (1) factor (at risk of densitymixing becoming a worse EDFT).With a rubric in place to judge competing methods, one can now question what improve-ments are realistically achievable in order to set a standard for the following schemes.The absolute best-case-scenario for denisty mixing methods would be to match the eﬃ-ciency and reliability of a scheme utilising an exact susceptibility computation. It wasderived in § and O − , but the initialisation of the charge density will not reﬂectthis bonded structure, instead initialising the lattice with a more covalent character.Accounting for this, the best-case-scenario would be convergece across all input systemsin 3-7 iterations, as concluded by Ref . [59]. Increasingly sophisticated and tuned denistymixing schemes should thus approach the limit of 3-7 iterations across all input systems.54. Results In order to properly assess the eﬃciency and robustness of any changes made to castep ,a suite of representative test systems is constructed and motivated. In creating thistest suite, a particular emphasis was put on sloshing prone systems (see § . B.When a representative comparison is done utilising the full test suite, the following DFTparameters are used.– Exchange-correlation functional: Perdew, Burke & Ernzerhof (PBE) [76]– Cut-oﬀ energy: converged to approximately 0 .

03 eV/atom– Spin unpolarised– Monkhorst-Pack k -point spacing: 0.05˚A − Any analysis on individual systems will also use these parameters, unless stated other-wise. As an exemplar, the test suite can be used to compare the two standard densitymixing methods in castep : Broyden’s method (Johnson’s variant [51]), and Pulay’smethod, both Kerker preconditioned. From Fig . (4.1) it can be seen that Pulay mixingresults in a signiﬁcant improvement over Broyden mixing. For one system, a far-from-equilibrium phase of TiK, Broyden mixing even diverges where Pulay mixing is ableto converge . The total number of SCF cycles taken to converge the test suite can becalculated. By this method, Pulay mixing was found to be approximately 14% moreeﬃcient than Broyden mixing. Moreover, Pulay mixing appears to be more robust, al-though a methodical study of robustness here is yet to be done. The caveat, however, isthat Pulay’s method requires the inversion of a potentially singular matrix when closeto convergence. This can be ﬁxed with relative ease by including some form of regular-isation that does not interfere with the scheme. Therefore, it is suggested that perhapsPulay mixing be made the default density mixer in castep . In fact, after some preliminary analysis on AIRSS output, it was often found to be the case thatPulay mixing was signiﬁcantly more robust than Broyden mixing. . Results S C F C y c l e s t o C o n v e r g e : P u l a y M i x i n g Pulay WinsBroyden Wins

Figure 4.1:

Comparison of the SCF cycles to converge for Broyden’s and Pulay’smethods using the test suite. Data points in the lower triangle signify an improvementfor Pulay’s method, whereas data in the upper triangle signify an improvement forBroyden’s method, as labelled. Data points lying on the edge of the plot have failed toconverge . The ‘net winner’, by total number of SCF cycles over the whole test suite, iscoloured green. & Luke

The results of Fig . (4.1) demonstrate that the most eﬀective default scheme in castep for achieving self-consistency is Kerker preconditioned Pulay mixing. Hence, Kerkerpreconditioned Pulay mixing will be taken as the reference method of comparison forall subsequent testing. First, the Marks & Luke scheme of § m = 30. Furthermore, unless stated otherwise, the default castep Kerker preconditioner parameters will be used: α = 0 . | G | = 1 . − .In the original work of Marks & Luke, the scheme was preconditioned using a (safe-guarded) linear scaling parameter. Therefore, an initial test of both MSB1 and MSB2 isprovided preconditioned with a simple linear scaling parameter in an attempt to matchthe results of Marks & Luke, thus verifying the correctness of the implementation. Thisamounts to setting the preconditioning matrix P in Eq . (3.9) equal to a scalar constant,. Results σ . After a preliminary parameter analysis, it was found that σ = 0 . . As expected, Fig . (4.2) demonstrates thatneither MSB1 nor MSB2 in this form are able to provide a systematic improvement over dielectric preconditioned Pulay mixing. However, the nature of linear preconditioningas discussed in § . (4.2) conﬁrm the correctness of the implementation,and an attempt at dielectric preconditioning can now be made.It is not obvious in the initial presentation of the Marks & Luke scheme how the updateshould be preconditioned for optimal performance. Hence, the scheme was reassessedtaking particular care to track the position of the initial guess Jacobian. As dielectricpreconditioning should iteratively pre-multiply the initial guess Jacobian, this analysishighlighted the correct placement for the preconditioning matrix P in the Marks &Luke update, Eq . (3.9). As such, MSB1 and MSB2 are tested using the diagonal Kerkerform for P , the results for which are displayed in Fig . (4.3). As expected, dielectricpreconditioning leads to a performance improvement across the whole test suite, andMSB2 remains superior to MSB1. However, Kerker preconditioned MSB2 mixing is 9%less eﬃcient across the whole test suite than Pulay mixing, and still fails to convergeone system – a far-from-equilibrium phase of TiK. This is perhaps a shortcoming of thepreconditioner, rather than the underlying method, as Fig . (4.4) demonstrates. That is,the TiK phase was found to be prone to band sloshing, requiring the calculation of manyadditional, unoccupied bands to reach converge. This suggests the material has a largedensity of states at the Fermi level, and is thus also prone to Coulomb sloshing. As such,MSB2 and Pulay mixing were performed on TiK with an appropriately adjusted Kerkerparameter of | G | = 3 . − to better account for Coulomb sloshing in the preconditioner.In doing so, MSB2 and Pulay mixing were found to converge at the same rate (Fig . (4.4)), suggesting that, with an optimised set of Kerker parameters, MSB2 could rivalthe eﬃciency and robustness of Pulay mixing. Furthermore, it can be noted that MSB2is in fact an improvement over Broyden mixing as implemented in castep , Fig . (4.5). Unfortunately, there is no clear map between the value of the linear scaling parameter used in wien2k , and the value used in the following work. This is due to inherent implementation speciﬁcdiﬀerences between wien2k and castep . . Results S C F C y c l e s t o C o n v e r g e : P u l a y M i x i n g MSB1: Scalar Preconditioned σ =0 . Pulay WinsMSB1 Wins0 5 10 15 20 25 30 35 40SCF Cycles to Converge: MSB2 Mixing0510152025303540 S C F C y c l e s t o C o n v e r g e : P u l a y M i x i n g MSB2: Scalar Preconditioned σ =0 . Pulay WinsMSB2 Wins

Figure 4.2:

Comparison of linear preconditioned MSB1 (top) and MSB2 (bottom) toKerker preconditioned Pulay mixing over the full test suite. . Results S C F C y c l e s t o C o n v e r g e : P u l a y M i x i n g MSB1: Kerker Preconditioned Pulay WinsMSB1 Wins0 5 10 15 20 25 30 35 40SCF Cycles to Converge: MSB2 Mixing0510152025303540 S C F C y c l e s t o C o n v e r g e : P u l a y M i x i n g MSB2: Kerker Preconditioned Pulay WinsMSB2 Wins

Figure 4.3:

Comparison of Kerker preconditioned MSB1 (top) and MSB2 (bottom)to Kerker preconditioned Pulay mixing over the full test suite. . Results -5 -4 -3 -2 -1

0 5 10 15 20 25 R e s i d u a l SCF cycle Pulay |G |=1.5 Å -1 Pulay |G |=3.5 Å -1 MSB2 |G |=3.5 Å -1 Figure 4.4:

Convergence of a far-from-equilibrium TiK phase using MSB2 and Pulaymixing. The schemes were made competitive by an appropriate adjustment of | G | . However, the results presented here demonstrate that no meaningful improvement canbe made using the Marks & Luke scheme over Pulay mixing as implemented in castep .Lastly, it is worth also comparing the schemes for an increasingly ill-conditioned systemwhereby no new physics is introduced as a result of this ill-conditioning. To do this,an Aluminium surface is converged with an increasing vacuum gap of between 5˚A and23˚A, Fig . (4.6). Additional SCF cycles taken to converge after the total energy ofthe structure no longer changes is purely a numerical artefact, rather than a resultof the physical system at hand. In principle, then, a sophisticated scheme should beable to remove this numerical artefact completely, resulting in little-to-no additionaliterations with increasing vacuum (as, for example, in Ref . [77]). Both schemes suﬀerfrom a near linear relationship in iterations to converge versus vacuum gap, suggestingill-conditioning of this form is best dealt with in the preconditioner.. Results S C F C y c l e s t o C o n v e r g e : B r o y d e n M i x i n g MSB2: Kerker Preconditioned Broyden WinsMSB2 Wins

Figure 4.5:

Comparison of Kerker preconditioned MSB2 to Kerker preconditionedBroyden mixing over the full test suite.

12 14 16 18 20 22 24 26 28 30 4 6 8 10 12 14 16 18 20 22 24 S C F c y c l e s Vacuum Gap (Å) PulayMSB2

Figure 4.6:

SCF cycles to converge with increasing vacuum distance for an Aluminiumsurface using MSB2 and Pulay mixing. . Results Periodic Pulay mixing is tested in a similar fashion to the Marks & Luke scheme, byﬁrst comparing the method against Kerker preconditioned Pulay mixing across the testsuite. The parameters suggested by Banerjee et al . were found to give approximatelythe best compromise between DIIS steps and linear steps. That is, two linear steps perDIIS step, k = 3, with a linear step length σ = 0 .

1, Fig . (4.7). S C F C y c l e s t o C o n v e r g e : P u l a y M i x i n g Periodic Pulay: Linear, k =3 , σ =0 . Pulay WinsPeriodic Pulay Wins

Figure 4.7:

Comparison of Periodic Pulay (two linear steps per DIIS step) and Pulaymixing.

Counter to the conclusions of Banerjee et al ., Periodic Pulay mixing is found to be far lessrobust than Pulay mixing, resulting in a failure to converge for three systems. However,for simple systems (particularly insulating systems), Periodic Pulay mixing is found toaccelerate convergence over standard Pulay mixing. This is not a proof of concept forthe Periodic Pulay scheme per se , as simple insulating systems are best converged withlinear mixing anyway. Over the whole test suite, the improvement in eﬃciency is minorcompared to the drastic decrease in reliability. Hence, Periodic Pulay mixing is notin a position to replace Pulay mixing in castep , as the gain from an improved DIISextrapolation is not enough to overcome the ineﬃciency of the linear step. Interestingly,however, the concept of Periodic Pulay is displayed well by studying the convergencepatterns of individual systems. First, Fig . (4.8) displays convergence of an Aluminumsurface.. Results -9 -8 -7 -6 -5 -4 -3 -2 -1

0 2 4 6 8 10 12 14 16 18Linear Linear Pulay R e s i d u a l SCF cycle Periodic PulayPulay

Figure 4.8:

Comparison of Periodic Pulay (two linear steps per DIIS step) and Pulaymixing for an Aluminum surface.

As predicted, the linear steps (labelled) contribute little toward reducing the residualthemselves, but the DIIS extrapolation utilises this information extremely well. Assuch, the DIIS extrapolation is able to take a far more eﬃcient step toward convergencethan in regular Pulay mixing. This results in slightly accelerated convergence for theAluminium surface. However, the drawback of Periodic Pulay is highlighted by studyingthe convergence behaviour of a more complex system – a far-from-equilibrium phase ofLiCu, Fig . (4.9). -7 -6 -5 -4 -3 -2 -1

0 5 10 15 20 25 30 35 40 45 R e s i d u a l SCF cycle Periodic PulayPulay

Figure 4.9:

Comparison of Periodic Pulay (two linear steps per DIIS step) and Pulaymixing for a far-from-equilibrium phase of LiCu. . Results . (4.10) shows, this in fact results infurther deceleration, and is less eﬃcacious than using scalar preconditioned linear steps.The reason for this is that the scalar preconditioned linear steps were designed to providethe history with certain type of information that a sophisticated mixing scheme wouldnot contribute. Conversely, dielectric preconditioned linear steps explicitly attempt topredict the behaviour of a sophisticated mixing scheme. Hence, the advantage of adding‘non-sophisticated’ information to the history is negated. This argument is heuristic innature, and as Ref . [14] notes, the mathematics behind Periodic Pulay mixing is poorlyunderstood. Perhaps a more rigorous study of the mathematics would reveal how bestto utilise this concept, which clearly works (to an extent) from an empirical standpoint. S C F C y c l e s t o C o n v e r g e : P u l a y M i x i n g Periodic Pulay: Kerker, k =3 , σ =0 . Pulay WinsPeriodic Pulay Wins

Figure 4.10:

Comparison of Periodic Pulay (two Kerker steps per DIIS step) andPulay mixing. . Results Analysing the eﬀectiveness of a dielectric model to precondition the SCF cycles requiressome additional considerations to the previous testing. First, density mixing is now per-formed on the full reciprocal space grid. That is, the mixing sphere deﬁned in § G -vector grid, meaning density mixing is done for all Fourier compo-nents. Moreover, the mixing metric is also disabled as it can interfere with conclusionsdrawn regarding how eﬀective a given dielectric model is at suppressing instabilities. Inorder to ensure the implementation of the framework in § ψ ( r ) = γ . The conjugate gradient solver was convergedto near machine precision, and the framework indeed exactly reproduced the behaviourof the Kerker preconditioner.The VS susceptibility model is ﬁrst analysed. To do this, one can test the accuracy of just the dielectric model across the test suite, without reference yet to accelerated mixingschemes. That is, compare VS preconditioned linear mixing to Kerker preconditionedlinear mixing across the test suite, Fig . (4.11). S C F C y c l e s t o C o n v e r g e : K e r k e r P r e c o n d i t i o n e d L i n e a r M i x i n g Kerker WinsVS Wins

Figure 4.11:

Comparison of Kerker preconditioned linear mixing to VS preconditionedlinear mixing over the test suite.

As expected, simple dielectric mixing is far less eﬃcient and robust compared to theaccelerated mixing schemes of § § O (100),with multiple failures from both dielectric models. Perhaps unexpectedly, the VS suscep-tibility model is unable to demonstrate an improved eﬃciency for any system in the test. Results ρ in n +1 = ρ in n − (cid:15) − (cid:16) ρ out n − ρ in n (cid:17) . (4.1)For the input vacuum region, deﬁned as ρ in n ( r ) = 0, the KS map will return ρ out n ( r ) = 0for the vast majority of the vacuum (this is not true close to the interface), leadingto a zero residual. That is to say, the initial guess is exactly correct for most of thevacuum region, therefore treating the vacuum with an exact dielectric will contributelittle to the scheme, seen clearly in Eq . (4.1). Ill-conditioning due to the vacuum regionis therefore equivalent to ill-conditioning due to an increased unit cell size. In order forthe preconditioner to suppress this form of ill-conditioning, a diﬀerent approach wouldbe required to that presented in § § .It is clear from the above analysis that including a density dependence in the susceptibil-ity will not remove ill-conditioning due to increased unit cell size. However, this densitydependence is expected to provide a better treatment of the dielectric for regions wherethe initial guess is incorrect, thus requiring an accurate dielectric. The parametrisationof the density dependence studied will be that of the ITF susceptibility model, whereFig . (4.12) displays ITF preconditioned linear mixing against Kerker preconditioned lin-ear mixing. As expected, there is much more variance in the iterations to converge thanin the VS study. Particularly, the two largest improvements in Fig . (4.12) for the ITFmodel (referring to nodes in the upper right of the ITF region) are graphene and anAluminium surface, both containing a vacuum gap. Over the entire test suite, however,the ITF susceptibility does not provide a better model treatment of the exact dielectricthan the Kerker form. This conclusion remains the same when the ITF susceptibilitymodel is used to precondition accelerated mixing schemes such as Pulay mixing, Fig . (4.13). When utilising an accelerated scheme, the preconditioners perform comparably,but the ITF parametrisation clearly leads to a systematic, net decrease in eﬃciency.This is likely due to the heuristic nature with which Thomas-Fermi theory was extendedfor inhomogeneous systems. However, as discussed, both models for ψ presented here This statement highlights the diﬀerence between numerical ill-conditioning and ‘physically sourced’ill-conditioning. That is, the exact dielectrics of a primitive cell and supercell of some material areformally equivalent. However, the supercell would still require more iterations to converge as a result ofwhat is referred to here as ‘numerical ill-conditioning’. . Results S C F C y c l e s t o C o n v e r g e : K e r k e r P r e c o n d i t i o n e d L i n e a r M i x i n g Kerker WinsITF Wins

Figure 4.12:

Comparison of Kerker preconditioned linear mixing to ITF precondi-tioned linear mixing over the test suite. are preliminary in nature, and were intended to demonstrate the framework, ratherthan give a nuanced model treatment of the dielectirc. As such, an immediate avenuefor further work will be to derive a rigorous, well-motivated model for the real spacesusceptibility (see § . (3.27), i.e . optimise α and β manually. Fig . (4.14) illustrates the result of this parameter analysis, whereby setting α = 0 . β = 0 . . To conclude, boththe VS and ITF models presented here lacked the depth required in order to produce animprovement over Kerker preconditioning. However, the framework was demonstratedto work, and the potential of the framework to provide an improved peak performancewhen both preconditioners are optimised was demonstrated. Although, a note regarding on-the-ﬂy determined parameters will be given in § . Results S C F C y c l e s t o C o n v e r g e : K e r k e r P r e c o n d i t i o n e d P u l a y M i x i n g Kerker Pulay WinsITF Pulay Wins

Figure 4.13:

Comparison of Kerker preconditioned Pulay mixing to ITF precondi-tioned linear mixing over the test suite. -8 -7 -6 -5 -4 -3 -2 -1

0 5 10 15 20 25 R e s i d u a l SCF cycle Kerker PreconditionedITF PreconditionedDensity scaled: α =0.1, β =0.5Kerker: α =0.4, |G |=0.6 Figure 4.14:

Comparison of the convergence behaviour for an Aluminium surfaceusing Pulay mixing with various preconditioners. First, Pulay mixing is preconditionedwith default Kerker form and the ITF form. Second, both Kerker and the density scaledsusceptibility model have optimised parameters for the input system. onclusion

This thesis has described and presented three separate approaches that attempt to im-prove eﬃciency and reliability in which self-consistency can be attained in KS DFTimplementations. These approaches were implemented within the planewave pseudopo-tential electronic structure software, castep . The default methods of castep were usedas a benchmark for the proposed improvements of this thesis.First, a multisecant variant of Broyden’s methods for non-linear root ﬁnding were con-sidered, referred to as MSB1 and MSB2. These schemes, based on the work of Marks &Luke, were made suitable for numerical implementation by the introduction of an appro-priate parametrisation (including regularisation and normalisation). Most importantly,the scheme was adapted for complex electron densities and dielectric preconditioning.With these augmentations in place, the methods were revealed to be insuﬃcient in pro-viding a systematic improvement over the best-performing default method in castep –Pulay mixing. However, MSB2 mixing was only slightly less eﬃcient than Pulay mixing,and notably outperformed Broyden mixing as implemented in castep . This is a proofof concept for the work of Marks & Luke, but the methods remain unable to improveon the current standard without further adaptation. Second, the Periodic Pulay mix-ing scheme of Ref . [14] was implemented. This scheme deﬁnes quite a general conceptin numerical analysis, namely, how to construct the iterative subspace such that theDIIS is best utilised [78]. It was revealed that Periodic Pulay mixing could provide animprovement in eﬃciency over Pulay mixing when two linear steps are added to thehistory per DIIS step. However, this resulted in a far less robust scheme than Pulaymixing, as the hindrance from taking the linear steps caused divergence for increasinglyill-conditioned input systems. Nonetheless, there is promise for Periodic Pulay mixingif a rigorous study on how best to utilise the concept can be done, as discussed in Ref . [14]. 69. Conclusion

The real space susceptibility models presented in this thesis were preliminary and lack-ing the sophistication required to improve the eﬃciency of density mixing. As such,immediate future work will be to derive a rigorous, well-motivated model for the realspace susceptibility that can be used to precondition the self-consistent cycles. Further-more, Kresse et al . in Ref . [46] detail how an on-the-ﬂy optimisation of the linear scalingparameter, α , can be done utilising the iterative subspace of densities. This will beimplemented within castep , and on-the-ﬂy updates of other parameters will be studiedin a similar fashion.In an eﬀort to implement the GW approximation in castep , exact methods for calcu-lating the KS susceptibility have also been implemented [79]. In this implementation,the KS susceptibility is constructed either using numerical derivatives, or alternativelyby use of the Sternheimer equation. Therefore, these exact computations of the KSsusceptibility will be adapted to precondition the SCF cycles of density mixing. In do-ing so, an up-to-date analysis of the eﬀectiveness of exact dielectric preconditioning canperformed.Throughout this work spin was ignored. Density mixing for spin-polarised input systemsis far more poorly dealt with than in charge density mixing. The reason for this isthat the linear response function for spin densities is entirely deﬁned by the exchange-correlation contribution to the KS Hamiltonian. However, the current preconditioningstrategy is to simply apply the Kerker preconditioner, constructed with no reference. Conclusion erivation of Newton’s Method

Newton’s method for ﬁnding x ∗ such that f ( x ∗ ) = 0 starting from an initial guess x willbe derived. Firstly, the assumption that x is close to the root will be made, such thatthe addition of a small perturbation vector e ∈ R n yields the root, f ( x + e ) = 0. Thegoal is to therefore ﬁnd the e that satisﬁes this equation, or in practice, an e that willdrive subsequent values of x closer to satisfying it. To ﬁnd this e , the Taylor expansionof f i ( x + e ) about x is computed, f i ( x + e ) = f i ( x ) + (cid:88) j ∂f i ( x ) ∂x j h j + O ( || e || ) , (A.1) f ( x + e ) = f ( x ) + J f ( x ) e + O ( || e || ) . (A.2)where J f ( x ) is the Jacobian matrix of f at x – J f ( x ) ij = ∂f i ∂x j . The larger || e || is, thefurther away the Taylor expansion will deviate from the exact evaluation of the function.Ideally, the initial e that is calculated will bring x much closer to the root (provided itwas ‘close’ to the root to begin with), but will over or undershoot due to the ﬁrst ordernature of the Taylor expansion. Taylor expansions are then repeatedly done as e (theabsolute error in x − x ∗ ) tends to zero. This deﬁnes the self-consistent process to ﬁnd x ∗ from a suitable initial value, x n +1 = x n + e . (A.3)Where e is obtained from Eq . (A.2) via f ( x + e ) = = ⇒ e = − J − f ( x ) f ( x ) , (A.4)producing x n +1 = x n − J − f ( x ) f ( x ) . (A.5)This the famous Newton-Raphson update formula.72 est Suite Many of the materials presented here have been collated from literature on ill-conditionedKS systems. Hence, further motivation for the set of systems to follow can be found in[10, 12, 14, 47, 70, 77, 80]. However, key features of interest and potential sources ofill-conditioning are listed below each input.1.

Aluminium . • Metallic2.

Aluminium Surface . • Inhomogeneous (10˚A vacuum gap) • Interface of metallic and insulating regions3.

Aluminium × × Supercell . • Large, asymmetric unit cell4.

Argon • Wide-gap insulator5.

Gallium Arsenide • Inhomogeneous • Semiconductor6.

Graphene • Semimetal • Inhomogeneous (10˚A vacuum gap)7.

Isolated Oxygen • Localised 73ppendix B. Test Suite 74 • Inhomogeneous (10˚A vacuum gap in all dimensions)8.

AIRSS Output: Potassium, Phosphorous, and Tin • Atomic conﬁguration far-from-equilibrium9.

AIRSS Output: Lithium and Copper • Atomic conﬁguration far-from-equilibrium • Contains a transition metal10.

AIRSS Output: Titanium and Potassium • Atomic conﬁguration far-from-equilibrium • Contains a transition metal11.

Magnesium Oxide • Prototypical polar oxide • Wide-gap insulator12.

Palladium • Transition metal13.

Silicon • Prototypical simple semiconductor14.

Strontium • Metal15.

Titanium Oxide • Transition metal oxide16. • Metallic • Inhomogeneous (10˚A vacuum gap in two dimensions) eferences [1] R. M. Martin. Electronic Structure: Basic Theory and Practical Methods.

Cam-bridge University Press; 1st edition , (2008).[2] R. A. Friesner. Ab initio quantum chemistry: Methodology and applications.

Proc.Nat. Academy. Sci. 102, 19 , (2005).[3] M. Sob et al . The role of ab initio electronic structure calculations in studies of thestrength of materials.

Mat. Sci. Eng. A, 387, 148-157 , (2004).[4] G. Hagen et al . Coupled-cluster computations of atomic nuclei.

Reports on Progressin Physics, 77, 9 , (2014).[5] P. Hasnip et al . Density functional theory in the solid state.

Phil. Trans. R. Soc.A 372: 20130270 , (2014).[6] D. R. Bowler. Density functional theory: a tale of success in three codes.

J. Phys.Cond. Mat., 28, 42 , (2016).[7] K. Burke. Perspective on density functional theory.

J. Chem. Phys. 136, 150901 ,(2012).[8] K. Lejaeghere et al . Reproducibility in density functional theory calculations ofsolids.

Science, 351, 6280 , (2016).[9] C.-K. Skylaris et al . Introducing ONETEP: Linear-scaling density functional sim-ulations on parallel computers.

J. Chem. Phys., 122, 084119 , (2005).[10] P. M. Anglade and X. Gonze. Preconditioning of self-consistent-ﬁeld cycles indensity-functional theory: The extrapolar method.

Phys. Rev. B 78, 045126 , (2008).[11] S. J. Clark et al . First principles methods using CASTEP.

Z. Kristallogr. 220,567–570 , (2005).[12] L. D. Marks and D. R. Luke. Robust mixing for ab initio quantum mechanicalcalculations.

Phys. Rev. B 78, 075114 , (2008).75 eferences et al . WIEN2K, An Augmented Plane Wave + Local OrbitalsProgram for Calculating Crystal Properties.

Technische Universit¨at Wien, Wien ,(2001).[14] A. S. Banerjee et al . Periodic Pulay method for robust and eﬃcient convergenceacceleration of self-consistent ﬁeld iterations.

Chem. Phys. Lett 647, 31–35 , (2016).[15] A. Georges et al . Dynamical mean-ﬁeld theory of strongly correlated fermion sys-tems and the limit of inﬁnite dimensions.

Rev. Mod. Phys. 68, 13 , (1996).[16] R. M. Martin et al . Interacting Electrons: Theory and Computational Approaches.

Cambridge University Press , (2016).[17] L. Hedin. New Method for Calculating the One-Particle Green’s Function withApplication to the Electron-Gas Problem.

Phys. Rev. 139, A796 , (1965).[18] E. E. Salpeter and H. A. Bethe. A Relativistic Equation for Bound-State Problems.

Phys. Rev. 84, 1232 , (1951).[19] D. Cremer. From conﬁguration interaction to coupled cluster theory: The quadraticconﬁguration interaction approach.

WIREs Comput. Mol. Sci., 3: 482-503 , (2013).[20] P. H. Acioli. Review of quantum Monte Carlo methods and their applications.

Journal of Molecular Structure, 394 3, 75-85 , (1998).[21] D. Cremer. Møller–Plesset perturbation theory: from small molecule methods tomethods for thousands of atoms.

WIREs Comput. Mol. Sci. 1 509–530 , (2011).[22] P. Hohenberg and W. Kohn. Inhomogeneous Electron Gas.

Phys. Rev. 136, B864 ,(1964).[23] M. Levy et al . Universal variational functionals of electron densities, ﬁrst-orderdensity matrices, and natural spin-orbitals and solution of the v-representabilityproblem.

PNAS, 76, 12 , (1979).[24] T. L. Gilbert. Hohenberg-Kohn theorem for nonlocal external potentials.

Phys.Rev. B 12, 2111 , (1975).[25] K. Capele. A bird’s-eye view of density-functional theory. arXiv:cond-mat/0211443 ,(2006).[26] W. Kohn and L. J. Sham. Self-Consistent Equations Including Exchange and Cor-relation Eﬀects.

Phys. Rev. 140, A1133 , (1965).[27] A. Schindlmayr and R. W. Godby. Density-functional theory and the v-representability problem for model strongly correlated electron systems.

Phys. Rev.B 51, 10427 , (1995). eferences

Mol. Phys. 114, 7-8 , (1986).[29] J. F. Janak. Proof that dE/dn i = (cid:15) i in density-functional theory. Phys. Rev. B 18,7165 , (1978).[30] J. P. Perdew M. Levy and V. Sahni. Exact diﬀerential equation for the density andionization energy of a many-particle system.

Phys. Rev. A 30, 2745 , (1984).[31] M. T. Entwistle et al . Local density approximations from ﬁnite systems.

Phys. Rev.B 94, 205134 , (2016).[32] J. P. Perdew and K. Schmidt. Jacob’s ladder of density functional approximationsfor the exchange-correlation energy.

AIP Conf. Proc. 577, 1 , (2001).[33] D. Vanderbilt N. Marzari and M. C. Payne. Ensemble Density-Functional Theoryfor Ab Initio Molecular Dynamics of Metals and Finite-Temperature Insulators.

Phys. Rev. Lett. 79 7 , (1997).[34] R. Smoluchowski L. P. Bouckaert and E. Wigner. Theory of Brillouin zones andsymmetry properties of wave functions in crystals.

Phys. Rev. 50, 58 , (1936).[35] V. Heine. The Pseudopotential Concept.

Solid State Physics 24, 1-36 , (1970).[36] G. Kresse and J. Furthmuller. Eﬃciency of ab-initio total energy calculations formetals and semiconductors using a plane-wave basis set.

Comp. Mat. Sci. 6, 1 ,(1996).[37] X. Gonze et al . First-principles computation of material properties : the ABINITsoftware project.

Comp. Mat. Sci. 25, 478-492 , (2002).[38] P. Giannozzi et al . QUANTUM ESPRESSO: a modular and open-source softwareproject for quantum simulations of materials.

J. Phys. Cond. Mat., 21, 39 , (2009).[39] E. R. Davidson. The iterative calculation of a few of the lowest eigenvalues andcorresponding eigenvectors of large real symmetric matrices.

J. Comput. Phys. 17,87-94 , (1975).[40] Private communication with Stewart Clark. (2017).[41] M. Smith. Accelerating ﬁrst principles electronic structure calculations.

First YearPhD Report (Unpublished) , (2017).[42] A. C. Faul. A Concise Introduction to Numerical Analysis.

Chapman and Hal-l/CRC , (2016).[43] P. H. Dederichs and R. Zeller. Self-consistency iterations in electronic-structurecalculations.

Phys. Rev. B 28, 5462 , (1983). eferences

Math. Comp. 19, 577-593 , (1965).[45] P. Pulay. Convergence acceleration of iterative sequences. the case of scf iteration.

Chem. Phys. Lett. 73, 2 , (1980).[46] G. Kresse and J. Furthm¨uller. Eﬃcient iterative schemes for ab initio total-energycalculations using a plane-wave basis set.

Phys. Rev. B, 54, 11169 , (1996).[47] L. Lin and C. Yang. Elliptic preconditioner for accelerating the self consistent ﬁelditeration in Kohn-Sham density functional theory. arXiv:1206.2225 , (2012).[48] G. P. Srivastava. Broyden’s method for self-consistent ﬁeld convergence acceleration.

J. Phys. A. 17, 6 , (1984).[49] D. Vanderbilt and S. G. Louie. Total energies of diamond (111) surface recon-structions by a linear combination of atomic orbitals method.

Phys. Rev. B. 30,6118-6130 , (1984).[50] V. Eyert. A comparative study on methods for convergence acceleration of iterativevector sequences.

J. Comp. Phys. 124, 271-285 , (1996).[51] D. D. Johnson. Modiﬁed Broyden’s method for accelerating convergence in self-consistent calculations.

Phys. Rev. B, 38, 12807–12813 , (1988).[52] D. A. Knoll et al . Jacobian-free Newton–Krylov methods: a survey of approachesand applications.

J. Comp. Phys. 193, 2, 357-397 , (2004).[53] Y. Saad and M.H. Schultz. GMRES: A generalized minimal residual algorithm forsolving nonsymmetric linear systems.

J. Sci. Stat. Comput., 7, 856-869 , (1986).[54] D. G. Anderson. Iterative Procedures for Nonlinear Integral Equations.

Journal ofthe ACM 12, 4 , (1965).[55] J. Harl. The linear response function in density functional theory.

PhD Thesis(unpublished) , (2008).[56] P. Ghosez et al . Long-wavelength behavior of the exchange-correlation kernel in theKohn-Sham theory of periodic systems.

Phys. Rev. B. 56, 12811-12817 , (1997).[57] S. L. Adler. Quantum theory of the dielectric constant in real solids.

Phys. Rev.126, 413-420 , (1962).[58] N. Wiser. Dielectric constant with local ﬁeld eﬀects included.

Phys. Rev. 46, 1002-1011 , (1963). eferences

Phys. Rev. B. 25, 4260-4262 , (1982).[60] A. Sawamura and M. Kohyama. A Second-Variational Prediction Operator for FastConvergence in Self-Consistent Electronic Structure Calculations.

Mater. Trans. 45,1422 , (2004).[61] J. Auer and E. Krotscheck. A rapidly converging algorithm for solving the Kohn-Sham and related equations in electronic structure theory.

Comput. Phys. Com.118, 139 , (1999).[62] L. Lin et al.

Adaptively Compressed Polarizability Operator for Accelerating LargeScale Ab Initio Phonon Calculations.

Multis. Mod. Sim. 15, 29-55 , (2017).[63] G. P. Kerker. Eﬃcient iteration scheme for self-consistent pseudopotential calcula-tions.

Phys. Rev. B. 23, 6 , (1981).[64] M. Mannien et al . Electrons and positrons in metal vacancies.

Phys. Rev. B 12,4012 , (1975).[65] N. W. Ashcroft and N. D. Mermin. Solid State Physics.

Thomson Learning,Toronto , (1976).[66] Z. H. Levine and S. G. Louie. New model dielectric function and exchange-correlation potential for semiconductors and insulators.

Phys. Rev. B. 25, 10 ,(1982).[67] D. R. Penn. New model dielectric function and exchange-correlation potential forsemiconductors and insulators.

Phys. Rev. 128, 2093 , (1962).[68] Y. Zhou et al . On the applicability of Kerker precoditioning scheme to theself-consistent density functional theory calculations of inhomogeneous systems. arXiv:1707.00848 , (2017).[69] R. Resta. Thomas-Fermi dielectric screening in semiconductors.

Phys. Rev. B 16,2717 , (1977).[70] Y. Shiihara et al . Real-space Kerker method for self-consistent calculation usingnon-orthogonal basis functions.

Modelling and Simulation in Materials Science andEngineering, 16, 3 , (2008).[71] H. Khalil et al . Superfast solution of Toeplitz systems based on syzygy reduction. arXiv:1301.5798 , (2013). eferences et al . Band Parallelism in CASTEP: Scaling to More Than 1000 Cores.

Cray User Group 2009 Proceedings, 1-5 , (2009).[73] Y. Saad. Iterative Methods for Sparse Linear Systems.

Society for Industrial andApplied Mathematics; 2 edition , (2003).[74] D. A. Scherlis et al . A uniﬁed electrostatic and cavitation model for ﬁrst-principlesmolecular dynamics in solution.

J. Chem. Phys 124, 074103 , (2006).[75] C. J. Pickard and R. J. Needs. Ab initio random structure searching.

Journal ofPhysics: Condensed Matter, 23, 5 , (2011).[76] J. P. Perdew et al . Generalized Gradient Approximation Made Simple.

Phys. Rev.Lett. 77, 3865 , (1996).[77] P. Hasnip and M. I. J. Probert. Auxiliary density functionals: a new class of meth-ods for eﬃcient, stable density functional theory calculations. arXiv:1503.01420 ,(2015).[78] P. P. Pratapa et al . Anderson acceleration of the Jacobi iterative method: Aneﬃcient alternative to Krylov methods for large, sparse linear systems.

J. Comp.Phys., 306, 43-54 , (2016).[79] Private communication with Vincent Sacksteder. (2017).[80] D. Raczkowski, A. Canning, and L. W. Wang. Thomas-Fermi charge mixing forobtaining self-consistency in density functional calculations.