[PDF] Complexity Reduction in Large Quantum Systems: Reliable Electrostatic Embedding for Multiscale Approaches via Optimized Minimal Basis Functions

Abstract

Given a partition of a large system into an active quantum mechanical (QM) region and its environment, we present a simple way of embedding the QM region into an effective electrostatic potential representing the environment. This potential is generated by partitioning the environment into well defined fragments, and assigning each one a set of electrostatic multipoles, which can then be used to build up the electrostatic potential. We show that, providing the fragments and the projection scheme for the multipoles are chosen properly, this leads to an effective electrostatic embedding of the active QM region which is of equal quality as a full QM calculation. We coupled our formalism to the DFT code BigDFT, which uses a minimal set of localized in-situ optimized basis functions; this property eases the fragment definition while still describing the electronic structure with great precision. Thanks to the linear scaling capabilities of BigDFT, we can compare the modeling of the electrostatic embedding with results coming from unbiased full QM calculations of the entire system. This enables a reliable and controllable setup of an effective coarse-graining approach, coupling together different levels of description, which yields a considerable reduction in the degrees of freedom and thus paves the way towards efficient QM/QM and QM/MM methods for the treatment of very large systems.

Full PDF

aa r X i v : . [ phy s i c s . c h e m - ph ] J u l Complexity Reduction in Large Quantum Systems: Reliable Electrostatic Embeddingfor Multiscale Approaches via Optimized Minimal Basis Functions

Stephan Mohr, Michel Masella, Laura E. Ratcliﬀ,

3, 4 and Luigi Genovese

5, 6 Barcelona Supercomputing Center (BSC) ∗ Laboratoire de Biologie Structurale et Radiologie,Service de Bio´energ´etique, Biologie Structurale et M´ecanisme,Institut de Biologie et de Technologie de Saclay,CEA Saclay, F-91191 Gif-sur-Yvette Cedex, France Argonne Leadership Computing Facility, Argonne National Laboratory, Illinois 60439, USA Now at Department of Materials, Imperial College London, SW7 2AZ, United Kingdom Univ. Grenoble Alpes, INAC-MEM, L Sim, F-38000 Grenoble, France CEA, INAC-MEM, L Sim, F-38000 Grenoble, France † (Dated: September 14, 2018)Given a partition of a large system into an active quantum mechanical (QM) region and itsenvironment, we present a simple way of embedding the QM region into an eﬀective electrostaticpotential representing the environment. This potential is generated by partitioning the environmentinto well deﬁned fragments, and assigning each one a set of electrostatic multipoles, which canthen be used to build up the electrostatic potential. We show that, providing the fragments andthe projection scheme for the multipoles are chosen properly, this leads to an eﬀective electrostaticembedding of the active QM region which is of equal quality as a full QM calculation. We coupled ourformalism to the DFT code BigDFT , which uses a minimal set of localized in-situ optimized basisfunctions; this property eases the fragment deﬁnition while still describing the electronic structurewith great precision. Thanks to the linear scaling capabilities of

BigDFT , we can compare themodeling of the electrostatic embedding with results coming from unbiased full QM calculationsof the entire system. This enables a reliable and controllable setup of an eﬀective coarse-grainingapproach, coupling together diﬀerent levels of description, which yields a considerable reduction inthe degrees of freedom and thus paves the way towards eﬃcient QM/QM and QM/MM methodsfor the treatment of very large systems.

I. INTRODUCTION

Density Functional Theory (DFT) calculations basedon the Kohn-Sham formalism allow the treatment ofconsiderably larger systems than other ﬁrst principlesapproaches. However the intrinsic cubic scaling of thismethod still limits the size of the systems which can bemodeled to typically some hundred atoms. With the in-troduction of linear scaling algorithms this limit canbe considerably extended, and calculations up to severalthousand atoms can nowadays be done in a routine way .Nevertheless there are situations where it is desirable —either due to performance issues or for conceptual con-siderations — to apply an eﬀective complexity reduction(ECR), meaning that parts of the system are treated ata coarser level of theory. In this context a natural ap-proach is to “split” the system into an active quantummechanical (QM) region, which is treated at a full QMlevel of theory, and an environment, which is treated atlower computational cost. From now on, we will denotesuch a setup as an embedding of the QM region into theenvironment.Similar in spirit, but still conceptually diﬀerent arethe various ECR methods based on a fragmentation ofa big QM system. Here the system is not only split intoan active region and an environment, but rather parti-tioned into several fragments, which are each individuallytreated on a strict ab-initio level, but which are mutually interacting in a simpliﬁed way. The various ECR meth-ods based on such a fragmentation usually only diﬀer inthe way in which they deal with the problem of describingthe mutual interactions between the fragments.One of the most popular methods for such a simpliﬁedinteraction between the fragments of a large QM systemis the Fragment Molecular Orbital (FMO) approach ,which assigns the electrons to some user-deﬁned frag-ments and then solves the electronic structure problemfor each fragment, taking into account the electrostaticpotential generated by the other fragments. Very simi-lar in spirit is the X-Pol method , where each one ofthe fragments is again treated with electronic structuretheory, whilst the interactions among them are handledby a combined quantum mechanics/molecular mechanics(QM/MM) approach. Another method is the MolecularTailoring Approach (MTA) , where a large systemis decomposed — either manually or automatically, de-pending on geometrical criteria and chemical intuition— into small overlapping fragments, for which individ-ual QM calculations are performed and eventually com-bined to get the desired quantities for the entire system.Finally we also mention subsystem DFT , which isconceptually diﬀerent as it is based on a fragmentationof the electronic charge density and not of the atoms ofthe system. Moreover, within this approach an eﬀectiveembedding potential — which is exact under certain con-ditions, i.e. it is equivalent to a traditional Kohn-ShamDFT calculation of the entire system — arises in a nat-ural way. All of these fragmentation approaches allowquasi-ﬁrst-principles calculations up to the order of onemillion atoms , and therefore oﬀer interesting advan-tages for the quantum mechanical treatment of large-scale systems.Nevertheless, these fragmentation methods are con-ceived to simplify the full ab-initio calculation of a bigQM system, i.e. they aim at treating the entire systemat the same level of theory. This is in contrast to em-bedding approaches, which use various levels of theorywithin a single calculation, thus paving the way towardscoarse grain models which can be used within multiscaleQM/MM simulations, able to tackle systems that are or-ders of magnitude larger than those accessible with tra-ditional quantum approaches . Among others, we quotehere the QM/MM methods detailed in Refs. 23–26.Obviously the introduction of an ECR raises the ques-tion of the pertinence of this approximation, and it wouldbe desirable to dispose of an eﬀective measurement ofhow the given ECR aﬀects the ab-initio character of thecalculation. In the context of ECRs based on embed-ding, the accuracy and reliability depends on two factors:Firstly on how the splitting into the QM region and en-vironment is done, and secondly on the way in which theenvironment and the interactions with the QM region aretreated.In this paper we will focus on the quality of the em-bedding potential, proposing a simple framework for aneﬀective electrostatic embedding of the active QM regioninto an environment. Since it is an embedding and nota fragmentation method, we clearly distinguish betweenthe active QM region and the embedding (environmen-tal) region, and thus use diﬀerent levels of theory for theregions. This clear distinction allows us to focus on theessential parts of the system and thus to use the availablecomputational resources in an eﬃcient way.Our embedding potential is based on a multipole ex-pansion of the electrostatic potential of the environment.To derive these multipoles, we come back to the idea ofpartitioning the entire systems into fragments. To eachof those well deﬁned fragments we assign a set of par-tial and static electrostatic charges, like those playing akey role in standard MM force ﬁelds (see among oth-ers Ref. 27), which can also be used to quantify chargetransfer eﬀects among important moieties of a molecu-lar system and even to get inputs for particular kindsof quantum mechanical methods, like constrained DFTcalculations . These electrostatic charges — possiblyenhanced by higher multipoles — of each subsystem canthen be employed to deﬁne an electrostatic environmentwhere the QM active region can be embedded.Obviously the quality of the electrostatic embeddingis directly related to the quality of the fragment mul-tipoles. Deriving a general and reliable way of parti-tioning an arbitrary system into a set of fragments is avery complex task. However, in a previous publicationwe derived a simple way of determining in a quantita- tive way whether a chosen fragmentation is reasonable,and whether the associated multipoles can be consid-ered as meaningful “pseudo-observables” with an inter-pretable physico-chemical meaning . In this paper wenow demonstrate that fragments which fulﬁll this condi-tion lead to an accurate embedding potential that reachesthe same quality as a full QM treatment of the entire sys-tem.The determination of the correct fragmentation andcalculation of the associated multipoles is based on a fullQM calculation of the entire system and thus does notdirectly provide any speedup compared to a straightfor-ward full QM approach. However, our method allows usto reliably address the problem of deﬁning an electro-static embedding which is consistent with the unbiasedfull QM calculation. In other terms, we can use the fullQM calculations to validate a posteriori whether an em-bedding based on environment fragments is meaningfulor not. Consequently, the existence of a linear scalingapproach which is capable of treating large systems iscrucial for this validation procedure. Fortunately, the BigDFT code into which we implemented our embed-ding procedure exhibits such a linear scaling mode ,thus enabling us to perform calculations involving thou-sands of atoms in relatively little time. Once this val-idation is done, we can then use the embedding frame-work to perform computationally more demanding simu-lations, in this way accessing new quantities which wouldotherwise be out of reach.The outline of this paper is as follows. In Sec. II weshow how we can create from a set of fragment multi-poles an electrostatic potential into which a QM systemcan be embedded. In Sec. III we then show various ex-amples to validate and apply this embedding approach,which can then be used for forthcoming applications. InSec. III A we demonstrate that the behavior of a largebulk system can be exactly reproduced, and show howthe embedding allows one to use expensive hybrid func-tionals which would be out of reach for a straightforwardfull QM calculation; and in Sec. III B we then discussin detail the correct setup for a solvation embedding ofa large complex system, demonstrating the necessity ofincluding a small shell of explicit solvent. II. SUBSYSTEM REPRESENTATION VIA THEELECTROSTATIC POTENTIALA. Fragmentation identiﬁcation and multipoleassignment

The goal of this work is to show how a set of (pseudo-)observable fragment multipoles can be used for an elec-trostatic embedding of a QM region and thus lead to anECR. We presented in Ref. 32 a detailed discussion ofhow the fragments can be identiﬁed and the set of asso-ciated multipoles calculated; we provide here for consis-tency a brief summary of the most important results.A complete description of a QM system is given by thedensity matrix operator ˆ F , as the measurement of anyobservable ˆ O can be written as Tr( ˆ F ˆ O ). For KS-DFT,where the many-body wave function | Ψ i giving rise to ˆ F is given by a single Slater determinant, the density matrixis idempotent, i.e. ˆ F = ˆ F . For a QM system which isseparable into reasonably “independent” fragments, wewant to deﬁne, in an analogous way, a fragment densitymatrix ˆ F F , such that a fragment observable can then beevaluated as Tr( ˆ F F ˆ O ). In such a case, it is reasonable toassume that the fragment density matrix can be writtenas ˆ F F = ˆ F ˆ W F , (1)where ˆ W F is a projection operator onto the fragment.To proceed further, we assume that the density matrixand the fragment projector can both be written in termsof a set of localized basis functions | φ α i :ˆ F = X α,β | φ α i K αβ h φ β | , (2)ˆ W F = X µ,ν | φ µ i R F µν h φ ν | , (3)where the | φ α i are from now on called support functions ,the matrix K the kernel , and the matrix R deﬁnes thecharacter of the projection. In this way the above men-tioned idempotency translates in a purity condition forthe kernel, i.e. KSK = K , with S αβ = h φ α | φ β i .If the fragment were indeed an independent subsystem,the associated density matrix should also be idempotent,i.e. ( ˆ F F ) = ˆ F F . For improper fragment deﬁnitions thiscondition will however be violated. In order to measurethe “fragment property” of a subsystem, we thus considerthe so-called purity indicator , deﬁned byΠ = 1 q Tr (cid:18)(cid:16) ˆ F F (cid:17) − ˆ F F (cid:19) = 1 q Tr (cid:18)(cid:16) KS F (cid:17) − KS F (cid:19) , (4)where q is the total number of electrons of the isolatedfragment in gas phase and S F ≡ SR F S .In what follows, we are interested in characterizingthe fragments via their electrostatic multipole moments.Given a charge density ρ ( r ), its multipole moments aregiven by the expression Q Rℓm ≡ r π ℓ + 1 Z S ℓm ( r − r R ) ρ ( r ) d r = r π ℓ + 1 Tr (cid:16) ˆ F ˆ S Rℓm (cid:17) = Tr (cid:0) KP Rℓm (cid:1) , (5)where we have deﬁned the multipole matrices P Rℓm as P Rℓm ; αβ = r π ℓ + 1 h φ α | ˆ S Rℓm | φ β i , (6) and ˆ S Rℓm ( r ) ≡ S ℓm ( r − r R ) are the solid harmonic op-erators centered on the reference position r R . As is ex-plained in more detail in Ref. 32, the electrostatic multi-poles of a fragment can be written in a similar way as Q F ℓm ≡ Tr(

KSR F P F ℓm ) . (7)A popular choice for the fragments are the individualatoms, leading to atomic multipoles Q Aℓm . In this casethe above deﬁnition can be seen as a generalization ofthe well-known Mulliken and L¨owdin charge populationanalyses, and we can associate speciﬁc projector matricesto these traditional approaches, as is demonstrated inRef. 32.If we are interested in the multipole moments for afragment which is composed of various sub-fragments,the projector ˆ W F onto the large fragment can simply bedeﬁned as the sum over the projectors onto the smallfragments. Typically these sub-fragments are the indi-vidual atoms, and we thus get ˆ W F = P A ∈ F ˆ W A . Wemay then combine the atomic multipoles to obtain thecorresponding quantities for the fragments, as is shownin more detail in Ref. 32. For the important case of themonopole and dipole of a fragment the result is very sim-ple and given by Q F = X A Q A , (8a) Q F m = X A r π S m ( r F − r A ) Q A + Q A m . (8b) B. Charge analysis schemes

Suppose that a fragment F has been identiﬁed in a QMcalculation. If we are not interested in the QM informa-tion of this fragment (which is the typical case for solventor environment molecules), we might reduce the complex-ity of the calculation by expressing only the electrostaticpotential generated by this fragment, thereby loweringthe number of atoms within the QM region and conse-quently also the computational cost. To do so we chooseto represent the electrostatic potential via a set of mul-tipoles. Before describing this approach in more detail,we want to give a quick overview over popular schemesfor the determination of the set of multipoles.Loosely speaking the various (atomic) population anal-yses can be divided into three main classes: i) approachesbased on a set of localized atom-centered orbitals, ii)grid-based methods working directly with the electroniccharge density, and iii) methods which determine theatomic charges by ﬁtting them to the electrostatic po-tential.With respect to the ﬁrst class, the most renowned ex-amples are the Mulliken charge population analysis , theL¨owdin population analysis , and approaches like thenatural population analysis (NPA) . The convenienceof all the methods belonging to this ﬁrst class is thatthey work in the subspace of the atomic orbitals wherethe density matrix of the system is represented.Methods belonging to the second class work directlywith the charge density represented on a numerical gridand try to partition it into disjunctive regions, whichare then assigned to the individual atoms. One of themost popular is Bader’s atoms-in-molecule approach ,which deﬁnes the boundary between two atoms as the2D surface through which the charge density has “zeroﬂux”, meaning that it exhibits a local minimum there.We also quote here the Voronoi deformation density ap-proach that assigns the charge of each grid point to theclosest atom, taking into account only geometrical infor-mation and neglecting the nature of the atoms. This isin contrast to the Hirshfeld approach , which partitionsthe charge density on each grid point to the surroundingatoms according to a weight function which is based onthe atomic charge densities, in this way taking into ac-count the nature of the atoms. The shortcoming of theseapproaches is that they can be very sensitive with respectto the resolution of the grid, and moreover they requirea large amount of data, making them potentially veryexpensive. An even more important drawback is thatthey are based on purely geometrical criteria to partitionthe system. Such a partitioning scheme may be doubtfulfrom a QM perspective as only the electrostatic chargedistribution is taken into account.The methods belonging to the third class diﬀer mainlyin the way the ﬁtting procedure is performed to repro-duce the electrostatic properties of a molecular systemon the nodes of a surrounding grid. Popular approachesare the CHELP method , CHELPG and the Merz-Singh-Kollman scheme . We also mention here theapproach derived by Bayly et al. , where the charges arerestrained using a penalty function to avoid unreasonablylarge charges resulting from the ﬁt, as well as the ESPFmethod . In these methods, the original QM informa-tion is encoded in a local function (the electrostatic po-tential), which is identiﬁed beforehand. By constructionthe QM partitioning is thus imposed by the procedure,thereby leaving the appropriateness of the choice of thefragments to the chemical intuition of the user. As a sidenote, we mention that there exist also approaches whichassign charges to locations other than atomic centers, e.g.chemical bonds , to get a more accurate description.In our approach we use a method based on the den-sity matrix of the full system, as brieﬂy summarized inSec. II A. As is shown in Ref. 32, a minimal set of in-situoptimized basis functions allows the use of conceptuallysimple methods such as Mulliken or L¨owdin while stillobtaining reliable and unbiased results. In addition ourmultipoles are evaluated a posteriori , based on a full QMcalculation of the entire system, in this way eliminatingany bias potentially caused by a inappropriate choice ofthe fragments. Moreover the purity indicator of Eq. (4)allows one to determine whether the multipoles can be in-terpreted as physically meaningful “pseudo-observables”. C. Representation of the electrostatic potential

Given the deﬁnition of the fragment F and the associ-ated set of multipoles, we now detail the representationof the electrostatic potential. We choose to express thefragment’s charge density by a sum of localized functionsbased on atom-centered Gaussians: ρ F ( r ) = X A ∈ F ρ A ( r ) ,ρ A ( r ) ≃ e − | r − r A | σ X ℓ,m a ℓ Q Aℓm S Aℓm ( r ) , (9)where the Q Aℓm are the atomic multipoles of the atomsforming the fragment F , and the coeﬃcients a ℓ are de-ﬁned in Appendix A. This expression guarantees thepreservation of the original values of the fragment multi-poles Q F ℓm . The spread σ of each Gaussian is an arbitraryquantity that is associated with the “atomic density”within the fragment. In order to represent the long-rangeelectrostatic properties of the fragment, it is enough toset this value close to the characteristic extension of thecore electron density, in our case speciﬁed by the em-ployed pseudopotentials . The potential V ( r ) A canthen be expressed by solving Poisson’s equation, ∇ V A ( r ) = − πρ A ( r ) ; (10)using our Poisson solver based on interpolating scalingfunctions this calculation is possible for any kind ofboundary conditions, making this approach very ﬂexi-ble. When the Gaussian functions can be considered asa point-charge distribution (for example for fragments lo-cated far from the simulation domain), the potential canbe directly calculated using the analytic formula for themultipole expansion of an electrostatic potential, whichreads, up to quadrupoles, V A ( r ) = − Q A | r − R A | − r T · p A | r − R A | − r T · D A · r | r − R A | , (11)where the relations between p and D on one hand andthe Q ℓm ( ℓ = 1 ,

2) on the other hand are detailed in Ap-pendix B.From the above expressions the electrostatic potentialof the fragment can be calculated in terms of the decom-position of its atom-centered potentials, i.e. V F ( r ) = X A ∈ F V A ( r − r A ) . (12)In a QM/MM approach using an electrostatic or even po-larized embedding , the potential V F ( r ) can thenbe added to the QM Hamiltonian as an additional exter-nal potential. D. Choice of the diﬀusive centers

As the reference quantities in our method are the frag-ment (and not the atomic) multipoles, there might be H O HF H – atomic H – bondthreshold volume error volume error volume error volume error10 − − − − − we report two results, namely ﬁrstly the standard setup where the atomic multipoles are placed on the atoms, andsecondly a modiﬁed setup where the multipoles are placed on the covalent bond. other representations of ρ F diﬀerent from the above whichwould provide the same fragment multipoles, but a dif-ferent electrostatic potential. In other terms, as the frag-ment is in principle not immediately separable in termsof its atomic contributions, other centers might be cho-sen instead of the atomic positions. This fact has to betaken into account if one is interested, for instance, in us-ing such a method to express accurate short-range repre-sentations of V F , like for example electrostatic potentialﬁtting methods.However, we nevertheless also want to assess the cor-rectness of this approximation closer to the atoms. Eventhough we do not perform potential ﬁtting or use othersimilar techniques and thus only expect a qualitativeagreement, we show in Tab. I a quantitative compari-son for three small molecules (H O, HF, H ), namely thevalues of∆ V = R ( V exact ( r ) − V multipoles ( r )) d r R V exact ( r ) d r , (13)where V exact ( r ) is the exact electrostatic potential and V multipoles ( r ) its approximation using the multipole ex-pansion. Since the multipoles are only expected to yielda reasonable representation of the electrostatic potentialfar away from the atoms, we report this diﬀerence forvarious threshold values, meaning that only grid pointswhere the charge density is below this given threshold(and thus far away from the molecule) contribute to theintegrals.As can be seen, for H O and HF the agreement be-tween the exact and approximate potentials is alreadygood rather close to the atoms (relative deviations of afew percent) and indeed becomes even better the smallerthe aforementioned threshold is. However, for H theresults are considerably worse, which is a consequenceof the fact that we place our atomic multipoles exclu-sively on the atoms and not, for instance, on covalentbonds. Both H O and HF have a strong dipole, resultingin considerable atomic partial charges, and the electro-static potential can thus be well represented by placingthe multipoles onto the atoms.For H , on the other hand, the molecular dipole is zerodue to symmetry, and the atomic partial charges are thus r e l a t i v e e rr o r o f t he po t en t i a l i n % distance between multipole centers (Bohr) ∆ V FIG. 1. Relative error of the multipole potential for H (ac-cording to Eq. (13)) as a function of the distance between themultipole centers. The centers are always symmetric with re-spect to the bond center, i.e. a distance of 0 corresponds to asetup where both multipoles are located at the center. zero as well — the charge is rather localized between theatoms. However, since we force the multipoles to be lo-cated on the atoms, the approximation of the electro-static potential shows a considerable deviation from theexact one.This problem can be solved by providing additionalﬂexibility with respect to the choice of the multipolecenters. In Fig. 1 we show the error of Eq. (13) forH as a function of the distance between the two mul-tipole centers, keeping them always symmetric with re-spect to the bond center. It is important to stress thatwe only move the multipoles corresponding to electroniccharge, whereas the monopoles corresponding to the pos-itive counter charge of the nuclei remain at their originalpositions. As can be seen the error decreases as the elec-tronic centers come closer to each other, and becomesvirtually zero as soon as they are overlapping each otherin the middle of the bond. The exact values for this lastcase are also noted in Tab. I. FIG. 2. Visualization of the assembled pentacene molecules. III. ILLUSTRATIVE APPLICATIONS

In the following we show various applications of ourapproach for the eﬀective electrostatic embedding of anactive QM region into the potential generated by a setof fragment multipoles. Using the purity indicator ofEq. (4) we make sure that the fragments are properlychosen and the associated multipoles can thus be consid-ered as reliable pseudo-observables. With respect to theprojector matrix R F , we always use the Mulliken method.In all examples, the density matrix is described by a minimal set of localized in-situ optimized basis functions,as implemented in the DFT code BigDFT . Theuse of a minimal set helps in the proper identiﬁcationof the fragment, whereas the in-situ optimization nev-ertheless guarantees an accurate description of the elec-tronic structure . However, as discussed in detail inRef. 32, by choosing a suitable population analysis —i.e. yielding a low value for the so-called purity indicator— our methodology can also be applied to calculationswith other localized functions, e.g. ﬁxed atomic orbitalsor Gaussians. Unless otherwise stated, the PBE func-tional was used. A. Ordered system replica — the case of bulkpentacene

As a ﬁrst example we take (a portion of) bulk pen-tacene (PEN), as shown in Fig. 2. A single pentacenemolecule has neither a net monopole nor dipole, so thatthe ﬁrst non-zero contribution to the electrostatic poten-tial should come from the rather short-range quadrupoleterm. There is thus no ambiguity in the identiﬁca-tion of the fragment, as the presence of the neighboring molecules introduces only band eﬀects and negligible hy-bridization among the PEN KS orbitals. This systemtherefore represents an interesting playground for a ba-sic QM/QM description; we use here the term QM/QM(instead of QM/MM) since all quantities are extractedfrom QM calculations. a. Validity of the fragment approximation

First weperformed a full QM calculation of the entire assemblyof pentacenes, consisting in total of 191 molecules (6876atoms), which will serve as a reference. The binding en-ergy of the pentacene molecules is found to be 282 meVper fragment unit, conﬁrming our assumption that thereis little direct interaction between the pentacenes, andthat one pentacene can be considered a well deﬁned frag-ment of the entire system. We observe in the bulk system,thanks to the formation of bands, a relatively small re-duction (20 meV) in the HOMO-LUMO energy gap (seeTab. II), expressed as the peak-to-peak distance of thedensity of states (DoS).To analyze such a subsystem identiﬁcation in a morequantitative way, we looked at the purity indicator Πas deﬁned in Eq. (4). Its value for each of the singlepentacene molecules is of the order of 3 · − , whichconﬁrms that it is more than reasonable to consider asingle pentacene molecule as a well-deﬁned fragment ofthe system. b. Embedding of central pentacene molecules To val-idate a QM/QM approach in this system, we then chosea few central molecules and embedded them in the mul-tipole potential generated by the other PEN molecules,by averaging the molecular multipoles found in the refer-ence full QM calculation. In order to generate the properbulk environment, surface PEN have not been consid-ered for the averaging. In this way we were able to re-produce the electronic structure of the full QM results,as shown in Fig. 3, for both one and three pentacenesin the QM region. This similarity between the full QMcalculation for the bulk on the one hand and the gasphase and embedded ones for the individual PENs onthe other hand conﬁrms our initial assumption that suchan electrostatic embedding should work for this system,as the fragments can be easily deﬁned and there is onlylittle interaction between them. Clearly the HOMO andLUMO levels split up when going from one to three pen-tacenes; this eﬀect would continue for even more pen-tacenes, eventually forming the bands that we see forthe bulk calculation. To take into account band ef-fects we considered for the calculation of the gaps the“average HOMO/LUMO energies”; for one pentacenethis corresponds to the ordinary HOMO and LUMOstates, whereas for 3 pentacenes it corresponds to themean of the values { HOMO − , HOMO − , HOMO } and { LUMO , LUMO + 1 , LUMO + 2 } , respectively. Likefor the complete bulk calculations, we observe that alsothese values are slightly lower in embedded phase thanthe corresponding quantities in gas phase. c. Embedding with diﬀerent functionals Now thatwe have validated the electrostatic embedding for this embedded gas phaseLDA PBE PBE0 B3LYP LDA PBE PBE0 B3LYP1 PEN E HOMO -4.75 -4.59 -5.24 -5.02 -4.63 -4.47 -5.12 -4.90 E LUMO -3.52 -3.33 -2.72 -2.76 -3.39 -3.20 -2.59 -2.63 E gap h E HOMO i − . − . − . − . − . − . − . − . h E LUMO i − . − . − . − . − . − . − . − . h E gap i . . . . . . . . h E HOMO i — — — — − . − .

88 — — h E LUMO i — — — — − . − .

64 — — h E gap i — — — — 1 .

21 1 .

25 — —TABLE II. HOMO level, LUMO level and HOMO-LUMO gap in eV for one and three pentacenes, both embedded andisolated, for various functionals. For three pentacenes, we consider, instead of the HOMO and LUMO levels, the averagevalue of { E HOMO − , E HOMO − , E HOMO } and { E LUMO , E LUMO+1 , E LUMO+2 } , respectively, in order to account for band eﬀects;in parentheses we also give the standard deviation. Additionally we also show the values for the full QM bulk calculation, wherethe HOMO and LUMO energies correspond to the top of the two smeared out HOMO and LUMO peaks in Fig. 3, respectively. -7 -6 -5 -4 -3 -2 den s i t y o f s t a t e s energy (eV)bulk1 PEN, gas phase1 PEN, embedded E HOMO E LUMO den s i t y o f s t a t e s energy (eV)bulk3 PEN, gas phase3 PEN, embedded E HOMO E HOMO-1 E HOMO-2 E LUMO+2 E LUMO+1 E LUMO

FIG. 3. Comparison of the density of states for 1 (upperpanel) and 3 (lower panel) pentacenes (calculated with thecubic version of

BigDFT ), together with the density of statesof the entire system (calculated with the linear version of

BigDFT ). For easier comparison, all curves have been nor-malized by dividing by the number of atoms and shifted by∆ E , where ∆ E is deﬁned such that the HOMO level of 1PEN gas phase coincides with the HOMO peak of the full QMbulk calculation. The overall electronic structure for all threesetups is very similar. Additionally we also show the HOMO-LUMO gap of the full QM calculation, which is 1.25 eV. Forall curves a Gaussian smearing with σ = 0 .

05 eV was applied. system — by showing that the electronic structure of thelarge system can be represented well by taking only onemolecule and using the multipoles to generate the exter-nal potential — we can use this technique to calculatequantities which are more easily – if not exclusively – ac-cessible within such a reduced complexity setup. As anexample, we can use more expensive functionals for theQM active region, and in this way perform calculationsusing hybrid functionals, which would be computation-ally extremely expensive — or even out of reach — forthe full QM system containing several thousand atoms.We show in Tab. II the values of the HOMO-LUMO gapfor one and three pentacenes, taking as illustrations thepopular LDA , PBE , PBE0 and B3LYP func-tionals, as implemented in the Libxc library .As can be seen, there is only a small diﬀerence — stillcorresponding to a lowering of the gap by 20 meV —between the gas phase and the embedded calculations,again conﬁrming that the PEN molecules represent idealfragments with only little interaction among each other.However, and more importantly, the hybrid functionalsyield considerably larger values for the gap than LDAand PBE and thus correct the well-known gap under-estimation of these (semi-)local functionals . Such anembedded approach can thus pave a way towards theroutine usage of expensive QM descriptions also for largesystems, where a straightforward calculation would beout of reach. Moreover such an embedding can also beused to couple DFT to even more accurate wave functionmethods . solvent shell 0 ˚A 2 ˚A 4 ˚A 6 ˚A 8 ˚Anumber of shell atoms 0 45 1164 2210 3390purity indicator × . . . . . ✔ ✔ ✔ ✔ ✔ TABLE III. Purity indicator according to Eq. (4), withinthe Mulliken population analysis scheme, for the DNA plusvarious shells of solvent, from 0 ˚A (i.e. the pure DNA) upto 8 ˚A. The values have been multiplied by a factor of onethousand.

B. Disordered and heterogeneous systems — thecase of DNA in water

We have shown in Sec. III A the validation and ben-eﬁts of the electrostatic embedding for a homogeneoussystem where the fragments can easily be identiﬁed bychemical intuition. Now we want to apply this formal-ism to a more complicated system, namely the case ofsolvated DNA. In this setup the interesting part — theDNA — is rather heterogeneous, but the solvent partmight be a good candidate for our fragment multipolescheme. As a speciﬁc example we take a snapshot froman MD simulation — run with Amber 11 and theﬀ99SB force ﬁeld — for a 11 base pair DNA fragment(made only of Guanine and Cytosine nucleotides) beingembedded into a sodium-water solution, giving in total15,613 atoms. The system is depicted in Fig. 4.

1. Necessity of explicit solvation

It is known that important quantities of biological sys-tems – such as the HOMO-LUMO gap — are very sensi-tive to a proper setup of the electrostatic environment .However, as is shown in Ref. 32, the DNA environmentconstituents — i.e. the Na atoms and the water molecules— exhibit quite low values for the purity indicator ofEq. (4) and can thus be considered as well-deﬁned frag-ments. Consequently we might nevertheless ask howmuch the electrostatic environment they generate actu-ally inﬂuences the properties of the DNA and to whatextent it is important to include many explicit solventatoms in the calculation, i.e. how big the active QM re-gion should be . To shed light on this, we present inTab. III the purity indicator Π from Eq. (4) for the DNAplus various solvent shells from 0 ˚A to 8 ˚A. All valuesare very small, indicating that all setups represent sensi-ble fragment choices. Unsurprisingly, the values becomeeven smaller the larger the shell is; however, there is aclear drop when going from 2 ˚A to 4 ˚A, indicating thatthe interaction of the solvated DNA with its environmentdecreases considerably at that size. a. References from complete QM calculations In or-der to further investigate the inﬂuence of the explicit sol-vent on the internal QM region, we analyzed the partialcharges and dipole moments of the DNA nucleotides as a function of the quantity of surrounding solvent, doingcalculations for solvation shells of 0 ˚A, 2 ˚A, 4 ˚A and 6 ˚A.As a reference we took a full QM calculation of the en-tire system, performed with the linear scaling approachof

BigDFT ; from this calculation we also extractedthe set of atomic multipoles which will be used in thesubsequent steps. The values of the charges for each ofthe nucleotides of the DNA in solution are depicted inFig. 4. This charge analysis also allows us to determinehow much of the Na charge has gone to the DNA. The 20Na atoms have lost 19.2 electrons (corresponding to anaverage ionization of 0.96), out of which 3.6 have gone tothe water and the other 15.6 to the DNA. b. Test of non-embedded setups

To test the actualinﬂuence that the environmental molecules have on theDNA region we ﬁrst removed the solvent molecules out-side of the explicit solvent shell and conducted calcula-tions with two diﬀerent setups. i) In the “neutral” setupwe performed a neutral calculation of the system as-is,knowing that this can be problematic — in particular forsmall solvent shells — as the DNA attracts charge fromits environment. ii) In the “charged” setup we explicitlycharged the system with the negative countercharge ofthose atoms which are neglected with respect to the fullQM calculation, ranging from 15.6 electrons for the nakedDNA up to 8.9 electrons for a shell of 6 ˚A. In Fig. 5 weshow the mean value and standard deviation for the nu-cleotide charges and dipole moments, with Fig. 5a show-ing the neutral setup and Fig. 5b the charged setup. Theresults for the second setup are considerably more accu-rate, in particular for small solvent shells. Neverthelesswe also need in this case a shell of at least 4 ˚A to get rea-sonably close to the reference calculation where the entiresolvent is explicitly taken into account. This is again inremarkable correlation with the results of Tab. III.In Fig. 6 we show the density of states of these twosimpliﬁed QM setups, together with the partial densityof states (PDoS) f F ( ǫ ) for the DNA subsystem comingfrom the full QM calculation, which is given by f F ( ǫ ) = X j Tr (cid:16) F ˆ W j ˆ W F (cid:17) δ ( ǫ − ǫ j ) , (14)where ǫ j is the j th Kohn-Sham eigenvalue of the system,ˆ W j is the projector selecting the j th KS orbital,ˆ W j = | ψ j i h ψ j | , (15)and ˆ W F is the Mulliken deﬁnition of the fragment. Againwe can see a very good agreement between the referenceresult and the charged QM setup with the 4 ˚A explicitsolvent shell.However a big diﬀerence becomes visible if we comparethe absolute energy eigenvalues. In Tab. IV we show theHOMO energies and the value of the HOMO-LUMO gapfor the diﬀerent setups. The gap for the neutral setup isbasically zero — a fact which is not visible from Fig. 6 dueto the applied smearing. This wrong description is due -0.80 -0.75 -0.70 -0.65 -0.60 -0.55nucleotide net charge Cytosine-0.80 -0.75 -0.70 -0.65 -0.60 -0.55nucleotide net charge Guanine FIG. 4. Visualization of the solvated DNA. The color code represents the charges of the DNA residues as found from thefull QM calculation with the method described in Ref. 32. The terminal nucleotides do not represent perfect Cyt and Guanucleotides any more and are thus neither listed in the histogram nor colored in the ﬁgure. The solvent water molecules andNa ions are represented in faded pink. −0.8−0.6−0.4−0.20.00.2 ne t c ha r ge Gua nucleotidesfull QMshell 0 Åshell 2 Å shell 4 Åshell 6 Å−0.8−0.6−0.4−0.20.00.2 ne t c ha r ge Cyt nucleotides4.05.06.07.08.09.010.0 d i po l e ( D eb y e ) d i po l e ( D eb y e ) (a) neutral −0.8−0.6−0.4−0.20.00.2 ne t c ha r ge Gua nucleotidesfull QMshell 0 Åshell 2 Å shell 4 Åshell 6 Å−0.8−0.6−0.4−0.20.00.2 ne t c ha r ge Cyt nucleotides4.05.06.07.08.09.010.0 d i po l e ( D eb y e ) d i po l e ( D eb y e ) (b) charged −0.8−0.6−0.4−0.20.00.2 ne t c ha r ge Gua nucleotidesfull QMshell 0 Åshell 2 Å shell 4 Åshell 6 Å−0.8−0.6−0.4−0.20.00.2 ne t c ha r ge Cyt nucleotides4.05.06.07.08.09.010.0 d i po l e ( D eb y e ) d i po l e ( D eb y e ) (c) QM/QM FIG. 5. Mean value and standard deviation of the nucleotide charges and dipole moments for the neutral (5a), charged (5b)and QM/QM (5c) setup. The standard deviation must not be interpreted as an error bar, but rather as the natural variation ofthe charges and dipoles among the nucleotides. Thus not only the mean value, but also the standard deviation should coincidewith the reference full QM calculation. to the missing electrons which should be added to popu-late the states ﬁlled by the electrons that the nucleotidesattract from the environment, and indeed the chargedsetup exhibits a more reasonable value for the gap. Onthe other hand, the HOMO energy of the charged setup is largely positive, showing that this state is not bound,and therefore it has no actual physical meaning for gasphase boundary conditions. Additionally the total en-ergy of the charged setup is slightly higher (1.4 eV) thanthat of the neutral setup, also demonstrating that charg-0 −30 −25 −20 −15 −10 −5 0 5 10 15E

HOMO full QM den s i t y o f s t a t e s ( a r b i t r a r y un i t s ) energy (eV)full QMneutral, shell 4 Åcharged, shell 4 Å FIG. 6. Partial density of states for the DNA subsystem, forthe diﬀerent setups described in the text. The curves havebeen shifted such that the HOMO energies coincide, and aGaussian smearing with σ = 0 .

27 eV was applied. ing the system without a proper conﬁning environmentis not ideal.

2. QM/QM embedding

The comparisons with results obtained from a full QMcalculation indicate that we have to somehow considerthe environment in order to identify the ideal setup fora reduced complexity QM/QM calculation. Therefore,we now want to add the embedding potential, calculatedfrom the set of QM multipoles, to see whether this givesus a description which is of equal quality as the full QMcalculation.In Fig. 5c we show again the mean value and standarddeviation of the nucleotide charges and dipoles, but thistime using this embedded QM/QM approach; again wecharged the DNA with the negative countercharge of theatoms outside of the QM region. The results are similarto the “charged” setup, and once more conﬁrm the needfor an explicit inclusion of an explicit solvent shell ofat least 4 ˚A for this system. For smaller radii of theexplicit solvent shell, the well-known phenomenon of“overpolarization” occurs: Some nucleotides acquire anextra dipole as the KS orbitals are partially attractedby the environmental molecules. In the literature thereexist various approaches to avoid this problem, a popularone being the use of a delocalized charge distribution forthe environment atoms — a property which is indeedfulﬁlled using our ansatz. In our case the overpolarizationphenomenon might simply be ascribed to the presence oftoo few explicit solvent molecules. Indeed, as the numberof solvent molecules becomes slightly larger (above 4 ˚A),the problem completely disappears.Also with respect to Tab. IV, the situation usingthis QM/QM approach is considerably improved. TheHOMO value is now negative, showing that the associ-ated KS orbital is conﬁned by the environment, and thegap is large and slightly overestimated, which is soundconsidering that some molecules have been excluded fromthe QM calculation, thereby reducing the smearing dueto disorder. full QM neutral charged QM/QMHOMO -3.52 -6.41 ( ✔ ) 7.98 ( ✘ ) -4.81 ( ✔ )gap 2.76 0.006 ( ✘ ) 1.89 ( ✔ ) 3.26 ( ✔ )TABLE IV. HOMO energies and HOMO-LUMO gaps for thesolvated DNA. For the neutral, charged and QM/QM setupwe used a solvent shell of 4 ˚A. All values are given in eV. Overall, the QM/QM setup is thus the only one whichshows in all aspects a behavior which is similar to thatof the full QM setup and thus validates this approach.This demonstrates that we can eﬃciently represent theelectrostatic environment via the atomic multipoles andthus achieve a considerable reduction in the complexityof a DFT calculation on a system of several thousandatoms.

IV. CONCLUSIONS AND OUTLOOK

In this paper we presented a simple method for reduc-ing the complexity of large QM systems. To do so, wesplit the system into an active QM part and an environ-ment part, and — upon partitioning the latter into welldeﬁned fragments — represent the environment by a setof fragment multipoles, resulting in an eﬀective electro-static embedding of the QM part. The approach worksbest for fragment multipoles that can be considered as“pseudo-observables”, which means that they are quan-tities with an interpretable physico-chemical meaning.This property is closely related to a proper fragment def-inition, which can be easily veriﬁed in a quantitative wayusing our so-called purity indicator. In order to get a reli-able set of fragment multipoles it is advantageous to usea set of minimal and in-situ optimized basis functions;in such a favorable situation the embedding exhibits avery high quality, and there is no need to perform anypotential ﬁtting or similar techniques, nor to deal withapproaches describing explicitly the interface between theQM and the embedding region.Since our approach is based on an a posteriori iden-tiﬁcation of the fragments and the associated multipolesout of full QM calculations, it can easily be validated bycomparing it with unbiased results stemming from suchfull QM setups. Thanks to the fact that we have imple-mented both functionalities — i.e. a linear scaling fullQM calculation and the embedding approach — in thesame code, namely

BigDFT , this veriﬁcation of the com-plexity reduction can be done in a straightforward way.Once this validation is done, the embedding can then beused for subsequent calculations within a reduced com-plexity scheme, giving access to quantities which wouldnot be reachable otherwise, as for example demonstratedby the use of hybrid functionals which would considerablyincrease the computational cost for a straightforward fullQM calculation.In this way this approach of fragment identiﬁcation1and representation allows one to considerably lower thenumber of degrees of freedom and thus to reduce thecomplexity of QM treatments of large systems. Whenproperly employed this simpliﬁcation does not notablyaﬀect the accuracy of the description of a quantum me-chanical system and keeps the essential physico-chemicalproperties unaﬀected. In this sense our ansatz allows tocouple various levels of theory, and the eﬃcient coarsegraining of the QM description that we obtain allows inparticular ab-initio calculations to be coupled with clas-sical approaches. This scheme paves the way towardspowerful QM/QM or QM/MM calculations for very largesystems and is thus an important link within a multiscaleapproach aiming at the bridging of so-called time andlengthscale gaps . V. ACKNOWLEDGMENTS

We would like to thank Thierry Deutsch for valuablediscussions and F´atima Lucas for providing various testsystems and helpful discussions. This research used re-sources of the Argonne Leadership Computing Facility,which is a DOE Oﬃce of Science User Facility supportedunder Contract DE-AC02-06CH11357. SM acknowledgessupport from the European Centre of Excellence MaX(project ID 676598). LG acknowledges support from theEU ExtMOS project (project ID 646176) and the Euro-pean Centre of Excellence EoCoE (project ID 676629).

Appendix A: Approximating the electrostaticdensity from a set of multipoles

Suppose we have extracted the multipole coeﬃcients Q ℓm , deﬁned in Eq. (5), of a charge density ρ ( r ). Clearly,the Q ℓm alone do not suﬃce to completely determinethe original function ρ . It can be easily seen that any function of the form f ( r ) = √ π X ℓ,m √ ℓ + 1 Q ℓm φ ℓ ( r ) S ℓm ( r ) r ℓ (A1)yields the same multipoles as the original function ρ aslong as the radial functions φ ℓ ( r ) fulﬁll, ∀ ℓ , the normal-ization condition Z ∞ d rφ ℓ ( r ) = 14 π . (A2)When approximating a charge density we therefore haveto choose the family of functions φ ℓ ( r ) that express theirradial behaviour. It is convenient to employ functions ofthe form φ ℓ ( r ) = a ℓ p π (2 ℓ + 1) r ℓ φ ( r ) , (A3)where φ ( r ) is some regular radial function and a ℓ a nor-malization coeﬃcient ensuring the validity of Eq. (A2). In this way we get for the reconstructed function f ( r ) = φ ( r ) X ℓ,m Q ℓm a ℓ S ℓm ( r ) . (A4)In our approach we use Gaussians for φ ( r ): φ ( r ) = e − r σ , (A5)and the normalization coeﬃcient a ℓ is then given by1 /a ℓ = σ ℓ +3 ℓ +2 √ π p π (2 ℓ + 1) Γ (cid:18)

32 + ℓ (cid:19) . (A6) Appendix B: Deﬁnitions and relations between themultipole moments

There are two possible ways to deﬁne the monopole,dipole and quadrupole moments: Either using the basicformula of Eq. (5), or directly as q = Z ρ ( r ) d r ,p i = Z r i ρ ( r ) d r ,D ij = Z (3 r i r j − r δ ij ) ρ ( r ) d r . (B1)By comparing them, it follows that the monopole anddipole terms are identical up to a reordering: q = Q , p =  Q Q − Q  . (B2)For the quadrupoles we get for the oﬀ-diagonal elements D = D = √ Q − ,D = D = √ Q ,D = D = √ Q − . (B3)and for the diagonal elements (using the fact that D istraceless) D = 2 Q ,D − D = r Q ,D + D + D = 0 . (B4)This gives rise to the linear system  −   D D D  = r  √ Q Q  , (B5)2whose solution leads to ﬁnal result D = 1 √ ×  −√ Q + Q Q − Q Q − −√ Q − Q Q − Q Q − √ Q  . (B6) ∗ [email protected] † [email protected] P. Hohenberg and W. Kohn, “Inhomogeneous electrongas,” Phys. Rev. , B864 (1964). W. Kohn and L. J. Sham, “Self-consistent equa-tions including exchange and correlation eﬀects,”Phys. Rev. , A1133 (1965). Stefan Goedecker, “Linear scaling electronic structuremethods,” Rev. Mod. Phys. , 1085 (1999). D R Bowler and T Miyazaki, “O(N) meth-ods in electronic structure calculations.”Rep. Prog. Phys. , 036503 (2012). Laura E. Ratcliﬀ, Stephan Mohr, Georg Huhs, ThierryDeutsch, Michel Masella, and Luigi Genovese, “Chal-lenges in large scale quantum mechanical calculations,”Wiley Interdiscip. Rev.-Comput. Mol. Sci. , e1290 (2017),e1290. Kazuo Kitaura, Eiji Ikeo, Toshio Asada, Tatsuya Nakano,and Masami Uebayasi, “Fragment molecular orbitalmethod: an approximate computational method for largemolecules,” Chem. Phys. Lett. , 701 (1999). Dmitri G. Fedorov and Kazuo Kitaura, “Extend-ing the power of quantum chemistry to large sys-tems with the fragment molecular orbital method,”J. Phys. Chem. A , 6904 (2007), pMID: 17511437,http://dx.doi.org/10.1021/jp0716740. Jiali Gao, “Toward a molecular orbital de-rived empirical potential for liquid simu-lations,” J. Phys. Chem. B , 657 (1997),http://dx.doi.org/10.1021/jp962833a. Jiali Gao, “A molecular-orbital derived polarization poten-tial for liquid water,” J. Chem. Phys. , 2346 (1998),http://dx.doi.org/10.1063/1.476802. Scott J. Wierzchowski, David A. Kofke, andJiali Gao, “Hydrogen ﬂuoride phase behaviorand molecular structure: A qm/mm potentialmodel approach,” J. Chem. Phys. , 7365 (2003),http://dx.doi.org/10.1063/1.1607919. Wangshen Xie and Jiali Gao, “Design of anext generation force ﬁeld: the x-pol potential,”J. Chem. Theory Comput. , 1890 (2007), pMID:18985172, http://dx.doi.org/10.1021/ct700167b. Wangshen Xie, Lingchun Song, Donald G. Truh-lar, and Jiali Gao, “The variational explicit po-larization potential and analytical ﬁrst deriva-tive of energy: Towards a next generationforce ﬁeld,” J. Chem. Phys. , 234108 (2008),http://dx.doi.org/10.1063/1.2936122. Wangshen Xie, Lingchun Song, Donald G. Truhlar,and Jiali Gao, “Incorporation of a qm/mm buﬀer zonein the variational double self-consistent ﬁeld method,” J. Phys. Chem. B , 14124 (2008), pMID: 18937511,http://dx.doi.org/10.1021/jp804512f. Yingjie Wang, Carlos P. Sosa, Alessandro Cem-bran, Donald G. Truhlar, and Jiali Gao, “Multi-level x-pol: A fragment-based method with mixedquantum mechanical representations of diﬀerent frag-ments,” J. Phys. Chem. B , 6781–6788 (2012), pMID:22428657, http://dx.doi.org/10.1021/jp212399g. Jiali Gao and Yingjie Wang, “Communication: Varia-tional many-body expansion: Accounting for exchangerepulsion, charge delocalization, and dispersion inthe fragment-based explicit polarization method,”The Journal of Chemical Physics , 071101 (2012),http://dx.doi.org/10.1063/1.3688232. Shridhar R. Gadre, Rajendra N. Shirsat, and Ajay C.Limaye, “Molecular tailoring approach for simulation ofelectrostatic properties,” J. Phys. Chem. , 9165 (1994),http://dx.doi.org/10.1021/j100088a013. V. Ganesh, Rameshwar K. Dongare, P. Bala-narayan, and Shridhar R. Gadre, “Moleculartailoring approach for geometry optimization oflarge molecules: Energy evaluation and paralleliza-tion strategies,” J. Chem. Phys. , 104109 (2006),http://dx.doi.org/10.1063/1.2339019. Shridhar R. Gadre and V. Ganesh, “Molecular tailoringapproach: Towards pc-based ab initio treatment of largemolecules,” J. Theor. Comput. Chem. , 835 (2006). Nityananda Sahu and Shridhar R. Gadre, “Moleculartailoring approach: A route for ab initio treatment oflarge clusters,” Acc. Chem. Res. , 2739 (2014), pMID:24798296, http://dx.doi.org/10.1021/ar500079b. Christoph R. Jacob and Johannes Neuge-bauer, “Subsystem density-functional theory,”Wiley Interdisciplinary Reviews: Computational Molecular Science , 325–362 (2014). Alisa Krishtal, Debalina Sinha, Alessandro Genova, andMichele Pavanello, “Subsystem density-functional the-ory as an eﬀective tool for modeling ground and ex-cited states, their dynamics and many-body interactions,”J. Phys.: Condens. Matter , 183202 (2015). Yoshio Nishimoto, Dmitri G. Fedorov, and StephanIrle, “Density-functional tight-binding combinedwith the fragment molecular orbital method,”J. Chem. Theory Comput. , 4801–4812 (2014), pMID:26584367, http://dx.doi.org/10.1021/ct500489d. Dirk Bakowies and Walter Thiel, “Hybrid Models for Com-bined Quantum Mechanical and Molecular Mechanical Ap-proaches,” J. Phys. Chem. , 10580 (1996). Nicolas Ferr´e and J´anos G. ´Angy´an, “Approximate elec-trostatic interaction operator for qm/mm calculations,”Chem. Phys. Lett. , 331 (2002). Hai Lin and Donald G. Truhlar, “QM/MM: What have welearned, where are we, and where do we go from here?”Theor Chem. Acc. , 185 (2007). Hans Martin Senn and Walter Thiel,“QM/MM methods for biomolecular systems,”Angew. Chem. Int. Ed. , 1198 (2009). Alexander D. Mackerell, “Empirical force ﬁelds forbiological macromolecules: Overview and issues,”J. Comput. Chem. , 1584 (2004). P. H. Dederichs, S. Bl¨ugel, R. Zeller, and H. Akai, “Groundstates of constrained systems: Application to cerium im-purities,” Phys. Rev. Lett. , 2512 (1984). Qin Wu and Troy Van Voorhis, “Direct optimizationmethod to study constrained systems within density-functional theory,” Phys. Rev. A , 024502 (2005). Benjamin Kaduk, Tim Kowalczyk, and TroyVan Voorhis, “Constrained density functional the-ory,” Chem. Rev. , 321 (2012), pMID: 22077560,http://dx.doi.org/10.1021/cr200148b. Laura E. Ratcliﬀ, Luigi Genovese, Stephan Mohr, andThierry Deutsch, “Fragment approach to constraineddensity functional theory calculations using Daubechieswavelets,” J. Chem. Phys. , 234105 (2015). Stephan Mohr, Michel Masella, Laura E. Ratcliﬀ, andLuigi Genovese, “Complexity reduction in large quantumsystems: Fragment identiﬁcation and population analysisvia a local optimized minimal basis,” J. Chem. TheoryComput. (submitted), https://arxiv.org/abs/1612.02386. Luigi Genovese, Alexey Neelov, Stefan Goedecker, ThierryDeutsch, Seyed Alireza Ghasemi, Alexander Willand,Damien Caliste, Oded Zilberberg, Mark Rayson, AndersBergman, and Reinhold Schneider, “Daubechies waveletsas a basis set for density functional pseudopotential calcu-lations.” J. Chem. Phys. , 014109 (2008). Stephan Mohr, Laura E. Ratcliﬀ, Paul Boulanger, LuigiGenovese, Damien Caliste, Thierry Deutsch, and StefanGoedecker, “Daubechies wavelets for linear scaling densityfunctional theory,” J. Chem. Phys. , 204110 (2014). Stephan Mohr, Laura E. Ratcliﬀ, Luigi Genovese,Damien Caliste, Paul Boulanger, Stefan Goedecker,and Thierry Deutsch, “Accurate and eﬃcient linearscaling dft calculations with universal applicability,”Phys. Chem. Chem. Phys. , 31360 (2015). R. S. Mulliken, “Electronic population anal-ysis on lcao-mo molecular wave func-tions. i,” J. Chem. Phys. , 1833 (1955),http://dx.doi.org/10.1063/1.1740588. Per-Olov L¨owdin, “On the non-orthogonality problem con-nected with the use of atomic wave functions in the theoryof molecules and crystals,” J. Chem. Phys. , 365 (1950). Per-Olov L¨owdin, “On the nonorthogonality problem,”Adv. Quantum Chem. , 185 (1970). Alan E. Reed, Robert B. Weinstock, andFrank Weinhold, “Natural population analysis,”J. Chem. Phys. , 735 (1985). R.F.W. Bader and T.T. Nguyen-Dang, “Quantum theoryof atoms in molecules–dalton revisited,” (Academic Press,1981) p. 63. Richard F. W. Bader, “A quantum theoryof molecular structure and its applications,”Chem. Rev. , 893 (1991). C´elia Fonseca Guerra, Jan-Willem Handgraaf, Evert JanBaerends, and F. Matthias Bickelhaupt, “Voronoi defor-mation density (vdd) charges: Assessment of the mulliken, bader, hirshfeld, weinhold, and vdd methods for chargeanalysis,” J. Comput. Chem. , 189 (2004). F. L. Hirshfeld, “Bonded-atom fragmentsfor describing molecular charge densities,”Theor. Chim. Acta , 129 (1977). Emma Sigfridsson and Ulf Ryde, “Comparison of methodsfor deriving atomic charges from the electrostatic potentialand moments,” J. Comput. Chem. , 377 (1998). Lisa Emily Chirlian and Michelle Miller Francl, “Atomiccharges determined from electrostatic potentials: A de-tailed study,” J. Comput. Chem. , 894 (1987). Curt M Breneman and Kenneth B Wiberg, “De-termining atom-centered monopoles from molecularelectrostatic potentials. The need for high sam-pling density in formamide conformational analysis,”J. Comput. Chem. , 361 (1990). U. Chandra Singh and Peter A. Kollman, “An ap-proach to computing electrostatic charges for molecules,”J. Comput. Chem. , 129 (1984). Brent H. Besler, Kenneth M. Merz, and Peter A. Koll-man, “Atomic charges derived from semiempirical meth-ods,” J. Comput. Chem. , 431 (1990). Christopher I. Bayly, Piotr Cieplak, Wendy Cor-nell, and Peter a Kollman, “A well-behaved elec-trostatic potential based method using charge re-straints for deriving atomic charges: the RESP model,”J. Phys. Chem. , 10269 (1993). C. Alem´an, Modesto Orozco, and F.J. Luque, “Multicen-tric charges for the accurate representation of electrostaticinteractions in force-ﬁeld calculations for small molecules,”Chem. Phys. , 573 (1994). C. Hartwigsen, S. Goedecker, and J. Hutter, “Relativisticseparable dual-space gaussian pseudopotentials from h torn,” Phys. Rev. B , 3641 (1998). Alex Willand, Yaroslav O Kvashnin, Luigi Gen-ovese, ´Alvaro V´azquez-Mayagoitia, Arpan KrishnaDeb, Ali Sadeghi, Thierry Deutsch, and Ste-fan Goedecker, “Norm-conserving pseudopotentials withchemical accuracy compared to all-electron calculations,”J. Chem. Phys. , 104109 (2013). Luigi Genovese, Thierry Deutsch, Alexey Neelov,Stefan Goedecker, and Gregory Beylkin, “Eﬃ-cient solution of poissons equation with free bound-ary conditions,” J. Chem. Phys. , 074105 (2006),http://dx.doi.org/10.1063/1.2335442. Luigi Genovese, Thierry Deutsch, and Ste-fan Goedecker, “Eﬃcient and accurate three-dimensional poisson solver for surface prob-lems,” J. Chem. Phys. , 054704 (2007),http://dx.doi.org/10.1063/1.2754685. Alessandro Cerioni, Luigi Genovese, Alessandro Mirone,and Vicente Armando Sole, “Eﬃcient and accu-rate solver of the three-dimensional screened andunscreened poisson’s equation with generic bound-ary conditions,” J. Chem. Phys. , 134108 (2012),http://dx.doi.org/10.1063/1.4755349. G. Fisicaro, L. Genovese, O. Andreussi, N. Marzari,and S. Goedecker, “A generalized poisson andpoisson-boltzmann solver for electrostatic envi-ronments,” J. Chem. Phys. , 014103 (2016),http://dx.doi.org/10.1063/1.4939125. Koichi Momma and Fujio Izumi, “VESTA 3 for three-dimensional visualization of crystal, volumetric and mor-phology data,” J. Appl. Crystallogr. , 1272 (2011). S. Goedecker, M. Teter, and J. Hutter, “Sep-arable dual-space gaussian pseudopotentials,”Phys. Rev. B , 1703 (1996). John P. Perdew, Kieron Burke, and Matthias Ernzer-hof, “Generalized gradient approximation made simple,”Phys. Rev. Lett. , 3865 (1996). Carlo Adamo and Vincenzo Barone, “Toward reliable den-sity functional methods without adjustable parameters:The pbe0 model,” J. Chem. Phys. (1999). Matthias Ernzerhof and Gustavo E. Scuseria, “Assessmentof the perdew-burke-ernzerhof exchange-correlation func-tional,” J. Chem. Phys. (1999). P. J. Stephens, F. J. Devlin, C. F. Chabalowski, andM. J. Frisch, “Ab initio calculation of vibrational ab-sorption and circular dichroism spectra using densityfunctional force ﬁelds,” J. Phys. Chem. , 11623 (1994),http://dx.doi.org/10.1021/j100096a001. Miguel A.L. Marques, Micael J.T. Oliveira, andTobias Burnus, “Libxc: A library of exchange andcorrelation functionals for density functional theory,”Comput. Phys. Commun. , 2272 (2012). John P. Perdew, “Density functional theory and the bandgap problem,” Int. J. Quantum Chem. , 497 (1985). Simon J. Bennie, Marc W. van der Kamp,Robert C. R. Pennifold, Martina Stella, Freder-ick R. Manby, and Adrian J. Mulholland, “Aprojector-embedding approach for multiscale coupled-cluster calculations applied to citrate synthase,”J. Chem. Theory Comput. , 2689 (2016), pMID:27159381, http://dx.doi.org/10.1021/acs.jctc.6b00285. David A. Case, Thomas E. Cheatham, Tom Darden,Holger Gohlke, Ray Luo, Kenneth M. Merz, AlexeyOnufriev, Carlos Simmerling, Bing Wang, and Robert J.Woods, “The Amber biomolecular simulation programs,”J. Comput. Chem. , 1668 (2005), arXiv:NIHMS150003. David A. Case, T. A. Darden, T. E. Cheatham, Car-los L. Simmerling, J. Wang, Robert E. Duke, Ray Luo,Michael Crowley, Ross C. Walker, W. Zhang, K. M.Merz, B. Wang, S. Hayik, Adrian Roitberg, GustavoSeabra, I. Kolossv´ary, K. F. Wong, F. Paesani, J. Van- icek, X. Wu, Scott R. Brozell, Tom Steinbrecher, HolgerGohlke, L. Yang, C. Tan, J. Mongan, V. Hornak, G. Cui,D. H. Mathews, M. G. Seetin, C. Sagui, V. Babin, andPeter A. Kollman,

Amber 11 , University of California, SanFrancisco. Viktor Hornak, Robert Abel, Asim Okur, BentleyStrockbine, Adrian Roitberg, and Carlos Simmerling,“Comparison of multiple amber force ﬁelds and de-velopment of improved protein backbone parameters,”Proteins Struct. Funct. Bioinf. , 712 (2006). Greg Lever, Daniel J Cole, Nicholas D M Hine, Peter DHaynes, and Mike C Payne, “Electrostatic considera-tions aﬀecting the calculated homolumo gap in proteinmolecules,” J. Phys.: Condens. Matter , 152101 (2013). Heather J. Kulik, Jianyu Zhang, Judith P.Klinman, and Todd J. Mart´ınez, “How largeshould the qm region be in qm/mm calcula-tions? the case of catechol o-methyltransferase,”J. Phys. Chem. B , 11381 (2016), pMID: 27704827,http://dx.doi.org/10.1021/acs.jpcb.6b07814. Johannes Neugebauer, “Subsystem-based theoretical spec-troscopy of biomolecules and biomolecular assemblies,”ChemPhysChem , 3148 (2009). M. Eichinger, P. Tavan, J. Hutter, and M. Parrinello, “Ahybrid method for solutes in complex solvents: Densityfunctional theory combined with empirical force ﬁelds,”J. Chem. Phys. , 10452 (1999). Debananda Das, Kirsten P. Eurenius, Eric M. Billings,Paul Sherwood, David C. Chatﬁeld, Milan Hodoˇsˇcek, andBernard R. Brooks, “Optimization of quantum mechanicalmolecular mechanical partitioning schemes: Gaussian delo-calization of molecular mechanical charges and the doublelink atom method,” J. Chem. Phys. , 10534 (2002),http://dx.doi.org/10.1063/1.1520134. P. K. Biswas and V. Gogonea, “A regularized andrenormalized electrostatic coupling hamiltonian forhybrid quantum-mechanicalmolecular-mechanicalcalculations,” J. Chem. Phys.123