Automated construction of quantum-classical hybrid models
AAutomated construction of quantum–classical hybrid models
Christoph Brunken and Markus Reiher ∗ ETH Z¨urich, Laboratorium f¨ur Physikalische Chemie, Vladimir-Prelog-Weg 2,8093 Z¨urich, SwitzerlandFebruary 18, 2021
Abstract
We present a protocol for the fully automated construction of quantum mechanical-(QM)–classical hybrid models by extending our previously reported approach on self-parametri-zing system-focused atomistic models (SFAM) [
J. Chem. Theory Comput. , (3),1646–1665]. In this QM/SFAM approach, the size and composition of the QM regionis evaluated in an automated manner based on first principles so that the hybrid modeldescribes the atomic forces in the center of the QM region accurately. This entails the au-tomated construction and evaluation of differently sized QM regions with a bearable com-putational overhead that needs to be paid for automated validation procedures. ApplyingSFAM for the classical part of the model eliminates any dependence on pre-existing pa-rameters due to its system-focused quantum mechanically derived parametrization. Hence,QM/SFAM is capable of delivering a high fidelity and complete automation. Furthermore,since SFAM parameters are generated for the whole system, our ansatz allows for a con-venient re-definition of the QM region during a molecular exploration. For this purpose, alocal re-parametrization scheme is introduced, which efficiently generates additional clas-sical parameters on the fly when new covalent bonds are formed (or broken) and movedto the classical region. ∗ Corresponding author; e-mail: [email protected] a r X i v : . [ phy s i c s . c h e m - ph ] F e b Introduction
In contrast to most protocols of computational quantum chemistry that considerisolated molecules, chemical processes can take place in a vast variety of complexenvironments. Studying chemical reactions in proteins, in nanostructures, and onsurfaces, requires a theoretical approach that must provide an accurate quantum me-chanical description of the reaction center and at the same time an efficient model tocope with the enormous system size of the structured, heterogeneous environment.For these requirements to coalesce, one is forced to apply a hybrid method, which di-vides the system into several regions treated at different levels of approximation. [1,2]
Typically, the reaction center is modeled with a quantum mechanical method, whichallows one to describe the formation and cleavage of covalent bonds in a natural way.The environment may efficiently be treated by a force field [3] rooted in classicalmechanics. Such a quantum-mechanical/molecular-mechanical hybrid (QM/MM)model [4,5] typically requires significant manual work involved in the model construc-tion process. It is plagued by a lack of standardization and comparability, and rigor-ous uncertainty quantification is not available (as for most computational chemistrymethods), which is particularly critical for such complex composite models. Typi-cally, no standardized procedures exist regarding, for instance, the parametrizationof the force field (especially for metal-containing regions), the QM region determina-tion, the choice of boundary scheme, the initial structure generation, conformationalsampling, and the extent to which parts of the macromolecule are constrained dur-ing structure optimizations in order to confine the complexity of the system as wellas to limit the error of the MM energy contribution.One of the most prominent challenges for QM/MM modeling is the constructionof an efficient and at the same time accurate molecular mechanics model, which istypically only available for a pre-defined subset of chemical elements with standardbonding patterns. Its degree of transferability to a new system is not obvious at all.We addressed this issue in our recent work [6] by introducing the self-parametrizingsystem-focused atomistic model (SFAM), which allows for an automated construc-tion of a molecular mechanics model for a system of arbitrary size and elementalcomposition. The reference data for the model are obtained in a fully automatedway by an autonomous fragmentation algorithm and subsequent quantum chemicalreference calculations for the molecular fragments. Furthermore, SFAM includesa model refinement step based on ∆-machine learning [7] when additional referencedata become available during a molecular exploration.In this work, we extend SFAM toward quantum–classical hybrid models (QM/SFAM)with an automated set-up. The application of SFAM as the classical part of themodel eliminates any issues arising from an incomplete set of parameters. Further-more, SFAM guarantees that the reference data, from which the MM parametersare derived, can be provided by the same quantum chemical method as is appliedin the QM part of the hybrid calculation making the MM model as consistent as2ossible with the QM part.Furthermore, we present a scheme to determine the choice of the QM region whichresults in an accurate QM/SFAM model. The selection of atoms to include in thequantum mechanical part of the calculation is a highly non-trivial task and thedecision often relies on chemical intuition alone. However, there have been recentefforts to select QM regions systematically on a first-principles basis by Kulik andcoworkers. [8]
Their systematic QM region determination scheme relies on evaluatinghow specific residues of a protein affect the electronic structure (charge distribu-tion, frontier orbitals) of the core residues of the protein. In this work, we providean alternative approach solely based on the fundamental quantities of a molecularsystem, i.e., its electronic energy and its derivatives with respect to the nuclear co-ordinates, which are available from computationally cheap and approximate as wellas expensive and accurate electronic structure models.The nuclear derivatives play an essential role in obtaining reliable structures (bymolecular dynamics (MD) sampling or structure optimization) and we therefore fo-cus on the accurate description of the atomic forces when determining which atomsmust be included in the quantum mechanical part of the model. Defining an op-timal set of atoms as the QM region is essential, especially because it has beendemonstrated by Ochsenfeld, [9–11]
Martinez [12] and others [13–15] that for many sys-tems QM/MM models safely converge only with large QM region sizes of severalhundred atoms. Therefore, in those cases where such large QM regions are not fea-sible (e.g., vast reaction network explorations or MD simulations), it is inevitable tocarefully select the QM region systematically in order to guarantee that the resultingmodel is an accurate approximation to a full quantum mechanical model.In the following, we first describe our QM/SFAM model and then introduce analgorithm to systematically determine the composition of the QM region in an auto-mated way. This is demonstrated with the examples of (i) a medium-sized peptidethat also allows for full-quantum reference calculations and of (ii) a larger systemto resemble a typical case of application. Although these examples are taken frombiochemistry, we emphasize that our model is agnostic with respect to the elementalcomposition due to its first-principles core. Hence, any nanoscale atomistic systemcan be subjected to our hybrid model construction process, even one for which amolecular structure first needs to be constructed (by virtue of the SFAM approachthat early on in the model generation provides an approximate force field for iterativestructure refinement [6] ). 3
Theory
We briefly review the SFAM approach [6] as it will be the classical part of our hybridmodel. Similar to QMDFF [16] and QuickFF [17] molecular mechanics models, alsoSFAM is generated automatically for a specific molecular system from quantummechanical reference data, which yields accurate force fields without being limitedby the elemental composition of the molecular system. SFAM is distinct from thetwo aforementioned models [16,17] in two crucial aspects. First, SFAM force constantsare parametrized by a partial Hessian fit algorithm [18,19] as introduced by Hirao andcoworkers in 2016, i.e., the parameters are fitted solely to local information in theHessian, which allows us to generate the model for very large molecular systemsby calculating reference data for fragments cut out of the whole structure. Wealso introduced an autonomous fragmentation algorithm for this purpose. [6]
Second,SFAM includes an (optional) improvement step based on ∆-machine learning (∆-ML). [7]
While the MM base model of SFAM provides an accurate description of the potentialenergy surface (PES) close to the local energy minimum taken as a reference forparametrization, its parameters are not guaranteed to be transferable across allregions of the PES. The base model can be applied in an exploration of additionalstructures (e.g., in molecular dynamics simulations), for which additional referencedata can be calculated on the fly. The MM/ML ansatz of SFAM can then graduallyincrease its accuracy across the PES as an increasing amount of reference datais collected to train the ML model. Many ML-only models have been reportedas replacements for classical force fields. [20–24]
However, our MM/ML approach forSFAM has several advantages. On the one hand, the MM base model providesphysical insight into the properties of the system in contrast to an approach solelybased on ML. On the other hand, it requires only a limited and well controllableamount of reference data, as it is parametrized on single-point data obtained forfragments (optimized structures, atomic charges and Hessians).The SFAM energy E SFAM can be written as the sum of the MM and ML contribu-tions, E SFAM = E MM + E ML . (1) E ML is zero at this level, because we choose the hybrid QM/SFAM model as the newbase model, which can then be refined in a later step by ∆-ML. The SFAM energyexpression is divided into a covalent (cov) part E cov and nonbonding (nb) potentialenergy contributions E nb , E SFAM = E cov + E nb , (2)as is common in MM models. [3] The covalent energy contribution is calculated fromthe displacements of the internal degrees of freedom out of their equilibrium positionsand can therefore be divided into terms for bonds r , bond angles α , dihedral angles4 , and improper dihedral angles ϕ , E cov = E r + E α + E θ + E ϕ = (cid:88) ( A,B ) E ABr + (cid:88) ( A,B,C ) E ABCα + (cid:88) ( A,B,C,D ) E ABCDθ + (cid:88) ( X,B,C,D ) E XBCDϕ , (3)where ( A, B ) denotes a group of two bonded atoms A and B , ( A, B, C ) a bondedtriplet of atoms, (
A, B, C, D ) a bonded quadruplet of atoms, and (
X, B, C, D ) theatoms of an improper dihedral angle (with X representing the center atom). Thenonbonding interactions comprise an electrostatic part (estat), dispersive (disp) andPauli repulsion (rep) interactions, and hydrogen bonds, E nb = E estat + E disp + E rep + E hb = (cid:88) ( A,B ) (cid:0) E AB estat + E AB disp + E AB rep (cid:1) + (cid:88) ( D,H,A ) E DHA hb , (4)where ( A, B ) represents a pair of atoms, in which the atoms A and B are neitherbonded to one another nor both bonded to another atom C , and ( D, H, A ) is ahydrogen bond. For details on the potential energy expressions for each of the MMcontributions, as well as an explanation of the parametrization procedure, we referto our previous paper. [6]
We emphasize that combining the classical model in SFAM with a quantum chemicalmethod creates an opportunity to apply our ∆-ML improvement step to the hybridmodel. During a molecular exploration with the QM/SFAM method, quantumchemical reference data are collected without any additional effort. Our machine-learned model corrections can be trained with these data and will be valuable (i)if the QM focus is moved to a different section of the whole system and (ii) if thedata can be transferred to improve the description of similar molecular substructureslocated in the MM region. We also note that related efforts to combine machinelearning with quantum − classical hybrid methods have been reported recently. [25,26] The energy expression of any hybrid model with two distinct regions, a quantumcore Q and an environment E , can be approximately separated into a quantum-core-only contribution E Q , an analogous contribution of the environment E E , andan interaction energy E Q−E , E hybrid = E Q + E E + E Q−E . (5)When choosing two different methods for the two regions Q and E , there must not beany electromagnetic interactions included twice or be missing completely. As notedbefore, we apply a quantum chemical method for the core region and SFAM for the5nvironment. Furthermore, we distinguish two schemes for the interaction energy E Q−E , (a) one in which the electrostatic interaction is treated by SFAM and (b) onein which it is described quantum mechanically. The former is known as mechanicalembedding (ME) and the latter as electrostatic embedding (EE). Although it hasbeen demonstrated that EE provides more accurate results than ME, in particularfor small QM regions, [11] we implemented both embedding schemes because EEmay not always be available for the QM method of choice. In the case of ME, theQM/SFAM energy expression reads, E MEQM/SFAM = E Q + E SFAM + E Q QM − (cid:88) ( A,B ) ∈Q E ABr − (cid:88) ( A,B,C ) ∈Q E ABCα − (cid:88) ( A,B,C,D ) ∈Q E ABCDθ − (cid:88) ( X,B,C,D ) ∈Q E XBCDϕ − (cid:88) A ∈Q B ∈Q A (cid:54) = B (cid:0) E AB estat + E AB disp + E AB rep (cid:1) − (cid:88) ( D,H,A ) ∈Q E DHA hb . (6)In Eq. (6), the energy E Q + E SFAM refers to the SFAM energy of the full system (i.e., Q and E combined) and E Q QM is the electronic energy obtained in a QM calculation for Q , E Q QM = (cid:104) (cid:98) H Q el,ME (cid:105) (7)with the electronic Hamiltonian in atomic units (cid:98) H Q el,ME = − N Q el (cid:88) i = 1 m e ∆ i − N Q nuc (cid:88) α = 1 M α ∆ α − N Q el (cid:88) i = 1 N Q nuc (cid:88) α = 1 Z α | r i − R α | + N Q el (cid:88) i = 1 N Q el (cid:88) j = i +1 | r i − r j | + N Q nuc (cid:88) α = 1 N Q nuc (cid:88) β = α +1 Z α Z β | R α − R β | , (8)where N Q el is the number of electrons, N Q nuc is the number of atomic nuclei, Z α isthe nuclear charge of nucleus α , and r i and R α are the Cartesian coordinates ofelectrons and nuclei in Q , respectively.The latter part of Eq. (6) subtracts all energy contributions in the MM force fieldthat are covered by E Q QM , i.e., all bond terms E ABr with the atoms A and B bothin the QM region, all angle terms with at least two, dihedral terms with at leastthree, and all improper dihedral terms with all four of their corresponding atomsin the QM region. All pairwise noncovalent interaction terms are subtracted foreach pair of atoms in Q . Hence, all noncovalent interactions between Q and E aredescribed by SFAM, which is a consistent approach if the electronic structure modeldoes not acount for dispersive interactions (as in many standard density functionals)so that they can be treated semi-classically everywhere in the system. For EE, we6pply a quantum mechanical description of the electrostatic interaction by defining E EEQM/SFAM as, E EEQM/SFAM = E MEQM/SFAM + E Q QM, estat − (cid:88) A ∈Q B ∈E E AB estat , (9)with E Q QM, estat = E Q QM,EE − E Q QM,ME . (10)Eq. (9) includes an additional energy contribution obtained in the QM calculation of Q , namely E Q QM, estat . This is the interaction energy of the elementary particles (elec-trons and nuclei) in Q with the electrostatic potential generated by SFAM’s atomicpartial charges located at atomic positions in E as point charges. Naturally, theclassical equivalent of this interaction must be subtracted to avoid double counting.The electronic Hamiltonian operator (in Hartree atomic units) is therefore differentfor EE compared to ME, (cid:98) H Q el,EE = (cid:98) H Q el,ME − N Q el (cid:88) i = 1 (cid:88) A ∈E q A | r i − r A | + N Q nuc (cid:88) α = 1 (cid:88) A ∈E Z α q A | R α − r A | . (11)Here, q A is the partial charge of atom A in E , and as before, Z α is the nuclear chargeof nucleus α in Q , and r A , r i , and R α are the Cartesian coordinates of the atoms in E and the electrons and nuclei in Q , respectively. The van der Waals interactionsare treated at the SFAM level (based on semi-classical dispersion corrections [27–30] of Grimme) in both embedding schemes. Within Q , the QM method must take careof dispersive interactions. The challenge of describing a single molecular system with two different physicaltheories becomes most apparent at the boundary of the two regions Q and E , par-ticularly if the boundary intersects a covalent chemical bond. [31] Various strategieshave been developed for modeling this QM–MM boundary. [1,2]
The by far mostcommon one is the link-atom approach, [31–37] in which the covalent bond at theborder of the QM region is valence saturated by a hydrogen atom or some otherprototypical residue (e.g., a methyl group). The most prominent alternative is togenerate localized bond orbitals from a slightly larger QM calculation and includethese doubly occupied orbitals in the QM calculation of the hybrid method. Duringthe self-consistent field (SCF) optimization of the orbitals, these artificial orbitalsare kept frozen. This approach known as the local-SCF method [38–41] was introducedby Rivail and coworkers and later extended by Gao and coworkers. [42–44]
Further-more, advanced embedding approaches may be applied to separate the QM regionfrom an environment, such as projector-based embedding and embedded mean-fieldtheory, [45–49] frozen density embedding, [50–61] or the subsystem separation by unitaryblock-diagonalization approach (SSUB). [62]
7n our QM/SFAM implementation, we focus on the link-atom approach, but empha-size that our implementation can be extended to include the more advanced bound-ary schemes mentioned above. It is crucial for the link-atom approach to carefullyselect the bonds which indicate the boundary of the QM region, because replacingheavy atoms with hydrogen link atoms introduces artificial effects on neighboringmolecular entities by distorting the electronic structure.Our automated strategy for placing link atoms is described in section 2.7. Alongthe vector of a single bond i , r E i Q i = r E i − r Q i , (12)which is defined by the coordinates of atom E i in the environment and Q i of thequantum region bonded to E i , a hydrogen link atom is positioned at r H i = r Q i + ( R Q i , cov + R H,cov ) r E i Q i | r E i Q i | , (13)with the covalent radii [63,64] R Q i , cov and R H, cov of Q i and hydrogen. This approachallows us to cut through single bonds only (see section 2.7), which is, however, not asevere restriction, especially considering the fact that the QM region can be enlargedto eventually meet it.To exploit the force on an artificial link atom H i , which is located between E i and Q i , its energy gradient g H i calculated in the QM calculation must be distributed to E i and Q i . The gradient contributions ˜ g E i and ˜ g Q i in direction µ ( µ = x, y, z ) are,following Ref. 65,˜ g Q i ,µ = g TH i · (cid:18)(cid:18) − | r Q i − r H i || r E i Q i | (cid:19) u µ + | r Q i − r H i | d µ | r E i Q i | r E i Q i (cid:19) , (14)˜ g E i ,µ = g TH i · (cid:18) | r Q i − r H i || r E i Q i | u µ − | r Q i − r H i | d µ | r E i Q i | r E i Q i (cid:19) . (15)where d µ is the absolute value of the difference in the µ -th component of r E i and r Q i ,and u µ is a unit vector in direction µ , e.g., u z = (0 , , T . With these equations,it is straightforward to calculate analytic gradients for the QM/SFAM energy aslong as analytic gradients are available for the QM method (including gradientson the external point charges). These gradients are essential for efficient structureoptimizations and for molecular dynamics simulations with our QM/SFAM model.In electrostatic embedding, another issue arises at the Q – E boundary. Since the par-tial charge q E i of some atom E i at the boundary is included into the QM calculationas an external point charge, the link atom may suffer from overpolarization effectscaused by the close proximity of that charge. To counteract this artificial effect,many strategies have been proposed such as deleting the charge, [32,66,67] redistribut-ing it, [68–72] or smearing it out [36,73,74] by replacing the point charge by a Gaussiancharge distribution centered at E i . We apply a charge redistribution scheme that8as shown to produce accurate results compared to full QM calculations. [72] Weimplemented two variants of the charge redistribution: (i) one in which the totalcharge is conserved (redistribution of charge, denoted RC) and (ii) one in whichalso the bond dipoles of the first shell of bonds in E are conserved (redistribution ofcharge and dipoles, denoted RCD). Both schemes are illustrated in Fig. 1. R OHCH R 𝜀 i 𝜀 i ,1 𝜀 i ,2 𝜀 i ,3 𝓠 i + q + q + q (a) R OHCH R 𝜀 i 𝜀 i ,1 𝜀 i ,2 𝜀 i ,3 𝓠 i + + + − q − q − q (b) Figure 1:
Illustration of (a) the redistribution of charge (RC) scheme and (b) the redis-tribution of charge and dipoles (RCD) scheme to prevent overpolarization of a link atom(not shown) by the charge on E i at the QM/SFAM boundary i (wiggly line). Green colorhighlights the redistribution of partial charges toward the positions to which the arrowsare pointing, whereas the color red indicates a subtraction of charge. We define q = q E i /n with n being the number of non-QM neighbors of E i . In this example, n = 3 as the neigh-boring atoms are E i, , E i, , and E i, . R represents the remainder of the QM region and R denotes the rest of the environment. In both schemes, the charge on E i vanishes; i.e., the new charge is ˜ q E i = 0. In the RCscheme, the charge is shifted equally to the n neighbors E i,k (with k ∈ { , . . . , n } )of E i , which are also in E ,˜ q E i,k = q E i n for k = 1 , . . . , n . (16)In the RCD scheme, to conserve the bond dipoles of the bonds E i − E i,k , the chargeon E i is shifted to the positions half-way in between the E i − E i,k bond vectors anddoubled in magnitude, resulting in auxiliary charges at positions r aux ,k ,˜ q r aux ,k = 2 q E i n , (17)with r aux ,k = 12 (cid:0) r E i + r E i,k (cid:1) . (18)The factor of two, i.e., the doubling of the shifted charge, is introduced to preservethe magnitude of the bond dipole as the distance between the charges is halved.Consequently, also the charges on the neighboring atoms must be adjusted so thatthe new charges on atoms E i,k become˜ q E i,k = q E i,k − q E i n . (19)9 .4 QM/SFAM structure optimization As targets of QM/SFAM are large systems with many degrees of freedom, structureoptimizations tend to require many iterations to reach convergence. Most of thedegrees of freedom to optimize can be attributed to the environment E and there-fore the necessity to perform a QM calculation in every optimization step can beavoided by a microiteration-based structure optimization. Several variants of suchan algorithm exist and have been implemented in QM/MM programs to acceler-ate structure optimizations of large systems. [75–81] The aim of these approaches is toreach the same local energy minimum structure as in a regular optimization withoutexpensive QM calculations in every step.In our variant of the algorithm, the Cartesian coordinates of all atoms in Q aswell as those of atoms in E within a distance R to any atom in Q are frozen,while all remaining MM degrees of freedom are relaxed (either until convergenceor until a maximum of N steps is reached). As all of the atoms in Q are fixed(i.e., their gradients are treated as zero), no QM calculation is necessary duringthese microiterations. We note that, despite the system-focused parametrization ofSFAM that can be tailored to the QM model in QM/SFAM, our attempts to utilizethe MM gradient for the whole system in this step were fruitless. In fact, it is thisremaining mismatch of SFAM and QM forces that requires the environment atomsat the QM boundary to be kept frozen.Then, the complete system is relaxed according to the full QM/SFAM gradients. Nonuclear positions are constrained and therefore a QM calculation is needed for eachevaluation of the complete gradient. Once convergence has been reached the opti-mization terminates. If convergence cannot be reached after N steps, one macroi-teration step will be completed and the procedure will iterate again starting withthe first MM-only step.For the parameters of this algorithm, we found values of R = 4 ˚A, N = 1000and N = 15 to perform well in all examples studied in this work, but they maybe adjusted if needed. For the individual structure optimizations, the algorithmsimplemented in the SCINE Utilities library [82] are applied. A crucial issue of molecular mechanics models is that their error in the total energyof the system is expected to scale unfavorably with system size, e.g., measured interms of the number of atoms N at . While for small systems, energies obtained withmolecular mechanics have been shown to be accurate, especially with system-focusedmodels, [6,16,17] this is not expected for large systems, which we illustrate by a simplestatistical model. [83] Consider the covalent terms in the total MM energy expressionwhich is a sum of approximately 3 N at (mostly) independent terms, each with an10ncertainty of ± ∆ ε . We can model the estimated error under the ideal assumptionof equal probabilities to either underestimate or to overestimate the energy of asingle potential term by ∆ ε . For large N at , the corresponding binomial distributioncan be approximated by the normal distribution. [84] As a result, we estimate a totalerror of at least ∆ E ≈ . (cid:112) p (1 − p ) N at (20)to occur with a probability of 50% applying these simple assumptions with p beingthe probability of overestimating an individual potential energy by ∆ ε instead ofunderestimating it. For example, for a system with 1000 atoms it is expected thatwith a probability of 50%, a total error of at least 18 . ε will be observed, whichscales with √ N at . Moreover, the MM model may exhibit a systematic over- orunderestimation of the potential energies to some (minor) extent, resulting in anexpected error that scales linearly with system size. Furthermore, we note that theMM noncovalent pair interaction terms, which outnumber the covalent terms, aremore difficult to assess with respect to their error contribution [85,86] because of theirdistance and hence structure dependence.Regardless of the simplified assumptions inherent to Eq. (20) such as neglecting addi-tional uncertainties introduced by the noncovalent interactions (see also Refs. [85,86] ),it is apparent that for large systems the energies of classical models may exhibit largeuncertainties (even with system-focused approaches). By contrast, atomic forces arelocal quantities evaluated as partial first-order derivatives at a given reference struc-ture for each atomic nucleus.In view of these considerations, common practice in QM/MM studies is to gener-ate and sample structures, either by molecular dynamics simulations or structureoptimizations. [87–93] To obtain accurate energies of local minima on the PES, it iscommon to freeze all MM atoms beyond a given distance from the active site duringstructure optimizations [90,94–97] to obtain a converged structure at smaller computa-tional cost and with larger resemblance of a reference structure such as a structuremeasured by X-ray diffraction, [98] and to eliminate the contribution of most of theMM region to the total energy. The advantage of this strategy, compared to neglect-ing all MM contributions to the total energy, is that effects of structural changesclose to the active site are captured. However, there exist no standardized guidelinesfor the choice of this additional cutoff parameter, which may have a significant effecton calculated energies.Considering all of the aforementioned factors, we introduce a reduced QM/SFAMenergy E redQM/SFAM to counteract the possibly large uncertainties induced by the clas-sical description of a large environment E , E redQM/SFAM = E Q QM + E Q−E , (21)in which any covalent SFAM contributions as well as the noncovalent interactionswithin the environment are neglected. The QM calculation is embedded into theenvironment by including the Q – E interaction E Q−E either through mechanical orelectrostatic embedding (see section 2.2).11e propose that during a molecular exploration, relevant structures should be iden-tified with the complete QM/SFAM model (e.g., by molecular dynamics simulationsor structure optimizations), while for the energy differences of intermediate struc-tures on the PES, the difference between the two strategies for computing the energy,∆ = ∆ E QM/SFAM − ∆ E redQM/SFAM , (22)should be monitored closely. For ∆ E redQM/SFAM , it is crucial that energy contributionsfrom structural changes in close proximity to the active site are picked up by theQM calculation. With SFAM as the classical part of the hybrid approach, parameters are always gen-erated for the whole system automatically before starting a molecular exploration.Therefore, one is not restricted in the selection of the QM region and may freelyre-define the QM region. It can be valuable to have this flexibility in automatedreaction network explorations [99,100] as well as in reactive molecular dynamics simula-tions because reactive centers can shift during a multi-step mechanism. This featureis also a requirement for applying QM/SFAM in an interactive quantum chemistryframework, [101,102] as the ability of the operator to choose a region of interest in alarge system should not be limited by missing parameters.Naturally, QM/SFAM does not require parameters for the covalent terms in the QMregion. This means that a reaction that takes place in the QM region and modifiesthe local connectivity of the atoms (and hence the SFAM topology), does not resultin the model to become unapplicable. Note that parameters for van der Waalsinteractions, namely the dispersion coefficients, may be required in the QM region.These can be either quickly re-evaluated or, as an approximation, kept constant evenafter the modification of the topology, because the dispersion coefficients for the sametypes of elements are expected to be similar (we note that the dispersion coefficientsof the predecessor of D3, i.e., D2, [103] are fixed for each pair of elements). Partialcharges are expected to be less transferable after a chemical reaction; however, theseare not needed for atoms in the QM region.Even if no bond breaking and bond formation processes are possible in the classicalregion, a re-definition of the QM region can move atoms affected by such processesfrom the QM region to the environment. In this case, the connectivity of the atomsis modified in the classical region and therefore SFAM must be re-parametrized ifthe newly required SFAM parameters are not available due to the existence of thesame bonding pattern somewhere else in the system. To cope with such events, wehere extend our SFAM parametrization procedure [6] by the option to re-parametrizelocally, for which QM data from the QM-region calculation may be exploited. In12eneral, the missing parameters must be obtained in an efficient way at a smallfraction of the cost of the full-system parametrization.At the start, we identify all parameters which are not covered by the existing setof SFAM parameters. To calculate the required reference data (i.e., optimized localgeometries, Hessian matrix, atomic partial charges, bond orders), we fragment thewhole system as explained in our original work on SFAM, [6] but perform calculationsonly on those fragments that were generated around the atoms involved in the bonds,angles, or dihedral angles with missing parameters. Subsequently, the parametersare optimized based on the calculated Hessians and local equilibrium geometries.Partial charges and connectivity information (obtained from Mayer [104,105] covalentbond orders) are updated for all atoms and bonds for which new information isavailable (see our original work on SFAM [6] for details). Moreover, the dispersioncoefficients are re-evaluated for the whole system due to the negligible additionalcomputational effort associated with it; see Fig. 2 for an overview of the wholeprocedure. chemicalreactionre-parametri-zationIdentify missingparameters Calculate reference datafor relevant fragments Optimize missingparameters onlyUpdate atomic charges,bond orders anddispersion coe ffi cients1234 𝓠 𝜀 movedto classicalregionno extraparametersneeded 𝓠 𝜀 𝜀 missingparameters 𝓠 Figure 2:
Local re-parametrization of the SFAM model if a region in which new covalentbonds are formed or broken is moved out of the QM region Q to become part of theenvironment E . Shown is an esterification reaction, which is embedded in a large environ-ment (not shown for the sake of clarity). The box provides a short summary of the stepsinvolved in a local model reparametrization. Finally, we note that this strategy can be combined with a second approach towardflexible QM regions, i.e., with adaptive QM/MM schemes for molecular dynam-ics. [106–110]
These have been developed in recent years to allow for moving smallmolecules (e.g., solvent molecules) from the MM to the QM region (and vice versa)during an MD simulation while preserving a smooth description of the total energy.13 .7 Automated selection of the QM region
In this section, we introduce our algorithm for the selection of atoms for the QMregion. Once a location of the QM region is provided, either based on structural char-acteristics that indicate chemical reactivity or by explicit manual determination, thislocation allows us to identify an atom around which the QM region is constructed(called “the center atom”). Our aim is to define an automated, universal, anddata-driven procedure to find an accurate QM/MM model for the description of thereactive center when compared to a full QM calculation on the system or, if this iscomputationally not feasible, to the best possible estimate of that. First, one needsto define a descriptor to measure the accuracy of a given model. As mentioned inthe Introduction, previous work by Karelina and Kulik [8] applied descriptors basedon charge distribution. However, we focus on the forces on atoms in the proximityto the center atom, because these relate directly to reasonable structures either instructure optimizations or molecular dynamics simulations.For long-time molecular dynamics simulations (possibly with large basis sets), onecannot afford as large of a QM region as, for instance, in structure optimizationsthat require less than 100 single-point gradient calculations. Moreover, for smallQM regions it is also important to have a systematic approach toward a reliablesolution. We therefore emphasize the importance of automation required to carryout a large number of exploratory calculations on candidate models of different sizewhich need to be automatically set up, carried out, and then analyzed (includingalso the construction of the models) for the reliable and autonomous QM regiondetermination to be applicable in a routine fashion.In the following, we first explain how we construct QM regions around a given centeratom automatically. Then, the selection criteria for the QM region are discussed.Finally, we clarify how to obtain reference data for systems where a full QM calcu-lation is not feasible.The construction of a QM region with a user-defined center atom represents a taskanalogous to the fragmentation step in our SFAM model generation. [6]
In the lattercase, we construct one molecular fragment around each atom of the system undersubsequent valence saturation with hydrogen atoms. This is achieved by first defin-ing a sphere with radius r around a selected atom and adding all atoms within itto the fragment. Second, all covalent bonds which were cut by the sphere’s edge areidentified and followed outwards recursively until a covalent bond is reached at whichthe system can be divided and valence saturated by a residue (currently, hydrogen-atom saturation has been implemented). Which bonds are considered cleavable ispre-determined, but can be adapted for a given system. For biochemical systems,C sp − X bonds (with X = C, N) can be considered a suitable choice because oftheir abundance in biological macromolecules. We emphasize that advanced em-bedding schemes, [45–49] frozen density embedding [47–62] may have the potential toreplace this rule-based saturation approach. Combined with an initial radius be-14ween r = 5 . r = 7 . [6] which means that the re-quired reference data can easily be calculated for these fragments with contemporarydensity functional theory.To sample several model sizes and boundaries, we introduce a stochastic elementto our automated QM/MM model construction. If a cleavable bond is reached, thesystem may be chosen to be split into QM and MM parts at that bond with someprobability p . Naturally, the resulting set of QM regions will contain duplicatesbecause each QM region is constructed independently. Hence, the set of QM regionsmust be deduplicated. This straightforward approach is chosen over a systematicgeneration of all possible QM regions, because of its simple implementation in thecurrent fragmentation framework and the otherwise exploding number of possibleQM regions of varying size, for which an exhaustive generation and selection processbecomes unfeasible. Nevertheless, we point out that for small QM regions, thestochastic approach is also capable of generating all possible QM regions exhaustivelyup to a given size due to the efficiency of the fragmentation algorithm. Hence, theparameters r and p allow for adjusting the QM region size and the variation ofsizes.To filter and categorize the generated QM regions, we introduce two additional de-scriptors. The first one is the number of covalent bonds m link cut in the processof defining the QM region, which is equal to the number of link atoms in the re-sulting fragment. As a second feature of the generated QM regions, we introduce asymmetry measure m sym , m sym = r LDM r MDQ , (23)where r LDM is the mean distance of the central atom to the three least distant MMatoms (LDM) and r MDQ is the mean distance of the central atom to the three mostdistant QM atoms (MDQ). This descriptor can be applied to assess the extent towhich atoms are arranged aspherically around the reactive center.To measure the reliability of some automatically produced QM/MM model m interms of how accurately the atomic forces in close proximity to the center atom aredescribed compared to a reference (’ref’), we first select a set of N repr representativeatoms that are closer than a cutoff of r repr to the center atom. The mean absoluteerror ε mk of the force components ( f x,m,k , f y,m,k and f z,m,k ) for a given atom k in thisset is given by ε mk = 13 (cid:16) | f x, ref , k − f x, m, k | + | f y, ref , k − f y, m, k | + | f z, ref , k − f z, m, k | (cid:17) . (24)The overall accuracy can then be measured by the mean of these errors, ε m mean = 1 N repr N repr (cid:88) k = 1 ε mk . (25)15he reference forces f ref , k = ( f x, ref , k , f y, ref , k , f z, ref , k ) T in Eq. (24) may be obtainedfrom a QM calculation on a significantly larger system. As only one single-pointgradient evaluation is necessary for this purpose, this will be feasible for systemsof several hundred atoms. Alternatively, we may obtain a reliable estimate for thereference in the case of larger systems by averaging the force vectors obtained from asample of N ref QM/SFAM models with large QM regions by choosing a radius r thatis as large as possible for a single-point calculation in order to be still feasible in areasonable amount of time. It is important that this estimate is not based on a singlereference model, but on many different ones, because it has been shown that onecannot be certain that molecular properties are converged even with QM regionsof up to several hundred atoms. [9–12] The comparison of several QM/MM modelswith different QM/MM boundaries allows to detect whether the atomic forces areconverged with the QM region size that was chosen for the reference. If significantdeviations exist between reference calculations, it can be detected and flagged bythe algorithm automatically.Finally, we note that it has been demonstrated that one can deploy the domain-basedlocal pair natural orbital coupled cluster methods [111–113] as the QM part (allowingfor large QM regions) to obtain accurate reaction barrier heights, [87,114] which iscrucial in mechanistic explorations for the subsequent kinetic analysis. [115]
RunningQM/SFAM calculations with these QM methods is enabled through an interface tothe quantum chemistry software ORCA. [116,117]
We demonstrate our automated QM/SFAM set-up algorithm with two examples.For the first example, chain A of the peptide hormone insulin [118] was chosen becauseits size of slightly more than 300 atoms is large enough to test the effects of differentQM regions on the QM/SFAM results while at the same time full QM calculationsare still feasible so that a well-defined full-QM reference is available. The initialstructure was taken from the Protein Data Bank [119] (PDB ID: 1AI0).
To study a chemical reaction in this system, we added a 1-propanol molecule inclose proximity to the carboxylic acid group of the C-terminus of the chain (as-paragine, A21), which serves as the initial structure of an esterification reaction.The product structure therefore contains a free water molecule and the propyl estercompound. This reaction is depicted in Fig. 3. With the added alcohol as reac-tant, the system consists of 328 atoms. We fully pre-optimized the reactant andproduct structures with the PM3 semi-empirical method [120,121] applying the
ORCA4.2 quantum chemistry software. [116,117]
The subsequent DFT optimization with16I-PBE-D3(BJ)/def2-SVP [28,122–126] was limited to ten optimization steps in orderto obtain forces on all atoms that neither vanish nor acquire artificially large nu-merical values. The coordinates of these structures can be found in the SupportingInformation.For the reactant structure, a SFAM molecular mechanics model was parametrizedin a fully automated fashion. [6]
The reference data, i.e., optimized structures, Hes-sians, Mayer bond orders, and atomic partial charges, were obtained for the RI-PBE-D3(BJ)/def2-SVP electronic structure model with
ORCA 4.2 driven by oursoftware.
ORCA
Hirshfeld charges, [127] were converted to Charge Model 5 charges byour implementation of the published algorithm [128] in our SCINE software. [129] NH H O NHO NH O O OHS
A19Tyr AsnA18 A17-A2... GlyA1
OH NHO NH HO O OHS
A19Tyr AsnA18 A17-A2... GlyA1 NH Figure 3:
Reaction of 1-propanol with chain A of the peptide hormone insulin, which con-tains 21 amino acids, labeled A1-A21. The residues A20 (cysteine) and A21 (asparagine)are represented as Lewis structures. Note that this representation refrains from showingthe disulfide bridge connecting two cysteine residues at positions A7 and A11.
For the generation of the second example, we solvated the (dry) insulin peptidestructure with water molecules. We applied the solvation tool of the
ADF 2016.107 graphical user interface [130] with which 418 water molecules were added (16 ˚A sphere,solute factor of 2.0), resulting in the second system that now comprises 1582 atomsin total. After the water molecules were added, the system was not relaxed. Again,we note that this is done deliberately to work with non-zero forces. The coordinatesof the solvated structures can also be found in the Supporting Information (see Fig. 4for a ball-and-stick representation of both systems). The SFAM parametrization wascarried out analogously to the unsolvated insulin system.17e note that for the parametrization, the bottleneck with respect to computingtime is the reference data generation step with all other tasks being completed inless than five minutes for the dry insulin peptide structure and less than one hourfor the solvated structure on a modern computing architecture with a single core.However, the time needed for the reference data generation can vary significantlydepending on the number of cores chosen for parallel execution. In our set-up, allreference calculations for the solvated structure were completed within a few daysin 200 parallel calculations on 4 cores each.For QM/SFAM calculations, the RI-PBE-D3(BJ)/def2-SVP [28,122–126] combinationof density functional and basis set was applied for the QM region through an interfaceto the
ORCA 4.2 software [116,117] in both examples.Figure 4:
Dry (left) and microsolvated (right; 16 ˚A water-molecule sphere) insulin withpropanol as reactants (cf. Fig. 3) Atom coloring: carbon in gray, hydrogen in white,oxygen in red, nitrogen in blue, and sulfur in yellow.
We first apply the QM-region generation algorithm of section 2.7 to the structuralmodel of dry insulin. The carbon atom of the carboxylic acid group of residueA21 (see Fig. 3) was chosen as the center atom around which the QM region isconstructed. We study whether the accuracy of the atomic forces close to the centerof the QM region can be exploited to ensure reproducibility of the QM reference bythe QM/SFAM model. For this purpose, a large variety of different QM regions wasgenerated, including aspherical ones with a large value of m sym (see Eq. (23)). We18pplied a small initial radius r = 5 . m sym obtained by this algorithm can be found in the Supporting Information.
50 100 150 200 250 300Number of QM atoms0.000.250.50 M e a n e rr o r o n f o r c e s / k c a l m o l b o h r with thiol groupwithout thiol group Figure 5:
Mean error ε m mean of atomic forces of all non-hydrogen atoms within 4.0 ˚A ofthe central carbon atom in the generated QM/SFAM models for the unsolvated initial structure of insulin. Each point corresponds to a different QM/MM model m . The pointsare colored such that for the orange ones the thiol group SH . is part of the QM region,whereas for the blue points it is part of the environment. The point farthest to the rightcorresponds to the model representing the full QM calculation, which is taken as thereference for the error evaluation. Fig. 5 presents the results of evaluating the mean error ε m mean on the atomic forces ofall non-hydrogen atoms within r repr = 4 . N repr = 10),according to Eq. (25). We observe that only four models exhibit a large error ofmore than 0.75 kcal mol − bohr − . All of these models consist of less than 60 QMatoms. Despite the obvious rationale that more QM atoms should result in a higheraccuracy, the mean error on the forces does not strictly decrease with the size of theQM region. This can be attributed to our choice to generate models with a largevariance in asphericity (achieved by applying a small value for p and measured bythe symmetry score m sym ). Large QM regions with more than about 200 QM atomsmay be lacking a residue in close proximity to the center of the QM region, whichwould then result in a large error ε m mean . However, for the models with the smallest19rror for a given size of the QM region, the trend of a decreasing error is observedfor models with more than 150 QM atoms. QM regions that are very asphericalmay be regarded as unsuitable and should not be considered in an application. Thiscan be achieved by choosing a larger value for p as well as by directly rejectingaspherical models with an improper value for m sym . We demonstrate this with oursecond example, the solvated insulin in section 3.3, for which the expected trend ofdecreasing ε m mean with increasing QM region size can be observed (see below).Moreover, it can be easily seen in Fig. 5 that the data split into two groups, whichare separated by about 0.3 kcal mol − bohr − . It follows from this observation that ε m mean allows us to easily eliminate one of these groups from our consideration. Thecause for this effect can be attributed to including or excluding the thiol group ofthe cysteine residue A20 from the QM atoms. The distance of the sulfur atom ofthis group to the central atom was 5.7 ˚A, which implies that it was not alwaysincluded in the QM region. However, it was close enough to affect the esterificationreaction significantly. We call this group SH . . The coloring in Fig. 5 highlights thisobservation. It demonstrates that our descriptor ε m mean is able to clearly distinguishQM/MM models in which SH . is part of the QM region from those where it isnot. 98.4% of all models m that contain the SH . in their QM region were able toreproduce the reference forces of the full QM calculation with a mean error ε m mean less than 0.3 kcal mol − bohr − , while all of the other models produced a larger errorthan 0.27 kcal mol − bohr − . With this example, in which the crucial functionalgroup can be easily identified, we understand that it is possible to automatically andreliably sort out QM/SFAM models where an unreliable choice of QM region leads tolarge errors in atomic forces. Furthermore, we identified a second functional group(a carboxylic acid moiety (COOH . ) at a distance of 9.5 ˚A to the central carbonatom), for which an effect on the forces is observed. As this residue has a largerdistance to the reaction center, its influence on the latter is smaller, which resultsin the observation that the corresponding groups of data are not well separated.Due to the small size of this effect, we refer to the Supporting Information for itsvisualization (differently colored version of Fig. 5).If a QM/SFAM model is able to accurately describe the forces in the reactant struc-tures, but poorly for intermediates, transition states, or products, it will not besensible to rely on this descriptor for the evaluation of the QM region selectionprotocol. Therefore, we evaluated the atomic forces on the same atoms as beforefor the final (product) structure of the reaction shown in Fig. 3 by applying thesame 2000 automatically selected QM/MM models to assess whether the modelsthat led to a small error for the initial structure also performed well for the finalstructure. The results of this comparison are presented in Fig. 6, in which we encodethe accuracy of the models on the initial structure forces by their color. We observean almost perfect agreement of force deviations measured in terms of ε m mean for theinitial structure and those for the final structure, indicating that our measure forthe reliability of a selected QM region is likely to be transferable across a PES, atleast for close-to-minimum energy structures.20aturally, the size of the QM region may be different for other physical quantitiesand our assumption that the forces are most crucial for making a decision on the size,albeit reasonable from a structural point of view, needs to be scrutinized. Therefore,we now discuss whether those QM/SFAM models that most accurately reproducedforces also deliver reliable energies. For the esterification reaction in dry insulin,the reaction energy calculated as the difference of reduced QM/SFAM energies (seesection 2.5, Eq. (21)) obtained for the product and reactant structures are presentedin Fig. 7. where the same coloring scheme used for the forces of the product structurein Fig. 6 is applied.
50 100 150 200 250 300Number of QM atoms0.00.20.40.60.81.0 M e a n e rr o r o n f o r c e s / k c a l m o l b o h r M e a n e rr o r f o r i n i t i a l s t r u c t u r e / k c a l m o l b o h r Figure 6:
Mean error ε m mean on atomic forces of all non-hydrogen atoms within 4.0 ˚A ofthe central carbon atom for the generated QM/SFAM model for the unsolvated insulin final (product) structure. The model with the smallest QM region exhibits a significantlylarger error than 1 kcal mol − bohr − (as in the case of the initial reactant structure, seeFig. 5) and therefore has been omitted here. The colors encode the mean error thatwas obtained for the initial reactant structure with that model (i.e., the vertical axis inFig. 5). Models with errors larger than 0.3 kcal mol − bohr − on the initial structure arenot further distinguished in color as these are not considered reasonable model candidates.This representation highlights that the accuracy found for the initial structure is matchedby the accuracy obtained at a different location on the PES (here, the final productstructure of the esterification reaction). First, we observe that the average energy error is decreasing continuously withgrowing QM region size. Second, the models that exhibited a large error on theforces (larger than 0.3 kcal mol − bohr − ) are separated from the well-performingQM/SFAM models with only a small number of exceptions. This shows that models21hat were discarded after evaluating their accuracy on the forces are also expectedto generate large errors in energy, confirming the reliability of our selection strategy.However, we also observe that for the well-performing models for which the reactionenergy only fluctuates by less than 2 kcal mol − for a given QM region size, theaccuracy of the forces does not map perfectly to the accuracy of the reaction energies(compared to Fig. 6). Models with differences of less than 0.2 kcal mol − bohr − onthe forces are not distinguished in terms of the energies. However, we still reduce theerror (compared to the most accurate QM/SFAM model for a given QM region size)significantly by excluding the models with large errors on the forces; for instance,it is reduced from roughly 4 kcal mol − to 2 kcal mol − considering QM region sizesbetween 100 and 200 atoms.
50 100 150 200 250 300Number of QM atoms02468101214 A b s o l u t e e rr o r o f r e a c t i o n e n e r g y / k c a l m o l M e a n e rr o r o n f o r c e s f o r i n i t i a l s t r u c t u r e / k c a l m o l b o h r Figure 7:
Absolute error of the esterification reaction energy for each of the generatedQM/SFAM models of the unsolvated chain A of insulin. The colors correspond to themean error on the atomic forces close to the reaction center obtained for the initial struc-ture with that model (i.e., the vertical axis in Fig. 5). Models with errors larger than0.3 kcal mol − bohr − on the initial structure are not further distinguished by color. Fig. 7 shows that the model evaluation based on atomic forces can predict whichmodels exhibit large errors of reaction energy. However, we observe that it cannot beguaranteed that the models with the smallest values of ε m mean also exhibit the small-est energy errors. Considering these observations, we conclude that the descriptor ε m mean can reliably eliminate choices of QM regions, which are lacking residues thatsignificantly affect key physical quantities of the reaction. However, the fact thatthis descriptor is based on a single-point property (the atomic forces), results in botha crucial advantage and a drawback of the method. On the one hand, it allows us22o efficiently test many model candidates in an automated fashion against referencedata that is calculated with models with very large QM regions. On the other hand,we have shown that it cannot be guaranteed that the models with the smallest ε m mean provide reaction energies that are within 1 kcal mol − of the QM reference. Fig. 7demonstrates that in our example, this accuracy can only be achieved by applyingvery large QM regions with more than 250 atoms. During an exploration of a molec-ular reaction, we therefore stress the importance of applying models of several QMregion sizes in single-point energy calculations to probe for convergence in order toclosely monitor the uncertainties by which the QM region selections are affected.Respective algorithmic procedures can be included in automated workflows.The energy errors discussed so far are, of course, given with respect to a DFTreference and therefore affected with some unknown uncertainty. We stress, however,that accurate quantum chemical methods, such as coupled cluster approaches [87,114] with sufficiently high excitation rank and decent one-particle bases combined withbasis-set exploration or explicit correlation factors, can be applied to obtain moreaccurate energies. Finally, we point out that it may be beneficial to extend ourQM region selection process by explicitly adding energies differences of two or morestructures to our descriptor. Hence, we designed our implementation in a modularfashion to allow for such extensions easily. At the example of solvated insulin, we demonstrate that it is possible to obtain reli-able reference forces even for large systems for which one cannot routinely perform afull QM calculation on the whole system. As described in section 3.1, the structuresof the previous example after solvation with water are applied for this purpose.For this solvated insulin structure, we do not want to generate a variety of differentQM regions that is as large in number as the one in section 3.2, because (i) thisincreases the total number of candidate models that needs to be tested (whichwill not be practical in a routine QM/SFAM application) due to the larger totalsystem size, and (ii) the first example already demonstrated that very asphericalQM regions (e.g., large QM regions without the nearby SH . group) do not provideaccurate results. Hence, we set a larger probability of p = 0 . r while increasing this value in steps of 0.1 ˚A starting at 5.5 ˚A and terminating at11 ˚A, yielding 11000 models in total. After the deduplication process, 673 uniqueQM/SFAM models were created. The smallest QM region comprised 68 atoms, thelargest 410 atoms. The obtained distribution of QM region sizes is provided in theSupporting Information.As described in section 2.7, we take the mean of N ref models with very large QMregions to obtain the reference forces. In this case, we assign all models with QM23egions of more than 390 atoms to this set, resulting in N ref = 18. For the assessmentof the models, the atomic forces of all non-hydrogen atoms within 4.0 ˚A of the centralcarbon atom were considered ( N repr = 11). The analysis was performed on the initial(reactant) structure of the esterification reaction and the results are presented inFig. 8.Choosing a larger radius r and cutting probability p to generate the models witha larger QM region (in contrast to the setting in section 3.2), resulted in the obser-vation that all QM/SFAM models with more than 100 QM atoms include the thiolgroup SH . into the QM region. We also observed that generating the QM regions ina systematic way leads to the continuous increase of accuracy with growing QM re-gion size. The models taken as the reference (red points in Fig. 8) do not show largefluctuations in accuracy. A mean deviation ε m mean of 0.05 kcal mol − bohr − for thisset of models was obtained and the maximum deviation ε m max was 0.1 kcal mol − bohr − .From this we can deduce that the functional groups that were present in some ofthe QM regions of these models, but not in all of them, do not have a significanteffect on the forces. Hence, we can reliably apply the reference values obtained bythis strategy. We stress that this approach is unavoidably limited by the maximumQM region size that is still computationally feasible. Therefore, sampling of sev-eral QM/SFAM models with QM regions of such size is crucial in order to obtain areliable reference. 24 M e a n e rr o r o n f o r c e s / k c a l m o l b o h r with thiol groupwithout thiol groupreference models Figure 8:
Mean error ε m mean on the atomic forces of all non-hydrogen atoms within 4.0 ˚A ofthe central carbon atom for the generated QM/SFAM models for the initial structure ofsolvated insulin. Each point corresponds to a different QM/SFAM model. The pointsare colored such that for the orange ones the thiol group SH . is part of the QM region,while for the blue points it is part of the environment. The red points correspond to themodels for which the mean of the forces was taken as the reference (QM regions of atleast 390 atoms). With the fragmentation settings applied here, which differ from thoseof unsolvated insulin, all models with more than 100 QM atoms include the thiol groupinto the QM region. To select a QM region from these data, we consider the models with the smallesterror on the forces for a given range of QM region sizes (constrained by the typeof calculation and the available computational resources). To pick an example, weselect a model with a QM region of 125 ±
10 atoms, which may be consideredcomputationally feasible and of reasonable size for a variety of applications. Withinthe data presented in Fig. 8, this requirement is satisfied by 38 of the generatedmodels. Note that one would typically generate only those models fulfilling thedesired size requirement and the number of candidates can be increased if deemednecessary and feasible (by varying r and p ).We apply a tolerance of 0.05 kcal mol − bohr − (based on the mean deviation in thereference models), which yields seven models with highest accuracy of the forces asthe remaining candidates. We stress that our approach is not able to discriminatebetween these models reliably, as the differences in performance of these models aresmaller than the mean deviation in the reference models. An option to overcome thisissue is to perform additional reference calculations on other structures on the PES25nd deploy these reference data to determine the optimal QM region. Calculatingdata for more than one point on the PES also facilitates the use of energy differencesas a selection criterion.A simpler alternative is to base the final selection on heuristic rules. We chose toapply the following: (i) models with fewer cuts at covalent bonds (i.e., a smallernumber of link atoms m link ) are always preferred within this pre-selection and (ii)models with symmetry scores m sym (see Eq. (23)) that are 50% larger than theminimal m sym are discarded to prevent an unphysical QM region to be selected dueto error compensation. In our current implementation, we employ these rules toguide the final selection, which represents a systematic and reproducible process.However, we plan to extend this process in future work, as outlined above, basedon additional reference data generation to obtain a final selection based purely onfirst-principles data.Figure 9: Molecular structure of the QM region of the automatically selected QM/SFAMmodel. As an example, we limited the acceptable size of this QM region to 125 ± Applying this selection strategy, a QM/SFAM model with a QM region of 127 atoms(excluding 3 link atoms) was selected in our example. The molecular structure ofthis QM region is depicted in Fig. 9. The symmetry score of this QM region is m sym = 1 .
24 and the reaction energy error with this model was 1.96 kcal mol − .Finally, we emphasize that the construction of this QM/SFAM model as well as ofits selection over the other model candidates is fully automated in our implementa-tion. This includes the reference calculation management, which is automatized andparallelized in the same way as has been implemented for the SFAM parametrizationprocess. We write the information about all the necessary reference calculations intoa MongoDB database, [131] which is subsequently processed by n instances of anotherprogram carrying out the calculations and storing the results back into the database.With this set-up, n -fold parallelization of the data generation is enabled and there-26ore many QM/SFAM model candidates can be considered and tested efficiently.This is particularly important because most of the computing time needed by thequantum region selection algorithm can be attributed to the reference calculations.In our set-up, each of the reference calculations (QM/SFAM models with very largequantum regions) was completed in less than one hour applying 8 cores per QMcalculation. In this work, we reported a new QM/MM hybrid model for atomistic simulationswhich features, a system-focused force field for minimized errors. We developeda fully automated set-up of this QM/SFAM model, which by construction (i.e.,by virtue of the salient features of SFAM) is not plagued with typical limitationsof standard force fields (such as missing parameters for specific metal atoms inrelevant valence states). However, if required, the methodology reported can becombined with such a standard force field (we implemented the general AMBERforce field (GAFF) [132] ). Our implementation will be available within the open-source SCINE platform. [129]
As a result, the cumbersome manual set-up of QM/MM models has been deci-sively alleviated, up to the point where it can be driven in a fully automated way,which opens up new avenues for QM/MM approaches; e.g., (i) in interactive ap-proaches, [101,102] where operator-defined abrupt changes of focus occur, (ii) in situa-tions of quickly changing reactive sites because of highly mobile or volatile reactants,or (iii) in studies of complex chemical systems with varying environments such asenzymes generated by high-throughput directed evolution.If, during a molecular exploration, new covalent bonds are formed and then shiftedto the MM region (e.g., because the QM region is moved to a different local regionof the full structure), molecular-mechanics parameters may be missing for this newchemical environment in the classical region. However, the SFAM ansatz allows ourimplementation to quickly re-parametrize this new local situation with only minimalcomputational effort. Furthermore, our implementation is flexible enough to allowfor two or more (unconnected) QM regions in the model.Our automated model construction process also allows for the generation and appli-cation of several models with differently sized QM regions in parallel. This enablesus to estimate and control the uncertainty of the model constantly, even in fullyautomated exploration set-ups. [99,100]
As was demonstrated in section 3.2, this willbe of great importance when calculating physical quantities (e.g., reaction energies)that are not directly related to the atomic forces on which we based our modelselection criterion. However, the modular nature of our implementation allows forextending the selection criteria to include additional quantities if necessary.27 cknowledgments
C. B. gratefully acknowledges support by a Kekul´e Ph.D. fellowship of the Fondsder Chemischen Industrie. The authors thank the Schweizerischer Nationalfonds forgenerous support (Projects 200021 182400 to M. R. and 200021 172950-1 (C. B.) toPD Dr. Thomas Hofstetter).
References [1] Senn, H. M.; Thiel, W. QM/MM Methods for Biological Systems.
Top. Curr.Chem. , , 173–290.[2] Groenhof, G. Biomolecular Simulations ; Methods in Molecular Biology; Hu-mana Press, Totowa, NJ, 2013; pp 43–66.[3] Riniker, S. Fixed-charge atomistic force fields for molecular dynamics simula-tions in the condensed phase: An overview.
J. Chem. Inf. Model. , ,565–578.[4] Warshel, A.; Levitt, M. Theoretical studies of enzymic reactions: Dielectric,electrostatic and steric stabilization of the carbonium ion in the reaction oflysozyme. J. Mol. Biol. , , 227–249.[5] Thiel, W.; Hummer, G. Nobel 2013 Chemistry: Methods for computationalchemistry. Nature , , 96.[6] Brunken, C.; Reiher, M. Self-Parametrizing System-Focused Atomistic Mod-els. J. Chem. Theory Comput. , , 1646–1665.[7] Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Big data meetsquantum chemistry approximations: The ∆-machine learning approach. J.Chem. Theory Comput. , , 2087–2096.[8] Karelina, M.; Kulik, H. J. Systematic quantum mechanical region determina-tion in QM/MM simulation. J. Chem. Theory Comput. , , 563–576.[9] Sumowski, C. V.; Ochsenfeld, C. A convergence study of QM/MM isomeriza-tion energies with the selected size of the QM region for peptidic systems. J.Phys. Chem. A , , 11734–11741.[10] Flaig, D.; Beer, M.; Ochsenfeld, C. Convergence of electronic structure withthe size of the QM region: example of QM/MM NMR shieldings. J. Chem.Theory Comput. , , 2260–2271.[11] Roßbach, S.; Ochsenfeld, C. Influence of Coupling and Embedding Schemes onQM Size Convergence in QM/MM Approaches for the Example of a ProtonTransfer in DNA. J. Chem. Theory Comput. , , 1102–1107.2812] Kulik, H. J.; Zhang, J.; Klinman, J. P.; Martinez, T. J. How large shouldthe QM region be in QM/MM calculations? The case of catechol O-methyltransferase. J. Phys. Chem. B , , 11381–11394.[13] Hu, L.; S¨oderhjelm, P.; Ryde, U. On the convergence of QM/MM energies. J.Chem. Theory Comput. , , 761–777.[14] Liao, R.-Z.; Thiel, W. Convergence in the QM-only and QM/MM modeling ofenzymatic reactions: A case study for acetylene hydratase. J. Comput. Chem. , , 2389–2397.[15] Retegan, M.; Neese, F.; Pantazis, D. A. Convergence of QM/MM and clustermodels for the spectroscopic properties of the oxygen-evolving complex inphotosystem II. J. Chem. Theory Comput. , , 3832–3842.[16] Grimme, S. A General Quantum Mechanically Derived Force Field (QMDFF)for Molecules and Condensed Phase Simulations. J. Chem. Theory Comput. , , 4497–4514.[17] Vanduyfhuys, L.; Vandenbrande, S.; Verstraelen, T.; Schmid, R.; Waro-quier, M.; Van Speybroeck, V. QuickFF: A program for a quick and easyderivation of force fields for metal-organic frameworks from ab initio input. J.Comput. Chem. , , 1015–1027.[18] Wang, R.; Ozhgibesov, M.; Hirao, H. Partial hessian fitting for determiningforce constant parameters in molecular mechanics. J. Comput. Chem. , , 2349–2359.[19] Wang, R.; Ozhgibesov, M.; Hirao, H. Analytical hessian fitting schemes forefficient determination of force-constant parameters in molecular mechanics. J. Comput. Chem. , , 307–318.[20] Li, Z.; Kermode, J. R.; De Vita, A. Molecular dynamics with on-the-fly ma-chine learning of quantum-mechanical forces. Phys. Rev. Lett. , ,096405.[21] Chmiela, S.; Tkatchenko, A.; Sauceda, H. E.; Poltavsky, I.; Sch¨utt, K. T.;M¨uller, K.-R. Machine learning of accurate energy-conserving molecular forcefields. Sci. Adv. , , e1603015.[22] Glielmo, A.; Sollich, P.; De Vita, A. Accurate interatomic force fields viamachine learning with covariant kernels. Phys. Rev. B , , 214302.[23] Chmiela, S.; Sauceda, H. E.; M¨uller, K.-R.; Tkatchenko, A. Towards exactmolecular dynamics simulations with machine-learned force fields. Nat. Com-mun. , , 3887.[24] Amabilino, S.; Bratholm, L. A.; Bennie, S. J.; O’Connor, M. B.;Glowacki, D. R. Training atomic neural networks using fragment-based datagenerated in virtual reality. arXiv preprint arXiv:2007.02824 ,2925] Zhang, Y.-J.; Khorshidi, A.; Kastlunger, G.; Peterson, A. A. The potential formachine learning in hybrid QM/MM calculations. J. Chem. Phys. , ,241740.[26] B¨oselt, L.; Th¨urlemann, M.; Riniker, S. Machine Learning in QM/MM Molec-ular Dynamics Simulations of Condensed-Phase Systems. , arXiv preprintarXiv:2010.11610.[27] Johnson, E. R.; Mackie, I. D.; DiLabio, G. A. Dispersion interactions indensity-functional theory. J. Phys. Org. Chem. , , 1127–1135.[28] Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A consistent and accurate abinitio parametrization of density functional dispersion correction (DFT-D) forthe 94 elements H-Pu. J. Chem. Phys. , , 154104.[29] Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the damping function in disper-sion corrected density functional theory. J. Comput. Chem. , , 1456–1465.[30] Grimme, S. Density functional theory with London dispersion corrections. WIREs Comput. Mol. Sci. , , 211–228.[31] Field, M. J.; Bash, P. A.; Karplus, M. A combined quantum mechanical andmolecular mechanical potential for molecular dynamics simulations. J. Com-put. Chem. , , 700–733.[32] Singh, U. C.; Kollman, P. A. A combined ab initio quantum mechanical andmolecular mechanical method for carrying out simulations on complex molecu-lar systems: Applications to the CH3Cl+ Cl- exchange reaction and gas phaseprotonation of polyethers. J. Comput. Chem. , , 718–730.[33] Maseras, F.; Morokuma, K. IMOMM: A new integrated ab initio+ molec-ular mechanics geometry optimization scheme of equilibrium structures andtransition states. J. Comput. Chem. , , 1170–1179.[34] Eichler, U.; K¨olmel, C. M.; Sauer, J. Combining ab initio techniques with an-alytical potential functions for structure predictions of large systems: Methodand application to crystalline silica polymorphs. J. Comput. Chem. , ,463–477.[35] Antes, I.; Thiel, W. Combined Quantum Mechanical and Molecular MechanicalMethods ; 1998; Chapter 4, pp 50–65.[36] Das, D.; Eurenius, K. P.; Billings, E. M.; Sherwood, P.; Chatfield, D. C.;Hodoˇsˇcek, M.; Brooks, B. R. Optimization of quantum mechanical molecularmechanical partitioning schemes: Gaussian delocalization of molecular me-chanical charges and the double link atom method.
J. Chem. Phys. , , 10534–10547. 3037] Swart, M. AddRemove: A new link model for use in QM/MM studies. Int. J.Quantum Chem. , , 177–183.[38] Th´ery, V.; Rinaldi, D.; Rivail, J.-L.; Maigret, B.; Ferenczy, G. G. Quantummechanical computations on very large molecular systems: The local self-consistent field method. J. Comput. Chem. , , 269–282.[39] Monard, G.; Loos, M.; Th´ery, V.; Baka, K.; Rivail, J.-L. Hybrid classicalquantum force field for modeling very large molecules. Int. J. Quantum Chem. , , 153–159.[40] Assfeld, X.; Rivail, J.-L. Quantum chemical computations on parts of largemolecules: the ab initio local self consistent field method. Chem. Phys. Lett. , , 100–106.[41] Ferr´e, N.; Assfeld, X.; Rivail, J.-L. Specific force field parameters determina-tion for the hybrid ab initio QM/MM LSCF method. J. Comput. Chem. , , 610–624.[42] Gao, J.; Amara, P.; Alhambra, C.; Field, M. J. A generalized hybrid orbital(GHO) method for the treatment of boundary atoms in combined QM/MMcalculations. J. Phys. Chem. A , , 4714–4721.[43] Amara, P.; Field, M. J.; Alhambra, C.; Gao, J. The generalized hybrid orbitalmethod for combined quantum mechanical/molecular mechanical calculations:formulation and tests of the analytical derivatives. Theor. Chem. Acc. , , 336–343.[44] Garcia-Viloca, M.; Gao, J. Generalized hybrid orbital for the treatment ofboundary atoms in combined quantum mechanical and molecular mechanicalcalculations using the semiempirical parameterized model 3 method. Theor.Chem. Acc. , , 280–286.[45] Huzinaga, S.; Cantu, A. Theory of separability of many-electron systems. J.Chem. Phys. , , 5543–5549.[46] Manby, F. R.; Stella, M.; Goodpaster, J. D.; Miller, T. F. A Simple, Ex-act Density-Functional-Theory Embedding Scheme. J. Chem. Theory Comput. , , 2564–2568.[47] Fornace, M. E.; Lee, J.; Miyamoto, K.; Manby, F. R.; Miller III, T. F. Em-bedded mean-field theory. J. Chem. Theory Comput. , , 568–580.[48] H´egely, B.; Nagy, P. R.; Ferenczy, G. G.; K´allay, M. Exact density functionaland wave function embedding schemes based on orbital localization. J. Chem.Phys. , , 064107.[49] Lee, S. J.; Welborn, M.; Manby, F. R.; Miller III, T. F. Projection-BasedWavefunction-in-DFT Embedding. Acc. Chem. Res. , , 1359–1368.3150] Wesolowski, T. A.; Warshel, A. Frozen density functional approach for abinitio calculations of solvated molecules. J. Phys. Chem. , , 8050–8053.[51] Neugebauer, J.; Louwerse, M. J.; Baerends, E. J.; Wesolowski, T. A. Themerits of the frozen-density embedding scheme to model solvatochromic shifts. J. Chem. Phys. , , 094115.[52] Neugebauer, J.; Jacob, C. R.; Wesolowski, T. A.; Baerends, E. J. An ExplicitQuantum Chemical Method for Modeling Large Solvation Shells Applied toAminocoumarin C151. J. Phys. Chem. A , , 7805–7814.[53] Neugebauer, J.; Louwerse, M. J.; Belanzoni, P.; Wesolowski, T. A.;Baerends, E. J. Modeling solvent effects on electron-spin-resonance hyperfinecouplings by frozen-density embedding. J. Chem. Phys. , , 114101.[54] Neugebauer, J.; Baerends, E. J. Exploring the Ability of Frozen-Density Em-bedding to Model Induced Circular Dichroism. J. Phys. Chem. A , ,8786–8796.[55] Jacob, C. R.; Neugebauer, J.; Jensen, L.; Visscher, L. Comparison of frozen-density embedding and discrete reaction field solvent models for molecularproperties. Phys. Chem. Chem. Phys. , , 2349–2359.[56] Weso(cid:32)lowski, T. A. Embedding a multideterminantal wave function in anorbital-free environment. Phys. Rev. A , , 012504.[57] Jacob, C. R.; Neugebauer, J.; Visscher, L. A flexible implementation of frozen-density embedding for use in multilevel simulations. J. Comput. Chem. , , 1011–1018.[58] Pernal, K.; Wesolowski, T. A. Orbital-free effective embedding potential:Density-matrix functional theory case. Int. J. Quantum Chem. , ,2520–2525.[59] Fux, S.; Jacob, C. R.; Neugebauer, J.; Visscher, L.; Reiher, M. Accurate frozen-density embedding potentials as a first step towards a subsystem descriptionof covalent bonds. J. Chem. Phys. , , 164101.[60] Jacob, C. R.; Neugebauer, J. Subsystem density-functional theory. WIREsComput. Mol. Sci. , , 325–362.[61] Wesolowski, T. A.; Shedge, S.; Zhou, X. Frozen-density embedding strategyfor multilevel simulations of electronic structure. Chem. Rev. , , 5891–5928.[62] M¨uhlbach, A. H.; Reiher, M. Quantum system partitioning at the single-particle level. J. Chem. Phys. , , 184104.[63] Huheey, J. E.; Keiter, E. A.; Keiter, R. L.; Medhi, O. K. Inorganic chemistry:principles of structure and reactivity ; Pearson Education India, 2006.3264] Greenwood, N. N.; Earnshaw, A.
Chemistry of the Elements ; Elsevier, 2012.[65] Walker, R. C.; Crowley, M. F.; Case, D. A. The implementation of a fast andaccurate QM/MM potential method in Amber.
J. Comput. Chem. , ,1019–1031.[66] Eurenius, K. P.; Chatfield, D. C.; Brooks, B. R.; Hodoscek, M. Enzyme mech-anisms with hybrid quantum and molecular mechanical potentials. I. Theoret-ical considerations. Int. J. Quantum Chem. , , 1189–1200.[67] Ryde, U. The coordination of the catalytic zinc ion in alcohol dehydrogenasestudied by combined quantum-chemical and molecular mechanics calculations. J. Comput.-Aided Mol. Des. , , 153–164.[68] Sherwood, P.; de Vries, A. H.; Collins, S. J.; Greatbanks, S. P.; Burton, N. A.;Vincent, M. A.; Hillier, I. H. Computer simulation of zeolite structure andreactivity using embedded cluster methods. Faraday Discuss. , , 79–92.[69] De Vries, A. H.; Sherwood, P.; Collins, S. J.; Rigby, A. M.; Rigutto, M.;Kramer, G. J. Zeolite structure and reactivity by combined quantum-chemical-classical calculations. J. Phys. Chem. B , , 6133–6141.[70] Sherwood, P.; de Vries, A. H.; Guest, M. F.; Schreckenbach, G.; Catlow, C. A.;French, S. A.; Sokol, A. A.; Bromley, S. T.; Thiel, W.; Turner, A. J.; Bil-leter, S.; Terstegen, F.; Thiel, S.; Kendrick, J.; Rogers, S. C.; Casci, J.; Wat-son, M.; King, F.; Karlsen, E.; Sjøvoll, M.; Fahmi, A.; Sch¨afer, A.; Lennartz, C.QUASI: A general purpose implementation of the QM/MM approach and itsapplication to problems in catalysis. J. Mol. Struct.: THEOCHEM , ,1–28.[71] K¨onig, P.; Hoffmann, M.; Frauenheim, T.; Cui, Q. A critical evaluation ofdifferent QM/MM frontier treatments with SCC-DFTB as the QM method. J. Phys. Chem. B , , 9082–9095.[72] Lin, H.; Truhlar, D. G. Redistributed charge and dipole schemes for combinedquantum mechanical and molecular mechanical calculations. J. Phys. Chem.A , , 3991–4004.[73] Eichinger, M.; Tavan, P.; Hutter, J.; Parrinello, M. A hybrid method for solutesin complex solvents: Density functional theory combined with empirical forcefields. J. Chem. Phys. , , 10452–10467.[74] Amara, P.; Field, M. J. Evaluation of an ab initio quantum mechani-cal/molecular mechanical hybrid-potential link-atom method. Theor. Chem.Acc. , , 43–52. 3375] Hall, R. J.; Hindle, S. A.; Burton, N. A.; Hillier, I. H. Aspects of hybridQM/MM calculations: the treatment of the QM/MM interface region andgeometry optimization with an application to chorismate mutase. J. Comput.Chem. , , 1433–1441.[76] Vreven, T.; Morokuma, K.; Farkas, ¨O.; Schlegel, H. B.; Frisch, M. J. Geom-etry optimization with QM/MM, ONIOM, and other combined methods. I.Microiterations and constraints. J. Comput. Chem. , , 760–769.[77] Vreven, T.; Morokuma, K. Hybrid Methods: ONIOM(QM:MM) andQM/MM. Annu. Rep. Comput. Chem. , , 35–51.[78] K¨astner, J.; Thiel, S.; Senn, H. M.; Sherwood, P.; Thiel, W. ExploitingQM/MM capabilities in geometry optimization: A microiterative approachusing electrostatic embedding. J. Chem. Theory Comput. , , 1064–1072.[79] Melaccio, F.; Olivucci, M.; Lindh, R.; Ferr´e, N. Unique QM/MM potential en-ergy surface exploration using microiterations. Int. J. Quantum Chem. , , 3339–3346.[80] Metz, S.; K¨astner, J.; Sokol, A. A.; Keal, T. W.; Sherwood, P. Chem Shell—amodular software package for QM/MM simulations. Wiley Interdiscip. Rev.:Comput. Mol. Sci. , , 101–110.[81] Caprasecca, S.; Jurinovich, S.; Viani, L.; Curutchet, C.; Mennucci, B. Geom-etry optimization in polarizable QM/MM models: the induced dipole formu-lation. J. Chem. Theory Comput. [82] Bosia, F.; Brunken, C.; Grimmel, S. A.; Haag, M. P.; Heuer, M. A.;Simm, G. N.; Sobez, J.-G.; Steiner, M.; Unsleber, J. P.; Vaucher, A. C.;Weymuth, T.; Reiher, M. SCINE Utilities: Release 2.0.0. 2020; https://doi.org/10.5281/zenodo.3828692 .[83] Bryan, J. G.; Wadsworth, G. P.
Introduction to probability and random vari-ables ; New York: McGraw-Hill, 1960.[84] Feller, W.
An introduction to probability theory and its applications ; JohnWiley & Sons, 2008; Vol. 2.[85] Weymuth, T.; Proppe, J.; Reiher, M. Statistical Analysis of SemiclassicalDispersion Corrections.
J. Chem. Theory Comput. , , 2480–2494.[86] Proppe, J.; Gugler, S.; Reiher, M. Gaussian Process-Based Refinement of Dis-persion Corrections. J. Chem. Theory Comput. , , 6046–6060.[87] Claeyssens, F.; Harvey, J. N.; Manby, F. R.; Mata, R. A.; Mulholland, A. J.;Ranaghan, K. E.; Sch¨utz, M.; Thiel, S.; Thiel, W.; Werner, H.-J. High-Accuracy Computation of Reaction Barriers in Enzymes. Angew. Chem. Int.Ed. , , 6856–6859. 3488] Walker, R. C.; Mercer, I. P.; Gould, I. R.; Klug, D. R. Comparison of basisset effects and the performance of ab initio and DFT methods for probingequilibrium fluctuations. J. Comput. Chem. , , 478–490.[89] Mart´ı, M. A.; Capece, L.; Bikiel, D. E.; Falcone, B.; Estrin, D. A. Oxygen affin-ity controlled by dynamical distal conformations: the soybean leghemoglobinand the Paramecium caudatum hemoglobin cases. Proteins: Struct., Funct.,Bioinf. , , 480–487.[90] Mata, R. A.; Werner, H.-J.; Thiel, S.; Thiel, W. Toward accurate barriers forenzymatic reactions: QM/MM case study on p-hydroxybenzoate hydroxylase. J. Chem. Phys. , , 01B610.[91] Liao, R.-Z.; Thiel, W. Comparison of QM-only and QM/MM models for themechanism of tungsten-dependent acetylene hydratase. J. Chem. Theory Com-put. , , 3793–3803.[92] Polyak, I.; Reetz, M. T.; Thiel, W. Quantum mechanical/molecular mechanicalstudy on the mechanism of the enzymatic Baeyer–Villiger reaction. J. Am.Chem. Soc. , , 2732–2741.[93] Berraud-Pache, R.; Navizet, I. QM/MM calculations on a newly synthesisedoxyluciferin substrate: new insights into the conformational effect. Phys.Chem. Chem. Phys. , , 27460–27467.[94] Bathelt, C. M.; Zurek, J.; Mulholland, A. J.; Harvey, J. N. Electronic struc-ture of compound I in human isoforms of cytochrome P450 from QM/MMmodeling. J. Am. Chem. Soc. , , 12900–12908.[95] Geronimo, I.; Paneth, P. A DFT and ONIOM study of C–H hydroxylationcatalyzed by nitrobenzene 1, 2-dioxygenase. Phys. Chem. Chem. Phys. , , 13889–13899.[96] Cooper, A. M.; K¨astner, J. Averaging Techniques for Reaction Barriers inQM/MM Simulations. ChemPhysChem , , 3264–3269.[97] Finkelmann, A. R.; Senn, H. M.; Reiher, M. Hydrogen-activation mechanismof [Fe] hydrogenase revealed by multi-scale modeling. Chem. Sci. , ,4474–4482.[98] Ke, Z.; Abe, S.; Ueno, T.; Morokuma, K. Catalytic Mechanism in Artifi-cial Metalloenzyme: QM/MM Study of Phenylacetylene Polymerization byRhodium Complex Encapsulated in apo-Ferritin. J. Am. Chem. Soc. , , 15418–15429.[99] Simm, G. N.; Vaucher, A. C.; Reiher, M. Exploration of Reaction Pathwaysand Chemical Transformation Networks. J. Phys. Chem. A , , 385–399. 35100] Unsleber, J. P.; Reiher, M. The Exploration of Chemical Reaction Networks. Annu. Rev. Phys. Chem. , , 121–142.[101] Haag, M. P.; Vaucher, A. C.; Bosson, M.; Redon, S.; Reiher, M. Interactivechemical reactivity exploration. ChemPhysChem , , 3301–3319.[102] Vaucher, A. C.; Haag, M. P.; Reiher, M. Real-time feedback from iterativeelectronic structure calculations. J. Comput. Chem. , , 805–812.[103] Grimme, S. Semiempirical GGA-type density functional constructed with along-range dispersion correction. J. Comput. Chem. , , 1787–1799.[104] Mayer, I. Charge, bond order and valence in the AB initio SCF theory. Chem.Phys. Lett. , , 270–274.[105] Mayer, I. On bond orders and valences in the ab initio quantum chemicaltheory. Int. J. Quantum Chem. , , 73–84.[106] Heyden, A.; Lin, H.; Truhlar, D. G. Adaptive partitioning in combined quan-tum mechanical and molecular mechanical calculations of potential energyfunctions for multiscale simulations. J. Phys. Chem. B , , 2231–2241.[107] Bulo, R. E.; Ensing, B.; Sikkema, J.; Visscher, L. Toward a practical methodfor adaptive QM/MM simulations. J. Chem. Theory Comput. , , 2212–2221.[108] Mones, L.; Jones, A.; G¨otz, A. W.; Laino, T.; Walker, R. C.; Leimkuhler, B.;Cs´anyi, G.; Bernstein, N. The adaptive buffered force QM/MM method in theCP2K and AMBER software packages. J. Comput. Chem. , , 633–648.[109] Zheng, M.; Waller, M. P. Adaptive quantum mechanics/molecular mechanicsmethods. WIREs Comput. Mol. Sci. , , 369–385.[110] Duster, A. W.; Wang, C.-H.; Garza, C. M.; Miller, D. E.; Lin, H. Adaptivequantum/molecular mechanics: What have we learned, where are we, andwhere do we go from here? WIREs Comput. Mol. Sci. , , e1310.[111] Neese, F.; Wennmohs, F.; Hansen, A. Efficient and accurate local approxi-mations to coupled-electron pair approaches: An attempt to revive the pairnatural orbital method. J. Chem. Phys. , , 114108.[112] Neese, F.; Hansen, A.; Liakos, D. G. Efficient and accurate approximations tothe local coupled cluster singles doubles method using a truncated pair naturalorbital basis. J. Chem. Phys. , , 064103.[113] Hansen, A.; Liakos, D. G.; Neese, F. Efficient and accurate local single refer-ence correlation methods for high-spin open-shell molecules using pair naturalorbitals. J. Chem. Phys. , , 214102.36114] Bistoni, G.; Polyak, I.; Sparta, M.; Thiel, W.; Neese, F. Toward accurateQM/MM reaction barriers with large QM regions using domain based pairnatural orbital coupled cluster theory. J. Chem. Theory Comput. , ,3524–3531.[115] Proppe, J.; Reiher, M. Mechanism deduction from noisy chemical reactionnetworks. J. Chem. Theory Comput. , , 357–370.[116] Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. , ,73–78.[117] Neese, F. Software update: the ORCA program system, version 4.0. WIREsComput. Mol. Sci. , , e1327.[118] Sonksen, P.; Sonksen, J. Insulin: understanding its action in health and dis-ease. Br. J. Anaesth. , , 69–79.[119] Bernstein, F. C.; Koetzle, T. F.; Williams, G. J.; Meyer Jr, E. F.; Brice, M. D.;Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. The Protein DataBank: a computer-based archival file for macromolecular structures. J. Mol.Biol. , , 535–542.[120] Stewart, J. J. Optimization of parameters for semiempirical methods I.Method. J. Comput. Chem. , , 209–220.[121] Stewart, J. J. Optimization of parameters for semiempirical methods II. Ap-plications. J. Comput. Chem. , , 221–264.[122] Whitten, J. L. Coulombic potential energy integrals and approximations. J.Chem. Phys. , , 4496–4501.[123] Dunlap, B. I.; Connolly, J.; Sabin, J. On some approximations in applicationsof X α theory. J. Chem. Phys. , , 3396–3402.[124] Vahtras, O.; Alml¨of, J.; Feyereisen, M. Integral approximations for LCAO-SCFcalculations. Chem. Phys. Lett. , , 514–518.[125] Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient ApproximationMade Simple. Phys. Rev. Lett. , , 3865–3868.[126] Weigend, F.; Ahlrichs, R. Balanced basis sets of split valence, triple zeta va-lence and quadruple zeta valence quality for H to Rn: Design and assessmentof accuracy. Phys. Chem. Chem. Phys. , , 3297–3305.[127] Hirshfeld, F. L. Bonded-atom fragments for describing molecular charge den-sities. Theor. Chim. Acta , , 129–138.[128] Marenich, A. V.; Jerome, S. V.; Cramer, C. J.; Truhlar, D. G. Charge Model 5:An Extension of Hirshfeld Population Analysis for the Accurate Descriptionof Molecular Interactions in Gaseous and Condensed Phases. J. Chem. TheoryComput. , , 527–541. 37129] SCINE − Software for Chemical Interaction Networks. http://scine.ethz.ch/ , visited on 2020-07-07.[130] te Velde, G.; Bickelhaupt, F. M.; Baerends, E. J.; Fonseca Guerra, C.; van Gis-bergen, S. J. A.; Snijders, J. G.; Ziegler, T. Chemistry with ADF.
J. Comput.Chem. , , 931–967.[131] MongoDB Inc., MongoDB 3.2. , visited on 2020-08-08.[132] Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Develop-ment and testing of a general amber force field. J. Comput. Chem. ,25