[PDF] Efficient Formulation of Polarizable Gaussian Multipole Electrostatics for Biomolecular Simulations

Abstract

Molecular dynamics simulations of biomolecules have been widely adopted in biomedical studies. As classical point-charge models continue to be used in routine biomolecular applications, there have been growing demands on developing polarizable force fields for handling more complicated biomolecular processes. Here we focus on a recently proposed polarizable Gaussian Multipole (pGM) model for biomolecular simulations. A key benefit of pGM is its screening of all short-range electrostatic interactions in a physically consistent manner, which is critical for stable charge-fitting and is needed to reproduce molecular anisotropy. Another advantage of pGM is that each atom's multipoles are represented by a single Gaussian function or its derivatives, allowing for more efficient electrostatics than other Gaussian-based models. In this study we present an efficient formulation for the pGM model defined with respect to a local frame formed with a set of covalent basis vectors. The covalent basis vectors are chosen to be along each atom's covalent bonding directions. The new local frame allows molecular flexibility during molecular simulations and facilitates an efficient formulation of analytical electrostatic forces without explicit torque computation. Subsequent numerical tests show that analytical atomic forces agree excellently with numerical finite-difference forces for the tested system. Finally, the new pGM electrostatics algorithm is interfaced with the PME implementation in Amber for molecular simulations under the periodic boundary conditions. To validate the overall pGM/PME electrostatics, we conducted an NVE simulation for a small water box of 512 water molecules. Our results show that, to achieve energy conservation in the polarizable model, it is important to ensure enough accuracy on both PME and induction iteration.

Full PDF

1 Efficient Formulation of Polarizable Gaussian Multipole Electrostatics for Biomolecular Simulations

Haixin Wei † , Ruxi Qi † , Junmei Wang ‡ , Piotr Cieplak ^ , Yong Duan § , and Ray Luo †,* † Departments of Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, Materials Science and Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California, Irvine, Irvine, California 92697, United States ‡ Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States ^ SBP Medical Discovery Institute, 10901 North Torrey Pines Road, La Jolla, California 92037, United States § UC Davis Genome Center and Department of Biomedical Engineering, University of California, Davis, One Shields Avenue, Davis, California 95616, United States Corresponding authors: [email protected]

Abstract

Molecular dynamics simulations of biomolecules have been widely adopted in biomedical studies. As classical point-charge models continue to be used in routine biomolecular applications, there have been growing demands on developing polarizable force fields for handling more complicated biomolecular processes. Here we focus on a recently proposed polarizable Gaussian Multipole (pGM) model for biomolecular simulations. A key benefit of pGM is its screening of all short-range electrostatic interactions in a physically consistent manner, which is critical for stable charge-fitting and is needed to reproduce molecular anisotropy. Another advantage of pGM is that each atom’s multipoles are represented by a single Gaussian function or its derivatives, allowing for more efficient electrostatics than other Gaussian-based models. In this study we present an efficient formulation for the pGM model defined with respect to a local frame formed with a set of covalent basis vectors. The covalent basis vectors are chosen to be along each atom’s covalent bonding directions. The new local frame allows molecular flexibility during molecular simulations and facilitates an efficient formulation of analytical electrostatic forces without explicit torque computation. Subsequent numerical tests show that analytical atomic forces agree excellently with numerical finite-difference forces for the tested system. Finally, the new pGM electrostatics algorithm is interfaced with the PME implementation in Amber for molecular simulations under the periodic boundary conditions. To validate the overall pGM/PME electrostatics, we conducted an NVE simulation for a small water box of 512 water molecules. Our results show that, to achieve energy conservation in the polarizable model, it is important to ensure enough accuracy on both PME and induction iteration. It is hoped that the reformulated pGM model will facilitate the development of future force fields based on the pGM electrostatics for applications in biomolecular systems and processes where polarization plays crucial roles.

1. Introduction

Atomistic simulations of biomolecules have been applied in a wide range of biological systems. While additive nonpolarizable models will continue to play important roles, nonadditive polarizable models are expected to extend our ability to study more complex biomolecular systems and processes. Nonpolarizable models typically use fixed atom-centered partial charges to model electrostatics and include polarization response to the environment (mostly in water) only in an averaged, mean-field manner. Subsequently, nonpolarizable models that provide excellent descriptions of the homogeneous bulk phase are poor models for gas-phase clusters or in nonpolar solvents. The importance of modeling nonadditive effects is well known. For example, the gas-phase water dimer interaction energy is overestimated by more than 30% in the TIP5P model. Similarly for large biomolecular systems, there are concerns that such models cannot correctly account for situations where the same nonpolarizable moiety is exposed to different electrostatic environments/solvents, either within a single large structure or during a simulation process. In addition, there is an inherent inconsistency in most nonpolarizable models related to their static inclusion of average bulk polarization within the potential. This results in internal energies and other properties that are derived against a gas-phase reference state, which is already “pre-polarized” for the liquid phase. These limitations lead to issues in modeling multiple important problems such as pH-dependent processes, ion-dependent interactions, order-disorder transition, enzyme reactions, and so on. In response to the above concerns, much effort has been invested on the inclusion of explicit polarization within the molecular mechanics potentials.

Several methods are available to explicitly model polarization in molecular simulations, such as the Drude oscillator,

10, 11 fluctuating charges, and induced dipoles.

6, 13, 14

The use of polarizable point dipoles is a classical approach with a long history in molecular simulation. The original induced dipole model of Applequist places induced point dipoles on atom centers. However, this model suffers from the so-called “polarization catastrophe”: when interaction between two mutually interacting induced dipoles with atomic polarizabilities diverge at a finite distance. Thole proposed a solution by applying a damping function to induced dipole – induced dipole interactions. However, a drawback to this model is that it does not prescribe how induced dipoles and permanent charges interact. A great deal of effort has been devoted to developing modern polarizable models, including the fluctuating charge models

18, 19 in the context of OPLS-AA, the fluctuating charge model and the Drude oscillator model in the context of CHARMM, and detailed multipole expansions and more complicated MM potentials in the context of Amoeba. In Amber, polarization was implemented with induced dipoles. In Amber ff12pol, the induced dipoles are calculated using Thole models to avoid “polarization catastrophe”.

Another limitation of widely adopted nonpolarizable models is their use of partial atomic charges in the electrostatic models, which often lack sufficient mathematical flexibility to describe the electrostatic potential around molecules. Williams showed that optimal least-squares fitting of atom-centered partial charges resulted in relative root-mean-square errors of 3-10% over a set of grid points in a shell outside the surface of a series of small polar molecules. These errors were reduced by 2-3 orders of magnitude via the use of higher atomic multipoles. In Amoeba force fields, multipoles are placed on each atom, allowing better capture of electrostatic potential distribution around molecules.

31, 32

Gaussian electrostatic model (GEM) is a force field based on density fitting, which can extend to arbitrary angular momentums (multipoles).

Of course there are many other proposals to model electronic polarization in the literature.

12, 36-38

Recently, Elking et al. proposed to a polarizable multipole model with Gaussian charge densities. A key benefit of the polarizable Gaussian Model (pGM) is its screening of all short-range electrostatic interactions in a physically consistent manner. This is critical for stable charge-fitting in polarizable force fields when the polarization of 1-2 and 1-3 charges are included and are needed to reproduce molecular anisotropy, as we discussed before. An advantage of pGM is that each atom’s multipoles are represented by a single Gaussian function and its derivatives with different amplitudes. Therefore, pGM is a minimalist Gaussian polarizable model. In comparison, the GEM model treats nuclear charges explicitly and uses Hermite Gaussian auxiliary basis sets to reproduce atomic electron density, so it has the potential to represent the short-range interactions more faithfully than the pGM model. However, because the computational cost of the nonbonded electrostatic calculation scales as the squared number of functions on each atom, the multiple functions used to represent each atom in GEM can notably increase the simulation cost. The increased number of parameters associated with the functions may also pose additional challenges in parameterization. Another major difference is that GEM, like several other efforts, such as X-Pol and Amoeba,

31, 32 uses electronic densities to model molecular polarization and other effects. In comparison, our pGM model follows the Amber tradition and uses the ab initio electrostatic potential (ESP) to fit the parameters of atomic partial charges and dipoles. Most macromolecular simulations with long-range electrostatic interactions are performed using periodic boundary conditions. A rigorous treatment of electrostatic interactions in periodic boundary conditions requires a careful treatment of the associated lattice sums. Thus, the widely used lattice sum methods, like particle mesh Ewald (PME), need to be extended to handle multipolar related summations. Fortunately, efficient implementations of PME of dipoles and higher multipoles are already available in widely used software package such as Amber.

42, 43

This greatly simplifies the integration of pGM with PME for molecular simulations. In the following sections we first describe the detailed pGM electrostatics scheme with a focus on how to define the atomic Gaussian multipoles and associated analytical algorithms for force computation. This is followed by algorithmic details of interfacing pGM and PME. We then present the validation of the analytical force formulation and accuracy discussion of pGM in PME simulations. Finally, we conclude the manuscript with a brief discussion of the next steps in our development.

2. Theory

The Gaussian multipole model represents the charge distribution on each atom as a Gaussian-shaped multipole expansion. So, an n th order Gaussian multipole with the radius of , located at position 𝑅%⃑ , is 𝜌 ()) +𝑟⃑; 𝑅%⃑. = 𝛩 ()) ∙ 𝛻 ( ) 𝑒𝑥𝑝 (−𝛽 = |𝑟⃑ − 𝑅%⃑| = ). (1) Here Θ ()) is the n th rank momentum tensor and ∇ ()) is the n th rank gradient operator (Appendix A.1). In our current polarizable Gaussian multipole (pGM) model, only monopoles and dipoles are retained, so only the first two terms are needed at each atom as shown below 𝜌 (B) +𝑟⃑; 𝑅%⃑. = 𝑞( ) exp (−𝛽 = |𝑟⃑ − 𝑅%⃑| = ) 𝜌 (G) +𝑟⃑; 𝑅%⃑. = 𝜇⃑ ∙ 𝛻 ( ) exp (−𝛽 = |𝑟⃑ − 𝑅%⃑| = ) (2) where the zeroth-order term represents a monopole, and the first-order term represents a dipole. Once the charge densities are defined as in Eqn (2), the pairwise Coulombic interaction energy expressions needed for the current pGM model are as follows (1) Monopole-Monopole: 𝑞 G 𝑞 = IJK (4 LM LM )3 LM (3) (2) Monopole-Dipole: 𝑞 G 𝜇⃑ = ∙ ∇ = IJK+4 LM LM .3 LM (4) (3) Dipole-Dipole: (𝜇⃑ G ∙ ∇ G )(𝜇⃑ = ∙ ∇ = ) IJK+4 LM LM .3 LM (5) where erf () is the error function and 𝛽 G= = L M N4 LM O4 MM and 𝑅 G= = |𝑅 G %%%%⃑ − 𝑅 = %%%%⃑| (6) Finally it is often convenient to introduce the dipole-dipole interaction tensor 𝑇%⃑%⃑ G= =∇ G ∇ = IJK+4 LM LM .3 LM so that Eqn (5) can be simplified as 𝜇⃑ G ∙ 𝑇%⃑%⃑ G= ∙ 𝜇⃑ = . Here it is worth pointing out an important convention used throughout this manuscript. All gradient operators paired with a dipole only operate on coordinates. For example, the gradient operator in 𝜇⃑ G ∙ ∇ G only operates on the atomic coordinates that follow. On the other hand, all other gradient operators that are not paired with a dipole are used in the normal sense. This convention is adopted throughout this manuscript. It can be shown that an effective potential and corresponding effective field at atomic center 𝑅 G %%%%⃑ can be defined as 𝜙 IKKIRSTUI = (𝑞 = + 𝜇⃑ = ∙ ∇ = ) IJK (4 LM LM )3 LM 𝐸 IKKIRSTUI = −(𝑞 = + 𝜇⃑ = ∙ ∇ = )∇ G XYZ+4 LM LM .3 LM (7) due to a charge distribution at atomic center 𝑅 = %%%%⃑, so that the pairwise Coulomb energies in Eqn. (3) through (5) can be reproduced when an effective point charge of 𝑞 G and an effective point dipole 𝜇⃑ G are placed at atomic center 𝑅 G %%%%⃑ . The use of effective potential and field simplifies the derivation of pairwise Coulombic force calculations as to be shown below. Note that these are different from the real Coulombic potential and field due to Gaussian charges and dipoles. For example, the real potential at any location 𝑅 G %%%%⃑ due to atom 2 at 𝑅 = %%%%⃑ is 𝜙 JI\] = (𝑞 = + 𝜇⃑ = ∙ ∇ = ) IJK (4 M LM )3 LM (8) In our current pGM model, interactions are modeled with both permanent and induced atomic multipoles at atomic centers, both of which are truncated at the dipole level. The framework can be easily extended to higher-order multipoles, if needed in future developments.

Permanent Multipoles

Permanent multipoles are the first part of the pGM model and are defined with respect to a local frame overlapped with an atom’s covalent bonds. This choice is based on the fact that atomic moments result from atomic covalent bonding interactions. This is also because covalent bonding interactions are along the stiffest degrees of freedom of a molecule. Thus our design follows the logic that induced moments are meant to be responsible for changes in molecular moments due to changes in soft degrees of freedom in molecular simulations. Of course, the partition between permanent and induced moments is somewhat artificial in a moment fitting procedure. Therefore we refer permanent multipoles in our pGM model as covalent multipoles in the following discussion. The zeroth-order covalent multipoles, i.e. covalent monopoles, are simply the atomic partial charges as in other polarizable or nonpolarizable force fields. The first-order multipoles, i.e. covalent dipoles, are expressed in linear combinations of certain basis vectors. We define the basis vectors to be along the bonding directions, or more precisely, covalent interaction directions. Thus, there may be more covalent interactions than the number of bonds needed to fully define all covalent dipoles on an atom. For example, hydrogen atoms in water are with covalent dipole moments not 100% along the H-O bonds, so virtual H-H bonds may be needed to define covalent dipoles more accurately. An illustration of basis vectors is shown in Figure 1 for O and H atoms of water, and we refer these as the covalent basis vectors (CBVs). The local frame formed by CBVs on an atom is termed its CBV frame. Another representative case is the alpha carbon atom in proteins, which has four bonds, so there can be four basis vectors in its CBV frame to define the covalent dipoles. (a) (b)

Figure 1.

Definition of covalent basis vectors for atoms in the water molecule. (a) For covalent dipoles centered at A (oxygen), the two basis vectors are 𝑒⃑ ^_ and 𝑒⃑ ‘_ , which are defined as unit vectors along its two O-H bonds. (b) For covalent dipoles centered at C (hydrogen), the two basis vectors are 𝑒⃑ _‘ , the unit vector along the H-O bond, and 𝑒⃑ ^‘ , the unit vector along the H-H virtual bond. The covalent dipoles centered on the other hydrogen atom B can be defined similarly. The CBV frame is also chosen for the sake of simplifying force calculations because the basis vectors are directly dependent on the positions of atoms. For example, the gradient of a covalent dipole vector used extensively in force calculations can be obtained easily within the CBV frame as ∇(𝑢%⃑) = ∇ b∑ 𝑢

T 3%⃑ d e3%⃑ d eT f = ∑ 𝑢 T b g⃑⃑e3%⃑ d e − d d e3%⃑ d e h f T , (9) where 𝑢%⃑ is the permanent dipole of an atom, 𝑅%⃑ T is the vector pointing from the atom to its i th bonded atom (including virtually), 𝐼⃑⃑ is the identity tensor, and the summation is over all covalent interactions of the atom. Even if quadrupoles are not used in the current pGM model, it is instructive to outline how they are defined in the CBV frame. Given the covalent basis vectors defined above, covalent basis tensors are constructed as dyadic tensors, with each of which formed as a dyadic product of two covalent basis vectors. For example in the case of oxygen atom with two covalent basis vectors A B C A C B ( 𝑒⃑ ^_ , 𝑒⃑ ‘_ ) in Figure 1a, there are up to four dyadic tensors ( 𝑒⃑ ^_ 𝑒⃑ ^_ , 𝑒⃑ ‘_ 𝑒⃑ ‘_ , 𝑒⃑ ^_ 𝑒⃑ ‘_ , 𝑒⃑ ‘_ 𝑒⃑ ^_ ) available to define its quadrupole. Induced Multipoles

Induced multipoles are the second part of the pGM model. Only first-order terms, i.e. induced dipoles, are used. The pGM polarization scheme can naturally avoid the well-known polarization catastrophe in point polarizable models without employing any artificial screening factors, because distributed dipole densities instead of point dipoles are induced at atomic centers. In the current pGM model, the linear polarization relation is retained as follows 𝑝⃑ T = 𝛼 T 𝐸%⃑

T IKKIRSTUI = 𝛼 T (𝐸%⃑ TRkU\]I)S,IKKIRSTUI − ∑ 𝑇%⃑%⃑ Tl ∙ 𝑝⃑ l lmT ) 𝐸%⃑

TRkU\]I)S,IKKIRSTUI = − ∑ (𝑞 l + 𝜇⃑ l lmT ∙ ∇ l )∇ T XYZ+4 dn dn .3 dn 𝑇%⃑%⃑ Tl = ∇ T ∇ l XYZ+4 dn dn .3 dn (10) where 𝑝⃑ T is the induced dipole and 𝛼 T is the polarizability coefficient of atom i , 𝐸%⃑

T IKKIRSTUI is the total effective electric field at atom i , which contains two parts, (1) the effective field of covalent multipoles, 𝐸%⃑

TRkU\]I)S,IKKIRSTUI (Eqn (7)), and (2) that of induced dipoles, − ∑ 𝑇%⃑%⃑ Tl ∙ 𝑝⃑ l lmT . Note that we have used the effective electric field instead of the real electric field to define the induced dipoles in the pGM model. One reason is to ensure the symmetry of 𝑇%⃑%⃑ Tl , which greatly reduces the complexity of force calculation later. To simplify the following discussion, we drop the effective superscript as we plan to use effective electric fields in all subsequent discussions of energy and force calculations in the pGM model. Another issue worth pointing out about induced dipoles is their self energies. The linear polarization itself implies a self-energy term of the form

𝑈 =

G= p⃑ M q . (11) The derivation can be found in many publications. However, the pGM model, due to its use of Gaussian distributions of multipoles, posts extra difficulty. For example, a Gaussian charge distribution itself has self energy, or assembly energy, of the form as derived in Appendix A.2 as

𝑈 = r M √=6 + h (s%%⃑ Op⃑ ) M (12) Clearly, the self energy is different from Eqn. (11) and it does not lead to a linear polarization behavior. In fact, it is difficult to assess the physical meaning for the nonlinear assembly energy, just like it is hard to discuss the physical meaning of the infinitely large assembly/self energy for a point charge. Thus, for the current model development, we do not consider self energies beyond Eqn. (11).

From the introduction of the pGM model in previous sections, it is clear that the electrostatic potential energy of the system can be divided into two parts: (1)

Covalent Dipole-Covalent Dipole Interaction Energy: 𝑈 RkU\]I)StRkU\]I)S = G= ∑ ∑ (𝑞 T + 𝜇⃑ T ∙ ∇ T )(𝑞 l + 𝜇⃑ l ∙ ∇ l ) XYZ (4 dn dn )3 dn ulmTuT (13) (2) Induced Energy: 𝑈 T)vwRIv = 𝑈

T)vwRIvtRkU\]I)S + 𝑈

T)vwRIvtT)vwRIv + 𝑈 xI]K = − ∑ 𝑝⃑

TuT ∙ 𝐸%⃑

TRkU\]I)S + G= ∑ ∑ (𝑝⃑ T ∙ ulmTuT 𝑇%⃑%⃑ Tl ∙ 𝑝⃑ l ) + G= ∑ 𝑝⃑ T ∙ (𝐸%⃑ TRkU\]I)S − ∑ 𝑇%⃑%⃑ Tl ∙ 𝑝⃑ lulmTuT ) = − G= ∑ 𝑝⃑ TuT ∙ 𝐸%⃑

TRkU\]I)S (14) where N denotes the number of atoms in the system. Thus, atomic electrostatic forces can be derived as negative gradients of the above two potential energy terms, respectively. When computing gradients for the covalent-dipole interaction energy, it is very important to know which quantities are the variables of the virtual displacement of atom i . There are two types of variables: (1) pairwise distances between atom i and all other atoms, and (2) covalent dipoles on atom i and covalent dipoles on atoms covalently interacting with atom i . Given this classification of variables, we can group the terms in Eqn. (13) into four different parts and discuss their gradients with respect to 𝑅%⃑ T , separately. The detailed derivations are presented in Appendix A.3, and the final force expression for the covalent-dipole interaction energy is 𝐹⃑ RkU\]I)StRkU\]I)ST = − ∑ ∇

Tul +𝑢%⃑ l . ∙ ∑ ∇ luzml (𝑞 z + 𝑢%⃑ z ∙ ∇ z ) XYZ+4 n{ n{ .3 n{ −(𝑞 T + 𝑢%⃑ T ∙ ∇ T ) ∑ +𝑞 l + 𝑢%⃑ l ∙ ∇ l .∇ TulmT XYZ+4 dn dn .3 dn (15) where N denotes the number of atoms in the system. Briefly, the first term is from the derivatives over the covalent dipoles, and the second term is from derivatives over the pairwise distances. The derivation of forces for the induced energy in Eqn (14) is not that straightforward. We first need to express the induced dipole, 𝑝⃑ T , in terms of fields from covalent dipoles only, not as their definition in Eqn (10). This is because 𝑝⃑ T appears on both sides of Eqn (10), i.e. induced dipoles mutually influence each other, so it is difficult to take their derivatives. Instead we proceed by expressing 𝑝⃑ T as shown in Appendix A.3 as 𝑝 Tx = 𝐴 tGTlxS 𝐸 lRkU\]I)S,S (16) where 𝐴 tGTlxS is an

3𝑁 × 3𝑁 matrix which we do not know the expression of, and s t and i j are coordinate component indices and atom indices, respectively. Next Eqn (14) can be rewritten as

𝑈 = − G= 𝐴 tGlzxS 𝐸 lRkU\]I)S,x 𝐸 zRkU\]I)S,S (17) where the Einstein’s index notation is employed for j k s t , so that a repeated index implies a summation over all possible values of the index, i.e. Eqn (17) is a quadruple summation. Even if we do not know the expression of matrix 𝐴 tG , we can still obtain its gradient with respect to 𝑅%⃑ T , or the virtual displacement of atom i (cid:128)_ (cid:129)Ldn(cid:130)(cid:131) (cid:128)(cid:132) {(cid:133) = −𝐴 tGTT (cid:134) xx (cid:134) (cid:128)(cid:135) d(cid:134)n(cid:134)(cid:130)(cid:134)(cid:131)(cid:134) (cid:128)(cid:132) {(cid:133) 𝐴 tGl (cid:134) lS (cid:134) S (18) where all primed indices follow the Einstein’s index notation, and 𝑇 is the dipole-dipole interaction tensor. Given the above preparations, the induced part of the force can be obtained as 𝐹⃑ T)vwRIvT = −𝑝⃑ T ∙ ∇ T ∑ (𝑝⃑ l ∙ ∇ l )∇ TulmT XYZ+4 dn dn .3 dn + ∑ ∇ T (𝐸%⃑ lRkU\]I)S ) ∙ 𝑝⃑ lul (19) where N denotes the number of atoms in the system. Details are presented in Appendix A.3. Briefly, the first term is obtained from the derivative of matrix 𝐴 tG and the second term is calculated as derivative of the covalent field 𝐸%⃑ lRkU\]I)S as follows ∇ T (𝐸%⃑ lRkU\]I)S ) = (cid:136) − ∑ ∇ T (𝑢%⃑ z ) ∙ )zml ∇ z ∇ l XYZ (4 n{ n{ )4 n{ n{ − (𝑞 T + 𝑢%⃑ T ∙ ∇ T )∇ T ∇ l XYZ+4 dn dn .4 dn dn 𝑖𝑓 𝑗 ≠ 𝑖− ∑ ∇ T (𝑢%⃑ z ) ∙ )zmT ∇ z ∇ T XYZ (4 d{ d{ )4 d{ d{ − ∑ (𝑞 z + 𝑢%⃑ z ∙ ∇ z ) uzmT ∇ T ∇ T XYZ+4 d{ d{ .4 d{ d{ 𝑖𝑓 𝑗 = 𝑖 (20) where n denotes all atoms that are covalently (including virtually) bonded with atom i , and atom i itself. In practice, force calculations must be combined with an Ewald summation or particle mesh Ewald (PME) technique to handle long-range under periodic boundary condition. This is to be discussed in detail in section 2.5. The Ewald summation was introduced to compute the electrostatic energy of an infinite lattice under periodic boundary conditions. The basic idea is to put a mask Gaussian charge distribution on the real charge on each atom. Then a direct-space pairwise summation is conducted to compute the electric field due to real charges masked by the Gaussian charges. This step can be executed with a reasonably short cutoff distance due to the very fast decay after applying the mask Gaussian charges. Next the field generated by the mask Gaussians can be computed efficiently by a reciprocal-space summation to bring back the original electric field due to the real charges. Finally, a correction step is used to remove interactions not needed in the original electrostatic model. Similar to its use in point charge/dipole models, a mask Gaussian distribution is also used on each moment of each atom in the pGM model. 𝜌 T(cid:141)\xz +𝑟⃑; 𝑅%⃑ T . = 𝑞 T (cid:142) (cid:143)M (cid:144) hM exp (cid:142)−𝛽 B= e𝑟⃑ − 𝑅%⃑ T e = (cid:144) +(𝜇⃑ T + 𝑝⃑ T ) ∙ 𝛻 d (cid:142) (cid:143)M (cid:144) hM exp (cid:142)−𝛽 B= e𝑟⃑ − 𝑅%⃑ T e = (cid:144) (21) where 𝛽 B is an adjustable parameter usually in the range of about G(cid:145) ~ G= Å tG , universal for all atoms. Direct Summation

Given the mask distribution is also a Gaussian function, it is straightforward to compute electrostatic potential, field, and the gradient of field of a masked pGM charge distribution, as follows 𝜙 T = ∑ [ 𝑞 l + (𝜇⃑ l + 𝑝⃑ l ) ∙ ∇ l ] XYZ+4 dn dn .tXYZ+4 (cid:143) dn .3 dn ulmT (22) 𝐸%⃑ T = − ∑ [ 𝑞 l + (𝜇⃑ l + 𝑝⃑ l ) ∙ ∇ l ]∇ T XYZ+4 dn dn .tXYZ+4 (cid:143) dn .3 dn ulmT (23) 𝐸%⃑%⃑ T = − ∑ [ 𝑞 l + (𝜇⃑ l + 𝑝⃑ l ) ∙ ∇ l ]∇ T ∇ T XYZ+4 dn dn .tXYZ+4 (cid:143) dn .3 dn ulmT (24) For the current pGM model, no higher-order field is needed. Here, N represents all the atoms including those in the periodic boxes, but their influence would decay to zero very quickly due to the masking effect. Another point worth pointing out is that the real field of the mask Gaussian multipoles is used, i.e. 𝛽 B is used instead of 𝛽 TB = d (cid:143) N4 dM O4 (cid:143)M in the above expressions. The mixed use is not an issue because mask multipoles do not really exist, their role is just a mathematical treatment in the Ewald summation as long as the effect is exactly cancelled out in the later step. Reciprocal Summation

The reciprocal summation of the pGM model follows the same procedure as a traditional point polarizable model. Thus the electrostatic potential, field, and gradient of field can be shown as 𝜙 T = G6(cid:149) ∑ X(cid:150)(cid:151)(cid:152)t (cid:153)M(cid:154)%%%%⃑M(cid:155)(cid:143)M (cid:156)(cid:141)%%%⃑ M (cid:141)%%%⃑mB exp+−2𝜋𝑖𝑚%%⃑ ∙ 𝑅%⃑ T . 𝑆(𝑚%%⃑) (25) 𝐸%⃑ T = =T(cid:149) ∑ 𝑚%%⃑ X(cid:150)(cid:151)(cid:152)t (cid:153)M(cid:154)%%%%⃑M(cid:155)(cid:143)M (cid:156)(cid:141)%%%⃑ M (cid:141)%%%⃑mB exp+−2𝜋𝑖𝑚%%⃑ ∙ 𝑅%⃑ T . 𝑆(𝑚%%⃑) (26) 𝐸%⃑%⃑ T = − ¡6(cid:149) ∑ 𝑚%%⃑𝑚%%⃑ X(cid:150)(cid:151)(cid:152)t (cid:153)M(cid:154)%%%%⃑M(cid:155)(cid:143)M (cid:156)(cid:141)%%%⃑ M (cid:141)%%%⃑mB exp+−2𝜋𝑖𝑚%%⃑ ∙ 𝑅%⃑ T . 𝑆(𝑚%%⃑) (27) where the i ’s that are not subscripts but the imaginary unit, V is the volume of the unit cell, and 𝑚%%⃑ is the reciprocal space vector. 𝑆(𝑚%%⃑) is the structure factor

𝑆(𝑚%%⃑) = ∑ 𝐿£ l (𝑚%%⃑) exp+2𝜋𝑖𝑚%%⃑ ∙ 𝑅%⃑ l . ul⁄G 𝐿£ l (𝑚%%⃑) = 𝑞 l + 2𝜋𝑖(𝜇⃑ l + 𝑝⃑ l ) ∙ 𝑚%%⃑ (28) Here N is the number of atoms in the primary simulation box only. Correction

The PME correction term is used to handle various specific situations in a force field. For example, most force fields have masked bonded (1-2 and 1-3) atom pairs, which result in no electrostatic interactions between these pairwise atoms. Thus, the interactions among these masked pairs must be removed. However, in the current pGM model we do not have any masked pairs, so there is no need for such correction. Another correction that needs paying attention to is the self-interaction correction. This is the only correction in the pGM model. The self-potential, self-field, gradient of self-field can be shown as

42, 43 𝜙 T = =r d (cid:143) √6 (29) 𝐸%⃑ T = − ¡(s%%⃑ d Op⃑ d )4 (cid:143)h (30) 𝐸%⃑%⃑ T = ¡r d (cid:143)h 𝐼⃑⃑ (31) These terms need to be properly subtracted to obtain correct potential, field, and field gradient, respectively In summary, the similarity between the Ewald summation in pGM model and that in the polarizable point charge/dipole model shows that the PME MD engine for the point charge/dipole model can be easily transplanted over for pGM applications with little revision. There are excellent literature discussing the details of PME for polarizable point dipole models, and can be safely omitted in this work.

42, 43

As pointed out at the end of Section 2.3, analytical force expressions, Eqn (15) and (19), cannot be used directly in typical MD simulations since all summations are over infinite numbers of atoms with periodic boundary conditions. They must be combined with an Ewald summation or a PME technique to facilitate solvated-phase simulations. To bypass the infinite summations, the force expressions are reformulated in terms of fields and its derivatives, which are also the quantities that an Ewald or PME procedure would return. The key to express Eqn (15) and (19) with fields and gradients of fields is to consider the following quantities together

𝐸%⃑

TRkU\]I)S = − ∑ ( 𝑞 l + 𝜇⃑ l ∙ ∇ l )∇ T XYZ+4 dn dn .3 dn ulmT 𝐸%⃑%⃑

TRkU\]I)S = − ∑ ( 𝑞 l + 𝜇⃑ l ∙ ∇ l )∇ T ∇ T XYZ+4 dn dn .3 dn ulmT 𝐸%⃑

TT)vwRIv = − ∑ 𝑝⃑ l ∙ ∇ l ∇ T XYZ+4 dn dn .3 dn ulmT 𝐸%⃑%⃑

TT)vwRIv = − ∑ 𝑝⃑ l ∙ ∇ l ∇ T ∇ T XYZ+4 dn dn .3 dn ulmT (32) where N denotes all atoms in the system, including those of the periodic boxes. Thus, these are all infinite summations. A key step is in the computation of 𝐹⃑ T)vwRIvT , where term ∑ ∇ T (𝐸%⃑ lRkU\]I)S ) ∙ 𝑝⃑ lul also has to be reformulated accordingly. Given that Eqn (20) for ∇ T +𝐸%⃑ lRkU\]I)S . lists two separate terms for 𝑗 =𝑖 and 𝑗 ≠ 𝑖 , we can rewrite ∑ ∇ T (𝐸%⃑ lRkU\]I)S ) ∙ 𝑝⃑ lul as follows ∑ ∇ T +𝐸%⃑ lRkU\]I)S . ∙ 𝑝⃑ lul = ∑ ulmT ∑ (−∇ T (𝑢%⃑ z )) ∙ )zml ∇ z +𝑝⃑ l ∙ ∇ l . XYZ (4 n{ n{ )4 n{ n{ − ∑ (𝑞 T + 𝜇⃑ T ∙ ∇ T )∇ T (𝑝⃑ l ∙ ∇ l ) XYZ+4 dn dn .4 dn dn ulmT − ∑ ∇ T (𝑢%⃑ z ) ∙ )zmT ∇ z (𝑝⃑ T ∙ ∇ T ) XYZ+4 d{ d{ .4 d{ d{ − ∑ (𝑞 z + 𝜇⃑ z ∙ ∇ z ) uzmT ∇ T (𝑝⃑ T ∙ ∇ T ) XYZ+4 d{ d{ .4 d{ d{ (33) where both 𝑗 = 𝑖 and 𝑗 ≠ 𝑖 terms in Eqn (20) are needed due to the outermost summation over j . Combining the first and third terms of Eqn (33) gives = ∑ ul ∑ (−∇ T (𝑢%⃑ z )) ∙ )zml ∇ z +𝑝⃑ l ∙ ∇ l . XYZ+4 n{ n{ .4 n{ n{ − ∑ (𝑞 T + 𝜇⃑ T ∙ ∇ T )∇ T (𝑝⃑ l ∙ ∇ l ) XYZ+4 dn dn .4 dn dn ulmT − ∑ (𝑞 z + 𝜇⃑ z ∙ ∇ z ) uzmT ∇ T (𝑝⃑ T ∙ ∇ T ) XYZ+4 d{ d{ .4 d{ d{ (34) Exchanging the summation order for the first term leads to = ∑ )z ∑ (−∇ T (𝑢%⃑ z )) ∙ ulmz ∇ z +𝑝⃑ l ∙ ∇ l . XYZ+4 n{ n{ .4 n{ n{ − ∑ ulmT (𝑞 T + 𝜇⃑ T ∙ ∇ T )∇ T +𝑝⃑ l ∙ ∇ l . XYZ+4 dn dn .4 dn dn −∑ (𝑞 z + 𝜇⃑ z ∙ ∇ z ) uzmT ∇ T (𝑝⃑ T ∙ ∇ T ) XYZ+4 d{ d{ .4 d{ d{ (35) Substitution of the expressions of electric fields and derivatives in Eqn (32) gives ¥ ∇ T +𝐸%⃑ lRkU\]I)S . ∙ 𝑝⃑ lul = ∑ ∇ T)z (𝑢%⃑ z ) ∙ 𝐸%⃑ zT)vwRIv + 𝑞 T 𝐸%⃑

TT)vwRIv + 𝑢%⃑ T ∙ 𝐸%⃑%⃑ TT)vwRIv + 𝑝⃑ T ∙ 𝐸%⃑%⃑ TRkU\]I)S (36) Here, n means those atoms that covalently interact with atom i . Given the above preparations, Eqn (15) and (19) can finally be expressed as follows after substitution of Eqns (32) and (36) 𝐹⃑ RkU\]I)StRkU\]I)ST = ∑ ∇

T)l +𝑢%⃑ l . ∙ 𝐸%⃑ lRkU\]I)S + 𝑞 T 𝐸%⃑

TRkU\]I)S + 𝑢%⃑ T ∙ 𝐸%⃑%⃑ TRkU\]I)S (37) 𝐹⃑ T)vwRIvT = ∑ ∇

T)l +𝑢%⃑ l . ∙ 𝐸%⃑ lT)vwRIv + 𝑞 T 𝐸%⃑

TT)vwRIv +𝑝⃑ T ∙ 𝐸%⃑%⃑ TT)vwRIv + 𝑢%⃑ T ∙ 𝐸%⃑%⃑ TT)vwRIv + 𝑝⃑ T ∙ 𝐸%⃑%⃑ TRkU\]I)S (38) Adding these two terms together, the final force expression is, 𝐹⃑ T = ∑ ∇ T)l +𝑢%⃑ l . ∙ 𝐸%⃑ l + 𝑞 T 𝐸%⃑ T + (𝑢%⃑ T + 𝑝⃑ T ) ∙ 𝐸%⃑%⃑ T (39) Eqn (39) shows that a key step in this algorithm is to accumulate atomic electric potential and its first and second derivatives from various components, including both reciprocal and direct summations.

3. Results and Discussion

To validate the pGM force expressions, e.g. Eqns (15)/(19) above, we constructed a small toy system of two water molecules in free space. The detailed water pGM parameters are listed in Table 1. These parameters were derived with an iterative RESP procedure for the pGM model with quantum mechanical ESP data from a B3LYP/aug-cc-pvtz calculation of the water dimer.

Tested atoms water-1 O water-1 H1 water-1 H2 water-2 O water-2 H1 water-2 H2 Charge ( 𝑒 ) -1.797045 0.898523 0.898523 -1.797045 0.898523 0.898523 Dipole moment ( 𝑒 ∙ Å ) (1.317332, -0.120867, 1.711988) (0.262849, -0.484213, 0.323876) (0.276182, 0.436999, 0.376728) (0.927376, 0.134058, -1.967337) (-0.236591, 0.041394, -0.650790) (0.590269, 0.013966, -0.164059) Polarizability ( Å ) 1.448980 0.427350 0.427350 1.448980 0.427350 0.427350 Gaussian radius ( Å ) 0.8066249 0.7147597 0.7147597 0.8066249 0.7147597 0.7147597 Coordinates ( Å ) (-1.387669, -0.006775, 0.110728) (-1.734110, 0.790147, -0.310036) (-1.756164, -0.740122, -0.397708) (1.514536, 0.007522, -0.121554) (1.919419, -0.047517, 0.751251) (0.555921, -0.008485, 0.043101) Table 1. Two water molecules in free space. The permanent dipole moments are expressed in the lab frame. The moments in the CBV frame can be obtained once the CBV’s are defined according to Figure 1.

Two methods were used to calculate the atomic forces. The first method is to use the force expressions to calculate forces analytically. The second method is to calculate forces numerically via the finite-difference method, based on the fact that each force is the negative gradient of potential energy. Here the potential energy was computed with Eqn (13) and (14). The finite-difference coordinate displacement was set to be 0.001 Å . The two sets of atomic forces are listed in Table 2. It is clear that the differences between the two sets of atomic forces appear only on the fourth digit after the decimal point, which is consistent with the accuracy of the finite-difference displacement used (0.001 Å ). Tested atoms water-1 O water-1 H1 water-1 H2 water-2 O water-2 H1 water-2 H2 Analytical forces (0.094496, -0.008624, 0.1254230) (-0.048543, -0.237284, -0.076339) (-0.041431, 0.245927, -0.048632) (0.047721, 0.009654, -0.143042) (-0.245399, 0.000915, -0.033889) (0.193156, -0.010589, 0.176479) Finite-difference forces (0.094410, -0.008797, 0.125186) (-0.048657, -0.237474, -0.076493) (-0.041549, 0.245762, -0.048810) (0.047509, 0.009778, -0.143437) (-0.245485, 0.000871, -0.034221) (0.192871, -0.010637, 0.176359) Deviations (0.000085, 0.000173, 0.000236) (0.000114, 0.000190, 0.000154) (0.000118, 0.000164, 0.000178) (0.000211, -0.000124, 0.000395) (0.000086, 0.000043, 0.000332) (0.000284, 0.000048, 0.000119)

Table 2. Atomic forces (e / Å = ) computed via the analytical expression and the finite difference procedure, and their differences. There is also an indirect way to confirm the correctness of the force expression, which is to utilize the fact that the total force of the system should always be zero in any direction. If we add up all atomic forces, the system net force (in e / Å = ) is −2 × 10 tG§ , −1 × 10 t¤ and , for x , y and z directions, respectively. The overall error here is consistent with the induction tolerance used in the testing, t¤ . To achieve aqueous-phase simulations, an Ewald summation or PME technique is essential for any electrostatic model. Although there are various publications discussing the accuracy of PME

42, 43, 47, 48 , we have to acknowledge the fact that the pGM model has a higher accuracy requirement than classical point-charge force fields due to the presence of dipoles. In general, higher moments would require higher PME accuracy. This can be appreciated from the perspective of two considerations. Firstly, the potentials of dipoles scale with distance more nonlinearly ( ⁄ ) than those of charges ( ). Secondly, the second derivatives of potential are needed to compute forces on dipoles (Eqn (39)), whereas only first derivatives, e.g. electric fields, are needed to compute forces on charges. Due to these differences, we have to carefully examine the accuracy requirement of PME methods used in our model. To test the accuracy, we only look at the most difficult pairwise interactions, so that the errors reported below are the maximum errors in the tested water system. For the reciprocal part, we focus on the electrostatic field between the bonded O atom and H atoms, whose interactions are the most nonlinear and thus the most difficult in PME. For the direct summation part, we focus on the electrostatic field between two H atoms. Because they have the smallest Gaussian radii, their interactions converge slowest in the direct summation. Thus, to guarantee a given accuracy level for forces in the pGM model, we need to consider both PME components. In the following analysis we set the grid spacing to 1 Å for PME as in most biomolecular simulations and varied other parameters to see how accuracy changes in both the reciprocal and the direct summation components. Also, because our model contains both charges and dipoles, we analyzed their field separately to assess the impact of different setups on their accuracy of electric field. The test results are shown below in Table 3, 4 and 5. Interpolation order 5 6 7 8 9 Ewald coefficient 𝛽 B = 0.3Å tG potential t(cid:145) t(cid:145) t§ t¤ t¤ first derivative t¡ t(cid:145) t¤ t¤ t¤ second derivative t7 t¡ t(cid:145) t(cid:145) t¤ Ewald coefficient 𝛽 B = 0.4Å tG potential t¡ t(cid:145) t¤ t(cid:176) t¤ first derivative t7 t¡ t¡ t(cid:145) t(cid:145) second derivative t7 t7 t¡ t¡ t(cid:145) Ewald coefficient 𝛽 B = 0.5Å tG potential t¡ t¡ t¡ t(cid:145) t(cid:145) first derivative t7 t7 t7 t¡ t¡ second derivative t7 t7 t7 t7 t¡ Table 3. Errors of reciprocal potentials and derivatives generated by the dipoles of water-1 H1 atom on water-1 O atom at different PME setups. The analytical values (Eqns (25) – (28)) were calculated with MATLAB. Interpolation order refers to the rank of the Lagrangian interpolation method used in PME. See Ref for details. Interpolation order 5 6 7 8 9 Ewald coefficient 𝛽 B = 0.3Å tG potential t¤ t(cid:176) t(cid:176) t(cid:176) t– first derivative t¡ t(cid:145) t¤ t¤ t§ second derivative t¡ t¡ t(cid:145) t¤ t¤ Ewald coefficient 𝛽 B = 0.4Å tG potential t(cid:145) t¤ t¤ t§ t§ first derivative t¡ t¡ t(cid:145) t(cid:145) t(cid:145) second derivative t7 t¡ t¡ t(cid:145) t(cid:145) Ewald coefficient 𝛽 B = 0.5Å tG potential t(cid:145) t(cid:145) t(cid:145) t(cid:145) t¤ first derivative t7 t¡ t¡ t¡ t¡ second derivative t7 t7 t7 t¡ t¡ Table 4. Errors of reciprocal potentials and derivatives generated by the charges of water-1 H1 atom on water-1 O atom at different PME setups. The analytical values (Eqns (25) – (28)) were calculated with MATLAB. Interpolation order refers to the rank of the Lagrangian interpolation method used in PME. See Ref for details. Cutoff distance ( Å ) 7 8 9 10 11 𝛽 B = 0.3Å tG t7 t¡ t¡ t(cid:145) t¤ 𝛽 B = 0.4Å tG t(cid:145) t¤ t§ t(cid:176) tGB 𝛽 B = 0.5Å tG t§ t(cid:176) tGB tG= tG(cid:145) Table 5. Errors of direct summation potentials for Gaussian potentials between two H atoms at different PME setups. These values are the difference between two error functions, erf(𝛽𝑅 R ) − erf (𝛽 B 𝑅 R ) . Here, 𝑅 R is the direct summation cutoff distance, and 𝛽 = 1/(0.7147597 ∙ √2)Å tG for the H atom pairs. It is clear from the above analyses that the pGM model demands a higher accuracy level than classical point-charge models. This is as expected for any electrostatic model with dipoles or higher moments. For the reciprocal part, the field generated by dipoles are more difficult to handle than that of charges in PME. Comparing Tables 3 and 4, we can see that the errors of dipole fields are about twice larger than that of charge fields. Furthermore, Table 3 and 4 show that the second derivatives are the most difficult in PME. Thus, to ensure the accuracy of the reciprocal summation of PME, we need to make sure the second derivatives reach a specified accuracy level. For example, if we use t(cid:145) as the accuracy threshold, which is a common choice, we have to set Ewald 𝛽 B = 0.3Å tG and the interpolation order 7 or higher in the PME setup (Table 3). Situations are similar for the direct summation part. To reach a common accuracy threshold of t(cid:145) , if we set 𝛽 B = 0.3Å tG as in the reciprocal part, the direct space cutoff should be set to a relatively longer cutoff distance of 10 Å , as shown in Table 5. Of course, a choice of larger 𝛽 B (i.e. tG ) would allow a commonly used cutoff distance of 9 Å. However, this would require a higher interpolation order to achieve the similar level of accuracy.

Given all the accuracy consideration in Section 3.2, we performed a pure water simulation to test the energy conservation behavior in an NVE run with the PME treatment. The electrostatic parameters were derived from those in Table 1 and transplanted onto the TIP3P water model. We used 512 water molecules in a truncated octahedron box of 27.5 Å . The dimension of the particle mesh grid is 30 , so the grid spacing is a bit less than 1 Å. The PME 𝛽 B = 0.35Å tG , the real space cutoff was set as 9 Å , and the interpolation order was 8 so that the overall PME error was less than t(cid:145) . We first tested a range of induction tolerance criteria, ranging from t7 to t¤ . Our experiments show that t7 and t¡ are clearly not sufficient for the induction iteration, leading to decreasing energy throughout the MD simulations. This is consistent with previous findings in the developments of polarizable point dipole models. Specifically, the energy in the NVE run of t7 drifts too fast, so that it is already out of the plotting range at the 100 th step, the very first data point. The rest of the energy plots over the simulation time are shown below in Figure 2. The initial testing shows that the energy convergence became much better after we tighten the iteration tolerance to t(cid:145) . Though the total energy still drifts down a little, but much slower than before. Finally, after we tighten it to t¤ , the total energy is basically conserved. Of course, the total energy fluctuation does exist. Figure 2. Total energy versus simulation time for the 512-water box simulation. Here, all the simulations are performed with a t(cid:145)

PME accuracy, but with different induction iteration tolerance ( t¡ ~ 1 × 10 t¤ ). Next we also studied the influences of the PME setup on the energy conservation. To compare with the NVE run with the high PME accuracy above. We collected a comparable NVE run with the same induction tolerance of t¤ , but with a somewhat lower PME setting. The real space cutoff was set as 8 Å and the interpolation order was set as 6 but others remained to be the same, which leads to a lower PME accuracy of ~5 × 10 t¡ . The total energy is more positive because there are fewer van der Waals pairs. As shown in Figure 3, the total energy also drifts noticeably, though it becomes more positive over time. In summary, our experiment shows that high enough accuracy in both the PME calculation and the induction iteration is necessary for a polarizable dipole model with permanent dipoles to achieve energy conservation. Furthermore, we expect even higher accuracy is necessary if higher moments, i.e. quadrupoles, are used in future pGM developments as higher derivatives are needed from the PME calculation.

4. Conclusion

In this work, we proposed an efficient formulation for the polarizable Gaussian Multipole (pGM) model for biomolecular simulations. Firstly, a local frame based on the covalent basis vectors (CBV) /tensors was used to set up the permanent (covalent) multipoles on all atoms. The CBV frame nicely allows the intrinsic molecular flexibility during simulations and facilitates an efficient formulation of analytical electrostatic forces. Based on the new CBV local frame, we then derived the analytical force expressions for the pGM model. Finally, we outlined how to interface the pGM electrostatics seamlessly with the PME implementation for molecular simulations under the periodic boundary conditions. Figure 3. Total energy versus simulation time for the 512-water box simulation. Here, both simulations were performed with a t¤ induction iteration tolerance, but with different PME accuracy, low for t¡ and high for t(cid:145) . To validate the analytical force expression for the pGM model defined on the CBV frame, we studied the accuracy of the analytical atomic forces with a finite-different force analysis for a water dimer. The analysis shows a very good consistency between the analytical and numerical forces, with an error comparable to the finite difference uncertainty. In addition, total analytical and numerical forces of the water dimer are very close to zero with an error consistent with the induction iteration tolerance. Next, we analyzed the PME setups necessary for accurate pGM energy and force calculations. It was found that the pGM model requires higher accuracy than the classical point-charge models due to the presence of dipoles. This is because the electrostatic field generated by dipoles are much more difficult to interpolate than that of charges in PME, and the error of the dipole field is about twice that of the charge field. In addition, the second derivative of potential is needed, which is the more difficult to compute accurately in PME to ensure accurate pGM forces. To validate the overall electrostatic framework for the reformulated pGM model, we conducted an NVE simulation for a small water box of 512 water molecules. Our results show that, to achieve energy conservation, it is important to ensure enough accuracy on both PME and induced dipoles. With a t(cid:145) accuracy on PME and a t¤ tolerance for induced dipoles, the tested NVE water simulation in the pGM model was shown to conserve energy reasonably well. Future development will be necessary to improve the efficiency of the pGM model in both PME setup and induction iteration to bring out the potential of the pGM model. Data Availability

The algorithms developed in this study and the validation data are deposited in the Amber repository and will be made publicly available in the next Amber/AmberTools release on http://ambermd.org/.

Acknowledgements

This work was supported by NIH GM79383, GM093040, and GM130367.

References U. Burkert, and N. Allinger, Washington, DC (1982) W. L. Jorgensen et al. , The Journal of chemical physics (1983) 926. H. J. Berendsen et al. , in

Intermolecular forces (Springer, 1981), pp. 331. M. W. Mahoney, and W. L. Jorgensen, The Journal of Chemical Physics (2000) 8910. E. Clementi et al. , Theor. Chim. Acta (1980) 257. P. Ren, and J. W. Ponder, The Journal of Physical Chemistry B (2003) 5933. T. A. Halgren, and W. Damm, Current opinion in structural biology (2001) 236. S. W. Rick, and S. J. Stuart, Reviews in computational chemistry (2002) 89. J. W. Ponder, and D. A. Case, in

Advances in Protein Chemistry (Academic Press, 2003), pp. 27. G. Lamoureux, A. D. MacKerell Jr, and B. Roux, The Journal of chemical physics (2003) 5185. H. Yu, T. Hansson, and W. F. van Gunsteren, The Journal of chemical physics (2003) 221. S. W. Rick, S. J. Stuart, and B. J. Berne, The Journal of chemical physics (1994) 6141. J. Caldwell, L. X. Dang, and P. A. Kollman, Journal of the American Chemical Society (1990) 9144. C. J. Burnham et al. , The Journal of chemical physics (1999) 4566. F. J. Vesely, Journal of Computational Physics (1977) 361. J. Applequist, J. R. Carl, and K.-K. Fung, Journal of the American Chemical Society (1972) 2952. B. T. Thole, Chemical Physics (1981) 341. G. A. Kaminski et al. , Journal of Computational Chemistry (2002) 1515. R. A. Friesner, in

Advances in Protein Chemistry (Academic Press, 2005), pp. 79. S. Patel, A. D. Mackerell, and C. L. Brooks, Journal of Computational Chemistry (2004) 1504. G. Lamoureux et al. , Chemical Physics Letters (2006) 245. P. E. M. Lopes et al. , Journal of Physical Chemistry B (2007) 2873. W. Jiang et al. , Journal of Physical Chemistry Letters (2011) 87. J. W. Ponder et al. , Journal of Physical Chemistry B (2010) 2549. P. Cieplak, J. Caldwell, and P. A. Kollman, J Comput Chem (2001) 1048. J. Wang et al. , Journal of Physical Chemistry B (2011) 3091. J. Wang et al. , Journal of Physical Chemistry B (2011) 3100. J. Wang et al. , Journal of Physical Chemistry B (2012) 7999. J. M. Wang et al. , Journal of Physical Chemistry B (2012) 7088. D. E. Williams, Journal of computational chemistry (1988) 745. J. W. Ponder et al. , J Phys Chem B (2010) 2549. Y. Shi et al. , J Chem Theory Comput (2013) 4046. G. A. Cisneros, J.-P. Piquemal, and T. A. Darden, The Journal of chemical physics (2006) 184101. J. P. Piquemal et al. , Journal of Chemical Physics (2006) R. E. Duke et al. , Journal of Chemical Theory and Computation (2014) 1361. A. K. Rappe, and W. A. Goddard III, The Journal of Physical Chemistry (1991) 3358. H. A. Stern et al. , The Journal of chemical physics (2001) 2237. B. Guillot, and Y. Guissani, The Journal of Chemical Physics (2001) 6720. D. Elking, T. Darden, and R. J. Woods, Journal of Computational Chemistry (2007) 1261. J. Wang et al. , Journal of Chemical Theory and Computation (2019) 1146. J. Gao et al. , Acc. Chem. Res. (2014) 2837. A. Toukmaji et al. , The Journal of chemical physics (2000) 10913. C. Sagui, L. G. Pedersen, and T. A. Darden, The Journal of chemical physics (2004) 73. D. Elking, (University of Georgia, 2007). C. J. F. Böttcher et al. , Theory of electric polarization (Elsevier Science Ltd, 1978), Vol. 2, P. P. Ewald, Annalen der Physik (1921) 253. T. Darden, D. York, and L. Pedersen, The Journal of chemical physics (1993) 10089. U. Essmann et al. , The Journal of chemical physics (1995) 8577. W. Smith, CCP5 Inf. Q (1982) 13. Appendices

A.1 Tensor Format of Boys Serial

In this study, Boys functions up to rank three were used and are listed below as reference. Higher ranked tensors and Boys functions can be found in the literatures.

44, 49

Boys functions up to rank three are 𝐵 B (𝑥) = XYZ ((cid:132))(cid:132) 𝐵 G (𝑥) = XYZ ((cid:132))(cid:132) h − =√6 𝑒 t(cid:132) M G(cid:132) M 𝐵 = (𝑥) = „ − =√6 𝑒 t(cid:132) M G(cid:132) ” (3 + 2𝑥 = ) 𝐵 (𝑥) = G(cid:145)XYZ ((cid:132))(cid:132) » − =√6 𝑒 t(cid:132) M G(cid:132) … (15 + 10𝑥 = + 4𝑥 ¡ ) The associated tensors are ∇ XYZ (43)3 = −𝑅%⃑𝛽 𝐵 G (𝛽𝑅) ∇∇ XYZ (43)3 = 𝑥‰ p 𝑥‰ r (𝑅 p 𝑅 r 𝛽 (cid:145) 𝐵 = (𝛽𝑅) − 𝛿 pr 𝛽 𝐵 G (𝛽𝑅)) ∇∇∇ XYZ (43)3 = 𝑥‰ p 𝑥‰ r 𝑥‰ J (+𝛿 pr 𝑅 J + 𝛿 pJ 𝑅 r + 𝛿 Jr 𝑅 p .𝛽 (cid:145) 𝐵 = (𝛽𝑅) − 𝑅 p 𝑅 r 𝑅 J 𝛽 § 𝐵 (𝛽𝑅)) A.2 Self-energy derivation

We can derive a general expression of self-energy/assemble energy of Gaussian multipoles in pGM as follows

𝑈 = ∫ 𝑑𝑟⃑ G= 𝜌𝜑 = ∫ 𝑑𝑟⃑ G= +𝜌 (cid:141)k)kpk]I + 𝜌 vTpk]I .+𝜑 (cid:141)k)kpk]I + 𝜑 vTpk]I . =∫ 𝑑𝑟⃑ G= 𝜌 (cid:141)k)kpk]I 𝜑 (cid:141)k)kpk]I + ∫ 𝑑𝑟⃑ G= 𝜌 vTpk]I 𝜑 (cid:141)k)kpk]I + ∫ 𝑑𝑟⃑ G= 𝜌 (cid:141)k)kpk]I 𝜑 vTpk]I +∫ 𝑑𝑟⃑ G= 𝜌 vTpk]I 𝜑 vTpk]I The first term, monopole self-energy, ∫ 𝑑𝑟⃑ G= 𝜌 (cid:141)k)kpk]I 𝜑 (cid:141)k)kpk]I = lim = IJK ( (cid:155)√M )3 = lim = 4√= IJK ( (cid:155)√M ) (cid:155)√M = lim (cid:132)→B r∙r = 4√= 𝐵 B (𝑥) = r∙r = 4√= =√6 = r M √=6 The second and the third terms, on the other hand, are always zero, because the symmetry of a monopolar and a dipolar distribution is even and odd, respectively, which causes the integral always cancels itself. The last term is the dipole self-energy. In a similar fashion as above for the monopole, it can be shown as ∫ 𝑑𝑟⃑ G= 𝜌 vTpk]I 𝜑 vTpk]I = h (s%%⃑ M Op⃑ M )7√=6 A.3 Force derivation

We precede in two steps, covalent-covalent interactions and induced interactions as shown in section 2.3. First, we consider interaction energies due to covalent multipoles interacting with covalent multipoles. The system can be split into two groups of atoms, bonded atoms and nonbonded atoms. Bonded group are those atoms bonded to the atom that are being considered (including itself), and the nonbonded group are the rest. In the bonded group, the atoms can be further split into two subgroups: the atom that is currently under consideration, termed the bonded-moving atom below; the other atoms in the bonded group are termed bonded-non-moving atoms. A total of three groups of atoms can be classified. Thus, we can rewrite the covalent-covalent interaction energy as the following four parts.

1) Nonbonded atoms interacting with bonded-non-moving atoms,

𝑈 = ¥ ¥ (𝑞 T + 𝜇⃑ T ∙ ∇ T )(𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf (𝛽 Tl 𝑅 Tl )𝑅 Tl)k)˘k)vIvl˘k)vIvt)k)t(cid:141)kUT)˙T

2) Nonbonded atoms interacting with bonded-moving atom i , 𝑈 = ¥ (𝑞 T + 𝜇⃑ T ∙ ∇ T )(𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf (𝛽 Tl 𝑅 Tl )𝑅 Tl)k)˘k)vIvl

3) Bonded-non-moving atoms interacting with bonded-non-moving atoms,

𝑈 = 12 ¥ ¥ (𝑞 T + 𝜇⃑ T ∙ ∇ T )(𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf (𝛽 Tl 𝑅 Tl )𝑅 Tl˘k)vIvt)k)t(cid:141)kUT)˙lmT˘k)vIvt)k)t(cid:141)kUT)˙T

4) Bonded-non-moving atoms interacting with bonded-moving atom i , 𝑈 = ¥ (𝑞 T + 𝜇⃑ T ∙ ∇ T )(𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf (𝛽 Tl 𝑅 Tl )𝑅 Tl˘k)vIvt)k)t(cid:141)kUT)˙l

Apparently, there should be a fifth part of interaction energy, nonbonded atoms interacting with nonbonded atoms. However, this part of energy does not change in the force calculation, so we omit its expression here. Of course, atom i ’s self-interaction is also ignored as discussed in the text. Next force on bonded-moving atom ( i ) can be derived as the negative gradient of the above energy terms. When computing the gradient, we should keep in mind that nothing varies on the nonbonded atoms, only the dipole directions vary on the bonded-non-moving atoms, and both dipole directions and positions of the bonded-moving atoms vary. The above four energy parts thus lead to the following four force components, respectively, (𝐹⃑ T ) G = − ¥ ¥ ∇ T (𝜇⃑ z ) ∙ ∇ z (𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf (𝛽 zl 𝑅 zl )𝑅 zl)k)˘k)vIvl˘k)vIvt)k)t(cid:141)kUT)˙z (𝐹⃑ T ) = = − ¥ ∇ T (𝜇⃑ T ) ∙ ∇ T (𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)k)˘k)vIvl − ¥ (𝑞 T + 𝜇⃑ T ∙ ∇ T )+𝑞 l + 𝜇⃑ l ∙ ∇ l .∇ T erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)k)˘k)vIvl (𝐹⃑ T ) = − ¥ ¥ ∇ T (𝜇⃑ z ) ∙ ∇ z (𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf (𝛽 zl 𝑅 zl )𝑅 zl˘k)vIvt)k)t(cid:141)kUT)˙lmz˘k)vIvt)k)t(cid:141)kUT)˙z (𝐹⃑ T ) ¡ = − ¥ ∇ T (𝜇⃑ T ) ∙ ∇ T (𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf+𝛽 Tl 𝑅 Tl .𝑅 Tl˘k)vIvt)k)t(cid:141)kUT)˙l − ¥ (𝑞 T + 𝜇⃑ T ∙ ∇ T )∇ T (𝜇⃑ l ) ∙ ∇ l erf+𝛽 Tl 𝑅 Tl .𝑅 Tl˘k)vIvt)k)t(cid:141)kUT)˙l − ¥ (𝑞 T + 𝜇⃑ T ∙ ∇ T )(𝑞 l + 𝜇⃑ l ∙ ∇ l )∇ T erf (𝛽 Tl 𝑅 Tl )𝑅 Tl˘k)vIvt)k)t(cid:141)kUT)˙l

Summing up all four components, the final force expression is, 𝐹⃑ T = − ¥ ¥ ∇ T (𝜇⃑ z ) ∙ ∇ z (𝑞 l + 𝜇⃑ l ∙ ∇ l ) erf+𝛽 zl 𝑅 zl .𝑅 zlulmz)z − ¥(𝑞 T + 𝜇⃑ T ∙ ∇ T )(𝑞 l + 𝜇⃑ l ∙ ∇ l )∇ T erf (𝛽 Tl 𝑅 Tl )𝑅 TlulmT

Here, n and N follows the same notation as section 2.3, number of atoms in the bonded group and the system, respectively. Second, we consider energies caused by induced dipoles. As stated before, the induced energy contains three parts, induced dipoles interacting with covalent multipoles, induced dipoles interacting with induced dipoles, and induced dipole self-energy G= 𝑝⃑ ∙ 𝐸%⃑ . From section 2.3, we know that the total induced energy is,

12 ¥ −𝑝⃑

TuT ∙ 𝐸%⃑ TB where 𝐸%⃑ TB is the electric field on atom i only by covalent multipoles. The induced dipoles are determined by the total electric field, 𝑝⃑ T = 𝛼 T 𝐸%⃑ T = 𝛼 T (𝐸%⃑ TB − ¥ 𝑇%⃑%⃑ Tl ∙ 𝑝⃑ lulmT ) 𝑇%⃑%⃑ Tl = ∇ T ∇ l erf (𝛽 Tl 𝑅 Tl )𝑅 Tl Changing the above expressions into the component format, and applying the Einstein’s index notation, we obtain 𝑝 Tx = 𝛼 T (𝐸 TB,x − 𝑇

TlxS 𝑝 lS ) Rearrangement leads to (cid:152) 1𝛼 l 𝛿 TlxS + 𝑇

TlxS (cid:156) 𝑝 lS = 𝐴 TlxS 𝑝 lS = 𝐸 TB,x

If we assume 𝐴 TlxS is inversible, and its inverse matrix is 𝐴 tGTlxS , we have 𝑝 Tx = 𝐴 tGTlxS 𝐸 lB,S 𝐴 tGTlxS is a

3𝑁 × 3𝑁 matrix, symmetrical for both atom index and component index, 𝐴 tGTlxS = 𝐴 tGTlSx = 𝐴 tGlTxS The gradient of 𝐴 tG is, 𝜕𝐴 tGTlxS 𝜕𝑥 z(cid:201) = −𝐴 tGTT (cid:134) xx (cid:134) 𝜕𝐴 T (cid:134) l (cid:134) x (cid:134) S (cid:134) 𝜕𝑥 z(cid:201) 𝐴 tGT (cid:134) lx (cid:134) S = −𝐴 tGTT (cid:134) xx (cid:134) 𝜕𝑇 T (cid:134) l (cid:134) x (cid:134) S (cid:134) 𝜕𝑥 z(cid:201) 𝐴 tGl (cid:134) lS (cid:134) S where w refers to coordinate indices ( x , x , x ). It is obvious that 𝑖 ˚ or 𝑗 ˚ has to be equal to k for T to have a nonzero value. We have 𝜕𝑇 T (cid:134) l (cid:134) x (cid:134) S (cid:134) 𝜕𝑥 z(cid:201) = 𝜕𝑇 zl (cid:134) x (cid:134) S (cid:134) 𝜕𝑥 z(cid:201) + 𝜕𝑇 T (cid:134) zx (cid:134) S (cid:134) 𝜕𝑥 z(cid:201) Based on above relations, the force expressed as the negative gradient of the induced energy is 𝐹 T(cid:201) = 𝜕𝜕𝑥

T(cid:201) b12 𝑝⃑ l ∙ 𝐸%⃑ lB f = 𝜕𝜕𝑥 T(cid:201) b12 𝐴 tGlzxS 𝐸 lB,x 𝐸 zB,S f = 12 𝜕𝐴 tGlzxS 𝜕𝑥 T(cid:201) 𝐸 lB,x 𝐸 zB,S + 𝐴 tGlzxS 𝐸 lB,x 𝜕𝐸 zB,S 𝜕𝑥 T(cid:201) = − 12 𝐴 tGll (cid:134) xx (cid:134) 𝜕𝑇 l (cid:134) z (cid:134) x (cid:134) S (cid:134) 𝜕𝑥 T(cid:201) 𝐴 tGz (cid:134) zS (cid:134) S 𝐸 lB,x 𝐸 zB,S + 𝑝 zS 𝜕𝐸 zB,S 𝜕𝑥 T(cid:201) = − 12 𝑝 l (cid:134) x (cid:134) 𝜕𝑇 l (cid:134) z (cid:134) x (cid:134) S (cid:134) 𝜕𝑥 T(cid:201) 𝑝 z (cid:134) S (cid:134) + 𝑝 zS 𝜕𝐸 zB,S 𝜕𝑥 T(cid:201) = −𝑝 Tx (cid:134) 𝜕𝑇 Tz (cid:134) x (cid:134) S (cid:134) 𝜕𝑥 T(cid:201) 𝑝 z (cid:134) S (cid:134) + 𝑝 zS 𝜕𝐸 zB,S 𝜕𝑥 T(cid:201)

Rewriting the above component format into the vector/tensor format, we have 𝐹⃑ T = − ¥ 𝑝⃑ T ∙ ∇ T +𝑝⃑ l ∙ ∇ l .∇ T erf+𝛽 Tl 𝑅 Tl .𝑅 TlulmT + ¥ ∇ T (𝐸%⃑ lB ) ∙ 𝑝⃑ lul The next step is to evaluate ∇ T (𝐸%⃑ lB ) . Following the similar strategy used in covalent-covalent interactions, we split the system into two groups: non-moving atoms and moving atom (i.e. atom i ). When computing the derivative of field on a non-moving atom j , it is worth pointing out that other nonbonded non-moving atoms are not influenced by the virtual displacement of atom i , so only bonded non-moving atoms are considered below. ∇ T +𝐸%⃑ lB . = ∇ T ¸ ¥ −∇ l (𝑞 z + 𝜇⃑ z ∙ ∇ z ) erf+𝛽 zl 𝑅 zl .𝑅 zl˘k)vIvt)k)t(cid:141)kUT)˙zml − ∇ l (𝑞 T + 𝜇⃑ T ∙ ∇ T ) erf+𝛽 Tl 𝑅 Tl .𝑅 Tl (cid:204)= ¥ −∇ T (𝜇⃑ z ) ∙ ∇ z ∇ l erf+𝛽 zl 𝑅 zl .𝑅 zl˘k)vIvt)k)t(cid:141)kUT)˙zml − ∇ T (𝜇⃑ T ) ∙ ∇ T ∇ l erf+𝛽 Tl 𝑅 Tl .𝑅 Tl − (𝑞 T + 𝜇⃑ T ∙ ∇ T )∇ T ∇ l erf+𝛽 Tl 𝑅 Tl .𝑅 Tl = ¥ −∇ T (𝜇⃑ z ) ∙ ∇ z ∇ l erf+𝛽 zl 𝑅 zl .𝑅 zl)zml − (𝑞 T + 𝜇⃑ T ∙ ∇ T )∇ T ∇ l erf+𝛽 Tl 𝑅 Tl .𝑅 Tl Here n represents the number of atoms in the bonded group, including atom i . Next we compute the derivative of field on the moving atom, i.e. atom i , as follows ∇ T +𝐸%⃑ TB . = ∇ T ¸¥ −∇ T +𝑞 l + 𝜇⃑ l ∙ ∇ l . erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)lmT − ¥ ∇ T +𝑞 l + 𝜇⃑ l ∙ ∇ l . erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)k)˘k)vIvl (cid:204)= ¥ −∇ T (𝜇⃑ l ) ∙ ∇ l ∇ T erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)lmT − ¥+𝑞 l + 𝜇⃑ l ∙ ∇ l .∇ T ∇ T erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)lmT − ¥ +𝑞 l + 𝜇⃑ l ∙ ∇ l .∇ T ∇ T erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)k)˘k)vIvlmT = − ¥ ∇ T +𝜇⃑ l . ∙ ∇ l ∇ T erf+𝛽 Tl 𝑅 Tl .𝑅 Tl)lmT − ¥+𝑞 l + 𝜇⃑ l ∙ ∇ l .∇ T ∇ T erf+𝛽 Tl 𝑅 Tl .𝑅 TlulmT