[PDF] The Truncated Conjugate Gradient (TCG), a Non-iterative/Fixed-cost Strategy for Computing Polarization in Molecular Dynamics: Fast Evaluation of Analytical Forces

Abstract

Full PDF

aa r X i v : . [ phy s i c s . c h e m - ph ] A ug The Truncated Conjugate Gradient (TCG), aNon-iterative/Fixed-cost Strategy for ComputingPolarization in Molecular Dynamics: FastEvaluation of Analytical Forces

F´elix Aviat, † Louis Lagard`ere, ∗ , † , § and Jean-Philip Piquemal ∗ , † , ‡ , ¶ † Sorbonne Universit´es, UPMC Univ. Paris 06, UMR7616, Laboratoire de ChimieTh´eorique, F-75005, Paris, France ‡ Institut Universitaire de France,Paris Cedex 05, 75231, France ¶ Department of Biomedical Engineering, The University of Texas at Austin,Austin, Texas,78712, United States § Sorbonne Universit´es, UPMC Univ. Paris 06, Institut des Sciences du Calcul et desDonn´ees, F-75005, Paris, France

E-mail: [email protected]; [email protected]

Abstract

In a recent paper (J. Chem. Theory. Comput., 2017, 13, 180-190) we proposed theTruncated Conjugate Gradient (TCG) approach to compute the polarization energyand forces in polarizable molecular simulations. The method consists in truncatingthe Conjugate Gradient algorithm at a ﬁxed predetermined order leading to a ﬁxedcomputational cost and can thus be considered ”non-iterative”. This gives the pos-sibility to derive analytical forces avoiding the usual energy conservation ( i.e. drifts)issues occurring with iterative approaches. A key point concerns the evaluation of the nalytical gradients, which is more complex than with an usual solver. In this paper,after reviewing the present state of the art of polarization solvers, we detail a viablestrategy for the eﬃcient implementation of the TCG gradients calculation. The com-plete cost of the approach is then mesured as it is tested using a multi-timestep schemeand compared to timings using usual iterative approaches. We show that the TCGmethods is more eﬃcient than traditional techniques, making it a method of choice forfuture long molecular dynamics simulations using polarizable force ﬁelds where energyconservation matters. We detail the various steps required for the implementation ofthe complete method by software developers. Introduction

Polarizable force ﬁelds simulations using point dipoles models are not slow anymore. Indeed,in recent years, the computational cost of the explicit evaluation of the many-body polar-ization energy and associated forces has been signiﬁcantly reduced using state of the artmathematical techniques. More precisely, the bottleneck of such approaches is the manda-tory resolution of a large set of linear equations ( i.e. requiring a matrix inversion) whose sizedepends on the number of polarizable sites, which is very large in practice (for example upto several tens of thousand of atoms for medium sized proteins in water). Therefore, directmatrix inversion approaches are unfeasible and one has to resort to iterative methods suchas the Preconditioned Conjugate Gradient (PCG) or the Jacobi/Direct Inversion of the Iter-ative Subspace (JI/DIIS). Both methods have the advantages to ensure convergence and tobe compatible with a massively parallel implementation coupled to Smooth Particle MeshEwald (SPME), enabling the possibility to tackle large systems of interest that range frommaterials to biophysics. However, iterative techniques have to address two aspects simulta-neously: a low computational cost and a high accuracy on both energy and forces. But thestandard way of computing the forces assumes that the dipoles are fully converged and thusthese forces are not the exact opposite of the gradient of the polarization energy. This means2hat to avoid energy drifts, users have to enforce the quality of the non-analytical forces bychoosing a tighter convergence criterion of 10 –5 to 10 –8 Debye for the dipoles, leading to astrong increase of the number of iterations required to reach convergence. This degrades thecomputational eﬃciency of the solvers, limiting the use of molecular dynamics with polariz-able force ﬁelds. In that context, several strategies have been explored to prevent this driftwhile ensuring accurate results and a low computational overhead.In this paper, we review the present status of the polarization solvers before introducingthe Truncated Conjugate Gradient, a method introduced in ref. 4 to propose an eﬃcientsolution to these challenges. We then address the issue of the fast computation of theanalytical gradients for TCG by presenting a general way to formulate the TCG polarizationforces. Analytical formulas are given for the TCG1 and the TCG2 methods, as well as fortheir reﬁnements with the use of a preconditioner and peek steps. Indeed as a preconditionerimproves the convergence of the polarization computation, a peek step allows to perform aadditional but inexpensive Jacobi/Picard pseudo-iteration that does not requires any matrix-vector product as it uses the available residual obtained from the TCG process. Finally,timings to compute these forces in a production context of a Respa integrator are givenand compared to the ones obtained with standard iterative solvers and diﬀerent level ofconvergence as well as diﬀerent predictor guesses for these solvers.

Polarization solvers: present status

Several iterative solvers applied to the polarization equations have been presented and tested,such as the Jacobi Over Relaxation method (JOR), the (Preconditioned) Conjugate Gradi-ent, the Jacobi/DIIS method (see references 1 and 2) or the recently introduced potentiallyfaster Divide and Conquer block-Jacobi/DIIS method. Considering an iterative solver, several techniques can be used to reduce the computa-tional cost to reach convergence by reducing the number of necessary iteration to do so. In3he context of Krylov methods such as the Conjugate Gradient, it is for example possible touse a preconditioner. It consists in choosing a matrix P such that P − is close to T − (where T is the polarization matrix to be inverted, presented in the third section of the paper) andin applying the iterative method to the modiﬁed linear system where the matrix and theright hand side are multiplied by P − . The convergence of the solver is then acceleratedbecause of the clustering of the eigenvalues of the matrix P − T . Eﬃcient preconditionersfor the polarization equations have been designed, such as the ones proposed by Wang andSkeel which provide a reduction in the number of iteration to reach convergence up to 10to 20 percent, depending on the system ( i.e. on the condition number of the matrix thatone needs to invert).Another way to improve convergence of an iterative solver is to chose an initial ”predictor”guess as close as possible to the actual solution of the linear equations. This guess can beconstructed using information from one or a few of the past values of the dipoles. The mostnaive way to do so is to chose the value of the dipoles at the previous timestep (previousguess) but more elaborate and eﬃcient strategies have been designed such as Kolafa’s AlwaysStable Predictor Corrector (ASPC) or Skeel’s Least Square Predictor Corrector (LSPC), that can reduce the number of iterations required to reach convergence up to a factor two in astandard production context . Nevertheless, these two ways to construct initial guesses losetheir eﬃciency when one uses larger timesteps, as it the case with the RESPA (Reversiblereference System Propagator Algorithm) multiple timestep integrator (instabilities occurwhen such predictors are used with time steps larger than 2 fs).Note that the two reﬁnements (preconditioning and choosing the initial guess of the solverwisely) can be coupled without problem.In the same spirit, it is also possible to speed up convergence by introducing an extendedLagrangian scheme to propagate a set of dipoles that are used as initial guess to standarditerative solvers (iEL/SCF or Extended Lagrangian Self-Consistent Field, see ref. 9). Thisapproach, derived from ab initio MD, signiﬁcantly reduces the number of iterations of the4olver (by the same order of magnitude as the ASPC predictor) but requires to use anadditional thermostat in order to prevent energy ﬂows between the degrees of freedom.However, whatever the diﬀerent speedups strategies applied of the popular iterative pro-duction methods such as PCG or JI/DIIS, they still suﬀer from an important drawback in linkto the way the associated forces are computed. Indeed, they do not address the polarizationenergy drifting issues that will be encountered in long simulations of large non-homogeneouscomplexes, such as proteins in water or highly charged ionic liquids. In such case, the math-ematical problem, i.e. the matrix inversion, is costlier to solve as the polarization matrixitself is worse conditioned than in simple bulk water. Therefore, to ensure stability of verylong timescale simulations towards microseconds where errors accumulate, they should allemploy a tighter dipole convergence criterion (10 − to 10 − D) leading to a higher numberof iterations than usually discussed in benchmarks for short simulations, where the 10 − Dstandard is employed, eﬀectively causing really degraded real life performances.Another set of methods address this issue by considering analytical formulas for thepolarization energy.The ﬁrst idea in that direction was introduced by Wang and Skeel, who used Chebyshevpolynomials to get analytical expressions of the polarization energy and its derivatives, whichautomatically ensures that the source of the energy drift previously evoked is removed. Un-fortunately, the approach provided energy surfaces that were too far from the ones obtainedwith tightly converged iterative method and was thus not further investigated. Signiﬁcantprogresses were recently made in the same direction by Simmonett et al. who proposeda revisitation of Wang’s proposal through the ExPT (Extrapolated Perturbation Theory)perturbation approach, which is equivalent to the truncation of the Jacobi iterative methodat a predetermined order combined with the use of a few parameters.If the parametric aspect of their approach initially limited its global applicability toany type of systems, the authors recently improved their method which is now denotedOPT3 (OPT=Orders of Perturbation Theory) by pushing it to higher order of perturbation5nd providing a systematic way for the parametrization, extending the applicability of themethod. One advantage of the approach is its reduced cost compared to the best iterativeapproaches.Alternatively, one can also consider the actual induced dipoles as new degrees of freedomand build an extended Lagrangian deﬁning the way to propagate them during the dynamicswithout any SCF cycles . The ﬁrst results using this strategy are promising and themethod indeed does not require any iteration. On the performance side, one could argue thatusing a production PCG solver with a 10 − D convergence threshold, a RESPA integratorwith a 2 fs time step for the non bonded forces coupled to Kolafa’s ASPC is twice fasterthan the sequential iEL/0-SCF method with a 1 fs time step . Nevertheless, this PCGspeed advantage is only ”apparent” as it does not solve the energy drift issue for long timescales whereas the iEl/0-SCF method has been shown to have improved energy conservationproperties. This nice improvement is due to the use of thermostats and therefore, iEL/0-SCunfortunately suﬀers from the drawbacks of any extended Lagrangian approach that cannot use time steps larger than 1 fs. As we stated before, if iterative methods do not haveany theoretical upper limit to the time step they can be used with, it requires not to useinformation from the past such as predictor-correctors, removing such speed advantage whenusing RESPA.As we see from this discussion, the question of which method to adopt is complex as itappears diﬃcult to combine all possible improvements.In fact, we can state that reducing the computational cost of an iterative method tocompute the polarization energy and forces always come with degraded energy conservation.Energy conservation is tricky as it depends on the chemical nature of the system (chargedor not, homogeneous or not). For example, polarization of bulk water systems requires lessiterations to converge with PCG solvers. On the other hand, the ExPT method behavespoorly for the ionic liquid system that will be studied in section 4 and the Jacobi methoddoes not even converge in that case . 6 major diﬃculty to compute the polarization energy and its gradient for future mi-croseconds simulations is to oﬀer a non-empirical strategy applicable to any kind of systems,embodying the following properties.Indeed, such a method should be systematically improvable in order to allow the userto set the accuracy of the simulation depending on its goal. For example, the simple Ja-cobi method has been shown not to converge in several cases and adding iterations wouldnot improve the results. It should show good conservation of the total energy during amicrocanonical simulation, ensuring good accuracy on the forces driving the dynamics. Itshould also be non parametric to provide a close reproduction of any type of potential en-ergy surfaces, without having to resort to force-ﬁeld models reparametrization. In practice,a polarization scheme should also be aﬀordable with a computational cost as reduced aspossible. It should allow to use larger time steps through multiple timestep schemes suchas RESPA. In the end, the selected criterion to compare computational eﬃciencies of thevarious schemes should be the global cost of computing both energy and derivatives withsimilar energy conservation capabilities for a given trajectory length. TCG : context

To address all these required features we recently introduced a non-empirical and non-iterative strategy denoted the Truncated Conjugate Gradient (TCG). TCG is derived byexplicitly writing down all numerical operations of a ﬁnite number of Conjugate Gradient cy-cles of iteration which can be user-chosen (be TCG-n, n=1,3). As the number of operationsin the TCG approach is ﬁxed once and for all, it is possible to derive an exact analyticalexpression of the gradient of the energy like in ExPT/OPT3, avoiding by construction anyenergy drift in microcanonical simulations and thus ensuring energy conservation in thatcontext. The higher the TCG level is, the higher its accuracy is, as TCG inherits from theproperties of the Conjugate Gradient and beneﬁts from the fact that it is a Krylov method7n which the associated error is monotonically reduced at each iteration. It can be shownin that context that the CG-method is mathematically optimal, meaning that it minimizesexactly the polarization energy on the so-called Krylov subspaces at each iteration and there-fore guarantees that the number of the required matrix-vector products (1 per iteration inany iterative approach) are reduced to a minimum compared to other iterative methods.Moreover, the TCG accuracy can be improved at negligible costs ( i.e. without any addi-tional matrix-vector product): (i) by using preconditioners as presented above leading to theTruncated Preconditioned Conjugate Gradient (TPCG); (ii) by using the residue of the ﬁnalCG step, available without any additional cost, to perform an additional “peek” iteration,equivalent to one step of Jacobi Over Relaxation (JOR) with a relaxation parameter whichcan be found adaptively.Overall, the TCG approach was found to accurately reproduce energy surfaces at a re-duced computational cost providing analytical forces. As it does not rely on history, it doesnot suﬀer from MD perturbations such as the ones arising when predictor guesses, whichbreak the time-reversibility of the simulation, are used in polarization solvers. It is for thesame reasons compatible with the use of large timestep with multi-timesteps integrators.Also, being based on the Conjugate Gradient and thus relying essentially on matrix vectorproducts and computation of electric ﬁelds, it can replace standard solvers in a regular im-plementation including linear scaling ones using Smooth Particle Mesh Ewald. Furthermore,it does not require additional advanced thermostating nor any additional parameter.The purpose of this paper is to address one delicate point which is the main bottleneckof the TCG method: the complex derivation of its gradients. If TCG answers all the desireddiscussed properties for a polarization solver, a naive derivation of the energy gradientscan lead to an undesired additional computational cost, while the method should remainanalytical, accurate but cheap as well. The goal here is to detail a strategy enabling a fastcomputation of the analytical gradients that would allow developers to eﬃciently implementthe TCG approach in the software of their choice. We will ﬁrst present the technical aspect8f TCG and its notations, then we will detail the optimal computation of gradients in a formthat could be implemented by developers. TCG : notations

We will place ourselves in the context of the AMOEBA force ﬁeld and consider a system of N atoms, each embodying a multipole expansion (up to quadrupoles) as permanent chargedensity and a polarizability tensor α i . We will denote E as the 3 N vector gathering allelectric ﬁelds ~E i created by the permanent charge density at atomic position i , and µ is theequivalent 3 N vector gathering the induced dipoles experienced at each atomic site. T is the3 N × N polarization matrix, deﬁned by block as follows. It bears the 3 × α i along its diagonal block, and the interaction between the i th and j th dipole isrepresented as the T ij tensor. T =  α − − T − T . . . − T N − T α − − T . . . − T N − T − T . . .... ... ... − T N − T N . . . α − N  This matrix is symmetric and positive deﬁnite. Thanks to the Thole damping of theelectric ﬁeld at short range, any polarization catastrophe is prevented. Indeed, the Tholedamping acts on the eigenvalues. Without Thole, negative eigenvalues could be found whichis a problem for Conjugate Gradient methods. Using these notations, the total polarization energy can be expressed as follows : E pol = 12 µ T T µ − µ T E (1)where µ T E represents the scalar product of vectors µ and E (also noted h µ , E i ). One can9asily see that the dipole vector µ minimizing (1) veriﬁes the following linear system: T µ = E (2)giving the minimized polarization energy: E pol = − µ T E (3)As explained earlier, the TCG method that we use to solve this equation, derives fromthe Conjugate Gradient algorithm. It uses three vectors upon starting : the guess µ , theinitial residual r = T µ − E , and an initial descent direction p that we set to be equal to r . It reads as follows:  γ i = r T i r i p Ti Tp i µ i +1 = µ i + γ i p i r i +1 = r i − γ i Tp i β i +1 = r Ti +1 r i +1 r Ti r i p i +1 = r i +1 + β i +1 p i (4)Instead of using a convergence criterion as a condition to stop iterating, as this is usuallydone, one can choose to arbitrarily ﬁx the number of iterations and to unfold a ﬁnite num-ber of computational operations that makes it ﬁxed cost and non-iterative, as explainedabove. This deﬁnes our Truncated Conjugate Gradient (TCG) method. Besides the obviousadvantage of drastically reducing the computational cost of each induced polarization cal-culation, it allows one to simulate perfectly stable molecular dynamics, without drift overtime, as explained in Ref. 4. This advantage is not limited to MD and could be exploited inMonte-Carlo simulations. 10he exact, total derivative of the energy with respect to the nuclear position should be:E. pol r. i = ∂E pol ∂ µ ∂ µ ∂r i + ∂E pol ∂r i (5)When using an iterative method, the provided solution µ is inexact (approached only),thus the energy is not perfectly minimized with respect to the dipoles (the term ∂E pol /∂ µ is not zero). One usually still makes this erroneous assumption, giving E. pol / r. i = ∂E pol /∂r i .This leads to computing forces that do not perfectly correspond to the system, and thus toan unavoidable drift in the subsequent simulations.If one ﬁxes the number of iterations, it is however possible to ”unroll” the analyticalformula for the ﬁnal polarization vector, expressed as a function of the starting quantities( µ , r ). Noting µ TCG n this vector, with n the truncation order ( i.e the number of iterationsof the algorithm), one obtains the TCG n family of methods that reads up to order three: µ T CG = µ + t r (6) µ T CG = µ + ( γ t + t ) r − γ t P (7) µ T CG = µ + ( t + γ t + γ + γ β t ) r − ( γ t + γ t + γ β t ) P − γ γ P (8)All quantities used in the previous equations are deﬁned in the Appendix. In practice,we showed that one could stop as the TCG2 level, as it is accurate enough. Fast computation of the gradients

In this section, we ﬁrst explain that computing the gradients of the energy, even though ananalytical expression is at our disposal, is not straightforward. We then show how to passthe diﬀerent hurdles encountered.Having the analytical, exact expression of the dipoles allows one to diﬀerentiate them inan equally exact manner. A formal diﬀerentiation, with a prime ” ′ ” denoting it, would give11or the ﬁrst two orders: µ ′ T CG = µ ′ + t r ′ + t ′ r (9) µ ′ T CG = µ ′ + ( t + γ t ) r ′ + ( t ′ + γ ′ t + γ t ′ ) r + γ ′ t P + γ t ′ P + γ t P ′ (10)However, the diﬀerentiation of a 3 N vector with respect to 3 N spatial coordinates wouldbuild a 3 N × N matrix. This leads to three obstacles that slow down the gradient compu-tation : • ﬁrstly, a scalar product of one such derivative A ′ with another vector B would leadto a (3 N ) operation, which is a non-negligible cost, repeated for all products of this < A ′ , B > form. • Secondly, these products, when using the analytical expressions (equations 9 and 10)”as is”, are repeated an unnecessary number of times, eﬀectively making this slow-down a pure stop. • Thirdly, one can see that there are two types of vectors building µ TCG n : the electricﬁeld E , but also the product of the residue with successive powers of the polarizationmatrix ( r , Tr = P , more generally T m r , with m an integer). Diﬀerentiating T m r exhibits, amongst others, a T p T ′ T q r term (with p and q two integers verifying p + q + 1 = m ); computing such a T . T ′ A product is equivalent to a matrix-matrixproduct, which is also computationally too expensive.This makes a naive implementation of our method eﬀectively unusable. Yet to run a classicalsimulation, one needs the forces , i.e. the gradients of the polarization energy, rather thanthe derivatives of the dipoles themselves. What one really needs is thus the derivative of thefollowing scalar product : E pol = 12 < E , µ TCG n > (11)12hat is, formally, E ′ pol = 12 < E ′ , µ TCG n > + 12 < E , µ ′ TCG n > (12)Firstly, developing eq. (12) shows all scalar products involved involve a diﬀerentiatedquantity : either a diﬀerentiated matrix (like < A , T ′ B > ), or the derivative of the ﬁelditself ( E ′ ). An analogy, or dimensional analysis, allows us to compare these terms to forces,with < A , E ′ > corresponding to a force produced by the interaction of the dipoles A withthe electric ﬁeld, and < B , T ′ C > to a force arising from the interaction between two sets ofdipoles B and C . The expensive part of computing such quantities lies in the calculation ofdistances. All of these forces can be computed in a single double loop (whose cost is a O ( N )for direct calculations, and O ( N log N ) when using SPME) to minimize the computationalcost and compute the said distances only once. This adresses the ﬁrst hurdle evoked earlier.We can also reorganize the gradient computation in order to minimize the number ofthe expensive scalar products involving a vector and a diﬀerentiated vector, by grouping allthese scalar products and performing them all at once (given three vectors A , B and C , ifone needs to compute < A , B ′ > + < C , B ′ > , it is much more eﬃcient to ﬁrst prepare avector D = A + C and then to compute < D , B ′ > ). This optimization, though quite simplein principle, actually requires quite involved expressions (see Annex). It is a simple solutionto the second obstacle we listed.Thirdly, since T is a symmetric matrix, we have < TA , B > = < A , TB > for any twovectors A and B . In particular, for our generic vectors T m r , < T p T ′ T q r , A > = < T ′ T q r , T p A > (13)Considering scalar products thus allows us to get rid of the matrix-matrix ( T . T ′ ) products,our third hurdle.Overall, the solution to overcome our obstacles came from considering the polarization13nergy instead of the induced dipole themselves.To illustrate our solution, one can write the analytical formulas as follows, for the TCGat order one and two respectively : (14) E ′ pol, TCG1 = 12 (cid:16) h r ′ , a (1)1 , E + a (1)1 , r + a (1)1 , Tr i + h T ′ r , a (1)2 , r i (cid:17) (15) E ′ pol, TCG2 = 12 (cid:16) h E ′ , µ TCG2 i + h µ ′ , E i + h r ′ , a (2)1 , E + a (2)1 , − TE + a (2)1 , r + a (2)1 , Tr + a (2)1 , T r + a (2)1 , T r i + h T ′ r , a (2)2 , E + a (2)2 , r + a (2)2 , Tr + a (2)2 , T r i + h T ′ Tr , a (2)3 , r + a (2)3 , Tr i + h T ′ T r , a (2)4 , r i (cid:17) where the coeﬃcients a ( k ) i,j are the result of the cumbersome derivation evoked earlier ; theirexplicit expression can be found in the Annex.As stated earlier in this paper, the so-called peek-step is a supplementary JOR iterationbased on the last obtained residual r n . It simply improves the solution to reach the followingexpression : µ ( peek )TCG n = µ TCG n + ω α r n (16) α is the relaxation parameter mentioned earlier; more precisions on its choice can be foundin ref. 4 . Deﬁning µ peek, TCG n = ω α r n , the supplementary contribution of the peek step canbe also written as follows : (17) E ′ peek, TCG1 = h µ peek, TCG1 , E ′ i + h r ′ , a (1 ,p )1 ,α α E + a (1 ,p )1 , α T α E + a (1 ,p )1 , r + a (1 ,p )1 , Tr i + h T ′ r , a (1 ,p )2 , r + a (1 ,p )2 ,α α E i E ′ peek, TCG2 = h µ peek, TCG2 , E ′ i + h r ′ , a (2 ,p )1 , α α E + a (2 ,p )1 , α T α E + a (2 ,p )1 , α T α E + a (2 ,p )1 , r + a (2 ,p )1 , Tr + a (2 ,p )1 , T r + a (2 ,p )1 , T r i + h T ′ r , a (2 ,p )2 ,α α E + a (2 ,p )2 , α T α E + a (2 ,p )2 , r + a (2 ,p )2 , Tr + a (2 ,p )2 , T r i + h T ′ Tr , a (2 ,p )3 ,α α E + a (2 ,p )3 , r + a (2 ,p )3 , Tr i + h T ′ T r , a (2 ,p )4 , r i (18)14the coeﬃcients a ( k,peek ) i,j , as well as an explicit formula for the µ peek vectors, are repro-duced in the Annex). One should then simply sum the corresponding terms to obtain theﬁnal expression for the polarization energy gradients in a computationally feasible way : forexample, the scalar product h r ′ , r i should now be multiplicated by coeﬃcient a (1)1 , + a (1 ,p )1 , to get the correct gradients for TCG1.All these formulas have been tested and validated against gradients obtained via ﬁnitediﬀerences. Such details could be useful to allow anyone to implement the fast evaluationof the forces necessary to the use of TCG. The source code of this method will be freelyavailable in Tinker-HP version 1.1. To sum up, the implementation of the gradients calculation that we propose here followsthese three steps : ﬁrstly, we compute the successive matrix-vector products to build thesuccessive T m r vectors needed; secondly we perform the various scalar products appearingin our analytical formulas, allowing us to assemble (through weighted sums) a second setof vectors; ﬁnally, we perform simultaneously on all these assembled vectors a ”force-like”calculation. The choice to use – or not – a peek step only changes the assembled vectors onstep two, through an extra set of coeﬃcients as presented above. Numerical results

In this section, we report the timings of the implementation presented above for diﬀerentsystems as it has been added to the software Tinker-HP . More precisely, we report thecost of the calculation of the polarization energy and the associated forces with diﬀerentmethods: a standard diagonally preconditioned conjugate gradient (PCG) with a 10 − Dconvergence threshold, the same method with a tighter 10 − D convergence threshold (thatensures energy conservation as explained above) and the TPCG1 and the TPCG2 methodswith the ”direct ﬁeld” α E as guess µ with a Jacobi peek step ( ω =1). For the two PCGsolver settings the average number of iterations is also reported in parenthesis. Note that15he computational cost of these two methods would be the same with any other kind ofpeek steps whose cost is negligible, as described in ref. 4 . For the PCG solvers, we reporttimings using the simple ”direct ﬁeld” as a guess (noted ”PCG (10 − x D )” in the table) andalso timings using the ASPC predictor (noted ”PCG (10 − x D , ASPC)” ). These methodsare timed in the nowadays standard context of the RESPA integrator used with a 2 fs timestep for the non bonded forces.The systems that are tested here are the same than in our previous work: three solvatedprotein droplets (the HIV nucleocapsid ncp7 made of 18518 atoms, the ubiquitin made of9737 atoms and the dihydrofolate reductase dhfr with 23558 atoms) and an ionic liquid,the dimethyl-imidazolium [dmim+][Cl-] (3672 atoms). No boundary conditions are used inthese tests, therefore, each matrix-vector product and force computation involved in the PCGsolvers and in the TCG formulas has a O ( N ) computational cost. However, these matrix-vector products can be easily re-expressed following the possible choices for the boundaryconditions that will give rise to slightly diﬀerent forms of the polarization matrix. For exam-ple, TCG being really close to PCG, it can either be applied in the context of the ParticleMesh Ewald method with a O ( N lnN ) cost, or using the Fast Multipole summation tech-nique with a O ( N ) cost. These operations are by far the costliest in the computation ofthe dipoles and of the polarization forces. This is why we report the timings as their pro-portional cost compared to the PCG solver with a convergence threshold of 10 − D and thedirect ﬁeld as a guess, as these proportions would be the same when using other boundaryconditions. We chose these settings to be our reference.All these (sequential) timings were obtained on an HP 620 Workstation made of IntelXeon E5-2665 CPUs at 2.4 Ghz and were averaged over 100 ps of NVT trajectories at 300 Kfor the protein droplets and at 425 K for the ionic liquid.We observe that both the TPCG methods are signiﬁcantly faster compared to standardproduction settings (10-5D). Compared to more strict settings using a convergence criterionof 10 − D for the PCG solver, which guarantees energy conservation during the MD simula-16able 1: Average time for the computation of the polarization energy and the associ-ated forces for diﬀerent methods, using the PCG converged at 10 − D as reference, for aRESPA(2fs) timestep. In parenthesis, mean number of iterations needed.ubiquitin ncp7 dhfr [dmim+][Cl-]PCG (10 − D) 100% (8) 100% (8) 100% (8) 100% (8)PCG (10 − D, ASPC) 88% (6) 85% (6) 88% (6) 84% (5)PCG (10 − D) 136% (15) 138% (15) 143% (16) 138% (15)PCG (10 − D, ASPC) 125% (13) 127% (13) 125% (13) 117% (12)TPCG1 43% 43% 44% 44%TPCG2 61% 62% 63% 63%tion, diﬀerences are even more striking because the computational cost of the TPCG1 andTPCG2 methods are found to be respectively more than three times faster and more thantwice faster respectively.This means that using these methods with the implementation described in this paperenables not only to guarantee energy conservation but also to save a considerable amount oftime during the computation of the polarization energy and the associated forces.Concerning the use of ASPC, a striking result at a timestep of 2 fs is the smaller reductionof iterations necessary to reach convergence compared to the reduction observed at 1 fs where a 50% gain was observed for a 10 − D threshold. In other words, ASPC guess is lesseﬃcient when using a bigger timestep. Following intuition, the shorter the timestep, themore eﬃcient the ASPC. Moreover, in line with our previous study , we also observed thatthe proportional gain in that regard is even smaller for tighter dipole convergence criterion(such as 10 − D), making very long simulations a daunting challenge.Another remark concerns the use of even larger timesteps with the RESPA integrator. Ithas been indeed shown that one can use a 3 fs timestep for the non-bonded forces, providedthat masses of the hydrogen atoms of the system are appropriately redistributed amongheavy atom carriers. But such large timesteps limit the use of predictor such as the ASPCand no gain in the number of iteration can be obtained with these methods. On the contrary,the computational cost of the T(P)CG family of methods does not suﬀer from such a changeas no history is taken into account. The computational cost at 3 fs would remain the same17hat in the 2 fs context, oﬀering an automatic 1.5 acceleration for the same trajectory lengthat no cost, increasing the global speedup oﬀered by the use of T(P)CG.

Conclusion

As we have seen, one can reformulate the analytical expressions for the gradients of theTruncated Conjugate Gradient using a clear strategy. We detailed for interested developersthe various steps required for the implementation of the complete TCG method includingfast forces computations.This strategy allows the implementation of these gradients to be fast enough for thecomputational cost of an evaluation of the polarization energy and the associated forcesto be greatly reduced compared to standard production settings using iterative methods.The TPCG2 method is more than 1.6 times faster than the PCG solver with a 10 − Dconvergence criterion and the direct ﬁeld as a guess using a RESPA integrator with a 2 fstime step (1.4 when ASPC is used). Moreover, it is more than 2 times faster than a PCGwith a convergence criterion of 10 − D and the same predictor guess, such settings beingmandatory to guarantee energy conservation with standard PCG for long simulations. Asthe number of operations in the TCG method is ﬁxed and does not rely on history (i.e. noprevious dipole guess nor predictor guess), it can be applied with larger time-steps for thesame ﬁxed computational cost.The TCG approach provides an accurate reproduction of energy surfaces at a reducedcomputational cost, providing analytical forces that avoid by construction the drift issueswithout relying on complex parametrization, nor adding extra degrees of freedom limitingthe settings than one can use to integrate MD trajectories. That is why it should be a methodof choice for long timescale and stable simulations using polarizable force ﬁelds. Since allTCG’s analytical formulas involve the expressions of electric ﬁelds as well as matrix-vectorproducts, these latter are easily and directly transposable in diﬀerent boundary conditions.18n particular, the extension to Smooth Particle Mesh Ewald is straightforward. For thesame reasons, the parallel implementation of these methods within the context of spatialdecomposition follow any PCG one and will be described in a future paper dedicated to themassively parallel Tinker-HP package. In that context, capabilities of the AMOEBA forceﬁeld using a TCG/SPME coupling will be tested by comparing various properties obtainedwith these methods. Acknowledgement

This work was supported in part by French state funds managed by CalSimLab and theANR within the Investissements d Avenir program under reference ANR-11-IDEX-0004-02.Jean-Philip Piquemal and Louis Lagard`ere are grateful for support by the Direction G´en´eralede l Armement (DGA) Maitrise NRBC of the French Ministry of Defense.

Annex

We introduce the following notations to express the analytical formulas of the induceddipoles, as well as their derivatives. Each term can be expressed using the starting vec-tors ( r and µ ) and the polarization matrix T .Vectors : • r = E − T µ • p = r • P = Tr • P = t P − t T r • P = (1 + β t ) Tr − ( t + β t ) TP − γ TP Scalars : • n = r T r • t = r T P • t = n || P || t t = t P T P • t = n t • t = P T P • t = t = t || P || − t t • t = r T T r • t = t − n || P || • γ = t − n || P || t • sp = r T E • sp = P T E = E T Tr • b = sp − γ sp • b = sp t − t sp • spp = h α E , E i• spp = h α Tr , E i• β = n + t || P || + γ || P || − t t − γ t || P || +2 γ t t ( t − n • γ = n + t || P || + γ || P || − t t − γ t || P || +2 γ t t (1 + β t ) r T P − ( t + β t ) P T P + γ P T P Peek-step formulas µ peek, TCG1 = ω α r − ωt α P (19) µ peek, TCG2 = ω α r − ωt α P − ω α γ t P − ω α γ t T r (20) Coeﬃcients for the analytical expressions

The superscript number, between parenthesis, indicates the truncation number (1 or 2). A p indicates that the coeﬃcient corresponds to the peek-step derivative, and needs to be addedto the energy derivative coeﬃcient itself.Derivation of E pol, TCG1 : • a (1)1 , = t • a (1)1 , = sp t + t • a (1)1 , = − sp n t • a (1)2 , = − sp n t Peek-step for TCG1 : 20 a (1 ,p )1 ,α = ω • a (1 ,p )1 , α = − t ω • a (1 ,p )1 , = − spp ωt • a (1 ,p )1 , = n spp ωt • a (1 ,p )2 ,α = − t ω • a (1 ,p )2 , = n spp ωt TCG2 : • a (2)1 , = t + γ t • a (2)1 , − = − γ t • a (2)1 , = b t − np b t − np t b t t + 2 t t b t +2 np sp γ t • a (2)1 , = − n b t + 4 t b t − n t t b t t +4 t np t b t − t t b t − n np sp γ t • a (2)1 , = − t t t b t − n b t + 2 n sp γ t • a (2)1 , = 2 t t t b t • a (2)2 , = − γ t • a (2)2 , = − n b t + 2 t b t − n t t b t t +2 t np t b t − t t b t − n np sp γ t • a (2)2 , = − n b t − t t t b t + n sp γ t • a (2)2 , = t t t b t • a (2)3 , = − n b t − t t t b t + n γ sp t • a (2)4 , = t t t b t • a (2)3 , = t t t b t Peek-step for TCG2 : • a (2 ,p )1 , α = ω • a (2 ,p )1 , α = − ω ( t γ + t ) • a (2 ,p )1 , α = − ωt γ • a (2 ,p )1 , = − np t ωγ spp + ( ωt spp + ωt spp ) (cid:16) np t + np t t t − t t t (cid:17) − t ( ωγ spp + ωspp ) • a (2 ,p )1 , = n np t ωγ spp +( ωt spp + ωt spp ) (cid:16) − t t + n t t t t − np t t t + t t t (cid:17) + n t ( ωγ spp + ωspp ) • a (2 ,p )1 , = − n t γ ωspp + ( ωt spp + ωt spp ) (cid:16) t t t t + n t (cid:17) a (2 ,p )1 , = − ( ωt spp + ωt spp ) t t t t • a (2 ,p )2 ,α = − ω ( γ t + t ) • a (2 ,p )2 , α = − ωt γ • a (2 ,p )2 , = n np t ωγ spp +( ωt spp + ωt spp ) (cid:16) − t t + n t t t t − np t t t + t t t (cid:17) + n t ( ωγ spp + ωspp ) • a (2 ,p )2 , = − n t ωγ spp + ( ωt spp + ωt spp ) (cid:16) n t + t t t t (cid:17) • a (2 ,p )2 , = − ( ωt spp + ωt spp ) t t t t • a (2 ,p )3 ,α = − ωγ t • a (2 ,p )3 , = − n t ωγ spp + ( ωt spp + ωt spp ) (cid:16) n t + t t t t (cid:17) • a (2 ,p )3 , = − ( ωt spp + ωt spp ) t t t t • a (2 ,p )4 , = − ( ωt spp + ωt spp ) t t t t References (1) Lipparini, F.; Lagard`ere, L.; Stamm, B.; Canc`es, E.; Schnieders, M.; Ren, P.; Maday, Y.;Piquemal, J.-P.

Journal of Chemical Theory and Computation , , 1638–1651.(2) Lagard`ere, L.; Lipparini, F.; Polack, E.; Stamm, B.; Canc`es, E.; Schnieders, M.; Ren, P.;Maday, Y.; Piquemal, J. P. Journal of Chemical Theory and Computation , ,2589–2599.(3) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. J.Chem. Phys , , 8577–8593.(4) Aviat, F.; Levitt, A.; Stamm, B.; Maday, Y.; Ren, P.; Ponder, J. W.; Lagard`ere, L.;Piquemal, J.-P. Journal of Chemical Theory and Computation , 180––190.(5) Nocito, D.; Beran, G. J. O.

J. Chem. Phys. , , 114103.(6) Wang, W.; Skeel, R. D. J. Chem. Phys. , , 164107.227) Kolafa, J. J. Comput. Chem. , , 335–342.(8) Tuckerman, M.; Berne, B. J.; Martyna, G. J. J. Chem. Phys. , , 1990–2001.(9) Albaugh, A.; Demerdash, O.; Head-Gordon, T. J. Chem. Phys. , , 174104.(10) Wang, W. Fast Polarizable Force Field Computation in Biomolecular Simulations.Ph.D. thesis, University of Illinois at Urbana-Champaign, 2013.(11) Simmonett, A. C.; Pickard IV, F. C.; Shao, Y.; Cheatham III, T. E.; Brooks, B. R. J.Chem. Phys. , , 074115.(12) Simmonett, A. C.; Pickard IV, F. C.; Ponder, J. W.; Brooks, B. R. J. Chem. Phys. , , 164101.(13) Albaugh, A.; Niklasson, A. M.; Head-Gordon, T. The Journal of Physical ChemistryLetters , , 1714–1723.(14) Ponder, J. W.; Wu, C.; Ren, P.; Pande, V. S.; Chodera, J. D.; Schnieders, M. J.;Haque, I.; Mobley, D. L.; Lambrecht, D. S.; DiStasio, R. A. J.; Head-Gordon, M.;Clark, G. N. I.; Johnson, M. E.; Head-Gordon, T. J. Phys. Chem. B , , 2549–64.(15) Tinker-HP. , Accessed:2017-05-31.(16) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. , , 10089–10092.(17) Greengard, L.; Rokhlin, V. J. Comput. Phys. , , 325–348.(18) Hopkins, C. W.; Le Grand, S.; Walker, R. C.; Roitberg, A. E. Journal of ChemicalTheory and Computation ,11