[PDF] Alchemical Transformations for Concerted Hydration Free Energy Estimation with Explicit Solvation

Abstract

We present a family of alchemical perturbation potentials that enable the calculation of hydration free energies of small to medium-sized molecules in a concerted single alchemical coupling step instead of the commonly used sequence of two distinct coupling steps for Lennard-Jones and electrostatic interactions. The perturbation potentials are based on the softplus function of the solute-solvent interaction energy designed to focus sampling near entropic bottlenecks along the alchemical pathway. We present a general framework to optimize the parameters of alchemical perturbation potentials of this kind. The optimization procedure is based on the λ -function formalism and the maximum-likelihood parameter estimation procedure we developed earlier to avoid the occurrence of multi-modal distributions of the coupling energy along the alchemical path. A novel soft-core function applied to the overall solute-solvent interaction energy rather than individual interatomic pair potentials critical for this result is also presented. Because it does not require modifications of core force and energy routines, the soft-core formulation can be easily deployed in molecular dynamics simulation codes. We illustrate the method by applying it to the estimation of the hydration free energy in water droplets of compounds of varying size and complexity. In each case, we show that convergence of the hydration free energy is achieved rapidly. This work paves the way for the ongoing development of more streamlined algorithms to estimate free energies of molecular binding with explicit solvation.

Full PDF

SSingle Step Hydration Free Energy

Alchemical Transformations for Single-Step Hydration Free EnergyCalculations

Emilio Gallicchio [email protected] of Chemistry, Brooklyn College of the City University of New York, New York,NY; Ph.D. Program in Chemistry, The Graduate Center of the City University of New York, New York,NY; Ph.D. Program in Biochemistry, The Graduate Center of the City University of New York, New York,NY (Dated: 15 May 2020) We present a family of alchemical perturbation potentials that allow the calculation of hydration free energy of small tomedium-sized molecules in a single perturbation step. We also present a general framework to optimize the parametersof the alchemical perturbation potentials based on avoiding ﬁrst order pseudo phase transitions along the alchemicalpath. We illustrate the method for two compounds of increasing size and complexity: ethanol and 1-naphtol. In eachcase we show that convergence of the hydration free energy is achieved rapidly when conventional approaches fail.

I. INTRODUCTION

The hydration free energy of a compound is deﬁned as thethe reversible work for transferring one molecule of the com-pound from the gas phase to the water phase. The hydra-tion free energy is an important characteristic of a substance.For instance it is one of the determinant factors of the wa-ter solubility of drug formulations and of the binding afﬁn-ity of an inhibitor towards a protein receptor. Hydration freeenergies are most commonly derived from Henry’s constantmeasurements. Free energies of hydration can also be estimatedcomputationally. Most commonly this is accomplished bymolecular simulations of alchemical transformations in whichsolute-solvent interactions are progressively turned on. Thenature of the alchemical transformation is critical to obtainingreliable results. For compounds other than the smallest so-lutes, a single direct alchemical process in which the couplingbetween the solute and the solvent is increased in a simplelinear fashion has been found to be problematic for anythingother than the smallest solutes (monoatomic atoms and ions,water, methane, etc.). Some of the problems are due to thesingularity of the derivative of the alchemical potential withrespect to the charging parameter λ near the decoupled state. Simple approaches of this kind can also be found to convergeslowly due to bottlenecks along the alchemical path caused bypoorly sampled conformational equilibria. These issues have been the subject of intense studies. Thiscollective effort resulted in a set of recommended best prac-tices for alchemical calculations that are commonly employedby the computational chemistry community and that are nowa-days implemented in popular molecular simulation softwarepackages.

For example, it is generally recommended to splitthe coupling of the solute and the solvent in two phases, theﬁrst in which the volume-exclusion core repulsion and disper-sion interactions are turned on, and a second phase in whichelectrostatic interactions are turned on. Soft-core interatomicpotentials are a useful expedient to avoid end-point singular-ities, especially when volume-exclusion core repulsion termsare introduced. Moreover for large solutes it is also often nec-essary to introduce one small group of atoms at a time or to resort to bond-growing processes in which solute atoms arepushed out from a central point rather than directly created inthe solvent.While successful, these strategies add layers of complex-ity to hydration free energy calculations. Splitting the cou-pling process in two phases requires specifying two alchemi-cal schedules which are often very different from each other.In addition, this practice relies on simulating a nonphysicalintermediate state consisting of the uncharged solute in wa-ter which could have very different conformational propen-sities than the actual solute. It could, for example, undergohydrophobic collapse. Soft-core potential function implemen-tations are difﬁcult and cumbersome to maintain. In addition,they require additional parameters that need to be tuned ac-cording to the nature of the interaction potential.It would be therefore beneﬁcial to develop more stream-lined alchemical protocols that (i) compute the hydration freeenergy in one continuous transformation process, (ii) that arebased on the standard form of interatomic potentials, and (iii)that can be parameterized using a linear progression of thecharging parameter λ , while (iv) preserving or improving therate of equilibration and convergence of free energy estimatesrelative to mainstream protocols. We present here a methodthat has all of these characteristics and that has the potentialto be widely applicable. The approach is based on the workwe carried out recently to optimize binding free energy calcu-lations with implicit solvation. This work resulted into thedevelopment of a family of single-step alchemical potentialenergy functions as well as a general framework to optimizethem to accelerate conformational sampling and convergence.In the present work we illustrate how the same approach canbe employed to evaluate the hydration free energies of smallto medium-sized molecule with explicit solvation. As in ouroriginal work, the approach is founded on the realization thatslow equilibration and convergence is caused by entropic bot-tlenecks akin to ﬁrst-order phase transitions that can be cir-cumvented by an appropriate choice of the alchemical poten-tial energy function.This work is organized as follows. We ﬁrst review thetheoretical and computational protocol developed earlier andthen we apply to a model system of hydration in which two a r X i v : . [ q - b i o . Q M ] M a y ingle Step Hydration Free Energy 2solutes (ethanol and 1-naphtol) are transferred from the gasphase to a water droplet. We show that the optimized proto-cols yield results consistent with conventional approaches forthe small solute ethanol for which they can reach convergence.For the larger solute 1-naphtol, only the optimized protocolcan achieve rapid convergence. These promising outcomespave the way for a novel generation of more efﬁcient andmore streamlined single-step alchemical free energy method-ologies. II. THEORY AND METHODSA. Alchemical Transformations for the Estimation ofHydration Free Energies

Alchemical transformations are based on an alchemical po-tential energy function that interpolates between the potentialenergy function U of the starting state and that of the ﬁnalstate U as the progress parameter λ goes, conventionally,from 0 to 1. The most straightforward approach is a linearinterpolating function of the form: U λ ( x ) = U ( x ) + λ [ U ( x ) − U ( x )] = U ( x ) + λ u ( x ) (1)where x represents the degrees of freedom of the system andwe have introduced the perturbation function u ( x ) = U ( x ) − U ( x ) (2)which in the case of hydration corresponds to the solute-solvent interaction energy and, critical to the following de-velopments, does not depend on λ .While straightforward, this simple alchemical potential en-ergy function leads to instabilities and poor convergence espe-cially when volume-exclusions terms are introduced. To ad-dress these issues while maintaining as much as possible thesame simple framework, we consider the family of alchemicalpotential energy functions of the form: U λ ( x ) = U ( x ) + W λ [ u sc ( x )] (3)where u sc ( x ) , deﬁned below, is a soft-core C ( ) -smooth, andone-to-one map of the solute-solvent interaction energy u , and W λ ( u sc ) is an alchemical perturbation function deﬁned suchthat W ( u sc ) = W ( u sc ) = u sc at λ = λ =

1, re-spectively. We will consider in this work the linear function W λ ( u ) = λ u sc and the integrated logistic function W λ ( u sc ) = λ − λ α ln (cid:104) + e − α ( u sc − u ) (cid:105) + λ u sc + w (4)where the parameters λ , λ , α , u , and w are functions of λ .The name of this function comes from the fact that its deriva-tive is the logistic function (also know as Fermi’s function): ∂ W λ ( u sc ) ∂ u sc = λ − λ + e − α ( u sc − u ) + λ . (5)The parameters of the integrated logistic function are opti-mized using the procedure described in reference 11 and sibrieﬂy described below. In this work, we employ the following deﬁnition of the soft-core solute-solvent interaction energy u sc ( u ) = (cid:40) u u ≤ u max f sc ( u / u max ) u > f sc ( y ) = z a − z a + , (7)where u max > z = + y / a + ( y / a ) and a is an adjustable dimensionless exponent. Here u max = a = / u evaluated with the standard form of the Coulomb andLennard-Jones interatomic potentials without soft-core modi-ﬁcations.To obtain the hydration free energy, a set of samples of thesoft-core solute-solvent interaction energies, u sc ( i ) , are col-lected during molecular dynamics simulations performed ata sequence of λ values between 0 and 1. The free energyproﬁle as a function of λ , ∆ G ( λ ) , is obtained by multi-statereweighting using the UWHAM method. The hydrationfree energy in the Ben-Naim standard state is by deﬁnitionthe value of free energy proﬁle at λ = ∆ G ∗ h = ∆ G ( ) . B. Analytic Theory of Alchemical Transformations

The hydration process is analyzed and optimized using thetheory we recently developed for alchemical potential energyfunction of the form of Eq. (3). The key quantity of thistheory is in this case p ( u sc ) , the probability density of thesoft-core solute-solvent interaction energy u sc in the ensem-ble in which solute and solvent are uncoupled ( λ = p ( u sc ) . In particular, given p ( u sc ) ,the probability density for the binding energy u sc for the statewith perturbation potential W λ ( u sc ) is p λ ( u sc ) = K ( λ ) p ( u sc ) exp [ − β W λ ( u sc )] (8)where β = / k B T , K ( λ ) = (cid:90) + ∞ − ∞ p ( u sc ) exp [ − β W λ ( u sc )] du sc = (cid:104) exp [ − β W λ ( u sc )] (cid:105) λ = (9)is the excess component of the equilibrium constant for bind-ing and ∆ G ( λ ) = − β ln K ( λ ) (10)is the corresponding free energy proﬁle. Note that for a lin-ear perturbation potential, W λ ( u sc ) = λ u sc , Eqs. (9) and (10)ingle Step Hydration Free Energy 3state that the free energy proﬁle is related to the double-sidedLaplace transform of p λ ( u sc ) .An analytical description of p ( u sc ) , and thus of all thequantities derived from it, is available. Brieﬂy (see reference15 for the full derivation), the theory is based on the assump-tion that the statistics of, in this case, the solute-solvent inter-action energy u is the superposition of two processes, one thatdescribes the sum of many “soft” background solute-solventinteractions and that follows central limit statistics, and an-other process that describes “hard” atomic collisions and thatfollows max statistics. The probability density p ( u ) is ex-pressed as the superposition of probability densities of a smallnumber of modes p ( u ) = ∑ i c i p i ( u ) (11)where c i are adjustable weights summing to 1 and p i ( u ) isthe probability density corresponding to mode i described an-alytically as (see reference 15 and appendix A of reference 11for the derivation): p i ( u ) = p bi g ( u ; ¯ u bi , σ bi )+( − p bi ) (cid:90) + ∞ p WCA ( u (cid:48) ; n li , ε i , ˜ u i ) g ( u − u (cid:48) ; ¯ u bi , σ bi ) du (cid:48) (12)where g ( u ; ¯ u , σ ) is the normalized Gaussian density functionof mean ¯ u and standard deviation σ and p WCA ( u ; n l , ε , ˜ u ) = n l (cid:34) − ( + x C ) / ( + x ) / (cid:35) n l − H ( u ) ε LJ ( + x C ) / x ( + x ) / , (13)where x = (cid:112) u / ε + ˜ u / ε + x C = (cid:112) ˜ u / ε +

1. The modelfor each mode i depends on a number of adjustable parameterscorresponding to the following physical quantities : • c i : relative population of mode i • p bi : probability that no atomic clashes occur while inmode i • ¯ u bi : the average background interaction energy of mode i • σ bi : the standard deviation of background interactionenergy of mode i • n li : the effective number of statistical uncorrelatedatoms of the solute in mode i • ε i : the effective ε parameter of an hypotheticalLennard-Jones interaction energy potential describingthe solute-solvent interaction energy in mode i • ˜ u i : the solute-solvent interaction energy value abovewhich the collisional energy is not zero in mode i The parameters above, together with the weights c i , are var-ied to ﬁt, using the maximum likelihood criterion, the distri-butions of the soft-core solute-solvent interaction energy ob-tained from numerical simulations. The distribution of thesoft-core solute-solvent interaction energy p ( u sc ) is obtained from p ( u ) using the standard formula for the change of ran-dom variable. In this work we use the analytical theory above and the re-sults of trial alchemical simulations to optimize the parame-ters of the integrated logistic alchemical perturbation poten-tial in Eq. (4). Brieﬂy, the procedure consists of running atrial alchemical calculations using the the linear alchemicalpotential W λ ( u sc ) = λ u sc . The set of samples of the soft-corebinding energies as a function of λ obtained from the trialsimulation are then used to derive a maximum likelihood pa-rameterization of the analytical model for p ( u sc ) [Eqs. (11) to(13)]. We developed an application based on the Tensorﬂow to conduct the maximum likelihood optimization. The analytical form of p ( u sc ) so obtained is then analyti-cally differentiated with respect to u sc to obtain the so-called λ -function: λ ( u sc ) ≡ β ∂ ln p ( u sc ) ∂ u sc . (14)Because, minima and maxima of p λ ( u sc ) occur when the λ -function and ∂ W λ ( u sc ) / ∂ u sc intersect λ ( u sc ) = ∂ W λ ( u sc ) ∂ u sc , (15)the λ -function can be used as a guide to design alchemicalpotentials that avoid the occurrence of bimodal distributionsthat are difﬁcult to converge. The linear alchemical poten-tial W λ ( u sc ) = λ u sc leads to ∂ W λ ( u sc ) / ∂ u sc = λ = constant,which in many cases intersects the λ -function at multiplepoints. Here we use the integrated logistic alchemical po-tential and we vary the parameters λ , λ , α , u , and w asa function of λ such that the derivative of the integrated lo-gistic function in Eq. (5) intersects λ ( u sc ) at a single pointat each λ or, when this is not easily achievable, such that itintersects it at nearby points. As thoroughly discussed in ref-erence 11 this procedure removes or reduces the severity ofentropic sampling bottlenecks during the alchemical couplingprocess and enhances conformational sampling efﬁciency andconvergence of the binding free energy estimates. C. Computational Details

In this work we employ a model of hydration in which so-lutes are transferred from vacuum to near the center of massof a water droplet of about 27 Å in diameter composed of 357TIP3P water molecules (Figure 1). The droplet is conﬁned ina spherical region deﬁned by a ﬂat-bottom harmonic restrain-ing potential centered at the origin and acting on each TIP3Pwater oxygen atom. The tolerance of the ﬂat-bottom potentialwas set to 24 Å (to ensure a conﬁnement region of approxi-mately twice the volume of the droplet) and a force constant of5 kcal/(mol Å ) beyond this tolerance. This model was cho-sen to illustrate and validate the novel methodology presentedhere to avoid potential issues with an implementation of thealchemical transfer with periodic boundary conditions. ingle Step Hydration Free Energy 4 FIG. 1. Illustration of 1-naphtol (green carbon atoms) inserted intothe center of the droplet of water employed in this work. Watermolecules obscuring the solute are not shown for clarity.

The solutes (ethanol and 1-naphtol) were preparedwith Maestro and GAFF/AM1-BCC force ﬁeld param-eters were assigned using the Antechamber program. Single Decoupling alchemical calculations were pre-pared using the SDM workﬂow ( github.com/-egallicc/openmm_sdm_worflow.git ) using 16 λ steps.The MD calculations employed the OpenMM MD enginethe SDM integrator plugins ( github.com/rajatkrpal/-openmm_sdm_plugin.git ) using the OpenCL platform. TheASyncRE software, customized for OpenMM and SDM( github.com/egallicc/async_re-openmm.git ), wasused for the Hamiltonian Replica Exchange in λ space withan uniform λ schedule between 0 and 1. The λ -dependentparameters used with the integrated logistic potential arelisted in Tables III and IV. Molecular dynamics runs wereconducted for a minimum of 2 ns per replica with a 1 fstime-step at 300 K, exchanging λ values approximately every5 ps. A Langevin thermostat at 300 K with a relaxation timeconstant of 20 ps was used. Binding energy samples andtrajectory frames were recorded every 5 ps. Calculationswere performed on the XSEDE Comet GPU HPC cluster atthe San Diego Supercomputing Center.In this work, we employ the Hamiltonian Replica Exchangealgorithm in alchemical space to accelerate conforma-tional sampling. For the calculations reported here, wehave employed the asynchronous implementation of ReplicaExchange (ASyncRE) with the Gibbs Independence Sam-pling algorithm for state reassignments. III. RESULTSA. Probability Distributions of the Solute-SolventInteraction Energy

As discussed in Theory and Methods, the free energy pro-ﬁle is determined by the probability distributions of the pertur-bation energy (the soft-core solute-solvent interaction energy, u sc , in this case) along the alchemical path. The probabil-ity distributions p λ ( u sc ) for some values of λ obtained fromthe replica exchange simulations of the hydration of ethanoland 1-naphtol with the linear alchemical potential are shownin Figure 2 (dots). At small λ s the distributions are centeredaround large and unfavorable values of the solute-solvent in-teraction energy reﬂecting the fact that atomic clashes are fre-quent when the solute and solvent are weakly coupled. Con-versely, at large λ s solute and solvent are strongly coupledand the solute-solvent interaction energies tend to be favor-able. However, the distributions do not uniformly move to-wards lower values of u sc as λ is progressively increased.Rather, at intermediate values of λ they become bimodal witha trough near u sc = λ = . λ = . λ = .

4. Presumably, at some λ value between these two the two modes are equally probable.However, it would be very difﬁcult to establish accurately thetransition point because interconversions from one state to theother are extremely rare (see Figure 5). The reason for this isthat the probability of visiting an intermediate state betweenthe weakly coupled and the strongly coupled states is immea-surably small (Figure 2B). In particular, the analytical modeltrained on this data (see below) predicts a small population ofstrongly coupled states at λ = .

333 and, conversely, a smallpopulation of weakly coupled states at λ = . λ . Because the free energy is affected by the relative popu-lations of the coupled and uncoupled states as a function of λ , it is expected (and conﬁrmed, see below) that the hydra-tion free energy estimate for 1-naphtol would be substantiallybiased in this case.The data shown in Figure 2 conﬁrms the observed proba-bility distributions of the soft-core solute-solvent interactionenergies are accurately reproduced by the analytical modelfor p ( u sc ) parameterized to each dataset. The maximumingle Step Hydration Free Energy 5 (A) (B) −

20 0 20 40 u sc [kcal/mol]0 . . . . p λ ( u s c ) [ k c a l / m o l] − Ethanol - Linear perturbation − −

20 0 20 40 u sc [kcal/mol]0 . . . . p λ ( u s c ) [ k c a l / m o l] − FIG. 2. The predicted (continuous lines) and observed (dots) probability densities of the soft-core solute-solvent interaction energy, p λ ( u sc ) , forthe alchemical hydration of with the linear perturbation function W λ ( u sc ) = λ u sc for (A) ethanol with (from right to left ) λ = λ = .

267 (green), and λ = λ = λ = .

333 (gold), λ = . λ = p ( u sc ) , Eqs. (11) to (13)], with the parameters listed inTable II. likelihood-optimized parameters of the model are listed in Ta-ble II. The model reproduces accurately the positions of thedistributions and their variations as a function of λ , includingthe points where transitions occur.The parameterized analytical functions for p ( u sc ) are ana-lytically differentiated with respect to u sc to obtain the corre-sponding λ -functions λ ( u sc ) [Eq. (14)] shown in Figure 3for the hydration of ethanol and 1-naphtol. These functionsare useful because they can be used to locate, graphically,the maxima and minima of the probability distributions of thesoft-core solute-solvent interaction energy as λ is varied. Theprocedure consists of ﬁnding the intersections between the λ -function and, in the case of a linear perturbation potential, thehorizontal line drawn at the level of the desired value of λ .Indeed, the predictions from the λ -functions in Figure 3are quantitatively consistent with the observations in Figure2. In each case, near λ = u sc . As λ is increased(that is as the horizontal line is shifted up), a back-bendingregion of the λ -function is encountered in which typicallythree intersections occur with the ﬁrst and the last correspond-ing to maxima of the distributions and the middle one to aminimum. More complex patterns can arise when there aremultiple back-bending regions. In the case of ethanol, abovea critical value of λ (approximately 0 .

4) the distributions re-turn to be unimodal at low values of the solute-solvent interac-tion energies. In the case of 1-naphtol the back-bending of the λ -function is predicted to be so extreme to prevent unimodaldistributions for large values of λ less than 1.While the λ -functions identify the locations of maxima andminima of the distributions, they alone do not predict the rel-ative populations of competing modes. Thus, for example, at λ = As shown in Figure 3(A) the parame-ters of the integrated logistic potential can be tuned to avoidmultiple intersections with the λ -function thereby avoidingthe occurrence of bimodal distributions (see below). Whenthis is not possible, as in Figure 3(B), the logistic potentialis still useful to reduce the energy gap between competingmodes and to favor one mode over another.The alchemical simulations with the integrated logistic po-tential with optimized parameters (Table II) designed basedon the predicted λ -functions (Figure 3) indeed show the ab-sence of a gap of the solute-solvent interaction energy and agradual shift of the probability distributions as the solute iscoupled to the solvent (see Figure 4, to be compared with Fig-ure 2 with the linear alchemical potential). The distributionswith the integrated logistic alchemical potential are also gen-erally unimodal reﬂecting either lack of multiple modes or thegradual shift from one mode to another as λ is varied. Theeffect of the integrated logistic potential is smaller in the caseof ethanol which does not exhibit a strong hydration transitionwith the linear potentials. It has, however, a very substantialeffect on 1-naphtol where an interaction energy gap of morethan 30 kcal/mol (Figure 2B) is virtually eliminated (Figure4B) thereby facilitating interconversions between the weaklycoupled and strongly coupled states (see below).ingle Step Hydration Free Energy 6 (A) (B) −

20 0 20 40 u sc [kcal/mol]0 . . . λ ( u s c ) Ethanol − −

20 0 20 40 u sc [kcal/mol]0 . . . λ ( u s c ) FIG. 3. The predicted λ -functions for the hydration of (A) ethanol and (B) 1-naphtol obtained from the analytical model for p ( u sc ) [Eqs. (11)to (13)], and Eq. (14) with the parameters listed in Table II. The dashed horizontal lines correspond to the values of λ where the probabilitydistributions of the soft-core solute solvent energy are predicted to be bimodal [ λ = .

267 in (A) and λ = .

333 and λ = . The solid sigmoidcurve is the derivative, ∂ W λ ( u sc ) / ∂ u sc , of the integrated logistic function for a representative value of λ [ λ = . λ = . (A) (B) −

20 0 20 40 u sc [kcal/mol] . . . . p λ ( u s c ) [ k c a l / m o l] − Ethanol - Logistic perturbation − −

20 0 20 40 u sc [kcal/mol] . . . . p λ ( u s c ) [ k c a l / m o l] − FIG. 4. The probability densities of the soft-core solute-solvent interaction energy from the alchemical simulations with the integrated logisticperturbation potential [Eq. (4)] with the parameters in Table II for the hydration of (A) ethanol and (B) 1-naphtol. The distributions shift fromright to left with alternating colors as the λ progress variable progressively varies from 0 (vacuum state) to 1 (hydrated state). B. Analysis of Replica Exchange Eﬃciency

Hamiltonian replica exchange efﬁciency has been moni-tored here in terms of the extent of diffusion of replicas insolute-solvent interaction energy space. We are particularlyinterested in the rate of transitions between coupled states withfavorable solute-solvent interaction energies and uncoupledstates with large and repulsive interaction energy. The timetrajectories of the soft-core solute-solvent interaction energiessampled by the replicas are shown in Figure 5. The number oftransitions from uncoupled to coupled states (hydration) andvice versa (dehydration) is presented in Table I. The data indicates that ethanol undergoes frequent hydra-tion/dehydration events with either the linear and integratedlogistic perturbation potentials. 1-naphtol however undergoesvery few hydration/dehydration transitions and none at all inthe latest three quarters of the simulation with the linear per-turbation potential. This is due to the wide and strong inter-action energy gap between the distributions of solute-solventinteraction energies of states near the coupled state and thosenear the decoupled state (Figure 2). The bulky nature of thissolute makes it very improbable for a decoupled 1-naphtolmolecule to ﬁnd a suitable cavity in the solvent to make favor-able interactions with the solvent. Conversely, 1-naphtol iningle Step Hydration Free Energy 7 ethanol . . . . . t [ns] − − u s c [ k c a l / m o l] Linear perturbation . . . . . t [ns] Logistic perturbation . . . . . t [ns] − − − u s c [ k c a l / m o l] Linear perturbation . . . . . t [ns] Logistic perturbation

FIG. 5. Soft-core solute-solvent interaction energy trajectories for selected replicas of the alchemical replica exchange simulations of thehydration of ethanol (top) and 1-naphtol (bottom) as a function of simulation time with the linear (left) and integrated logistic (right) alchemicalperturbation potentials. Ethanol undergoes multiple hydration and dehydration events with either alchemical perturbation. The bulkier 1-naphtol undergoes frequent hydration and dehydration events only with the integrated logistic perturbation potential.TABLE I. Number of hydration and dehydration transitionsProtocol n hydr n dehydr ethanol a Linear 58 60Logistic 48 491-naphtol b Linear 1 3Logistic 21 23 a u lower = − u upper =

25 kcal/mol b u lower = − u upper =

25 kcal/mol a coupled state is unlikely to overcome the energetic penaltynecessary to become decoupled from the solvent. The cou-pled and decoupled states of 1-naphtol are separated by es-sentially a pseudo ﬁrst-order phase transition which is entrop-ically frustrated in the hydration direction and energeticallyfrustrated in the dehydration direction.The integrated logistic perturbation potential on the otherhand is very effective at circumventing the phase transition.As shown in Figure 5 and Table I, with the integrated logisticpotential 1-naphtol undergoes many hydration and dehydra- tion transitions. This makes it possible to rapidly equilibrateand converge the hydration free energy estimate for this solute(see below).

C. Equilibration of Hydration Free Energy Estimates

The computed hydration free energy estimates of ethanoland 1-naphtol are plotted in Figure 6 as a function of theamount of data discarded from the start of the simulation(equilibration time). These plots, referred to as reverse cu-mulative equilibration proﬁles, are used to determine theequilibration time after which the time series of data gener-ated by the simulation becomes stationary and not biased bythe starting conditions. It is not obvious to pin-point such atime because, while the accuracy of the binding free energypresumably improves as more non-equilibrated samples arediscarded, the precision of the estimate (illustrated by the er-ror bars in Figure 6) worsens as more initial samples are dis-carded. Here we take the approach of choosing the hydrationfree energy estimate with the best compromise between biasand precision as the one corresponding to the smallest equi-libration time that gives a value statistically indistinguishableingle Step Hydration Free Energy 8from those at all subsequent equilibration times.Based on this criterion, we conclude that the hydration freeenergy of ethanol equilibrates almost immediately after thestart of the simulation with either the linear or integrated logis-tic alchemical potentials. The two estimates are − . ± . − . ± .

16 kcal/mol for the linear and inte-grated logistic potentials, respectively, are in statistical agree-ment. Given the very different nature of the alchemical pathsin the two simulations, it must be concluded that equilibra-tion and convergence of the hydration free energy has beenachieved in this case. This positive outcome is consistent withthe high rate of hydration and dehydration events observed forethanol (Figure 5 and Table I). It also conﬁrms the validity ofthe alchemical protocol and the correctness of the simulationalgorithms in producing a canonical ensemble of conforma-tions.The reverse cumulative equilibration proﬁles paint a dif-ferent picture for the bulkier 1-naphtol (Figure 6). With thelinear perturbation potential the hydration free energy of 1-naphtol appears to equilibrate after approximately 0 .

75 ns ata value of − . ± .

25 kcal/mol. With the integrated lo-gistic potential instead, the hydration free energy appears toequilibrate immediately after the start of the simulation at avalue of − . ± .

22 kcal/mol. The two values are most cer-tainly statistical inconsistent (the p -value of being equivalentis 8 × − based on a simple t-test) and, based on the very fewoccurrences of hydration/dehydration events (Table I) and theslow equilibration (Figure 6), it can be concluded that the es-timate with the linear alchemical potential is the least reliable. IV. DISCUSSION

Alchemical hydration free energy calculations are widelyemployed to predict the water solubilites of substances, test force ﬁeld and free energy protocols, and to esti-mate absolute and relative binding free energies of molecularcomplexes.

While early alternatives have been proposed, the generally accepted guidelines to avoid slow convergenceand numerical instabilities are to split the alchemical hydra-tion process in two steps. In the ﬁrst step volume-exclusionand dispersion solute-solvent interactions are turned on, fol-lowed by a second step in which electrostatic interactions areestablished. In addition, especially during the ﬁrst phase, ithas been found necessary to employ customized soft-core in-teratomic pair potentials.

These free energy practices are widely implemented in pop-ular molecular simulation packages, and, while gen-erally successful and robust, their equilibration and conver-gence rates are likely not optimal. In addition their deploy-ment and software implementations can be cumbersome anddifﬁcult to integrate and maintain alongside other molecu-lar simulation algorithms. A recent study by Lee et al. discussed the downsides of multi-step free energy meth-ods and called for a streamlined single-step approach thatwould more easily be integrated with extended ensemble andself-adjusting conformational sampling algorithms and non-equilibrium approaches. Lee et al. furthermore pro- posed a family of soft-core pair potentials and non-linear al-chemical hybrid potentials that enable the calculation of hy-dration free energies in a single step in which Lennard-Jonesand Coulomb interactions are varied in concert rather thanseparately.In a recent study we identiﬁed pseudo ﬁrst-order phasetransitions along the alchemical path as the critical and funda-mental obstacles to the application of single-step alchemicalprocesses in which two chemical species are coupled to eachother. In the same study, we applied Hamiltonian-shapingtechniques inspired by non-Boltzmann sampling methodolo-gies developed by Straub and collaborators for the studyof temperature-driven ﬁrst order phase transitions, to choosean alchemical path that avoids or soften biphasic behavior. Inthis study we show that the same techniques that we devel-oped earlier in the context of an implicit description of thesolvent can be also applied to the estimation of hydration freeenergies with explicit solvation. This approach not only in-troduces a new family of alchemical perturbation functionsof potentially wide applicability, but it also proposes a theo-retical framework and a numerical and graphical procedure tooptimize them to systematically improve conformational sam-pling and the rate of equilibration and convergence of free en-ergy estimates.The theoretical, algorithmic, and numerical simulationstrategies presented and the promising results illustrated heresuggest a possible roadmap to streamline and improve al-chemical free energy protocols as currently implemented inpopular molecular simulation packages. The ﬁrst suggestionis to replace soft-core pair potentials with, as done here, asoft-core function applied to the raw overall solute-solvent in-teraction energy (without soft-core modiﬁcations) [e.g. usingEqs. (6) and (7)] rather than to each individual atomic pair ascurrently done. This would effectively replace a O ( N ) op-eration with a O ( ) operation that can be easily executed bya single subroutine external to the complex procedures usedto perform pairwise calculations. The processing of the gra-dients of the alchemical potential would require a O ( N ) op-eration but it would still external and separate from the mainpairwise loop.The second suggestion emerging from this study is the useof a non-linear alchemical perturbation that, like the inte-grated logistic potential we propose [Eq. 4], judiciously warpsthe alchemical path in critical regions to avoid phase transi-tions, while retaining a linear character away from the prob-lematic regions. Furthermore, we suggest the use of the λ -function formalism [Eqs. (14) and (15)] and adaptive maxi-mum likelihood-based approaches to systematically optimizethe alchemical schedule and the alchemical perturbation func-tion to enhance equilibration and convergence. V. CONCLUSIONS

We employ a statistical mechanics theory and an analytictheory of molecular association used previously to optimizea single-decoupling alchemical approach to binding free en-ergy estimation to the computation of hydration free en-ingle Step Hydration Free Energy 9 ethanol . . . . t eq [ns] − − − ∆ G h [ k c a l / m o l] Linear perturbation . . . . t eq [ns] Logistic perturbation . . . . t eq [ns] − − − − ∆ G h [ k c a l / m o l] Linear perturbation . . . . t eq [ns] Logistic perturbation

FIG. 6. Reverse cumulative equilibration proﬁles of the hydration free energies of ethanol (top) and 1-naphtol (bottom) as a function ofequilibration time with the linear (left) and integrated logistic (right) alchemical perturbation potentials. The horizontal line corresponds to thelast value. ergies of small to medium-sized molecules. The approach andits beneﬁts are illustrated for a model system involving the hy-dration of two solutes in a water droplet. This proof of prin-ciple study paves the way for a new generation of streamlinedand improved single-step free energy estimation algorithms ofwide applicability.

ACKNOWLEDGMENTS

We acknowledge support from the National Science Foun-dation (NSF CAREER 1750511 to E.G.). Molecular sim-ulations were conducted on the Comet GPU supercomputercluster at the San Diego Supercomputing Center supported byNSF XSEDE award TG-MCB150001.

Appendix A: Appendix1. Parameters of the Analytical Model for p ( u sc ) See Table II.

2. Alchemical Schedule and Parameters of the IntegratedLogistic Perturbation Function

See Tables III and IV. A. Ben Naim.

Water and Aqueous Solutions . Plenum, New York, 1974. Christel AS Bergström and Per Larsson. Computational prediction of drugsolubility in water-based systems: qualitative and quantitative approachesused in the current drug discovery and development setting.

Inter. J.Pharm. , 540(1-2):185–193, 2018. M. K. Gilson, J. A. Given, B. L. Bush, and J. A. McCammon. Thestatistical-thermodynamic basis for computation of binding afﬁnities: Acritical review.

Biophys. J. , 72:1047–1069, 1997. David L Mobley and J Peter Guthrie. Freesolv: a database of experimentaland calculated hydration free energies, with input ﬁles.

J. Comp. Aid. Mol.Des. , 28(7):711–720, 2014. Chipot and Pohorille (Eds.).

Free Energy Calculations. Theory and Appli-cations in Chemistry and Biology . Springer Series in Chemical Physics.Springer, Berlin Heidelberg, Berlin Heidelberg, 2007. Anita de Ruiter and Chris Oostenbrink. Free energy calculations of protein–ligand interactions.

Current opinion in chemical biology , 15(4):547–552,2011. Thomas Simonson. Free energy of particle insertion: an exact analysis ofthe origin singularity for simple liquids.

Molecular Physics , 80(2):441–447, 1993. David L Mobley. Lets get honest about sampling.

J. Comp. Aided Mol.Des. , 26:93–95, 2012. ingle Step Hydration Free Energy 10 Pavel V Klimovich, Michael R Shirts, and David L Mobley. Guidelinesfor the analysis of free energy calculations.

J. Comp. Aid. Mol. Des. ,29(5):397–411, 2015. Tai-Sung Lee, Zhixiong Lin, Bryce K Allen, Charles Lin, Brian K Radak,Yujun Tao, Hsu-Chun Tsai, Woody Sherman, and Darrin M York. Improvedalchemical free energy calculations with optimized smoothstep softcore po-tentials.

J. Chem. Theory Comput. , 2020. Rajat K Pal and Emilio Gallicchio. Perturbation potentials to overcomeorder/disorder transitions in alchemical binding free energy calculations.

J.Chem. Phys. , 151(12):124116, 2019. Michael R Shirts and John D Chodera. Statistically optimal analysis ofsamples from multiple equilibrium states.

J. Chem. Phys. , 129:124105,2008. Zhiqiang Tan, Emilio Gallicchio, Mauro Lapelosa, and Ronald M. Levy.Theory of binless multi-state free energy estimation with applications toprotein-ligand binding.

J. Chem. Phys. , 136:144102, 2012. E. Gallicchio, M. M. Kubo, and R. M. Levy. Entropy-enthalpy compensa-tion in solvation and ligand binding revisited.

J. Am. Chem. Soc. , 120:4526–27, 1998. Denise Kilburg and Emilio Gallicchio. Analytical model of the free energyof alchemical molecular binding.

J. Chem. Theory Comput. , 14(12):6183–6196, 2018. Emilio Gallicchio, Mauro Lapelosa, and Ronald M. Levy. Binding energydistribution analysis method (BEDAM) for estimation of protein-ligandbinding afﬁnities.

J. Chem. Theory Comput. , 6:2961–2977, 2010. Emilio Gallicchio and Ronald M Levy. Recent theoretical and computa-tional advances for modeling protein-ligand binding afﬁnities.

Adv. Prot.Chem. Struct. Biol. , 85:27–80, 2011. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, ZhifengChen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, MatthieuDevin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving,Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, ManjunathKudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore,Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner,Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Va-sudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg,Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scalemachine learning on heterogeneous systems, 2015. Software available fromtensorﬂow.org. Tai-Sung Lee, Brian K Radak, Anna Pabis, and Darrin M York. A new max-imum likelihood approach for free energy proﬁle construction from molec-ular simulations.

J. Chem. Theory Comput. , 9(1):153–164, 2012. Albert C Pan, Huafeng Xu, Timothy Palpant, and David E Shaw. Quan-titative characterization of the binding and unbinding of millimolar drugfragments with molecular dynamics simulations.

J. Chem. Theory Com-put. , 13(7):3372–3377, 2017. Junmei Wang, Romain M Wolf, James W Caldwell, Peter A Kollman, andDavid A Case. Development and testing of a general amber force ﬁeld.

J.Comp. Chem. , 25(9):1157–1174, 2004. Peter Eastman, Jason Swails, John D Chodera, Robert T McGibbon, Yu-tong Zhao, Kyle A Beauchamp, Lee-Ping Wang, Andrew C Simmonett,Matthew P Harrigan, Chaya D Stern, et al. Openmm 7: Rapid developmentof high performance algorithms for molecular dynamics.

PLoS Comp. Bio. ,13(7):e1005659, 2017. Emilio Gallicchio, Junchao Xia, William F Flynn, Baofeng Zhang, SadeSamlalsingh, Ahmet Mentes, and Ronald M Levy. Asynchronous replicaexchange software for grid and heterogeneous computing.

ComputerPhysics Communications , 196:236–246, 2015. Yuji Sugita, Akio Kitao, and Yuko Okamoto. Multidimensional replica-exchange method for free-energy calculations.

J. Chem. Phys. , 113:6042–6051, 2000. A. K. Felts, Y. Harano, E. Gallicchio, and R. M. Levy. Free energy surfacesof beta-hairpin and alpha-helical peptides generated by replica exchangemolecular dynamics with the AGBNP implicit solvent model.

Proteins:Struct. Funct. Bioinf. , 56:310–321, 2004. K.P. Ravindranathan, E. Gallicchio, R. A. Friesner, A. E. McDermott, andR. M. Levy. Conformational equilibrium of cytochrome P450 BM-3 com-plexed with N-palmitoylglycine: A replica exchange molecular dynamicsstudy.

J. Am. Chem. Soc. , 128:5786–5791, 2006. Hisashi Okumura, Emilio Gallicchio, and Ronald M Levy. Conformationalpopulations of ligand-sized molecules by replica exchange molecular dy-namics and temperature reweighting.

J. Comput. Chem. , 31:1357–1367,2010. Christopher J. Woods, Jonathan W. Essex, and Michael A. King. The de-velopment of replica-exchange-based free-energy methods.

J. Phys. Chem.B , 107:13703–13710, 2003. Steven W. Rick. Increasing the efﬁciency of free energy calculations usingparallel tempering and histogram reweighting.

J. Chem. Theory Comput. ,2:939–946, 2006. Jaegil Kim and John E Straub. Generalized simulated tempering for explor-ing strong phase transitions.

J. Chem. Phys. , 133(15):154101, 2010. Wei Yang, Ryan Bitetti-Putzer, and Martin Karplus. Free energy simula-tions: use of the reverse cumulative averaging to determine the equilibratedregion and the time required for convergence.

J. Chem. Phys. , 120(6):2618–2628, 2003. John D Chodera. A simple method for automated equilibration detection inmolecular simulations.

J. Chem. Theory Comput. , 12(4):1799–1805, 2016. Denise Kilburg and Emilio Gallicchio. Assessment of a single decouplingalchemical approach for the calculation of the absolute binding free en-ergies of protein-peptide complexes.

Frontiers in Molecular Biosciences ,5:22, 2018. Tang-Qing Yu, Pei-Yang Chen, Ming Chen, Amit Samanta, Eric Vanden-Eijnden, and Mark Tuckerman. Order-parameter-aided temperature-accelerated sampling for the exploration of crystal polymorphism and solid-liquid phase transitions.

J. Chem. Phys. , 140(21):06B603_1, 2014. David L Mobley, Shaui Liu, David S Cerutti, William C Swope, and Julia ERice. Alchemical prediction of hydration free energies for sampl.

J. Comp.Aid. Mol. Des. , 26(5):551–562, 2012. David L Mobley, Elise Dumont, John D Chodera, and Ken A Dill. Com-parison of charge models for ﬁxed-charge force ﬁelds: small-molecule hy-dration free energies in explicit solvent.

J Phys Chem B , 111:2242–2254,2007. Devleena Shivakumar, Joshua Williams, Yujie Wu, Wolfgang Damm, JohnShelley, and Woody Sherman. Prediction of absolute solvation free energiesusing molecular dynamics free energy perturbation and the opls force ﬁeld.

J. Chem. Theory Comput. , 6(5):1509–1519, 2010. John D. Chodera, David L. Mobley, Michael R. Shirts, Richard W. Dixon,Kim Branson, and Vijay S. Pande. Alchemical free energy methods for drugdiscovery: Progress and challenges.

Curr. Opin. Struct. Biol. , 21:150–160,2011. Clara D Christ and Wilfred F van Gunsteren. Multiple free energiesfrom a single simulation: Extending enveloping distribution sampling tononoverlapping phase-space distributions.

The Journal of chemical physics ,128(17):174112, 2008. Jed W Pitera and Wilfred F van Gunsteren. A comparison of non-bondedscaling approaches for free energy calculations.

Molecular Simulation ,28(1-2):45–65, 2002. Thomas Steinbrecher, InSuk Joung, and David A. Case. Soft-core poten-tials in thermodynamic integration: Comparing one- and two-step transfor-mations.

J. Comput. Chem. , 32:3253–3263, 2011. Bernard R Brooks, Charles L Brooks III, Alexander D Mackerell Jr, LennartNilsson, Robert J Petrella, Benoît Roux, Youngdo Won, Georgios Archon-tis, Christian Bartels, Stefan Boresch, et al. Charmm: the biomolecular sim-ulation program.

Journal of computational chemistry , 30(10):1545–1614,2009. James C Phillips, Rosemary Braun, Wei Wang, James Gumbart,Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel,Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics withnamd.

J. Comp. Chem. , 26(16):1781–1802, 2005. Sander Pronk, Szilárd Páll, Roland Schulz, Per Larsson, Pär Bjelkmar,Rossen Apostolov, Michael R Shirts, Jeremy C Smith, Peter M Kasson,David van der Spoel, Berk Hess, and Erik Lindahl. Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit.

Bioinformatics , 29:845–854, 2013. Xibing He, Shuhan Liu, Tai-Sung Lee, Beihong Ji, Viet H Man, Darrin MYork, and Junmei Wang. Fast, accurate, and reliable protocols for routinecalculations of protein–ligand binding afﬁnities in drug design projects us-ing AMBER GPU-TI with ff14SB/GAFF.

ACS Omega , 5(9):4611–4619,2020. ingle Step Hydration Free Energy 11 Eric Darve, David Rodríguez-Gómez, and Andrew Pohorille. Adaptive bi-asing force method for scalar and vector free energy calculations.

The Jour-nal of chemical physics , 128(14):144120, 2008. Piero Procacci. Solvation free energies via alchemical simulations: let’s gethonest about sampling, once more.

Phys. Chem. Chem. Phys. , 2019. Qing Lu, Jaegil Kim, and John E Straub. Order parameter free enhancedsampling of the vapor-liquid transition using the generalized replica ex-change method.

J. Chem. Phys. , 138(10):104119, 2013. ingle Step Hydration Free Energy 12

TABLE II. Optimized parameters for the analytical model of molecular association for the two systems studied in this work. Uncertainties areimplied by the number of reported signiﬁcant ﬁgures.weight p b ¯ u b a σ a b ε a ˜ u a n l ethanol hydrationmode 1 3 . × − . × − − .

30 2 .

65 3 . − .

69 4 . . × − . × − − .

51 2 .

65 22 . . . . × − . × − . . . . × − . × − − .

35 3 .

24 4 .

93 2 .

52 4 . . × − . × − − .

70 3 .

22 7 .

91 15 . . .

00 1 × − . . . a In kcal/mol

TABLE III. Alchemical schedule of the integrated logistic perturba-tion function for the hydration of ethanol λ λ λ α a u w a kcal/mol − kcal/mol c kcal/mol ingle Step Hydration Free Energy 13 TABLE IV. Alchemical schedule of the integrated logistic perturba-tion function for the hydration of 1-naphtol λ λ λ α a u w a kcal/mol − kcal/mol cc