[PDF] Best Practices for Alchemical Free Energy Calculations

Abstract

Alchemical free energy calculations are a useful tool for predicting free energy differences associated with the transfer of molecules from one environment to another. The hallmark of these methods is the use of "bridging" potential energy functions representing \emph{alchemical} intermediate states that cannot exist as real chemical species. The data collected from these bridging alchemical thermodynamic states allows the efficient computation of transfer free energies (or differences in transfer free energies) with orders of magnitude less simulation time than simulating the transfer process directly. While these methods are highly flexible, care must be taken in avoiding common pitfalls to ensure that computed free energy differences can be robust and reproducible for the chosen force field, and that appropriate corrections are included to permit direct comparison with experimental data. In this paper, we review current best practices for several popular application domains of alchemical free energy calculations, including relative and absolute small molecule binding free energy calculations to biomolecular targets.

Full PDF

AA LiveCoMS Best Practices Guide

Best Practices for Alchemical FreeEnergy Calculations [Article v 1.0]

Antonia S. J. S. Mey , Bryce K. Allen , Hannah E. Bruce Macdonald , John D.Chodera , Maximilian Kuhn , Julien Michel , David L. Mobley , Levi N.Naden , Samarjeet Prasad , Andrea Rizzi , Jenke Scheen , Michael R. Shirts ,Gary Tresadern , Huafeng Xu EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’sBuildings, Edinburgh, EH9 3FJ, UK; Computational and Systems Biology Program, SloanKettering Institute, Memorial Sloan Kettering Cancer Center, New York NY, USA; Departments of Pharmaceutical Sciences and Chemistry, University of California, Irvine,USA; National Institutes of Health, Bethesda, MD, USA; University of Colorado Boulder,Boulder, CO, USA; Silicon Therapeutics, Boston, MA, USA; Tri-Institutional TrainingProgram in Computational Biology and Medicine, New York, NY, USA; ComputationalChemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340,Belgium; Cresset, Cambridgeshire, UK; Molecular Sciences Software Institute, Blacksburg VA,USA

This LiveCoMS document ismaintained online onGitHub at https://github.com/michellab/alchemical-best-practices;to provide feedback,suggestions, or helpimprove it, please visit theGitHub repository andparticipate via the issuetracker.This version dated August24, 2020

Abstract

Alchemical free energy calculations are a useful tool for predicting free energy diﬀer-ences associated with the transfer of molecules from one environment to another. The hallmarkof these methods is the use of "bridging" potential energy functions representing alchemical inter-mediate states that cannot exist as real chemical species. The data collected from these bridgingalchemical thermodynamic states allows the eﬃcient computation of transfer free energies (ordiﬀerences in transfer free energies) with orders of magnitude less simulation time than simulatingthe transfer process directly.While these methods are highly ﬂexible, care must be taken in avoiding common pitfalls to ensurethat computed free energy diﬀerences can be robust and reproducible for the chosen force ﬁeld,and that appropriate corrections are included to permit direct comparison with experimental data.In this paper, we review current best practices for several popular application domains of alchem-ical free energy calculations, including relative and absolute small molecule binding free energycalculations to biomolecular targets. *For correspondence: [email protected] (ASJSM); [email protected] (JDC); [email protected](DLM); [email protected] (MRS)

Alchemical free energy calculations compute free energy dif-ferences associated with transfer processes, such as the bind- a r X i v : . [ q - b i o . B M ] A ug LiveCoMS Best Practices Guideing of a small molecule to a receptor, the transfer of a smallmolecule from an aqueous to apolar phase [1], or the ef-fects of protein side chain mutations on binding aﬃnities orthermostabilities. These calculations use non-physical inter-mediate states in which the chemical identity of some portionof the system (such as a small molecule ligand or proteinsidechain) is changed by modifying the potential governingthe interactions with the environment for the atoms beingmodiﬁed, inserted, or deleted.Fig. 1 illustrates common free energy changes that maybe diﬃcult to compute with unbiased molecular dynamicsmethods, but are more tractable with alchemical methods.In alchemical simulations, the introduction of intermediate alchemical states that bridge the high-probability regions ofconﬁguration space between two physical endstates of inter-est, permits the robust computation of free energy for largetransformations. Alchemical calculations can be used in avariety of scenarios, such as: • computing the free energy of a conformational changefor a molecule with a high barrier to interconversion(Fig. 1 A ); • computing partition (log P ) or distribution (log D ) coeﬃ-cients between environments (Fig. 1 B ) [3, 4] • determining partitioning between compartments intomembranes (Fig. 1 C ) [5].Furthermore, alchemical calculations are frequently usedto estimate changes in free energies upon modifying a ligandor protein: • a protein residue can be alchemically mutated to probethe impact on binding aﬃnity (Fig. 1 D )[6, 7] or changesin protein thermostability [8–11]; • the entire ligand can be alchemically transferred fromprotein to solvent in an absolute binding free energycalculation (Fig. 1 E ) [12–14]; • small alchemical modiﬁcations can be made betweenchemically related ligands to estimate relative diﬀer-ences in binding free energies (Fig. 1 F ) [15–19].After an alchemical calculation is performed (which gen-erally involves multiple simulations at a variety of alchemicalstates), the data must be analyzed to compute an estimateof the free energy for the transformation of interest. Earlywork used simple but statistically suboptimal estimators forthis: free energy perturbation (FEP) used a simple (but highlybiased) estimator based on the Zwanzig relation [1] or numer-ical quadrature via thermodynamic integration (TI), for whichthe theory dates back the better part of a century but withthe ﬁrst computational applications emerging in the 1980’s Here, the non-physical nature of the transformation is referred to as "al-chemical", a term coined by Tembre and McCammon in Ref. [2]. and 90’s [20–24]. More recent developments have seen new,highly eﬃcient statistical estimators that make better useof all the data, often building on the more eﬃcient and lessbiased Bennett acceptance ratio (BAR) [25], producing mul-tistate generalizations [26] or removing the need for globalequilibrium [27–29].Subsequent work in the 2000s led to improved implemen-tations of alchemical methods in popular biomolecular sim-ulation packages [15, 30–34]. This foundational work, com-bined with the methodological, technological, and hardwareimprovements of the last 5–10 years, has led to an explo-sion of interest and direct commercial application of thesetechnologies [15, 19, 35–38].As the ﬁeld of molecular simulation can now routinelyaccess microsecond timescales with the aid of GPUs [39], andmillisecond timescales appear to soon be within reach, accu-rate alchemical calculations on even more challenging prob-lems will become reasonable to perform. In the meantime,today’s users may ﬁnd it diﬃcult to get started with thesecomplex calculations whilst also keeping up with the fast paceof change. This Best Practices guide provides current recom-mendations and tips for users of all experience. Updates andsuggestions are welcomed via our GitHub repository.

This Best Practices guide focuses on providing a good start-ing point for new practitioners and a reference for experi-enced practitioners. For this propose we provide a convenientchecklist (Sec. 12) to help ensure all calculations comply withcurrently-understood best practices for alchemical simulationand analysis. Where the best practices are currently not cer-tain, we highlight areas where further research is needed toidentify an unambiguous recommendation. It can also serveas a set of best practices to ensure simulation robustness andreproducibility which reviewers may wish to consider as theyevaluate papers.We assume that novice practitioners have at least mod-erate experience with molecular simulation concepts anduse of simulation packages. Furthermore, basic familiaritywith the principles of molecular mechanics, molecular dynam-ics simulations, statistical mechanics, and the biophysics ofprotein-ligand association are essential. If you feel unfamiliarwith some of these concepts, good starting points can befound in these references [40–43].While reading this Best Practices guide, it is important tobear in mind this is not a review of all free energy calculationmethods at the cutting edge of current research. Instead thisguide aims to answer the following questions: • Is my problem suitable for an alchemical calculation? • How do I select an appropriate alchemical protocol?

LiveCoMS Best Practices Guide

MembraneHostHostHostHostHost

ABCDEF

Figure 1. Illustration of common types of free energies diﬀer-ences that can be calculated using alchemical free energy meth-ods. A : Change in free energy due to a conformational change of themolecule across a high barrier. B : Partition coeﬃcient such as log P or log D depend on a change in free energy between diﬀerent phases;here, as an example the partition coeﬃcient between methanol andwater is shown. C : Free energy diﬀerence associated with the inser-tion of a molecule into a membrane. D : Eﬀect of mutations of proteinor host residues on free energies of binding. E : Absolute free ener-gies of binding of a small molecule to a host (e.g. protein), F : Relativefree energy of binding of one molecule with respect to another, heretoluene and benzyl alcohol. • What software tools are available to perform alchemicalcalculations? • How should I analyze my data and report uncertainties?Some other background information may be needed de-pending on the nature of the alchemical project. For example,often, if binding poses are not known, docking calculationscan be used to generate an initial small molecule binding poseto start alchemical simulations. This will require some basicfamiliarity on how to perform docking to generate reasonablesimulation starting points [44].As some of the theoretical background can seem daunt-ing, we do, however, provide a guide to the essential theorybehind alchemical free energy calculations in Sec. 3. In theremainder of this paper, we will cover topics that are key tothe preparation( Sec. 6), choice and use of correct protocols(Sec. 7), and ﬁnally the best practices that should be usedin the analysis of alchemical calculations (Sec. 8). Particularfocus will be given to aspects of the molecular simulationswhich are unique to alchemical calculations—these includethe calculation of transfer free energies (hydration free en-ergies, partition coeﬃcients, etc.), and binding free energies(absolute and relative).While we try to address as many methods and practicesas possible, the ﬁeld of free energy calculations is broad, andthere are many advanced topics that are left to future BestPractices documents focusing on speciﬁc issues. Below, weprovide a non-exhaustive list of topics we have not addressed,along with some references to provide starting points onthese more advanced topics: • covalent inhibition [45] • free energies of mutation of protein side chains [7, 10] • nonspeciﬁc binding or multiple binding sites [46] • approximate and often less accurate endpoint free en-ergy methods such as MM-PBSA [47] and LIE [48] • Free energy methods that extract the ligand using geo-metric order parameters and potential of mean forcemethods [49] • forceﬁeld dependence for protein, ligand, ions, co-solvents, and co-factors. A number of diﬀerent studieshave looked at the inﬂuence of force ﬁelds and it isassumed the user has made an adequate choice for thesystem under study [50–52].For convenience we have also compiled a list of commonacronyms and common symbols used throughout this paper. LiveCoMS Best Practices GuideAcronyms

CPU — Central Processing Unit

BAR — Bennett Acceptance Ratio

FEP — Free Energy Perturbation

GPCR — G-Protein Coupled Receptor

GPU — Graphics Processing Unit

MBAR — Multistate Bennett Acceptance Ratio

MCSS — Maximum Common Substructure MD — Molecular Dynamics RMSE — Root Mean Square Error

MUE — Mean unsigned error

SAR — Structure-Activity Relationships TI — Thermodynamic IntegrationList of Symbols L and R — generic names for ligand and receptor K ◦ b — binding constant c ◦ — standard state concentration U — potential energy u — reduced potential describing a thermodynamicstate ∆ G — Gibbs free energy (isothermal isobaric ensem-ble) ∆ A — Helmholtz free energy (canonical ensemble) ∆ f — reduced (dimensionless) free energy ∆ˆ f — estimate from an estimator for the reduced freeenergy Γ — conformation space accessible by simulations (cid:126) q — vector of a single conﬁguration, i.e. x , y , z coordi-nates of the simulation system k B — Boltzmann constant Z — partition function p — pressure µ — chemical potential (grand canonical ensemble) T — temperature β ≡ ( k B T ) –1 — inverse thermal energy (cid:126)λ — alchemical progress parameter, which may bemultidimensional g — statistical ineﬃciency O — overlap matrix C t — discrete-time-normalized ﬂuctuation auto-correlation function τ eq — integrated auto-correlation time t — equilibration time Why would you want to run an alchemical free energy cal-culation and why do they work? In this section, we use theexample of relative free energy calculations to sketch the the-ory of alchemical simulations and illustrate their utility. Theemphasis here is placed on bridging theoretical foundationsand intuition. A rigorous derivation of the standard (abso-lute) free energy of binding using the principles of statisticalmechanics can be found in Gilson’s classic work [53].

Suppose you want to compute the binding aﬃnity, or freeenergy of binding, of a ligand L to a receptor R , given by: R + L (cid:11) RL . (1)The binding constant ( K ◦ b ) is given by the law of mass actionas the ratio of concentrations of product [ RL ] and reactants[ R ], [ L ]: K ◦ b = c ◦ [ RL ][ L ][ R ] . (2)The standard state concentration c ◦ depends on the refer-ence state, but it is usually set to 1 mol/L assuming a constantpressure of 1 atm (see also Sec. 7.1.2). Thus, the Gibbs freeenergy of binding ∆ G bind is given by: ∆ G bind, L = – k B T ln K ◦ b , (3)where k B is the Boltzmann constant and T the temperatureof the system. The free energy of binding can be expressed as a ratioof partition functions

A natural, though generally very computationally expensive,way to estimate the equilibrium constant is by directly simu-lating several binding and unbinding events and computingthe probability of ﬁnding the receptor-ligand system in thebound state, P ( RL ), or the unbound state, P ( R + L ). Assumingthe volume change upon binding to be negligible, which isoften the case at 1 atm due to the incompressibility of water,then the Gibbs free energy ∆ G bind, L is approximately equal tothe Helmholtz free energy ∆ A bind, L , and we can simulate thesystem in a box of volume V to obtain [54] ∆ G bind, L ≈ ∆ A bind, L = – k B T (cid:18) ln P ( RL ) P ( R + L ) + ln (cid:0) c ◦ N Av V (cid:1)(cid:19) , (4)where N Av is the Avogadro number, and the last term cor-rects for the simulated concentration being diﬀerent than the LiveCoMS Best Practices Guidestandard concentration. Let Γ bound and Γ unbound be the setof receptor-ligand conformations (cid:126) q that we consider boundand unbound respectively. The probability of a conformation (cid:126) q is given by the Boltzmann probability density function P ( (cid:126) q ) = exp (cid:0) β U ( (cid:126) q ) (cid:1)(cid:82) Γ exp (cid:0) β U ( (cid:126) q ) (cid:1) d (cid:126) q , (5)where β = ( k B T ) –1 is the inverse temperature, U ( (cid:126) q ) is the poten-tial energy of conformation (cid:126) q , and the integration is over theset of all possible conformations accessible in the simulationbox volume Γ , with Γ bound , Γ unbound ⊂ Γ . If the simulationis long enough, we expect the fraction of conformations (cid:126) q found in the bound state to converge to P ( RL ) = (cid:90) Γ bound P ( (cid:126) q ) d (cid:126) q = (cid:82) Γ bound exp (cid:0) β U ( (cid:126) q ) (cid:1) d (cid:126) q (cid:82) Γ exp (cid:0) β U ( (cid:126) q ) (cid:1) d (cid:126) q . (6)After similar considerations for P ( R + L ), we ﬁnd that the ratioof visited bound and unbound conformations, in the limit oflong simulations, should converge to P ( RL ) P ( R + L ) = (cid:82) Γ bound exp (cid:0) – β U ( (cid:126) q ) (cid:1) d (cid:126) q (cid:82) Γ unbound exp (cid:0) – β U ( (cid:126) q ) (cid:1) d (cid:126) q = Z ( RL ) Z ( R + L ) , (7)where we have deﬁned the conﬁgurational integral or conﬁgu-rational partition function as Z (state) ≡ (cid:82) Γ state exp (cid:0) – β U ( (cid:126) q ) (cid:1) d (cid:126) q . Simulating binding events is computationally expensive

While simulating binding events has been used to estimatebinding aﬃnities [54, 55] or to get insights into the bindingpathways and kinetics of receptor-ligand systems [56–60],the computational cost of these calculations is usually dom-inated by the rate of dissociation, which can be on the mi-crosecond timescale even for millimolar binders [55] andreaches the micosecond to second timescale for a typicaldrug [61, 62]. Depending on system size and simulation set-tings, common molecular dynamics software packages canreach a few hundreds of ns/day using currently availablehigh-end GPUs [63, 64], making these type of calculationsunappealing and irrelevant on a pharmaceutical drug discov-ery timescale. Other methods compute the free energy ofbinding by building potential of mean force proﬁles along areaction coordinate [49, 65–67], but these methods requireprior knowledge of a high-probability binding pathway, whichis not easily available, especially in the prospective scenariostypical of the drug development process.

In many cases, the quantity of interest is the change in bindingaﬃnity between a compound A and a related compound B (e.g., by modifying one the drug scaﬀold’s substituents, see(Fig. 1 F ), which, by using Eq. 4 and 7 is given by ∆∆ G bind, AB = ∆ G bind, B – ∆ G bind, A ≈ – k B T (cid:18) ln Z ( RB ) Z ( R + B ) – ln Z ( RA ) Z ( R + A ) (cid:19) . (8)Note that the terms involving the standard concentration can-cel out when we assume that the volume is identical for A and B . Predictions of ∆∆ G bind, AB with non-alchemical methodsgenerally require long simulations of both ligands, possiblythrough diﬀerent binding pathways. Alchemical relative freeenergy calculations avoid the need to simulate binding andunbinding events by making use of the fact that the free en-ergy is a state function and exploiting the thermodynamiccycle illustrated in Fig. 2. This is apparent after rewriting Eq. 8as ∆∆ G bind, AB ≈ – k B T (cid:18) ln Z ( RB ) Z ( RA ) – ln Z ( R + B ) Z ( R + A ) (cid:19) = – k B T (cid:18) ln Z ( RB ) Z ( RA ) – ln Z ( B ) Z ( A ) (cid:19) = ∆ G bound – ∆ G unbound , (9)where ∆ G bound/unbound is the free energy of mutating A to B in the bound/unbound state. Eq. 9 and Fig. 2 tell us thatthe diﬀerence in free energy of binding between toluene ( A )and benzyl alcohol ( B ) can be computed by running two in-dependent calculations estimating the free energy cost ofmutating A into B in the binding pocket ( ∆ G bound ) and in sol-vent ( ∆ G unbound ), saving us the need to simulate the physicalbinding process of the two compounds. In particular, thesecond line of Eq. 9 is a consequence of ∆ G unbound being in-dependent of the presence of the receptor in the simulationbox as the deﬁnition of the unbound state assumes receptorand ligand to be at a suﬃcient distance for them to have noenergetic interactions. Note that, when A and B have diﬀer-ent number of atoms, Eq. 9 implies the presence of a factorhaving units of volume entering both logarithms, which re-quire unitless arguments. However, the value of this factor isinconsequential as it cancels out in practice. How are alchemical transformations performed inpractice?

In practice, the mutation of A to B is carried out by introducingone or more parameters (cid:126)λ controlling the potential energyfunction U ( (cid:126) q ; (cid:126)λ ) such that the potential of compounds A and B is recovered at two particular values (cid:126)λ A and (cid:126)λ B . Brieﬂy, thisis achieved by simulating a “chimeric” molecule composedof enough atoms to represent both A and B . A subset of theenergetic terms in U ( (cid:126) q ; (cid:126)λ ) is then modulated by (cid:126)λ so that at (cid:126)λ A , the atoms that form molecule A are activated and thosebelonging exclusively to B are non-interacting “dummy atoms”,while the opposite occur at (cid:126)λ B (see Sec. 7.1.1 for details). LiveCoMS Best Practices Guide

HostHost

Figure 2. Thermodynamic cycle for computing the relative freeenergy of binding ( ∆∆ G ) between two related small moleculesto a supramolecular host or a rigid receptor. The relative bindingfree energy diﬀerence between two small molecules, ∆∆ G bind, A → B ≡ ∆ G bind, B – ∆ G bind, A —here benzyl alcohol (top) to toluene (bottom)—can be computed as a diﬀerence between two alchemical transfor-mations, ∆ G bound – ∆ G solvated , where ∆ G bound represents the freeenergy change of transforming A → B in complex, i.e. bound to ahost molecule, and ∆ G solvated the free energy change of transforming A → B in solvent, typically water. We can rigorously account for ﬂuctuations in other ther-modynamic parameters such as changes in volume V whensimulating at constant pressure p or changes in number ofmolecules N i of species i at constant chemical potential µ i (e.g., number of waters or ions) by introducing the reducedpotential [26] u ( (cid:126) q ; (cid:126)λ ) ≡ β (cid:34) U ( (cid:126) q ; (cid:126)λ ) + p V ( (cid:126) q ) + (cid:88) i µ i N i ( (cid:126) q ) + · · · (cid:35) . (10)Here, the collection of thermodynamic and alchemical param-eters { β , (cid:126)λ , p , µ , . . . } deﬁnes a thermodynamic state . In the con-text of alchemical calculations, in which the thermodynamicstates vary only in their value of (cid:126)λ , these are also referredto as alchemical states . The free energy of mutating A to B in any environment (e.g., binding site, solvent) can then becomputed as ∆ G env = – k B T ln Z ( (cid:126)λ B ) Z ( (cid:126)λ A ) = – k B T ln (cid:82) Γ env exp (cid:16) u ( (cid:126) q ; (cid:126)λ B ) (cid:17) d (cid:126) q (cid:82) Γ env exp (cid:16) u ( (cid:126) q ; (cid:126)λ A ) (cid:17) d (cid:126) q .(11)While it is generally not feasible to compute the two partitionfunctions Z ( (cid:126)λ ), several estimators have been devised to ro-bustly estimate the ratio of partition functions in Eq. 11 (seeSec. 8.3) from a set of conformations usually collected withMD simulations from the thermodynamic states deﬁned at (cid:126)λ A and (cid:126)λ B and intermediates thereof. Why do alchemical calculations need unphysicalintermediate states?

While it is theoretically possible to estimate the ratio of parti-tion functions from samples collected only at states (cid:126)λ A and (cid:126)λ B ,the eﬃciency of the free energy estimators rapidly decreasesas the phase-space overlap between the two states also de-creases [68, 69]. Roughly, the phase-space overlap betweentwo thermodynamic states measures the degree to whichhigh-probability conformations (i.e., those with very nega-tive potential energy) in one state are also high-probabilityconformations in the other state (see Sec. 8.5 and Fig. 7).To solve the problem of having poor overlap between thestates of interest, multiple intermediate alchemical states areintroduced at values (cid:126)λ A = (cid:126)λ , (cid:126)λ , · · · , (cid:126)λ K = (cid:126)λ B so that eachpair of consecutive states (cid:126)λ k , (cid:126)λ k +1 share good overlap. Eachintermediate state models a ligand that is neither A nor B but a mix of the two. Many estimators (e.g., exponentialreweighting (EXP) [1] and Bennet’s acceptance ratio (BAR) [25,70]) can then be used to compute the free energy as ∆ G env = k B T K –1 (cid:88) k =0 ∆ f ( (cid:126)λ k , (cid:126)λ k +1 ) (12)from samples collected at all the alchemical states { (cid:126)λ k }, where ∆ f is the unitless free energy diﬀerence ∆ f ( (cid:126)λ k , (cid:126)λ k +1 ) = f ( (cid:126)λ k +1 ) – f ( (cid:126)λ k ) = – ln Z ( (cid:126)λ k +1 ) Z ( (cid:126)λ k ) . (13)While this strategy usually results in sampling thermodynamicstates whose Boltzmann distributions are very similar, thuscollecting information that is to some degree redundant,some estimators such as the Multistate Bennett acceptanceratio (MBAR) [26] can exploit similarities between states toimprove the precision of the estimates. This is achieved byusing the conformations sampled at all alchemical states { (cid:126)λ k }to compute the free energy diﬀerence ∆ f ( (cid:126)λ i , (cid:126)λ j ) between anypair of states i , j (see Sec. 8.3). How do absolute free energy calculations diﬀer fromrelative?

While absolute and relative free energy calculations havesubtle diﬀerences in their practical applications (e.g., use ofrestraints, handling of the standard state), the fundamen-tal ideas and concepts of relative free energy approachesremain unaltered in other types of alchemical calculations.Absolute binding, hydration, and partition free energies stilluse thermodynamic cycles that enable computing transferfree energies without actually simulating the physical transferfrom one environment to another.The main diﬀerence in these approaches lies instead inthe thermodynamic cycle to which this strategy is applied.For example, a typical thermodynamic cycle for an alchemical

LiveCoMS Best Practices Guideabsolute binding free energy calculation is represented inFig. 6. In this case, two independent calculations computethe free energy of removing the interactions between theligand and its environment in solvent or in the binding siterespectively through a series of intermediate states in whichthe energy terms are only partially deactivated.

When starting an alchemical free energy project, a key ﬁrststep is to decide whether free energy calculations are reallythe right tool. Particularly, count the cost of your project: Canyou even hope to tackle the problem with available resourcesand, if successful, will it be worth it in terms of human andcomputational cost?

Current alchemical free energy calculations involving smallmolecules seem to achieve, in favorable cases, root meansquare (RMS) errors around 1-2 kcal/mol depending on forceﬁeld, system, and a variety of other factors such as simulationtime, sampling method, and whether the calculations em-ployed are absolute or relative. A small selection of exampledatasets and case studies can be found in Sec. 11 at the endof this document. However, the domain of applicability is asigniﬁcant concern [37, 38], especially for relative calculations,which typically require a high quality and usually experimen-tal bound structure of a closely related ligand as a startingpoint. Additional factors such as slow protein or ligand rear-rangements, uncertainties in ligand binding mode, or chargedligands can make these calculations far less reliable and moreof a research eﬀort.It is worth noting that the accuracy of free energy cal-culations is highly variable across diﬀerent protein targets,and likely across diﬀerent ligand chemotypes as well. Forinstance, FEP+ with OPLS3 achieves an RMSE of 0.62 kcal/-mol for a set of 21 compounds binding to JNK1 kinase, butan RMSE of 1.05 kcal/mol for a set of 34 compounds bind-ing to P38 α kinase [71]. Furthermore, perturbations for thesame chemotype in diﬀerent pockets of the BACE enzymegave varied errors [72]. Here the errors refer to the diﬀer-ence in ∆ G derived by ﬁtting the ∆∆ G ’s to the known ex-perimental binding free energies [15]. This prompts us toconsider another important aspect. It is important to be clearon what error to report: ∆ G after shifting by a constant tominimize the RMSE, unshifted ∆ G , ∆∆ G of computed edges,or ∆∆ G of all edges. (See recommendations for reportingbest practices, Sec. 8.7.) Additionally, as it is possible to per-form calculations on a set of ligands using diﬀerent pairwise comparisons of molecules, the performance of the methodmay be biased based on which pairs of comparisons are per-formed. Additionally, it is possible that the error associatedto the relative free energy between a two ligands that wasnot directly computed (however can be deduced using ther-modynamic paths involving other ligands) will likely be moreuncertain https://github.com/jchodera/jacs-dataset-analysis.Given the need to understand the performance of the systemwith alchemical free energy calculations, we recommend thatretrospective studies for a particular target and a particularchemical series be performed for each application case. Finite computing resources necessarily limit the generatednumber of uncorrelated samples of potential energy surfaces,and therefore alchemical free energy calculations only givefree energy estimates to within ﬁnite precision. An importantconsideration is how reproducible alchemical free energy cal-culations are in practice. In simple cases such as absolutehydration free energies of small organic molecules, or relativehydration free energy calculations between structurally simi-lar small organic molecules, it should be possible to obtainhighly precise estimates with a given software package (witha sample standard deviation under 0.01 kcal/mol) [73]. Formore complex use cases such as protein-ligand binding freeenergies the repeatability is often substantially worse [73]. Agood practice is to perform two or three runs of the sameperturbation to assess precision with a given protocol. Thesample standard deviation will give a crude estimate of thereliability of the estimates, and whether the precision is suﬃ-cient for the problem at hand. When practical, a more strin-gent test is to use diﬀerent input coordinates for each repeatrun.Note that these issues concern calculations carried outwith a single software package, but simulation package vari-ations can introduce additional issues. Such issues of repro-ducibility of free energy calculations across diﬀerent simula-tion packages have attracted attention recently. Greater vari-ability is expected due to methodological diﬀerences such asintegrators, thermostats, barostats, treatment of long-rangeelectrostatics, and potentially other factors. For absolute andrelative hydration free energies of small organic moleculesa variability of ca. 0.2 kcal/mol between popular simulationpackages has been reported [50]. In the recent SAMPL6 SAM-PLing challenge a larger variability of 0.3 to 1.0 kcal/mol wasnoted in the computed absolute binding free energies ofhost/guest systems even though the study sought to useidentical input and simulation parameters [73] and, in manycases, single-point energies were identical or nearly so. Fur-

LiveCoMS Best Practices Guidether work is needed to ensure reproducibility of alchemicalfree energy calculations across diﬀerent software implemen-tations to guarantee that force-ﬁeld development eﬀorts leadto transferable potential energy functions.

Before even planning free energy calculations to study bind-ing to a particular target, it is important to assess what isknown about the system and its timescales and its suitabilityfor free energy calculations, as well as the purpose of the cal-culations and the amount of available computer resources.In some cases, predicting accurate binding free energies fora particular target might be more challenging than simplymeasuring them! This is often the case when dealing withdatabase screening problems, where compounds might beeasily and quickly available commercially for testing and freeenergy calculations could consume far more resources. Freeenergy calculations thus typically only appeal when (slow orcostly) synthesis would be required or experiments are other-wise cost-prohibitive.Sometimes free energy calculations can provide answersthat are not readily available from experiments. For example,type II kinase inhibitors selectively bind to diﬀerent kinases inthe so-called DFG-out conformations [74]. The selectivity ofsuch inhibitors may be attributed either to their diﬀerentialbinding to diﬀerent kinases in the DFG-out conformations, orto diﬀerent stability of the DFG-out conformations of diﬀerentkinases.Let K C be the equilibrium constant between DFG-in andDFG-out conformations of one kinase, and K ∗ D be the disso-ciation constant of a type II inhibitor against this kinase, theapparent binding constant of this inhibitor against this kinaseis then K D = K ∗ D K C K C (14)Since binding experiments cannot resolve K ∗ D and K C indi-vidually, such experiments cannot address the basis of selec-tivity of the type II inhibitors. Absolute binding free energycalculations, in contrast, can take advantage of the slow kinet-ics of DFG-in/out conversion, and estimate the conformation-speciﬁc binding constant K ∗ D , thus yielding clues as to thesource of selectivity. The requisite level of accuracy is another important considera-tion. If the goal is to guide lead optimization when many com-pounds will be synthesized, free energy calculations can beappealing even with accuracies in the 1–2 kcal/mol range [75],but if the number of compounds to be synthesized is very small, this accuracy may not be enough to provide muchvalue.Here we provide a simple estimate of the value providedby alchemical free energy calculations in lead optimization.Let P ( ∆∆ G ) be the probability distribution of the changes inthe binding free energies of a new set of molecules duringone round of lead optimization, and let P ( ∆∆ G † | ∆∆ G ) bethe conditional probability of the binding free energy changecomputed by the free energy calculations, ∆∆ G † , given theactual change ∆∆ G . The latter conditional probability can bemodeled by a normal distribution P ( ∆∆ G † | ∆∆ G ) = 1 √ πσ exp (cid:32) – ( ∆∆ G † – ∆∆ G ) σ (cid:33) , (15)where σ signiﬁes the accuracy of free energy calculations.Here we assume that there is no systematic bias in the freeenergy calculations, i.e., on average, the free energy changecomputed by free energy calculations agrees with the actualfree energy change.In lead optimization guided by free energy calculations, wewill likely only synthesize and experimentally test moleculesthat are predicted to have favorable free energy changes. Weare thus interested in how often that a molecule predictedto bind stronger actually turns out to bind stronger. In otherwords, we are interested in the conditional probability: P ( ∆∆ G < 0| ∆∆ G † < 0). (16)For illustrative purposes, consider a proposed set of newmolecules, and assume that the changes proposed in thesemolecules yield a set of relative binding free energies that fol-low a normal distribution. That is, assume that the standarddeviation in the relative binding free enrgies for the changesrepresented is RT ln 5 (corresponding to a 5-fold change inthe binding aﬃnities), and that 1 in 10 new molecules have in-creased binding aﬃnity ( ∆∆ G ≤ σ = 1 kcal/-mol, P ( ∆∆ G < 0| ∆∆ G † < 0) = 0.35, which means that outof every 10 molecules selected for predicted favorable freeenergy change, on average 3.5 molecules will have actualfavorable free energy change. In other words, selection byfree energy calculations yields 3.5 times more molecules ofimproved aﬃnities than selection without free energy calcula-tions under these assumptions.Available computational resources and timescales of mo-tion also factor into this initial analysis. An individual freeenergy calculation involves simulations at many diﬀerent in-termediate states (perhaps 20-40 or more) and each of thesemust typically be long enough to capture the relevant motionsin the system. If such motions are microsecond events orlonger, the computational cost of running 20-40 microsecond LiveCoMS Best Practices Guideor longer simulations for each of N ligands will likely be pro-hibitive for most users with today’s hardware. Alternatively,if key motions are fast and minimal (as is often assumed inpractice), only much shorter simulations may be necessary. Furthermore, are available computational resources suﬃcientthat throughput will be reasonable compared to needs of ex-perimental collaborators working on this system? How manyligands ( N ) can you aﬀord to handle given your computationalresources? As cloud computing becomes more available, in-house GPU clusters may not be necessary if calculations arenot run on a regular basis. This analysis should be done upfront as part of “counting the cost” of involvement in a partic-ular project. In some cases, the analysis may conclude thatfree energy calculations will not be feasible for the proposedproblem. Here, by “cost”, we refer not just to ﬁnancial cost ofthe calculations relative to experiments, but also time – canthe calculations be run faster than experiments are done?How will the relevant resource and opportunity costs factorin? Both computation and experiment require human time,supplies (of diﬀerent sorts), and equipment. In the extremelimit, for example, it would not make sense to spend a monthrunning a binding free energy calculation if the equivalentexperiment could be done in a day with resources already onhand. Such issues should be considered before deciding toconduct binding free energy calculations. An additional consideration is how much is known aboutyour particular target, ligand binding modes in the target,and any relevant motions – essentially, has it been studiedenough to know whether it might be suitable for free energycalculations? It is important to know if the system has hardlybeen studied, because should the initial calculations performpoorly, the eﬀort may turn into an attempt to understand therelevant sampling, force ﬁeld, or system preparation prob-lems.If you are unsure whether your project is feasible, as men-tioned above, one recommended option is to conduct a shortexploratory study to assess tractability for a small number ofligands. This can be suﬃcient to get an initial idea of feasibilityand accuracy of the calculations for the proposed target [36].

Many practitioners expect alchemical methods to providevaluable guidance for drug discovery, and to exhibit accu-racy superior to most alternative approaches for suitabletargets [76]. Successful application in industry may require considerable knowledge of the “domain of applicability” offree energy calculations – where they work well and wherethey will not [38]. Successful application also requires robustprotocols for preparing, submitting and analysing alchemi-cal calculations. In this regard, the issues mentioned in theprevious section such as understanding the suitability andtimescales to capture the structure activity relationships (SAR),and performing up-front tests of performance are all relevantto drug discovery applications. Without venturing too far intodetails of system setup, which is beyond the scope of thisarticle, we highlight some critical factors aﬀecting accuracyand successful application.

The calculations aim to capture the alchemical change fromone ligand to another as accurately as possible. Therefore,it is necessary to consider details of the experimental setup,such as pH. Biological assays are usually run at neutral pHbut this is not always the case. For example, some enzymesexhibit pH-dependent activity and assays may thus be donein conditions other from neutral pH. Therefore, computa-tional protein and ligand preparation protocols should reﬂectexperimental pH.The formal charge and/or tautomeric state of the smallmolecules can change within a series of analogs, necessi-tating care in treatment. Additionally, medicinal chemistryeﬀorts might deliberately modify the pKa of a series to modifydrug properties, requiring explicit eﬀorts to incorporate thesechanges into alchemical calculations.To ensure modeling matches experiment, we also needto accurately prepare and simulate the same system – whichrequires understanding what protein construct is used inthe bioassay. For instance, does the X-ray structure that isto be used for the calculations match the construct usedfor screening (i.e. only the catalytic domain vs. full length,monomer vs. dimer, etc.) [77]? Also, were certain co-factorsor partner proteins required in the bioassay?

As also mentioned, good performance of alchemical calcula-tions requires an accurate representation of the ligand bind-ing mode, usually from a high quality X-ray crystal structure.If more than one structure is available, the modeler shouldpay attention to choose the most suitable. The quality of thestructure can be a concern, and the reader is referred to workof Warren et al. for a detailed discussion of choosing optimalstructures for structure-based modeling [78].It is also useful to study the structure activity relationshipand understand the expected impact of any mutations onthe binding site, such as whether side chain movement inthe protein will be required, and whether there is evidence

LiveCoMS Best Practices Guideof this in any alternative X-ray structures of the same protein.Often, only one protein and water conﬁguration is used for aseries of alchemical calculations, so this needs to be capableof accommodating the smallest through to largest ligands ina way that allows stable and well behaved simulations. Thiscan provide a practical limit on the alchemical changes thatare feasible, though a simple work-around can be to separatecompounds into sub-series for diﬀerent calculations.If multiple structures are available there is some evidencethe higher aﬃnity complex can give better performance [79],at least in some cases. However, ligands and proteins can alsoundergo unexpected changes in binding mode for relatedligands, which can make these issues more complex to dealwith [16].

In a drug discovery setting it is normal to consider dozens(or more) of ligands and it is necessary to align them in thebinding site. There is no detailed study of how diﬀerent align-ment approaches may aﬀect results, but the user should beaware of some practical considerations. Tools are available tocompare the ligands and build the hybrid topologies that de-ﬁne the changes between one ligand and another [34, 80, 81].In simple terms, providing poor alignment to these tools willmake this job harder. Docking with restraints is often beneﬁ-cial in this regard. Particularly, ﬁxing the 3D spatial positionof the scaﬀold using maximal common substructure (MCSS)restrained docking can help provide well aligned input forthe topology generation. Nevertheless, in this case carefulattention is still needed to ensure consistency of alignmentfor identical substituents. Another alternative is to manu-ally edit the same core and add/modify the changing sub-stituents. This provides assurances that coordinates for thenon-perturbed portion of the structure remain identical andaromatic substituents, for instance, have consistent dihedralangles. However, it is not feasible for many compounds andtherefore automation is desirable.Finally, the role of water in ligand binding is not alwayswell understood and it can be crucial to capture the changesin binding site solvation during ligand binding. Can crystal-lographic waters be retained? Do they clash with some ofthe larger ligands used in the alchemical perturbation? SeeSec. 6.1 for diﬀerent strategies that can be applied to dealingwith waters. Generally, before launching large numbers ofalchemical free energy calculations it is always recommendedto test the system using classical MD simulations and limitednumbers of alchemical perturbations. Metrics such as ligandand protein RMSD and RMSF can be inspected, along with vi-sual inspection of simulations, to ensure the system is stableand likely to be suitable for alchemical calculations.Running binding free energy calculations in a drug discov- ery application will typically require the use of software ortools to facilitate the large number of calculations. Commer-cial implementations such as FEP+, OpenEye Tools, or Flareallow for a fast setup and deployment to GPU hardware inminutes, but may have limited ability to customize calcula-tions [15, 19]. Commercial tools can be expensive in somecases, but non-commercial tools are becoming more straightforward to use to run alchemical free energy calculations [17–19, 34, 80–82].For relative free energy calculations, various graph topolo-gies or maps of calculations are possible, and choices maydepend on the target application. For instance, if the goal is toaccurately assess the relative binding energy of a small num-ber of compounds, possibly with challenging synthesis, themap of perturbations should contain as many connectionsbetween compounds as aﬀordable. However, when runningcalculations on hundreds of compounds a so called star-map (see Fig. 5 A ) can be used that just contains one connectionper compound: perturbing every compound to a central lig-and, typically the crystal structure ligand [83]. In this way thetop-ranking examples can be readily identiﬁed and submittedto additional calculations in a second round. Alternatively, ifthe goal is to achieve the smallest possible error with mini-mal computational expense, certain graph topologies providebeneﬁts [84, 85] For prospective drug discovery applications there are severalother considerations including understanding likely errorsand taking selection bias into account.It is crucial when proposing compounds for synthesis tohave some idea of the underlying error or uncertainty in thepredictions. A retrospective assessment can give an indica-tion of prospective performance for similar molecules [86].Beyond this, several parameters provide useful indicatorsof performance. For example error estimates provided byfree energy estimators that are too large can highlight poorlyconverged simulations [79]. Hysteresis, within cycles in theperturbation network or between forward and backward per-turbations can be checked [87] to indicate problematic per-turbations involved in cycles connecting many compounds(See also Secs. 7.1.1 and 8.5). Once synthesis and testing ofcompounds is complete a standard strategy is to look back athow the calculations performed. In this regard it is importantto consider the issue of selection bias upfront. It is temptingto only synthesize the compounds predicted to be most ac-tive, thus a narrow range of calculated activity is tested thatimposes limits on the statistical assessment of performance,ideally example molecules from across the range of predicted

10 of 48

LiveCoMS Best Practices Guideactivity can be assessed or corrections can be applied basedon previous recommendations [88]. For a more detailed dis-cussion on checking the robustness of your alchemcial freeenergy calculation see also Sec. 8.5.In summary, the successful use of alchemical calculationsnot only in, but particularly for drug discovery requires work-ing in the domain of applicability, using a high quality X-raystructure of the target bound to compounds in the series,and testing the approach retrospectively to ensure the sys-tem setup is well-behaved. Always assess your conﬁdencein the resulting predictions and communicate this when dis-cussing with experimentalists. Consider performing repeatcalculations for at least some of the perturbations in thestudy. There are many accounts of success of alchemical cal-culations, the methods show good performance towards thegoal of binding energy prediction. However, it is important tohave realistic expectations.Structure based drug design projects are often capable ofimproving potency relatively quickly, even with only limitedapplication of computational approaches and the range ofactivity narrows to just two-to-three log units. It may seemhard to have impact with substantially diﬀerent, more potent,stand-out compounds in this scenario, but binding energyprediction can still be extremely useful for ensuring activity ismaintained as other properties are optimized. An interestingcost beneﬁt analysis has shown the value of activity prediction,see discussion above and articles such as [75]. From a drugdiscovery point of view, alchemical calculations are expandingtheir domain of applicability, and there are reports of successusing homology models [89] and GPCRs [90, 91] for instance,as well as enabling charge change and scaﬀold hopping [92,93], but these systems are undoubtedly more diﬃcult. Inthe meantime, the use cases are expanding to resistanceprediction, selectivity prediction , solubility prediction – anexciting future for alchemical calculations [6, 94, 95].

Alchemical free energy protocols as discussed below (Sec. 7)are deﬁned for a speciﬁc type of free energy calculation, i.e. afree energy of binding or a free energy of hydration. Diﬀer-ent types of simulations require diﬀerent choices for ligands,solvent, and host molecules (in the case of the estimation offree energies of binding).

In principle, in the limit of suﬃcient conformational sampling,the free energy changes estimated from an alchemical free en-ergy calculation should be independent of the system’s initialcoordinates. However, in practice, because simulations are ofﬁnite duration (typically 1-100 ns per state at present), this is only true for certain classes of alchemical free energy calcula-tions such as relative or absolute free energies of hydrationof small and relatively rigid organic molecules. Protein-ligandcomplexes typically exhibit slowly relaxing degrees of free-dom that signiﬁcantly exceed the duration of an alchemicalfree energy calculation, and host-guest calculations can besusceptible to these issues as well, depending on timescaleand system. It is therefore generally important to carefullyselect input coordinates to obtain satisfactory results. Thefollowing questions may be relevant before diving into thesimulation setup. • Do I have one or multiple good receptor structures? (e.g.a good resolution X-ray crystal of the protein target) • Do I have information on one or all of the ligand bindingsites (e.g. a X-ray structure) • Should I include buried waters, or other small moleculesthat can be found in an X-ray structure? • Are my ligands part of a congeneric series? (i.e. simpleR group substitutions around the same scaﬀold)

Are there good X-ray structures available?

As with any simulation, care should be taken in selecting avail-able X-ray structures in the Protein DataBank [96]. In somecases it may be wise to choose multiple starting structures toaccount for variability in receptor conformations as well asthe accuracy of available X-ray structures. Typically, clusteringof receptor structures can be used to identify diﬀerent recep-tor conformations near the binding site, as well as assessingrelevant side chain placements from the X-ray structure, seefor example [16]. In terms of set up and other choices, follow-ing general best practice guidelines is advisable [40].Many free energy calculations focus on a congeneric seriesof ligands, which can make these calculations suitable for rela-tive free energy protocols (see Sec. 7). For relative calculations,some care has to be taken selecting binding poses for theseligands. Generally, a common assumption for a congenericseries is that the binding mode is conserved. Therefore, if anX-ray structure of one of the ligands is available, this shouldbe used to position the ligands in the putative binding sitein an energetically reasonable conformation without stericor electrostatic mismatch with the receptor. Checking theX-ray structure versus the experimental electron densitiesis important, as the position of part of the ligand or impor-tant sidechains may be based on the interpretation of thecrystallographer rather than the available electron density,especially in cases of missing density. For example, lookingat a cyclohexane ring density, a chair conformation is vastlymore likely than that of a boat and, if a boat conformationis present in the structure, it may be worth inspecting thedensity to ensure it adequately supports this choice.

11 of 48

LiveCoMS Best Practices Guide

Are you prepared to deal with any binding modechallenges?

Generally, binding modes within congeneric series are con-served [97], however, exceptions exist [98, 99], as discussedin more detail in Sec. 7.2.6. Certain functional groups may beparticularly prone to this due to symmetries or near symme-tries. One such issue involves a 180 degree ﬂip in the dihedralangle of an aromatic ring, or ﬁve-membered ring leading toa diﬀerent spatial position of ortho- or meta- substituentsthat otherwise should overlap within a series. The 180 de-gree ﬂip of the ring may not occur enough during simulations(due to steric obstructions) to overcome bias due to the start-ing conﬁguration. Another scenario may be equatorial andaxially substituted saturated rings (e.g. cyclohexane deriva-tives). This situation may be addressed by explicitly modellingdiﬀerent binding modes of the same ligand and combininglater computed free energy diﬀerences for diﬀerent bindingmodes into a relative free energies of binding [100].

Have you considered sterioisomers and enantiomers?

Congeneric series can contain stereoisomers or enantiomerswhich can bind very diﬀerently, resulting in large errors iftreated incorrectly. For racemates, the relative abundanceof each stereoisomer is normally not known. Therefore,the experimental activity associated with just one stereoiso-mer/enantiomer is more uncertain. However, the modelingtypically uses just the bioactive conformation that best ﬁts theactive site. Clearly this introduces potential for larger errorscompared to experiment. Nevertheless, if all compounds inthe congeneric series are racemic, originating from similarsynthetic procedures with an expected similar abundanceof stereoisomers, then the diﬀerences may cancel and thetrend in calculated and observed binding energies may be ro-bust. Despite this, we can see that care and further testing isneeded in this scenario, and the quality of the predictions maysuﬀer. Additionally, unexpected changes in what stereoiso-mer binds experimentally, if they occur, could pose signiﬁcantchallenges for modelling eﬀorts.

Conserved binding site waters can play an importantrole in binding free energies

Binding site water molecules may form water mediatedprotein-ligand interactions which can pose challengeswhenever exchange with bulk water is slow compared tosimulation timescales. This happens typically in buriedbinding sites. Overlaying multiple protein X-ray structurescan identify conserved or additional water molecules thatcan be useful to include in calculations. In cases wherewater molecules are known to play an important role in thebinding, software implementations that use water samplingfacilitated by Grand Canonical Monte Carlo methods may be useful, i.e. consider pre-solvation methods such as GCMCsteps [101]. Other tools such as WaterMap or open sourceequivalents (SSTMap, GIST, and others) can be used to deﬁnewater structure for systems with no experimental evidenceof water sites [102]. Well known protein systems with watermediated ligand interactions are for example: HSP90 whichformed part of the D3R grand challenge 2015 [16], A2A [103],MUP [104], [90], and others [105].

Protonation states depend on the pH of theexperimental assay

Care should be taken when preparing ligands and proteinsto match the pH of the experimental assay, if known. Asmentioned above in Sec. 5.1, the pH of the assay can diﬀerfrom neutral pH and will determine the protonation states ofthe proteins and ligands. Since the pKa of reference aminoacid sidechain residues is known, but can vary in the pro-tein environment, many diﬀerent tools have emerged forpredicting sidechain pKa in proteins, such as the H++ server,ProPKa, APBS, and Maestro [106–109]. Strongly acidic (Glu,Asp) or basic (Arg, Lys) sidechains can reliably predicted tobe ionised, but care is still needed as the local environmentcan modify expected ionization states (for instance the cat-alytic Asp dyad in proteases). Histidine is notoriously morediﬃcult to predict as its pKa suggests it ionizes closer to theexperimental pH range. For ligands often the pKa needsto be determined, if not known experimentally. There aremany diﬀerent available tools for this purpose, but com-mon choices may be propKa [107, 110], Chemicalize (https://chemicalize.com/welcome), or Maestro [109]. Still, accuratepKa prediction for small molecules remains a challengingproblem, even with dedicated tools [111]. While often it canbe assumed that the protonation state of a ligand and proteinwill remain the same as a ligand binds, some care needs to betaken with systems where the protonation state may changeupon binding [112]. BACE, for example, famously undergoesa protonation state change on ligand binding.

Congeneric series often need alignment

Input coordinates for a congeneric series may be generatedby docking calculations, or by ligand alignment using MCSSalgorithms. The latter tends to produce alignments that aremore conserved and more consistent free energy changesacross a dataset, but will struggle to yield reasonable resultsfor relative binding free energy calculations that involve a sig-niﬁcant binding mode rearrangement. This may also lead tosteric clashes with the receptor coordinates of the referenceligand if structural rearrangements are needed to accommo-date diﬀerent members of the congeneric series. Small stericclashes may be resolved during subsequent simulation equi-libration prior to data collection, but there is a risk that the

12 of 48

LiveCoMS Best Practices Guidecomplex relaxes to an alternative metastable state.An additional consideration arises for single topology rela-tive free energy calculations. In this class of alchemical freeenergy calculations it is necessary to generate a moleculartopology that may describe the initial and ﬁnal states of theperturbation (see Fig. 3). In cases where the end-states havehigh topological similarity and high structural overlap thisis relatively straightforward and typically handled by use ofMCSS calculations. In situations where the end-state topolo-gies diﬀer signiﬁcantly, or there is relatively little spatial over-lap between the two end-states some user intervention maybe necessary to produce a satisfactory input topology.If the binding site location is uncertain but the structureof the receptor is well deﬁned and plausible binding sites areidentiﬁed, it may be more useful to choose an absolute freeenergy protocol to compute the standard free energy of bind-ing of the ligand to a set of binding sites. This requires theuser to prepare input ﬁles describing the bound conformationin diﬀerent putative binding sites [113]. The apparent bind-ing free energy of the ligand may be obtained by combiningthe individual binding site free energies, which also indicatewhere the ligand is more likely to bind. In this case a dockingprogram can generate initial structures. Diﬀerent commer-cial and none commercial tools are available, such as rDock,Autodock Vina, Glide, or Flare, to name a few [19, 114–116].If the putative binding sites are not apparent, for instancedue to signiﬁcant induced-ﬁt eﬀects, it may be challenging toobtain meaningful free energies of binding. One may have toaccount for the free energy cost of forming a binding site inthe target receptor which may not be feasible on alchemicalsimulation timescales.

Preliminary considerations necessary for using free energymethods to compute partition coeﬃcients are generally morestraight forward. For example, a 3D minimised structureof the solute can be generated with a simple tool such asopenBabel and solvated to prepare the input to computea free energy of hydration [117]. However, in these casesa careful choice of forceﬁeld, as well as water models ororganic solvents is essential. See for example [3, 4] for a gooddiscussion of these choices. And, while sampling problemsmight seem to be a non-issue for small molecules, this is notalways the case; e.g. even the hydroxyl orientation on neutralcarboxylic acids can occasionally pose a challenge [118, 119].

Alchemical free energy calculations can be grouped into twomain categories, “absolute” (see Fig. 6) and “relative” (seeFig. 2), which diﬀer in whether they compute properties for asingle molecule (absolute) or compare properties of diﬀerent,usually closely related, molecules (relative). To use binding asa concrete example, in absolute binding free energy calcula-tions, we compute the binding free energy of a ligand to anindividual receptor relative to a standard reference concen-tration.In contrast, in relative binding free energy calculations, wecompare the binding free energy of two related inhibitors todetermine the potency diﬀerence. Many of the issues around simulation setup and protocolchoice for alchemical calculations are common, but thereare some diﬀerences between absolute and relative calcula-tions. We will consider protocol diﬀerences before treatingthe common elements.

A critical ﬁrst step in relative calculations is to select an ap-proach to these calculations, determining whether to use a dual topology , single topology , or hybrid topology approach torelative calculations.The distinction between these can be illustrated by con-sidering a hypothetical transformation from molecule A tomolecule B, where both atoms share a common substructurebut diﬀer in their substituents; e.g. consider a transformationof benzene to benzyl alcohol Fig. 3. In this case the commonsubstructure is the benzene ring, though the substructuremay be larger depending on how it is deﬁned, as we discussbelow.In single topology calculations, the overall transformationis set up to involve as few additional atoms as possible, sobenzene would be typically changed into benzyl alcohol bychanging one of the hydrogens into a carbon. This site willalso be the future home of two additional hydrogen atomsbound to the new carbon, so these must initially be presentas non-interacting atoms called “dummy atoms”, which retaintheir bonded interactions but do not interact with the restof the system. Bond parameters as well as partial charges The distinction is a bit of a misnomer, since both compute ratios of partitionfunctions relative to another state and in that sense are relative, while neithercomputes an absolute free energy.

13 of 48

LiveCoMS Best Practices Guidebetween the changing atoms are adjusted accordingly be-tween the initial and ﬁnal state. Thus, in a single topologycalculation, atoms may change their type, ensuring minimaldummy atoms are created. This is illustrated in the left armof Fig. 3.In contrast, in a dual topology alchemical free energy cal-culation, no atoms are allowed to change type [43]. Thismeans that the benzene to benzyl alcohol transformation in-volves starting with benzene plus the non-interacting dummyatoms making up the hydroxy methyl group, then passingthrough an intermediate state where some atoms are partiallyinteracting— particularly, those atoms which are becomingdummy atoms or ceasing to be dummy atoms [120]. Thetransformation ﬁnally culminates in a state where benzyl alco-hol is present along with the additional dummy atom whichwas previously a corresponding hydrogen of the benzene.Fig. 3’s right branch depicts how such a dual topology works.Hybrid topology calculations have seen much less use butessentially consist of two absolute free energy calculations inopposite directions at the same time, turning one molecule’sinteractions with the environment oﬀ, while turning the othermolecule’s interaction on [121, 122].Most existing software implementations currently use sin-gle or dual topology approaches; for example, AMBER TI usesa dual topology approach, while BioSimSpace uses a singletopology approach. Please make sure to check what approachis used with your software package of choice, or whether itsupports your choice of approach (GROMACS, for example,supports both). To our knowledge eﬃciency diﬀerences havenot been thoroughly explored, though conventional wisdomsuggests that fewer dummy atoms are better, as introducingor removing atomic sites is usually more diﬃcult, requiringmore intermediate steps [75, 123].

Atom mapping

Once a particular approach to the topology is selected, a cru-cial next step is to identify the common atoms which willnot be perturbed. Rigorously, this process typically com-prises a MCSS search of the molecules involved to identifythe common substructure—though the parameters of theMCSS search will diﬀer depending on whether single or dualtopology calculations are planned. Speciﬁcally, with a singletopology approach in mind, atom types are allowed to change,so a permissive MCSS search can be done, whereas with dualtopology a more strict search is required.There are diﬀerent tools that allow the generation of MCSSmatches as well as single topology input. A large number ofsoftware tools can compute MCSS matches using diﬀerentcheminformatics packages. Some rely on RDKit [124], suchas LOMAP [123], FESetup [80] and partially BioSimSpace [34], HM H DuMDu DuDu Du Du DuDu DuHM M M M M MM MM Du

Figure 3. Two common topologies for alchemical calculations:single and dual topology. Left : A single topology uses softcore po-tentials to convert from one type of atom to an other. Dummy atoms(Du) are used when there is no corresponding maximum commonsubstructure match between the two molecules for certain atoms.

Right

LiveCoMS Best Practices Guide

Figure 4. Illustration of maximum common substructurematches

MCSS is shown in green for when ( A ) a restrictive MCSSmatch is used and in ( B ) ring breaking is allowed. while others such as fkckombu [125] are standalone tools.Schrödinger’s FEP+ planning tool was originally based on aversion of LOMAP, and it also uses MCSS matching as wellas 3D considerations to plan the network of single topologycalculations between molecules [15].MCSS searches can be relatively time consuming, so ifthe goal is to assess a library of ligands to identify promisingpairs for relative calculations, it can be helpful to use fasterapproaches such as shape similarity to perform an initialsimilarity assessment and then use MCSS only to identifyﬁnal mappings for relative calculations [126–128]. The MCSSapproach, though relatively standard, takes into account onlytopological similarity. It is possible that changes in bindingmode could actually require a diﬀerent choice of mapping, soin some cases mappings may need to be planned diﬀerentlydepending on 3D positioning of atoms in space.Single topology relative calculations, and calculationsbased on substructure searches, only work if in fact theligands share a common substructure, e.g. are part of acongeneric series, see Fig. 4. If no common substructure isshared, then alternative dual or hybrid topology free energycalculations are needed, where one would co-localize a pairof compounds in a binding site, exclude their interactionswith one another, and compute the relative binding freeenergy by turning one molecule on from being dummy atomswhile turning the other oﬀ. To our knowledge no generalpipeline for such calculations yet exists and this would likelyremain a research problem. Using an absolute free energyapproach instead seems more promising in such a case. Ring breaking and forming.

Relative free energy calculations for ring breaking and form-ing are particularly challenging/problematic (see Fig. 4 B ), inpart because relative calculations rely on the free energy con- tributions of dummy atoms canceling between diﬀerent legsof the thermodynamic cycle, which may not be true wheneverdummy atoms are involved in rings [129]. Some approacheshave attempted to address this [130] but a general solutionis not yet in mainstream use, though FEP+ implements onesolution. Perturbation maps

Based on the input ligand series, a perturbation map or net-work can be planned. Recent heuristics have shown the moreconnected the perturbation network the better, however,there is a way to optimize network structure while minimizingthe number of perturbations that need to be computed re-ducing the resulting computational cost [84, 85]. Sometimesthe introduction of intermediates that are not part of the orig-inal congeneric series are essential to avoid ring breaking, ordeal with perturbations that would otherwise result in largenumbers of atoms being inserted or deleted. Some commer-cial tools have good underlying heuristics but may fail withcomplicated input, needing user validation in particular whendealing with chiral compounds.In some cases, during the lead optimization stage, or forvery large datasets that would beneﬁt from rougher initialfree energy ranking, or in cases where perturbations wouldbe rather large a star shaped network as seen in Fig. 5 A isused. However, adding redundancy into the network meansthat a better error analysis can be carried out, by looking atcycle closure errors as discussed in sec. 8.5, with an examplegiven in Fig. 5 B .Methods in experimental design have been applied tothe construction of the perturbation maps. Yang et al. [84]optimized the perturbation map by selecting a ﬁxed num-ber of calculations from the pairwise perturbations so thatthe resulting set of calculations minimize the total variance.Xu [85] optimized the perturbation map by allocating diﬀerentamounts of simulation time to diﬀerent pairwise perturba-tions so as to minimize the total variance, given the totalsimulation time of all the perturbation calculations. Both ap-proaches lead to substantial reduction in the statistical errorof the estimated free energies. Constraints and relative free energy calculations

One issue which requires particular care is the use ofconstraints. Commonly, bonds involving hydrogen areconstrained to a ﬁxed length using algorithms such asSHAKE or LINCS, allowing the use of longer timesteps [131].However, in single topology relative free energy calculations,the atoms involved might be mutated to other atom types– for example, in a mutation of methane to methanol, onehydrogen might become an oxygen atom. Typical moleculardynamics engines are not set up to recognize this change,

15 of 48

LiveCoMS Best Practices Guide AB Figure 5. Examples of perturbation networks ( A ) Star shaped net-work with the crystal structure in the center. ( B ) Network with cycleclosures (see more on this in Sec. 8.5). Arrows indicate the directionof the perturbation. Fully converged binding free energy calculationsyield binding free energy changes which sum to zero around anyclosed cycle. However, in practice errors may not sum to zero aroundclosed cycles, providing a way to look for potential sampling prob-lems. Here in (B), green cycles indicate cycles with hypotheticallygood cycle closure, red those with poor cycle closure. The red arrowindicates a poorly converged simulation that would give rise to badcycle closures. The diamond indicates the use of a crystallographicbinding mode. Figure 6. Thermodynamic cycle required for an absolute freeenergy calculation – absolute free energy of binding example

The fully interacting ligand in water ( A ), has its charges turned oﬀ topass to ( B ) followed by turning of van der Waals terms, resulting in anon-interacting ligand in water in ( C ). Restraints are used on the fullyinteracting ligand in the binding site of a protein or host molecule ( D ).The next step is to turn oﬀ the charges again ( E ) followed by the vander Waals interactions resulting in a non-interacting complex state( F ). Free energyes can be computed as ∆ G bind = ( ∆ G elecsolv + ∆ G VdWsolv ) –( ∆ G elecbound + ∆ G VdWbound ). or at least not to correctly include contributions to the freeenergy from changing constraints/constraint length, soresults for a transformation would usually be erroneous. Atpresent the most general solution to this problem is simply toavoid the use of constraints (and thus use a smaller timestepif necessary, usually of around 1 fs) in any relative free energycalculation involving a transformation of a constrained bond,as done by GROMACS. Absolute free energy calculations involve completely remov-ing the interactions between the ligand or solute and its envi-ronment, taking it to a non-interacting state that may or maynot retain intramolecular non-bonded interactions. This non-

16 of 48

LiveCoMS Best Practices Guideinteracting state can then be shifted between environments(from the protein to water, or from one solution to another)without changing its free energy other than that due to thechanging volume of the simulations, and then interactionscan be restored in the new environment.Absolute free energies are by deﬁnition reported withrespect to a speciﬁc reference or standard state, which eﬀec-tively determines the arbitrary point at which the free energyis 0. The role of the standard state is particularly evident fromthe expression of the binding free energy between a receptor R and ligand L ∆ G = – k B T ln (cid:0) c ◦ K b (cid:1) = – k B T ln (cid:18) c ◦ [ RL ][ L ][ R ] (cid:19) . (17)Here, the reference state concentration c ◦ converts the bind-ing constant K b into a dimensionless quantity expressed inreference concentration units. It should be noted that ignor-ing the term c ◦ is equivalent to assuming a reference concen-tration of 1 D –1 , where D are the units used to express K b ,and would thus cause the value of ∆ G to vary with the choiceof the units. It is convenient to deﬁne a standard state at aconstant pressure of 1 atm and where each chemical species(i.e., A, B, and AB) in the reaction solvent has a concentrationof c ◦ = 1 M = 1 molecule/1660 Å but do not interact withother molecules of R , B , or RL . Handling the standard state in absolute free energycalculations.

For solvation free energy calculations, handling the standardstate is typically straightforward, and treating it correctly sim-ply means ensuring that the non-interacting solute is takento the same (or equivalent) ﬁnal reference state in both en-vironments, e.g. that the transformation involves a 1 M to 1M equivalent transfer free energy (where the non-interactingsolute still occupies essentially the same volume as the solutein the interacting system). So typically in such cases no specialcare is required to ensure the correct standard state, as longas the experimental data being analyzed uses the same stan-dard state. If this is not the case, a simple entropic correctionis needed.For binding, however, the situation is more complex andrequires special care. Because the simulations are typicallyperformed using restraints and at concentrations that arediﬀerent from 1 M, the expression of the free energy requiresthe following correction [53] (see an example of such a ther-modynamic cycle in Fig. 6) ∆ G ◦ restr = – k B T ln (cid:18) c ◦ V L (cid:19) – k B T ln (cid:18) ξ L π (cid:19) , (18)where V L and ξ L are respectively the volume of thetranslational and rotational degrees of freedom of the non-interacting ligand in the simulation box. When no restraints are used, the non-interacting ligand is free to translate androtate in the simulation box (i.e., V L = V box and ξ L = 8 π ),and the rotational term is zero. A suﬃciently thoroughexploration of the simulation box by the non-interactingligand is, however, required for the formula to be valid.This is typically hard to achieve as the exploration processis governed by diﬀusion. The addition of a restraint limitsthe volume available to the non-interacting ligand, thusspeeding the convergence of the sampling. In addition,when enhanced sampling methods such as Hamiltonian λ exchange are used (see Sec. 7.2.4), the use of a restraint istypically necessary as it keeps the ligand in the binding sitein the interacting state (see also Sec. 7.2.1) and generallyreduce the round-trip time of replicas. When restraint areemployed, the values of V L and ξ L are restraint-dependent,but for commonly employed restraints, these can be usuallyeasily computed analytically or numerically by solving therelevant integral. Several choices of restraints are possible.

In practice, a variety of types of restraints are common, fromsimple harmonic distance restraints between the ligand andthe protein [132], to ﬂat-bottom restraints which work sim-ilarly but only exert a force if the ligand leaves a speciﬁcregion [133]. Because these restraints do not limit the rota-tional degrees of freedom of the ligand, the rotational termentering the correction in Eq. 18 is zero.Alternatively, a set of restraints proposed by Boresch havealso commonly been employed, where all six rigid-body de-grees of freedom governing the orientation of the ligandrelative to the receptor are restrained [134, 135]. Further re-straints, such as on the overall ligand RMSD have also beenused [65].In principle, all of these forms will yield correct bindingfree energies in the limit of adequate sampling if their eﬀectsand connection to the standard state are correctly handled,but they have diﬀerent strengths and weaknesses. For exam-ple, with more involved restraints, sampling at intermediate (cid:126)λ values will usually not need to be as extensive but morecomputational eﬀort must go to computing the restrainingfree energy. Additionally, such restraints would typically keepthe ligand from exploring alternative binding modes. Thisrestriction may be undesirable when using Hamiltonian λ ex-change or expanded ensemble techniques where allowing theligand to exchange binding modes when it is non-interactingcould provide sampling beneﬁts [136]. More speciﬁcally, ﬂat-bottom restraints might allow a ligand to explore multiplebinding sites, harmonic restraints multiple binding modeswithin a site, while Boresch restraints a single binding modewithin a single site. See additional discussion of the possibilityof multiple binding modes in Sec. 7.2.6 below.

17 of 48

LiveCoMS Best Practices GuideMany choices of restraints involve selecting referenceatoms. Again, in principle this choice is unimportant givenadequate simulation time but practical considerations maybe important. The choice is likely especially important withBoresch-style restraints, where some relative placements ofreference atoms are likely to be numerically unstable; ad-ditionally, ligand reference atoms should likely be in a partof the molecule which deﬁnes the binding orientation well,rather than in a ﬂoppy solvent-exposed tail, for example.

In binding free energy calculations, only the conformations inwhich the receptor and ligand form a bound complex shouldbe sampled from the bound states (Sec. 3). Determining whatthe bound states actually are can be challenging for weaklybound ligands. For tightly bound ligands, virtually all reason-able deﬁnitions of the bound state will lead to be equivalentfree energies, since the partition function will be dominatedby a relatively small number of low-energy poses. For weakbinders, this simpliﬁcation breaks down. In fact, the correctbound state may depend on the type of experiment per-formed. For example, isothermal titration calorimetry (ITC) orsurface plasmon resonance (SPR) measurements eﬀectivelydeﬁne a binding state that includes all ligand comformationsthat are complexed with the protein, regardless of where onthe protein they bind. In contrast, ﬂuorescence polarizationcompetition assays measure binding to only a single location,where the ligand of interest displaces a competing binder.Therefore, care must be taken to ensure that a reasonabledeﬁnition of the binding site is used [136]. In absolute calcula-tions, this applies to the fully interacting state in the complexleg of the thermodynamic cycle (top-right state in Fig. 6), whilein relative calculations this must be true at both end statesof the complex leg (top- and bottom-right states in Fig. 2).In principle, this requires deﬁning which conformations areconsidered to be bound before running the calculation, butit is common practice to start the simulation with the ligandalready placed in the binding site and rely on kinetic trappingto maintain the bound complex. This strategy, however, canfail when the dissociation rate of the ligand has the sameor smaller order of magnitude than the length of the sim-ulation. This is typical of weak binders such as fragmentsbinding shallow pockets with µ M-mM aﬃnities [55, 137]. Inthis case, using a ﬂat-bottom or harmonic restraint betweenreceptor and ligand in the bound state(s) can prevent dissoci-ations [73, 137]. We stress that this is normally avoided as itgenerally introduces bias in the free energy estimate, which is why the restraint is usually activated only in the intermediatestates in absolute calculations. The bias can be correctedthrough reweighting schemes [73], but this post-processingstep can be avoided if a ﬂat-bottom restraint is used andthe ligand never hits the potential wall during the simulationin the bound state. It is important to note that the springconstant and/or radius parameters of the restraint eﬀectivelydetermine which conformations are considered to be bound.As a consequence, these parameters must be tuned to thesystem so that only the binding site is accessible to the ligand.Again, this step is particularly important for weak binders astheir free energy of binding is known to be more sensitive tothe deﬁnition of the binding site [53].In absolute calculations, this restraint can substitute orbe added to the restraint used to handle the standard statecorrection (Sec. 7.1.2). In the latter case, to compute the stan-dard state correction analytically, the bound-state restraintmust be turned oﬀ in the decoupled state. Alternatively,a ﬂat-bottom restraint can be activated also in the decou-pled state as long as the second restraint (e.g., a harmonicor Boresch restraint) prevents the ligand to hit the wall ofthe ﬂat-bottom potential [73]. Finally, even for tight binders,dissociation events can be enhanced by methods such asHamiltonian replica exchange [136, 138, 139] and expandedensemble [140, 141], especially in absolute free energy calcu-lations using harmonic or ﬂat-bottom restraints. In the lattercase, dissociations can be averted simply by increasing thespring constant and/or reducing the radius of the restraintpotential to prevent the exploration of ligand conformationsoutside the binding site in the decoupled state (bottom-rightstate in Fig. 6) that could be propagated to the bound state.

If the net charge of the system changes as the alchemicalvariable changes during the calculation, this can pose majorchallenges. Speciﬁcally, ﬁnite-size eﬀects can introduce sig-niﬁcant charge-dependent artifacts into computed bindingfree energies, in part because typical schemes for long-rangeelectrostatics (including PME and reaction ﬁeld) do not handlefree energy contributions from such changes eﬀectively oras they would be handled in a hypothetical macroscopic bulksolution [142–144].There are two main potential solutions to avoid artifactsdue to changes in net charge: Correcting for the introducedartifacts, or avoiding changing the net charge.Many relative free energy planning tools have been set upto avoid changing the net charge of the systems considered,including LOMAP [123] and Schrödinger’s FEP+ [15]. Absolutefree energy calculations can also potentially avoid changingthe charge of the system by making a charge perturbation

18 of 48

LiveCoMS Best Practices Guide B ... λ =0 λ =0.2 λ =0.8 λ =1 λ A B ABC D P ( p o t e n t i a l e n e r g y ) λ Figure 7.

Alchemical intermediates are created by making the po-tential energy depend on an additional variable (cid:126)λ that interpolatesbetween the chemical endpoints. In ( A ), at (cid:126)λ = 0 the molecule is afully interacting phenol and at (cid:126)λ = 1, a fully interacting benzene. ( B )shows an illustration of the probability distribution of the potentialenergies as the switching function takes values of (cid:126)λ = 0 to (cid:126)λ = 1.Intermediates states are required for a suﬃcient overlap in potentialenergies to estimate a free energy diﬀerence between (cid:126)λ = 0 and (cid:126)λ = 1. Soft-core potentials provide one of the most eﬃcient familiesof intermediate pathways, with a (cid:126)λ dependence. In ( C ) the poten-tial energy surface is coloured according to (cid:126)λ with blue being (cid:126)λ = 0and (cid:126)λ = 1 orange. In ( D ) the potential is coloured according to thepotential energy. Note how as (cid:126)λ approaches 0, the energy smoothlyapproaches zero at all r , a necessary requirement for eﬃcient andstable calculations. of equal and opposite sign elsewhere in the system; for ex-ample, as a charged ligand is removed, a charged counterionof opposite sign could also be removed, or one of the samesign could be inserted. This is sometimes referred to as an"alchemical ion" approach for dealing with the needed chargechange, and is also employed by the Yank free energy pack-age [136]. Charge corrections have also been explored, andare potentially a viable solution to this problem [145] whereartifacts introduced by ﬁnite-size eﬀects are corrected nu-merically [92, 143]. However, application of such correctionstypically remains less common than the use of a co-alchemicalion.When free energy calculations do need to change thecharge of a ligand or solute, the literature does not yet seemto indicate what approach should be preferable, so consider-able care should be taken. We are not yet aware of a carefulcomparison of charge corrections versus other approachessuch as decoupling an ion at the same time, so in our view theissue of proper handling of charge mutations in the contextof alchemical calculations remains a research problem. Both absolute and relative calculations must choose an al-chemical pathway connecting initial and ﬁnal states. In prin-ciple, because of the path independence of the free energy,any arbitrary pathway will give the correct free energy change,but the choice of pathway will greatly aﬀect the eﬃciency ofthe calculations.Some choices are particularly crucial—for example, trans-formations involving insertions or deletions of atoms shouldemploy a soft-core potential path for Lennard-Jones or otherinteractions with repulsive interactions that go to inﬁnite en-ergy at small radius [146–148].The key consideration for choosing alchemical pathwaysis that the intermediate states that a given pathway producesshould sample conﬁgurational ensembles that change asslowly as possible as (cid:126)λ changes, while still managing to gofrom the initial state to the ﬁnal state as (cid:126)λ goes from 0 to 1.Another way of stating this is that intermediate statesshould sample molecular conﬁgurations that are as similaras possible to their neighboring states. The more similar theconﬁgurations are between intermediate states, the lowerthe statistical uncertainty is in the estimate of free energy be-tween intervals. This can be proven directly from the BAR andMBAR formulas [25, 42], though the exact same principlesapply for TI. For a ’good’ path to work and give a sequence ofstates with maximally similar conﬁgurations, suﬃcient simi-larity in potential energies is required. Fig. 7 A and B illustratethis. Fig. 7 A shows in a pictorial way a soft-core potential canbe applied across diﬀerent (cid:126)λ s. Fig. 7 B illustrates the potentialenergy distributions at the diﬀerent (cid:126)λ intermediates, withsuﬃcient overlap between neighboring (cid:126)λ states to ensurethat reweighting estimators such as MBAR can be used foranalysis (see Sec. 8.3). The actual transformation is best han-dled with soft-core potentials of the form shown in Fig. 7 C and B , with more details given below.So what are the options to adjust the potentials betweenthe two end states based on (cid:126)λ ? The simplest possible alchem-ical pathway is a linear pathway: U ( (cid:126) q , (cid:126)λ ) = (cid:126)λ U ( (cid:126) q ) + (1 – (cid:126)λ ) U ( (cid:126) q ) (19)so-called because the dependence on (cid:126)λ is linear. Thisclearly satisﬁes the basic requirement that it gives the initialendpoint potential energy U ( (cid:126) q ) when (cid:126)λ = 0 and ﬁnal endpointenergy U ( (cid:126) q ) when (cid:126)λ = 1.For many energy terms, as long as a repulsive core remainson , this is a very good approach. For example, it can be shownthat if van der Waals repulsions are left on, then the linear ap-proach is very nearly the optimal path possible for changing,removing, or inserting the electrostatic energy terms, withthe path being within about 10–20% of the minimum possibleuncertainty [149] for a ﬁxed amount of simulation, as well as

19 of 48

LiveCoMS Best Practices Guidebeing nearly optimally eﬃcient for van der Waals attractiveterms with repulsion terms turned on [150]. Although weare not aware of any quantitative tests for dipolar or highermultipole terms, theoretically it should behave equally wellfor those systems.However, this approach ends up being terrible for remov-ing or adding repulsive potentials that go to inﬁnity quicklyat or near the origin. One way to look at this is to examinehow low (cid:126)λ values must go to reduce the energy at 0.5 σ (theatomic size parameter) down to 1 k B T , where thermal ﬂuctu-ations make it possible for other atomic sites to penetrateroutinely that deep. If we are trying to go from a particlebeing present, and desire to make it disappear alchemically,then if the repulsive terms are of the form (cid:15) σ r , then if (cid:15) was1 k B T at the temperature of interest, and we start with theparticle present, then solving for (1 – (cid:126)λ )(1 kB T ) (cid:18) σ σ (cid:19) = 1 k B T we get (cid:126)λ = 1 – 2 –12 ∼ (cid:126)λ = 1. How-ever, various analyses have shown that this is not a very goodstrategy [146, 148, 151–153].What we need instead is a function that smoothly getsrid of this inﬁnity. A large number of schemes have beentried [146, 150–155], but the most common strategy thatappears to be the best practice is to use a "soft-core" potential,of the form: U ( (cid:126) r ij , (cid:126)λ ) = 4 (cid:15) ij (cid:126)λ (cid:32) α (1 – (cid:126)λ ) + ( r ij / σ ij ) ) – 1 α (1 – (cid:126)λ ) + ( r ij / σ ij ) (cid:33) ,(20)where r ij is the distance between two particles i and j , (cid:15) ij and σ ij are the Lennard-Jones parameters corresponding tothe interaction between particles i and j , and α is a constant(0.5 is optimal for the speciﬁc functional form shown above).This functional form has exactly the property we are look-ing for: it recovers the Lennard-Jones potential when (cid:126)λ = 1,and the other endpoint ( (cid:126)λ = 0), it is exactly zero for all r ij everywhere, and as (cid:126)λ goes to zero, the α (1 – (cid:126)λ ) term lowersthe inﬁnite energy in the core. There are several diﬀerentvariants of the same functional form [146, 151, 152], but theone given in eq. 20 is easy to understand and implement andfairly numerically stable. This functional form is shown in C and D of Fig. 7.It has been shown that more complicated forms are notsigniﬁcantly more eﬃcient than eq. 20 [154]. We thereforerecommend using the softcore potential given in eq. 20, un- less there is a compelling reason otherwise. Using a similarequation to eq. 20 may be acceptable in most circumstancesif that is what is supported in your chosen software. However,if you are inserting or removing entire atomic sites, we heavilyrecommend against using the linear approach; it will be verydiﬃcult to get correct and converged results.So far in this section, we have discussed optimal ways ofdisappearing or appearing Lennard-Jones interaction sitesand turning on and oﬀ electrostatics terms. What aboutperforming both transformations at the same time? We cannot turn oﬀ the electrostatics linearly at the same time weturn oﬀ the Lennard-Jones terms, as it would leave inﬁnitelylarge attractive and repulsive electrostatic terms "bare" atsmall (cid:126)λ , resulting in the simulation crashing. It is possible toapply the same soft core approach to the Coulomb interactionand this is indeed done in a number of implementations, inwhich case it is important that the Coulomb interaction issoftened more rapidly than the Lennard-Jones interaction toavoid charge penetration issues, which can be tricky to ensurefor all types of perturbations [156].A safe but potentially more computationally expensiveapproach is to perform the transformations in sequence;ﬁrst, turning oﬀ all electrostatics for atoms that must beremoved, inserting and removing Lennard-Jones sites (boththe insertion and removal can be done simultaneously), andthen turning electrostatics for the introduced particles on.Again, If there are no removals or introduction to atomic sites,then it is reasonable to change the interactions in the ﬁrstand third steps linearly.Other issues, such as whether absolute calculationsshould retain or remove intramolecular non-bonded in-teractions through either annihilation [132, 134, 157–159]or decoupling [132, 160] must be considered. Reasonableeﬃciency can be often obtained with either choice even ifsome are somewhat better or worse than others, and there isno consensus on which is better in most given situations. Ourrecommendation is to leave the intramolecular interactionson during the transformation for simplicity if there are noother known issues with this approach. The key thing towatch out for is whether the total potential energy, andtherefore the intermediate ensembles sampled, changesmoothly from beginning to end. These problems can bediagnosed by noticing lack of conﬁguration space overlapbetween diﬀerent simulations (see Sec. 8.5).Relative calculations introduce additional choices, suchas the order in which to modify nonbonded interactions. Acommon process in single topology relative calculations isto ﬁrst remove electrostatic interactions of any atoms whichwill be deleted, then modify other non-bonded interactions,then restore electrostatic interactions of any atoms whichare being inserted. Although this is a simpler path to un-

20 of 48

LiveCoMS Best Practices Guidederstand cognitively and can take advantage of the soft-corepotential from eq 20, this can lead to more intermediatesteps and thus be more computationally expensive. Otherschemes, such as simultaneously changing electrostatic andLennard-Jones interactions with electrostatic “soft-core” po-tentials [161] may be implemented with fewer intermediatebut could require ﬁne-tuning of electrostatic and Lennard-Jones softcore parameters to avoid numerical instabilities. Atthe time of writing, there has not been conclusive evidenceto suggest one approach is better in general than the other,so discretion should be left up to the user as to what is vi-able from both hardware resources, and what the simulationsoftware supports.A key additional consideration in choosing the alchemi-cal pathway is the choice of spacing of intermediate states.The spacing depends to some extent on the choice of anal-ysis method, though states should essentially be spacedequidistant in the relevant thermodynamic length [162, 163].For BAR/MBAR techniques this means that states should bespaced so that the statistical uncertainties between neighbor-ing states be equal; [154, 164] Some schemes to adaptivelyoptimize the spacing of intermediate states based on ini-tial exploratory simulations have been proposed [165]. Formolecules changing in dense solvent, then the best path isroughly independent of molecule size and shape, so whatworks for one molecular transformation is likely to be rela-tively eﬃcient for another [166].

Though all alchemical simulations must sample from multiple (cid:126)λ states, diﬀerent approaches can be used to achieve this.Fig. 8 illustrates the four most common schemes. The sim-plest approach involves running an independent simulationat each of the predeﬁned (cid:126)λ values (see Fig. 8 A ). This typeof scheme is currently used for AMBER TI calculations [17]and for Sire as implemented in BioSimSpace [34]. However,if these simulations can be run simultaneously with commu-nication between them, a simple extension allows mixingbetween these replicas. In this approach, the simulation ateach (cid:126)λ can undergo periodic exchanges with neighboring (cid:126)λ values. This form of replica exchange (Hamiltonian replicaexchange) is based on ideas developed from Monte Carlo sim-ulations of spin glasses by Swendsen and Wang [167]. Withthe Metropolis-Hastings acceptance criterion for exchanges,the generated ensemble of all replicas still samples from theBoltzmann distribution, thus this approach has been used inmany diﬀerent contexts for molecular simulations [138, 168–170]. The basic idea of the replica exchange scheme is shownin Fig. 8 B . It is supported in various software packages thatprovide alchemical implementations, such as GROMACS [13], FEP+ [15], and NAMD [121]. A third approach borrows ideasfrom simulated tempering [171]. In this scheme a singlereplica rapidly explores all of (cid:126)λ space by working out optimalweights that allow switching between diﬀerent intermediate (cid:126)λ values, as seen in Fig. 8 C . This approach is also referredto as self adjusted mixture sampling [140, 141, 172] andwhile promising, has so far only been supported in OpenMMTools [173]. The last approach makes use of non-equilibriumsimulations [7]. In this approach, only end state (cid:126)λ replicas( (cid:126)λ =0, (cid:126)λ = 1) are simulated at equilibrium; intermediate infor-mation is generated from non-equilibrium simulations thatrapidly transition between end-states. This approach is avail-able in GROMACS and appears to be coming online in severalother packages. A schematic of this approach is shown inFig. 8 D .Currently, we recommend using Hamiltonian replica ex-change type sampling schemes (Fig. 8 B ). If these are notavailable in the code of choice, running independent sim-ulations at diﬀerent (cid:126)λ values can be acceptable, especiallywhen conformational sampling is fast (Fig. 8 A ). Single replicaschemes and non-equilibrium schemes are not as establishedyet, but are very promising. Before launching alchemical free energy calculations it iswise to consider how convergence and completion will beassessed. Diﬀerent conditions on when to stop alchemicalfree energy calculations should be determined, and this mayrequire several iterative checks and therefore modiﬁcationsto the calculation protocol. One useful metric to use for termi-nation is the expected or desired uncertainty of a desired freeenergy estimate, though care must be exercised should theuncertainty estimate prove unreliable. In particular, if the rateof change in the free energy estimate is signiﬁcant when thiscondition is met, the simulation may not be locally converged,and more sampling may be necessary to determine a stablefree energy estimate which is no longer changing signiﬁcantlyover time. However, this is not the only metric which shouldbe used, as the uncertainty only captures the informationabout the sampled phase space, not necessarily the entiretyof the phase space. For example, convergence of relative freeenergy calculations in predictive simulations where the en-tire phase space is not known in advance, requires samplingthe diﬀerent kinetically stable states [75]. This highlights theimportance of choosing the correct thermodynamic path toensure you sample the required thermodynamic states asdiscussed in Sec. 7.2.3.The condition of minimizing the statistical uncertainty ofdiﬀerent free energy estimators below a suﬃcient thresholdshould be one metric monitored over the simulation. This

21 of 48

LiveCoMS Best Practices Guide

ABCD … λ λ λ Ν … λ N-1 λ λ λ Ν λ N-1 λ λ λ Ν λ N-1 ……… … …… n e q M D ns/µs https://link.springer.com/protocol/10.1007%2F978-1-4939-8736-8_2 λ λ Ν … ns P(-W)P(W)

Figure 8. Four most common sampling strategies. ( A ): Multiplereplicas in parallel at diﬀerent lambda states. Each arrow symbol-ises an independent (cid:126)λ simulation. ( B ): Hamiltonian replica exchangescheme. Each arrow represents a short simulation interval beforean exchange through Metropolis Hastings acceptance (dice) is at-tempted. A tick means an accepted exchange a cross a rejectedexchange. ( C ): Single replica scheme sampling from all (cid:126)λ states. Aftera short simulation time symbolised by the arrow, the lambda-stateis attempted to change until all N lambda states will be sampled.( D ): Non-equilibrium sampling scheme, where two equilibrium sim-ulations at the end-states are run as indicated by the blue and pinkarrow. Non-equilibrium simulations are attempted at intervals toswitch between the two end-states. can be done through the uncertainty estimator built into cer-tain analysis tools such as MBAR, or can be done thoughmore general statistical tools like bootstrap sampling. A tar-get statistical uncertainty should be chosen at the onset ofthe simulation to avoid excessively long simulations, or fallinginto the trap of running until the free energy estimate is "goodenough," which is subjective and has no deﬁned criteria. Thiscould be a ﬁxed value such as 0.20kcal/mol, or a functionalquantity such as "below 0.5kcal/mol and 10% of the free en-ergy estimate." The user does not need to monitor this in-formation in real-time and can choose to run simulations forﬁxed duration (either time or number of samples) and runanalysis on the data collected thus far. If more samples areneeded, the simulations can be resumed, or, started again indiﬀerent initial conditions.Convergence in other alchemical observables should alsobe monitored to determine if the deﬁned phase space hasbeen suﬃciently sampled and enough decorrelated sampleshave been drawn. These additional observables include, butare not limited to, the variance in dUd (cid:126)λ across all (cid:126)λ values, cal-culating the variance in free energy using bootstrap analysis,and comparing diﬀerences in free energies calculated usingdiﬀerent percentages of the simulation in both the forwardand reverse directions ( see Fig. 9).Each of these metrics have demonstrated promising re-sults for diagnosing when a simulation has a convergenceissue beyond simple convergence of uncertainty estimate.Results obtained from calculations with convergence issuesshould be checked for errors or run for longer before anyconﬁdence should be placed in conclusions drawn from theiranalysis. In relative calculations that share similar bindingmodes, for example, and do not induce large conformationalchanges when in complex with protein, the need to sampleexhaustively to converge estimates in free energy diﬀerencesis often not necessary due to the locality of sampling changesin the molecular topology and shared phase space of thecore atoms. However, even subtly induced changes in proteinbinding conﬁguration will require more sampling or cause lo-cal convergence to a free energy estimate that has high error.The conﬁdence a user should have in a free energy estimateis signiﬁcantly improved when both the uncertainty of thefree energy estimate is low, and when other observables havereached a convergence.The uncertainty in the free energy, for example, hasmultiple ways to be estimated, e.g. through standard errorpropagation methods (including MBAR’s estimator, which isbased on the same principles as standard error propagation),through bootstrap methods, through multiple independentruns, etc. Independent of how the property is estimated,its important to remember that they are estimations of theproperty , not the true underlying property itself. These

22 of 48

LiveCoMS Best Practices Guide

A BC DE F

Figure 9. Free energy (in k B T) for two diﬀerent relative bindingfree energy perturbations.

Each plot shows the estimated freeenergy change using a varying fraction of total simulation time (upto 5 ns total). Subplots ( A ), ( C ), and ( E ) show a three step protocolfor a perturbation involving 3 perturbed atoms, while ( B ), ( D ), and ( F )shows the same protocol for a perturbation involving 10 perturbedatoms. The ﬁrst step of the protocol is the decharging then removingvan der Waals interactions and then recharging. The diﬀerence inenergy between the forward (blue) and reverse (red) free energycalculations at the midpoint of the simulation time gives an indicationof the overall convergence of the simulation, with diﬀerences over 1k B T indicating poor convergence.

Figure 10. Potential of mean force with respect to (cid:126)λ for TI andMBAR

The estimated PMF for a bound calculation of a Tyk2 ligandpair of the Wang et al. [15] with respect to (cid:126)λ estimated from TI andMBAR and showing agreement within errorbars. estimators are usually consistent estimators, meaning theywill converge to the true answer in the limit of suﬃcientsampling, not necessarily unbiased ones though. As such, it isa good idea to subject diﬀerent estimators to the same datato see if they yield either the same estimate (within error andbias), or if they ﬂuctuate wildly. See for example the potentialof mean force with respect to (cid:126)λ estimated from a boundsimulation of a Tyk2 ligand pair of Wang et al. [15] for boththe MBAR and TI estimators, as seen in Fig. 10. This is nota perfect method as some estimators, such as exponentialaveraging, will converge signiﬁcantly more slowly, relative tomore accurate estimators like MBAR. Therefore, it is a goodidea to apply the estimators to diﬀerent fractions of the datato see if the main estimator of free energy you have chosenis stable.Each method requires diﬀerent data from the simulationbe collected. If, for instance, the free energy estimator se-lected is thermodynamic integration, then values of dUd (cid:126)λ atuncorrelated data points must be collected. Once a combina-tion of knowing what type of simulation you will run, whichalchemical topology you will simulate, what alchemical pathyou will simulate along, and what your stopping conditionsare, then you are ready to enumerate the information youshould capture. Below is a sample of the minimal informationyou need for a set of common estimators (discussed in moredetail in Sec. 8.3): • Thermodynamic Integration (TI) requires ∂ u ( (cid:126) q ) ∂(cid:126)λ . • Exponential Averaging (EXP) needs either ∆ u k , k +1 ( (cid:126) q ) or ∆ u k , k –1 ( (cid:126) q ), depending on the direction its being evalu-ated in.

23 of 48

LiveCoMS Best Practices Guide • Bennett Acceptance Ratio (BAR) needs both ∆ u k , k +1 ( (cid:126) q )and ∆ u k , k –1 ( (cid:126) q ). • Weighted Histogram Analysis Method (WHAM) and Mul-tistate Bennett Acceptance Ratio (MBAR) both need thecomplete set of ∆ u k , j ∀ j = {1... K }. WHAM must have thisinformation binned.The potential derivative required for TI should generallybe calculated during the simulation; only under very rare cir-cumstances [149] can it post-processed by a code that doesnot evaluate the derivatives. Many codes already have op-tions for doing this. If that option is unavailable, you canestimate it through ﬁnite diﬀerence (if suﬃcient informationis collected), but this will introduce signiﬁcant error, and isgenerally not a best practice. The BAR estimator may be abetter, and simpler choice at that point as you will have atleast the same level of information. The potential energydiﬀerences required for EXP, BAR, MBAR, and WHAM can becalculated either during the simulation or in post-processing.It is recommended to calculate the potential diﬀerences incode when possible to avoid extra overhead and possibleerrors produced by running the simulation twice, and to re-duce the amount of stored information. Although TI mustusually be calculated in code, as it requires the derivative,there is one condition under which it actually has the fastestcomputation time. If the alchemical path you have chosen isa linear alchemical path, then you get dud (cid:126)λ = u ( (cid:126) q ) – u ( (cid:126) q ), whichis the diﬀerence between the initial and ﬁnal states. However,because of the problems with linear paths already discussedin this paper, this simpliﬁcation is rarely that useful.Free energy information should generally be saved morefrequently than coordinate data, approximately at the ratethat uncorrelated samples are produced. The on-disk sizeof the data for free energy estimation is often signiﬁcantlysmaller than full atomic coordinates, so the informationshould be collected frequently. However, the informationshould not be collected every time step, as most free energytechniques are operated at equilibrium, and need equili-brated and decorrelated samples for an unbiased estimate.A sample collected every time step will likely result in mostsamples being discarded due to decorrelation routines in theanalysis. However, if it is computationally cheap and diskspace is plentiful, do save often. One may safely assumethat the correlation time is greater than 100-200 fs even forrelatively simple systems such as small molecules in solvent,so saving no more frequently than every 50-100 steps isrecommended. How decorrelation impacts calculations, andhow to compute it is discussed in Sec. 8.2. In a discovery setting, new ligands can have unknown or atleast uncertain binding modes [100, 174–176], complicatingbinding free energy estimation. This uncertainty is becauseit is usually not desirable to estimate a binding aﬃnity for aligand which already has an available bound structure, sincesuch a compound has already been tested. To deal withprospective ligands with unknown binding modes, discoveryprojects commonly assume that modiﬁcations of functionalgroups on a common scaﬀold result in a consistent bindingmode across all members of a series. This is not necessarilyalways the case [100], as reviewed elsewhere [175] and insome cases unexpected binding mode changes can be theorigin of apparent non-additivity in structure-activity relation-ships [176]. Binding modes also tend to be particularly vari-able in the case of fragments, which often may have multiplerelevant binding modes [177].Absolute free energy calculations for dissimilar ligands canhave particular challenges because the (potentially incorrect)assumption of consistent binding modes across a series ofsimilar ligands is likely to be even less robust than the in thecase of relative calculations. This means that researchers per-forming absolute binding free energy calculations will haveto pay particular attention to generating reasonable putativebinding modes.In some cases, it is tempting to simply use docking tech-niques to generate initial bound structures for starting molec-ular dynamics simulations. However, timescales for bindingmode interconversion are usually slow compared to MD/freeenergy timescales, meaning that simulations started fromdiﬀerent potential binding modes are likely to yield disparatecomputed binding free energies [46, 75, 132, 178] . And dock-ing techniques are good at identifying sterically reasonablepotential binding modes, but still perform relatively poorly atidentifying a single dominant binding mode a priori .It is worth highlighting a recent SAMPL blind challenge onHIV integrase as an illustration of this. Many submissions,using state-of-the-art methods, had diﬃculty even predictingwhich binding site ligands would bind in (most submissionsplaced more than half of the ligands into the incorrect bindingsite), and even given correct binding sites, the binding modewithin each site was also quite diﬃcult to predict [120]. Thebest performing submission for predicting binding modesactually ended up being a human expert (aided by computa-tional tools) with more than 10 years of experience on the par-ticular target [179], rather than a fully automated approach.While free energy calculations on this set had some success,many of the failures actually ended up being cases wherethe binding mode selected as input for free energy calcula-

24 of 48

LiveCoMS Best Practices Guidetions was later found to be incorrect [180], highlighting theimportance of these issues.One approach which has shown some success is to re-tain diverse potential binding modes from docking, performshort MD simulations of these to identify distinct stable bind-ing modes, and then consider these in subsequent calcula-tions [12, 132, 180–182].Routes to handle multiple potential binding modes arediﬀerent depending on whether absolute or relative calcula-tions are selected, unless a method is available to estimatethe relative populations of diﬀerent stable binding modes inadvance (e.g. such as the BLUES approach currently in devel-opment [46]), in which case this approach could be appliedto assist both types of calculations.

Handling multiple potential binding modes withinabsolute calculations.

Within absolute binding free energy calculations, multiplepotential binding modes can be handled by two main strate-gies: Consider each binding mode separately (a separationof states strategy) or sample all binding modes within a sin-gle simulation [75]. This couples to the choice of restraintsselected, as some restraints will allow transitions betweenbinding modes and even binding sites (Sec. 7.1.2), and othersdo not.Sampling all potential lignad binding modes within a sin-gle free energy calculation is usually impractical without someform of enhanced sampling or at least Hamiltonian replicaexchange [136] because barriers for binding mode intercon-version result in kinetics which are too slow compared tosimulation timescales [46, 75, 132, 178]. Hamiltonian ex-change, coupled with appropriate restraints, can allow theligand to relatively rapidly exchange between potential bind-ing modes when non-interacting, accelerating sampling ofbinding modes [136]. However, it is not always clear that thisis desirable, since this also increases the size of the conﬁgura-tion space which must be sampled even if the binding modeis known.Separation of states provides a simple though potentiallyexpensive alternative, where each stable binding mode isconsidered separately with a binding free energy calculationrestricted to that binding mode, and then (as long as the bind-ing modes are non-overlapping) the resulting componentbinding free energies can be combined into a total [75, 132].This approach necessitates a separate binding free energy cal-culation for each potential binding mode, however, so it canbe computationally quite costly. If relative populations of dif-ferent stable binding modes were available from some othertechnique, it could make this separation of states approachconsiderably more eﬃcient [46, 75].

Handling multiple potential binding modes withinrelative calculations.

Multiple potential binding modes pose particular problemsfor relative free energy calculations, as having multiple start-ing structures for these calculations could yield substantiallydiﬀerent calculated relative binding free energies for thesame transformation due to kinetic trapping, and, withoutadditional information (speciﬁcally, the free energy of bindingmode interconversion or, equivalently, the relative popula-tions of diﬀerent binding modes) it becomes impossible tosort out which of the multiple answers is in fact the correctrelative binding free energy.To deal with this, some practitioners have actuallycomputed relative binding free energies of diﬀerent bindingmodes of the same ligand [178]. For example, a perturbationwhich adds a methyl to an aromatic ring of a larger ligandmight yield one result if the methyl points in one direction,and a diﬀerent value if it points in the other due to slowring motions [183, 184]. One could compute the free energyof turning oﬀ the methyl group in one orientation andturning it back on in the other orientation to obtain the freeenergy diﬀerence between the two potential binding modes.While this approach has precedent, it is relatively diﬃcult toautomate at present and requires considerable care.Overall, this likely means that relative free energy calcu-lations will be susceptible to problems resulting from uncer-tainty in ligand binding modes until more robust approachesare available to determine dominant binding modes, or therelative populations of diﬀerent potential binding modes, inadvance.

Once data has been collected from alchemical intermediates,it must be analyzed to produce an estimate of the free energychange (and its associated statistical uncertainty) for eachleg of the thermodynamic cycle. While a number of diﬀerentestimators are available that will give consistent results underoptimal circumstances, some approaches are recommendedover others due to their robustness and ability to provideinformation on poor convergence.

Much of the infrastructure for analyzing alchemical free en-ergy calculations relies on the concept of asymptotically un-biased estimators, which produce unbiased estimates of thefree energy when fed very long simulations [185]. In real-ity, free energy calculations are often initiated from highlyatypical initial conditions (such as a protein-ligand geometryobtained from docking and subjected to a heuristic solvent

25 of 48

LiveCoMS Best Practices Guideplacement scheme), and simulations are of a ﬁnite length dic-tated by available computational resources and computingdemands. As a result, these estimators can produce signiﬁ-cantly biased estimates if fed the entirety of simulation datagenerated without further processing [186].To minimize this eﬀect, an initial portion of the simulationis often discarded to equilibration [40], with the idea of remov-ing the most heavily biased initial portion of simulation databut retaining the unbiased production region that representsa stationary Markov chain process sampling from the desiredequilibrium target distribution. Because the simulation timerequired for the atypical initial sampler state to relax towardequilibrium is a property of the speciﬁc system being simu-lated and the speciﬁc initial conditions selected, it is simplestto collect data for the whole process and use an automatedalgorithm to select how much data should be discarded toequilibration in a post-processing step.A simple approach to automatically partitioning simula-tion data into equilibration and production regions is de-scribed in [186] (illustrated in Fig. 11). Suppose we have asimulation of length T consisting of correlated data. Here,the goal of the post-processing step is to select the equilibra-tion boundary t ∈ [0, T ] so as to maximize the number ofeﬀectively uncorrelated samples remaining in the productionregion N [ t , T ] , which is deﬁned as N [ t , T ] = T – t g [ t , T ] (21)where g [ t , T ] is the statistical ineﬃciency of a timeseries a t ,described in more detail below. Conveniently, this procedurealso produces the information necessary to decorrelate thesimulation data for estimating the free energy diﬀerences, arequisite next step in analysis. This approach is implementedwithin the MBAR [187] and alchemlyb [188] packages, and ishighly recommended for standard practice.For additional discussion of working with correlated dataand autocorrelation analysis, please refer to the work onBest Practices for Quantiﬁcation of Uncertainty and SamplingQuality in Molecular Simulations [41]. Computing the timeseries for equilibration detection

Typically, the timeseries of note a t analyzed in automatedequilibration detection is the negative logarithm of the proba-bility density ( π ( x t ; (cid:126)λ )) sampled by the MCMC algorithm (up toan irrelevant additive constant). For simple independent sim-ulations that sample x t ∼ π ( x ; (cid:126)λ ), this is given by the reducedpotential a t ≡ – ln π ( x t ; (cid:126)λ ) + c = u ( x t ; (cid:126)λ ). (22)Note that the use of the eﬀective reduced potential isnot guaranteed to pick up on all slow relaxation processes ABCDE

Figure 11. Automatic partitioning into equilibration and produc-tion regions. ( A ) The average (black line) standard deviation (shadedregion) of the reduced potential u ∗ over many independent replicatesimulations started from the same initial conditions show a signiﬁcantinitial transient change before relaxing to the true average potentialenergy ( B ). A cumulative average (red) of the entire simulation datademonstrates simulation bias not seen when initial simulation data isomitted (blue). Using an automated approach to detect equilibrationof the boundary t using statistical ineﬃciency g ( C ) for an eﬀectivesimulation interval ( D ). ( E ) The optimal equilibration boundary t isselected to maximize the number of uncorrelated samples. Figureadapted from [186].

26 of 48

LiveCoMS Best Practices Guidethat may be coupled to the alchemical free energy, but thesimplicity of its computation means it is generally appropriatefor most cases.

Cautions in automating equilibration detection

For simulations that are simply not long enough to containa large number of samples from true equilibrium either be-cause they are very short or contain slow processes, thisprocedure cannot completely remove the bias. In such cases,this approach simply selects the ﬁnal portion of the the sim-ulation, which may be contained in a single substate of con-formational space, and may itself lead to biased estimates.This situation can be detected if the equilibration boundary t is a signiﬁcant fraction of the total simulation length T , witha good rule of thumb being that T (cid:38) t . If this is not pos-sible, advanced analysis techniques that assume only localequilibrium (rather than global equilibrium) such as the TRAMestimators [27, 28, 189] may be more appropriate, but arebeyond the scope of this paper. Computing the statistical ineﬃciency

Most estimators require an uncorrelated set of samples fromthe equilibrium distribution to produce (relatively) unbiasedestimates of the free energy diﬀerence and its statistical un-certainty. To do this, the production region of the simulationis generally subsampled with an interval approximately equalto or greater than the statistical ineﬃciency g ≥ g ≡ τ eq (23)where τ eq is the integrated autocorrelation time, formallydeﬁned as τ eq ≡ T –1 (cid:88) t =1 (cid:18) tT (cid:19) C t , (24)with the discrete-time normalized ﬂuctuation autocorrelationfunction C t deﬁned as C t ≡ (cid:104) a n a n + t (cid:105) – (cid:104) a n (cid:105) (cid:10) a n (cid:11) – (cid:104) a n (cid:105) . (25)The basic concept is that τ eq corresponds to the single-exponential decay time for the autocorrelation processthat generates samples, so the statistical ineﬃciency g measures the approximate temporal separation betweentwo eﬀectively uncorrelated samples (where two exponentialrelaxation times are presumed to be suﬃcient).Robust estimation of C t for t ∼ T is diﬃcult due to growthin statistical error, so common estimators of g make use ofseveral additional properties of C t to provide useful estimates (see Practical Computation of Statistical Ineﬃciencies in [186]for a detailed discussion).We recommend using the robust statistical ineﬃciencycomputation routines available within the MBAR [187] andalchemlyb [188] packages.

Subsampling data to generate uncorrelated samples

Once the statistical ineﬃciency g has been estimated, it isstraightforward to subsample the correlated timeseries sim-ulation data to produce eﬀectively uncorrelated data thatcan be fed to the free energy estimators. Suppose the cor-related timeseries is { a t } Tt =1 ; we can form a new timeseriesof N eﬀ ≈ T / g eﬀectively uncorrelated samples by selecting asubset of indices { t = round(( n – 1) g ) | n ∈ range(1, . . . , N ) }where round(x) denotes rounding to the nearest integer.If independent simulations are used, the alchemical state (cid:126)λ may have a signiﬁcant impact on the correlation time, andthese simulations should be subsampled independently us-ing a separate estimate of the statistical ineﬃcinecy g for eachalchemical state. If coupled simulations are used (such as aHamiltonian replica exchange simulation), the replicas shouldundergo equivalent random walks in alchemical space, andthe replicas can be can be subsampled with the same g togenerate an equal number of uncorrelated samples at eachalchemical state. Conveniently, the approach described abovefor automated equilibration detection produces an appropri-ate estimate of g over the production region for automatingthis process. Cautions and considerations

Reliable estimation of the statistical ineﬃciency is diﬃcult,and estimates will not generally be as precise (in a relativeerror sense) as averages. To ensure there is suﬃcient dataavailable for reliable decorrelation and estimation of free en-ergy diﬀerences, it is recommended that the eﬀective numberof uncorrelated samples N eﬀ ≥≈

50 if the BAR or MBAR es-timators (discussed below in sec 8.3) are used; the numbermay need to be much higher with alternate estimators.

Free energy diﬀerences between two diﬀerent states diﬀer-ing in the energy function are directly related to the ratio ofprobabilities of those states. As can be noted, the partitionfunctions in Eq. 5 are simply the total accumulated probabili-ties for all possible conﬁgurations of the system. Virtually allof the ways to estimate this free energy are based in convert-ing this ratio of integrals to something that can be measuredin one (or several) simulations.

The Zwanzig relationship (EXP)

The simplest method for calculating free energy diﬀerencesfrom simulations is the so-called

Zwanzig relationship [1], also

27 of 48

LiveCoMS Best Practices Guidecalled one-sided exponential re-weighting (EXP), or simply freeenergy perturbation, though this ﬁnal term is sometimes usedto encompass all ways of calculating free energy diﬀerences.The (reduced) free energy diﬀerence ∆ f between aninitial state 0 and a ﬁnal state 1 deﬁned by two diﬀerentpotential energy functions u ( (cid:126) q ) and u ( (cid:126) q ) over coordinatespace (cid:126) q can be calculated as: ∆ f = ln (cid:68) e –( u ( (cid:126) q )– u ( (cid:126) q )) (cid:69) = ln (cid:68) e – ∆ u ( (cid:126) q ) (cid:69) (26)and the average is over all samples from the simulation per-formed with u . In the case of NVT (canonical) sampling andassuming the masses do not change, then u is simply U / k B T ,and f is F / k B T , but it can be generalized to other ensembleswith the proper deﬁnition of f and u . Described in words, wetake the samples generated during our run with the potentialenergy function u ( (cid:126) q ) and calculate what the diﬀerence inenergy would be if we switched to potential energy function u ( (cid:126) q ), and average the exponential of the negative energydiﬀerence to get the negative of the exponential of the freeenergy diﬀerence. The original distributions, P( u ) as gener-ated at (cid:126)λ = 0 and P( u ) would look like those seen in Fig. 8 A - C on the left hand side. Reevaluating requires almost noextra code functionality to perform; one need only to savea full precision trajectory, and run an unmodiﬁed molecularsimulation code using the u in order to calculate the newenergies of stored snapshots. The analysis can be writtenin a line of code. We note that this method is even moregeneral, in that the instantaneous work to change the po-tential energy function from u to u can be replaced by thenon-reversible work W to make the same change under thesame equilibrium conditions at either end state [190–192],though we do not go into all of the details of non-equilibriumtransformations here, and refer the reader to more advancedtreatments [70, 193–196].Although the Zwanzig equation is formally correct (as longas the two states considered sample the same phase spacevolume, which is true for standard molecular models), it hassome very important numerical issues that mean that it oftenperforms badly for standard free energy calculations, evenfor small molecules [185, 197]. One can show that if thestandard deviation of the diﬀerence ∆ u ( (cid:126) q ) = u ( (cid:126) q ) – u ( (cid:126) q ) overall sampled (cid:126) q is large (which in this case, means only severaltimes k B T ), then very few samples contribute to the average,and the answer will be both biased and extremely noisy [198].Essentially, the method is dominated by contributions of raresnapshots [68, 69, 199]. The Bennett Acceptance Ratio (BAR)

If we have the diﬀerences in the potential energy sampledfrom the distribution deﬁned by u to the state deﬁned by u , and we also have the diﬀerences in potential energies from the distribution sampled by u to the state deﬁned by u , we can obtain a signiﬁcantly improved estimate of thefree energy diﬀerence compared to that obtained by EXP. Thisestimate was ﬁrst derived by Bennett and is hence generallycalled the Bennett Acceptance Ratio (BAR). It is solved byﬁnding the reduced free energy f ij that satisﬁed the followingimplicit equation: n i (cid:88) i =1

11 + exp[ln( n i n j ) + u ij ( (cid:126) q ) – f ij )]= n j (cid:88) i =1

11 + exp[ln( n i n j ) – u ij ( (cid:126) q ) + f i j )] , (27)where n i and n j are the number of samples from each state.More recent derivations show that this formula is the maxi-mum likelihood estimate of the free energy diﬀerence givensets of samples from the two states [70].Many studies have demonstrated both the theoretical andpractical superiority of BAR over EXP in molecular simula-tions [185, 197], and BAR converges to EXP in the limit that allsamples are from a single state [25, 25, 70]. BAR also requiressigniﬁcantly less overlap between the conﬁgurational spaceof each state to converge than EXP, though some overlapmust still exist.The Bennett acceptance ratio is only deﬁned between twostates. Usually, the endpoints of interest in a free energycalculation are suﬃciently diﬀerent that we will need a chainof states that gradually change the potential energy functionfrom u to u , as discussed in Sec. 7.2.3. You can simply carryout BAR between each pair of states ∆ f → N = ∆ f → + ∆ f → + . . . + ∆ f N –1 → N .There is one important thing to note about the uncertaintyestimates when summing multiple free energies together tocalculate an overall free energy estimate. Although BAR itselfgives a free energy estimate that is asymptotically correct inand is much less biased than the uncertainty estimate for EXP,the uncertainties in ∆ f i –1 → i and ∆ f i → i +1 are not uncorrelated,because they both involve the energies u i ( (cid:126) q ). The variancesof each of the free energies will not propagate as variancesusually do (in quadrature) into the variance of the overallfree energy. Instead, some other method for propagating theuncertainty, such as bootstrapping [41] must be used. Thermodynamic integration (TI)

By taking the derivative of the free energy with respect to thevariable (cid:126)λ , we ﬁnd that: dfd (cid:126)λ = dd (cid:126)λ (cid:34) – ln (cid:90) exp – u ( (cid:126)λ , (cid:126) q ) Z ( (cid:126)λ ) d (cid:126) q (cid:35) = (cid:42) du ( (cid:126)λ , (cid:126) q ) d (cid:126)λ (cid:43) (cid:126)λ . (28)And than we can then numerically integrate df / d (cid:126)λ over analchemical transformation, using a range of diﬀerent well-

28 of 48

LiveCoMS Best Practices Guideestablished techniques, to obtain: ∆ f = (cid:90) (cid:42) du ( (cid:126)λ , (cid:126) q ) d (cid:126)λ (cid:43) (cid:126)λ d (cid:126)λ . (29)This approach to calculating the free energy is called ther-modynamic integration (TI). Averaging over (cid:68) dud (cid:126)λ (cid:69) requiresfewer uncorrelated samples to reach a given level of relativeerror than averaging e – u ( (cid:126) q ) , as the distribution of values isusually narrower, with a more Gaussian shape to the distri-bution. Rather than being limited by overlap, as in the caseof BAR and MBAR (see below), we are instead limited by thebias in the numerical quadrature, which must be minimizedsuﬃciently to be beneath the level of statistical noise.Various numerical integration schemes are possible, butthe trapezoid rule provides a simple and robust scheme. Alltypes of numerical integration can be written as: ∆ f ≈ K (cid:88) k =1 w k (cid:42) du ( (cid:126)λ , (cid:126) q ) d (cid:126)λ (cid:43) k ,where the weights w k correspond to a particular choice ofnumerical integration. Researchers have tried a large num-ber of diﬀerent integration schemes [200–202]. However,many other integration choices require speciﬁc choices of (cid:126)λ to minimize bias, which makes them unsuitable when theintermediates have widely-varying levels of uncertainty. Forexample, integrating a cubic spline interpolation providednegligible beneﬁts over a simple trapezoid rule [203]. Forstarting researchers, we therefore recommend the simpletrapezoid rule scheme, as it allows for maximal ﬂexibility inwhich values of (cid:126)λ are simulated. As ﬁtting to higher orderpolynomials can have numerical instabilities for some energyfunctions, and because alternate functional forms might onlybe appropriate with some types of transformations, expertiseand experience is required to perform such numerical integra-tion modiﬁcations. In practice, adding 2-3 more intermediatestates is typically suﬃcient to match the performance of thesemore complicated numerical quadrature schemes.One drawback of TI is that it requires derivatives with re-spect to (cid:126)λ to be calculated directly in the code. Unfortunately,many problems of interest require using pathways (such asthe soft-core pathways, for removing repulsive interactions)that are not linear, as we discuss, making this more complex.Still, if the code of interest does compute dud (cid:126)λ , then TI is per-haps the simplest method to use, as it involves a very littlepost-processing eﬀort. The multistate Bennett acceptance ratio (MBAR)

One can generalize Bennett’s logic from two states to multiplestates to obtain a free energy estimator that uses energydiﬀerences between conﬁgurations at all intermediate states to compute free energy diﬀerences between all states. MBARgives a system of implicit equations for the free energies f i : f i = – ln N (cid:88) n =1 exp(– u i ( (cid:126) q n )) (cid:80) Kk =1 N k exp( f k – u k ( (cid:126) q n )) , (30)where there are N k samples from each of K states, with (cid:80) k N k = N the total number of samples. Thus, we needto evaluate the energy function u i for all samples obtained atall states in the transformation. The equations can be solvedby a number of diﬀerent standard routines. We note thatthere are only K – 1 independent equations, so only K – 1 ofthe free energies are independent variables, and one of the f i must be speciﬁed (usually, without loss of generality, settingit to zero).MBAR is provably the lowest variance asymptotically unbi-ased estimator of the free energy given the energies of thesamples [204], which means that BAR is also the lowest vari-ance estimator for the free energy diﬀerence between only2 states, as it is mathematically exactly the same as MBAR inthis case. MBAR also provides an uncertainty estimate, de-rived from standard error propagation methods for implicitfunctions, which has been shown to be highly accurate aslong as there are suﬃcient samples at each state [203].MBAR can also be thought of as the Zwanzig estimatorof the free energy to state i where the sampled distributionis the mixture distribution of all the other samples thrown to-gether in one “pot”, deﬁned by p m ( (cid:126) q ) = N –1 (cid:80) k N k exp( f k – u k (cid:126) q ),which is the weighted average of all the individual normal-ized probability distributions from the simulations that areperformed. [205]. Recommendations • We recommend MBAR if all energy diﬀerences are avail-able. It is the lowest variance unbiased free energyestimate given samples from multiple states. • BAR is essentially just as good as MBAR for highly opti-mized (cid:126)λ intermediates. Speciﬁcally, if the (cid:126)λ s are chosensuch that intermediate states have moderate overlapwith their neighbors (i.e. between i and i + 1 and be-tween i and i – 1, they will not have signiﬁcant overlapwith their next nearest neighbors i + 2 and i – 2. ThusMBAR does not actually get signiﬁcant information fromthese energy diﬀerences, so one might as well not evencalculate them, and just perform BAR between nearestneighbors. [203] • TI usually gives similar values as MBAR implementedwith suﬃcient numbers of intermediates, but quadra-ture errors are hard to estimate beforehand can occurif one is not careful. [203] • WHAM is an approximation to MBAR, and there are nocompelling reasons it should be used. If careful, it is not

29 of 48

LiveCoMS Best Practices Guidenecessarily much worse than the other methods, but italways introduces some degree if binning error. • Other variants, especially ones that adaptively deter-mine the free energies can be useful in certain circum-stances but beyond the scope of a Best Practices article.

It is important to consider the variation in your computedfree energies from your equilibrium simulations, in order toobtain an estimate of uncertainty of the obtained value forthe free energies of interest. A recent best practices paper byGrossﬁeld et al., [41] provides substantial detail on how to esti-mate uncertainties from molecular simulations and is a goodstarting point for this topic. In general, the quantiﬁcation ofdiﬀerent error metrics depends on both data generation andanalysis methods used from the ones discussed above.The computation of free energies using TI (Sec.n 8.3)is straightforward and the trapezoidal rule is often recom-mended since it allows unequal spacing of (cid:126)λ states, which isrequired to minimize the variance in the free energy estimate,but in principle any good numerical integration method canbe used. The determination of regions of high curvaturewhen estimating the integral is helpful to determine regionsof phase space where more sampling and/or more (cid:126)λ statesare necessary to obtain the best approximation of theintegral. Plotting (cid:126)λ with respect to the gradients at each ofthe (cid:126)λ values can be be a helpful diagnostic. Additionally,computation of the overall variance of TI requires thecalculation of the overall variance of integration, ratherthan each individual ∆ G i , i +1 and assuming variances addindependently. Therefore, var( ∆ f) = (cid:80) Ki =1 w k var( dud (cid:126)λ ) k .For alchemical changes that result in smooth, low curva-ture sets of (cid:68) dUd (cid:126)λ (cid:69) , a relatively small number of (cid:126)λ states isnecessary for suﬃcient accuracy and low variance in the freeenergy estimate. Depending on the diﬃculty of the pertur-bation, the bias introduced by discretization of the integralcan become large due to increased curvature, and more (cid:126)λ intermediate states become necessary to reduce error. It isrecommended that researchers verify that a suﬃcient num-ber of states are included such that the free energy is essen-tially invariant to the number of lambda intermediate stateschosen. Good heuristics or measures to assess the ’diﬃculty’of a given perturbation is still an ongoing research topic.Compared with TI, the MBAR method (Sec. 8.3) discussedabove provides uncertainty estimation directly from solving aset of linear equations to compute the variances between allstates. The number of states and amount of sampling shouldbe optimized to minimize the uncertainty in the MBAR freeenergy estimate, while balancing other key considerationssuch as computational expense.If possible, it is advisable to analyze the same set of sim- ulations with diﬀerent estimators, providing an opportunityfor synergy. If diﬀerent estimators agree the free energy esti-mate is more reliable than if there are diﬀerences betweenmethods that are larger than 1 kcal/mol and would indicatepoor convergence.Uncertainty can also be assessed for a particular pertur-bation by repeating calculations with slight changes in initialconﬁgurations, forceﬁeld parameters, and diﬀerent randomseeds in the MD engine. The assessment of variability in freeenergy calculations due to repeating simulations has beenpreviously reported [11, 16, 145, 203], and large variance infree energies estimated from simulations with diﬀerent ran-dom seeds should be ﬂagged as issues with convergence.For relative binding free energy calculations, additionalsensitivity analysis can be performed by changing the initialconﬁgurations of non-core regions of the perturbation topol-ogy and determining if this change in conﬁgurations resultsin a large diﬀerences in the computed relative free energy,indicating poor sampling of ligand conﬁguration. The pro-posed changes in conﬁguration are increasingly relevant if noexperimental evidence is available to reduce uncertainty inwhere the changing atoms should be positioned.In addition to statistical uncertainty and sampling, a va-riety of other factors can impact results from binding freeenergy calculations. In addition to the choice of initial con-ﬁguration, results can depend on the choice of force ﬁeldfor the protein/receptor, water, and small molecule(s), so re-running calculations with diﬀerent choices of force ﬁeld canalso be used to assess how sensitive results and conclusionsare to these particular choices. Other factors, like systempreparation (choice of protonation state, tautomer, counte-rion presence, salt concentration, etc.) can also substantiallyimpact results [206, 207], so unless modelers are conﬁdentthey have these factors correct, sensitivity to these choicesmay also need to be examined. There are diﬀerent easily measurable indicators that can testhow well converged simulations are, and if all alchemicalstates have been suﬃciently sampled for a rigorous analysis.Furthermore, once you have established that individual per-turbations are well behaved, there are some tricks to ensurethe overall perturbation network gives reliable results.

Convergence of simulations

Fig. 12 illustrates how looking at the convergence of your datamay be important. In this example, the guest G3 shows diﬀer-ent convergence behaviour for two diﬀerent hosts. The CB8host with guest G3 has a longer correlation time than the octaacid (OA) host. In some cases, slow correlation time may notbe expected and therefore not a feature known in advance.

30 of 48

LiveCoMS Best Practices Guide total simulation time [ns] G [ kc a l / m o l ] CB8-G3 OA-G6

Figure 12.

Average binding free energy of 5 replicate Hamilto-nian replica exchange calculations as a function of total simulationtime (i.e. the sum of the simulation time of all replicas) for thetwo host-guest systems CB8-G3 and OA-G3. Shaded areas repre-sent 95% conﬁdence intervals around the mean computed fromthe 5 replicates data. The horizontal dashed lines show the ﬁnalbinding free energy prediction of the two calculations after a to-tal of 5230 ns for OA-G3 and 6650 ns for CB8-G3. Longer cor-relation times in CB8-G3 cause the calculation to converge moreslowly. The original data used to generate the plot can be foundat https://github.com/samplchallenges/SAMPL6/blob/master/host_guest/Analysis/SAMPLing/Data/reference_free_energies.csv.

To this end, you should always look at all simulation dataavailable check convergence behaviour for each free energyestimate and if need be extend existing simulations or try anapproach that requires simulations in two separate bindingmodes where they interconvert at very slow timescales.

Overlap matrix

One way of assessing reliability of the calculations is checkingthe phase space overlap between neighboring (cid:126)λ -windows [68,69]. For this purpose, a so-called overlap matrix O can beused. O is a K × K matrix, with K being the number of simu-lated states, i.e. values of (cid:126)λ . Suﬃcient overlap is importantfor reweighting estimators such as BAR or MBAR, but can-not help assess reliability of estimates when using TI. Thesematrices are graphical representations of the phase spaceoverlap, i.e. the average probability that a sample generatedat state (cid:126)λ j can be observed at state (cid:126)λ i . As this probability iscomputed considering the samples from all states (and notjust the adjacent states), the values in each row and columnadd up to 1. In this analysis, the goal is to ensure every statehas overlap with its neighbors in both directions – so thatoﬀ-diagonal elements are suﬃciently larger than zero. For ac-curate calculations, the matrix should be at least tridiagonal.Details on the calculation and properties of these matri-ces can be found elsewhere [42]. In an overlap matrix O , theoﬀ-diagonal values ( O i , j , i (cid:54) = j ) are negatively correlated with thevariance of the free energy diﬀerence. Accordingly, the uncer-tainty of the free energy diﬀerence between the states i and j will be smaller when O i , j , i (cid:54) = j is larger (and thus the values in themain diagonal ( O i , j , i = j ) are smaller). In order to obtain a reli-able estimate of the free energy all neighbouring states mustbe connected, i.e. there must be suﬃcient overlap betweenthe samples of these states (general description: O i , j , i (cid:54) = j ≥ threshold). However, due to the mathematical derivation it isdiﬃcult to explicitly describe the relation of the overlap matrixand the variance by formulae. Consequently, the thresholdhas to be derived empirically. It has been proposed that thevalues of the ﬁrst oﬀ-diagonals (i.e. the diagonals above andbelow the main diagonal) should at least be 0.03 to obtain areliable free energy estimate [42]. Smaller values should beconsidered as a warning sign (see Fig. 13 C ), as the variancetends to be underestimated in case of poor overlap.Fig. 13 A , B , and C shows examples of good, mediocre,and poor overlap respectively. For Fig. 13 A , the probability toﬁnd a sample from state i in its neighbouring state j is about0.2 for all states adjacent to the main diagonal, and hence theoverall connectivity is good. In the case of Fig. 13 B , the over-lap is strongly diminishing in the lower right corner, raisingconcerns regarding the reliability of the free energy estimateobtained. For Fig. 13 C , the state at (cid:126)λ index = 6 is connectedto neither of its neighbouring states. While this does not nec-essarily imply that the result for this perturbation is wrong,the energy estimate must at least be considered as highlyunreliable. In order to overcome the issue of poor overlap inthis example, additional sampling should be performed byintroducing additional states, i.e. (cid:126)λ values.Interestingly, as the variance is inversely correlated withthe number of states [42], it can in principle be reduced be-low any arbitrary threshold with enough simulation time anda large enough number of (cid:126)λ windows. However, decreas-ing the variance to a value close to 0 is not feasible, as thisapproach would signiﬁcantly increase the calculation time.While variance can be decreased by increasing simulationlength, if the overlap between states is known to be poor,increasing the number of (cid:126)λ values, or adjusting the spacing ofthose values to better cover regions of poor overlap will likelyprovide a larger immediate impact. Diﬀerent approachesare described in Sec. 7 and more details can be found in theliterature [208, 209]. Cycle closure error

Relative free energy calculations, which compute the changein free energy on making a change to a molecule (e.g. addinga functional group to a ligand) may provide an additionalopportunity for error/consistency checking. Particularly, suchcalculations are often done to span a graph or tree of freeenergy calculations [85, 87, 123]. In some cases the freeenergy change to go between molecules A and B can beobtained via multiple transformation pathways. This allows a

31 of 48

LiveCoMS Best Practices Guide

ABC

Figure 13. Overlap matrices:

Visualising overlap matrices can helpwith assessing the quality of simulation data. ( A ) shows good over-lap with all ﬁrst oﬀ-diagonal entries well above 0.03, the suggestedthreshold, ( B ) is an example of mediocre overlap with good overlapat lower (cid:126)λ values and poor overlap at high (cid:126)λ values. ( C ) shows pooroverlap resulting in disconnected simulations with unreliable MBARestimates. type of consistency checking where we assess how much thefree energy change for that transformation in practice diﬀersfrom equivalence.Signiﬁcant deviations from this typically indicate insuﬃ-cient conﬁgurational sampling along the lambda schedule ofone or more of the transformations involved. This approachmay be generalised to sets of connected transformationsgiven the requirement that the sum of free energy changesalong edges of a closed cycle should be zero. This analysisis called “cycle closure”. In practice, such thermodynamiccycles do not actually sum to zero, and deviations becomeincreasingly large as the size of the cycle increases owingto propagation of error. Though no ﬁrm guidelines haveemerged, it may be judicious to perform additional conﬁgu-rational sampling along edges of a network that are involvedin cycles closing poorly. This may be done by extending theduration of simulations, or by averaging free energy changesover multiple repeats. The latter approach may yield morereproducible free energy changes, but at the expense of astronger bias on the estimated free energies due to repeateduse of the same input coordinates.A scheme to reduce cycle closure errors is used in FEP+whereby calculated free energy changes along the nodes ofthe network are re-sampled assuming estimates of the calcu-lated free energy change along a node may be obtained froma Gaussian distribution centered on the estimated free energychange and with a standard deviation equal to the estimatedstandard deviation of the free energy change. The procedurethen uses a maximum likelihood method to ﬁnd new sets offree energy changes that minimize cycle closure errors [87].An alternative approach computes the free energy changebetween a target and reference compound as a weighted av-erage over all unique paths in the network, with the weightsderived from the propagated uncertainties of each node [16].Approaches as illustrated by Yang et al. for perturbation mapdesign can also be used to compute relative free energiesbetween target and reference compounds [84]. Reversible binding simulations

An even more stringent test of the correctness of binding freeenergy calculations is to compare the results to the equilib-rium binding constants derived from long timescale reversiblebinding simulations [55]. For small ligands with millimolaraﬃnities, repeated binding to and unbinding from the proteincan occur for a large number of times in a suﬃciently long un-biased MD simulation (10-100 µ s), and the equilibrium bind-ing constants can be computed from the ratio of bound tounbound fractions of the simulation time. The agreement be-tween the binding free energy calculations and the reversiblebinding simulations—given the same system preparation andthe same force ﬁeld parameters—will strongly support the

32 of 48

LiveCoMS Best Practices Guidecorrectness of both calculations, as the same results are ar-rived at by two independent methods, and any discrepancywill suggest some systematic error in one, or both, of thetwo methods. As part of validation testing of alchemical freeenergy codes a benchmark set to compare alchemical anddirect computation of equilibrium binding constant shouldbecome standard in future.

It is important to carefully examine output data for commonproblems. Some of the most important things to check forare: • Sampling of the binding site by the ligand:

Makesure the ligand samples the binding site reasonablytightly for its expected potency and ﬁt, and that it doesnot depart out of binding site in the coupled end stateif it is a moderate to strong binder. • Consistency of free energy estimates across diﬀer-ent estimators

Signiﬁcant discrepancies (further out-side the error estimates than would be plausible) be-tween free energies calculated with diﬀerent free en-ergy estimators such as TI, BAR, and MBAR. All of theseestimators converge to the same results with suﬃcientsampling. Diﬀerences between them indicate poor over-lap or errors in processing. • Have replicas mixed well?

Poor replica mixing (forreplica-exchange) or λ -space sampling for single-replicamethods. If the system is not mixing between states,then the states are insuﬃciently close for mixing, or elsethere are bottlenecks in the conﬁgurational samplingthat limit the accuracy. • Behaviour of correlation times:

Correlation time thatdoes not vary relatively smoothly as a function of (cid:126)λ . Dis-continuities indicate that the system is sampling signiﬁ-cantly diﬀerent conﬁgurations with only small changesto the Hamiltonian changes. This usually indicates sam-pling problems. • Dependence of the free energy on initial confor-mation of the system.

Ensemble average propertiesshould not depend on the starting point. • Torsional sampling

Torsions with multiple low-energyminima where some of these minima are that are visitedrarely or not at all. Which torsions have low energy min-ima can best be found by comparing to the simulationin the solvent. There should be clear physical reasonsthat simulation in the complex has diﬀerent torsionaldistributions that the ligand in the solvent. • Free energy dependence on (cid:126)λ

The free energy shouldvary relatively smoothly with (cid:126)λ . If it varies drastically, then either there need to be ﬁner sampling in (cid:126)λ in thisregion, or there are sampling problems there. • Convergence of free energy

The free energy shouldclearly converge as a function or of simulation time (Fig.9). • If using nonequilbirum methods, is the result indepen-dent of the speed at which the nonequilibrium changeis performed? Nonequilibrium methods are in theoryindependent of the switching time in the limit of goodsampling unless the switching time is simply too short. • Visualization of data

In general, inspect output datasuch as energies and visualize the simulation trajecto-ries and assess if they match your expectations. Manyissues can be spotted by a straight forward visualiztion.

Following best practices for data generation and their analysisdoes not mean that data is reported in the optimal way. Asa practitioner of alchemical free energy simulations you alsoshould use best practices for reporting and plotting yourresults. We encourage the following standard set of analysesand ways to represent data.

Statistics to include

As with any modelling technique, misuse of statistical analysiscan skew the perception of how well models perform in freeenergy predictions. First, error estimates should always beincluded on your predictions in whatever form you presentyour data (scatterplots, barplots, etc; see next paragraph).We recommend performing triplicates of your predictionsat minimum, with starting points that are expected to beuncorrelated, to ensure some measure of reliability in yourdata. This replication may seem excessive, but uncertaintyestimates often underestimate the true statistical uncertainty.Where performing multiple replicas of the simulation is notpossible, an error estimate from e.g. MBAR can be used,though bearing in mind this is likely an underestimated error.As alchemical free energy methods are used in drug dis-covery to quantify and rationalise structure activity relation-ships (SAR), the models ability to (a) correlate well with ex-periment and (b) rank-order the molecules by aﬃnity, shouldboth be computed. Conventionally, this means including anR (or Pearson’s R), where R = +1 means high correlation, R = 0 means no correlation, and R = –1 means high anti-correlation) and a Kendall τ (with perfect ranking agreementwhen τ=1 and perfect disagreement when τ=-1) metric in yourresults. Additionally, practitioners may choose to include aSpearman ρ as well. Brown et al. [210] have provided a use-ful analysis in terms of upper bounds of expected possiblecorrelations between experiment and computation with agiven potency range for the compounds. For example, for

33 of 48

LiveCoMS Best Practices Guidepotency ranges of 2 log units it would be impossible to geta higher correlation in R than 0.8 because of experimentaluncertainties [210]. What often is neglected to include is anerror analysis on correlation statistics that arise from the er-rors of both experimental and computed data. One way toinclude such error analysis for correlation metrics is usingbootstrapping on the datasets. The D3R community chal-lenges follows best practices on their data evaluation withreadily available python scripts online [211], based on workby Pat Walters [212]. Other analysis software also providesimilar functionality for bootstrapping datasets [213].Mean unsigned error (MUE, also called mean absoluteerror/MAE) is another key statistic to include in your results.Even though some models’ near-perfect correlation and rank-ing statistics might suggest excellent accuracy, MUE valuescan still have errors of show multiple kcal/mol of error, provid-ing important additional insight into performance. Further-more, MUE allows for unbiased comparisons between predic-tive models as it is less sensitive to dataset size. Other metricssuch as Gaussian Random Aﬃnity Model (GRAM) [214], Pre-dictive Interval (PI) and Relative Absolute Error (RAE), attemptto correct for the inherent potency range of a dataset, whichcan aid in comparing success between diﬀerent targets. Werecommend further reading on evaluation of computationalmodels [210, 212, 215, 216].Reporting the results of relative free energy calculationsrequires care. As shown in Fig. 5, relative free energies canbe performed arbitrarily as a forward or a reverse process,and thus relative free energies may be reported as eitherpositively or negatively valued. The consequence of the twopossible signs for relative free energies is that correlationstatistics (such as Pearson’s R and Kendall τ) can be skeweddepending on which sign is analysed. The issue of this incon-sistency can be circumvented by either plotting all datapointswithin a consistent quadrant [79], or by avoiding the use ofcorrelation statistics for assessment of relative free energycalculations and instead measuring accuracy using RMSE andMUE which are unaﬀected by choice of sign.

Presenting your data

As essentially all alchemical free energy prediction schemesare regression problems, the preferred type of plot is a scat-ter plot (see Fig. 14). Most alchemical free energy projectswill look at 10-50 ligands; any study with <10 ligands is moresuitable for bar plots (with inclusion of error bars), and willunlikely provide meaningful statistics. Any study with >50ligands often contains multiple protein targets to which al-chemical free energies may perform better on some targetsthan others. Because of this, it is bad practice to place multi-ple datasets on the same plot as this can suggest high modelaccuracy even though the individual models perform less

Figure 14. An example of recommended practices for graphingalchemical free energy predictions.

This ﬁgure shows the relationbetween predicted and experimentally-determined Gibbs free en-ergy in kcal/mol with standard errors as error bars. The dark andlight-orange regions depict the 1- and 2-kcal/mol conﬁdence bounds.Statistical metrics for the data are reported, with 95% conﬁdenceintervals determined by bootstrapping analysis. Extra care should betaken when investigating potential outliers further. well [216].As we are interested mainly in the linear relationship be-tween the alchemical free energy predictions and the experi-mentally -determined aﬃnity values, plots should be depictedwith the same range on both axes (i.e. x = y ) with a 1:1 as-pect ratio, with units for both experiment and simulationconverted to be the same. If this skews the plot to a pointwhere it is diﬃcult to read of information, using the samedimensions, such that e.g. 1 cm is 1 kcal/mol is acceptable.Furthermore, bounds should be depicted for the 1- and 2-kcal/mol conﬁdence regions. These regions can serve as toolsto communicate your model performance: any predictionsinside the 1 kcal/mol region can be seen as highly reliable,any predictions inside the 2 kcal/mol region should be seenas somewhat reliable, and any predictions outside the conﬁ-dence regions should be expected to be unreliable (and han-dled as outliers). In a drug discovery context, this type of datadepiction may suggest the reliability of alchemical FE predic-tions in the project, and can give an idea of how trustworthypredictions can be for synthesis ideas. It is also recommendedto included experimental error bars in all plots.An example of a best practice scatter comparison betweencomputed and experimental values is shown in Fig. 14, high-lighting outliers, error bars and conﬁdence intervals. The data

34 of 48

LiveCoMS Best Practices Guidefor this plot is artiﬁcially generated for illustration purposes.

Alchemical free energy calculations have seen a vast increasein popularity both in academic research as well as pharma-ceutical industry applications in structure based drug discov-ery [36, 38, 217]. Commercial products such as FEP+ andFlare, which provide a convenient user interface make thesetup and use of these methods a lot easier [15, 19], but thisconvenience comes with less ﬂexibility in terms of choice ofsimulation protocols. It is also important to understand thecurrent limitations of the methodology to recognise whenautomated workﬂow tools can be used eﬀectively for a givenprotein target and when they are likely to fail still. Prospectiveprediction challenges such as the Drug Design Data resourcegrand challenges provide a community driven platform toevaluate diﬀerent free energy protocols against each other onblinded targets [218, 219]. Such eﬀorts have highlighted thatselection of seemingly identical or similar potential energyfunction or simulation package does not guarantee produc-tion of similar free energies owing to diﬀerences in simulationprotocols. We hope that the best practice guide provides aset of tools that allow a better understanding of how to setup,run, and reliably interpret alchemical free energy calculations.

10 Selection of available softwarepackages

There are many diﬀerent software solutions available forthe setup, running, and analysis of alchemcial free energycalculations. These will vary in customizability and ways inwhich they are ran, e.g. graphical user interface versus com-mand line tool or python script. The following provides anon-exhaustive list of commercial and noncommercial toolsavailable for conducting alchemical free energy calculations.

Simulation software: Commercial–

FEP+ is a tool oﬀered by Schrödinger Inc. under acommercial license. It has an intuitive GUI whichmakes it easier for non-experts to run alchemicalfree energy calculations and analyze the results. Itruns the DESMOND MD package under the hoodand hence parallelizes well on GPUs [15]. – Flare is a commercial structure-based drug designsoftware oﬀered by Cresset. Similar to FEP+ ithas an easily accessible graphical user interfaceand strives to facilitate free energy calculations fornon-experts while oﬀering advanced users full con-trol via a Python API. It only runs on GPUs, usingCUDA or OpenCL [19]. It is build on top of the opensource software packages Sire and BioSimSpace(cf. below). – The molecular operating environment (MOE) of-fered by the Chemical Computing Group (CCG) hasa tool for performing free energy calculations. It isbuilt on AMBER-TI (cf. below).All the above tools also provide a convenient setup andanalysis suite and are really a one in all product.

Simulation software: Free/low cost academic andCommercial–

CHARMM has a variety of tools developed overthe years. The PERT module can be used to de-ﬁne initial and ﬁnal states and deﬁne the inter-mediate lambda points. FREN and BAR modulescan be used to analyze the data after the MD run.Lambda-dynamics-based free energy calculationcan be carried out using the BLOCK module. – AMBER, including its new pmemd.cuda version sup-ports free energy calculations [220].

Simulation software: Open Source–

PLUMED is an open source tool which enablesthe usage of a variety of MD engines. It is de-signed as a plugin for MD packages such that itanalyzes the trajectory on the ﬂy. It also oﬀers aVMD based plugin for the computation of collectivevariables [221]. – BioSimSpace is a free, open source, multiscalemolecular simulation framework, written to allowcomputational modellers to quickly prototype anddevelop new algorithms for molecular simulationand molecular design [34]. – Sire is a multiscale, molecular simulation frame-work that provides several applications, includingSOMD, an MD/MC code for performing FEP calcu-lations via an interface to OpenMM. – YANK is a tool developed by John Chodera andgroup on the top of OpenMM MD package. It al-lows the users to write their inputs in easy-to-useYAML format. – GROMACS is a molecular simulation package witha signiﬁcant number of free energy methods im-plementations. The LiveCOMS GROMACS tutorialincludes an example free energy calculation [222]. – PMX, an add-on to GROMACS, oﬀers a mutationfree energy calculation module[223]. – Q is MD code for performing FEP calculations usinga variety of force ﬁelds [224].

Setup tools:–

PMX: GROMACS https://github.com/deGrootLab/pmx.

35 of 48

LiveCoMS Best Practices Guide – Lomap/Lomap2 : Relative alchemical transforma-tion graph planning for setting up perturbationnetworks [123]. – CHARMM-GUI is a web-based tool for setting up avariety of MD simulations. It can be used to gen-erate CHARMM scripts for solvation and ligand-binding free energy calculations [225]. – QligFEP oﬀers robust and fast setup of FEP calcula-tions for the software package Q [82]. – ProtoCaller, a setup tool for the automation of Gro-macs free energy calculations. ProtoCaller [226] – FESetup has been developed primarily to setupcalculations in AMBER, GROMACS and SIRE [80]

Analysis tools:–

Alchemlyb: Multipackage free energy analysishttps://github.com/alchemistry/alchemlyb [227]. – pymbar: MBAR implementation, but haveto roll your own analysis wrapper https://github.com/choderalab/pymbar [26]. – Arsenic: Standardising alchemical free energy anal-ysis https://github.com/openforceﬁeld/Arsenic – Free Energy Workﬂows: Sire-speciﬁc free energymap analysis using weighted path averages https://github.com/michellab/freenrgworkﬂows.Generally, commercial software will oﬀer more com-plete pipelines in which standalone analysis applica-tions are not necessarily needed; free and open sourcepackages often require manual analysis.

11 Alchemical free energy datasets: anoverview

The following contains a non-exhaustive summary of alchemi-cal free energy datasets that can serve as a starting pointto review approaches or test new implementations. Theﬁeld is moving towards a more standardised way of gener-ating protein-ligand benchmark datasets and the progressof these eﬀorts can be tracked here: https://github.com/openforceﬁeld/FE-Benchmarks-Best-Practices. Currently lack-ing an exhaustive set of benchmark datasets, the review byWilliams-Noonan et al. [228] contains an overview of recentlypublished alchemical free energy studies. For comparison ofFEP+ and Gromacs (using the AMBER99SB-ILDN and GAFF2force ﬁeld), cf. the recently published study by Pérez-Benitoet al. [79]. An overview of further suggested benchmark setscan be found in the review by Mobley and Gilson [206] oron alchemistry.org [229]. These include cyclodextrins, theCytochrome C peroxidase (CCP) protein model binding site,thrombin and bromodomains as well as solvation benchmarksets [203]. Please refer to table 1, for a small overview of

Table 1.

Selection of example datasets

Publication Targets Ligands Force Field

D3R Grand Challenges [230]GC3 [219] 6 266 variousGC2 [218] 1 102 variousGC2015 [231] 2 215 variousSAMPL Challenges [232]SAMPL6 [233] 3 21 variousSAMPL5 [234] 3 22 variousSAMPL4 [235] 2 23 variousSchrödinger DatasetsFEP+ Dataset [15] 8 199 OPLS2.1FEP+ Dataset [71] 8 199 OPLS3FEP+ Dataset [236] 8 199 OPLS3eFEP+ Dataset [17] 8 199 GAFF 1.8FEP+ Dataset [18] 8 199 variousFEP+ Dataset [19] 8 199 GAFF2.1Fragments [177] 8 96 OPLS2.1Scaﬀold Hopping [93] 6 21 OPLS3Scaﬀold Hopping [19] 6 21 GAFF2.1Macrocycles [237] 7 33 OPLS3Further Suggested DatasetsCucurbit[7]uril (CB7) [206] 1 15 NADeep cavity cavitand [206] 2 19 NAT4 Lysozyme [206] 2 20 NAMerck set [238] 5 169 OPSL3datasets, what forceﬁelds they used, and what the originalstudy was it came from.

12 Checklist

36 of 48

LiveCoMS Best Practices Guide

KNOW WHAT YOU WANT TO SIMULATE

Initial questions you should ask before you set up an alchemical free energy calculation using molecular dynam-ics simulations (cid:3)

Do I understand the biology, chemistry and physics of my system? (cid:3)

Have I properly prepared my protein and ligand systems? (cid:3)

Does my system contain any structures that require custom parameters? (cid:3)

What simulation protocol will provide the most evidence to answer my hypothesis? (cid:3)

Are the projected computational expense and runtime realistic for my goals? (cid:3)

Will my protocol be reproducible? (cid:3)

Will my statistics be reliable? If not, would more replicates solve the problem? (cid:3)

Can I open-source my data?

PREPARING YOUR SIMULATIONS

Steps to getting started setting up your alchemical free energy calculation (cid:3)

Make sure you know why you have picked your (combination of) force ﬁeld(s) (cid:3)

Minimize your system (cid:3)

Equilibrate your system with your choice of thermodynamic ensemble (cid:3)

Check the stability of your system and whether it behaves the way you believe it should

RUNNING ABSOLUTE SIMULATIONS

Steps to running your absolute alchemical free energy calculations (cid:3)

Check your ligands have the same, biologically correct binding pose (cid:3)

Make sure your λ-scheduling is set appropriately (cid:3)

Check if your ligands are discharging and decoupling correctly (cid:3)

Set up your restraints correctly (cid:3)

Make sure you subsample the data in your free energy estimation protocol (cid:3)

Apply the appropriate correction terms

RUNNING RELATIVE SIMULATIONS

Steps to running your relative alchemical free energy calculations (cid:3)

Check your ligands have the same, biologically correct binding pose (cid:3)

Make sure your λ-scheduling is set correctly (cid:3)

Make sure your molecular transformations are realistic (1-5 heavy atoms for reliable computations) (cid:3)

Generate a perturbation network by your method of choice; check whether you have enough cycle closures to checkconsistency in the results (cid:3)

Check whether dummy atoms were assigned correctly (cid:3)

Consider subsampling the data in your free energy estimation protocol (cid:3)

Apply the appropriate correction terms

37 of 48

LiveCoMS Best Practices Guide

HOW DO I KNOW WHICH SIMULATIONS ARE UNRELIABLE?

Situations suggesting your relative alchemical free energy calculations have not run properly (assuming absenceof experimental aﬃnities) (cid:3)

Standard error (σ) should not be >1 kcal · mol –1 (cid:3) Simulated systems have not converged - trajectories should be manually checked for consistency; other methods suchas generating RMSD plots are also recommended

Relative: (cid:3)

If you observe hysteresis in perturbations and incorrect cycle closures (cid:3)

Energy diﬀerences > ∼

15 kcal · mol –1 are likely unreliable Absolute: (cid:3)

Energies < ∼ -15 kcal · mol –1 are likely unreliable (cid:3) The ligand has not sampled most of the intended region after the decoupling step (cid:3)

The ligand is drifting out of the intended region after the decoupling step

WHY ARE THEY NOT RELIABLE?

Suggestions for ﬁnding out why your alchemical free energy calculations may not be reliable (cid:3)

Check again whether dummy atoms were assigned correctly (cid:3)

Inspect the trajectories across the λ-schedule (particularly the endpoints) for problems described in the text (cid:3)

Inspect the overlap matrices for lack of overlap

DATA ANALYSIS

Steps to analyzing your output data correctly (cid:3)

Make sure you have run enough replicates to ensure statistical reliability (>3) (cid:3)

Compute both correlation and ranking coeﬃcients and ranking statistics (e.g. r, ρ, MUE and τ) (cid:3)

Include error bars in all your visual analyses

38 of 48

LiveCoMS Best Practices Guide

Author Contributions

ASJSM : Coordinated the document, contributed to mostsections, and co-designed Figs. 2, 3, 4, 5, 6, 14,and createdFigs. 7, 1, 13, 10 and replotted 11 and 5. JS : Created Figs. 1, 2, 3, 4, 6, 14, and an initial draft of 5.Wrote Sec. 8.7, the checklist Sec. 12, and contributed togeneral formatting discussions and editing. MK : Contributed to Sec. 8, provided the data for ﬁgure 13,compiled the dataset for Sec. 11 and helped edit the paper. DLM : Contributed to the outline, drafted some of thesections, gave ideas on ﬁgures, and helped edit the paper. GT : Contributed to Sec. 1 and 5, and helped edit the paper. AR : Created ﬁgure 12, contributed to sections 3 and 7, andhelped edit the paper. MRS : Helped create ﬁgure 7, wrote Sec. 7.2.3 describingchoices for alchemical pathways and parts of 8 on theanalysis for free energy calculations. Reviewed and editedtext throughout.

LNN : Helped write the simulation length, stopping condi-tions, and information saving section. Edited and reviewedalchemical path section. BA : Helped write the uncertainty estimation, stoppingconditions, and output analysis sections and created ﬁgure9. JM : Contributed to Sec. 4.2, 6, 7.2.3, 8.3, 8.4, and 9 JDC : Wrote Sec. 8.2 and 8.1 discussed structure and design ofthe whole document, suggested Figs. 1 and 8 HX : Contributed Sec. 4.4, to Sec. 7.1.1, and to Sec. 8.5. HBM

Contributed to Sec. 8.7 and Fig. 14 and helped edit thepaper.For a more detailed description of author contribu-tions, see the GitHub issue tracking and changelog athttps://github.com/michellab/alchemical-best-practices.

Other Contributions

Julia E. Rice participated in the original discussion of the docu-ment at the Best Practices in Molecular Simulation WorkshopHosted by at NIST, Gaithersburg, MD, August 24th-25th, 2017For a more detailed description of contributionsfrom the community and others, see the GitHub issuetracking and changelog at https://github.com/michellab/alchemical-best-practices.

Potentially Conﬂicting Interests

JM is a current member of the Scientiﬁc Advisory Board ofCresset. MK is employed by Cresset who commercially dis-tribute a software for performing alchemical free energy cal-culations. MRS is a Open Science Fellow and consultant forSilicon Therapeutics.

Funding Information

ASJSM and JM acknowledge funding through an EPSRCﬂagship software grant: EP/P022138/1 MK and JM acknowl-edge funding through Innovate UK by KTP partnership011120. AR acknowledges partial support from the Tri-Institutional Program in Computational Biology and Medicineand the Sloan Kettering Institute. HEBM acknowledgessupport from a Molecular Sciences Software InstituteInvestment Fellowship and Relay Therapeutics. JDC isa current member of the Scientiﬁc Advisory Board ofOpenEye Scientiﬁc Software and a consultant to ForesiteLaboratories. A complete funding history for the Choderalab can be found at http://choderalab.org/funding. DLMappreciates ﬁnancial support from the National Institutesof Health (1R01GM108889-01, 1R01GM124270-01A1 andR01GM132386), and the National Science Foundation (CHE1352608).

Author Information

ORCID:

Antonia S. J. S. Mey: 0000-0001-7512-5252Bryce Allen: 0000-0002-0804-8127Hannah E. Bruce Macdonald: 0000-0002-5562-6866John D. Chodera: 0000-0003-0542-119XMaximilian Kuhn: 0000-0002-2811-3934Julien Michel: 0000-0003-0360-1760David L. Mobley: 0000-0002-1083-5533Levi N. Naden: 0000-0002-3692-5027Samarjeet Prasad: 0000-0001-8320-6482Andrea Rizzi: 0000-0001-7693-2013Jenke Scheen: 0000-0001-9781-0445Michael R. Shirts: 0000-0003-3249-1097Gary Tresadern: 0000-0002-4801-1644Huafeng Xu: 0000-0001-5447-0452

References [1]

Zwanzig RW . High-Temperature Equation of State by a Per-turbation Method. I. Nonpolar Gases. J Chem Phys. 1954;22(8):1420–1426.[2]

Tembre BL , Mc Cammon JA. Ligand-Receptor Interactions.Comput Chem. 1984; 8(4):281–283. doi: 10.1016/0097-8485(84)85020-2.[3]

Rustenburg AS , Dancer J, Lin B, Feng JA, Ortwine DF, MobleyDL, Chodera JD. Measuring Experimental Cyclohexane-WaterDistribution Coeﬃcients for the SAMPL5 Challenge. J ComputAided Mol Des. 2016; 30(11):945–958. doi: 10.1007/s10822-016-9971-7.[4]

Bosisio S , Mey ASJS, Michel J. Blinded Predictions of Distribu-tion Coeﬃcients in the SAMPL5 Challenge. J Comput Aided MolDes. 2016; 30(11):1101–1114. doi: 10.1007/s10822-016-9969-1.39 of 48

LiveCoMS Best Practices Guide [5]

Corey RA , Vickery ON, Sansom MSP, Stansfeld PJ. Insightsinto Membrane Protein-Lipid Interactions from Free EnergyCalculations. J Chem Theory Comput. 2019; 15(10):5727–5736.doi: 10.1021/acs.jctc.9b00548.[6]

Hauser K , Negron C, Albanese SK, Ray S, Steinbrecher T, AbelR, Chodera JD, Wang L. Predicting Resistance of Clinical AblMutations to Targeted Kinase Inhibitors Using Alchemical Free-Energy Calculations. Commun Biol. 2018; 1(1):70.[7]

Aldeghi M , Gapsys V, de Groot BL. Accurate Estimation ofLigand Binding Aﬃnity Changes upon Protein Mutation. ACSCent Sci. 2018; 4(12):1708–1718.[8]

Seeliger D , De Groot BL. Protein Thermostability CalculationsUsing Alchemical Free Energy Simulations. Biophys J. 2010;98(10):2309–2316.[9]

Gapsys V , Michielssens S, Seeliger D, de Groot BL. Insightsfrom the First Principles Based Large Scale Protein Thermosta-bility Calculations. Biophys J. 2016; 110(3):368a.[10]

Gapsys V , Michielssens S, Seeliger D, de Groot BL. Accurateand Rigorous Prediction of the Changes in Protein Free En-ergies in a Large-Scale Mutation Scan. Angewandte ChemieInternational Edition. 2016; 55(26):7364–7368.[11]

Aldeghi M , de Groot BL, Gapsys V. Accurate Calculation of FreeEnergy Changes upon Amino Acid Mutation. In:

ComputationalMethods in Protein Evolution

Springer; 2019.p. 19–47.[12]

Mobley DL , Graves AP, Chodera JD, McReynolds AC, ShoichetBK, Dill KA. Predicting Absolute Ligand Binding Free Energiesto a Simple Model Site. J Mol Biol. 2007; 371(4):1118–1134.[13]

Aldeghi M , Heifetz A, Bodkin MJ, Knapp S, Biggin PC. Ac-curate Calculation of the Absolute Free Energy of Bindingfor Drug Molecules. Chem Sci. 2015; 7(1):207–218. doi:10.1039/C5SC02678D.[14]

Aldeghi M , Heifetz A, Bodkin MJ, Knapp S, Biggin PC. Pre-dictions of Ligand Selectivity from Absolute Binding Free En-ergy Calculations. J Am Chem Soc. 2017; 139(2):946–957. doi:10.1021/jacs.6b11467.[15]

Wang L , Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D,Robinson S, Dahlgren MK, Greenwood J, Romero DL, MasseC, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E,Sherman W, Brewer M, Wester R, et al. Accurate and ReliablePrediction of Relative Ligand Binding Potency in ProspectiveDrug Discovery by Way of a Modern Free-Energy CalculationProtocol and Force Field. J Am Chem Soc. 2015; 137(7):2695–2703. doi: 10.1021/ja512751q.[16]

Mey ASJS , Juárez-Jiménez J, Hennessy A, Michel J. Blinded Pre-dictions of Binding Modes and Energies of HSP90- α Ligands forthe 2015 D3R Grand Challenge. Bioorganic & Medicinal Chem-istry. 2016; 24(20):4890–4899. doi: 10.1016/j.bmc.2016.07.044.[17]

Song LF , Lee TS, Zhu C, York DM, Merz KM. Using AMBER18for Relative Free Energy Calculations. J Chem Inf Model. 2019;59(7):3128–3135. doi: 10.1021/acs.jcim.9b00105. [18]

Gapsys V , Pérez-Benito L, Aldeghi M, Seeliger D, van Vlijmen H,Tresadern G, de Groot BL. Large Scale Relative Protein LigandBinding Aﬃnities Using Non-Equilibrium Alchemy. Chem Sci.2020; doi: 10.1039/C9SC03754C.[19]

Kuhn M , Firth-Clark S, Tosco P, Mey ASJS, Mackey M, MichelJ. Assessment of Binding Aﬃnity via Alchemical Free-EnergyCalculations. J Chem Inf Model. 2020; 60(6):3120–3130. doi:10.1021/acs.jcim.0c00165.[20]

Kirkwood JG . Statistical Mechanics of Fluid Mixtures. J ChemPhys. 1935; 3:300–313.[21]

Jorgensen WL , Ravimohan C. Monte Carlo Simulation of Dif-ferences in Free Energies of Hydration. J Chem Phys. 1985;83(6):3050–3054.[22]

Kollman PA . Free Energy Calculations: Applications to Chem-ical and Biochemical Phenomena. Chem Rev. 1993; 7:2395–2417.[23]

Wong CF , McCammon JA. Dynamics and Design of Enzymesand Inhibitors. J Am Chem Soc. 1986; 108(13):3830–3832. doi:10.1021/ja00273a048.[24]

Merz KM , Kollman PA. Free Energy Perturbation Simulations ofthe Inhibition of Thermolysin: Prediction of the Free Energy ofBinding of a New Inhibitor. J Am Chem Soc. 1989; 111(15):5649–5658. doi: 10.1021/ja00197a022.[25]

Bennett CH . Eﬃcient Estimation of Free Energy Diﬀerencesfrom Monte Carlo Data. J Comput Phys. 1976; 22:245–268.[26]

Shirts MR , Chodera JD. Statistically Optimal Analysis of Sam-ples from Multiple Equilibrium States. J Chem Phys. 2008;129(12):124105. doi: 10.1063/1.2978177.[27]

Wu H , Paul F, Wehmeyer C, Noé F. Multiensemble MarkovModels of Molecular Thermodynamics and Kinetics. ProcNatl Acad Sci. 2016; 113(23):E3221–E3230. doi: 10.1073/p-nas.1525092113.[28]

Mey ASJS , Wu H, Noé F. xTRAM: Estimating Equilibrium Ex-pectations from Time-Correlated Simulation Data at MultipleThermodynamic States. Phys Rev X. 2014; 4(4):041018. doi:10.1103/PhysRevX.4.041018.[29]

Wu H , Mey ASJS, Rosta E, Noé F. Statistically Optimal Anal-ysis of State-Discretized Trajectory Data from Multiple Ther-modynamic States. J Chem Phys. 2014; 141(21):214106. doi:10.1063/1.4902240.[30]

Shirts MR , Pitera JW, Swope WC, Pande VS. Extremely PreciseFree Energy Calculations of Amino Acid Side Chain Analogs:Comparison of Common Molecular Mechanics Force Fields forProteins. J Chem Phys. 2003; 119(11):5740–5761.[31]

Shirts MR , Pande VS. Solvation Free Energies of Amino AcidSide Chains for Common Molecular Mechanics Water Models.J Chem Phys. 2005; 122:134508.[32] van der Spoel D , Lindahl E, Hess B, Groenhof G, Mark AE,Berendsen HJC. GROMACS: Fast, Flexible, and Free. J ComputChem. 2005; 26:1701–1718.40 of 48

LiveCoMS Best Practices Guide [33]

Mermelstein DJ , Lin C, Nelson G, Kretsch R, McCammon JA,Walker RC. Fast and Flexible Gpu Accelerated Binding FreeEnergy Calculations within the Amber Molecular DynamicsPackage. J Comput Chem. 2018; 39(19):1354–1358. doi:10.1002/jcc.25187.[34]

Hedges L , Mey A, Laughton C, Gervasio F, Mulholland A, WoodsC, Michel J. BioSimSpace: An Interoperable Python Frame-work for Biomolecular Simulation. J Open Source Softw. 2019;4(43):1831. doi: 10.21105/joss.01831.[35]

Fratev F , Sirimulla S. An Improved Free Energy PerturbationFEP+ Sampling Protocol for Flexible Ligand-Binding Domains. .2019; doi: 10.26434/chemrxiv.6204167.v2.[36]

Schindler C , Baumann H, Blum A, Böse D, Buchstaller HP,Burgdorf L, Cappel D, Chekler E, Czodrowski P, Dorsch D, EguidaM, Follows B, Fuchß T, Grädler U, Gunera J, Johnson T, Jorand Le-brun C, Karra S, Klein M, Kötzner L, et al. Large-Scale Assess-ment of Binding Free Energy Calculations in Active Drug Dis-covery Projects. . 2020; doi: 10.26434/chemrxiv.11364884.v1.[37]

Cournia Z , Allen B, Sherman W. Relative Binding Free EnergyCalculations in Drug Discovery: Recent Advances and PracticalConsiderations. J Chem Inf Model. 2017; 57(12):2911–2937.doi: 10.1021/acs.jcim.7b00564.[38]

Sherborne B , Shanmugasundaram V, Cheng AC, Christ CD,DesJarlais RL, Duca JS, Lewis RA, Loughney DA, Manas ES, Mc-Gaughey GB, Peishoﬀ CE, van Vlijmen H. Collaborating to Im-prove the Use of Free-Energy and Other Quantitative Methodsin Drug Discovery. J Comput Aided Mol Des. 2016; 30(12):1139–1141. doi: 10.1007/s10822-016-9996-y.[39]

Salomon-Ferrer R , Götz AW, Poole D, Le Grand S, WalkerRC. Routine Microsecond Molecular Dynamics Simulationswith AMBER on GPUs. 2. Explicit Solvent Particle MeshEwald. J Chem Theory Comput. 2013; 9(9):3878–3888. doi:10.1021/ct400314y.[40]

Braun E , Gilmer J, Mayes HB, Mobley DL, Monroe JI, PrasadS, Zuckerman DM. Best Practices for Foundations in Molec-ular Simulations [Article v1.0]. LiveCoMS. 2019; 1(1). doi:10.33011/livecoms.1.1.5957.[41]

Grossﬁeld A , Patrone PN, Roe DR, Schultz A J, Siderius D, Zuck-erman DM. Best Practices for Quantiﬁcation of Uncertaintyand Sampling Quality in Molecular Simulations [Article v1.0].LiveCoMS. 2018; 1(1):5067. doi: 10.33011/livecoms.1.1.5067.[42]

Klimovich PV , Shirts MR, Mobley DL. Guidelines for the Analy-sis of Free Energy Calculations. J Comput Aided Mol Des. 2015;29(5):397–411. doi: 10.1007/s10822-015-9840-9.[43]

Shirts MR . Best Practices in Free Energy Calculations for DrugDesign. In: Baron R, editor.

Computational Drug Discovery andDesign

Methods in Molecular Biology, New York, NY: Springer;2012.p. 425–467. doi: 10.1007/978-1-61779-465-0_26.[44]

Grinter SZ , Zou X. Challenges, Applications, and Re-cent Advances of Protein-Ligand Docking in Structure-BasedDrug Design. Molecules. 2014; 19(7):10150–10176. doi:10.3390/molecules190710150. [45]

Lameira J , Bonatto V, Cianni L, Rocho FdR, Leitão A, Monta-nari CA. Predicting the Aﬃnity of Halogenated ReversibleCovalent Inhibitors through Relative Binding Free Energy.Phys Chem Chem Phys. 2019; 21(44):24723–24730. doi:10.1039/C9CP04820K.[46]

Gill SC , Lim NM, Grinaway PB, Rustenburg AS, Fass J, RossGA, Chodera JD, Mobley DL. Binding Modes of Ligands UsingEnhanced Sampling (BLUES): Rapid Decorrelation of LigandBinding Modes via Nonequilibrium Candidate Monte Carlo. JPhys Chem B. 2018; doi: 10.1021/acs.jpcb.7b11820.[47]

Genheden S , Ryde U. The MM/PBSA and MM/GBSA Methodsto Estimate Ligand-Binding Aﬃnities. Expert Opin Drug Dis.2015; 10(5):449–461. doi: 10.1517/17460441.2015.1032936.[48]

Gutiérrez-de-Terán H , Åqvist J. Linear Interaction Energy:Method and Applications in Drug Design. In: Baron R, editor.

Computational Drug Discovery and Design

Methods in MolecularBiology, New York, NY: Springer New York; 2012.p. 305–323.doi: 10.1007/978-1-61779-465-0_20.[49]

Heinzelmann G , Henriksen NM, Gilson MK. Attach-Pull-Release Calculations of Ligand Binding and ConformationalChanges on the First BRD4 Bromodomain. J Chem Theory Com-put. 2017; 13(7):3260–3275. doi: 10.1021/acs.jctc.7b00275.[50]

Loeﬄer HH , Bosisio S, Duarte Ramos Matos G, Suh D, RouxB, Mobley DL, Michel J. Reproducibility of Free Energy Calcu-lations across Diﬀerent Molecular Simulation Software Pack-ages. J Chem Theory Comput. 2018; 14(11):5567–5582. doi:10.1021/acs.jctc.8b00544.[51]

Vassetti D , Pagliai M, Procacci P. Assessment of GAFF2 andOPLS-AA General Force Fields in Combination with the WaterModels TIP3P, SPCE, and OPC3 for the Solvation Free Energyof Druglike Organic Molecules. J Chem Theory Comput. 2019;15(3):1983–1995. doi: 10.1021/acs.jctc.8b01039.[52]

Lopes PEM , Guvench O, MacKerell AD. Current Status of Pro-tein Force Fields for Molecular Dynamics. Methods Mol Biol.2015; 1215:47–71. doi: 10.1007/978-1-4939-1465-4_3.[53]

Gilson MK , Given JA, Bush BL, McCammon JA. A Statistical-Thermodynamic Basis for Computation of Binding Aﬃnities: ACritical Review. Biophys J. 1997; 72(3):1047–1069.[54]

Jong DHD , Schäfer LV, Vries AHD, Marrink SJ, Berendsen HJC,Grubmüller H. Determining Equilibrium Constants for Dimer-ization Reactions from Molecular Dynamics Simulations. JComput Chem. 2011; 32(9):1919–1928. doi: 10.1002/jcc.21776.[55]

Pan AC , Xu H, Palpant T, Shaw DE. Quantitative Char-acterization of the Binding and Unbinding of Millimo-lar Drug Fragments with Molecular Dynamics Simulations.J Chem Theory Comput. 2017; 13(7):3372–3377. doi:10.1021/acs.jctc.7b00172.[56]

Teo I , Mayne CG, Schulten K, Lelièvre T. Adaptive Multi-level Splitting Method for Molecular Dynamics Calculation ofBenzamidine-Trypsin Dissociation Time. J Chem Theory Com-put. 2016; 12(6):2983–2989. doi: 10.1021/acs.jctc.6b00277.41 of 48

LiveCoMS Best Practices Guide [57]

Votapka LW , Jagger BR, Heyneman AL, Amaro RE. SEEKR:Simulation Enabled Estimation of Kinetic Rates, A Compu-tational Tool to Estimate Molecular Kinetics and Its Applica-tion to Trypsin–Benzamidine Binding. J Phys Chem B. 2017;121(15):3597–3606. doi: 10.1021/acs.jpcb.6b09388.[58]

Doerr S , De Fabritiis G. On-the-Fly Learning and Samplingof Ligand Binding by High-Throughput Molecular Simula-tions. J Chem Theory Comput. 2014; 10(5):2064–2069. doi:10.1021/ct400919u.[59]

Plattner N , Noé F. Protein Conformational Plasticity and Com-plex Ligand-Binding Kinetics Explored by Atomistic Simulationsand Markov Models. Nat Commun. 2015; 6(1):1–10. doi:10.1038/ncomms8653.[60]

Dixon T , Lotz SD, Dickson A. Predicting Ligand Binding Aﬃn-ity Using On- and oﬀ-Rates for the SAMPL6 SAMPLing Chal-lenge. J Comput Aided Mol Des. 2018; 32(10):1001–1012. doi:10.1007/s10822-018-0149-3.[61]

Basavapathruni A , Jin L, Daigle SR, Majer CRA, Therkelsen CA,Wigle TJ, Kuntz KW, Chesworth R, Pollock RM, Scott MP, MoyerMP, Richon VM, Copeland RA, Olhava EJ. Conformational Adap-tation Drives Potent, Selective and Durable Inhibition of theHuman Protein Methyltransferase DOT1L. Chemical Biology &Drug Design. 2012; 80(6):971–980. doi: 10.1111/cbdd.12050.[62]

Hyre DE , Trong IL, Merritt EA, Eccleston JF, Green NM,Stenkamp RE, Stayton PS. Cooperative Hydrogen Bond In-teractions in the Streptavidin–Biotin System. Protein Sci. 2006;15(3):459–467. doi: 10.1110/ps.051970306.[63]

Eastman P , Swails J, Chodera JD, McGibbon RT, Zhao Y,Beauchamp KA, Wang LP, Simmonett AC, Harrigan MP, Stern CD,Wiewiora RP, Brooks BR, Pande VS. OpenMM 7: Rapid Develop-ment of High Performance Algorithms for Molecular Dynamics.PLoS Comput Biol. 2017; 13(7):e1005659. doi: 10.1371/jour-nal.pcbi.1005659.[64]

Kutzner C , Páll S, Fechner M, Esztermann A, de Groot BL, Grub-müller H. More Bang for Your Buck: Improved Use of GPUNodes for GROMACS 2018. J Comput Chem. 2019; 40(27):2418–2431. doi: 10.1002/jcc.26011.[65]

Woo HJ , Roux B. Calculation of Absolute Protein–Ligand Bind-ing Free Energy from Computer Simulations. Proc Natl AcadSci. 2005; 102(19):6825–6830. doi: 10.1073/pnas.0409005102.[66]

Velez-Vega C , Gilson MK. Overcoming Dissipation in theCalculation of Standard Binding Free Energies by Ligand Ex-traction. J Comput Chem. 2013; 34(27):2360–2371. doi:10.1002/jcc.23398.[67]

Limongelli V , Bonomi M, Parrinello M. Funnel Metadynamicsas Accurate Binding Free-Energy Method. Proc Natl Acad Sci.2013; 110(16):6358–6363. doi: 10.1073/pnas.1303186110.[68]

Wu D , Kofke DA. Phase-Space Overlap Measures. I. Fail-Safe Bias Detection in Free Energies Calculated by Molec-ular Simulation. J Chem Phys. 2005; 123(5):054103. doi:10.1063/1.1992483. [69]

Wu D , Kofke DA. Phase-Space Overlap Measures. II. De-sign and Implementation of Staging Methods for Free-EnergyCalculations. J Chem Phys. 2005; 123(8):084109. doi:10.1063/1.2011391.[70]

Shirts MR , Bair E, Hooker G, Pande VS. Equilibrium Free En-ergies from Nonequilibrium Measurements Using Maximum-Likelihood Methods. Phys Rev Lett. 2003; 91(14):140601. doi:10.1103/PhysRevLett.91.140601.[71]

Harder E , Damm W, Maple J, Wu C, Reboul M, Xiang JY, Wang L,Lupyan D, Dahlgren MK, Knight JL, Kaus JW, Cerutti DS, KrilovG, Jorgensen WL, Abel R, Friesner RA. OPLS3: A Force FieldProviding Broad Coverage of Drug-like Small Molecules andProteins. J Chem Theory Comput. 2016; 12(1):281–296. doi:10.1021/acs.jctc.5b00864.[72]

Keränen H , Pérez-Benito L, Ciordia M, Delgado F, Stein-brecher TB, Oehlrich D, van Vlijmen HWT, Trabanco AA,Tresadern G. Acylguanidine Beta Secretase 1 Inhibitors:A Combined Experimental and Free Energy PerturbationStudy. J Chem Theory Comput. 2017; 13(3):1439–1453. doi:10.1021/acs.jctc.6b01141.[73]

Rizzi A , Jensen T, Slochower DR, Aldeghi M, Gapsys V, Ntek-oumes D, Bosisio S, Papadourakis M, Henriksen NM, De GrootBL, et al. The SAMPL6 SAMPLing challenge: Assessing the relia-bility and eﬃciency of binding free energy calculations. BioRxiv.2019; p. 795005.[74]

Schindler T , Bornmann W, Pellicena P, Miller WT, ClarksonB, Kuriyan J. Structural Mechanism for STI-571 Inhibition ofAbelson Tyrosine Kinase. Science. 2000; 289(5486):1938–1942.doi: 10.1126/science.289.5486.1938.[75]

Mobley DL , Klimovich PV. Perspective: Alchemical Free EnergyCalculations for Drug Discovery. J Chem Phys. 2012; 137(23).doi: 10.1063/1.4769292.[76]

Kuhn B , Tichý M, Wang L, Robinson S, Martin RE, KuglstatterA, Benz J, Giroud M, Schirmeister T, Abel R, Diederich F, HertJ. Prospective Evaluation of Free Energy Calculations for thePrioritization of Cathepsin L Inhibitors. J Med Chem. 2017;60(6):2485–2497. doi: 10.1021/acs.jmedchem.6b01881.[77]

Pérez-Benito L , Keränen H, van Vlijmen H, Tresadern G. Pre-dicting Binding Free Energies of PDE2 Inhibitors. The Diﬃcul-ties of Protein Conformation. Sci Rep. 2018; 8(1):1–10. doi:10.1038/s41598-018-23039-5.[78]

Warren GL , Do TD, Kelley BP, Nicholls A, Warren SD. Es-sential Considerations for Using Protein–Ligand Structures inDrug Discovery. Drug Discov. 2012; 17(23):1270–1281. doi:10.1016/j.drudis.2012.06.011.[79]

Pérez-Benito L , Casajuana-Martin N, Jiménez-Rosés M, vanVlijmen H, Tresadern G. Predicting Activity Cliﬀs with Free-Energy Perturbation. J Chem Theory Comput. 2019; 15(3):1884–1895. doi: 10.1021/acs.jctc.8b01290.[80]

Loeﬄer HH , Michel J, Woods C. FESetup: Automating Setupfor Alchemical Free Energy Simulations. J Chem Inf Model.2015; 55(12):2485–2490. doi: 10.1021/acs.jcim.5b00368.42 of 48

LiveCoMS Best Practices Guide [81]

Gapsys V , Michielssens S, Seeliger D, de Groot BL. Pmx: Auto-mated Protein Structure and Topology Generation for Alchemi-cal Perturbations. J Comput Chem. 2015; 36(5):348–354. doi:10.1002/jcc.23804.[82]

Jespers W , Esguerra M, Åqvist J, Gutiérrez-de-Terán H. QligFEP:An Automated Workﬂow for Small Molecule Free Energy Cal-culations in Q. J Cheminformatics. 2019; 11(1):26. doi:10.1186/s13321-019-0348-5.[83]

Konze KD , Bos PH, Dahlgren MK, Leswing K, Tubert-BrohmanI, Bortolato A, Robbason B, Abel R, Bhat S. Reaction-BasedEnumeration, Active Learning, and Free Energy CalculationsTo Rapidly Explore Synthetically Tractable Chemical Spaceand Optimize Potency of Cyclin-Dependent Kinase 2 In-hibitors. J Chem Inf Model. 2019; 59(9):3782–3793. doi:10.1021/acs.jcim.9b00367.[84]

Yang Q , Burchett W, Steeno GS, Liu S, Yang M, Mobley DL, HouX. Optimal Designs for Pairwise Calculation: An Application toFree Energy Perturbation in Minimizing Prediction Variability. JComput Chem. 2020; 41(3):247–257. doi: 10.1002/jcc.26095.[85]

Xu H . Optimal Measurement Network of Pairwise Diﬀer-ences. J Chem Inf Model. 2019; 59(11):4720–4728. doi:10.1021/acs.jcim.9b00528.[86]

Ciordia M , Pérez-Benito L, Delgado F, Trabanco AA, TresadernG. Application of Free Energy Perturbation for the Design ofBACE1 Inhibitors. J Chem Inf Model. 2016; 56(9):1856–1871.doi: 10.1021/acs.jcim.6b00220.[87]

Wang L , Deng Y, Knight JL, Wu Y, Kim B, Sherman W, Shelley JC,Lin T, Abel R. Modeling Local Structural Rearrangements UsingFEP/REST: Application to Relative Binding Aﬃnity Predictions ofCDK2 Inhibitors. J Chem Theory Comput. 2013; 9(2):1282–1293.doi: 10.1021/ct300911a.[88]

Abel R

Cappel D , Hall ML, Lenselink EB, Beuming T, Qi J, Bradner J,Sherman W. Relative Binding Free Energy Calculations Ap-plied to Protein Homology Models. J Chem Inf Model. 2016;56(12):2388–2400. doi: 10.1021/acs.jcim.6b00362.[90]

Deﬂorian F , Perez-Benito L, Lenselink EB, Congreve M, vanVlijmen HWT, Mason JS, de Graaf C, Tresadern G. Accurate Pre-diction of GPCR Ligand Binding Aﬃnity with Free Energy Pertur-bation. J Chem Inf Model. 2020; doi: 10.1021/acs.jcim.0c00449.[91]

Lenselink EB , Louvel J, Forti AF, van Veldhoven JPD, de VriesH, Mulder-Krieger T, McRobb FM, Negri A, Goose J, Abel R,van Vlijmen HWT, Wang L, Harder E, Sherman W, IJzerman AP,Beuming T. Predicting Binding Aﬃnities for GPCR Ligands UsingFree-Energy Perturbation. ACS Omega. 2016; 1(2):293–304. doi:10.1021/acsomega.6b00086.[92]

Chen W , Deng Y, Russell E, Wu Y, Abel R, Wang L. AccurateCalculation of Relative Binding Free Energies between Ligandswith Diﬀerent Net Charges. J Chem Theory Comput. 2018;14(12):6346–6358. doi: 10.1021/acs.jctc.8b00825. [93]

Wang L , Deng Y, Wu Y, Kim B, LeBard DN, Wandschneider D,Beachy M, Friesner RA, Abel R. Accurate Modeling of ScaﬀoldHopping Transformations in Drug Discovery. J Chem TheoryComput. 2017; 13(1):42–54. doi: 10.1021/acs.jctc.6b00991.[94]

Albanese SK , Chodera JD, Volkamer A, Keng S, Abel R, WangL. Is Structure Based Drug Design Ready for SelectivityOptimization? bioRxiv. 2020; p. 2020.07.02.185132. doi:10.1101/2020.07.02.185132.[95]

Mondal S , Tresadern G, Greenwood J, Kim B, Kaus J, Wirtala M,Steinbrecher T, Wang L, Masse C, Farid R, Abel R. A Free EnergyPerturbation Approach to Estimate the Intrinsic Solubilitiesof Drug-like Small Molecules. . 2019; doi: 10.26434/chem-rxiv.10263077.v1.[96]

Berman H , Henrick K, Nakamura H. Announcing the World-wide Protein Data Bank. Nat Struct Mol Biol. 2003; 10(12):980–980. doi: 10.1038/nsb1203-980.[97]

Wacker D , Fenalti G, Brown MA, Katritch V, Abagyan R, Chere-zov V, Stevens RC. Conserved Binding Mode of Human B Brandt T , Holzmann N, Muley L, Khayat M, Wegscheid-GerlachC, Baum B, Heine A, Hangauer D, Klebe G. Congeneric but StillDistinct: How Closely Related Trypsin Ligands Exhibit DiﬀerentThermodynamic and Structural Properties. J Mol Biol. 2011;405(5):1170–1187. doi: 10.1016/j.jmb.2010.11.038.[99]

Nazaré M , Will DW, Matter H, Schreuder H, Ritter K, UrmannM, Essrich M, Bauer A, Wagner M, Czech J, Lorenz M, LauxV, Wehner V. Probing the Subpockets of Factor Xa RevealsTwo Binding Modes for Inhibitors Based on a 2-CarboxyindoleScaﬀold: A Study Combining Structure-Activity Relationshipand X-Ray Crystallography. J Med Chem. 2005; 48(14):4511–4525. doi: 10.1021/jm0490540.[100]

Kaus JW , Harder E, Lin T, Abel R, McCammon JA, Wang L. HowTo Deal with Multiple Binding Poses in Alchemical RelativeProtein-Ligand Binding Free Energy Calculations. J Chem The-ory Comput. 2015; 11:2670.[101]

Michel J , Essex JW. Prediction of Protein–Ligand Binding Aﬃn-ity by Free Energy Simulations: Assumptions, Pitfalls and Ex-pectations. J Comput Aided Mol Des. 2010; 24(8):639–658. doi:10.1007/s10822-010-9363-3.[102]

Wang L , Berne BJ, Friesner RA. Ligand Binding to Protein-Binding Pockets with Wet and Dry Regions. Proc Natl Acad Sci.2011; 108(4):1326–1330. doi: 10.1073/pnas.1016793108.[103]

Bruce Macdonald HE , Cave-Ayland C, Ross GA, Essex JW. Lig-and Binding Free Energies with Adaptive Water Networks:Two-Dimensional Grand Canonical Alchemical Perturbations.J Chem Theory Comput. 2018; 14(12):6586–6597. doi:10.1021/acs.jctc.8b00614.[104]

Ross GA , Bodnarchuk MS, Essex JW. Water Sites, Networks, AndFree Energies with Grand Canonical Monte Carlo. J Am ChemSoc. 2015; 137(47):14930–14943. doi: 10.1021/jacs.5b07940.43 of 48

LiveCoMS Best Practices Guide [105]

Michel J , Tirado-Rives J, Jorgensen WL. Energetics of DisplacingWater Molecules from Protein Binding Sites: Consequencesfor Ligand Optimization. J Am Chem Soc. 2009; 131(42):15403–15411. doi: 10.1021/ja906058w.[106]

Anandakrishnan R , Aguilar B, Onufriev AV. H++ 3.0: Automat-ing pK Prediction and the Preparation of Biomolecular Struc-tures for Atomistic Molecular Modeling and Simulations. Nu-cleic Acids Res. 2012; 40(Web Server issue):W537–W541. doi:10.1093/nar/gks375.[107]

Søndergaard CR , Olsson MHM, Rostkowski M, Jensen JH. Im-proved Treatment of Ligands and Coupling Eﬀects in EmpiricalCalculation and Rationalization of pKa Values. J Chem TheoryComput. 2011; 7(7):2284–2295. doi: 10.1021/ct200133y.[108]

Jurrus E , Engel D, Star K, Monson K, Brandi J, Felberg LE,Brookes DH, Wilson L, Chen J, Liles K, Chun M, Li P, GoharaDW, Dolinsky T, Konecny R, Koes DR, Nielsen JE, Head-GordonT, Geng W, Krasny R, et al. Improvements to the APBS Biomolec-ular Solvation Software Suite. Protein Sci. 2018; 27(1):112–128.doi: 10.1002/pro.3280.[109] Schrödinger Release 2020-2: Maestro,; 2020. Schrödinger, LLC,New York, NY,.[110]

Olsson MHM , Søndergaard CR, Rostkowski M, Jensen JH.PROPKA3: Consistent Treatment of Internal and SurfaceResidues in Empirical pKa Predictions. J Chem Theory Comput.2011; 7(2):525–537. doi: 10.1021/ct100578z.[111]

Işık M , Levorse D, Rustenburg AS, Ndukwe IE, Wang H, WangX, Reibarkh M, Martin GE, Makarov AA, Mobley DL, Rhodes T,Chodera JD. pKa Measurements for the SAMPL6 PredictionChallenge for a Set of Kinase Inhibitor-like Fragments. J ComputAided Mol Des. 2018; 32(10):1117–1138. doi: 10.1007/s10822-018-0168-0.[112]

Onufriev AV , Alexov E. Protonation and pK Changes in Protein-Ligand Binding. Q Rev Biophys. 2013; 46(2):181–209. doi:10.1017/S0033583513000024.[113]

Evoli S , Mobley DL, Guzzi R, Rizzuti B. Multiple Binding Modesof Ibuprofen in Human Serum Albumin Identiﬁed by AbsoluteBinding Free Energy Calculations. Phys Chem Chem Phys. 2016;18(47):32358–32368. doi: 10.1039/C6CP05680F.[114]

Ruiz-Carmona S , Alvarez-Garcia D, Foloppe N, Garmendia-Doval AB, Juhos S, Schmidtke P, Barril X, Hubbard RE, Morley SD.rDock: A Fast, Versatile and Open Source Program for DockingLigands to Proteins and Nucleic Acids. PLoS Comput Biol. 2014;10(4):e1003571. doi: 10.1371/journal.pcbi.1003571.[115]

Trott O , Olson A J. AutoDock Vina: Improving the Speed andAccuracy of Docking with a New Scoring Function, Eﬃcient Opti-mization, and Multithreading. J Comput Chem. 2010; 31(2):455–461. doi: 10.1002/jcc.21334.[116]

Friesner RA , Banks JL, Murphy RB, Halgren TA, Klicic JJ, MainzDT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Fran-cis P, Shenkin PS. Glide: A New Approach for Rapid, Ac-curate Docking and Scoring. 1. Method and Assessment ofDocking Accuracy. J Med Chem. 2004; 47(7):1739–1749. doi:10.1021/jm0306430. [117]

O’Boyle NM , Banck M, James CA, Morley C, VandermeerschT, Hutchison GR. Open Babel: An Open Chemical Toolbox. JCheminformatics. 2011; 3(1):33. doi: 10.1186/1758-2946-3-33.[118]

Klimovich PV , Mobley DL. Predicting Hydration Free EnergiesUsing All-Atom Molecular Dynamics Simulations and Multi-ple Starting Conformations. J Comput Aided Mol Des. 2010;24(4):307–316. doi: 10.1007/s10822-010-9343-7.[119]

Lim VT , Bayly CI, Fusti-Molnar L, Mobley DL. Assessingthe Conformational Equilibrium of Carboxylic Acid via Quan-tum Mechanical and Molecular Dynamics Studies on AceticAcid. J Chem Inf Model. 2019; 59(5):1957–1964. doi:10.1021/acs.jcim.8b00835.[120]

Mobley DL , Liu S, Lim NM, Wymer KL, Perryman AL, Forli S,Deng N, Su J, Branson K, Olson A J. Blind Prediction of HIVIntegrase Binding from the SAMPL4 Challenge. J Comput AidedMol Des. 2014; 28(4):327–345. doi: 10.1007/s10822-014-9723-5.[121]

Jiang W , Chipot C, Roux B. Computing Relative Bind-ing Aﬃnity of Ligands to Receptor: An Eﬀective HybridSingle-Dual-Topology Free-Energy Perturbation Approach inNAMD. J Chem Inf Model. 2019; 59(9):3794–3802. doi:10.1021/acs.jcim.9b00362.[122]

Rocklin GJ , Mobley DL, Dill KA. Separated Topologies—AMethod for Relative Binding Free Energy Calculations UsingOrientational Restraints. J Chem Phys. 2013; 138(8):085104.doi: 10.1063/1.4792251.[123]

Liu S , Wu Y, Lin T, Abel R, Redmann JP, Summa CM, Jaber VR, LimNM, Mobley DL. Lead Optimization Mapper: Automating FreeEnergy Calculations for Lead Optimization. J Comput Aided MolDes. 2013; 27(9):755–770. doi: 10.1007/s10822-013-9678-y.[124]

RDkit

Kawabata T , Nakamura H. 3D Flexible Alignment Using 2DMaximum Common Substructure: Dependence of PredictionAccuracy on Target-Reference Chemical Similarity. J Chem InfModel. 2014; 54(7):1850–1863. doi: 10.1021/ci500006d.[126]

Raymond JW , Willett P. Maximum Common Subgraph Iso-morphism Algorithms for the Matching of Chemical Struc-tures. J Comput Aided Mol Des. 2002; 16(7):521–533. doi:10.1023/A:1021271615909.[127]

Klabunde T , Giegerich C, Evers A. MARS: Computing Three-Dimensional Alignments for Multiple Ligands Using PairwiseSimilarities. J Chem Inf Model. 2012; 52(8):2022–2030. doi:10.1021/ci3000369.[128]

Jones G , Gao Y, Sage CR. Elucidating Molecular Overlays fromPairwise Alignments Using a Genetic Algorithm. J Chem InfModel. 2009; 49(7):1847–1855. doi: 10.1021/ci900109n.[129]

Liu S , Wang L, Mobley DL. Is Ring Breaking Feasible in RelativeBinding Free Energy Calculations? J Chem Inf Model. 2015;55(4):727–735. doi: 10.1021/acs.jcim.5b00057.44 of 48

LiveCoMS Best Practices Guide [130]

Clark A J , Negron C, Hauser K, Sun M, Wang L, Abel R, FriesnerRA. Relative Binding Aﬃnity Prediction of Charge-Changing Se-quence Mutations with FEP in Protein–Protein Interfaces. J MolBiol. 2019; 431(7):1481–1493. doi: 10.1016/j.jmb.2019.02.003.[131]

Kräutler V , van Gunsteren WF, Hünenberger PH. A Fast SHAKEAlgorithm to Solve Distance Constraint Equations for SmallMolecules in Molecular Dynamics Simulations. Journal of Com-putational Chemistry. 2001; 22(5):501–508. doi: 10.1002/1096-987X(20010415)22:5<501::AID-JCC1021>3.0.CO;2-V.[132]

Mobley DL , Chodera JD, Dill KA. On the Use of OrientationalRestraints and Symmetry Corrections in Alchemical Free EnergyCalculations. J Chem Phys. 2006; 125:084902.[133]

Chen J , Brooks CL. Can Molecular Dynamics SimulationsProvide High-Resolution Reﬁnement of Protein Structure?Proteins: Struct, Funct, Bioinf. 2007; 67(4):922–930. doi:10.1002/prot.21345.[134]

Boresch S , Tettinger F, Leitgeb M, Karplus M. Absolute Bind-ing Free Energies: A Quantitative Approach for Their Cal-culation. J Phys Chem B. 2003; 107(35):9535–9551. doi:10.1021/jp0217839.[135]

Leitgeb M , Schröder C, Boresch S. Alchemical Free EnergyCalculations and Multiple Conformational Substates. J ChemPhys. 2005; 122(8):084109. doi: 10.1063/1.1850900.[136]

Wang K , Chodera JD, Yang Y, Shirts MR. Identifying LigandBinding Sites and Poses Using GPU-Accelerated HamiltonianReplica Exchange Molecular Dynamics. J Comput Aided MolDes. 2013; 27(12):989–1007. doi: 10.1007/s10822-013-9689-8.[137]

Georgiou C , McNae I, Wear M, Ioannidis H, Michel J, Walkin-shaw M. Pushing the Limits of Detection of Weak BindingUsing Fragment-Based Drug Discovery: Identiﬁcation of NewCyclophilin Binders. J Mol Biol. 2017; 429(16):2556–2570. doi:10.1016/j.jmb.2017.06.016.[138]

Sugita Y , Kitao A, Okamoto Y. Multidimensional Replica-Exchange Method for Free-Energy Calculations. J Chem Phys.2000; 113(15):6042–6051. doi: 10.1063/1.1308516.[139]

Chodera JD , Shirts MR. Replica Exchange and Expanded En-semble Simulations as Gibbs Sampling: Simple Improvementsfor Enhanced Mixing. J Chem Phys. 2011; 135(19):194110. doi:10.1063/1.3660669.[140]

Lyubartsev AP , Martsinovski AA, Shevkunov SV, Vorontsov-Velyaminov PN. New Approach to Monte Carlo Calculation ofthe Free Energy: Method of Expanded Ensembles. J Chem Phys.1992; 96(3):1776–1783. doi: 10.1063/1.462133.[141]

Li H , Fajer M, Yang W. Simulated Scaling Method for LocalizedEnhanced Sampling and Simultaneous “Alchemical” Free En-ergy Simulations: A General Method for Molecular Mechanical,Quantum Mechanical, and Quantum Mechanical/MolecularMechanical Simulations. J Chem Phys. 2007; 126(2):024106.doi: 10.1063/1.2424700.[142]

Lin YL , Aleksandrov A, Simonson T, Roux B. An Overview ofElectrostatic Free Energy Computations for Solutions and Pro-teins. J Chem Theory Comput. 2014; 10(7):2690–2709. doi:10.1021/ct500195p. [143]

Öhlknecht C , Lier B, Petrov D, Fuchs J, Oostenbrink C. Correct-ing Electrostatic Artifacts Due to Net-Charge Changes in theCalculation of Ligand Binding Free Energies. J Comput Chem.2020; 41(10):986–999. doi: 10.1002/jcc.26143.[144]

Rocklin GJ , Mobley DL, Dill KA, Hünenberger PH. Calculatingthe Binding Free Energies of Charged Species Based on Explicit-Solvent Simulations Employing Lattice-Sum Methods: An Accu-rate Correction Scheme for Electrostatic Finite-Size Eﬀects. JChem Phys. 2013; 139(18):184103. doi: 10.1063/1.4826261.[145]

Mey ASJS , Jiménez JJ, Michel J. Impact of Domain Knowledgeon Blinded Predictions of Binding Energies by Alchemical FreeEnergy Calculations. J Comput Aided Mol Des. 2018; 32(1):199–210. doi: 10.1007/s10822-017-0083-9.[146]

Beutler TC , Mark AE, van Schaik RC, Gerber PR, van GunsterenWF. Avoiding Singularities and Numerical Instabilities in FreeEnergy Calculations Based on Molecular Simulations. ChemPhys Lett. 1994; 222:529–539.[147]

Beutler TC , van Gunsteren WF. Molecular Dynamics FreeEnergy Calculation in Four Dimensions. J Chem Phys. 1994;101(2):1417–1422.[148]

Gapsys V , Seeliger D, de Groot BL. New Soft-Core PotentialFunction for Molecular Dynamics Based Alchemical Free EnergyCalculations. J Chem Theory Comput. 2012; 8(7):2373–2382.doi: 10.1021/ct300220p.[149]

Naden LN , Shirts MR. Linear Basis Function Approach to Ef-ﬁcient Alchemical Free Energy Calculations. 2. Inserting andDeleting Particles with Coulombic Interactions. J Chem TheoryComput. 2015; 11(6):2536–2549. doi: 10.1021/ct501047e.[150]

Naden LN , Pham TT, Shirts MR. Linear Basis Function Ap-proach to Eﬃcient Alchemical Free Energy Calculations. 1. Re-moval of Uncharged Atomic Sites. J Chem Theory Comput.2014; 10(3):1128–1149. doi: 10.1021/ct4009188.[151]

Pham TT , Shirts MR. Identifying Low Variance Pathwaysfor Free Energy Calculations of Molecular Transformationsin Solution Phase. J Chem Phys. 2011; 135(3):034114. doi:10.1063/1.3607597.[152]

Zacharias M , Straatsma TP, McCammon JA. Separation-ShiftedScaling, a New Scaling Method for Lennard-Jones Interac-tions in Thermodynamic Integration. J Phys Chem. 1994;100(12):9025–9031.[153]

Blondel A . Ensemble Variance in Free Energy Calculations byThermodynamic Integration: Theory, Optimal Alchemical Path,and Practical Solutions. J Comput Chem. 2004; 25(7):985–993.[154]

Pham TT , Shirts MR. Optimal Pairwise and Non-Pairwise Al-chemical Pathways for Free Energy Calculations of Molecu-lar Transformation in Solution Phase. J Chem Phys. 2012;136(12):124120. doi: 10.1063/1.3697833.[155]

Donnini S , Mark AE, Juﬀer AH, Villa A. Molecular DynamicsFree Energy Calculation in Four Dimensions. J Comput Chem.2005; 26(2):115–122.45 of 48

LiveCoMS Best Practices Guide [156]

Steinbrecher T , Joung I, Case DA. Soft-Core Potentials in Ther-modynamic Integration: Comparing One- and Two-Step Trans-formations. J Comput Chem. 2011; 32(15):3253–3263. doi:10.1002/jcc.21909.[157]

Hermans J , Wang L. Inclusion of Loss of Translational andRotational Freedom in Theoretical Estimates of Free Energiesof Binding. Application to a Complex of Benzene and MutantT4 Lysozyme. J Am Chem Soc. 1997; 119(11):2707–2714. doi:10.1021/ja963568+.[158]

Mann G , Hermans J. Modeling Protein-Small Molecule Inter-actions: Structure and Thermodynamics of Noble Gases Bind-ing in a Cavity in Mutant Phage T4 Lysozyme L99A11Editedby B. Honig. J Mol Biol. 2000; 302(4):979–989. doi:10.1006/jmbi.2000.4064.[159]

Wang J , Deng Y, Roux B. Absolute Binding Free Energy Calcula-tions Using Molecular Dynamics Simulations with RestrainingPotentials. Biophysical Journal. 2006; 91(8):2798–2814. doi:10.1529/biophysj.106.084301.[160]

Fujitani H , Tanida Y, Ito M, Jayachandran G, Snow CD, ShirtsMR, Sorin EJ, Pande VS. Direct Calculation of the Binding FreeEnergies of FKBP Ligands. J Chem Phys. 2005; 123(8):084108.doi: 10.1063/1.1999637.[161]

Steinbrecher T , Mobley DL, Case DA. Nonlinear ScalingSchemes for Lennard-Jones Interactions in Free Energy Cal-culations. J Chem Phys. 2007; 127(21).[162]

Crooks GE . Measuring Thermodynamic Length. Phys Rev Lett.2007; 99(10):100602. doi: 10.1103/PhysRevLett.99.100602.[163]

Sivak DA , Crooks GE. Thermodynamic Metrics and OptimalPaths. Phys Rev Lett. 2012; 108(19):190602. doi: 10.1103/Phys-RevLett.108.190602.[164]

Shenfeld DK , Xu H, Eastwood MP, Dror RO, Shaw DE. Mini-mizing Thermodynamic Length to Select Intermediate Statesfor Free-Energy Calculations and Replica-Exchange Simula-tions. Phys Rev E. 2009; 80(4):046705. doi: 10.1103/Phys-RevE.80.046705.[165]

Hayes RL , Armacost KA, Vilseck JZ, Brooks CL. Adaptive Land-scape Flattening Accelerates Sampling of Alchemical Space inMultisite λ Dynamics. J Phys Chem B. 2017; 121(15):3626–3635.doi: 10.1021/acs.jpcb.6b09656.[166]

Monroe JI , Shirts MR. Converging Free Energies of Bindingin Cucurbit[7]Uril and Octa-Acid Host–Guest Systems fromSAMPL4 Using Expanded Ensemble Simulations. J ComputAided Mol Des. 2014; 28(4):401–415. doi: 10.1007/s10822-014-9716-4.[167]

Swendsen RH , Wang JS. Replica Monte Carlo Simulation ofSpin-Glasses. Phys Rev Lett. 1986; 57(21):2607–2609. doi:10.1103/PhysRevLett.57.2607.[168]

Sugita Y , Okamoto Y. Replica-Exchange Molecular DynamicsMethod for Protein Folding. Chem Phys Lett. 1999; 314(1):141–151. doi: 10.1016/S0009-2614(99)01123-9.[169]

Woods CJ , Essex JW, King MA. The Development of Replica-Exchange-Based Free-Energy Methods. J Phys Chem B. 2003;107(49):13703–13710. doi: 10.1021/jp0356620. [170]

Jiang W , Roux B. Free Energy Perturbation Hamiltonian Replica-Exchange Molecular Dynamics (FEP/H-REMD) for Absolute Lig-and Binding Free Energy Calculations. J Chem Theory Comput.2010; 6(9):2559–2565. doi: 10.1021/ct1001768.[171]

Marinari E , Parisi G. Simulated Tempering: A New MonteCarlo Scheme. Europhys Lett. 1992; 19(6):451–458. doi:10.1209/0295-5075/19/6/002.[172]

Tan Z . Optimally Adjusted Mixture Sampling and LocallyWeighted Histogram Analysis. J Comput Graph Stat. 2017;26(1):54–65. doi: 10.1080/10618600.2015.1113975.[173]

Rizzi A , Chodera J, Naden L, Beauchamp K, Grinaway P, Fass J,adw62, Rustenburg B, Ross GA, Krämer A, Macdonald HB, Swen-son DWH, Simmonett A, hb0402, ajsilveira, Choderalab/Open-mmtools: 0.19.0 - Multiple Alchemical Regions; 2019. doi:10.5281/zenodo.3532826. Zenodo.[174]

Plount Price ML , Jorgensen WL. Analysis of Binding Aﬃnitiesfor Celecoxib Analogues with COX-1 and COX-2 from CombinedDocking and Monte Carlo Simulations and Insight into the COX-2/COX-1 Selectivity. J Am Chem Soc. 2000; 122(39):9455–9466.doi: 10.1021/ja001018c.[175]

Mobley DL , Dill KA. Binding of Small-Molecule Ligands to Pro-teins: “What You See” Is Not Always “What You Get”. Structure.2009; 17(4):489–498. doi: 10.1016/j.str.2009.02.010.[176]

Calabrò G , Woods CJ, Powlesland F, Mey ASJS, Mulholland A J,Michel J. Elucidation of Nonadditive Eﬀects in Protein–LigandBinding Energies: Thrombin as a Case Study. J Phys Chem B.2016; 120(24):5340–5350. doi: 10.1021/acs.jpcb.6b03296.[177]

Steinbrecher TB , Dahlgren M, Cappel D, Lin T, Wang L, KrilovG, Abel R, Friesner R, Sherman W. Accurate Binding Free EnergyPredictions in Fragment Optimization. J Chem Inf Model. 2015;55(11):2411–2420. doi: 10.1021/acs.jcim.5b00538.[178]

Palma PN , Bonifácio MJ, Loureiro AI, Soares-da-Silva P. Compu-tation of the Binding Aﬃnities of Catechol-O-MethyltransferaseInhibitors: Multisubstate Relative Free Energy Calculations. JComput Chem. 2012; 33(9):970–986. doi: 10.1002/jcc.22926.[179]

Voet ARD , Kumar A, Berenger F, Zhang KYJ. Combining inSilico and in Cerebro Approaches for Virtual Screening andPose Prediction in SAMPL4. J Comput Aided Mol Des. 2014;28(4):363–373. doi: 10.1007/s10822-013-9702-2.[180]

Gallicchio E , Deng N, He P, Wickstrom L, Perryman AL, SantiagoDN, Forli S, Olson A J, Levy RM. Virtual Screening of IntegraseInhibitors by Large Scale Binding Free Energy Calculations: TheSAMPL4 Challenge. J Comput Aided Mol Des. 2014; 28(4):475–490. doi: 10.1007/s10822-014-9711-9.[181]

Rocklin GJ , Boyce SE, Fischer M, Fish I, Mobley DL, Shoichet BK,Dill KA. Blind Prediction of Charged Ligand Binding Aﬃnities ina Model Binding Site. J Mol Biol. 2013; 425(22):4569–4583. doi:10.1016/j.jmb.2013.07.030.[182]

Boyce SE , Mobley DL, Rocklin GJ, Graves AP, Dill KA, ShoichetBK. Predicting Ligand Binding Aﬃnity with Alchemical FreeEnergy Methods in a Polar Model Binding Site. J Mol Biol. 2009;394(4):747–763. doi: 10.1016/j.jmb.2009.09.049.46 of 48

LiveCoMS Best Practices Guide [183]

Lincoﬀ J , Sasmal S, Head-Gordon T. Comparing GeneralizedEnsemble Methods for Sampling of Systems with Many De-grees of Freedom. J Chem Phys. 2016; 145(17):174107. doi:10.1063/1.4965439.[184]

Sasmal S , Gill SC, Lim NM, Mobley DL. Sampling Con-formational Changes of Bound Ligands Using Nonequi-librium Candidate Monte Carlo and Molecular Dynamics.J Chem Theory Comput. 2020; 16(3):1854–1865. doi:10.1021/acs.jctc.9b01066.[185]

Shirts MR , Pande VS. Comparison of Eﬃciency and Bias ofFree Energies Computed by Exponential Averaging, the BennettAcceptance Ratio, and Thermodynamic Integration. J ChemPhys. 2005; 122:144107.[186]

Chodera JD . A Simple Method for Automated EquilibrationDetection in Molecular Simulations. J Chem Theory Comput.2016; 12(4):1799–1805. doi: 10.1021/acs.jctc.5b00784.[187]

Beauchamp K , Chodera J, Naden L, Shirts M, MartinianiS, Stern C, McGibbon RT, Gowers R, Dotson D, Choderal-ab/Pymbar: Critical Bugﬁx Release; 2019. doi: 10.5281/zen-odo.3559263. Zenodo.[188]

Dotson D , Beckstein O, Wille D, Kenney I, shuail, trje3733, LeeH, Lim V, brycestx, Barhaghi MS, Alchemistry/Alchemlyb: 0.3.0;2019. doi: 10.5281/zenodo.3361016. Zenodo.[189]

Nüske F , Wu H, Prinz JH, Wehmeyer C, Clementi C, NoéF. Markov state models from short non-equilibrium simula-tions—Analysis and correction of estimation bias. The Journalof Chemical Physics. 2017; 146(9):094104.[190]

Jarzynski C . Nonequilibrium Equality for Free Energy Diﬀer-ences. Phys Rev Lett. 1997; 78(14):2690–2693.[191]

Jarzynski C . Equilibrium Free Energies from NonequilibriumProcesses. Act Phys Pol B. 1998; 29(6):1609–1622.[192]

Crooks GE . Path-Ensemble Averages in Systems Driven Farfrom Equilibrium. Phys Rev E. 2000; 61(3):2361–2366.[193]

Maragakis P , Ritort F, Bustamante C, Karplus M, Crooks GE.Bayesian Estimates of Free Energies from NonequilibriumWork Data in the Presence of Instrument Noise. J Chem Phys.2008; 129(2):024102–8. doi: 10.1063/1.2937892.[194]

Oberhofer H , Dellago C, Geissler PL. Biased Sampling ofNonequilibrium Trajectories: Can Fast Switching SimulationsOutperform Conventional Free Energy Calculation Methods? JPhys Chem B. 2005; 109:6902–6915.[195]

Procacci P . Unbiased Free Energy Estimates in Fast Nonequi-librium Transformations Using Gaussian Mixtures. J ChemPhys. 2015; 142(15):154117. doi: 10.1063/1.4918558.[196]

Ytreberg FM , Zuckerman DM. Single-Ensemble Nonequilib-rium Path-Sampling Estimates of Free Energy Diﬀerences. JChem Phys. 2004; 120(23):10876–9.[197]

Lu ND , Singh JK, Kofke DA. Appropriate Methods to CombineForward and Reverse Free-Energy Perturbation Averages. JChem Phys. 2003; 118(7):2977–2984. [198]

Lelièvre T , Rousset M, Stoltz G. Free Energy Computations.IMPERIAL COLLEGE PRESS; 2010. doi: 10.1142/p579.[199]

Jarzynski C . Rare Events and the Convergence of ExponentiallyAveraged Work Values. Phys Rev E. 2006; 73:046105.[200]

Resat H , Mezei M. Studies on Free Energy Calculations. I. Ther-modynamic Integration Using a Polynomial Path. J Chem Phys.1993; 99(8):6052–6061.[201]

Jorge M , Garrido N, Queimada A, Economou I, Macedo E. Eﬀectof the Integration Method on the Accuracy and ComputationalEﬃciency of Free Energy Calculations Using ThermodynamicIntegration. J Chem Theory Comput. 2010; 6(4):1018–1027. doi:10.1021/ct900661c.[202]

Shyu C , Ytreberg FM. Reducing the Bias and Uncertainty ofFree Energy Estimates by Using Regression to Fit Thermody-namic Integration Data. J Comput Chem. 2009; 30(14):2297–2304. doi: 10.1002/jcc.21231.[203]

Paliwal H , Shirts MR. A Benchmark Test Set for AlchemicalFree Energy Transformations and Its Use to Quantify Error inCommon Free Energy Methods. J Chem Theory Comput. 2011;7(12):4115–4134. doi: 10.1021/ct2003995.[204]

Tan Z . On a Likelihood Approach for Monte Carlo Inte-gration. J Am Stat Soc. 2004; 99(468):1027–1036. doi:10.1198/016214504000001664.[205]

Shirts MR . Reweighting from the Mixture Distribution as aBetter Way to Describe the Multistate Bennett AcceptanceRatio. arXiv:170400891 [cond-mat]. 2017; .[206]

Mobley DL , Gilson MK. Predicting Binding Free Energies: Fron-tiers and Benchmarks. Annu Rev Biophys. 2017; 46(1):531–558.doi: 10.1146/annurev-biophys-070816-033654.[207]

Mobley DL , Heinzelmann G, Henriksen NM, Gilson MK. Pre-dicting Binding Free Energies: Frontiers and Benchmarks (aPerpetual Review). . 2017; .[208]

Dakka J , Farkas-Pall K, Turilli M, Wright DW, Coveney PV, Jha S.Concurrent and Adaptive Extreme Scale Binding Free EnergyCalculations. arXiv:180101174 [cs]. 2018; .[209]

Hahn DF , Hünenberger PH. Alchemical Free-Energy Calcu-lations by Multiple-Replica λ -Dynamics: The Conveyor BeltThermodynamic Integration Scheme. J Chem Theory Comput.2019; 15(4):2392–2419. doi: 10.1021/acs.jctc.8b00782.[210] Brown SP , Muchmore SW, Hajduk PJ. Healthy Skepticism:Assessing Realistic Model Performance. Drug Discov. 2009;14(7):420–427. doi: 10.1016/j.drudis.2009.01.012.[211] Drugdata/Metk; 2018. Drug Design Data Resource.[212]

Walters WP . What Are Our Models Really Telling Us?A Practical Tutorial on Avoiding Common Mistakes WhenBuilding Predictive Models. In:

Chemoinformatics for DrugDiscovery

John Wiley & Sons, Ltd; 2013.p. 1–31. doi:10.1002/9781118742785.ch1.[213]

Antonia M , Michellab/Freenrgworkﬂows; 2019. michellab.47 of 48

LiveCoMS Best Practices Guide [214]

Cui G , Graves AP, Manas ES. GRAM: A True Null Model forRelative Binding Aﬃnity Predictions. J Chem Inf Model. 2020;60(1):11–16. doi: 10.1021/acs.jcim.9b00939.[215]

Jain AN , Nicholls A. Recommendations for Evaluation of Com-putational Methods. J Comput Aided Mol Des. 2008; 22(3):133–139. doi: 10.1007/s10822-008-9196-5.[216]

Walter P , Some Thoughts on Evaluating Predictive Models;.[217]

Wagner V , Jantz L, Briem H, Sommer K, Rarey M, Christ CD.Computational Macrocyclization: From de Novo MacrocycleGeneration to Binding Aﬃnity Estimation. ChemMedChem.2017; 12(22):1866–1872. doi: 10.1002/cmdc.201700478.[218]

Gaieb Z , Liu S, Gathiaka S, Chiu M, Yang H, Shao C, Feher VA,Walters WP, Kuhn B, Rudolph MG, Burley SK, Gilson MK, AmaroRE. D3R Grand Challenge 2: Blind Prediction of Protein–LigandPoses, Aﬃnity Rankings, and Relative Binding Free Energies. JComput Aided Mol Des. 2018; 32(1):1–20. doi: 10.1007/s10822-017-0088-4.[219]

Gaieb Z , Parks CD, Chiu M, Yang H, Shao C, Walters WP, Lam-bert MH, Nevins N, Bembenek SD, Ameriks MK, Mirzadegan T,Burley SK, Amaro RE, Gilson MK. D3R Grand Challenge 3: BlindPrediction of Protein–Ligand Poses and Aﬃnity Rankings. JComput Aided Mol Des. 2019; 33(1):1–18. doi: 10.1007/s10822-018-0180-4.[220]

Salomon-Ferrer R , Case DA, Walker RC. An Overview of theAmber Biomolecular Simulation Package. WIREs Comput MolSci. 2013; 3(2):198–210. doi: 10.1002/wcms.1121.[221]

Bonomi M , Bussi G, Camilloni C, Tribello GA, Banáš P, BarducciA, Bernetti M, Bolhuis PG, Bottaro S, Branduardi D, Capelli R,Carloni P, Ceriotti M, Cesari A, Chen H, Chen W, Colizzi F, DeS, De La Pierre M, Donadio D, et al. Promoting Transparencyand Reproducibility in Enhanced Molecular Simulations. NatMethods. 2019; 16(8):670–673. doi: 10.1038/s41592-019-0506-8.[222]

Lemkul J . From Proteins to Perturbed Hamiltonians: A Suite ofTutorials for the GROMACS-2018 Molecular Simulation Package[Article v1.0]. LiveCoMS. 2018; 1(1):5068–. doi: 10.33011/live-coms.1.1.5068.[223]

Abraham MJ , Murtola T, Schulz R, Páll S, Smith JC, HessB, Lindahl E. GROMACS: High Performance MolecularSimulations through Multi-Level Parallelism from Laptopsto Supercomputers. SoftwareX. 2015; 1-2:19–25. doi:10.1016/j.softx.2015.06.001.[224]

Åqvist Johan MJ Kamerlin Shina Caroline Lynn Bauer Paul ,Q6: A Comprehensive Toolkit for Empirical Valence Bond andRelated Free Energy Calculations; 2017. doi: 10.5281/zen-odo.1002739. Zenodo.[225]

Jo S , Kim T, Iyer VG, Im W. CHARMM-GUI: A Web-BasedGraphical User Interface for CHARMM. J Comput Chem. 2008;29(11):1859–1865. doi: 10.1002/jcc.20945.[226]

Suruzhon M , Senapathi T, Bodnarchuk MS, Viner R, Wall ID,Barnett CB, Naidoo KJ, Essex JW. ProtoCaller: Robust Automa-tion of Binding Free Energy Calculations. J Chem Inf Model.2020; 60(4):1917–1921. doi: 10.1021/acs.jcim.9b01158. [227]

Dotson D , Beckstein O, Wille D, Kenney I, shuail, trje3733, LeeH, Lim V, Allen B, Barhaghi MS, Alchemistry/Alchemlyb: 0.3.1;2020. doi: 10.5281/zenodo.3610564. Zenodo.[228]

Williams-Noonan BJ

Gathiaka S , Liu S, Chiu M, Yang H, Stuckey JA, Kang YN, Delpro-posto J, Kubish G, Dunbar JB, Carlson HA, Burley SK, WaltersWP, Amaro RE, Feher VA, Gilson MK. D3R Grand Challenge2015: Evaluation of Protein–Ligand Pose and Aﬃnity Predic-tions. J Comput Aided Mol Des. 2016; 30(9):651–668. doi:10.1007/s10822-016-9946-8.[232] SAMPL Challenges;. Accessed: 2019-08-01. https://samplchallenges.github.io/.[233]

Rizzi A , Murkli S, McNeill JN, Yao W, Sullivan M, Gilson MK,Chiu MW, Isaacs L, Gibb BC, Mobley DL, Chodera JD. Overviewof the SAMPL6 Host–Guest Binding Aﬃnity Prediction Chal-lenge. J Comput Aided Mol Des. 2018; 32(10):937–963. doi:10.1007/s10822-018-0170-6.[234]

Yin J , Henriksen NM, Slochower DR, Shirts MR, Chiu MW, Mob-ley DL, Gilson MK. Overview of the SAMPL5 Host–Guest Chal-lenge: Are We Doing Better? J Comput Aided Mol Des. 2017;31(1):1–19. doi: 10.1007/s10822-016-9974-4.[235]

Muddana HS , Fenley AT, Mobley DL, Gilson MK. The SAMPL4Host–Guest Blind Prediction Challenge: An Overview. J ComputAided Mol Des. 2014; 28(4):305–317. doi: 10.1007/s10822-014-9735-1.[236]

Roos K , Wu C, Damm W, Reboul M, Stevenson JM, Lu C,Dahlgren MK, Mondal S, Chen W, Wang L, Abel R, FriesnerRA, Harder ED. OPLS3e: Extending Force Field Coverage forDrug-Like Small Molecules. J Chem Theory Comput. 2019;15(3):1863–1874. doi: 10.1021/acs.jctc.8b01026.[237]

Yu HS , Deng Y, Wu Y, Sindhikara D, Rask AR, Kimura T,Abel R, Wang L. Accurate and Reliable Prediction of theBinding Aﬃnities of Macrocycles to Their Protein Targets.J Chem Theory Comput. 2017; 13(12):6290–6300. doi:10.1021/acs.jctc.7b00885.[238]