[PDF] Ligand-protein interactions in lysozyme investigated through a dual-resolution model

Abstract

A fully atomistic modelling of biological macromolecules at relevant length- and time-scales is often cumbersome or not even desirable, both in terms of computational effort required and it a posteriori analysis. This difficulty can be overcome with the use of multi-resolution models, in which different regions of the same system are concurrently described at different levels of detail. In enzymes, computationally expensive atomistic detail is crucial in the modelling of the active site in order to capture e.g. the chemically subtle process of ligand binding. In contrast, important yet more collective properties of the remainder of the protein can be reproduced with a coarser description. In the present work, we demonstrate the effectiveness of this approach through the calculation of the binding free energy of hen egg white lysozyme (HEWL) with the inhibitor di-N-acetylchitotriose. Particular attention is posed to the impact of the mapping, i.e. the selection of atomistic and coarse-grained residues, on the binding free energy. It is shown that, in spite of small variations of the binding free energy with respect to the active site resolution, the separate contributions coming from different energetic terms (such as electrostatic and van der Waals interactions) manifest a stronger dependence on the mapping, thus pointing to the existence of an optimal level of intermediate resolution.

Full PDF

LLigand-protein interactions in lysozyme investigated through a dual-resolution model

Raﬀaele Fiorentini, Kurt Kremer, and Raﬀaello Potestio

2, 3, ∗ Max Planck Institute for Polymer Research, Mainz, Germany Physics Department, University of Trento, via Sommarive, 14 I-38123 Trento, Italy INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy (Dated: February 19, 2020)A fully atomistic modelling of biological macromolecules at relevant length- and time-scales is oftencumbersome or not even desirable, both in terms of computational eﬀort required and a posteriori analysis. This diﬃculty can be overcome with the use of multi-resolution models, in which diﬀerentregions of the same system are concurrently described at diﬀerent levels of detail. In enzymes,computationally expensive atomistic detail is crucial in the modelling of the active site in orderto capture e.g. the chemically subtle process of ligand binding. In contrast, important yet morecollective properties of the remainder of the protein can be reproduced with a coarser description.In the present work, we demonstrate the eﬀectiveness of this approach through the calculation of thebinding free energy of hen egg white lysozyme (HEWL) with the inhibitor di-N-acetylchitotriose.Particular attention is posed to the impact of the mapping, i.e. the selection of atomistic and coarse-grained residues, on the binding free energy. It is shown that, in spite of small variations of thebinding free energy with respect to the active site resolution, the separate contributions coming fromdiﬀerent energetic terms (such as electrostatic and van der Waals interactions) manifest a strongerdependence on the mapping, thus pointing to the existence of an optimal level of intermediateresolution.

I. INTRODUCTION

One of the most relevant challenges of computationalbiochemistry and biophysics is the accurate calculationof binding free energies [1–3], which represents one of thekey steps in the identiﬁcation of pharmacological targetsas well as in the development of new drugs [4–6]. How-ever, the large sizes of the molecules under examination(often above the hundred of residues), as well as the ne-cessity to screen through large datasets of potential can-didate molecules, make this eﬀort onerous in terms oftime and computational resources.A promising way to mitigate these limitations is theuse of multiple-resolution models of the protein, that is,representations in which diﬀerent parts of the moleculeare concurrently described at diﬀerent levels of resolution[7–15]. The chemically relevant part of the protein, e.g.the active site, is modelled at level of detail, typicallyatomistic. For the remainder, on the contrary, a sim-pliﬁed representation is used, where several atoms arelumped together in eﬀective interaction sites. The work-ing hypothesis underlying these methods is that only arelatively small part of the molecule requires an explic-itly atomistic treatment; the remainder, in fact, is mainlyresponsible for large-scale, collective ﬂuctuations whosefunction-oriented role is well recognised and prominent[15–19], however also prone to be accurately reproducedby lower-resolution representations [20–25]. Hence, theresulting model favourably joins the accuracy of an atom-istic (AT) description where needed and the computa-tional eﬃciency of a coarse-grained (CG) one where pos-sible. ∗ raﬀ[email protected] In order to take full advantage of the dual-resolutionapproach to protein modelling, though, one has to solvea few key open issues: ﬁrst, the deﬁnition of the ap-propriate coarse-grained model to employ in the low-resolution part [25–33]; second, the coupling betweenhigh- and low-resolution models, which has to be per-formed so as to guarantee that the appropriate observ-ables are reproduced with respect to the reference pro-vided for example by a fully atomistic simulation. Thisissue entails a further one, namely the identiﬁcation ofthe correct observables apt to quantify the ﬁdelity withwhich the behaviour of the system is reproduced by thedual-resolution model; third, the selection of the subpartof the molecule that requires a high-resolution modelling.In the present work we will focus speciﬁcally on this thirdaspect.Various methods and approaches have been developedin the past few years to describe proteins in dual res-olution [7, 10–14]. In general, the high-resolution partis modelled at the all-atom level, making use of one ofthe several atomistic force ﬁelds available. The coarse-grained representations range from simple bead-springelastic networks [15, 20, 23] to more sophisticated G¯o-type models [11]. Recently, we have proposed a dual-resolution model [15] where, in the CG part, only the C α carbons of the protein chain are retained and connectedone with the other by harmonic bonds. This model hasbeen employed in the present work with the aim of as-sessing the accuracy of a hybrid atomistic/coarse-graineddescription of a protein for binding free energy calcula-tions. The system under examination is hen egg-whitelysozyme in explicit water, bound to a sugar substrate,di-N-acetylchitotriose. We carried out calculations of thebinding free energy of the ligand in the active site, with atwofold objective. In fact, not only we aimed at verifying a r X i v : . [ q - b i o . B M ] F e b that the computed quantity in the dual-resolution modelmatches a reference, all-atom calculation; but rather wealso investigated the impact of diﬀerent choices in thedeﬁnition of the high-resolution subdomain. This aspectbears the highest prominence, as it is becoming increas-ingly more evident that a crucial component in the con-struction of accurate and eﬀective low-resolution modelsfor biological and soft matter systems is represented bythe mapping [15, 32, 33], that is, the particular selectionof collective variables employed to describe the system.Here, we provide novel evidence of this general propertyin the context of a dual-resolution model of a biomolecule,and describe a transferable strategy to tackle this issue. II. METHODS

The system under examination in the present work ishen egg-white lysozyme (HEWL) in aqueous solution. Inthis model, the binding site of the enzyme and the sub-strate molecule, the inhibitor di-N-acetylchitotriose, arerepresented with atomistic detail. The protein model em-ployed is not adaptive, that is, the resolution of a givenresidue is ﬁxed –either atomistic or coarse-grained– anddoes not change throughout a simulation. However, atdiﬀerence with other works [8, 9, 11], several values ofthe number of protein residues treated at high resolutionhave been explored and employed in independent calcula-tions. The impact of choosing diﬀerent numbers of activesite residues to model at the atomistic level is a centralaspect of this study. The coarse-grained model employedto describe the low-resolution part of the protein is a sim-ple bead-spring representation where the selected sites(namely the C α atoms) are connected by elastic bondspenalising the deviations from the distances that inter-acting atoms have in the reference conformation. Twovalues of elastic constants employed, one for C α ’s alongthe chain, and one for all other bonds. Water moleculesare described in atomistic detail throughout the wholesimulation box: the interaction with the high-resolutionpart of the protein takes place through the standard all-atom force ﬁeld, while the interaction with the coarse-grained beads is mediated by a purely repulsive potentialacting on the sole oxygen atom.Hereafter we provide a detailed description of themodel. We ﬁrst discuss the calculation of the bindingfree energy ∆ G bind , then we outline the dual-resolutionmodel and its coupling to the atomistic part, and ﬁnallyreport information about the simulation setup. Furtherdetails are made available in the Supporting Information . A. Binding Free Energy calculation

One of the key points of this work is the calculationof the protein-ligand binding free energy ∆ G bind , whichquantiﬁes the aﬃnity of a molecule towards a protein [1–3]. As such, it plays a prominent role in the investigation of the biochemical function and activity of enzymes andsimilar biomolecules, and in the development of eﬀectivedrugs.∆ G bind is deﬁned as the diﬀerence between the freeenergy of the system in the conﬁguration in which theligand is bound to the active site ( G b ) and the corre-sponding value when the ligand is absent ( G ub ):∆ G bind = G b − G ub (1)This value, in the speciﬁc case under examination,varies according to the number of active site residuesmodelled with atomistic resolution, as we will see in Sect.III.The free energy diﬀerence between two states is herecomputed by means of thermodynamic integration (TI)[34]. Speciﬁcally, a scalar λ ∈ [0 ,

1] is deﬁned whichparametrises the potential energy of the system as U λ ( r ) = λU A ( r ) + (1 − λ ) U B ( r ) connecting the statesA and B. The sought quantity is given by:∆ G = (cid:90) (cid:28) ∂U ( λ ) ∂λ (cid:29) λ dλ (2)Since the free energy is a state function, the natureof the path is unimportant, and one can choose a ther-modynamic cycle that connects the bound and unboundstates through several intermediate ones, as illustratedin Fig.1. In particular, we can identify two main terms:the insertion of the ligand from vacuum to water ∆ G lig ,and the decoupling from the protein ∆ G compl . A furtherstep is the removal of the restraints that keep the ligandin proximity of the protein during the damping of theligand-protein interactions, ∆ G r off ; this latter calcula-tion can be carried out analytically without the need torun simulations. Hence, ∆ G bind is the algebraic sum ofthe previous three terms:∆ G bind = ∆ G compl + ∆ G lig + ∆ G r off (3)According to the previous deﬁnitions of each term, nei-ther ∆ G lig nor ∆ G r off changes with the protein reso-lution: indeed, the former corresponds to the solvationfree energy of the ligand, which is always treated at theatomistic level; likewise, the calculation of the restraintremoval free energy is analytic [3]. The unique term thatvaries depending on the number of active site residuesmodelled in high resolution is the free energy change ofthe protein-ligand complex between the bound state andthe state where the ligand is removed, that is, the varia-tion of ∆ G bind is equal to the variation of ∆ G compl .The alchemical change in the calculation of ∆ G compl is performed in three steps (in the following, the sub-scripts c and (cid:96) stand for complex and ligand, respec-tively). First, one adds a set of restraints between protein G r o↵ G lig G coul ,` G LJ ,` + G bind G co m p l G r on G coul,c = ++ FIG. 1: pictorial representation of thermodynamiccycle. Starting from the top-right corner of the ﬁgure,we decouple the ligand from the protein (∆ G compl ,which also includes a set of restraints between ligandand protein) and subsequently introduce it in water(∆ G lig ). A further step is the restraints removal(∆ G r off ) whose calculation is analytical.and ligand (∆ G r on ) in order to avoid the problem of theligand leaving the binding pocket when interactions arebeing removed. The presence of restraints is indicatedin the cycle scheme of Fig.1 with a red circle: it rep-resents the fact that the ligand is conﬁned in a certainvolume. For this work we use the set of restraints de-scribed by Boresch [3]. Second, Coulomb interactions areswitched oﬀ (∆ G coul,c ); third, the Lennard-Jones poten-tials modelling van der Waals interactions are removed(∆ G LJ,c ). Likewise, the alchemical change in the lig-and free energy ∆ G lig is performed in two steps: ﬁrstswitching on Coulomb interaction (∆ G coul,(cid:96) ), and thenLennard-Jones (∆ G LJ,(cid:96) ). The last contribution to thebinding free energy, ∆ G r off , derives from restraint re-moval: its calculation is analytical and therefore it doesnot require alchemical changes. These transformationsare summarised in Fig. 1 and Tab. I. Further details canbe found in the Supporting Information in the sectionrelative to the thermodynamic cycle.The calculation of ∆ G compl can be carried out in twodiﬀerent ways, namely decoupling and annihilation. De-coupling refers to turning oﬀ the interaction between themolecule and its environment, while maintaining the po-tentials among atoms constituting the molecule; annihi-lation, on the other hand, implies turning oﬀ the interac- TABLE I: Summary of the alchemical changes and theprotein resolution dependence for each contribute ofBinding free energy ∆ G bind . prot. res.alchemical changes dependence Δ G compl ∆ G coul,c + ∆ G LJ,c + ∆ G r on YES Δ G lig ∆ G coul ,(cid:96) + ∆ G LJ ,(cid:96) NO Δ G r oﬀ Analytical NO tion between the molecule and the environment as wellas the intramolecular interaction. Here we consider thevalues of ∆ G obtained through ligand decoupling, sincethis process is more intuitive with respect to annihilation;furthermore, the ligand is always treated at fully atom-istic detail, therefore it is not involved in the change offree energy while varying the protein resolution. In Tab.III and Fig. 6 (and with greater detail in the Support-ing Information , annihilation section) we provide datashowing that the values of binding free energy obtainedusing decoupling and annihilation are consistent withinthe error bars.

B. Dual-Resolution protein model

In this work the solvent is treated with all-atom de-tail, while the protein has a ﬁxed (i.e. position- andtime-independent) dual-resolution. The binding site ismodelled with atomistic resolution, whereas the rest ofthe protein is coarse-grained. To describe the lower-resolution part we employ an elastic network model(ENM) [15, 20], in which each residue is mapped ontoa bead whose position corresponds to the C α atom inthe atomistic description. These beads are connected byharmonic springs as shown in Fig. 2.The potential energy is given by: E = (cid:88) i (cid:88) j k ij (cid:0) r ij − r ij (cid:1) θ ( r c − r ij ) (4)with spring constants k ij , equilibrium distance r ij , a cut-oﬀ distance r c , i and j are the node index, and θ ( r ) isa Heaviside theta function taking value 1 if r > k b ) for consecutivebeads, represented in blue in Fig. 2; and a weaker spring k nb for not consecutive beads whose distance in the ref-erence (native) conformation lies below a ﬁxed cutoﬀ (ingreen).The ENM used here is parametrised to reproducethe conformational ﬂuctuations of the reference all-atommodel, these being quantiﬁed by the root mean squareﬂuctuations (RMSF) of the all C α atoms of the system[15]. The residues in direct contact (H-bonding or hy-drophobic contact) with the substrate are modelled withall-atom detail; in order to select the other binding siteFIG. 2: Visualisation of the dual-resolution protein [15].The residues included in atomistic detail are shown inred, blue, cyan and white (O, N, C and H atoms). Thegrey spheres are ENM nodes, the stiﬀ backbone springsare shown as dark blue lines and all others (weaker)springs are shown in green.residues to be described at the atomistic level, we sortedthem by increasing distance of their the center of massfrom the closest ligand atom.The water-CG protein interaction consists in a sim-ple excluded volume, modelled via a Weeks-Chandler-Anderson (WCA) potential [35]. The details about theprocedure followed to determine the ENM elastic con-stants and the excluded volume interaction are providedin the Supporting Information , while the numerical val-ues of the resulting parameters are reported hereafter.

C. Simulation details

The reference model is given by the 2 ns equili-brated PDB structure 1HEW in the NPT ensemble (theParrinello-Rahman barostat [36] with a time constant of2.0 ps and 1 bar was used). Both fully atomistic anddual-resolution models of HEWL are solvated in waterand placed in a cubic simulation box of 7.06 nm side.The force ﬁeld employed is Amber99SB [37], whereas thewater model is TIP3P [38]. The inhibitor, which wasalways atomistic, had GLYCAM forceﬁeld parametersconsistent with Amber99SB [39]. The TI binding freeenergy calculation consists of 3 diﬀerent steps: ∆ G compl ,∆ G r off , ∆ G lig :1. The protein-ligand complex free energy ( Δ G compl )calculation uses 11 λ values per ∆ G restr on,c , 5evenly spaced λ values per ∆ G LJ,c (with separa-tion 0.20) and 15 λ values per ∆ G coul,c , with 600ps of simulation per λ in the fully atomistic case,and 4000 ps in the dual-resolution case to improvethe statistics. 2. The restraint removal free energy ( Δ G r oﬀ ) calcu-lation is analytical (details on Supporting Informa-tion ).3. The ligand solvation free energy ( Δ G lig ) calculationuses 5 evenly spaced λ values per ∆ G coul,(cid:96) (withseparation 0.20) and 16 λ values per ∆ G LJ,(cid:96) , with600 ps of simulation of each λ -value.In the thermodynamic integration we employ the soft-core potential of Ref. [40] with parameters α = 0 . p = 1 . γ = 15 ps − . The integration step is 1 fs. The calculationof electrostatic interaction is performed using the reac-tion ﬁeld method with a dielectric constant (cid:15) = 80 anda cutoﬀ of 1.2 nm. These parameters are a good com-promise between speed and accuracy, as veriﬁed in Ref.[41]. The SETTLE [42] and RATTLE [43] algorithmsfor rigid water and rigid bonds to hydrogen have beenused. Each system is prepared using fully atomistic min-imisation with steepest descent and 6 ns of equilibrationin NVT (for both ligand-free and ligand-bound systems).All simulations (both fully atomistic and dual-resolution)are carried out with the ESPResSo++ simulation pack-age [44, 45], in which we have implemented TI (except incase of annihilation, for which all steps are performed inboth ESPResSo++ and GROMACS [46]). Some prelim-inary fully atomistic equilibration simulations use GRO-MACS. The error bars shown are calculated using theStudent t at 95% conﬁdence limit [47], via standard de-viations obtained using block averaging in which all tra-jectories are divided into four blocks of equal length.The parametrization of the dual-resolution model isconsistent with the work in Ref.[15]: the spring constantbetween consecutive C α nodes along the backbone ( k b )has a stiﬀ value of 5 · kJ · mol − · nm − , whilst all theother ones ( k nb ) have a value of 160 kJ · mol − · nm − ,until 1.2 nm as cutoﬀ, parametrised by minimising theaverage root mean square error in the C α RMSF. More-over, a WCA interaction is applied between C α nodesand all solvent molecules center of mass. In the WCApotential, (cid:15) has a value of 0 . kJ · mol − arbitrarilychosen as the value for carbon in the atomistic forceﬁeld,and σ i = R g,i · c where R g,i is the radius of gyration ofa given residue i where c is the same for all amino acids.The value of c is tuned to give the correct bulk waterdensity of reference for a protein-water system. The c value found is 0.658. Further explanations about c canbe found in the Supporting Information . III. RESULTS AND DISCUSSION

We performed the calculation of ∆ G b of lysozyme mod-elled in dual-resolution, varying the number of atomisticresidues constituting the binding site and comparing theresults with a fully atomistic reference simulation. Recallthat the binding free energy calculation consists of threesteps: restraint removal, ligand ∆ G , and ligand-complex∆ G ; of these, only the latter depends on protein resolu-tion, that is, only ∆ G compl assumes diﬀerent values fordiﬀerent numbers of active site residues described at theall-atom level.As explained in the previous section, the contributioncoming from the restraints can be analytically computedand amounts to ∆ G r off = − . kJ · mol − . Likewise,the Coulomb and Lennard-Jones contributions to the lig-and free energy ∆ G lig are the following:∆ G coul,(cid:96) = − . ± . kJ · mol − ∆ G LJ,(cid:96) = − . ± . kJ · mol − Hence: ∆ G lig = − . ± . kJ · mol − The ﬁnal step is the calculation of ∆ G compl , whose re-sults, including the comparison between dual-resolutionmodel and fully atomistic reference, are shown in Tab. IIand illustrated in Fig. 3.The ﬁrst three columns of the table describe theCoulomb, Lennard-Jones, Restraints contributions tofree energy, respectively, while the last one correspondsto the value of the total ligand-protein complex free en-ergy. All the values are expressed in kJ · mol − . In Fig. 3,the atomistic reference is represented with a dash blackline with its error bar. In particular, panels (a), (b) and(c) show the three components that contribute to the to-tal complex free energy, reported in panel (d). Lookingat these values as a function of the number of all-atomactive site residues, we notice that there are importantdeviations of the free energy from the reference, espe-cially in the case of 3 and 4 atomistic residues. On thecontrary, the total value of the binding free energy agreeswith the reference within the error bar in all cases.Furthermore, we observe that the trend of free energyvalues, in comparison to the reference, is essentially thesame: starting from 3 amino acids it approaches the ref-erence until reaching 6, both in its components and intotal. In contrast, going from 6 to 8 atomistic residuesthe value deviates from the reference, even though thetotal remains close to it. Finally, from 8 to 10, ∆ G con-verges again. Hence, increasing the number of atomisticresidues does not introduce necessarily an improvementof the computed free energy, at least as long as the vari-ous free energy components are considered separately.In order to gain further, quantitative insight into theseresults, we computed the the quadratic deviation fromthe reference, δ , deﬁned as: δ i = δ i − Coul + δ i − LJ + δ i − Restr == (∆ G Coul i − ∆ G Coul − at ) + (∆ G LJ i − ∆ G LJ − at ) + (∆ G Restr i − ∆ G Restr − at ) (5) where the index i = 3 ...

10 runs over atomistic residues.Fig. 4 reports δ as a function of the number of activesite amino acids modelled with atomistic detail.The plot shows that the binding free energy computedin the dual-res model approaches the reference as thenumber of atomistic active site residues increases, andmost importantly this approach takes place for each com-ponent up 6 residues. Beyond this value, though, thetrend stops and the deviation becomes larger, peaking at8 residues and decreasing when further atomistic aminoacids are added. These results highlight a non-monotonicdependence of the free energy on the mapping, that is,the number of retained atomistic residues. If, on theone hand, the overall value of the binding free energy(Fig. 3 panel d) levels to the reference with as few all-atom residues as 4, the separate components oscillate andreach the plateau only for larger numbers. The existenceof a minimum in the standard deviation of all three con-tributions pinpoints a particular number of atomistic ac-tive site residues for which the accuracy of the computedfree energy is the highest and the economy of the high-resolution subpart the largest. Including more than 6atomistic residues counterintuitively worsens the result–when the various contributions are looked at– and theprevious accuracy is only recovered when more residuesare included. This behaviour suggests that the total freeenergy undergoes an error cancellation which hides thedeviations of the separate terms.A possible explanation for this nontrivial behaviour isthat when 6 active site residues are modelled with all-atom accuracy (Fig. 5b) the ligand is stable in the cat-alytic site, namely it is surrounded by a complete shellof atomistic residues. The addition or deletion of otherresidues (Figs. 5c and 5a respectively) leads to a wors-ening of ∆ G : in the ﬁrst case, the two added residues(in pink and grey) are located behind the ﬁrst shell ofamino acids (far away from the ligand) and start to forma second, incomplete shell; in the second case, only threeatomistic amino acids take part in the direct interactionwith the ligand: therefore, the ﬁrst layer is still incom-plete and important interactions are missing; in orderto improve the free energy value one has to add furtheramino acids in order to complete the second shell. Weemphasise that the impact on the deviation from the ref-erence is inversely proportional to the distance of theadded/removed amino acid. Thus, the farther the atom-istic amino acid is from the ligand, the more negligibleits eﬀect is. In the Supporting Information we providedetail about the other numbers of all-atom residues notreported here. Finally, the values of binding free energy(also for the case of annihilation whose calculations arereported in the

Supporting Information ) are summarisedin Tab. III and illustrated in Fig. 6.TABLE II: In this table are reported the resulting values of free energy of Complex Free Energy (4th column) andits components (Coulomb, Lennard Jones and Restraints respectively in the ﬁrst three columns) in fully atomisticsystem and varying the number of atomistic residues. All the values are in kJ · mol − and performed withThermodynamic Integration. Moreover, all simulations are carried out in ESPResSo++. In particular, for eachvalue of λ , the dual-resolution simulations with diﬀerent number of atomistic residues last 4 nsec; the atomisticsimulation, instead, lasts 0.6 ns (600 ps) at res Δ G Coul,c Δ G LJ,c Δ G Restr on,c Δ G compl fully-at 145 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . IV. CONCLUSIONS

In this work we have shown how the dual resolutionmodel employed, constituted by an all-atom subregioncoupled to an elastic network model remainder, can beused to calculate the binding free energy of an enzyme-substrate complex with atomistic accuracy. Furthermore,and most importantly, we have highlighted the impactthat diﬀerent choices of the model resolution can have.Speciﬁcally, we have computed the total value of thebinding free energy as well as that of its various en-ergetic components, and quantitatively inspected howthese change when diﬀerent selections are performed forthe subgroup of amino acids, ranging from 3 to 10 intotal, to be modelled at the fully atomistic level.At ﬁrst sight, one can appreciate that the binding freeenergy value rapidly converges to the atomistic refer-ence when as few as 4 amino acids constituting the ac-tive site are described all-atom. This comforting result,however, unveils a greater complexity when the diﬀerentterms constituting the free energy are looked at sepa-rately. These show an oscillating behaviour as the num-ber of all-atom residues in the active site is increased,with a decreasing diﬀerence from the reference followedby a sudden jump to larger values, which dampens uponfurther addition of atomistic amino acids. The rationalein this behaviour is identiﬁed in the structure of the ac-tive site, which is constituted by a ﬁrst shell of the sixresidues exposed to the solvent and closest to the ligand; when further amino acids beyond these are modelled withatomistic resolution, they interact with the substrate af-fecting the binding free energy components and shiftingthem away from the reference, with a steadily loweringimpact as the model’s resolution is increased - as onecan expect. Surprisingly, very little if no signal of thisbehaviour is observed in the value of the binding freeenergy as a whole, rather it becomes visible only uponinspection of its separate contributions.The results of this work thus highlight the importanceof mapping in the construction of multi-scale and multi-resolution models, as a higher degree of detail does notnecessarily correlate with a higher accuracy of the quan-tities of interest. The implications of these observationsshould serve as a warning in the realisation of coarse-grained models concurrently employing various levels ofdetail for diﬀerent regions of the same system, whoserange of application spans from fundamental understat-ing of a molecule’s properties to real-life pharmaceuticalapplications.

ACKNOWLEDGMENTS

The authors are grateful to Robinson Cortes-Huertoand Thomas Tarenzi for a critical reading of themanuscript. This project has received funding from theEuropean Research Council (ERC) under the EuropeanUnions Horizon 2020 research and innovation programme(grant agreement no. 758588-VARIAMOLS and grantagreement no. 340906-MOLPROCOMP). [1] Boyce SE, Mobley DL, Rocklin GJ, Graves AP, Dill KA,and Shoichet BK. Predicting ligand binding aﬃnity withalchemical free energy methods in a polar model bindingsite.

J Mol Bio. , page 747763, 2009.[2] Aldeghi M., Bluck J.P., and Biggin P.C. Absolute al- chemical free energy calculations for ligand binding: Abeginners guide.

Computational Drug Discovery and De-sign , 1762:199–232, 2018.[3] Stefan Boresch, Franz Tettinger, Martin Leitgeb, andMartin Karplus. Absolute binding free energies: A quan- proteins's residues number D G C ou l / kJ m o l - D G Coulomb with error bar (a) protein's residues number D G L J K J m o l - D G Van der Waals with error bars (b) protein's residues number D G R e s t r K J m o l - D G Restraints with error bars (c) protein's residues number D G t o t / K J m o l - D G Total with error bars (d)

FIG. 3: (a) Coulomb, (b) Lennard-Jones, (c) restraint and (d) total free energies in the protein-ligand complex, as afunction of protein’s residues number included in atomistic detail in the multi-resolution set-up. The heavy dashedblack horizontal lines are the reference values from fully atomistic simulations, and the lighter dotted blackhorizontal lines are the error bars for those values. These simulations use decoupling, not annihilation. y-axes do notcover the same energy range. titative approach for their calculation.

J. Phys. Chem.B , 107(35):9535–9551, 2003.[4] Zoe Cournia, Bryce Allen, and Woody Sherman. Rel-ative binding free energy calculations in drug discovery:Recent advances and practical considerations.

Journal ofChemical Information and Modeling , 57(12):2911–2937,2017. PMID: 29243483.[5] Robert Abel, Lingle Wang, Edward D. Harder, B. J.Berne, and Richard A. Friesner. Advancing drug discov-ery through enhanced free energy calculations.

Accountsof Chemical Research , 50(7):1625–1632, 2017. PMID:28677954.[6] B. N. Dominy. Molecular recognition and binding freeenergy calculations in drug development.

Current Phar- maceutical Biotechnology , 9(2):87–95, 2008.[7] Cameron F. Abrams, Luigi Delle Site, and Kurt Kre-mer. Dual-resolution coarse-grained simulation of thebisphenol- a -polycarbonate/nickel interface. Phys. Rev.E , 67:021807, Feb 2003.[8] Raﬀaello Potestio, Christine Peter, and Kurt Kremer.Computer simulations of soft matter: Linking the scales.

Entropy , 16(8):4199–4245, 2014.[9] Raﬀaele Fiorentini, Kurt Kremer, Raﬀaello Potestio, andAoife Fogarty. Using force-based adaptive resolution sim-ulations to calculate solvation free energies of amino acidsidechain analogues.

The Journal of Chemical Physics ,146:244113, 06 2017.[10] Richard J. Gowers and Paola Carbone. A mul-

FIG. 4: Square root of quadratic deviation δ vs thenumber of atomistic residues chosen. The plot showsthat in the case of 6 atomistic residues, the value ofquadratic deviation is the lowest one and hence itmeans that such a number leads the best result of freeenergy. Moreover the black line shows the trend of FEvalues as discussed in the section III . TABLE III: Representation of Free Energies valuescomputed in ESPResSo++ and GROMACS(respectively espp and grom using a short notation onthe table) in case of annihilation and decoupling. Thetable is divided in three column: from left to right arerepresented the ligand, protein-ligand complex andbinding FE. The latter is the algebraic sum of ∆ G compl ,∆ G r off and ∆ G lig . The results are in kJ · mol − . Ligand Complex Bindingannihilation atom, espp − . ± . . ± . . ± . atom, grom − . ± . . ± . . ± . decoupling atom, espp − . ± . . ± . . ± . aa-3, espp − . ± . . ± . . ± . aa-4, espp − . ± . . ± . . ± . aa-5, espp − . ± . . ± . . ± . aa-6, espp − . ± . . ± . . ± . aa-7, espp − . ± . . ± . . ± . aa-8, espp − . ± . . ± . . ± . aa-9, espp − . ± . . ± . . ± . aa-10, espp − . ± . . ± . . ± . The Journal of Chemical Physics ,142(22):224907, 2015.[11] Marilisa Neri, Claudio Anselmi, Michele Cascella, AmosMaritan, and Paolo Carloni. Coarse-grained model ofproteins incorporating atomistic detail of the active site.

Phys. Rev. Lett. , 95:218102, Nov 2005.[12] Marilisa Neri, Marc Baaden, Vincenzo Carnevale, Clau-dio Anselmi, Amos Maritan, and Paolo Carloni. Mi- croseconds dynamics simulations of the outer-membraneprotease t.

Biophysical Journal , 94(1):71 – 78, 2008.[13] Matas Rodrigo Machado, Pablo Daniel Dans, and SergioPantano. A hybrid all-atom/coarse grain model for mul-tiscale simulations of dna.

Phys. Chem. Chem. Phys. ,13:18134–18144, 2011.[14] Matias R. Machado and Sergio Pantano. Exploring LacI–DNA dynamics by multiscale simulations using the sirahforce ﬁeld.

Journal of Chemical Theory and Computa-tion , 11(10):5012–5023, 2015. PMID: 26574286.[15] Aoife C. Fogarty, Raﬀaello Potestio, and Kurt Kremer.A multi-resolution model to capture both global ﬂuctu-ations of an enzyme and molecular recognition in theligand-binding site.

Proteins: Struct., Func., and Bioinf. ,84(12):1902–1913, 2016.[16] Andrea Amadei, Antonius B. M. Linssen, and Her-man J. C. Berendsen. Essential dynamics of pro-teins.

Proteins: Structure, Function, and Bioinformatics ,17(4):412–425, 1993.[17] Vincenzo Carnevale, Simone Raugei, Cristian Micheletti,and Paolo Carloni. Convergent dynamics in the proteaseenzymatic superfamily.

J. Am. Chem. Soc. , 2:173–181,2006.[18] Andrea Zen, Vincenzo Carnevale, Arthur M. Lesk,and Cristian Micheletti. Correspondences between low-energy modes in enzymes: Dynamics-based alignment ofenzymatic functional families.

Protein Sci. , 17:918–929,2008.[19] F. Pontiggia, A. Zen, and C. Micheletti. Small andlarge scale conformational changes of adenylate kinase:a molecular dynamics study of the subdomain motionand mechanics.

Biophys J , 95(12):5901–5912, Dec 2008.[20] Monique M. Tirion. Large amplitude elastic motions inproteins from a single-parameter, atomic analysis.

Phys.Rev. Lett. , 77:1905–1908, Aug 1996.[21] K. Hinsen. Analysis of domain motions by approximatenormal mode calculations.

Proteins , 33:417–429, 1998.[22] M Delarue and Y H Sanejouand. Simpliﬁed normal modeanalysis of conformational transitions in dna-dependentpolymerases: the elastic network model.

J Mol Biol ,320(5):1011–1024, 2002.[23] C. Micheletti, P. Carloni, and A. Maritan. Accurateand eﬃcient description of protein vibrational dynam-ics: comparing molecular dynamics and gaussian models.

Proteins , 55(3):635–645, May 2004.[24] Tod D. Romo and Alan Grossﬁeld. Validating and im-proving elastic network models with molecular dynamicssimulations.

Proteins: Structure, Function, and Bioin-formatics , 79(1):23–34, 2011.[25] R. Potestio, F. Pontiggia, and C. Micheletti. Coarse-grained description of proteins’ internal dynamics: anoptimal strategy for decomposing proteins in rigid sub-units.

Biophys J , 96, 2009.[26] H. Golhlke and M. F. Thorpe. A natural coarse grain-ing for simulating large biomolecular motion.

BiophysicalJournal , 91:2115–2120, 2006.[27] Zhiyong Zhang, Lanyuan Lu, Will G. Noid, Vinod Kr-ishna, Jim Pfaendtner, and Gregory A. Voth. A sys-tematic methodology for deﬁning coarse-grained sites inlarge biomolecules.

Biophysical Journal , 95(11):5073 –5083, 2008.[28] Zhiyong Zhang, Jim Pfaendtner, Andrea Grafmller, andGregory A. Voth. Deﬁning coarse-grained representationsof large biomolecules and biomolecular complexes from (a) (b)(c) (d)

FIG. 5: VMD representation of lysozyme and ligand in diﬀerent resolution: (a) three, (b) six, (c) eight, (d) tenatomistic residues. The complete set can be found in

Supporting information

The ligand is always atomistic and it isrepresented in Licorice. In green are represented the ENM beads. With the other colors are represented, instead, thevarious atomistic residues which surround the ligand elastic network models.

Biophysical Journal , 97(8):2327– 2337, 2009.[29] Zhiyong Zhang and Gregory A. Voth. Coarse-grainedrepresentations of large biomolecular complexes fromlow-resolution structural data.

Journal of Chemical The-ory and Computation , 6(9):2990–3002, 2010.[30] Anton V. Sinitskiy, Marissa G. Saunders, and Gregory A.Voth. Optimal number of coarse-grained sites in diﬀerentcomponents of large biomolecular complexes.

The Jour-nal of Physical Chemistry B , 116(29):8363–8374, 2012.PMID: 22276676.[31] Guido Polles, Giuliana Indelicato, Raﬀaello Potes-tio, Paolo Cermelli, Reidun Twarock, and CristianMicheletti. Mechanical and assembly units of viralcapsids identiﬁed via quasi-rigid domain decomposition.

PLOS Computational Biology , 9(11):1–13, 11 2013.[32] Thomas T. Foley, M. Scott Shell, and W. G. Noid. Theimpact of resolution upon entropy and information incoarse-grained models.

The Journal of Chemical Physics , 143(24):243104, 2015.[33] Patrick Diggins, Changjiang Liu, Markus Deserno, andRaﬀaello Potestio. Optimal coarse-grained site selectionin elastic network models of biomolecules.

Journal ofChemical Theory and Computation , 0(0):null, 0.[34] John G. Kirkwood. Statistical mechanics of ﬂuid mix-tures.

The Journal of Chemical Physics , 3(5):300–313,1935.[35] John D. Weeks, David Chandler, and Hans C. Ander-sen. Role of repulsive forces in determining the equilib-rium structure of simple liquids.

The Journal of ChemicalPhysics , 54(12):5237–5247, 1971.[36] M. Parrinello and A. Rahman. Polymorphic transitionsin single crystals: A new molecular dynamics method.

Journal of Applied Physics , 52(12):7182–7190, 1981.[37] Viktor Hornak, Robert Abel, Asim Okur, Bentley Strock-bine, Adrian Roitberg, and Carlos Simmerling. Compar-ison of multiple amber force ﬁelds and development ofimproved protein backbone parameters.

Proteins: Struc- ture, Function, and Bioinformatics , 65(3):712–725, 2006.[38] William L. Jorgensen, Jayaraman Chandrasekhar, Jef-fry D. Madura, Roger W. Impey, and Michael L. Klein.Comparison of simple potential functions for simulat-ing liquid water.

The Journal of Chemical Physics , 79(2):926–935, 1983.[39] Karl N. Kirschner, Austin B. Yongye, Sarah M.Tschampel, Jorge Gonzlez-Outeirio, Charlisa R. Daniels,B. Lachele Foley, and Robert J. Woods. Glycam06:A generalizable biomolecular force ﬁeld. carbohydrates.

Journal of Computational Chemistry , 29(4):622–655,2008.[40] Mark Abraham, Berk Hess, David van der Spoel, andErik Lindahl. The gromacs development team, gromacsuser manual version 5.0.4. 2014.[41] Michael R. Shirts, Jed W. Pitera, William C. Swope,and Vijay S. Pande. Extremely precise free energy cal-culations of amino acid side chain analogs: Compari-son of common molecular mechanics force ﬁelds for pro-teins.

The Journal of Chemical Physics , 119(11):5740–5761, 2003.[42] Shuichi Miyamoto and Peter A. Kollman. Settle: Ananalytical version of the shake and rattle algorithm forrigid water models.

Journal of Computational Chemistry ,13(8):952–962, 1992.[43] Hans C Andersen. Rattle: A “velocity” version ofthe shake algorithm for molecular dynamics calculations.

Journal of Computational Physics , 52(1):24 – 34, 1983.[44] J.D.Halverson, T.Brandes, O.Lenz, A.Arnold, S.Bevc,V.Starchenko, K.Kremer, T.Stuehn, and D.Reith.Espresso++: A modern multiscale simulation packagefor soft matter systems.

Computer Physics Communica-tions , 184:1129–1149, 2013.[45] Horacio V. Guzman, Nikita Tretyakov, HidekiKobayashi, Aoife C. Fogarty, Karsten Kreis, JakubKrajniak, Christoph Junghans, Kurt Kremer, andTorsten Stuehn. Espresso++ 2.0: Advanced methodsfor multiscale molecular simulation.

Computer PhysicsCommunications , 238:66 – 76, 2019.[46] Berk Hess, Carsten Kutzner, David van der Spoel, andErik Lindahl. Gromacs 4: Algorithms for highly eﬃcient,load-balanced, and scalable molecular simulation.

Jour-nal of Chemical Theory and Computation , 4(3):435–447,2008. PMID: 26620784.[47] Student. The probable error of a mean.