Conformational Entropy as Collective Variable for Proteins
CConformational Entropy as Collective Variable for Proteins
Ferruccio Palazzesi, Omar Valsson,
1, 2 and Michele Parrinello
1, 2, ∗ Department of Chemistry and Applied Biosciences, ETH Zurich and Facolt`a di Informatica,Instituto di Scienze Computationali, Universit`a della Svizzera italiana,Via Giuseppe Buffi 13, CH-6900, Lugano, Switzerland. National Center for Computational Design and Discovery of Novel Materials MARVEL. (Dated: April 12, 2017)Many enhanced sampling methods, such as Umbrella Sampling, Metadynamics or VariationallyEnhanced Sampling, rely on the identification of appropriate collective variables. For proteins, evensmall ones, finding appropriate collective variables has proven challenging. Here we suggest thatthe NMR S order parameter can be used to this effect. We trace the validity of this statementto the suggested relation between S and entropy. Using the S order parameter and a surrogatefor the protein enthalpy in conjunction with Metadynamics or Variationally Enhanced Sampling weare able to reversibly fold and unfold a small protein and draw its free energy at a fraction of thetime that is needed in unbiased simulations. From a more conceptual point of view this impliesdescribing folding as a resulting from a trade off between entropy and enthalpy. We also use S incombination with the free energy flooding method to compute the unfolding rate of this peptide.We repeat this calculation at different temperatures to obtain the unfolding activation energy. PACS numbers: 05.10.-a, 02.70.Ns, 87.15.H-
Enhanced sampling method have received great atten-tion since they offer the promise of overcoming the lim-ited time scale that direct simulations can afford. Amongthe plethora of methods proposed in the literature [1–4],Metadynamics (MetaD) [5, 6] and more recently Vari-ationally Enhanced Sampling (VES) [7] are finding ap-plication in a vast array of problems [8–14]. Like othersimilar approaches they rely on the identification of ap-propriate collective variables (CVs) or order parameters.In MetaD or VES the fluctuations of the CVs are en-hanced such that transition between different metastablebasins are favored [15, 16].A vast number of CVs have been suggested and areeasily accessible in open source codes [17]. However thequest for new order parameters continues in the hopeof finding CVs that are efficient and yet generic enoughsuch that they can be applied to a larger class of prob-lems, without prejudging the final results. Identifyingappropriate CVs is not only a technical issue needed toaccelerate sampling but offers a key to the understandingof the underlying physical processes. The need for suchCVs is particularly pressing in the field of bio-molecularsystems, such as proteins, whose conformational changesare defined by a large number of degrees of freedom likethe arrangement of backbone atoms, side-chains and sol-vent molecules.At first this appears like a very demanding request.The purpose of this paper is to show that this is not nec-essarily so, at least for small proteins or selected regionsof larger bio-systems. To this effect we introduce a con-ceptually new CV and illustrate its efficiency in non triv-ial examples. Our guiding principle is that in the behav-ior of proteins, and of many other systems [18], entropyplays an important role and if we are able to identify aCV that measures entropy even if in an approximate way, this could go some way towards being able to sample thecomplex landscape of proteins [19].In this search we shall be helped by the NMR litera-ture in which several attempts have been made at con-verting the NMR observable into a measure of conforma-tional entropy [20–26]. Without going into the complexdetail of the NMR technique, it suffices to say that innuclear relaxation experiments, it is possible to measurethe dynamical behavior of selected bonds, like N-H orC-H, and, within some approximations, extract the so-called order parameter S . This parameter, that canvary between 0 and 1, can provide useful informationon the degree of spatial motion of the system [27, 28].From the knowledge of this parameter several empiricalrelation between S and the conformational entropy havebeen proposed and their validity assessed in comparisonwith either experiments or molecular dynamics simula-tions [20, 23–25]. Despite being rather empirical [22],these relationships have been used to study several bio-macromolecular processes, and to calculate protein heatcapacity [21, 26, 29, 30].It is the existence of these relations and the notionthat entropy plays an important role in proteins thathave inspired us to use S as CV. However, to orderto proceed one needs to express S as a function of theatomic coordinates. Luckily such relations are availablefor both the NH and CH groups [31–34]. Since here weshall only bias the N-H bonds related order parameter, wesolely report the relevant expression proposed by Zhangand Br¨uschweiler [35]: S = (cid:88) n S n (1)where S n is the order parameter for the n -th amino acid a r X i v : . [ phy s i c s . c h e m - ph ] A p r residue defined as: S n = tanh (cid:32) . (cid:88) k [exp( − r On − ,k ) + exp( − r H N n,k )] (cid:33) − . , (2)where k runs over all the heavy atoms with the excep-tion of the residues n and n − r H N n,k and r On − ,k arethe distances from the heavy atom k from the amide hy-drogen in residue n and the carboxyl oxygen in residue n −
1. In Eq. (2) the distances are to be expressed in˚Angstroms.In this paper we assume that this relation is valid forany atomic configuration and we use it as a CV. From apractical point of view the merit of this choice is in theend to be judged by the results. Once the CV is identi-fied, the fluctuations of S can be amplified by using anenhanced sampling technique, such as MetaD or VES.Before starting our calculations we indeed checked that S is able to distinguish between the folded and unfoldedprotein conformations. For this reason we calculated thefree energy surface (FES) along this CV utilizing thechignolin (CLN025) [36] trajectory provided to us by theD.E. Shaw Research Group [37]. As can be seen in Fig. 1,the FES exhibits two well-defined basins correspondingto the folded and unfolded state. This in itself is an in-teresting result that suggests the attempt at using S ascollective variable is not totally devoid of merits. FIG. 1. FES along the s S CV calculated from long unbiasedmolecular dynamics simulation of Ref. 37, with the ribbonrepresentation of chignolin in two representative snapshots ofthe folded and unfolded conformations.
We tried to drive the folding transition using only S (hereafter denoted s S ) as CV but failed. Thus in thespirit of this work in which we describe folding as a trade off between entropy and enthalpy we introduced a sec-ond CV that is meant to be a surrogate for enthalpy.This could have been accomplished by separating fromthe total energy of the systems those components thatdescribe the protein-protein interactions and use theseas CV. However this is somewhat expensive and we pre-ferred to use as surrogate for the enthalpy the native H-bonds contact map (hereafter s H ). At the end we shallreweight the FES [38] and obtain its projection onto theprotein enthalpy defined as the sum of the protein-proteincontributions, E pp , to the total energy.We performed the simulation at T = 340 Kas in Ref. 37 and we use the same potential, theCharmm22* [39] protein force field and the TIP3P [40]water model. We use GROMACS 5.1.4 MD package [41]patched with the PLUMED 2 plug-in [17] in which wehave implemented the s S CV. In the MetaD calculationswe integrate the equation of motion using a time step of2.0 fs and the temperature is controlled by the stochasticrescaling thermostat [42]. Gaussians of initial height 2.82kJ/mol and width 0.05 for s S and for s H were depositedevery picosecond. The value of the bias factor γ was setequal to 8 [6]. To speed up the calculation and make useof parallelism we used 4 multiple walkers [43]. We eval-uate the convergence using the error metric previouslyused in Ref. [6, 44, 45] and using the unbiased data ofRef. 37 as reference.It can be seen from Fig. 2 that the convergence in theFES is reached in about 1.0 µ s, that is a much shortertime relative to ∼ µ s reported in Ref. 37. This re-flects the fact that in the MetaD run the rate of transitionbetween folded and unfolded states is accelerated aboutone hundred times.In this plot we also express the FES as a function ofentropy and enthalpy. The entropy is extimated from s S using the relation given by Wand and co-workers [23],while as a measure of the protein enthalpy we use E pp .This calculation clearly proves the usefulness of using en-tropy and enthalpy to drive reversible transitions betweenfolded and unfolded states. We have repeated the calcula-tions using VES obtaining statistically indistinguishableresults. The VES simulations are performed using theVES code [46] module for PLUMED 2. The results forthis calculation are reported in the SM.An analysis of the structures lying at the bottom of thefolded basin in Fig. 1 shows that they deviate from theexperimental structure [36] on average by around 0.8 ˚A.This suggests that if we are only interested in studyingthe unfolding process, s S alone could be used to pro-mote unfolding transition and calculate unfolding ratesusing the approach introduced in Ref. 47. This is theVES analogous of the infrequent metadynamics methodof Tiwary and Parrinello [48], that in turn is based on FIG. 2. a) Reweighted FES calculated from the MetaD sim-ulation using entropy and enthalpy as CVs. b) and c) Mono-dimensional FESs for entropy and enthalpy, respectively. Inred the data from the unbiased simulation of Ref. 37, while ingreen the one from MetaD simulation. d) Free energy errorof the 2D FES along the simulation time. the ideas of Grubmuller [3] and Voter [1]. In all theseapproaches one relates the rates of a rare event, as calcu-lated in a biased system, to the physical unbiased ratesby a simple relation. The requirement that the bias doesnot act in the transition region is crucial for this rela-tion to hold. The methods mentioned earlier differ inthe way this is achieved. In Ref. 47 the variational ap-proach is used to truncate the bias up to a preassignedfree energy level. If this cutoff value is smaller than thefree energy barrier this latter region remains free of biasand the condition put forward by Grubmuller [3] andVoter [1] applies. Computational details can be found inthe SM. The results of our calculation are shown in Fig. 3.The value obtained at T = 340 K is in good agreementwith both the unbiased estimation of Lindorff-Larsen etal. [37] and the infrequent MetaD simulations performedby Tung and Pfaendtner on a mutated form of our sys-tem, using the root-mean square deviation with respectto the folded structure as biased CV [49].Contrary to these previous estimations we push ourcalculation to lower temperatures (320 and 300 K), find-ing for the unfolding of this simple peptide an Arrheniusbehavior, with an activation energy of around 50 kJ/mol.This value is in the right ballpark when compared to theestimation based on a similar β -peptide [50].In conclusion we have shown that the idea of usingentropy and enthalpy as collective variables, or rathersurrogate expressions for them, is a useful one. One of thesecret of the success in using these CVs is not only that FIG. 3. Arrenhius plot of the unfolding process of the chig-nolin. The green square is the unfolding time calculated fromthe unbiased data of Ref. 37. The black line is the linear re-gression used to calculated the activation energy ( R is equalto 0.97). The acceleration factors are 10, 50 and 100 for tem-perature equal to 340, 320 and 300, respectively. it is founded on physical ideas and concepts but also onthe fact that is non-local and thus sensitive to the wholestructure of the protein. We would like to add that the S defined in Eq. 2 reflects the backbone structure. Ifone were interested in the role of the side-chains the useof S order parameter based on the methyl groups mightprove useful.We believe that here we have introduced a powerfulnew concept in the simulation of protein. How far itcan be pushed when going to larger systems it remainsto be seen. Already at this stage, it can be safely saidthat, without any modification, the folding of small pro-teins, the study of intrinsically disordered proteins andthe conformational flexibility of proteins segments can behandled as described above [51]. We are confident thatwith an appropriate adaptation much larger proteins canbe similarly handled.We acknowledge funding from the European UnionGrant No. ERC-2014-AdG-670227/VARMET and by theNCCR MARVEL, funded by the Swiss National ScienceFoundation. This work was supported by a grant fromthe Swiss National Supercomputing Centre (CSCS) un-der project ID s721 and u1. The authors thank D. E.Shaw Research for sharing the simulation data for chig-nolin. ∗ [email protected][1] A. F. Voter, Phys. Rev. Lett. , 3908 (1997).[2] T. Huber, A. E. Torda, and W. F. Gunsteren, J Computer-Aided Mol Des , 695 (1994).[3] H. Grubm¨uller, Phys. Rev. E , 2893 (1995).[4] G. M. Torrie and J. P. Valleau, J. Comp. Phys. , 187(1977).[5] A. Laio and M. Parrinello, Proc. Natl. Acad. Sci. U.S.A. , 12562 (2002).[6] A. Barducci, G. Bussi, and M. Parrinello, Phys. Rev.Lett. , 020603 (2008).[7] O. Valsson and M. Parrinello, Phys. Rev. Lett. ,090601 (2014).[8] F. Palazzesi, A. Barducci, M. Tollinger, and M. Par-rinello, Proc. Natl. Acad. Sci. U.S.A. , 14237 (2013).[9] R. Martoˇn´ak, D. Donadio, A. R. Oganov, and M. Par-rinello, Nat. Mat. , 623 (2006).[10] G. M. Pavan, A. Barducci, L. Albertazzi, and M. Par-rinello, Soft Mat. , 2593 (2013).[11] D. Branduardi, F. L. Gervasio, and M. Parrinello, J.Chem. Phys. , 054103 (2007).[12] P. Tiwary, V. Limongelli, M. Salvalaglio, and M. Par-rinello, Proc. Natl. Acad. Sci. U.S.A. , 201424461 (2015).[13] S. Bottaro, P. Ban´aˇs, J. Sponer, and G. Bussi, J. Chem.Phys. Lett. (2016).[14] P. Shaffer, O. Valsson, and M. Parrinello, Proc. Natl.Acad. Sci. U.S.A. , 1150 (2016).[15] A. Barducci, M. Bonomi, and M. Parrinello, WIREsComput. Mol. Sci. , 826 (2011).[16] O. Valsson, P. Tiwary, and M. Parrinello, Annual reviewof physical chemistry , 159 (2016).[17] G. A. Tribello, M. Bonomi, D. Branduardi, C. Camilloni,and G. Bussi, Comput. Phys. Commun. , 604 (2014).[18] P. M. Piaggi, O. Valsson, and M. Parrinello, arXivpreprint arXiv:1612.03235 (2016).[19] L. L. Chavez, J. N. Onuchic, and C. Clementi, J. Am.Chem. Soc. , 8426 (2004).[20] M. Akke, R. Brueschweiler, and A. G. Palmer III, Jour-nal of the American Chemical Society , 9832 (1993).[21] D. Yang, Y.-K. Mok, J. D. Forman-Kay, N. A. Farrow,and L. E. Kay, J. Mol. Bio. , 790 (1997).[22] M. J. Stone, Acc. Chem. Res. , 379 (2001).[23] K. A. Sharp, E. O’Brien, V. Kasinath, and A. J. Wand,Proteins: Struct., Funct., Bioinf. Proteins , 922 (2015).[24] D. Yang and L. E. Kay, J. Mol. Bio. , 369 (1996).[25] Z. Li, S. Raychaudhuri, and A. J. Wand, Protein Sci. ,2647 (1996).[26] M. S. Marlow, J. Dogan, K. K. Frederick, K. G. Valen-tine, and A. J. Wand, Nat. Chem. Bio. , 352 (2010).[27] G. Lipari and A. Szabo, J. Am. Chem. Soc. , 4546(1982).[28] G. Lipari and A. Szabo, J. Am. Chem. Soc. , 4559(1982).[29] L. Spyracopoulos and B. D. Sykes, Curr. Opin. Struct.Bio. , 555 (2001).[30] C. Bracken, J. Mol. Graph. Model. , 3 (2001). [31] G. R. Bowman, J. Comp. Chem. , 558 (2016).[32] D. C. Chatfield, A. Szabo, and B. R. Brooks, J. Am.Chem. Soc. , 5301 (1998).[33] C. Peter, X. Daura, and W. F. Van Gunsteren, J. Bio.NMR , 297 (2001).[34] P. Maragakis, K. Lindorff-Larsen, M. P. Eastwood, R. O.Dror, J. L. Klepeis, I. T. Arkin, M. Ø. Jensen, H. Xu,N. Trbovic, R. A. Friesner, A. G. Palmer III, and D. E.Shaw, J. Phys. Chem. B , 6155 (2008).[35] F. Zhang and R. Br¨uschweiler, J. Am. Chem. Soc. ,12654 (2002).[36] S. Honda, T. Akiba, Y. S. Kato, Y. Sawada, M. Sekijima,M. Ishimura, A. Ooishi, H. Watanabe, T. Odahara, andK. Harata, J. Am. Chem. Soc. , 15327 (2008).[37] K. Lindorff-Larsen, S. Piana, R. O. Dror, and D. E.Shaw, Science , 517 (2011).[38] P. Tiwary and M. Parrinello, J. Phys. Chem. B , 736(2014).[39] S. Piana, K. Lindorff-Larsen, and D. E. Shaw, Biophys.J. , L47 (2011).[40] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W.Impey, and M. L. Klein, J. Chem. Phys. , 926 (1983).[41] M. J. Abraham, T. Murtola, R. Schulz, S. P´all, J. C.Smith, B. Hess, and E. Lindahl, SoftwareX , 19(2015).[42] G. Bussi, D. Donadio, and M. Parrinello, J. Chem. Phys. , 014101 (2007).[43] P. Raiteri, A. Laio, F. L. Gervasio, C. Micheletti, andM. Parrinello, J. Phys. Chem. B , 3533 (2006).[44] D. Branduardi, G. Bussi, and M. Parrinello, J. Chem.Theory Comput. , 2247 (2012).[45] O. Valsson and M. Parrinello, J. Chem. Theory Comput. , 1996 (2015).[46] VES Code , 070601 (2015).[48] P. Tiwary and M. Parrinello, Phys. Rev. Lett. ,230602 (2013).[49] H.-J. Tung and J. Pfaendtner, Mol. Sys. Des. Eng. , 382(2016).[50] M. Scian, I. Shu, K. A. Olsen, K. Hassam, and N. H.Andersen, Biochemistry , 2556 (2013).[51] D. B. Kokh, P. Czodrowski, F. Rippmann, and R. C.Wade, J. Chem. Theory Comput. , 4100 (2016).[52] F. Bach and E. Moulines, in Advances in Neural Informa-tion Processing Systems 26 , edited by C. Burges, L. Bot-tou, M. Welling, Z. Ghahramani, and K. Weinberger(Curran Associates, Inc., Red Hook, NY, 2013) pp. 773–781.[53] M. Salvalaglio, P. Tiwary, and M. Parrinello, J. Chem.Theory Comput.10