[PDF] Generalization of the Elastic Network model for the study of large conformational changes in biomolecules

Abstract

The elastic network (EN) is a prime model that describes the long-time dynamics of biomolecules. However, the use of harmonic potentials renders this model insufficient for studying large conformational changes of proteins (e.g. stretching of proteins, folding and thermal unfolding). Here, we extend the capabilities of the EN model by using a harmonic approximation described by Lennard-Jones (LJ) interactions for far contacts and native contacts obtained from the standard overlap criterion as in the case of Go-like models. While our model is validated against the EN model by reproducing the equilibrium properties for a number of proteins, we also show that the model is suitable for the study of large conformation changes by providing various examples. In particular, this is illustrated on the basis of pulling simulations that predict with high accuracy the experimental data on the rupture force of the studied proteins. Furthermore, in the case of DDFLN4 protein, our pulling simulations highlight the advantages of our model with respect to Go-like approaches, where the latter fail to reproduce previous results obtained by all-atom simulations that predict an additional characteristic peak for this protein. In addition, folding simulations of small peptides yield different folding times for alpha-helix and beta-hairpin, in agreement with experiment, in this way providing further opportunities for the application of our model in studying large conformational changes of proteins. In contrast to the EN model, our model is suitable for both normal mode analysis and molecular dynamics simulation. We anticipate that the proposed model will find applications in a broad range of problems in biology, including, among others, protein folding and thermal unfolding.

Full PDF

GGeneralization of the Elastic Network model forthe study of large conformational changes inbiomolecules † Adolfo B. Poma, ∗ a Mai Suan Li, a and Panagiotis E. Theodorakis ∗ a The Elastic Network (EN) is a prime model that describes the long-time dynamics of biomolecules.However, the use of harmonic potentials renders this model insufﬁcient for studying large confor-mational changes of proteins ( e.g. stretching of proteins, folding and thermal unfolding). Here, weextend the capabilities of the EN model by using a harmonic approximation described by Lennard–Jones (LJ) interactions for far contacts and native contacts obtained from the standard overlapcriterion as in the case of G¯o-like models. While our model is validated against the EN model byreproducing the equilibrium properties for a number of proteins, we also show that the model issuitable for the study of large conformation changes by providing various examples. In particular,this is illustrated on the basis of pulling simulations that predict with high accuracy the experimen-tal data on the rupture force of the studied proteins. Furthermore, in the case of DDFLN4 protein,our pulling simulations highlight the advantages of our model with respect to G¯o-like approaches,where the latter fail to reproduce previous results obtained by all-atom simulations that predict anadditional characteristic peak for this protein. In addition, folding simulations of small peptidesyield different folding times for α -helix and β -hairpin, in agreement with experiment, in this wayproviding further opportunities for the application of our model in studying large conformationalchanges of proteins. In contrast to the EN model, our model is suitable for both normal modeanalysis and molecular dynamics simulation. We anticipate that the proposed model will ﬁnd ap-plications in a broad range of problems in biology, including, among others, protein folding andthermal unfolding. One of the main goals in the computer simulation arena ofbiomolecules is to build the simplest yet computationally mostefﬁcient models able to reproduce accurately and predict faith-fully dynamic and structural properties of proteins. A most primeexample of this is the elastic network (EN) , which reproduceswell the low-frequency motion (long-time dynamics) of proteins.The EN has been also employed for modelling other importantbiomolecules such as DNA , RNA , graphene sheet , and cellu-lose ﬁbers , providing information on their equilibrium dynam-ics, the inﬂuence of the native-structure topology on their sta-bility, the localization properties of protein ﬂuctuations or thedeﬁnition of protein domains . Although a number of similarmodels have subsequently appeared in the literature and vari- a Institute of Physics, Polish Academy of Sciences, Al. Lotników 32/46, 02-668 Warsaw,Poland; E-mail: [email protected], [email protected] † Electronic Supplementary Information (ESI) available: Harmonic approximationfor the Lennard–Jones potential. Pulling results for titin and sequence of transitionstates during pulling simulation. See DOI: 10.1039/cXCP00000x/ ous improvements have been suggested , the EN still remainsthe standard model having attracted particular interest due to itssimplicity and ability to provide realistic frequency data . Theuse of EN model for studying processes that involve large con-formational changes of proteins is a current challenge though,in practice due to the required numerical complexity. Therefore,several methodologies have been developed to tackle this prob-lem. For example, certain approaches are based on the update ofthe connectivity or Kirchoff matrix during a linear interpolationbetween two known protein states . However, this approx-imation fails when the two states are unknown or when one orboth of these states are represented by an ensemble of equivalentconﬁgurations ( e.g. unfolded state).The EN model is based only on a single-parameter harmonicpotential between residues that are represented in the model bythe C α atoms. In this model, the harmonic interaction is intro-duced when two residues overlap, i.e. the van der Waals radiiaugmented by a cutoff distance of any pairs of atoms belongingto different residues overlap (see Table 1). Here, the harmonicapproximation of EN contacts models the interaction between C α Journal Name, [year], [vol.] ,, a r X i v : . [ phy s i c s . b i o - ph ] N ov ig. 1 Left panel shows the EN model for a tryptophan-cage motif (PDBID: 1L2Y), the “unbreakable” harmonic EN contacts are shown in bluecolor. Right panel shows the GEN model, where a subset of the ENcontacts are formulated as G¯o contacts (in red) and far contacts by aneffective harmonic term based on the LJ potential (in green). The EN andGEN models do not assume a priori C α -backbone connectivity. Hence,the tube representation of the C α backbone in grey serves only as a guidefor the eye. Still, consecutive C α atoms are connected by EN harmonicbonds, because they are within the cutoff distance of the EN model (seemain text for further details). atoms in the native state, such as the electrostatic and van derWaals interactions, as well as the covalent bonds along the back-bone of C α atoms (Fig. 1). On the one hand, the harmonic ap-proximation is incompatible with the dissociation of native con-tacts for certain processes involving large conformational changesin biomolecules. On the other hand, some important advantagesof the EN from the modeling point of view are avoiding: theuse of computationally expensive simulations based on all-atomforce-ﬁelds and the necessity of including ad hoc backbone stiff-ness in the model. The latter is generally implemented in coarse-grained models based on C α atoms to mimic the all-atomdescription of proteins, which is based on harmonic interactionsused to maintain bonds and bond and dihedral angles. Techni-cally, the EN model is suitable for Normal Mode Analysis (NMA),which requires the calculation of the second-derivative (or Hes-sian) matrix. However, this step requires substantial computermemory and processing power to perform the matrix diagonaliza-tion, which becomes a severe bottleneck in the study of very largecomplexes. Moreover, one typically ignores molecular-interactiondetails at this coarse-grain level of description, which is an ap-proach that has been shown to work better in cases where the pro-tein is packed uniformly . While an initial energy-minimizationstep is needed to satisfy the harmonic pairs ( e.g. steepest de-scent and conjugate gradient methods), this is not necessaryfor small systems .Despite the aforementioned advantages of the EN, this modelcannot be currently used for studying certain processes that in-volve large conformational changes of proteins apart from a fewprevious attempts that require a priori knowledge of the proteinstates , due to the presence of the harmonic bonds. In thefollowing, we overcome this barrier and enhance the numberof possible applications of the EN, for example, protein stretch-ing , prediction of elastic properties without the assump- tion of continuum theory , characterization of folding path-ways denaturation due to high temperature , pressure andsurfactants , as well as denaturation processes at interfaces ( e.g. air–water and oil–water interfaces) based on simple CG poten-tials . Here, we propose a model where harmonic interactionsbeyond the nearest neighbors (distance in sequence larger thanthree) are substituted by harmonic effective terms approximatedby the Lennard–Jones (LJ) potential . In addition, our modelassumes LJ contacts obtained from a contact map based on thestandard overlap criterion (Fig. 1) , namely, we determinenative contacts based on the overlap (OV) of vdW radii of heavyatoms (N, C, C α and O). For the determination of the contact mapone considers each residue as a cluster of spheres. In this case,the radii of the spheres are equal to the van der Waals radii en-hanced by a factor of about 25% (see Table 1). In addition, ourmodel does not require bending- and dihedral-angle parametersalong the C α backbone in the native state. Our model is basedonly on a single parameter as the EN model does, but it enablesthe study of large-conformational changes in biomolecules due tothe use of the contact map. In this respect, we anticipate that ourwork opens the door for the use of EN in a wider range of appli-cations, for example, folding, thermal unfolding, denaturation ofproteins at interfaces, etc .In the following sections, we provide details about our modeland methodologies. After validating the proposed model with theEN model for a number of proteins, we present various studiesthat involve large conformational changes of proteins highlight-ing model’s advantages in comparison with the EN and G¯o-likemodels. In the case of EN, if any two heavy atoms belonging to differentresidues are within a distance r c = vdW i + vdW j + R c , then the tworesidues form an EN contact and a harmonic potential that con-nects the C α positions of these residues is applied. R c is simply acutoff distance and vdW i and vdW j are the van der Waals radii ofheavy atoms belonging to residues i and j (see Table 1). In thiscase, the harmonic potential has the form V h = C ( r ij − r ) , where r is the distance between C α atoms in the native structure ofthe protein and C is a constant indicating the strength of the har-monic potential. The energy scale associated to the EN model isgiven by ε EN = CR . In the case of our model, we use the sameguideline for deﬁning contacts due to EN. They contribute both tobonded and nonbonded energy of the Hamiltonian. The energycan be written as, H = ∑ | i − j | < C ( r ij − r ) + ∑ | i − j | > U nb ij (1)The ﬁrst term is the standard harmonic potential associatedwith the EN model while the second term is the nonbondedcontribution, which enables large conformational changesin the protein. This second term is deﬁned as follows: Journal Name, [year], [vol.] , able 1 List of vdW radii for heavy atoms used to determine the presence of the native contacts in proteins, sugars, and the sugar–protein complex.The third column refers to proteins. The values are taken from ref. . The radii of the enlarged spheres are deﬁned as the vdW radii multiplied by 1.24to account for attraction . No. Atomic Group vdW radius [Å] enlarged radius [Å]1 C H H H H H H H H H H H H H U nb ij = (cid:40) U LJ ( ε cm ) if a native contact forms according to the OV criterion as in G¯o-like models U LJ ( ε harm ) If a non-native contact forms according to the cutoff distance as in the EN model (2)The native contacts found by the OV criterion are describedby a G¯o-like model with LJ interactions ( U LJ ( ε cm ) ). In addi-tion, we use an effective-harmonic term based on the LJ potential[ U LJ ( ε harm ) ] for contacts between residues with a distance in se-quence larger than three. The LJ potential reads U LJ ( ε ij ) = ε ij (cid:34)(cid:18) σ ij r ij (cid:19) − (cid:18) σ ij r ij (cid:19) (cid:35) , (3)where r ij is the distance between any pair of i and j C α atoms inthe system. The relation between the effective harmonic termand the strength of the LJ potential ε ij is simply described bythe formula : ε ij = ε harm , where ε harm = C σ − ( / ) (see SI), σ ij = − / r . Hence, one infers about the ε harm from C and r . In addition, we include contacts by using a contact mapbased on the standard overlap criterion . The latter contactsare represented by LJ potentials U LJ ( ε cm ) . In this case, thestrength of interaction is independent of the distance and equalto ε ij = ε cm for any pair of residues in contact, where ε cm is theunit of energy, and σ ij = − / r . Here, the subindex “cm” de-notes the “contact map" obtained by the overlap criterion. More-over, the latter contacts apply only for residues at a sequentialdistance larger than three. If a native contact coincides with aharmonic effective contact, then the native contact [ U LJ ( ε cm ) ] re-mains and the harmonic contact [ U LJ ( ε harm ) ] is removed. An im-portant characteristic of the model is the proper balance of theenergy scales assigned between contacts for the C α backbone andthose that represent the non-bonded contributions. In particular,there are three energy scales: the EN-type interactions ( ε EN ), thenative ( ε cm ) and the effective harmonic-type interactions ( ε harm ). ε EN = CR , which corresponds to 12.250 kJmol − for R c = . nm, and ε cm = . kJmol − , which reﬂects the strength of hy-drogen bond in proteins . ε harm is about 2.0 kJmol − . Hence- forth, we will simply refer to our model as Generalized ElasticNetwork (GEN) model. We have also investigated other pos-sibilities, such as excluding the effective harmonic interactions U LJ ( ε harm ) and substituting all effective harmonic terms with stan-dard LJ potentials ( ε ij = ε cm , “M1” model) or eliminating all con-tacts beyond 1–4 [ U LJ ( ε harm ) ] and keeping only the native con-tacts [ U LJ ( ε cm ) ] based on the overlap criterion (“M2” model) .Finally, we have also considered the case that we have all ENharmonic bonds irrespective of the sequential distance betweenresidues as in the standard EN model, but those contacts that co-incide with the native contacts derived from the overlap criterionwill be substituted by LJ potentials U LJ ( ε cm ) resulting in what wewill simply refer to as the “M3” model. In the following, we willdiscuss these models for the sake of comparison with the GENand EN models on the basis of the different number of harmonic,effective harmonic and G¯o contacts (see Table 2). We used the GROMACS package to perform standard Nor-mal Mode Analysis (NMA) . The output of NMA is independent(normal) modes (harmonic motions) characterized by an eigen-value (characteristic frequency). Each normal mode acts as asimple harmonic oscillator of a concerted motion of atoms with-out moving the center of mass with all atoms passing throughtheir equilibrium position at the same time. Moreover, normalmodes resonate independently and can be obtained directly bydata obtained from vibrational spectroscopy. In practice, the nor-mal modes are the eigenvectors of the Hessian matrix, which rep-resents the force constants between every possible pair of residuesin the system in all directions of the Cartesian coordinate system.The mean square ﬂuctuations of the C α atoms are calculated Journal Name, [year], [vol.] ,,

To validate our model, we have calculated the B-factors for sev-eral target proteins by using NMA , which are proportional tothe mean square ﬂuctuations of atom positions. We juxtaposedour results with those of Tirion , which were also obtained byusing the same EN approach (Fig. 2). The results shown in Fig. 2were obtained for a cutoff distance of R c = . nm, but, for othermodels considered in this study, we have also investigated differ-ent cutoffs, namely, R c = R c = providing a the-oretical validation of the GEN model in the case of the chosen setof proteins. Another EN model, the so-called Gaussian Networkmodel (GNM) , is able to reproduce closer the experimental re-sults of B-factor related to larger protein complexes. However,this may not be the case for atomic ﬂuctuations derived fromall-atom simulation . Moreover, the GNM cannot be used in simulation and, therefore, the one-to-one correspondence withshort-time atomic ﬂuctuations is not conceived in this formalism.In addition, this model introduces additional concepts from theelastic theory of random polymer network and it is found to bemore appropriate to reproduce available experimental ﬂuctuationdata reported by the B-factors in the case of G-actine. Yet, evenmodels based on this assumption are not reliable for describinglarge conformational changes as they rely by construction on the“unbreakable” harmonic bonds. (a) Residue B ( Å ) ENGENM1M2M3 ω ( c m - ) Mode Number

Residue(b) B ( Å ) ENGENM1M2M3 ω ( c m - ) Mode Number (c)

Residue B ( Å ) ENGENM1M2M3 ω ( c m - ) Mode Number

Fig. 2

The B-factors for the lowest 30 modes for the proteins with PDB ID:5RSA (Ribonuclease-A, a) , 5PTI (Bovine Pacreatic Trypsin Inhibitor, b),and 1ATN (G-actine, c). The results obtained from different models areillustrated as indicated. Here, R c = ω (cm − ) for the lowest 30 modes. To further validate the GEN model, we have compared it withthe standard EN and other possible versions of a breakable ENmodel (M1, M2 and M3) for a number of different protein struc-tures determined by X-Ray diffraction and within 1-2 Å of resolu-tion. For this purpose, we performed extensive tests on the follow-ing proteins: The ﬁrst one is about 124 residues (PDB ID: 5RSA)and corresponds to ribonuclease A , the second one is obtainedfrom the bovine pancreatic trypsin inhibitor with a length of 58residues (PDB ID: 5PTI), and the last one corresponds to a muscle Journal Name, [year], [vol.] , able 2 Total number of bonds/contacts for different models as indicated. Here, the columns are as follows: “EN” indicates the number of EN bonds,“cm” the number of native contacts by using the contact map (overlap criterion), and the “Eff. Harm. (LJ)” indicates the number of effective-harmoniccontacts. R c = PDB ID:1AOH PDB ID:1TITModel Nr. of bonds/contacts EN cm Eff. Harm. (LJ) EN cm Eff. Harm. (LJ)Elastic Network 1131 – – 632 – –GEN 352 349 430 203 156 273M1 352 779 – 203 429 –M2 352 349 – 203 156 –M3 782 349 – 476 156 –protein (G-actine) with 373 residues ((PDB ID: 1ATN). The ﬁrsttwo protein chains are made by helices and beta-strands while thelast chain is folded so as to form two large domains joined by anarrow neck region. These two domains are partly held togetherby salt bridges and hydrogen bonds provided by a nucleotide thatstabilizes the two domains. Our NMA results are presented inFig. 2 along with their corresponding frequency data. Clearly, theGEN model exhibits the best agreement with the EN, again indi-cating the very good approximation of our harmonic terms withappropriate effective LJ interactions and the small inﬂuence of thenative LJ contacts in the model. We have also checked a numberof additional proteins and we have found consistent results anda similar agreement between the GEN and the EN models. More-over, the M3 model, which has undergone a small modiﬁcationby including the native LJ contacts in the EN, exhibits obviouslyalmost absolute agreement with the EN, whereas the M1 and M2models show considerable deviation from the EN, due to the lackof a large number of harmonic or effective-harmonic interactions[ U LJ ( ε harm ) ] (see Table 2). This shows that the U LJ ( ε cm ) termsare not enough to preserve the structure and properties of thetargeted proteins without assuming extra terms that contributeto backbone stiffness ( e.g. bond and dihedral angles). Moreover,the latter terms require tuning, as in the case of G¯o-like models.For a comparison of 64 different G¯o models, see Ref. . As our aim here is to propose an as simple and accurate as possi-ble model for studying mechanical unfolding of proteins, we havecarried out pulling simulations in the same manner as in the caseof single molecule studies performed with AFM . We usedimplicit solvent conditions similar to ref. . Overall, the correctredistribution of contacts between the above three categories (seeTable 2) results in the excellent agreement of our simulations re-sults (GEN model) with the experimental maximum pulling force, F max , in pulling simulations as is shown here for two examples,cohesin (PDB ID: 1AOH) (Fig. 3) and I27 domain of titin (PDBID: 1TIT) (see Fig. S1 in SI). The temperature during pullingsimulations is 0.3 ε cm / k B . The early unfolding scenario at the ex-perimental pulling speed, which gives rise to the assessment ofthe mechanical properties of proteins, is difﬁcult to achieve byusing all-atom simulations, because the time-scale involved is tooshort for stretching proteins in the case of all-atom methods. Inparticular, the typical speed used to stretch proteins in all-atom simulations is nowadays of the order of − nm/ps and theexperimental cantilever speed is around − nm/ps . In thisregard, the CG nature of our approach can be used to study arange of speeds much closer to the experimental conditions incomparison with all-atom models.Multiple proteins are linked sequentially and one can typicallyobserve a number of corresponding peaks, which signal the fullunfolding of individual protein modules. Due to the space res-olution, intermediate unfolding states are not detected in AFMexperiments . However, by using CG models one can usually ac-cess these intermediate states with a better resolution and assignto each of them a force peak . The largest of these force peaks,F max , deﬁnes the characteristic unfolding force for the whole pro-tein domain.The GEN model is the best to reproduce the experimental rup-ture force for cohesin and the I27 domain of titin (Fig. 3). Inparticular, the maximum force is 480 ±

14 and 204 ±

30 pN for1AOH, and 1TIT, respectively (see SI). The M2 model provideda much lower force peak, while a much higher one was observedin the case of the M1 model. This can be explained in terms ofthe number of contacts associated with the LJ interactions (seeFig. 3, bottom panel), which in the case of M2 model appears tobe smaller, but in the case of M1 is much larger (Table 2). In ad-dition, in the case of M3 and EN models there is no peak due tothe presence of the harmonic bonds between the C α atoms thatprevent the unfolding even at very large stretching forces. Theunfolding pathway for 1AOH protein has been previously charac-terized by a G¯o-like model and experiment . It is known thatthe detachment of β ( − ) from β ( − ) domains occursat the same position of the maximum force in the F - d plot. Wehave carried out the analysis of native contacts between pairs of β -strands which are responsible for stabilizing the protein (seeSI). Our results capture the sequential detachment of the sec-ondary structures that give rise to the largest force peak. Ourobservation agrees well with the breaking of native contacts be-tween β and β strands. The characterization of the unfoldingpathway for titin also agrees with the experimental results and is shown in SI. Moreover, proteins do not show any spuri-ous effects with respect to their structure ( e.g. local aggregation)during the stretching due to the presence of harmonic bonds inthe structure for our GEN model. We have further conﬁrmed ourconclusions by investigating a number of different proteins.The mechanostability of the DDFLN4 protein (with PDB ID:1KSR) has been studied experimentally and theoretically Journal Name, [year], [vol.] ,,

480 pN

ENGENM1M2M3

Fig. 3

Top panel shows the plot of force vs. cantilever displacement, d ,for the type I cohesin domain (PDB ID: 1AOH). Also, the experimentalvalue for the maximum force, which is 480 ±

14 pN in the case of 1AOHis indicated by a horizontal line. The bottom panel shows two snapshotsat d = 0 nm and d = 10 nm. In the middle, we highlight the contactsresponsible for the ﬁrst peak (in blue) in GEN model and the additionalyellow contacts that are present in the M1 model. We have performed asimilar analysis for the I27 domain of titin (see SI). with all-atom simulation by Kouza et al. . . In experiment, twopeaks were observed in the force–displacement curve: one waslocated at d = nm and another one at nm. Here, we testedthe performance of the GEN model against a standard G¯o-likemodel for this protein. The G¯o-like model captures the ﬁrst ex-perimental peak approximately at d = nm, but it misses thesecond peak (see Figure 4). This is due to the lack of additionalfar distance contacts, which are only included by the GEN model(in GEN model these are treated by the harmonic approximation).In this regard, the GEN model performs better than the G¯o-likemodel as it reproduces both experimental peaks. Moreover, theGEN model is as good as the all-atom model in predicting me-chanical unfolding intermediates because both models providethree peaks at nearly the same positions. Note that in our CGsimulations we obtained an earlier force peak at d = nm, whichis consistent with all-atom simulation . However, this peak hasnot been detected in experiment. F o r c e [ p N ] d [nm]

1 KSR

200 pN

ENGEN F o r c e [ p N ] d [nm] Go-like

Fig. 4

Top panel shows the force–displacement proﬁles for domain 4 ofthe DDFLN4 protein. The results were obtained by using the GEN model(blue line), the EN model (black line) and the G¯o model (inset) for pullingspeed v= × − nm/ τ , where τ is approximately 1ns. Results were av-eraged over 40 trajectories. Arrows refer to positions of the second peakat d = nm, which is expected to be the same as in experiments .Bottom panel illustrates a simulation snapshot for d = nm, where thecontacts that stabilize the structures are in blue (G¯o-like) and in red (har-monic). The N- and C- termini beads are shown in green and the pullingdirection is denoted by arrows. The data points for the G¯o-like modelwere extracted by using g3data software from ref. . We are presenting here the folding process of two, well docu-mented in the literature, small peptides, namely, an α -helix com-prising the sequence segment 70–83 of the protein HPr from Es-cherichia coli (PDB ID: 1HDN with 85 residues in total) anda β -hairpin (residues 41–56 of the immunoglobulin binding do-main of streptococcal protein G with PDB ID: 1GB1 and 56residues in total). Initial conﬁgurations for MD simulation are un-folded conformations without any G¯o-like or effective harmoniccontacts. According to a standard criterion that is commonly usedin the case of G¯o-like models, the G¯o-like contacts are present inthe structure when the actual distance between two C α atomsis smaller than 1.5 σ ij , where r is the distance between two C α atoms that form a contact in the native conformation. Unfoldedstructures were obtained by heating up the system at 500 K with-out water and making sure that no native contacts are presentin the protein structure by using the above criterion based onthe distance between the C α atoms in the native structure. Inthis way, we produced initial conﬁgurations for 50 statisticallyindependent MD trajectories of length 200 τ at 300 K. Figure 5shows the convergence of the total number of contacts towardsthe folded structure.The GEN model and two of its variants (M1 and M2) allowfor the study of protein folding, whereas the EN model appar-ently is not suitable for folding studies, due to the presence ofthe harmonic bonds. Our results and typical snapshots of un-folded and well-folded (native) structures are presented in Fig-ure 5. In the case of the α -helix, the folding did occur in allindependent trajectories, whereas about 5% of the trajectories inthe case of β -hairpin did not reach the native structure within Journal Name, [year], [vol.] , N C [ % ] GEN β -hairpin α -helix N C [ % ] M1 N C [ % ] M2 N C [ % ] M3 t [100 τ ] Fig. 5

Folding of two small peptides, a β -hairpin and an α -helix. Plotsshow the percentage of G¯o-like and harmonic contacts present at a cer-tain time during the folding process for each case and breakable model.Top panel for GEN model, middle panels for M1 model and M2 and bot-tom panel for M3 model, as indicated. A typical folding time for the α -helixis about 10 τ , while for the β -hairpin it is about 70 τ in the GEN model.Snapshots indicate examples of an unfolded state at the beginning of thesimulation and a ﬁnal folded (native) structure for each peptide. The C α atoms are represented by grey color. Native and harmonic contacts aredescribed by solid blue lines. the time scale of the simulation. Assuming that the time unit τ =1 ns, we obtained the folding times for β -hairpin and α -helixequal to t β f old = 70 ns and t α f old = 10 ns , respectively. Accord-ing to the temperature-jump ﬂuorescence experiment by Munoz et al. , t β f old ≈ µ s. Since the folding time of α -helix is about0.7 µ s the experimental ratio t β f old / t α f old ≈

9, which is close tothe value of 7, which was obtained from our simulations. In thisregard, the other models underestimate the ratio. For instance,M1 and M2 give approximately 2 and 1.3, respectively. Moreover,the M3 model does not capture any distinction between both pep-tides. This is due to the presence of the harmonic contacts thatplace the unfolded state in a very high energy state. This inducesa rapid process towards the folded state just after energy mini- mization. Still, the absolute folding time in GEN is much shorterthan the real time, due to the coarse-graining, while all-atom sim-ulations from different groups have predicted values in the range1–7 µ s . The fact that folding of the α -helix is about seventimes faster than the folding of the β -hairpin in the GEN modelis also in agreement with our previous study , whereas M1 andM2 lead to a shorter time scale separation. In conclusion, this work presents a simple and apparently accu-rate model based on the EN approach for studying large confor-mational changes in proteins. Here, we have shown that the GENmodel, despite its simplicity, maintains a close match with theEN, while it reproduces with high accuracy the maximum forcein AFM-pulling experiments and the folding time scales of pep-tides. On the one hand, there are several limitations in mod-eling proteins by CG approaches in particular due to the lackof details ( e.g., solvent effect, amino acid speciﬁcity, etc. ) andthus we do not expect to capture all possible effects that stabi-lize protein complexes. However, our model handles native inter-actions with simplicity (G¯o-like potentials), which is crucial forenabling conformational changes. Moreover, the effective har-monic interactions described by LJ potentials and the native G¯o-like potentials prevent the steric clashes during the studies carriedout in the present work. However, more sophisticated functionalforms of non-native interactions could be included a posteriori and their effect may be relevant in other applications. The GENmodel has enabled the study of protein folding conﬁrming thetimescale separation (about seven-fold difference in folding timebetween the α -helix and the β -hairpin). On the other hand, ourmodel uses a reduced number of parameters in comparison withany structured-based CG model that enables large conformationalstudies, while its foundation is based on the simple EN modelwith no assumptions about backbone connectivity. As we haveshown, the GEN model provides the same number of peaks inthe force–displacement proﬁle as observed in the case of the all-atom models for the DDFLN4 protein. This result highlights theadvantage of our model over standard Go-like models. In per-spective, one can interface the GEN model with knowledge-basedand free-energy derived potentials for the study of protein aggre-gation phenomena. It could also be used to study denaturationphenomena, for example, due to large changes in temperature orpressure. Such and other phenomena could possibly be describedby our simple EN-type model and it would be interesting to checkin the future the prediction power of GEN for different proteinsystems. Acknowledgements

This research has been supported by the National Sci-ence Centre, Poland, under grant No. 2015/19/P/ST3/03541,No. 2015/19/B/ST4/02721, and No. 2017/26/D/NZ1/00466.This project has received funding from the European Union’sHorizon 2020 research and innovation programme under theMarie Skłodowska–Curie grant agreement No. 665778. This re-search was supported in part by PLGrid Infrastructure.

Journal Name, [year], [vol.] ,,

Journal Name, [year], [vol.] ,, eferences Phys. Rev. Lett. , 1996, , 1905–1908.2 P. Setny and M. Zacharias, J. Chem. Theory Comput. , 2013, ,5460–5470.3 M. T. Zimmermann and R. L. Jernigan, RNA , 2014, , 792–804.4 G. Pinamonti, S. Bottaro, C. Micheletti and G. Bussi, NucleicAcids Res. , 2015, , 7260–7269.5 M. H. Kim, D. Kim, J. B. Choi and M. K. Kim, Phys. Chem.Chem. Phys. , 2014, , 15263–15271.6 D. C. Glass, K. Moritsugu, X. Cheng and J. C. Smith, Biomacro-molecules , 2012, , 2634–2644.7 Q. Cui and I. Bahar, Normal Mode Analysis. Theory and ap-plications to biological and chemical systems. , Chapman &Hall/CRC, 2006.8 I. Bahar, A. R. Atilgan and B. Erman,

Fold. Design , 1997, ,173–181.9 T. Haliloglu, I. Bahar and B. Erman, Phys. Rev. Lett. , 1997, ,3090.10 K. Hinsen, Proteins , 1998, , 417–429.11 A. Hinsen, K. Thomas and M. Field, Proteins , 1999, , 369–382.12 A. R. Atilgan, S. R. Durell, R. L. Jernigan, M. C. Demirel, O. Ke-skin and I. Bahar, Biophys. J. , 2001, , 505–515.13 F. Tama and Y.-H. Sanejouand, Protein Eng. , 2001, , 1.14 M. K. Kim, R. L. Jernigan and G. S. Chirikjian, Biophys. J. ,2002, , 1620–1630.15 Y. Feng, L. Yang, A. Kloczkowski and R. L. Jernigan, Proteins:Struct., Funct., Bioinf. , 2009, , 551–558.16 A. Das, M. Gur, M. H. Cheng, S. Jo, I. Bahar and B. Roux, PLOS comput. Biol. , 2014, , e1003521.17 M. Tekpinar and W. Zheng, Proteins: Struct., Funct., Bioinf. ,2010, , 2469–2481.18 C. Clementi, H. Nymeyer and J. N. Onuchic, J. Mol. Biol. ,2000, , 937–953.19 J. Karanicolas and C. L. Brooks,

Protein Sci. , 2002, , 2351–2361.20 A. B. Poma, M. Cieplak and P. E. Theodorakis, J. Chem. TheoryComput. , 2017, , 1366–1374.21 A. Van Wynsberghe, G. Li and Q. Cui, Biochemistry , 2004, ,13083–13096.22 Z. Bagci, A. Kloczkowski, R. L. Jernigan and I. Bahar, Proteins:Struct., Funct., Bioinf. , 2003, , 56–67.23 R. Fletcher and M. J. Powell, Comput. J. , 1963, , 163–168.24 D. S. Kershaw, J. Comput. Phys. , 1978, , 43–65.25 X. Periole, M. Cavalli, S.-J. Marrink and M. A. Ceruso, J. Chem.Theory Comput. , 2009, , 2531–2543.26 M. Rief, M. Gautel, F. Oesterhelt, J. M. Fernandez and H. E.Gaub, Science , 1997, , 1109–1112.27 M. S. Kellermayer, S. B. Smith, H. L. Granzier and C. Busta-mante,

Science , 1997, , 1112–1116.28 J. Sułkowska, A. Kloczkowski, T. Sen, M. Cieplak and R. Jerni-gan,

Proteins , 2008, , 45–60. 29 S. Kumar and M. S. Li, Phys. Rep. , 2010, , 1 – 74.30 N. Becker, E. Oroudjev, S. Mutz, J. P. Cleveland, P. K. Hansma,C. Y. Hayashi, D. E. Makarov and H. G. Hansma,

Nat. Mater. ,2003, , 278–283.31 L. D. Landau and E. Lifshitz, Course of Theoretical Physics ,1986, , 109.32 S. E. Jackson, Fold. Des. , 1998, , R81–R91.33 S. Benjwal, S. Verma, K.-H. Röhm and O. Gursky, Protein Sci. ,2006, , 635–639.34 N. Hillson, J. N. Onuchic and A. E. García, Proc. Natl. Acad.Sci. U.S.A. , 1999, , 14848–14853.35 D. E. Otzen, Biophys. J. , 2002, , 2219–2230.36 Y. Zhao and M. Cieplak, Phys. Chem. Chem. Phys. , 2017.37 A. Poma, M. Chwastyk and M. Cieplak,

J. Phys. Chem. B , 2015, , 12028–12041.38 M. Cieplak and T. X. Hoang,

Biophys. J. , 2003, , 475–488.39 J. Tsai, R. Taylor, C. Chothia and M. Gerstein, J. Mol. Biol. ,1999, , 253–66.40 J. I. Sulkowska and M. Cieplak,

Biophys. J. , 2008, , 3174–91.41 N. G¯o and H. Taketomi, Proc. Natl. Acad. Sci. U.S.A. , 1978, ,559–563.42 N. G¯o and H. Abe, Biopolymers , 1981, , 991–1011.43 T. X. Hoang and M. Cieplak, J. Chem. Phys. , 2000, , 8319–8328.44 J. I. Sułkowska and M. Cieplak,

J. Phys. Condens. Matter ,2007, , year.45 H. Berendsen, D. van der Spoel and R. van Drunen, Comput.Phys. Commun. , 1995, , 43–56.46 D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. Markand H. Berendsen, J. Comput. Chem. , 2005, , 1701–1718.47 B. Hess, C. Kutzner, D. van der Spoel and E. Lindahl, J. Chem.Theory Comput. , 2008, , 435–447.48 S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apos-tolov, M. Shirts, J. Smith, P. Kasson, D. van der Spoel, B. Hessand E. Lindahl, Bioinformatics , 2013, , 845–854.49 M. Carrion-Vazquez, A. F. Oberhauser, S. B. Fowler, P. E.Marszalek, S. E. Broedel, J. Clarke and J. M. Fernandez, Proc.Natl. Acad. Sci. U.S.A. , 1999, , 3694–3699.50 P. E. Marszalek, H. Lu, H. Li, M. Carrion-Vazquez, A. F. Ober-hauser, K. Schulten and J. M. Fernandez, Nature , 1999, ,100–103.51 I. Bahar and A. J. Rader,

Curr. Opin. Struct. Biol. , 2005, ,586–592.52 E. Fuglebakk, N. Reuter and H. K, J. Chem. Theory Comput. ,2013, , 5618–5628.53 P. Flory, Proc. R. Soc. Lond. , 1976, , 351–380.54 A. Valbuena, J. Oroz, R. Hervás, A. M. Vera, D. Rodríguez,M. Menéndez, J. I. Sulkowska, M. Cieplak and M. Carrión-Vázquez,

Proc. Natl. Acad. Sci. U.S.A. , 2009, , 13791–13796.55 A. Wlodawer, N. Borkakoti, D. Moss and B. Howlin,

Acta Crys-tallogr. B , 1986, , 379–387.56 A. Wlodawer, J. Walter, R. Huber and L. Sjölin, J. Mol. Biol. , Journal Name, [year], [vol.] , , 301–329.57 W. Kabsch, H. G. Mannherz, D. Suck, E. F. Pai and K. C.Holmes, Nature , 1990, , 37.58 M. Cieplak, T. X. Hoang and M. O. Robbins,

Proteins: Struct.,Funct., Genet. , 2002, , 114–124.59 M. Sotomayor and K. Schulten, Science , 2007, , 1144–1148.60 P. E. Marszalek, H. Lu, H. Li, M. Carrion-Vazquez, A. F. Ober-hauser, K. Schulten and J. M. Fernandez,

Nature , 1999, ,100–103.61 M. Wojciechowski, P. Szymczak, M. Carrión-Vázquez andM. Cieplak,

Biophys. J. , 2014, , 1661–1668.62 A. Valbuena, J. Oroz, R. Hervás, A. M. Vera, D. Rodríguez,M. Menéndez, J. I. Sulkowska, M. Cieplak and M. Carrión-Vázquez,

Proc. Natl. Acad. Sci. U.S.A. , 2009, , 13791–13796.63 S. B. Fowler, R. B. Best, J. L. T. Herrera, T. J. Rutherford,A. Steward, E. Paci, M. Karplus and J. Clarke,

J. Mol. Biol. ,2002, , 841–849.64 R. B. Best, S. B. Fowler, J. L. T. Herrera, A. Steward, E. Paciand J. Clarke,

J. Mol. Biol. , 2003, , 867–877.65 I. Schwaiger, M. Schleicher, A. A. Noegel and M. Rief,

EMBOreports , 2005, , 46–51.66 I. Schwaiger, A. Kardinal, M. Schleicher, A. A. Noegel and M. Rief, Nat. Struct. Mol. Biol. , 2004, , 81.67 J. Frantz, , 2009.68 M. Sikora, J. I. Sułkowska and M. Cieplak, PLOS Comput.Biol. , 2009, , e1000547.69 M. Kouza, C.-K. Hu, H. Zung and M. S. Li, J. Chem. Phys. ,2009, , 12B608.70 N. A. van Nuland, I. W. Hangyi, R. C. van Schaik, H. J. Berend-sen, W. F. van Gunsteren, R. M. Scheek and G. T. Robillard,

J.Mol. Biol. , 1994, , 544–559.71 A. M. Gronenborn, D. R. Filpula, N. Z. Essig, A. Achari,M. Whitlow, P. T. Wingﬁeld and G. M. Clore,

Science , 1991, , 657–661.72 V. Munoz, P. A. Thompson, J. Hofrichter and W. A. Eaton,

Nature , 1997, , 196–199.73 J. Kubelka, J. Hofrichter and W. A. Eaton,

Curr. Opin. Struct.Biol. , 2004, , 76–88.74 P. H. Nguyen, G. Stock, E. Mittag, C. K. Hu and M. S. Li, Proteins , 2005, , 795–808.75 B. Zagrovic, E. J. Sorin and V. Pande, J. Mol. Biol. , 2001, ,151–169.76 A. E. Garcia and K. Y. Sanbonmatsu,

Proteins , 2001, , 345–354. Journal Name, [year], [vol.] ,,