[PDF] Effective harmonic potentials: insights into the internal cooperativity and sequence-specificity of protein dynamics

Abstract

The proper biological functioning of proteins often relies on the occurrence of coordinated fluctuations around their native structure, or of wider and sometimes highly elaborated motions. Coarse-grained elastic-network descriptions are known to capture essential aspects of conformational dynamics in proteins, but have so far remained mostly phenomenological, and unable to account for the chemical specificities of amino acids. Here, we propose a method to derive residue- and distance-specific effective harmonic potentials from the statistical analysis of an extensive dataset of NMR conformational ensembles. These potentials constitute dynamical counterparts to the mean-force statistical potentials commonly used for static analyses of protein structures. In the context of the elastic network model, they yield a strongly improved description of the cooperative aspects of residue motions, and give the opportunity to systematically explore the influence of sequence details on protein dynamics.

Full PDF

EEﬀective harmonic potentials: insights into the internalcooperativity and sequence-speciﬁcity of protein dynamics

Yves Dehouck , and Alexander S. Mikhailov Department of Physical Chemistry, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg4-6, 11495 Berlin, Germany. Department of BioModelling, BioInformatics and BioProcesses, Universit´e Libre de Bruxelles(ULB), CP165/61, Av. Fr. Roosevelt 50, 1050 Brussels, Belgium. [email protected] [email protected]

Abstract

The proper biological functioning of proteins often relies on the occurrence of coordinated ﬂuctuationsaround their native structure, or of wider and sometimes highly elaborated motions. Coarse-grained elastic-network descriptions are known to capture essential aspects of conformational dynamics in proteins, but haveso far remained mostly phenomenological, and unable to account for the chemical speciﬁcities of amino acids.Here, we propose a method to derive residue- and distance-speciﬁc eﬀective harmonic potentials from thestatistical analysis of an extensive dataset of NMR conformational ensembles. These potentials constitutedynamical counterparts to the mean-force statistical potentials commonly used for static analyses of proteinstructures. In the context of the elastic network model, they yield a strongly improved description of thecooperative aspects of residue motions, and give the opportunity to systematically explore the inﬂuence ofsequence details on protein dynamics.

Introduction

Deciphering the motions that underlie many aspects ofprotein function is a major current challenge in molecu-lar biology, with the potential to generate numerous ap-plications in biomedical research and biotechnology. Al-though molecular dynamics (MD) hold a prominent po-sition among computational approaches, considerable ef-forts have been devoted to the development of coarse-grained models of protein dynamics [1]. Besides their abil-ity to follow motions on time scales that are usually notaccessible to MD simulations, these models also give thepossibility to better understand the general principles thatrule the dynamical properties of proteins. 13 ˚A.The elegant simplicity of the elastic network models(ENM) certainly contributed to their popularity, and theyhave been successfully exploited in a wide range of ap- plications [2, 3]. In these models, the residues are usu-ally represented as single particles and connected to theirneighbors by Hookean springs [4, 5]. The input structureis assumed to be the equilibrium state, i.e. the globalenergy minimum of the system. Common variants in-clude the homogeneous ENM, in which springs of equalstiﬀness connect pairs of residues separated by a distancesmaller than a predeﬁned cutoﬀ, and other versions inwhich the spring stiﬀness decays as the interresidue dis-tance increases [6–8]. In all cases, the equations of motioncan be either linearized around equilibrium, to perform anormal mode analysis of the system [9–11], or integratedto obtain time-resolved relaxation trajectories [12, 13].Despite their many achievements, purely structuralENM also come with severe limitations. Notably, model-ing the possible eﬀects of mutations within this frameworkusually requires random local perturbations of the spring1 a r X i v : . [ q - b i o . B M ] A p r onstants [14], or a more drastic removal of links fromthe network [15]. A few attempts have been made to in-clude sequence-speciﬁcity in the ENM by setting the springconstants proportional to the depth of the energy min-ima, as estimated by statistical contact potentials [16, 17].However, this approach cannot be extended to distance-dependent potentials, for they are not consistent with theground hypothesis of the ENM, i.e. that all pairwise in-teraction potentials are at their minimum in the nativestructure. Other studies have led to the conclusion thatthe ENM behave as entropic models dominated by struc-tural features, and that the level of coarse-graining is prob-ably too high to incorporate sequence details [5, 18]. Still,the chemical nature of residues at key positions can havesigniﬁcant eﬀects on the main dynamical properties of aprotein. Hinge motions [19], for instance, obviously re-quire some architectural conditions to be fulﬁlled, such asthe presence of two domains capable of moving relativelyindependently. But the amplitude and preferred directionof the motion are most likely determined by ﬁne tuningof speciﬁc interactions in the hinge region. In proteinssubject to domain swapping, the hinge loops have indeedbeen shown to frequently include residues that are not op-timal for stability [20]. The importance of the amino acidsequence has also been repeatedly emphasized by experi-mental studies of the impact of mutations on the confor-mational dynamics of proteins [21–23].A major obstacle to the deﬁnition of accurate coarse-grained descriptions of protein dynamics lies in the highlycooperative nature of protein motions, which makes it dif-ﬁcult to identify the properties of the individual buildingblocks independently of the overall architecture of eachfold. By condensing the information contained in a mul-titude of NMR ensembles, we build here a mean proteinenvironment, in which the behavior of residue pairs canbe tracked independently of each protein’s speciﬁc struc-ture. This methodology brings an eﬃcient way of assessingcoarse-grained models of protein dynamics and of derivingeﬀective energy functions adapted to these models. In thecontext of the ENM, we identify a set of spring constantsthat depend on both the interresidue distances and thechemical nature of amino acids, and that markedly im-prove the performances of the model. Results

Dynamical properties of proteins from the per-spective of an average pair of residues

The mean-square ﬂuctuations of individual residues(MSRF) have been extensively relied on to characterizeprotein ﬂexibility and to evaluate coarse-grained models ofprotein dynamics [24], in part because of their widespreadavailability as crystallographic B-factors. However, since

A BC D E

Individual fluctuationsPairwise fluctuations< Δ R A2 > = 1.00< Δ R B2 > = 1.22< Δ R C2 > = 0.57< Δ R D2 > = 1.01< Δ R E2 > = 0.89 γ AB = 1.05 γ CD = 1.18 γ AE = 0.48 γ AB = 0.97 γ CD = 1.09 γ AE = 0.61°°° Figure 1:

Schematic illustration of the apparent stiﬀness γ . Asimple model containing 8 beads connected by elastic springswas subjected to 10 integration steps under Gaussian noise.Selected values of < (∆ R i ) > , γ and γ ◦ are given in arbitraryunits. Individually, the pairs A-B and C-D would be identi-cal, but they experience diﬀerently the inﬂuence of the otherbeads. As a result, the C-D pair is eﬀectively more rigid thanA-B ( γ AB < γ CD ). In both cases, the motions are somewhatcorrelated, as the apparent stiﬀness γ is larger than what isexpected from the knowledge of their individual motions ( γ ◦ ).Beads A and E do not interact directly but the eﬀect of thenetwork on their relative motions is captured by the values of γ AE and γ ◦ AE . the MSRF carry little information about the cooperativenature of residue motions, we propose to examine thedynamical behavior of proteins from the perspective ofresidue pairs rather than individual residues. Informa-tion about the ﬂuctuations of interresidue distances is con-tained in the data of NMR experiments for numerous pro-teins, and will be exploited here. We deﬁne the apparentstiﬀness of a pair of residues i , j in a protein p : γ pij = 2 k B T /σ r pij (1)where k B is the Boltzmann constant, T the temperature,and σ r pij the variance of the distance r between residues i and j , in a structural ensemble representative of the equi-librium state. γ pij is deﬁned up to a multiplicative factor,which corresponds to the temperature. We also introducethe uncorrelated apparent stiﬀness γ ◦ pij , to quantify theimpact of the individual ﬂuctuations of residues i and j on the ﬂuctuations of the distance that separates them.This is achieved by using σ ◦ r pij instead of σ r pij in eq. ,where σ ◦ r pij is computed after exclusion of all correlationsbetween the motions of residues i and j ( Methods ).As illustrated in Figure 1, γ can be quite diﬀerent fromone residue pair to another. Indeed, besides the impactof direct interactions, γ is also strongly dependent on theoverall fold of the protein, and on the position of the pairwithin the structure. To remove the speciﬁc inﬂuence ofeach protein’s architecture, we deﬁne the apparent stiﬀnessin a mean protein environment γ ( s, d ):2 ( s, d ) = 2 k B Tσ r ( s, d ) , with σ r ( s, d ) = (cid:80) Pp =1 (cid:80) N p ( s,d ) ij M p σ r pij (cid:80) Pp =1 N p ( s, d ) M p (2)where s is one of 210 amino acid pairs, d the discretizedequilibrium distance between pairs of residues ( d ≤ r pij

Comparison of the experimental and predicted val-ues of the apparent stiﬀness γ ( d ). Experimental values of γ ( d )(continuous black) and γ ◦ ( d ) (dashed black), extracted fromthe dataset of 1500 NMR ensembles. Values of γ ( d ) predictedon the same dataset by the ENM (dashed red); ENM (con-tinuous red); ENM (dashed blue); ENM (continuous blue). (e.g. hydrophobic core vs. exposed surface loops). How-ever, there is only a limited agreement between γ ( s ) and γ ◦ ( s ) (Fig. 3A-B): the correlation coeﬃcient is equal to0.71, and γ ( s ) spans a much wider range of values. Be-yond individual amino acid preferences, the speciﬁcs ofresidue-residue interactions play thus a signiﬁcant role indetermining the extent of cooperativity in residue motions. Accuracy of elastic network models in reproducingthe dynamical properties of proteins

The computation of the apparent stiﬀness of residue pairsin a mean protein environment provides an interesting toolto probe the dynamical properties of proteins. It also gen-erates a very straightforward approach to assess the abilityof coarse-grained models to reproduce accurately this gen-eral behavior.We focus here on four common variants of the residue-based ENM [25, 26], which diﬀer only by the functionalform of the spring constants κ . The dependence of κ onthe interresidue distance r pij is deﬁned by two parame-ters: the cutoﬀ distance l d , above which residues i and j are considered disconnected, and the exponent α thatdetermines how fast κ decreases with increasing distances: κ pij (ENM αl d ) = a p H ( l d − r pij ) r − αpij (3)where H is the Heaviside function. The value of thetemperature-related factor a p is obtained, for each pro-tein independently, by ﬁtting the predicted MSRF withthe experimental ones. This ensures that the amplitudeof the individual ﬂuctuations of the beads in the networkis on average equal to that observed in the corresponding3 γ° (s) B experimental γ (s) C predicted by the ENM γ (s) A experimental Figure 3:

Comparison of the experimental and predicted values of the apparent stiﬀness γ ( s ), in the dataset of 1500 NMRensembles. For each amino acid, the median value of γ ( s ) over the 20 possible partners is given in units of k B T , along withthe maximal, minimal, 1 st and 3 rd quartile values. Outliers from these distributions are depicted as circles. (A) Experimentalvalues of γ ( s ). (B) Experimental values of γ ◦ ( s ). (C) Values of γ ( s ) predicted by the ENM . NMR ensemble, and that the predicted γ ( s, d ) values canthus be directly compared with those extracted from theNMR data. We consider the following models: ENM ,ENM , ENM , ENM . These ENM variants were usedto estimate the value of σ r pij for each pair of residues inthe 1500 proteins of our NMR dataset ( Methods ), and tosubsequently compute γ ( d ) and γ ( s ) from eq. .Strikingly, all ENM variants systematically predict γ ( d )values to be lower than the experimental ones, at least upto interresidue distances of 20-30 ˚A(Fig. 2). These modelsoverestimate thus the amplitude of pairwise ﬂuctuations,relatively to the amplitude of individual ﬂuctuations. Forexample, if two residues in a protein undergo highly cor-related motions, the amount of thermal energy necessaryto induce a moderate variance on the distance betweenthem will generate high variances on their individual co-ordinates. Consequently, if the motions of the beads ofthe ENM are less coordinated, adjusting the scale of thespring constants to reproduce the amplitude of individualﬂuctuations leads to an overestimated variance on the in-terresidue distances, and thus to lower γ ( d ) values. Thisproblem is particularly apparent when κ is assumed todecrease proportionally to the square of the interresiduedistance, in the ENM . Although this model was shownto perform well in predicting MSRF values [8], our resultssuggest that it negates almost completely the coordinatedaspect of residue motions. Indeed, as shown in Figure 2,the γ ( d ) values predicted by this model are very close tothose obtained from the experimental data after removalof the correlations between the motions of the diﬀerentresidues ( γ ◦ ( d )). This observation is consistent with theextremely short atom-atom correlation length character-istic of the ENM , recently estimated on the basis of anX-ray structure of Staphylococcal nuclease [25]. The ENM is often considered as an entropic model, notdetailed enough to include sequence information in a rele-vant way [5,18]. It is therefore hardly surprising that com-mon ENM variants produce a poor description of the se-quence speciﬁcities of protein dynamics. Individual aminoacid preferences for more or less densely connected regionsare responsible for some variety in the predicted values of γ ( s ) (Fig. 3C). However, this variety is far from matchingthe one observed in the experimental data, as shown bya much narrower range of γ ( s ) values, and a limited cor-relation coeﬃcient with the experimental γ ( s ) values, e.g.0.62 for the ENM (Supplementary Fig. 3). Derivation of eﬀective harmonic potentials

Mean-force statistical potentials are commonly used toperform energetic evaluations of static protein structures[27–29]. These potentials do not describe explicitly the”true” physical interactions, but provide eﬀective energiesof interaction in a mean protein environment, in the con-text of a more or less simpliﬁed structural representation.Similarly, within the ENM framework, κ ( s, d ) deﬁnes foreach pair of residues an harmonic interaction potential.This potential is also eﬀective in nature, accounting im-plicitly for everything that is not included in the model(e.g. the surrounding water). Hence, we seek to identifythe value of κ yielding the most accurate reproductionof the dynamical behavior of each type of pair ( s, d ) in amean protein environment, which is conveniently capturedby the apparent stiﬀness γ ( s, d ).For that purpose, let us deﬁne E bond ( s, d ) as the energyof the elastic spring connecting two residues of type ( s, d ),in a mean protein environment:4

10 20

Interresidue distance d (Å) -3 -2 -1 S p r i ng c on s t an t s o f t he d E N M : κ ( d ) Interresidue distance d (Å) -3 -2 -1 S p r i ng c on s t an t s o f t he s d E N M : κ ( s , d ) I-ID-RG-G

Spring constants of the sENM : κ (s) A0.22.3 C D E F G H I K L M N P Q R S T V W YACDEFGHIKLMNPQRSTVWY

A B C

Figure 4:

Eﬀective harmonic potentials. (A) Spring constants of the sENM , for the 210 amino acid pairs. (B) Springconstants of the dENM. The dashed line corresponds to κ ∼ r − . (C) Spring constants of the sdENM for 3 amino-acid pairs.All κ values are given in Supplementary Tables 2-5. E bond ( s, d ) = 12 κ ( s, d ) σ r ( s, d ) = k B T κ ( s, d ) γ ( s, d ) (4)where γ ( s, d ) is the apparent stiﬀness extracted from theexperimental data. E bond ( s, d ) is unknown and is expectedto be diﬀerent for diﬀerent pair types ( s, d ). The knowl-edge of γ ( s, d ) is thus not suﬃcient to estimate directly κ ( s, d ). However, from any approximate set of spring con-stants κ (cid:48) ( s, d ), we may build the ENM for all proteins inour dataset, to reproduce the mean protein environment,and compute for each pair type an estimated value of theapparent stiﬀness, γ (cid:48) ( s, d ), and bond energy, E (cid:48) bond ( s, d ).Since the behavior of a given residue pair is highly de-pendent on its environment, we can make the assump-tion that E (cid:48) bond ( s, d ) is a relatively good approximationof E bond ( s, d ), even if κ (cid:48) ( s, d ) (cid:54) = κ ( s, d ): E (cid:48) bond ( s, d ) = k B T κ (cid:48) ( s, d ) γ (cid:48) ( s, d ) (cid:39) E bond ( s, d ) (5)Indeed, if the spring stiﬀness of a residue pair is underes-timated ( κ (cid:48) < κ ), it will also appear as less rigid in theENM than in the experimental data ( γ (cid:48) < γ ). A more de-tailed discussion is given in Supplementary Note

1. Fromeqs. and , we devise thus an iterative procedure inwhich κ ( s, d ) is updated at each step k by confronting thepredicted values of the apparent stiﬀness, γ k ( s, d ), withthe experimental ones, γ ( s, d ). It is expected to convergewhen γ k ( s, d ) → γ ( s, d ), that is, when the predictions ofthe model agree with the experimental data: κ k +1 ( s, d ) = κ k ( s, d ) γ ( s, d ) γ k ( s, d ) (6)We used this approach to derive, from the NMR data,four novel ENM variants: the distance-dependent dENM ; the sequence-dependent sENM and sENM , with a dis-tance cutoﬀ of 10 and 13 ˚A, respectively, and the sequence-and distance-dependent sdENM ( Methods ). Interestingly,the κ values for the 210 amino acid pairs in the sENM arerelatively well correlated with the corresponding contactpotentials [28], even though they result from totally dif-ferent approaches (Supplementary Fig. 4). Some commongeneral trends can be identiﬁed, e.g. hydrophobic contactstend to be associated with both favorable interaction en-ergies and large κ values (Fig. 4A). However, the overallcorrespondence remains limited, indicating that the deter-minants of protein rigidity and stability are related, butdistinct. The distance dependence of κ in the dENM isremarkably similar to the r − power law that was previ-ously obtained by ﬁtting against a 1.5ns MD trajectoryof a C-phycocyanin dimer [6] (Fig. 4B), although our newmodel presents more detailed features. Notably, κ remainsapproximately constant up to interresidue distances of 5-6˚A, and then drops by about two orders of magnitude toreach a second plateau between 7 and 12 ˚A. The κ valuesof the sdENM are shown in Figure 4C, for a few aminoacid pairs. This model not only combines the strengths ofthe sENM and the dENM, but also reveals the sequencespeciﬁcity of the κ distance dependence. The D-R pair,for example, is almost as rigid as I-I at short distancesconsistent with the formation of a salt bridge, but almostas ﬂexible as G-G at larger distances. Performances of the new ENM

The sdENM yields a much more accurate reproduction ofthe dynamical behavior of residue pairs in a mean proteinenvironment than the common ENM variants, as demon-strated by the good agreement between experimental andpredicted values of γ ( s ) (Fig. 5A, Supplementary Fig. 5),and γ ( d ) (Fig. 5B).5 γ ( s ) p r ed i c t ed b y t he s d E N M γ (s) experimental Correlation coefficient = 0.95 A ppa r en t s t i ff ne ss γ ( d ) ( k B T ) Interresidue distance d (Å) a b c

0% 50%-50% ε (sdENM) - (ENM ) σ ε σ Figure 5:

Performances of the sdENM. (A) Experimental and predicted values of γ ( s ), in the dataset of 1500 NMR ensembles.See also Supplementary Figures 3 and 5. (B) Experimental (continuous) and predicted (dashed) values of γ ( d ), in the dataset of1500 NMR ensembles (black), and in a single protein (PDB 1xqq) (green). See also Supplementary Figure 2. (C) Comparisonof the ability of the sdENM and the ENM to correctly reproduce the ﬂuctuations of a single protein, on the basis of a highquality structural ensemble of human ubiquitin (PDB 1xqq) [30]. 20 randomly selected residue pairs are connected by solidlines. Shades of blue indicate a better performance of the sdENM, while shades of red indicate a better performance of theENM . See also Supplementary Figure 6. Beyond its performances in a mean protein environment,our new model also brings highly notable improvementswith respect to previously described ENM variants whenit is applied to the speciﬁc architecture of a given protein.This is illustrated by two examples, on Figure 5C andSupplementary Figure 6. A more thorough assessment ofthe ability of the diﬀerent ENM variants to capture themotions of individual proteins was performed on an in-dependent dataset of 349 proteins. The correlation coeﬃ-cient between predicted and observed MSRF ( r B ) has beenwidely used in the past but ignores the cooperativity inher-ent to protein dynamics, and presents other shortcomings.Therefore, we introduce a new measure ( (cid:15) σ ) that quanti-ﬁes the relative error on the estimation of the variabilityof the distance between residue pairs, and is thus focusedon the cooperative aspects of residue motions ( Methods ). Table 1:

Performances of diﬀerent ENM variants. ( a ) Aver-age correlation coeﬃcient between experimental and measuredMSRF. ( b ) Average relative error on the ﬂuctuations of inter-residue distances. r ( a ) B (cid:15) ( b ) σ (cid:15) SR σ (cid:15) MR σ (cid:15) LR σ ENM is better at predicting the individual residue ﬂuc-tuations (Table 1). Interestingly, the ENM , with its sim-ple cutoﬀ distance, appears superior when it comes to thereproduction of cooperative motions ( (cid:15) σ = 0 . for individual ﬂuctua-tions ( r B = 0 . for thedescription of cooperativity ( (cid:15) σ = 0 . / with ENM / ,and sdENM with dENM. It consists in a slight improve-ment of the correlation coeﬃcient r B , and a pronounceddecrease of the error (cid:15) σ , especially at short- (0-15 ˚A) andmid- (15-30 ˚A) range. Discussion

For the last decades, statistical potentials extracted fromdatasets of known protein structures [27–29] have playeda critical role in static analyses of protein structures, withmajor applications including structure prediction, protein-protein docking, or rational mutant design. Our studydemonstrates that a similar approach can be taken to de-rive eﬀective energy functions that are speciﬁcally adaptedto the coarse-grained modeling of protein dynamics.More precisely, in the context of the ENM, we exploiteda dataset of 1500 NMR ensembles to determine the valuesof the spring constants that describe best the behavior ofpairs of residues, as a function of both their chemical na-6ure and the distance separating them. The success of ourapproach is attested by a drastic enhancement of the abil-ity to accurately describe the cooperative nature of residuemotions, with respect to previously described ENM vari-ants. Moreover, a deﬁnite advantage of our method is thatthe eﬀective parameters characterizing the strength of thevirtual bonds are directly extracted from the experimen-tal data without any a priori conception of their functionalform. The fact that the distance dependence of the springconstants that we retrieve is quite similar to the r − powerlaw, which was considered so far as underlying one of thebest performing ENM variants [6, 25], also constitutes amajor support to our approach.In our derivation scheme, the virtual bonds areparametrized so as to reproduce the behavior of aminoacid pairs in a mean protein environment. The analysisof the ability of diﬀerent models of protein dynamics todescribe the motions of residues within this environmentsheds an interesting new light on the properties of thesemodels. In particular, our results indicate that previousENM variants underestimate, sometimes dramatically, therigidity of amino acid pairs at short- and mid-range. Ournew model does however provide a much more accuratereproduction of the balance between short-range and long-range coordinated motions. This is arguably a crucial as-pect when considering, for example, the consequences oflocalized alterations induced by ligand binding on signaltransduction or global conformational changes, such as inATP-powered molecular motors.Importantly, our results also demonstrate that the ENMdoes not have to be exclusively structural, and that se-quence details may be allowed to play a major role incoarse-grained descriptions of protein dynamics. Thereby,this study paves the way towards comparative analysesof motions in proteins that share a similar structure butpresent diﬀerences in sequence. Such investigations willprove particularly interesting in the context of the ratio-nal design of (modiﬁed) proteins with controlled dynami-cal properties. Although we focused here on residue-basedelastic network models, our approach is not limited to thisparticular family, and can be readily implemented to eval-uate and optimize other coarse-grained models of proteindynamics. Notably, the impact of chemical speciﬁcity onthe dynamical behavior of residues should be even moreaccurately rendered by eﬀective potentials based on a moredetailed structural description. Methods

NMR Dataset

We retrieved, from the Protein Data Bank [31], ensemblesof at least 20 models from solution NMR experiments, cor-responding to monomeric proteins of at least 50 residues that present at most 30% sequence identity with one an-other. Entries under the SCOP classiﬁcations ”Peptides”or ”Membrane and cell surface proteins” were not consid-ered. The presence of ligands, DNA or RNA molecules,chain breaks, non-natural amino acids, and diﬀerences inthe number of residues per model were also grounds for re-jection. These criteria led to the selection of 1849 distinctstructural ensembles. A subset of 1500 ensembles was ran-domly selected for the main analysis, and the remaining349 were used to assess the performances of the diﬀerentENM variants. Unfolded C- or N-terminal tails were au-tomatically identiﬁed (MSRF values larger than twice theaverage for all residues in the protein) and removed fromconsideration. In each ensemble, the structure with thelowest root mean square deviation from the mean struc-ture, after superposition, is chosen as representative andused to build the ENM.

Elastic network model

The network is built by considering each residue as a sin-gle bead, placed at the position of the corresponding C α atom in the input structure, and connecting neighboringbeads with Hookean springs [4,5]. The ENM variants con-sidered here diﬀer only by the form of the spring constant κ as a function of interresidue distance and of amino acidtypes. In all variants, bonded interactions are describedby a larger value of κ , deﬁned as ten times the value of κ for non-bonded interactions at a separation of 3.5 ˚A,averaged over all amino acid types. The potential energyof the network is given by: U = (cid:80) i < ∆ x i ∆ y j > < ∆ x i ∆ z j >< ∆ y i ∆ x j > < ∆ y i ∆ y j > < ∆ y i ∆ z j >< ∆ z i ∆ x j > < ∆ z i ∆ y j > < ∆ z i ∆ z j >  (8)where ∆ x i , ∆ y i , and ∆ z i correspond to the displacementsof residue i from its equilibrium position, along the three7artesian coordinates. The predicted MSRF of residue i is given by the trace of submatrix C ii : < (∆ R i ) > = < (∆ x i ) > + < (∆ y i ) > + < (∆ z i ) > . Variance of the interresidue distance

For each pair of residues in a given protein p , the experi-mental value of this variance is readily computed from theNMR data: σ r pij = 1 M p M p (cid:88) m =1 ( r pijm − r pij ) (9)where M p is the number of structures in the NMR ensem-ble, r pijm the distance between the C α atoms of residues i and j in structure m of protein p , and r pij the average dis-tance over all M p structures. In the context of the ENM, σ r pij values are estimated from the covariance matrix ofthe spatial coordinates, by standard statistical propaga-tion of uncertainty: σ r pij ≈ J (cid:20) C ii C ij C ji C jj (cid:21) J (cid:62) (10)where J is the Jacobian of the distance r pij as a functionof the six coordinates ( x i , y i , z i , x j , y j , z j ). This estima-tion of σ r pij relies on the validity of the ﬁrst order Taylorexpansion of the distance as function of the coordinatesin the vicinity of the average distance. We ensured thatno systematic bias arose from this approximation (Supple-mentary Fig. 7). To quantify the impact of the individ-ual motions of residues on their relative positions, we useeq. to compute ( σ ◦ r pij ) in an artiﬁcial construct whereresidue motions are not correlated. This is achieved byextracting the covariance matrix from the NMR data, andsetting to zero all submatrices C ij where i (cid:54) = j . Iterative procedure

The values of the spring constants of the new ENM vari-ants were derived from the dataset of 1500 NMR ensem-bles using eq . For the dENM, sENM and sENM , theinitial values of the spring constants were set equal to theexperimental values of the apparent stiﬀness: κ ( d ) = γ ( d )or κ ( s ) = γ ( s ). Note that the γ ( s ) values were computedby considering only residue pairs separated by a distancelower than the cutoﬀ of 10 or 13 ˚A. For the sdENM, the κ ( s, d ) values were set equal to the ﬁnal values of thespring constants in the dENM, κ ( d ), for all amino acidtypes. A correction for sparse data was devised to ensurethat κ ( s, d ) tends to κ ( d ) when the number of residue pairsof type ( s, d ) is too small to obtain relevant estimations of σ r ( s, d ). Instead of eq. , we used the following deﬁnitionto compute both the experimental and predicted apparentstiﬀness: γ ( s, d ) = 2 k B Tσ r ( s, d ) (cid:16) N sd N sd + S (cid:17) + σ r ( d ) (cid:16) S N sd + S (cid:17) (11)where N sd = (cid:80) Pp =1 N p ( s, d ) M p , N p ( s, d ) is the numberof pairs of type ( s, d ) in protein p , M p is the number ofstructures in the NMR ensemble of protein p , and S is anadjustable parameter set to 500.The κ values were rescaled after each iteration step, sothat the average value of κ over all amino acid types isequal to 1 for pairs separated by a distance of 6 ˚A. Residuepairs of a given type ( s, d ) for which κ ( s, d ) < .

001 (afterrescaling), were considered to establish no direct interac-tion: κ ( s, d ) was set to 0, and they were no longer con-sidered in the iterative procedure. The performances ofthe new ENM variants after the ﬁrst nine iteration stepsare reported in Supplementary Table 1. The procedureconverged rapidly for the dENM and the sdENM, and theﬁnal models were selected after 5 and 3 iteration steps,respectively. The sENM variants did not improve signiﬁ-cantly with respect to the initial models ( k = 0), indicat-ing that κ ( s ) = γ ( s ) is a good approximation, contrary to κ ( d ) = γ ( d ). The procedure was thus stopped after oneiteration step, for both the sENM and the sENM . Performance measures

The ability of coarse-grained models to accurately describeprotein dynamics is commonly evaluated by computingthe Pearson correlation coeﬃcient between predicted andexperimental MSRF, < (∆ R i ) > , over all i = 1 , ..., n residues of a given protein: r B = (cid:80) ni =1 ( B expi − B )( B prei − B ) (cid:113)(cid:80) ni =1 ( B expi − B ) (cid:113)(cid:80) ni =1 ( B predi − B ) (12)where, for simplicity, B i was used instead of < (∆ R i ) > .There is indeed a direct relationship between the MSRFand the cristallographic B-factors: B i = (8 π / < (∆ R i ) > . B expi and B prei correspond thus here to theMSRF of residue i extracted from the NMR data and pre-dicted by the ENM, respectively. The scale of the pre-dicted MSRF values depends on the scale of the springconstants, which are only deﬁned up to a constant fac-tor. This factor was determined, for each protein indepen-dently, by ﬁtting the scales of the predicted and experi-mental MSRF, i.e. to ensure that: B = 1 n n (cid:88) i =1 B expi = 1 n n (cid:88) i =1 B prei (13)Although it has been widely used in previous studies, r B is probably not the most adequate measure to evaluate8he performances of coarse-grained models of protein dy-namics. As pointed out previously [24, 25], it does indeedpresent several shortcomings: e.g. it is strongly aﬀectedby the presence of highly ﬂexible regions, and does notaccount for possible ﬂaws leading to an intercept of theregression line diﬀerent from zero. Most importantly, theMSRF describe individual ﬂuctuations but provide no in-formation about the cooperative aspects of residue mo-tions. The quality of the MSRF predictions gives thus noguarantee about the ability of the model to describe thecooperativity of protein dynamics. The ENM providesan interesting example, for it performs quite well in pre-dicting the MSRF but basically negates all cooperativity(Fig. 2, Table 1).Therefore, we introduce a new measure that exploitsthe information contained in the correlation matrix C , toquantify the error on the estimation of the ﬂuctuations ofthe interresidue distances: (cid:15) σ = (cid:118)(cid:117)(cid:117)(cid:116) N p N p (cid:88) ij (cid:32) σ (exp) r pij − σ (pre) r pij σ ◦ r pij (cid:33) (14)where N p is the number of non-bonded residue pairs inprotein p , σ (exp) r pij and σ (pre) r pij are the experimental (eq. )and predicted (eq. ) values of σ r pij , respectively. σ (pre) r pij is obtained after ﬁtting the experimental MSRF with thepredicted ones (eq. ). The error is normalized by σ ◦ r pij ,which is the expected value of σ r pij given the individual,anisotropic, ﬂuctuations of both residues extracted fromthe NMR data, but neglecting all correlations betweentheir respective motions. This normalization ensures thatthe contributions of the diﬀerent pairs of residues areequivalent, and that the measure is not dominated byhighly ﬂexible regions.Both r B and (cid:15) σ are computed independently for each ofthe 349 proteins of our test set, and the average values arereported. We also report the short- ( (cid:15) SR σ ), mid- ( (cid:15) MR σ ), andlong-range ( (cid:15) LR σ ) contributions to (cid:15) σ , obtained by consid-ering only pairs separated by 0-15 ˚A, 15-30 ˚A, and morethan 30 ˚A, respectively. Acknowledgments

The authors thank H. Flechsig and M. D¨uttmann for valu-able discussions. Y.D. is Postdoctoral Researcher at theBelgian Fund for Scientiﬁc Research (F.R.S.-FNRS), andacknowledges support from the Walloon region through aWBI grant.

Author Contributions

Y.D. designed and performed study. Y.D. and A.S.M an-alyzed data and wrote the paper.

References [1] Takada, S. Coarse-grained simulations of largebiomolecules.

Curr. Opin. Struct. Biol. , 130–137(2012).[2] Bahar, I., Lezon, T.R., Yang, L.-W. & Eyal, E. Globaldynamics of proteins: bridging between structure andfunction. Annu. Rev. Biophys. , 23–42 (2010).[3] Bahar, I., Lezon, T.R., Bakan, A. & Shrivastava,I.H. Normal mode analysis of biomolecular structures:functional mechanisms of membrane proteins. Chem.Rev. , 1463–1497 (2010).[4] Thirion, M.M. Large amplitude elastic motions inproteins from a single-parameter, atomic analysis.

Phys. Rev. Lett. , 1905–1908 (1996).[5] Atligan, A.R. et al. Anisotropy of ﬂuctuation dynam-ics of proteins with an elastic network model.

Biophys.J.

80, 505–515 (2001).[6] Hinsen, K., Petrescu, A.-J., Dellerue, S., Bellisent-Funel, M.-C. & Kneller, G.R. Harmonicity in slowprotein dynamics.

Chem. Phys. , 25–37 (2000).[7] Moritsugu, K. & Smith, J.C. Coarse-grainedbiomolecular simulation with REACH: Realistic ex-tension algorithm via covariance hessian.

Biophys. J. , 3460–3469 (2007).[8] Yang, L., Song, G. & Jernigan, R.L. Protein elasticnetwork models and the ranges of cooperativity. Proc.Natl. Acad. Sci. USA , 12347–12352 (2009).[9] Bahar, I. & Rader, A.J. Coarse-grained normal modeanalysis in structural biology.

Curr. Opin. Struct.Biol. , 586–592 (2005).[10] Van Wynsberghe, A.W. & Cui, Q. Interpreting corre-lated motions using normal mode analysis. Structure , 1647–1653 (2006).[11] Dykeman, E.C. & Sankey, O.F. Normal mode anal-ysis and applications in biological physics. J. Phys.Condens. Matter , 423202 (2010).[12] Flechsig, H. & Mikhailov, A.S. Tracing entire opera-tion cycles of molecular motor hepatitis C virus he-licase in structurally resolved dynamical simulations. Proc. Natl. Acad. Sci. USA , 20875–20880 (2010).913] D¨uttmann, M., Togashi, Y., Yanagida, T. &Mikhailov, A.S. Myosin-V as a mechanical sensor:an elastic network study.

Biophys. J. , 542–551(2012).[14] Zheng, W., Brooks, B.R., Doniach, S. & Thirumalai,D. Network of dynamically important residues in theopen/closed transition in polymerases is strongly con-served.

Structure , 565–577 (2005).[15] Hamacher, K. Relating sequence evolution of HIV1-protease to its underlying molecular mechanics. Gene , 30–36 (2008).[16] Hamacher, K. & McCammon, J.A. Computing theamino acid speciﬁcity of ﬂuctuations in biomolecularsystems.

J. Chem. Theor. Comput. , 873–878 (2006).[17] Gerek, Z.N., Keskin, O. & Ozkan, S.B. Identiﬁcationof speciﬁcity and promiscuity of PDZ domain inter-actions through their dynamic behavior. Proteins ,796–811 (2009).[18] Lezon, R.L. & Bahar, I. Using entropy maximizationto understand the determinants of structural dynam-ics beyond native contact topology. PLoS Comput.Biol. , e1000816 (2010).[19] Gerstein, M. & Krebs, W. A database of macro-molecular motions. Nucleic Acids Res. , 4280–4290(1998).[20] Dehouck, Y., Biot, C., Gilis, D., Kwasigroch, J.M. &Rooman, M. Sequence-structure signals of 3D domainswapping in proteins. J. Mol. Biol. , 1215–1225(2003).[21] Siggers, K., Soto, C. & Palmer, A.G.3rd. Conforma-tional dynamics in loop swap mutants of homologousﬁbronectin type II domains.

Biophys. J. , 2447–2456 (2007).[22] Trivedi, D.V., David, C., Jacobs, D.J. & Yengo,C.M. Switch II mutants reveal coupling between thenucleotide- and actin-binding regions in myosin V. Biophys. J. , 2545–2555 (2012).[23] Adhikary, R., Yu, W., Oda, M., Zimmerman, J. &Romesberg, F.E. Protein dynamics and the diversityof an antibody response.

J. Biol. Chem. , 27139–27147 (2012).[24] Fuglebakk, E., Echave, J. & Reuter, N. Measur-ing and comparing structural ﬂuctuation patterns inlarge protein datasets.

Bioinformatics , 2431–2440(2012). [25] Riccardi, D., Cui, Q. & Phillips, G.N.Jr. Evaluat-ing elastic network models of crystalline biologicalmolecules with temperature factors, correlated mo-tions, and diﬀuse X-ray scattering. Biophys. J. ,2616–2625 (2010).[26] Leoiatts, N., Romo, T.D. & Grossﬁeld, A. Elastic net-work models are robust to variations in formalism. J.Chem. Theor. Comput. , 2424–2434 (2012).[27] Sippl, M. Knowledge-based potentials for proteins. Curr. Opin. Struct. Biol. , 229–235 (1995).[28] Miyazawa, S. & Jernigan, R.L. Residue-residue po-tentials with a favorable contact pair term and anunfavorable high packing density term, for simulationand threading. J. Mol. Biol. , 623–644 (1996).[29] Dehouck, Y., Gilis, D. & Rooman, M. A new genera-tion of statistical potentials for proteins.

Biophys. J. , 4010–4017 (2006).[30] Lindorﬀ-Larsen, K., Best, R.B., Depristo, M.A., Dob-son, C.M. & Vendruscolo, M. Simultaneous deter-mination of protein structure and dynamics. Nature , 128–132 (2005).[31] Berman, H.M. et al.

The protein data bank.

NucleicAcids Res.28