Effective harmonic potentials: insights into the internal cooperativity and sequence-specificity of protein dynamics
EEffective harmonic potentials: insights into the internalcooperativity and sequence-specificity of protein dynamics
Yves Dehouck , and Alexander S. Mikhailov Department of Physical Chemistry, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg4-6, 11495 Berlin, Germany. Department of BioModelling, BioInformatics and BioProcesses, Universit´e Libre de Bruxelles(ULB), CP165/61, Av. Fr. Roosevelt 50, 1050 Brussels, Belgium. [email protected] [email protected]
Abstract
The proper biological functioning of proteins often relies on the occurrence of coordinated fluctuationsaround their native structure, or of wider and sometimes highly elaborated motions. Coarse-grained elastic-network descriptions are known to capture essential aspects of conformational dynamics in proteins, but haveso far remained mostly phenomenological, and unable to account for the chemical specificities of amino acids.Here, we propose a method to derive residue- and distance-specific effective harmonic potentials from thestatistical analysis of an extensive dataset of NMR conformational ensembles. These potentials constitutedynamical counterparts to the mean-force statistical potentials commonly used for static analyses of proteinstructures. In the context of the elastic network model, they yield a strongly improved description of thecooperative aspects of residue motions, and give the opportunity to systematically explore the influence ofsequence details on protein dynamics.
Introduction
Deciphering the motions that underlie many aspects ofprotein function is a major current challenge in molecu-lar biology, with the potential to generate numerous ap-plications in biomedical research and biotechnology. Al-though molecular dynamics (MD) hold a prominent po-sition among computational approaches, considerable ef-forts have been devoted to the development of coarse-grained models of protein dynamics [1]. Besides their abil-ity to follow motions on time scales that are usually notaccessible to MD simulations, these models also give thepossibility to better understand the general principles thatrule the dynamical properties of proteins. 13 ˚A.The elegant simplicity of the elastic network models(ENM) certainly contributed to their popularity, and theyhave been successfully exploited in a wide range of ap- plications [2, 3]. In these models, the residues are usu-ally represented as single particles and connected to theirneighbors by Hookean springs [4, 5]. The input structureis assumed to be the equilibrium state, i.e. the globalenergy minimum of the system. Common variants in-clude the homogeneous ENM, in which springs of equalstiffness connect pairs of residues separated by a distancesmaller than a predefined cutoff, and other versions inwhich the spring stiffness decays as the interresidue dis-tance increases [6–8]. In all cases, the equations of motioncan be either linearized around equilibrium, to perform anormal mode analysis of the system [9–11], or integratedto obtain time-resolved relaxation trajectories [12, 13].Despite their many achievements, purely structuralENM also come with severe limitations. Notably, model-ing the possible effects of mutations within this frameworkusually requires random local perturbations of the spring1 a r X i v : . [ q - b i o . B M ] A p r onstants [14], or a more drastic removal of links fromthe network [15]. A few attempts have been made to in-clude sequence-specificity in the ENM by setting the springconstants proportional to the depth of the energy min-ima, as estimated by statistical contact potentials [16, 17].However, this approach cannot be extended to distance-dependent potentials, for they are not consistent with theground hypothesis of the ENM, i.e. that all pairwise in-teraction potentials are at their minimum in the nativestructure. Other studies have led to the conclusion thatthe ENM behave as entropic models dominated by struc-tural features, and that the level of coarse-graining is prob-ably too high to incorporate sequence details [5, 18]. Still,the chemical nature of residues at key positions can havesignificant effects on the main dynamical properties of aprotein. Hinge motions [19], for instance, obviously re-quire some architectural conditions to be fulfilled, such asthe presence of two domains capable of moving relativelyindependently. But the amplitude and preferred directionof the motion are most likely determined by fine tuningof specific interactions in the hinge region. In proteinssubject to domain swapping, the hinge loops have indeedbeen shown to frequently include residues that are not op-timal for stability [20]. The importance of the amino acidsequence has also been repeatedly emphasized by experi-mental studies of the impact of mutations on the confor-mational dynamics of proteins [21–23].A major obstacle to the definition of accurate coarse-grained descriptions of protein dynamics lies in the highlycooperative nature of protein motions, which makes it dif-ficult to identify the properties of the individual buildingblocks independently of the overall architecture of eachfold. By condensing the information contained in a mul-titude of NMR ensembles, we build here a mean proteinenvironment, in which the behavior of residue pairs canbe tracked independently of each protein’s specific struc-ture. This methodology brings an efficient way of assessingcoarse-grained models of protein dynamics and of derivingeffective energy functions adapted to these models. In thecontext of the ENM, we identify a set of spring constantsthat depend on both the interresidue distances and thechemical nature of amino acids, and that markedly im-prove the performances of the model. Results
Dynamical properties of proteins from the per-spective of an average pair of residues
The mean-square fluctuations of individual residues(MSRF) have been extensively relied on to characterizeprotein flexibility and to evaluate coarse-grained models ofprotein dynamics [24], in part because of their widespreadavailability as crystallographic B-factors. However, since
A BC D E
Individual fluctuationsPairwise fluctuations< Δ R A2 > = 1.00< Δ R B2 > = 1.22< Δ R C2 > = 0.57< Δ R D2 > = 1.01< Δ R E2 > = 0.89 γ AB = 1.05 γ CD = 1.18 γ AE = 0.48 γ AB = 0.97 γ CD = 1.09 γ AE = 0.61°°° Figure 1:
Schematic illustration of the apparent stiffness γ . Asimple model containing 8 beads connected by elastic springswas subjected to 10 integration steps under Gaussian noise.Selected values of < (∆ R i ) > , γ and γ ◦ are given in arbitraryunits. Individually, the pairs A-B and C-D would be identi-cal, but they experience differently the influence of the otherbeads. As a result, the C-D pair is effectively more rigid thanA-B ( γ AB < γ CD ). In both cases, the motions are somewhatcorrelated, as the apparent stiffness γ is larger than what isexpected from the knowledge of their individual motions ( γ ◦ ).Beads A and E do not interact directly but the effect of thenetwork on their relative motions is captured by the values of γ AE and γ ◦ AE . the MSRF carry little information about the cooperativenature of residue motions, we propose to examine thedynamical behavior of proteins from the perspective ofresidue pairs rather than individual residues. Informa-tion about the fluctuations of interresidue distances is con-tained in the data of NMR experiments for numerous pro-teins, and will be exploited here. We define the apparentstiffness of a pair of residues i , j in a protein p : γ pij = 2 k B T /σ r pij (1)where k B is the Boltzmann constant, T the temperature,and σ r pij the variance of the distance r between residues i and j , in a structural ensemble representative of the equi-librium state. γ pij is defined up to a multiplicative factor,which corresponds to the temperature. We also introducethe uncorrelated apparent stiffness γ ◦ pij , to quantify theimpact of the individual fluctuations of residues i and j on the fluctuations of the distance that separates them.This is achieved by using σ ◦ r pij instead of σ r pij in eq. ,where σ ◦ r pij is computed after exclusion of all correlationsbetween the motions of residues i and j ( Methods ).As illustrated in Figure 1, γ can be quite different fromone residue pair to another. Indeed, besides the impactof direct interactions, γ is also strongly dependent on theoverall fold of the protein, and on the position of the pairwithin the structure. To remove the specific influence ofeach protein’s architecture, we define the apparent stiffnessin a mean protein environment γ ( s, d ):2 ( s, d ) = 2 k B Tσ r ( s, d ) , with σ r ( s, d ) = (cid:80) Pp =1 (cid:80) N p ( s,d ) ij M p σ r pij (cid:80) Pp =1 N p ( s, d ) M p (2)where s is one of 210 amino acid pairs, d the discretizedequilibrium distance between pairs of residues ( d ≤ r pij Comparison of the experimental and predicted val-ues of the apparent stiffness γ ( d ). Experimental values of γ ( d )(continuous black) and γ ◦ ( d ) (dashed black), extracted fromthe dataset of 1500 NMR ensembles. Values of γ ( d ) predictedon the same dataset by the ENM (dashed red); ENM (con-tinuous red); ENM (dashed blue); ENM (continuous blue). (e.g. hydrophobic core vs. exposed surface loops). How-ever, there is only a limited agreement between γ ( s ) and γ ◦ ( s ) (Fig. 3A-B): the correlation coefficient is equal to0.71, and γ ( s ) spans a much wider range of values. Be-yond individual amino acid preferences, the specifics ofresidue-residue interactions play thus a significant role indetermining the extent of cooperativity in residue motions. Accuracy of elastic network models in reproducingthe dynamical properties of proteins The computation of the apparent stiffness of residue pairsin a mean protein environment provides an interesting toolto probe the dynamical properties of proteins. It also gen-erates a very straightforward approach to assess the abilityof coarse-grained models to reproduce accurately this gen-eral behavior.We focus here on four common variants of the residue-based ENM [25, 26], which differ only by the functionalform of the spring constants κ . The dependence of κ onthe interresidue distance r pij is defined by two parame-ters: the cutoff distance l d , above which residues i and j are considered disconnected, and the exponent α thatdetermines how fast κ decreases with increasing distances: κ pij (ENM αl d ) = a p H ( l d − r pij ) r − αpij (3)where H is the Heaviside function. The value of thetemperature-related factor a p is obtained, for each pro-tein independently, by fitting the predicted MSRF withthe experimental ones. This ensures that the amplitudeof the individual fluctuations of the beads in the networkis on average equal to that observed in the corresponding3 γ° (s) B experimental γ (s) C predicted by the ENM γ (s) A experimental Figure 3: Comparison of the experimental and predicted values of the apparent stiffness γ ( s ), in the dataset of 1500 NMRensembles. For each amino acid, the median value of γ ( s ) over the 20 possible partners is given in units of k B T , along withthe maximal, minimal, 1 st and 3 rd quartile values. Outliers from these distributions are depicted as circles. (A) Experimentalvalues of γ ( s ). (B) Experimental values of γ ◦ ( s ). (C) Values of γ ( s ) predicted by the ENM . NMR ensemble, and that the predicted γ ( s, d ) values canthus be directly compared with those extracted from theNMR data. We consider the following models: ENM ,ENM , ENM , ENM . These ENM variants were usedto estimate the value of σ r pij for each pair of residues inthe 1500 proteins of our NMR dataset ( Methods ), and tosubsequently compute γ ( d ) and γ ( s ) from eq. .Strikingly, all ENM variants systematically predict γ ( d )values to be lower than the experimental ones, at least upto interresidue distances of 20-30 ˚A(Fig. 2). These modelsoverestimate thus the amplitude of pairwise fluctuations,relatively to the amplitude of individual fluctuations. Forexample, if two residues in a protein undergo highly cor-related motions, the amount of thermal energy necessaryto induce a moderate variance on the distance betweenthem will generate high variances on their individual co-ordinates. Consequently, if the motions of the beads ofthe ENM are less coordinated, adjusting the scale of thespring constants to reproduce the amplitude of individualfluctuations leads to an overestimated variance on the in-terresidue distances, and thus to lower γ ( d ) values. Thisproblem is particularly apparent when κ is assumed todecrease proportionally to the square of the interresiduedistance, in the ENM . Although this model was shownto perform well in predicting MSRF values [8], our resultssuggest that it negates almost completely the coordinatedaspect of residue motions. Indeed, as shown in Figure 2,the γ ( d ) values predicted by this model are very close tothose obtained from the experimental data after removalof the correlations between the motions of the differentresidues ( γ ◦ ( d )). This observation is consistent with theextremely short atom-atom correlation length character-istic of the ENM , recently estimated on the basis of anX-ray structure of Staphylococcal nuclease [25]. The ENM is often considered as an entropic model, notdetailed enough to include sequence information in a rele-vant way [5,18]. It is therefore hardly surprising that com-mon ENM variants produce a poor description of the se-quence specificities of protein dynamics. Individual aminoacid preferences for more or less densely connected regionsare responsible for some variety in the predicted values of γ ( s ) (Fig. 3C). However, this variety is far from matchingthe one observed in the experimental data, as shown bya much narrower range of γ ( s ) values, and a limited cor-relation coefficient with the experimental γ ( s ) values, e.g.0.62 for the ENM (Supplementary Fig. 3). Derivation of effective harmonic potentials Mean-force statistical potentials are commonly used toperform energetic evaluations of static protein structures[27–29]. These potentials do not describe explicitly the”true” physical interactions, but provide effective energiesof interaction in a mean protein environment, in the con-text of a more or less simplified structural representation.Similarly, within the ENM framework, κ ( s, d ) defines foreach pair of residues an harmonic interaction potential.This potential is also effective in nature, accounting im-plicitly for everything that is not included in the model(e.g. the surrounding water). Hence, we seek to identifythe value of κ yielding the most accurate reproductionof the dynamical behavior of each type of pair ( s, d ) in amean protein environment, which is conveniently capturedby the apparent stiffness γ ( s, d ).For that purpose, let us define E bond ( s, d ) as the energyof the elastic spring connecting two residues of type ( s, d ),in a mean protein environment:4 10 20 Interresidue distance d (Å) -3 -2 -1 S p r i ng c on s t an t s o f t he d E N M : κ ( d ) Interresidue distance d (Å) -3 -2 -1 S p r i ng c on s t an t s o f t he s d E N M : κ ( s , d ) I-ID-RG-G Spring constants of the sENM : κ (s) A0.22.3 C D E F G H I K L M N P Q R S T V W YACDEFGHIKLMNPQRSTVWY A B C Figure 4: Effective harmonic potentials. (A) Spring constants of the sENM , for the 210 amino acid pairs. (B) Springconstants of the dENM. The dashed line corresponds to κ ∼ r − . (C) Spring constants of the sdENM for 3 amino-acid pairs.All κ values are given in Supplementary Tables 2-5. E bond ( s, d ) = 12 κ ( s, d ) σ r ( s, d ) = k B T κ ( s, d ) γ ( s, d ) (4)where γ ( s, d ) is the apparent stiffness extracted from theexperimental data. E bond ( s, d ) is unknown and is expectedto be different for different pair types ( s, d ). The knowl-edge of γ ( s, d ) is thus not sufficient to estimate directly κ ( s, d ). However, from any approximate set of spring con-stants κ (cid:48) ( s, d ), we may build the ENM for all proteins inour dataset, to reproduce the mean protein environment,and compute for each pair type an estimated value of theapparent stiffness, γ (cid:48) ( s, d ), and bond energy, E (cid:48) bond ( s, d ).Since the behavior of a given residue pair is highly de-pendent on its environment, we can make the assump-tion that E (cid:48) bond ( s, d ) is a relatively good approximationof E bond ( s, d ), even if κ (cid:48) ( s, d ) (cid:54) = κ ( s, d ): E (cid:48) bond ( s, d ) = k B T κ (cid:48) ( s, d ) γ (cid:48) ( s, d ) (cid:39) E bond ( s, d ) (5)Indeed, if the spring stiffness of a residue pair is underes-timated ( κ (cid:48) < κ ), it will also appear as less rigid in theENM than in the experimental data ( γ (cid:48) < γ ). A more de-tailed discussion is given in Supplementary Note 1. Fromeqs. and , we devise thus an iterative procedure inwhich κ ( s, d ) is updated at each step k by confronting thepredicted values of the apparent stiffness, γ k ( s, d ), withthe experimental ones, γ ( s, d ). It is expected to convergewhen γ k ( s, d ) → γ ( s, d ), that is, when the predictions ofthe model agree with the experimental data: κ k +1 ( s, d ) = κ k ( s, d ) γ ( s, d ) γ k ( s, d ) (6)We used this approach to derive, from the NMR data,four novel ENM variants: the distance-dependent dENM ; the sequence-dependent sENM and sENM , with a dis-tance cutoff of 10 and 13 ˚A, respectively, and the sequence-and distance-dependent sdENM ( Methods ). Interestingly,the κ values for the 210 amino acid pairs in the sENM arerelatively well correlated with the corresponding contactpotentials [28], even though they result from totally dif-ferent approaches (Supplementary Fig. 4). Some commongeneral trends can be identified, e.g. hydrophobic contactstend to be associated with both favorable interaction en-ergies and large κ values (Fig. 4A). However, the overallcorrespondence remains limited, indicating that the deter-minants of protein rigidity and stability are related, butdistinct. The distance dependence of κ in the dENM isremarkably similar to the r − power law that was previ-ously obtained by fitting against a 1.5ns MD trajectoryof a C-phycocyanin dimer [6] (Fig. 4B), although our newmodel presents more detailed features. Notably, κ remainsapproximately constant up to interresidue distances of 5-6˚A, and then drops by about two orders of magnitude toreach a second plateau between 7 and 12 ˚A. The κ valuesof the sdENM are shown in Figure 4C, for a few aminoacid pairs. This model not only combines the strengths ofthe sENM and the dENM, but also reveals the sequencespecificity of the κ distance dependence. The D-R pair,for example, is almost as rigid as I-I at short distancesconsistent with the formation of a salt bridge, but almostas flexible as G-G at larger distances. Performances of the new ENM The sdENM yields a much more accurate reproduction ofthe dynamical behavior of residue pairs in a mean proteinenvironment than the common ENM variants, as demon-strated by the good agreement between experimental andpredicted values of γ ( s ) (Fig. 5A, Supplementary Fig. 5),and γ ( d ) (Fig. 5B).5 γ ( s ) p r ed i c t ed b y t he s d E N M γ (s) experimental Correlation coefficient = 0.95 A ppa r en t s t i ff ne ss γ ( d ) ( k B T ) Interresidue distance d (Å) a b c 0% 50%-50% ε (sdENM) - (ENM ) σ ε σ Figure 5: Performances of the sdENM. (A) Experimental and predicted values of γ ( s ), in the dataset of 1500 NMR ensembles.See also Supplementary Figures 3 and 5. (B) Experimental (continuous) and predicted (dashed) values of γ ( d ), in the dataset of1500 NMR ensembles (black), and in a single protein (PDB 1xqq) (green). See also Supplementary Figure 2. (C) Comparisonof the ability of the sdENM and the ENM to correctly reproduce the fluctuations of a single protein, on the basis of a highquality structural ensemble of human ubiquitin (PDB 1xqq) [30]. 20 randomly selected residue pairs are connected by solidlines. Shades of blue indicate a better performance of the sdENM, while shades of red indicate a better performance of theENM . See also Supplementary Figure 6. Beyond its performances in a mean protein environment,our new model also brings highly notable improvementswith respect to previously described ENM variants whenit is applied to the specific architecture of a given protein.This is illustrated by two examples, on Figure 5C andSupplementary Figure 6. A more thorough assessment ofthe ability of the different ENM variants to capture themotions of individual proteins was performed on an in-dependent dataset of 349 proteins. The correlation coeffi-cient between predicted and observed MSRF ( r B ) has beenwidely used in the past but ignores the cooperativity inher-ent to protein dynamics, and presents other shortcomings.Therefore, we introduce a new measure ( (cid:15) σ ) that quanti-fies the relative error on the estimation of the variabilityof the distance between residue pairs, and is thus focusedon the cooperative aspects of residue motions ( Methods ). Table 1: Performances of different ENM variants. ( a ) Aver-age correlation coefficient between experimental and measuredMSRF. ( b ) Average relative error on the fluctuations of inter-residue distances. r ( a ) B (cid:15) ( b ) σ (cid:15) SR σ (cid:15) MR σ (cid:15) LR σ ENM is better at predicting the individual residue fluc-tuations (Table 1). Interestingly, the ENM , with its sim-ple cutoff distance, appears superior when it comes to thereproduction of cooperative motions ( (cid:15) σ = 0 . for individual fluctua-tions ( r B = 0 . for thedescription of cooperativity ( (cid:15) σ = 0 . / with ENM / ,and sdENM with dENM. It consists in a slight improve-ment of the correlation coefficient r B , and a pronounceddecrease of the error (cid:15) σ , especially at short- (0-15 ˚A) andmid- (15-30 ˚A) range. Discussion For the last decades, statistical potentials extracted fromdatasets of known protein structures [27–29] have playeda critical role in static analyses of protein structures, withmajor applications including structure prediction, protein-protein docking, or rational mutant design. Our studydemonstrates that a similar approach can be taken to de-rive effective energy functions that are specifically adaptedto the coarse-grained modeling of protein dynamics.More precisely, in the context of the ENM, we exploiteda dataset of 1500 NMR ensembles to determine the valuesof the spring constants that describe best the behavior ofpairs of residues, as a function of both their chemical na-6ure and the distance separating them. The success of ourapproach is attested by a drastic enhancement of the abil-ity to accurately describe the cooperative nature of residuemotions, with respect to previously described ENM vari-ants. Moreover, a definite advantage of our method is thatthe effective parameters characterizing the strength of thevirtual bonds are directly extracted from the experimen-tal data without any a priori conception of their functionalform. The fact that the distance dependence of the springconstants that we retrieve is quite similar to the r − powerlaw, which was considered so far as underlying one of thebest performing ENM variants [6, 25], also constitutes amajor support to our approach.In our derivation scheme, the virtual bonds areparametrized so as to reproduce the behavior of aminoacid pairs in a mean protein environment. The analysisof the ability of different models of protein dynamics todescribe the motions of residues within this environmentsheds an interesting new light on the properties of thesemodels. In particular, our results indicate that previousENM variants underestimate, sometimes dramatically, therigidity of amino acid pairs at short- and mid-range. Ournew model does however provide a much more accuratereproduction of the balance between short-range and long-range coordinated motions. This is arguably a crucial as-pect when considering, for example, the consequences oflocalized alterations induced by ligand binding on signaltransduction or global conformational changes, such as inATP-powered molecular motors.Importantly, our results also demonstrate that the ENMdoes not have to be exclusively structural, and that se-quence details may be allowed to play a major role incoarse-grained descriptions of protein dynamics. Thereby,this study paves the way towards comparative analysesof motions in proteins that share a similar structure butpresent differences in sequence. Such investigations willprove particularly interesting in the context of the ratio-nal design of (modified) proteins with controlled dynami-cal properties. Although we focused here on residue-basedelastic network models, our approach is not limited to thisparticular family, and can be readily implemented to eval-uate and optimize other coarse-grained models of proteindynamics. Notably, the impact of chemical specificity onthe dynamical behavior of residues should be even moreaccurately rendered by effective potentials based on a moredetailed structural description. Methods NMR Dataset We retrieved, from the Protein Data Bank [31], ensemblesof at least 20 models from solution NMR experiments, cor-responding to monomeric proteins of at least 50 residues that present at most 30% sequence identity with one an-other. Entries under the SCOP classifications ”Peptides”or ”Membrane and cell surface proteins” were not consid-ered. The presence of ligands, DNA or RNA molecules,chain breaks, non-natural amino acids, and differences inthe number of residues per model were also grounds for re-jection. These criteria led to the selection of 1849 distinctstructural ensembles. A subset of 1500 ensembles was ran-domly selected for the main analysis, and the remaining349 were used to assess the performances of the differentENM variants. Unfolded C- or N-terminal tails were au-tomatically identified (MSRF values larger than twice theaverage for all residues in the protein) and removed fromconsideration. In each ensemble, the structure with thelowest root mean square deviation from the mean struc-ture, after superposition, is chosen as representative andused to build the ENM. Elastic network model The network is built by considering each residue as a sin-gle bead, placed at the position of the corresponding C α atom in the input structure, and connecting neighboringbeads with Hookean springs [4,5]. The ENM variants con-sidered here differ only by the form of the spring constant κ as a function of interresidue distance and of amino acidtypes. In all variants, bonded interactions are describedby a larger value of κ , defined as ten times the value of κ for non-bonded interactions at a separation of 3.5 ˚A,averaged over all amino acid types. The potential energyof the network is given by: U = (cid:80) i For each pair of residues in a given protein p , the experi-mental value of this variance is readily computed from theNMR data: σ r pij = 1 M p M p (cid:88) m =1 ( r pijm − r pij ) (9)where M p is the number of structures in the NMR ensem-ble, r pijm the distance between the C α atoms of residues i and j in structure m of protein p , and r pij the average dis-tance over all M p structures. In the context of the ENM, σ r pij values are estimated from the covariance matrix ofthe spatial coordinates, by standard statistical propaga-tion of uncertainty: σ r pij ≈ J (cid:20) C ii C ij C ji C jj (cid:21) J (cid:62) (10)where J is the Jacobian of the distance r pij as a functionof the six coordinates ( x i , y i , z i , x j , y j , z j ). This estima-tion of σ r pij relies on the validity of the first order Taylorexpansion of the distance as function of the coordinatesin the vicinity of the average distance. We ensured thatno systematic bias arose from this approximation (Supple-mentary Fig. 7). To quantify the impact of the individ-ual motions of residues on their relative positions, we useeq. to compute ( σ ◦ r pij ) in an artificial construct whereresidue motions are not correlated. This is achieved byextracting the covariance matrix from the NMR data, andsetting to zero all submatrices C ij where i (cid:54) = j . Iterative procedure The values of the spring constants of the new ENM vari-ants were derived from the dataset of 1500 NMR ensem-bles using eq . For the dENM, sENM and sENM , theinitial values of the spring constants were set equal to theexperimental values of the apparent stiffness: κ ( d ) = γ ( d )or κ ( s ) = γ ( s ). Note that the γ ( s ) values were computedby considering only residue pairs separated by a distancelower than the cutoff of 10 or 13 ˚A. For the sdENM, the κ ( s, d ) values were set equal to the final values of thespring constants in the dENM, κ ( d ), for all amino acidtypes. A correction for sparse data was devised to ensurethat κ ( s, d ) tends to κ ( d ) when the number of residue pairsof type ( s, d ) is too small to obtain relevant estimations of σ r ( s, d ). Instead of eq. , we used the following definitionto compute both the experimental and predicted apparentstiffness: γ ( s, d ) = 2 k B Tσ r ( s, d ) (cid:16) N sd N sd + S (cid:17) + σ r ( d ) (cid:16) S N sd + S (cid:17) (11)where N sd = (cid:80) Pp =1 N p ( s, d ) M p , N p ( s, d ) is the numberof pairs of type ( s, d ) in protein p , M p is the number ofstructures in the NMR ensemble of protein p , and S is anadjustable parameter set to 500.The κ values were rescaled after each iteration step, sothat the average value of κ over all amino acid types isequal to 1 for pairs separated by a distance of 6 ˚A. Residuepairs of a given type ( s, d ) for which κ ( s, d ) < . 001 (afterrescaling), were considered to establish no direct interac-tion: κ ( s, d ) was set to 0, and they were no longer con-sidered in the iterative procedure. The performances ofthe new ENM variants after the first nine iteration stepsare reported in Supplementary Table 1. The procedureconverged rapidly for the dENM and the sdENM, and thefinal models were selected after 5 and 3 iteration steps,respectively. The sENM variants did not improve signifi-cantly with respect to the initial models ( k = 0), indicat-ing that κ ( s ) = γ ( s ) is a good approximation, contrary to κ ( d ) = γ ( d ). The procedure was thus stopped after oneiteration step, for both the sENM and the sENM . Performance measures The ability of coarse-grained models to accurately describeprotein dynamics is commonly evaluated by computingthe Pearson correlation coefficient between predicted andexperimental MSRF, < (∆ R i ) > , over all i = 1 , ..., n residues of a given protein: r B = (cid:80) ni =1 ( B expi − B )( B prei − B ) (cid:113)(cid:80) ni =1 ( B expi − B ) (cid:113)(cid:80) ni =1 ( B predi − B ) (12)where, for simplicity, B i was used instead of < (∆ R i ) > .There is indeed a direct relationship between the MSRFand the cristallographic B-factors: B i = (8 π / < (∆ R i ) > . B expi and B prei correspond thus here to theMSRF of residue i extracted from the NMR data and pre-dicted by the ENM, respectively. The scale of the pre-dicted MSRF values depends on the scale of the springconstants, which are only defined up to a constant fac-tor. This factor was determined, for each protein indepen-dently, by fitting the scales of the predicted and experi-mental MSRF, i.e. to ensure that: B = 1 n n (cid:88) i =1 B expi = 1 n n (cid:88) i =1 B prei (13)Although it has been widely used in previous studies, r B is probably not the most adequate measure to evaluate8he performances of coarse-grained models of protein dy-namics. As pointed out previously [24, 25], it does indeedpresent several shortcomings: e.g. it is strongly affectedby the presence of highly flexible regions, and does notaccount for possible flaws leading to an intercept of theregression line different from zero. Most importantly, theMSRF describe individual fluctuations but provide no in-formation about the cooperative aspects of residue mo-tions. The quality of the MSRF predictions gives thus noguarantee about the ability of the model to describe thecooperativity of protein dynamics. The ENM providesan interesting example, for it performs quite well in pre-dicting the MSRF but basically negates all cooperativity(Fig. 2, Table 1).Therefore, we introduce a new measure that exploitsthe information contained in the correlation matrix C , toquantify the error on the estimation of the fluctuations ofthe interresidue distances: (cid:15) σ = (cid:118)(cid:117)(cid:117)(cid:116) N p N p (cid:88) ij (cid:32) σ (exp) r pij − σ (pre) r pij σ ◦ r pij (cid:33) (14)where N p is the number of non-bonded residue pairs inprotein p , σ (exp) r pij and σ (pre) r pij are the experimental (eq. )and predicted (eq. ) values of σ r pij , respectively. σ (pre) r pij is obtained after fitting the experimental MSRF with thepredicted ones (eq. ). The error is normalized by σ ◦ r pij ,which is the expected value of σ r pij given the individual,anisotropic, fluctuations of both residues extracted fromthe NMR data, but neglecting all correlations betweentheir respective motions. This normalization ensures thatthe contributions of the different pairs of residues areequivalent, and that the measure is not dominated byhighly flexible regions.Both r B and (cid:15) σ are computed independently for each ofthe 349 proteins of our test set, and the average values arereported. We also report the short- ( (cid:15) SR σ ), mid- ( (cid:15) MR σ ), andlong-range ( (cid:15) LR σ ) contributions to (cid:15) σ , obtained by consid-ering only pairs separated by 0-15 ˚A, 15-30 ˚A, and morethan 30 ˚A, respectively. Acknowledgments The authors thank H. Flechsig and M. D¨uttmann for valu-able discussions. Y.D. is Postdoctoral Researcher at theBelgian Fund for Scientific Research (F.R.S.-FNRS), andacknowledges support from the Walloon region through aWBI grant. Author Contributions Y.D. designed and performed study. Y.D. and A.S.M an-alyzed data and wrote the paper. References [1] Takada, S. Coarse-grained simulations of largebiomolecules. Curr. Opin. Struct. Biol. , 130–137(2012).[2] Bahar, I., Lezon, T.R., Yang, L.-W. & Eyal, E. Globaldynamics of proteins: bridging between structure andfunction. Annu. Rev. Biophys. , 23–42 (2010).[3] Bahar, I., Lezon, T.R., Bakan, A. & Shrivastava,I.H. Normal mode analysis of biomolecular structures:functional mechanisms of membrane proteins. Chem.Rev. , 1463–1497 (2010).[4] Thirion, M.M. Large amplitude elastic motions inproteins from a single-parameter, atomic analysis. Phys. Rev. Lett. , 1905–1908 (1996).[5] Atligan, A.R. et al. Anisotropy of fluctuation dynam-ics of proteins with an elastic network model. Biophys.J. 80, 505–515 (2001).[6] Hinsen, K., Petrescu, A.-J., Dellerue, S., Bellisent-Funel, M.-C. & Kneller, G.R. Harmonicity in slowprotein dynamics. Chem. Phys. , 25–37 (2000).[7] Moritsugu, K. & Smith, J.C. Coarse-grainedbiomolecular simulation with REACH: Realistic ex-tension algorithm via covariance hessian. Biophys. J. , 3460–3469 (2007).[8] Yang, L., Song, G. & Jernigan, R.L. Protein elasticnetwork models and the ranges of cooperativity. Proc.Natl. Acad. Sci. USA , 12347–12352 (2009).[9] Bahar, I. & Rader, A.J. Coarse-grained normal modeanalysis in structural biology. Curr. Opin. Struct.Biol. , 586–592 (2005).[10] Van Wynsberghe, A.W. & Cui, Q. Interpreting corre-lated motions using normal mode analysis. Structure , 1647–1653 (2006).[11] Dykeman, E.C. & Sankey, O.F. Normal mode anal-ysis and applications in biological physics. J. Phys.Condens. Matter , 423202 (2010).[12] Flechsig, H. & Mikhailov, A.S. Tracing entire opera-tion cycles of molecular motor hepatitis C virus he-licase in structurally resolved dynamical simulations. Proc. Natl. Acad. Sci. USA , 20875–20880 (2010).913] D¨uttmann, M., Togashi, Y., Yanagida, T. &Mikhailov, A.S. Myosin-V as a mechanical sensor:an elastic network study. Biophys. J. , 542–551(2012).[14] Zheng, W., Brooks, B.R., Doniach, S. & Thirumalai,D. Network of dynamically important residues in theopen/closed transition in polymerases is strongly con-served. Structure , 565–577 (2005).[15] Hamacher, K. Relating sequence evolution of HIV1-protease to its underlying molecular mechanics. Gene , 30–36 (2008).[16] Hamacher, K. & McCammon, J.A. Computing theamino acid specificity of fluctuations in biomolecularsystems. J. Chem. Theor. Comput. , 873–878 (2006).[17] Gerek, Z.N., Keskin, O. & Ozkan, S.B. Identificationof specificity and promiscuity of PDZ domain inter-actions through their dynamic behavior. Proteins ,796–811 (2009).[18] Lezon, R.L. & Bahar, I. Using entropy maximizationto understand the determinants of structural dynam-ics beyond native contact topology. PLoS Comput.Biol. , e1000816 (2010).[19] Gerstein, M. & Krebs, W. A database of macro-molecular motions. Nucleic Acids Res. , 4280–4290(1998).[20] Dehouck, Y., Biot, C., Gilis, D., Kwasigroch, J.M. &Rooman, M. Sequence-structure signals of 3D domainswapping in proteins. J. Mol. Biol. , 1215–1225(2003).[21] Siggers, K., Soto, C. & Palmer, A.G.3rd. Conforma-tional dynamics in loop swap mutants of homologousfibronectin type II domains. Biophys. J. , 2447–2456 (2007).[22] Trivedi, D.V., David, C., Jacobs, D.J. & Yengo,C.M. Switch II mutants reveal coupling between thenucleotide- and actin-binding regions in myosin V. Biophys. J. , 2545–2555 (2012).[23] Adhikary, R., Yu, W., Oda, M., Zimmerman, J. &Romesberg, F.E. Protein dynamics and the diversityof an antibody response. J. Biol. Chem. , 27139–27147 (2012).[24] Fuglebakk, E., Echave, J. & Reuter, N. Measur-ing and comparing structural fluctuation patterns inlarge protein datasets. Bioinformatics , 2431–2440(2012). [25] Riccardi, D., Cui, Q. & Phillips, G.N.Jr. Evaluat-ing elastic network models of crystalline biologicalmolecules with temperature factors, correlated mo-tions, and diffuse X-ray scattering. Biophys. J. ,2616–2625 (2010).[26] Leoiatts, N., Romo, T.D. & Grossfield, A. Elastic net-work models are robust to variations in formalism. J.Chem. Theor. Comput. , 2424–2434 (2012).[27] Sippl, M. Knowledge-based potentials for proteins. Curr. Opin. Struct. Biol. , 229–235 (1995).[28] Miyazawa, S. & Jernigan, R.L. Residue-residue po-tentials with a favorable contact pair term and anunfavorable high packing density term, for simulationand threading. J. Mol. Biol. , 623–644 (1996).[29] Dehouck, Y., Gilis, D. & Rooman, M. A new genera-tion of statistical potentials for proteins. Biophys. J. , 4010–4017 (2006).[30] Lindorff-Larsen, K., Best, R.B., Depristo, M.A., Dob-son, C.M. & Vendruscolo, M. Simultaneous deter-mination of protein structure and dynamics. Nature , 128–132 (2005).[31] Berman, H.M. et al. The protein data bank. NucleicAcids Res.28