[PDF] Orientational Order Governs Collectivity of Folded Proteins

Abstract

The past decade has witnessed the development and success of coarse-grained network models of proteins for predicting many equilibrium properties related to collective modes of motion. Curiously, the results are usually robust towards the different cutoff distances used for constructing the residue networks from the knowledge of the experimental coordinates. In this study, we present a systematical study of network construction, and their effect on the predicted properties. Probing bond orientational order around each residue, we propose a natural partitioning of the interactions into an essential and a residual set. In this picture the robustness originates from the way with which new contacts are added so that an unusual local orientational order builds up. These residual interactions have a vanishingly small effect on the force vectors on each residue. The stability of the overall force balance then translates into the Hessian as small shifts in the slow modes of motion and an invariance of the corresponding eigenvectors. We introduce a rescaled version of the Hessian matrix and point out a link between the matrix Frobenius norm based on spectral stability arguments. A recipe for the optimal choice of partitioning the interactions into essential and residual components is prescribed. Implications for the study of biologically relevant properties of proteins are discussed with specific examples.

Full PDF

11 Orientational Order Governs Collectivity of Folded Proteins*

Canan Atilgan, Osman Burak Okan, Ali Rana Atilgan

Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul Turkey

ABSTRACT

AUTHOR SUMMARY

Network models of proteins have opened up previously unexplored areas of study, since the level of coarse graining adopted has been shown to reveal several important characteristic of these systems. The findings are mainly based on the observation that a simplified, harmonic potential is capable of describing the collective modes of motion, which are also associated with the basic functioning of these molecular machines. The level of success of these studies depends on the quality with which we describe the interactions between pairs of residues in the protein. Here, we perform for the first time, a systematical study on how the predictions are affected by network construction. We first track the local orientational order of residues as new contacts are added with increasing cut-off distance. We also study in detail the spectral properties of the Hessian. We show why the network construction is free of the cut-off distance problem beyond a threshold value, if one is interested in the collective motions. We discuss the implications on the limitations and capabilities of the network models with regard to functionality-related predictions based on the most global motions. *All computer programs used in the analyses are available upon request.

INTRODUCTION

Globular proteins show diversified structures and sizes, yet, it has been claimed that they display a nearly random packing of amino acids with strong local symmetry on the one hand [1], and that they are regular structures that occupy specific lattice sites, on the other [2]. It was later shown that this classification depends on the property one investigates, and that (cid:83)(cid:85)(cid:82)(cid:87)(cid:72)(cid:76)(cid:81)(cid:86)(cid:3) (cid:71)(cid:76)(cid:86)(cid:83)(cid:79)(cid:68)(cid:92)(cid:3) (cid:179)(cid:86)(cid:80)(cid:68)(cid:79)(cid:79) - (cid:90)(cid:82)(cid:85)(cid:79)(cid:71)(cid:180)(cid:3) (cid:83)(cid:85)(cid:82)(cid:83)(cid:72)(cid:85) ties, where highly ordered structures are altered with few additional links [3]. Furthermore, packing density of proteins scales uniformly with their size [4,5] which causes them to show similar vibrational spectral characteristics to those of solids [6]. Dynamical studies of folded proteins draw much attention to their importance in relating the structure of the proteins to their specific function and collective behavior. Protein dynamics is generally both anisotropic and collective. Internal motional anisotropy is a consequence of the low symmetry local atomic environment, while the collectivity is mainly caused by the dense packing of proteins [7]. Theoretical studies on fluctuations and collective motions of proteins are based on either molecular dynamics (MD) simulations or normal mode analysis (NMA). Since, in molecular simulations with conventional atomic models and potentials, computational effort is demanding for proteins with more than a few hundreds of residues, coarse grained models with simplified governing potentials have been employed. The latter have shown a great success in the description of the residue fluctuations and the collective behavior of proteins [8]. One of these simplified models, NMA using a single parameter harmonic potential [9] following the uniform harmonic potential introduced originally by Tirion [10], successfully predicts the large amplitude motions of proteins in the native state [11]. Within the framework of this model, proteins are modeled as elastic networks whose nodes are residues linked by inter-residue potentials that stabilize the folded conformation. The residues are assumed to undergo Gaussian-distributed fluctuations about their native positions. The springs connecting each node to all other neighboring nodes are of equal strength, and only the atom pairs within a cut-off distance are considered without making a distinction between different types of residues. This model, with its simplicity, speed of calculation and relying mostly on geometry and mass distribution of the protein, demonstrates that a single-parameter model can reproduce complex vibrational properties of macromolecular systems. Elastic models based on the force balance around each node [12] led to the development of the so called Anisotropic Network Model (ANM) [13]. In the past few years, variant methods have been introduced; e.g. [14,15]. Applications of these models on many proteins show successful results in terms of predicting the collective behavior of proteins. By separating different components of normal modes, e.g. collective (low-frequency) motions, the nature of a conformational change, for example due to the binding of a ligand, may also be analyzed [16,17]. Despite numerous applications comparing the theoretical and experimental findings on a case-by-case basis, e.g. [18,19,20], only a few attempted a statistical assessment of the models. A methodology that evaluates the number of modes necessary to map a given conformational change from the degree of accuracy obtained by the inclusion of a given number of modes, showed the results to be protein dependent [21]. In another study where 170 pairs of structures were systematically analysed, it was shown that the success of coarse-grained elastic network models may be improved by recognizing the rigidity of some residue clusters [22,23]. To date, the structures that form the basis of the network models have been generated from certain rules of thumb. A distance threshold between the C or C atoms of the residues is used as the rule for the connectedness of a given pair of residues. Values in the range of 8 (cid:177) (cid:99)(cid:3) (cid:68)(cid:85)(cid:72)(cid:3) found in the literature based on the argument that (i) the eigenvalue distributions obtained from the modal decomposition are similar to those obtained from the full-atom NMA description of proteins, or (ii) these provide atomic fluctuation profiles that display the largest correlation with the experimental B-factors. Recently, distance weighted interaction schemes of various functional forms were proposed to circumvent the cut-off problem [23]. Voronoi tessalation of the space defined by the central (usually C or C ) atom into non-intersecting polyhedra constitute another route that frees one from defining a cut-off distance [24]. Atom-based network construction approaches have also been used [see reference [25] for a review of the variety of network construction methods in literature.] In this study, we use a systematic approach on a large set of globular proteins with varying architectures and sizes to find a basis for why the network models work well to define certain properties of the system. This enables us to assess residue-based approaches used in the construction of the networks. We track the local orientational order of residues as new contacts are added with increasing cut-off distance. We show that the network construction is free of the cut-off distance problem once a certain baseline threshold is accessed, if one is interested in the collective motions and the fluctuation patterns of the residues. Implications for the limitations and capabilities of the ANM methodology are discussed due to functionality-related predictions based on the most global motions. COMPUTATIONAL DETAILS Network construction.

A protein of N residues is treated as a residue-based structure, where the C atom of each amino acid is considered as a node, and the coordinates of the protein are obtained from the protein data bank (PDB) [26]. The network information is contained in the N (cid:238)(cid:3) N adjacency matrix, A, of inter-residue contacts, whose elements A ij are taken to be 1 for contacting pairs of nodes i and j , and zero otherwise. The criterion for contact is that the two nodes are within a cut-off distance r c of each other. Bond Orientational Order Parameter . We use the bond-orientational order parameter defined by Steinhard et al., a well established metric in the study of packed spheres [27], to analyze local connectivity around each residue: (1) are the spherical harmonic functions for a bond vector from residue i to n , (cid:537) and (cid:307) are the polar angles of this bond. is the total number of such contacts of i . This form of the order parameter is invariant under reorientations of the external coordinate system. In defining a mean protein environment, we further average over all residues; (2) Among different choices for l , is commonly employed as the bond orientational parameter, since it concurrently yields non-zero values for hexagonal close packed, cubic (simple, body centered and face centered) and icosahedral configurations [27,28,29]. Anisotropic network model.

In ANM , the networks are formed as described under the subsection Network Construction and the interactions between nodes is considered to be due to harmonic potentials [13].

Nodes within the predetermined cut-off distance r c are coupled by elastic springs having a uniform force constant (cid:534) . Thus, the overall potential of the molecule is given by the sum of all harmonic potentials among interacting nodes such that (3) is the average distance between residues i and j . For a network of N nodes, the Hessian is a 3 N x N matrix formed by a number of N super elements . The off-diagonal super elements of ( i (cid:143) j ), obtained from the second derivative of the total potential with respect to node positions, are given by (4) where X ij , Y ij , and Z ij are the Cartesian components of the distance vector . The diagonal super elements are given by . The elements of the inverse of the Hessian, G = H -1 , may be used to predict the auto- and cross-correlations of residues. G may be viewed as an N (cid:238)(cid:3) N matrix whose ij th element is the 3 (cid:238)(cid:22)(cid:3)(cid:80)(cid:68)(cid:87)(cid:85)(cid:76)(cid:91)(cid:3)(cid:82)(cid:73)(cid:3)(cid:70)(cid:82)(cid:85)(cid:85)(cid:72)(cid:79)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:3)(cid:69)(cid:72)(cid:87)(cid:90)(cid:72)(cid:72)(cid:81)(cid:3)(cid:87)(cid:75)(cid:72)(cid:3) x -, y -, and z -components of the fluctuations and of residues i and j ; i.e., (5) The correlations between residue pairs are obtained from the trace of its components, . In particular, the auto-correlations are the average residue fluctuations in space, and are directly proportional to the experimentally measured B-factors: (6) RESULTS Structural heterogeneity of amino acid distributions in proteins.

The local environment of a residue is imposed by the spatial organization of the other residues. Studies exploring possible correspondence between various forms of local order and the amino acid packing have been long explored [1,2,30,31,32]. Obviously, there is no single matching structure; instead, one finds traces of various well-known ordered states like the icosahedral and face centered cubic arrangements. The basic premise of the ANM is the local orientational heterogeneity which establishes unique force balance constraints around each node [13]. In the cut-off based models, local anisotropy is a natural function of the coordination radius. For a better understanding of this spatial dependence, we use the bond orientational parameter Q (equation 2). In a fully extended chain conformer, this parameter is exactly one and for a fully random distribution of bonds, it is zero. Q has also been reported for common regular packing arrangements; e.g. it is 0.57, 0.51 and 0.35 for 13-atom Face Centered Cubic (FCC), 15-atom Body Centered Cubic (BCC) and 7-atom Simple Cubic (SC) cluster [27], respectively, within their first coordination shell. In figure 1, we present Q for three different proteins, as well as the average for 8940 residues pertaining to 30 different proteins. As expected, at lower cut-off distances, we find lower symmetry (higher anisotropy) in the average residue environment. At smaller cut-off choices, a better correspondence is established between the environment of an average residue and that of common cubic crystallographic arrangements (FCC, BCC, SC). For example, assuming packing of spherical particles with a diameter of (cid:22)(cid:17)(cid:26)(cid:3)(cid:99)(cid:15)(cid:3) equal to average nearest neighbor C (cid:302) -C (cid:302)(cid:3) atom distances, Q is (cid:19)(cid:17)(cid:21)(cid:26)(cid:3)(cid:73)(cid:82)(cid:85)(cid:3)(cid:41)(cid:38)(cid:38)(cid:3)(cid:68)(cid:81)(cid:71)(cid:3)(cid:54)(cid:38)(cid:3)(cid:79)(cid:68)(cid:87)(cid:87)(cid:76)(cid:70)(cid:72)(cid:86)(cid:3)(cid:68)(cid:87)(cid:3)(cid:25)(cid:3)(cid:99)(cid:15)(cid:3)(cid:90)(cid:75)(cid:76)(cid:79)(cid:72)(cid:3)(cid:76)(cid:87)(cid:3)(cid:76)(cid:86)(cid:3)(cid:19)(cid:17)(cid:22)(cid:22)(cid:3)(cid:73)(cid:82)(cid:85)(cid:3)(cid:87)(cid:75)(cid:72)(cid:3)(cid:68)(cid:89)(cid:72)(cid:85)(cid:68)(cid:74)(cid:72)(cid:3) residues. With the addition of bonds in a larger coordination space, local anisotropy gradually decreases. This intuition is strongly validated for crystallographic configurations when further coordination shells are taken into account, where Q (cid:167)(cid:3) (cid:19)(cid:3) (cid:68)(cid:87)(cid:3) (cid:71)(cid:76)(cid:86)(cid:87)(cid:68)(cid:81)(cid:70)(cid:72)(cid:86)(cid:3) (cid:79)(cid:68)(cid:85)(cid:74)(cid:72)(cid:85)(cid:3) (cid:87)(cid:75)(cid:68)(cid:81)(cid:3) (cid:28)(cid:3) (cid:99)(cid:17)(cid:3) However, for proteins, the newly formed orientational order is nontrivial and noticeably persistent above r c =10 (cid:99) converging to around 0.13. The plateau which is reached by Q hints a non-negligible residual anisotropy and is comparable in value to what has been computed for a supercooled Lennard-Jones system [27]. The non-vanishing and uniform value of the bond orientational order at larger cut-off settings suggests that the network structure is well-preserved even if we introduce new contacts to the system. The smallest distance at which this particular bond orientational order builds up can be seen as a core structure which contains the essence of structural dynamics exhibited by the molecule. Motivated by this observation, we shall partition the Hessian matrix into two, as the essential and residual parts. This simplifies the picture and brings in valuable insight for the cut-off problem as we show next. Partitioning the Hessian into its essential and residual components.

The distributions studied in the previous subsection lead us to further examine the effect of the geometrical features brought into the system for the different choices of contacts. The system properties of interest studied by network models all rely on the construction of the Hessian matrix. Contacts within a chosen cut-off distance are all assumed to interact identically (i.e. uniform as in equation 4). Therefore, we study how the eigenvalue and eigenvector structure of the Hessian is modified by these choices. We first partition the Hessian into two parts: (8) We postulate that contains information due to the essential contacts of the nodes, whereas is the residual part where the interactions are added in a spherically symmetrical manner around the nodes. We assume that there is a total of interactions between pairs of nodes and r of these are of the latter type. Here m i is the number of interactions of the i th node. Equation 8 may be expanded as (9) where the eigenvalue decomposition and used in the latter term. Inasmuch as the eigenvectors of the and matrices are uncorrelated, the product and therefore will vanish and the Hessian will be . In practice, a complete independence of the two matrices is not expected, mainly due to the finite size, and hence the surface effects, of the protein systems studied. Then these interactions are not perfectly uncorrelated, and the Hessian is approximated by . In the Appendix, we outline the conditions under which this approximation may be achieved. The changes that are brought about by the term will have the result of modifying the eigenvalues of the matrix so that their order might be affected, and the corresponding eigenvectors will be perturbed. The eigenvectors of the lowest eigenvalues will be the least sensitive to these perturbative effects as we outline below [also see Chapter 8 in [33]]. First, we map all the eigenvalues of H into the interval , to alleviate the size effects while comparing the spectral properties obtained with different cut-offs, and for proteins of different sizes. This is achieved by rescaling the super-elements as, (10) The spectral stability theory of non-singular matrices provides upper bounds for the perturbation of eigenvectors under matrix perturbations. Let p i be an eigenvector and q i its perturbed counterpart. An upper bound for the angle between the two vectors is [34] (11) The gap function gap( i , A ) for a matrix A is defined as the smallest difference between a given eigenvalue and the remaining elements of the spectrum; . is the Frobenius norm. Due to the natural singularity of the Hessian owing to rigid body motions, equation 11 is not directly applicable to compute an upper bound for (cid:537) . Moreover, the value of the gap function for a given mode number i depends on the cut-off distance at which the H r matrix is formed. Nevertheless, the analysis pointed out by equation 11 is insightful in that the shift in a given eigenvector is expected to get lower when the Frobenius norm is minimized. We probe the dependence of the Frobenius norm of H (cid:130) on r c as shown in Figure 2 for six proteins of various sizes. For each case, we note minima by the arrow. Typically, the curves decrease from a high value at low r c to a minimum, r c * . During this initial decrease phase, the additional contacts that are included in the calculation of H (cid:130) bring in non-trivial structural information that contributes to the lowering of the Frobenius norm. At r c values higher than r c * , on the other hand, the new contacts do not modify the overall orientational structure Q , hence the force balance around the nodes, of the system (figure 1). However, the monotonic addition of these uniform structural elements increases their overall weight in the Frobenius norm, resulting in an increase in the calculated values. Invariant eigenvectors.

To quantify the effect of the perturbation introduced by H r on the eigenvectors of , we first compute the eigenvector, p i ( r c ), that belongs to a selected mode i at a series values spanning r c = 8 (cid:177) (cid:99) . Each of these eigenvectors has 3 N components. We then evaluate the projection of every pair of these eigenvectors obtained at different r c values by the dot product, p i ( r c1 (cid:12)(cid:194) p i ( r c2 ). For eigenvectors that are slightly perturbed, this value is close to 1 and for orthogonal eigenvectors it is zero. In figure 3, we exemplify the modifications in the lowest two non-trivial modes (i.e. modes 7 and 8) for the A chain of succinyl-CoA synthase (PDB code 1scu) by contour maps corresponding to these calculations. In this figure, the lower diagonal represents the projection of the eigenvectors of the slowest mode, and the upper diagonal is that for the second slowest mode. As is evident, both of these modes preserve their directionality for all r c (cid:33)(cid:3) (cid:20)(cid:20)(cid:3) (cid:99) . For example, the eigenvectors representing the two slowest modes are identical for the matrix formed at r c (cid:32)(cid:3) (cid:20)(cid:21)(cid:3) (cid:99)(cid:15)(cid:3) (cid:68)(cid:81)(cid:71)(cid:3) (cid:76)(cid:87)(cid:86)(cid:3) (cid:83)(cid:72)(cid:85)(cid:87)(cid:88)(cid:85)(cid:69)(cid:72)(cid:71)(cid:3) (cid:73)(cid:82)(cid:85)(cid:80)(cid:3) (cid:82)(cid:69)(cid:87)(cid:68)(cid:76)(cid:81)(cid:72)(cid:71)(cid:3) (cid:68)(cid:87)(cid:3) r c (cid:32)(cid:3) (cid:21)(cid:24)(cid:3) (cid:99) (these two points are marked with a star sign on the figure). The two most collective modes are found to yield similar results to those in Figure 3 for all proteins studied. However, it has been customary to follow the behavior of up to the first 20 slowest modes in protein calculations. It is therefore of interest to see the extent to which the eigenvectors, whose shapes are related to protein function in many studies, are sensitive to the choice of network construction. In Table 1, we report our calculations based on a set of 25 non-homolog proteins of various sizes. We find that, out of the 20 eigenvectors corresponding to the slowest eigenvalues obtained at r c1 (cid:32)(cid:3)(cid:20)(cid:21)(cid:3)(cid:99) , from 8 (cid:177)

20 eigenvectors are approximately invariant to the choice of larger r c2 , exemplified here by r c2 (cid:32)(cid:3) (cid:20)(cid:24)(cid:3) (cid:99)(cid:3)(cid:11)(cid:70)(cid:68)(cid:86)(cid:72)(cid:3) (cid:36)(cid:12)(cid:3) (cid:68)(cid:81)(cid:71)(cid:3) r c2 (cid:32)(cid:3) (cid:20)(cid:27)(cid:3) (cid:99)(cid:3) (case B). Our criterion for invariance is that the dot product p i ( r c1 (cid:12)(cid:194) p i ( r c2 (cid:12)(cid:3)(cid:149)(cid:3)(cid:19)(cid:17)(cid:26) . Thus, in all the cases we examine, the two slowest non-trivial modes of motion are approximately invariant to matrix construction strategy. In contrast, the invariance of the remaining modes is highly dependent on the particular protein structure. There is a statistically significant dependence of the number of invariant modes on protein size (Pearson correlation is 0.57 and 0.58 for the respective cases). This reflects the fact that the increased number of redundant interactions contribute to the conditions outlined in the Appendix, leading to the separability of the essential and redundant parts of the Hessian, despite the increase in the number of surface residues. Note that the bond orientational parameter Q is essentially the same for the core and surface residues (data not shown). DISCUSSION

In recent years, network models of proteins, RNA and their complexes have opened up previously unexplored areas of study, since the level of coarse graining adopted has been shown to describe several important phenomena unique to these self-assembled systems. The findings are mainly based on the observation that a simplified harmonic potential (equation 4) is capable of describing the collective modes of motion [7], which also are associated with the basic functioning of these molecular machines [35]. First, it was demonstrated that the Debye-Waller factors obtained from X-ray crystallography correlate with the fluctuations predicted by the theory [11]. This led to the study of the cross-correlations between the different parts of the system with confidence, leading to information not directly accessible by experiments; in particular, the coupled motions in the low frequency regions were found to shed light on many experimental findings and were utilized to uncover some mechanistic features; see, e.g. [36]. It was later shown that the eigenvectors associated with the lowest frequencies of motion also described the conformational changes accompanying binding [16,35,37,38,39,40]. The level of success in the latter work depends on the degree of collectivity displayed by the particular protein [22]. The number of modes that describe the essential motions is highly specific to the protein, or even to the different ligand bound forms of the same protein [21]. In yet other studies, the mode that best describes the conformational change was monitored to see if mutations of certain residues (carried out by modifying the force constant of contacting pairs, (cid:534) i in equation 4) affected the dominance of that particular mode [41,42]. Further, monitoring the response of the protein to local structural deformations [43] leads to valuable information on function and allosteric response [16,44]. The level of success of these studies, which all depend on the quality of the constructed Hessian, in relation to the method of network construction has not been addressed systematically. One exception is the work by da Silveira et al, which focuses on the residue contact properties for different network construction strategies [45]. Our analysis in the previous section uncovers spectral properties of the Hessian; in particular, the robustness of the most collective modes is based on properly including the local structure of proteins in the Hessian, whereas the longer range interactions build a redundant set. We discuss the implications of these findings by specific examples. Mean-square fluctuations of residues.

The residue-by-residue mean square fluctuations in a given protein are frequently exploited as a first step while constructing residue networks. The predictions of equation 6 are compared with the values obtained either experimentally or from MD simulations, and the r c value that best-represents the fluctuation profiles are selected to further study the system properties. We find for a set of 50 proteins that the correlation between the mean-square fluctuations of C (cid:302) atoms and the theoretical predictions of equation 6 improve as the cut-off distance is increased. This curious observation is valid up to r c values of at least (cid:21)(cid:24)(cid:3)(cid:99) . Two examples are displayed in figure 4: In figure 4a, we compare the mean square fluctuations calculated via MD simulations at 300 K of hen egg white lysozyme (HEWL, PDB code 6lyz, 129 residues) with the predictions from selected theoretical models of r c = 8, (cid:20)(cid:25)(cid:3) (cid:68)(cid:81)(cid:71)(cid:3) (cid:21)(cid:24)(cid:3) (cid:99) . The details of the MD simulations are given in reference [46]. A lower correlation coefficient of 0.52 between the predicted and MD-calculated values is obtained at r c (cid:32)(cid:3) (cid:27)(cid:3) (cid:99)(cid:17)(cid:3) (cid:55)(cid:75)(cid:76)(cid:86)(cid:3) (cid:76)(cid:86)(cid:3) (cid:71)(cid:88)(cid:72)(cid:3) (cid:87)(cid:82)(cid:3) (cid:87)(cid:75)(cid:72)(cid:3) (cid:79)(cid:68)(cid:85)(cid:74)(cid:72)(cid:3) (cid:73)(cid:79)(cid:88)(cid:70)(cid:87)(cid:88)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:3) of residues 47-48 and 68-70 belonging to the (cid:79)(cid:82)(cid:82)(cid:83)(cid:86)(cid:3)(cid:90)(cid:76)(cid:87)(cid:75)(cid:76)(cid:81)(cid:3)(cid:87)(cid:75)(cid:72)(cid:3)(cid:533)(cid:3)(cid:86)(cid:75)(cid:72)(cid:72)(cid:87)(cid:3) region defined by residues (43-45, 51-53, 58-59, 64-65, 78-79). The important interactions of these loops with the rest of the protein are omitted at low r c values, leading to much larger fluctuation patterns than is actually present. At the higher r c values, exemplified by r c = 16 (cid:99)(cid:3) in figure 4a, these local interactions are restored, and the correlations of 0.90 is obtained. At r c (cid:33)(cid:3)(cid:20)(cid:27)(cid:3)(cid:99)(cid:3)(cid:87)(cid:75)(cid:72)(cid:3)(cid:70)(cid:82)(cid:85)(cid:85)(cid:72)(cid:79)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3)(cid:76)(cid:86)(cid:3)(cid:86)(cid:82)(cid:80)(cid:72)(cid:90)(cid:75)(cid:68)(cid:87)(cid:3)(cid:79)(cid:82)(cid:90)(cid:72)(cid:85)(cid:72)(cid:71)(cid:3)(cid:71)(cid:88)(cid:72)(cid:3)(cid:80)(cid:68)(cid:76)(cid:81)(cid:79)(cid:92)(cid:3) to the suppressed fluctuations of the loop containing residues 47 and 48 when they are connected to long range residues (e.g. at r c (cid:32)(cid:3)(cid:21)(cid:24)(cid:3)(cid:99)(cid:3)(cid:76)(cid:81)(cid:3)(cid:73)(cid:76)(cid:74)(cid:88)(cid:85)(cid:72)(cid:3)(cid:23) a, it is 0.76). Nevertheless, the mean square fluctuation profiles are correctly captured at all r c (cid:33)(cid:3) (cid:20)(cid:19)(cid:3) (cid:99)(cid:15)(cid:3) (cid:90)(cid:76)(cid:87)(cid:75)(cid:3) (cid:70)(cid:82)(cid:85)(cid:85)(cid:72)(cid:79)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:3) (cid:82)(cid:73)(cid:3) r c (cid:88)(cid:83)(cid:3)(cid:87)(cid:82)(cid:3)(cid:20)(cid:27)(cid:3)(cid:99)(cid:3) for 6lyz (Table 1). In figure 4b, we present another example for a 263 residue -class protein (PDB code: 1arb), where the residue-by-residue experimental B-factors (bottom curve in gray) are compared with selected theoretical models: A relatively low correlation is obtained at r c = (cid:27)(cid:3) (cid:99) ; in particular, the fluctuations of surface loop residues 15 (cid:177)

20 and 135 (cid:177)

145 are overestimated due to the absence of important core-region contacts that are not taken into account at this cut-off distance. The r c = 16 (cid:99)(cid:3)(cid:80)(cid:82)(cid:71)(cid:72)(cid:79)(cid:3)(cid:70)(cid:68)(cid:83)(cid:87)(cid:88)(cid:85)(cid:72) s the experimentally determined fluctuation patterns, which remains unaltered at higher cut-offs. The Pearson correlation coefficients are 0.57, 0.81 and 0.82, at the respective cases displayed in figure 4b. We emphasize that the behavior exemplified by figure 4 is not unique to these two proteins, but is rather a common property of all proteins. Thus, one may conveniently partition the Hessian into two (equation 8), where contains information due to the essential contacts of the matrix whereas is the residual part where the interactions are added in a symmetrical manner around the nodes beyond the adopted r c value (see the orientational order Q in figure 1). The lowest eigenvalues are modified in a small window based on this partitioning, and the corresponding eigenvectors remain unchanged (figure 3 and Table 1). As a result, the inverse of the Hessian is nominally modified, accompanied by slight changes in the predicted C fluctuations (equation 6). A case study on predicting conformational change upon ligand binding.

Inasmuch as the global motions are dominated by one or the superposition of a few collective modes, the results for properties based-on these motions will not change appreciably with the different choices of r c . Due to the invariance of the eigenvectors under a perturbation to the essential part of the Hessian, the mode based predictions on the direction of motion between the unbound and bound conformations of the protein are also expected to converge. An example is shown in figure 5 for the protein adenylate kinase (ADK), for which the eigenvector that belongs to the lowest eigenvalue is known to describe the conformational change with high accuracy due to the collective behavior of the hinge motion between its NMP binding (residues 30-67) and LID domains (residues118-167) [47]. The eigenvectors corresponding to the two slowest modes of motion of apo ADK are plotted on the PDB structure in figure 5a. The eigenvectors obtained at r c (cid:32)(cid:3)(cid:27)(cid:3)(cid:99)(cid:3)(cid:11)(cid:82)(cid:85)(cid:68)(cid:81)(cid:74)(cid:72)(cid:12)(cid:3)(cid:68)(cid:81)(cid:71)(cid:3) r c =18 (cid:99)(cid:3)(cid:11)(cid:74)(cid:85)(cid:72)(cid:72)(cid:81)(cid:12)(cid:3)(cid:68)(cid:85)(cid:72)(cid:3)(cid:86)(cid:75)(cid:82)(cid:90)(cid:81)(cid:3)(cid:73)(cid:82)(cid:85)(cid:3)(cid:69)(cid:82)(cid:87)(cid:75)(cid:3)(cid:80)(cid:82)(cid:71)(cid:72)(cid:86)(cid:17)(cid:3)(cid:55)(cid:75)(cid:72)(cid:3)(cid:80)(cid:82)(cid:86)(cid:87)(cid:3)(cid:70)(cid:82)(cid:79)(cid:79)(cid:72)(cid:70)(cid:87)(cid:76)(cid:89)(cid:72)(cid:3)(cid:80)(cid:82)(cid:71) e is that of hinge bending of the LID (top) and NMP-binding (bottom) domains, and the next one is a twisting mode of these two domains. These modes are well separated at all r c values. In fact, mode 7 suffices to describe the conformational change upon ligand binding as shown in figure 5b. Here, we display in gray the residue-by-residue displacements of the apo and Ap5-bound structures of ADK obtained from the difference of their superposed experimental x-ray structures (PDB codes 4ake and 1ake, respectively). Also shown on the figure are the magnitudes of the eigenvector components acting on the residues from mode 7. The Pearson correlation between the experimental and theoretical curves is 0.9 for all cut-off distances at and above r c (cid:32)(cid:3) (cid:27)(cid:3) (cid:99)(cid:17)(cid:3) (cid:55)(cid:75)(cid:72)(cid:3) (cid:79)(cid:68)(cid:85)(cid:74)(cid:72)(cid:86)(cid:87)(cid:3) (cid:71) iscrepancy between theory and experiment is observed in the region spanning NMP binding domain residues; the conformational change of the LID domain on the opposite side is faithfully reproduced irrespective of the choice of r c . Moreover, the prediction of the overall conformational change does not change with r c , although, it may be improved by the inclusion of additional modes. Shifts in eigenvalue ordering and its relation to physical interpretations.

The increase in the correlation coefficient with r c as well as its persistence to very high r c implies that the main ingredients that contribute to the fluctuation predictions are present in the Hessian obtained at a relatively low r c , and the additional contacts act as a perturbation to this (cid:179)(cid:72)(cid:86)(cid:86)(cid:72)(cid:81)(cid:87)(cid:76)(cid:68)(cid:79)(cid:180)(cid:3) (cid:83) art of the matrix. However, one has to take extreme care in interpreting dynamical properties of proteins based-on the collective eigenvectors, as the ordering of global modes may shift with the selected cut-off distance as we exemplify next. We begin by noting that, when matrices are perturbed by the addition of a diagonal matrix (as is approximated by the current systems and outlined in the Appendix), the eigenvalues of the perturbed matrix are interlaced; i.e., ; i =2: n [see Section 8.1 in reference [33]]. In some cases, the gap between consecutive eigenvalues may be small enough so that their ordering changes. However, the associated eigenvectors are robust to the change in the eigenvalue order in the region where (cid:540) i is small. In figure 6, we exemplify one case where such a swapping of eigenvalue ordering occurs between the two slowest eigenvalues of HEWL. We first examine in figure 6a, the projection of the eigenvectors of the two most collective modes at different r c in the order they appear; e.g. p ( r c1 (cid:12)(cid:194) p ( r c2 ) and p ( r c1 (cid:12)(cid:194) p ( r c2 ). As in figure 3, the lower diagonal represents the projection of the eigenvectors of the slowest mode, and the upper diagonal is that for the second slowest mode. We find that the eigenvectors of each mode are parallel to each other in the cut-off ranges 8 < r c1 , r c2 < 16 and 16 < r c1 , r c2 (cid:31)(cid:3) (cid:21)(cid:22)(cid:3) (cid:99)(cid:17)(cid:3) (cid:43)(cid:82)(cid:90)(cid:72)(cid:89)(cid:72)(cid:85)(cid:15)(cid:3) (cid:76)(cid:81)(cid:3) (cid:87)(cid:75)(cid:72)(cid:3) (cid:85)(cid:68)(cid:81)(cid:74)(cid:72)(cid:3) (cid:27)(cid:3) (cid:31)(cid:3) r c1 < 16 and 16 < r c2 (cid:31)(cid:3) (cid:21)(cid:22)(cid:3) (cid:99)(cid:15)(cid:3) (cid:87)(cid:75)(cid:72)(cid:3) eigenvectors are nearly orthogonal for both modes. Evidently, at r c (cid:32)(cid:3)(cid:20)(cid:25)(cid:3)(cid:99)(cid:15)(cid:3)(cid:87)(cid:75)(cid:72)(cid:85)(cid:72)(cid:3)(cid:76)(cid:86)(cid:3)(cid:68)(cid:3)(cid:70)(cid:75)(cid:68)(cid:81)(cid:74)(cid:72)(cid:3) in the projection profile of vectors p ( r c > 16 (cid:99)(cid:12)(cid:3) (cid:82)(cid:81)(cid:3) p ( r c (cid:31)(cid:3) (cid:20)(cid:25)(cid:3) (cid:99)(cid:12)(cid:17)(cid:3) We next recalculate the (cid:83)(cid:85)(cid:82)(cid:77)(cid:72)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:15)(cid:3)(cid:68)(cid:86)(cid:86)(cid:88)(cid:80)(cid:76)(cid:81)(cid:74)(cid:3)(cid:87)(cid:75)(cid:68)(cid:87)(cid:3)(cid:87)(cid:75)(cid:72)(cid:3)(cid:72)(cid:76)(cid:74)(cid:72)(cid:81)(cid:89)(cid:68)(cid:79)(cid:88)(cid:72)(cid:3)(cid:82)(cid:85)(cid:71)(cid:72)(cid:85)(cid:76)(cid:81)(cid:74)(cid:3)(cid:76)(cid:86)(cid:3)(cid:86)(cid:90)(cid:68)(cid:83)(cid:83)(cid:72)(cid:71)(cid:3)(cid:68)(cid:87)(cid:3)(cid:20)(cid:25)(cid:3)(cid:99)(cid:3)(cid:71)(cid:88)(cid:72)(cid:3)(cid:87)(cid:82)(cid:3)(cid:76)(cid:81)(cid:87)(cid:72)(cid:85)(cid:79)(cid:68)(cid:70)(cid:76)(cid:81)(cid:74)(cid:30)(cid:3)(cid:76)(cid:17)(cid:72)(cid:17)(cid:3) we calculate the projection p ( r c1 (cid:12)(cid:194) p ( r c2 ) for the latter range of cut-off values. The (cid:179)(cid:70)(cid:82)(cid:85)(cid:85)(cid:72)(cid:70)(cid:87)(cid:72)(cid:71)(cid:180)(cid:3)(cid:83)(cid:85)(cid:82)(cid:77)(cid:72)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81)(cid:86)(cid:3) are shown in figure 6b. In this case, the effect of the perturbation on the slowest eigenvectors of the base Hessian is negligible (the dot product is 0.75 or better in all cases). We track the corresponding eigenvalues as a function of r c in figure 6c, where the swapping is evident. We plot an eigenvector on the protein structure in figure 6d to visualize one of the major modes of action, and its robustness towards the choice of r c . It corresponds to that labeled mode i in figures 6b and 6c obtained at r c = 12 (cid:68)(cid:81)(cid:71)(cid:3) (cid:20)(cid:27)(cid:3) (cid:99)(cid:3) (cid:11)(cid:82)(cid:85)(cid:68)(cid:81)(cid:74)(cid:72)(cid:3) (cid:68) nd green arrows, respectively). HEWL displays complex motions during its dynamics around the equilibrium state. Both of the major modes contributing to these motions act on the regions marked in red on the figure. They span the C-terminus residues and two (cid:79)(cid:82)(cid:82)(cid:83)(cid:86)(cid:3)(cid:77)(cid:82)(cid:76)(cid:81)(cid:76)(cid:81)(cid:74)(cid:3)(cid:533)(cid:3)(cid:86)(cid:87)(cid:85)(cid:68)(cid:81)(cid:71)(cid:86)(cid:3)(cid:11) (cid:533)(cid:3) (cid:86) heet region is marked in yellow. The motion shown is a twisting of the loops; the motion not displayed also corresponds to the twisting of these loops, 0 albeit in an orthogonal direction. Mode swapping and lack of dominance of a single mode exemplified here for HEWL suggest that, the complex motions in certain proteins may only be understood by studying the superposition of several such motions. Is there a recipe for residue network construction in protein models?

In this study, the degree of success of network models constructed from the PDB coordinates of proteins is shown to converge if the cut-off distance used is larger than a threshold value. We find that this value incorporates all the local essential interactions. A choice of r c in the vicinity of 16 (cid:99) covers the neighborhood structure of an arbitrary protein and its eigenvalue spectra [13]. However, for large proteins, this will introduce a large number of interactions which will be computationally demanding during the matrix inversion procedure. In such cases, one may resort to compute bond orientational order parameters and choose the optimal value when the structural descriptor Q converges. For large proteins the number of nodes will be high enough to obtain statistics for smooth curves where the peaks may be discerned, a problem that cannot be circumvented for small system sizes. However, this approach is computationally demanding for large systems and higher cut-off values; e.g. to calculate Q in the r c range of 5 (cid:177) (cid:21)(cid:27)(cid:3)(cid:99)(cid:3)(cid:82)(cid:81)(cid:3)(cid:68)(cid:3)(cid:70)(cid:82)(cid:80)(cid:83)(cid:88)(cid:87)(cid:72)(cid:85)(cid:3)(cid:90)(cid:76)(cid:87)(cid:75)(cid:3)(cid:21)(cid:17)(cid:24)(cid:3)(cid:42)(cid:43)(cid:93)(cid:3)(cid:44)(cid:81)(cid:87)(cid:72)(cid:79)(cid:3)(cid:52)(cid:88)(cid:68)(cid:71)(cid:3)(cid:70)(cid:82)(cid:85)(cid:72)(cid:3)(cid:38)(cid:51)(cid:56)(cid:3)(cid:68)(cid:81)(cid:71)(cid:3)(cid:27)(cid:42)(cid:37)(cid:3)(cid:53)(cid:36)(cid:48)(cid:3)(cid:73)(cid:82)(cid:85)(cid:3) the protein PDB codes 6lyz, 1ad2 and 1scu takes 740, 1700, and 3000 seconds, respectively. The matrix Frobenius norm of the scaled Hessian, especially for large systems constitutes an attractive alternative route due to its low computational cost and straightforward implementation. Thus, for any given protein, monitoring any of these three measurables, i.e. the eigenvalue spectra, orientational order, or Frobenius norm, will give an idea on the range of values that may be used for r c . Nevertheless, it is advisable to track the inner products of eigenvectors to ascertain that the modes of interest are robust to this selection. Furthermore, if the properties of interest rely on the predictions based on a few selected modes, one must also track the eigenvectors in case interlacing of the eigenvalues occurs. A quick scan of the projections of a given mode obtained at different network constructions will reveal such anomalies. In summary, we have shown that the slow modes are robust to the details of network construction once the essential contacts in the first few coordination shells are included. Therefore, the properties that depend on the most collective modes may be studied independent of this choice. This is in contrast to the modes that affect the medium to high frequency motions, since interlacing and mode shifts will lead to unpredictable changes in the eigenvectors. Therefore, in studies deriving information by relying on the superposition of a large number of modes, developing a sound network construction strategy is essential. APPENDIX

We have previously shown [12,13] that the Hessian may be decomposed into the product, (A1) where B is the 3 N (cid:238)(cid:3) M direction cosine matrix. The overall interactions are also written as the sum of the and matrices, containing the essential and the residual interactions, respectively. By substitution into equation 8, the Hessian may thus be expressed as (A2) may also be viewed as an N (cid:238)(cid:3) N supermatrix (equation 4) whose ij th element is the 3 (cid:238)(cid:22)(cid:3) matrix .

1 The cross terms in equation A2, and are each zero. To see why, consider the matrix with dimension 3 N (cid:238)(cid:3) M whose last ( r ) columns have zero elements and the matrix with dimension M (cid:238)(cid:3)(cid:22) N whose first ( M (cid:177) r ) rows have zero elements. Then, = (A3) Here, if bead i participates in the m th interaction, then the terms (3 i , m ), (3 i+ m ), and (3 i+ m ) are non-zero and consist of the direction cosine of bead i along interaction m , in the x -, y -, and z -directions, respectively. They are zero otherwise. A similar argument holds for . We now denote the elements of the and matrices by and , respectively, where p is the x -, y -, or the z -direction. In addition, the direction cosines for the interaction between beads i and j have the relationship . Then, the elements of the remaining terms in equation A2 may be explicitly written. For the essential component we have the terms, (A4) (A5) For the residual component, we have similar matrix elements for and with replacing terms. We now assume that the elements of the matrix are dependent on each other due to the underlying local structure of the protein, whereas those of the matrix are independent. For a sufficiently large number ( r ) of residual interactions appearing in such independent directions, we have the following approximate values for the diagonal super-elements of the residual component: (A6) with r i being the number of redundant interactions of each residue and . In addition, there is the constraint due to the law of cosines, . Substituting these limits, the off-diagonal and diagonal elements of the super-matrix elements of H r are in the bounds (A7) 2 (A8) For the cut-off distances considered in this study, r i > 10. For example, for the protein with PDB code 1sca which has 287 nodes, if the essential part of the Hessian is constructed at r c = (cid:20)(cid:21)(cid:3) (cid:99)(cid:15)(cid:3) (cid:68)(cid:81)(cid:71)(cid:3) (cid:87)(cid:75)(cid:72)(cid:3) (cid:83)(cid:72)(cid:85)(cid:87)(cid:88)(cid:85)(cid:69)(cid:72)(cid:71)(cid:3) (cid:43)(cid:72)(cid:86)(cid:86)(cid:76)(cid:68)(cid:81)(cid:3) (cid:76)(cid:86)(cid:3) (cid:70)(cid:82)(cid:81)(cid:86)(cid:87)(cid:85)(cid:88)(cid:70)(cid:87)(cid:72)(cid:71)(cid:3) (cid:68)(cid:87)(cid:3) (cid:20) (cid:99)(cid:15)(cid:3) (cid:87)(cid:75)(cid:72)(cid:3) (cid:68)(cid:89)(cid:72)(cid:85)(cid:68)(cid:74)(cid:72)(cid:3) (cid:81)(cid:88)(cid:80)(cid:69)(cid:72)(cid:85)(cid:3) (cid:82)(cid:73)(cid:3) (cid:85)(cid:72)(cid:86)(cid:76)(cid:71)(cid:88)(cid:68)(cid:79)(cid:3) interactions per node is 33.Thus, the leading terms of are along the diagonal and it may be approximated by a diagonal matrix. The effect of the perturbations brought about by the matrix with such approximate bounds, on the eigenvalues and the corresponding eigenvectors of the are thoroughly investigated in the text with accompanied examples. ACKNOWLEDGEMENTS. We thank Ibrahim Inanc for useful discussions. REFERENCES

1. Soyer A, Chomilier J, Mornon JP, Jullien R, Sadoc JF (2000) Voronoi tessellation reveals the condensed matter character of folded proteins. Physical Review Letters 85: 3532-3535. 2. Raghunathan G, Jernigan RL (1997) Ideal architecture of residue packing and its observation in protein structures. Protein Science 6: 2072-2083. 3. Atilgan AR, Akan P, Baysal C (2004) Small-world communication of residues and significance for protein dynamics. Biophysical Journal 86: 85-91. 4. Liang J, Dill KA (2001) Are Proteins Well Packed? Biophys J 81: 751-766. 5. Zhang JF, Chen R, Tang C, Liang J (2003) Origin of scaling behavior of protein packing density: A sequential Monte Carlo study of compact long chain polymers. Journal of Chemical Physics 118: 6102-6109. 6. ben-Avraham D (1993) Vibrational Normal-Mode Spectrum of Globular-Proteins. Physical Review B 47: 14559-14560. 7. Bahar I, Atilgan AR, Demirel MC, Erman B (1998) Vibrational dynamics of folded proteins: Significance of slow and fast motions in relation to function and stability. Physical Review Letters 80: 2733-2736. 8. Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology. Current Opinion in Structural Biology 15: 586-592. 9. Cui Q (2006) Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems; Raton B, editor. FL, USA: Chapman & Hall/CRC. 10. Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters 77: 1905-1908. 11. Bahar I, Atilgan AR, Erman B (1997) Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design 2: 173-181. 12. Yilmaz LS, Atilgan AR (2000) Identifying the adaptive mechanism in globular proteins: Fluctuations in densely packed regions manipulate flexible parts. Journal of Chemical Physics 113: 4454-4464. 13. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, et al. (2001) Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophysical Journal 80: 505-515. 14. Gerek ZN, Keskin O, Ozkan SB (2009) Identification of specificity and promiscuity of PDZ domain interactions through their dynamic behavior. Proteins-Structure Function and Bioinformatics 77: 796-811. 15. Song G, Jernigan RL (2007) vGNM: A better model for understanding the dynamics of proteins in crystals. Journal of Molecular Biology 369: 880-893. 16. Atilgan C, Ozkan SB, Gerek ZN, Atilgan AR (2010) Manipulation of conformational change in proteins by single residue perturbations. Biophys J submitted. 17. Delarue M, Sanejouand YH (2002) Simplified normal mode analysis of conformational transitions in DNA-dependent polymerases: the Elastic Network Model. Journal of Molecular Biology 320: 1011-1024. 18. Doruker P, Atilgan AR, Bahar I (2000) Dynamics of proteins predicted by molecular dynamics simulations and analytical approaches: Application to alpha-amylase inhibitor. Proteins-Structure Function and Genetics 40: 512-524. 19. Haliloglu T, Bahar I (1999) Structure-based analysis of protein dynamics: Comparison of theoretical results for hen lysozyme with X-ray diffraction and NMR relaxation data. Proteins-Structure Function and Genetics 37: 654-667. 20. Temiz NA, Meirovitch E, Bahar I (2004) Escherichia coli adenylate kinase dynamics: Comparison of elastic network model modes with mode-coupling N-15-NMR relaxation data. Proteins-Structure Function and Bioinformatics 57: 468-480. 4 21. Petrone P, Pande VS (2006) Can conformational change be described by only a few normal modes? Biophysical Journal 90: 1583-1593. 22. Yang L, Song G, Jernigan RL (2007) How well can we understand large-scale protein motions using normal modes of elastic network models? Biophysical Journal 93: 920-929. 23. Yang L, Song G, Jernigan RL (2009) Protein elastic network models and the ranges of cooperativity. Proc Natl Acad Sci U S A 106: 12347-12352. 24. Dupuis F, Sadoc JF, Jullien R, Angelov B, Mornon JP (2005) Voro3D: 3D Voronoi tessellations applied to protein structures. Bioinformatics 21: 1715-1716. 25. Bode C, Kovacs IA, Szalay MS, Palotai R, Korcsmaros T, et al. (2007) Network analysis of protein dynamics. Febs Letters 581: 2776-2782. 26. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The Protein Data Bank. Nucleic Acids Research 28: 235-242. 27. Steinhardt PJ, Nelson DR, Ronchetti M (1983) Bond-orientational order in liquids and glasses. Physical Review B 28: 784. 28. Torquato S (2002) Random Heterogenous Materials: Microstructure and Macroscopic Properties: Springer-Verlag. 29. Truskett TM, Torquato S, Debenedetti PG (2000) Towards a quantification of disorder in materials: Distinguishing equilibrium and glassy sphere packings. Physical Review E 62: 993. 30. Covell DG, Jernigan RL (1990) Conformations of folded proteins in restricted spaces. Biochemistry 29: 3287-3294. 31. Bagci Z, Jernigan RL, Bahar I (2002) Residue packing in proteins: Uniform distribution on a coarse-grained scale. Journal of Chemical Physics 116: 2269-2276. 32. Andrzej K, Jeffrey S (1994) Monte carlo simulations of protein folding. I. Lattice model and interaction scheme. Proteins: Structure, Function, and Bioinformatics 18: 338-352. 33. Golub GH, van Loan CF (1996) Matrix Computations. Baltimore: Johns Hopkins University Press. 34. Demmel JW (1997) Applied Numerical Linear Algebra: SIAM. 35. Hinsen K (1998) Analysis of domain motions by approximate normal mode calculations. Proteins-Structure Function and Genetics 33: 417-429. 36. Baysal C, Atilgan AR (2001) Elucidating the structural mechanisms for biological activity of the chemokine family. Proteins-Structure Function and Genetics 43: 150-160. 37. Keskin O (2007) Binding induced conformational changes of proteins correlate with their intrinsic fluctuations: a case study of antibodies. Bmc Structural Biology 7: 31. 38. Tama F, Sanejouand YH (2001) Conformational change of proteins arising from normal mode calculations. Protein Engineering 14: 1-6. 39. Zen A, Carnevale V, Lesk AM, Micheletti C (2008) Correspondences between low-energy modes in enzymes: Dynamics-based alignment of enzymatic functional families. Protein Science 17: 918-929. 40. Zheng WJ, Brooks BR, Thirumalai D (2009) Allosteric Transitions in Biological Nanomachines are Described by Robust Normal Modes of Elastic Networks. Current Protein & Peptide Science 10: 128-132. 41. Zheng W, Brooks BR, Thirumalai D (2007) Allosteric transitions in the chaperonin GroEL are captured by a dominant normal mode that is most robust to sequence variations. Biophysical Journal 93: 2289-2299. 42. Zheng WJ, Brooks B (2005) Identification of dynamical correlations within the myosin motor domain by the normal mode analysis of an elastic network model. Journal of Molecular Biology 346: 745-759. 5 43. Sacquin-Mora S, Lavery R (2009) Modeling the Mechanical Response of Proteins to Anisotropic Deformation. Chemphyschem 10: 115-118. 44. Atilgan C, Atilgan AR (2009) Perturbation-Response Scanning Reveals Ligand Entry-Exit Mechanisms of Ferric Binding Protein. PloS Computational Biology 5: 14. 45. da Silveira CH, Pires DEV, Minardi RC, Ribeiro C, Veloso CJM, et al. (2009) Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins-Structure Function and Bioinformatics 74: 727-743. 46. Okan OB, Atilgan AR, Atilgan C (2009) Nanosecond Motions in Proteins Impose Bounds on the Timescale Distributions of Local Dynamics. Biophysical Journal 97: 2080-2088. 47. Miyashita O, Onuchic JN, Wolynes PG (2003) Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins. Proceedings of the National Academy of Sciences of the United States of America 100: 12570-12575. 6

FIGURE LEGENDS Figure 1.

Bond orientational order parameter, Q , computed in the cut-off distance range r c = 5 (cid:177) (cid:21)(cid:27)(cid:3) (cid:99)(cid:3) (cid:90)(cid:76) t (cid:75)(cid:3) (cid:20)(cid:3) (cid:99)(cid:3) (cid:76)(cid:81)(cid:70)(cid:85)(cid:72)(cid:80)(cid:72)(cid:81)(cid:87)(cid:86) for three example proteins. The sizes and PDB codes are indicated on the figures; Q averaged over 30 non-homolog proteins of varying sizes is also displayed (gray line). We observe a decrease from Q = 0.5 at low r c towards a convergence to Q = 0.1 (cid:177) r c is increased. Q is a geometrical analysis which follows from the local force balance conditions due to the anisotropy of contacts in ANM. Figure 2.

The matrix Frobenius norm of the scaled Hessians (equation 11) computed for six example proteins, scanned at r c (cid:89)(cid:68)(cid:79)(cid:88)(cid:72)(cid:86)(cid:3) (cid:90)(cid:76)(cid:87)(cid:75)(cid:3) (cid:20)(cid:3) (cid:99)(cid:3) (cid:76)(cid:81)(cid:70)(cid:85)(cid:72)(cid:80)(cid:72)(cid:81)(cid:87)(cid:86)(cid:17)(cid:3) The sizes and PDB codes are indicated on the figures. The minima are marked by the arrows.

Figure 3.

Inner product maps for the eigenvectors p and p corresponding to the slowest two modes of the protein with PDB code 1scu. These are labeled modes 7 and 8, since the first six modes correspond to translational and rotational motions. At each point we compute the inner product of the eigenvectors of mode i at a pair of r c values; i.e. p i ( r c1 (cid:12)(cid:194) p i ( r c2 ). Lower triangle is for mode 7 and upper one is for mode 8. We find that above r c = 9 (cid:99) global modes remain unaltered as more contacts are added with larger cut-off radii. For example, the value of the projection p of the computed from networks formed at r c = 12 and 22 (cid:99) is 0.99 and that for p is 0.96 (these points are marked by star signs on the figure). Figure 4. (cid:41)(cid:79)(cid:88)(cid:70)(cid:87)(cid:88)(cid:68)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3) (cid:83)(cid:85)(cid:82)(cid:73)(cid:76)(cid:79)(cid:72)(cid:86)(cid:3) (cid:11)(cid:507) R i (cid:194)(cid:507) R i ); calculated by equation 6 and using three different residue network models obtained at r c (cid:32)(cid:3) (cid:27)(cid:15)(cid:3) (cid:20)(cid:25)(cid:15)(cid:3) (cid:21)(cid:24)(cid:3) (cid:99)(cid:3) (cid:68)(cid:85)(cid:72)(cid:3) (cid:70)(cid:82)(cid:80)(cid:83)(cid:68) red with; (a) residue fluctuations of HEWL calculated from MD simulations (lower gray curve). (b) X-ray B-factors (lower gray curve) of the 268 residue achromobacter lyticus protease (PDB code: 1arb).

Figure 5. (a)

The two most collective modes of apo ADK mapped onto the structure. For this protein, eigenvectors obtained at r c (cid:32)(cid:3)(cid:27)(cid:3)(cid:68)(cid:81)(cid:71)(cid:3)(cid:20)(cid:27)(cid:3)(cid:99)(cid:3)(cid:11)(cid:82)(cid:85)(cid:68)(cid:81)(cid:74)(cid:72)(cid:3)(cid:68)(cid:81)(cid:71)(cid:3)(cid:74)(cid:85)(cid:72)(cid:72)(cid:81)(cid:3)(cid:68)(cid:85)(cid:85)(cid:82)(cid:90)(cid:86)(cid:15)(cid:3)(cid:85)(cid:72)(cid:86)(cid:83)(cid:72)(cid:70)(cid:87)(cid:76)(cid:89)(cid:72)(cid:79)(cid:92)(cid:12)(cid:3)(cid:68)(cid:85)(cid:72)(cid:3) nearly indistinguishable. Mode 7, which describes hinge bending, is dominant and accounts for most of the conformational change upon binding as shown in (b) where the displacement profiles between the unbound and bound forms (PDB codes: 4ake and 1ake, respectively) are explored. The experimental displacements are shown in gray. Predictions from the relative magnitudes of the eigenvector corresponding to the slowest mode obtained at r c = 8, 1 (cid:19)(cid:15)(cid:3)(cid:20)(cid:24)(cid:3)(cid:99)(cid:3) are shown in black. The latter curves are displaced to guide the eye, and their zero baselines are marked by the dotted curves. The Pearson correlation between the experimental curve and each of the predictions is 0.9. Figure 6.

Inner product maps p i ( r c1 (cid:12)(cid:194) p i ( r c2 ) for eigenvectors p and p of HEWL displayed in (a) show that there is possible swapping in the mode order at r c (cid:32)(cid:3) (cid:20)(cid:25)(cid:3) (cid:99)(cid:30)(cid:3) (cid:87)(cid:75)(cid:76)(cid:86)(cid:3) (cid:83)(cid:85)(cid:72)(cid:71)(cid:76)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81)(cid:3) (cid:76)(cid:86)(cid:3) justified in (b) where the dot product p i ( r c1 (cid:12)(cid:194) p j ( r c2 ) for the regions that are swapped restores the picture with the slow modes staying unaltered with the choice of r c beyond a lower threshold. The swapping of the modes may also be tracked in the eigenvalues as shown (c) . In (d) , the mode labeled i in parts b and c obtained is mapped onto the protein structure for r c = (cid:20)(cid:21)(cid:3) (cid:68)(cid:81)(cid:71)(cid:3) (cid:20)(cid:27)(cid:3) (cid:99)(cid:3) (cid:11)(cid:82)(cid:85)(cid:68)(cid:81)(cid:74)(cid:72)(cid:3) (cid:68)(cid:81)(cid:71)(cid:3) (cid:74)(cid:85)(cid:72)(cid:72)(cid:81)(cid:3) (cid:68)(cid:85)(cid:85)(cid:82)(cid:90)(cid:86)(cid:15)(cid:3) (cid:85)(cid:72)(cid:86)(cid:83)(cid:72)(cid:70)(cid:87)(cid:76)(cid:89)(cid:72)(cid:79)(cid:92)(cid:12) . The mode is most effective in the (cid:85)(cid:72)(cid:74)(cid:76)(cid:82)(cid:81)(cid:86)(cid:3) (cid:70)(cid:82)(cid:79)(cid:82)(cid:85)(cid:72)(cid:71)(cid:3) (cid:85)(cid:72)(cid:71)(cid:29)(cid:3) (cid:55)(cid:75)(cid:72)(cid:3) (cid:79)(cid:82)(cid:82)(cid:83)(cid:86)(cid:3) (cid:77)(cid:82)(cid:76)(cid:81)(cid:76)(cid:81)(cid:74)(cid:3) (cid:87)(cid:75)(cid:72)(cid:3) (cid:92)(cid:72)(cid:79)(cid:79)(cid:82)(cid:90)(cid:3) (cid:70)(cid:82)(cid:79)(cid:82)(cid:85)(cid:72)(cid:71)(cid:3) (cid:533) -sheet structure and the C-terminus. Mode j (not shown) also acts on the same regions, albeit in orthogonal directions. 7 Table 1.

Number of approximately invariant eigenvectors in matrices and PDB id no. of residues

Case B Excluding the first six eigenvectors corresponding to translation and rotation of the whole molecule We report the number of eigenvectors with the lowest 20 eigenvalues in the matrix whose projection on is larger than the set threshold of 0.7. is formed at r c (cid:32)(cid:3)(cid:20)(cid:21)(cid:3)(cid:99)(cid:3) is formed r c (cid:32)(cid:3)(cid:20)(cid:24)(cid:3)(cid:99) is formed r c (cid:32)(cid:3)(cid:20)(cid:27)(cid:3)(cid:99) Not computed average of 30 proteins cut-off distance, r c (Å) Q Figure 1 cut-off distance, r c (Å) F r o b e n i u s N o r m Figure 2 Figure 3

25 50 75 100 125 MD r c (cid:32)(cid:20)(cid:25)(cid:3)(cid:99) r c (cid:32)(cid:21)(cid:24)(cid:3)(cid:99) r c (cid:32)(cid:27)(cid:3)(cid:99) Residue Index < R > ( a r b i t r a r y un i t s ) X-ray r c (cid:32)(cid:20)(cid:25)(cid:3)(cid:99) r c (cid:32)(cid:21)(cid:24)(cid:3)(cid:99) r c (cid:32)(cid:27)(cid:3)(cid:99) Residue Index < R > ( a r b i t r a r y un i t s ) Figure 4 mode mode r c (cid:32)(cid:27)(cid:3)(cid:99) r c (cid:32)(cid:20)(cid:19)(cid:3)(cid:99) r c (cid:32)(cid:20)(cid:24)(cid:3)(cid:99) X-ray

Residue Index D i s p l a c e m en t ( (cid:99) (cid:12)(cid:12)