Can all-atom protein dynamics be reconstructed from the knowledge of C-alpha time evolution?
Jiaojiao Liu, Jin Dai, Jianfeng He, Xubiao Peng, Antti J. Niemi
CCan all-atom protein dynamics be reconstructedfrom the knowledge of C α time evolution? Jiaojiao Liu, ∗ Jin Dai, † Jianfeng He, Xubiao Peng, ‡ and Antti J. Niemi
2, 4, 5, 1, § School of Physics, Beijing Institute of Technology, Beijing 100081, P.R. China Nordita, Stockholm University, Roslagstullsbacken 23, SE-106 91 Stockholm, Sweden Center for Quantum Technology Research, School of Physics,Beijing Institute of Technology, Beijing 100081, P. R. China Department of Physics and Astronomy, Uppsala University, P.O. Box 516, S-75120, Uppsala, Sweden Laboratory of Physics of Living Matter, School of Biomedicine, Far Eastern Federal University, Vladivostok, Russia
We inquire to what extent protein peptide plane and side chain dynamics can be reconstructed fromknowledge of C α dynamics. Due to lack of experimental data we analyze all atom molecular dy-namics trajectories from Anton supercomputer, and for clarity we limit our attention to the peptideplane O atoms and side chain C β atoms. We try and reconstruct their dynamics using four di ff erentapproaches. Three of these are the publicly available reconstruction programs Pulchra , Remo and
Scwrl4 . The fourth,
Statistical Method , builds entirely on statistical analysis of Protein Data Bank(PDB) structures. All four methods place the O and C β atoms accurately along the Anton trajectories.However, the
Statistical Method performs best. The results suggest that under physiological condi-tions, the all atom dynamics is slaved to that of C α atoms. The results can help improve all atomforce fields, and advance reconstruction and refinement methods for reduced protein structures. Theresults provide impetus for development of e ff ective coarse grained force fields in terms of reducedcoordinates. I. INTRODUCTION
The structure of a protein is commonly character-ized in terms of the C α atoms. They are located atthe branch points between the backbone and the sidechains, and as such their positions are subject to rela-tively stringent stereochemical constraints. For examplethe model building in a crystallographic structure de-termination experiment commonly starts with an initialC α skeletonization [1]. The central role of the C α atomsis also exploited widely in various structural classifica-tion schemes [2, 3], in threading [4], homology [5] andother modeling techniques [6], and de novo approaches[7]. The development of coarse grained energy func-tions for folding prediction also frequently points outthe special role of C α atoms [8, 9], and the aim of theso-called C α trace problem is to construct an accuratemain chain and/or all atom model of a crystallographicfolded protein, solely from the knowledge of the posi-tions of the C α atoms [10–15].In the dynamical case, knowledge of the all atomstructure is pivotal to the understanding how biologi-cally active proteins function. However, in the case of adynamical protein it remains very hard to come by withhigh precision structural information, and as a conse-quence our understanding of protein dynamics remainsvery limited [16–19]. Here we test a widely suggested ∗ [email protected] † [email protected] ‡ [email protected] § proposal that the dynamics of backbone and side chainatoms could be strongly slaved to the dynamics of theC α atoms, under physiological conditions. For this weaddress a dynamical variant of the static C α -trace prob-lem: We inquire to what extent can the motions of thepeptide plane and the side chain atoms be estimatedfrom the knowledge of the C α atom positions, in a dy-namical protein that moves under physiological condi-tions. Any systematic correlation between the dynamicsof C α atoms and other heavy atoms could be most valu-able, for our understanding of many important biolog-ical processes. A slaving of the peptide plane and sidechain heavy atom motions to the C α backbone dynam-ics would mean that many aspects of protein dynam-ics can be described by e ff ective coarse grained energyfunctions that are formulated in terms of reduced setsof coordinates that relate to the C α atoms only.Unfortunately, high precision experimental data ondynamical proteins under physiological conditions issparse, indeed almost non-existent. At the moment allatom molecular dynamics simulations remain the pri-mary source of dynamical information. These simula-tions are best exemplified by the very long Anton trajec-tories [20], that use the CHARMM22 (cid:63) force field [21].Accordingly we analyze the all atom trajectories thatwere produced in these simulations; specifically we con-sider the α -helical villin and the β -stranded ww-domaintrajectories reported in [20]. From the Anton trajecto-ries, we extract the C α dynamics. We then try to recon-struct the motions of other atoms: For clarity, we limitour attention to peptide plane O and the side chain C β atoms only. We reckon this is a limitation, but thesetwo atoms are common to all amino acids except forglycine that lacks the C β atom. Moreover, the O atom a r X i v : . [ q - b i o . B M ] J a n does not share a covalent bond, either with the C α or theC β . In the static case, the knowledge of the C α , O andC β atoms is often considered su ffi cient to determine thepositions of the remaining atoms, reliably and often atvery high precision e.g. with the help of stereochemicalconstraints and rotamer libraries [22–27]. Furthermore,there are highly predictive coarse grained and associa-tive memory Hamiltonians for protein structure deter-mination that employ exactly the reduced coordinate setof C α − C β − O atoms [28, 29].We compare four di ff erent reconstruction methods.These include the three publicly available programs Pulchra [13]
Remo [14] and
Scwrl [15]; note that
Scwrl does not predict the peptide plane atom positions, in-stead it is commonly used in combination with
Remo topredict the side chain atom positions. The fourth ap-proach we introduce here. It is a pure
Statistical Method ,it is based entirely on information that we extract fromhigh resolution crystallographic PDB structures.We find that all four methods have a very high successrate in predicting the positions of the dynamic O and C β atoms along the Anton C α backbone, both in the case ofvillin and ww-domain. Surprisingly, we find that ourstraightforward Statistical Method performs even bet-ter than the three other much more elaborate methods.Since our
Statistical Method introduces no stereochemi-cal fine tuning, nor force field refinement, it is also supe-rior to the other three in terms of computational speed.
II. METHODSA. Anton data
For data on protein dynamics, we use the all atomCHARMM22 (cid:63) force field trajectories simulated with
Anton supercomputer and reported in [20]. The data hasbeen provided to us by the authors. We present detailedresults for two trajectories that we have chosen for struc-tural diversity: We have selected the α -helical villin(based on PDB structure 2F4K) and the β -stranded ww-domain (based on PDB structure 2F21). The length ofthe villin trajectory is 120 µ s, and we have selected ev-ery 20 th simulated structure for our prediction analysis,for a total of 31395 structures. The length of the ww-domain trajectory is 651 µ s and we have chosen every40 th simulated structure for our prediction analysis, fora total of 60814 structures. The combination of thesetwo trajectories covers all the major regular secondarystructures, with all the biologically relevant amino acidsappearing, except CYS with its unique potential to formsulphur bridges. Furthermore, the villin in [20] involvesa NLE mutant and the HIS in [20] is protonated. Thusour analysis includes, at least to some extent, the e ff ectsof mutations and pH variations. In both villin and ww-domain, the Anton simulation observes several transi-tions between structures that are unfolded and that are(apparently) folded. This ensures that there is a good diversity of dynamical details for us to analyze.
B. Discrete Frenet frames
Our basic tool of analysis is the discrete Frenet fram-ing of the C α backbone [30] that we construct as follows:We take r i ( i = 1 , ..., N ) to be the (time dependent) coor-dinate of the i th C α atom along the backbone. The r i then form the vertices of the virtual C α backbone, thatwe visualize as a piecewise linear polygonal chain. Ateach vertex we introduce a right-handed orthonormalset of Frenet frames ( nbt ) that we define as follows t i = r i +1 − r i | r i +1 − r i | (1) b i = t i − × t i t i − × t i (2) n i = b i × t i (3)The framing is shown in Figure 1. For details of dis-crete Frenet frames and other framings of the C α back-bone, we refer to [30], and for a comparison with theRamachandran angle description we refer to [31].FIG. 1: Color online:
Definition of Frenet frames (1)-(3) at theposition of a C α atom. C. Pulchra , Remo and
Scwrl4 frames
The framing used in
Pulchra [13] is obtained fromfour successive C α coordinates. We start by defining e i,j = r i − r j | r i − r j | (4)The Pulchra frames ( ν ix , ν iy , ν iz ) are then ν ix = e i − ,i +1 (5) ν iy = e i,i +1 × e i − ,i | e i,i +1 × e i − ,i | (6) ν iz = ν ix × ν iy (7)The Remo [14] reconstruction program uses also framesthat are obtained from four successive C α coordinates:We again start with (4) and we setThe Remo frames are then x i = e i − ,i +2 + e i,i +1 | e i − ,i +2 + e i,i +1 | (8) y i = e i − ,i +2 − e i,i +1 | e i − ,i +2 − e i,i +1 | (9) z i = x i × y i (10)Both Pulchra and
Remo frames are very di ff erent fromthe discrete Frenet frames. In particular, both of theseframings exploit four consecutive C α atoms, while thediscrete Frenet frames use only three.Finally, in the case of Scwrl4 the backbone is describedin terms of the Ramachandran angles [15].FIG. 2:
Color online:
Definition of bond ( κ i ) and torsion ( τ i )angles in relation to the i th C α atom. D. Visualization
For protein structure visualization and our data anal-ysis, we use exclusively the Frenet frames ( n , b , t ); foreach C α atom we associate a (virtual) C α backbone bond( κ ) and torsion ( τ ) angle as follows, κ i = arccos( t i +1 · t i ) τ i = sign( b i +1 · n i ) arccos( b i +1 · b i ) (11)These angles are shown in Figure 2. To visualize thevarious atoms in a protein, we identify the bond andtorsion angles ( κ, τ ) as the canonical latitude and lon-gitude angles on the surface of a unit radius (Frenet)sphere S α : The center of the sphere is located at theC α atom, the north-pole of S α is the point where thelatitude angle κ = 0, the vector t points to the directionof the north-pole and this direction coincides with the canonical z -direction in a C α centered Cartesian coordi-nate system. The great circle τ = 0 passes through thenorth-pole and the tip of the normal vector n that lies atthe equator, the longitude a.k.a. torsion angle takes val-ues τ ∈ [ − π, π ) and it increases in the counterclockwisedirection around the positive z -axis i.e. around vector t .In the sequel, whenever we introduce a Frenet sphere S we shall use the convention that the triplet ( n , b , t )corresponds to the right-handed Cartesian ( xyz ) ∼ ( rgb )coordinate system, with the convention that n ∼ x ∼ red( r ), b ∼ y ∼ green ( g ) and t ∼ z ∼ blue ( b ). The color cod-ing of distributions on the spheres are always relativebut with equal MatLab setting, in all the cases that wedisplay, and intensity increases from no-entry white tolow density blue, and towards high density red.We plot the directions of the peptide plane O and sidechain C β atoms on the surface of S α in exactly the wayhow they are seen from the position of a miniature ob-server, standing at the center of the sphere S α and withhead up towards the north-pole. For the O atoms weuse spherical coordinates that we denote ( θ, φ ) for thelatitude and longitude, for the C β atoms we denote thespherical coordinates ( ϑ, ϕ ) for the latitude and longi-tude on S α . Both sets of coordinates are in direct corre-spondence with ( κ, τ ).In Figure 3 we visualize the statistical reference distri-butions of the O and C β atoms, in crystallographic Pro-tein Data Bank structures that have been measured withbetter than 1.0 ˚A resolution. We choose these, since wetrust that such ultra high resolution structures are rela-tively void of refinement.The O distribution is concentrated on a very narrowcircle-like annulus. It forms the base of a right circularcone with axis that coincides with the t vector and withconical apex at the center of the Frenet sphere; the lati-tude of the annulus is very close to θ = π/ π/
2. The regionsof α -helical and β -stranded regular structures are con-nected by a region of left-handed α -helical structuresand by a region of loops. There are very few entries out-side this circular region; notably the cis -peptide planeregion is located on a short strip under the β -strandedregion with a latitude angle close to θ ≈ π/ β distribution shown in Figure 3 is a little likea horse-shoe, forming a slightly distorted annulus thatis somewhat wider than in the case of the O distribu-tion. The regions of α -helical and β -stranded regularstructures are connected by a region of loops but thedistribution of left-handed α -helical structures is nowdisjoint from the main distribution.In Figures 4 and 5 we show the corresponding Anton distributions for the O an C β atoms, separately in thecase of villin and ww-domain. When we compare the Anton distributions and the ultra high resolution PDBdata of Figure 3, we observe that the overall structureof the statistical distributions are very similar. We alsonote that as expected, in the villin trajectory there is aclear predominance of the α -helical region, while in theFIG. 3: Color online:
Top: The ( θ,φ ) distribution of peptideplane O atoms in the below 1.0 ˚A resolution PDB structureson the surface of the Frenet sphere S α . Bottom: The ( ϑ,ϕ )distribution of C β atoms in the below 1.0 ˚A resolution PDBstructures on S α . In both Fgures we have identified the majorregions α -helices ( α ), β -strands ( β ), left-handed α -helices( α L ) and cis -peptide planes ( cis ). case of ww-domain the β -stranded region dominates.The PDB and Anton distributions are otherwise remark-ably similar, superficially the only di ff erence appears tobe due to thermal fluctuations in the latter: The crystal-lographic data is often taken at liquid nitrogen temper-atures below 77 K while the simulation temperature inthe Anton data is around 360K for both villin and ww-domain.The strong similarity between the distributions in Fig-ures 3-5 motivates us to make the following bold pro-posal: Even in the case of a dynamical protein undernear-physiological conditions, the relative positions ofthe O and C β atoms can be reconstructed with high ac-curacy from the knowledge of the C α atoms motion - atleast when the CHARMM22 (cid:63) force field is used to de- FIG. 4: Color online:
Top: The distribution of peptide planeO atoms in the villin trajectory of
Anton . Bottom: Thedistribution of peptide plane O atoms in the ww-domaintrajectory of
Anton . scribe the dynamics. E. Reconstruction
We proceed to try and reproduce the individual O andC β atom positions in the Anton trajectories of villin andww-domain [20], solely from the knowledge of the C α atoms. We employ four di ff erent reconstruction meth-ods:
1. Pulchra
For
Pulchra [13] we use the version 3.04. We start withthe C α coordinates that we obtain from Anton . We thenuse
Pulchra to reconstruct the other heavy atom posi-tions, with only the instantaneous C α coordinates of theFIG. 5: Color online:
Top: The distribution of side chain C β atoms in the villin trajectory of Anton . Bottom: Thedistribution of side chain C β atoms in the ww-domaintrajectory of Anton . Anton trajectory as an input. From the
Pulchra struc-tures we read the coordinates of the peptide plane Oand side chain C β atoms, and compare with the origi-nal Anton data.
2. Remo
For
Remo [14] we use the version 3.0, and we proceedwith reconstruction as in the case of
Pulchra . We notethat
Remo employs
Scwrl for the side chains.
3. Scwrl4
For stand-alone
Scwrl [15] we use the version 4.0.Since
Scwrl can not reconstruct the peptide planes, weuse it only for the C β comparison. For this we first con-struct the peptide planes using Pulchra , since
Remo isalready based on
Scwrl .
4. Statistical Method
Unlike the previous three methods, our
StatisticalMethod approach to reconstruction does not employany force field refinement, stereochemical constraints,or any other kind of data curation. It only uses sta-tistical analysis of PDB data to predict the positions ofthe peptide plane O and side chain C β atoms from theknowledge of the C α atom positions: Once the C α co-ordinates of an amino acid are given, a search algorithmfits it with a PDB structure and identifies the ensuingO and C β atom coordinates as the reconstructed coordi-nates.As already stated, the PDB pool consist of all thosecrystallographic structures that have been measuredwith better than 1.0 ˚A resolution. We start with a vi-sual presentation of the C α bond and torsion angle den-sity distribution of these PDB structures, in terms of astereographically projected Frenet sphere.Let S α ( i ) be centered at the i th C α atom of a givenPDB structure. The vector t i has its tail at the centerof S α ( i ) and its head lies at the north pole, this vectorpoints from the i th C α atom towards the ( i + 1) st C α atom. The ( i + 2) nd C α is then located similarly, in thedirection of the Frenet vector t i +1 that points from thecenter of S α ( i + 1) towards its north pole.We parallel transport the vector t i +1 without any ro-tation until its tail becomes located at the i th C α atomposition i.e. at the origin of S α ( i ). Let ( κ i , τ i ) be theFrenet frame coordinates of the head of the paralleltransported t i +1 on the surface of S α ( i ). These coor-dinates depict how a miniature Frenet frame observer,standing at the position of the i th C α atom and head to-wards the north pole of S α ( i ), sees the backbone twist-ing and bending when she proceeds along the chain tothe position of the ( i + 1) st C α atom.We repeat the construction for all the C α atoms alongall the chains in our pool. This yields us a statistical dis-tribution in terms of the coordinates ( κ, τ ) of the headsof the parallel transported vectors t . We visualize thedistribution by projecting the sphere S α stereograph-ically onto the complex plane from the south pole asshown in Figure 6. The relation between the sphericalcoordinates ( κ, τ ) and the z = x + iy coordinates on theplane is x + iy = tan (cid:18) κ (cid:19) e iτ FIG. 6:
Color online:
Stereographic projection of two sphereonto plane, with the projection taken from the south-pole.
In Figure 7 we show the distribution of all the C α atom coordinates in our PDB data set, on the stereo-graphically projected Frenet sphere. The distribution isFIG. 7: Color online:
Statistical distribution of all C α atomsin our 1.0 ˚A pool of PDB structures, on stereographicallyprojected sphere shown in Figure 6. largely concentrated inside an annulus, with inner cir-cle κ in ≈ κ out ≈ π/
2. The various reg-ular secondary structures such as α -helices, β -strandsand left-handed α -helices are clearly identifiable andmarked in this Figure.We proceed to describe how we predict the positionsof the O and C β atoms from the knowledge of the C α atoms coordinates, along a given (dynamical) proteinstructure: We first use the torsion angle τ ∈ [ − π, π ) to di-vide the statistical distribution of Figure 7 into 60 equalsize sectors sized ∆ τ = π/
30 radians. We then divideeach of these sectors into two sets, one with bond an-gle κ < . κ ≥ . κ = 1 . α -helix-like and β -strand-like (secondary) structures.Now suppose that we have a C α atom along a proteinbackbone, with coordinates ( κ i , τ i ). We determine thecoordinates ( θ i , φ i ) of the corresponding O atom and thecoordinates ( ϑ i , ϕ i ) of the corresponding C β atom usingthe following algorithm: Step 1:
We first use the τ i value of the C α atom to selectthe pertinent sector τ i ∈ ∆ τ . We then use its κ i valuetogether with κ i +1 of the following C α atom along thechain, to assign with the given C α atom one of the four sets Set ∆ κ : κ i < . κ i +1 < . ∆ κ : κ i < . κ i +1 ≥ . ∆ κ : κ i ≥ . κ i +1 < . ∆ κ : κ i ≥ . κ i +1 ≥ . ∆ τ sectors this divides our statisti-cal C α distribution into 4 ×
60 subsets [ ∆ κ ; ∆ τ ], and wechoose the one that corresponds to the ( κ i , κ i +1 , τ i ) val-ues of the C α atom we consider. Step 2:
We search a protein structure in our pool, inthe subset [ ∆ κ, ∆ τ ] of the i th C α , for which two consec-utive amino acids are also identical to the i th and ( i +1) st amino acids of the protein structure that we consider. • If we find only one matching pair of amino acids in thesubset [ ∆ κ, ∆ τ ], we use the coordinates of its O and C β atoms as the predicted coordinates of the O, C β atomsof the i th C α atom. • If there are two or more matching pairs, we use the av-erage value of their O and C β coordinates to determinethose of the O and C β around the C α of interest. • If there are no pairs of identical amino acids in thesubset [ ∆ κ, ∆ τ ], we use the average value of all PDBstructures in this subset to determine the O and C β co-ordinates. • Finally, if the subset [ ∆ κ, ∆ τ ] is empty we search forone from a neighboring subset, first from preceding ∆ τ ,then from following ∆ τ , then from neighboring ∆ κ ; butsuch cases are highly exceptional. Step 3:
We repeat the process for all C α atoms alongthe chain. At the end of the chain there is no κ i +1 , thusat the end of the chain we use only the κ i value in oursearch.Our reconstruction algorithm is extremely simple andproceeds very fast computationally, much faster thanany of the other three reconstruction programs we con-sider, even though we have not optimized the search al-gorithm but use a straightforward MatLab code.The sector size ∆ τ can be changed and optimised;here we have chosen 60 sectors, only to exemplify themethod.The reason why we divide the original annulus intotwo using κ = 1 . α atoms, in thecase of a bond angle only three C α are needed; see Fig-ure 2. Thus, by engaging the two neighboring bond an-gle values, we employ the full information in all four C α atoms in our search algorithm.In Step 2, we calculate the average values of the anglesas follows: For the average latitude θ ave (similarly for ϑ ave ) we simply use θ ave = 1 N N (cid:88) i = k θ k where the summation is over all elements in the givensubset [ ∆ κ, ∆ τ ]. For the average longitude φ ave (simi-larly for ϕ ave ) we proceed as follows: We first define X = 1 N N (cid:88) i = k cos φ k & Y = 1 N N (cid:88) i = k sin φ k and R = √ X + Y The average value is then obtained fromcos φ ave = XR & sin φ ave = YR F. Algorithm comparisons
1. Direction comparison
In order to compare the di ff erent reconstructionmethods we introduce two unit length vectors −−−−→ CαO and −−−−→
Cαβ ; the former points from the C α atom to the follow-ing peptide plane O atom ( X = O in the sequel), and thelatter points from the C α atom to its side chain C β atom( X = β in the sequel). We evaluate these vectors for allresidues i = 1 , ..., N and for every structure k = 1 , ..., K in the Anton data. We also evaluate these vectors inall of the four reconstruction methods and in the sequely = P , R , S , M stands for
Pulchra (P),
Remo (R),
Scwrl4 (S)and the
Statistical Method (M) respectively.We define Θ yX [ i, k ] to be the angle between a vector −−−−→ CαX evaluated from the
Anton data, and the corre-sponding vector obtained from the corresponding re-construction method. The statistical distribution func-tion for all the values Θ yX [ i, k ] measures the overall accu-racy of a given method, for reconstructing the individ-ual O and C β atom positions.For each of the k = 1 , ..., K Anton chain structure, weevaluate the root-mean-square (RMS) value of the angles Θ yX [ i, k ] by summing over the N individual residues ofthe chain, RMS [ Θ yX ( k )] = (cid:118)(cid:117)(cid:116) N N (cid:88) i =1 (cid:16) Θ yX [ i, k ] (cid:17) (12)The distribution density (12) then measures the overallaccuracy of a method, in the reconstruction of the C α -O-C β chains.
2. Distance comparison
We also compare the algorithms by evaluating theRMS distance between di ff erent atomic positions in the Anton trajectory and in the reconstructed trajectory. TheRMS distance between two chain structures is evaluatedfrom RMSD( X ; k ) = (cid:118)(cid:117)(cid:116) N N (cid:88) i =1 | r A,ki,X − r y,ki,X | (13)Here r A,ki,X is the
Anton data distance between the C α atom and the X = O , β atom at residue i in structure k , and r y,ki,X is the corresponding quantity in the y =P , R , S , M reconstructed structure. The probability dis-tribution of (13) is a complement of (12), as a measureof the overall accuracy of a method in the reconstructionof the entire chain in a statistical sense.For reference, in Figure 8 we present the combineddistribution of the Debye-Waller fluctuation distances (cid:112) < | x | > = (cid:114) B π for the O and C β atoms, that we have evaluated usingthe B -factors in our 1.0 ˚A PDB pool; note the logarith-mic scale. The fluctuation distances are strongly peakedat around 0.3 ˚A, and there are practically no structureswith a fluctuation distance less than 0.12 ˚A.FIG. 8: Color online:
Combined B-factor fluctuation distancesfor the O and C β atoms Similarly, Figure 9 shows the statistical distributionsin the individual C α -O and C α -C β distances that wecalculate directly from the coordinates in our PDB datapool and Anton data, respectively; only results for villinare shown as the results for ww-domain are very simi-lar. In the following we shall not refine the C α -O andthe C α -C β distances, in our statistical method, the dif-ferences are in any case minor. Instead we simply usethe average PDB distance values ∆ r = 2 .
40 ˚A For C α − O distance ∆ r = 1 .
53 ˚A For C α − C β distancein our Statistical Method reconstruction. It is natural toallocate the larger variance in the case of
Anton data inFigures 9 to temperature fluctuations.FIG. 9:
Color online: distribution of distances in ˚Angstr¨om,calculated directly from the coordinates in our PDB data and
Anton data for villin. Top: C α -O Bottom: C α -C β . III. RESULTSA. Peptide plane O atoms
We start with the peptide plane O atom distributionsin the
Anton data. We compare the
Anton results shownin Figure 4, with results from
Pulchra , Remo and
Statis-tical Method .
1. Frenet Spheres
In Figures 10 we present the reconstructed O atomdistributions on a Frenet sphere S , in the case of villin.The Figures display all the reconstructed O atom coor-dinates ( θ, φ ) for all the C α atoms of all Anton trajec-tories that we obtain using
Pulchra , Remo and
StatisticalMethod respectively.Overall, all three distributions reproduce well the Oatom villin distribution of the
Anton trajectory in Fig-ure 4 (top). In particular, the regular α -helical and β -stranded regions are clearly identifiable. We observethat both Pulchra and
Remo distributions are slightlywider than the PDB distribution, we propose that thisreflects mainly the presence of thermal e ff ects in the An-ton data, as captured by these two methods. On theother hand, the
Statistical Method distribution is moreconcentrated. This is expected since the data pool is asubset of the PDB data shown in Figure 4 (top), whichhave all been measured at very low temperature values. FIG. 10:
Color online:
Top: Reconstructed peptide plane Odistribution for villin on Frenet sphere S . Top: Pulchra
Middle:
Remo
Bottom:
Statistical Method . We remind that the color coding is always relative butequal, in all the cases that we display, and intensity in-creases from no-entry white to low density blue, and to-wards high density red.In Figures 11 we present the reconstructed O distri-butions in the case of ww-domain. Again, all three dis-tributions are very much in line with the
Anton data ofFigure 4 (bottom). We observe some fragmentation andexcess (thermal) data spreading in the case of
Pulchra .The
Remo distribution also displays thermal spreadingwhile the
Statistical Method distribution is again moreconcentrated, as expected since it forms a subset of thelow temperature PDB data.We now analyze the reconstruction results for thepeptide plane O atoms in more detail, using the meth-ods of Section II F.
2. Individual angular probability densities for peptide planes
In Figures 12 and 13 we show the normalized proba-bility density distributions for all the individual angles Θ yO [ i, k ] along the Anton trajectories for
Pulchra , Remo and the
Statistical method in the case of villin and ww-domain O atoms, respectively.In both
Pulchra and
Remo the individual Θ yO [ i, k ] val-ues peak near the small value Θ max ≈ . Statistical Method the peak is located at the even smallervalue Θ max ≈ .
06 (rad), both in the case of villin andww-domain.From Figure 9 we learn that the average PDB distancebetween C α and O is around 2.4 ˚A. An angular devia-tion of Θ max ≈ .
06 then corresponds to a distance devi-ation ∼ .
14 ˚A which is smaller than the B-factor fluc-tuation distances in Figure 8 and more in line with the(apparently purely thermal) distance deviations we ob-serve in Figures 9.We conclude that each of the three methods can re-construct the individual angular positions of the dy-namical
Anton
O atoms with very high precision. More-over, despite its simplicity the
Statistical Method per-forms even better than both
Pulchra and
Remo .In each of the probability distributions of Figures12, 13 we observe enhanced accumulation of data near Θ ≈ π/
2; the inserts show the probability densities for Θ -values above 1.0 (rad). We recall our interpretationof the Frenet sphere O distribution as the base of a cone,with the apex at the origin; the vertex angle has a valuevery close to π/
2. Thus the Θ ≈ π/ φ ∼
180 degree rotation around the (blue) t -vector in the Figures 10, 11. We observe that a rotationof the longitude φ by ∼ o exchanges the α -helicaland β -stranded regions according to top Figure 3.
3. Angular probability densities for entire chains
In Figures 14, 15 we show the probability density dis-tributions (12) for
Pulchra , Remo and
Statistical Method ,in the case of villin and ww-domain O atoms, respec-tively. We remind that the value of (12) is a measurefor the accuracy of the reconstruction, in the case of theentire chain. FIG. 11:
Color online:
Top: Reconstruction of peptide planeO distribution on the sphere S α for ww-domain. Top: Pulchra
Middle:
Remo
Bottom:
Statistical Method . In each case, the reconstructed chains appear to bevery close to the original
Anton chains, both in the caseof villin and ww-domain. The
Statistical Method per-forms best and the results for
Pulchra are quite similar,but for
Remo we observe a clear deviation from the
Sta-
Color online:
Probability density distribution for allindividual angles Θ yO [ i,k ] in the case of Anton villintrajectories. Top:
Pulchra
Middle:
Remo
Bottom:
StatisticalMethod . tistical method ; the results from the latter are visibly bet-ter.Note that in the case of both Pulchra and
StatisticalMethod , both the villin and the ww-domain profiles ap-pear to resemble a combination of two distinct Gaussiandistributions. On the other hand, in the case of
Remo thedistribution is like a single Gaussian (thermal spread),in both cases.
4. RMSD probability densities for entire chains
In Figures 16, 17 we show the probability density dis-tributions for the RMS distances (13), evaluated for theentire C α -O reconstructed chains that we obtain using Pulchra , Remo and
Statistical Method in the case of villinand ww-domain
Anton trajectories.Again, the reconstruction by the
Statistical Method isclosest to the original
Anton result. The di ff erence to Pulchra is very small but there is a more visible di ff er- FIG. 13: Color online:
Probability density distribution for allangles Θ yO [ i,k ] along the Anton ww-domain trajectories. Top:
Pulchra
Middle:
Remo
Bottom:
Statistical Method . ence to Remo .Even though all the RMS distances between the O-chains shown in Figures 16 and 17 are small, they areslightly larger than what we can expect on the basis ofthe individual (O and C β ) atom B-factor fluctuation dis-tances in Figure 8. In line with Figures 14, 15, both Pul-chra and
Statistical Method distributions again exhibit adouble Gaussian profile; In the case of
Remo the distri-butions form a single Gaussian which is more in linewith a thermal spread, except that the peak is close to1.0 ˚A which is a somewhat large value in comparison tothe
Statistical Method where the two peaks are slightlybelow values 0.6 ˚A and 0.8 ˚A i.e. close to the cova-lent radius ∼ .
66 ˚A of O atom. But even in the caseof
Remo , the deviation distances can be considered to bequite small and we consider the results to be good.1FIG. 14:
Color online:
The probability density distributions(12) for villin. Top:
Pulchra
Middle:
Remo
Bottom:
StatisticalMethod .
5. Peptide plane flip
In Figures 12 and 13 we have noted an accumulationof entries near Θ ∼ π/
2, corresponding to a ∼
180 degreerotation of the entire probability distribution aroundthe t vector. Consequently these entries contribute max-imally to the deviations between the Anton chain andthe reconstructed chains, in terms of statistical distribu-tions. We note that Θ ∼ π/ ∼ . Anton entries for which Θ > . all of the three reconstructionmethods; the spatial distance for two entries that are Θ ∼ . ∼ Θ > . An-ton data from PDB structures. There are a total ofaround 6.000 such entries, this corresponds to a mere0.6 per cent of the total entries that we have analyzed.We evaluate the average values of Θ for all these entries.The probability distribution for these average values isconcentrated very close to Θ = π/ Color online:
The probability density distributions(12) for ww-domain. Top:
Pulchra
Middle:
Remo
Bottom:
Statistical Method .
18 (top). The entries are also distributed quite evenlyaround the O-circle on the Frenet sphere (Figure 18)(bottom), they seem to appear quite randomly duringthe
Anton time evolution, and correspond to very sud-den and short-lived 180 degree back-and-forth rotations(flips) of the entire peptide plane, around the virtualbond that connect two consecutive C α atoms.From the available Anton data, we are unable toconclude whether these peptide plane flips are gen-uine physical e ff ects with an important role in pro-tein folding, or whether they are mere simulation ar-tifacts, or whether they are e ff ects that are specific tothe CHARMM22 (cid:63) force field [21]. For example, it ap-pears that in the Anton simulations the peptide plane N-H covalent bond lengths are constrained to have a fixedvalue. That may a ff ect the stability of peptide planes,causing them to flip by 180 degrees.To properly understand the character of these pep-tide plane flips one needs to perform more detailed allatom simulations, presumably with shorter time stepsthan used in Anton simulations and with no length con-straints on the N-H covalent bonds.2FIG. 16:
Color online:
The probability density distributionsfor the RMS distance (13) over the O atoms in the case ofvillin. Top:
Pulchra
Middle:
Remo
Bottom:
Statistical Method .
6. Double Gaussians
In Figures 14-17 we have observed the presence of adouble Gaussian peak structures, in the
Pulchra and
Sta-tistical Method distributions. We now proceed to try andidentify the cause. We present a detailed analysis of thedouble Gaussian structures in the case of the
StatisticalMethod
RMSD distributions in Figures 16 and 17; theanalysis in the other cases is similar, with similar con-clusions.In [20] the following quantity Q ( t ) = (cid:80) N res i =1 (cid:80) N i j =1 (cid:20) e d ij ( t ) − ( d ij +1)) (cid:21) − (cid:80) N res i =1 N i (14)has been introduced, to characterize the distance of adynamical chain at time t , to an experimentally mea-sured folded state. Here N i is the number of contactsof residue i along the chain as defined in [20], d ij ( t ) isthe distance in ˚A between the C α atoms of residues i and j at time t and d ij is the same distance in the na- FIG. 17: Color online:
The probability density distributionsfor the RMS distance (13) over the O atoms in the case ofww-domain. Top:
Pulchra
Middle:
Remo
Bottom:
StatisticalMethod . tively folded, crystallographic structure. According to[20] a structure with Q > . Q < . Statistical Method peaks in thevillin distribution Figure 16 and in the ww-domain dis-tribution Figure 17. In both cases, we analyze separatelythe low RMSD subsets with values below 0.54 ˚A invillin and below 0.6 ˚A in ww-domain, and the highRMSD subsets with RMSD values above 0.77 ˚A in villinand above 0.8 ˚A in ww-domain. These four subsets areidentified by the dashed lines in the Figures. In Figures19 we show the probability density distributions for thevalues of (14) that we evaluate for these subsets. Both inthe case of villin and ww-domain the low-RMSD peakcorresponds to large values of Q which is characteristicto near-folded state, while the large-RMSD peak corre-sponds to predominantly small values of Q which arecharacteristic to unfolded states.3FIG. 18: Color online:
Top: The distribution of average valueof angular deviations for the reconstructed entries with Θ > . FIG. 19:
Color online: Statistical Method probabilitydistributions for the Q -values (14), corresponding to theGaussian peaks in Figures 16 and 17 a) low-RMSD villin, b)high-RMSD villin, c) low-RMSD ww-domain, d) high-RMSDww-domain. B. Side chain C β atoms We proceed to describe results from the C β atom re-construction. Due to the high accuracy in all the meth-ods we use, the presentation is less detailed than in thecase of peptide plane O atoms: The di ff erences to Anton simulation results are minuscule.We investigate reconstruction results in four ap-proaches:
Pulchra , Remo , Pulchra+Scwrl4 , and our
Sta-tistical method . We note that
Remo commonly constructsthe side chain atoms using
Scwrl while
Pulchra employsits own side chain reconstruction. Thus we have added a
Pulchra+Scwrl4 combination, where the peptide planesare first constructed with
Pulchra and then side chainsare constructed with
Scwrl4 . This combination shouldbe of interest, since we have found that
Pulchra performsslightly better than
Remo in the reconstruction of the
Anton peptide plane O atom positions.
1. Frenet spheres
In Figures 20 and 21 we have the Frenet sphere prob-ability distributions for the four methods, in the case ofvillin and ww-domain respectively.FIG. 20:
Color online:
Reconstructed side chain C β distribution for villin on the Frenet sphere S α . Figure a) Pulchra , Figure b)
Remo , Figure c)
Pulchra+Scwrl4 , Figure d)
Statistical Method
Pulchra reconstructs quite accurately the
Anton distri-butions of the C β atoms, shown in Figure 5. In partic-ular, it reconstructs the region of left-handed α -helices. Pulchra also broadens the overall shape of the distribu-tion in a manner which, at least superficially, appears toaccount for the thermal fluctuations that are present inthe
Anton distributions.
Pulchra and
Scwrlt4 combination increases the spreadof the C β distributions, there is now an even betterresemblance between the Anton distributions with the(perceived) thermal fluctuations.
Remo reconstruct the C β distributions in terms of veryconcentrated distributions that are highly peaked at the4FIG. 21: Color online:
Reconstructed side chain C β distribution for ww-domain on the Frenet sphere S α . Figurea) Pulchra , Figure b)
Remo , Figure c)
Pulchra+Scwrl4 , Figured)
Statistical Method C α and C β regions, with no observable thermal spread-ing: Remo appears to reconstruct the C β positions in avery straightforward two-stage fashion. In particular,there is a marked di ff erence between the Pulchra+Scwrl4 and
Remo distributions even though
Remo apparentlyuses
Scwrl for side chain reconstruction [14].
Statistical Method selects a subset of the full PDB dis-tribution in Figure 3 (bottom), as expected. Since thePDB data has been mostly measured at liquid nitrogentemperatures below ∼
77 K, the distributions do not dis-play thermal spreading.
2. Individual angular probability densities for side chains
In Figure 22 we have combined the probability dis-tribution functions for the individual angles Θ y β [ i, k ] inthe case of C β , for all the four reconstruction methodswe consider. We observe very little di ff erence betweenthe four methods, all distributions are strongly peakednear Θ ≈ .
15 (rad). According to Figure 9 (bottom) thePDB average for the C α -C β bond length is around 1.54˚A. Thus a Θ ≈ .
15 (rad) angular deviation correspondsto an average distance deviation of around 0.2 ˚A whichis well within the the limits of the individual B-factorfluctuation distances in Figure 8. FIG. 22:
Color online:
Probability distribution for theindividual angular deviations Θ yβ [ i ; k ] in the fourreconstruction methods.Top: villin and Bottom: ww-domain.
3. RMS Probability densities for reconstructed chains
Figures 23 show the C β probability distributions of(12) and Figures 24 show the C β probability distribu-tions of (13), for the reconstructed chains. Generallyspeaking, all four methods are able to reconstruct theC β positions with high accuracy. The Statistical Method performs best but the di ff erence to Remo is tiny. For
Pul-chra the results are slightly worse: Its combination with
Scwrl4 performs a bit better than
Pulchra alone in thecase of villin, but in the case of ww-domain the resultsare opposite. However, the di ff erences between all thefour methods are quite small, and the RMS distancesare all in line with the experimental B-factor fluctuationdistances shown in Figure 8. IV. CONCLUSIONS
To comprehend protein dynamics is a prerequisite forthe ability to understand how biologically active pro-teins function. However, despite the importance of pro-tein dynamics, our knowledge remains very limited.High quality experimental data on dynamical proteinsat near-physiological conditions is sparse and very dif-ficult to come by, the primary source of information istheoretical considerations in combination with all atommolecular dynamics simulations; the latter are best ex-emplified by the very long
Anton trajectories [20]Here we have searched for systematics in the dynam-5FIG. 23:
Color online:
Probability distribution for the angularRMS values (12) in the four reconstruction methods. Top:villin and Bottom: ww-domain.
FIG. 24:
Color online:
Probability distribution for the RMSdistances (13) in the four reconstruction methods. Top: villinand Bottom: ww-domain. ics of proteins at near-physiological conditions, by ana-lyzing the
Anton trajectories for villin and ww-domain.We have inquired to what extent can the dynamicsof C α atoms determine that of the peptide plane O atoms and the side chain C β atoms. For this we havecompared the original simulation results with trajecto-ries that we have reconstructed solely from the knowl-edge of the Anton C α trace. We have analyzed the re-sults from the reconstruction approaches Pulchra , Remo , Scwrl and a direct
Statistical Method that we developedhere. All these methods exploit crystallographic ProteinData Bank structures which have been measured mostlyat the very low temperatures of liquid nitrogen i.e. be-low 77 K. On the other hand, the
Anton simulations havebeen performed at around 360K. Thus we expect thatbesides e ff ects with a purely dynamical origin, thereshould be systematic di ff erences that can be allocatedto thermal fluctuations. Nevertheless, we have foundthat the positions of both O and C β atoms in the Anton trajectories can be determined with very high precisionsimply by using the knowledge of the static, crystallo-graphic PDB structures; both dynamical and thermal ef-fects are surprisingly small. The results propose that thepeptide plane and side chain dynamics is very stronglyslaved to the C α atom motions, and subject to only verysmall thermal fluctuation deviations.Our results can be explained in di ff erent ways: Itwould be truly remarkable if in a dynamical proteinat near-physiological conditions, the O and C β motionscan indeed be determined, and with a very high preci-sion, solely from a knowledge of the C α atom dynamics.Such a strong slaving to C α dynamics would be a verystrong impetus for the development of e ff ective energymodels for protein dynamics, in terms of reduced sets ofcoordinates at various levels of coarse graining. Alterna-tively, it can also be that the force field CHARMM22 (cid:63) that was used in the Anton simulations [20], simplylacks the resiliency of actual proteins. In that case ourresults can shed light for ways to improve the accuracyof existing force field, and help to determine more strin-gent standards for simulations. Indeed, we have ob-served the presence of very short-lived but systematicpeptide plane flips along the
Anton trajectories. Theseflips could be true physical e ff ects that are important toprotein folding and dynamics. But they could as wellbe a consequence of too harsh simulation obstructions,such as the use of too long elemental time steps and/orexclusion of all fluctuations in the hydrogen covalentbond lengths. We have also observed, in the case of both Pulchra and
Statistical Method , the presence of an appar-ent two-state structure in the O atom distributions, thatseems to correlate with the distance between the dy-namical structure and the natively folded state. In thecase of
Remo no such two-stage structure is observed.Quite unexpectedly to us, the purely PDB based
Sta-tistical Method appears to perform best in reconstruct-ing the O and C β atom positions. We suspect that thisis partly due to the choice of C α framing: The framingsthat are used in the case of Pulchra and
Remo are math-ematically correct, but might not account to the C α ge-ometry as well as the Frenet framing does. It is possi-ble that variants of Pulchra and
Remo that are based on6Frenet framing, could bring about even higher precisionthan any of the methods we have analysed. Thus, ourresults should help the future development of increas-ingly precise reconstruction algorithms, for a wide spec-trum of refinement and structure determination pur-poses. It should also be of interest to extend the
Sta-tistical Method for the analysis of all other heavy atomsalong a protein structure, possibly following [27].Finally, we note that the visual analysis methodologythat we have developed is very versatile. It can be ap-plied to analyze protein structure and dynamics, widely.
V. ACKNOWLEDGEMENTS
We thank N. Ilieva for communications and discus-sions, AJN also thanks A. Liwo for discussions. Thework of AJN has been supported by the Qian Renprogram at BIT, by a grant 2013-05288 and 2018-04411from Vetenskapsr˚adet, by Carl Trygger Stiftelseand and by Henrik Granholms stiftelse. [1] T.A. Jones, J.Y. Zou, S.W. Cowan, M. Kjeldgaard, ActaCryst.
A47
110 (1991)[2] I. Sillitoe, A.L. Cu ff , B.H. Dessailly, N.L. Dawson, N.Furnham. D. Lee, J.G. Lees, T.E. Lewis, R.A. Studer, R.Rentzsch, C. Yeats, J.M. Thornton, C.A. Orengo, NucleicAcids Res. , D490 (2013)[3] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, J.Mol. Biol.
536 (1995)[4] A. Roy, A. Kucukural, Y. Zhang, Nature Protocols
145 (2009)[7] K. Dill, S.B. Ozkan, T.R. Weikl, J.D. Chodera, V.A. Voelz,Curr. Op. Struct. Biol.
342 (2007)[8] H.A. Scheraga, M. Khalili, A. Liwo, Ann. Rev. Phys.Chem.
57 (2007)[9] S. Kmiecik, D. Gront, M. Kolinski, L. Wieteska, A.E.Dawid, A. Kolinski, Chem. Rev. , 7898 (2016)[10] L. Holm, C. Sander, Journ. Mol. Biol.
183 (1991)[11] M.A. DePristo, P.I.W. de Bakker, R.P. Shetty, T.L. Blundell,Prot. Sci.
437 (2003)[13] P. Rotkiewicz, J. Skolnick, Journ. Comp. Chem.
665 (2009)[15] G.G. Krivov, M.V Shapovalov, R.L. Dunbrack Jr., Proteins
964 (2007) [17] H. Frauenfelder, G. Chen, J. Berendzen, P.W. Fenimore,H. Jansson, B.H. McMahon, I.R. Stroe, J. Swenson, R.D.Young, PNAS
163 (2011)[19] S. Khodadadi, A.P. Sokolov, Soft Matter ff -Larsen, S. Piana, R. O. Dror, D.E. Shaw. Sci-ence ff -Larsen, D.E. Shaw, Biophys. J. L47(2011)[22] J. Janin, S. Wodak, M. Levitt, B. Maigret, J. Mol. Biol.
357 (1978)[23] S.C. Lovell, J. Word, J.S. Richardson, D.C. Richardson,Proteins
389 (2000)[24] H. Schrauber, F. Eisenhaber, P. Argos J. Mol. Biol.
27 (2014)[28] M. Sasai, P.G. Wolynes, Phys. Rev. Lett. E83139