[PDF] Can all-atom protein dynamics be reconstructed from the knowledge of C-alpha time evolution?

Abstract

We inquire to what extent protein peptide plane and side chain dynamics can be reconstructed from knowledge of C-alpha dynamics. Due to lack of experimental data we analyze all atom molecular dynamics trajectories from Anton supercomputer, and for clarity we limit our attention to the peptide plane O atoms and side chain C-beta atoms. We try and reconstruct their dynamics using four different approaches. Three of these are the publicly available reconstruction programs Pulchra, Remo Scwrl4. The fourth, Statistical Method, builds entirely on statistical analysis of Protein Data Bank (PDB) structures. All four methods place the O and C-beta atoms accurately along the Anton trajectories. However, the Statistical Method performs best. The results suggest that under physiological conditions, the all atom dynamics is slaved to that of C-alpha atoms. The results can help improve all atom force fields, and advance reconstruction and refinement methods for reduced protein structures. The results provide impetus for development of effective coarse grained force fields in terms of reduced coordinates.

Full PDF

CCan all-atom protein dynamics be reconstructedfrom the knowledge of C α time evolution? Jiaojiao Liu, ∗ Jin Dai, † Jianfeng He, Xubiao Peng, ‡ and Antti J. Niemi

2, 4, 5, 1, § School of Physics, Beijing Institute of Technology, Beijing 100081, P.R. China Nordita, Stockholm University, Roslagstullsbacken 23, SE-106 91 Stockholm, Sweden Center for Quantum Technology Research, School of Physics,Beijing Institute of Technology, Beijing 100081, P. R. China Department of Physics and Astronomy, Uppsala University, P.O. Box 516, S-75120, Uppsala, Sweden Laboratory of Physics of Living Matter, School of Biomedicine, Far Eastern Federal University, Vladivostok, Russia

We inquire to what extent protein peptide plane and side chain dynamics can be reconstructed fromknowledge of C α dynamics. Due to lack of experimental data we analyze all atom molecular dy-namics trajectories from Anton supercomputer, and for clarity we limit our attention to the peptideplane O atoms and side chain C β atoms. We try and reconstruct their dynamics using four di ﬀ erentapproaches. Three of these are the publicly available reconstruction programs Pulchra , Remo and

Scwrl4 . The fourth,

Statistical Method , builds entirely on statistical analysis of Protein Data Bank(PDB) structures. All four methods place the O and C β atoms accurately along the Anton trajectories.However, the

Statistical Method performs best. The results suggest that under physiological condi-tions, the all atom dynamics is slaved to that of C α atoms. The results can help improve all atomforce ﬁelds, and advance reconstruction and reﬁnement methods for reduced protein structures. Theresults provide impetus for development of e ﬀ ective coarse grained force ﬁelds in terms of reducedcoordinates. I. INTRODUCTION

The structure of a protein is commonly character-ized in terms of the C α atoms. They are located atthe branch points between the backbone and the sidechains, and as such their positions are subject to rela-tively stringent stereochemical constraints. For examplethe model building in a crystallographic structure de-termination experiment commonly starts with an initialC α skeletonization [1]. The central role of the C α atomsis also exploited widely in various structural classiﬁca-tion schemes [2, 3], in threading [4], homology [5] andother modeling techniques [6], and de novo approaches[7]. The development of coarse grained energy func-tions for folding prediction also frequently points outthe special role of C α atoms [8, 9], and the aim of theso-called C α trace problem is to construct an accuratemain chain and/or all atom model of a crystallographicfolded protein, solely from the knowledge of the posi-tions of the C α atoms [10–15].In the dynamical case, knowledge of the all atomstructure is pivotal to the understanding how biologi-cally active proteins function. However, in the case of adynamical protein it remains very hard to come by withhigh precision structural information, and as a conse-quence our understanding of protein dynamics remainsvery limited [16–19]. Here we test a widely suggested ∗ [email protected] † [email protected] ‡ [email protected] § proposal that the dynamics of backbone and side chainatoms could be strongly slaved to the dynamics of theC α atoms, under physiological conditions. For this weaddress a dynamical variant of the static C α -trace prob-lem: We inquire to what extent can the motions of thepeptide plane and the side chain atoms be estimatedfrom the knowledge of the C α atom positions, in a dy-namical protein that moves under physiological condi-tions. Any systematic correlation between the dynamicsof C α atoms and other heavy atoms could be most valu-able, for our understanding of many important biolog-ical processes. A slaving of the peptide plane and sidechain heavy atom motions to the C α backbone dynam-ics would mean that many aspects of protein dynam-ics can be described by e ﬀ ective coarse grained energyfunctions that are formulated in terms of reduced setsof coordinates that relate to the C α atoms only.Unfortunately, high precision experimental data ondynamical proteins under physiological conditions issparse, indeed almost non-existent. At the moment allatom molecular dynamics simulations remain the pri-mary source of dynamical information. These simula-tions are best exempliﬁed by the very long Anton trajec-tories [20], that use the CHARMM22 (cid:63) force ﬁeld [21].Accordingly we analyze the all atom trajectories thatwere produced in these simulations; speciﬁcally we con-sider the α -helical villin and the β -stranded ww-domaintrajectories reported in [20]. From the Anton trajecto-ries, we extract the C α dynamics. We then try to recon-struct the motions of other atoms: For clarity, we limitour attention to peptide plane O and the side chain C β atoms only. We reckon this is a limitation, but thesetwo atoms are common to all amino acids except forglycine that lacks the C β atom. Moreover, the O atom a r X i v : . [ q - b i o . B M ] J a n does not share a covalent bond, either with the C α or theC β . In the static case, the knowledge of the C α , O andC β atoms is often considered su ﬃ cient to determine thepositions of the remaining atoms, reliably and often atvery high precision e.g. with the help of stereochemicalconstraints and rotamer libraries [22–27]. Furthermore,there are highly predictive coarse grained and associa-tive memory Hamiltonians for protein structure deter-mination that employ exactly the reduced coordinate setof C α − C β − O atoms [28, 29].We compare four di ﬀ erent reconstruction methods.These include the three publicly available programs Pulchra [13]

Remo [14] and

Scwrl [15]; note that

Scwrl does not predict the peptide plane atom positions, in-stead it is commonly used in combination with

Remo topredict the side chain atom positions. The fourth ap-proach we introduce here. It is a pure

Statistical Method ,it is based entirely on information that we extract fromhigh resolution crystallographic PDB structures.We ﬁnd that all four methods have a very high successrate in predicting the positions of the dynamic O and C β atoms along the Anton C α backbone, both in the case ofvillin and ww-domain. Surprisingly, we ﬁnd that ourstraightforward Statistical Method performs even bet-ter than the three other much more elaborate methods.Since our

Statistical Method introduces no stereochemi-cal ﬁne tuning, nor force ﬁeld reﬁnement, it is also supe-rior to the other three in terms of computational speed.

II. METHODSA. Anton data

For data on protein dynamics, we use the all atomCHARMM22 (cid:63) force ﬁeld trajectories simulated with

Anton supercomputer and reported in [20]. The data hasbeen provided to us by the authors. We present detailedresults for two trajectories that we have chosen for struc-tural diversity: We have selected the α -helical villin(based on PDB structure 2F4K) and the β -stranded ww-domain (based on PDB structure 2F21). The length ofthe villin trajectory is 120 µ s, and we have selected ev-ery 20 th simulated structure for our prediction analysis,for a total of 31395 structures. The length of the ww-domain trajectory is 651 µ s and we have chosen every40 th simulated structure for our prediction analysis, fora total of 60814 structures. The combination of thesetwo trajectories covers all the major regular secondarystructures, with all the biologically relevant amino acidsappearing, except CYS with its unique potential to formsulphur bridges. Furthermore, the villin in [20] involvesa NLE mutant and the HIS in [20] is protonated. Thusour analysis includes, at least to some extent, the e ﬀ ectsof mutations and pH variations. In both villin and ww-domain, the Anton simulation observes several transi-tions between structures that are unfolded and that are(apparently) folded. This ensures that there is a good diversity of dynamical details for us to analyze.

B. Discrete Frenet frames

Our basic tool of analysis is the discrete Frenet fram-ing of the C α backbone [30] that we construct as follows:We take r i ( i = 1 , ..., N ) to be the (time dependent) coor-dinate of the i th C α atom along the backbone. The r i then form the vertices of the virtual C α backbone, thatwe visualize as a piecewise linear polygonal chain. Ateach vertex we introduce a right-handed orthonormalset of Frenet frames ( nbt ) that we deﬁne as follows t i = r i +1 − r i | r i +1 − r i | (1) b i = t i − × t i t i − × t i (2) n i = b i × t i (3)The framing is shown in Figure 1. For details of dis-crete Frenet frames and other framings of the C α back-bone, we refer to [30], and for a comparison with theRamachandran angle description we refer to [31].FIG. 1: Color online:

Deﬁnition of Frenet frames (1)-(3) at theposition of a C α atom. C. Pulchra , Remo and

Scwrl4 frames

The framing used in

Pulchra [13] is obtained fromfour successive C α coordinates. We start by deﬁning e i,j = r i − r j | r i − r j | (4)The Pulchra frames ( ν ix , ν iy , ν iz ) are then ν ix = e i − ,i +1 (5) ν iy = e i,i +1 × e i − ,i | e i,i +1 × e i − ,i | (6) ν iz = ν ix × ν iy (7)The Remo [14] reconstruction program uses also framesthat are obtained from four successive C α coordinates:We again start with (4) and we setThe Remo frames are then x i = e i − ,i +2 + e i,i +1 | e i − ,i +2 + e i,i +1 | (8) y i = e i − ,i +2 − e i,i +1 | e i − ,i +2 − e i,i +1 | (9) z i = x i × y i (10)Both Pulchra and

Remo frames are very di ﬀ erent fromthe discrete Frenet frames. In particular, both of theseframings exploit four consecutive C α atoms, while thediscrete Frenet frames use only three.Finally, in the case of Scwrl4 the backbone is describedin terms of the Ramachandran angles [15].FIG. 2:

Color online:

Deﬁnition of bond ( κ i ) and torsion ( τ i )angles in relation to the i th C α atom. D. Visualization

For protein structure visualization and our data anal-ysis, we use exclusively the Frenet frames ( n , b , t ); foreach C α atom we associate a (virtual) C α backbone bond( κ ) and torsion ( τ ) angle as follows, κ i = arccos( t i +1 · t i ) τ i = sign( b i +1 · n i ) arccos( b i +1 · b i ) (11)These angles are shown in Figure 2. To visualize thevarious atoms in a protein, we identify the bond andtorsion angles ( κ, τ ) as the canonical latitude and lon-gitude angles on the surface of a unit radius (Frenet)sphere S α : The center of the sphere is located at theC α atom, the north-pole of S α is the point where thelatitude angle κ = 0, the vector t points to the directionof the north-pole and this direction coincides with the canonical z -direction in a C α centered Cartesian coordi-nate system. The great circle τ = 0 passes through thenorth-pole and the tip of the normal vector n that lies atthe equator, the longitude a.k.a. torsion angle takes val-ues τ ∈ [ − π, π ) and it increases in the counterclockwisedirection around the positive z -axis i.e. around vector t .In the sequel, whenever we introduce a Frenet sphere S we shall use the convention that the triplet ( n , b , t )corresponds to the right-handed Cartesian ( xyz ) ∼ ( rgb )coordinate system, with the convention that n ∼ x ∼ red( r ), b ∼ y ∼ green ( g ) and t ∼ z ∼ blue ( b ). The color cod-ing of distributions on the spheres are always relativebut with equal MatLab setting, in all the cases that wedisplay, and intensity increases from no-entry white tolow density blue, and towards high density red.We plot the directions of the peptide plane O and sidechain C β atoms on the surface of S α in exactly the wayhow they are seen from the position of a miniature ob-server, standing at the center of the sphere S α and withhead up towards the north-pole. For the O atoms weuse spherical coordinates that we denote ( θ, φ ) for thelatitude and longitude, for the C β atoms we denote thespherical coordinates ( ϑ, ϕ ) for the latitude and longi-tude on S α . Both sets of coordinates are in direct corre-spondence with ( κ, τ ).In Figure 3 we visualize the statistical reference distri-butions of the O and C β atoms, in crystallographic Pro-tein Data Bank structures that have been measured withbetter than 1.0 ˚A resolution. We choose these, since wetrust that such ultra high resolution structures are rela-tively void of reﬁnement.The O distribution is concentrated on a very narrowcircle-like annulus. It forms the base of a right circularcone with axis that coincides with the t vector and withconical apex at the center of the Frenet sphere; the lati-tude of the annulus is very close to θ = π/ π/

2. The regionsof α -helical and β -stranded regular structures are con-nected by a region of left-handed α -helical structuresand by a region of loops. There are very few entries out-side this circular region; notably the cis -peptide planeregion is located on a short strip under the β -strandedregion with a latitude angle close to θ ≈ π/ β distribution shown in Figure 3 is a little likea horse-shoe, forming a slightly distorted annulus thatis somewhat wider than in the case of the O distribu-tion. The regions of α -helical and β -stranded regularstructures are connected by a region of loops but thedistribution of left-handed α -helical structures is nowdisjoint from the main distribution.In Figures 4 and 5 we show the corresponding Anton distributions for the O an C β atoms, separately in thecase of villin and ww-domain. When we compare the Anton distributions and the ultra high resolution PDBdata of Figure 3, we observe that the overall structureof the statistical distributions are very similar. We alsonote that as expected, in the villin trajectory there is aclear predominance of the α -helical region, while in theFIG. 3: Color online:

Top: The ( θ,φ ) distribution of peptideplane O atoms in the below 1.0 ˚A resolution PDB structureson the surface of the Frenet sphere S α . Bottom: The ( ϑ,ϕ )distribution of C β atoms in the below 1.0 ˚A resolution PDBstructures on S α . In both Fgures we have identiﬁed the majorregions α -helices ( α ), β -strands ( β ), left-handed α -helices( α L ) and cis -peptide planes ( cis ). case of ww-domain the β -stranded region dominates.The PDB and Anton distributions are otherwise remark-ably similar, superﬁcially the only di ﬀ erence appears tobe due to thermal ﬂuctuations in the latter: The crystal-lographic data is often taken at liquid nitrogen temper-atures below 77 K while the simulation temperature inthe Anton data is around 360K for both villin and ww-domain.The strong similarity between the distributions in Fig-ures 3-5 motivates us to make the following bold pro-posal: Even in the case of a dynamical protein undernear-physiological conditions, the relative positions ofthe O and C β atoms can be reconstructed with high ac-curacy from the knowledge of the C α atoms motion - atleast when the CHARMM22 (cid:63) force ﬁeld is used to de- FIG. 4: Color online:

Top: The distribution of peptide planeO atoms in the villin trajectory of

Anton . Bottom: Thedistribution of peptide plane O atoms in the ww-domaintrajectory of

Anton . scribe the dynamics. E. Reconstruction

We proceed to try and reproduce the individual O andC β atom positions in the Anton trajectories of villin andww-domain [20], solely from the knowledge of the C α atoms. We employ four di ﬀ erent reconstruction meth-ods:

1. Pulchra

For

Pulchra [13] we use the version 3.04. We start withthe C α coordinates that we obtain from Anton . We thenuse

Pulchra to reconstruct the other heavy atom posi-tions, with only the instantaneous C α coordinates of theFIG. 5: Color online:

Top: The distribution of side chain C β atoms in the villin trajectory of Anton . Bottom: Thedistribution of side chain C β atoms in the ww-domaintrajectory of Anton . Anton trajectory as an input. From the

Pulchra struc-tures we read the coordinates of the peptide plane Oand side chain C β atoms, and compare with the origi-nal Anton data.

2. Remo

For

Remo [14] we use the version 3.0, and we proceedwith reconstruction as in the case of

Pulchra . We notethat

Remo employs

Scwrl for the side chains.

3. Scwrl4

For stand-alone

Scwrl [15] we use the version 4.0.Since

Scwrl can not reconstruct the peptide planes, weuse it only for the C β comparison. For this we ﬁrst con-struct the peptide planes using Pulchra , since

Remo isalready based on

Scwrl .

4. Statistical Method

Unlike the previous three methods, our

StatisticalMethod approach to reconstruction does not employany force ﬁeld reﬁnement, stereochemical constraints,or any other kind of data curation. It only uses sta-tistical analysis of PDB data to predict the positions ofthe peptide plane O and side chain C β atoms from theknowledge of the C α atom positions: Once the C α co-ordinates of an amino acid are given, a search algorithmﬁts it with a PDB structure and identiﬁes the ensuingO and C β atom coordinates as the reconstructed coordi-nates.As already stated, the PDB pool consist of all thosecrystallographic structures that have been measuredwith better than 1.0 ˚A resolution. We start with a vi-sual presentation of the C α bond and torsion angle den-sity distribution of these PDB structures, in terms of astereographically projected Frenet sphere.Let S α ( i ) be centered at the i th C α atom of a givenPDB structure. The vector t i has its tail at the centerof S α ( i ) and its head lies at the north pole, this vectorpoints from the i th C α atom towards the ( i + 1) st C α atom. The ( i + 2) nd C α is then located similarly, in thedirection of the Frenet vector t i +1 that points from thecenter of S α ( i + 1) towards its north pole.We parallel transport the vector t i +1 without any ro-tation until its tail becomes located at the i th C α atomposition i.e. at the origin of S α ( i ). Let ( κ i , τ i ) be theFrenet frame coordinates of the head of the paralleltransported t i +1 on the surface of S α ( i ). These coor-dinates depict how a miniature Frenet frame observer,standing at the position of the i th C α atom and head to-wards the north pole of S α ( i ), sees the backbone twist-ing and bending when she proceeds along the chain tothe position of the ( i + 1) st C α atom.We repeat the construction for all the C α atoms alongall the chains in our pool. This yields us a statistical dis-tribution in terms of the coordinates ( κ, τ ) of the headsof the parallel transported vectors t . We visualize thedistribution by projecting the sphere S α stereograph-ically onto the complex plane from the south pole asshown in Figure 6. The relation between the sphericalcoordinates ( κ, τ ) and the z = x + iy coordinates on theplane is x + iy = tan (cid:18) κ (cid:19) e iτ FIG. 6:

Color online:

Stereographic projection of two sphereonto plane, with the projection taken from the south-pole.

In Figure 7 we show the distribution of all the C α atom coordinates in our PDB data set, on the stereo-graphically projected Frenet sphere. The distribution isFIG. 7: Color online:

Statistical distribution of all C α atomsin our 1.0 ˚A pool of PDB structures, on stereographicallyprojected sphere shown in Figure 6. largely concentrated inside an annulus, with inner cir-cle κ in ≈ κ out ≈ π/

2. The various reg-ular secondary structures such as α -helices, β -strandsand left-handed α -helices are clearly identiﬁable andmarked in this Figure.We proceed to describe how we predict the positionsof the O and C β atoms from the knowledge of the C α atoms coordinates, along a given (dynamical) proteinstructure: We ﬁrst use the torsion angle τ ∈ [ − π, π ) to di-vide the statistical distribution of Figure 7 into 60 equalsize sectors sized ∆ τ = π/

30 radians. We then divideeach of these sectors into two sets, one with bond an-gle κ < . κ ≥ . κ = 1 . α -helix-like and β -strand-like (secondary) structures.Now suppose that we have a C α atom along a proteinbackbone, with coordinates ( κ i , τ i ). We determine thecoordinates ( θ i , φ i ) of the corresponding O atom and thecoordinates ( ϑ i , ϕ i ) of the corresponding C β atom usingthe following algorithm: Step 1:

We ﬁrst use the τ i value of the C α atom to selectthe pertinent sector τ i ∈ ∆ τ . We then use its κ i valuetogether with κ i +1 of the following C α atom along thechain, to assign with the given C α atom one of the four sets Set ∆ κ : κ i < . κ i +1 < . ∆ κ : κ i < . κ i +1 ≥ . ∆ κ : κ i ≥ . κ i +1 < . ∆ κ : κ i ≥ . κ i +1 ≥ . ∆ τ sectors this divides our statisti-cal C α distribution into 4 ×

60 subsets [ ∆ κ ; ∆ τ ], and wechoose the one that corresponds to the ( κ i , κ i +1 , τ i ) val-ues of the C α atom we consider. Step 2:

We search a protein structure in our pool, inthe subset [ ∆ κ, ∆ τ ] of the i th C α , for which two consec-utive amino acids are also identical to the i th and ( i +1) st amino acids of the protein structure that we consider. • If we ﬁnd only one matching pair of amino acids in thesubset [ ∆ κ, ∆ τ ], we use the coordinates of its O and C β atoms as the predicted coordinates of the O, C β atomsof the i th C α atom. • If there are two or more matching pairs, we use the av-erage value of their O and C β coordinates to determinethose of the O and C β around the C α of interest. • If there are no pairs of identical amino acids in thesubset [ ∆ κ, ∆ τ ], we use the average value of all PDBstructures in this subset to determine the O and C β co-ordinates. • Finally, if the subset [ ∆ κ, ∆ τ ] is empty we search forone from a neighboring subset, ﬁrst from preceding ∆ τ ,then from following ∆ τ , then from neighboring ∆ κ ; butsuch cases are highly exceptional. Step 3:

We repeat the process for all C α atoms alongthe chain. At the end of the chain there is no κ i +1 , thusat the end of the chain we use only the κ i value in oursearch.Our reconstruction algorithm is extremely simple andproceeds very fast computationally, much faster thanany of the other three reconstruction programs we con-sider, even though we have not optimized the search al-gorithm but use a straightforward MatLab code.The sector size ∆ τ can be changed and optimised;here we have chosen 60 sectors, only to exemplify themethod.The reason why we divide the original annulus intotwo using κ = 1 . α atoms, in thecase of a bond angle only three C α are needed; see Fig-ure 2. Thus, by engaging the two neighboring bond an-gle values, we employ the full information in all four C α atoms in our search algorithm.In Step 2, we calculate the average values of the anglesas follows: For the average latitude θ ave (similarly for ϑ ave ) we simply use θ ave = 1 N N (cid:88) i = k θ k where the summation is over all elements in the givensubset [ ∆ κ, ∆ τ ]. For the average longitude φ ave (simi-larly for ϕ ave ) we proceed as follows: We ﬁrst deﬁne X = 1 N N (cid:88) i = k cos φ k & Y = 1 N N (cid:88) i = k sin φ k and R = √ X + Y The average value is then obtained fromcos φ ave = XR & sin φ ave = YR F. Algorithm comparisons

1. Direction comparison

In order to compare the di ﬀ erent reconstructionmethods we introduce two unit length vectors −−−−→ CαO and −−−−→

Cαβ ; the former points from the C α atom to the follow-ing peptide plane O atom ( X = O in the sequel), and thelatter points from the C α atom to its side chain C β atom( X = β in the sequel). We evaluate these vectors for allresidues i = 1 , ..., N and for every structure k = 1 , ..., K in the Anton data. We also evaluate these vectors inall of the four reconstruction methods and in the sequely = P , R , S , M stands for

Pulchra (P),

Remo (R),

Scwrl4 (S)and the

Statistical Method (M) respectively.We deﬁne Θ yX [ i, k ] to be the angle between a vector −−−−→ CαX evaluated from the

Anton data, and the corre-sponding vector obtained from the corresponding re-construction method. The statistical distribution func-tion for all the values Θ yX [ i, k ] measures the overall accu-racy of a given method, for reconstructing the individ-ual O and C β atom positions.For each of the k = 1 , ..., K Anton chain structure, weevaluate the root-mean-square (RMS) value of the angles Θ yX [ i, k ] by summing over the N individual residues ofthe chain, RMS [ Θ yX ( k )] = (cid:118)(cid:117)(cid:116) N N (cid:88) i =1 (cid:16) Θ yX [ i, k ] (cid:17) (12)The distribution density (12) then measures the overallaccuracy of a method, in the reconstruction of the C α -O-C β chains.

2. Distance comparison

We also compare the algorithms by evaluating theRMS distance between di ﬀ erent atomic positions in the Anton trajectory and in the reconstructed trajectory. TheRMS distance between two chain structures is evaluatedfrom RMSD( X ; k ) = (cid:118)(cid:117)(cid:116) N N (cid:88) i =1 | r A,ki,X − r y,ki,X | (13)Here r A,ki,X is the

Anton data distance between the C α atom and the X = O , β atom at residue i in structure k , and r y,ki,X is the corresponding quantity in the y =P , R , S , M reconstructed structure. The probability dis-tribution of (13) is a complement of (12), as a measureof the overall accuracy of a method in the reconstructionof the entire chain in a statistical sense.For reference, in Figure 8 we present the combineddistribution of the Debye-Waller ﬂuctuation distances (cid:112) < | x | > = (cid:114) B π for the O and C β atoms, that we have evaluated usingthe B -factors in our 1.0 ˚A PDB pool; note the logarith-mic scale. The ﬂuctuation distances are strongly peakedat around 0.3 ˚A, and there are practically no structureswith a ﬂuctuation distance less than 0.12 ˚A.FIG. 8: Color online:

Combined B-factor ﬂuctuation distancesfor the O and C β atoms Similarly, Figure 9 shows the statistical distributionsin the individual C α -O and C α -C β distances that wecalculate directly from the coordinates in our PDB datapool and Anton data, respectively; only results for villinare shown as the results for ww-domain are very simi-lar. In the following we shall not reﬁne the C α -O andthe C α -C β distances, in our statistical method, the dif-ferences are in any case minor. Instead we simply usethe average PDB distance values ∆ r = 2 .

40 ˚A For C α − O distance ∆ r = 1 .

53 ˚A For C α − C β distancein our Statistical Method reconstruction. It is natural toallocate the larger variance in the case of

Anton data inFigures 9 to temperature ﬂuctuations.FIG. 9:

Color online: distribution of distances in ˚Angstr¨om,calculated directly from the coordinates in our PDB data and

Anton data for villin. Top: C α -O Bottom: C α -C β . III. RESULTSA. Peptide plane O atoms

We start with the peptide plane O atom distributionsin the

Anton data. We compare the

Anton results shownin Figure 4, with results from

Pulchra , Remo and

Statis-tical Method .

1. Frenet Spheres

In Figures 10 we present the reconstructed O atomdistributions on a Frenet sphere S , in the case of villin.The Figures display all the reconstructed O atom coor-dinates ( θ, φ ) for all the C α atoms of all Anton trajec-tories that we obtain using

Pulchra , Remo and

StatisticalMethod respectively.Overall, all three distributions reproduce well the Oatom villin distribution of the

Anton trajectory in Fig-ure 4 (top). In particular, the regular α -helical and β -stranded regions are clearly identiﬁable. We observethat both Pulchra and

Remo distributions are slightlywider than the PDB distribution, we propose that thisreﬂects mainly the presence of thermal e ﬀ ects in the An-ton data, as captured by these two methods. On theother hand, the

Statistical Method distribution is moreconcentrated. This is expected since the data pool is asubset of the PDB data shown in Figure 4 (top), whichhave all been measured at very low temperature values. FIG. 10:

Color online:

Top: Reconstructed peptide plane Odistribution for villin on Frenet sphere S . Top: Pulchra

Middle:

Remo

Bottom:

Statistical Method . We remind that the color coding is always relative butequal, in all the cases that we display, and intensity in-creases from no-entry white to low density blue, and to-wards high density red.In Figures 11 we present the reconstructed O distri-butions in the case of ww-domain. Again, all three dis-tributions are very much in line with the

Anton data ofFigure 4 (bottom). We observe some fragmentation andexcess (thermal) data spreading in the case of

Pulchra .The

Remo distribution also displays thermal spreadingwhile the

Statistical Method distribution is again moreconcentrated, as expected since it forms a subset of thelow temperature PDB data.We now analyze the reconstruction results for thepeptide plane O atoms in more detail, using the meth-ods of Section II F.

2. Individual angular probability densities for peptide planes

In Figures 12 and 13 we show the normalized proba-bility density distributions for all the individual angles Θ yO [ i, k ] along the Anton trajectories for

Pulchra , Remo and the

Statistical method in the case of villin and ww-domain O atoms, respectively.In both

Pulchra and

Remo the individual Θ yO [ i, k ] val-ues peak near the small value Θ max ≈ . Statistical Method the peak is located at the even smallervalue Θ max ≈ .

06 (rad), both in the case of villin andww-domain.From Figure 9 we learn that the average PDB distancebetween C α and O is around 2.4 ˚A. An angular devia-tion of Θ max ≈ .

06 then corresponds to a distance devi-ation ∼ .

14 ˚A which is smaller than the B-factor ﬂuc-tuation distances in Figure 8 and more in line with the(apparently purely thermal) distance deviations we ob-serve in Figures 9.We conclude that each of the three methods can re-construct the individual angular positions of the dy-namical

Anton

O atoms with very high precision. More-over, despite its simplicity the

Statistical Method per-forms even better than both

Pulchra and

Remo .In each of the probability distributions of Figures12, 13 we observe enhanced accumulation of data near Θ ≈ π/

2; the inserts show the probability densities for Θ -values above 1.0 (rad). We recall our interpretationof the Frenet sphere O distribution as the base of a cone,with the apex at the origin; the vertex angle has a valuevery close to π/

2. Thus the Θ ≈ π/ φ ∼

180 degree rotation around the (blue) t -vector in the Figures 10, 11. We observe that a rotationof the longitude φ by ∼ o exchanges the α -helicaland β -stranded regions according to top Figure 3.

3. Angular probability densities for entire chains

In Figures 14, 15 we show the probability density dis-tributions (12) for

Pulchra , Remo and

Statistical Method ,in the case of villin and ww-domain O atoms, respec-tively. We remind that the value of (12) is a measurefor the accuracy of the reconstruction, in the case of theentire chain. FIG. 11:

Color online:

Top: Reconstruction of peptide planeO distribution on the sphere S α for ww-domain. Top: Pulchra

Middle:

Remo

Bottom:

Statistical Method . In each case, the reconstructed chains appear to bevery close to the original

Anton chains, both in the caseof villin and ww-domain. The

Statistical Method per-forms best and the results for

Pulchra are quite similar,but for

Remo we observe a clear deviation from the

Sta-

Color online:

Probability density distribution for allindividual angles Θ yO [ i,k ] in the case of Anton villintrajectories. Top:

Pulchra

Middle:

Remo

Bottom:

StatisticalMethod . tistical method ; the results from the latter are visibly bet-ter.Note that in the case of both Pulchra and

StatisticalMethod , both the villin and the ww-domain proﬁles ap-pear to resemble a combination of two distinct Gaussiandistributions. On the other hand, in the case of

Remo thedistribution is like a single Gaussian (thermal spread),in both cases.

4. RMSD probability densities for entire chains

In Figures 16, 17 we show the probability density dis-tributions for the RMS distances (13), evaluated for theentire C α -O reconstructed chains that we obtain using Pulchra , Remo and

Statistical Method in the case of villinand ww-domain

Anton trajectories.Again, the reconstruction by the

Statistical Method isclosest to the original

Anton result. The di ﬀ erence to Pulchra is very small but there is a more visible di ﬀ er- FIG. 13: Color online:

Probability density distribution for allangles Θ yO [ i,k ] along the Anton ww-domain trajectories. Top:

Pulchra

Middle:

Remo

Bottom:

Statistical Method . ence to Remo .Even though all the RMS distances between the O-chains shown in Figures 16 and 17 are small, they areslightly larger than what we can expect on the basis ofthe individual (O and C β ) atom B-factor ﬂuctuation dis-tances in Figure 8. In line with Figures 14, 15, both Pul-chra and

Statistical Method distributions again exhibit adouble Gaussian proﬁle; In the case of

Remo the distri-butions form a single Gaussian which is more in linewith a thermal spread, except that the peak is close to1.0 ˚A which is a somewhat large value in comparison tothe

Statistical Method where the two peaks are slightlybelow values 0.6 ˚A and 0.8 ˚A i.e. close to the cova-lent radius ∼ .

66 ˚A of O atom. But even in the caseof

Remo , the deviation distances can be considered to bequite small and we consider the results to be good.1FIG. 14:

Color online:

The probability density distributions(12) for villin. Top:

Pulchra

Middle:

Remo

Bottom:

StatisticalMethod .

5. Peptide plane ﬂip

In Figures 12 and 13 we have noted an accumulationof entries near Θ ∼ π/

2, corresponding to a ∼

180 degreerotation of the entire probability distribution aroundthe t vector. Consequently these entries contribute max-imally to the deviations between the Anton chain andthe reconstructed chains, in terms of statistical distribu-tions. We note that Θ ∼ π/ ∼ . Anton entries for which Θ > . all of the three reconstructionmethods; the spatial distance for two entries that are Θ ∼ . ∼ Θ > . An-ton data from PDB structures. There are a total ofaround 6.000 such entries, this corresponds to a mere0.6 per cent of the total entries that we have analyzed.We evaluate the average values of Θ for all these entries.The probability distribution for these average values isconcentrated very close to Θ = π/ Color online:

The probability density distributions(12) for ww-domain. Top:

Pulchra

Middle:

Remo

Bottom:

Statistical Method .

18 (top). The entries are also distributed quite evenlyaround the O-circle on the Frenet sphere (Figure 18)(bottom), they seem to appear quite randomly duringthe

Anton time evolution, and correspond to very sud-den and short-lived 180 degree back-and-forth rotations(ﬂips) of the entire peptide plane, around the virtualbond that connect two consecutive C α atoms.From the available Anton data, we are unable toconclude whether these peptide plane ﬂips are gen-uine physical e ﬀ ects with an important role in pro-tein folding, or whether they are mere simulation ar-tifacts, or whether they are e ﬀ ects that are speciﬁc tothe CHARMM22 (cid:63) force ﬁeld [21]. For example, it ap-pears that in the Anton simulations the peptide plane N-H covalent bond lengths are constrained to have a ﬁxedvalue. That may a ﬀ ect the stability of peptide planes,causing them to ﬂip by 180 degrees.To properly understand the character of these pep-tide plane ﬂips one needs to perform more detailed allatom simulations, presumably with shorter time stepsthan used in Anton simulations and with no length con-straints on the N-H covalent bonds.2FIG. 16:

Color online:

The probability density distributionsfor the RMS distance (13) over the O atoms in the case ofvillin. Top:

Pulchra

Middle:

Remo

Bottom:

Statistical Method .

6. Double Gaussians

In Figures 14-17 we have observed the presence of adouble Gaussian peak structures, in the

Pulchra and

Sta-tistical Method distributions. We now proceed to try andidentify the cause. We present a detailed analysis of thedouble Gaussian structures in the case of the

StatisticalMethod

RMSD distributions in Figures 16 and 17; theanalysis in the other cases is similar, with similar con-clusions.In [20] the following quantity Q ( t ) = (cid:80) N res i =1 (cid:80) N i j =1 (cid:20) e d ij ( t ) − ( d ij +1)) (cid:21) − (cid:80) N res i =1 N i (14)has been introduced, to characterize the distance of adynamical chain at time t , to an experimentally mea-sured folded state. Here N i is the number of contactsof residue i along the chain as deﬁned in [20], d ij ( t ) isthe distance in ˚A between the C α atoms of residues i and j at time t and d ij is the same distance in the na- FIG. 17: Color online:

The probability density distributionsfor the RMS distance (13) over the O atoms in the case ofww-domain. Top:

Pulchra

Middle:

Remo

Bottom:

StatisticalMethod . tively folded, crystallographic structure. According to[20] a structure with Q > . Q < . Statistical Method peaks in thevillin distribution Figure 16 and in the ww-domain dis-tribution Figure 17. In both cases, we analyze separatelythe low RMSD subsets with values below 0.54 ˚A invillin and below 0.6 ˚A in ww-domain, and the highRMSD subsets with RMSD values above 0.77 ˚A in villinand above 0.8 ˚A in ww-domain. These four subsets areidentiﬁed by the dashed lines in the Figures. In Figures19 we show the probability density distributions for thevalues of (14) that we evaluate for these subsets. Both inthe case of villin and ww-domain the low-RMSD peakcorresponds to large values of Q which is characteristicto near-folded state, while the large-RMSD peak corre-sponds to predominantly small values of Q which arecharacteristic to unfolded states.3FIG. 18: Color online:

Top: The distribution of average valueof angular deviations for the reconstructed entries with Θ > . FIG. 19:

Color online: Statistical Method probabilitydistributions for the Q -values (14), corresponding to theGaussian peaks in Figures 16 and 17 a) low-RMSD villin, b)high-RMSD villin, c) low-RMSD ww-domain, d) high-RMSDww-domain. B. Side chain C β atoms We proceed to describe results from the C β atom re-construction. Due to the high accuracy in all the meth-ods we use, the presentation is less detailed than in thecase of peptide plane O atoms: The di ﬀ erences to Anton simulation results are minuscule.We investigate reconstruction results in four ap-proaches:

Pulchra , Remo , Pulchra+Scwrl4 , and our

Sta-tistical method . We note that

Remo commonly constructsthe side chain atoms using

Scwrl while

Pulchra employsits own side chain reconstruction. Thus we have added a

Pulchra+Scwrl4 combination, where the peptide planesare ﬁrst constructed with

Pulchra and then side chainsare constructed with

Scwrl4 . This combination shouldbe of interest, since we have found that

Pulchra performsslightly better than

Remo in the reconstruction of the

Anton peptide plane O atom positions.

1. Frenet spheres

In Figures 20 and 21 we have the Frenet sphere prob-ability distributions for the four methods, in the case ofvillin and ww-domain respectively.FIG. 20:

Color online:

Reconstructed side chain C β distribution for villin on the Frenet sphere S α . Figure a) Pulchra , Figure b)

Remo , Figure c)

Pulchra+Scwrl4 , Figure d)

Statistical Method

Pulchra reconstructs quite accurately the

Anton distri-butions of the C β atoms, shown in Figure 5. In partic-ular, it reconstructs the region of left-handed α -helices. Pulchra also broadens the overall shape of the distribu-tion in a manner which, at least superﬁcially, appears toaccount for the thermal ﬂuctuations that are present inthe

Anton distributions.

Pulchra and

Scwrlt4 combination increases the spreadof the C β distributions, there is now an even betterresemblance between the Anton distributions with the(perceived) thermal ﬂuctuations.

Remo reconstruct the C β distributions in terms of veryconcentrated distributions that are highly peaked at the4FIG. 21: Color online:

Reconstructed side chain C β distribution for ww-domain on the Frenet sphere S α . Figurea) Pulchra , Figure b)

Remo , Figure c)

Pulchra+Scwrl4 , Figured)

Statistical Method C α and C β regions, with no observable thermal spread-ing: Remo appears to reconstruct the C β positions in avery straightforward two-stage fashion. In particular,there is a marked di ﬀ erence between the Pulchra+Scwrl4 and

Remo distributions even though

Remo apparentlyuses

Scwrl for side chain reconstruction [14].

Statistical Method selects a subset of the full PDB dis-tribution in Figure 3 (bottom), as expected. Since thePDB data has been mostly measured at liquid nitrogentemperatures below ∼

77 K, the distributions do not dis-play thermal spreading.

2. Individual angular probability densities for side chains

In Figure 22 we have combined the probability dis-tribution functions for the individual angles Θ y β [ i, k ] inthe case of C β , for all the four reconstruction methodswe consider. We observe very little di ﬀ erence betweenthe four methods, all distributions are strongly peakednear Θ ≈ .

15 (rad). According to Figure 9 (bottom) thePDB average for the C α -C β bond length is around 1.54˚A. Thus a Θ ≈ .

15 (rad) angular deviation correspondsto an average distance deviation of around 0.2 ˚A whichis well within the the limits of the individual B-factorﬂuctuation distances in Figure 8. FIG. 22:

Color online:

Probability distribution for theindividual angular deviations Θ yβ [ i ; k ] in the fourreconstruction methods.Top: villin and Bottom: ww-domain.

3. RMS Probability densities for reconstructed chains

Figures 23 show the C β probability distributions of(12) and Figures 24 show the C β probability distribu-tions of (13), for the reconstructed chains. Generallyspeaking, all four methods are able to reconstruct theC β positions with high accuracy. The Statistical Method performs best but the di ﬀ erence to Remo is tiny. For

Pul-chra the results are slightly worse: Its combination with

Scwrl4 performs a bit better than

Pulchra alone in thecase of villin, but in the case of ww-domain the resultsare opposite. However, the di ﬀ erences between all thefour methods are quite small, and the RMS distancesare all in line with the experimental B-factor ﬂuctuationdistances shown in Figure 8. IV. CONCLUSIONS

To comprehend protein dynamics is a prerequisite forthe ability to understand how biologically active pro-teins function. However, despite the importance of pro-tein dynamics, our knowledge remains very limited.High quality experimental data on dynamical proteinsat near-physiological conditions is sparse and very dif-ﬁcult to come by, the primary source of information istheoretical considerations in combination with all atommolecular dynamics simulations; the latter are best ex-empliﬁed by the very long

Anton trajectories [20]Here we have searched for systematics in the dynam-5FIG. 23:

Color online:

Probability distribution for the angularRMS values (12) in the four reconstruction methods. Top:villin and Bottom: ww-domain.

FIG. 24:

Color online:

Probability distribution for the RMSdistances (13) in the four reconstruction methods. Top: villinand Bottom: ww-domain. ics of proteins at near-physiological conditions, by ana-lyzing the

Anton trajectories for villin and ww-domain.We have inquired to what extent can the dynamicsof C α atoms determine that of the peptide plane O atoms and the side chain C β atoms. For this we havecompared the original simulation results with trajecto-ries that we have reconstructed solely from the knowl-edge of the Anton C α trace. We have analyzed the re-sults from the reconstruction approaches Pulchra , Remo , Scwrl and a direct

Statistical Method that we developedhere. All these methods exploit crystallographic ProteinData Bank structures which have been measured mostlyat the very low temperatures of liquid nitrogen i.e. be-low 77 K. On the other hand, the

Anton simulations havebeen performed at around 360K. Thus we expect thatbesides e ﬀ ects with a purely dynamical origin, thereshould be systematic di ﬀ erences that can be allocatedto thermal ﬂuctuations. Nevertheless, we have foundthat the positions of both O and C β atoms in the Anton trajectories can be determined with very high precisionsimply by using the knowledge of the static, crystallo-graphic PDB structures; both dynamical and thermal ef-fects are surprisingly small. The results propose that thepeptide plane and side chain dynamics is very stronglyslaved to the C α atom motions, and subject to only verysmall thermal ﬂuctuation deviations.Our results can be explained in di ﬀ erent ways: Itwould be truly remarkable if in a dynamical proteinat near-physiological conditions, the O and C β motionscan indeed be determined, and with a very high preci-sion, solely from a knowledge of the C α atom dynamics.Such a strong slaving to C α dynamics would be a verystrong impetus for the development of e ﬀ ective energymodels for protein dynamics, in terms of reduced sets ofcoordinates at various levels of coarse graining. Alterna-tively, it can also be that the force ﬁeld CHARMM22 (cid:63) that was used in the Anton simulations [20], simplylacks the resiliency of actual proteins. In that case ourresults can shed light for ways to improve the accuracyof existing force ﬁeld, and help to determine more strin-gent standards for simulations. Indeed, we have ob-served the presence of very short-lived but systematicpeptide plane ﬂips along the

Anton trajectories. Theseﬂips could be true physical e ﬀ ects that are important toprotein folding and dynamics. But they could as wellbe a consequence of too harsh simulation obstructions,such as the use of too long elemental time steps and/orexclusion of all ﬂuctuations in the hydrogen covalentbond lengths. We have also observed, in the case of both Pulchra and

Statistical Method , the presence of an appar-ent two-state structure in the O atom distributions, thatseems to correlate with the distance between the dy-namical structure and the natively folded state. In thecase of

Remo no such two-stage structure is observed.Quite unexpectedly to us, the purely PDB based

Sta-tistical Method appears to perform best in reconstruct-ing the O and C β atom positions. We suspect that thisis partly due to the choice of C α framing: The framingsthat are used in the case of Pulchra and

Remo are math-ematically correct, but might not account to the C α ge-ometry as well as the Frenet framing does. It is possi-ble that variants of Pulchra and

Remo that are based on6Frenet framing, could bring about even higher precisionthan any of the methods we have analysed. Thus, ourresults should help the future development of increas-ingly precise reconstruction algorithms, for a wide spec-trum of reﬁnement and structure determination pur-poses. It should also be of interest to extend the

Sta-tistical Method for the analysis of all other heavy atomsalong a protein structure, possibly following [27].Finally, we note that the visual analysis methodologythat we have developed is very versatile. It can be ap-plied to analyze protein structure and dynamics, widely.

V. ACKNOWLEDGEMENTS

We thank N. Ilieva for communications and discus-sions, AJN also thanks A. Liwo for discussions. Thework of AJN has been supported by the Qian Renprogram at BIT, by a grant 2013-05288 and 2018-04411from Vetenskapsr˚adet, by Carl Trygger Stiftelseand and by Henrik Granholms stiftelse. [1] T.A. Jones, J.Y. Zou, S.W. Cowan, M. Kjeldgaard, ActaCryst.

A47

110 (1991)[2] I. Sillitoe, A.L. Cu ﬀ , B.H. Dessailly, N.L. Dawson, N.Furnham. D. Lee, J.G. Lees, T.E. Lewis, R.A. Studer, R.Rentzsch, C. Yeats, J.M. Thornton, C.A. Orengo, NucleicAcids Res. , D490 (2013)[3] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, J.Mol. Biol.

536 (1995)[4] A. Roy, A. Kucukural, Y. Zhang, Nature Protocols

145 (2009)[7] K. Dill, S.B. Ozkan, T.R. Weikl, J.D. Chodera, V.A. Voelz,Curr. Op. Struct. Biol.

342 (2007)[8] H.A. Scheraga, M. Khalili, A. Liwo, Ann. Rev. Phys.Chem.

57 (2007)[9] S. Kmiecik, D. Gront, M. Kolinski, L. Wieteska, A.E.Dawid, A. Kolinski, Chem. Rev. , 7898 (2016)[10] L. Holm, C. Sander, Journ. Mol. Biol.

183 (1991)[11] M.A. DePristo, P.I.W. de Bakker, R.P. Shetty, T.L. Blundell,Prot. Sci.

437 (2003)[13] P. Rotkiewicz, J. Skolnick, Journ. Comp. Chem.

665 (2009)[15] G.G. Krivov, M.V Shapovalov, R.L. Dunbrack Jr., Proteins

964 (2007) [17] H. Frauenfelder, G. Chen, J. Berendzen, P.W. Fenimore,H. Jansson, B.H. McMahon, I.R. Stroe, J. Swenson, R.D.Young, PNAS

163 (2011)[19] S. Khodadadi, A.P. Sokolov, Soft Matter ﬀ -Larsen, S. Piana, R. O. Dror, D.E. Shaw. Sci-ence ﬀ -Larsen, D.E. Shaw, Biophys. J. L47(2011)[22] J. Janin, S. Wodak, M. Levitt, B. Maigret, J. Mol. Biol.

357 (1978)[23] S.C. Lovell, J. Word, J.S. Richardson, D.C. Richardson,Proteins

389 (2000)[24] H. Schrauber, F. Eisenhaber, P. Argos J. Mol. Biol.

27 (2014)[28] M. Sasai, P.G. Wolynes, Phys. Rev. Lett. E83139