[PDF] Conformational Biases of α-Synuclein and Formation of Transient Knots

Abstract

We study local conformational biases in the dynamics of {\alpha}-synuclein by using all-atom simulations with explicit and implicit solvents. The biases are related to the frequency of the specific contact formation. In both approaches, the protein is intrinsically disordered, and its strongest bias is to make bend and turn local structures. The explicit-solvent conformations can be substantially more extended which allows for formation of transient trefoil knots, both deep and shallow, that may last for up to 5 {\mu}s. The two-chain self-association events, both short- and long-lived, are dominated by formation of contacts in the central part of the sequence. This part tends to form helices when bound to a micelle.

Full PDF

aa r X i v : . [ q - b i o . B M ] F e b Conformational Biases of α -synuclein andFormation of Transient Knots Mateusz Chwastyk ∗ and Marek Cieplak Institute of Physics, Polish Academy of Sciences, Al. Lotnik´ow 32/46, PL-02668 Warsaw,Poland

E-mail: [email protected]

Abstract

We study local conformational biases in the dynamics of α -synuclein by using all-atomsimulations with explicit and implicit solvents. The biases are related to the frequencyof the speciﬁc contact formation. In both approaches, the protein is intrinsically dis-ordered and its strongest bias is to make bend, and then turn, local structures. Theexplicit-solvent conformations can be substantially more extended which allows forformation of transient trefoil knots, both deep and shallow, that may last for up to 5 µ s. The two-chain self-association events, both short- and long-lived, are dominatedby formation of contacts in the central part of the sequence. This part tends to formhelices when bound to a micelle. Introduction α -Synuclein is a 140-residue protein that is found in the mammalian brain as both a solubleand membrane-associated molecule. It is highly expressed in the mitochondria and is active ∗ corresponding author

1n the presynaptic region, where it is involved in the regulation of synaptic vesicle release,vesicle traﬃcking, and fatty-acid binding. It is also a chaperon protein that aﬀects the degra-dation and assembly of the presynaptic SNARE complex. α -Synuclein has a propensity toform self-propagating ﬁbrils that are present in the brain of most of the Parkinson-diseasepatients. It is unclear whether toxicity arises as a consequence of the ﬁbers themselves orof the presence of oligomeric intermediates, which appear en route to ﬁber formation. The structural properties of the monomeric α -synuclein are then of great interest andalso a matter of uncertainty. When bound to a micelle, the protein is partially structured(the structure code PDB:1XQ8), as evidenced by NMR studies: the segments 2-37 (denotedhere as h ) and 45-92 ( h ) are α -helical. They form a hairpin linked by a loop between thehelices. The N-terminal region is charged as it is lysine-rich. This feature enables interactionswith a membrane and makes the region disordered. However, in the absence of binding tothe micelle, both the tryptophan ﬂuorescence and NMR studies indicate the existence ofintrinsic disorder throughout the monomeric sequence. α -Synuclein has also been studied by molecular dynamics simulations. Several of them, done within all-atom explicit solvent schemes, lead to the conclusion that α -synuclein is in-deed disordered since the simulations reveal the existence of rapid temporal changes in thelocal conformational biases. The biases can be determined through the DSSP scheme. Inparticular, the occurrence of the β structures in single chains is low which does not favordimeric aggregation.Here, we study the free protein by all-atom simulations of two kinds: with the explicitand implicit solvents. We determine the time-averaged probabilities for a residue i to be-long to a particular secondary structure and demonstrate that both approaches indicate thepresence of the all-sequence disorder. These probabilities are found to be the largest for the2end and turn local conformations whereas the helical ones rank the third.The very sequential pattern of the probabilities is found to depend on the nature of thesolvent. More importantly, we demonstrate that the interactions with the molecules of watergenerally introduce a qualitative diﬀerence to the physics of α -synuclein: the explicit solventconformations tend to be substantially more extended. Such conformations allow for muchmore frequent global opening and closing events, resulting in higher probability to formknotted backbones. In the implicit-solvent case, we observe no knots. In contrast, in onetrajectory calculated with the explicit solvent we observe a deeply knotted trefoil topologythat lasts for about 5 µ s. In another trajectory, we observe a series of shallow knots thatemerge and disappear within a similar time span.While the presence of knots in many structured proteins is well established by now, this is not so in the disordered case. One diﬃculty is related to the fact that the disor-dered conformations are not readily available and have to be generated through computersimulations. Another is that the knotted structures are only transient – in our simulations,they last for of order 5 µ s. However, as we discuss at the end of the paper, even then theymay impede proteasomal degradation and thus enhance accumulation of the proteins. Weexpect that other intrinsically disorder proteins may also tangle to the knotted forms, butthe corresponding characteristic time scales for their stability should be protein dependent.The propensity to form transient local structures is related to the temporary establish-ment of inter-residue interactions that can be described, in the coarse-grained sense, as theestablishment of inter-residue contacts. Our deﬁnition of a contact is based on the existenceof overlaps between eﬀective spheres associated with the heavy atoms belonging to separateresidues (see the Methods section). We ﬁnd that the most likely contacts are those thatinvolve the N-terminus (e.g. by linking to the h region) and those that occur within the h α -synuclein chains and ﬁnd thatthe most frequent contacts are those that connect h in one chain with h in another chain.The next in importance are those that connect h with h . These ﬁndings are consistentwith the expectation that the capacity for association is related to the central region 65-90.The absence of this region has been shown experimentally to impede oligomerization andﬁbrilogenesis. Interestingly, we also ﬁnd that the dimeric inter-chain contacts are distinctfrom those that are likely to arise during the single-chain dynamics. The involvement of the h and h regions, that are helical when bound to the micelle, seems to echo the results ofstudies pertaining to the tetrameric aggregation which appear to lead to the emergenceof a ring of helices (especially when the partially structured PDB:1XQR conformation isused to generate the starting states). Such ring states resist any further aggregation andmay serve to store excessive α -synucleins in the cell. Methods

Our implicit-solvent all-atom simulations were conducted using the NAMD code versionCVS-2013-11-07 for Linux-x86 64-MPI with the CHARMM36n force ﬁeld that is reﬁnedto be applicable to the intrinsically disordered proteins. The snapshots of the conformationswere plotted by using the VMD package. The simulations were performed with the use ofthe Generalized Born Implicit Solvent method. In the single-chain simulations, we usedfree boundary conditions. The cut-oﬀ radius for non-bonded interactions was set to 1.4 nm.We have generated 5 trajectories, each lasting for 30 ns.The ﬁrst part of the simulationsinvolved energy minimization of the system for 5 000 conjugate gradient steps. In the sec-ond part, the system was heated up to T of 298 K. The temperature was controlled by the4tandard Langevin algorithm. The time step was 1 fs. The conformations of the proteinwere captured every 2 ps. The 5 starting conformations were obtained by random selectionfrom 5 diﬀerent points of the ANTON-derived trajectory.When studying the association processes of multiple chains, we switch to periodic bound-ary conditions and use the smooth Particle Mesh Ewald procedure to treat the long rangeelectrostatics. The procedure is a variant of the Particle Mesh Ewald method as imple-mented in NAMD. The grid spacing is 0.16 nm. In the simulation of two chains, we usedthe cubic simulation box with the edge length of 60.0 nm. We prepared 50 starting systemsin the following way. We ﬁrst chose all possible pairs from the set of the initial structuresthat were simulated as single chains. This yields 10 diﬀerent sets of pairs. For each of them,we prepared 5 diﬀerent conformations by changing the mutual orientations of the chains byusing the PyMol software. Each starting system was simulated for 30 ns which yielded 15000 frames. Altogether, we have considered 750 000 conformations.Independent of the number of chains, the conformations can be characterized by their in-stantaneous contact maps. The contacts may link residues within chains and between chains.We determine their presence by checking if there are atomic overlaps between residues.Speciﬁcally, each non-hydrogen heavy atom is represented by a sphere of radius equal to theatom’s van der Waals radius enlarged by a factor of 1.24 to account for attractive interactions(the factor corresponds to the inﬂection point in the Lennard-Jones potential). The valuesof the van der Waals radii are taken from ref. A contact between residues i and j is saidto exist if the two residues contain at least two spheres, one belonging to i and the other to j , that overlap. In the terminology of ref. (Wolek et al.), this procedure is denoted as OV.We have successfully used such contact maps in the studies of protein stretching. esults & Discussion We have obtained the implicit-solvent results through ﬁve 30 ns NAMD-based simulations, as described in the Methods section. The explicit-solvent trajectories have been derived byRobustelli et al. when developing a set of novel molecular force ﬁelds that would be ade-quate for both structured and disordered proteins. α -Synuclein was one of several proteinsconsidered in ref. in the tests. These novel force ﬁelds were designed to be used on thespecial purpose supercomputer ANTON. For each of the considered force ﬁelds, α -synucleincomes out as a fully disordered protein. In particular, the local α -helical content, thoughnon-zero in certain parts of the sequence (e.g. around residue 20), has been found never toexceed 45% for any of the force ﬁelds used. Here, we analyze two trajectories, one 30- µ s andthe other 20- µ s long, that were obtained using the C22* force ﬁeld and the TIP4P-D watermolecules.An earlier extensive explicit solvent study by Sethi et al. involved GROMACS, theOPLS-AA force ﬁeld, and a division of the protein into weakly interacting modules. Ourresults on the propensities to form helices and β -bridges are consistent with this study (whenthe starting conformations are random). However, the novelty of our research in this context,also compared to that of ref., is showing that the dominant structural bias is for the T andS secondary structures and that the structural biases can be explained in terms of contactmaps. The transient secondary structures

Here, we consider the probabilities, P λ , of forming secondary structures, λ , at the sequen-tial residue i . In addition to the turn (T) and bend (S), the secondary structures detected arehelix (H), β -bridge (B), and helix 3 (G). We ﬁnd no appreciable probabilities for strands(E) and helices- π (I). When there is no detectable structure at a residue, we refer to it as a6oil state (C).The time-averaged data shown in the left panels of Figure 1 is based on the explicit-solvent30 µ s simulation performed on the special purpose ANTON supercomputer at D.E.Shaw Re-search in New York. The P H ( i ) function for the C22*/TIP4P-4D force ﬁeld never exceeds20%. Neither does P B ( i ). P G ( i ) does not exceed 10%. It is seen that the dominant tendencyis to form the local S and T structures, sometimes in the same segment. The system isclearly disordered because the various structural propensities switch in time and compete atessentially all residue. At about 8% of the residues, P C ( i ) is under 50% (the level indicatedby the dotted horizontal line) but at all remaining residues the lack of any detectable orderis dominant behavior.The right panels of Figure 1 show similar data for the all-atom 5 ×

30 ns implicit-solventsimulation. They are qualitatively similar in that the strongest structural propensities arefor the S and T local arrangements. However, the overall propensities for making secondarystructures are stronger than those shown in Figure 1. In particular, at most residues, thepropensity to show no structure (C) is smaller than 50%. In most cases, the starting con-formations were chosen randomly from the ANTON-derived trajectory. In this way, the twokinds of simulations are expected to be in similar regions of the phase space. Thus we con-clude that the larger structural propensities are due to the lack of the molecules of water. Inorder to quantify the diﬀerences between the explicit and implicit solvent results, we residueaverage P λ along the sequence. For the explicit solvent case, we obtain 0.026 (H), 0.096(T), 0.179 (S), 0.028 (B), 0.022 (G), 0.000 (I), 0.033 (E), and 0.614 (C) where the lettersin brackets indicate the type of the secondary structure. For the implicit solvent case, weobtain 0.062 (H), 0.161 (T), 0.243 (S), 0.025 (B), 0.029 (G), 0.000 (I), 0.025 (E), and 0.455(C). The biggest shift is seen to be for the C-content. Independent of the model, the largestordering tendency is for S. 7he top-right panel of Figure 1 also shows results for P H ( i ) for 5 trajectories that startfrom the PDB:1XQ8 conformation. They are seen to boost P H ( i ) substantially, indicating astrong sensitivity to the initial conditions within the simulational times considered.Even though the implicit-solvent simulations yield results that are qualitatively similarto those of the explicit-solvent ones, other than the general shift toward more ordering, thereis a sequential displacement in the patterns. When one plots the ratio of P λ ( i ) obtained withthe implicit solvent to P λ ( i ) with the explicit solvent then, for any λ , there is a signiﬁcantpatterning as a function of i (not shown).Panels A and B of Figure 2 show 20 of the most likely contacts that form during thesingle-chain evolution. The panels are for the explicit and implicit solvent cases respectively.There is a qualitative similarity in their looks but the diﬀerences in the details of the patternare also evident. There are four contacts appear in both panels: 5-8, 19-22, 102-140, and1-140. Furthermore, contact 121-139 in panel A is almost identical to contact 121-140 inpanel B. However, 15 other contacts are distinct. Despite the diﬀerences, all of these topcontacts are seen to be related to the attempts to construct the ﬁrst helix and to connectthe N-terminal part with the center region. The increased number of strong contacts in theN-terminal segment extending up to residue 36 explains the larger tendencies to form helicesin the implicit-solvent case. The bottom-left panel of Figure 2 show the probability densitycorresponding to all contacts that have been detected in the single-chain simulations. Thecontact maps corresponding to the implicit- and explicit-solvent simulations are seen to bedistinct. The geometrical characteristics of conformations R g - L plane, where R g denotes the radius of gyration and L the end-to-end distance. Theregions that are the most frequently visited are expected to correspond to the lowest freeenergy. The left panels of Figure 3 shows such plots for the implicit- and explicit-solventsystems considered (top and bottom respectively). The landscapes are seen to have sub-stantial regions that overlap but the highest frequency regions (marked in red) are shiftedsigniﬁcantly between the two descriptions. This is seen especially in the values of R g : theexplicit-solvent conformations are much less globular. This feature is also reﬂected in thedistributions of R g and L shown in the right panels of Figure 3. The distribution of R g is double peaked for both systems but the maxima of the peaks are shifted upward in theexplicit-solvent case. The distribution of L is single-peaked but the peak is also upward-moved.The top-right panel of Figure 3 allows for making comparisons with experiments. RecentSAXS study of α -synuclein obtained from human blood cells yields the average R g between33.1 and 33.3 ˚A depending on the buﬀer used (pH=7.4). This is consistent with our explicit-solvent value of 36.2 ˚A compared to 25.8 ˚A indicating that the explicit-solvent approach iscloser to the reality. It should be noted that the same SAXS study yields results that dependstrongly on the buﬀer (between 27.2 and 42.7 ˚A) if one uses a recombinant α -synuclein. Theauthors attribute this fact to the ”harsh” treatment used in the preparatory work of suchproteins. Dynamics of the knot formation

We test the presence of knots by using the KMT algorithm.

We have not spottedany knots in the implicit-solvent trajectories. However, the much more expansive (on the R g - L plane) explicit solvent trajectory is found to have a 3 µ s long time segment in which a9hallow trefoil knot keeps forming and disappearing very much like what happens with theshallowly knotted proteins at the air-water interface. This can be illustrated by showingthe sequential locations of the knot ends n and n . The knot ends are determined by trun-cating the sequence from both termini until coming to a stage at which the knot dissolves.Figure 4 shows that the locations of n and n ﬂuctuate but their distances from the closesttermini (N for n and C for n ) never exceed the span of 10 residues and sometimes can beas short as 2 residues. It should be noted that for most of the discussed segment of time, L varies little and stays in the upper range of the values shown in the left panels of Figure 3.Figure 5 shows an example of knot formation by direct threading (the panel correspond-ing to 3492) or slipknotting (5523-5526 ns) and unknotting by the slipknot mechanism. Aslipknot conformation is one in which a bend segment of the backbone pierces through abackbone (knot) loop, It transforms into a knot on straightening the segment. We have also considered a second available trajectory that is 20 µ s long. Figure 6 showsthat in nearly a quarter of the duration of this trajectory there is a stable deeply knottedtopology. The knot ends of the corresponding conformations keep sliding to an extent, asshown and explained in Figure 7.It should be pointed out that the knot formation in both trajectories is kinetically drivenand that the knots, though fairly long lasting, are transient.In an eﬀort to assess the consistency of the knotted conformations from the a99SB-disp and c22*/TIP4P-D explicit solvent trajectories with previous experiments, we back-calculated several NMR observables from the knotted conformational ensembles and com-pared them to a suite of previously reported NMR measurements in Table 1. These mea-surements include chemical shifts, residual dipolar couplings (RDCs) and backbone scalar10oupling constants which report on local backbone conformations, as well as paramagneticrelaxation enhancements (PREs) which report on transient tertiary contacts. We foundthe knotted portions of the c22*/TIP4P-D and a99SB-disp trajectories show slightly worseagreement with experimental measurements compared to their parent trajectories, but are insubstantially better agreement than simulations run with similar force ﬁelds and the TIP3Pand TIP3P-CHARMM water models. This comparison suggests that a sub-population ofknotted conformations similar to those observed in these simulations would not be grossly in-consistent with pervious experimental measurements, though the populations may be smallerthan those observed in these trajectories, particularly in the c22*/TIP4P-D trajectory. Self-association in the implicit solvent approach

We have considered self-association of two α -synuclein chains under fairly dilute condi-tions as described in the Methods section. A dimer is considered to be formed if there is atleast one contact that connects the chains. The C panel of Figure 2 shows that the residue-residue contacts that are the most engaged in coupling two chains together, as assessedthroughout the evolution, are not those which drive the formation of transient secondarystructures in the individual chains. However, they are consistent with the experimental ﬁnd-ings discussed in the introduction. Figure 8 shows an example of association in which 7 ofthe 10 top most frequent contacts are present. They are indicated by the red lines and theylink the center parts of the chains. The bottom-right panel of Figure 2 shows the full contactmap – the over-all pattern is seen to be distinct from the one obtained for the single-chainimplicit solvent calculation.The top most frequent connecting contacts are shown in panel C of Figure 2. They areseen to be mostly within the h regions of the two chains and then in the parts connecting h with h . There are also important contacts between h and residue 111.11igure 9 shows the distribution of the duration times of the dimers. It is seen that mostof the association events are short-lived: their life span does not exceed 0.2 ns. It seems thata power-law decay of the distribution describes the behavior better than an exponential low(see the caption of Figure 9). However, there are also events that last for tens of ns. Thesecorrespond to the data points shown in the inset of the ﬁgure. The corresponding most fre-quent contacts are diﬀerent in each event indicating existence of many pathways to associate.It is interesting to point out that one of the most frequent association contacts is 4-67.Thus, if one of the chains is in the knotted state then association involving residue 4 is ex-pected to extend the duration of this state by making the knot deeper, similar to the eﬀectsof the procedures described in ref. Conclusions

When analyzing the structural propensities, we have found, not surprisingly, that the solventincreases the disorder substantially, which shows as an enhancement in the C-content. Thetypes of the transient secondary structures that are detected are the same indicating thatfor most computational purposes the implicit-solvent approach is suﬃcient.The nature of the solvent, however, may be important when assessing the topologicalfeatures. This appears to be the case of α -synuclein. This protein supports formation ofknots in the explicit-solvent case but not in the implicit-solvent one, or, at least, there is areduction in the probability of making a knot. In any event, our results suggest that theintrinsically disordered proteins can generally support transient knots. It has been alreadydemonstrated that suﬃciently long poly-glutamine chains, which are also disordered, can12upport both deeply- and shallowly-knotted conformations (the implicit-solvent calculation;see also a coarse-grained study ). The presence of the knots in these chains has been sug-gested to be responsible for the monomeric-level toxicity leading to the Huntington disease.The toxicity results from the fact that the knots may hinder or even jam the degradationby the proteasomes (see a related discussion in ref. ) and thus enhance the concentrationof the chains in the cytosol. In the case of α -synuclein, the knots should enhance accumula-tion of the proteins which, in turn, leads to an enhancement of multimeric aggregation. Itremains to be elucidated, however, whether the knotted states of α -synuclein may lead toother physiologicallly relevant aspects.We have already mentioned that the SAXS experiments at the physiological pH yieldthe average R g of the human blood derived α -synuclein to be close to the value obtained byour explicit-solvent calculations. While about 20% of the durations of the explicit-solventtrajectories indicate the presence of the knots, the close agreement is not yet a proof of theirexistence. It would be interesting to perform single-molecule stretching experiments, similarto those done for structured proteins, to determine the fully stretched lengths. We expectthe distributions of such data to be two-peaked, with the shorter L peak corresponding tothe knotted conformations. Acknowledgements

We thank P. Robustelli for his inspiration to study knotting in α -synuclein, for providing us with the all-atom explicit solvent trajectories, and for mak-ing Table 1. We appreciate very useful discussions with E. Go la´s and B. R´o˙zycki. MChas received funding from the National Science Centre (NCN), Poland, under grant No.2018/31/B/NZ1/00047. This project is a part of the European COST Action EUTOPIA.13 eferences (1) Lashuel, H. A.; Overk, C. R.; Oueslati, A.; Masliah, E. The many faces of alpha-synuclein: from structure and toxicity to therapeutic target. Nat. Rev. Neurosci. , , 38-48.(2) Pieri, L.; Madiona, K.; Bousset, L.; Melki, R. Fibrillar a-synuclein and huntington exon1 assemblies are toxic to the cells. Biophys. J. , , 2894-2905.(3) Stefanis, L. α -synuclein in Parkinson’s disease. Cold Spring Harb. Perspect. Med. , , a009399 .(4) Ulmer, T. S.; Bax, A.; Cole, N. B.; Nussbaum, R. L. Structure and dynamics of micelle-bound human α -synuclein. J. Biol. Chem. , , 9595-9603.(5) van Rooijen, B. D.; van Leijenhurst-Groener, K. A.; Claessens, M. M.; Suibramanian,V. Trytophan ﬂuorescence reveals structural features of alpha-synuclein oligomers. J.Mol. Biol. , , 826-33.(6) Mantshyzov, A. B.; Maltsev, A. S.; Ying, J.; Shen, Y.; Hummer, G.; Bax, A. A maxi-mum entropy approach to the study of residue-speciﬁc backbone angle distributions in α -synuclein, and intrinsically disordered protein. Protein Sci. , , 1275-1290.(7) Ball, K. A.; Phillips, A. H.; Nerenberg, P. S.; Fawzi, N. L.; Wemmer, D. E.; Head-Gordon, T. Homogeneous and heterogeneous tertiary structure ensembles of amyloid- β peptides. Biochemistry , , 7612-7628.(8) Cote, Y.; Delarue, P.; Scheraga, H. A.; Senet P.; Maisuradze, G. G. From a highly disor-dered to a metastable state: uncovering insights of α -synuclein. ACS Chem. Neurosci. , , 1051-1065.(9) Ilie, I. M.; Nayar, D.; den Otter, W. K.; van der Vegt N. F. A.; Briels, W. J. In-14rinsic conformational preferences and interactions in α -synuclein ﬁbrils: insights frommolecular dynamics simulations. J. Chem. Theory Comput. , , 3298-3310.(10) Kabsch, W.; Sander, C. Dictionary of protein secondary structure: pattern recognitionof hydrogen bonded and geometrical features. Biopolymers , , 2557-2637.(11) Faisca, P. F. N. Knotted proteins: A tangled tale of structural biology. Comput. Struct.Biotechnol. J. , , 459-468.(12) Virnau, P.; Mallam A.; Jackson, S. Structures and folding pathways of topologicallyknotted proteins. J. Phys.: Condens. Matter , , 033101.(13) Sulkowska, J. I.; Rawdon, E. J. M.; Millet, K. C.; Onuchic, J. N.; Stasiak, A. Conser-vation of complex knotting and slipknotting patterns in proteins. Proc. Natl Acad. Sci.USA , , E1715-23.(14) Jamroz, M.; Niemyska, W.; Rawdon, E. J.; Stasiak, A.; Millett, K. C.; Sulkowski, P.;Sulkowska, J. I. KnotProt: a database of proteins with knots and slipknots. Nucl. AcidsRes. , , D306-14.(15) Xu, L.; Bhattacharya S.; Thompson, D. Re-designing the α -synuclein tetramer. Chem.Commun. , , 8080-8083.(16) Xu, L.; Bhattacharya, S.; Thompson, D. On the ubiquity of helical α -synucleintetramers. Phys. Chem. Chem. Phys. , , 12036.(17) Gurry, T.; Ullman, O.; Fisher, C. K.; Perovic, I.; Pochapsky T.; Stultz, C. M. Thedynamic structure of α -synuclein multimers. J. Am. Chem. Soc. USA , , 3865-3872.(18) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot,C.; Skeel, R. D.; Kale, L.; Schulten, K. Scalable molecular dynamics with NAMD. J.Comp. Chem. , , 1781-1802. 1519) Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B. L.; Grub-mueller, H.; MacKerell Jr., A. D. CHARMM36m: an improved force ﬁeld for foldedand intrinsically disordered proteins. Nature Methods , , 71-73.(20) Humphrey, W.; Dalke, A.; Schulten, K. VMD-Visual Molecular Dynamics. J. Mol.Graphics , , 33-38.(21) Tanner, D. E.; Chan, K.-Y.; Phillips, J. C.; Schulten, K. Parallel generalized Bornimplicit solvent calculations with NAMD. J. Chem. Theory Comput. , , 3635-3642.(22) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. Asmooth particle mesh Ewald method. J. Chem. Phys. , , 8577-8592.(23) Darden, T.; York D.; Pedersen, L. G. Particle mesh Ewald: An Nlog(N) method forEwald sums in large systems. J. Chem. Phys. , J. Mol. Biol. , , 253-266.(26) Wo lek, K.; G´omez-Sicilia, ´A.; Cieplak, M. Determination of contact maps in proteins: acombination of structural and chemical approaches. J. Chem. Phys. , , 243105.(27) Sikora, M.; Su lkowska, J. I.; Cieplak, M. Mechanical strength of 17 134 model proteinsand cysteine slipknots. PLoS Comp. Biol. , , e1000547.(28) Robustelli, P.; Piana S.; Shaw, D. E. Developing a molecular dynamics force ﬁeldfor both folded and disordered protein states. Proc. Natl. Acad. Sci. USA , ,E4758-E4766. 1629) Sethi, A.; Tian, J.; Vu, D. M.; Gnanakaran, S. Identiﬁcation of minimally interactingmodules in an intrinsically disordered protein. Biophys. J. , , 748-757.(30) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. GROMACS 4: algorithms forhighly eﬃcient, oad-balance, and scalable molecular simulation. J. Chem. Theory Com-put. , , 435-447.(31) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation andreparametrization of the OPLS-AA force ﬁeld for proteins via comparison with accuratequantum chemical calculations on peptides. J. Phys. Chem. B , , 6474-6487.(32) Araki, K.; Yagi, N.; Nakatani, R.; Sekiguchi, H.; So, M.; Yagi, H.; Ohta, N.; Nagai, Y.;Mochizuki, H. A small-angle X-ray scattering study of alpha-synuclein from human redblood cells. Sci. Rep. , , 30473.(33) Koniaris, K.; Muthukumar, M. Knottedness in ring polymers. Phys. Rev. Lett. , , 2211.(34) Taylor, W. R. A deeply knotted protein structure and how it might fold. Nature , , 916-919.(35) Zhao, Y.; Chwastyk, M.; Cieplak, M. Topological transformations in proteins: eﬀectsof heating and proximity of an interface. Sci. Rep. , , 39851.(36) Schwalbe, M.; Ozenne, V.; Bibow, S.; Jaremko, M.; Jaremko, L.; Gajda, M.; Jensen,M. R.; Biernat, J.; Becker, S.; Mandelkow, E.; et al. Predictive atomic resolutiondescriptions of intrinsically disordered hTau40 and α -synuclein in solution from NMRand small angle scattering. Structure , , 238-249.(37) San Martin, A.; Rodriguez-Aliaga, P.; Molina, J. A.; Martin, A.; Bustamante, C.;Baez, M. Knots can impair protein degradation by TAP-dependent proteases. Proc.Natl. Acad. Sci. USA , , 9864-9869.1738) Gomez-Sicilia, A.; Sikora, M.; Cieplak, M.; Carrion-Vazquez, M. An exploration of theuniverse of polyglutamine structures. PLoS Comp. Biol. , , e1004541.(39) Mioduszewski, L.; Cieplak, M. Disordered peptide chains in an α -C-based coarse-grained model. Phys. Chem. Chem. Phys. , , 19057-19070.(40) Wojciechowski, M.; Gomez-Sicilia, A.; Carrion-Vazquez, M.; Cieplak, M. Unfoldingknots by proteasome-like systems: simulations of the behavior of folded and neurotoxicproteins. Mol. BioSyst. , , 2700-2712.(41) Morar, A. S.; Olteanu, A.; Young G. B.; Pielak, G. J. Solvent-induced collapse of α -synuclein and acid-denatured cytochrome c. Protein Sci. , , 2195-2199.(42) Bornschloegl, T.; Anstrom, D. M.; Mey, E.; Dziubiella, J.; Rief, M.; Forest, K. T.Tightening the Knot in Phytochrome by Single-Molecule Atomic Force Microscopy, Biophys. J. , , 1508-1514. 18able 1: Comparison of calculated RMSD from experimental measurements for simula-tions of α -synuclein. We compare the agreement of 30 µ s explicit solvent trajectories of α -synuclein from ref. run with c22*/TIP4P-D, c36M/TIP3P-CHARMM, a99SB-disp, anda99SB*-ILDN/TIP3P and the knotted portions of the c22*/TIP4P-D and a99SB-disp tra-jectories with previously reported experimental solution NMR measurements. R g penaltieswere computed using an experimental value of 31.0 ± All classes (C α , H α ,HN, C’, C β ) of chemical shifts (CS) are reported in ppm; residual dipolar couplings (RDCs)and indirect dipole–dipole couplings (J-couplings) are in Hz; R g is in ˚A; paramagentic re-laxation enhancements (PREs) and the scores are unitless. NMR observables and force ﬁeld(FF) scores were calculated as previously reported, using only the subset of trajectoriesconsidered here. The Combined FF Score is deﬁned as (CS Score + NMR

Score ) / Penalty ,Rg

Penalty = (cid:0) | Rg Exp − Rg Sim | − Rg Exp error (cid:1) / Rg Exp , where the CS

Score is determined by nor-malizing the RMSD for each class of chemical shift by the smallest RMSD observed for theseven force ﬁelds and taking an average of the normalized RMSDs over all sets of experimen-tal chemical shifts. The NMR

Score is computed analogously for all additional classes of NMRmeasurements and Rg

Exp error is an experimentally estimated error of Rg. We note that theknotted portions of the c22*/TIP4P-D and a99SB-disp trajectories show marginally worseagreement with experimental measurements compared to their parent trajectories, but stillshow large improvements relative to simulations run with similar force ﬁelds and the TIP3Pand TIP3P-CHARMM water models.C α CS 0.43 0.51 0.61 0.51 0.59 0.88H α CS 0.15 0.18 0.20 0.14 0.20 0.31HN CS 0.90 1.09 1.63 1.46 1.89 3.73C’ CS 0.43 0.44 0.57 0.31 0.44 0.69C β CS 1.06 1.04 1.27 1.04 1.22 1.60RDC (Q) 0.47 0.59 0.64 0.41 0.52 0.93Rg 23.34 21.69 18.39 36.76 24.76 15.53PRE 0.18 0.22 0.32 0.17 0.23 0.39Backbone J HNHA J CC Score

Score

Penalty P λ for the monomer of α -synuclein to adopt the local secondarystructures λ at residue i . The structures shown are T, S (the upper panels, green andblue respectively) and H, B, G, and C (the lower panels, black, purple, magenta, and redrespectively). The dotted lines (not shown for G and B for clarity of the presentation)indicate the size of the error bars. They were obtained by splitting the whole trajectory intotwo halves. The data points in the left panels have been obtained by using the C22*/TIP4P-4D force ﬁeld with the explicit solvent. The panels on the right correspond to the NAMD-derived implicit solvent simulations. The black broken line shows the helical content if thestarting conformations is the PDB:1XQ8 structure.20

50 100 i j i explicit implicit implicit Figure 2: The top three panels: The 20 most likely contacts that participate the the α -synuclein dynamics. Panels A, B and C correspond to: the single-chain dynamics with theexplicit solvent, single-chain dynamics with the implicit solvent and the two-chain dynamicswith the implicit solvent respectively. The horizontal lines indicate locations along thesequence. Contacts link locations in the upper line with the locations in the lower line:contacts are either within the same chain (the top part) or with another chain (the bottompart). The higher the ranking, the thicker the line. The occupations are counted in thecumulative fashion throughout the simulation. We show contacts corresponding to locations i and j where i < j . For simplicity, we assume the symmetry between i and j . In panel A,the most highly occupied contacts is 95-112. The other contacts shown are 102-140 (rank2), 39-42 (3), 81-140 (4), 12-15 (5), 80-140 (6), 34-37 (7), 45-140 (8), 121-139 (9), 19-22 (10),94-112 (11), 108-140 (12), 5-8 (13), 128-140 (14), 95-113 (15), 20-23 (16), 80-83 (17), 102-105(18), 101-136 (19), and 103-106 (20). The total number of contacts identiﬁed was 8606. Inpanel B, the most highly occupied contact is 4-7. The other contacts shown are: 27-30 (rank2), 102-140 (3), 19-22 (4), 22-25 (5), 76-140 (6), 118-140 (7), 5-8 (8), 128-140 (9), 18-21 (10),11-14 (11), 96-140 (12), 33-36 (13), 93-140 (14), 53-140 (15), 8-11 (16), 4-8 (17), 69-140 (18),96-99 (19), and 121-140 (20). The total number of contacts identiﬁed was 9454. In panel C,the most highly occupied interchain contact is 72-73. The other contacts shown are: 65-89(rank 2), 4-67 (3), 51-111 (4), 49-111 (5), 48-111 (6), 64-86 (7), 71-73 (8), 73-73 (9), 66-87(10), 8-68 (11), 49-110 (12), 78-87 (13), 4-66 (14), 50-111 (15), 65-87 (16), 71-75 (17), 72-74(18), 71-74 (19), and 66-81 (20). The total number of contacts identiﬁed was 8596. Thebottom panel: The time averaged contact map corresponding to the situations A, B, andC. The color code, with the scale on the right, indicates the probability of the occurrenceof a contact. The white spots corresponds to regions in which no contacts were found. Thecontact maps are symmetric with respect to the diagonal so only a half of any map is shown.The digits indicate the rankings of the top three contacts.21igure 3: The left panels: The R g - L cross plot for α -synuclein. The data points have beenobtained by summing the values belonging to 1 ˚A × R g (the top panel) and L (the bottom panel). The histogram in black is for the implicit-solvent data. The histogram in blue, also denoted by A, is for the explicit-solvent case. Thebin size is 1 ˚A. The arrows indicate the corresponding mean values.22igure 4: A 6- µ s fragment of the explicit solvent trajectory. The upper panel shows theend-to-end distance. The lower panel shows the locations of the knot ends.23

704 ns 3492 ns 5523 ns5526 ns 5532 ns 5540 ns

N NN NC CCC NCC N

Figure 5: Examples of conformations in the fragment of the trajectory shown in Figure 4.The conformation at 2704 is the state just before the knotting process begins. The panelsfor 3492 and 5526 correspond to situations in which the knot exists. In the remaining panels,there is no knot. In the panel corresponding to 5526 ns, the knotted segment is in green andthe purple lines indicate the segments between the knot ends and the nearest termini. In thisexample, the purple segments are the longest that were found. In the panel corresponding to3492 ns the sequence within the knot is longer: it combines the segments in green and blue.The segments in purple are outside of the knots and are shorter than the maximal ones (for5526 ns). In the remaining (unknotted panels) the segments in blue are equal in length tothe maximal segments in purple shown for 5526 ns. They merely indicate the regions to lookat when a knot is about to form. The conformation at 3492 is obtained by performing movesindicated by the red arrows. The last stage here corresponds to the direct threading. Inthe panel for 5523 ns, a slipknot is formed and it further indicated motion leads to knotting(5526 ns). Further unknotting takes place by driving the slipknot out of the knot loop (5532ns). The conformation at 5540 represents the completely unknotted state. The knot was nolonger observed during this trajectory after this stage.24igure 6: A 6- µ s fragment of the explicit solvent trajectory. The upper panel shows theend-to-end distance. The lower panel shows the locations of the knot ends.25 C Figure 7: A snapshot of a knotted trefoil structure in which the sequential distance betweenthe knot ends is 73 – the shortest observed in the second trajectory. The knotted segmentis in green and blue and the knot ends are at the boundary of blue and purple. These areresidues 52 and 126. In other conformations, the knot ends may move into the core of theknot along the segments shown in blue but the separation between them will become largerthan the minimal value. The motion of the ends along the purple segments eventually leadsto the dissolution of the knot. 26

C NN

110 107109 4849111505352115908986 4 3268 7069636465 511121168788117 867 667

Figure 8: An example of a two-chain associated state. One chain is shown in the shadesof blue and another in the shades of green. The interfacial residues are shown as sphericalbeads, in blue and green correspondingly. Other residues are not shown. There are 40 inter-chain contacts. Seven of these contacts are marked as red lines. They belong to the top-tenmost probable contacts. These are: 65-89 (rank 2), 4-67 (3), 51-111 (4), 49-111 (5), 48-111(6), 64-86 (7), and 66-87 (10) (see also panel C in Figure 2). The ranking is based on allassociation events. The event shown is a part of the longest lasting dimer (see Figure 9). Inthis particular dimer, the most frequent contact is 4-67 (marked in red) and then 51-112. Inthe snapshot shown, 51-112 does form a contact but, unlike 51-111, it is not marked becauseit does not belong to the list of top 20 contacts derived from all events, independent of theduration of the corresponding dimer. The ﬁgure in the center shows full two chains. Thepanel on the right shows an enlargement of the interfacial region.27igure 9: The distribution of the duration times, t , of dimers (the solid black line). Thethinner blue line corresponds to the power law ﬁt ( t /t p ) − / with t pp