[PDF] First-Passage Time Distributions in Two-State Protein Folding Kinetics: Exploring the Native-Like States vs Overcoming the Free Energy Barrier

Abstract

Using a beta-hairpin protein as a representative example of two-state folders, we studied how the exploration of native-like states affects the folding kinetics. It has been found that the first-passage time (FPT) distributions are essentially single-exponential not only for the times to overcome the free energy barrier that separates unfolded and native-like states but also for the times to find the native state among the native-like ones. If the protein explores native-like states for a time much longer than the time to overcome the free energy barrier, which was found to be characteristic of high temperatures, the resulting FPT distribution to reach the native state remains close to exponential but the mean FPT (MFPT) is determined not by the height of the free energy barrier but by the time to explore native-like states. The mean time to overcome the free energy barrier is found to be in reasonable agreement with the Kramers rate formula and generally far shorter than the MFPT to reach the native state. The time to find the native state among native-like ones increases with temperature, which explains the known U-shape dependence of the MFPTs on temperature.

Full PDF

aa r X i v : . [ q - b i o . B M ] M a y First-Passage Time Distributions in Two-StateProtein Folding Kinetics: Exploring theNative-Like States vs Overcoming the FreeEnergy Barrier

Sergei F. Chekmarev ∗ , † , ‡ † Institute of Thermophysics, SB RAS, 630090 Novosibirsk, Russia ‡ Department of Physics, Novosibirsk State University, 630090 Novosibirsk, Russia

E-mail: [email protected]

Abstract

Using a coarse-grained protein model, a molecular dynamics (MD) simulation offolding of one of the benchmark two-state folders, a β -hairpin protein, has been per-formed. Each MD trajectory was divided into two parts - one to pass from an unfoldedprotein state to a native-like state by overcoming a free energy barrier that separatesthese states, and the other to explore the native-like states until the native state is at-tained. It has been found that the distributions of ﬁrst-passage times (FPTs) for bothsegments of the trajectories are essentially single-exponential. If the protein exploresthe native-like states for a time much longer than the time to overcome the free energybarrier, the resulting FPT distribution may be approximately single-exponential, as isexpected for two-state folders, but the mean FPT (MFPT) to reach the native statewill be determined by the time to ﬁnd the native state among the native-like ones. TheKramers rate formula to estimate the transition times from the unfolded to the native-like states shows that these times are in reasonable agreement with the corresponding imes obtained in the simulations and may be far shorter than the MFPTs to reachthe native state. Most of small, single-domain globular proteins (approximately to one hundred of residues)fold in a two-state manner.

In this case, the process of folding represents a cooperativetransition from a unfolded state of the protein to its functional (native) state over a freeenergy barrier without signiﬁcant intermediates. The barrier is created due to an interplaybetween energy and entropy, i.e., while the energy directs the protein towards the nativestate, the entropy returns it back towards numerous unfolded states.

On a free energylandscape, the unfolded and folded states form basins of attraction separated by the freeenergy barrier.

The resulting ﬁrst-passage times (FPTs) to reach the native state havea single-exponential distribution with the mean FPT (MFPT) associated with the heightof the free energy barrier. If observed, such a distribution allows one to suggest that onlytwo states are essentially populated - for folded and unfolded conformations. Also, basedon the calculated height of the free energy barrier, the proteins can be classiﬁed as fastand slow folders, which is a common practice.

Along with these, well documentedproperties of two-state folders, one issue requires clariﬁcation. The folded states are notrepresented by the unique native conformation - the latter is just one among a variety ofnative-like states, which may also play a role in protein functionality.

Accordingly, whenthe protein comes to the basin of native-like conformations, it does not necessarily reach thenative state immediately, but may dwell, and typically does, in this basin exploring native-like conformations until it ﬁnds the native one. Therefore, the MFPT may not be determinedsolely by the free energy barrier but be aﬀected by the protein dynamics in the native-likebasin. So, why the exploration of native-like states does not change the single-exponentialFPT distributions, which are observed for two-state folders?2o gain insight into this issue, we perform molecular dynamics (MD) simulations offolding of a β -hairpin protein - one of the benchmark two-state folders. We show that alongwith the single-exponential FPT distribution to arrive to the basin of native-like statesby overcoming the free energy barrier, the distribution of times to reach the native statewithin this basin is also exponential. As a result, the overall FPT distribution, which mayremain apparently single-exponential, is determined by the relation of the mean times forthose distributions, i.e., by the times to overcome the free energy barrier and those to ﬁndthe native state among native-like ones. Moreover, for high temperatures, the MFPT ispractically determined by the time to ﬁnd the native state in the basin for native-like states.We also use the Kramers rate formula to estimate the transition times from the basin ofunfolded states to the basin of native-like state, and ﬁnd that these times are in reasonableagreement with the corresponding times obtained in the simulations and may be far belowof the MFPTs to reach the native state. The β -hairpin protein we study is a 12-residue protein with the sequence KTWNPAT-GKWTE (2evq.pdb). Since a large number of folding trajectories was required to havewell-converged FPT distributions (ten to twenty ﬁve thousands of trajectories were run),the coarse-grained simulations similar to those in the previous work were employed. Theyincluded a C α -bead protein representation and G¯o-type interaction potential. The C α -beadrepresentation was constructed on the basis of the NMR protein structure of the protein. The G¯o-type potential consisted of three terms, which accounted for the rigidity of the back-bone and the contributions of native and non-native contacts in the form of the Lennard-Jones potential. Two C α -beads were considered to be in native contact if they were not thenearest neighbors along the protein chain and had the interbead distance not longer than d cut = 7 . N nat = N NATnat = 27. The simulations were per-formed with a constant-temperature molecular dynamics (MD) based on the coupled set ofLangevin equations. The time-step was ∆ t = 0 . τ , where τ is the characteristic time. Atthe length scale l = 7 . ǫ = 2 . τ = ( M l /ǫ ) / ≈ . M = 110 Da is the average mass of the residue. The friction constant γ = M/τ inthe Langevin equations was varied from γ = 3 M/τ to γ = 50 M/τ , where the upper boundcorresponds to water solution at room temperature; for these values of γ , the folding ratedecreases approximately as ∼ /γ . In what follows, the temperature is measured in theunits of ǫ , i.e., the Boltzmann constant is set to unity. Folding trajectories were started froma partially folded state of the protein and terminated upon reaching the native state. Thenative state was considered to be reached if the root-mean-square-deviation (RMSD) fromthe native structure was less than 1.0 ˚A.

The simulations were performed for ﬁve temperatures ranging from T = 0 . T = 0 . ≈ ≈ ǫ ). Figures 1, 2 and 3 present the results for T = 0 . T = 0 . T = 0 .

3. The friction constant is γ = 10 M/τ . Twenty ﬁvethousand of folding trajectories were run for each temperature. Figure 1 shows the resultsfor T = 0 .

1. The distribution of protein states is presented in Fig. 1 a and Fig. 1 b asa free energy surface (FES) and a free energy proﬁle (FEP), respectively. Since the MDtrajectories were terminated upon reaching the native state, i.e., “nonequilibrium” conditionswere simulated, the present FES and FEP represent the distributions of probabilities ofprotein states rather the true free energy landscapes. As a reaction coordinate, the numberof native contacts N nat was used. For the FES, the free energy was calculated as F ( N nat , R g ) = − T ln P ( N nat , R g ), where P ( N nat , R g ) is the probability to ﬁnd the protein ina state with the given number of native contacts N nat and the radius of gyration R g . For the4 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -5 -4 -3 -2 p r obab ili t y den s i t y time(c) < r ( t ) > time (d) Figure 1: T = 0 .

1. ( a ) The free energy surface F ( N nat , R g ) and ( b ) free energy proﬁle F ( N nat ). ( c ) First-passage time distributions: the U-NL trajectories (blue triangles), theNL-N trajectories (red), and the U-N trajectories (black). ( d ) The mean-square deviation ofthe number of natives contacts N nat from that at the transition state N TSnat (black curve); theblue and red dashed lines are the linear ﬁts to the curve for short and long times, respectively.5 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -5 -4 -3 -2 p r obab ili t y den s i t y time(c) < r ( t ) > time (d) Figure 2: T = 0 .

2. The notations are as in Fig. 1.6 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -5 -4 -3 -2 p r obab ili t y den s i t y time(c) < r ( t ) > time (d) Figure 3: T = 0 .

3. The notations are as in Fig. 1.7EP, the free energy was calculated as F ( N nat ) = − T ln P ( N nat ), where the probability forthe protein to have N nat contacts, P ( N nat ), was calculated by summation of protein statesat the current value of N nat . In agreement with previous studies of β -hairpin folding, the FES and FEP reveal two basins of attraction - one for partially folded (semi-compact)conformations (smaller values of N nat ), and the other for native-like states (larger valuesof N nat ). The basins are separated by a free energy barrier at the transition state (TS) at N nat = N TSnat ≈ To divide the U-N trajectory, here and in all other cases we studied,we chose the point N nat = N TSnat + 2, where the height of the basin of native-like states onthe TS side typically was ≈ c showsthe FPT distributions for the U-NL (blue), NL-N (red) and U-N (black) trajectories. It isseen that not only the FPT distribution for the U-NL trajectories, which overcome the TSbarrier, is essentially single-exponential, which is characteristic of two-state kinetics, but theFPT distribution for the UN-to-N trajectories, which are conﬁned to the basin of native-likestates, is also approximately single-exponential (a steep rise of the U-NL FPT distribution atsmall times reﬂects the times the protein spends to come to the basin of semi-compact statefrom an unfolded state ). According to the Poisson law of zero-order (the waiting time forthe ﬁrst event), the single-exponential distribution for the U-NL trajectories suggests thatin the basin of native-like states the protein explores equally probable and accessible states(see also a simple illustration of this in the Supporting Information, Fig. S2). The U-NL and8L-N trajectories contribute to the overall MFPT approximately equally: the MFTPs forthe U-NL and NL-N trajectories are h t U − NL i ≈

144 and h t NL − N i ≈ h t U − NL i ≈ p U − N ( t ) = Z t p NL − N ( t ) p U − NL ( t − t ) dt When the U-NL and NL-N distributions are single-exponential, i.e., p U − NL ( t ) = (1 / h t U − NL i ) exp( − t/ h t U − NL i )and p NL − N ( t ) = (1 / h t NL − N i ) exp( − t/ h t NL − N i ), it gives p U − N ( t ) = 1 h t U − NL i − h t NL − N i [ e − t/ h t U − NL i − e − t/ h t NL − N i ] (1)In two limit cases, h t NL − N i ≫ h t U − NL i and h t U − NL i ≫ h t NL − N i , p U − N ( t ) transforms intothe corresponding single-exponential NL-N and U-NL distribution, while when h t NL − N i and h t U − NL i are compatible, it remains essentially two-exponential. Most signiﬁcantly, p U − N ( t )deviates from the single-exponential distribution when h t NL − N i ≈ h t U − NL i . Then p U − N ( t ) ≈ t/τ exp − t/τ , where τ = h t NL − N i ≈ h t U − NL i . In this case, ﬁrst, the steep rise at small timesbecomes more pronounced, and, second, the distribution approaches to a single-exponentialone only at t/ ln t ≫ τ . Figure 4 shows the evolution of the FPT distributions with the ratio α = h t NL − N i / h t U − NL i from α ≪ α ≫

1; at the lower bound of α , the MFPT is largelydetermined by the transition over the free energy barrier, and at the upper bound, by theexploration of native-like states.As the temperature increases, the TS slightly shifts towards the native state, Figs. 2 a,b and 3 a,b (see also Fig. 5, where the FEPs are put together, including those for the inter-mediate temperatures T = 0 .

15 and T = 0 . T = 0 . c and 3 c ), i.e., the U-NL and NL-N distributions remain essentially9

25 50 75 100 12510 -6 -5 -4 -3 -2 -1 p r obab ili t y den s i t y time(a) -6 -5 -4 -3 -2 -1 p r obab ili t y den s i t y time(b) -6 -5 -4 -3 -2 -1 p r obab ili t y den s i t y time(c) -6 -5 -4 -3 -2 -1 p r obab ili t y den s i t y time(d) Figure 4: The evolution of theoretical ﬁrst-passage time distribution: ( a ) h t U − NL i = 10 . h t NL − N i = 1 .

0, ( a ) h t U − NL i = 5 . h t NL − N i = 10 .

0, ( c ) h t U − NL i = 10 . h t NL − N i = 10 . d ) h t U − NL i = 1 . h t NL − N i = 10 .

0. The distributions for the U-NL trajectories areshown in blue, for the NL-N trajectories in red, and for the U-N trajectories in black.10ingle-exponential whereas the U-N distributions are apparently exponential only at largetimes. In order to separate the U-NL, NL-N and U-N distributions more clearly, they areshown in Fig. 6 in the form of survival probabilities. It is signiﬁcant that the contributionof the NL-N trajectories to the U-N FPT distribution becomes dominant with temperature,so that both the U-N distribution and its MFPT are largely determined not by overcomingthe free energy barrier but protein dwelling in the basin of native-like states. For example,at T = 0 .

3, the MFPTs are h t U − NL i ≈ h t NL − N i ≈ h t U − N i ≈ T=0.1 T=0.15 T=0.2 T=0.25 T=0.3 f r ee ene r g y number of native contacts Figure 5: Variation of the free energy proﬁles with temperature.The results of simulations for intermediate temperatures, T = 0 .

15 and T = 0 .

25, as wellas of the simulations for smaller and larger values of the friction constant, γ = 3 M/τ and γ = 50 M/τ , are completely in line with the present results (Supporting Information, Figs.S3 - S10 and Tables S1 - S4). The main eﬀect is that the MFTSs drastically increase with γ . Also, it is worth noting that we can as well use an alternative condition to terminatethe MD trajectories, speciﬁcally, that N nat is equal to the number of native contacts in thenative state N NATnat = 27. The simulations show that this does not aﬀect the overall picture offolding, i.e., the TSs retain their positions, and the U-NL and NL-N FPT distributions remainessentially single-exponential (Supporting Information, Figs. S11 - S12). The only changeis that the time the protein spent in the basin of native-like states increases substantiallybecause the probability to ﬁnd a state with N nat = N NATnat turns to be smaller than that to11

200 400 600 800 1000 1200 140010 -4 -3 -2 -1 s u r v i v a l p r obab ili t y time(a) -4 -3 -2 -1 s u r v i v a l p r obab ili t y time(b) -4 -3 -2 -1 s u r v i v a l p r obab ili t y time(c) Figure 6: The ﬁrst-passage time distributions as the survival probabilities: ( a ) T=0.1, ( b )T=0.2, and ( c ) T=0.3. The U-NL distributions are shown in blue, the NL-N distributionsin red, and the U-N distributions in black. The dashed green lines are the exponential ﬁtsto the the U-NL distributions. 12nd a state with the RMSD from the native state less than 1.0 ˚A (al least, in the presentprotein model).It is interesting to ask what folding times can be predicted using the reaction-state theoryfor the calculated FEPs. Speciﬁcally, we can use the Kramers rate formula in the strongfriction limit, which has been previously employed to calculate folding times. Ouranalysis is somewhat similar to that for the folding of a 27-bead lattice protein. With N nat axis as a reaction coordinate, the mean time of transitions over the free energy barrier isdetermined as h t U − NL i = 2 πTD TS ( F ′′ U F ′′ TS ) / exp( − ∆ F/T ) (2)where F ′′ U and F ′′ TS are the second order derivatives of the free energy with respect to N nat atthe bottom of the basin for unfolded states and the top of the TS barrier, respectively, D TS is the diﬀusion coeﬃcient at the TS, and ∆ F is the height of the TS barrier measured fromthe bottom of the unfolded state basin. The diﬀusion coeﬃcient was calculated directly,although the autocorrelation time of N nat could also be used for this. Speciﬁcally, as theMD trajectory reached the TS ( N nat = N TSnat ), the time-dependent square deviation from theTS, R ( t ) = [ N nat ( t ) − N TSnat ] , was calculated, and the diﬀusion coeﬃcient was determinedas D = (1 / d h R i /dt , where h R ( t ) i is the ensemble average of R ( t ). In general, thevalue of the diﬀusion coeﬃcient is position-dependent in protein folding. As can be seenfrom Figs. 1 d , 2 d and 3 d , there are two time intervals where the h R ( t ) i changes withtime linearly, and thus the diﬀusion coeﬃcients can be considered constant for each of thesetime intervals. At short times ( t < . D TS ∼

10, but at longer times ( t > d and t > d ), it is one order of magnitude smaller (by ≈

30 times). At shorttimes, the deviation from TS, ∆ N nat = h R ( t ) i / , is 2 units or less, i.e., the protein doesnot leave a close vicinity of the TS. In contrast, at longer times, ∆ N nat can be as largeas 4 units, which indicates that the protein moves away from the TS toward the bottomof one of the basins (see Figs. 1 b , 2 b and 3 b ). The simulations show that at large timesthe protein does not “jump” between the basins of unfolded and native-like states over the13S but rather explores one of the basins, predominantly, the basin of the native-like states(Supporting Information, Fig. S1). This suggests that the linear behavior of h R ( t ) i at largetimes should be associated with an intra-basin diﬀusion rather than with the transitions overthe TS barrier, i.e., with an inter-basin diﬀusion. Consequently, the value of the diﬀusioncoeﬃcient at small times was employed as the D TS in Eq. (2). Because of the discrete natureof the reaction coordinate, we used two diﬀerent methods to calculate the derivative F ′′ U and F ′′ TS . In one method, the derivatives were obtained from the approximations of the F ( N nat )with the second order polynomials in the vicinity of the points corresponding to the bottomof the unfolded basin ( F ′′ U ) and the top of the TS barrier ( F ′′ TS ). In the other method, thederivatives were calculated directly, as the three-point ﬁnite diﬀerences of F ( N nat ) at thosepoints. The calculated parameters for Eq. (2) are tabulated in Table 1. It is seen, inTable 1: Parameters to calculate the U-NL transition time with the Kramers formula T . .

15 0 . .

25 0 . F F ′′ Ua F ′′ Ub F ′′ TSa F ′′ TSb D TS a from the polynomial approximation. b calculated as the three-point ﬁnite diﬀerence.partular, that except for T = 0 .

1, the values of F ′′ U and F ′′ TS obtained with the two abovemethods are very close.The results of the calculations are presented in Table 2 and Fig. 7 a . As could be expected,the Kramers formula (blue triangles) gives the U-NL transition times (black triangles) ratherthan the total, U-N folding times (black squares). At the same time, along with a similarqualitative behavior with temperature, the times given by Eq. (2) are considerably shorterthan those obtained in the simulations - from approximately 2 times at T = 0 . T = 0 .

3. Also, the degree of agreement with the simulated times depends on the valueof the friction constant, as it is shown in Figs. 7 b and 7 c for γ = 3 M/τ and γ = 50 M/τ ,14able 2: Comparison of folding times T . .

15 0 . .

25 0 . h t U − NL i a h t U − NL i b h t U − NL i h t NL − N i h t U − N i a calculated from the slope of the simulated U-NL decay curve. b Eq. (2) for the average values of F ′′ U and F ′′ TS (Table 1). t i m e temperature (a) t i m e temperature (b) t i m e temperature (c) Figure 7: Comparison of the mean-ﬁrst-passage-times: ( a ) γ = 10 M/τ , ( b ) γ = 3 M/τ , and( c ) γ = 50 M/τ . The black squares are for the h t U − N i times from simulations, the blacktriangles denote the h t U − NL i times calculated from the slopes of the simulated U-NL decaycurves, and the blue triangles are for h t U − NL i times from Eq. (2) with the average values of F ′′ U and F ′′ TS (the dashed and dash-dotted blue lines show the results for F ′′ U and F ′′ TS obtained bythe polynomial approximation of the FEP and calculated by ﬁnite-diﬀerences, respectively).For γ = 10 M/τ and γ = 3 M/τ , the diﬀusion coeﬃcient was calculated from h R ( t ) i at smalltimes where h R ( t ) i ∼ t , and at γ = 50 M/τ at longer times where h R ( t ) i ∼ t (see the textfor details). In all cases, the solid lines are to guide the eye.15espectively. Moreover, we found that the calculation of the diﬀusion coeﬃcient at longtimes, as has been employed for folding of a lattice protein, can give better agreementwith the simulated times, although the diﬀusion at these times is intra-basin rather thaninter-basin. This is the case of γ = 50 M/τ , where the calculation of the diﬀusion coeﬃcientat short times would decrease the times from Eq. (2) by factor of ≈

10 in comparisonwith those in Fig. 7 c . There may be several reasons why the replacement of “real” proteindynamics in multi-dimensional space by motion along the calculated one-dimensional FEPdid not lead to better agreement of the theoretical estimates with the simulated reasults.For instance, N nat is good but not an optimal reaction coordinate. Also, one of the basicconditions to derive Eq. (2), ∆ F ≫ T , is not fully satisﬁed. Nevertheless, althoughmore accurate estimates would be desirable, for the purposes of the present study, it is moreimportant that the times obtained from Eq. (2) reproduce the U-NL transition times ratherthan the total MFPT. Using a coarse-grained protein model, we have performed an extensive MD simulation offolding of a β -hairpin protein - one of the benchmark two-state folders. Each MD trajectoryto reach the protein native state from an unfolded state was divided into two parts - oneto pass from the unfolded state to a native-like state by overcoming the free energy barrierthat separates these states, and the other to expore the basin of native-like states untilthe native state is achieved. It has been found that the distributions of ﬁrst-passage times(FPTs) for both segments of the trajectories are essentially single-exponential. The resultingFPT distribution to reach the native state is generally double-exponential, with a steep riseat small times and an apparent exponential decay at large times. The deviation of thisdistribution from a single-exponential one is determined by the relation of the mean timesfor the constituting trajectories, i.e., the smaller one time is in comparison to the other, the16loser the resulting FPT distribution is to the exponential distribution for the longer-timetrajectories. Accordingly, if the protein dwells in the basin of native-like states for a longtime, in comparison to the time to overcome the free energy barrier, the FPT distributionmay appear as an exponential distribution, but the mean FPT (MFPT) to reach the nativestate will be determined not by the time to overcome the barrier but the time to ﬁnd thenative state among native-like ones; this is characteristic of high temperatures. Based on thefree energy proﬁles constructed from the simulated MD trajectories, the mean times to passfrom the basin of unfolded states to the basin of native-like states has been calculated usingthe Kramers rate formula. It has been found that these times are in reasonable agreementwith the corresponding times obtained by the simulation and are far shorter than the MFPTsto reach the native state. Acknowledgement

I thank Dmitriy Chekmarev for valuable comments on the manuscript. A support from theRussian Ministry of Education and Science is acknowledged.17 eferences (1) Zwanzig, R. Two-state models of protein folding kinetics.

Proc. Natl. Acad. Sci. USA , , 148-150.(2) Jackson, S. E. How do small single-domain proteins fold? Fold. Des. , , R81-R91.(3) Plaxco, K. W.; Simons, K. T.; Ruczinski, I.; Baker, D. Topology, Stability, Sequence,and Length: Deﬁning the Determinants of Two-State Protein Folding Kinetics. Bio-chemistry , , 11178-11183.(4) Ivankov, D. N.; Garbuzynskiy, S. O.; Elm, E.; Plaxco, K. W.; Baker, D.; Finkelstein,A. V. Contact order revisited: Inﬂuence of protein size on the folding rate. Protein Sci. , , 2057-2062.(5) Makarov, D. E.; Plaxco, K. W. The topomer search model: A simple, quantitativetheory of two-state protein folding kinetics. Protein Sci. , , 17-26.(6) Kubelka, J.; Hofrichter, J.; Eaton, W. A. The protein folding speed limit. Curr. Opin.Struct. Biol. , , 76-88.(7) Naganathan, A. N.; Sanchez-Ruiz, J. M.; Mu˜noz, V. Direct Measurement of BarrierHeights in Protein Folding. J. Am. Chem. Soc. , , 7970-17971.(8) Kubelka, J.; Chiu, T. K.; Davies, D. R.; Eaton, W. A., Hofrichter, J. Sub-microsecondProtein Folding. J. Mol. Biol. , , 546-553.(9) Akmal, A.; Mu˜noz, V. The Nature of the Free Energy Barriers to Two-State Folding. Proteins: Struct. Funct. Bioinform. , , 142-152.(10) Barrick, D. What have we learned from the studies of two-state folders, and what arethe unanswered questions about two-state protein folding? Phys. Biol. , , 015001.1811) Lane, T. J.; Schwantes, C. R.; Beauchamp, K. A.; Pande, V. S. Probing the origins oftwo-state folding. J. Chem. Phys. , , 145104.(12) Gelman, H.; Gruebele, M. Fast protein folding kinetics. Quart. Rev. Biophys. , , 95-142.(13) ˘Sali, A.; Shakhnovich, E.; Karplus, M. How does a protein fold? Nature , ,248-251.(14) Onuchic, J. N.; Luthey-Schulten, Z.; Wolynes, P. G. Theory of Protein Folding: TheEnergy Landscape Perspective. Annu. Rev. Phys. Chem. , , 545-600.(15) Dobson, C. M.; ˘Sali, A.; Karplus, M. Protein Folding: A Perspective from Theory andExperiment. Angew. Chem. Int. Ed. , , 868-893.(16) Shea, J.-E.; Brooks, C. L. III. From Folding Theories to Folding Proteins: A Reviewand Assessment of Simulation Studies of Protein Folding and Unfolding. Annu. Rev.Phys. Chem. , , 499-535.(17) Gruebele, M. Protein folding: the free energy surface. Curr. Opin. Struct. Biol. , , 161-168.(18) Henry, E. R.; Eaton, W. A. Combinatorial modeling of protein folding kinetics: freeenergy proﬁles and rates. Chem. Phys. , , 163-185.(19) Das, P.; Moll, M.; Stamati, H.; Kavraki, L. E.; Clementi, C. Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proc. Natl. Acad. Sci. USA , , 9885-9890.(20) Oliveberg, M.; Wolynes, P. G. The experimental survey of protein-folding energy land-scapes. Quart. Rev. Biophys. , , 245-288 .1921) Best, R. B.; Hummer, G.; Eaton, W. A. Native contacts determine protein foldingmechanisms in atomistic simulations. Proc. Natl. Acad. Sci. USA , , 17874-17879.(22) Privalov, P. L. Stability of Proteins. Small Globular Proteins. Adv. Prot. Chem. , , 167-241.(23) Taverna, D. M.; Goldstein, R. A. Why Are Proteins Marginally Stable? Proteins:Struct. Funct. Genet. , , 105-109.(24) Eisenmesser, E. Z.; Millet, O.; Labeikovsky, W.; Korzhnev, D. M.; Wolf-Watz, M.;Bosco, D. A.; Skalicky, J. J.; Kay, L. E.; Kern, D. Intrinsic dynamics of an enzymeunderlies catalysis. Nature , , 117-121.(25) Best, R.B.; Lindorﬀ-Larsen, K.; DePristo, M. A.; Vendruscolo, M. Relation betweennative ensembles and experimental structures of proteins. Proc. Natl. Acad. Sci. USA , , 10901-10906.(26) DuBay, K. H.; Bowman, G. R.; Geissler, P. L. Fluctuations within Folded Proteins:Implications for Thermodynamic and Allosteric Regulation. Acc. Chem. Res. , , 1098-1105.(27) Andersen, N. H.; Olsen, K. A.; Fesinmeyer, R. M.; Tan, X.; Hudson, F. M.; Eidenschink,L. A.; Farazi, S. R. Minimization and Optimization of Designed β -Hairpin Folds. J. Am.Chem. Soc. , , 6101-6110.(28) Chekmarev, S. F. Protein folding: Complex potential for the driving force in a two-dimensional space of collective variables. J. Chem. Phys. , , 145103.(29) G¯o, N. Theoretical studies of protein folding. Annu. Rev. Biophys. Bioeng. , ,183-210. 2030) Hoang, T. X.; Cieplak, M. Molecular dynamics of folding of secondary structures inGo-type models of proteins. J. Chem. Phys. , , 6851-6962 .(31) Biswas, R.; Hamann, D. R. Simulated annealing of silicon atom clusters in Langevinmolecular dynamics. Phys. Rev. B , , 895-901.(32) Miyazawa, S.; Jernigan, R. L. Residue Residue Potentials with a Favorable ContactPair Term and an Unfavorable High Packing Density Term, for Simulation and Thread-ing. J. Mol. Biol. , , 623-644.(33) Klimov, D. K.; Thirumalai, D. Viscosity dependence of the folding rates of proteins. Phys. Rev. Lett. , , 317-320.(34) Kalgin, I. V.; Chekmarev, S. F.; Karplus, M. First Passage Analysis of the Folding ofa β Sheet Miniprotein: Is it More Realistic Than the Standard Equilibrium Approach?

J. Phys. Chem. B , , 4287-4299.(35) Socci, N. D.; Onuchic, J. N.; Wolynes, P. G. Diﬀusive Dynamics of the Reaction Coor-dinate for Protein Folding Funnels. J. Chem. Phys. , , 5860-5868.(36) Yang, S.; Onuchic, J. N.; Levine, H. Eﬀective stochastic dynamics on a protein foldingenergy landscape. J. Chem. Phys. , , 054910.(37) Best, R B.; Hummer, G. Diﬀusive Model of Protein Folding Dynamics with KramersTurnover in Rate. Phys. Rev. Lett. , , 228104.(38) Best, R B.; Hummer, G. Coordinate-dependent diﬀusion in protein folding. Proc. Natl.Acad. Sci. USA , , 1088-1093.(39) Mu˜noz, V.; Thompson, P. A.; Hofrichter, J.; Eaton, W.A . Folding dynamics andmechanism of β -hairpin formation. Nature , , 196-199.(40) Dinner, A. R.; Lazaridis, T.; Karplus, M. Understanding β -hairpin formation. Proc.Natl. Acad. Sci. USA , , 9068-9073 .2141) Zhou, R.; Berne, B. J.; Germain, R. The free energy landscape for β hairpin folding inexplicit water. Proc. Natl. Acad. Sci. USA , , 14931-14936.(42) Zagrovic, B.; Sorin, E. J.; Pande, V. β -Hairpin Folding Simulations in Atomistic DetailUsing an Implicit Solvent Model. J. Mol. Biol. , , 151-169.(43) Bolhuis, P.G. Transition-path sampling of β -hairpin folding. Proc. Natl. Acad. Sci. USA , , 12129-12134.(44) Krivov, S. V.; Karplus, M. Hidden complexity of free energy surfaces for peptide (pro-tein) folding. Proc. Natl. Acad. Sci. USA , , 14766-14770.(45) Bussi, G.; Gervasio, F. L.; Laio, A.; Parrinello, M. Free-Energy Landscape for β HairpinFolding from Combined Parallel Tempering and Metadynamics.

J. Am. Chem. Soc. , , 13435-13441.(46) Chung, H. S.; McHale, K.; Louis, J. M.; Eaton, W. A. Single-Molecule FluorescenceExperiments Determine Protein Folding Transition Path Times. Science , ,981-984.(47) Jacobs, W. M.; Shakhnovich, E. I. Accurate Protein-Folding Transition-Path StatisticsFrom a Simple Free-Energy Landscape. J. Phys. Chem. B , , 11126-11136.(48) Chekmarev, S. F.; Krivov, S. V.; Karplus, M. Folding Time Distributions as an Ap-proach to Protein Folding Kinetics. J. Phys. Chem. B , , 5312-5330.(49) Kramers, H. A. Brownian motion in a ﬁeld of force and the diﬀusion model of chemicalreactions. Physica , , 284-304.(50) Berne, B. J.; Borkovec, M.; Straub, J. E. Classical and Modern Methods in ReactionRate Theory. J. Phys. Chem. , , 3711-3725.22 r X i v : . [ q - b i o . B M ] M a y Supporting Information for:“First-Passage Time Distributions in Two-StateProtein Folding Kinetics: Exploring theNative-Like States vs Overcoming the FreeEnergy Barrier”

Sergei F. Chekmarev ∗ , † , ‡ † Institute of Thermophysics, SB RAS, 630090 Novosibirsk, Russia ‡ Department of Physics, Novosibirsk State University, 630090 Novosibirsk, Russia

E-mail: [email protected] Deviation from the Transition State N na t ( t ) - N T S na t < r ( t ) > time Figure S1: T = 0 . γ = 10 M/τ , and ten thousand MD trajectories. The mean square (blackcurve) and mean (blue curve) deviations from the transition state.2

A Simple Model for Single-ExponentialFirst-Passage Time Distribution p r obab ili t y den s i t y number of steps Figure S2: A simulated distribution of ﬁrst-passage times. Random number generator witha uniform distribution of the numbers between 0 and 1 was used. In the ensemble of 10 trajectories, each trajectory was started from a random number and proceeded through thenumbers until the value of 0 . ± .

01 was achieved. The label corresponds to the simulatedtrajectories, and the blue dashed line shows an exponential ﬁt to the simulate distributionwith the decay rate of 50.0. 3

Friction Constant γ = 10 M/τ number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -4 -3 -2 -1 s u r v i v a l p r obab ili t y time(b) < r ( t ) > time (d) Figure S3: T = 0 .

15. ( a ) The free energy surface F ( N nat , R g ), and ( b ) free energy proﬁle F ( N nat ). ( c ) First-passage time distributions in the form of survival probabilities: the U-NLtrajectories (blue), the NL-N trajectories (red), and the U-N trajectories (black); the dashedgreen line denotes an exponential ﬁt to the U-NL distribution. ( d ) The time-dependentmean-square deviation from the transition state in the number of native contacts (blackcurve); the blue and red dashed lines are the linear ﬁts to the curve for short and long times,respectively. 4 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -4 -3 -2 -1 s u r v i v a l p r obab ili t y time(b) < r ( t ) > time (d) Figure S4: T = 0 .

25. The notations are as in Fig. S3.5

Friction Constant γ = 3 M/τ number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -4 -3 -2 p r obab ili t y den s i t y time(c) < r ( t ) > time (d) Figure S5: T = 0 .

1. ( a ) The free energy surface F ( N nat , R g ), and ( b ) free energy proﬁle F ( N nat ). ( c ) First-passage time distributions: the U-NL trajectories (blue), the NL-N trajec-tories (red), and the U-N trajectories (black); the dashed green line denotes an exponentialﬁt to the U-NL distribution. ( d ) The time-dependent mean-square deviation from the tran-sition state in the number of native contacts (black curve); the blue and red dashed lines arethe linear ﬁts to the curve for short and long times, respectively.6 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -4 -3 -2 p r obab ili t y den s i t y time(c) < r ( t ) > time (d) Figure S6: T = 0 .

2. The notations are as in Fig. S5.7 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -4 -3 -2 p r obab ili t y den s i t y time (c) < r ( t ) > time (d) Figure S7: T = 0 .

3. The notations are as in Fig. S5.8able S1: Parameters to calculate the U-NL transition time with the Kramers rate formula T . .

15 0 . .

25 0 . F F ′′ Ua F ′′ Ub F ′′ TSa F ′′ TSb D TS a from the polynomial approximation. b calculated as the three-point ﬁnite diﬀerence.Table S2: Comparison of Folding Times T . .

15 0 . .

25 0 . h t U − NL i a h t U − NL i b h t U − NL i h t NL − N i h t U − N i a calculated from the slope of the simulated U-NL decay curve. b Kramers rate formula for the average values of F ′′ U and F ′′ TS (Table S1).9 Friction Constant γ = 50 M/τ number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -3 -2 -1 s u r v i v a l p r obab ili t y time(a) < r ( t ) > time (d) Figure S8: T = 0 .

1. ( a ) The free energy surface F ( N nat , R g ), and ( b ) free energy proﬁle F ( N nat ). ( c ) First-passage time distributions in the form of survivaal probabilities: the U-NLtrajectories (blue), the NL-N trajectories (red), and the U-N trajectories (black); the dashedgreen line denotes an exponential ﬁt to the U-NL distribution. ( d ) The time-dependentmean-square deviation from the transition state in the number of native contacts (blackcurve); the blue and red dashed lines are the linear ﬁts to the curve for short and long times,respectively. 10 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -3 -2 -1 s u r v i v a l p r obab ili t y time(c) < r ( t ) > time (d) Figure S9: T = 0 .

2. The notations are as in Fig. S8.11 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -3 -2 -1 s u r v i v a l p r obab ili t y time(a) < r ( t ) > time (d) Figure S10: T = 0 .

3. The notations are as in Fig. S8.12able S3: Parameters to calculate the U-NL transition time with the Kramers rate formula T . .

15 0 . .

25 0 . F F ′′ Ua F ′′ Ub F ′′ TSa F ′′ TSb D TS a from the polynomial approximation. b calculated as the three-point ﬁnite diﬀerence.Table S4: Comparison of folding times T . .

15 0 . .

25 0 . h t U − NL i a

440 280 285 310 265 h t U − NL i b

472 358 151 159 100 h t U − NL i

733 514 484 454 406 h t NL − N i

476 372 380 524 859 h t U − N i a calculated from the slope of the simulated U-NL decay curve. b Kramers rate formula for the average values of F ′′ U and F ′′ TS (Table S3).13 Diﬀerent Thresholds to Terminate the MDTra jectories number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -4 -3 -2 -1 s u r v i v a l p r obab ili t y time(c) < r ( t ) > time (d) Figure S11: The trajectories were terminated as the RMSD from the native state was lessthan 1.0 ˚A; T = 0 .

2. ( a ) The free energy surface and ( b ) free energy proﬁle. ( c ) First-passage time distributions in the form of survival probabilities: the U-NL trajectories (blue),the NL-N trajectories (red), and the U-N trajectories (black); the dashed green line denotesan exponential ﬁt to the U-NL distribution. ( d ) The time-dependent mean-square deviationfrom the transition state in the number of native contacts (black curve); the blue and reddashed lines are the linear ﬁts to the curve for short and long times, respectively.14 number of native contacts r ad i u s o f g y r a t i on (a) f r ee ene r g y number of native contacts (b) -4 -3 -2 -1 s u r v i v a l p r obab ili t y time(c) < r ( t ) > time (d) Figure S12: The trajectories were terminated as the number of native contacts N nat was equalto the number of native contacts in the native state N NATnat = 27; T = 0 ..