[PDF] Markov modeling of peptide folding in the presence of protein crowders

Abstract

We use Markov state models (MSMs) to analyze the dynamics of a β -hairpin-forming peptide in Monte Carlo (MC) simulations with interacting protein crowders, for two different types of crowder proteins [bovine pancreatic trypsin inhibitor (BPTI) and GB1]. In these systems, at the temperature used, the peptide can be folded or unfolded and bound or unbound to crowder molecules. Four or five major free-energy minima can be identified. To estimate the dominant MC relaxation times of the peptide, we build MSMs using a range of different time resolutions or lag times. We show that stable relaxation-time estimates can be obtained from the MSM eigenfunctions through fits to autocorrelation data. The eigenfunctions remain sufficiently accurate to permit stable relaxation-time estimation down to small lag times, at which point simple estimates based on the corresponding eigenvalues have large systematic uncertainties. The presence of the crowders have a stabilizing effect on the peptide, especially with BPTI crowders, which can be attributed to a reduced unfolding rate k u , while the folding rate k f is left largely unchanged.

Full PDF

MMarkov modeling of peptide folding in the presence of protein crowders

Daniel Nilsson, a) Sandipan Mohanty, b) and Anders Irbäck c)1) Computational Biology and Biological Physics, Department of Astronomyand Theoretical Physics, Lund University, Sölvegatan 14A, SE-223 62 Lund,Sweden Institute for Advanced Simulation, Jülich Supercomputing Centre,Forschungszentrum Jülich, D-52425 Jülich, Germany (Dated: 10 November 2018)

We use Markov state models (MSMs) to analyze the dynamics of a β -hairpin-formingpeptide in Monte Carlo (MC) simulations with interacting protein crowders, for twodiﬀerent types of crowder proteins [bovine pancreatic trypsin inhibitor (BPTI) andGB1]. In these systems, at the temperature used, the peptide can be folded orunfolded and bound or unbound to crowder molecules. Four or ﬁve major free-energy minima can be identiﬁed. To estimate the dominant MC relaxation timesof the peptide, we build MSMs using a range of diﬀerent time resolutions or lagtimes. We show that stable relaxation-time estimates can be obtained from theMSM eigenfunctions through ﬁts to autocorrelation data. The eigenfunctions remainsuﬃciently accurate to permit stable relaxation-time estimation down to small lagtimes, at which point simple estimates based on the corresponding eigenvalues havelarge systematic uncertainties. The presence of the crowders have a stabilizing eﬀecton the peptide, especially with BPTI crowders, which can be attributed to a reducedunfolding rate k u , while the folding rate k f is left largely unchanged.PACS numbers: 87.15.ak, 87.15.Cc, 87.15.hp, 87.15.kmKeywords: macromolecular crowding, Monte Carlo simulation, Markov state models,time-lagged independent component analysis a) Electronic mail: [email protected] b) Electronic mail: [email protected] c) Electronic mail: [email protected] a r X i v : . [ phy s i c s . b i o - ph ] J a n . INTRODUCTION In the crowded interior of living cells, proteins are surrounded by high concentrationsof macromolecules, which leads to a reduction of the volume available to a given protein.Under such conditions, steric interactions would universally favor more compact structures.A growing body of evidence indicates, however, that the eﬀects of macromolecular crowd-ing on properties such as protein stability cannot be explained in terms of steric repulsionalone.

To understand the role of other interactions, in recent years, there have been in-creasing eﬀorts to perform computer simulations with realistic crowder molecules, ratherthan hard-sphere crowders. When analyzing these large systems, a major challenge lies inidentifying the main states and dynamical modes, which may not be easily anticipated. Onepossible approach to this problem is provided by Markov modeling techniques, which inrecent years have found widespread use in studies of biomolecular processes such as foldingand binding.

Most of these studies dealt with data from molecular dynamics simulations,but the methods are general and can be used on Monte Carlo (MC) data as well.In this article, we use Markov modeling, along with time-lagged independent componentanalysis (TICA), to analyze data from MC simulations of a test peptide in the presenceof interacting protein crowders, for two diﬀerent types of crowder proteins. We show that themajor free-energy minima and slow dynamical modes of these high-dimensional systems canbe identiﬁed in a systematic manner using TICA and Markov state models (MSMs). We fur-ther show that the dominant MC relaxation times of the peptide can be robustly estimatedfrom the constructed MSMs, although simple estimates based on the MSM eigenvalues aresubject to well-known systematic uncertainties. Our procedure for relaxation-time estima-tion uses the MSM eigenfunctions and autocorrelation ﬁts, rather than the eigenvalues.As test molecule, we use the β -hairpin-forming GB1m3 peptide. The peptide is sim-ulated in homogeneous crowding environments, where either bovine pancreatic trypsin in-hibitor (BPTI) or the B1 domain of streptococcal protein G (GB1) serves as crowdingagent. Both these proteins are thermally highly stable and therefore modeled using aﬁxed-backbone approximation, whereas the GB1m3 peptide is free to fold and unfold in thesimulations. The simulations are conducted using MC dynamics at constant temperature.Recently, we studied the same systems using MC replica-exchange methods, and found thatboth BPT1 and GB1 have a stabilizing eﬀect on GB1m3. I. METHODSA. Simulated systems

The simulated systems consist of one GB1m3 molecule and eight crowder molecules,enclosed in a periodic box with side length 95 Å. The eight crowder molecules are copies ofa single protein, either BPTI or GB1. This setup yields crowder densities of ∼

100 mg/mL,whereas the macromolecule densities in cells can be ∼ The volume fractionoccupied by the crowders is around 7%. The simulation temperature is set to 332 K, whichis near the melting temperature of the free GB1m3 peptide. For reference, simulations ofthe free peptide are also performed, using the same temperature.The GB1m3 peptide is an optimized variant of the second β -hairpin (residues 41–56)in protein GB1, with enhanced stability. It diﬀers from the original sequence at 7 of 16positions. To our knowledge, no experimental structure is available for GB1m3, but itsnative fold is expected to be similar to the parent β -hairpin in GB1. B. Biophysical model

Our simulations are based on an all-atom protein representation with torsional degreesof freedom, and an implicit solvent force ﬁeld, A detailed description of the interactionpotential can be found elsewhere. In brief, the potential consists of four main terms, E = E loc + E ev + E hb + E sc . One term ( E loc ) represents local interactions between atomsseparated by only a few covalent bonds. The other, non-local terms represent excluded-volume eﬀects ( E ev ), hydrogen bonding ( E hb ), and residue-speciﬁc interactions betweenpairs of side-chains, based on hydrophobicity and charge ( E sc ). In multi-chain simulations,intermolecular interaction terms have the same form and strength as the correspondingintramolecular ones. The potential is an eﬀective energy function, parameterized throughfolding thermodynamics studies for a structurally diverse set of peptides and small proteins. Previous applications of the model include folding/unfolding studies of several proteins with >

90 residues.

Recently, it was used by us to simulate the peptides trp-cage and GB1m3in the presence of protein crowders.

Our simulations use the same fully atomistic representation of both the GB1m3 peptideand the crowder proteins. However, because of their high thermal stability, the crowder3roteins are modeled with a ﬁxed backbone, and thus with side-chain rotations as theironly internal degrees of freedom. The assumed backbone conformations of BPTI and GB1are model approximations of the PDB structures 4PTI and 2GB1, derived by MC withminimization. The structures were selected for both low energy and high similarity to theexperimental structures. The root-mean-square deviations (RMSDs) from the experimentalstructures (calculated over backbone and C β atoms) are (cid:46) C. MC simulations

The systems are simulated using MC dynamics. The simulations are done in the canonicalrather than some generalized ensemble. Also, only “small-step” elementary moves are used,so that the system cannot artiﬁcially jump between free-energy minima, without having toclimb intervening barriers. With these restrictions, the simulations should capture somebasics of the long-time dynamics. Despite the restrictions, the methods are suﬃciently fastto permit the study of the folding and binding thermodynamics of the peptide, throughsimulations containing multiple folding/unfolding and binding/unbinding events.Our move set consists of four diﬀerent updates: (i) the semi-local Biased Gaussian Steps(BGS) method for backbone degrees of freedom in the peptide, (ii) simple single-angleMetropolis updates in side chains, (iii) small rigid-body translations of whole chains, and (iv)small rigid-body rotations of whole chains. The “time” unit of the simulations is MC sweeps,where one MC sweep consists of one attempted update per degree of freedom. Speciﬁcally,each MC sweep consists of 74 attempted moves in the crowder-free system, whereas thecorresponding numbers are 1208 and 1328 with BPTI and GB1 crowders, respectively. Notethat the average number of attempted conformational updates of the peptide per MC sweepis the same in all three cases. In the simulations with crowders, the relative fractionsof BGS moves, side-chain updates, rigid-body translations and rigid-body rotations areapproximately 0.02, 0.94, 0.02 and 0.02, respectively.All simulations are run with the program PROFASI, using both vector and threadparallelization. To gather statistics, a set of independent runs is generated for each system.The number of runs is 16 with BPTI crowders, 62 with GB1 crowders, and 30 for the isolatedpeptide. Each run comprises × MC sweeps if crowders are present, and × MCsweeps without crowders. Compared to the longest relaxation times in the respective systems4see below), the individual runs are a factor >

20 longer.Several diﬀerent properties are recorded during the simulations. As a measure of thenativeness of the peptide, the number of native H bonds present, n hb , is computed, assumingthat the native H bonds are the same as in the full GB1 protein (PDB code 2GB1). Theinteraction of the peptide with surrounding crowder molecules is studied by monitoringintermolecular H bonds and C α -C α contacts. A residue pair is said to be in contact if theirC α atoms are within 8 Å.As input for our TICA and MSM analyses (see below), two sets of parameters are storedat regular intervals during the course of the simulations. The ﬁrst set consists of all (non-constant) intramolecular C α -C α distances within the peptide, called r ij . The second setconsists of intermolecular distances between the peptide and the crowders, called d ij . Specif-ically, d ij denotes the shortest C α -C α distance between peptide residue i and residue j inany of the crowder molecules. D. TICA and MSM analysis

TICA can be used as a dimensionality reduction method. It is somewhat similar toprincipal component analysis, but identiﬁes high-autocorrelation (or slow) rather than high-variance coordinates. Given time trajectories of a set of parameters, { o n } (in our case, thedistances r ij and d ij , see above), one constructs the time-lagged covariance matrix c nm ( τ cm ) = (cid:104) o n ( t ) o m ( t + τ cm ) (cid:105) t − (cid:104) o n ( t ) (cid:105) t (cid:104) o m ( t + τ cm ) (cid:105) t , where τ cm is the lag time and (cid:104)·(cid:105) t denotes anaverage over time t . By solving the (generalized) eigenvalue problem C ( τ cm )ˆ v i = ˆ λ i C (0)ˆ v i ,slow linear combinations of the original parameters can be identiﬁed. A more advancedmethod for identifying slow modes is to construct MSMs.To build an MSM, the state space needs to be discretized. In our calculations, followingRef. 22, the discretization is achieved by clustering the data with the k -means algorithm in a low-dimensional subspace spanned by slow TICA coordinates. By computing the prob-abilities of transition among these clusters in a time τ tm (which, like τ cm , is an adjustableparameter), a transition matrix is obtained. Assuming Markovian dynamics, the eigenvec-tors of this matrix have relaxation times given by ˜ t i = − τ tm / ln ˜ λ i ( τ tm ) (1)where λ > ˜ λ ≥ ˜ λ ≥ ... > are the eigenvalues. The eigenvalue ˜ λ corresponds to5 stationary distribution ( ˜ t = ∞ ), whereas all other eigenvalues correspond to relaxationmodes with ﬁnite timescales ˜ t i . The timescales obtained using Eq. 1 are expected to re-produce the dominant relaxation times of the full system if the discretization is suﬃcientlyﬁne, or if the lag time is suﬃcently large. There are several software packages available for TICA and MSM analysis.

Our cal-culations are done using the pyEMMA software. E. Timescales from autocorrelations of MSM eigenfunctions

Another way of estimating relaxation times from an MSM is to compute autocorrelationsof the eigenfunctions. The (normalized) autocorrelation function of a general property f isgiven by C f ( τ ) = [ (cid:104) f ( t ) f ( t + τ ) (cid:105) t − (cid:104) f ( t ) (cid:105) t (cid:104) f ( t + τ ) (cid:105) t ] /σ f , where σ f is the variance of f . Let ψ MSM i be the i th eigenfunction of a given MSM, and let ψ i be the true i th eigenfunction ofthe system’s time transfer operator. The autocorrelation function of ψ MSM i , C i ( τ ) , may beexpanded as C i ( τ ) = (cid:88) j c j e − τ/t j (2)where t j is the exact j th relaxation time. The coeﬃcients c j are given by c j = |(cid:104) ψ j , ψ MSM i (cid:105)| ,where the overlap (cid:104) ψ j , ψ MSM i (cid:105) can be expressed as an average with respect to the stationarydistribution, µ ( x ) : (cid:104) ψ j , ψ MSM i (cid:105) = (cid:82) dxµ ( x ) ψ j ( x ) ψ MSM i ( x ) . Note that ψ j and ψ MSM i have meanzero and unit norm. In Sect. III, overlaps between pairs of general functions are computedin the same way, after shifting and normalizing the functions to zero mean and unit norm.Now, if ψ MSM i is a good approximation of ψ i , then c j (cid:28) c i for j (cid:54) = i . If this holds, C i ( τ ) decays approximately as e − τ/t i for not too large τ (compared to t i ), so that t i can beestimated through a simple exponential ﬁt.The calculations discussed below use data for C i ( τ ) in the range of τ where . < C i ( τ ) < . . Over this range, C i ( τ ) is to a good approximation single exponential for all MSMeigenfunctions studied. The upper bound on τ is set primarily by statistical uncertainties,rather than by deviations from single-exponential behavior.6 II. RESULTS

Our analysis of the GB1m3 peptide in the three simulated systems (with BPTI crowders,with GB1 crowders, without crowders) can be divided into two parts. First, equilibriumfree-energy surfaces are constructed, using TICA coordinates. Second, the dynamics areinvestigated, using MSM techniques.

A. Free-energy landscapes

It is instructive to begin with the isolated GB1m3 peptide, whose folding thermodynamicshave been studied before using the same model. This study found that the isolated peptidefolds in a cooperative manner, and that the number of native H bonds present, n hb , is a usefulfolding coordinate that has a bimodal distribution at the melting temperature. Figure 1ashows the free energy of the isolated GB1m3, calculated as a function of the two slowestTICA coordinates, TIC0 and TIC1. The free-energy surface exhibits two major minima,labeled I and II, which are well separated in the TIC0 direction. From Fig. 1(b), it can beseen this coordinate is strongly (anti-) correlated with n hb . This correlation implies that thepeptide is native-like in free-energy minimum I, and unfolded in minimum II.We now turn to the system where GB1m3 is surrounded by BPTI crowders. Here, theTICA coordinates are linear combinations of both intra- and intermolecular distances ( r ij and d ij ; see Sec. II C). Calculated as a function of the two slowest TICA coordinates, the freeenergy exhibits four major minima, labeled I–IV (Fig. 2(a)). To characterize the minima,an interpretation of the TIC0 and TIC1 coordinates is needed. As in the previous case,TIC0 is strongly correlated with n hb (Fig. 2(b)), and thus linked to the degree of nativeness.Inspection of the eigenvector corresponding to TIC1 suggests that this coordinate dependsstrongly on certain peptide-crowder distances d ij involving the BPTI residue Pro8, which ispart of a sticky patch on the BPTI surface. Motivated by this observation, Fig. 2(c) showsthe TIC0,TIC1-dependence of a function deﬁned to be unity whenever there is at leastone residue-pair contact between the peptide and a Pro8 BPTI residue, and zero otherwise(smoothing is used). This function is indeed strongly correlated with TIC1. Therefore, themain free-energy minima can be classiﬁed based on whether or not the peptide is native-like,and whether or not the peptide forms any Pro8 BPTI contact. The peptide is native-like and7 − T I C I II (a) . . . . . . . . . . F r eee n e r g y ( k T ) − − T I C (b) o f n a t i v e H b o nd s FIG. 1. (a) Free energy of the isolated GB1m3 peptide, calculated as a function of the twoslowest TICA coordinates, TIC0 and TIC1. Major minima are labeled by Roman numerals. (b)The dependence of the number of native H bonds, n hb , on these coordinates. Here, each storedconformation is represented by a point in the TIC0,TIC1-plane, in a color determined by the valueof n hb . Smoothing is applied to improve readability. The TICA lag time is set to τ cm = 10 MCsweeps. bound in minimum I, which actually can be split into two distinct subminima, correspondingto two preferred orientations of the folded and bound peptide. In the remaining three mainminima, the peptide is either unfolded and bound (minimum II), native-like and unbound(minimum III), or unfolded and unbound (minimum IV).With GB1 crowders, the free energy of GB1m3 exhibits ﬁve well-separated and easilyvisible minima (Fig. 3(a)) when calculated as a function of the slowest and third-slowestTICA coordinates. The TIC0,TIC2-plane is used here because two of the minima (IIIand IV) cannot be distinguished in the TIC0,TIC1-plane (see the supplementary material,Fig. S1). The TIC0 coordinate is again correlated with the degree of nativeness of thepeptide (Fig. 3(b)). Proper interpretation of the TIC2 coordinate requires knowledge of thepreferred peptide-crowder binding modes. It turns out that there are two preferred bindingmodes, called B1 and B2. In both cases, binding occurs through β -sheet extension; the edgestrand β β -hairpin. The binding modes can be described in terms of the H bonds involved(see the supplementary material, Fig. S2). Figures 3(c) and 3(d) show how the presence ofH bonds associated with the respective modes vary with TIC0 and TIC2. Apparently, lowand high TIC2 signal B1 and B2 binding, respectively. A similar analysis of TIC1 shows8 − T I C I IIIII IV (a) F r eee n e r g y ( k T ) − T I C (b) o f n a t i v e H b o nd s − T I C (c) . . . . . . B i nd i n g t o P r o8 i n B P T I FIG. 2. Characterization of the GB1m3 peptide in the presence of BPTI crowders, using the twoslowest TICA coordinates, TIC0 and TIC1. (a) Free energy. Major minima are labeled by Romannumerals. (b) The number of native H bonds present in the peptide, n hb . (c) A function which isunity whenever there is at least one residue-pair C α -C α contact between the peptide and a Pro8BPTI residue, and zero otherwise (drawn using smoothing). The contact cutoﬀ distance is 8 Å.The TICA lag time is set to τ cm = 10 MC sweeps. that this coordinate separates bound and unbound states, but discriminates poorly betweenthe B1 and B2 modes (see the supplementary material, Fig. S1(c,d)). The isolated islandat low TIC0 and intermediate TIC2 stems from simultaneous binding of the peptide viaboth modes, to two crowder molecules. Based on the above observations, the free-energyminima in Fig. 3(a) can be described as follows. In minima I and II, the peptide is unfoldedand native-like, respectively, and neither B1 nor B2 binding occurs. In the remaining threeminima, the peptide is native-like and bound. The mode of binding is either B1 (minimumIII), B2 (minimum IV) or both (minimum V).It is worth noting that the interpretation of the TIC0 coordinates of the BPTI and GB1systems is not necessarily the same, although TIC0 is a strongly correlated with folding in9 − T I C IIIIIIIVV (a) F r eee n e r g y ( k T ) − − T I C (b) o f n a t i v e H b o nd s − − T I C (c) o f H b o nd s o f t y p e B − − T I C (d) o f H b o nd s o f t y p e B FIG. 3. Characterization of the GB1m3 peptide in the presence of GB1 crowders, using the slowestand third-slowest TICA coordinates, TIC0 and TIC2. (a) Free energy. Major minima are labeledby Roman numerals. (b) The number of native H bonds present in the peptide, n hb (c,d) Thenumbers of present H bonds associated with the peptide-crowder binding modes B1 and B2 (seethe supplementary material, Fig. S2), respectively. The TICA lag time is set to τ cm = 20 × MCsweeps. both cases. In the GB1 system, TIC0 is correlated not only with folding but also with doublebinding (Fig. 3(c,d)). By contrast, in the BPTI system, the correlation between TIC0 andthe binding coordinate is weak (Fig. 2(c)).To sum up, the results of this section show that TICA provides useful coordinates fordescribing the free energy of the peptide in the diﬀerent systems. Using a few slow TICAcoordinates, the main free-energy minima can be identiﬁed.10 . Dynamics

TICA provides a ﬁrst approximation of the slow modes. For a more detailed investigationof the dynamics of the peptide in our simulations with crowders, MSMs are constructed asdescribed in Sec. II D, for a range of lag times τ tm . Relaxation times are estimated by twomethods: (i) from MSM eigenvalues (Eq. 1), and (ii) by ﬁts to autocorrelation data for MSMeigenfunctions (Sec. II E). Illustrations of how the main MSM eigenfunctions are related tothe TICA modes discussed above can be found in the supplementary material (Figs. S3–S6).Figure 4(a) shows estimates of the four longest relaxation times in the system with BPTIcrowders, as obtained by the above-mentioned methods. As expected, the eigenvalue-basedestimates have systematic errors for small lag times τ tm . To keep this error low, τ tm hasto be comparable to the timescale in question. The estimates based on autocorrelationanalysis depend, by contrast, only very weakly on τ tm . This behavior suggests that the truerelaxation times can be estimated from the MSM eigenfunctions even if τ tm is relatively small.Consistent with this, a further test shows that the shape of the slowest MSM eigenfunctiondepends only weakly on τ tm . Here, pairwise overlaps (see Sect. II E) were computed betweenvariants of this function obtained for diﬀerent τ tm . The overlap was ≥ τ tm .Figure 4(b) compares the raw autocorrelation functions for the two slowest MSM eigen-functions to those for the folding and binding coordinates studied in Figs. 2(b) and 2(c),respectively. One observation that can be made is that the autocorrelations of the foldingand binding coordinates, not unexpectedly, show clear deviations from single-exponentialbehavior at small τ . The MSM eigenfunctions are, as intended by construction, much closerto single exponential, which facilitates the extraction of relaxation times.Another observation from Fig. 4(b) is that, except at small τ , the autocorrelations ofthe ﬁrst MSM eigenfunction and the folding coordinate decay at very similar rates. A closerelationship between these two functions is indeed suggested from a comparison of Figs. 2(b)and S4(a) (see the supplementary material). This conclusion is further strengthened by theiroverlap (about 0.88). The autocorrelation function for the second MSM eigenfunction some-what resembles that for the binding coordinate (Fig. 4(b)), but the overlap is not very large(about 0.36); the binding coordinate overlaps signiﬁcantly with other MSM eigenfunctions aswell. Thus, while the second eigenfunction probably is related to binding, that relationship11

100 200 300Lag time, τ tm /10 MC sweeps0200400600800 T i m e s c a l e /10 M C s w ee p s (a) AutocorrelationsMSM eigenvalues τ /10 MC sweeps0 . . A u t o c o rr e l a t i o n , C ( τ ) (b) FoldingBindingMSM eigenfunction 1MSM eigenfunction 2

FIG. 4. Long-time dynamics of GB1m3 in the presence of BPTI crowders. Shaded areas indicatestatistical 1 σ errors. (a) Estimates of the four longest relaxation times, as obtained using MSMeigenvalues (Eq. 1; blue curves) and autocorrelation analysis (Sec. II E; red curves). The data areplotted against the lag time τ tm of the MSM transition matrix. The second and third longesttimescales are very similar. In building the MSMs, data were clustered in the space spanned bythe four slowest TICA modes (using τ cm = 10 MC sweeps), into 800 clusters. (b) Autocorrelationfunctions, C ( τ ) , for the two slowest MSM eigenfunctions ( τ tm = 25 × MC sweeps), the foldingvariable n hb (Fig. 2(b)), and the binding variable studied in Fig. 2(c). is not fully captured by the binding coordinate.Figure 5 shows data from our simulations with GB1 crowders. The statistical uncertain-ties are larger for this system. The main reason for this is that transitions to and fromfree-energy minimum V (Fig. 3(a)), where the peptide simultaneously binds two crowdermolecules, occur only rarely in the simulations. Nevertheless, after increasing the number ofruns from 16 for the BPTI system to 62, our total data set contains about 30 independentvisits to this minimum, and some clear trends can be seen. The estimated relaxation timesfollow the same pattern as with BPTI crowders; the estimates based on MSM eigenvaluesconverge only slowly with increasing τ tm , whereas those based on autocorrelation analysisare essentially constant down to small τ tm (Fig. 5(a)). However, in the GB1 system, theﬁrst MSM eigenfunction is more closely linked to binding than to folding. To show this, abinary function sensitive to simultaneous binding of the peptide to two crowder moleculesis calculated. Speciﬁcally, this function is deﬁned as χ b = χ χ , where χ i is unity if atleast three of the H bonds associated with binding mode i (see the supplementary mate-rial, Fig. S2) are present, and χ i = 0 otherwise. Figure 5(b) shows autocorrelation data12

100 200 300 400Lag time, τ tm /10 MC sweeps02505007501000 T i m e s c a l e /10 M C s w ee p s (a) AutocorrelationsMSM eigenvalues τ /10 MC sweeps0 . . A u t o c o rr e l a t i o n , C ( τ ) (b) FoldingDouble binding MSM eigenfunction 1MSM eigenfunction 2

FIG. 5. Long-time dynamics of GB1m3 in the presence of GB1 crowders. Shaded areas indicatestatistical 1 σ errors. (a) Estimates of the four longest relaxation times, as obtained using MSMeigenvalues (Eq. 1; blue curves) and autocorrelation analysis (Sec. II E; red curves). The data areplotted against the lag time τ tm of the MSM transition matrix. In building the MSMs, data wereclustered in the space spanned by the three slowest TICA modes (using τ cm = 20 × MC sweeps),into 1574 clusters. (b) Autocorrelation functions, C ( τ ) , for the two slowest MSM eigenfunctions( τ tm = 25 × MC sweeps), the folding variable n hb (Fig. 3(b)), and the binding variable χ b (seetext). For clarity, statistical errors are shown only for one of the four functions. The statisticaluncertainties are somewhat larger for the binding variable χ b than they are for the other threefunctions. for the two slowest MSM eigenfunctions, the folding coordinate ( n hb ), and the function χ b .The n hb and χ b functions are natural candidates for the slowest modes, since they are bothhighly correlated with TIC0. It turns out that the autocorrelation function of χ b decaysslower than that of n hb , and at a rate comparable to that for the ﬁrst MSM eigenfunction(Fig. 5(b)). Consistent with this, the ﬁrst MSM eigenfunction has a larger overlap with thebinding function χ b (about 0.76) than it has with the folding coordinate (about 0.44).Finally, we compute and compare the folding and unfolding rates of the peptide, k f and k u ,in the three simulated environments. To this end, we determine the native-state probability, P n (with the peptide being deﬁned as native if n hb ≥ ), and the apparent folding/unfoldingrate, k = k f + k u . The rate k is obtained by a ﬁt to autocorrelation data for the foldingcoordinate n hb (Fig. 6). Knowing k and P n and assuming a simple folded/unfolded two-statebehavior, k f and k u can be computed ( k f = kP n , k u = k − k f ). Our data for P n , k , k f and k u are summarized in Table I. The BPTI crowders cause a considerable stabilization of the13

200 400 600 800 1000 τ /10 MC sweeps0 . . A u t o c o rr e l a t i o n , C ( τ ) FreeBPTIGB1

FIG. 6. Autocorrelation function, C ( τ ) , for the folding variable n hb (the number of native Hbonds present in the peptide), as obtained without crowders, with BPTI crowders and with GB1crowders. Table I shows apparent folding rates k obtained by exponential ﬁts to the data. Shadedareas indicate statistical 1 σ errors. peptide (increased P n ), and a marked decrease in k . The decrease in k can be attributed toa lower k u ; no signiﬁcant change in k f is observed. With GB1 crowders, a similar patternis observed, although the stabilization of the peptide is much weaker in this case. Again, amarkedly reduced k u is observed, whereas the change in k f is smaller. Therefore, in both theBPTI and GB1 simulations, the peptide seems to interact more eﬃciently with the crowderswhen folded than when unfolded. At the same time, the peptide-crowder interaction isdiﬀerent in character in the BPTI and GB1 cases (see above). Note therefore that thefolding of the peptide to its native state entails the formation of both β -sheet structure anda hydrophobic side-chain cluster, which may enhance the interaction with GB1 and BPTI,respectively. IV. DISCUSSION AND SUMMARY

In this article, we have analyzed the interplay between peptide folding and peptide-crowder interactions in MC simulations of the GB1m3 peptide with protein crowders, usingTICA and MSM techniques. A common major advantage of these methods is that they canbe used to search for key coordinates of complex systems in an unsupervised manner. Weused the simpler TICA method to explore the free-energy landscape of the peptide. Using afew slow TICA coordinates, it was possible to identify the major free-energy minima of the14

ABLE I. Folding and unfolding rates of the GB1m3 peptide, k f and k u , in our three simulatedsystems. The rates are computed from the apparent rate constant k = k f + k u and the native-stateprobability, P n . The peptide is taken as native if at least three native H bonds are present, and k is obtained by ﬁts to the data in Fig. 6. Rates are in units of ( MC sweeps) − .System P n k k f k u No crowders . ± .

01 3 . ± . . ± . . ± . BPTI crowders . ± .

02 1 . ± . . ± . . ± . GB1 crowders . ± .

01 2 . ± . . ± . . ± . peptide in the presence of the crowders.In order to quantitatively analyze the dynamics of the peptide in the simulations, webuilt MSMs. MSMs oﬀer a convenient method for estimating relaxation times, from theeigenvalues via Eq. 1. However, this method is subject to well-known systematic uncertain-ties. In particular, it assumes eﬀectively Markovian dynamics, which, at a given level ofcoarse graining, need not hold for small lag times τ tm . Unfortunately, in our systems, τ tm had to be comparable to the relaxation time in question to keep the systematic error low.Instead, we therefore estimated relaxation times by a procedure based on ﬁts to autocorrela-tion data for the MSM eigenfunctions. The estimates obtained this way show essentially no τ tm -dependence. This robustness suggests that the calculated MSM eigenfunctions maintainsigniﬁcant overlaps with the respective true eigenfunctions down to the smallest τ tm valuesused.It is, of course, also possible to estimate relaxation times from autocorrelation data forother functions than the MSM eigenfunctions. However, the autocorrelation of a generalfunction is a multi-exponential whose parameters may be statistically challenging to de-termine. The autocorrelation of an MSM eigenfunction should, by contrast, be close tosingle-exponential over a range of τ , if this eigenfunction approximates the true eigenfunc-tion suﬃciently well (at low and high τ , deviations will occur, since the approximation isnot perfect). The autocorrelations of our MSM eigenfunctions showed this behavior, andrelaxation times could therefore be estimated by single-exponential ﬁts in an intermediaterange of τ (where . < C ( τ ) < . ). If general functions rather than the MSM eigenfunc-15ions had been used, our possibilities to estimate relaxation times would have been muchmore limited.Our simulations further suggest that the GB1m3 peptide interacts more eﬃciently withboth BPTI and GB1 when folded than when unfolded. The addition of either of the crowdersled to a reduced unfolding rate k u , while the change in the folding rate k f was smaller,especially with BPTI crowders. SUPPLEMENTARY MATERIAL

See supplementary material for illustrations of (i) the free energy of GB1m3 with GB1crowders as a function of the TIC0 and TIC1 coordinates (Fig. S1), (ii) the preferred GB1m3-GB1 binding modes (Fig. S2), and (iii) the character of the leading MSM eigenfunctions inthe diﬀerent systems (Figs. S3–S6).

ACKNOWLEDGMENTS

This work was in part supported by the Swedish Research Council (Grant no. 621-2014-4522) and the Swedish strategic research program eSSENCE. The simulations were per-formed on resources provided by the Swedish National Infrastructure for Computing (SNIC)at LUNARC, Lund University, Sweden, and Jülich Supercomputing Centre, Forschungszen-trum Jülich, Germany.

REFERENCES I. Guzman, H. Gelman, J. Tai, and M. Gruebele, J. Mol. Biol. , 11 (2014). W. B. Monteith, R. D. Cohen, A. E. Smith, E. Guzman-Cisneros, and G. J. Pielak, Proc.Natl. Acad. Sci. USA , 1739 (2015). J. Danielsson, X. Mu, L. Lang, H. Wang, A. Binolﬁ, F.-X. Theillet, B. Bekei, D. T. Logan,P. Selenko, H. Wennerström, and M. Oliveberg, Proc. Natl. Acad. Sci. USA , 12402(2015). S. R. McGuﬀee and A. H. Elcock, PLoS Comput. Biol. , e1000694 (2010). M. Feig and Y. Sugita, J. Phys. Chem. B , 599 (2012). A. V. Predeus, S. Gul, S. M. Gopal, and M. Feig, J. Phys. Chem. B , 8610 (2012).16

B. Macdonald, S. McCarley, S. Noeen, and A. E. van Giessen, J. Phys. Chem. B ,2956 (2015). A. Bille, B. Linse, S. Mohanty, and A. Irbäck, J. Chem. Phys. , 175102 (2015). I. Yu, T. Mori, T. Ando, R. Harada, J. Jung, Y. Sugita, and M. Feig, eLife , 18457(2016). M. Feig, I. Yu, P.-h. Wang, G. Nawrocki, and Y. Sugita, J. Phys. Chem. B , 8009(2017). S. Qin and H.-X. Zhou, Curr. Opin. Struct. Biol. , 28 (2017). C. Schütte, A. Fischer, W. Huisinga, and P. Deuﬂhard, J. Comput. Phys. , 146 (1999). J. D. Chodera, N. Singhal, V. S. Pande, K. A. Dill, and W. C. Swope, J. Chem. Phys. , 155101 (2007). N.-V. Buchete and G. Hummer, J. Phys. Chem. B , 6057 (2008). G. R. Bowman, K. A. Beauchamp, G. Boxer, and V. S. Pande, J. Chem. Phys. ,124101 (2009). J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte,and F. Noé, J. Chem. Phys. , 174105 (2011). J. D. Chodera and F. Noé, Curr. Opin. Struct. Biol. , 135 (2014). F. Noé and C. Clementi, Curr. Opin. Struct. Biol. , 141 (2017). L. Molgedey and H. G. Schuster, Phys. Rev. Lett. , 3634 (1994). Y. Naritomi and S. Fuchigami, J. Chem. Phys. , 215102 (2013). C. R. Schwantes and V. S. Pande, J. Chem. Theory Comput. , 2000 (2013). G. Pérez-Hernández, F. Paul, T. Giorgino, G. De Fabritiis, and F. Noé, J. Chem. Phys. , 015102 (2013). R. M. Fesinmeyer, F. M. Hudson, and N. H. Andersen, J. Am. Chem. Soc. , 7238(2004). E. Moses and H.-J. Hinz, J. Mol. Biol. , 765 (1983). A. M. Gronenborn, D. R. Filpula, N. Z. Essig, A. Achari, M. Whitlow, P. T. Wingﬁeld,and G. M. Clore, Science , 657 (1991). A. Bille, S. Mohanty, and A. Irbäck, J. Chem. Phys. , 175105 (2016). A. Vendeville, D. Larivière, and E. Fourmentin, FEMS Microbiol. Rev. , 395 (2011). A. Irbäck, S. Mitternacht, and S. Mohanty, BMC Biophys. , 2 (2009).17 S. Mitternacht, S. Luccioli, A. Torcini, A. Imparato, and A. Irbäck, Biophys. J. , 429(2009). S. Æ. Jónsson, S. Mohanty, and A. Irbäck, Proteins , 2169 (2012). S. Mohanty, J. H. Meinke, and O. Zimmermann, Proteins , 1446 (2013). A. Bille, S. Æ. Jónsson, M. Akke, and A. Irbäck, J. Phys. Chem. B , 9194 (2013). S. Æ. Jónsson, S. Mitternacht, and A. Irbäck, Biophys. J. , 2725 (2013). J. Petrlova, A. Bhattacherjee, W. Boomsma, S. Wallin, J. O. Lagerstedt, and A. Irbäck,Protein Sci. , 1559 (2014). G. Tiana, L. Sutto, and R. A. Broglia, Physica A , 241 (2007). G. Favrin, A. Irbäck, and F. Sjunnesson, J. Chem. Phys. , 8154 (2001). A. Irbäck and S. Mohanty, J. Comput. Chem. , 1548 (2006). S. Lloyd, IEEE Trans. Inf. Theory , 129 (1982). S. Kube and M. Weber, J. Chem. Phys. , 024103 (2007). N. Djurdjevac, M. Sarich, and C. Schütte, Multiscale Model. Simul. , 61 (2012). J.-H. Prinz, J. D. Chodera, and F. Noé, Phys. Rev. X , 011020 (2014). M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernández, M. Hoﬀmann,N. Plattner, C. Wehmeyer, J.-H. Prinz, and F. Noé, J. Chem. Theory Comput. , 5525(2015). M. Seeber, A. Felline, F. Raimondi, S. Muﬀ, R. Friedman, F. Rao, A. Caﬂisch, andF. Fanelli, J. Comput. Chem. , 1183 (2010). X. Biarnés, F. Pietrucci, F. Marinelli, and A. Laio, Comput. Phys. Commun. , 203(2012). M. P. Harrigan, M. M. Sultan, C. X. Hernández, B. E. Husic, P. Eastman, C. R. Schwantes,K. A. Beauchamp, R. T. McGibbon, and V. S. Pande, Biophys. J.112