[PDF] A Coarse-Grained Model for Predicting RNA Folding Thermodynamics

Abstract

We present a thermodynamically robust coarse-grained model to simulate folding of RNA in monovalent salt solutions. The model includes stacking, hydrogen bond and electrostatic interactions as fundamental components in describing the stability of RNA structures. The stacking interactions are parametrized using a set of nucleotide-specific parameters, which were calibrated against the thermodynamic measurements for single-base stacks and base-pair stacks. All hydrogen bonds are assumed to have the same strength, regardless of their context in the RNA structure. The ionic buffer is modeled implicitly, using the concept of counterion condensation and the Debye-Hückel theory. The three adjustable parameters in the model were determined by fitting the experimental data for two RNA hairpins and a pseudoknot. A single set of parameters provides good agreement with thermodynamic data for the three RNA molecules over a wide range of temperatures and salt concentrations. In the process of calibrating the model, we establish the extent of counterion condensation onto the single-stranded RNA backbone. The reduced backbone charge is independent of the ionic strength and is 60% of the RNA bare charge at 37 degrees Celsius. Our model can be used to predict the folding thermodynamics for any RNA molecule in the presence of monovalent ions.

Full PDF

aa r X i v : . [ q - b i o . B M ] A p r A Coarse-Grained Model for Predicting RNA FoldingThermodynamics

Natalia A. Denesyuk and D. Thirumalai ∗ Department of Chemistry and Biochemistry and Biophysics Program, Institute for PhysicalScience and Technology, University of Maryland, College Park, Maryland 20742

E-mail: [email protected]

Tel.: 301-405-4803Fax: 301-314-9404 ∗ To whom correspondence should be addressed bstract We present a thermodynamically robust coarse-grained model to simulate folding of RNAin monovalent salt solutions. The model includes stacking, hydrogen bond and electrostaticinteractions as fundamental components in describing the stability of RNA structures. Thestacking interactions are parametrized using a set of nucleotide-speciﬁc parameters, whichwere calibrated against the thermodynamic measurements for single-base stacks and base-pairstacks. All hydrogen bonds are assumed to have the same strength, regardless of their contextin the RNA structure. The ionic buffer is modeled implicitly, using the concept of counterioncondensation and the Debye-Hückel theory. The three adjustable parameters in the modelwere determined by ﬁtting the experimental data for two RNA hairpins and a pseudoknot.A single set of parameters provides good agreement with thermodynamic data for the threeRNA molecules over a wide range of temperatures and salt concentrations. In the processof calibrating the model, we establish the extent of counterion condensation onto the single-stranded RNA backbone. The reduced backbone charge is independent of the ionic strengthand is 60% of the RNA bare charge at 37 ◦ C. Our model can be used to predict the foldingthermodynamics for any RNA molecule in the presence of monovalent ions.

Keywords: Hairpin, pseudoknot, melting temperature, ionic buffer2 ntroduction

Since the landmark discovery that RNA molecules can act as enzymes, an increasing repertoireof cellular functions has been associated with RNA, raising the need to understand how these com-plex molecules fold into elaborate tertiary structures. In response to this challenge, great strideshave been made in describing RNA folding. Single molecule and ensemble experiments usinga variety of biophysical methods, combined with theoretical techniques, have led to a conceptualframework for interpreting the thermodynamics and kinetics of RNA folding.

Despite theseadvances, there are very few reliable structural models with the ability to quantitatively predict thethermodynamic properties of RNA (see, however, refs. 11–17). The development of simple andaccurate models is complicated by the interplay of several energy and length scales, which arisefrom stacking, hydrogen bond and electrostatic interactions. Although multiple interactions con-tribute to the stability of RNA, the most vexing of these are the electrostatic interactions, since thenegatively charged phosphate groups make RNA a strongly charged polyelectrolyte. Because ofthe strong intramolecular Coulomb repulsion, the magnitude of the charge on the phosphate groupshas to be reduced in order for RNA molecules to fold. The softening of repulsion between the phos-phate groups requires the presence of counterions. A number of factors such as the Debye length,the Bjerrum length, the number of nucleotides in RNA, as well as the size, valence and shapeof counterions modulate electrostatic interactions, which further complicates the prediction ofRNA folding thermodynamics.In principle, all-atom simulations of RNA in water provide a straightforward route to com-puting RNA folding thermodynamics. However, uncertainties in nucleic acid force ﬁelds and thedifﬁculty in obtaining adequate conformational sampling have prevented routine use of all-atomsimulations to study the folding of even small RNA molecules. At the same time, the successof using polyelectrolyte theories and simulations in capturing many salient features of RNAfolding justiﬁes the development of coarse-grained (CG) models. None of the existing CG modelsof RNA, which have been remarkably successful in a variety of applications, have been usedto reproduce folding thermodynamics over a wide range of ion concentrations and temperature. In3his paper, we introduce a force ﬁeld based on a CG model in which each nucleotide is representedby three interactions sites (TIS) — a phosphate, a sugar and a base. The TIS force ﬁeld includesstacking, hydrogen bond and electrostatic interactions that are known to contribute signiﬁcantlyto the stability of RNA structures. We obtain the thermodynamic parameters for the stacking andhydrogen bond interactions by matching the simulation and experimental melting data for variousnucleotide dimers and for the pseudoknot from mouse mammary tumor virus (MMTV PK in Fig-ure 1). Our description of the electrostatic interactions in RNA relies on the concept of counterioncondensation, which posits that counterions condense onto the sugar-phosphate backbone and par-tially reduce the charge on each phosphate group. Our simulations provide a way to determine themagnitude of the reduced backbone charge by ﬁtting the experimental data for the ion-dependentstability of RNA hairpins (L8 and L10 in Figure 1). Remarkably, experimental data on foldingthermodynamics of the MMTV PK, L8, and L10 are reproduced well over a wide range of temper-atures and concentrations of monovalent salt using a single set of force ﬁeld parameters. Our CGforce ﬁeld is transferable, and hence can be adopted for other RNA molecules as well.

Methods

Three Interaction Site (TIS) Representation of RNA

In the TIS model, each nucleotide is replaced by three spherical beads P, S and B, representing aphosphate, a sugar and a base (Figure 2). The coarse-grained beads are at the center of mass of thechemical groups. The energy function in the TIS model, U TIS , has the following six components, U TIS = U BL + U BA + U EV + U ST + U HB + U EL , (1)which correspond to bond length and angle constraints, excluded volume repulsions, single strandbase stacking, inter-strand hydrogen bonding and electrostatic interactions. We constrain bondlengths, r , and angles, a , by harmonic potentials, U BL ( r ) = k r ( r − r ) and U BA ( a ) = k a ( a − ) , where the equilibrium values r and a are obtained by coarse-graining an ideal A-formRNA helix. The values of k r , in units of kcal mol − Å − , are: 64 for an S → P bond, 23 for anP → S bond (“ → ” indicates the downstream direction) and 10 for an S − B bond. The values of k a are 5 kcal mol − rad − if the angle involves a base, and 20 kcal mol − rad − otherwise.Excluded volume between the interacting sites is modeled by a Weeks-Chandler-Andersen(WCA) potential, U EV ( r ) = e "(cid:18) D r (cid:19) − (cid:18) D r (cid:19) + , r ≤ D , U EV ( r ) = , r > D , (2)which has been commonly used to study excluded volume effects in ﬂuids. The precise form of U EV ( r ) will not affect the results as long as U EV ( r ) is short-ranged. The WCA potential is com-putationally efﬁcient because it vanishes exactly beyond the contact distance D . To allow closeapproach between two bases that stack ﬂat one on top of another, we assume D = . e = D and e for all interacting sites. We note that the speciﬁc choice of parameters in Eq. (2)has little effect on the results obtained. In our simulations, stable folds are sustained by stack-ing and hydrogen bond interactions, U ST and U HB , which are parameterized using experimentalthermodynamic data and accurate approach distances between various RNA groups (see below). Stacking Interactions

Single strand stacking interactions, U ST , are applied to any two consecutive nucleotides along thechain, U ST = U + k r ( r − r ) + k f ( f − f , ) + k f ( f − f , ) , (3)5here r , f and f are deﬁned in Figure 2. Sixteen distinct nucleotide dimers are modeled withdifferent r , f , , f , and U . The structural parameters r , f , and f , are obtained by coarse-graining an A-form RNA helix. To estimate standard deviations of r and f , f from the corre-sponding values in a A-form helix, we used double helices in the NMR structure of the pseudoknotfrom human telomerase RNA (PDB code 2K96). We chose this pseudoknot because it has twofairly long stems containing six and nine Watson-Crick base pairs. We had previously conductedsimulations of the two stems at 15 ◦ C in the limit of high ionic strength and found that, for k r = . − and k f = − , the time averages of ( r − r ) , ( f − f , ) and ( f − f , ) agreedwell with the standard deviations computed from the NMR structure. The time averages were notvery sensitive to a speciﬁc choice of U . Using k r = . − and k f = − , we derive U from available thermodynamic measurements of single-stranded and double-stranded RNA, as described below. Thermodynamic parameters of dimers from experiments

In the nearest neighbor model of RNA duplexes, the total stability of a duplex is given by a sum ofsuccessive contributions D G (cid:0) x − yw − z (cid:1) , where x − y denotes a base pair stacked over the preceding basepair w − z . The enthalpy, D H , and entropy, D S , components of D G (cid:0) x − yw − z (cid:1) are known experimentallyat 1 M salt concentration. Here, we make the following assumptions: D H (cid:18) x − yw − z (cid:19) = D H (cid:18) xw (cid:19) + D H (cid:18) zy (cid:19) + . D H ( w − z ) + . D H ( x − y ) , D S (cid:18) x − yw − z (cid:19) = D S (cid:18) xw (cid:19) + D S (cid:18) zy (cid:19) , (4)where D H (cid:0) xw (cid:1) and D S (cid:0) xw (cid:1) are the thermodynamic parameters associated with stacking of x over w along 5 ′ → ′ in one strand. Additional enthalpy gain D H ( w − z ) arises from hydrogen bondingbetween w and z in complementary strands. Our goal is to solve Eqs. (4) for D H (cid:0) xw (cid:1) , D S (cid:0) xw (cid:1) and D H ( w − z ) . Since the number of unknowns exceeds the number of equations, we have to makesome additional assumptions. 6e average the thermodynamic parameters on the left-hand side of Eqs. (4) for stacks (cid:0) U − AA − U (cid:1) and (cid:0) A − UU − A (cid:1) , (cid:0) A − UC − G (cid:1) and (cid:0) U − AG − C (cid:1) , and (cid:0) U − AC − G (cid:1) and (cid:0) A − UG − C (cid:1) , because these values are similar withinexperimental uncertainty. This allows us to assign D H (cid:0) xw (cid:1) = D H (cid:0) wx (cid:1) and D S (cid:0) xw (cid:1) = D S (cid:0) wx (cid:1) on theright-hand side of Eqs. (4) for all dimers, except for (cid:0) CG (cid:1) and (cid:0) GC (cid:1) . Additional simpliﬁcations resultfrom the analysis of experimental data on stacking of nucleotide dimers. Experiments indicatethat dimers (cid:0) AA (cid:1) , (cid:0) UA (cid:1) and (cid:0) CA (cid:1) have similar stacking propensities and can therefore be described byone set of thermodynamic parameters. The same holds for (cid:0) CC (cid:1) and (cid:0) UC (cid:1) .The melting temperature of dimer (cid:0) AA (cid:1) is known from experiment, T m = ◦ C. Accordingto the assumptions above, dimer (cid:0) UA (cid:1) has the same melting temperature. Combining Eqs. (4) for (cid:0) U − AA − U (cid:1) and the relationship D H (cid:0) UA (cid:1) = k B T m D S (cid:0) UA (cid:1) , where T m =

299 K and k B is the Boltzmannconstant, we can solve for D H (cid:0) UA (cid:1) , D S (cid:0) UA (cid:1) and D H ( A − U ) . By assigning D H (cid:0) AA (cid:1) = D H (cid:0) UA (cid:1) and D S (cid:0) AA (cid:1) = D S (cid:0) UA (cid:1) in Eqs. (4) for (cid:0) A − UA − U (cid:1) , we solve for D H (cid:0) UU (cid:1) and D S (cid:0) UU (cid:1) . Finally, we assume D H (cid:18) UC (cid:19) = k D H (cid:18) UA (cid:19) + ( − k ) D H (cid:18) UU (cid:19) , D S (cid:18) UC (cid:19) = k D S (cid:18) UA (cid:19) + ( − k ) D S (cid:18) UU (cid:19) , (5)where k is a constant. This assumption is based on the observation that the measured enthalpychanges of duplex formation, D H (cid:0) x − yw − z (cid:1) in Eqs. (4), are approximately in proportion to the corre-sponding entropy changes, D S (cid:0) x − yw − z (cid:1) . Furthermore, from previous assumptions, the melting tem-perature of dimer (cid:0) UC (cid:1) should match the T m of (cid:0) CC (cid:1) , which is known experimentally to be 13 ◦ C. Using this result and Eqs. (5), we obtain D H (cid:0) UC (cid:1) and D S (cid:0) UC (cid:1) .The enthalpies of hydrogen bond formation between Watson-Crick base pairs are related as D H ( G − C ) = / D H ( A − U ) , where D H ( A − U ) = − .

47 kcal/mol is the result of the calculationoutlined above. The remaining thermodynamic parameters follow directly from Eqs. (4) with nofurther approximations. The results are summarized in Table 1. The relative stacking propensitiesof dimers in Table 1 are consistent with experiments. hermodynamic parameters of dimers from simulations To calibrate the model, we simulated stacking of coarse-grained dimers similar to that shown inFigure 2. We used the stacking potential U ST in Eq. (3) with U = − h + k B ( T − T m ) s , where T (K)is the temperature, T m (K) is the melting temperature given in Table 1, and h and s are adjustableparameters. In simulations, we computed the stability D G of stacked dimers at temperature T using D G = − k B T ln p + k B T ln ( − p ) + D G , (6)where p is the fraction of all sampled conﬁgurations for which U ST < − k B T (Figure 3). Thecorrection D G in Eq. (6) is assumed to be constant for all dimers and accounts for any differencesin the deﬁnition of D G between experiments and simulations.Figure 3 shows the simulation values of D G for the dimer (cid:0) GA (cid:1) , as a function of T . At D G = s =

0, the melting temperature T m of (cid:0) GA (cid:1) , computed using D G ( T m ) =

0, increases with h andequals T m in Table 1 when h = .

98 kcal/mol. If s =

0, the entropy loss of stacking, given bythe slope of D G ( T ) over T , is smaller than the value of D S speciﬁed in Table 1. To rectify thisdiscrepancy we take U = − . + k B ( T − T m ) s with s >

0, which does not alter T m but allows usto adjust the slope of D G ( T ) by adjusting the value of s . We ﬁnd that s = .

30 is consistent with D S of (cid:0) GA (cid:1) in Table 1.We carried out the same ﬁtting procedure for all coarse-grained dimers. The resulting param-eters U are summarized in Table 2 for D G = . D G ≈ k B T at room temperature).This value of D G gives the best agreement between simulation and experiment (see also Resultsand Discussion). Note that, although some stacks have equivalent thermodynamic parameters inTable 1, they have somewhat different U due to their geometrical differences.Finally, the parameters U in Table 2 are coupled to the speciﬁc choice of k r and k f in Eq. (3),since these coefﬁcients determine how much entropy is lost upon formation of a model stack. Forany reasonable choice of k r and k f , the coarse-grained simulation model without explicit solventwill require correction factors s to match the experimental D S . If different values of k r and k f U ( h and s ) are alsoreadjusted following the ﬁtting procedure outlined above. Hydrogen Bond Interactions

To model the RNA structures shown in Figure 1, we use coarse-grained hydrogen bond interac-tions U HB which mimic the atomistic hydrogen bonds present in the folded structure. The atom-istic structures of hairpins L8 and L10 have not been determined experimentally. We assume thatthe only hydrogen bonds stabilizing these hairpins come from six Watson-Crick base pairs in thehairpin stem. The NMR structure for the MMTV PK is available (PDB code 1RNK ). For theMMTV PK, we generated an optimal network of hydrogen bonds by submitting the NMR structureto the WHAT IF server at http://swift.cmbi.ru.nl . Each hydrogen bond is modeled bya coarse-grained interaction potential, U HB = U × (cid:2) + ( r − r ) + . ( q − q , ) + . ( q − q , ) + . ( y − y ) + . ( y − y , ) + . ( y − y , ) (cid:3) − , (7)where r , q , q , y , y and y are deﬁned in Figure 4 for different coarse-grained sites. For Watson-Crick base pairs, the equilibrium values r , q , , q , , y , y , and y , are adopted from the coarse-grained structure of an ideal A-form RNA helix. For all other bonds, the equilibrium parametersare obtained by coarse-graining the PDB structure of the RNA molecule. Our approach assumesthat an A-form helix is an equilibrium state for RNA canonical secondary structure. Modeling ofnon-canonical base pairing and of the tertiary interactions is biased to the native structure. Thecoefﬁcients 5, 1.5 and 0.15 in Eq. (7) were determined from the same simulations as k r and k f in Eq. (3). Equation (7) speciﬁes U HB for a single hydrogen bond and it must be multiplied bya factor of 2 or 3 if the same coarse-grained sites are connected by multiple bonds (as in basepairing). The geometry of U HB in Eq. (7) is the minimum necessary to maintain stable helices inthe coarse-grained model. In particular, simulations of the MMTV PK (Figure 1) at 10 ◦ C yield9he RMS deviation from the NMR structure of 1.4 and 2.0 Å for stems 1 and 2, respectively.In the present implementation of the model, the only hydrogen bonds included in simulationare those that are found in the PDB structure of the RNA molecule. However, large RNA moleculesmay have alternative patterns of secondary structure that are sufﬁciently stable to compete with thenative fold. To account for this possibility, we have developed an extended version of the modelwhere we allow the formation of any G − C, A − U or G − U base pair. Although easily implemented,this additional feature makes simulations signiﬁcantly less efﬁcient due to a large number of basepairing possibilities. A description of the extended model and its implementation for large RNAwill be reported separately. For small RNA molecules, similar to the ones considered here, we ﬁndthat the folding thermodynamics is largely unaffected by the inclusion of alternative base pairing.

Electrostatic interactions

To model electrostatic interactions, we employ the Debye-Hückel approximation combined withthe concept of counterion condensation, which has been used previously to determine the re-duced charge on the phosphate groups in RNA. The highly negatively charged RNA attractscounterions, which condense onto the sugar-phosphate backbone. The loss in translational entropyof a bound ion (in the case of spherical counterions) is compensated by an effective binding energybetween the ion and RNA, thus making counterion condensation favorable. Upon condensationof counterions onto the RNA molecule, the charge of each phosphate group decreases from − e to − Qe , where Q < e is the proton charge. The uncondensed mobile ions are described by thelinearized Poisson-Boltzmann (or Debye-Hückel) equation. It can be shown that the electrostaticfree energy of this system is given by G DH = Q e e (cid:229) i , j exp (cid:0) −| r i − r j | / l (cid:1) | r i − r j | , (8)where | r i − r j | is the distance between two phosphates i and j , e is the dielectric constant of waterand l is the Debye-Hückel screening length. The value of the Debye length l must be calculated10ndividually for each buffer solution using l − = pe k B T (cid:229) n q n r n , (9)where q n is the charge of an ion of type n and r n is its number density in the solution. Ifevaluated in units of Å − , the number density r is related to the molar concentration c through r = . × − c . In the simulation model, the free energy G DH is viewed as the effective energyof electrostatic interactions between RNA phosphates, U EL = G DH , and as such it contributes anextra term to the energy function in Eq. (1). This implicit inclusion of the ionic buffer signiﬁcantlyspeeds up simulations, leading to much enhanced sampling of RNA conformations.To complete our description of U EL , we still need to deﬁne the magnitude of the phosphatecharge Q . For rod-like polyelectrolytes in monovalent salt solutions, Manning’s theory of counte-rion condensation predicts Q = Q ∗ ( T ) = bl B ( T ) , (10)where b is the length per unit (bare) charge in the polyelectrolyte and l B is the Bjerrum length, l B = e e k B T . (11)According to Eq. (10), the reduced charge Q ( Q = c of monovalent salt. The dependence of Q on T is nonlinear,since the dielectric constant of water decreases with the temperature, e ( T ) = . − . T + . × − T − . × − T , (12)where T is in ◦ C.We estimate b in Eq. (10) from available folding data for hairpins L8 and L10, which were mea-sured extensively in monovalent salt solutions of different ionic strength. We ﬁnd that b = .

4Å reproduces measured stabilities of these hairpins over a wide range of salt concentrations. As-11uming b = . Q . If the density of the bare backbone chargeis slightly underestimated then Eq. (10) will predict a larger value of Q . Therefore, ﬁtting Q to experimental data allows us to compensate for small scale variations in the backbone chargedensity. Calculation of Stabilities

We are interested in calculating the stability D G of the RNA structures shown in Figure 1 as afunction of temperature T . However, the folded and unfolded states of RNA coexist only in anarrow range of T around the melting temperature. Thus, computing D G by means of directsampling of the folding/unfolding transition at any T is not feasible. Below we derive a formula for D G ( T ) from fundamental thermodynamic relationships that enables us to circumvent this problem.Consider the Gibbs free energy of the folded state, G f = H f − T S f . We can write the followingexact expressions for the enthalpy H f and entropy S f , H f ( T ) = H f ( T ∗ ) + Z TT ∗ ¶ H f ¶ T dT , f ( T ) = S f ( T ∗ ) + Z TT ∗ ¶ S f ¶ T dT , (13)where T ∗ is an arbitrary reference temperature. The derivatives in Eq. (13) can be expressed interms of the heat capacity C f , C f = ¶ H f ¶ T = T ¶ S f ¶ T , (14)so that G f ( T ) = H f ( T ∗ ) − T S f ( T ∗ ) + Z TT ∗ C f dT − T Z TT ∗ C f T dT . (15)If we assume that the heat capacity of the folded state does not change signiﬁcantly over thetemperature range of interest, Eq. (15) simpliﬁes to G f ( T ) = H f ( T ∗ ) − T S f ( T ∗ ) − C f (cid:18) T ∗ − T + T ln TT ∗ (cid:19) . (16)According to Eq. (16), we can deduce the free energy G f of the folded state at temperature T fromthe thermodynamic properties at some other temperature T ∗ . The same result holds for the freeenergy G u ( T ) of the unfolded state.In the analysis of two-state transitions, it is convenient to use the transition (melting) temper-ature T m as the reference temperature for both folded and unfolded states. Then, the free energydifference D G between the folded and unfolded states is given by D G ( T ) = D H ( T m ) (cid:18) − TT m (cid:19) − D C (cid:18) T m − T + T ln TT m (cid:19) , (17)where we have used D G ( T m ) =

0. Equation (17) is commonly used to determine RNA stabil-ity from calorimetry experiments, since it expresses D G ( T ) in terms of measured changes inenthalpy and heat capacity.In simulations, we calculate the stability D G ( T ) of the folded RNA as follows. For each RNAillustrated in Figure 1, we run a series of Langevin dynamics simulations at different tempera-tures T in the range from 0 to 130 ◦ C. Using the weighted histogram technique, we combine the13imulation data from all T to obtain the density of energy states, r ( E ) , which is independent oftemperature. The total free energy of the system, G ( T ) , is then given by G ( T ) = − k B T ln Z r ( E ) exp (cid:18) − Ek B T (cid:19) dE , (18)where the integral, representing the partition function, runs over all energy states. At low T , thepartition function in Eq. (18) is dominated by the folded conformations and therefore, G ( T ) ≈ G f ( T ) . This allows us to rewrite Eq. (16) as G f ( T ) = G ( T ∗ ) + ¶ G ¶ T ( T ∗ ) ( T − T ∗ ) + T ¶ G ¶ T ( T ∗ ) (cid:18) T ∗ − T + T ln TT ∗ (cid:19) , (19)where we take T ∗ = ◦ C to be the reference temperature for the folded state. To obtain Eq. (19),we used S = − ¶ G / ¶ T and C = − T ¶ G / ¶ T . Equation (19) can also be used to compute thefree energy of the unfolded state, G u ( T ) , if the reference temperature T ∗ is chosen such that G ( T ∗ ) ≈ G u ( T ∗ ) — for example, T ∗ = ◦ C. The stability D G ( T ) of the folded RNA is givenby a difference between G f ( T ) and G u ( T ) , as illustrated in Figure 5.The present calculation is an alternative to the commonly used order parameter method todetermine D G and can be applied to any folding/unfolding transition without further adjustments.Furthermore, in contrast to Eq. (17), our approach will still work in systems that do not exhibit atwo-state behavior, since it employs different reference temperatures for the folded and unfoldedstates. The only assumption is that at the reference temperature T ∗ for the folded (unfolded) statethe population of the unfolded (folded) state is negligible. At the reference temperatures chosen inour simulation, 0 ◦ C and 130 ◦ C, this assumption is trivially satisﬁed.The formalism described above, including the weighted histogram technique, assumes thatthe conformational energy E in Eq. (18) does not depend on temperature. However, the stackinginteractions in Eq. (3) and electrostatic interactions in Eq. (8) have T as a parameter. The stackingparameters U in Eq. (3) are linear in T , so we can write U ST = u + k B Tu , where u and u aretemperature independent. The Boltzmann factor of the second term, exp ( − u ) , does not contain14 and cannot affect the temperature dependence of thermodynamic quantities. In the data analysisusing weighted histograms, it is convenient to incorporate this Boltzmann factor into the density ofstates r ( E ) . Effectively, this means that the stacking interactions u can be omitted from the totalenergy E in Eq. (18) and from all ensuing formulas. The electrostatic interactions in Eq. (8) dependon T nonlinearly, since Q , e and l are all functions of T . However, we operate within a relativelynarrow range of temperatures — the thermal energy k B T is between 0.54 and 0.8 kcal/mol. Thisjustiﬁes expanding the electrostatic potential U EL up to the ﬁrst order in T , which then enables usto treat it similarly to U ST . We expand U EL around 55 ◦ C, in the middle of the relevant temperaturerange. We have checked that this linear expansion does not affect the numerical results reportedhere.

Langevin Dynamics Simulations

The RNA dynamics are simulated by solving the Langevin equation, which for bead i is m i ¨ r i = − g i ˙ r i + F i + f i , where m i is the bead mass, g i is the drag coefﬁcient, F i is the conservative force, and f i is the Gaussian random force, (cid:10) f i ( t ) f j ( t ′ ) (cid:11) = k B T g i d i j d ( t − t ′ ) . The bead mass m i is equal to thetotal molecular weight of the chemical group associated with a given bead. The drag coefﬁcient g i is given by the Stokes formula, g i = ph R i , where h is the viscosity of the medium and R i is the bead radius. To enhance conformational sampling, we take h = − Pa · s, which equalsapproximately 1% of the viscosity of water. The values of R i are 2 Å for phosphates, 2.9 Å forsugars, 2.8 Å for adenines, 3 Å for guanines and 2.7 Å for cytosines and uracils. The Langevinequation is integrated using the leap-frog algorithm with a time step D t = . Results and Discussion

There are three adjustable parameters in the model: the corrective constant D G in Eq. (6), thestrength of hydrogen bonds U in Eq. (7), and the length b , which deﬁnes the reduced phosphatecharge Q in Eq. (10). The absolute value of the correction D G should be relatively small, i.e.,15 D G | < D G and U . The physical meaning of the variable Q implies that Q < Q may depend on the speciﬁc RNA structure as well as the buffer properties,since both could determine the extent of counterion condensation. However, we ﬁnd that Q doesnot vary much for different monovalent salt buffers or different RNA. Calibration of D G and U The parameters D G and U were adjusted to match the differential scanning calorimetry meltingcurve (or heat capacity) of the MMTV PK at 1 M Na + (Figure 6). The list of all hydrogen bondsin the MMTV PK structure is given in 3. The secondary structure of the MMTV PK comprisesﬁve base pairs in stem 1 and six base pairs in stem 2 (Figure 1). The tertiary structure is limited tosingular hydrogen bonds and is not stable in the absence of Mg + ions. We ﬁnd that in simulations with c = Q =

1, or if we assume counterioncondensation, Q = Q ∗ ( T ) . This is not unexpected, since the electrostatic interactions are screenedat high salt concentration and do not contribute signiﬁcantly to the RNA stability. Therefore, wecan identify D G and U which are Q -independent.The measured heat capacity at c = D G = . U = .

43 kcal/mol (Figure 6). The model correctly describes the overall shapeof the melting curve, including two peaks that indicate the melting transitions of the two stems.Stem 1 in the MMTV PK is comprised entirely of G − C base pairs (Figure 1) and, despite beingshorter than stem 2, melts at a higher temperature. Although the melting temperature of stem 2 isreproduced very accurately in simulation, the melting temperature of stem 1 is somewhat under-estimated, 89 ◦ C instead of 95 ◦ C. We speculate that the failure to precisely reproduce both peaksis due to inaccurate estimates of the stacking parameters U at high temperatures. In addition toseveral approximations involved in the derivation of U , the experimental data that were used in16his derivation were obtained at 37 ◦ C or below. The approximation that enthalpies and entropiesare constant may not be accurate for large temperature extrapolations. It is therefore expected thatthe agreement between experiment and simulation will be compromised at high temperatures.In the rest of the paper we set D G = . U = .

43 kcal/mol. Since the mag-nitude of the backbone charge could not be determined at high salt concentration, we will analysethe measurements of hairpin stability that cover a wide range of c . In this analysis we assume that Q is given by Eq. (10), with b constant. Determination of b To estimate the reduced phosphate charge Q , we have computed the stabilities D G ( c ) of hairpinsL8 and L10 (Figure 1) for different values of b in Eq. (10). We use these hairpins as bench-marks because their folding enthalpies and entropies have been measured over a wide range of c , from 0.02 M to 1 M Na + . The experimental D G ( c ) of L8 and L10 increase linearly with ln c for c < .

11 M, but the extrapolation of this linear dependence to c > .

11 M does not yield themeasured stabilities at 1 M salt (Figure 7). In addition, the measured stability of L10 at 1 M isdisproportionately larger than that of L8. For these reasons, we use 0.11 M as a reference saltconcentration c , instead of usual 1 M, and compare simulation and experiment in terms of relativestabilities DD G ( c ) = D G ( c ) − D G ( c ) .The simulation model reproduces correctly the linear dependence of DD G on ln c , DD G ( c ) = − k c ln cc (20)for c < .

11 M. It also predicts an upward curvature of DD G ( c ) for c > .

11 M (Figure 7). We ﬁndthat b = . k c . Note that,although the stability of RNA hairpins decreases sharply with temperature, the salt dependenceof D G is mostly insensitive to T (Figure 7). The linear slope in Eq. (20) does not change withtemperature in experiment and simulation. 17n uncertainty in the analysis of the L8 and L10 data comes from the 5’-pppG, which is subjectto hydrolysis in solution. The total number of phosphates N p may vary from 20 to 22 in L8 andfrom 22 to 24 in L10. Panels (c) and (d) in Figure 7 show DD G for the hairpins with charge − Qe , − Qe or − Qe at the 5’-end for t = ◦ C and Q = .

60 ( b = . k c was shown to increase linearly with the total number of phosphates in the duplex, k c = . N p . (21)This formula assumes implicitly that all phosphates contribute equally to the duplex stability. Weﬁnd that for short RNA hairpins, such as L8 and L10, the end effects are signiﬁcantly greater than1 / N p . In Figure 7, k c = . N p for L8 with N p =

22 and k c = . N p for L10 with N p = k c / N p shifts towards its value in Eq. (21) with increasing the hairpin length.Although the experimental scatter in Figure 7 can be attributed to partial hydrolysis of the 5’-pppG, it is hard to establish the precise contribution of this effect. Therefore, we ﬁx b = . c = .

11 M, the simulation model predicts T m = . ◦ C for the melting temperature ofL8 and D G = − . ◦ C. The corresponding experimental values are T m = . ◦ C and D G = − . ◦ C. For L10, we have T m = . ◦ C, D G = − . T m = . ◦ C, D G = − . Melting at low ionic concentration

Figure 8 compares the experimental heat capacity of the MMTV PK at 50 mM K + to the resultobtained in simulation with c = .

05 M in Eq. (9). It is not obvious a priori that hairpins and pseu-doknots should have the same reduced charge Q . Pseudoknot structures consist of three alignedstrands of RNA, rather than two, and the high density of negative charge would be expected topromote counterion condensation. Nonetheless we ﬁnd that the heat capacity of the MMTV PKcomputed using b = . b was sufﬁcient to position correctly both melting peaks, anindication that Eq. (8) is suitable for a description of salt effects on RNA pseudoknots. The modelalso captures the characteristic property of the MMTV PK, that is, stem 2 is more strongly affectedby changes in c than stem 1. In experiment, the difference in the melting temperatures of the twostems increases from 22 ◦ C at c = ◦ C at c = .

05 M, which is related to a signiﬁcant lossof stability for stem 2 in the low salt buffer. Note that neglecting counterion condensation ( Q = ◦ C, in stark contrast to the experimental melting temperature of 48 ◦ C.A considerable difference in the stability of the two stems in the MMTV PK is further illustratedin Figure 9, where we plot the probability that each stem is folded as a function of c . At 37 ◦ C, stem1 is stable for all salt concentrations in the typical experimental range, whereas stem 2 undergoesa folding transition upon increasing c , with the midpoint at approximately 30 mM (Figure 9a).At 80 ◦ C, the folding transition of stem 1 falls within the experimental range of c (Figure 9b).However, at such high temperatures, the population of the unfolded state is non-negligible for allsalt concentrations and, in the case of stem 2, it exceeds 80%.In Figure 9, we have used two different criteria for folding of the stems. For the solid curves astem is considered folded if at least ﬁve base pairs have formed and for the dashed curves a stem is19ssumed to be folded if at least one base pair has formed. Although the curves in Figure 9 dependon the criteria for folding, the numerical differences are small, especially at 37 ◦ C. This is becausethe transition state in the folding of each stem corresponds to the closing of a loop by a single basepair, after which the formation of subsequent base pairs is a highly cooperative process. At 80 ◦ C, individual base pairs have a high probability of opening and closing without affecting the loopregion, which contributes to the quantitative differences between the two deﬁnitions of the foldedstate (Figure 9b).

Conclusions

We have developed a general coarse-grained simulation model that reproduces the folding thermo-dynamics of RNA hairpins and pseudoknots with good accuracy. The model enables us to studythe folding/unfolding transitions with computational efﬁciency, as a function of temperature andionic strength of the buffer. It is interesting that simulations using a single choice of model pa-rameters, D G = . U = .

43 kcal/mol and b = . c < . c > . c extends all the way to 1 M (cf. L8 and L10 in Figure 7). Our simulations predict a substantial cur-vature in DD G vs. ln c in the range c > . c > . b = . b is the mean axial distance per unit bare charge of the polyelectrolyte.Notably, distance 4.4 Å agrees well with the estimates of the counterion condensation theory for b in single-stranded nucleic acids. Therefore, we propose that, in our simulations, b describesthe geometry of the unfolded state, which is similarly ﬂexible for hairpins and pseudoknots. Fur-ther work on this issue and the reduction of RNA charge in the presence of divalent counterionswill be published elsewhere. References (1) Doudna, J. A.; Cech, T. R.

Nature , , 222–228.(2) Treiber, D. K.; Williamson, J. R. Curr. Opin. Struct. Biol. , , 339–345.(3) Thirumalai, D.; Hyeon, C. Biochemistry , , 4957–4970.(4) Chen, S.-J. Annu. Rev. Biophys. , , 197–214.(5) Woodson, S. A. Acc. Chem. Res. , , 1312–1319.(6) Zhuang, X.; Bartley, L.; Babcock, A.; Russell, R.; Ha, T.; Hershlag, D.; Chu, S. Science , , 2048–2051.(7) Russell, R.; Herschlag, D. J. Mol. Biol. , , 839–851.(8) Thirumalai, D.; Woodson, S. A. Acc. Chem. Res. , , 433–439.(9) Tinoco, I., Jr.; Bustamante, C. J. Mol. Biol. , , 271–281.2110) Woodson, S. A. Curr. Opin. Chem. Biol. , , 104–109.(11) Tan, Z.-J.; Chen, S.-J. Biophys. J. , , 1565–1576.(12) Hyeon, C.; Thirumalai, D. Proc. Natl. Acad. Sci. U.S.A , , 6789–6794.(13) Tan, Z.-J.; Chen, S.-J. Biophys. J. , , 176–187.(14) Kirmizialtin, S.; Pabit, S. A.; Meisburger, S. P.; Pollack, L.; Elber, R. Biophys. J. , ,819–828.(15) Tan, Z.-J.; Chen, S.-J. Biophys. J. , , 738–752.(16) Ha, B. Y.; Thirumalai, D. Macromolecules , , 9658–9666.(17) Chen, K.; Eargle, J.; Lai, J.; Kim, H.; Abeysirigunawardena, S.; Mayerle, M.; Woodson, S.;Ha, T.; Luthey-Schulten, Z. J. Phys. Chem. B , , 6819–6831.(18) Thirumalai, D.; Lee, N.; Woodson, S. A.; Klimov, D. K. Annu. Rev. Phys. Chem. , ,751–762.(19) Koculi, E.; Thirumalai, D.; Woodson, S. A. J. Mol. Biol. , , 446–454.(20) Koculi, E.; Hyeon, C.; Thirumalai, D.; Woodson, S. A. J. Am. Chem. Soc. , , 2676–2682.(21) Whitford, P. C.; Schug, A.; Saunders, J.; Hennelly, S. P.; Onuchic, J. N.; Sanbonmatsu, K. Y. Biophys. J. , , L7–L9.(22) Cho, S. S.; Pincus, D. L.; Thirumalai, D. Proc. Natl. Acad. Sci. U.S.A , , 17349–17354.(23) Lin, J.; Thirumalai, D. J. Am. Chem. Soc. , , 14080–14081.(24) Feng, J.; Walter, N. G.; Brooks, C. L., III J. Am. Chem. Soc. , , 4196–4199.2225) Denesyuk, N. A.; Thirumalai, D. J. Am. Chem. Soc. , , 11858–11861.(26) A sample A-form RNA structure can be found at .(27) Chandler, D.; Weeks, J. D.; Andersen, H. C. Science , , 787–794.(28) Theimer, C. A.; Blois, C. A.; Feigon, J. Mol. Cell , , 671–682.(29) Xia, T.; SantaLucia, J., Jr.; Burkand, M. E.; Kierzek, R.; Schroeder, S. J.; Jiao, X.; Cox, C.;Turner, D. H. Biochemistry , , 14719–14735.(30) Bloomﬁeld, V. A.; Crothers, D. M.; Tinoco, I., Jr. Nucleic Acids: Structures, Properties, andFunctions , 1st ed.; University Science Books, 2000.(31) Dima, R. I.; Hyeon, C.; Thirumalai, D.

J. Mol. Biol. , , 53–69.(32) Florián, J.; Šponer, J.; Warshel, A. J. Phys. Chem. B , , 884–892.(33) Shen, L. X.; Tinoco, I., Jr. J. Mol. Biol. , , 963–978.(34) Manning, G. S. J. Chem. Phys. , , 924–933.(35) Heilman-Miller, S. L.; Thirumalai, D.; Woodson, S. A. J. Mol. Biol. , , 1157–1166.(36) Sharp, K. A.; Honig, B. J. Phys. Chem. , , 7684–7692.(37) Hasted, J. B. Liquid water: dielectric properties ; Water, a Comprehensive Treatise; PlenumPress: New York, 1972; Vol. 1; pp 255–309.(38) Williams, D. J.; Hall, K. B.

Biochemistry , , 14665–14670.(39) Mikulecky, P. J.; Feig, A. L. Biopolymers , , 38–58.(40) Honeycutt, J. D.; Thirumalai, D. Biopolymers , , 695–709.(41) Theimer, C. A.; Giedroc, D. P. RNA , , 409–421.2342) Manning, G. S. Biopolymers , , 2385–2390.(43) Record, M. T., Jr.; Woodbury, C. P.; Lohman, T. M. Biopolymers , , 893–915.(44) Bond, J. P.; Anderson, C. F.; Record, M. T., Jr. Biophys. J. , , 825–836.24able 1: Enthalpies D H , entropies D S and melting temperatures T m of single-stranded stacks, de-rived in this work. Enthalpies of hydrogen bond formation in Watson-Crick base pairs are given inthe last two rows. In the ﬁrst column, the 5 ′ to 3 ′ direction is shown by an arrow. ↑ xw D H , kcal mol − D S , cal mol − K − T m , ◦ C UU − . − . − CC − . − . CU ; UC − . − . AA − . − . AU ; UA − . − . AC ; CA − . − . GC − . − . GU ; UG − . − . CG − . − . GA ; AG − . − . GG − . − . D H ( A − U ) = − .

47 kcal/mol D H ( G − C ) = − .

21 kcal/mol25able 2: Temperature-dependent stacking parameters U used in Eq. (3). The values of h corre-spond to D G = . T m are given in Table 1. In theﬁrst column, the 5 ′ to 3 ′ direction is shown by an arrow. U = − h + k B ( T − T m ) s ↑ xw h , kcal mol − s UU − . CC − . CU ; UC − . − . AA − . AU ; UA − . − . AC ; CA − . − . GC GU ; UG CG GA ; AG GG igures (a) stem 1 stem 2 CCGCGAUCGGGUGGCGC AGGGCCCAUCUCA AAAA - -

19 5 - - - - -- -------- _ _ - . MMTV PK (b) - --

CUUCGGGAAGCCA U CGUC - - - --- - _ _ AC L8 (c) - -- CUUCGGGAAGCCAU GU C - - - --- - _ _ CAC UC

L10

Figure 1: Secondary structures of studied RNA. Hairpins L8 and L10 have a 5’-pppG, while theMMTV PK does not have a phosphate group at the 5’-end.28 (cid:73) r P S P S P B B Figure 2: Illustration of the structural parameters in Eq. (3). Sites P, S and B are shown in black,green and red, respectively. The indices refer to different nucleotides. r is the distance betweenbases B and B in angstroms, and f ( P , S , P , S ) and f ( P , S , P , S ) are the dihedral anglesin radians. 29 B T , kcal / mol (cid:39) G , k ca l / m o l h = 5.7; s = 0.0 h = 6.0; s = 0.0 h = 5.98; s = 5.30 (a) (b) -6 -5 -4 -3 -2 -1 00.00.51.01.52.02.53.0 Stack energy, kcal / mol P r ob a b ilit y d e n s it y k B T Figure 3: (a) A sample distribution of stacking energies U ST (Eq. (3)) from simulations of coarse-grained dimers. All dimer conﬁgurations with U ST < − k B T are counted as stacked in Eq. (6).(b) The free energy, D G , of stack formation in dimer (cid:0) GA (cid:1) , calculated using Eq. (6) with D G = D G for different U = − h + k B ( T − T m ) s , where T m is the meltingtemperature of (cid:0) GA (cid:1) in Table 1 and h , s vary. Red solid line shows D G = D H − T D S for D H and D S given in Table 1. Same D G is obtained in simulation with h = .

98 kcal/mol and s = .

30 (closedsymbols). 30 (cid:84) r (cid:92) (cid:92) (cid:92) P S P S P P S P B B B B (cid:84) (cid:84) r (cid:92) (cid:92) (cid:92) P S P S P S P S P P B B B B (a) (b) S P Figure 4: Illustration of the structural parameters in Eq. (7). Angle deﬁnitions depend on the site(P, S or B) which forms a hydrogen bond. Examples show hydrogen bonding between sites B andS (a) and between sites P and S (b). In (a), r (S1, B4) is distance, q (S4, B4, S1) and q (P2,S1, B4) are angles, y (P2, S1, B4, S4), y (S1, B4, S4, P5) and y (B4, S1, P2, S2) are dihedralangles. In (b), r (S1, P4) is distance, q (S4, P4, S1) and q (P2, S1, P4) are angles, y (P2, S1, P4,S4), y (S1, P4, S4, P5) and y (P4, S1, P2, S2) are dihedral angles. All other designations are thesame as in Figure 2. 31 r ee e n e r gy , k ca l/ m o l Temperature, o C G u (cid:39) G G f Figure 5: Geometrical deﬁnition of the stability D G ( T ) of the folded state. The solid curve showsthe total free energy of the system G ( T ) , given by Eq. (18). The free energies of the folded andunfolded states, G f ( T ) and G u ( T ) , are estimated from Eq. (19) using T ∗ = ◦ C and 130 ◦ C,respectively (dashed curves). 32

20 40 60 80 100 120 14001234 (a) (b)

Temperature, o C Temperature, o C MMTV PK + Q = 1 MMTV PK + Q = Q *( T ) C , k ca l/ ( m o l K ) C , k ca l/ ( m o l K ) Figure 6: Measured (black symbols) and computed (red curve) heat capacity C of the MMTVPK in 1 M Na + . The computed C ( T ) is shown with respect to the heat capacity of the unfoldedstate at 130 ◦ C. D G = . U = .

43 kcal/mol. In (a), Q =

1. In (b), Q = Q ∗ ( T ) , b = . C ( T ) from simulation using the solvent viscositywhich is ten times larger than the h speciﬁed in Methods. It can be rigorously established that thethermodynamic properties must be independent of h . In accord with this expectation simulationsat high and low values of h are in good agreement with each other despite the difﬁculty in obtainingadequate sampling for large h . 33 a) (b)(c) (d) L10 c , M (cid:39)(cid:39) G , k ca l/ m o l L10 c , M (cid:39)(cid:39) G , k ca l/ m o l c , M L8 (cid:39)(cid:39) G , k ca l/ m o l (cid:39)(cid:39) G , k ca l/ m o l L8 c , M o C37 o C60 o C 20 o C37 o C60 o C

22 P21 P20 P

24 P23 P22 P

Figure 7: The stability D G of hairpins L8 and L10 as a function of salt concentration c , plotted as DD G ( c ) = D G ( c ) − D G ( c ) , where c = .

11 M. Panels (a) and (b) show comparison of experiment(symbols) and simulation (curves) at different temperatures, assuming no hydrolysis of the 5’-pppG. Panels (c) and (d) illustrate the contribution of the 5’-pppG to DD G ( c ) at 37 ◦ C for differentlevels of hydrolysis: no hydrolysis (solid), partial hydrolysis (dashed) and complete hydrolysis(dash-dotted). Simulation curves are for Q = Q ∗ ( T ) , b = . a) (b) C , k ca l/ ( m o l K ) Temperature, o C Temperature, o C MMTV PK

50 mM K + Q = 1 MMTV PK

50 mM K + Q = Q *( T ) C , k ca l/ ( m o l K ) Figure 8: Same as in Figure 6 but in 50 mM K + .35 a) (b) c , M F r ac ti on f o l d e d F r ac ti on f o l d e d stem 1 c , M o C 80 o C stem 1stem 2 stem 2 Figure 9: The fraction of folded stem 1 (black) and stem 2 (red) in the MMTV PK, as a functionof salt concentration c . A stem is folded if ﬁve base pairs have formed (solid lines) or if one basepair has formed (dashed lines). D G = . U = .

43 kcal/mol, Q = Q ∗ ( T ) , b = . T = ◦ C. In (b), T = ◦◦