[PDF] Calculation of absolute free energy of binding for theophylline and its analogs to RNA aptamer using nonequilibrium work values

Abstract

Full PDF

aa r X i v : . [ c ond - m a t . s o f t ] A ug Calculation of absolute free energy of binding for theophylline andits analogs to RNA aptamer using nonequilibrium work values

Yoshiaki Tanida, ∗ Masakatsu Ito, and Hideaki Fujitani

Fujitsu Laboratories Ltd., 10-1 Morinosato-Wakamiya, Atsugi, Kanagawa, Japan (Dated: October 29, 2018)

Abstract

The massively parallel computation of absolute binding free energy with a well-equilibratedsystem (MP-CAFEE) has been developed [H. Fujitani, Y. Tanida, M. Ito, G. Jayachandran, C.D. Snow, M. R. Shirts, E. J. Sorin, and V. S. Pande, J. Chem. Phys. , 084108 (2005)].As an application, we perform the binding aﬃnity calculations of six theophylline-related ligandswith RNA aptamer. Basically, our method is applicable when using many compute nodes toaccelerate simulations, thus a parallel computing system is also developed. To further reduce thecomputational cost, the adequate non-uniform intervals of coupling constant λ , connecting twoequilibrium states, namely bound and unbound, are determined. The absolute binding energies∆ G thus obtained have eﬀective linear relation between the computed and experimental values.If the results of two other diﬀerent methods are compared, thermodynamic integration (TI) andmolecular mechanics Poisson-Boltzmann surface area (MM-PBSA) by the paper of Gouda et al [H.Gouda, I. D. Kuntz, D. A. Case, and P. A. Kollman, Biopolymers , 16 (2003)], the predictiveaccuracy of the relative values ∆∆ G is almost comparable to that of TI: the correlation coeﬃcients(R) obtained are 0.99 (this work), 0.97 (TI), and 0.78 (MM-PBSA). On absolute binding energiesmeanwhile, a constant energy shift of ∼ -7 kcal/mol against the experimental values is evident. Tosolve this problem, several presumable reasons are investigated. ∗ Electronic address: [email protected] . INTRODUCTION To accurately compute the free energy diﬀerence between two thermal equilibrium states(∆ G ) is of interest in computational science and of importance in terms of drug discovery [1].It can help us obtain a quantitative understanding of molecular complexes from atomic-scale,and also provide valuable information for use in structural reﬁnement for drug design. Gen-erally speaking, to discuss chemical reaction, we have to determine the free energy changewithin so-called chemical accuracy ( ∼ et al . proposed the acceptance ratio (AR) method [20, 21, 22, 23] to estimate thefree energy diﬀerence and also calculated the binding free energies for eight FKBP12-ligandcomplexes [24] using the “folding@home” system. Furthermore, our previous work [25] sug-gested the eﬃciency of the use of general AMBER force ﬁeld (GAFF) [26] for ligands toobtain the predictive binding estimates, and simultaneously indicated the importance ofstarting structures for simulations. In our method the use of many independent compute2odes or a grid computing system is eﬀective to accelerate simulations, whereupon we alsodeveloped a massively parallel computing system, “BioServer”, consisting of 1920 micropro-cessor elements, to eﬃciently perform the binding free energy calculations. Hereafter, ourmethod is referred to as the massively parallel computation of absolute binding free energywith a well-equilibrated system (MP-CAFEE).The principal aim of this study is to explore the practical feasibility of our methodology,based on a nonequilibrium approach. We have performed the calculation of the absolutebinding free energy for theophylline and its analogs to the RNA aptamer. On the samesystem, Gouda et al . reported the results of free energy calculations using two diﬀerentmethods, TI and MM-PBSA [27]. Therefore, this system is a good platform to comparethe ability between two diﬀerent methods, namely the equilibrium and nonequilibrium ap-proach. The paper is organized as follows. We initially describe the theoretical backgroundand simulation protocol of binding free energy estimation used through this work, beforesubsequently explaining the choice of coupling constant λ intervals connecting two equilib-rium states in Sec. II. The results for all RNA-ligands are presented and discussed in Sec.III. After pointing out the key features of this system, we conclude in Sec. IV. II. COMPUTATIONAL DETAILS

In general, the methods used to calculate the free energy diﬀerence in molecular sim-ulations can be classiﬁed as either equilibrium or nonequilibrium approaches. Several ap-proaches based on thermal equilibrium, e.g. FEP, TI and the potential of mean force usingweighted histogram analysis method with umbrella sampling [28], have been attempted tocompute the free energy diﬀerence between two equilibrium states. However, the equilibriumapproaches encounter the diﬃculty of removing the nonequilibrium contributions, known asthe hysteresis problem [29, 30, 31]. In order to overcome this diﬃculty, a long time simu-lation must be performed, to retain the reversibility of quasistatic process associating withthe thermal equilibrium. Recently, Jarzynski’s equality exactly relates the free energy dif-ference of two equilibrium states i and j to the statistics of works via a nonequilibrium,irreversible process that connects them as exp( − ∆ G/k B T ) = h exp( − W/k B T ) i , where W isthe work change of a system from i to j under isothermal isobaric (NPT) conditions [32]. Aremarkable feature of Jarzynski’s equality is the fact that the switching time between two3tates is arbitrary. Subsequently, when the process occurs instantaneously, the work can bedeﬁned by the potential energy diﬀerence as W ≡ U ( λ j , x ) − U ( λ i , x ), where U and x arethe potential function and conﬁgurational coordinates, respectively. A coupling constant λ is imposed on the path connecting two states. The ﬂuctuation theorem (FT) related with P i → j ( W ) and P j → i ( W ) of the work probability distributions along the non-equilibrium for-ward ( i → j direction) and reverse ( j → i direction) processes is derived by Crooks [13] as P i → j (+ W ) /P j → i ( − W ) = exp[( W − ∆ G ) /k B T ]. The ratio between P i → j (+ W ) and P j → i ( − W )depends only the value of the free energy diﬀerence ∆ G . Shirts et al . pointed out thatthe maximum likelihood ∆ G is given by AR analysis, namely, the control parameter inAR analysis must correspond to the free energy diﬀerence [21]. Since two irreversible workdistributions in forward and reverse directions cross at W = ∆ G , both distributions musthave a large overlap to obtain the free energy estimate with high accuracy. Subsequently,after applying the multi-staging method, the intermediate stages are inserted onto the pathbetween two states i and j , with the total free energy change obtained as the sum of thediﬀerences. This type of usage, involving the insertion of many intermediate states on thepath connecting two states, seems to be eﬀective. However, since economical issues also playa role because many λ require signiﬁcant computing resources, we have now addressed theissue concerned with determining the adequate choice of each λ interval. First, potentialenergy is parameterized by λ , followed by the introduction of the λ dependency to the non-bonded interactions. In order to avoid instability near the end-points of the vanishing atomsthrough calculation, the non-bonded interaction energy U ( λ C , λ LJ ) between the ligand andothers, including the so-called soft-core potentials, is used [33]. We also use two kind ofcoupling constant λ for representing two diﬀerent potential energies: the fully bound state( λ C = λ LJ = 0) and the unbound state ( λ C = λ LJ = 1). Hereafter, the free energy compo-nent resulting from turning oﬀ the electrostatic potential ( λ C = 0 → , λ LJ = 0) is referredas the electrostatic contribution. We also denote the free energy component for the processof ( λ C = 1 , λ LJ = 0 →

1) as the van der Waals contribution. Next, we investigate the conver-gence properties of the free energy when the number of equal λ spacings ( ≡ N ) increases. Atrajectory at each intermediate λ i is obtained via the standard MD simulation. As alreadynoted, the work associated with the forward path is obtained to compute the potential energydiﬀerence as W F ( i, i + 1) = U ( λ i +1 , x i ) − U ( λ i , x i ). Using a trajectory at λ i +1 , the work onthe reverse path is also obtained as W R ( i, i + 1) = U ( λ i +1 , x i +1 ) − U ( λ i , x i +1 ), while the free4nergy ∆ G i,i +1 can be estimated using both work distributions of W F ( i, i +1) and W R ( i, i +1)by the AR algorithm, thus the free energy diﬀerence is ∆ G = P N − i =0 ∆ G i,i +1 . In general, ∆ G is being converged as the number of N increases. N is deﬁned as a large enough number toobtain the well-converged value of ∆ G ( ≡ ∆ G ). We also deﬁne ∆ G ( λ n ) = P n − i =0 ∆ G i,i +1 .Finally, to reproduce the feature of ∆ G ( λ n ), the adequate mesh points of λ are determinedto be satisﬁed with the following requirements: 1) the small diﬀerence between ∆ G and∆ G ; 2) the small root mean square (rms) error between ∆ G ( λ n ) and ∆ G ( λ n ). The rmserror used here is deﬁned as h (1 /N ) P N − i =0 | ∆ G ( λ i ) − ∆ G ( λ i ) | i / .Now, we have applied this manner to an RNA-theophylline system. Using the ﬁrst 100ps MD simulation N were determined as 20 for electrostatic contribution and 40 for vander Waals contribution respectively. As can be seen from Table I, we have determined thenon-uniform mesh points of λ with ∆ G diﬀerence ( ≡ δG ) ≤ G ( λ ) ≤ λ points. The λ points obtained were the following: 12 values of λ C points (0, 0.1, 0.25, 0.45,0.55, 0.65, 0.7, 0.75, 0.8, 0.9, 0.95, 1) and 21 values of λ LJ points (0, 0.1, 0.2, 0.275, 0.375,0.45, 0.55, 0.65, 0.675, 0.725, 0.75, 0.775, 0.8, 0.825, 0.85, 0.875, 0.9, 0.925, 0.95, 0.975, 1).Computed ∆ G ( λ i ) as a function of λ i were also shown in Figs. 1(a) and (b). It is clear thatthe feature of the well-converged curves is reproduced well by a small number of non-uniformmesh points.To estimate the free energy diﬀerence, the path from the bound state ( λ = 0) to theunbound state ( λ = 1) is chosen as follows; the ligand electrostatic interactions with theenvironment are turned oﬀ, followed by its van der Waals interactions with the environment,whereupon we obtain the solvation energy of ligand as ∆ G Solv ( ≡ ∆ G CSolv +∆ G LJSolv ). The freeenergy of the annihilation of the ligand from the ligand-receptor complex (complexation freeenergy) is also calculated as ∆ G Complex ( ≡ ∆ G CComplex + ∆ G LJComplex ). We schematically noteas follows; L (sol) → L (gas) on the calculation of the solvation free energy of the ligand (L) and RL (sol) → R (sol) + L (gas) on the calculation of the complexation free energy of the ligand (L) tothe receptor (R). Consequently, we obtain the absolute binding free energy of the receptor-ligand complex as ∆ G ≡ ∆ G Complex − ∆ G Solv = ∆ G CComplex + ∆ G LJComplex − ∆ G CSolv − ∆ G LJSolv using the relation of RL (sol) → R (sol) + L (sol) .In our study the three-dimensional structure of the RNA-theophylline complex proposedby Clore et al . [34] (PDB code 1O15) was used as a modeling. We assumed that the bind-5ng sites for six ligands ﬁt into the structural framework of the original RNA-theophyllinestructure because the structures of other complexes were not experimentally determined.All MD simulations were performed using a modiﬁed version of the GROMACS package(v3.1.4) [35] with single precision to speed up the execution. The Amber force ﬁeld pa-rameters (ﬀ99) were used to describe the RNA aptamer, including 1-4 interactions betweenhydrogen atoms, while the force ﬁeld parameters for Mg ions [36] were also used fromthe Amber package. The atomic structure of the theophylline molecule was optimized in avacuum using Gaussian 98 (Gaussian, Inc.) with the HF/6-31G ∗ single-point energy calcu-lation, while the atomic charges were calculated using the restrained electrostatic potential(RESP) method [37] and the ANTECHAMBER program was used to assign the GAFF foreach atom. The chemical structures of the ligands investigated here are shown in Fig. 2,while the electrostatic potential surface of these molecules, computed using the AdaptivePoisson-Boltzmann Solver (APBS) package [38] after using the RESP charge ﬁtting are alsoshown. It is clear that hypoxanthine has a slightly diﬀerent charge distribution around thelower left-hand side of the molecule as compared to other molecules. The TIP3P model forwater molecules was used to describe the solvent [39].Through this study we obeyed the following parameters for all MD simulations. Integra-tion of the equation of motion was performed using the leap-frog algorithm and all bondswere constrained by LINCS [40] with order 8. We used the time step of 2.0 fs while the non-bonded pair list of 1 nm was updated every 10 steps. All simulations were carried out at T =298 K using the Nose-Hoover [41, 42] temperature control and at 1 atm using the Berend-sen [43] pressure control respectively, with a time constant of 0.5 ps and a compressibility of4.5 × − bar − . The particle mesh Ewald (PME) [44] summation was used to evaluate theelectrostatic interactions with a grid spacing of approximately 0.12 nm, a cubic spline orderof 4 and relative tolerance between long and short range energy of 10 − . The L-J interactionenergy was calculated by a switched cutoﬀ between 0.8 nm and 0.9 nm. We also includedlong range correction to the energy and pressure. We used the truncated octahedron boxas a unit cell with periodic boundary conditions, while for the solvated ligand system, weadopted the unit cell size, maintained no closer than 0.85 nm from each face of the box,and introduced about 238 to 282 TIP3P water molecules into the unit cell. RNA-ligandcomplexes were prepared in which the minimum distance was 0.75 nm between the nearestunit cell wall on either side, while the number of TIP3P water molecules used were either6192 or 7193. We introduced 3 Mg ions into the complex according to the paper of Gouda et alt [27, 45, 46] and also added 26 Na + counter-ions to the complex to neutralize thesystem.One of the diﬃculties for computing the binding free energy is the preparation of start-ing structure sampled from the canonical ensemble at the desired temperature T [25]. Thefollowing procedure was used. We performed energy minimization using the conjugate gra-dient method, followed by the MD simulation of 20 ps relaxation with the solute positionsrestrained. In order to attain the thermal equilibrium state ( λ C = λ LJ = 0), 10/15 ns MDcalculations were typically used for the solvated ligand and the RNA-ligand complexes.Through this process, used to conclude the system to be equilibrated, the total potentialenergy of the system and the local electrostatic and van der Waals potential energies be-tween RNA and ligand were carefully monitored. Via the multi-stage method we adopted anatomic structure at λ = 0 after completing the preconditioning MD simulation as a startingstructure at each intermediate λ state. To decrease the uncertainty from statistical error,we sampled the MD simulation with 15 and 10 kinds of diﬀerent initial velocities for thesolvated ligand system and RNA-ligand complex, respectively. Nonequilibrium works ev-ery 50 steps (0.1 ps) were used to compute the free energies, which were obtained as thearithmetic average of all estimated values. Through this work we simultaneously performed330 (33 independent non-uniform λ points ×

10 samples) independent MD simulations togenerate the energetically possible samples for evaluating the binding free energy on theFujitsu BioServer system, consisting of 1920 FR-V microprocessors.

III. RESULTS AND DISCUSSION

It was found that a Mg ion shielded by water molecules and located at the center ofthe U-turn formed by C22-U24 [46] remained unchanged during all the preconditioning MDsimulations. Conversely however, other Mg ions immediately left their initial positions,and diﬀused around an RNA aptamer. 7 . Theophylline A 15 ns preconditioning MD simulation was used to reach a thermal stable structurein the RNA-theophylline complex. We now consider the energetically favorable structurefrom the perspective of both electrostatic interaction and that of van der Waals. Theobtained structure is consistent as expected, namely, the negative surface region of an RNAstrongly interacts with the positive region on a complementary molecule surface. Besides,there is a match of theophylline and residue of A28 on the substrate. This result can beinterpreted based on the view that the van der Waals interaction simultaneously stabilizesthe theophylline molecule onto the binding pocket. Furthermore, theophylline is bound withan upper as well as lower layer.Let us now move on to the binding free energy calculation to examine the availabilityof our method. The open circles in Fig. 3 show the features of the convergence of (a) thesolvation free energy of theophylline and of (b) the complexation free energy of theophyllineto an RNA as a function of the MD step. Each open circle was obtained by analyzing thegenerated works during each 100 ps MD trajectory. Rough convergence appears visible after200 ps and 700 ps for (a) the solvated ligand and (b) the solvated RNA-ligand complex,respectively. Subsequently, we used work values from 200 ps to 1 ns (8000 samples) toestimate the solvation energy ∆ G Solv and those from 700 ps to 1 ns (3000 samples) toestimate the complexation energy ∆ G Complex , and obtained -14.4 kcal/mol for ∆ G Solv and-30.5 kcal/mol for ∆ G Complex . Consequently, we also obtained the binding free energy ∆ G of -16.1 kcal/mol for the RNA-theophylline system. We have summarized the free energycomponents in Table II and it can be found that the favorable structure of this complex ismainly driven by the van der Waals interaction.We further comment on the computational cost required to obtain the converged ∆ G .Since the ratio of the simulation time between the solvation and complexation is relativelysmall, we only consider the simulation time for obtaining ∆ G Complex . In this system, a totalof 330 ns MD simulations for obtaining ∆ G Complex were used. However, our approach isapplicable for the use of many compute nodes to accelerate calculations, meaning the useof a parallel computing system, including a grid environment, is the key to diminishing thecomputation turnaround time. 8 . Caﬀeine

Caﬀeine diﬀers from theophylline by a single methyl group as shown in Fig. 2. Eachcomponent, once decomposed from estimated binding energies, was tabulated in Table II.It is evident that the signiﬁcant diﬀerence of 5.1 kcal/mol between the theophylline andcaﬀeine bindings is mainly attributable to the electrostatic contribution (4.6 kcal/mol). Wenow address and attempt to explore the reason why the aﬃnity between RNA and caﬀeineis drastically reduced in comparison with that between RNA and theophylline. Due to thetypical snapshots, both ligands preferably bind within the pocket of an RNA, as illustratedin Figs. 4(a) and (b). Hereafter, we discuss the results during 1ns MD simulations afterreaching thermal equilibrium states for both systems. The obtained averaged distances ofTEP(O)-C22(1H4), TEP(hn)-C22(N3), TEP(hn)-C22(O2) and TEP(nc)-U24(H3) are 0.22nm, 0.19 nm, 0.26 nm and 0.21 nm, respectively, while the averaged distance of CAF(nc)-U24(H3) of 0.21 nm was also obtained for the RNA-caﬀeine complex. These values indicatedthe creation of hydrogen bonds as the bond lengths were less than approximately 0.25 nm.This diﬀerence in the number of hydrogen bonds between both RNA-caﬀeine and RNA-theophylline complexes may result in a signiﬁcant diﬀerence in the binding energy, so thenumber of hydrogen bonds for both systems were enumerated. At this point, we determinedthe hydrogen bond based on the cutoﬀ angle of the donor-hydrogen-acceptor and the distanceof the hydrogen-acceptor; values of 60 ◦ for cutoﬀ angle and 0.25 nm for cutoﬀ distance wereused. We obtained hydrogen bonds of 4.3 ( ± ± ± ± ± ± ∼

3) is easily understood as the hydrogen atom TEP(hn)are shared to be bound by both a nitrogen atom (C22(N3)) and an oxygen atom (C22(O2))in the solvated RNA-theophylline complex as shown in Fig. 4(a), while these hydrogen bondsvanish in the solvated RNA-caﬀeine complex in Fig. 4(b). The hydrogen bond contributingto binding energy for the RNA-theophylline system is ∼

0, while the decrease in the number9f hydrogen bonds for the RNA-caﬀeine system is ∼ ∼ ∼ ∼ C. Other molecules

To obtain a thermal equilibrium binding structure, we similarly set the initial locationsof other molecules in close proximity to the binding site of theophylline for preconditioningMD simulations because the electrostatic potential surface of each molecule is almost thesame as theophylline. The obtained stable atomic conﬁgurations of other molecules boundwith RNA were virtually unchanged. The association of RNA with all six ligands is drivenprimarily by attractive van der Waals interactions ∆ G LJ as tabulated in Table II. The resultsof the absolute and relative binding free energies we obtained for six ligands are given inTable III, which also tabulates the results of other methods, TI and MM-PBSA [27] andincludes three important characteristics: 1) There is an eﬀective linear relation (slope= 1)between the computed absolute binding free energies and experimental ones. We obtainedthe following relation between the computed relative energies ∆∆ G calc and experimental ones∆∆ G expt for three diﬀerent methods (the results of TI and MM-PBSA were from Ref. [27]):∆∆ G calc = ∆∆ G expt − . , R = 0 .

99 (This work), ∆∆ G calc = ∆∆ G expt +0 . , R = 0 .

97 (TI),∆∆ G calc = ∆∆ G expt + 1 . , R = 0 .

78 (MM-PBSA), where R is the correlation coeﬃcient.Consequently, the eﬃciencies of both our method and the TI method for estimating therelative binding free energies are virtually comparable. Here, we should point out thatour method directly estimate the absolute binding energies, whereas the TI method onlycompute the relative binding energies. We also point out that the absolute binding freeenergy for the RNA-hypoxanthine system is partly shifted about 1.5 kcal/mol from thelinear relation, which means this result might suggest consideration be taken of anotherbinding site for hypoxanthine binding. 2) There is a constant energy shift of ∼ -7 kcal/molfrom the experimental values ∆ G expt as can be seen in Fig. 5. Detailed discussions for thelatter are given below.In general, since water molecules play an important role in free energy calculation, we have10erformed calculations of ∆ U local as the sum of short-range electrostatic interaction energyand van der Waals interaction energy between RNA and ligand. The sample averaged valuesduring 1 ns MD simulations after reaching the thermal equilibrium state are shown in Fig. 6.In comparison with Fig. 5, our method can reproduce the trend in both 1-methylxanthineand xanthine well, although the large shift ( ≃ U local and ∆ G expt was evident. Consequently, our method evidently takes the eﬀect ofwater molecules into consideration. D. Constant energy shift

The computed absolute binding free energies were approximately -7 kcal/mol smaller thanthose obtained by the experiment. How do we solve the puzzle underlying the diﬀerencebetween them? We brieﬂy comment on the presumed reasons to solve this puzzle. First, it iswell known that nucleic acid, e.g. RNA or DNA, is generally ﬂexible in solution [47]. Thereare a number of metastable states in free energy space and long term residence at a singlevalley, resulting in anomalously slow dynamics. Namely, the ensemble average of severalcapable atomic structures was observed in the experiment. Thus, in the RNA-theophyllinesystem we examined exploration of another metastable structure using a simulated annealingtechnique as follows: We ﬁrst prepared a slightly expanded unit cell that was suﬃcientto treat the RNA relaxing. Theophylline and RNA aptamer was solvated in a truncatedoctahedron box with 3 Mg ions, 26 Na + counter-ions and 9325 TIP3P water molecules.Next, energy minimization and position restrained MD simulation were carried out in thesame manner, as described in II. We also set a simulation temperature of 398 K to disturb thestable complex structure during the simulation time of 100 ps. Consequently, a (meta)stableatomic conﬁguration was obtained by annealing to 298 K. One of the characteristic featureswas that a Mg ion leave from the center of the U-turn of C22-U24. Binding free energy of -3.3 kcal/mol was obtained by AR analysis with simulation of up to 1 ns MD, by discarding theﬁrst 400 ps runs, which is signiﬁcantly small compared with the strongly binding structuredescribed in Sec. IIIA. The feature of convergence of the free energy estimation as a functionof the MD step for this weakly binding structure was also drawn in Fig. 3. We further dividethe binding free energy into electrostatic and van der Waals components. Binding freeenergy of -0.4 kcal/mol is contributed from the electrostatic part, whereas, -3.0 kcal/mol11s contributed from the van der Waals part. Second, we examined the inﬂuence of theatomic charges of the ligand. Here, AM1-BCC charges [48] of the ligand generated fromMOPAC2002 were tested. ∆ G Solv of -17.7 kcal/mol was obtained, while ∆ G Complex of -33.7 kcal/mol was also obtained up to 1 ns MD simulations. Both values increased by ∼ G LJSolv wasreduced by approximately 0.5 kcal/mol as the cutoﬀ radius was increased from 0.9 nm to1 nm. The ∆ G LJComplex was also reduced by 0.2 kcal/mol as the cutoﬀ radius increased from0.9 nm to 1 nm, meaning the correction is substantially only 0.3 kcal/mol. Fourth, wepreviously reported that ∼ G Solv of -14.2 kcal/mol was obtained, while ∆ G Complex of -27.0 kcal/mol was also obtained, so we obtained a free energy diﬀerence ∆ G of -12.8kcal/mol. This value still represents an approximate shift from the experimental values by4.0 kcal/mol. Finally, since an RNA has signiﬁcant negative charges, we believe the factthat the charge polarizability of an atom is inﬂuenced by electrostatic interactions with itssurrounding atoms, becomes important [52, 53, 54]. For this reason the polarizable potentialmodel might be considered, however, as the computational cost rises signiﬁcantly, we havenot yet examined this ability. Consequently, this puzzle remains a major challenge in theRNA-ligand system. IV. CONCLUSION

In conclusion, we have introduced the method (MP-CAFEE) as a highly accurate meansof obtaining the absolute binding free energy. This method has been applied to investi-gate the properties of the RNA-ligand system. Since our method is suitable for the useof many compute nodes, we have also developed a massively parallel computing system to12ﬀectively accelerate simulations. To further reduce the computational cost, we determinedthe adequate non-uniform intervals of two coupling constants and consequently obtainedthe following details: First, our estimated absolute binding free energies correlate well interms of a linear ﬁt to the experimental values when all ligands are assumed to be preferableto bind the same binding pocket. By comparing the results of two other methods, TI andMM-PBSA[27], the accuracy of this method is almost comparable to that of TI in terms ofrelative binding free energy. With this in mind, we believe this method to be a promisingtool to quantitatively predict the absolute binding free energy in the drug design domain.Second, we expect a considerable diﬀerence in the binding aﬃnity to RNA between theo-phylline and caﬀeine to be attributed to the diﬀerence in hydrogen bonds of the relatedenvironment by analyzing the free energy component. Finally, we ﬁnd the constant energyof ∼ -7 kcal/mol shifted from the experimental values for all ligands on absolute binding freeenergy. Though considering several presumable solutions to this problem, this area remainsincompletely understood. Acknowledgments

The authors would like to thank G. Jayachandran, M. R. Shirts, E. J. Sorin, and V. S.Pande for fruitful discussions and also E. Lindahl for help in modifying GROMACS. Thiswork was partly supported by the High-Throughput Biomolecule Analysis System Projectof the NEDO (New Energy and Industrial Technology Development Organization, Japan).Three of the authors (H.F., Y.T., and M.I.) would like to thank all the members of theBioServer project in Fujitsu. [1] W. L. Jorgensen, Science , 1813 (2004).[2] R. W. Zwanzig, J. Chem. Phys. , 1420 (1954).[3] J. G. Kirkwood, J. Chem. Phys. , 300 (1935).[4] B. L. Tembe and J. A. McCammon, Comp. Chem. , 281 (1984).[5] C. F. Wong and J. A. McCammon, J. Am. Chem. Soc. , 3830 (1986).[6] M. K. Gilson, J. A. Given, B. L. Bush, and J. A. McCammon, Biophys. J. , 1047 (1997).[7] S. Boresch, F. Tettinger, M. Leitgeb, and M. Karplus, J. Phys. Chem. , 9535 (2003).

8] D. Hamelberg and J. A. McCammon, J. Am. Chem. Soc. , 7683 (2004).[9] J. Hermans and L. Wang, J. Am. Chem. Soc. , 2707 (1997).[10] I. Massoval and P. A. Kollman, J. Am. Chem. Soc. , 8133 (1999).[11] C. Jarzynski, Phys. Rev. Lett. , 2690 (1997).[12] C. Jarzynski, Phys. Rev. E , 5018 (1997).[13] G. E. Crooks, Phys. Rev. E , 2361 (2000).[14] C. Bustamante, J. Liphardt, and F. Ritort, Physics Today , 43 (2005).[15] G. E. Crooks, Phys. Rev. E , 2721 (1999).[16] D. Collin, F. Ritort, C. Jarzynski, S. B. Smith, I. Tinoco, and C. Bustamante, Nature ,231 (2005).[17] J. Liphardt, S. Dumont, S. B. Smith, I. Tinoco, and C. Bustamante, Science , 1832 (2002).[18] F. Ritort, Poincare Seminar , 195 (2003).[19] D. A. Hendrix and C. Jarzynski, J. Chem. Phys. , 5974 (2001).[20] C. H. Bennett, J. Commput. Phys. , 245 (1976).[21] M. R. Shirts, E. Bair, G. Hooker, and V. S. Pande, Phys. Rev. Lett. , 140601 (2003).[22] M. R. Shirts and V. S. Pande, J. Chem. Phys. , 144107 (2005).[23] D. A. Kofke and P. T. Cummings, Fluid Phase Equilib. , 41 (1998).[24] M. R. Shirts, G. Jayachandran, C. D. Snow, H. Fujitani, E. J. Sorin, and V. S. Pande (un-published).[25] H. Fujitani, Y. Tanida, M. Ito, G. Jayachandran, C. D. Snow, M. R. Shirts, E. J. Sorin, andV. S. Pande, J. Chem. Phys. , 084108 (2005).[26] J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman, and D. A. Case, J. Comput. Chem. ,1157 (2004).[27] H. Gouda, I. D. Kuntz, D. A. Case, and P. A. Kollman, Biopolymers , 16 (2003).[28] S. Kumar, D. Bouzida, R. H. Swendsen, P. A. Kollman, and J. M. Rosenberg, J. Comp. Chem. , 1011 (1992).[29] T. P. Straatsma, H. J. C. Berendsen, and J. P. M. Postma, J. Chem. Phys. , 6720 (1986).[30] D. A. Perlman and P. A. Kollman, J. Chem. Phys. , 2460 (1989).[31] D. A. Perlman and P. A. Kollman, J. Chem. Phys. , 7831 (1989).[32] S. R. Williams, D. J. Searles, and D. J. Evans, cond-mat/0611541 (2006).[33] M. R. Shirts, Ph. D. dissertation, Stanford University (2005).

34] G. M. Clore and J. Kuszewski, J. Am. Chem. Soc. , 1518 (2003).[35] E. Lindahl, B. Hess, and D. v. d. Spoel, J. Mol. Model. , 306 (2001).[36] J. ˚Aqvist, J. Phys. Chem. , 8021 (1990).[37] C. I. Bayly, P. Cieplak, W. D. Cornell, and P. A. Kollman, J. Phys. Chem. , 10269 (1993).[38] N. Baker, D. Sept, S. Joseph, M. Holst, and J. McCammon, Proc. Natl. Acad. Sci. , 10037(2001).[39] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein, J. Chem.Phys. , 926 (1983).[40] B. Hess, H. Bekker, and H. J. C. Berendsen, J. Comput. Chem. , 1463 (1997).[41] S. Nose, Mol. Phys. , 255 (1984).[42] W. G. Hoover, Phys. Rev. A , 1695 (1985).[43] H. J. Berendsen, J. P. Postma, A. DiNola, and J. R. Haak, J. Chem. Phys. , 3684 (1984).[44] U. Essmann, L. Perera, and M. L. Berkowitz, J. Chem. Phys. , 8577 (1995).[45] R. D. Jenison, S. C. Gill, A. Pardi, and B. Polisky, Science , 1425 (1994).[46] G. R. Zimmermann, C. L. Wick, T. P. Shields, R. D. Jenison, and A. Pardi, RNA , 659(2000).[47] E. J. Sorin, B. J. Nakatani, Y. M. Rhee, G. Jayachandran, V. Vishal, and V. S. Pande, J.Mol. Biol. , 789 (2004).[48] A. Jakalian, B. L. Bush, D. B. Jack, and C. I. Bayly, J. Comput. Chem. , 132 (2000).[49] G. Jayachandran, M. R. Shirts, S. Park, and V. S. Pande, J. Chem. Phys. , 084901 (2006).[50] D. L. Mobley, J. D. Chodera, and K. A. Dill, J. Chem. Phys. , 084902 (2006).[51] J. Wang, Y. Deng, and B. Roux, Biophys. J. , 2798 (2006).[52] L. X. Dang and B. M. Pettitt, J. Phys. Chem. , 3349 (1987).[53] J. Caldwell, L. X. Dang, and P. A. Kollman, J. Am. Chem. Soc. , 9144 (1990).[54] M. W. Mahoney and W. L. Jorgensen, J. Chem. Phys. , 10758 (2001). ABLE I: Obtained free energy diﬀerences and rms values comparing with well-converged valuesusing condensed equal λ spacings, 20 uniformed points for λ C , and 40 uniformed points for λ LJ respectively. These values are deduced from the ﬁrst 100 ps MD trajectory.solvation complex N δG a rms δG a rms λ C b λ LJ b a δG ≡ ∆ G − ∆ G , where ∆ G is well-converged value. b These values are obtained using an adaptive λ spacing set (see, in II). TABLE II: Electrostatic contributions and those of van der Waals, decomposed from obtainedbinding free energies for each RNA-ligand complex.∆ G Solv ∆ G Complex ∆ G Ligand ∆ G CSolv ∆ G LJSolv ∆ G CComplex ∆ G LJComplex ∆ G C ∆ G LJ Theophylline -14.2 -0.2 -19.2 -11.3 -5.0 -11.13-Methylxanthine -16.8 -0.5 -22.1 -9.9 -5.3 -9.4Xanthine -19.1 -1.0 -24.8 -10.0 -5.7 -9.01-Methylxanthine -16.7 -0.6 -21.3 -10.2 -4.6 -9.6Hypoxanthine -18.2 -0.3 -22.8 -7.2 -4.6 -6.9Caﬀeine -12.7 0.3 -13.1 -10.3 -0.4 -10.6 ABLE III: Absolute binding free energies obtained for RNA aptamer and theophylline and itsanalogs in kcal/mol. Experiment a TI b MM-PBSA b This workLigand ∆ G exp ∆∆ G exp ∆∆ G bind ∆ G (MM+solv) ∆∆ G (MM+solv) ∆ G calc ∆∆ G calc Theophylline -8.85 0 0 -18.51 0 -16.1 03-Methylxanthine -7.77 1.08 1.36 -15.27 3.25 -14.7 1.4Xanthine -6.91 1.94 1.64 -12.97 5.54 -14.6 1.61-Methylxanthine -6.88 1.97 1.82 -14.24 4.27 -14.2 1.9Hypoxanthine -5.87 2.98 — — — -11.6 4.5Caﬀeine -3.35 5.50 6.63 -12.67 5.84 -11.0 5.1 a These values are obtained using the formula - RT ln K d , where K d is the individual competitor dissociationconstant. [45]. T =298 K. b These values are from Ref. 27. ∆ G C ( λ C ) ( kc a l / m o l ) λ C SolvComplex -20-15-10-50 0 0.2 0.4 0.6 0.8 1 ∆ G L J ( λ L J ) ( kc a l / m o l ) λ LJ SolvComplex (a)(b)

FIG. 1: Free energy convergence as a function of the number of λ spacing for the RNA-Theophyllinesystem; (a) ∆ G C vs. λ C . Open symbols are our reduced 12 λ C points of (0, 0.1, 0.25, 0.45, 0.55,0.65, 0.7, 0.75, 0.8, 0.9, 0.95, 1). The solid lines are generated from 20 uniform mesh points of λ C .(b) ∆ G LJ vs. λ LJ . Open symbols are our reduced 21 λ LJ points of (0, 0.1, 0.2, 0.275, 0.375, 0.45,0.55, 0.65, 0.675, 0.725, 0.75, 0.775, 0.8, 0.825, 0.85, 0.875, 0.9, 0.925, 0.95, 0.975, 1), while thesolid lines are generated from 40 uniform mesh points of λ LJ . IG. 2: Chemical structures of ligands investigated. The dissociation constant K d (in µ M) isincluded in each parenthesis. The electrostatic potential surfaces of molecules are also shown withred representing the negative potential and blue representing the positive potential respectively (-4 k B T /e to 4 k B T /e ). ne r g y ( kc a l / m o l ) -20-18-16-14-12-10 (a)(b) -40-35-30-25-20-150 200 400 600 800 1000 MD step (ps)

Strongly boundWeakly bound

FIG. 3: Free energy of theophylline-RNA aptamer as a function of MD step: (a) ∆ G Solv for solvatedtheophylline, (b) ∆ G Complex for the RNA-theophylline complex. Open circle denotes a stronglybinding structure, while the open square denotes a weakly binding structure (see, in IIID). Errorbars in both graphs are drawn from the maximum value to minimum value. (a) (b) U24(H3)CAF(nc)

FIG. 4: (a) Theophylline molecule and its surroundings in the RNA-theophylline complex. Onlytwo residues related to hydrogen bonds are drawn and candidates for hydrogen bonds are depictedas dashed lines. (b) Caﬀeine molecule and its surroundings in the RNA-caﬀeine complex, whilehydrogen bonds with residue in the same plane are also drawn in the form of a dashed line. ∆ G c a l c ( kc a l / m o l ) ∆ G expt (kcal/mol) CAFTEP XANOMETME HXA

FIG. 5: Computed binding free energies ∆ G calc vs. experimentally measured binding free energies∆ G expt for six ligands. The line of slope= 1 is also drawn as a guide. The abbreviations used hereare TEP as theophylline, TME as 3-methylxanthine, XAN as xanthine, OME as 1-methylxanthine,HXA as hypoxanthine and CAF as caﬀeine. ∆ U l o c a l ( kc a l / m o l ) ∆ G expt (kcal/mol) TEPTME XANOME HXA CAF

FIG. 6: Local potential energies computed ∆ U local vs. experimental binding free energies ∆ G expt for six ligands. The line of slope= 1 is also drawn as a guide, while the abbreviations used are thesame as those in Fig. 5.for six ligands. The line of slope= 1 is also drawn as a guide, while the abbreviations used are thesame as those in Fig. 5.