[PDF] Heterogeneous path ensembles for conformational transitions in semi-atomistic models of adenylate kinase

Abstract

We performed "weighted ensemble" path-sampling simulations of adenylate kinase, using several semi-atomistic protein models. Our study investigated both the biophysics of conformational transitions as well as the possibility of increasing model accuracy without sacrificing good sampling. Biophysically, the path ensembles show significant heterogeneity and the explicit possibility of two principle pathways in the Open-Closed transition. We recently showed, under certain conditions, a "symmetry of hetereogeneity" is expected between the forward and the reverse transitions: the fraction of transitions taking a specific pathway/channel will be the same in both the directions. Our path ensembles are analyzed in the light of the symmetry relation and its conditions. In the realm of modeling, we employed an all-atom backbone with various levels of residue interactions. Because reasonable path sampling required only a few weeks of single-processor computing time with these models, the addition of further chemical detail should be feasible.

Full PDF

aa r X i v : . [ phy s i c s . b i o - ph ] F e b Heterogeneous path ensembles for conformational transitions in semi–atomisticmodels of adenylate kinaseDivesh Bhatt, Daniel M. Zuckerman ∗ Department of Computational Biology, University of Pittsburgh

Abstract

We performed “weighted ensemble” path–sampling simulations of adenylate ki-nase, using several semi–atomistic protein models. Our study investigated both thebiophysics of conformational transitions as well as the possibility of increasing modelaccuracy without sacriﬁcing good sampling. Biophysically, the path ensembles showsigniﬁcant heterogeneity and the explicit possibility of two principle pathways in theOpen ↔ Closed transition. We recently showed, under certain conditions, a “symme-try of hereteogeneity” is expected between the forward and the reverse transitions:the fraction of transitions taking a speciﬁc pathway/channel will be the same in boththe directions. Our path ensembles are analyzed in the light of the symmetry relationand its conditions. In the realm of modeling, we employed an all–atom backbone withvarious levels of residue interactions. Because reasonable path sampling required onlya few weeks of single–processor computing time with these models, the addition offurther chemical detail should be feasible.

Fluctuations and conformational changes are of extreme importance in biomolecules. For example, most enzymes show distinctly diﬀerent conformations in the apo andthe holo forms. Conformational transitions are also typical in non–enzymatic bindingproteins, and of course are intrinsic to motor proteins. ∗ email: [email protected] i.e. , trajectories in conﬁgurational space – deﬁn-ing the transition. Such path ensembles contain the information about the relevant“mechanisms” for transitions, including possible intermediates. In addition, the tran-sition rates can only be calculated accurately from a path ensemble, which implic-itly accounts for all barriers and recrossings. From a computational point of view,such path ensembles are diﬃcult to obtain due to rugged energy landscapes and thetimescales involved.

Multiple local minima and/or channels dramatically increasethe computational eﬀort required. To put the diﬃculty of path sampling in perspec-tive, note that equilibrium sampling of fully atomistic models of large biomoleculesis not typically feasible. Thus, path sampling using detailed atomistic models forall, but the smallest systems, is impractical – even with potentially eﬃcient methodsdeveloped speciﬁcally for path sampling. A number of groups have reported atomisticpath sampling studies for small systems.

Less computationally expensive approaches to determining atomistic paths areavailable, including targeted and steered molecular dynamics, “nudged elasticband”, and related approaches.

However, all these methods yield only a sin-gle path or a handful, and not the ensemble required for a correct thermal/statisticaldescription. Speciﬁcally, ﬂuctuations in pathways, the possibility of multiple path-ways (path heterogeneity), and possible recrossings typically are not accounted for inthese approaches.Coarse–grained (CG) models, on the other hand, permit an alternative strategyfor statistical path sampling.

Although CG models omit chemical detail, they canbe sampled signiﬁcantly faster than fully atomistic models, and, thus, such modelsare quite attractive for path sampling studies. For example, Zhang et al. showedthat a simple alpha–carbon model of calmodulin can be fully path sampled usingthe weighted ensemble path sampling method. Because full path sampling in thismodel required only a few weeks of single–processor computing, it is evident thatbetter models and/or larger systems could be studied. Network models have also2een used to study conformational transitions.

In this manuscript, we report path sampling studies of adenylate kinase whichrepresent improvements over previous work in several ways. (i) At 214 residues,adenylate kinase is triple the triple the size of the calmodulin domain previously pathsampled. (ii) Our models now include signiﬁcant atomic details, as explained below.(iii) We examine a series of models to test the sensitivity of the path ensemble tothe chosen interactions and parameters. (iv) We investigate symmetry, based on ourrecent formal derivation, between forward and reverse transitions.Adenylate kinase (Adk) is an enzyme that catalyzes phosphate transfer betweenAMP and ATP via AMP + ATP Mg −−− ⇀↽ −−− , (1)and thus helps to regulate the relative amounts of cellular energetic units. Thecrystal structure of Adk for

E. coli is available in several conformations. Its nativeapo form (Protein Databank code 4AKE ) is shown in Figure 1 (a). In the ﬁgure, theblue segments represent the core (CORE), the yellow segment represents the AMPbinding domain (BD), and the green segment represents the ﬂexible lid (LID). Uponligand binding, the enzyme closes over the ligands. The crystal structure (1AKE) of the holo form of the enzyme obtained in complex with a ligand that mimics bothAMP and ATP is shown in Figure 1 (b). Clearly, in the apo form, the enzyme showsan Open structure (that we denote as O in this manuscript), and in the holo form, itis Closed (denoted by C throughout).Adk has been studied previously via computational methods using both coarse–grained models and fully atomistic simulations. Coarse–grained models used to studytransition pathways for Adk have, primarily, utilized network models. In thesemethods, the ﬂuctuations in proteins are represented by harmonic potentials, and thedeformations due to these ﬂuctuations are used to estimate the free energy in thebasins (end states and/or multiple basins). Subsequently, a minimum energy path iscalculated to characterize the transition.A few groups have also studied conformational ﬂuctuations in Adk using atomistic3odels. In an interesting amalgamation of coarse and atomistic models, Arora andBrooks performed atomistic (with implicit solvent) umbrella sampling moleculardynamics (MD) simulations along an initial minimum energy path suggested by anetwork model. Kubitzki and de Groot performed replica exchange MD for atom-istic Adk to increase conformational sampling of adenylate kinase – and observedboth O and C conformers; however, a true path ensemble is not obtained from replicaexchange. In other work, fully atomistic MD on the two end structures has beenperformed to observe ﬂuctuations in the two ensembles but direct conformationaltransitions were not observed.In the present study, we use semi–atomistic models to improve chemical accu-racy compared to typical coarse–grained models while still performing high qualitypath sampling. In our models, the backbone is fully atomistic to provide chemicallyrealistic geometry. Inter–residue interactions are modeled at a coarse–grained levelvia the commonly used double–G¯o potentials that (meta)stabilize two crystalstructures. Additionally, one of the models uses residue–speciﬁc interactions to probethe eﬀect of such interactions. We use a library–based Monte Carlo (LBMC) schemeto perform sampling. . LBMC was previously developed in our group and shown tofacilitate the use of semi–atomistic models of the type used here. Transitions between the Open and the Closed states (both directions) are studiedwith the weighted–ensemble (WE) path–sampling method that has been previouslybeen studied to study folding of proteins, protein dimerization, and conforma-tional transitions in an alpha–carbon model of calmodulin. WE was shown to pro-mote eﬃcient path sampling of conformational transitions in purely alpha–carbonmodel of calmodulin. Additionally, WE is statistically exact: it preserves naturalsystem dynamics, resulting in an unbiased path ensemble. Biophysically, we focus on heterogeneity of the path ensemble (multiple pathways)and the forward–reverse “symmetry” of the ensemble. It is possible that evolutionhas favored the ﬁne–tuned precision of a single pathway in some systems, but the“robustness” of alternative pathways in other cases. Although our semi–atomistic4odels preclude biochemically precise conclusions, our path sampling means that wecan provide a complete description in a model system. Further, good path samplingenables us to investigate, perhaps for the ﬁrst time, the issue of symmetry betweenforward and reverse transitions – which has implications for studies of protein unfold-ing.

The goal of this work, in summary, is to probe the biophysics of transitions withthe most detailed models that allow for generating an ensemble of pathways. Themanuscript is organized as follows. First, In Section 2 we discuss the models we useto depict the protein. Section 3 then describes the method to generate the ensembleof pathways. In Section 4, we present results for transitions in both the directions forall the three models we used. We discuss the results, eﬃciency, and future models inSection 5, with conclusions given in Section 6.

We use three semi–atomistic models, expanding on our previous work. In all themodels, the backbone is represented in full atomistic detail, using the three residuesalanine, glycine, and proline. All intraresidue interactions are included explicitly,using the OPLSAA all–atom force ﬁeld. Both the intra–residue interaction energiesand the conﬁgurations are stored in libraries as described previously. In brief, wenote that libraries of the three types of residues are pre–generated according to theBoltzmann distribution at 300 K, and alanine is used to represent the backbone of allresidues besides glycine and proline (a simpliﬁcation motivated by the similarity ofRamachandran maps for the residues). Ligands are not modeled explicitly in thispath sampling study.The diﬀerences in the three models lie in the treatment of inter–residue interac-tions: two of the models use only double–G¯o interactions at backbone alpha carbons,whereas one model uses both double–G¯o and residue–speciﬁc interactions. Completeinformation is given below. 5ll three semi–atomistic models employ double–G¯o interactions. Following Ref ,for each of the two crystal structures, residues pairs with alpha carbons less than 8 ˚Aapart are considered native contacts. In the G¯o energy of an arbitrary conﬁguration,every native contact from the Open form is assigned an energy of − ǫ , whereas thoseexclusively found in the Closed form are scaled to be − e scale ǫ . G¯o interactions do notdistinguish between diﬀerent types of residues except in terms of size. This double–G¯o potential between two residues i and j with alpha carbon distance r ij is givenby u G¯o ( r ij ) =  ∞ , r ij < r X ij (1 − δ ) − ǫ X , r X ij (1 − δ ) ≤ r ij < r X ij (1 + δ )0 . ǫ, r X ij (1 + δ ) ≤ r ij < r Y ij (1 + δ ) − ǫ Y , ( ∗ ) r Y ij (1 − δ ) ≤ r ij < r Y ij (1 + δ )0 , r ij ≥ r Y ij (1 + δ ) (2)where r X ij and r Y ij are the native distances in the two crystal structure ordered suchthat r X ij < r Y ij (X and Y equate to Open or Closed), and δ is a well–width parameterchosen to be 0.05. If X equals Open and Y equals Closed, ǫ X = ǫ and ǫ Y = e scale ǫ ,and vice–versa. In the case of overlapping square wells, r X ij (1 + δ ) > r Y ij (1 − δ ), the0 . ǫ barrier in the middle does not exist and the lower limit of the inequality markedwith ( ∗ ) is replaced by r X ij (1 + δ ).The total G¯o interactions are therefore U G¯o ( e scale ) = X i

05 to ensure that the residue–speciﬁc interactions are a signiﬁcant8erturbation of the double G¯o interactions.To make the crystal structures (meta)stable, we “titrated in” double G¯o interac-tions ( ǫ ), until bistability was observed at 300 K. Because, as described, hydrogenbonding introduces physical units into Model 2, the units of G¯o well depth, ǫ , are alsophysical. We found that at ǫ/k = 400 K, both the structures remained (meta)stable. ¯o without energy symmetry Finally, to facilitate the generation of large path ensembles in both the Open–to–Closed and Closed–to–Open directions, we also constructed a third model. The newmodel is designed to overcome the somewhat artifactual over–stabilization of theClosed states in Models 1 and 2 (see results below in Section 4). In brief, our 8 ˚Acutoﬀ permits signiﬁcantly more contacts in the Closed state, implicitly but artiﬁciallymimicking the presence of ligands in Models 1 and 2. This implicit presence of ligandsinterferes somewhat with our goal of modeling the ligand–free opening and closing ofthe enzyme.In Model 3, therefore, we attempt to make the Open and Closed forms of adeny-late kinase more comparable in stability. We decrease the strength of G¯o interactionsspeciﬁc to the Closed form to half of G¯o interactions ( i.e. , we set e scale = 0 . U = U G¯o ( e scale = 0 . . (6) We follow many precedents ? ,62 and use “dynamical” MC for the dynamics of ourmodels. Such an approximation to physical dynamics is consistent with our use of9impliﬁed models. Speciﬁcally, we use the library–based Monte Carlo (LBMC) algo-rithm, to propagate the system in both brute–force simulations for generating equi-librium ensembles and path sampling simulations (discussed below in more detail).For both equilibrium and path sampling, the systems always evolve via “natural”LBMC dynamics, and no artiﬁcial forces are used to direct conformational transi-tions, as explained below.Our LBMC simulations use the same trial moves described in our earlier work. .Namely, one ﬂexible peptide plane in the current conﬁguration is swapped with onestored in the library, and a Ψ angle is also displaced by a small amount. In systems with rugged energy landscapes, such as proteins, regular brute–force sim-ulations are not eﬃcient for studying transitions. For this reason, we use the sta-tistically rigorous weighted–ensemble (WE) path sampling method to generate pathensembles of conformational transitions of adenylate kinase between the Open andthe Closed states. This method preserves the natural system dynamics and wasused previously to study protein folding, , protein dimerization, , and conforma-tional transitions of calmodulin using an alpha–carbon model. Weighted ensemblestudies the probability evolution of trajectories in the conﬁguration space using anyunderlying system dynamics. In this work, we use the WE method to study transi-tions in more detailed models to evaluate the eﬀect of increasing chemical detail onthe transitions, and to study questions of symmetry in forward and reverse directions.The procedure to use weighted–ensemble path sampling to study conformationaltransitions is described in detail elsewhere, and here we describe our simple im-plementation brieﬂy. Prior to beginning the simulations, we divide a one dimensionalprojection of the conﬁgurational space ( i.e. , the DRMS from the target structure inthe present study) into a number of bins. The DRMS is a “progress coordinate” or“order parameter” – and is not necessarily the reaction coordinate. The progresscoordinate roughly keeps track of the progress to the target state: the DRMS of10tructures close to the target state is necessarily small. It is also possible to usemultidimensional or adaptively changing progress coordinates, but was not foundnecessary here.In the weighted–ensemble method, an evolving set of trajectories and their prob-abilities are tracked. Procedurally, several independent trajectories are started in aninitial conﬁguration and run for a short time interval τ (consisting of multiple sim-ulation steps) with natural dynamics. At the end of each τ interval, the progressof the trajectories along the progress coordinate is noted ( i.e. , into which bin alongthe progress coordinate each trajectory ends). Once bins are tabulated after each τ , trajectories are “split” (replicated with divided probability) and combined. Thiskeeps the same number of trajectories in each occupied bin, prunes low–weight tra-jectories and splits trajectories with high probability. This splitting and combiningof simulations is performed statistically as discussed elsewhere. The probabilityremains normalized and all probability ﬂows can be measured.The full details of our WE simulations are as follows. We employ LBMC todescribe the natural system dynamics. We utilize 25 bins between the two states,with 20 simulations (trajectories) in each occupied bin. The end state is deﬁned asbeing at a DRMS of 1.5 ˚A from the target crystal structure, a deﬁnition used in bothdirections. Using this deﬁnition of the end state, we calculate the probability ﬂux oftrajectories entering the target state at the end of each τ .It should be noted that value of the probability ﬂux into either state – and hencethe rate – depends upon the precise deﬁnitions of the two states. Although probabilityﬂows are good indicators of sampling quality, precise numerical values of the ratesare not of great interest in our study of simple models with Monte Carlo dynamics.In this work, we are interested in the path ensembles and not the rates.11 Results

For reference, we ﬁrst analyze the conformational diﬀerences between the two end–state static crystal structures to quantify the observed diﬀerences in the Open andClosed conﬁgurations of Figures 1. Figure 2 shows the α –carbon distance diﬀerencemap of pairs of residues in the Open and the Closed crystal structures. A largepositive value implies that a pair is farther apart in the Open structure than in theClosed structure, whereas a negative value is the opposite. By construction, the ﬁgureis symmetric about the diagonal. A few features of the two structures easily emergefrom Figure 2. The inter–residue distances for most of the residue pairs are verysimilar in the two crystal structures. The major diﬀerences are that the distances inthe Closed structure between residues labeled LID (114–164) are closer to BD (31–60) and several residues of CORE are smaller than the corresponding distances in theOpen structure. Thus, Figure 2 quantiﬁes Figure 1.From Figures 1 and 2 it is clear that the structural change that characterizes thetransition between the Open and the Closed structure is fairly straightforward: theLID and the BD close, and the rest of the protein remains fairly unchanged. FollowingFigures 1 and 2, for the path sampling studies presented shortly, we monitor inter–residue distances between two pairs of residues: residues 56 (GLY) and 163 (THR),which report on the BD–LID proximity, as well as residues 15 (THR) and 132 (VAL),which report on the CORE–LID proximity. In the Closed structure, d c56 , = 4 . d c15 , = 6 . d o56 , = 23 . d o15 , = 17 . .2 Brute force equilibrium sampling In order to demarcate the native basins in our analysis of transitions, we ﬁrst studyequilibrium ensembles for the Open and the Closed states of adenylate kinase. Putanother way, we want to quantify the size of native–basin ﬂuctuations in our models.Further, we determine whether transition paths can be obtained without the aid ofpath sampling.We quantify ﬂuctuations in the equilibrium ensembles in the two basins by usingDRMS from the respective crystal structures. Figure 3 (a) shows two sets of DRMStraces for Model 1 for a simulation started from the Open structure: DRMS–from–Open (black line) and DRMS–from–Closed (blue line). Similarly, Figure 3 (b) showstwo sets of DRMS traces for Model 1 for a simulation started from the Closed struc-ture: DRMS–from–Closed (black line) and DRMS–from–Open (blue line). Thus, ineach panel, the black line represents DRMS from the starting structure, whereas theblue line represents DRMS from the opposing structure.A comparison of the two panels of Figure 3 shows that the simulation startedfrom the Open structure shows signiﬁcantly more ﬂuctuations than the simulationstarted from the Closed structure. Furthermore, the ﬂuctuations drive the simulationstarted from the Open structure closer to the Closed structure than vice versa. Forexample, Figure 3 (a) shows that the simulation started from the Open structure getsto within 3 ˚A of the Closed structure at approximately 70 million MC steps. On theother hand, the simulation started from the Closed structure (Figure 3 (b)) remainsfarther from the Open structure.Most importantly, neither simulation show a transition to the opposing structure.The DRMS from the opposing structure for a particular simulation is always signiﬁ-cantly larger than DRMS values from the starting structure for the other simulation.To elaborate, let us consider the DRMS–vs–Closed structure for the simulation startedfrom the Closed structure (black line in Figure 3 (b)). The ﬂuctuations in DRMSremain less than 1.5 ˚A in the native basin for the Closed structure. Comparatively,13he largest ﬂuctuations in the simulations started from the Open structure bring itonly within at most 3 ˚A of the Closed structure (blue line in Figure 3 (a)). That is,the opposing native basin is never reached.We mention that all the DRMS values plotted in Figures 3 (a) and (b) are basedon the ﬁrst 200 residues. This is because the 14 tail residues, which form a helicalsegment, are very ﬂexible and the helix unravels in either structure at a much lowertemperature than the stable part of the protein. Thus, Figure 3 focuses on the restof the protein. Additionally, although we show results for T = 0 . Due to the inability of brute–force simulations to show transitions, we use weighted–ensemble path sampling to generate an ensemble of transition pathways with theaim of assessing path heterogeneity. In particular, we examine transitions in bothdirections for all the three models.

We ﬁrst check whether our path sampling is suﬃcient by monitoring the ﬂux into thetarget state. Figure 5 plots the WE results for probability ﬂuxes obtained into the14losed state for both Models 1 and 2. The “time” axis is merely the number of τ intervals (where one interval contains 2000 LBMC steps). In both models, the ﬂuxesreach linear regimes indicating that the observed transitions are not merely due toinitial fast trajectories and the path ensemble is appropriately sampled.The sensitivity to the models is also apparent in the ﬂuxes shown in Figure 5:Model 2 (which includes hydrogen–bonding, Ramachandran propensities, and MJ–type residue speciﬁc interactions) has a smaller ﬂux into the Closed state than Model1. Residue–speciﬁc interactions are expected to roughen the energy landscape, consis-tent with the observed slowing of transition dynamics. However, the possible changein the Open state basin stability due to addition of these interactions is convolutedwith the roughening of the landscape.We further study the path ensemble by examining individual trajectories. Figure 6shows, for Model 1, the DRMS from the Closed structure for four typical transitionsstarted in the Open state as a function of time (total number of LBMC steps) obtainedvia WE path sampling. In contrast with the brute–force simulation in Figure 3 (a),each trajectory in Figure 6 gets to the Closed state (deﬁned to be within a DRMSof 1.5 ˚A from the Closed structure). Although the trajectories arrive at the targetstate with diﬀerent weights, the ones shown in the ﬁgures above are obtained aftera simple resampling procedure, and, thus, represent trajectories that arrive withrelatively large probabilities. Resampling is a statistically rigorous procedure to prunean ensemble. In our resampling scheme, a trajectory arriving at the target withweight w is kept with a probability w/w max .For trajectories that begin transitions at larger times (such as Trajectory 4 inFigure 6), a signiﬁcant amount of time is spent in regions with large DRMS valuesfrom the Closed structure. Thus, in Figure 7 and beyond, we do not show the “dwelltime” in the Open state.To analyze the order of domain closing, and, in particular, to study possible het-erogeneity in the path ensemble, we study the four trajectories of Figure 6 in moredetail. Figure 7 plots the projection of the above four trajectories onto the d , and15 , plane (see Section 4.1) obtained via WE for Open–to–Closed transition. Theﬁlled circle shows the Closed x–ray structure, whereas the ﬁlled diamond is for theOpen x–ray structure. The corresponding open circles and diamonds are represen-tative of ﬂuctuations in the ensembles of Closed and Open structures, respectively.The relatively larger spread of structures in the Open ensemble compared to Closedreﬂects the larger Open–state ﬂuctuations depicted previously in Figure 3. The fourdiﬀerent colored lines show the four trajectories of Figure 6, without the “dwell time”in the Open ensemble.In all the trajectories, the transition through the region with values of d , (BD–LID distance) intermediate between the two ensembles is fairly rapid. The ﬂexible LIDundergoes large ﬂuctuations in the Open state, and the transition to the Closed stateis typically accomplished via the BD snapping closed on a much smaller timescale.Despite the relatively fast closing of the BD for the trajectories in Figure 7, theexact transition paths traced by the four trajectories are signiﬁcantly diﬀerent. ForTrajectory 1, the BD shuts after the ﬂexible LID gets close to the CORE. For thefollowing discussion, we call this pathway as Open–LID–BD–Closed (ﬁrst the LIDrelaxes, and then the BD shuts close). On the other hand, Trajectory 3 shows adramatically diﬀerent behavior: the BD snaps shut before the ﬂexible LID gets closerto the CORE (this pathway is labeled as Open–BD–LID–Closed). The other twotrajectories are somewhere in between the two extremes.To quantify heterogeneity in the path ensembles, we compare the ratio of trajec-tories in the two transition pathways. Speciﬁcally, we deﬁne a trajectory to follow theOpen–LID–BD–Closed (lower right) pathway if it ﬁrst visits the region d , < . d , > . d , > . d , > . i.e. , the CORE region) maintains a stable shape duringthe transformation.To determine the sensitivity of the path ensemble to the model, we similarlyanalyzed results from Model 2 (which includes hydrogen bonding, Ramachandranpropensities, and a level of residue speciﬁcity). A similar qualitative picture is ob-tained for Model 2. Figure 10 plots three of the resampled trajectories from the Opento the Closed structure using Model 2. The “dwell times” in the Open state have beenomitted for clarity. Again, the symbols have the same meaning as in Figure 7 (exceptthat open symbols represent the ﬂuctuations obtained using Model 2). The transitionfrom the Open–to–Closed structure primarily takes place by the BD snapping closedto the LID on a much shorter time scale. Depending upon the relative positions ofthe LID and CORE, the completion of the transition requires further adjustment ofthe LID relative to the CORE. The ratio of paths in the two pathways is the same asthat for Model 1. We also studied “reverse” transitions – from the Closed to the Open state. Figure 11shows the ﬂux into C as a function of “time” for both Models 1 and 2. Compared toFigure 5 for the transition from the Open to the Closed state, the ﬂux into state B isseveral orders of magnitude lower. This observation mirrors the previously describedlarger ﬂuctuations in the Open state ensemble. Flux into the Open state for Model2 with residue speciﬁc chemistry is higher than for Model 1, despite the expectedroughening of the energy landscape. This necessarily reﬂects a free energy shift,17uggesting MJ interactions de–stabilize the Closed state compared to a pure double–G¯o model. Such a shift seems appropriate given that we do not model ligands whichimplicitly lead to more contacts in the Closed state and consequent over–stabilizationin the G¯o model.For Closed–to–Open transition using either model, we obtain pathways which mir-ror the Open–to–Closed transition: the LID ﬂuctuates in the Closed state, and thisis followed by the BD snapping open on a relatively fast time scale. For both Mod-els 1 and 2, successful trajectories appear to follow only the Closed–LID–BD–Openpathway for Closed–to–Open transition (reverse order of the Open–BD–LID–Closedpathway in the Open–to–Closed transition direction). The absence of symmetry issurprising given our recent formal demonstration, and there seem to be two possi-ble reasons. First, the transients for Closed–BD–LID–Open pathway are long–lived.Lengthy transients are consistent with the low reverse reaction rates, shown in Fig-ure 11, for both Models 1 and 2. Second, our state deﬁnitions may be ﬂawed asdiscussed in Section 5.2.To clarify the issue of the symmetry of path ensembles between forward and reversedirections, we constructed and path sampled Model 3. The slow Closed–to–Open transitions indicates that, for Models 1 and 2, the freeenergy of the Closed structure is signiﬁcantly lower than that of the Open structure.As discussed in Section 2.3, this suggested the use of Model 3, which decreases inmagnitude the favorable energy for contacts present only in the Closed state. Thatis, Model 3 reduces the free energy asymmetry between the Open/Closed states.Model 3 thereby facilitates study of the symmetry between forward and reversetransitions. As shown in Figure 12, although the ﬂux in the Open–to–Closed directionin Model 3 is higher than in the Closed–to–Open direction, the diﬀerence between theﬂuxes in the two directions is much less than that for Models 1 and 2. The increasedClosed–to–Open rate implies that the relative stability of the Closed state is reduced18ompared to Models 1 and 2. Importantly, the relatively linear behavior of ﬂuxes inboth the directions implies our path sampling is suﬃcient – well beyond transients.For Model 3, we examine the same classiﬁcation of pathways as above. Bothpaths are frequently observed in both directions. In Figure 13, we show the ratio ofprobabilities of the two paths as a functions of simulation time in the two directions.Values in each window are averaged over 500 τ increments. The results for the Open–to–Closed direction (diamonds) are shown for a single simulation, whereas the Closed–to–Open transitions (circles) are shown for 6 independent simulations. Despite largeﬂuctuations, the ratios of paths in the two directions are similar. We discuss the issueof path symmetry further, below. An important issue in any coarse–grained study is the sensitivity of the results tothe particular model(s) used. To address this point, we used three diﬀerent semi–atomistic models of adenylate kinase. For the models used, we ﬁnd that the transitionpathways are not signiﬁcantly aﬀected by the models we used. In particular, we ﬁndtwo dominant pathways (Open–LID–BD–Closed and Open–BD–LID–Closed) thatoccur in all the models. Although the rates vary considerably among models, we donot expect realistic kinetics in simpliﬁed models.Our choice of models was governed by the basic requirement of obtaining full pathsampling of conformational transitions – in order to study path ensembles, heterogene-ity, and symmetry. Two of the models are based purely on structure (G¯o model) andthe other (Model 2) includes some level of residue speciﬁcity via Miyajawa–Jerniganinteractions, as well as hydrogen bonding energies and Ramachandran propensities.In Model 2, the chemical energy terms are signiﬁcant perturbations to the G¯o inter-actions. (as quantiﬁed by MJ interactions between residues). This model is designed19o be able to capture a minimum level of biochemistry. However, Model 2 still re-quires signiﬁcant G¯o–type interactions to stabilize the two physical states. In thefuture, we plan to utilize more detailed and explicit side chain–side chain and sidechain–backbone interactions to reduce the dependence on G¯o–type interactions.Another limitation is that we did not consider the ligand in our path samplingsimulations. The inclusion of ligand could inﬂuence the observed pathways signiﬁ-cantly. We have plans for modeling ligand via “mixed models” that include all–atomligands and binding sites, with a coarse–grained picture for the rest of the protein.Such an explicit inclusion of ligands, with the corresponding degrees of freedom ofthe unbound ligands in the Open form should reduce the dependence on arbitraryG¯o interactions. A study with explicit ligands could require a higher dimensionalprogress coordinates to use in weighted ensemble simulations: one coordinate for pro-tein structure (as is done in this work), and a second (or further) coordinates forthe distance between ligands and the protein. Note that weighted ensemble can mixreal– and conﬁgurational–space coordinates: it was originally designed for bindingstudies. Recently, we investigate the conditions when there should be symmetry – i.e. , whenpathways in the forward and the reverse directions occur with the same ratio. Weshow that exact symmetry will hold when a speciﬁc (equilibrium–based) steady stateis enforced. Approximate symmetry is expected if the initial and ﬁnal states are well–deﬁned physical basins lacking slow internal timescales, so that trajectories emergingfrom a state “forget” the path by which they entered. Figure 13 suggests that theratio of the two diﬀerent pathways in the two directions is very similar for Model 3,which was fully path sampled in both directions.Such a symmetry is clearly absent from our results (even after accounting forstatistical ﬂuctuations) in Models 1 and 2. Although we observed transitions inboth the directions for all the models, Closed–to–Open transitions in all the models20especially in Models 1 and 2) are harder to obtain. In particular, the Closed–BD–LID–Open pathway is not observed in our simulations for Models 1 and 2. Thisindicates a lack of the correct steady state for these models in the Closed–to–Opendirection and/or insuﬃciently well–deﬁned states. It is unlikely that the highly ﬂexibleOpen state is a good physical basin. We are currently working on developing WE pathsampling methods that allow steady states to be sampled directly and eﬃciently. Related steady–state methods are already available.

One of the basic goals of this work was to determine the level of detail we can includein a model, while still allowing for full sampling of the path ensemble. Thus, we nowdiscuss the computational eﬀort that was required. All simulations were performed onsingle 3 GHz Intel processors. The results shown for Model 1 in the Open–to–Closeddirection took approximately one week of single CPU time. More simulation wasperformed in the Closed–to–Open direction, requiring 3-4 weeks of single CPU time.The results for Model 2 were obtained using approximately the same time as Model 1.For Model 3, the Closed–to–Open transition was not much harder to obtain than theOpen–to–Closed transition, and a simulation in each direction required approximatelytwo weeks of single CPU time. Due to the low CPU usage for obtaining path ensemblesfor the models used here, obtaining path ensembles of better models using WE ispossible. See Section 5.1.It is not hard to estimate the eﬃciency of WE simulation compared to brute–force.The transition rates determined from WE simulations indicate the time required forbrute–force simulations to achieve transitions and hence permit estimates of eﬃ-ciency. For example, the rate obtained for Closed–to–Open transition for Model 3 is2.5 × − /τ . Thus, one brute–force transition can be estimated to require the recip-rocal amount of time. Since 2000 τ require approximately one week of computing, BFis estimated to take approximately 4 years for a single transition. In contrast WEyielded 50 transitions after resemapling ( i.e. , 50 transitions with equal weights), in21bout two weeks of single–processor computing. (Before resampling, there were about3000 WE transitions for each simulation). WE is thus signiﬁcantly more eﬃcient thanBF. For transitions in the other direction and/or other models, a qualitatively similarpicture for eﬃciency emerges. We applied weighted ensemble (WE) path sampling to generate ensembles for confor-mational transition between Open (apo) and Closed (holo) forms of adenylate kinaseusing semi–atomistic models of the protein. No additional driving force was usedto enable the transitions. We showed that conformational transitions in both direc-tions are possible for such models via WE. In contrast, brute–force simulations arevastly ineﬃcient. Given the relatively small computational eﬀort required for observ-ing transitions using WE, more detailed models can be used for full path sampling.In the future, models with further reduced dependence on G¯o–type interactions areneeded, along with ligand modeling, to study the speciﬁc enzyme biochemistry – andpath sampling of such models appears possible.All the models show signiﬁcant hereteogeneity in the transition pathways. Inparticular, two dominant pathways observed are characterized by the order in whichthe ﬂexible lid and the AMP binding domains close. Although the rates obtained(in terms of Monte Carlo steps) varied signiﬁcantly depending upon the model used,similar dominant pathways are obtained across the models. We further showed in theAppendix the formally exact result that the transition paths must be symmetric inthe two directions in the (equilibrium–based) steady states. The model that allowssigniﬁcant transitions in both the directions shows an approximate symmetry whichappears to be consistent with conditions on the symmetry rule.

Acknowledgments:

We thank Dr. Bin Zhang and Prof. David Jasnow forhelpful discussions. This work was supported by the NIH (Grant GM070987) and theNSF (Grant MCB–0643456). 22 eferences [1] Berg, J. M., L. Stryer, and J. L. Tymoczko. 2002. Biochemistry. Freeman, NewYork.[2] Hammes, G. G. 2002. Multiple conformational changes in enzyme catalysis.

Biochem.

J.Chem. Phys.

Annu.Rev. Phys. Chem . 53:291–318.[5] van Erp, T. S., D. Moroni, and P. G. Bolhuis. 2003. A novel path samplingmethod for the calculation of rate constants.

J. Chem. Phys.

J. Comp. Phys.

J. Chem. Phys.

Phys. Rev. Lett . 94:018104.2311] Allen, R. J., D. Frenkel, and P. R. ten Wolde. 2006. Simulating rare events inequilibrium or nonequilibrium stochastic systems.

J. Chem. Phys.

J. Phys.Chem. B . 111:12876–12882.[14] Juraszek, J., and P. G. Bolhuis. 2008. Rate constant and reaction coordinate oftrp–cage folding in explicit water.

Biophys. J.

Phys. Chem. Chem. Phys.

J. Chem. Phys.

Mol. Simul.

Proc. Natl. Acad. Sci.

J. Chem. Phys.

Chem. Phys. Lett.

J. Chem. Phys.

Chem. Phys. Lett.

J. Chem. Phys.

Adv. Chem. Phys.

J. Chem. Phys.

Protein Sci.

Curr. Opin.Struct. Biol.

J. Phys. Chem. B . 108:5127–5137.2532] Zhang, B. W., D. Jasnow, and D. M. Zuckerman. 2007. Eﬃcient and veriﬁedsimulation of a path ensemble for conformational change in a united-residuemodel of calmodulin.

Proc. Natl. Acad. Sci.

Biophys. J.

J. Mol. Biol.

PLoS Comput. Biol. http://arxiv.org/abs/1002.2402 . .[37] Pontiggia, F., A. Zen, and C. Micheletti. 2008. Small- and large-scale con-formational changes of adenylate kinase: A molecular dynamics study of thesubdomain motion and mechanics.

Biophys. J.

Biochem.

Proc. Natl. Acad. Sci.

Structure . 4:147–156. 2641] Muller, C. W., and G. E. Schultz. 1992. Structure of the complex betweenadenylate kinase from escherichia–coli and the inhibitor ap5a reﬁned at 1.9 aresolution – a model for a catalytic transition state.

J. Mol. Biol.

Biophys.J.

Proc.Natl. Acad. Sci.

Structure . 16:1175–1182.[46] Henzler-Wildman, K. A., V. Thai, M. Lei, M. Ott, M. Wolf-Watz, T. Fenn,E. Pozharski, M. A. Wilson, M. Karplus, C. G. Hubner, and D. Kern. 2007.Intrinsic motions along an enzymatic reaction trajectory.

Nat.

Struct.

Proc. Natl. Acad. Sci.

J. Phys. Chem. B . 113:10891–10904.[50] Rojnuckarin, A., S. Kim, and S. Subramanian. 1998. Brownian dynamics simu-lations of protein folding: Access to milliseconds time scale and beyond.

Proc.Natl. Acad. Sci.

J. Molec. Struct.–Themochem . 529:183–191.[52] Zhang, B. W., D. Jasnow, and D. M. Zuckerman. 2010. The “weighted ensem-ble” path sampling method is statistically exact for a broad class of stochasticprocesses and binning procedures.

J. Chem. Phys.

Cell . 108:573–582.[54] Day, R., and V. Daggett. 2005. Sensitivity of the folding/unfolding transitionstate ensemble of chymotrypsin inhibitor 2 to changes in temperature and solvent.

Protein Sci.

Proc. Natl. Acad. Sci.

Proteins: Structure Function and Genetics . 40:389–408.[57] Miyazawa, S., and R. L. Jernigan. 1985. Estimation of eﬀective interresiduecontact energies from protein crystal–structures – quasi–chemical approximation.

Macromolecules . 18:534–552. 2858] Miyazawa, S., and R. L. Jernigan. 1996. Residue-residue potentials with a fa-vorable contact pair term and an unfavorable high packing density term, forsimulation and threading.

J. Mol. Biol.

Curr. Op. Struct. Biol . 6:195–209.[60] Gan, H. H., A. Tropsha, and T. Schlick. 2000.

J. Chem. Phys.

J.Mol. Biol.

Proc. Natl. Acad. Sci. http://arxiv.org/abs/0910.5255 . .[65] Warmﬂash, A., P. Bhimalapuram, and A. R. Dinner. 2007. Umbrella samplingfor nonequilibrium processes.

J. Chem. Phys.

J. Chem. Phys. pen vs. Open Model 1 (a)Open vs. Closed 0 1 2 3 4 5 6 7 8 9 0 50 100 150 200

PSfrag replacements MC steps ( × ) D R M S Closed vs. ClosedClosed vs. Open Model 1 (b) 0 1 2 3 4 5 6 7 8 9 0 25 50 75 100 125 150 175 200

PSfrag replacements MC steps ( × ) D R M S Figure 3: Stability of the native basins in Model 1. The DRMS in Model 1 (pure G¯o)is shown for two simulations (a) starting from the Open structure and (b) startingfrom the Closed structure. For each simulation, we show the DRMS from the startingstructure (black line) and the opposing structure (blue line). Neither simulation showa transition to the opposing structure, but the scales of ﬂuctuations are very diﬀerent.32 pen vs. ClosedOpen vs. Open Model 2 (a) 0 1 2 3 4 5 6 7 8 9 0 50 100 150 200 250 300

PSfrag replacements MC steps ( × ) D R M S Closed vs. ClosedClosed vs. Open Model 2 (b) 0 1 2 3 4 5 6 7 8 9 0 50 100 150 200 250 300

PSfrag replacements MC steps ( × ) D R M S Figure 4: Stability of the native basins in Model 2. The DRMS in Model 2 (G¯o,H-bonding, Ramachandran propensities, and MJ–type residue speciﬁc interactions)is shown for two simulations (a) starting from the Open structure and (b) startingfrom the Closed structure. For each simulation, we show the DRMS from the startingstructure (black line) and the opposing structure (blue line). Neither simulation showa transition to the opposing structure, but the scales of ﬂuctuations are very diﬀerent.33 odel 1Model 2 0 0.04 0.12 0.16 0.2 0 400 600 800 1000 1200 1400 200 0.08

PSfrag replacements F l u x i n t o C l o s e d s t a t e τ Figure 5: Comparison of the probability ﬂuxes into the Closed state for two models.The ﬂuxes from a pure double–G¯o system (Model 1, solid line) and a system withconsiderable chemical speciﬁcity (Model 2, dashed line) are plotted as functions ofWE time intervals (each τ interval is 2000 LBMC steps). In both cases, the ﬂuxesreach approximately linear regimes, suggesting transient eﬀects have attenuated.34 rajectory 2 Trajectory 3Trajectory 1 Trajectory 4Model 1, Open to ClosedClosed state 0 2 4 6 8 0 1 2 3 PSfrag replacements MC steps ( × ) D R M S Figure 6: Open–to–Closed transitions observed for Model 1. Four typical DRMStraces are shown, measured from the Closed crystal structure. The Closed basin isdelimited by a DRMS of 1.5 ˚A from the Closed crystal structure.35 rajectory 1Trajectory 4Trajectory 2Closed Open T r a j ec t o r y Closed (crystal)Open (crystal) Open (ensemble)Closed (ensemble)Open to Closed (Model 1) 0 5 10 15 20 25 0 5 10 15 20 25

PSfrag replacements d , ˚A ( L I D – C O R E ) d , ˚A (BD–LID) Figure 7: Path heterogeneity in Model 1 transitions. Four typical trajectories for theOpen–to–Closed transition (Figure 6 are shown via distances between the CORE andLID (ordinate) and between the BD and the LID (abscissa). The “dwell” times forthe trajectories in the Open state excluded for clarity.36igure 8: Four time–ordered, representative structures along the Open–BD–LID–Closed path obtained for Trajectory 3 in Figure 7 (Model 1). The CORE domainis blue, the BD is yellow, and the LID is green. The conﬁgurations correspond to(296000, 802000, 822000, 1496000) MC steps, respectively, in Trajectory 3 of Figure 6.37igure 9: Four time–ordered, representative structures along the Open–LID–BD–Closed path obtained for Trajectory 1 in Figure 7 (Model 1). The CORE domainis blue, the BD is yellow, and the LID is green. The conﬁgurations correspond to(128000, 206000, 306000, 480000) MC steps, respectively, in Trajectory 1 of Figure 6.38 pen to Closed (Model 2) OpenClosedOpen (crystal)Closed (crystal) Open (ensemble)Closed (ensemble) 0 10 15 20 25 0 5 10 15 20 25 30 5

PSfrag replacements d , ˚A ( L I D – C O R E ) d , ˚A (BD–LID) Figure 10: Path heterogeneity depicted via four typical trajectories for the Open–to–Closed transition obtained using Model 2 shown via distance between the COREand LID (ordinate) versus the distance between the BD and the LID (abscissa). The“dwell” times for the trajectories in the Open state excluded for clarity.39 odel 1Model 2Closed to Open transition−35−30−25−20−15−10−5 0 1000 2000 3000 4000 5000 6000

PSfrag replacements l og ( F l u x i n t o O p e n s t a t e ) τ Figure 11: A comparison of probability ﬂuxes into the Open state for two modelsas a function of WE time increment (each τ is 2000 LBMC steps). The solid line isfor Model 1, whereas the dashed line is for Model 2. The log scale emphasizes thesmall amount of ﬂux in Closed–to–Open direction for both the models, as comparedto ﬂuxes in the reverse direction. 40 odel 3Open to Closed Closed to Open 0 0.01 0.02 0.03 0 500 1000 1500 2000 2500 PSfrag replacements F l u x i n t o t a r g e t s t a t e τ Figure 12: Probability ﬂux in either direction for Model 3 as a function of WE timeincrement (each τ is 2000 LBMC steps). For this model, the Closed–to–Open ﬂux isof the same order of magnitude as in the reverse direction.41 pen to closedClosed to Open Model 3 0 0.2 0.4 0.6 0.8 1 1.2 0 500 1000 1500 2000 2500 PSfrag replacements P ( p a t h = ) / P ( p a t h = ) τ Figure 13: Ratio of probabilities of the two paths, 1 and 2, in the Open–to–Closed (di-amonds) and Closed–to–Open directions (circles). Results from six independent WEsimulations are shown in the Closed–to–Open direction to highlight the ﬂuctuations.All data points are window averaged for 500 ττ