Stochastic Simulation to Visualize Gene Expression and Error Correction in Living Cells
SStochastic Simulation to Visualize Gene Expression andError Correction in Living Cells
Kevin Y. ChenDepartment of Chemistry, University of Cambridge,Lensfield Road, Cambridge CB2 1EW, UKDaniel M. ZuckermanDept of Biomedical Engineering, Oregon Health & Science Univ,Portland, OR 97239,and Philip C. NelsonDepartment of Physics and Astronomy, University of Pennsylvania,Philadelphia PA 19104September 18, 2018
Abstract
Stochastic simulation can make the molecular processes of cellular controlmore vivid than the traditional differential-equation approach by generatingtypical system histories instead of just statistical measures such as the meanand variance of a population. Simple simulations are now easy for studentsto construct from scratch, that is, without recourse to black-box packages. Insome cases, their results can also be compared directly to single-molecule ex-perimental data. After introducing the stochastic simulation algorithm, thisarticle gives two case studies, involving gene expression and error correction,respectively. Code samples and resulting animations showing results are givenin the online supplement.
Physical processes unfold over time. Our minds grasp physical mechanisms largelyvia narrative. So it is not surprising that some of the most vivid physics demon-strations also play out over time. Simulations of physics that unfold over time aresimilarly powerful; interactive simulations are better; and simulations created by thestudent can be best of all. This view is gaining ground in introductory courses [1],but the benefits of animated simulation extend farther than this. Here we wish toshow that the behavior of strongly nonequilibrium statistical systems can be illus-trated via stochastic simulations that are simple enough to serve as undergraduate1 a r X i v : . [ q - b i o . S C ] S e p rojects. Recently developed, free, open-source programming resources sidestep thelaborious coding chores that were once required for such work. In particular, we be-lieve that the error-correction mechanism known as kinetic proofreading can be moreclearly understood when a student views its linear temporal sequence, as opposed tosolving deterministic rate equations. Coding this and other simple processes opensthe door for the student to study other systems, including those too complex for therate-equation approach to yield insight. We tell students that a simple chemical reaction, for example isomerization of amacromolecule, can be regarded as a barrier-passing process. A micrometer-sizebead in a double optical trap serves as a mesoscopic model system with this character[2], and it is well worthwhile for students to watch it undergo a few dozen sharptransitions in between episodes of Brownian motion near its two stable positions(see supplementary video 1). A simple model for this behavior states that thehopping transitions occur at random times drawn from an exponential distribution.That is, many rapid transitions are interspersed with a few long pauses.
With this physical motivation, students can explore how to generate simulated wait-ing times of the sort just described. Any computer math system has a pseudorandomnumber generator that generates floating-point numbers uniformly distributed be-tween 0 and 1. Many students are surprised (and some are intrigued), to learn thatapplying a nonlinear function to samples from a random variable yields sampleswith a different distribution, and in particular that y = − τ ln x is exponentiallydistributed, with mean τ , if x is uniform on (0,1] [3].Starting from that insight, it takes just one line of code to generate a list of simu-lated waiting times for transitions in a symmetric double well; finding the cumulativesums of that list gives the actual transition times (see supplementary computer code1). The freely accessible VPython programming system (or its Web-based versionGlowscript) makes it very easy to create an animation of an object whose spatialposition is supplied as a function of time [4]. The only challenging part is to passfrom a list of irregularly-spaced transition times to particle positions at each of many(regularly-spaced) video frames (see supplementary computer code 1). The payoffis immediate: Visually, the simulated hopping has a very similar character to theactual Brownian hopping of a bead in a double trap (see supplementary video 2).2 .3 Upgrade to 1D random walks It may be of interest to make a small modification of the code: Instead of hoppingbetween two wells (reversing direction on every step), consider one-dimensional diffu-sion on a symmetric many-well potential, for example, one of the form U ( x ) = sin( x ).In such a potential, for each transition the system must also decide whether to in-crease or decrease a “position” coordinate. The resulting random walk will displaythe same long-time scaling behavior as any unbiased 1D walk, but with trajectoriesthat undergo hops at random times, not periodic steps as in the simplest realization[3]. We can now generalize from situations with essentially only one kind of transition(or two symmetric kinds), to the more interesting case where several inequivalentchoices are possible, and where the relevant probabilities depend on the currentstate. This general situation can describe a chemical reaction that requires, anddepletes, molecules of some substrate.Most science students know that living cells synthesize each of their messengerRNAs (mRNAs) from a single copy (or a small fixed number) of the correspondinggene. Even a constitutive (unregulated) gene must wait for the transcription appa-ratus to arrive, bind, and begin transcription [5]. We consider a situation in whichthat apparatus is in short enough supply that this waiting is the primary determi-nant for the initiation of transcription. Once a mRNA transcript has formed, it hasa limited lifetime until it is degraded by other cellular machinery. We assume thatthis process, too, relies on chance encounters with degradation enzymes. Moreover,each of many species of mRNA must all share the attentions of a limited number ofdegradation enzymes, so each mRNA copy has a a fixed probability per unit timeto be removed from the system.The physical hypotheses in the preceding paragraph amount to a model called the“birth-death process,” which has many other applications in physics and elsewhere.As in the 1D walk, we characterize the system’s state by an integer, in this casethe population of the mRNA of interest. Synthesis is a transition that increasesthis number, with a fixed probability per unit time k s (called the “mean rate” ofsynthesis). Degradation is a transition that decreases it, with a probability per unittime that is the current population n times another constant k d (the “rate constant”for degradation). D. Gillespie extended and popularized a simple but powerful method, the “stochas-tic simulation algorithm,” for simulating systems of this sort [6]. In the case just3escribed, the algorithm repeatedly executes the following steps (see supplementarycomputer code 2): • Determine the probability per time k tot for any of the allowed transitionsto occur by summing all the mean rates. In a birth-death process, we have k tot = k s + nk d . • Draw a waiting time from the exponential distribution with mean given by thereciprocal of k tot , via the method in Sect. 2.2. • Determine which of the allowed processes happens at that transition time. Inthe birth-death process, we make a Bernoulli trial with probability p = k s /k tot to increase population n by one, and 1 − p to decrease it. • Update n and repeat.The beauty of this algorithm, besides its correctness [7], is that no computation iswasted on time steps at which nothing happened: By definition, there is a statetransition at every chosen time. Students will probably find it reasonable that, when n is sufficiently large, we mayneglect its discrete character. Students who have been exposed to probability ideasmay also find it reasonable that in this case, the relative fluctuations of n from onerealization to the next will be small, and so n effectively behaves as a continuous,deterministic variable, subject to the differential equation d n/ d t = − k d n + k s . Thatequation predicts exponential relaxation from an initial value n to the steady value n ∗ = k s /k d with e-folding time k d : n ( t ) = n ∗ + ( n − n ∗ )e − k d t . (1)The simulation bears out this expectation (Fig. 1a,b).Actually, mRNA populations in living cells are often not large. Nevertheless,although individual realizations of n ( t ) may differ significantly, the ensemble average of many such trajectories does follow the prediction of the continuous/deterministicidealization (Fig. 1c). Within individual cells, there will be significant deviationaround that mean behavior (Fig. 1c again). In particular, the “steady” state willhave fluctuations of n that follow a Poisson distribution (Fig. 1d). That key result ismore memorable for students when they discover it empirically in a simulation thanit would be if they just watched the instructor prove it with abstract mathematics(by solving a master equation [3]).State fluctuations of the sort just mentioned may suffice to pop a more complexsystem out of one “steady” state and into a very different one. Indeed, even thesimplest living cells do make sudden, random state transitions of this sort. Such4 . . . . × P o pu l a t i o n × × .
05 0 . . . . . × P o pu l a t i o n . a bc d Figure 1:
Behavior of a birth-death process. ( a ) The bumpy traces show two examples ofsimulated time series with k s = 12 min − , k d = 0 .
015 min − , and hence n ∗ = 800. The initialpopulation was n = 3200. The smooth trace shows the exponential relaxation predicted bythe continuous, deterministic approximation (Eq. 1). ( b ) After the system comes to steadystate, there is a tight distribution of n values across 100 runs of the simulation ( bars ). The curve shows the Poisson distribution with mean n ∗ for comparison. ( c,d ) The same butwith k s = 0 .
15 min − and n = 40. Although individual instances (runs of the simulation)deviate strongly from the continuous, deterministic approximation, nevertheless the samplemean of the population n ( t ) over 150 runs does follow that prediction. The distribution ofsteady-state fluctuations is again Poisson ( curve in d). (See also [3].) Bacteria are supposedly simple organisms. The birth-death process is simple, too,and it fits with the cartoons we see in textbooks. So it is interesting to follow the re-cent discovery that the model makes quantitative predictions for mRNA productionthat were experimentally disproven [11, 12, 3].For example, recent advances in single-molecule imaging permit the direct mea-surement of n ( t ) in individual cells, and disproved the model’s prediction that thedistribution of n in the “steady” state should be Poisson (Fig. 2c). Researchersfound, however, that a simple modification of the birth-death model could accom-modate these and other discrepant data. The required extension amounts to as-suming that mRNA transcripts are generated in bursts, that the bursts themselvesare initiated with a fixed probability per unit time, and that once initiated, a burstis also terminated with a fixed probability per unit time. Although this “burstingmodel” has two additional parameters compared to the original birth-death model,nevertheless it was overconstrained by the experimental data, so its success was anontrivial test [12, 3]. Remarkably, detailed biochemical mechanisms for this be-havior were found only some years after its indirect inference [13, 14, 15, 16], animportant lesson for students to appreciate. Ask a student, “What is the big secret of life?” and the answer will probablybe “DNA,” or perhaps “evolution by natural selection.” Indeed, DNA’s high, butnot perfect, degree of stability underlies life’s ability to replicate with occasionalrandom modifications. But it is less well appreciated that the stability of a moleculeof DNA does not guarantee the accuracy of its replication and transcription. Thereis another big secret here, just as essential to life as the well known ones. In fact, awide range of molecular recognition events must have extremely high accuracy forcells and their organisms to function. Think of our immune cells, which must ignorethe vast majority of antigens they encounter (from “self”), yet reliably attack a tinysubpopulation of foreign antigens differing only slightly from the self.6 n ( t ) a log var( n ) ∞ b ln( P (0)) c Experiment − − −
100 50 100 150Time after induction, t [min] − ∞ t [min] − n (t) n Figure 2:
Indirect evidence for transcriptional bursting. ( a ) Symbols:
Time course of thenumber of mRNA transcripts in a cell, n ( t ), averaged over 50 or more cells in each of threeseparate, identical experiments. Data from the three trials are shown with three differentsymbols. All of the cells were induced to begin gene expression at time zero. The dashedcurve shows a fit of the birth-death (BD) process (Eq. 1) to data, determining the apparentsynthesis rate k s ≈ . / min and clearance rate constant k d ≈ . / min. The solid curve shows the corresponding result from a computer simulation of the bursting model discussedin Sect. 3.3 (see also [3]). ( b ) Semilog plot of the fraction of observed cells that have zerocopies of mRNA versus elapsed time. Symbols show data from the same experiments asin (a).
Dashed line:
The birth-death process predicts that initially P n(t) (0) falls with timeas exp( − k s t ), where k s has the value found by fitting the data in (a). (Degradation wasnegligible in this experiment.) The experimental data instead yield initial slope − . / min. Solid line:
Computer simulation of the bursting model. ( c ) Experiments were performed ateach of many different levels of gene induction. For each level, a cross shows the variance ofthe late-time mRNA population n ∞ versus its sample mean. This log-log plot of the datashows that they fall roughly on a line of slope 1, indicating that the Fano factor (var n ) / (cid:104) n (cid:105) is roughly a constant. The simple birth-death process predicts that this constant is equalto 1 ( dashed line ), because mean equals variance for any Poisson distribution, but the datainstead give the value ≈
5. The circle shows the result of the same simulation shown in(a,b). (Figure adapted from [3]; experimental data from [11].) without those huge andexpensive machines, despite the incessant nanoscale thermal motion!Merely intoning that a wonderful molecular machine called the ribosome ac-complishes this feat doesn’t get us over the fundamental problem: At each stepin translation, the triplet codon at the ribosome’s active site fits one of the manyavailable transfer RNA (tRNA) species somewhat better than it fits the other 19 op-tions. But the binding energy difference, which quantifies “somewhat better,” onlyamounts to two or three hydrogen bonds. This translates into a fraction of timespent bound to the wrong tRNAs that is about 1 /
100 times as great as the corre-sponding quantity for the correct amino acid [17]. If the fraction of incorrect aminoacids incorporated into a polypeptide chain were that high, then every protein copylonger than a few hundred amino acids would be defective!In fact, the error rate of amino acid incorporation is more like 10 − . The fact thatthis figure is so much smaller than the one seemingly demanded by thermodynamicsremained puzzling for decades. After all, the ribosome is rather complicated, but itis still a nanoscale machine. Which of its features could confer this vast improvementin accuracy?J. Hopfield and J. Ninio proposed an elegant physical mechanism, based on aknown but seemingly pointless feature of the ribosome [18, 19, 17]. To explore it, webegin by paraphrasing a metaphor due to U. Alon [20]. Imagine that you run an artmuseum and wish to find a mechanism that picks out Picasso lovers from among allyour museum’s visitors. You could open a door from the main hallway into a roomwith a Picasso painting. Visitors would wander in at random, but those who do notlove Picasso would not remain as long as those who do. Thus, the concentrationof Picasso lovers in the room would arrive at a steady value (with fluctuations, ofcourse) that is enriched for the desired subpopulation.To improve the enrichment factor further, you could hire an employee who occa-sionally closes the door to the main hallway, stopping the dilution of your enrichedgroup by random visitors. Then open a new exit doorway onto an empty corridor.Some of the trapped visitors will gratefully escape, but die-hard Picasso lovers willstill remain, leading to a second level of enrichment. After an appropriate time haselapsed, you can then reward everyone still in the room with, say, tickets to visitthe Picasso museum in Paris.The original authors realized that in the ribosome, the initial, reversible bindingof a tRNA was followed by a transformation analogous to closing the door in thepreceding metaphor. This transformation involved hydrolysis of a GTP (guanosine8riphosphate) molecule complexed with the tRNA, and hence it was nearly irre-versible, due to the highly nonequilibrium concentration of GTP, compared to thehydrolysis products GDP and P i (inorganic phosphate). Such hydrolysis reactionswere well known to supply the free energy needed to drive otherwise unfavorablereactions in cells, but here their role is more subtle.Hopfield and Ninio were aware that after the hydrolysis, incorporation of theamino acid was delayed and could still be preempted by unbinding of the tRNA com-plex. The existence of this pathway had previously seemed wasteful: An energy-richGTP had been “spent” without anything “useful” (protein synthesis) being done.On the contrary, however, the authors argued that this second step implementedthe mechanism in the art museum metaphor, giving the ribosome an independentsecond chance to dismiss a wrong tRNA that accidentally stayed bound long enoughto progress to this stage. After all, spending some GTPs may be a modest price topay compared to creating and then having to recycle an entire defective protein.Hopfield coined the name “kinetic proofreading” for this mechanism, but wewill refer to it as the “classic Hopfield–Ninio” (HN) mechanism because the originalterm is somewhat misleading. In chemical reaction contexts, a “kinetic” mechanismgenerally implies bias toward a product with lower activation barrier, even if itis less stable than another product with higher barrier. This preference is mostpronounced at high, far-from-equilibrium catalytic rates [21]. In contrast, the classicHN proofreading model involves two sequential thermodynamic (quasiequilibrium)discriminations. Moreover, these discriminations take place prior to reading even thevery next codon, in contrast to editorial proofreading, which generally happens afteran entire manuscript is written. (Our choice of term also distinguishes the classicscheme from later models that are sometimes also called “kinetic proofreading.”)The qualitative word-model given earlier in this section may seem promising. Butthe corresponding kinetic equations make for difficult reading and understanding.Better intuition could emerge from a presentation that stays closer to the concreteideas of discrete actors randomly arriving, binding, unbinding, and so on, visiblyimplementing the ideas behind the “museum” metaphor. The following sections willargue that stochastic simulation can realize that goal. In a nutshell, An effectively irreversible step, or at least a step far from equilibrium,gives rise to enhanced accuracy. The free energy of GTP hydrolysis isthe price paid for this accuracy.
It would also be valuable to confirm a key result of the analytic approach, whichpredicts that the enhancement of accuracy depends on GTP, GDP, and P i beingheld far from chemical equilibrium, so that the hydrolysis step is nearly irreversible(the “door shuts tightly” in the museum metaphor). In fact, the model predicts no enhancement of accuracy when this chemical driving force is low [18]. Far fromequilibrium, however, the predicted error fraction can be as low as the square of the9 GTP: Ribosome: tRNA: Amino acids: GDP: mRNA: Codon + , DT D Then shiftright and goback to a Hydrolysis (cid:31) P i D DT T
01 42 3 00 b C · GDPC · GTP P i W · GTP · R W · GDP · RR W · GTP W · GDP C · GTP · R C · GDP · R m w m w P i k w k w w w k c k c cc m c m c RRGDP, tRNAGDP, tRNA (cid:31)(cid:31) k add,c k add,w Figure 3:
Two representations of the classic Hopfield–Ninio mechanism. ( a ) Traditionalcartoon expressing the catalytic cycle of the ribosome (after [23]). ( b ) Corresponding kineticdiagram. The large pale arrows indicate the net circulation in each cycle under cellularconditions, where GTP is held out of equilibrium with GDP and P i . The symbol R denotesa ribosome complexed with mRNA; R ∗ is the corresponding complex “activated” by GTPhydrolysis. At far right, R indicates the ribosome with one additional amino acid added tothe nascent polypeptide chain. The classic Hopfield–Ninio proofreading model assumes thatunbinding rates k c and (cid:96) c are smaller than their mismatched counterparts k w and (cid:96) w , butthat other constants are all equal for correct and wrong tRNA. equilibrium value (or even a higher power if multiple rounds of sequential testingare employed). This section’s goal is to formulate the word-model of Sect. 4.1 in the context ofmRNA translation, then set up a stochastic simulation (see also [22]). Later sec-tions will show how students can explore the expectations raised at the end of thepreceding section.We will assume that a single ribosome is complexed with a single mRNA andhas arrived at a particular codon. This complex sits in an infinite bath containingseveral free, dissolved species at fixed concentrations (Fig. 3a): • C denotes correct tRNA (that is, the species that matches the codon currentlybeing read), loaded with the corresponding amino acid. We will neglect thepossibility of a tRNA being incorrectly loaded; accurate loading is the concernof another proofreading mechanism that we are not studying now [24, 25]. • W is similar to C, but refers to the wrong tRNA for the codon under study. • Some reactions form complexes of tRNA with guanosine phosphates: C · GTP,C · GDP, W · GTP, and W · GDP. (For simplicity, we suppress any mention of10longation factors, one of which, “EF-Tu,” is also included in these complexesbut is only implicit in the classic HN mechanism.)Fig. 3b denotes the ribosome-mRNA complex by R. In state , this complex isnot bound to any tRNA. (More precisely, no tRNA is bound at the “A” site of theribosome; a previously bound tRNA, together with the nascent polypeptide chain,is bound at another site (Fig. 3a), which we do not explicitly note.) Surroundingthis state, Fig. 3b shows four other states – in which the ribosome is bound to thecomplexes introduced earlier. The upper part of the figure describes wrong tRNAbinding and possible incorporation; the lower part corresponds to the correct tRNA.Horizontal arrows at the top and bottom denote hydrolysis of GTP, which is coupledto a transformation of the ribosome into an activated state, R ∗ .Although any chemical reaction is fundamentally reversible, under cellular con-ditions the concentration ratio [P i ][C · GDP] / [C · GTP] is far below the equilibriumvalue, so that the reactions in Fig. 3b are predominantly in the direction shown bythe pale arrows. This was one of the conditions in Hopfield’s original proposal.(Sect. 4.6 will explore relaxing it.)Again, we are assuming that a single ribosome bounces around this state diagramin the presence of fixed concentrations of feedstocks either imposed in vitro by theexperimenter or supplied by a cellular milieu. There are two ways to “exit themuseum exhibit by the second door”: After hydrolysis, the ribosome can reject itstRNA-GDP complex with probability per unit time (cid:96) . Or, with probability per unittime k add it can add its amino acid to the nascent polypeptide, translocate the tRNAto the second binding site, and eject any tRNA already bound there. Either way,the main binding site becomes vacant and, for the purposes of this state diagram,the ribosome returns to state .Supplementary computer code 3 implements a Gillespie simulation on the fivestates of the ribosome (Fig. 3b). To keep the project modular, we constructed a simulation code that writes its statetrajectory to a file. A second code then reads that file and creates a visual output.The first of these codes operates similarly to Sect. 3.1, but with a four-way choiceof what transition to make after each waiting interval. The second code can bealmost as simple as the one described in Sect. 2.2. However, students with moretime (perhaps in a capstone project) can make a more informative display with areasonable additional effort, as follows.The supplementary videos not only show the state that is current at the endof each video frame; they also animate the pending arrivals of new complexes thatare about to bind and the departures of old ones that have unbound without incor-poration. By this means, the videos give a rough sense of the “narrative” in thetrajectory being shown. These improvements are not difficult to add once the basic11 aiting timeProbabilitydensity [a.u.] t min Modified0
Figure 4:
Modified waiting times.
Solid line:
Exponential distribution of waiting times.
Dashed line:
Shifted exponential distribution obtained by adding the constant t min to eachsample. code is working. Alternatively, students can construct the basic version, then beshown these videos.The exponential distribution of waiting times implies that there will be episodeswith several events happening rapidly, interspersed with long pauses. For this rea-son, it is useful to view the simulation in two ways: Once with a shorter time stepthat resolves most individual events but covers only a limited time interval (sup-plementary video 3), and then with a coarser time step to see the entire synthesistrajectory (supplementary video 4).We also found it useful (solely for visualization purposes) to alter the distributionof waiting times in a simple way that relieves visual congestion without, we think,too much damage to the realism of the simulation. Our modification, shown in thesupplementary videos, was to add a small fixed delay, for example one half of onevideo frame, to every transition waiting time (Fig. 4). Following Hopfield, we initially assume that the rate constant for incorporation, k add , is the same regardless of whether the tRNA is correct or incorrect. We alsosuppose that the binding rates k (cid:48) c = k (cid:48) w and (cid:96) (cid:48) c = (cid:96) (cid:48) w also have this property; forexample, all of them may be diffusion-limited [17]. Only the unbinding rates differin the classic HN model: k w = φ − k c and (cid:96) w = φ (cid:96) c . Here φ − = φ ≈
100 is the preference factor for unbinding the wrong tRNA (relativeto the correct one). Again following Hopfield, we will also take the hydrolysis rateconstants to be equal: m (cid:48) w = m (cid:48) c (and m w = m c ). To visualize wrong incorporationswithin a reasonable time frame, we raised the probability of incorrect choices: Thepreference ratios φ − and φ were lowered from their realistic value of 100 to just5. Other values in column 4 of Table I were loosely inspired by rate constants es-timated from experimental data in a simplified form (see Appendix A.1). (Sect. 4.512 escription Symbol Name in code Classic HN Realistic model Equilibrium binding GTP complex, s − k (cid:48) c kc on
40 40 40unbinding GTP complex, s − k c kc off .
50 0.5 0.5binding GDP complex, s − (cid:96) (cid:48) c lc on − (cid:96) c lc off .
085 0.085 0.085hydrolysis and P i release, s − m (cid:48) c mhc .
01 25 0.01condensation/P i binding, s − m c msc k (cid:48) w = φ k (cid:48) c kw on φ = 1 0 .
68 1unbinding GTP k w = φ − k c kw off φ − = 5 94 5binding GDP (cid:96) (cid:48) w = φ − (cid:96) (cid:48) c lw on φ − = 1 0.0027 1unbinding GDP (cid:96) w = φ (cid:96) c lw off φ = 5 7.9 5hydrolysis m (cid:48) w = φ m (cid:48) c mhw φ = 1 0.048 1condensation m w = φ − m c msw φ − = 1 1 1incorporation, s − k add , c kaddC .
01 4.14 0.01incorporation k add , w = φ add k add , c kaddW φ add = 1 0 .
017 1
Table 1:
Illustrative values for the rates shown in Fig. 3b. The third column gives variablenames used in supplementary computer code 3. See the Appendix for discussion of thenumerical values. The fifth column follows [23, 26], who refer to our φ i as f i . The lastcolumn uses rates from the HN model, but with hydrolysis and incorporation modified tosatisfy equilibrium. will follow the experimental values more closely.) These effective rate constants areeither a constant probability per unit time (unbinding and hydrolysis) or else a prob-ability per unit time with the substrate concentration already lumped in (bindingand condensation). The values we chose were appropriate for the concentrations ofreactants present in the experiment.Supplementary videos 3–4 show the resulting behavior. Perhaps the most impor-tant impression we get from viewing these animations is that the cell is a busy place. The riot of activity, the constant binding events that end with no “progress” (andoften not even GTP hydrolysis), are hallmarks of chemical dynamics that are hardto appreciate in textbook discussions, yet vividly apparent in the simulation. This isespecially apparent in supplementary video 4, which shows a typical run of 25 aminoacid incorporations. Because there are many unproductive binding and unbindingevents in the simulation, not every event is shown in detail in video 4. However,focusing on the GDP-tRNA rejections shows that more correct tRNAs than incor-rect tRNAs make it past GTP hydrolysis, and that the few incorrect tRNAs that domake it past are quickly rejected in the second proofreading step. In the instanceshown, only one incorrect amino acid was incorporated out of 25 incorporations,much lower than the 1/5 error rate expected from single step equilibrium binding.Supplementary video 3 provides a more detailed look at this process. The videosalso show clearly the jerky, nonuniform progress of synthesis, with some amino acid13ncorporations happening after much longer delays than others. That feature is bynow well documented by single-molecule experiments.A typical run created a chain of 100 amino acids, of which 6 were wrong. Thiserror rate of ≈ (6 / .
06 is far smaller than the naive expectation of 1 /φ =0 .
20. This is the essence of the classic HN mechanism; we see it taking shapein the animation, as many wrong tRNA complexes bind but are rejected, eitherprior to or after GTP hydrolysis. We also see many correct complexes bind andget rejected, before or after GTP hydrolysis. This is the price paid for accuracyin the classic HN proofreading model. The error rate in the simulation is slightlylarger than 1 / ( φ ) = 1 / = 0 .
04, however. The discrepancy is expected, becausethe limiting value 1 / ( φ ) is only achieved in the limit as the incorporation andhydrolysis catalytic rates are sent to zero [26, 18]. Much has been learned about ribosome dynamics after Hopfield’s and Ninio’s originalinsights [27, 28]. We now know that each step in our model consists of substeps. Forexample, GTP hydrolysis is subdivided into GTPase activation followed by actualhydrolysis, the latter step probably depends on a rearrangement of “monitoringbases” in the ribosomal RNA, and so on [29].The model studied in Sect. 4.4 was designed to show the HN mechanism in its“classic,” or pure, form, and how it can enhance fidelity even without help from theeffects just described. For example, we assumed that the only dependence on rightversus wrong tRNA was via unbinding rates. Indeed, such dependence was laterseen at the single-molecule level [30]. But it now appears that some of the forwardrates also depend on the identity of the tRNA [31, 32, 33], an effect sometimescalled “internal discrimination.” In the limit that the ribosome uses only internaldiscrimination (activation barrier heights of correct and incorrect tRNA bindingdiffer and the equilibrium constants are the same), minimum error is obtained atfast catalytic rates [34]. This is in contrast to the HN scheme, which achievesminimum error as catalytic rate tends to zero.In our stochastic simulation model, it is straightforward to add internal discrim-ination effects by altering rate constants (Table I column 5). See Appendix A.2 fordiscussion of the values.Supplemental video 5 shows that the ribosome with experimentally measuredrates can be faster and more efficient than a ribosome with only classic HN proof-reading. We see both a bias for correct tRNA binding/hydrolysis and a bias forrejection of wrong tRNAs before GTP hydrolysis. Of the 26 correct tRNA bindingevents in this run, 25 resulted in successful incorporation. This is compared to thefraction 24 / .
002 of productive correct tRNA binding events in a typicalrun of the classic HN ribosome simulation. In addition, of the 30 incorrect tRNAbinding events on the realistic ribosome, all 30 resulted in rejection. The error frac-14ion of the ribosome with realistic rates is also more accurate than the classic HNribosome. Of 10 000 amino acids simulated, 18 wrong amino acids were incorporated(error fraction of 0.0018), compared to 6/100 for the classic HN ribosome. This sim-ulated error fraction of 0.0018 is consistent with the analytic prediction of 0.0017from first-passage times [26].However, the simulated ribosome with in-vitro measured rates is still not as fastor accurate as the real
E. coli ribosome in vivo, which translates at 15–20 amino acidsper second with an error rate of 1/10 000 [35]. Thus, the realistic ribosome likelyevolved to combine HN proofreading (quasiequilibrium, energetic proofreading) withinternal discrimination (unequal forward rates) to optimize speed, efficiency, andaccuracy [36]. Despite this, simulating the classic HN model of proofreading is stilla valuable exercise for students. By visualizing discrimination via only a differencein unbinding rates, students see the minimal components necessary to attain highaccuracy in a broad class of biological reactions. Also, the classic HN mechanismillustrates an essential part of biological proofreading which fundamentally relies onnon-equilibrium physics.There is also recent evidence pointing to two kinetic proofreading steps, that is,two sequential, nearly irreversible steps each of which can be followed by unbindingof tRNA [37, 38]. Our simulation could be extended to include such effects, whereasanalytic methods would quickly become intractable. Finally, additional interestingsteps arise during “translocation,” in which the previous tRNA, from which thenascent peptide chain has been released, and the current tRNA, now carrying thatchain with an additional amino acid, are both shifted one step inside the ribosome,freeing the binding site so that the entire cycle can begin again (Fig. 3a). Becausethis step is not related to accuracy, we have simplified by omitting it from our model.
For comparison, we return to the classic Hopfield–Ninio model, this time operat-ing at nearly equilibrium concentrations of GTP, GDP, and P i to demonstrate theimportance of the “one-way door” (the GTP hydrolysis step). Table I column 6summarizes the rates for this undriven model (see Appendix A.3).With these rates, the reaction still creates a chain, because we assumed a fixedprobability per time to irreversibly add an amino acid whenever the ribosome visitsits activated state. But this time a typical run gave 17 errors in a chain of length100, illustrating the significance of the thermodynamic driving force in reducing theerror rate. This error rate of about 0.17 is consistent with the Michaelis-Mentenpredicted error of 1 /φ = 0 .
20, which is the lowest the error can be for a classic HNmodel in equilibrium conditions [18].Analysis of the events in the simulation showed that of the 17 wrong amino acidsincorporated, 10 were through direct binding of GDP · tRNA. Thus, for many aminoacids, the first discrimination step was bypassed, resulting in the high error rate15bserved in the simulation.To gain more insight into the role of the irreversible GTP hydrolysis step, somestudents may wish to rerun the simulation with different incorporation and hydrol-ysis rates. For example, a simulation with m (cid:48) c = 25 and k add = 4 .
14 results in asimulation with many tRNAs flipping between GDP and GTP states, another wayin which the two discrimination steps become coupled into one.
The models described here show fairly elementary physical principles that lie at theheart of cell biology. Specifically, gene expression and kinetic proofreading are twoimportant, fundamental topics that are well within reach of undergraduates.A module that introduces stochastic simulation need not dominate a semestercourse: One class week is enough for the first exposure. Indeed, the entire simulationplus visualization in supplementary computer code 1 consists of just seven short lines of code, and yet it creates a valuable educational experience not available in a statictextbook. Moreover, the opening material is not specifically biological in character;it can serve as a stepping stone to more complex simulations relevant for a varietyof courses.
Acknowledgments
We are grateful to Ned Wingreen and Anatoly Kolomeisky for correspondence andto Bruce Sherwood for help with software. This work was partially supported bythe United States National Science Foundation under Grants PHY–1601894 andMCB–1715823. Some of the work was done at the Aspen Center for Physics, whichis supported by NSF grant PHY–1607611, and at the Physical Biology of the Cellsummer school (Marine Biological Laboratory).
A Choice of illustrative parameters
A.1 Classic HN proofreading (Sect. 4.4)
Here we outline our choice of rate parameters in Table I, column 4. Rather than pur-sue these biochemical details, students can simply be told that these are interestingvalues.Zaher and Green measured k add , c ≈ .
14 s − and m (cid:48) c ≈
25 s − [31]. Interestingly,with those forward rates and the classic HN model’s stipulation that only the un-binding rates differ, we found in the stochastic simulation that protein synthesis hada high error rate. For example, using the parameter values shown Table I, column4 but with the two forward rates above, a typical run gave 49 errors out of 1000amino acids simulated. 16n fact, this breakdown of kinetic proofreading was already predicted in Hop-field’s original work, which pointed out that the error rate will only approach 1 /φ if additional conditions are met: m (cid:48) c < ∼ k c and k add , c < ∼ (cid:96) c . This condition gives the two binding/unbinding steps, which are both quasi-equilibrium,time to reject a wrong aa-tRNA before hydrolysis or incorporation; it was not sat-isfied for the parameter values just mentioned. Thus, if the wrong aminoacyl-tRNA(aa-tRNA) bound to the ribosome, there was a low probability it would unbindbefore either GTP hydrolysis or incorporation.Thus, for the purpose of illustrating a pure HN model of proofreading, we mod-ified the experimental values of k add , c and m (cid:48) c to the ones shown in Table I.We also required illustrative values of m c and (cid:96) (cid:48) c , which were not measuredexperimentally by Zaher and Green. However, these values are constrained by ther-modynamic consistency [39], which requires that even if the reaction is run far fromequilibrium, nevertheless the reactions must be capable of creating an equilibriumstate. When a reaction graph contains a closed loop (cycle), as ours does, thiscondition requires that k (cid:48) c , eq m (cid:48) c , eq (cid:96) c , eq = k c , eq m c , eq (cid:96) (cid:48) c , eq . equilibrium (2)Assuming for simplicity that substrate-dependent reactions are first order givesln k (cid:48) c m (cid:48) c (cid:96) c k c m c (cid:96) (cid:48) c = ln [C · GDP] eq [P i ] eq [C · GTP] eq + ln [C · GTP][P i ][C · GDP] . (3)The corresponding equation for wrong tRNAs is of the same form, except with wrongtRNA rate constants used. Dividing Eq. 3 for the wrong tRNA by the equation forthe right tRNA gives the consistency condition that φ φ φ = φ − φ − φ − . Thevalues in Table I column 4 satisfy this condition.Adamcyzk and Warshel calculated the free energy change for GTP hydrolysison free EF-Tu via a molecular dynamics simulation as ∆ G (cid:48) ≈ −
18 kcal / mol [40],which we used to calculate the first term on the right. For the second term on theright side of Eq. 3, [C · GTP] was taken from Zaher and Green since the simulation’srate constants were based on those experiments. In Zaher and Green’s experiments,EF-Tu · GTP complexes were incubated with 0.5 µ M of aa-tRNA for 15 minutes be-fore injection into a stopped-flow instrument. The [C · GDP] and [P i ] can then beestimated by using[C · GDP] = [P i ] = [C · GTP] ini (1 − e − k cat t ) ≈ [C · GTP] ini k cat t, (4)where t is the incubation time of 15 minutes and k cat is the rate of GTP hydrolysison free EF-Tu · aa-tRNA complexes. Fasano et al. calculated k cat ≈ . · − s − [41]. Thus, [C · GDP] = [P i ] = 0 . µ M and [C · GTP] = 0 . µ M .17sing these non-equilibrium concentrations and the assumption that m c = (cid:96) (cid:48) c ,Eq. 3 yields the common value of 2 . · − s − . A similar argument gives the samevalues for m w and (cid:96) (cid:48) w . For the purposes of our simulation, however, these valuesare essentially zero; over the limited duration of our simulation the correspondingtransitions don’t occur. We replaced them all by another small value, 0 .
001 s.As a further confirmation that the rates chosen are appropriate for a pure Hop-field scheme, we also checked the following conditions described by Hopfield [18]: • m (cid:48) c < ∼ k c . • k add < (cid:96) c . • m (cid:48) c k (cid:48) w / ( m (cid:48) c + k (cid:48) w ) > (cid:96) (cid:48) w . • m w < ∼ (cid:96) w , k add < ∼ (cid:96) w . A.2 More realistic model (Sect. 4.5)
To model a ribosome as it might work in the cell, in this simulation we more closelyfollowed the rates measured by Zaher and Green (Table I, column 5). The rateconstants m (cid:48) c and k add , c were set to their experimentally measured values of 25 s − and 4.14 s − , respectively. The φ i values were also modified to match experimentalmeasurements [26, 23]. φ and φ were not measured in Zaher and Green, so we fol-low Banerjee et al., who chose φ = 1 and got φ by the thermodynamic consistencycondition discussed in Appendix A.1. Banerjee et al. also chose φ − = 1, then got φ − by imposing the consistency condition: φ φ φ = φ − φ − φ − . A.3 Model with no chemical driving force (Sect. 4.6)
To illustrate the importance of the GTP hydrolysis step in the HN model, rate valueswere chosen to simulate protein synthesis without GTP, GDP, and P i concentrationsout of equilibrium. The rates chosen for the equilibrium proofreading model weresimilar to those in the classic HN model, except that the equilibrium consistencycondition, Eq. 2, was used to calculate (cid:96) (cid:48) c and m c . In addition, the φ s were keptthe same as in Appendix A.1, that is, satisfying Hopfield’s condition that only theunbinding rates differs between wrong and right tRNAs. Using the same assumptionthat m c = (cid:96) (cid:48) c for simplicity yielded m c = (cid:96) (cid:48) c =0.26 s − , higher than in the classic HNmodel. Supplementary online material
Videos: ChenVideo1-BeadJump.mov at :18ideo micrograph of a micrometer-scale bead in thermal motion, hopping be-tween the minima of a symmetric double-well potential field created by anoptical trap. Courtesy Adam Simon (see [2]).2. ChenVideo2-Flipper.mov at https://vimeo.com/269210861 :Hopping in a symmetric, two-state well. Animation generated by supplemen-tary computer code 1, reminiscent of the behavior seen in video 1.3. ChenVideo3-HNslow.mp4 at https://vimeo.com/283759767 :Proofreading in the classic Hopfield–Ninio model. Animation generated bysupplementary computer code 4, from a simulated trajectory generated by sup-plementary computer code 3 with parameters discussed in Sect. 4.4. In this andthe following videos, the left green number is the frame number and the rightgreen number is the state’s sequence number. Green objects represent correcttransfer-RNA/amino acid complexes; red objects represent incorrect choices.Cylindrical objects represent complexes containing GTP, arriving from solu-tion. A sphere in the center position represents a complex containing GDP.The growing chain of spheres on the right represent amino acids incorporatedinto the nascent polypeptide. The simulation was done assuming cellular con-ditions, that is, GTP far from equilibrium with GDP and inorganic phosphate.Only a short initial time interval is shown (the first 3% of full simulation), sothat tRNA binding, unbinding, hydrolysis, and incorporation events can beseen in detail. The frame rate was chosen to be 15 frames/second, and thetotal duration was chosen to be 10 000 s (150 000 frames). The actual simu-lation time is 26 692 seconds (41 185 states). The value of t min (see Sect. 4.3)was chosen to be (0 . /
15) s = 0 .
033 s.4.
ChenVideo4-HNfast.mp4 at https://vimeo.com/284065985 :The same as in video 3, but speeded up and covering the incorporation of25 amino acids. The error rate for binding was deliberately taken to be un-realistically large, 1 /φ = 0 .
20, in order to generate some errors in a shortsimulation. The frame rate was chosen to be 30 frames/second, and the totalduration was chosen to be 600s (18 000 frames). The actual simulation time is26 692 seconds (41 185 states). The value of t min (see Sect. 4.3) was chosen tobe (0 . /
30) s = 0 .
014 s.5.
ChenVideo5-InternalDisc.mp4 at https://vimeo.com/283760592 :Proofreading in a model with internal discrimination. The same as in video3, but with parameters discussed in appendix A.2. The same chain length of25 was used and no wrong amino acids are incorporated because the realisticribosome has an error rate much lower than 1/25 (Sect. 4.5). The frame ratewas chosen to be 15 frames/second, and the total duration was chosen to be150 s (2250 frames). The actual simulation time is 8.44 seconds (138 states).The value of t min (see Sect. 4.3) was chosen to be (0 . /
15) s = 0 .
033 s.19 omputer codes:
These codes are also available from https://github.com/NelsonUpenn/PNelson code .They were written in Python version 3. The .ipynb files run in the Jupyter Note-book environment and use the VPython 7 package [4]. The .py files may be run inany Python implementation. One way to obtain VPython and Jupyter is from thefree Anaconda distribution ( http://anaconda.com ): After regular installation, issuethe command $ conda install -c vpython vpython to your operating system’s command shell. For more details see [42].1.
ChenCode1-flipper.ipynb : Simulate two-state transitions.2.
ChenCode2-transcrip.py : Simulate birth-death process.3.
ChenCode3-riboProof.py : Simulate tRNA selection in a ribosome.4.
ChenCode4-kproofBackend.ipynb : Display the result of Code 3 as an animation.
References [1] R W Chabay and B A Sherwood.
Matter and interactions . Wiley, New York,4th edition, 2015.[2] A Simon and A Libchaber. Escape and synchronization of a Brownian particle.
Phys. Rev. Lett. , 68(23):3375–3378, 1992.[3] P Nelson.
Physical models of living systems . W. H. Freeman and Co., NewYork, 2015.[4] Glowscript VPython and VPython 7. ,accessed 5/2018.[5] B Alberts, D Bray, K Hopkin, A Johnson, J Lewis, M Raff, K Roberts, andP Walter.
Essential cell biology . Garland Science, New York, 4th edition, 2014.[6] D T Gillespie. A general method for numerically simulating the stochastic timeevolution of coupled chemical reactions.
J. Comput. Phys. , 22(4):403–434, 1976.[7] D T Gillespie. Exact stochastic simulation of coupled chemical reactions.
J.Phys. Chem. , 81(25):2340–2361, 1977.[8] P J Choi, L Cai, K Frieda, and X S Xie. A stochastic single-molecule eventtriggers phenotype switching of a bacterial cell.
Science , 322(5900):442–446,2008.[9] A Eldar and M B Elowitz. Functional roles for noise in genetic circuits.
Nature ,467(7312):167–173, 2010. 2010] M E Lidstrom and M C Konopka. The role of physiological heterogeneity inmicrobial population behavior.
Nat. Chem. Biol. , 6(10):705–712, 2010.[11] I Golding, J Paulsson, S M Zawilski, and E C Cox. Real-time kinetics of geneactivity in individual bacteria.
Cell , 123(6):1025–1036, 2005.[12] I Golding and E C Cox. Chapter 8: Spatiotemporal dynamics in bacterial cells:Real-time studies with single-event resolution.
Meth. Cell Biol. , 89:223–251,2008.[13] S Chong, C Chen, H Ge, and X S Xie. Mechanism of transcriptional burstingin bacteria.
Cell , 158(2):314–326, 2014.[14] S A Sevier, D A Kessler, and H Levine. Mechanical bounds to transcriptionalnoise.
Proc. Natl. Acad. Sci. USA , 113(49):13983–13988, 2016.[15] S A Sevier and H Levine. Properties of gene expression and chromatin structurewith mechanically regulated elongation.
Nucleic Acids Res , 46(12):5924–5934,2018.[16] A Klindziuk and A B Kolomeisky, “Theoretical Investigation of the Transcrip-tional Bursting Mechanism: a Multi-State Approach.” J. Phys. Chem., submit-ted.[17] R Phillips, J Kondev, J Theriot, and H Garcia.
Physical biology of the cell .Garland Science, New York, 2d edition, 2012.[18] J J Hopfield. Kinetic proofreading: A new mechanism for reducing errors inbiosynthetic processes requiring high specificity.
Proc. Natl. Acad. Sci. USA ,71(10):4135–4139, 1974.[19] J Ninio. Kinetic amplification of enzyme discrimination.
Biochimie , 57(5):587–595, 1975.[20] U Alon.
An introduction to systems biology: Design principles of biologicalcircuits . Chapman and Hall/CRC, Boca Raton FL, 2006.[21] P Sartori and S Pigolotti. Kinetic versus energetic discrimination in biologicalcopying.
Phys. Rev. Lett. , 110(18):188101, 2013.[22] D M Zuckerman. Physical lens on the cell: Active (“kinetic”) proofreading. http://physicallensonthecell.org/cell-biology-phenomena/active-kinetic-proofreading ,accessed 5/2018.[23] K Banerjee, A B Kolomeisky, and O A Igoshin. Elucidating interplay ofspeed and accuracy in biological error correction.
Proc. Natl. Acad. Sci. USA ,114(20):5183–5188, 2017. 2124] J J Hopfield, T Yamane, V Yue, and S M Coutts. Direct experimental evidencefor kinetic proofreading in amino acylation of tRNA
Ile . Proc. Natl. Acad. Sci.USA , 73(4):1164–1168, 1976.[25] T Yamane and J J Hopfield. Experimental evidence for kinetic proofreadingin the aminoacylation of tRNA by synthetase.
Proc. Natl. Acad. Sci. USA ,74(6):2246–2250, 1977.[26] K Banerjee, A B Kolomeisky, and O A Igoshin. Accuracy of substrate selec-tion by enzymes Is controlled by kinetic discrimination.
J. Phys. Chem. Lett. ,8(7):1552–1556, 2017.[27] M V Rodnina. Mechanisms of decoding and peptide bond formation. In M VRodnina, W Wintermyer, and R Green, editors,
Ribosomes: Structure, func-tion, and dynamics , chapter 16, pages 199–212. Springer, New York, 2011.[28] I Bahar, R L Jernigan, and K A Dill.
Protein actions: Principles and modeling .Garland Science, New York, 2017.[29] P Satpati, J Sund, and J Aqvist. Structure-based energetics of mRNA decodingon the ribosome.
Biochemistry , 53(10):1714–1722, 2014.[30] S C Blanchard, R L Gonzalez, H D Kim, S Chu, and J D Puglisi. tRNA selectionand kinetic proofreading in translation.
Nat. Struct. Mol. Biol. , 11(10):1008–1014, 2004.[31] H S Zaher and R Green. Hyperaccurate and error-prone ribosomes exploitdistinct mechanisms during tRNA selection.
Mol. Cell , 39(1):110–120, 2010.[32] A Prabhakar, J Choi, J Wang, A Petrov, and J D Puglisi. Dynamic basis offidelity and speed in translation: Coordinated multistep mechanisms of elonga-tion and termination.
Protein Science , 26(7, SI):1352–1362, 2017.[33] M V Rodnina, N Fischer, C Maracci, and H Stark. Ribosome dynamics duringdecoding.
Phil. Trans. R. Soc. Lond. B, Biol. Sci. , 372(1716):20160182, 2017.[34] C H Bennett. Dissipation-error tradeoff in proofreading.
BioSystems , 11(2-3):85–91, 1979.[35] R Milo and R Phillips.
Cell Biology by the numbers . Garland Science, NewYork, 2016.[36] R Rao and L Peliti. Thermodynamics of accuracy in kinetic proofreading:dissipation and efficiency trade-offs.
J. Stat. Mech. , article P06001, 2015.[37] K-W Ieong, U Uzun, M Selmer, and M Ehrenberg. Two proofreading stepsamplify the accuracy of genetic code translation.
Proc. Natl. Acad. Sci. USA ,113(48):13744–13749, 2016. 2238] J Chen, J Choi, S E O’Leary, A Prabhakar, A Petrov, R Grosely, E V Puglisi,and J D Puglisi. The molecular choreography of protein synthesis: translationalcontrol, regulation, and pathways.
Q. Rev. Biophys. , 49:e11, 2016.[39] T L Hill.
Free energy transduction and biochemical cycle kinetics . Springer,New York, 1989.[40] A J Adamczyk and A Warshel. Converting structural information into anallosteric-energy-based picture for elongation factor Tu activation by the ribo-some.
Proc. Natl. Acad. Sci. USA , 108(24):9827–9832, 2011.[41] O Fasano, E De Vendittis, and A Parmeggiani. Hydrolysis of GTP by elongationfactor Tu can be induced by monovalent cations in the absence of other effectors.
J. Biol. Chem. , 257(6):3145–3150, 1982.[42] J M Kinder and P Nelson.