Error-speed correlations in biopolymer synthesis
EError-speed correlations in biopolymer synthesis
Davide Chiuchi´u, Yuhai Tu, and Simone Pigolotti ∗ Biological Complexity Unit, Okinawa Institute of Science andTechnology Graduate University, Onna, Okinawa 904-0495, Japan IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.
Synthesis of biopolymers such as DNA, RNA, and proteins are biophysical processes aided byenzymes. Performance of these enzymes is usually characterized in terms of their average errorrate and speed. However, because of thermal fluctuations in these single-molecule processes, botherror and speed are inherently stochastic quantities. In this paper, we study fluctuations of errorand speed in biopolymer synthesis and show that they are in general correlated. This means that,under equal conditions, polymers that are synthesized faster due to a fluctuation tend to haveeither better or worse errors than the average. The error-correction mechanism implemented bythe enzyme determines which of the two cases holds. For example, discrimination in the forwardreaction rates tends to grant smaller errors to polymers with faster synthesis. The opposite occursfor discrimination in monomer rejection rates. Our results provide an experimentally feasible wayto identify error-correction mechanisms by measuring the error-speed correlations.
Organisms encode genetic information in heteropoly-mers such as DNA and RNA. Replication of these het-eropolymers is a non-equilibrium process catalyzed byenzymes. The crucial observables to characterize theseenzymes are their error rate and speed. A low error, de-fined as the fraction of monomers in the copy that do notmatch the template, ensures correct trasmission of bio-logical information. High processing speed is also crucialto guarantee fast cell growth. Theoretical approacheshave been developed to compute the average error andaverage speed of polymerization processes [1–7]. How-ever, at the single molecule level, both error and speedcan present significant stochastic fluctuations.In this Letter we address fluctuations in the error andspeed of polymer synthesis. In particular, we show thatcorrelations between these quantities exist. These cor-relations provide a way to identify the error correctionmechanism adopted by an enzyme from experimentaldata. This approach can circumvent the characteriza-tion of these enzymes by measuring all kinetic rates ofthe underlying reaction network [8–15].We consider an enzyme that replicates an existing tem-plate polymer by sequentially incorporating monomersinto a copy polymer (Figure 1a). In a given time interval T , the enzyme synthesizes a copy made up of a numberof monomers L . Because of thermal fluctuations, en-zymes sometimes incorporate wrong monomers ( w ) thatdo not match the template, instead of the right ones ( r ).In practical cases, there can be multiple types of wrongmonomers; for simplicity, we do not distinguish amongthem. We denote R as the number of right matches and W the number of wrong matches in the copy, so that R + W = L . The error of the polymer copy can be thenexpressed as η = WL . (1)We focus on two possible setups, corresponding to two idealized experiments. In the first, the enzyme repli-cates a given template polymer for a fixed time T ≫ L andthe error η fluctuate. We denote their variance with σ L = ⟨ L ⟩ − ⟨ L ⟩ , σ η = ⟨ η ⟩ − ⟨ η ⟩ and the covariance with σ ηL = ⟨ ηL ⟩ − ⟨ η ⟩⟨ L ⟩ , where ⟨ . . . ⟩ is an average over differ-ent realizations of the same process. Since T is fixed, wequantify the correlations between error and speed withthe error-length coefficient r ηL = σ ηL σ L σ η . (2)In the second setup, each realization terminates whenthe enzyme has incorporated a number L ≫ L is fixed, whereasthe total duration T of the copy process fluctuates. Thissetup represents the biological scenario where an en-zyme copies a polymer of fixed length. In this case, westudy the correlation between the polymerization errorand speed via the coefficient r ηT = σ ηT σ T σ η (3)where σ T = ⟨ T ⟩ − ⟨ T ⟩ is the variance of T and σ ηT = ⟨ ηT ⟩ − ⟨ η ⟩⟨ T ⟩ .Our two setups are akin to two conjugate ensemblesin equilibrium statistical physics. For large times (andlengths), fluctuations in these two ensembles can be re-lated by means of large deviation theory [16]. Followingthis approach we obtain r ηT = − r ηL (4)(see SI for details). Eq. (4) implies that the two setupscorrespond to two equivalent ensembles. Therefore, inthe following we will focus on the fixed time setup only. EnzymeW onma ch Free monomerRightmatchTemplate polymerCopy polymerT=T T=T T=T T=T Time TError 10 (c) Fixed Length L L=L L=L x L=L L=L L=L Length L230 2900.030.01Error (b) Fixed time T(a)
FIG. 1. (a) An enzyme reads an existing heteropolymer asa template and sequentially incorporates monomers to copyit. Each incorporated monomer can either be a right ( r ) orwrong ( w ) match with the template polymer. (b) Due tothermal fluctuations, the polymer length L and error η arerandom quantities at fixed completion time T . (c) When anenzyme produces a copy polymer with fixed length, the error η and the time T fluctuate. Scatterplots in (b) and (c) repre-sent N = To estimate r ηL we first observe that the distributionsof R and W tend to Gaussian for large T due to the cen-tral limit theorem. We can therefore obtain the momentsof L = R + W and η = W /( W + R ) from those of R and W . This procedure yields r ηL = ( − ⟨ η ⟩) σ RW + ( − ⟨ η ⟩) σ W − ⟨ η ⟩ σ R √ σ R σ W − ( σ RW ) . (5)To compute the quantities in Eq. (5), we assume that thefinal chemical reaction to incorporate a r or a w monomeris irreversible. This assumption is realistic for most prac-tical cases such as DNA polymerization [10, 17] and pro-tein translation [12–14]. Our framework could be gener-alized to cases where the last reaction is reversible, per-mitting an interpretation of the results using stochastic thermodynamics [1–5, 18] . For simplicity, we also as-sume that probabilities to incorporate right and wrongmatches do not depend on the template monomer. Underthese assumptions, we describe the polymerization pro-cess by means of the probabilities η and 1 − η to incor-porate a wrong ( w ) or a right ( r ) monomer, respectively,and the probability distributions P ( τ ∣ r ) and P ( τ ∣ w ) thatit takes a time τ to incorporate an r or a w monomer, re-spectively. The value of η and the functions P ( τ ∣ r ) and P ( τ ∣ w ) can be computed from the underlying reactionnetwork [6, 19]. With these quantities we can expressthe joint probability P ( R, W ∣ T ) for large T as P ( R, W ∣ T ) ≈ ( R + WW ) η W ( − η ) R × (6) × ∫ ∞ R ∏ i = W ∏ j = d τ i d τ j P ( τ i ∣ r ) P ( τ j ∣ w ) δ ( R ∑ n = τ n + W ∑ m = τ m − T ) . In Eq. (6), the binomial term weight the probabilityof incorporating R right and W wrong monomers. Theintegral term in the second line selects trajectories whosesum of incorporation times is equal to T .Evaluating the average error for large T gives the con-sistency relation ⟨ η ⟩ = η . Computing the covariance ma-trix of P ( R, W ∣ T ) in the same limit (see SI) and substi-tuting the resulting moments in Eq. (5) gives r ηL = β √ + β (7)with β = (⟨ τ ⟩ r − ⟨ τ ⟩ w )√ η ( − η )√( − η ) σ τ,r + η σ τ,w , (8)where ⟨ τ ⟩ r , ⟨ τ ⟩ w , σ τ,r and σ τ,w are the means and vari-ances of P ( τ ∣ r ) and P ( τ ∣ w ) , that we assume to be finite.We validated Eqs. (7) and (8) with stochastic simula-tions (see SI) and we will use them to compute error-speed correlations in the following. Expanding Eq. (8)and Eq. (7) for small η leads to r ηL ≈ ⟨ τ ⟩ r − ⟨ τ ⟩ w σ τ,r √ η . (9)Eq. (9) is our main result. It predicts that the sign of r ηL depends on the sign of (⟨ τ ⟩ r − ⟨ τ ⟩ w ) only. We willshow that, in practice, the error correction mechanismsdetermine this sign. Kinetic Proofreading.
Hopfield’s kinetic proofreadingmodel [20] is an elegant example of an incorporation pro-cesses implementing error correction. In this model, theenzyme first captures either a r or w monomer (Figure2.a). After the initial binding, the enzyme can either re-ject the monomer or consume ATP to induce a conforma-tional change. Thanks to this change, the enzyme gains a (b) Translation model p p rwrw (a) Hop fi eld model p p FIG. 2. Reaction networks for polymer synthesis. (a)Hopfield model. The kinetic rates satisfy the relations k r = k exp [ ∆ E r / k B T ] , k w = k exp [ ∆ E w / k B T ] , k rp = m exp [ ∆ E r / k B T ] and k wp = m exp [ ∆ E w / k B T ] with k ≫ m = n ≪
1, so that the model operates in the proof-reading regime [20]. (b) Protein translation model from [19]with rates extracted from [14]. Same line thickness marksreaction rates of the same order of magnitude. second chance to reject wrong monomers. This second re-jection reaction is the kinetic proofreading and it greatlyreduce the error probability η . This idea has been gener-alized to more complex proofreading models [2, 6, 19, 21–24]. Rates of forward reactions in the Hopfield model donot depend on the monomer type, whereas rejection re-actions have higher rates for w than r monomers (Figure2.a). In the proofreading regime (Figure 2.a), the errorprobability η can be estimated with first passage timetechniques [19] as η ≈ ( + k r k w k rp k wp ) − ≈ e − ( ∆ Ew − ∆ Er ) kBT (10)where the ratios k r / k w and k rp / k wp reflect the discrimi-nation in the rejection rates (see [20] and SI), k b is theBoltzmann constant and T is the temperature. Boththese ratios relate to the difference ∆ E r − ∆ E w in bind-ing energy of r and w monomers through k r / k w = k rp / k wp = exp [( ∆ E r − ∆ E w )/ k B T ] . Outside of the error correctionregime, the error is always larger than predicted by Eq.(10) [20, 25]. In the proofreading regime of the Hopfieldmodel, error and speed fluctuations are positively corre-lated. In particular, the error-length coefficient alwaysfalls in the range0 ≤ r ηL ≤ η (√ − η η − ) (11)for any choice of η , see SI and Fig. 3. This impliesthat the error-speed correlations become negligible whenproofreading ensures very small errors. Protein translation.
A standard model of proteintranslation is characterized by the same reactions of theHopfield model (Figure 2.b and [19, 26, 27]). A majordifference is that forward reactions discriminate betweenthe r and w monomers (Table S1 and [14, 19]). Withinthis model we estimate the error probability as η ≈ k wf k rf ( + k wf k wp ) − (12) (a) (b) -8 -4 -4 -2 -4 -3 -2 -1 HYP
E. coli WT E. coli
ERR
E. coli
FIG. 3. The Hopfield model and the protein translation modelhave opposite error-speed correlations. (a) Hopfield model.The gray shaded region defines the allowed values of r ηL fora given error probability η , see Eq (11). Black crosses areestimates of η and r ηL for 60 random sets of reaction rates inthe proofreading regime (see caption of Figure 2, SI, and TableS2). (b) Protein translation. To test Eq. (13) (gray line),we computed r ηL with the kinetic rates in [19] for wild type E. coli , a hypercorrective and an error-prone mutation (bluecrosses). We also evaluated r ηL for randomly generated sets ofthe reaction rates in Figure 2.b (black squares). For all pointsin both panels, correlation coefficients are evaluated by meansof Eqs. (7)-(8) upon computing moments of incorporationtimes with first passage time techniques [19]. See SI for detailsof numerical calculations. (see [19] and SI). In this case, the error probability de-pends on the relative preference to bind r rather than w monomers (term k wf / k rf ). Proofreading effectiveness overthe incorporation reaction for w monomers (term k wf / k wp )further decrease the error probability. Because of thediscrimination in the forward rates, the energy difference∆ E r − ∆ E w does not set a lower bound to the error prob-ability as in the Hopfield model [6]. Similar calculationsas in Eq. (11) predict an error-length coefficient r ηL ≈ − √ ⎛⎝ + k wp k wf ⎞⎠ − . (13)(SI and Figure 3). At variance with the Hopfield model,the error-length coefficient is always negative in pro-tein translation. This striking difference arises from thediscrimination in the forward rates, as further clarifiedin the following. Also in this case, the absolute valueof the error-speed correlations decreases at increasingproofreading efficiency. Ribosomes with impaired kineticproofreading should then exhibit stronger error-speedcorrelations. A computation of error-speed correlationsfrom experimentally measured rates for different E. coli strains supports Eq. (13), Fig. 3.
Core network.
In both models we considered, kineticproofreading reduces the absolute value of the error-length coefficient without changing its sign. To showthis effect in general, we consider an arbitrary reactionnetwork where we identify some of the reaction steps asthose implementing kinetic proofreading (Fig.4.a). Forexample, in both models of Fig. 2, the proofreading re-actions are those with rates k rp and k wp . The completenetwork has an error probability η and an error-lengthcoefficient r ηL . We now remove all the proofreading re-actions and define the remaining reactions as the “corenetwork”. In many practical case the core network isa simple linear chain of reactions, so that it is easy tocompute its error probability η core0 and its error-lengthcoefficient r core ηL . To compare r ηL and r core ηL we assumethat both η and η core0 are small so that Eq. (9) holds.We further assume that proofreading is a relatively rareevent that does not significantly influence the incorpora-tion times. Taking the ratio r ηL / r core ηL we therefore obtain r ηL ≈ √ η η core0 r core ηL . (14)Since proofreading reduces the error probability ( η < η core0 ), it also reduces the absolute value of the error-length coefficient without changing its sign. We testedour prediction by computing r ηL from experimentallymeasured kinetic rates in E. coli ribosomes (Table S1,SI, and [14, 19]) and from the T7 DNA polymerase [10](see Figure S1 for the T7 datum). Eq. (14) qualitativelycaptures the dependence of the error-length coefficienton the error-correction effectiveness (Fig. 4.b). Quanti-tative discrepancies arise because the assumption thatproofreading does not affect incorporation times partiallybreaks down.The core-network approach qualitatively explains whythe error-speed correlations have different signs in theHopfield model (Figure 2.a) and in the protein transla-tion model (Figure 2.b). Because of the discriminationin the backward rates, r monomer bind to the enzymefor a long time in the core network of the Hopfield beforethe final incorporation. On the other hand, w monomersbind to the enzyme for a short time before they are ei-ther rejected or incorporated. This implies that r core ηL > r ηL >
0, as showed in Figure 3. Conversely,the discrimination in the forward rates grants a fast in-corporation of r monomers in the core network of theprotein translation model. Thus, r core ηL < r ηL < (a)(b) P oo readingP oo reading rw net coli FIG. 4. Proofreading suppresses the error-speed correlations.(a) Incorporation with a ”core network” complemented byproofreading reactions. Each block in the figure representsan arbitrary sub-network with an average flux in the direc-tion of the arrows. (b) Comparison of Eq. (14) (solid line)with computation of error-speed correlations from measuredkinetic rates, see SI. We considered the ribosomes in threestrains of
E. coli : wild type, hypercorrective, and error-prone[14, 19]. For each strain, we built the core network by re-moving the proofreading reactions and computed the relativechange in r ηL and η between the original and core networks.We performed the same analysis for a model of T7 DNA poly-merase (blue square, see SI and [8]). The data qualitativelyagree with Eq. (14). correlations could reveal the presence of forward discrim-ination in replicative enzymes. Cell-free translation sys-tems [29, 30] could provide simple and versatile in vitroassays to perform these measurements for ribosomes. Apossible experiment would be to let the system trans-late for a fixed short time, separate the products intoshorter and longer peptides, and then measure using massspectroscopy whether the two categories contain signif-icantly different errors. Similar experiments for DNApolymerases could bring insight into poorly character-ized chemical reaction networks, such as those of humanmitochondrial DNA pol- γ [31], yeast pol- (cid:15) [32] and pol- δ [33, 34].The magnitude of the error-speed correlations de-creases when proofreading effectiveness increases. Thisimplies that proofreading-deficient enzymes [31–34] andin-vitro assays that favor mis-incorporation [11, 28] arebest suited to test our theory, for two reasons. First, theincreased magnitude of error-speed correlations in the ab-sence of error correction makes them easier to measure.Second, the poor precision of proofreading-deficient en-zymes [10] reduces the sample size needed to empiricallyestimate error fluctuations.Our result may also have consequences for the evolu-tion of genomes. Recent work showed that the cells whichreplicates earlier thanks to environmental fluctuations,contributes more to population growth [35]. With signif-icant error-speed correlations, the growth of a populationcould then be driven by the individuals whose DNA andproteins have significantly different error fractions fromthe population average. This phenomenon could haveplayed a role in early stages of life.We underline the conceptual difference between ourresults and the speed-error trade-off [5, 6, 19, 25] in par-ticular as observed in protein translation [26, 28, 36]. Intranslation, tuning the concentration of Mg ++ ions pro-vokes an approximately linear trade-off between the av-erage error and the average reaction speed [26]. Thiskind of tradeoffs may depend on the choice of a controlparameter [6, 19]. In contrast, we have shown that fluc-tuations of velocity and error are negatively correlatedin protein translation for fixed external parameters. Itremains to be explored whether the two results can begenerally connected, in a similar fashion as equilibriumfluctuations and responses to external forces are relatedin statistical physics [37, 38].We thank Michael Baym, Lucas Carey, Antonio Celani,Massimo Cencini, Todd Gingrich, and Jordan Horowitzfor discussion. We further thank S. Aird and P. Laurinofor comments on the manuscript. This work was sup-ported by JSPS KAKENHI Grant Number JP18K03473(to DC and SP). ∗ [email protected][1] C. H. Bennett, BioSystems , 85 (1979).[2] D. Andrieux and P. Gaspard, Proceedings of the NationalAcademy of Sciences , 9516 (2008).[3] F. Cady and H. Qian, Physical biology , 036011 (2009).[4] M. Esposito, K. Lindenberg, and C. Van den Broeck,Journal of Statistical Mechanics: Theory and Experi-ment , P01008 (2010).[5] P. Sartori and S. Pigolotti, Phys. Rev. X , 041039(2015).[6] S. Pigolotti and P. Sartori, Journal of Statistical Physics , 1167 (2016).[7] P. Gaspard, Physical review letters , 238101 (2016).[8] T. A. Kunkel and K. Bebenek, Annual Review of Bio-chemistry , 497 (2000), pMID: 10966467.[9] T. A. Kunkel and D. A. Erie, Annual Review of Genetics , 291 (2015), pMID: 26436461.[10] M. F. Goodman, S. Creighton, L. B. Bloom, J. Petruska,and D. T. A. Kunkel, Critical Reviews in Biochemistryand Molecular Biology , 83 (1993), pMID: 8485987.[11] M. V. Rodnina and W. Wintermeyer, Annual Review ofBiochemistry , 415 (2001), pMID: 11395413.[12] K. B. Gromadski and M. V. Rodnina, Molecular Cell , 191 (2004).[13] H. S. Zaher and R. Green, Cell , 746 (2009).[14] T. Pape, W. Wintermeyer, and M. Rodnina, The EMBOJournal , 3800 (1999).[15] H. L. Gahlon, A. R. Walker, G. A. Cisneros, M. H.Lamers, and D. S. Rueda, Phys. Chem. Chem. Phys. , 26892 (2018).[16] T. R. Gingrich and J. M. Horowitz, Phys. Rev. Lett. ,170601 (2017).[17] S. S. Patel, I. Wong, and K. A. Johnson, Biochemistry , 511 (1991).[18] J. M. Poulton, P. R. ten Wolde, and T. E. Ouldridge,Proceedings of the National Academy of Sciences ,1946 (2019).[19] K. Banerjee, A. B. Kolomeisky, and O. A. Igoshin, Pro-ceedings of the National Academy of Sciences , 5183(2017).[20] J. J. Hopfield, Proceedings of the National Academy ofSciences , 4135 (1974).[21] P. Gaspard and D. Andrieux, The Journal of ChemicalPhysics , 044908 (2014).[22] R. Rao and M. Esposito, New Journal of Physics ,023007 (2018).[23] P. Sartori and S. Pigolotti, Physical review letters ,188101 (2013).[24] R. Rao and L. Peliti, Journal of Statistical Mechanics:Theory and Experiment , P06001 (2015).[25] A. Murugan, D. A. Huse, and S. Leibler, Proceedings ofthe National Academy of Sciences , 12034 (2012).[26] M. Johansson, M. Lovmar, and M. Ehrenberg, CurrentOpinion in Microbiology , 141 (2008), cell Regulation.[27] Y. Savir and T. Tlusty, Cell , 471 (2013).[28] J. Zhang, K.-W. Ieong, M. Johansson, and M. Ehren-berg, Proceedings of the National Academy of Sciences , 9602 (2015).[29] Y. Shimizu, T. Kanamori, and T. Ueda, Methods ,299 (2005), engineering Translation.[30] S. Uemura, R. Iizuka, T. Ueno, Y. Shimizu, H. Taguchi,T. Ueda, J. D. Puglisi, and T. Funatsu, Nucleic AcidsResearch , e70 (2008).[31] H. R. Lee and K. A. Johnson, Journal of Biological Chem-istry , 36236 (2006).[32] K. Shimizu, K. Hashimoto, J. M. Kirchner, W. Nakai,H. Nishikawa, M. A. Resnick, and A. Sugino, Journal ofBiological Chemistry , 37422 (2002).[33] L. M. Dieckman, R. E. Johnson, S. Prakash, andM. T. Washington, Biochemistry , 7344 (2010), pMID:20666462.[34] K. Hashimoto, K. Shimizu, N. Nakashima, and A. Sug-ino, Biochemistry , 14207 (2003), pMID: 14640688.[35] M. Hashimoto, T. Nozoe, H. Nakaoka, R. Okura,S. Akiyoshi, K. Kaneko, E. Kussell, and Y. Wakamoto,Proceedings of the National Academy of Sciences ,3251 (2016).[36] M. Johansson, J. Zhang, and M. Ehrenberg, Proceedingsof the National Academy of Sciences , 131 (2012).[37] H. B. Callen and T. A. Welton, Phys. Rev. , 34 (1951).[38] R. Balescu, Equilibrium and Non-Equilibrium StatisticalMechanics , Wiley-Interscience publication (John Wiley& Sons, 1975). upplementary Information for: Error-speed correlations in biopolymer synthesis
Davide Chiuchi´u, Yuhai Tu, and Simone Pigolotti ∗ Biological Complexity Unit, Okinawa Institute of Science andTechnology Graduate University, Onna, Okinawa 904-0495, Japan IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.
This document contains additional material supplementing the manuscript ”Error-speed correlations in biopolymerreplication” (from now on ”Main Text”).The document is organized as follow. In Section I we prove the equivalence between fixed-length and fixed-timesetups. In Section II we derive our main result on error-speed correlations, i.e. Eqs. (7)-(8) in the Main Text. InSection III we discuss how to compute moments of incorporation times distribution using first-passage time techniques.In Section IV we discuss numerical computations. In particular, we present simulations to validate Eq. (4), Eqs. (7)-(8) and present details for computation of data in Figures 4 and 3. Section V presents computation of the error ratesfor proofreading models. Section VI provides the reaction network for the T7 DNA polymerase together with themeasured kinetic rates (Fig. SI.3). It also shows how the datum in Fig. 4 of the Main Text was computed. In TablesSI.2 and SI.3 we provide details on the distribution of the kinetic rates used in Fig. 3 of Main Text
I. EQUIVALENCE BETWEEN FIXED-LENGTH AND FIXED-TIME SETUPS
To prove the equivalence of the fixed-length and fixed-time setups, we assume that the distribution of {L = L / T, W = W / T } at fixed T , and that of {T = T / L, η = W / L } at fixed L satisfy large deviation principles [9] P (L , W∣ T ) ≍ exp [− T I (L , W)] (SI.1a) P (T , η ∣ L ) ≍ exp [− Lφ (T , η )] (SI.1b)where ≍ indicates the leading behavior of the distributions for large T and L . The rate functions I and ψ attain theirminimum at the average values of {L , W} and {T , η } , respectively. Their Hessian matrices evaluated at the minimumare proportional to the inverse covariance matrices of {L , W} and {T , η } .To connect I and φ we use that, for large times T , the distribution of a given observable X determines the timedistribution to observe a fixed value of X [4]. In particular, ψ ( TX ) = TX J ( XT ) (SI.2)where ψ and J are the rate functions of the intensive variables T / X and X / T for large values of X and T respectively.We assume that this result generalizes to joint distributions and apply it to I and φ . This yields φ (T , η ) = T I ( T , η T ) . (SI.3)We now perform the change of variable W = η L in Eq. (SI.1a) to obtain the joint probability P (L , η ∣ T ) of L and η at a fixed time T . At the leading order P (L , η ∣ T ) ≍ exp [− T Q (L , η )] , where Q (L , η ) = I (L , η L) . (SI.4)Combining Eq. (SI.3) and Eq. (SI.4) and expressing variances and covariances using the Hessian matrices of I , ψ and Q , we obtain r η T = − ∂ T η φ √ ∂ T T φ∂ ηη φ RRRRRRRRRRR min φ = ∂ L η Q √ ∂ LL Q∂ ηη Q RRRRRRRRRRR min Q = − r η L . (SI.5)which is equivalent to r ηT = − r ηL , (SI.6)when passing to extensive variables. Equation (SI.6) has been validated using numerical simulations for differentincorporation schemes. See Section IV for details. ∗ Electronic address: [email protected]
II. DERIVATION OF THE GENERAL FORMULA FOR THE ERROR-SPEED CORRELATIONS
The probability to have produced a copy polymer made up of a number R of right monomers and a number W ofwrong ones at a given time T can be approximated as P ( R, W ∣ T ) ≈( R + WW ) η W ( − η ) R ∫ ∞ δ ( R ∑ n = τ n + W ∑ m = τ m − T ) [ R ∏ i = P ( τ i ∣ r ) d τ i ] ⎡⎢⎢⎢⎣ W ∏ j = P ( τ j ∣ w ) d τ j ⎤⎥⎥⎥⎦ . (SI.7)Here, the binomial term counts all the possible permutations of R right monomers and W wrong ones. The integralterm with the Dirac delta function δ (⋅) isolate the trajectories with the prescribed number of right and wrong monomersat time T . We used the approximation sign since Eq. (SI.7) implicitly assumes that the sum of incorporation timesis exactly equal to T , whereas in practice it can be equal to T ′ < T if there are no further incorporation in the timeinterval [ T ′ , T ] . Representing the delta function as δ ( x ) = ∫ ∞−∞ e isx π d s, (SI.8)and swapping the integration order we obtain ρ ( R, W, T ) ∼ ( R + WW ) η W ( − η ) R ∫ ∞−∞ d s exp [ R ln ˜ P ( s ∣ r ) + W ln ˜ P ( s ∣ w ) − isT ] (SI.9)where ˜ P ( s ∣ x ) = ∫ ∞ d τ P ( τ ∣ x ) e isτ (SI.10)is the cumulant generating function of τ conditioned to the incorporation of monomer x . We assume that both P ( τ ∣ r ) and P ( τ ∣ w ) have a finite mean and variance. Under this hypothesis, the central limit theorem ensures thatthe sum of random incorporation times in Eq.(SI.7) tends to a Gaussian random variable. This implies that we cantruncate the cumulant generating function ln ˜ P ( s ∣ x ) asln ˜ P ( s ∣ x ) ∼ i ⟨ τ ⟩ x s − σ τ,x s . (SI.11)Substituting (SI.11) into (SI.9), using the Stirling approximation, and omitting sub-dominant terms finally gives theexpression P ( R, W ∣ T ) ≈ exp [−( R + W ) D η,η KL − (⟨ τ ⟩ r R + ⟨ τ ⟩ w W − T ) ( Rσ τ,r + W σ τ,w ) ] (SI.12)where D η,η KL = −( − η ) ln [( − η )( − η )] − η ln ( η / η ) (SI.13)is the Kullback-Leibler divergence between the error probability η and the error η = W /( R + W ) . To compute σ R , σ W and σ RW we approximate Eq.(SI.12) as a bivariate Gaussian distribution around Eq.(SI.12) maximum. This gives σ R = C η ⟨ τ ⟩ w + ( − η ) (( − η ) σ τ,r + η σ τ,w ) η (SI.14a) σ W = C ( − η )⟨ τ ⟩ r + η (( − η ) σ τ,r + η σ τ,w )( − η ) (SI.14b) σ RW = C [( − η ) σ τ,r + η σ τ,w − ⟨ τ ⟩ r ⟨ τ ⟩ w ] (SI.14c)(SI.14d)where C is a multiplicative factor. Substituting these expressions in Eq.(5) of the Main Text finally gives our mainresult, Eqs. (7)-(8). We also validated Eqs. (7)-(8) numerically, see Section IV. III. MOMENTS OF THE INCORPORATION TIMES
Analytical expressions for the first and second moments of the incorporation times are necessary to numericallyevaluate Eqs. (7)-(8) in real cases. To derive such quantities, we treat monomer incorporation as a first-passageproblem. We consider the probability P x,i ( τ ) that incorporation takes a time τ given that the incorporated monomeris x ∈ { r, w } and the initial state of the network is i . In the theory of stochastic processes, P x,i ( τ ) representsthe first-passage time distribution to reach the absorbing state x from state i [1, 2]. The Laplace transforms˜ P x,i ( s ) = ∫ ∞ e − sτ P x,i ( τ ) d τ evolve according to [1, 2, 8] s ˜ P x,i ( s ) = ∑ j ∉{ i,r ∗ ,w ∗ } k j,i [ ˜ P r,j ( s ) − ˜ P r,i ( s )] + k x ∗ ,i − ( k r ∗ ,i + k w ∗ ,i ) ˜ P r,i ( s ) (SI.15)where k j,i is the rate of the reaction from state i to j , ‘0” labels the network state where monomer incorporationstarts, while r ∗ and w ∗ label the states where right and wrong monomers are finally incorporated, respectively. Thismeans that conditional incorporation time distribution used in the Main Text is equal to P ( τ ∣ x ) = P x, ( τ ) . From thesolution of Eq. (SI.15) we obtain [1, 2, 8] η = ˜ P w, ( ) , ⟨ τ ⟩ x = − ( ∂ s ˜ P x, )( ) ˜ P x, ( ) , σ τ,x = ( ∂ ss ˜ P x, )( ) ˜ P x, ( ) − ⟨ τ ⟩ x . (SI.16) IV. NUMERICAL COMPUTATIONS
To validate Eq. (4) we considered two different incorporation processes. In the first, incorporation requires twoconsecutive irreversible reactions (Network 1 in Figure SI.1.a). In the second process, incorporation follows theMichaelis-Menten enzyme dynamics (Network 2 in Figure SI.1.a). For both processes, we randomly and independentlygenerated the reaction rates in the range [ . , ]
36 times for each reaction network, and then computed r ηL and r ηT by averaging over 5000 trajectories of the polymerization process for each set of rates. Stochastic trajectoriesare simulated with the Gillespie algorithm [3]. For the setup at fixed length, we stopped each simulation when thepolymer has reached a length L = T such that ⟨ L ⟩ = r ηL from the numerical simulations in Figure SI.1.b with theircorresponding predictions obtained by substituting Eqs. (SI.16) in Eqs. (7)-(8). Theoretical predictions agrees withthe simulated data (see Figure SI.1.c), supporting the validity of Eqs. (7)-(8).To generate the data points shown in Figures 3 and 4, we use Eq. (SI.16) and (7)-(8) to evaluate η and r ηL for thetwo reaction networks in Figure 2 with given kinetic rates. Data for the Hopfield model (Figure 4.a) are obtained from60 independent random choices of the reaction rates (see Table SI.2 for the generation rules). We also generated 60random sets of reaction rates for protein translation (black squares in Figure 4.b), see Table SI.3. In the same figure,blue crosses represent actual experimentally measured rates in different strains of E. coli [1]. The datum for the T7DNA polymerase is obtained in the same way but using a different reaction network. See Section VI for details. V. ERROR-SPEED CORRELATIONS IN PROOFREADING MODELS
Hopfield model:
We now apply Eq. (SI.15) and Eq. (SI.16) to the Hopfield model of Figure 2a. Combining theexpressions for η , ⟨ τ ⟩ x and σ τ,x together with Eq. (7) and Eq. (8), taking the limit k → ∞ , and expanding for small n gives η ≈ ( + e ( ∆ Ew − ∆ Er ) kBT ) − (SI.17a) r ηL ≈ ( e ∆ EwkBT − e ∆ ErkBT ) ne EwkBT ( + e ∆ ErkBT )+ e ErkBT ( + e ∆ EwkBT )+ e ( ∆ Ew + ∆ Er ) kBT . (SI.17b)To obtain Eq. (11) of the Main Text we observe that r ηL ≥ E r ≤ ∆ E w . Moreover n ≤ exp [ ∆ E r / k B T ] becausethe probability to incorporate a r monomer must be larger than the probability of proofreading it in useful operatingregimes. Considering then the largest possible value for η in Eq. (SI.17b), taking the limit ∆ E r ≪ − (a)(b) (c) rw Network 1 rw Network 2 -1 0 1-1-0.500.51 -1 0 1-1-0.500.51 network 1network 2
TheoreticalNumerical network 1network 2
FIG. SI.1: Numerical tests of Eq. (4) and Eqs. (7)-(8). (a) Reaction networks for synthesis. Network 1 (left): monomerincorporation takes place after two consecutive irreversible reactions. Network 2 (right): monomer incorporation follows theMichaelis-Menten enzyme dynamics. (b) Equivalence of the fixed time and fixed length setups. Each point correspond to thenumerical value of r ηL and r ηT obtained for an independent random choice of the reaction rates upon averaging over 5000realizations of the polymerization process. For each realization we take L = T such that ⟨ L ⟩ = r ηL used to testEq. (4) with the corresponding theoretical predictions obtained by substituting Eqs. (SI.16) into Eqs. (7)-(8) for each choiceof the reaction rates. Protein translation model: to apply Eq. (SI.15) and Eq. (SI.16) to protein translation, we consider the model ofFigure 2a with rate values as in [1] (see Table SI.1). To reduce the number of parameters, we introduce a minimalmodel of protein translation in which the rates whose average over the three
E. coli strands is of the same order ofmagnitudes in [1] are assigned the same value (see Figure SI.2 and Table SI.1). We apply Eq. (SI.15) and Eq. (SI.16)to such minimal model to compute η and r ηL . Expanding these quantities for large k + gives η ≈ k (cid:15) k + ( + k (cid:15) k ) − (SI.18a) r ηL ≈ − √ ( + kk (cid:15) ) − . (SI.18b)These equations are equivalent to Eqs. (12) and (13) in the Main Text upon substituting again the rates k + , k k (cid:15) ofthe minimal model with the specific rates they originate from. p p rw rw minimalmodel FIG. SI.2: Minimal model of protein translation where we consider identical all the reaction rates with same orders of magnitudesin the original model of protein translation.rate WT [s − ] HYP [s − ] ERR [s − ] minimal model k r
40 27 37 k + k r − . .
41 0 . k − k r
25 14 31 k + k rp . × − . × − . × − k − k rf .
415 4 .
752 7 . k + k w
27 25 36 k + k w −
47 46 4 k + k w . .
49 3 . kk wp .
67 0 .
50 0 . kk wf . × − . × − . × − k (cid:15) TABLE SI.1: Kinetic rates for the protein translation model measured in three different strands of
E. coli from [1, 7]: wildtype (WT), hypercorrective rpsL141 mutant (HYP) and error prone rps D12 mutant (ERR). In the column minimal model , weassign a single parameter to rates with the same order of magnitude. More specifically, we computed the average value of eachrate for the three
E. coli strands. We then assigned the value k (cid:15) if the average is less than 0 . s − ; k − if the average is in therange [ . , . ] s − ; k if the average is in [ . , ] s − ; and k + if the average is greater than 5 s − . VI. T7 DNA POLYMERASE
We consider the reaction network in Figure SI.3 to model the incorporation process by the T7 DNA polymerase.This network correspond to the one in [5] where we neglected polymerase detachment. The rates are k r pol =
300 s − , k r pp =
100 s − , k r next =
300 s − , k r sx = . − , k r sp =
700 s − , k w pol = .
03 s − , k w pp < × − s − , k w next = .
01 s − , k w sx = . − , k w sp =
700 s − and k w exo =
900 s − . To obtain the datum shown in Figure 4 of the Main Text, we first computed η and r ηL for the full network with the help of Eqs. (7)-(8) of the main text and Eqs. (SI.15)-(SI.16). Substituting the ratesfor T7 DNA polymerases gives η = × − and r ηL = − .
06. We then repeated the procedure for the core networkdefined upon removing the proofreading reaction (Figure SI.3), and obtained η core0 = × − and r core ηL = − .
71. Notethat r ηL < w r FIG. SI.3: Reaction network of the T7 DNA polymease. Dashed lines represent reactions that are removed from the completenetwork to obtain the core network used in Figure 4.b.
Distribution of rates for Fig. 3 of the Main Text
Quantity Uniform inlog ( k ) [
2; 6 ] log (− ∆ E w / k B T ) [− , ] log (( ∆ E w − ∆ E r )/ k B T ) [ , ] log ( n ) − ∆ E r / k B T [− . , ] TABLE SI.2: Distribution of the random rates for the data in Figure 3.a of the Main Text. In this range of rates, the modelalways operates in the error-correction regime [6]. Quantity Uniform in k w ( k r ) [ , ] log ( k r / k r ) [− , ] log ( k rf / k r ) [− , ] log ( k w / k r ) [− , ] log ( k w − / k r ) [− , ] k w [ . , . ] k r − [ . , . ] k r [ . , . ] log ( k wf ) [− , − ] TABLE SI.3: Distribution of the random rates for the data in Figure 3.a of the Main Text. Rates generated in these rangesare always consistent with the minimal model of Figure SI.2.[1] Kinshuk Banerjee, Anatoly B. Kolomeisky, and Oleg A. Igoshin. Elucidating interplay of speed and accuracy in biologicalerror correction.
Proceedings of the National Academy of Sciences , 114(20):5183–5188, 2017.[2] C. Gardiner.
Stochastic Methods: A Handbook for the Natural and Social Sciences . Springer Series in Synergetics. SpringerBerlin Heidelberg, 2009.[3] Daniel T Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions.
Journal of Computational Physics , 22(4):403 – 434, 1976.[4] Todd R. Gingrich and Jordan M. Horowitz. Fundamental bounds on first passage time fluctuations for currents.
Phys. Rev.Lett. , 119:170601, Oct 2017.[5] Myron F. Goodman, Steven Creighton, Linda B. Bloom, John Petruska, and Dr. Thomas A. Kunkel. Biochemical basis ofdna replication fidelity.
Critical Reviews in Biochemistry and Molecular Biology , 28(2):83–126, 1993. PMID: 8485987.[6] J. J. Hopfield. Kinetic proofreading: A new mechanism for reducing errors in biosynthetic processes requiring high specificity.
Proceedings of the National Academy of Sciences , 71(10):4135–4139, 1974.[7] Tillmann Pape, Wolfgang Wintermeyer, and Marina Rodnina. Induced fit in initial selection and proofreading of aminoacyl-trna on the ribosome.
The EMBO Journal , 18(13):3800–3807, 1999.[8] S. Redner.
A Guide to First-Passage Processes . A Guide to First-passage Processes. Cambridge University Press, 2001.[9] Hugo Touchette. The large deviation approach to statistical mechanics.