[PDF] Error-speed correlations in biopolymer synthesis

Abstract

Synthesis of biopolymers such as DNA, RNA, and proteins are biophysical processes aided by enzymes. Performance of these enzymes is usually characterized in terms of their average error rate and speed. However, because of thermal fluctuations in these single-molecule processes, both error and speed are inherently stochastic quantities. In this paper, we study fluctuations of error and speed in biopolymer synthesis and show that they are in general correlated. This means that, under equal conditions, polymers that are synthesized faster due to a fluctuation tend to have either better or worse errors than the average. The error-correction mechanism implemented by the enzyme determines which of the two cases holds. For example, discrimination in the forward reaction rates tends to grant smaller errors to polymers with faster synthesis. The opposite occurs for discrimination in monomer rejection rates. Our results provide an experimentally feasible way to identify error-correction mechanisms by measuring the error-speed correlations.

Full PDF

EError-speed correlations in biopolymer synthesis

Davide Chiuchi´u, Yuhai Tu, and Simone Pigolotti ∗ Biological Complexity Unit, Okinawa Institute of Science andTechnology Graduate University, Onna, Okinawa 904-0495, Japan IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.

Synthesis of biopolymers such as DNA, RNA, and proteins are biophysical processes aided byenzymes. Performance of these enzymes is usually characterized in terms of their average errorrate and speed. However, because of thermal ﬂuctuations in these single-molecule processes, botherror and speed are inherently stochastic quantities. In this paper, we study ﬂuctuations of errorand speed in biopolymer synthesis and show that they are in general correlated. This means that,under equal conditions, polymers that are synthesized faster due to a ﬂuctuation tend to haveeither better or worse errors than the average. The error-correction mechanism implemented bythe enzyme determines which of the two cases holds. For example, discrimination in the forwardreaction rates tends to grant smaller errors to polymers with faster synthesis. The opposite occursfor discrimination in monomer rejection rates. Our results provide an experimentally feasible wayto identify error-correction mechanisms by measuring the error-speed correlations.

Organisms encode genetic information in heteropoly-mers such as DNA and RNA. Replication of these het-eropolymers is a non-equilibrium process catalyzed byenzymes. The crucial observables to characterize theseenzymes are their error rate and speed. A low error, de-ﬁned as the fraction of monomers in the copy that do notmatch the template, ensures correct trasmission of bio-logical information. High processing speed is also crucialto guarantee fast cell growth. Theoretical approacheshave been developed to compute the average error andaverage speed of polymerization processes [1–7]. How-ever, at the single molecule level, both error and speedcan present signiﬁcant stochastic ﬂuctuations.In this Letter we address ﬂuctuations in the error andspeed of polymer synthesis. In particular, we show thatcorrelations between these quantities exist. These cor-relations provide a way to identify the error correctionmechanism adopted by an enzyme from experimentaldata. This approach can circumvent the characteriza-tion of these enzymes by measuring all kinetic rates ofthe underlying reaction network [8–15].We consider an enzyme that replicates an existing tem-plate polymer by sequentially incorporating monomersinto a copy polymer (Figure 1a). In a given time interval T , the enzyme synthesizes a copy made up of a numberof monomers L . Because of thermal ﬂuctuations, en-zymes sometimes incorporate wrong monomers ( w ) thatdo not match the template, instead of the right ones ( r ).In practical cases, there can be multiple types of wrongmonomers; for simplicity, we do not distinguish amongthem. We denote R as the number of right matches and W the number of wrong matches in the copy, so that R + W = L . The error of the polymer copy can be thenexpressed as η = WL . (1)We focus on two possible setups, corresponding to two idealized experiments. In the ﬁrst, the enzyme repli-cates a given template polymer for a ﬁxed time T ≫ L andthe error η ﬂuctuate. We denote their variance with σ L = ⟨ L ⟩ − ⟨ L ⟩ , σ η = ⟨ η ⟩ − ⟨ η ⟩ and the covariance with σ ηL = ⟨ ηL ⟩ − ⟨ η ⟩⟨ L ⟩ , where ⟨ . . . ⟩ is an average over diﬀer-ent realizations of the same process. Since T is ﬁxed, wequantify the correlations between error and speed withthe error-length coeﬃcient r ηL = σ ηL σ L σ η . (2)In the second setup, each realization terminates whenthe enzyme has incorporated a number L ≫ L is ﬁxed, whereasthe total duration T of the copy process ﬂuctuates. Thissetup represents the biological scenario where an en-zyme copies a polymer of ﬁxed length. In this case, westudy the correlation between the polymerization errorand speed via the coeﬃcient r ηT = σ ηT σ T σ η (3)where σ T = ⟨ T ⟩ − ⟨ T ⟩ is the variance of T and σ ηT = ⟨ ηT ⟩ − ⟨ η ⟩⟨ T ⟩ .Our two setups are akin to two conjugate ensemblesin equilibrium statistical physics. For large times (andlengths), ﬂuctuations in these two ensembles can be re-lated by means of large deviation theory [16]. Followingthis approach we obtain r ηT = − r ηL (4)(see SI for details). Eq. (4) implies that the two setupscorrespond to two equivalent ensembles. Therefore, inthe following we will focus on the ﬁxed time setup only. EnzymeW onma ch Free monomerRightmatchTemplate polymerCopy polymerT=T T=T T=T T=T Time TError 10 (c) Fixed Length L L=L L=L x L=L L=L L=L Length L230 2900.030.01Error (b) Fixed time T(a)

FIG. 1. (a) An enzyme reads an existing heteropolymer asa template and sequentially incorporates monomers to copyit. Each incorporated monomer can either be a right ( r ) orwrong ( w ) match with the template polymer. (b) Due tothermal ﬂuctuations, the polymer length L and error η arerandom quantities at ﬁxed completion time T . (c) When anenzyme produces a copy polymer with ﬁxed length, the error η and the time T ﬂuctuate. Scatterplots in (b) and (c) repre-sent N = To estimate r ηL we ﬁrst observe that the distributionsof R and W tend to Gaussian for large T due to the cen-tral limit theorem. We can therefore obtain the momentsof L = R + W and η = W /( W + R ) from those of R and W . This procedure yields r ηL = ( − ⟨ η ⟩) σ RW + ( − ⟨ η ⟩) σ W − ⟨ η ⟩ σ R √ σ R σ W − ( σ RW ) . (5)To compute the quantities in Eq. (5), we assume that theﬁnal chemical reaction to incorporate a r or a w monomeris irreversible. This assumption is realistic for most prac-tical cases such as DNA polymerization [10, 17] and pro-tein translation [12–14]. Our framework could be gener-alized to cases where the last reaction is reversible, per-mitting an interpretation of the results using stochastic thermodynamics [1–5, 18] . For simplicity, we also as-sume that probabilities to incorporate right and wrongmatches do not depend on the template monomer. Underthese assumptions, we describe the polymerization pro-cess by means of the probabilities η and 1 − η to incor-porate a wrong ( w ) or a right ( r ) monomer, respectively,and the probability distributions P ( τ ∣ r ) and P ( τ ∣ w ) thatit takes a time τ to incorporate an r or a w monomer, re-spectively. The value of η and the functions P ( τ ∣ r ) and P ( τ ∣ w ) can be computed from the underlying reactionnetwork [6, 19]. With these quantities we can expressthe joint probability P ( R, W ∣ T ) for large T as P ( R, W ∣ T ) ≈ ( R + WW ) η W ( − η ) R × (6) × ∫ ∞ R ∏ i = W ∏ j = d τ i d τ j P ( τ i ∣ r ) P ( τ j ∣ w ) δ ( R ∑ n = τ n + W ∑ m = τ m − T ) . In Eq. (6), the binomial term weight the probabilityof incorporating R right and W wrong monomers. Theintegral term in the second line selects trajectories whosesum of incorporation times is equal to T .Evaluating the average error for large T gives the con-sistency relation ⟨ η ⟩ = η . Computing the covariance ma-trix of P ( R, W ∣ T ) in the same limit (see SI) and substi-tuting the resulting moments in Eq. (5) gives r ηL = β √ + β (7)with β = (⟨ τ ⟩ r − ⟨ τ ⟩ w )√ η ( − η )√( − η ) σ τ,r + η σ τ,w , (8)where ⟨ τ ⟩ r , ⟨ τ ⟩ w , σ τ,r and σ τ,w are the means and vari-ances of P ( τ ∣ r ) and P ( τ ∣ w ) , that we assume to be ﬁnite.We validated Eqs. (7) and (8) with stochastic simula-tions (see SI) and we will use them to compute error-speed correlations in the following. Expanding Eq. (8)and Eq. (7) for small η leads to r ηL ≈ ⟨ τ ⟩ r − ⟨ τ ⟩ w σ τ,r √ η . (9)Eq. (9) is our main result. It predicts that the sign of r ηL depends on the sign of (⟨ τ ⟩ r − ⟨ τ ⟩ w ) only. We willshow that, in practice, the error correction mechanismsdetermine this sign. Kinetic Proofreading.

Hopﬁeld’s kinetic proofreadingmodel [20] is an elegant example of an incorporation pro-cesses implementing error correction. In this model, theenzyme ﬁrst captures either a r or w monomer (Figure2.a). After the initial binding, the enzyme can either re-ject the monomer or consume ATP to induce a conforma-tional change. Thanks to this change, the enzyme gains a (b) Translation model p p rwrw (a) Hop ﬁ eld model p p FIG. 2. Reaction networks for polymer synthesis. (a)Hopﬁeld model. The kinetic rates satisfy the relations k r = k exp [ ∆ E r / k B T ] , k w = k exp [ ∆ E w / k B T ] , k rp = m exp [ ∆ E r / k B T ] and k wp = m exp [ ∆ E w / k B T ] with k ≫ m = n ≪

1, so that the model operates in the proof-reading regime [20]. (b) Protein translation model from [19]with rates extracted from [14]. Same line thickness marksreaction rates of the same order of magnitude. second chance to reject wrong monomers. This second re-jection reaction is the kinetic proofreading and it greatlyreduce the error probability η . This idea has been gener-alized to more complex proofreading models [2, 6, 19, 21–24]. Rates of forward reactions in the Hopﬁeld model donot depend on the monomer type, whereas rejection re-actions have higher rates for w than r monomers (Figure2.a). In the proofreading regime (Figure 2.a), the errorprobability η can be estimated with ﬁrst passage timetechniques [19] as η ≈ ( + k r k w k rp k wp ) − ≈ e − ( ∆ Ew − ∆ Er ) kBT (10)where the ratios k r / k w and k rp / k wp reﬂect the discrimi-nation in the rejection rates (see [20] and SI), k b is theBoltzmann constant and T is the temperature. Boththese ratios relate to the diﬀerence ∆ E r − ∆ E w in bind-ing energy of r and w monomers through k r / k w = k rp / k wp = exp [( ∆ E r − ∆ E w )/ k B T ] . Outside of the error correctionregime, the error is always larger than predicted by Eq.(10) [20, 25]. In the proofreading regime of the Hopﬁeldmodel, error and speed ﬂuctuations are positively corre-lated. In particular, the error-length coeﬃcient alwaysfalls in the range0 ≤ r ηL ≤ η (√ − η η − ) (11)for any choice of η , see SI and Fig. 3. This impliesthat the error-speed correlations become negligible whenproofreading ensures very small errors. Protein translation.

A standard model of proteintranslation is characterized by the same reactions of theHopﬁeld model (Figure 2.b and [19, 26, 27]). A majordiﬀerence is that forward reactions discriminate betweenthe r and w monomers (Table S1 and [14, 19]). Withinthis model we estimate the error probability as η ≈ k wf k rf ( + k wf k wp ) − (12) (a) (b) -8 -4 -4 -2 -4 -3 -2 -1 HYP

E. coli WT E. coli

ERR

E. coli

FIG. 3. The Hopﬁeld model and the protein translation modelhave opposite error-speed correlations. (a) Hopﬁeld model.The gray shaded region deﬁnes the allowed values of r ηL fora given error probability η , see Eq (11). Black crosses areestimates of η and r ηL for 60 random sets of reaction rates inthe proofreading regime (see caption of Figure 2, SI, and TableS2). (b) Protein translation. To test Eq. (13) (gray line),we computed r ηL with the kinetic rates in [19] for wild type E. coli , a hypercorrective and an error-prone mutation (bluecrosses). We also evaluated r ηL for randomly generated sets ofthe reaction rates in Figure 2.b (black squares). For all pointsin both panels, correlation coeﬃcients are evaluated by meansof Eqs. (7)-(8) upon computing moments of incorporationtimes with ﬁrst passage time techniques [19]. See SI for detailsof numerical calculations. (see [19] and SI). In this case, the error probability de-pends on the relative preference to bind r rather than w monomers (term k wf / k rf ). Proofreading eﬀectiveness overthe incorporation reaction for w monomers (term k wf / k wp )further decrease the error probability. Because of thediscrimination in the forward rates, the energy diﬀerence∆ E r − ∆ E w does not set a lower bound to the error prob-ability as in the Hopﬁeld model [6]. Similar calculationsas in Eq. (11) predict an error-length coeﬃcient r ηL ≈ − √ ⎛⎝ + k wp k wf ⎞⎠ − . (13)(SI and Figure 3). At variance with the Hopﬁeld model,the error-length coeﬃcient is always negative in pro-tein translation. This striking diﬀerence arises from thediscrimination in the forward rates, as further clariﬁedin the following. Also in this case, the absolute valueof the error-speed correlations decreases at increasingproofreading eﬃciency. Ribosomes with impaired kineticproofreading should then exhibit stronger error-speedcorrelations. A computation of error-speed correlationsfrom experimentally measured rates for diﬀerent E. coli strains supports Eq. (13), Fig. 3.

Core network.

In both models we considered, kineticproofreading reduces the absolute value of the error-length coeﬃcient without changing its sign. To showthis eﬀect in general, we consider an arbitrary reactionnetwork where we identify some of the reaction steps asthose implementing kinetic proofreading (Fig.4.a). Forexample, in both models of Fig. 2, the proofreading re-actions are those with rates k rp and k wp . The completenetwork has an error probability η and an error-lengthcoeﬃcient r ηL . We now remove all the proofreading re-actions and deﬁne the remaining reactions as the “corenetwork”. In many practical case the core network isa simple linear chain of reactions, so that it is easy tocompute its error probability η core0 and its error-lengthcoeﬃcient r core ηL . To compare r ηL and r core ηL we assumethat both η and η core0 are small so that Eq. (9) holds.We further assume that proofreading is a relatively rareevent that does not signiﬁcantly inﬂuence the incorpora-tion times. Taking the ratio r ηL / r core ηL we therefore obtain r ηL ≈ √ η η core0 r core ηL . (14)Since proofreading reduces the error probability ( η < η core0 ), it also reduces the absolute value of the error-length coeﬃcient without changing its sign. We testedour prediction by computing r ηL from experimentallymeasured kinetic rates in E. coli ribosomes (Table S1,SI, and [14, 19]) and from the T7 DNA polymerase [10](see Figure S1 for the T7 datum). Eq. (14) qualitativelycaptures the dependence of the error-length coeﬃcienton the error-correction eﬀectiveness (Fig. 4.b). Quanti-tative discrepancies arise because the assumption thatproofreading does not aﬀect incorporation times partiallybreaks down.The core-network approach qualitatively explains whythe error-speed correlations have diﬀerent signs in theHopﬁeld model (Figure 2.a) and in the protein transla-tion model (Figure 2.b). Because of the discriminationin the backward rates, r monomer bind to the enzymefor a long time in the core network of the Hopﬁeld beforethe ﬁnal incorporation. On the other hand, w monomersbind to the enzyme for a short time before they are ei-ther rejected or incorporated. This implies that r core ηL > r ηL >

0, as showed in Figure 3. Conversely,the discrimination in the forward rates grants a fast in-corporation of r monomers in the core network of theprotein translation model. Thus, r core ηL < r ηL < (a)(b) P oo readingP oo reading rw net coli FIG. 4. Proofreading suppresses the error-speed correlations.(a) Incorporation with a ”core network” complemented byproofreading reactions. Each block in the ﬁgure representsan arbitrary sub-network with an average ﬂux in the direc-tion of the arrows. (b) Comparison of Eq. (14) (solid line)with computation of error-speed correlations from measuredkinetic rates, see SI. We considered the ribosomes in threestrains of

E. coli : wild type, hypercorrective, and error-prone[14, 19]. For each strain, we built the core network by re-moving the proofreading reactions and computed the relativechange in r ηL and η between the original and core networks.We performed the same analysis for a model of T7 DNA poly-merase (blue square, see SI and [8]). The data qualitativelyagree with Eq. (14). correlations could reveal the presence of forward discrim-ination in replicative enzymes. Cell-free translation sys-tems [29, 30] could provide simple and versatile in vitroassays to perform these measurements for ribosomes. Apossible experiment would be to let the system trans-late for a ﬁxed short time, separate the products intoshorter and longer peptides, and then measure using massspectroscopy whether the two categories contain signif-icantly diﬀerent errors. Similar experiments for DNApolymerases could bring insight into poorly character-ized chemical reaction networks, such as those of humanmitochondrial DNA pol- γ [31], yeast pol- (cid:15) [32] and pol- δ [33, 34].The magnitude of the error-speed correlations de-creases when proofreading eﬀectiveness increases. Thisimplies that proofreading-deﬁcient enzymes [31–34] andin-vitro assays that favor mis-incorporation [11, 28] arebest suited to test our theory, for two reasons. First, theincreased magnitude of error-speed correlations in the ab-sence of error correction makes them easier to measure.Second, the poor precision of proofreading-deﬁcient en-zymes [10] reduces the sample size needed to empiricallyestimate error ﬂuctuations.Our result may also have consequences for the evolu-tion of genomes. Recent work showed that the cells whichreplicates earlier thanks to environmental ﬂuctuations,contributes more to population growth [35]. With signif-icant error-speed correlations, the growth of a populationcould then be driven by the individuals whose DNA andproteins have signiﬁcantly diﬀerent error fractions fromthe population average. This phenomenon could haveplayed a role in early stages of life.We underline the conceptual diﬀerence between ourresults and the speed-error trade-oﬀ [5, 6, 19, 25] in par-ticular as observed in protein translation [26, 28, 36]. Intranslation, tuning the concentration of Mg ++ ions pro-vokes an approximately linear trade-oﬀ between the av-erage error and the average reaction speed [26]. Thiskind of tradeoﬀs may depend on the choice of a controlparameter [6, 19]. In contrast, we have shown that ﬂuc-tuations of velocity and error are negatively correlatedin protein translation for ﬁxed external parameters. Itremains to be explored whether the two results can begenerally connected, in a similar fashion as equilibriumﬂuctuations and responses to external forces are relatedin statistical physics [37, 38].We thank Michael Baym, Lucas Carey, Antonio Celani,Massimo Cencini, Todd Gingrich, and Jordan Horowitzfor discussion. We further thank S. Aird and P. Laurinofor comments on the manuscript. This work was sup-ported by JSPS KAKENHI Grant Number JP18K03473(to DC and SP). ∗ [email protected][1] C. H. Bennett, BioSystems , 85 (1979).[2] D. Andrieux and P. Gaspard, Proceedings of the NationalAcademy of Sciences , 9516 (2008).[3] F. Cady and H. Qian, Physical biology , 036011 (2009).[4] M. Esposito, K. Lindenberg, and C. Van den Broeck,Journal of Statistical Mechanics: Theory and Experi-ment , P01008 (2010).[5] P. Sartori and S. Pigolotti, Phys. Rev. X , 041039(2015).[6] S. Pigolotti and P. Sartori, Journal of Statistical Physics , 1167 (2016).[7] P. Gaspard, Physical review letters , 238101 (2016).[8] T. A. Kunkel and K. Bebenek, Annual Review of Bio-chemistry , 497 (2000), pMID: 10966467.[9] T. A. Kunkel and D. A. Erie, Annual Review of Genetics , 291 (2015), pMID: 26436461.[10] M. F. Goodman, S. Creighton, L. B. Bloom, J. Petruska,and D. T. A. Kunkel, Critical Reviews in Biochemistryand Molecular Biology , 83 (1993), pMID: 8485987.[11] M. V. Rodnina and W. Wintermeyer, Annual Review ofBiochemistry , 415 (2001), pMID: 11395413.[12] K. B. Gromadski and M. V. Rodnina, Molecular Cell , 191 (2004).[13] H. S. Zaher and R. Green, Cell , 746 (2009).[14] T. Pape, W. Wintermeyer, and M. Rodnina, The EMBOJournal , 3800 (1999).[15] H. L. Gahlon, A. R. Walker, G. A. Cisneros, M. H.Lamers, and D. S. Rueda, Phys. Chem. Chem. Phys. , 26892 (2018).[16] T. R. Gingrich and J. M. Horowitz, Phys. Rev. Lett. ,170601 (2017).[17] S. S. Patel, I. Wong, and K. A. Johnson, Biochemistry , 511 (1991).[18] J. M. Poulton, P. R. ten Wolde, and T. E. Ouldridge,Proceedings of the National Academy of Sciences ,1946 (2019).[19] K. Banerjee, A. B. Kolomeisky, and O. A. Igoshin, Pro-ceedings of the National Academy of Sciences , 5183(2017).[20] J. J. Hopﬁeld, Proceedings of the National Academy ofSciences , 4135 (1974).[21] P. Gaspard and D. Andrieux, The Journal of ChemicalPhysics , 044908 (2014).[22] R. Rao and M. Esposito, New Journal of Physics ,023007 (2018).[23] P. Sartori and S. Pigolotti, Physical review letters ,188101 (2013).[24] R. Rao and L. Peliti, Journal of Statistical Mechanics:Theory and Experiment , P06001 (2015).[25] A. Murugan, D. A. Huse, and S. Leibler, Proceedings ofthe National Academy of Sciences , 12034 (2012).[26] M. Johansson, M. Lovmar, and M. Ehrenberg, CurrentOpinion in Microbiology , 141 (2008), cell Regulation.[27] Y. Savir and T. Tlusty, Cell , 471 (2013).[28] J. Zhang, K.-W. Ieong, M. Johansson, and M. Ehren-berg, Proceedings of the National Academy of Sciences , 9602 (2015).[29] Y. Shimizu, T. Kanamori, and T. Ueda, Methods ,299 (2005), engineering Translation.[30] S. Uemura, R. Iizuka, T. Ueno, Y. Shimizu, H. Taguchi,T. Ueda, J. D. Puglisi, and T. Funatsu, Nucleic AcidsResearch , e70 (2008).[31] H. R. Lee and K. A. Johnson, Journal of Biological Chem-istry , 36236 (2006).[32] K. Shimizu, K. Hashimoto, J. M. Kirchner, W. Nakai,H. Nishikawa, M. A. Resnick, and A. Sugino, Journal ofBiological Chemistry , 37422 (2002).[33] L. M. Dieckman, R. E. Johnson, S. Prakash, andM. T. Washington, Biochemistry , 7344 (2010), pMID:20666462.[34] K. Hashimoto, K. Shimizu, N. Nakashima, and A. Sug-ino, Biochemistry , 14207 (2003), pMID: 14640688.[35] M. Hashimoto, T. Nozoe, H. Nakaoka, R. Okura,S. Akiyoshi, K. Kaneko, E. Kussell, and Y. Wakamoto,Proceedings of the National Academy of Sciences ,3251 (2016).[36] M. Johansson, J. Zhang, and M. Ehrenberg, Proceedingsof the National Academy of Sciences , 131 (2012).[37] H. B. Callen and T. A. Welton, Phys. Rev. , 34 (1951).[38] R. Balescu, Equilibrium and Non-Equilibrium StatisticalMechanics , Wiley-Interscience publication (John Wiley& Sons, 1975). upplementary Information for: Error-speed correlations in biopolymer synthesis

This document contains additional material supplementing the manuscript ”Error-speed correlations in biopolymerreplication” (from now on ”Main Text”).The document is organized as follow. In Section I we prove the equivalence between ﬁxed-length and ﬁxed-timesetups. In Section II we derive our main result on error-speed correlations, i.e. Eqs. (7)-(8) in the Main Text. InSection III we discuss how to compute moments of incorporation times distribution using ﬁrst-passage time techniques.In Section IV we discuss numerical computations. In particular, we present simulations to validate Eq. (4), Eqs. (7)-(8) and present details for computation of data in Figures 4 and 3. Section V presents computation of the error ratesfor proofreading models. Section VI provides the reaction network for the T7 DNA polymerase together with themeasured kinetic rates (Fig. SI.3). It also shows how the datum in Fig. 4 of the Main Text was computed. In TablesSI.2 and SI.3 we provide details on the distribution of the kinetic rates used in Fig. 3 of Main Text

I. EQUIVALENCE BETWEEN FIXED-LENGTH AND FIXED-TIME SETUPS

To prove the equivalence of the ﬁxed-length and ﬁxed-time setups, we assume that the distribution of {L = L / T, W = W / T } at ﬁxed T , and that of {T = T / L, η = W / L } at ﬁxed L satisfy large deviation principles [9] P (L , W∣ T ) ≍ exp [− T I (L , W)] (SI.1a) P (T , η ∣ L ) ≍ exp [− Lφ (T , η )] (SI.1b)where ≍ indicates the leading behavior of the distributions for large T and L . The rate functions I and ψ attain theirminimum at the average values of {L , W} and {T , η } , respectively. Their Hessian matrices evaluated at the minimumare proportional to the inverse covariance matrices of {L , W} and {T , η } .To connect I and φ we use that, for large times T , the distribution of a given observable X determines the timedistribution to observe a ﬁxed value of X [4]. In particular, ψ ( TX ) = TX J ( XT ) (SI.2)where ψ and J are the rate functions of the intensive variables T / X and X / T for large values of X and T respectively.We assume that this result generalizes to joint distributions and apply it to I and φ . This yields φ (T , η ) = T I ( T , η T ) . (SI.3)We now perform the change of variable W = η L in Eq. (SI.1a) to obtain the joint probability P (L , η ∣ T ) of L and η at a ﬁxed time T . At the leading order P (L , η ∣ T ) ≍ exp [− T Q (L , η )] , where Q (L , η ) = I (L , η L) . (SI.4)Combining Eq. (SI.3) and Eq. (SI.4) and expressing variances and covariances using the Hessian matrices of I , ψ and Q , we obtain r η T = − ∂ T η φ √ ∂ T T φ∂ ηη φ RRRRRRRRRRR min φ = ∂ L η Q √ ∂ LL Q∂ ηη Q RRRRRRRRRRR min Q = − r η L . (SI.5)which is equivalent to r ηT = − r ηL , (SI.6)when passing to extensive variables. Equation (SI.6) has been validated using numerical simulations for diﬀerentincorporation schemes. See Section IV for details. ∗ Electronic address: [email protected]

II. DERIVATION OF THE GENERAL FORMULA FOR THE ERROR-SPEED CORRELATIONS

The probability to have produced a copy polymer made up of a number R of right monomers and a number W ofwrong ones at a given time T can be approximated as P ( R, W ∣ T ) ≈( R + WW ) η W ( − η ) R ∫ ∞ δ ( R ∑ n = τ n + W ∑ m = τ m − T ) [ R ∏ i = P ( τ i ∣ r ) d τ i ] ⎡⎢⎢⎢⎣ W ∏ j = P ( τ j ∣ w ) d τ j ⎤⎥⎥⎥⎦ . (SI.7)Here, the binomial term counts all the possible permutations of R right monomers and W wrong ones. The integralterm with the Dirac delta function δ (⋅) isolate the trajectories with the prescribed number of right and wrong monomersat time T . We used the approximation sign since Eq. (SI.7) implicitly assumes that the sum of incorporation timesis exactly equal to T , whereas in practice it can be equal to T ′ < T if there are no further incorporation in the timeinterval [ T ′ , T ] . Representing the delta function as δ ( x ) = ∫ ∞−∞ e isx π d s, (SI.8)and swapping the integration order we obtain ρ ( R, W, T ) ∼ ( R + WW ) η W ( − η ) R ∫ ∞−∞ d s exp [ R ln ˜ P ( s ∣ r ) + W ln ˜ P ( s ∣ w ) − isT ] (SI.9)where ˜ P ( s ∣ x ) = ∫ ∞ d τ P ( τ ∣ x ) e isτ (SI.10)is the cumulant generating function of τ conditioned to the incorporation of monomer x . We assume that both P ( τ ∣ r ) and P ( τ ∣ w ) have a ﬁnite mean and variance. Under this hypothesis, the central limit theorem ensures thatthe sum of random incorporation times in Eq.(SI.7) tends to a Gaussian random variable. This implies that we cantruncate the cumulant generating function ln ˜ P ( s ∣ x ) asln ˜ P ( s ∣ x ) ∼ i ⟨ τ ⟩ x s − σ τ,x s . (SI.11)Substituting (SI.11) into (SI.9), using the Stirling approximation, and omitting sub-dominant terms ﬁnally gives theexpression P ( R, W ∣ T ) ≈ exp [−( R + W ) D η,η KL − (⟨ τ ⟩ r R + ⟨ τ ⟩ w W − T ) ( Rσ τ,r + W σ τ,w ) ] (SI.12)where D η,η KL = −( − η ) ln [( − η )( − η )] − η ln ( η / η ) (SI.13)is the Kullback-Leibler divergence between the error probability η and the error η = W /( R + W ) . To compute σ R , σ W and σ RW we approximate Eq.(SI.12) as a bivariate Gaussian distribution around Eq.(SI.12) maximum. This gives σ R = C η ⟨ τ ⟩ w + ( − η ) (( − η ) σ τ,r + η σ τ,w ) η (SI.14a) σ W = C ( − η )⟨ τ ⟩ r + η (( − η ) σ τ,r + η σ τ,w )( − η ) (SI.14b) σ RW = C [( − η ) σ τ,r + η σ τ,w − ⟨ τ ⟩ r ⟨ τ ⟩ w ] (SI.14c)(SI.14d)where C is a multiplicative factor. Substituting these expressions in Eq.(5) of the Main Text ﬁnally gives our mainresult, Eqs. (7)-(8). We also validated Eqs. (7)-(8) numerically, see Section IV. III. MOMENTS OF THE INCORPORATION TIMES

Analytical expressions for the ﬁrst and second moments of the incorporation times are necessary to numericallyevaluate Eqs. (7)-(8) in real cases. To derive such quantities, we treat monomer incorporation as a ﬁrst-passageproblem. We consider the probability P x,i ( τ ) that incorporation takes a time τ given that the incorporated monomeris x ∈ { r, w } and the initial state of the network is i . In the theory of stochastic processes, P x,i ( τ ) representsthe ﬁrst-passage time distribution to reach the absorbing state x from state i [1, 2]. The Laplace transforms˜ P x,i ( s ) = ∫ ∞ e − sτ P x,i ( τ ) d τ evolve according to [1, 2, 8] s ˜ P x,i ( s ) = ∑ j ∉{ i,r ∗ ,w ∗ } k j,i [ ˜ P r,j ( s ) − ˜ P r,i ( s )] + k x ∗ ,i − ( k r ∗ ,i + k w ∗ ,i ) ˜ P r,i ( s ) (SI.15)where k j,i is the rate of the reaction from state i to j , ‘0” labels the network state where monomer incorporationstarts, while r ∗ and w ∗ label the states where right and wrong monomers are ﬁnally incorporated, respectively. Thismeans that conditional incorporation time distribution used in the Main Text is equal to P ( τ ∣ x ) = P x, ( τ ) . From thesolution of Eq. (SI.15) we obtain [1, 2, 8] η = ˜ P w, ( ) , ⟨ τ ⟩ x = − ( ∂ s ˜ P x, )( ) ˜ P x, ( ) , σ τ,x = ( ∂ ss ˜ P x, )( ) ˜ P x, ( ) − ⟨ τ ⟩ x . (SI.16) IV. NUMERICAL COMPUTATIONS

To validate Eq. (4) we considered two diﬀerent incorporation processes. In the ﬁrst, incorporation requires twoconsecutive irreversible reactions (Network 1 in Figure SI.1.a). In the second process, incorporation follows theMichaelis-Menten enzyme dynamics (Network 2 in Figure SI.1.a). For both processes, we randomly and independentlygenerated the reaction rates in the range [ . , ]

36 times for each reaction network, and then computed r ηL and r ηT by averaging over 5000 trajectories of the polymerization process for each set of rates. Stochastic trajectoriesare simulated with the Gillespie algorithm [3]. For the setup at ﬁxed length, we stopped each simulation when thepolymer has reached a length L = T such that ⟨ L ⟩ = r ηL from the numerical simulations in Figure SI.1.b with theircorresponding predictions obtained by substituting Eqs. (SI.16) in Eqs. (7)-(8). Theoretical predictions agrees withthe simulated data (see Figure SI.1.c), supporting the validity of Eqs. (7)-(8).To generate the data points shown in Figures 3 and 4, we use Eq. (SI.16) and (7)-(8) to evaluate η and r ηL for thetwo reaction networks in Figure 2 with given kinetic rates. Data for the Hopﬁeld model (Figure 4.a) are obtained from60 independent random choices of the reaction rates (see Table SI.2 for the generation rules). We also generated 60random sets of reaction rates for protein translation (black squares in Figure 4.b), see Table SI.3. In the same ﬁgure,blue crosses represent actual experimentally measured rates in diﬀerent strains of E. coli [1]. The datum for the T7DNA polymerase is obtained in the same way but using a diﬀerent reaction network. See Section VI for details. V. ERROR-SPEED CORRELATIONS IN PROOFREADING MODELS

Hopﬁeld model:

We now apply Eq. (SI.15) and Eq. (SI.16) to the Hopﬁeld model of Figure 2a. Combining theexpressions for η , ⟨ τ ⟩ x and σ τ,x together with Eq. (7) and Eq. (8), taking the limit k → ∞ , and expanding for small n gives η ≈ ( + e ( ∆ Ew − ∆ Er ) kBT ) − (SI.17a) r ηL ≈ ( e ∆ EwkBT − e ∆ ErkBT ) ne EwkBT ( + e ∆ ErkBT )+ e ErkBT ( + e ∆ EwkBT )+ e ( ∆ Ew + ∆ Er ) kBT . (SI.17b)To obtain Eq. (11) of the Main Text we observe that r ηL ≥ E r ≤ ∆ E w . Moreover n ≤ exp [ ∆ E r / k B T ] becausethe probability to incorporate a r monomer must be larger than the probability of proofreading it in useful operatingregimes. Considering then the largest possible value for η in Eq. (SI.17b), taking the limit ∆ E r ≪ − (a)(b) (c) rw Network 1 rw Network 2 -1 0 1-1-0.500.51 -1 0 1-1-0.500.51 network 1network 2

TheoreticalNumerical network 1network 2

FIG. SI.1: Numerical tests of Eq. (4) and Eqs. (7)-(8). (a) Reaction networks for synthesis. Network 1 (left): monomerincorporation takes place after two consecutive irreversible reactions. Network 2 (right): monomer incorporation follows theMichaelis-Menten enzyme dynamics. (b) Equivalence of the ﬁxed time and ﬁxed length setups. Each point correspond to thenumerical value of r ηL and r ηT obtained for an independent random choice of the reaction rates upon averaging over 5000realizations of the polymerization process. For each realization we take L = T such that ⟨ L ⟩ = r ηL used to testEq. (4) with the corresponding theoretical predictions obtained by substituting Eqs. (SI.16) into Eqs. (7)-(8) for each choiceof the reaction rates. Protein translation model: to apply Eq. (SI.15) and Eq. (SI.16) to protein translation, we consider the model ofFigure 2a with rate values as in [1] (see Table SI.1). To reduce the number of parameters, we introduce a minimalmodel of protein translation in which the rates whose average over the three

E. coli strands is of the same order ofmagnitudes in [1] are assigned the same value (see Figure SI.2 and Table SI.1). We apply Eq. (SI.15) and Eq. (SI.16)to such minimal model to compute η and r ηL . Expanding these quantities for large k + gives η ≈ k (cid:15) k + ( + k (cid:15) k ) − (SI.18a) r ηL ≈ − √ ( + kk (cid:15) ) − . (SI.18b)These equations are equivalent to Eqs. (12) and (13) in the Main Text upon substituting again the rates k + , k k (cid:15) ofthe minimal model with the speciﬁc rates they originate from. p p rw rw minimalmodel FIG. SI.2: Minimal model of protein translation where we consider identical all the reaction rates with same orders of magnitudesin the original model of protein translation.rate WT [s − ] HYP [s − ] ERR [s − ] minimal model k r

40 27 37 k + k r − . .

41 0 . k − k r

25 14 31 k + k rp . × − . × − . × − k − k rf .

415 4 .

752 7 . k + k w

27 25 36 k + k w −

47 46 4 k + k w . .

49 3 . kk wp .

67 0 .

50 0 . kk wf . × − . × − . × − k (cid:15) TABLE SI.1: Kinetic rates for the protein translation model measured in three diﬀerent strands of

E. coli from [1, 7]: wildtype (WT), hypercorrective rpsL141 mutant (HYP) and error prone rps D12 mutant (ERR). In the column minimal model , weassign a single parameter to rates with the same order of magnitude. More speciﬁcally, we computed the average value of eachrate for the three

E. coli strands. We then assigned the value k (cid:15) if the average is less than 0 . s − ; k − if the average is in therange [ . , . ] s − ; k if the average is in [ . , ] s − ; and k + if the average is greater than 5 s − . VI. T7 DNA POLYMERASE

We consider the reaction network in Figure SI.3 to model the incorporation process by the T7 DNA polymerase.This network correspond to the one in [5] where we neglected polymerase detachment. The rates are k r pol =

300 s − , k r pp =

100 s − , k r next =

300 s − , k r sx = . − , k r sp =

700 s − , k w pol = .

03 s − , k w pp < × − s − , k w next = .

01 s − , k w sx = . − , k w sp =

700 s − and k w exo =

900 s − . To obtain the datum shown in Figure 4 of the Main Text, we ﬁrst computed η and r ηL for the full network with the help of Eqs. (7)-(8) of the main text and Eqs. (SI.15)-(SI.16). Substituting the ratesfor T7 DNA polymerases gives η = × − and r ηL = − .

06. We then repeated the procedure for the core networkdeﬁned upon removing the proofreading reaction (Figure SI.3), and obtained η core0 = × − and r core ηL = − .

71. Notethat r ηL < w r FIG. SI.3: Reaction network of the T7 DNA polymease. Dashed lines represent reactions that are removed from the completenetwork to obtain the core network used in Figure 4.b.

Distribution of rates for Fig. 3 of the Main Text

Quantity Uniform inlog ( k ) [

2; 6 ] log (− ∆ E w / k B T ) [− , ] log (( ∆ E w − ∆ E r )/ k B T ) [ , ] log ( n ) − ∆ E r / k B T [− . , ] TABLE SI.2: Distribution of the random rates for the data in Figure 3.a of the Main Text. In this range of rates, the modelalways operates in the error-correction regime [6]. Quantity Uniform in k w ( k r ) [ , ] log ( k r / k r ) [− , ] log ( k rf / k r ) [− , ] log ( k w / k r ) [− , ] log ( k w − / k r ) [− , ] k w [ . , . ] k r − [ . , . ] k r [ . , . ] log ( k wf ) [− , − ] TABLE SI.3: Distribution of the random rates for the data in Figure 3.a of the Main Text. Rates generated in these rangesare always consistent with the minimal model of Figure SI.2.[1] Kinshuk Banerjee, Anatoly B. Kolomeisky, and Oleg A. Igoshin. Elucidating interplay of speed and accuracy in biologicalerror correction.

Proceedings of the National Academy of Sciences , 114(20):5183–5188, 2017.[2] C. Gardiner.

Stochastic Methods: A Handbook for the Natural and Social Sciences . Springer Series in Synergetics. SpringerBerlin Heidelberg, 2009.[3] Daniel T Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions.

Journal of Computational Physics , 22(4):403 – 434, 1976.[4] Todd R. Gingrich and Jordan M. Horowitz. Fundamental bounds on ﬁrst passage time ﬂuctuations for currents.

Phys. Rev.Lett. , 119:170601, Oct 2017.[5] Myron F. Goodman, Steven Creighton, Linda B. Bloom, John Petruska, and Dr. Thomas A. Kunkel. Biochemical basis ofdna replication ﬁdelity.

Critical Reviews in Biochemistry and Molecular Biology , 28(2):83–126, 1993. PMID: 8485987.[6] J. J. Hopﬁeld. Kinetic proofreading: A new mechanism for reducing errors in biosynthetic processes requiring high speciﬁcity.

Proceedings of the National Academy of Sciences , 71(10):4135–4139, 1974.[7] Tillmann Pape, Wolfgang Wintermeyer, and Marina Rodnina. Induced ﬁt in initial selection and proofreading of aminoacyl-trna on the ribosome.

The EMBO Journal , 18(13):3800–3807, 1999.[8] S. Redner.

A Guide to First-Passage Processes . A Guide to First-passage Processes. Cambridge University Press, 2001.[9] Hugo Touchette. The large deviation approach to statistical mechanics.