[PDF] Kinetics and thermodynamics of DNA polymerases with exonuclease proofreading

Abstract

Kinetic theory and thermodynamics are applied to DNA polymerases with exonuclease activity, taking into account the dependence of the rates on the previously incorportated nucleotide. The replication fidelity is shown to increase significantly thanks to this dependence at the basis of the mechanism of exonuclease proofreading. In particular, this dependence can provide up to a hundred-fold lowering of the error probability under physiological conditions. Theory is compared with numerical simulations for the DNA polymerases of T7 viruses and human mitochondria.

Full PDF

aa r X i v : . [ q - b i o . S C ] A p r Kinetics and thermodynamics of DNA polymerases with exonuclease proofreading

Pierre Gaspard

Center for Nonlinear Phenomena and Complex Systems,Universit´e Libre de Bruxelles, Code Postal 231, Campus Plaine, B-1050 Brussels, Belgium

Kinetic theory and thermodynamics are applied to DNA polymerases with exonuclease activity,taking into account the dependence of the rates on the previously incorportated nucleotide. Thereplication ﬁdelity is shown to increase signiﬁcantly thanks to this dependence at the basis of themechanism of exonuclease proofreading. In particular, this dependence can provide up to a hundred-fold lowering of the error probability under physiological conditions. Theory is compared withnumerical simulations for the DNA polymerases of T7 viruses and human mitochondria.

I. INTRODUCTION

In the companion paper [1], the kinetic theory andthermodynamics of exonuclease-deﬁcient (exo − ) DNApolymerases have been developed on the basis of exper-imental observations from biochemistry [2–18] and ana-lytical methods to solve the kinetic equations of copoly-merization [19–21]. In this way, the error probability hasbeen studied numerically and analytically for the exo − DNA polymerases of T7 viruses and human mitochon-dria, showing how replication ﬁdelity is determined bykinetics and related to thermodynamics.Molecular and thermal ﬂuctuations at the nanoscaleinduce errors in DNA replication, at the origin of pos-sible mutations. Following the discovery and systematicexperimental studies of DNA polymerases [22–24], Hop-ﬁeld, Ninio, and Bennett have shown in the seventieshow kinetics can reduce the error probability if repli-cation is driven out of equilibrium [25–27]. Thanks tothe kinetic ampliﬁcation of discrimination between cor-rect and incorrect base pairs, exo − DNA polymerasescan already lower their error probability down to val-ues of about 10 − -10 − . However, the theory of quasis-pecies by Eigen and Schuster [28–30] implies that the self-replication of a quasispecies requires that the mutationprobability should be lower than a threshold inverselyproportional to its genome size. For genome sizes as largeas 10 nucleotides in higher eukaryotes [31], the muta-tion probability should thus be as low as 10 − . There-fore, biological evolution towards such organisms wouldnot have been possible without dedicated proofreadingmechanisms greatly enhancing the ﬁdelity of DNA repli-cation. Progress in molecular biology has revealed thatproofreading is speciﬁcally generated, on the one hand,by the exonuclease activity of DNA polymerases, able tocleave incorrectly incorporated nucleotides one at a time,as well as by postreplication mismatch repair [5]. Thislatter mechanism is the feature of other enzymes thanDNA polymerases that will not be considered here [32–34].In the present paper, our goal is to extend the analy-sis of the companion paper [1] to DNA polymerases withexonuclease proofreading and to investigate the implica-tions of the dependence of the rates on the previouslyincorporated nucleotide. In the presence of the exonucle- ase activity, we shall see that this dependence becomesessential to signiﬁcantly lower the error probability andincrease ﬁdelity in the transmission of genetic informa-tion.In Section II, the kinetic scheme is extended to in-clude the reactions of the exonuclease activity. The ki-netic equations are explicitly given in Appendix A. Asfor exo − polymerases, these equations are reduced forthe Michaelis-Menten kinetics, which continues to hold inthe presence of the exonuclease activity. The thermody-namics of the enzymatic activities is also presented. Thekinetic equations are solved analytically and the thermo-dynamic quantities are deduced under the assumptions ofthe Bernoulli-chain model in Section III and Appendix B,and of the Markov-chain model in Section IV and Ap-pendix C. In Sections V and VI, the enzymatic process isnumerically simulated for the exo + DNA polymerases ofT7 viruses and human mitochondria and the results areanalyzed with the theoretical methods. Conclusions aredrawn in Section VII.

II. KINETICS AND THERMODYNAMICSA. Generalities

Most DNA polymerases have an exonuclease proof-reading mechanism besides their polymerase activity.The polymerase and exonuclease activities may be onthe same polypeptide of the protein complex forming theenzyme, or on separate subunits. The elongation of DNAis catalyzed by the polymerase domain. If this latter isslowed down by the insertion of an incorrect nucleotide,the growing copy moves to the exonuclease domain wherethe incorrect nucleotide is cleaved by hydrolysis. The re-actions are thuspolymerase activity:dNTP + E · DNA l ⇋ E · DNA l +1 + PP i , (1)exonuclease activity:E · DNA l +1 + H O ⇋ E · DNA l + dNMP , (2)overall reaction:dNTP + H O ⇋ dNMP + PP i , (3)where E denotes the enzyme, DNA l the deoxyribonucleicdouble helix with the template strand and the growingcopy, dNTP deoxyribonucleoside triphosphate, PP i py-rophosphate, and dNMP deoxyribonucleoside monophos-phate. The exonuclease activity proceeds by hydrolysisof the ultimate nucleotide attached to the growing chain,so that a deoxyribonucleoside monophosphate dNMP re-turns to the surrounding solution. The overall reac-tion (3) is the hydrolysis of dNTP into dNMP with therelease of pyrophosphate by the polymerase activity. TheGuldberg-Waage condition for the chemical equilibriumof this overall reaction is given by[dNTP] eq c [dNMP] eq [PP i ] eq = exp (cid:18) ∆ G RT (cid:19) (4)in terms of the standard free enthalpy of hydrolysis intopyrophosphate ∆ G = − . / mol [35–37], the stan-dard concentration c = 1 M, the molar gas constant R = 8 . − mol − , and the temperature T .Given the physiological concentrations [38, 39]:[dNTP] ≃ µ M , (5)[dNMP] ≃ µ M , (6)[PP i ] ≃ , (7)the enzyme remains very much away from chemical equi-librium even if the mean growth velocity is vanishing. In-deed, Eq. (4) implies exceedingly small dNTP concentra-tions for physiological dNMP concentrations, or unphys-ical dNMP concentrations far above one molar for physi-ological dNTP concentrations. Contrary to exonuclease-deﬁcient DNA polymerases where the vanishing of thegrowth velocity corresponds to thermodynamic equilib-rium, this latter is not accessible if exonuclease is ac-tive, so that energy dissipation is always present and theenzyme remains out of equilibrium, unless it dissociatesfrom DNA.As in the companion paper [1], the dissociation of theenzyme-DNA complex is neglected, which is justiﬁed ifthe processivity is high enough. Moreover, the surround-ing solution is assumed to be suﬃciently large in orderto keep constant the concentrations of the diﬀerent sub-stances during the growth of the copy.In the regime of steady growth, the mean growth ve-locity v is given by v = r p − r x , (8)where r p = h ˙ N PP i i denotes the polymerase rate, i.e., therate of pyrophosphate release, while r x = h ˙ N dNMP i is theexonuclease rate, i.e., the rate of dNMP release. Conse-quently, the exonuclease rate is equal to the polymeraserate r p = r x if the growth velocity is vanishing v = 0.Besides, the release of dNTP in the surrounding solutionhas the rate h ˙ N dNTP i = − r p − r x . (9) Consequently, dNTP continues to be consumed if thegrowth velocity is zero, unless both the polymerase andexonuclease rates vanish, which only happens at the in-accessible chemical equilibrium. We should thus ex-pect that the thermodynamic entropy production re-mains positive in the presence of the exonuclease activity. B. Kinetic scheme

Here, we consider the kinetic scheme depicted in Fig. 1.The reaction rates of the elementary steps are deter-mined by the mass action law, giving the appropriatedependence of the rates on the concentrations of nu-cleotides and pyrophosphate. The template is repre-sented by its sequence α = n · · · n l n l +1 · · · and the copyby ω = m · · · m l , where the successive nucleotides aredenoted m, n ∈ { A , C , G , T } . In this notation, we canexpress the dependence of the kinetics on the local en-vironment of the ultimate nucleotide being attached ordetached. m l +1 m l +1 P m l +1 P m l +2 P m l +2 P m l +1 P PPP +m l +1 m l n l +1 n l k p +m l +1 m l n l +1 n l k [ m l +1 P] +m l +1 m l n l +1 n l k x [ m l +1 ] - m l +1 m l n l +1 n l k - m l +1 m l n l +1 n l k p [P] - m l +1 m l n l +1 n l k x ......... ... ......... ......... ...... ... m l -1 m l n l +1 n l -1 n l n l +2 m l -2 n l -2 ...... ... m l +1 P m l -1 m l n l +1 n l -1 n l n l +2 m l -2 n l -2 ...... ... m l +1 m l -1 m l n l +1 n l -1 n l n l +2 m l -2 n l -2 FIG. 1: Kinetic scheme of the polymerase and exonucleaseactivities. { m j } denotes the ssDNA copy, { n j } the ssDNAtemplate, m j P deoxynucleoside triphosphates dNTP, m j de-oxynucleoside monophosphates dNMP, and P pyrophosphatesPP i . The reactions of the polymerase activity are the sameas in the companion paper [1]. From a copy ending withthe ultimate monomeric unit m l , the binding and dis-sociation of the deoxyribonucleoside triphosphate m l +1 Phave the ratesdNTP binding: k + ml +1 mlnl +1 nl [ m l +1 P] , (10)dNTP dissociation: k − ml +1 mlnl +1 nl . (11)From the copy ending with m l +1 P, the release of py-rophosphate PP i – denoted P – and the reverse reactionof pyrophosphorolysis have the ratespolymerization: k p + ml +1 mlnl +1 nl , (12)depolymerization: k p − ml +1 mlnl +1 nl [P] . (13)In the presence of the exonuclease activity, there aretwo further reactions. In Fig. 1, the binding of the de-oxyribonucleoside monophosphate m l +1 is depicted onthe left-hand side, and its dissociation by hydrolysis onthe right-hand side. These reactions have the rates:dNMP binding: k x + ml +1 mlnl +1 nl [ m l +1 ] , (14)dNMP dissociation: k x − ml +1 mlnl +1 nl , (15)this latter deﬁning the rate constant of the exonucleaseactivity.The kinetic equations of this scheme are given asEqs. (A1)-(A2) in Appendix A.The template sequence is characterized by the proba-bility distributions ν l ( α ) = ν l ( n · · · n l ) to ﬁnd a subse-quence n · · · n l of length l . The known properties ofsuch sequences have been discussed in the companionpaper [1]. For reasons of simplicity, we suppose in thefollowing that the template sequence is Bernoullian with ν l ( α ) = 1 / l for l = 1 , , , ... . C. Michaelis-Menten kinetics

Experimental observations [2–18] show that the bind-ing and dissociation of deoxyribonucleoside triphosphatesis faster than the other reactions: k + mm ′ n n ′ [ m P] , k − mm ′ n n ′ ≫ k p + mm ′ n n ′ , k p − mm ′ n n ′ [P] , k x + mm ′ n n ′ [ m ] , k x − mm ′ n n ′ . (16)Accordingly, the sequences m · · · m l − m l and m · · · m l − m l m l +1 P remain in quasi-equilibriumand the kinetic equations are simpliﬁed as shown inAppendix A into an equation for the time evolution ofthe probability: P t ( ω | α ) = P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) , (17)which is the sum of probabilities (A5), as in the com-panion paper [1]. Now, the transition rates are givenby the sums of the rates of polymerase and exonucleaseactivities: W + ml +1 mlnl +1 nl = W p + ml +1 mlnl +1 nl + W x + ml +1 mlnl +1 nl , (18) W − mlml − nl +1 nl nl − = W p − mlml − nl +1 nl nl − + W x − mlml − nl +1 nl nl − . (19)The rates of polymerase activity have already been pre-sented in the companion paper [1], while those of exonu-clease activity are the following ones. The rate of dNMP binding is equal to W x + ml +1 mlnl +1 nl ≡ k x + ml +1 mlnl +1 nl [ m l +1 ] Q mlnl +1 nl (20)and the rate of dNMP dissociation by the exonuclease is W x − mlml − nl +1 nl nl − ≡ k x − mlml − nl nl − Q mlnl +1 nl (21)with the same denominators Q mlnl +1 nl ≡ X m l +1 [ m l +1 P] K ml +1 mlnl +1 nl (22)and Michaelis-Menten dissociation constants K mm ′ n n ′ ≡ k − mm ′ n n ′ k + mm ′ n n ′ , (23)as for the rates of polymerase activity [1]. The rates (20)and (21) are written for the reactive events occurring tothe sequence m · · · m l of the copy growing on the se-quence n · · · n l n l +1 · · · of the template. We notice thatthe detachment of m l has a rate that depends not onlyon the template nucleotides n l − and n l forming the basepairs m l − : n l − and m l : n l , but also on the next templatenucleotide n l +1 because of the Michaelis-Menten kinetics.The stochastic process ruled by the rates of polymeraseand exonuclease activities can be simulated with Gille-spie’s algorithm [40, 41].Following a cycle that is closing after an overall reac-tion (3), we obtain the following Guldberg-Waage condi-tions for chemical equilibrium:[ m P] eq c [ m ] eq [P] eq = c k − mm ′ n n ′ k p − mm ′ n n ′ k x + mm ′ n n ′ k + mm ′ n n ′ k p + mm ′ n n ′ k x − mm ′ n n ′ = exp (cid:18) ∆ G RT (cid:19) . (24)Combining this thermodynamic constraint with the sameassumption of proportionality between the rates of poly-merization and depolymerization as in the companion pa-per [1] K P ≡ k p + mm ′ n n ′ k p − mm ′ n n ′ , (25)we obtain the rate constants of dNMP binding (14) givenby k x + mm ′ n n ′ = k x − mm ′ n n ′ K P K mm ′ n n ′ c exp (cid:18) ∆ G RT (cid:19) (26)in terms of the exonuclease rate constants (15), whichhave been measured experimentally [4, 5, 8].We shall also assume for simplicity that the nucleotideconcentrations are all equal:[dNTP] ≡ [dATP] = [dCTP] = [dGTP] = [dTTP] , (27)[dNMP] ≡ [dAMP] = [dCMP] = [dGMP] = [dTMP] . (28)After a long enough time, the copolymerization pro-cess reaches a regime of steady growth since there isno termination. In this regime, the mean growth veloc-ity (8) becomes constant and the sequence of the growingcopy takes stationary statistical properties described bythe probability distribution µ l ( ω | α ) to ﬁnd the sequence ω = m · · · m l of length l given that the template has thesequence α . This distribution describes in particular themismatches between the copy and the template, whichare generated with the error probability η [1]. D. Thermodynamics and sequence disorder

In the regime of steady growth, the entropy production– which is explicitly given by Eq. (A7) in Appendix A –can be written as [19]Σ ≡ R d i Sdt = v A ≥ A = ǫ + D ( ω | α ) , (29)in terms of the mean growth velocity v , the entropy pro-duction per nucleotide or aﬃnity A , the free-energy driv-ing force per nucleotide ǫ , and the conditional Shannondisorder D ( ω | α ) per nucleotide in the sequence [1]. Ifreplication ﬁdelity is high enough and the substitutionsare equiprobable, the conditional disorder per nucleotidecan be estimated as D ( ω | α ) ≃ η ln 3e η ≪ η ≪ III. BERNOULLI-CHAIN MODELA. Kinetics and error probability

The simplest model assumes that the rates only dependon the nucleotide that is attached or detached and onwhether the pairing is correct or incorrect. Besides thepolymerization and depolymerization rates W p ± c and W p ± i already presented in the companion paper [1], the ratesof the exonuclease activity (and its reverse) are given by W x+c = k x+c [dNMP] Q , W x+i = k x+i [dNMP] Q , (31) W x − c = k x − c Q , W x − i = k x − i Q , (32)with the Michaelis-Menten denominator: Q = 1 + (cid:18) K c + 3 K i (cid:19) [dNTP] , (33) and the rate constants of dNMP binding k x+c = k x − c K P K c c exp (cid:18) ∆ G RT (cid:19) , (34) k x+i = k x − i K P K i c exp (cid:18) ∆ G RT (cid:19) , (35)introduced with Eq. (26).Under these assumptions, the probability distributionof the copy sequence factorizes into the probabilities tohave correct or incorrect base pairs, which read µ (c) = 1 − η and µ (i) = η/ η . Consequently, thegrowing copy is a Bernoulli chain. The probabilities (36)are here given by µ (c) = W p+c + W x+c W p − c + W x − c + v , (37) µ (i) = W p+i + W x+i W p − i + W x − i + v , (38)where v is the mean growth velocity. This latter can beexpressed as v = W p+c + W x+c − η − W p − c − W x − c = 3 W p+i + W x+i η − W p − i − W x − i (39)by using Eq. (36) [20]. The error probability η can thusbe obtained as a root of a polynomial of degree two. Be-sides, the polymerase and exonuclease rates are given by r ρ = ν ρ (cid:2) W ρ +c − W ρ − c (1 − η ) + 3 W ρ +i − W ρ − i η (cid:3) (40)with the stoichiometric coeﬃcients ν p = +1 and ν x = − ρ = p and x. We notice that the meangrowth velocity (8) is recovered by Eqs. (39). B. Thermodynamics and sequence disorder

As shown in Appendix B, the thermodynamic entropyproduction is indeed given by Eq. (29) in terms of themean growth velocity v , the free-energy driving force ǫ , and the conditional Shannon disorder per nucleotide D ( ω | α ). This latter takes the same expression in termsof the error probability as for the Bernoulli-chain modelof exo − DNA polymerases [1], which is approximated byEq. (30) if η ≪ C. Low speed regime

As aforementioned, the polymerase and exonucleaseactivities do not approach thermodynamic equilibriumif the mean growth velocity is vanishing v = 0. Instead,the polymerase and exonuclease rates become equal byEq. (8) and they can be evaluated as r p0 = r x0 ≃ k x − c . (41)Indeed, the exonuclease rate is given by Eq. (40) with ρ = x and only the term with W x − c dominates, since therates of dNMP attachment W x+c and W x+i are very smallby the Guldberg-Waage condition (4) while the term with W x − i η is negligible because the error probability is alsovery small η ≪

1. Moreover, the denominator (33) be-comes unity in this regime where the dNTP concentrationis small with respect to the Michaelis-Menten dissociationconstants K c and K i .Now, setting the velocity equal to zero in Eqs. (39)and evaluating the diﬀerent terms, we similarly obtainthe critical value of dNTP concentration and the corre-sponding error probability as[dNTP] , B ≃ K c (cid:18) [P] K P + k x − c k p+c (cid:19) , (42) η , B = 3 k p+i [dNTP] , B k x − i K i ≃ k p+i K c k x − i K i (cid:18) [P] K P + k x − c k p+c (cid:19) , (43)in the Bernoulli-chain model. If k x − c = 0, we recover anestimation of the equilibrium dNTP concentration givenin Ref. [1] for this model.The entropy production can also be evaluated fromEq. (B1) when the growth velocity is zero to get1 R d i Sdt (cid:12)(cid:12)(cid:12) , B ≃ k x − c ln K c c e − β ∆ G K P [dNMP] , (44)with β = ( RT ) − . Since the entropy production is notvanishing, the polymerase remains out of equilibrium dueto the exonuclease activity. We notice that the entropyproduction (44) would be inﬁnite if the dNMP concen-tration was zero because the reverse exonuclease reactionwould have zero probability to occur in such a fully irre-versible regime. [ d N M P ] [dNTP] v = v > v < c h e m . e qu il . depolymerization polymerization•• FIG. 2: Schematic diagram of enzymatic regimes in the planeof dNTP and dNMP concentrations showing the transitionbetween polymerization and depolymerization if the meangrowth velocity v is vanishing, and the line where the reactiondNTP+H O ⇋ dNMP+PP i is at chemical equilibrium. In Fig. 2, the behavior of DNA polymerases is schemat-ically depicted in the plane of dNTP and dNMP concen-trations. In this plane, the chemical equilibrium con-dition (4) is a straight line going up from the originwith a very high slope. Without any approximation,the condition of zero velocity obtained from Eqs. (39)would read γ [dNTP] + χ [dNMP] = 1 with two posi-tive coeﬃcients γ and χ , which corresponds to the line v = 0 in Fig. 2. Typically, the coeﬃcients are ordered as γ ≫ χ so that the approximation [dNTP] ≃ γ − givenby Eq. (42) is well satisﬁed. The DNA copy is grow-ing by polymerization at higher dNTP concentrations[dNTP] > [dNTP] , while it undergoes depolymerizationfor lower values [dNTP] < [dNTP] .The complete thermodynamic equilibrium would bereached if both rates (8) and (9) were vanishing, in whichcase r peq = r xeq = 0. This would happen at the intersec-tion of both oblique lines in Fig. 2, but this point is attoo large dNMP concentration to be accessible, conﬁrm-ing that the exonuclease activity keeps the enzyme awayfrom equilibrium. D. Full speed regime

The growth velocity tends to its maximal value if thedNTP concentration is larger than the Michaelis-Mentencrossover concentration:[dNTP] ≫ (cid:18) K c + 3 K i (cid:19) − . (45)In this regime, the detachment rates are negligible inEqs. (39) so that the mean growth velocity and the errorprobability are given by v ∞ , B ≃ k p+c K c K i , (46) η ∞ , B ≃ k p+i K c k p+c K i . (47)If we compare with the results for exo − polymerases [1],we notice that these quantities are not aﬀected by theexonuclease activity in the full speed regime. The reasonis that the exonuclease rate rapidly decreases as r xB ≃ k x − c [dNTP] (cid:18) K c + 3 K i (cid:19) − , (48)if the dNTP concentration increases, so that the poly-merase activity dominates: v ∞ ≃ r p ∞ . Accordingly, theentropy production, the aﬃnity, and the free-energy driv-ing force increase in this regime as the logarithm of thedNTP concentration, in the same way as for exo − poly-merases [1]. IV. MARKOV-CHAIN MODELA. Kinetics and error probability

Experimental observations show that the rates ofDNA polymerases depend on both the newly and pre-viously incorporated nucleotides [5]. In the companionpaper [1], we have already given the polymerization-depolymerization rates. With the exonuclease activity,we need to include also the corresponding rates W x+ p | p ′ = k x+ p | p ′ [dNMP] Q p ′ , (49) W x − p | p ′ = k x − p | p ′ Q p , (50)with p, p ′ = c or i, whether the pair is correct or in-correct. Here, we shall assume for simplicity that theexonuclease rates do not depend on the previously incor-porated nucleotide, i.e., k x − p | p ′ = k x − p for all p ′ . However,the dependence on p and p ′ remains for the polymeraserates so that the Guldberg-Waage chemical equilibriumconditions (24) give the rate constants k x+ p | p ′ = k x − p K P K p | p ′ c exp (cid:18) ∆ G RT (cid:19) (51)for p, p ′ = c or i. These rates are thus determined interms of available experimental data [4, 5, 8].These assumptions imply that a copy growing on aBernoullian template is a Markov chain with conditional,tip, and bulk probabilities calculated as explained in thecompanion paper [1] in terms of the transition rates (18)-(19). First, we need to calculate the partial velocities byiterating the self-consistent equations v c = a v c + b v i , (52) v i = c v c + d v i , (53)with the coeﬃcients a = W +c | c W − c | c + v c , (54) b = 3 W +i | c W − i | c + v i , (55) c = W +c | i W − c | i + v c , (56) d = 3 W +i | i W − i | i + v i , (57)depending themselves on the partial velocities. There-after, the tip probabilities are obtained by solving thefollowing set of linear equations: µ (c) = a µ (c) + 3 c µ (i) , (58) µ (i) = b µ (c) + d µ (i) , (59) with the same coeﬃcients (54)-(57), which are now de-termined. The mean growth velocity is then given by v = v c µ (c) + 3 v i µ (i) . (60)The conditional probabilities of the Markov chain canalso be obtained [1].In the Markov-chain model, the polymerase and exonu-clease rates can be expressed in terms of the conditionalprobabilities µ ( p ′ | p ) and the tip probabilities µ ( p ) of theMarkov chain as r ρ = ν ρ X p,p ′ h W ρ + p | p ′ µ ( p ′ ) − W ρ − p | p ′ µ ( p ′ | p ) µ ( p ) i (61)with ρ = p or x, the sum extending to p, p ′ ∈ { c , i , i , i } ,and the same stoichiometric coeﬃcient ν ρ as in Eq. (40).Now, the error probability is deﬁned as η ≡ µ (i)in terms of the bulk probability of incorrect base pairs.Since the bulk and tip probabilities are proportional toeach other according to ¯ µ (i) = µ (i) v i /v , we ﬁnd that theerror probability reads η = 11 + v c µ (c)3 v i µ (i) . (62) B. Thermodynamics and sequence disorder

It is in Appendix C that the expression is given forthe thermodynamic entropy production of the Markov-chain model in the regime of steady growth. Again, thisexpression can be written in the form (29) in terms of themean growth velocity v , the free-energy driving force ǫ ,and the conditional disorder per nucleotide given by D ( ω | α ) = − X p,p ′ µ ( p ′ | p ) ¯ µ ( p ) ln µ ( p ′ | p ) ≥ , (63)where µ ( p ′ | p ) and ¯ µ ( p ) are respectively the conditionaland bulk probabilities of the Markov chain and the sumextends to p, p ′ ∈ { c , i , i , i } . C. Low speed regime

The critical dNTP concentration where the growthvelocity is vanishing can be obtained by requiring thatEqs. (58)-(59) admit a non-zero solution. The conditionfor this result can be expressed as ( a − d −

1) = bc in terms of the coeﬃcients (54)-(57) with v c = v i = 0 sothat the mean growth velocity (60) is indeed zero. Weobtain the critical value:[dNTP] , M ≃ K c | c [P] K P + k x − c k p+c | c ! , (64)which is similar to the value (42) obtained for theBernoulli-chain model. At this critical concentration,the polymerase and exonuclease rates are again given byEq. (41) as in the Bernoulli-chain model, while the en-tropy production (C1) takes the value1 R d i Sdt (cid:12)(cid:12)(cid:12) , M ≃ k x − c ln K c | c c e − β ∆ G K P [dNMP] , (65)with β = ( RT ) − in the Markov-chain model. This con-ﬁrms that the polymerase remains away from equilibriumeven if the growth velocity is zero.If the dNTP concentration is larger than the criticalvalue (64) but lower than the Michaelis-Menten dissoci-ation constants for p = c and i[dNTP] , M ≪ [dNTP] ≪ (cid:18) K c | p + 3 K i | p (cid:19) − , (66)the mean growth velocity is no longer vanishing and it isimportant to determine how it increases with the dNTPconcentration. Since the error probability is expectedto be very small η ≪

1, the probability for a correctbase at the tip of the growing copy is much larger thanfor an incorrect base pair, µ (c) ≃ ≫ µ (i). In or-der to satisfy Eq. (58), the coeﬃcient a should be veryclose to unity: a ≃

1. Consequently, Eq. (54) impliesthat the corresponding partial velocity is approximatedby v c ≃ W +c | c − W − c | c . This expression is typicallydominated by the polymerization rate W p+c | c . At dNTPconcentrations lower than the Michaelis-Menten dissocia-tion constants, the denominator is close to the unit value Q c ≃

1, whereupon the mean growth velocity (60) canbe evaluated as v ≃ v c ≃ k p+c | c K c | c [dNTP] , (67)in the range (66).Now, we turn to the error probability at low but non-vanishing growth velocity. Introducing the ratios of tipprobabilities and partial velocities x ≡ µ (c)3 µ (i) and y ≡ v c v i , (68)the error probability (62) can be rewritten as η = (1 + xy ) − . Taking the ratios of Eqs. (52)-(53) and (58)-(59),we obtain quadratic equations for x and y in terms of thecoeﬃcients (54)-(57): b x = c y = 12 h a − d + p ( a − d ) + 4 bc i . (69)At dNTP concentrations lower than the Michaelis-Menten dissociation constants where Q c ≃ Q i ≃

At concentrations satisfying the conditions[dNTP] ≫ (cid:18) K c | p + 3 K i | p (cid:19) − for p = c and i , (73)the exonuclease activity characterized by the rate (61)with ρ = x decreases as r xM ≃ K c | c [dNTP] k x − c + 3 k x − i k p+i | c K c | i k p+c | i K i | c ! (74)for increasing dNTP concentration. Accordingly, thepolymerase activity dominates in this regime and we re-cover the same expressions of the mean growth velocity,the error probability, the entropy production, the aﬃn-ity, and the free-energy driving force per nucleotide, asfor exonuclease-deﬁcient polymerases [1]. In particular,the error probability is given by η ∞ , M ≃ k p+i | c K c | c k p+c | c K i | c , (75)which is comparable to the result (47) for the Bernoulli-chain model at full speed. E. The error probability of exonucleaseproofreading

Remarkably, it is possible to obtain an expression forthe error probability across the crossover from low tohigh dNTP concentrations. Instead of approximating thecoeﬃcient b by Eq. (70), we go back to its deﬁnition (55).Provided that W x+i | c ≪ W p+i | c and W p − i | c ≪ W x − i | c , we get b ≃ W p+i | c W x − i | c + v i . (76)Now, the partial velocity for incorrect base pair incorpo-ration can be approximated as v i ≃ c v using Eq. (56)with v ≃ v c so that v i ≃ k p+c | i K c | i [dNTP] , (77)Replacing into Eq. (76) and supposing that the dNTPconcentration is still low enough that Q c ≃ Q i ≃

1, thecoeﬃcient is evaluated as b ≃ k p+i | c K c | i [dNTP] K i | c (cid:16) k x − i K c | i + k p+c | i [dNTP] (cid:17) . (78)With the coeﬃcient (71), the error probability is againapproximated by η ≃ ( xy ) − ≃ bc , whereupon we obtain: η M ≃ η ∞ , M [dNTP][dNTP] + K xM (79)with the full speed error probability (75) and the constant K xM ≡ k x − i K c | i k p+c | i . (80)Equation (79) with the constants (75) and (80) consti-tutes the main result of this paper. In the low speedregime where [dNTP] ≪ K xM , we recover the error prob-ability given by Eq. (72). For [dNTP] ≫ K xB , the errorprobability (75) at full speed is recovered. Therefore,Eq. (79) describes the behavior of the error probabilityin the crossover. For exonuclease-deﬁcient DNA poly-merases, the exonuclease rate constant vanishes k x − i = 0,so that K xM = 0 and the error probability keeps its max-imal value η = η ∞ , M . Equation (79) shows that thebehavior of the error probability has a Michaelis-Mentenreminiscence.The key point is that the the error probability is ableto reach much lower values under the assumptions of theMarkov-chain model than under those of the Bernoullione. Indeed, for the Bernoulli-chain model, the errorprobability given by Eqs. (79)-(80) would read η B ≃ η ∞ , B [dNTP][dNTP] + K xB (81)with the full speed error probability (47) and K xB ≡ k x − i K c k p+c . (82)Since the polymerization rate slows down after theincorporation of an incorrect base pair k p+c | i ≪ k p+c | c and the Michaelis-Menten dissociation constant becomeslarger K c | i > K c | c [4, 5, 8], the constant (80) is signiﬁ-cantly larger under the assumptions of the Markov-chainmodel than if the polymerase was insensitive to the pre-viously incorporated nucleotide as in the Bernoulli-chainmodel, K xM ≫ K xB . However, the full speed error proba-bility remains comparable under both types of assump-tions, η ∞ , M ≃ η ∞ , B . Thanks to the dependence of thepolymerization rates on the previously incorporated nu-cleotide as described by the Markov-chain model, the er-ror probability is thus able to reach much lower valuesthan otherwise. This proofreading mechanism can bemost signiﬁcant as will be illustrated for the DNA poly-merases of T7 viruses and human mitochondria in thefollowing sections. TABLE I: Exo + T7 DNA polymerase at 20 ◦ C: The rate con-stants of the exonuclease activity used for the numerical sim-ulations and the Markov-chain model. The rate constants arefrom Refs. [4, 5]. The other parameters are from the numeri-cal simulations. parameter value units k x − c . − k x − i . − [dNTP] . × − M r p0 = r x0 . η ∞ . × − nt − D ∞ . × − nt − v ∞

288 nt/s

V. T7 DNA POLYMERASEA. Phenomenology

The kinetics of the exonuclease activity for the wild-type T7 DNA polymerase has been experimentally inves-tigated [4, 5]. The parameter values of the exonucleaseactivity inferred from the measured data and used for thepresent numerical simulations are given in Table I. Theparameters of the polymerase activity are the same as inthe companion paper [1]. Since there is no complete setof data for every possible pairing, it is the Markov-chainmodel that is numerically simulated for the T7 DNApolymerase, as in Ref. [1]. The values [dNMP] = 10 − Mand [P] = 10 − M are used respectively for the con-centrations of deoxynucleoside monophosphate and py-rophosphate, which correspond to physiological condi-tions [38, 39].

B. Numerical and theoretical results

The kinetics is numerically simulated by using Gille-spie’s algorithm [1, 40, 41]. The concentrations of thefour nucleotides are supposed to be equal accordingto Eqs. (27) and (28). The template is taken as aBernoulli chain of equal probabilities ν ( n ) = for n ∈ { A , C , G , T } . For every value of dNTP concentra-tion, the growth of 5 × chains each of length 10 is numerically simulated and the diﬀerent quantities ofinterest are computed by statistical averaging over thissample. In the following ﬁgures, the dots show the resultsof the numerical simulations, the solid lines those of theMarkov-chain model of Section IV, and the dashed linesthose of the Bernoulli-chain model of Section III.In Fig. 3, we see that the growth velocity vanishes atthe critical dNTP concentration given in Table I, whichis very well approximated by both Eqs. (42) and (64): -3 -2 -1 -3 -2 -1 -9 -7 -5 -3 -1 S , v , r p , r x ( s - ) A , e ( n t - ) [dNTP] (M) vv = A S r p r x FIG. 3: Exo + T7 DNA polymerase: Entropy production Σ(crossed squares), mean polymerase rate r p (open circles),mean growth velocity v (ﬁlled triangles), aﬃnity A (ﬁlledsquares), free-energy driving force ǫ (open squares), and meanexonuclease rate r x (dotted squares) versus nucleotide concen-tration. The dots are the results of numerical simulations, thesolid lines of the Markov-chain model, and the dashed linesof the Bernoulli-chain model. [dNTP] , B = [dNTP] , M ≃ . × − M. Accordingto Eq. (8), the polymerase and exonuclease rates be-come both equal to the exonuclease rate constant byEq. (41), as conﬁrmed by the corresponding values inTable I. Since the exonuclease activity goes on in spite ofthe vanishing of the growth velocity, the thermodynamicentropy production does not vanish and takes the posi-tive value d i Sdt (cid:12)(cid:12) ≃ . R s − , estimated by both Eqs. (44)and (65). Consequently, the aﬃnity – which is the en-tropy production per incorporated nucleotide - and thefree-energy driving force per nucleotide are both diverg-ing to inﬁnity if the velocity is vanishing. In the full speedregime, the exonuclease rate decreases to very small val-ues. Therefore, the behavior of the exo − polymerase isrecovered. The growth velocity saturates at its maximalvalue, which becomes equal to the polymerization rate,while the entropy production increases logarithmicallywith [dNTP], as the aﬃnity and the free-energy drivingforce.In Fig. 4, the decrease of the exonuclease rate r x is seen to manifest a small shoulder, which is not thecase for the Bernoulli-chain model. The reason is that,as [dNTP] increases, the rate decreases as r x ≃ . × − [dNTP] − by Eq. (74) of the Markov-chain model,while the Bernoulli-chain one would predict a faster de-crease as r x ≃ . × − [dNTP] − according to Eq. (48).The most prominent result of Fig. 4 is that the er-ror probability and the conditional Shannon disorder pernucleotide take drastically lower values in the Markov- -10 -8 -6 -4 -2 -10 -8 -6 -4 -2 -9 -7 -5 -3 -1 r x ( s - ) D , h ( n t - ) [dNTP] (M) v = r x D h FIG. 4: Exo + T7 DNA polymerase: Mean exonuclease rate r x (dotted squares), conditional Shannon disorder per nucleotide D (ﬁlled squares), and error probability η (ﬁlled circles) versusnucleotide concentration. The dots are the results of numer-ical simulations, the solid lines of the Markov-chain model,and the dashed lines of the Bernoulli-chain model. The long-dashed line is the behavior of Eq. (72). chain model (dots and solid lines) than the Bernoulli one(dashed lines). If the error probability takes comparablevalues η ∞ , M ≃ η ∞ , B ≃ − at full speed in both theBernoulli- and Markov-chain models as expected fromEqs. (47) and (75), in contrast, the error probabilitybecomes much smaller in the Markov-chain model, al-though it keeps its full speed value down to very smalldNTP concentrations in the Bernoulli-chain model. Thebehavior observed for the numerical simulations and theMarkov-chain model is well described by Eq. (72) giv-ing η ≃ . × − [dNTP], which is the long-dashedline depicted in Fig. 4. This low speed behavior andthe crossover to the full speed regime are well describedby Eqs.(79)-(80) of the Markov-chain model. Since thepolymerase is slowed down after an incorrect pairing k p+c | i = 0 .

01 s − ≪ k p+c | c = k p+c = 300 s − , the concen-tration (80) where the crossover happens is much largerin the Markov-chain model than the Bernoulli one whereit is given by Eq. (82): K xM ≃ . × − M ≫ K xB ≃ . × − M. Therefore, the error probability keeps itsfull speed value to a much lower [dNTP] concentrationbefore mildly decreasing in the Bernoulli-chain model(dashed line in Fig. 4). For physiological dNTP con-centrations (5), the error probability is thus two decadessmaller thanks to the dependence of the kinetics on thepreviously incorporated nucleotide, which cannot be de-scribed by Bernoulli-chain models.0

TABLE II: Exo + human mitochondrial DNA polymerase γ at 37 ◦ C: The rate constants of the exonuclease activity usedfor the numerical simulations. The rate constants are fromRef. [8]. The other parameters are from the numerical simu-lations. parameter value units k x − c .

05 s − k x − i . − [dNTP] . × − M r p0 = r x0 .

05 nt/s η ∞ . × − nt − D ∞ . × − nt − v ∞

34 nt/s

VI. HUMAN MITOCHONDRIAL DNAPOLYMERASEA. Phenomenology

For the wild-type human mitochondrial polymerase,the kinetics of the exonuclease activity has been experi-mentally investigated and reported in Ref. [8]. Table IIgives the values here used for numerical simulations. Theparameters of the polymerase activity are the same as inthe companion paper [1]. The same values as in the pre-vious Section V are taken for the concentrations of de-oxynucleoside monophosphate and pyrophosphate. Theparameters of the polymerase activity for the Bernoulli-and Markov-chain models are given in the companion pa-per [1].

B. Numerical and theoretical results

Gillespie’s algorithm is again used to simulate numer-ically the stochastic process [1, 40, 41]. The concen-trations of the four nucleotides are equal according toEqs. (27) and (28), while the template is a Bernoullichain of equal probabilities. The growth is numericallysimulated for 5 × chains each of length 10 in order toperform the statistics. In the following ﬁgures, the resultsof the numerical simulations are depicted by dots, thoseof the Markov-chain model of Section IV by solid lines,and those of the Bernoulli-chain model of Section III bydashed lines.Figure 5 shows the entropy production, the polymer-ization rate, the growth velocity, the aﬃnity, the free-energy driving force, and the exonuclease rate as a func-tion of dNTP concentration. The growth velocity is van-ishing at the critical dNTP concentration given in Ta-ble II, which is very well approximated by Eq. (64) giv-ing [dNTP] , M ≃ . × − M, while Eq. (42) gives theclose value [dNTP] , B ≃ . × − M. At this critical -3 -2 -1 -3 -2 -1 -10 -8 -6 -4 -2 S , v , r p , r x ( s - ) A , e ( n t - ) [dNTP] (M) v = r p r x S v e A FIG. 5: Exo + human mitochondrial DNA polymerase: En-tropy production Σ (crossed squares), mean polymerase rate r p (open circles), mean growth velocity v (ﬁlled triangles),aﬃnity A (ﬁlled squares), free-energy driving force ǫ (opensquares), and mean exonuclease rate r x (dotted squares) ver-sus nucleotide concentration. The dots are the results of nu-merical simulations, the solid lines of the Markov-chain model,and the dashed lines of the Bernoulli-chain model. concentration, the polymerase and exonuclease rates be-come equal to the value given in Table II, which is equalto the exonuclease rate constant expected from Eq. (41).Accordingly, the thermodynamic entropy remains posi-tive at the value d i Sdt (cid:12)(cid:12) ≃ . R s − estimated by bothEqs. (44) and (65). Hence, the aﬃnity and the free-energy driving force per nucleotide both diverge to inﬁn-ity if the velocity goes to zero. As for the T7 DNA poly-merase, the exonuclease rate decreases to very small val-ues in the full speed regime where exonuclease-deﬁcientbehavior is recovered. In this regime, the growth veloc-ity becomes equal to the polymerization rate, reachingthe maximal value v ∞ ≃ r p ∞ ≃

34 nt/s, which is smallerthan for the T7 DNA polymerase, while the entropy pro-duction, the aﬃnity, and the free-energy driving forceincrease logarithmically with the dNTP concentration.Figure 6 shows the exonuclease rate, the conditionalShannon disorder, and the error probability correspond-ing to the quantities in Fig. 5. For this exo − polymerase,the Markov- and Bernoulli-chain models are simpliﬁca-tions of the full kinetics simulated by Gillespie’s algo-rithm, which explains that the solid and dashed linesdeviate from the dots for the exonuclease rate r x at largevalues of dNTP concentration in Fig. 6. As [dNTP] in-creases, the rate decreases as r x ≃ . × − [dNTP] − according to Eq. (74) of the Markov-chain model, whilea faster decrease as r x ≃ . × − [dNTP] − is givenby Eq. (48) of the Bernoulli-chain model. Since the ex-onuclease activity decreases, the growth velocity becomes1 -10 -8 -6 -4 -2 -10 -8 -6 -4 -2 -10 -8 -6 -4 -2 r x ( s - ) D , h ( n t - ) [dNTP] (M) v = r x D h FIG. 6: Exo + human mitochondrial DNA polymerase: Meanexonuclease rate r x (dotted squares), conditional Shannon dis-order per nucleotide D (ﬁlled squares), and error probability η (ﬁlled circles) versus nucleotide concentration. The dotsare the results of numerical simulations, the solid lines of theMarkov-chain model, and the dashed lines of the Bernoulli-chain model. The long-dashed line is the behavior of Eq. (72). equal to the polymerase rate as large values of dNTP con-centration.Most remarkably, the error probability and the condi-tional Shannon disorder per nucleotide are much reducedin the Markov-chain model (dots and solid lines) withrespect to the Bernoulli one (dashed lines), as for theexo + T7 DNA polymerase. At full speed, the error prob-ability saturates at its maximal value η ∞ ≃ . × − approximated by Eqs. (47) and (75). However, the er-ror probability is signiﬁcantly smaller in the Markov-than the Bernoulli-chain model at low dNTP concentra-tion. Indeed, Eq. (72) of the Markov-chain model pre-dicts that η ≃ .

28 [dNTP] giving the long-dashed linedepicted in Fig. 6 in agreement with the simulations.This observed behavior of the error probability is welldescribed by Eqs. (79)-(80) of the Markov-chain model.The Bernoulli-chain model fails to generate this reduc-tion of the error probability because it is not sensitiveto the previously incorporated nucleotide. The Markov-chain model is able to take into account the slowing downof the polymerase after a mismatch thanks to the dis-tinction between the rate constants k p+c | i = 0 . − ≪ k p+c | c = 37 . − ≃ k p+c = 34 . − , which is not pos-sible in the Bernoulli-chain model. For the same rea-son, the crossover to the regime with a much lower er-ror probability happens in the Markov-chain model at alarger dNTP concentration than in the Bernoulli because K xM ≃ . × − M ≫ K xB ≃ . × − M. As seen inFig. 6, the error probability indeed keeps its full speed value to the much lower concentration [dNTP] ≃ K xB in the Bernoulli-chain model (dashed line in Fig. 4)than in the Markov-chain one where the drop in the er-ror probability already happens for concentrations be-low [dNTP] ≃ K xM . For physiological dNTP concentra-tions (5), the error probability is thus again two decadessmaller thanks to the dependence of the kinetics on thepreviously incorporated nucleotide, which is the featureof the Markov-chain model. VII. CONCLUSION

In the present paper, the mechanism of exonucleaseproofreading is analyzed in detail using experimentalobservations from biochemistry [2–18] and theoreticalmethods already applied to exonuclease-deﬁcient DNApolymerases in the companion paper [1].An essential aspect of exonuclease proofreading is thesensitivity of the enzymatic kinetics to mismatches in thebase pairing of the previously incorporated nucleotide.Such mismatches induce a slowing down of the poly-merase activity, allowing the DNA strand to jump tothe exonuclease domain of the enzyme where the mis-incorporated nucleotide is cleaved out [5]. Such a mecha-nism would not be possible if the enzyme was memorylessand its rates only depended on the currently incorpo-rated nucleotide, in which case the copy growing on aBernoullian template would be itself a Bernoulli chain.If the rates also depend on the previously incorporatednucleotide, the copy forms a Markov chain even if thetemplate is Bernoullian. A comparison has thus beensystematically carried out between the Bernoulli- andMarkov-chain models. For exo − DNA polymerases, bothtypes of models behave similarly (except close to equilib-rium). In contrast, the diﬀerence between both modelsis drastic in the presence of the exonuclease activity. Ifthe error probability keeps constant down to low valuesof dNTP concentration in the Bernoulli-chain model, itdecreases signiﬁcantly in the Markov-chain model, show-ing how important is the enzymatic memory of previousmismatches to perform exonuclease proofreading.In Fig. 7, the error probability is plotted as a functionof dNTP concentration for the diﬀerent exo − and exo + DNA polymerases studied in the companion and presentpapers [1]. The results of numerical simulations are de-picted as dots and those of the Markov-chain model assolid lines. The analysis conﬁrms that the replicationﬁdelity is lower for the human mitochondrial DNA poly-merase than for the T7 DNA polymerase. Thanks tothe dependence of the rates on the previously incorpo-rated nucleotide taken into account in the Markov-chainmodel, a large amount of proofreading is achieved bythe exonuclease activity. For physiological dNTP con-centrations, we see in Fig. 7 that the error probabilityundergoes a hundred-fold lowering with respect to thevalue provided by the kinetic ampliﬁcation [25, 26] of thelone polymerase activity at high dNTP concentration in2 -9 -8 -7 -6 -5 -4 -3 -2 -10 -8 -6 -4 -2 e rr o r p r ob a b ilit y ( n t - ) [dNTP] (M) T7 DNA pol. exo + T7 DNA pol. exo - Hum. mit. DNA pol. exo + Hum. mit. DNA pol. exo - FIG. 7: Error probability versus dNTP concentration for theexo − (open symbols) and exo + (ﬁlled symbols) DNA poly-merases of T7 viruses and human mitochondria. The dotsdepict the results of numerical simulations and the lines thoseof the Markov-chain model. The long-dashed lines show thebehavior of Eq. (72). For the exo − polymerases, the plusesdepict the equilibrium values of the error probability. agreement with experimental observations [3–5, 8, 9]. Ifthe error probability is of the order of η ≃ − for an ef-ﬁcient polymerase activity, it will reach values as small as η ≃ − with exonuclease proofreading. This hundred-fold reduction of the error probability is also consistentwith the drop in mutation rate from RNA viruses hav-ing polymerases devoid of exonuclease activity, to DNAviruses equipped with exonuclease activity [31]. The de-pendence of the error probability on dNTP concentra-tion is explained thanks to Eq. (79), which describes thecrossover from its maximal value if [dNTP] ≫ K xM , downto lower values if [dNTP] ≪ K xM . In this range of dNTPconcentrations, which contains the physiological condi-tions, the error probability is proportional to the dNTPconcentration and it decreases with the concentration.The crossover concentration K xM given by Eq. (80) takesa large value precisely thanks to the fact that the poly-merization rate is slowed down after the incorporation ofan incorrect base pair by k p+c | i ≪ k p+c | c , while the corre- sponding Michaelis-Menten dissociation is enhanced be-cause K c | i ≫ K c | c . Accordingly, the dependence of thekinetic constants on both the current and previous pair-ings in the Markov-chain model allows the error proba-bility to decrease with the dNTP concentration. Conse-quently, exonuclease proofreading has the advantage overthe kinetic ampliﬁcation of the polymerase activity thatthe replication ﬁdelity tends to increase if the pool ofnucleotides decreases. The thermodynamic cost is thatthe enzyme remains away from equilibrium even if thegrowth velocity vanishes due to the increase of dNMPcleavage. The dependence (79) of the error probabilityon nucleotide concentration is a prediction of theory andcould be tested experimentally.The present theory and methods can be applied as wellto other DNA polymerases in order to understand in de-tail how ﬁdelity depends on important control parame-ters such as the substrate concentrations. Furthermore,the analytical results should allow us to calculate the er-ror probability of exonuclease proofreading using moderncomputational approaches. Indeed, the rate and dissoci-ation constants in Eqs. (75) or (80) can be determinedby using Arrhenius’ law of kinetics in terms of the free-energy landscape of the enzyme-DNA complex along itsreaction pathway and conformational changes thanks tothe computational methods [42–44].An open issue is that an error probability of about 10 − for exonuclease proofreading, as evaluated in the presentpaper for eﬃcient DNA polymerases under physiologi-cal conditions, would limit the genome size to 10 nu-cleotides according to the theory of quasispecies by Eigenand Schuster [28–30]. The fact is that other proofreadingmechanisms such as the postreplication DNA mismatchrepair [32–34] are in action to further reduce the errorprobability in higher eukaryotes having genome sizes aslarge as 10 nucleotides [31]. Acknowledgments

The author is grateful to D. Andrieux, D. Bensimon,J. England, D. Lacoste, Y. Rondelez, and S. A. Ricefor helpful discussions, remarks, and support duringthe elaboration of this work. This research is ﬁnan-cially supported by the Universit´e Libre de Bruxelles,the FNRS-F.R.S., and the Belgian Federal Governmentunder the Interuniversity Attraction Pole project P7/18“DYGEST”.3

Appendix A: Equations for kinetics and thermodynamics

For the reaction network depicted in Fig. 1, the kinetic equations ruling the time evolution of the probabilities thatthe growing copy has respectively the sequences m · · · m l and m · · · m l m l+1 P are given by ddt P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) = k p + mlml − nl nl − P t (cid:18) m · · · m l P n · · · n l n l +1 · · · (cid:19) + X m l +1 k − ml +1 mlnl +1 nl P t (cid:18) m · · · m l m l +1 P n · · · n l n l +1 n l +2 · · · (cid:19) + k x + mlml − nl nl − [ m l ] P t (cid:18) m · · · m l − n · · · n l − n l · · · (cid:19) + X m l +1 k x − ml +1 mlnl +1 nl P t (cid:18) m · · · m l m l +1 n · · · n l n l +1 n l +2 · · · (cid:19) −  k p − mlml − nl nl − [P] + X m l +1 k + ml +1 mlnl +1 nl [ m l +1 P] + k x − mlml − nl nl − + X m l +1 k x + ml +1 mlnl +1 nl [ m l +1 ]  P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) (A1)and ddt P t (cid:18) m · · · m l m l +1 P n · · · n l n l +1 n l +2 · · · (cid:19) = k + ml +1 mlnl +1 nl [ m l +1 P] P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) + k p − ml +1 mlnl +1 nl [P] P t (cid:18) m · · · m l m l +1 n · · · n l n l +1 n l +2 · · · (cid:19) − (cid:18) k − ml +1 mlnl +1 nl + k p + ml +1 mlnl +1 nl (cid:19) P t (cid:18) m · · · m l m l +1 P n · · · n l n l +1 n l +2 · · · (cid:19) (A2)in terms of the rates (10)-(15) for l = 1 , , , ... . In Eq. (A1) for the probability of a copy ending with a monophosphategroup, the gain terms are due to polymerization, dNTP dissociation, dNMP binding, and dNMP dissociation, while theloss terms are due to depolymerization, dNTP binding, dNMP dissociation, and dNMP binding. In Eq. (A2) for theprobability of a copy ending with a triphosphate group, the gain terms are due to dNTP binding and depolymerization,and the loss terms to dNTP dissociation and polymerization. For l = 1 in Eq. (A1) and l = 0 in Eq. (A2), the symbols m and n stand for the empty set: m = n = ∅ . For l = 0, Eq. (A1) should be replaced by ddt P t (cid:18) ∅ n n · · · (cid:19) = X m k − m ∅ n ∅ P t (cid:18) m P n n · · · (cid:19) + X m k x − m ∅ n ∅ P t (cid:18) m n n · · · (cid:19) − X m k + m ∅ n ∅ [ m P] + X m k x + m ∅ n ∅ [ m ] ! P t (cid:18) ∅ n n · · · (cid:19) . (A3)The equations (A1)-(A3) preserve the total probability. We notice that the sequence probabilities are proportional tothe corresponding concentrations in a dilute solution.Under the assumption (16), the kinetic equations (A1)-(A2) reduce to the following kinetic equation ddt P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) = W + mlml − nl nl − P t (cid:18) m · · · m l − n · · · n l − n l · · · (cid:19) + X m l +1 W − ml +1 mlnl +2 nl +1 nl P t (cid:18) m · · · m l m l +1 n · · · n l n l +1 n l +2 · · · (cid:19) −  W − mlml − nl +1 nl nl − + X m l +1 W + ml +1 mlnl +1 nl  P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) , (A4)ruling the sum of probabilities P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) ≡ P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19) + X m l +1 P t (cid:18) m · · · m l m l +1 P n · · · n l n l +1 n l +2 · · · (cid:19) . (A5)The rates of Eq. (A4) are given by Eqs. (18)-(19). The total probability is also preserved by the kinetic equations (A4).The rates appearing in Eqs. (8) and (9) can be expressed in terms of the rates and probabilities ruled of the kineticequation (A4) as r ρ = ν ρ X l X m ··· m l (cid:20) W ρ + mlml − nl +1 nl nl − P t (cid:18) m · · · m l − n · · · n l − n l · · · (cid:19) − W ρ − mlml − nl +1 nl nl − P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19)(cid:21) (A6)with the stoichiometric coeﬃcient ν p = +1 for the polymerase activity ρ = p, and the stoichiometric coeﬃcient ν x = − ρ = x.4Now, the thermodynamic entropy production is given by [45–50]1 R d i Sdt = X ρ =p , x X l X m ··· m l (cid:20) W ρ + mlml − nl nl − P t (cid:18) m · · · m l − n · · · n l − n l · · · (cid:19) − W ρ − mlml − nl +1 nl nl − P t (cid:18) m · · · m l n · · · n l n l +1 · · · (cid:19)(cid:21) × ln W ρ + mlml − nl nl − P t (cid:16) m ··· m l − n ··· n l − n l ··· (cid:17) W ρ − mlml − nl +1 nl nl − P t (cid:16) m ··· m l n ··· n l n l +1 ··· (cid:17) ≥ , (A7)which includes the contributions of polymerase and exonuclease activities. Appendix B: Thermodynamics of the Bernoulli-chain model

For the Bernoulli-chain model deﬁned in terms of the rates (31)-(32) besides the polymerization-depolymerizationrates, the thermodynamic entropy production (A7) becomes1

R d i Sdt = X ρ =p , x (cid:26)(cid:2) W ρ +c − W ρ − c (1 − η ) (cid:3) ln W ρ +c W ρ − c (1 − η ) + (cid:0) W ρ +i − W ρ − i η (cid:1) ln 3 W ρ +i W ρ − i η (cid:27) ≥ . (B1)Separating the terms with the error probability in the logarithm and using Eq. (39) for the mean growth velocity v ,we obtain the expression (29) for the thermodynamic entropy production in terms of the free-energy driving force ǫ = 1 v X ρ =p , x (cid:26)(cid:2) W ρ +c − W ρ − c (1 − η ) (cid:3) ln W ρ +c W ρ − c + (cid:0) W ρ +i − W ρ − i η (cid:1) ln W ρ +i W ρ − i (cid:27) ≥ , (B2)and the conditional Shannon disorder per nucleotide D ( ω | α ) = − (1 − η ) ln(1 − η ) − η ln η . (B3) Appendix C: Thermodynamics of the Markov-chain model

For the Markov-chain model, the thermodynamic entropy production (A7) is given by1

R d i Sdt = X ρ =p , x n h W ρ +c | c µ (c) − W ρ − c | c µ (c | c) µ (c) i ln W ρ +c | c W ρ − c | c µ (c | c)+3 h W ρ +c | i µ (i) − W ρ − c | i µ (i | c) µ (c) i ln W ρ +c | i µ (i) W ρ − c | i µ (i | c) µ (c)+3 h W ρ +i | c µ (c) − W ρ − i | c µ (c | i) µ (i) i ln W ρ +i | c µ (c) W ρ − i | c µ (c | i) µ (i)+9 h W ρ +i | i µ (i) − W ρ − i | i µ (i | i) µ (i) i ln W ρ +i | i W ρ − i | i µ (i | i) ≥ , (C1)in terms of the conditional probabilities µ ( p ′ | p ) and tip probabilities µ ( p ) of the Markov chain with p, p ′ = c or i..5 [1] P. Gaspard, paper I .[2] S. S. Patel, I. Wong, and K. A. Johnson, Biochem. ,511 (1991).[3] I. Wong, S. S. Patel, and K. A. Johnson, Biochem. ,526 (1991).[4] M. J. Donlin, S. S. Patel, and K. A. Johnson, Biochem. , 538 (1991).[5] K. A. Johnson, Annu. Rev. Biochem. , 685 (1993).[6] Y.-C. Tsai and K. A. Johnson, Biochem. , 9675 (2006).[7] A. A. Johnson and K. A. Johnson, J. Biol. Chem. ,38090 (2001).[8] A. A. Johnson and K. A. Johnson, J. Biol. Chem. ,38097 (2001).[9] M. J. Longley, D. Nguyen, T. A. Kunkel, and W. C.Copeland, J. Biol. Chem. , 38555 (2001).[10] H. R. Lee and K. A. Johnson, J. Biol. Chem. , 36236(2006).[11] L. A. Loeb and T. A. Kunkel, Annu. Rev. Biochem. ,429 (1982).[12] H. Echols and M. F. Goodman, Annu. Rev. Biochem. ,477 (1991).[13] T. A. Kunkel and K. Bebenek, Annu. Rev. Biochem. ,497 (2000).[14] C. A. Sucato, T. G. Upton, B. A. Kashemirov, J. Osuna,K. Oertell, W. A. Beard, S. H. Wilson, J. Flori´an, A.Warshel, C. E. McKenna, and M. F. Goodman, Biochem. , 870 (2008).[15] M. P. Roettger, M. Bakhtina, and M.-D. Tsai, Biochem. , 9718 (2008).[16] L. Zhang, J. A. Brown, S. A. Newmister, and Z. Suo,Biochem. , 7492 (2009).[17] L. M. Dieckman, R. E. Johnson, S. Prakash, and M. T.Washington, Biochem. , 7344 (2010).[18] R. J. Bauer, M. T. Begley, and M. A. Trakselis, Biochem. , 1996 (2012).[19] D. Andrieux and P. Gaspard, Proc. Natl. Acad. Sci. USA , 9516 (2008).[20] D. Andrieux and P. Gaspard, J. Chem. Phys. ,014901 (2009).[21] P. Gaspard and D. Andrieux, J. Chem. Phys. ,044908 (2014).[22] I. R. Lehman, M. J. Bessman, E. S. Simms, and A. Ko-rnberg, J. Biol. Chem. , 163 (1958).[23] M. J. Bessman, I. R. Lehman, E. S. Simms, and A. Ko-rnberg, J. Biol. Chem. , 171 (1958).[24] D. Brutlag and A. Kornberg, J. Biol. Chem. , 241 (1972).[25] J. J. Hopﬁeld, Proc. Natl. Acad. Sci. USA , 4135(1974).[26] J. Ninio, Biochimie , 587 (1975).[27] C. H. Bennett, Biosystems , 85 (1979).[28] M. Eigen and P. Schuster, Naturwissenschaften , 541(1977).[29] M. Eigen and P. Schuster, Naturwissenschaften , 7(1978).[30] M. Eigen and P. Schuster, Naturwissenschaften , 341(1978).[31] S. Gago, S. F. Elena, R. Flores, and R. Sanju´an, Science , 1308 (2009).[32] T. Lindahl, Proc. Natl. Acad. Sci. USA , 3649 (1974).[33] A. Sancar and W. D. Rupp, Cell , 249 (1983).[34] R. R. Iyer, A. Pluciennik, V. Burdett, and P. L. Modrich,Chem. Rev. , 302 (2006).[35] F. Cady and H. Qian, Phys. Biol. , 036011 (2009).[36] C. A. S. A. Minetti, D. P. Remeta, H. Miller, C. A.Gelfand, G. E. Plum, A. P. Grollman, and K. J. Bres-lauer, Proc. Natl. Acad. Sci. USA , 14719 (2003).[37] P. A. Frey and A. Arabshahi, Biochem. , 11307 (1995).[38] T. W. Traut, Mol. Cell. Biochem. , 1 (1994).[39] J. K. Heinonen, Biological Role of Inorganic Pyrophos-phate (Springer, New York, 2001).[40] D. T. Gillespie, J. Comput. Phys. , 403 (1976).[41] D. T. Gillespie, J. Phys. Chem. , 2340 (1977).[42] J. Flori´an, M. F. Goodman, and A. Warshel, Proc. Natl.Acad. Sci. USA , 6819 (2005).[43] Y. Xiang, M. F. Goodman, W. A. Beard, S. H. Wilson,and A. Warshel, Proteins , 231 (2008).[44] R. Rucker, P. Oelschlaeger, and A. Warshel, Proteins ,671 (2010).[45] I. Prigogine, Introduction to Thermodynamics of Ir-reversible Processes (Charles C. Thomas Publishers,Springﬁeld IL, 1955).[46] J. Schnakenberg, Rev. Mod. Phys. , 571 (1976).[47] G. Nicolis, Rep. Prog. Phys. , 225 (1979).[48] Luo Jiu-Li, C. Van den Broeck, and G. Nicolis, Z. Phys.B: Condens. Matter , 165 (1984).[49] D.-Q. Jiang, M. Qian, and M.-P. Qian, Mathemati-cal Theory of Nonequilibrium Steady States (Springer,Berlin, 2004).[50] P. Gaspard, J. Chem. Phys.120