[PDF] Edge-effects dominate copying thermodynamics for finite-length molecular oligomers

Abstract

Living systems produce copies of information-carrying molecules such as DNA by assembling monomer units into finite-length oligomer (short polymer) copies. We explore the role of initiation and termination of the copy process in the thermodynamics of copying. By splitting the free-energy change of copy formation into informational and chemical terms, we show that copy accuracy plays no direct role in the overall thermodynamics. Instead, it is thermodynamically costly to produce outputs that are more similar to the oligomers in the environment than sequences obtained by randomly sampling monomers. Copy accuracy can be thermodynamically neutral, or even favoured, depending on the surroundings. Oligomer copying mechanisms can thus function as information engines that interconvert chemical and information-based free energy. Hard thermodynamic constraints on accuracy derived for infinite-length polymers instead manifest as kinetic barriers experienced while the copy is template-attached. These barriers are easily surmounted by shorter oligomers.

Full PDF

EEdge-eﬀects dominate copying thermodynamics for ﬁnite-length molecular oligomers

Jenny M. Poulton and Thomas E. Ouldridge ∗ Departmant of Bioengineering and Centre for Synthetic Biology,Imperial College London, London, SW7 2AZ, UK. (Dated: May 25, 2020)Living systems produce copies of information-carrying molecules such as DNA by assemblingmonomer units into ﬁnite-length oligomer (short polymer) copies. We explore the role of initiationand termination of the copy process in the thermodynamics of copying. By splitting the free-energychange of copy formation into informational and chemical terms, we show that copy accuracy playsno direct role in the overall thermodynamics. Instead, it is thermodynamically costly to produce out-puts that are more similar to the oligomers in the environment than sequences obtained by randomlysampling monomers. Copy accuracy can be thermodynamically neutral, or even favoured, depend-ing on the surroundings. Oligomer copying mechanisms can thus function as information enginesthat interconvert chemical and information-based free energy. Hard thermodynamic constraints onaccuracy derived for inﬁnite-length polymers instead manifest as kinetic barriers experienced whilethe copy is template-attached. These barriers are easily surmounted by shorter oligomers.

Information transfer is the essence of the central dogmaof molecular biology. The biopolymers created duringDNA replication, RNA transcription and protein trans-lation have speciﬁc monomer sequences that direct theirbiological function; they are created by copying of se-quence information from a template polymer [1]. In orderto perform its biological function, the copy must detachfrom the template that catalyses its growth.Building synthetic copying systems is diﬃcult evenfor moderate-length oligomer (short polymer) templates.The most successful examples rely on non-chemical ortime-varying conditions; the copy ﬁrst forms on the tem-plate and then separates via mechanical scission [2, 3],heat [4], or a change in chemical conditions [5]. Dueto the challenge of separating products from templates,synthetic examples of the biologically-relevant context inwhich copying is chemically-driven and autonomous haveonly involved dimers and trimers [6, 7].This diﬃculty in recreating a fundamental biologicalphenemenon in a minimal synthetic context suggests agap in our understanding. The need to separate the copyand template fundamentally changes both the detailedmechanics and the overall thermodynamics of the pro-cess, removing free-energetic biases towards accuracy andlimiting sequence discrimination to a purely kinetic phe-nomenon [8, 9]. Ref. [9] considered step-by-step growthof a copy polymer that separates from behind its growingtip, like a nascent RNA or polypeptide chain [1]. Becauseeach copy-template bond must eventually be disrupted,accurate copies were shown to be out of equilibrium andrequire an excess polymerisation free energy above thatrequired to grow an equilibrium, unbiased polymer. Fur-thermore, for ﬁnite binding free energies, this excess poly-merisation free energy could not be fully transformed intofree energy stored in sequence information by the modelsystem. Producing an accurate copy was therefore seento be necessarily thermodynamically irreversible. ∗ [email protected] These results were derived for inﬁnite-length polymers,similar to previous analyses that neglect separation en-tirely [10–17]. Initiation and termination of the process,when the ﬁrst monomer attaches and the completed copyﬁnally detaches from the template, were ignored. Whilethis approximation might be reasonable for some longbiopolymers in vivo , it is a poor approximation for thecopying of shorter oligomers and dimers. Given that syn-thetic systems are currently limited to short oligomers,and that early life is likely to have created short oligomersbefore the origin of complex enzyme-based copying ma-chinery, it is worth studying initiation and terminationin more detail. In this letter we probe the consequencesof the “edge-eﬀects” of initial attachment and ﬁnal de-tachment on the copying of oligomer sequences.We ﬁrst consider the free-energy change for the produc-tion of a single ﬁnite-length copy under constant externalconditions, separating it into chemical and informationalterms. Using dimerisation as an example, we show thatthe thermodynamic constraints on information transferare fundamentally altered relative to inﬁnite-length poly-mers: in general there is no minimal cost to accuratelyreproducing the template sequence. Although this resultholds for polymers of arbitrary ﬁnite length, we nonethe-less observe a gradual kinetic cross-over to the previouslypredicted constraints on accuracy [9] for longer oligomers.

Model of oligomerisation.

Fig. 1 shows a dimerisationsystem, a prototype for a broader class of oligomerisationsystems. A solvated template dimer carries informationin its sequence of monomer units; in this case, the se-quence is 1,1. The template is coupled to large bathsof monomers and oligomers (in this case dimers) of adistinct second type of molecule, like a DNA templatein a bath of RNA nucleotides and oligomers. This sec-ond type of molecule also comes in multiple varieties – inthis case two – and can interact with the template in asequence-speciﬁc way. In Fig. 1, we propose a particularthermodynamically-consistent model for dimer produc-tion. Dimers from the baths can also be broken downinto their component monomers by the template. a r X i v : . [ q - b i o . S C ] M a y FIG. 1.

Minimal model of oligomerisation illustrated withdimers.

A template (squares) interacts with baths ofmonomers and dimers of a second species (circles). Monomerscan bind to the template and dimerize, while dimers bindingto the template can be destroyed or interconverted. The stan-dard dimerisation free energy is ∆ G −(cid:9)− dim for all sequences, butmatching and non-matching varieties of monomer bind to thetemplate with strengths ∆ G r and ∆ G w , respectively, allowingselectivity. Rate constants k cat and k on deﬁne the dynamics. Thermodynamics of oligomerisation.

The total free-energy change of the baths upon creating a single dimerof sequence s is ∆ G ( s ) = ∆ G −(cid:9)− dim + (ln [ s ] − ln [ s ][ s ]),where ∆ G −(cid:9)− dim is the sequence-independent free energyfor forming a backbone bond under standard conditions.The concentrations are also measured relative to thesestandard conditions, and all free energies are in unitsof k B T . Since the template itself acts catalytically [8],its free-energy is unchanged and ∆ G ( s ) is the total free-energy change of the process. For simplicity, we do notallow for kinetic proofreading cycles [18] in our analysis.For a generalized system with oligomers of length | s | ,∆ G ( s ) = ( | s | − G −(cid:9)− dim +  ln [ s ] − ln | s | (cid:89) i =1 [ s i ]  . (1)Let J ( s ) be the expected net rate at which sequence s isproduced by the the system in steady state of the tem-plate. The average rate of change of free energy is then∆ ˙ G = (cid:80) s J ( s )∆ G ( s ). We deﬁne q ( s ) = J ( s ) /J tot , andthe following probability distributions: p ( s ) = [ s ] / [ S tot ],the probability of picking an oligomer of sequence s fromthe oligomers with total concentration [ S tot ]; m ( s ) =[ s ] / [ M tot ], the probability of picking a monomer of type s from the monomers with total concentration of [ M tot ]; and t ( s ) = (cid:81) i m ( s i ), which corresponds to the proba-bility of the sequence s occurring by selecting monomersrandomly from the monomer pools. In these terms,∆ ˙ G = J tot (cid:32)(cid:32)(cid:88) s q ( s ) | s | (cid:33) − (cid:33) ∆ G −(cid:9)− dim + J tot (cid:32)(cid:88) s q ( s ) ln p ( s ) t ( s ) + (cid:88) s q ( s ) ln [ S tot ][ M tot ] | s | (cid:33) . This expression can be re-written as∆ ˙ G = J tot ∆ G chem + J tot ∆ G inf , (2)with ∆ G chem = ( | s | − G −(cid:9)− dim + ln [ S tot ] / [ M tot ] | s | , ∆ G inf = (cid:88) s q ( s ) ln p ( s ) t ( s ) , (3)assuming for simplicity that all oligomers are of the samelength. The ﬁrst term in Eq. 2 is the average chem-ical free-energy change of oligomerisation ignoring se-quence, multiplied by the net rate of oligomer produc-tion. The second term is information-theoretic: for pos-itive net production of all oligomers J ( s ) = q ( s ) J tot > q ( s ) is the probability of picking a sequence s from thepool of net products and ∆ G inf = D ( q || t ) − D ( q || p ),where D ( q || p ) = (cid:80) s q ( s ) log q ( s ) p ( s ) is the Kullback-Leiblerdivergence between q ( s ) and p ( s ). ∆ G inf reﬂects the se-quence statistics of monomer and oligomer baths, and thesequence-dependence of net oligomer production. Thissplitting into chemical and informational terms holds forarbitrary oligomer lengths and sequence alphabets, andis the ﬁrst result of this letter. An explicit model of dimerisation to explore the con-sequences of the general result in Eq. 2 . We consider amodel, shown in Fig. 1, with two varieties of monomerin the template and copy. Here, matching and non-matching monomers bind to the template with the samerate constant k on , but since ∆ G r < ∆ G w mismatchesdetach faster. The free energy released by the dimerisa-tion of monomers is used to weaken the bonds betweendimer and template, as was found to be optimal in [19].∆ G −(cid:9)− dim thus appears in the dimer oﬀ rate, with the on-template dimerisation/undimerisation having a ﬁxed rate k cat . Throughout this letter k cat = k on = 1. Since we usean unbiased m ( s ) in all cases, the symmetry of the prob-lem gives identical physics for all template sequences; weshall use 1,1 for clarity. We assume constant bath con-centrations, and calculate the ﬂux J ( s ) in steady stateby analysing a single template as a Markov process (SI). Is there necessarily a thermodynamic cost to accu-racy?

The informational term in Eq. 2, and the resultsfor inﬁnite-length copies in Ref. [9] suggest a thermody-namic constraint on the accuracy of copying for a givenchemical drive. Let us ﬁrst consider a template of se-quence 1,1 coupled to baths where all oligomers have the

FIG. 2.

A dimer copying model shows ﬁnite accuracy in theequilibrium limit . (a) The net production rate J s againstchemical driving − ∆ G chem for each of the four dimers 1,1,1,2, 2,1 and 2,2 relative to a template of sequence 1,1, with∆ G disc = ∆ G w − ∆ G r = 2 , ∆ G r + ∆ G w = 0 and unbiasedmonomer and polymer baths. All J s pass through zero atthe equilibrium point ∆ G chem = 0 but quickly separate when∆ G chem (cid:54) = 0 (inset). J , and J , overlap. At larger driv-ing, accurate copies are preferentially produced. (b) Errorfraction at which incorrect type 2 monomers are incorpo-rated into dimers, (cid:15) , against ∆ G chem for various ∆ G disc and∆ G r + ∆ G w = 0. The system has ﬁnite accuracy, (cid:15) (cid:54) = 0 .

5, as∆ G chem → − , with the accuracy dependent on ∆ G disc . same concentration [ S tot ] /

4, and all monomers have thesame concentration [ M tot ] /

2. Without loss of general-ity we may choose our standard concentration so that[ S tot ] / [ M tot ] = 1, and ∆ G chem = ∆ G −(cid:9)− dim .In Fig. 2a we plot the net production rate of eachdimer as a function of ∆ G chem at set ∆ G disc . Equilib-rium is at ∆ G chem = 0, with net creation for all dimersfor ∆ G chem < G chem > (cid:15) = (cid:0) J , + ( J , + J , ) (cid:1) /J tot , theproportional rate at which incorrect monomers are incor-porated into dimers. It is noticeable that while at exactly∆ G chem = 0, (cid:15) is undeﬁned, as ∆ G chem → − we obtain (cid:15) < .

5, implying non-zero accuracy. In the limit where∆ G r and ∆ G w are increased but ∆ G disc is held constant,the graph ﬂattens, with less of an increase in (cid:15) close tozero. Perfect accuracy, (cid:15) →

0, is obtained in equilibriumfor high ∆ G disc , as shown in SI2.This result seems to contradict Ref. [9]. The minimalvalue of − ∆ G chem required for growth gives non-zero ac-curacy, and reversible processes can create a low entropydimer sequence distribution at ﬁnite discrimination freeenergy ∆ G disc . So is there no thermodynamic cost to ac-curacy? Consider ∆ G inf in Eq. 3. In this case, p ( s ) and t ( s ) are unbiased and equal, and thus ∆ G inf = 0 for any q ( s ) - even if only accurate copies are produced. When-ever the surrounding oligomers and monomers have no(or the same) sequence bias there is no extra thermody-namic cost to producing sequences of arbitrary accuracy.∆ G inf is generally non-zero, however, for systems with p ( s ) (cid:54) = t ( s ). ∆ G inf is positive if the system produces se-quences that are common in p ( s ) and rare in t ( s ). The al-ternative representation ∆ G inf = D ( q || t ) − D ( q || p ) makesthis fact particularly clear. Accuracy is therefore not di- rectly limited by thermodynamics in a general descrip-tion of the full process of oligomer copying. Instead,there is a thermodynamic cost to producing sequence dis-tributions q ( s ) that are closer to the oligomer baths andfurther from the monomer baths. This argument is thesecond main result of this letter. Template copying as an inherently non-equilibrium in-formation engine.

For systems with p ( s ) (cid:54) = t ( s ), thereis no equilibrium point at which all ﬂuxes are zero be-cause the baths are out of equilibrium with each other.There is instead a range of ∆ G chem over which J tot = 0could occur, depending on which sequences best coupleto the template. The most positive possible ∆ G chem atwhich J tot = 0 occurs when a system speciﬁcally pro-duces the sequence s min , where s min minimises t ( s ) /p ( s ).The most negative is when the system speciﬁcally pro-duces the sequence s max , where s max maximises t ( s ) /p ( s ).In Fig. 3a, we vary ∆ G disc for a system heavily thermo-dynamically biased towards creating accurate copies bythe baths. When ∆ G disc >

0, and the system is alsokinetically biased towards creating accurate copies and J tot = 0 for a more positive ∆ G chem than if ∆ G disc < q ( s ) is close to q min ( s ) it is possible for a negative ∆ G inf to overcome apositive ∆ G chem . Equally, a more negative ∆ G chem al-lows for a q ( s ) ≈ q max ( s ) with positive ∆ G inf . We canthink of this system as a chemical/information engine, inwhich chemical and information-based free energy can betraded against each other [20, 21].The second law implies ∆ ˙ G is negative. Thus, fromEq. 2, there are three possible regimes for this informa-tional engine, illustrated in Fig. 3(b). If ∆ G chem < G inf > p ( s ) than t ( s ),with an eﬃciency η = ∆ G inf − ∆ G chem ≤

1. In our case, η reaches a maximum of ∼ . p ( s ) is heavily biasedtowards accurate copies of the template and ∆ G chem issmall and negative. In the case where ∆ G chem > G inf <

0, the system generates outputs closer to t ( s )than p ( s ), expending information to compensate for anunfavourable chemical work term. Here the eﬃciency η = ∆ G chem − ∆ G inf ≤ .

15 when p ( s )is heavily biased against accurate copies of the templateand ∆ G chem is small and positive. The ﬁnal case, inwhich both ∆ G chem ≤ G inf ≤

0, is less interest-ing as the system both spends chemical free energy andgenerates outputs close to q ( s ) = q min ( s ). Kinetic convergence on thermodynamic constraints forinﬁnite-length polymers.

Our results apply to polymersof arbitrary ﬁnite length, so are the thermodynamic con-straints derived neglecting initiation and termination – eg. extra chemical work is required for accuracy [9] –invalid? When initiation and termination are neglected,the physics is set by whether the slope of the free-energyproﬁle of template-attached copy growth is favourable

FIG. 3.

Oligomer copying as an information engine with noequilibrium point . (a) Total ﬂux J tot against driving − ∆ G chem for a range of discrimination free energies ∆ G disc = ∆ G w − ∆ G r , with [1] = [2] = 0 .

1, [1 ,

1] = [1 ,

2] = [2 ,

1] = 0 .

001 and[2 ,

2] = 0 .

1. ∆ G disc is varied with ∆ G r + ∆ G w = 0 ﬁxed.The point J tot = 0 at which there is no net dimerisationvaries within the allowed white range despite the fact that theoverall dimerisation free energy is independent of − ∆ G disc .Speciﬁcity for s min = 1 , J tot = 0 to the lower limit, and speciﬁcity for s max = 2 , J tot negative to positive. Here we ﬁx ∆ G disc = 5, [ S tot ] = 1,[ M tot ] = 1, t ( s ) = 0 .

25 for all s and vary ∆ G chem . We fur-ther vary p ( s ) = 0 .

25 + d, . , . , . − d by varying d .There is a regime in which chemical work is used to speciﬁ-cally produce sequences of high free energy and a regime inwhich speciﬁc production of low free energy sequences is usedto drive polymerisation against a chemical load. (negative). As shown in Fig. 4b, higher accuracy gives amore positive slope, and therefore requires a more neg-ative ∆ G −(cid:9)− dim driving. Initiation and termination, how-ever, provide a theoretically unlimited adjustment to theoverall ∆ G . An arbitrarily unfavourable polymerisationprocess on the template can be made favourable with theright concentration of products.If template-attached growth is unfavourable, however,it will be kinetically suppressed by a large free-energybarrier even if oligomer production is favourable overall.Barrier height grows proportionally to oligomer length,suggesting that the thermodynamic constraints derivedfor inﬁnite-length polymers should become kinetic con-straints for suﬃciently long oligomers. To probe thishypothesis we consider a kinetic model for the growthand destruction of oligomers of arbitrary ﬁxed length,using the same parameters as the dimerisation modelof Fig. 1, by extending the model of Ref. [9]. Inthis thermodynamically-consistent model, illustrated inFig. 4a and described in the SI, a copy grows relative toa template, monomer by monomer,in a sequence-speciﬁcway. The copy detaches from the template as it grows; atthe end of each step only the ﬁnal monomer is templatebound. We now allow monomers, drawn from a distri-bution m ( s ), to bind to the ﬁrst template site to initi-ate copying, and oligomers to unbind from the ﬁnal siteto terminate it. The reverse is also possible; oligomers FIG. 4.

Thermodynamic restrictions in an inﬁnite-lengthmodel become kinetic restrictions for longer oligomers. (a)A model of copying of longer oligomers. The oligomer growssequentially and sequence-speciﬁcally on the template, withthe copy oligomer separating from the template behind itsleading edge [9]. We extend the model by allowing bindingand full unbinding for single monomers and full length copiesonly: (1) (cid:10) (2) and (5) (cid:10) (6). (b) Free-energy proﬁle for cre-ation of both accurate copies and unbiased oligomers for themodel in (a). Growth on the template gives a constant slopewith a gradient that depends on accuracy; attachment anddetachment are steps of arbitrary size. (c) Net ﬂux per unitempty template ˜ J tot of oligomer production and (d) the netfraction of error creation for a range of lengths | s | . We vary∆ G −(cid:9)− dim , with [ M tot ] = 1, [ S tot ] chosen to give ∆ G chem = 0at ∆ G −(cid:9)− dim = − p ( s ) = t ( s ) unbiased and ∆ G disc = 8 at∆ G r + ∆ G w = 0. Also plotted in (d) is the thermodynamicconstraint on accuracy for an inﬁnite length polymer, setby requiring a non-positive slope of the free-energy proﬁle, − (cid:15) ln (cid:15) − (1 − (cid:15) ) ln (1 − (cid:15) ) ≥ ∆ G −(cid:9)− dim + ln 2, and the actual errorrate obtained for an inﬁnite-length copy for these parame-ters [9]. Short oligomers overcome kinetic barriers to producecopies with ∆ G −(cid:9)− dim > (cid:15) below the inﬁnite-length limitand thermodynamic constraint. drawn from a sequence distribution p ( s ) can bind to theﬁnal site and monomers unbind from the initial site.We perform stochastic simulations for a range oflengths | s | , varying ∆ G −(cid:9)− dim while keeping all other pa-rameters ﬁxed. We calculate the ﬂux per empty tem-plate ˜ J tot = J tot P empty in the steady state. We use unbi-ased m ( s ) and p ( s ), set [ M tot ] = 1 and choose [ S tot ] sothat ∆ G chem = 0 at ∆ G −(cid:9)− dim = 5; growth is thermody-namically favourable for all sequences when ∆ G −(cid:9)− dim < G −(cid:9)− dim − ln[ M tot ], is positive for∆ G −(cid:9)− dim >

0. For 5 > ∆ G −(cid:9)− dim >

0, on-template polymeri-sation is thus a kinetic barrier to formation of a ther-modynamically favourable product. For short oligomers,large ˜ J tot are maintained nonetheless, but as oligomersget longer the kinetics is slowed and both forward andbackwards contributions to ˜ J tot are vanishingly small.Kinetic barriers not only control overall ˜ J tot , but alsoerror incorporation. The on-template production of anaccurate copy has a more positive slope in its free-energyproﬁle than an unbiased sequence (Fig. 4(a)). For aninﬁnite-length polymer, this fact provides a thermody-namic constraint on accuracy for 0 > ∆ G −(cid:9)− dim > − ln 2[9]. We plot the fraction of net incorporated monomersthat do not match the template, (cid:15) , in Fig. 4(d), alongsidethis thermodynamic constraint for inﬁnite-length poly-mers and the actual error rate obtained in the absence ofattachment and detachment. Short oligomers can over-come kinetic barriers and beat both the thermodynamicbound and the accuracy obtained in the inﬁnite-lengthlimit; longer oligomers approach the limiting behaviourslowly, with signiﬁcant diﬀerences even at length 20.In this letter we have investigated the copying of ﬁnite-length oligomers, with explicit focus on initiation and ter-mination. Copying creates correlations between copy andtemplate sequences, but the mixing of the products witholigomers in the environment means that the informationbetween copy and template sequences is not thermody-namically exploitable [8]. Thus, as we have shown, theoverall thermodynamics of the full copy process does notexplicitly depend on accuracy. Instead, the surround-ing concentrations of oligomers and monomers set thethermodynamic constraints. Creating outputs that re-semble the surrounding oligomers more than the inputmonomers is costly, and arbitrary accuracy can be free- energetically neutral or even actively favourable if theoligomer baths are biased towards other sequences.However, accuracy does play a role indirectly. Firstly,mixing with other oligomers is the ﬁnal step, andtherefore its thermodynamic consequences are irrelevantwhilst a copy is growing on the template. Whilst at-tached, the information between copy and template isexploitable, and creating accurate copies has thermody-namic costs [9]. For an inﬁnite-length polymer, thesecosts set absolute limits on what is possible. For ﬁnitelength oligomers, they instead manifest as kinetic barri-ers; longer oligomers have larger barriers and thus theirkinetics converges on the thermodynamic constraints.Secondly, templates will typically inﬂuence their envi-ronment. If a template sets its own oligomer environ-ment, p ( s ) = q ( s ), ∆ G inf = D ( q || t ), which reduces to theentropy diﬀerence between t ( s ) and q ( s ) if t ( s ) is unbi-ased. In this case no information is lost upon mixing andaccurate copying incurs a cost; the limits derived in [9]hold exactly. In general, there is no reason to supposethat p ( s ) = q ( s ). As in a cell, other templates and diﬀer-ential degradation rates may be relevant in setting p ( s ).Nonetheless, particularly for longer oligomers, sequencescommon in q( s ) are likely to be over-represented in p ( s ).If many identical templates are present, then the envi-ronmental p ( s ) will likely be more strongly peaked, andthe cost of accuracy higher, than in a system with manydistinct templates. Moreover, any template in an envi-ronment dominated by the copies of another will expe-rience a relative thermodynamic advantage. This eﬀectwould act as a form of “rubber banding” in evolution-ary competition among minimal replicators, and favourvirus-like templates invading new environments. [1] B. Alberts, D. Bray, J. Lewis, M. Raﬀ, K. Roberts, andJ. Watson, Molecular Biology of the Cell, 4th ed. (Gar-land Science, New York, 2002).[2] R. Schulman, B. Yurke, and E. Winfree, PROC NATACAD SCI USA , 640 (2012).[3] R. Schulman and E. Winfree, Programmable control ofnucleation for algorithmic self-assembly, in

InternationalWorkshop on DNA-Based Computers (Springer, 2004) p.319.[4] D. Braun, MOD PHYS LETT B , 775 (2004).[5] R. Zhuo, F. Zhou, X. He, R. Sha, N. C. Seeman, andP. M. Chaikin, PROC NAT ACAD SCI USA , 1952(2019).[6] D. Sievers and G. Von Kiedrowski, NATURE , 221(1994).[7] T. A. Lincoln and G. F. Joyce, SCIENCE , 1229(2009).[8] T. E. Ouldridge and P. R. ten Wolde, PHYS REV LETT , 158103 (2017).[9] J. M. Poulton, P. R. ten Wolde, and T. E. Ouldridge,PROC NAT ACAD SCI USA , 1946 (2019).[10] C. H. Bennett, BIOSYSTEMS , 85 (1979). [11] F. Cady and H. Qian, PHYS BIOL , 036011 (2009).[12] D. Andrieux and P. Gaspard, PROC NAT ACAD SCIUSA , 9516 (2008).[13] P. Sartori and S. Pigolotti, PHYS REV LETT ,188101 (2013).[14] P. Sartori and S. Pigolotti, PHYS REV X , 041039(2015).[15] M. Esposito, K. Lindenberg, and C. Van den Broeck, JSTAT MECH-THEORY E , P01008 (2010).[16] M. Ehrenberg and C. Blomberg, BIOPHYS J , 333(1980).[17] M. Johansson, M. Lovmar, and M. Ehrenberg, CURROPIN MICROBIOL , 141 (2008).[18] J. J. Hopﬁeld, PROC NAT ACAD SCI USA , 4135(1974).[19] A. Deshpande and T. E. Ouldridge, arXiv preprintarXiv:1905.00555 (2019).[20] T. McGrath, N. S. Jones, P. R. ten Wolde, and T. E.Ouldridge, PHYS REV L , 028101 (2017).[21] J. M. Horowitz, T. Sagawa, and J. M. R. Parrondo,PHYS REV L , 010602 (2013).[22] D. T. Gillespie, J CHEM PHYS , 2340 (1977). Supplemental Materials: Edge-eﬀects dominate the fundamental thermodynamics ofmolecular copying for ﬁnite-length oligomers

I. FINDING THE FLUXES THROUGH THE NETWORK USING A TRANSITION MATRIX

Fig. 1 deﬁnes a Markov process for the states of a single template of type 1 ,

1. Note that we assume the dimershave a directionality (like biopolymers such as nucleic acids and polypeptides), so that 1,2 is distinct from 2,1. Below,we use “left” to refer to the ﬁrst site and ‘right” to the second, for consistency with Fig. 1.The available states are as follows; • State 0: the empty template (shown in ﬁve diﬀerent locations in Fig. 1, the four outer edges and the centre). • State 1: the template with an incorrect monomer (2) on its left side. • State 2: the template with an incorrect monomer (2) on its right side. • State 3: the template with an correct monomer (1) on its left side. • State 4: the template with an correct monomer (1) on its right side. • State 5: the template with an incorrect monomer (2) on its left side and a correct monomer (1) on its right side. • State 6: the template with an incorrect monomer (2) on its left side and an incorrect monomer (2) on its rightside. • State 7: the template with a correct monomer (1) on its left side and an incorrect monomer (2) on its right side. • State 8: the template with a correct monomer (1) on its left side and a correct monomer (1) on its right side. • State 9: the template with a 2 , • State 10: the template with a 1 , • State 11: the template with a 1 , • State 12: the template with a 2 , K where K xy gives the transitions out of state x and into state y : K =  − X [2] k on [2] k on [1] k on [1] k on , k on , k on , k on , k on k on e ∆ G w − X k on [2] k on k on e ∆ G w − X k on [1] k on k on e ∆ G r − X k on k on k on e ∆ G r − X k on [2] k on k on e ∆ G r k on e ∆ G w − X k cat k on e ∆ G w k on e ∆ G w − X k cat k on e ∆ G r k on e ∆ G w − X k cat k on e ∆ G r k on e ∆ G r − X k cat k on e ‡ k cat − X k on e ∗ k cat − X k on e † k cat − X k on e ∗ k cat − X  , where ∗ = G −(cid:9)− dim + ∆ G r + ∆ G w , † = G −(cid:9)− dim + 2∆ G r and ‡ = ∆ G dim + 2∆ G −(cid:9)− w . X x is the sum over all the other termsin row x . Now we can solve for the steady state π by ﬁnding the appropriate left-eigenvector πK = 0.Next we consider the probability of the system creating either a 1 ,

1, 1 ,

2, 2 , , , , , ,  [1] k on N [2] k on N k on e ∆ Gw N [2] k on N [1] k on N k on e ∆ Gw N [2] k on N [1] k on N k on e ∆ Gr N [1] k on N [2] k on N k on e ∆ Gr N k on e ∆ Gr N k on e ∆ Gw N k cat N k on e ∆ Gw N k on e ∆ Gw N k cat N k on e ∆ Gr N k on e ∆ Gw N k cat N k on e ∆ Gr N k on e ∆ Gr N k cat N k cat N k on e ‡ N k cat N k on e ∗ N k cat N k on e † N

00 0 0 0 k cat N k on e ∗ N  , where N x normalises each row x to unity.The upper left quadrant is the sub matrix M , quantifying the transitions between non-absorbing states. The bottomright quadrant is the trivial matrix for the absorbing states. The upper right quadrant is the sub matrix T whichquantiﬁes transitions into the absorbing states. From here we can ﬁnd the fundamental matrix W = ( I − M ) − , whichquantiﬁes the number of times the system visits a non-absorbing state on the way to being absorbed, given that itstarted in a particular state. From here we can calculate Z = W.T . Z xy gives the probability of eventually reachingabsorbing state 12 + y given a starting state x .The overall rate in steady state at which the system produces dimers with the sequence 1 , y = 2)from monomers is simply Φ create1 , = π (cid:80) x =1 K x Z x . Other rates for production, degradation and interconversion ofdimers can be calculated similarly.Φ create2 , /π = [2] k on Z , + [2] k on Z , + [1] k on Z , + [1] k on Z , , Φ create1 , /π = [2] k on Z , + [2] k on Z , + [1] k on Z , + [1] k on Z , , Φ create2 , /π = [2] k on Z , + [2] k on Z , + [1] k on Z , + [1] k on Z , , Φ create1 , /π = [2] k on Z , + [2] k on Z , + [1] k on Z , + [1] k on Z , , (S1)Ψ destroy2 , /π = 2 k /π on Z , , Ψ destroy2 , /π = 2 k on Z , , Ψ destroy1 , /π = 2 k on Z , , Ψ destroy1 , /π = 2 k on Z , , (S2)Ψ switch1 , → , /π = 2 k on Z , , Ψ switch1 , → , /π = 2 k on Z , , Ψ switch1 , → , /π = 2 k on Z , , Ψ switch1 , → , /π = 2 k on Z , , Ψ switch1 , → , = 2 /π k on Z , , Ψ switch1 , → , /π = 2 k on Z , , Ψ switch2 , → , /π = 2 k on Z , , Ψ switch2 , → , /π = 2 k on Z , , (S3)Ψ switch2 , → , = 2 k on Z , , Ψ switch2 , → , = 2 k on Z , , Ψ switch2 , → , = 2 k on Z , , Ψ switch2 , → , = 2 k on Z , , where Ψ x,y is a rate per unit concentration of x, y , and Φ is an absolute rate.From here we can identify: J (1 ,

1) = Φ create1 , + [1 , switch1 , → , + [2 , switch2 , → , + [2 , switch2 , → , − [1 , (cid:16) Ψ destroy1 , + Ψ switch1 , → , + Ψ switch1 , → , + Ψ switch1 , → , (cid:17) , J (1 ,

2) = Φ create1 , + [1 , switch1 , → , + [2 , switch2 , → , + [2 , switch2 , → , − [1 , (cid:16) Ψ destroy1 , + Ψ switch1 , → , + Ψ switch1 , → , + Ψ switch1 , → , (cid:17) ,J (2 ,

1) = Φ create2 , + [1 , switch1 , → , + [1 , switch1 , → , + [2 , switch2 , → , − [2 , (cid:16) Ψ destroy2 , + Ψ switch2 , → , + Ψ switch2 , → , + Ψ switch2 , → , (cid:17) ,J (2 ,

2) = Φ create2 , +[2 , switch2 , → , +[2 , switch2 , → , +[2 , switch2 , → , − [2 , (cid:16) Ψ destroy2 , + Ψ switch2 , → , + [2 , switch2 , → , + Ψ switch2 , → , (cid:17) . (S4) II. PERFECT ACCURACY AT EQUILIBRIUM

Fig. S1 shows that when the average template binding free energy of right and wrong monomers is increased, at ﬁxed∆ G disc = ∆ G w − ∆ G r , the error remains low as the system tends towards equilibrium (∆ G chem → − for unbiased p ( s ), t ( s ) and [ S tot ] / [ M tot ] = 1). Indeed, (cid:15) → G disc → ∞ . The error at ∆ G chem = 0 is stillundeﬁned, but the ﬂuxes separate quickly after this point (inset Fig. S1a) to keep the error low. Due to the unstablebonds between copy and template, the system has a low ﬂux for ∆ G chem < FIG. S1.

The dimer copying model can show perfect accuracy in the equilibrium limit . (a) We plot the net production rate J s against chemical driving − ∆ G chem for each of the four dimers 1,1, 1,2, 2,1 and 2,2 relative to a template of sequence 1,1and with ∆ G disc = ∆ G w − ∆ G r = 2, but ∆ G r +∆ G w = 5. All oligomers have the same (ﬁxed) concentration [ S tot ] /

4, and allmonomers have the same (ﬁxed) concentration [ M tot ] /

2. All J s pass through zero at the equilibrium point ∆ G chem = 0 butseparate quickly (inset) at non-zero driving. Here accurate copies are preferentially produced but all ﬂuxes are very low. (b)Error fraction at which incorrect type 2 monomers are incorporated into dimers, (cid:15) , against − ∆ G chem for various ∆ G disc with ∆ G r +∆ G w = 5. The system tends towards perfect accuracy, (cid:15) →

0, as ∆ G disc → ∞ , even as the system tends to the equilibriumpoint ∆ G chem → III. MODEL OF COPY PRODUCTION FOR OLIGOMERS OF LENGTH | s | > The model, schematically illustrated in Fig. 4 (a), is adapted from temporary thermodynamic discrimination modelin Ref. [9]. We ﬁrst describe the model of Ref. [9], steps (2)-(5) in Fig. 4 (a), before outlining the extension consideredhere.We consider a copy oligomer s = s , ..., s l , made up of a series of sub-units or monomers s x , growing with respectto a template n = n , ..., n L ( l ≤ L ). Inspired by transcription and translation, we consider a copy that detaches fromthe template as it grows. We consider whole steps in which a single monomer is added or removed, encompassingmany individual chemical sub-steps. As illustrated in Figure 4(a), after each step there is only a single inter-polymerbond at position l , between s l and n l . As a new monomer joins the copy at position l + 1, the bond position l isbroken.As in the dimerisation model considered in the main text, we shall consider a template polymer n made entirelyof monomers of type 1. Given the assumed symmetry between the interactions of the two monomer types, and equalconcentrations of the monomer baths as used throughout this work, the results apply equally well to any templatesequence. Monomers of type 1 can simply be interpreted as correct matches and monomers of type 2 as incorrectmatches for any template sequence n .Having deﬁned the model’s state space, we now consider state free energies. By analogy with the dimerisationmodel of the main text, we deﬁne ∆ G −(cid:9)− dim as the free-energy change of adding a speciﬁc monomer to the end ofthe copy oligomer at standard concentration. The environment contains baths of monomers; a monomer of type s has a constant concentration [ s ] relative to the standard concentration. The chemical free-energy change for thetransition between any speciﬁc sequence s , ..., s l and any speciﬁc sequence s , ..., s l +1 , ignoring any contribution frominteractions with the template, is then ∆ G −(cid:9)− dim − ln[ s l +1 ].We then consider the eﬀect of speciﬁc interactions with template. As in the main text, we deﬁne ∆ G r/w as thebinding free energies for matched/mismatched monomers and the template. Overall, each forward step makes andbreaks one copy-template bond. There are four possibilities for forward steps: either adding 1 or 2 at position l + 1 to a copy with s l = 1; or adding 1 or 2 in position l + 1 to a copy with s l = 2. The ﬁrst and last of theseoptions preserve the same interaction with the template, so the total free-energy change for monomer addition is∆ G −(cid:9)− dim − ln[ s l +1 ]. For the second case a correct bond is broken and an incorrect bond added, implying a free-energychange of ∆ G −(cid:9)− dim − ∆ G r + ∆ G w − ln[2]. Conversely, for the third case, an incorrect bond is broken and a correct bondadded, giving a free-energy change of ∆ G −(cid:9)− dim + ∆ G r − ∆ G w − ln[1].These free energies constrain the kinetics of transitions between the various states, but are compatible with a rangeof kinetic models. In the temporary thermodynamic discrimination model of Ref. [9], all forwards steps are assumed tooccur with the same rate, and sequence-based discrimination occurs in the backwards step. For simplicity, in this casewe further assume that each step can be modelled as a single transition with an exponential waiting time, yielding: ν +1 , = [1] k, (S5) ν +1 , = [2] k, (S6) ν +2 , = [1] k, (S7) ν +2 , = [2] k, (S8) ν − , = ke ∆ G −(cid:9)− dim , (S9) ν − , = ke ∆ G −(cid:9)− dim − ∆ G r +∆ G w , (S10) ν − , = ke ∆ G −(cid:9)− dim − ∆ G w +∆ G r , (S11) ν − , = ke ∆ G −(cid:9)− dim . (S12)Here, k is a rate constant that sets the overall timescale (we take k = 1 in reduced units without loss of generality). ν + i,j is the rate for adding a monomer of type j to a copy with a monomer of type i at the leading edge, and ν − i,j isthe reverse process.To allow for initiation and termination of copying, we additionally include transitions corresponding to (1) and (6)in Fig. 4 (a) to the model of Ref. [9]. Unbinding transitions, whether of the initial monomer or the ﬁnal monomer ofa complete copy, are parameterised by: ν oﬀ1 = ke ∆ G r , (S13) ν oﬀ2 = ke ∆ G w . (S14)Binding of either a monomer to the initial site, or oligomer to the ﬁnal site, is assumed to have a rate ν on s = k [ s ] , (S15) ν on s = k [ s ] . (S16)Note that, for simplicity, we ignore the (challenging) question of how partial fragments are prevented from bindingto or detaching from the template, or the possibility of multiple copies being bound to the template at once.We simulate the system repeatedly using a Gillespie simulation [22], with the system initiated with either a monomer s sampled from t ( s ) seeded at the ﬁrst site, or an oligomer s sampled from p ( s ) attached to the ﬁnal position L of thetemplate. The simulation is allowed to run, and terminates either when a complete oligomer detaches from the ﬁnalsite of the template, or a single monomer detaches from the ﬁrst site of the template.By running many simulations it is possible to calculate the probability of creating a full length oligomer given thata single monomer binds to the template, P create , and the probability of destroying an oligomer given that one binds tothe ﬁnal site on the template, P destroy . Finally P transform = 1 − P destroy is the probability of a full oligomer detachingfrom the template given that an oligomer previously attached at the ﬁnal site. This oligomer will have had somesubset of its initial monomers transformed through removal of old and addition of new monomers.We set the oligomer concentration [ S tot ] = [ M tot ] n e (∆ G −(cid:9)− dim − Z )( n − . (S17)Here Z sets the equilibrium position; for Z = 0, the equilibrium is ∆ G −(cid:9)− dim = 0, for Z = 3, the equilibrium is at∆ G −(cid:9)− dim = 3 etc. In our model, Z = 5.The rate of oligomer creation per unit time in which the template is in an empty state is given by ˜ k create =[ M Tot ] kP create . The rate of oligomer destruction per unit empty template is ˜ k destroy = k [ S tot ] P destroy . The net ﬂuxper empty template, ˜ J tot = ˜ k create − ˜ k destroy , is plotted in Fig. 4c.We can also calculate the average fraction of the monomers incorporated during a creation event that are mismatcheswith the template, (cid:15) create . Similarly, the average fraction of incorrect monomers destroyed during a destruction event, (cid:15) destroy , can be extracted from simulations. Finally, (cid:15) transform is deﬁned as the diﬀerence in average error densitybetween the sequences at the end and start of a transformation event: (cid:15) transform = (cid:15) ﬁnal − (cid:15) initial . From thesequantities we calculate the overall error rate as the proportion of the net number of monomers added to oligomersthat are incorrect matches to the template: (cid:15) = k create (cid:15) create − k destroy (cid:15) destroy + k [ S tot ] P transform (cid:15) transform ˜ J tot ,,