The Tension on dsDNA Bound to ssDNA/RecA Filaments May Play an Important Role in Driving Efficient and Accurate Homology Recognition and Strand Exchange
Julea Vlassakis, Efraim Feinstein, Darren Yang, Antoine Tilloy, Dominic Weiller, Julian Kates-Harbeck, Vincent Coljee, Mara Prentiss
TThe Tension on dsDNA Bound to ssDNA-RecA Filaments MayPlay an Important Role in Driving Efficient and AccurateHomology Recognition and Strand Exchange
Julea Vlassakis, Efraim Feinstein, Darren Yang, Antoine Tilloy, DominicWeiller, Julian Kates-Harbeck, Vincent Coljee, and Mara Prentiss
Harvard University, Department of Physics, Cambridge, MA, 02138 ∗ Abstract
It is well known that during homology recognition and strand exchange the double stranded DNA(dsDNA) in DNA/RecA filaments is highly extended, but the functional role of the extension hasbeen unclear. We present an analytical model that calculates the distribution of tension in theextended dsDNA during strand exchange. The model suggests that the binding of additionaldsDNA base pairs to the DNA/RecA filament alters the tension in dsDNA that was already boundto the filament, resulting in a non-linear increase in the mechanical energy as a function of thenumber of bound base pairs. This collective mechanical response may promote homology stringencyand underlie unexplained experimental results.
PACS numbers: 87.15.ad, 87.15.La ∗ Electronic address: [email protected] a r X i v : . [ phy s i c s . b i o - ph ] N ov . INTRODUCTION Sexual reproduction and DNA damage repair often include homologous recombinationfacilitated by RecA family proteins [1, 2]. In homologous recombination, a single strandedDNA molecule (ssDNA) locates and pairs with a sequence matched double-stranded DNAmolecule (dsDNA). In the first step of the process, the incoming ssDNA binds to site I inRecA monomers, resulting in a helical ssDNA-RecA filament with 3 base pairs/monomerand ∼ in vitro homologyrecognition and strand exchange can occur without ATP hydrolysis [6–8]. Thus, each stepin the homology search/strand exchange process is fully reversible.During the homology search and strand exchange process, dsDNA bound to RecA is ex-tended significantly beyond the B-form length [9]. Recent theoretical work proposed thatthe free energy penalty associated with extension may promote rapid unbinding of non-homologous sequences, but the free energy penalty was assumed to be a linear function ofthe number of bound triplets and the kinetic trapping due to near homologs was not consid-ered [10].Earlier work had also suggested that the dsDNA extension promotes base-flipping[11] and reduces kinetic trapping since the lattice mismatch between extended dsDNA andB-form dsDNA presents a steric barrier to interactions between unbound dsDNA and boundssDNA which implies that the dsDNA must bind to the filament in order to interact with thessDNA [12]. These studies assumed the dsDNA in the DNA/RecA filament is uniformly ex-tended; however, the X-ray crystal structures of the dsDNA in the final post-strand exchangestate and the ssDNA in the homology searching state both consist of base pair triplets in anearly B-form conformation separated by large rises as illustrated in Figure 1a. The rises2ccur at the interfaces between adjoining RecA monomers [3], as illustrated in Fig. 2. Thefunctional role of the non-uniform extension has been unclear.In this paper we present a simple model that calculates the extension of each base pairtriplet in a dsDNA. Using this model, we calculate the free energy changes associated withprogression through the homology recognition/strand exchange process. The results of thatcalculation suggest a resolution to the long standing question of why strand exchange isfree energetically favorable even though the Watson-Crick pairing in the initial and finalstates is the same and the DNA/protein contacts in the ssDNA-RecA filament and finalpost-strand exchange state are nearly the same.[3] The model also makes several significantqualitative predictions, the most significant being the suggestion that the collective behaviorof the triplets due to their attachment to the phosphate backbones leads to a free energythat is a non-linear function of the number of consecutive bound triplets. As a result ofthis non-linearity, total binding energy has a minimum as a function of the number boundtriplets in a given conformation. After that minimum is reached, adding more triplets thatgiven conformation becomes free energetically unfavorable.Such a change in sign in the binding energy as a function of the number of bound tripletscan never occur in a theory where the energy is a purely linear function of the numberof bound triplets since the binding of any base pair anywhere in the system is equallylikely regardless of the state of any of the other triplets in the system. In a system with abinding energy that is a linear function of the number of correctly paired bound triplets,if homologous triplets can initially bind to the system, then additional homologous tripletswill always continue to bind. Thus, binding will readily progress across a non-homologoustriplet. As we will discuss in detail in this work, in a system with a linear energy and morethan ∼ ∼ ∼ ∼
15 bp that initially bind to the filament are contiguousand homologous. If the initial ∼
15 bp do not contain ∼ ∼ ∼
15 bp do include ∼ ∼
18 contiguous bp are bound to the filament in the metastable intermediate state. If allof the bp are contiguous and homologous, the system can make a transition to the finalpost-strand exchange state. Otherwise, the long regions of accidental homology will slowlyreverse strand exchange and unbind. Given the statistics of bacterial genomes, passing thehomology requirement for the second checkpoint would guarantee that the correct match hadbeen found. These predictions are in good agreement with known experimental results thatmeasure the stability of strand exchange products as a function of the number of contiguousbound base pairs [15].In this work, we will not attempt to optimize the parameters of the model in order to4rovide rapid and accurate homology recognition. Rather, we will consider why homologyrecognition systems in which the energies are a linear function of the number of boundbase pairs can either provide rapid unbinding of non-homologs or stable binding of completehomologs, but not both if the number of binding sites in the system is > ∼ > A. General Issues in Self-Assembly Based on the Pairing of Arrays of MatchingBinding Sites
In efficient self-assembly/recognition systems that create correct assembly by matchinglinear arrays of binding sites, correctly paired arrays of binding sites must remain stablybound, whereas incorrectly paired arrays must rapidly unbind even if the incorrect pairingcontains only one single mismatched binding site. For a system in thermodynamic equilib-rium, the populations in different binding configurations are determined simply by their bind-ing energies. Thus, in a system where every array consists of N binding sites if U ( m, N ) is thebinding energy when m of the binding sites are correctly paired, accurate recognition requiresthat exp[ − ( U ( m, N ) − U ( N, N )) / ( kT )] (cid:28) ∀ m < N , where k is Boltzmann’s constant and T is the Kelvin temperature. Furthermore, in a system with a temperature T , the require-ment that mismatched pairings rapidly unbind implies that U ( m, N ) > − kT ∀ m < N ,whereas if the correctly bound ones are to remain stably bound U ( N, N ) must be (cid:28) − kT .If the energy is a linear function of the number bound and a mismatch contributes zerofree energy then U ( N, N ) = − N (cid:15)kT and U ( N − , N ) = − ( N − (cid:15)kT , so the condi-tion exp[ − ( U ( N − , N ) − U ( N, N )) / ( kT )] (cid:28) (cid:15) (cid:29)
1, which also implies thatthe homolog will remain stably bound. For simplicity consider (cid:15) = 3 , which implies U ( N, N ) = − N (cid:15)kT = − N kT . Substitution into the requirement for the unbinding ofnear homolog requires that and U ( N − , N ) = − (3 N − kT > − kT = ⇒ N < / N = 1. Therequirements become more stringent if the specificity ratio is more strict than 1/20. As wewill discuss below, accurate homology recognition in a bacterial genome requires accurate5ecognition over a length of more than 12 bp, which implies N > in vitro is known to proceed without an irreversible step. [6–8] In this work, we will considerthe non-linearities in the free energy as a function of the number of contiguous bound basepairs that arise as a function of the differential extension of dsDNA bound to RecA andhow those non-linearities can promote rapid and accurate homology recognition withoutrequiring irreversibility by making transitions between successive bound states contingenton the state and relative position of the bound base pairs.
B. Homology Recognition and Strand Exchange are Likely to Proceed in Units ofBase pair Triplets
Previous experimental results suggest that homology recognition occurs via base flippingof the complementary strand bases between their initial pairing with the outgoing strand andtheir final pairing with incoming strand. [5, 17, 18]. Experimental results have already shownthat strand exchange does progress in triplets. [19] We propose that within each triplet thenearly B-form structure preserves sufficient stacking to allow homology discrimination toexploit the energy difference between the Watson-Crick pairing of homologs and heterologs,while the large rises between triplets result in mechanical stress that plays several importantfunctional roles in homology recognition and strand exchange. Furthermore, if the stackingmakes it free energetically favorable for the triplets to flip as a group, the lost of Watson-6rick pairing due to a single mismatch in a triplet may make strand exchange unfavorablefor the entire triplet, as would be the case if Watson-Crick pairing alone determined thefree energy of the strand exchanged state. If a single mismatch makes strand exchange of atriplet unfavorable, then testing in triplets implies that in a random sequence the possibilityof an accidental mismatch is 1/64, whereas if the search were done by comparing singlebase pairs the probability of an accidental match would be 1/4. Homology stringency andsearching speed would be greatly reduced if the search were not done in triplets. Once asufficient number of base pair triplets have undergone strand exchange, the complementarystrand backbone relocates to the position shown in the X-ray structure of the final state[5, 18], as illustrated in Figures 1c and 2.
C. General Model for dsDNA Bound to a RecA Filament
Certain underlying assumptions and qualitative features are important to the modelproposed in this paper: 1. that in the absence of hydrolysis the extension of the proteinfilament is unaffected by dsDNA binding [20] 2. that homology recognition and strandexchange occur in quantized units of base pair triplets [3, 19] 3. that the incoming andoutgoing strands consist of nearly B-form triplets separated by large rises where the strandis bound to the filament due to strong charged interactions between the backbone phosphatesand positive residues in the protein [3, 18, 21] 4. that the complementary strand is boundto the filament dominantly via Watson-Crick pairing, resulting in large stress on the basepairs unless the dsDNA is in the final post-strand exchanges state where the L1 and L2loops provide significant mechanical support. [3, 22]The model utilizes work presented by deGennes that calculated the force required to sheardsDNA [23]. We extend deGennes’ model to the triplet structures in the initial, intermediate,and final dsDNA conformations. In the model the actual three dimensional helical structureis converted into a one dimensional system. In the simple one dimensional model, L R isrise between the triplets in the incoming and outgoing strands and a single variable, γ ,characterizes the equilibrium spacing between phosphates when the complementary strandis in a particular state. Thus, for a particular state, the difference between the equilibriumspacing is given by (1 − γ ) L R . 7 . Predicted Extension and Energy As shown in Fig. 3, the extensions of rises between base pair triplets in a strand bounddirectly to the protein are given by v N,i and for a system with N triplets bound RecA, v N,i = iL R . (1)The u N,i specify the extensions of the rises in the complementary strand. At equilibrium,the net force on each u N,i must be zero; therefore, for j = 2 to NQ (( u N,j +1 − uN, j − γL R ) − ( u N,j − u N,j − − γL R )) + R ( u N,j − v N,j ) = 0 , (2)where R and Q are the spring constants for the base pairs and the backbones, respectively.These values for R and Q may be substantially different from those for individual dsDNAbase pairs when the dsDNA is not bound to RecA because of the interactions between thecharged phosphates and the protein and because dsDNA is grouped in triplets where thestacking between the triplets is strongly disrupted; however, it is still likely that R (cid:28) Q as it is in naked dsDNA since the interactions between the bases on opposite strand issignificantly weaker than the interaction between the phosophates in the backbone of thesame strand.The boundary condition on the last triplet u N, requires that+ Q ( u N, − u N, − γL R ) + R ( u N, − v N, ) = 0 (3)The values of u N,i can be found using Equations 1-3. The angle between base pairs andthe DNA helical axis is shown as θ bp in Fig. 3.In the continuous limit where the discrete subscript i is replaced by a continuous variable x , the equations have an analytical solution v N,x = x + A N ( x ) sinh( χx/L R ) (4)where χ = Sqrt [ R/ (2 Q )] is the deGennes length for RecA bound dsDNA and the constant A is found by using the boundary conditions for the ends, yielding A N ( x ) (cid:104) R sinh( χx/L R ) + Qχ cosh( χx/L R ) (cid:105) = Q ( γ − L R . (5)In the limit where 1 / (2 χ ) (cid:29) A N ( x ) = ( γ − L R cosh( χN/ . (6)8hese assumptions and features lead to a nuanced picture of the distribution of tensionduring strand exchange. The lattice mismatch between the complementary strand and itspairing partners is largest at the ends of the filament as shown in Fig. 4; therefore, the basepair tension is largest at the end of the filament. Furthermore, the lattice mismatch at theends increases significantly with the number of bound triplets as shown in Fig. 4 and 5. Theangle θ bp is greatest at the ends of the molecule. When fewer than ∼
30 bp are bound to thefilament, the tension on the base pairs at the ends increases rapidly as more RecA boundtriplets are added, as shown in Fig. 4 and 5; however, θ bp and the tension on the base pairsnear the center decreases as more triplets are added, as shown in Fig. 4 and 5. In contrastwith the tension on the base pairs, which is largest at the ends of the complementary strand,the tension on the rises in the complementary strand is largest at the center and smallestat the ends. These general qualitative features are not sensitive to R , Q or γ as long as R (cid:28) Q .Given the values of u N,i the mechanical energy of the system, which is of the form 1 / k ( x − x ) , may be calculated from: E mech ( N ) = [ N (cid:88) i =1 R ( u N,i − v N,i ) + N (cid:88) i =2 Q ( u N,i − u N,i − − γL R ) ] (7)When R (cid:28) Q this expression simplifies to: E mech (2) = 2[ 12 R ( L R (1 − γ )2 ] (8) E mech (3) = 2[ 12 R ( L R (1 − γ ) ] (9)These energy terms represent the stress on base pairs at the ends of the molecule due tothe lattice mismatch. When R (cid:28) Q , for moderate numbers of bound base pairs, this energyis stored primarily in the extended base pairs. This idea is particularly important because itimplies that an increase in the extension of the rises in the complementary strand can reducethe free energy by reducing the tension on the base pairs even if the increase in extensionof the rises requires some energy. Similarly, in the RecA structure the transition from theintermediate state to the final state may reduce the tension on the base pairs because theinteractions between the base pairs and the L1 and L2 loops may increase the equilibriumextension of the rises in the complementary strand.9sing the continuous limit allows us to generate scaling laws for the extension as afunction of the total number of bound base pairs. In the continuous limit, the non-linearcontribution to the free energy is given by E non − linear ( N ) = 12 (1 − γ ) L R χ cosh [ χN/ i = N/ (cid:88) i = − N/ sinh [ χi ] . (10)Thus, when χ (cid:28) E non − linear [ N ] α (1 − γ ) L R sinh( χN ) − χN [ χN/ . (11)In the limit where the number of base pairs bound is much less than the deGennes lengthso χN (cid:28) E non − linear [ N ] α (1 − γ ) L R χ N . (12)Thus, when N is small the energy increases as the square of the number of bound triplets,consistent with the exact results for the discrete case given in equations 7,8 and 9. Incontrast, when N is larger than the deGennes length, the non-linear energy term approacheszero and the mechanical energy increases linearly with increasing N . In this limit, the basepairs at the center of the filament are no longer under tension. Thus, adding a triplet tothe end of the filament effectively adds another triplet to the unstressed center rather thanincreasing the stress on all of the bound triplets.The total energy of the system includes the mechanical energy calculated above and thenon-mechanical binding energy per RecA monomer, E bind . Assuming the free energy of anunbound dsDNA is zero and the free energy gain upon binding a triplet is independent of N and i then the non-mechanical contribution to the binding energy for N triplets is N E bind .When the first RecA monomer binds, E total [1] = E bind , which is a constant negative value.In contrast, when N >
1, the stress on the molecule yields a total energy of E total [ N ] = E mech [ N ] + N E bind (13)which changes in sign and magnitude depending on L R , R , Q and γ . E. Modeling the Homology Search and Strand Exchange Process
Recent experimental results suggest that the homology recognition/strand exchange pro-cess uses four major dsDNA conformations: 1. B-form dsDNA, which is the structure of the10sDNA when it is not bound to the protein 2. an initial sequence independent searchingstate with the dsDNA bound to the RecA where the complementary strand bases paired withoutgoing strand 3. an intermediate sequence dependent strand exchanged state in which thecomplementary strand bases are paired with the incoming strand where the complementarystrand backbone is in a position where its bases can flip between pairing with the incomingand complementary strands 4. the final state known from the X-ray structure where thecomplementary strand bases are paired with the incoming strand bases and the phosphatebackbone is in the position shown in the x-Ray structure. [18]Consistent with strand exchange proceeding through base flipping of triplets, recent ex-perimental results have suggested that outgoing strand bases are arranged in B-form tripletsseparated by large rises such that the complementary strand bases can readily rotate be-tween pairing with the outgoing strand and pairing with the incoming strand. [21]. Thecrystal structure shows that the incoming strand is located near the center of the helicalDNA/protein structure whereas the residues associated with the binding of the outgoingstrand are much farther away from the center, as illustrated in Fig. 1; however, if strandexchange occurs via the base flipping of base pair triplets, the spacing within triplets mustbe approximately the same for all three strands. Thus, given that the total extension ofthe outgoing strand backbone is be much larger than the total extension of the incomingstrand backbone, the rises in the outgoing strand must be much larger than the rises in theincoming strand, as illustrated in Fig. 2.In the one dimensional model considered here where γ is the only parameter charac-terizing each dsDNA conformation, γ is smallest for the initial searching state, where thecomplementary strand is paired with the very highly extended outgoing strand, consistentwith experimental results that the dsDNA in the initial bound state has a large differentialextension between the outgoing and complementary strands that prevents more base pairsfrom binding to the filament unless the dsDNA undergoes strand exchange. [21] The γ forthe intermediate state is slightly larger because the complementary strand is paired withthe less extended incoming strand. This value is consistent with experimental results thatsuggest that the strand exchange of homologous triplets is favorable because it reduces thedifferential tension on the dsDNA. [18] The γ for the final state is almost 1, because of thesupport provided for the rises by the L1 and L2 loops of the protein that occupy the rises inthe complementary strand, consistent with the known experimental result that the binding11f at least ∼
80 bp in the final state is free energetically favorable. [22] In order to probethe role of tension during homology recognition strand exchange we apply the model to thevarious stages of strand exchange using the following parameters: γ = 0 . , . , . , . E bind equal to 0 , − . kT , − . kT and − . kT per homologous triplet in the un-bound, initial bound, intermediate and final states, respectively. These γ values are inspiredby X-ray structure DNA RecA filaments, the known properties of B-form dsDNA, exper-imental results on the stability of strand exchange products [15], and results of numericalsimulations that optimize homology recognition. [14] II. RESULTS AND DISCUSSIONA. Changes in Mechanical Tension Allows Strand Exchange to be Free Energeti-cally Favorable if and only if 6 Contiguous Homologous bp Undergo Strand ExchangeInsensitive to Model Parameters
Early RecA recognition systems assumed that strand exchange is always free energeti-cally favorable, just as Watson-Crick pairing of unpaired ssDNA is always favorable, but thisassumption is not correct. In self-assembly based on ssDNA/ssDNA pairing, the pairing ofmatched bases is free energetically favorable and the pairing of mismatched bases is approxi-mately neutral. Thus, in ssDNA/ssDNA pairing systems, such as DNA origami, assembly ofmatching sites is always free energetically favorable because the correct Watson-Crick pairingreduces the free energy below that for the system where no bases were bound. In contrast,if only Watson-Crick pairing is considered, self-assembly based on strand exchange is freeenergetically neutral for matched bases and free energetically unfavorable for mismatchedbases because the system begins in a state with all of the bases in the complementary strandcorrectly paired with the corresponding bases in the outgoing strand. Thus, previous modelsof strand exchange that considered only the Watson-Crick pairing were faced with a para-dox: at thermal equilibrium strand exchange is only free energetically neutral, so at bestonly 50% of correct pairings would end up in the strand exchanged state. The other 50%would end up paired with their initial partners; however, in vivo strand exchange proceeds tocompletion. Including protein contacts did not solve the problem since the protein contactsin the initial ssDNA-RecA filament are almost identical to the contacts in the final post-12trand exchange state [3]; however, the model presented in this paper resolves the problem,as discussed below.In the model considered here, the coupling along the dsDNA backbone makes any changein the dsDNA conformation (such as adding or base-flipping a triplet) alter the positionsand extensions of all other bound triplets. This effect is illustrated in Fig. 4, which showsthe rises and extensions of the base pairs calculated from the model when an additionaltriplet binds in site II of the protein when all of the bound triplets are bound in site II.The triplet highlighted with the arrow experiences increasing tension as triplets are addedto the end of the sequence. Furthermore, as shown in Fig. 5, the lattice mismatch at theends of the filament increases monotonically as more triplets are added. In contrast, thelattice mismatch for the first triplet out from the center of the sequence decreases as basepairs are added (solid black line) because the added triplets extend the inner rises. Asthe number of bound triplets approaches the deGennes length for RecA bound dsDNA, thelattice mismatch at the ends of the filament approaches a constant and the central rises inthe complementary strand approach L R .The mechanical stress model considered here suggests strand exchange is free energeti-cally favorable if and only if strand exchange transfers a rise because the reduction in freeenergy is the result of the reduction in mechanical stress due to a reduction in the stress onthe base pairs due to decrease in the lattice mismatch between the complementary strandand its Watson-Crick pairing partner. Thus, when the dsDNA first binds to the searchingfilament and all of the triplets are in the initial state, base flipping of a single triplet does notlower the energy of the system since a rise is not transferred even if that triplet is perfectlyhomologous; however, once one homologous triplet is strand exchanged, it becomes favorableto transfer a contiguous homologous triplet. Thus, the transfer of the Watson-Crick pairingof the complementary strand from the outgoing strand to the incoming strand is not freeenergetically favorable except in sequence regions containing at least six base pairs of con-tiguous homology. This minimum number of contiguous homologous bases required to passthe first checkpoint is a basic qualitative feature of the model that is extremely insensitiveto the model parameters chosen. This first checkpoint can provide very rapid unbinding ofalmost all initial dsDNA pairings, which is highly advantageous for rapid searching. Thoughthe model specifies that 6 contiguous bases is the minimum number required to pass thecheckpoint insensitive to the parameters used in the model,the actual number of base pairs13equired to pass the first checkpoint is sensitive to parameters.Experimental results suggest that strand exchange is only marginally stable when ∼ / ∼ × − which represents only ∼
50 possiblepositions for a bacterial genome with a length of 10,000,000 bp. All other sequences willrapidly unbind from the RecA filament because the binding energy for sequence independentsearching state is very weak when only ∼ B. Calculations of the Free Energy as Function of the Number of Bound Tripletsin Each dsDNA Conformation
In order to understand the progression through homology recognition and strand ex-change, it is important to consider the free energies for all of the dsDNA conformationsinvolved, not just the initial bound state and the intermediate state. Of course the processis kinetic, so the energies of the transition states play a vital role; however, for simplicitywe will only show the energies of the various bound conformations using γ values based onnumerical simulations. [14] Fig. 6 shows E total as a function of number of base pairs in theinitial bound state, the intermediate state and the final post-strand exchange state whenall of the triplets are in the same conformation. Except for the initial binding of ∼ C. Non-linearities in the Free Energy Allows Homologs to Progress to CompleteStrand exchange
While it is energetically favorable to add base pairs to the initial bound state for smallnumbers of base pairs, the quadratic term of E total from Equation 7 rapidly increases as afunction of increasing number of base pairs, making the binding of a large number of basepairs in the initial state unfavorable. This is because of the significant tension due to the14attice mismatch between the complementary strand and the outgoing strand; consequently,for the parameters based on simulation, no more than ∼
15 bp can bind to the filament inthe initial searching state because the energy required greatly exceeds kT . Thus, once ∼ ∼ < ∼
30; consequently, the non-linearity in the free energy makes strand exchange reversal moreimprobable as a more contiguous homologous base pairs are strand exchanged. Further-more, the non-linearity makes strand exchange at the center of the filament increasinglyunfavorable as the number of bound triplets increases as shown in Fig. 8, while still allowingstrand exchange reversal to remain possible at the ends of the filament. Again, these qual-itative features are basic properties of the model that are highly insensitive to the modelparameters. These qualitative features allow true homologs to progress to complete strandexchange even though non-homologs readily unbind. In contrast, for a system with a linearfree the probability that strand exchange will be reversed for a given triplet is independentof the number of other triplets bound and of the position of the particular triplet in the fila-ment. As a result, such systems either suffer rapid unbinding of homologs or strong kinetictrapping in near homologs, as discussed above for the general case of a system with a linearbinding energy and greater than 4 binding sites.15 . dsDNA Tension Drives the Transition from the Intermediate State to the FinalState
The free energy penalty due to binding in the intermediate state is so large that even aperfect homolog cannot bind more than ∼
18 bp in the intermediate state. The only wayto continue to add base pairs to the filament is to make a transition to the final post-strandexchange state where the L1 and L2 loops provide significant mechanical support for therises. This transition reduces the tension sufficiently to allow more triplets to bind in theinitial bounds state. If the transition does not occur, strand exchange will reverse becausethe intermediate state energy is unfavorable. Thus, the collectivity in the behavior of thedsDNA can have a significant effect in enforcing homology stringency at this final step inthe strand exchange process by enforcing the following: 1. that the number of base pairsthat can be bound in the intermediate state is limited 2. that transfer of a single tripletfrom the intermediate state to the final state is never favorable 3. that transferring non-contiguous triplets from the intermediate state to the final state is unfavorable even if theyare homologs 4. that the transfer of pairs of triplets is not favorable until ∼
18 bp are boundin the intermediate state. Properties 1-3 are shared with the transition from the initial boundstate to the intermediate state, but the fourth property is different. It may arise from somecombination of two features: 1. the linear energy in the final state may be less favorablethan the linear energy in the intermediate state 2. there is a significant boundary penaltyassociated with the deformation of the backbone that occurs when the system is partiallyin the intermediate state and partially in the final state. In either case, then the transitionfrom the intermediate state to the final state will only become free energetically favorablewhen the favorable non-linear term becomes dominant over the other terms. If, the finalstate only becomes free energetically favorable when ∼
15 contiguous base pairs are presentin the final state, then the non-linearity in the energies of the intermediate and final statesin the energy may provide additional discrimination against regions of accidental homologysince for a particular given searching ssDNA sequence the odds of this occurring with agiven searching sequence are ∼ . x − /base pair. Thus, even in a 10 million bp genome,the probability that such an accidental homology is present in a given bacterial genome is ∼ / . We have considered a few bacterial genomes and found that the sequences areindeed random in this sense, with the exception of repeated genes. Thus, many bacterial16enomes contain no accidental mismatches consisting of 18 contiguous bp; therefore, nofurther homology checking is required if 18 contiguous bp exactly match. The importantstatistical property is consistent with experimental results that strand exchange products todo not become stable until more than ∼
18 bp have undergone strand exchange.
E. For dsDNA in the final state the free energy has a minimum as a function of N If the final state has a much higher γ than the intermediate state, then adding basepairs will remain free energetically favorable for much larger N ; however, as long as themechanical contribution remains a non-linear function of N , the free energy as a functionof N will achieve a minimum for some N , after which adding more base pairs becomes freeenergetically unfavorable. Eventually, as N approaches the deGennes length, the mechanicalenergy will become a linear function of N . In this case, the energy cost of adding anadditional triplet remains constant as a function of N . If this cost is small in comparisonwith kT and unbinding is forbidden, the model suggests that the length of the strandexchange product can increase without limit. F. The Effect of the Non-Linearity on the Strand Exchange of a MismatchedTriplet
It has previously been assumed that the free energy penalty for strand exchange of atriplet is approximately equal to the loss of Watson-Crick pairing for that triplet, witha possible additional factor due to the effect of the mismatch on the pairing of the twoneighboring bases which ranges from ∼ . ∼ kT . [24] In contrast, for a system withthe non-linearity considered here if the initially bound base pairs contain a single mismatch,then strand exchange may be significantly more unfavorable because the unfavorable freeenergy contribution due to this mismatch must include not only the Watson-Crick pairingenergy for that base pair and its neighbors, but also the the increased mechanical stresson the two matched base pairs. This stress not only makes a direct contribution to thefree energy penalty, but it can also increase the stacking penalty by distorting the bondsbetween the two homologous base pairs which lowers their Watson-Crick pairing energy.17 detailed structural calculation would be required to correctly assess all of these factors.In what follows, we will assume that the free energy penalty for the strand exchange of amismatched base is approximately equal to the Watson-Crick pairing loss as long as thenumber of <
18 bp are bound to the filament.
G. dsDNA Tension Inhibits Progression of Strand Exchange Past a Mismatch
In a system with a linear free energy as a function of the number of bound homologoustriplets, adding more homologous triplets is always favorable even if the last triplet addedwere non-homologous, resulting in enormous kinetic trapping. In contrast, the non-linearenergy inhibits binding of additional base pairs after a non-homologous triplet has bound,as illustrated in Fig. 7. The dashed black line shows the curve for a perfect homolog addinga triplet to the initial bound state if all of the other bound triplets have undergone strandexchange. For up to ∼
18 bp thermal energy is sufficient to bind additional base pairs.The dashed gray line shows the free energy penalty for adding a homologous triplet if thelast triplet added was non-homologous. The free energy penalty is only slightly larger thanthe penalty for a homolog; however, the solid gray line shows that the penalty for addinga second triplet is very large, even though both triplets added after the non-homolog werein fact homologous. For comparison, the solid black line shows the energetic favorability ofstrand exchange of a homologous triplet from the initial binding state. This graph showsthat the non-linearity makes adding additional triplets to the initial state is unfavorableonce a mismatched triplet has bound, even when the subsequent base pairs are homologous.
H. Possible Explanations of Biological Results
We have already discussed the proposal that the energetic non-linearity explains whystrand exchange is free energetically favorable even though the sequences of the incomingand outgoing strands are the same and the protein contacts in the initial searching ssDNA-RecA filament are similar to those in the final post-strand exchange state.In addition, experimental results have shown that a rapid initial interaction incorporating ∼
15 base pairs is followed by a slower progression of strand exchange that occurs in triplets[19]. Figure 6 suggests that the binding of dsDNA to site II is favorable for fewer than 918ases and requires only a few kT of energy for fewer than 15 bases, whereas for more basesthe binding is highly free energetically unfavorable.Furthermore, FRET based studies indicate that homology recognition may be accuratefor short sequences, but inaccurate for longer sequences [17]. A separate study showed thatstrand exchange pauses at sequence mismatches [25, 26], and we argue that such pauses leadto the unbinding of shorter non-homologous sequences because the binding of the dsDNA tothe filament occurs sequentially In the model presented here the pause in strand exchangeat a mismatch results from the free energy cost of transferring the non-homologous tripletto the intermediate state as well as the cost of progressing past a mismatched triplet. Spon-taneous unbinding of the entire strand exchange product becomes unlikely as the sequencelengthens because so many free energetically unfavorable transitions are required. If thestrand exchange product becomes too long, the unbinding time exceeds the recognition timeavailable to the organism; however, as discussed above, accidental mismatches that extendbeyond 18 bp rarely exist in vivo . In vivo , strand exchange does progress through regions ofnon-homology once a sufficiently long stand exchange product is formed, but ATP hydrolysisis required. [27, 28]Finally, it is also well known that in the presence of ATP hydrolysis the size of thestrand exchange product increases monotonically until it reaches a limit of M ∼
80 bp[22], where M is the number of bound dsDNA base pairs. Strand exchange then continuesto progress, but M remains constant because the heteroduplex dsDNA unbinds from thelagging edge of the filament at the same average rate that new dsDNA binds to site II[22, 29]. Since the dsDNA can freely unbind from the filament, free energy minimizationimplies M will remain ∼ M freemin . Additional effects associated with dynamics may explainwhy the strand exchange window moves along the dsDNA with M ∼ M freemin rather thanremaining stationary [30]. In contrast with the experimental results obtained in the presenceof hydrolysis, experimental results obtained in the absence of hydrolysis show that the lengthof the strand exchange window can continue without bound.[22] In the model presentedhere, if the number of base pairs bound is small the mechanical energy penalty associatedwith adding base pairs is a quadratic function of energy for small numbers already bound;however, the base pairs redistribute the stress between the backbones, so that eventuallythe base pairs in the center of the filament are not under stress. In this case, the penaltyfor adding another base pair triplet to the end becomes a linear function of the number19ound rather than a quadratic function. Thus, when a sufficient number of base pairshave undergone strand exchange, the energy required to add an additional triplet to thefilament is constant, independent of the number bound. If, as suggested by this model, theconstant energy decrease due to additional DNA protein contacts is approximately equal tothe constant energy increase associated with the added mechanical stress, then the filamentcan extend forever because both energies are independent of the number of base pairs alreadybound. I. Additional Features in Three Dimensions
In the real RecA system steric factors are associated with the mismatch between the150 bp persistence length of dsDNA and the strong bending of dsDNA in the 18 bp/turnhelical RecA filament. The local rigidity of the dsDNA may play a role in limiting theinitial binding length to ∼ γ for the initial boundstate, but the additional degrees of freedom would alter the coupling between the initialbound state γ and the γ for the intermediate state.In addition, in the final post-strand exchange state interactions with the L1 and L2 loopsmay be more favorable for homologous triplets than non-homologous triplets due to stericfactors. Thus, the final state could have a sequence dependent linear contribution to thefree energy that was not considered in this model, but may provide additional homologystringency. 20 II. CONCLUSION
We have proposed a simple mechanical model for the stress distribution on dsDNA boundto ssDNA-RecA filaments. The model suggests that a change in the conformation of onebound triplet can change the conformation of all of the other bound triplets; consequently,the total energy is a non-linear function of the number of bound base pairs. The model makesseveral significant qualitative and quantitative predictions. The most important qualitativeprediction is that a change in the configuration of one bound triplet changes configurationof all the other bound triplets. The collective behavior of the triplets leads to a free energythat is a non-linear function of the number of consecutive bound triplets, where the bind-ing of additional triplets becomes increasingly unfavorable as the number of bound tripletsincreases. This non-linearity is important because in systems with more than ∼ ∼ ∼
50 out of 10,000,000 possible pairings will rapidly unbind.In addition , the non-linearity forces the addition of triplets to the filament to proceedsequentially from the initially binding, where adding more than two base pair triplets af-ter a mismatch is highly unlikely, even if the additional base pairs are sequence matched.Furthermore, the model suggests true homologs can proceed to complete strand exchangebecause the strand exchange of contiguous homologous base pair triplets reduces the tensionon the dsDNA. The tension reduction associated with the strand exchange of the initial ∼ ∼ ∼ < ∼
18 bp. These effects could provide exact sequence recognition forbacterial genomes except for repeated genes. The actual speed and accuracy of homologyrecognition depends on detailed values of parameters as well as additional factors associatedwith the three dimensional geometry of the filament, so we do not provide detailed esti-mates here; however, the simple model presented here provides mechanisms for overcomingfundamental limitations encountered in systems with more than 4 binding sites where thebinding energy is a linear function of the number of correctly paired binding sites.In sum, the energy non-linearity produces three crucial advantages that are unavailableto systems with a strictly linear energy: 1. the initial interaction is limited to ∼
15 bpbeyond which binding of dsDNA triplets to the filament cannot progress without a sequencedependent transition of 9 contiguous homologous base pairs to the strand exchanged state2. nearly immediate unbinding of any sequence that does not contain at least 9 contiguoushomologous base pairs 3. a large free energy penalty that prevents strand exchange fromprogressing past a sequence mismatch even if the mismatch is followed by homologous triplets4. a large free energy penalty that makes the transition from the intermediate state to thefinal state unfavorable until ∼
18 contiguous base pairs make the transition to the final state.These features provide much more rapid and accurate homology recognition than systemsusing linear energies: in systems with linear energies addition and strand exchange of ahomologous triplet is always favorable; therefore, in systems with linear energies even shortregions of accidental homology can produce substantial trapping times, as demonstratedby both analytical modeling and numerical simulations. [14] Qualitative features of themodel provide possible explanations for well known but previously unexplained features forhomology recognition and strand exchange and suggest that the bond rotations that appearin the overstretching of naked dsDNA may have a role in strand exchange.22 cknowledgments
We would like to thank Douglas Bishop, Yuen-Ling Chan, and Chantal Pr´evost for helpfulconversations about the X-ray structure of DNA bound to RecA. We would also like to thankChantal Pr´evost for the PDB file of a complete RecA helix. [1] A. Roca and M. Cox, Mol. Biol. , 415 (1990).[2] S. Kowalczykowski and A. Eggleston, Annu. Rev. Biochem. , 991 (1994).[3] Z. Chen, H. Yang, and N. Pavletich, Nature , 489 (2008).[4] E. Folta-Stogniew, S. OMalley, R. Gupta, K. Anderson, and C. Radding, Mol. Cell , 965(2004).[5] J. Xiao, A. Lee, and S. Singleton, ChemBioChem , 1265 (2006).[6] A. Mazin and S. Kowalczykowski, Proc. Natl. Acad. Sci. USA , 10673 (1996).[7] J. Menetski, D. Bear, and S. Kowalczykowski, Proc. Natl. Acad. Sci. USA , 21 (1990).[8] W. Rosselli and A. Stasiak, J. Mol. Biol. , 335 (1990).[9] A. Stasiak, E. D. Capua, and T. Koller, J. Mol. Biol. , 557 (1981).[10] Y. Savir and T. Tlusty, Mol. Cell , 388 (2010).[11] A. Mazin and S. Kowalczykowski, Genes Dev. , 2005 (1999).[12] K. Klapstein, T. Chou, and R. Bruinsma, Biophys J. , 1466 (2004).[13] J. Kates-Harbeck, A. Tilloy, and M. Prentiss, to be published.[14] E. Feinstein and M. Prentiss, to be published.[15] P. Hsieh, C. Camerini-Otero, and R. Camerini-Otero, Proc. Natl Acad. Sci. USA , 6492(1992).[16] J. Hopfield, Proc. Natl. Acad. Sci. USA , 4135 (1974).[17] L. Bazemore, E. Folta-Stogniew, M. Takahashi, and C. Radding, Proc. Natl. Acad. Sci. USA , 11863 (1997).[18] A. Peacock-Villada, D. Yang, C. Danilowicz, E. Feinstein, N. Pollack, S. McShan, V. Coljee,and M. Prentiss, Nucl. Acids Res. , 10441 (2012).[19] K. Ragunathan, C. Joo, and T. Ha, Structure , 1064 (2011).[20] M. Karplus, private communication.
21] C. Danilowicz, E. Feinstein, A. Conover, V. Coljee, J. Vlassakis, Y. Chan, D. Bishop, andM. Prentiss, Nucl. Acids Res. , 1717 (2011).[22] T. van der Heijden, M. Modesti, S. Hage, R. Kanaar, C. Wyman, and C. Dekker, Mol. Cell , 530 (2008).[23] P. G. D. Gennes, CR. Acad. Sci. IV-Phys. , 1505 (2001).[24] J. SantaLucia, Jr., Proc. Natl Acad. Sci. USA , 1460 (1998).[25] A. Lee, J. Xiao, and S. Singleton, J. Mol. Biol. , 343 (2006).[26] a. S. D. Sagi and? T. Tlusty, Nucl. Acids Res. , 5021 (2006).[27] W. Rosselli and A. Stasiak, EMBO J. , 4391 (1991).[28] J. Kim, M. Cox, and R. Inman, J. Biol. Chem. , 16444 (1992).[29] H.-F. Fan, M. Cox, and H.-W. Li, PLoS ONE , e21359 (2011).[30] A. Conover, C. Danilowicz, R. Gunaratne, V. Coljee, N. Kleckner, and M. Prentiss, Nucl.Acids Res. , 8833 (2011). bc Figure 1: (Color Online) a. Side view of the X-ray structure of the dsDNA post-strand exchange(final) state with complementary (on the right at the arrow) and incoming strands shown as greenand red stick renderings. The arrow points to a few rises between two triplets in the dsDNAstructure. The VMD (32) renderings of RecA crystal structure 3CMX (8) show site II residuesArg226 (pink), Lys227 (cyan), Arg243 (yellow), and Lys245 (magenta) with charged nitrogen atoms(blue). The cyan triangle indicates the approximate position of an outgoing strand phosphateb. top view of the same structure with circles indicating the radii occupied by the incoming(green), outgoing (blue), and complementary strands (red for final state and gray forintermediate state). c. top view showing the base pairing in the homology recognition/strandexchange process superimposed on the actual X-ray structure. ntermediate testing state for homolog1 exchanged tripletB-FormdsDNAssDNA in site I No Bound dsDNA ssDNAInitial binding state:dsDNA in Site IIFinal state:Complementary strand relocated iiiiiiivv Intermediate testing state for homolog3 exchanged triplets vi Figure 2: (Color Online) Schematic of Interactions between dsDNA and the ssDNA-RecA filamentas strand exchange progresses for a homolog with side views shown in the central part of eachpanel and top views shown in the right part of each panel where the grey region indicates thespace occupied by the protein DNA/protein and the circles indicate the radii occupied by the threestrands in their final post-strand exchange positions. In the schematic, the filament consisting ofthree RecA proteins, showing site I (pale red), site II (pale orange), and the support for the risesprovided by the L1 and L2 loops (cyan), with (i) unbound ssDNA ( (ii) ssDNA bound in site I andunbound B-form dsDNA, (iii) ssDNA bound in site I and dsDNA bound in site II; The outgoingstrand (far left ), complementary strand (bound to the outgoing strand in purple ), and incomingstrand (bound to the protein on the right) are shown in blue, red, and green, respectively. (iv)central triplet has undergone strand exchange, resulting in a decrease in lattice mismatch and adecrease in the stress on the bp (v) all three triplets shown have undergone strand exchange (vi)ssDNA bound in site II and dsDNA bound in site I in the final assembled state which has evenless bp stress than the strand exchanged state due to increase mechanical support for the rises. -V =L R V -V =L R V -V =L R U -U U -U U -U ! bp Figure 3: Schematic of dsDNA bound to site II of the protein showing the model parameters.
A B C Fig: Effects of adding triplets to the initial bound state as calculated by the model. (A) Three triplets shown in the bound state with an arrow pointing to the second one. (B) Adding a triplet to the initial state changes the tension in the other triplets. (C) Again tension on the triplets change as a fifth triplet is added to the initial state.
Figure 4: Effects of adding triplets to the initial bound state as calculated by the model. (A)Three triplets shown in the bound state with an arrow pointing to the second one. (B) Addinga triplet to the initial state changes the tension in the other triplets. (C) Again tension on thetriplets change as a fifth triplet is added to the initial state.