[PDF] Potential evolutionary advantage of a dissociative search mechanism in DNA mismatch repair

Abstract

Protein complexes involved in DNA mismatch repair appear to diffuse along dsDNA in order to locate a hemimethylated incision site via a dissociative mechanism. Here, we study the probability that these complexes locate a given target site via a semi-analytic, Monte Carlo calculation that tracks the association and dissociation of the complexes. We compare such probabilities to those obtained using a non-dissociative diffusive scan, and determine that for experimentally observed diffusion constants, search distances, and search durations in vitro , there is neither a significant advantage nor disadvantage associated with the dissociative mechanism in terms of probability of successful search, and that both search mechanisms are highly efficient for a majority of hemimethylated site distances. Furthermore, we examine the space of physically realistic diffusion constants, hemimethylated site distances, and association lifetimes and determine the regions in which dissociative searching is more or less efficient than non-dissociative searching. We conclude that the dissociative search mechanism is advantageous in the majority of the physically realistic parameter space.

Full PDF

PPotential evolutionary advantage of a dissociative search mechanism in DNAmismatch repair

Kyle Crocker, James London, Andr´es Medina, Richard Fishel, and Ralf Bundschuh Department of Physics, The Ohio State University, Columbus, Ohio 43210, USA Department of Cancer Biology and Genetics, The Ohio State University, Columbus, Ohio 43210, USA Department of Physics, Department of Chemistry and Biochemistry,Division of Hematology, Department of Internal Medicine,The Ohio State University, Columbus, Ohio 43210, USA (Dated: December 18, 2020)Protein complexes involved in DNA mismatch repair appear to diﬀuse along dsDNA in order tolocate a hemimethylated incision site via a dissociative mechanism. Here, we study the probabilitythat these complexes locate a given target site via a semi-analytic, Monte Carlo calculation thattracks the association and dissociation of the complexes. We compare such probabilities to thoseobtained using a non-dissociative diﬀusive scan, and determine that for experimentally observeddiﬀusion constants, search distances, and search durations in vitro , there is neither a signiﬁcantadvantage nor disadvantage associated with the dissociative mechanism in terms of probability ofsuccessful search, and that both search mechanisms are highly eﬃcient for a majority of hemimethy-lated site distances. Furthermore, we examine the space of physically realistic diﬀusion constants,hemimethylated site distances, and association lifetimes and determine the regions in which dis-sociative searching is more or less eﬃcient than non-dissociative searching. We conclude that thedissociative search mechanism is advantageous in the majority of the physically realistic parameterspace.

I. INTRODUCTION

DNA mismatch repair (MMR) is a molecular pro-cess by which errors in DNA sequence indicated by mis-matched base pairs are corrected. Failure of this pro-cess is the cause of many cancers [1], but a completemechanistic description of the process does not yet ex-ist [1–3]. The MMR process is evolutionarily conservedfrom prokaryotes to eukaryotes [4–6], so

E. coli

MutS,MutL, and MutH proteins may be productively used tostudy MMR. In

E. coli , MMR consists of the followingsteps. First, MutS recognizes a mismatched site on aDNA strand and associates with the DNA. This MutSthen binds MutL from solution, which in turn can bindMutH. MutH then nicks the newly synthesized, erroneousDNA strand. Excision, followed by polymerization andligation, complete the repair process [5, 6].Here, we describe a quantitative model of the processby which the MMR proteins determine which strand isnewly synthesized. Since

E. coli methylates its DNAstrands whenever a GATC base sequence appears, anewly synthesized strand diﬀers from existing strands inthat it is not yet methylated. A MutL activated MutH,therefore, nicks the new strand at a hemimethylated site,and the strand containing the nick is excised. In orderto create this nick, however, a hemimethylated site mustﬁrst be recognized. The hemimethylated sites may bethousands of base pairs away from the mismatch (andtherefore the place at which the MutS proteins bind tothe DNA), so recognition of a hemimethylated site is nota trivial problem. Through single molecule probing ofthe MMR process in vitro , Liu et al. recently found ex-tremely stable toroidal protein clamps diﬀusing along theDNA strand while transiently associating and dissociat- ing from each other in order to reach and recognize ahemimethylated site [3]. It is this diﬀusion mechanismthat is the subject of our quantitative model.Protein searches of DNA for speciﬁc sites are com-mon, and searches involving non-toroidal proteins havebeen studied extensively: Berg et al. derived a completemathematical model of this search process in terms ofassociation and dissociation rates, as well as geometricalconsiderations [7]. Givaty et al. developed a molecularsimulation based on electrostatic forces of non-toroidalDNA binding proteins searching DNA and tracked non-toroidal protein motion. They found that the most eﬃ-cient DNA searches consist of ∼

20% sliding and ∼ et al. in the context of an E.coli polymerase, DNA polymerase III holoenzyme, whichis stabilized on the DNA by the β -clamp that encirclesthe DNA [9]. More recently, Daitchmen et al. have usedmolecular dynamics simulations to study the diﬀusion ofthese protein clamps and report on the way in which thephysical properties of the clamps aﬀect the diﬀusion dy-namics [10]. However, all of the previous studies that weare aware of have focused on individual proteins ratherthan the search process as a whole.The focus of this paper is to quantitatively model theobserved protein clamp association-dissociation mecha-nism present in MMR protein clamp diﬀusion. Whilethis is similar to the non-toroidal search mechanism de-scribed by Berg in that it is characterized by a transitionbetween a slow searching state and a fast non-searchingstate, the time distribution of the fast state is diﬀerentin each case. In particular, the transition from the disso-ciated fast state into the slow searching state is governed a r X i v : . [ q - b i o . S C ] D ec by 3-D diﬀusion in the non-toroidal proteins discussed byBerg [7], whereas the toroidal structure of the proteinsthat we consider, while allowing dissociation of the pro-teins from each other, prevents release from the DNA andthus restricts their motion to a single dimension. Thisstructure also prevents transfer between nearby DNAsegments [9–11].After construction of the quantitative model, we inves-tigate if the association-dissociation mechanism serves toincrease the eﬃciency with which a hemimethylated siteis found, as compared to a more straightforward situa-tion in which the proteins are unable to dissociate fromone another (or, equivalently, there is only a single pro-tein). If this were the case, it could provide an evolution-ary pressure favoring the association-dissociation mecha-nism. We ﬁnd that although the association-dissociationmechanism makes little diﬀerence at the observed E. coli parameters, there is a much larger section of parameterspace in which the association-dissociation mechanism isbeneﬁcial as opposed to detrimental when compared toa case in which the proteins do not dissociate.This paper begins with a summary of the Liu et al. experiments that led to our model, including a tabula-tion of experimental parameters relevant to the modelin section II. In section III, the model itself is describedboth physically and mathematically. Section IV presentsour approach to calculating the probability of ﬁnding thehemimethylated site from the model. The main ﬁndingsconcerning the probability of ﬁnding the hemimethylatedsite are then presented in section V, and ﬁnally the im-plications of those results are discussed in section VI,along with potential future directions of research in thisarea. Several of the detailed derivations are relegated tovarious appendices.

II. EXPERIMENTAL OBSERVATION OFDISSOCIATIVE SEARCH MECHANISM

In this section, we brieﬂy summarize the experimentalobservations by Liu et al. [3] that underlie the model de-veloped in this paper. Additionally, we compile in Table Iexperimentally determined quantities used to determinevalues of model parameters, since we refer to these quan-tities throughout the paper.In the experiment by Liu et al. , interactions of

Es-cherichia coli

DNA mismatch repair proteins MutS,MutL, and MutH with dsDNA were imaged via TIRFmicroscopy. Of particular interest is what will be re-ferred to as the dissociative search mechanism, so-calledbecause of the many cycles in which MutS and MutLassociate into a single complex and then dissociate intotwo separate complexes before re-forming a single com-plex as they diﬀuse along the DNA in order to locatethe hemimethylated site. (We called this mechanism the“association-dissociation mechanism” in the introductionfor clarity, but for the remainder of the paper we switchto the less cumbersome “dissociative mechanism.”) In particular, when MutS binds to a mismatch, it formsa stable clamp in the presence of ATP. It then diﬀusesalong the DNA strand. MutL may then bind to MutS,forming a new clamp which together diﬀuses more slowlyalong the DNA. This slower diﬀusion implies frequent in-teraction with the DNA backbone, thus indicating thatthe MutS-MutL clamp is capable of “searching” the DNAfor a hemimethlyated site [3]. Interestingly, MutL oftendissociates from MutS, and the two proteins form twoindependent, stable, and freely diﬀusing clamps, each ofwhich diﬀuses much more quickly than the MutS-MutLcomplex and is therefore not interacting with the DNAfrequently enough to perform a search [3]. If the dis-sociated clamps diﬀuse back into a state in which theyare adjacent along the DNA, they are able to reassociateand continue to search the DNA together. Finally, MutHassociates with MutL in order to cleave the newly synthe-sized DNA strand at the hemimethylated site. Measuredassociation lifetimes and diﬀusion constants for the dis-sociative search are compiled in Table I. Note that thediﬀusion of the MutS protein alone is ∼

10 times fasterthan the diﬀusion of the MutS-MutL complex, and thatthe diﬀusion of the MutL protein in the absence of MutHis a factor of ∼

20 faster than that of the MutS proteinalone. In the presence of MutH, MutS and MutL diﬀuseat similar rates. Furthermore, the addition of MutH doesnot seem to have a signiﬁcant eﬀect on the MutS-MutLdiﬀusion constant [3].The objective of this paper is to quantitatively studythe eﬀect of this dissociative mechanism on search eﬃ-ciency. In particular, there are two competing eﬀects ofdissociative diﬀusion on search eﬃciency that make itsoverall eﬀect unclear. Since it makes the overall diﬀu-sion faster (compared to a system that always remainsin the slow, searching state), it increases the region ofthe DNA that the protein clamps are able to visit. How-ever, since proteins in the dissociated state are unable toactually search the DNA, the amount of DNA actuallysearched may decrease if the proteins do not reassociateoften enough.

III. MODEL

To determine the eﬀect of the dissociative DNA mis-match repair search mechanism on search probabilities,we propose the microscopic model illustrated in Fig.1. Inthe model, the search begins with an associated MutS-MutL protein complex. The complex then diﬀuses inone dimension along the DNA with diﬀusion constant D SL ,µ . During this time, any portion of the DNA overwhich the complex passes is considered “searched” andthe overall search is considered successful if a hemimethy-lated site is reached in this state. The MutS-MutL com-plex dissociates with some average lifetime τ A ,µ into in-dependent MutS and MutL clamps initially separated bya distance x d with diﬀusion constants D S and D L , re-spectively. The individual MutS and MutL clamps dif- Quantity Symbol SL Value SLH ValueSearchcomplexdiﬀusionconstant D SL , M (6 ± × bp /s (8 ± × bp /sMutSdiﬀusionconstant D S (7 ± × bp /s NAMutLdiﬀusionconstant D L (1 . ± . × bp /s (6 ± × bp /sMutS-MutLassociationlifetime τ A , M ± τ S ±

35 s NAMutL-DNAassociationlifetime τ L ±

150 s NATABLE I. Summary of relevant quantities measured by Liu et al. [3]. The column labelled “SL Value” gives the value ofeach quantity in the absence of MutH, whereas the columnlabelled “SLH Value” gives the value of each quantity in thepresence of MutH. Note that some values were not measuredseparately in the presence of MutH (indicated by NA), so wewill assume that MutH does not change these values. In Liu’sexperiment, 17 . . µ m, sowe use 4 . µ m = 17 . µ m by Liu et al. to units of bp, which is more convenient forour model. fuse along the DNA until they come into contact again.Once in contact, the proteins reassociate with each otherwith an association probability p A or continue indepen-dent diﬀusion starting from a distance x d with proba-bility 1 − p A . This dissociation-reassociation cycle con-tinues until one or both of the proteins dissociate fromthe DNA. However, dissociation of the MutS and MutLclamps from the DNA is not modeled directly; instead,a “cutoﬀ time,” based on the MutS association lifetime τ S = 185 s (which is shorter than the MutL lifetime andthus provides the more stringent cutoﬀ), is used [3, 6]after which the search is declared unsuccessful. SinceMutH binding to MutL changes the diﬀusion rates, weprovide results that correspond to a search with non-MutH rates as well as results that correspond to a searchwith MutH rates. These two scenarios provide the twolimiting cases, since it is not known if MutH tends toassociate with MutL early or late in the search. IV. SUCCESSFUL HEMIMETHYLATED SITESEARCH PROBABILITY

Analysis of the dissociative search mechanism is per-formed in terms of successful hemimethylated site searchprobabilities. A successful search is deﬁned as one inwhich the MutS-MutL complex visits the hemimethy-

FIG. 1. (color online) Illustration of the model used to cal-culate successful hemimethylated site search probabilities inDNA mismatch repair. The DNA is modeled as a one di-mensional track along which the proteins travel. MutS andMutL protein clamps diﬀuse along the DNA with rates speci-ﬁed by D S and D L , respectively, until they are directly adja-cent to each other. They may then either diﬀuse away fromeach other without associating or associate into a combined,searching MutS-MutL. These happen with probabilities 1 − p A and p A , respectively. The combined MutS-MutL can in turndiﬀuse along the DNA with a new rate speciﬁed by D SL , andis capable of searching the DNA over which it passes. After anaverage lifetime τ A ,µ , this complex dissociates into separateMutS and MutL clamps. lated site before the cutoﬀ time passes. These searchprobabilities are calculated for diﬀerent hemimethylatedsite distances from the original MutS-MutL position. A. Successful hemimethylated search probability ofnon-dissociative search

The baseline to which we compare successfulhemimethylated search probabilities using the dissocia-tive search mechanism are the equivalent probabilities forsearches in which the clamps remain associated with eachother for the entire search (pure 1-dimensional diﬀusion),which will be referred to as “non-dissociative” or “purelydiﬀusive” searches. The probability of a successful non-dissociative search can be derived analytically followingRedner [12]. We start with the diﬀusion equation in thecase of a MutS-MutL clamp: ∂p ( x, t | x ) ∂t = D SL ∂ p ( x, t | x ) ∂x , (1)where D SL is the diﬀusion constant associated with theMutS-MutL clamp and x is the position along the DNAat which the non-dissociative search begins. p ( x, t ) dx isthe probability that the clamp will be searching position x at time t .In order to consider the probability that the searchreaches the hemimethylated site x meth , we ﬁrst solvethis diﬀerential equation in the presence of an absorb-ing boundary condition at x meth . Mathematically, thiscondition is expressed as p ( x meth , t ) = 0, requiring thatthe MutS-MutL has not arrived at the hemimethylatedsite. Using the method of images, this solution is givenby p ( x, t | x , x meth ) = 1 √ πD SL t (cid:34) exp (cid:32) − ( x − x ) D SL t (cid:33) − exp (cid:32) − ( x − (2 x meth − x )) D SL t (cid:33)(cid:35) , (2)which represents the spatial probability density of thesearch at some time t under the assumption that thesearch has not yet reached x meth . P ( x < x meth , t ) = (cid:90) x meth −∞ p ( x, t | x , x meth ) d x = erf (cid:18) x meth − x √ D SL t (cid:19) (3)thus gives the probability at time t that a clamp hasonly searched positions x S such that x S < x meth . Theprobability that x meth has been searched, therefore, is P (0) ( t ) = 1 − P ( x < x meth , t ) = 1 − erf (cid:18) x meth − x √ D SL t (cid:19) , (4)where the superscript (0) indicates that the probabilityis for a non-dissociative search. Successful probability fora dissociative search will be indicated by superscript ( ∗ ). B. Association and Dissociation Event SteppingSimulation

In order to use the model described in Sec. III tocalculate successful search probabilities, we develop aMonte Carlo approach that samples from analytic one-dimensional diﬀusion probability distributions. This cal-culation breaks the problem of determining the overallsuccessful search probability into the cumulation of theprobabilities that each individual microscopic associationidentiﬁes the hemimethylated site. Each individual prob-ability can be determined analytically from the associa-tion lifetime and associated diﬀusion distributions if thedistance between the position at which the clamps asso-ciate and the hemimethylated site is known. For a giveninitial distance to the hemimethylated site, the subse-quent hemimethylated site distances are determined byboth the diﬀusion of the associated clamps and the dif-fusion of the dissociated clamps. In principle, this con-ceptual framework produces an analytic expression forthe successful search probability involving iterative con-volution integrals. In practice, however, this expressionis too complex to be used to compute values directly. In particular, we found that the most straightforward wayto calculate the many integrals over diﬀusion positionand association lifetime probability distributions was torandomly sample from these distributions many times.Each set of random samples produces a probability of ei-ther 1 or 0 that the hemimethylated site was successfullyreached, and the average of many of these sets gives theoverall successful search probability.Another way to think of this iterative random sam-pling is to imagine that each set of random samples rep-resents a path that the protein clamps can take alongthe DNA strand which results in either a successful orunsuccessful search. Each path occurs with a frequencyproportional to its probability, and therefore setting thesuccessful searches to 1 and the unsuccessful searches to0 and taking the average of many such searches producesthe successful search probability.The following algorithm is used to carry out this exper-iment, and will be called the association and dissociationevent stepping simulation (ADESS):1. The clamps start immediately adjacent to eachother. We set the starting position of the clamps to x = 0, the step counting index to i = 0, and theelapsed time to t e = 0. Input a position to searchfor on a 1-D axis (designated the “hemimethylatedsite” or simply “ x meth ”) representing its distancefrom the initial MutS-MutL association site on thedsDNA. Also choose a cutoﬀ time, representing dis-sociation of the MutS clamp from the dsDNA.2. Decide whether the adjacent clamps associate bysampling randomly from a uniform distribution be-tween 0 and 1 and comparing the result to the inputprobability that adjacent clamps will associate, de-noted p A .3. If the clamps do not associate, go to step 7. If thecutoﬀ time has been reached ( t e ≥ t s ) mark thesearch as unsuccessful and go to to step 9.4. Randomly select, using the method of inverse trans-form, an association lifetime from the probabilitydistribution given by p assoc ( t ) = τ − ,µ exp( − t/τ A ,µ ) , (5)where τ A ,µ is the average microscopic associationlifetime of the clamps. This represents the time forwhich the clamps are diﬀusing together during thisassociation. Denote this time as t assoc and increasethe total elapsed time t e by t assoc .5. Decide whether the hemimethylated site x meth hasbeen reached given the previous association posi-tion and lifetime by sampling randomly from a uni-form distribution between 0 and 1 and comparingthe result to the probability that the site has beenreached, given by P ﬁnd ( t assoc ) = 1 − erf (cid:32) x meth − x i (cid:112) D SL ,µ t assoc (cid:33) , (6)where x i is the previous association position and D SL ,µ is the diﬀusion rate of the associated clamps.Note that this is simply Eq. (4) evaluated at t = t assoc . If the association time from the previous stepbrings the total time t e past the cutoﬀ time, t assoc is taken to be the diﬀerence between the cutoﬀ timeand the time at which the current association be-gan. This ensures that the ﬁnal association ﬁndsthe hemimethylated site with the proper probabil-ity.6. (a) If x meth has been reached, the search is suc-cessful, so we proceed to step 9.(b) If x meth has not been reached, use the pre-vious association position x i and lifetime torandomly select the next dissociation posi-tion x i +1 from the probability density func-tion of Eq. (2) at t = t assoc , with an additionalnormalization factor C that ensures that theprobability that x meth has not been reachedis 1 at time t = t assoc . This factor is nec-essary because we have already determined inthe previous step that the hemimethylated sitehas not been reached: p ( x i +1 , t assoc | x i , x meth ) == C √ πD SL t assoc (cid:34) exp (cid:32) − ( x i +1 − x i ) D SL t assoc (cid:33) − (7)exp (cid:32) − ( x i +1 − (2 x meth − x i )) D SL t assoc (cid:33)(cid:35) , where C = 1erf (cid:2) ( x meth − x i ) / (cid:112) D SL ,µ t assoc (cid:3) . (8)We increase i by one to indicate that the x i +1 determined here is the new position of the twonewly dissociated clamps.7. Use the dissociation lifetime distribution p dissoc ( t ) = x d √ πD rel t exp (cid:20) − x d D rel t (cid:21) (9)to determine how long the clamps remain disso-ciated (see Appendix A). Here, x d is as before theinitial distance of the clamps following dissociation,and D rel is the diﬀusion constant associated withthe ﬂuctuation of the distance between the clamps.Since each clamp is diﬀusing independently, the dis-tance between them is also diﬀusing without bias ina particular direction. Denote this chosen lifetime t dissoc and increment the total elapsed time t e by t dissoc .8. Using the lifetime chosen in the previous step t dissoc , select the next possible association position x i +1 from the distribution of positions at which the relative position of the clamps returns to 0.This distribution is given by the solution to theunbounded diﬀusion equation with constant D CM associated with the diﬀusion of the “center of mass”of the dissociated clamps (see Appendix A). In par-ticular, p return ( x i +1 | x i , t dissoc ) == 1 √ πD CM t dissoc exp (cid:20) − ( x i +1 − x i ) D CM t dissoc (cid:21) (10)Increase i by one and return to step 2.9. Perform many such searches and assign a value of1 to all those that are successful and 0 to those inwhich the cutoﬀ time is reached without success.Take the average value of all of these searches todetermine the successful search probability. Dividethe trials into 10 independent blocks of equal num-ber of trials and calculate the search probability foreach block to determine standard error. C. Determination of model parameters

The model described above is written in terms of sev-eral microscopic parameters. In this section we will de-termine the values of these parameters. Some of theseparameters can be calculated directly from experimen-tally measured values and are summarized in Tab. II. Forthe remainder, we need to make reasonable assumptionsabout their values, summarized in Tab. III.The reason that the values of these parameters must becalculated or estimated rather than be measured directlyis that the spatial resolution of the experiment is diﬀrac-tion limited. Since the wavelength of visible light is onthe order of hundreds of nm and the protein footprints areon the order of a few nm, the proteins interact on scalesbelow the spatial sensitivity of the experiment. Impor-tantly, this implies that the clamps can appear to be asso-ciated with each other in the experiment, when they arecloser than the spatial resolution of the experiment, eventhough they may or may not be in actual physical con-tact. In contrast, in our model we deﬁne the associatedstate as the state in which the diﬀusion of the clamps iscoupled, and the clamps have undergone some conforma-tional change that allows them to interact more closelywith the backbone and thus changes their diﬀusion rate.The dissociated state is the state in which the clampsare diﬀusing independently of each other. To avoid con-fusion, we will thus for the purposes of describing thecalculation of model parameters from experimental ob-servables denote the state in which the clamps are physi-cally associated as “microscopically associated,” the statein which the clamps are physically dissociated but closeenough that their positions are indistinguishable withinthe resolution of the experiment as “proximate,” and thestate in which the clamps are physically dissociated and

Parameter Symbol SL Value SLH ValueDissociatedclamps relativepositiondiﬀusionconstant D rel (1 . ± . × bp /s (1 . ± . × bp /sDissociatedclamps “centerof mass”diﬀusionconstant D CM (7 ± × bp /s (3 . ± . × bp /sMutS-MutLdiﬀusionconstant D SL ,µ , D SL ,µ (6 ± × bp /s (8 ± × bp /sMutS-MutLassociationlifetime τ A ,µ .

03 s ≤ τ A ,µ <

30 s 0 .

03 s ≤ τ A ,µ <

30 sDistance fromhemimethy-latedsite x meth t s ±

35 s 185 ±

35 sTABLE II. Model parameter values calculated from experi-mental observables. The column labelled “SL Value” giveseach value in the absence of MutH, whereas the column la-belled “SLH Value” gives each value in the presence of MutH.Parameter Symbol ValueAdjacent MutS-MutLassociation probability p A − ≤ p A ≤ x d x M far enough away that their positions are distinguishableas “macroscopically dissociated”. In addition, we willuse “macroscopically associated” to describe clamps thatcould be either “microscopically associated” or “proxi-mate” and “microscopically dissociated” for clamps thatcould be either “proximate” or ”macrosopically dissoci-ated”.

1. Diﬀusion constants of individual clamps

Since diﬀusion is scale invariant, there is no reasonto believe that the microscopic diﬀusion constants D S and D L of the individual clamps are diﬀerent from theirmacroscopically measured values given in Tab. I. Rewrit-ing the diﬀusion of two clamps of diﬀerent diﬀusion con-stants in terms of relative and “center of mass” coor-dinate yields D rel = D S + D L for the diﬀusion of therelative coordinate and D CM = D S D L D S + D L for the diﬀusionof the “center of mass” coordinate.

2. Association lifetime and complex diﬀusion constant

The experiment measures the lifetime τ A , M and dif-fusion constant D SL , M of macroscopically associatedclamps (see Tab. I). Since macroscopically associatedclamps could be either microscopically associated orproximal, a macroscopic association event consists of asequence of transitions between the microscopically as-sociated state and the proximal state, where only aftermultiple excursions into the proximal state the clampsﬁnally reach a distance that can be resolved in the ex-periment and thus reach the macroscopically dissociatedstate. Thus, the macroscopically measured lifetime τ A , M is an eﬀective lifetime that integrates over many mi-croscopic dissociation and re-association events, and themacroscopically measured diﬀusion constant D SL , M is atemporal average of the diﬀusion constant of microscopi-cally associated clamps D SL ,µ and the diﬀusion constantof the center of mass of individual clamps D CM duringtheir excursions in the proximal state.In Appendix B 1, we explicitly calculate how themacroscopically measured lifetime τ A , M that integratesover multiple microscopic dissociation and re-associationevents depends on the microscopic parameters of themodel. Solving this dependence for the microscopic as-sociation time yields τ A ,µ = 1[( (cid:104) N A (cid:105) − p A + 1] (cid:20) τ A , M − x M ( x M − x d ) D rel (cid:21) ≈ τ A , M [( (cid:104) N A (cid:105) − p A + 1] , (11)where p A is the probability that adjacent MutS and MutLclamps will associate, and (cid:10) N A (cid:11) = x M /x d is the num-ber of times the clamps are in a microscopically adja-cent state (making microscopic association possible) in asingle macroscopic association. x d and x M are the mi-croscopic and macroscopic association distances, respec-tively, so x d (cid:28) x M . The approximation in the second lineof Eq. (11) holds for our speciﬁc values of the parametersas x M ( x M − x d ) D rel ≈ .

07 s and τ A , M ≈

30 s. It implies thatthe time spent in the proximal state has a negligible con-tribution to the macroscopic association time due to thespeed of the dissociated diﬀusion, even though the factthat a macroscopic association event consists of multiplemicroscopic association events is relevant as evidencedby the prefactor [( (cid:104) N A (cid:105) − p A + 1] − . Accordingly (seeAppendix B 2), the excursions into the proximal state donot have a signiﬁcant impact on the diﬀusion constanteither due to their short durations. Thus, D SL ,µ ≈ D SL , M . (12)

3. Distance from the nearest hemimethylated site In Escherichia coli , hemimethylation occurs at GATCsites [13–15]. Thus, the distance from a random lo-cation in the genome to the nearest hemimethylatedsite is governed by the distance distribution of adja-cent GATC sites, shown in Fig. 2 for the genome of

Escherichia coli

K-12 MG1655, NCBI RefSeq assembly:GCF 000005845.2. While in 90% of the cases, the dis-tance between neighboring GATC sites is 500 bp or less,the largest distances between adjacent GATC sites reachall the way to 5000 bp. Since the ability to repair mis-matches in the genome should depend on being able toidentify the closest hemimethylated site even in the worstcase scenario of being right in the middle of the two fur-thest separated GATC sites, we will report search prob-abilities over a range of x meth = 500 − FIG. 2. (color online) Distribution of hemimethylated site dis-tances in the

Escherichia coli genome. For each separation onthe horizontal axis, the vertical axis shows the number of adja-cent GATC sites in the

Escherichia coli

K-12 MG1655, NCBIRefSeq assembly GCF 000005845.2 genome with at least thatseparation.

4. Total search time

The search continues until either MutS or MutL disso-ciates from the the DNA. Since the experimentally deter-mined MutS association lifetime τ S = 185 ±

35 s is muchshorter than the experimentally determined MutL associ-ation lifetime τ L = 850 ±

150 s, the search time is limitedby the MutS association lifetime and thus t s = τ S .

5. Dissociation distances and association probability

Unlike the microscopic association lifetime, micro-scopic diﬀusion constants, and the distance fromhemimethylated sites, the dissociation distances x d and x M and the association probability p A are not deter-mined by experimental observables, and thus cannot becalculated directly. Physical arguments, however, al-low estimation of x d and x M . In particular, the micro- scopic dissociation distance, i.e., the distance at whichthe clamps can be considered as independent, is on theorder of x d ∼ x M ∼

300 nm ∼ p A , but argumentscan be made to set limits on this parameter. As a prob-ability, the upper limit on p A is evidently 1. Approxima-tion of a lower limit is made possible by the assumptionthat p A ≥ P assoc, soln , where P assoc, soln is the probabil-ity that a MutL in solution colliding with a DNA-boundMutS will associate. This assumption is plausible sincethere is only one dimension (namely rotation around theDNA) in which MutS and MutL clamps already asso-ciated with the DNA must align in order to associatewith each other, rather than the three dimensions thatmust align when MutL is not already associated to theDNA. This assumption combined with published experi-mental results independent of the experiments in [3] sug-gests that the association probability must be greaterthan 10 − (see Appendix B 3):10 − ≤ p A ≤ D. Validation of the ADESS approach

In order to validate the ADESS approach and the mi-croscopic parameter calculation, we compare ADESS toa much more time consuming simulation that explicitlytracks the positions along the DNA and interactions ofMutS, MutL, and MutS-MutL clamps. This simulationuses Gillespie’s stochastic simulation algorithm [16] tochoose a clamp and a direction in which to move it inevery step (see Appendix C for details). Each move hasa step size of a single base pair; thus, we will denotethis simulation approach as the base pair stepping sim-ulation (BPSS). By counting those positions over whicha MutS-MutL complex passed as having been searched,this simulation provides an alternative tool by which thesuccessful search probability can be calculated. Sincethe BPSS approach follows every single diﬀusion stepof the clamps, it becomes computationally unfeasible toobtain suﬃcient statistics for realistic values of the dif-fusion constants and we thus perform this validation for D S = 10 bp /s, D SL = 10 bp /s, and D L = 10 bp /s,which are each about two orders of magnitude smallerthan the actual experimentally determined diﬀusion con-stants. Fig. 3 compares the search probability calculatedusing the BPSS approach and the search probability cal-culated using the ADESS approach and ﬁnds them toyield identical results within statistical error. FIG. 3. (color online) Comparison between successful searchprobabilities calculated using BPSS and ADESS. (a) is calcu-lated with p A = 1 and (b) with p A = 10 − . The statisticaluncertainty is smaller than the size of the symbols. Additionally, the BPSS allows us to validate Eq. (11)for the microscopic association lifetime τ A ,µ empirically.In particular, the BPSS approach lets us keep track of thedistance between separate clamps and the times at whichthese distances occur. Using this feature, we calculate thetime t A,M for which the clamps remain within the macro-scopic association distance x M of each other, i.e., thetime until they ﬁrst reach the macroscopically dissociatedstate. Fig. 4 shows histograms of this time to reach themacroscopically dissociated state calculated from simu-lations that use the microscopic association lifetime cal-culated via Eq. (11). We ﬁnd that these simulated dis-tributions accurately reproduce the experimentally mea-sured macroscopic association lifetime τ A , M ≈ ± FIG. 4. (color online) Histogram of simulated macroscopicassociation times for (a) p A = 1 and (b) p A = 10 − . Theline is given by τ − exp (cid:0) − t A , M /τ A,M (cid:1) , where τ A,M is theaverage of the simulated macroscopic association times. Thisline therefore demonstrates that the association time proba-bility decays exponentially with a decay constant consistentwith the macroscopic association lifetime of τ A , M ≈ ± E. Robustness of results to variation in estimatedparameters

Since several model parameters can only be estimated(see Tab. III) we next determine how sensitive our modelis to variations in these parameters. The parameterwith the largest uncertainty is the microscopic associ-ation probability p A . In order to gauge the sensitivity ofthe model to this parameter, we hold all other parametersconstant at their values given in Tabs. II and III (both inthe presence of, and the absence of, MutH) while vary-ing the microscopic association probability over its entirepotential range given in Eq. (13). Then, we numericallycalculate the main observable of our model, namely theprobability of a successful search, using the ADESS ap-proach described in Sec. IV B. Fig. 5 shows the resulting FIG. 5. (color online) Search probability as a function ofsearch distance for diﬀerent values of the association prob-ability p A . (a) was calculated with non-MutH parameters,while (b) was calculated with MutH parameters. The statis-tical uncertainty is smaller than the size of the symbols. search probabilities as a function of search distance x meth for diﬀerent values of the association probability p A . Wenote that the successful search probability is largely in-dependent of the microscopic association probability p A as long as p A ≥ .

001 and then drops signiﬁcantly for p A = 10 − . Since a signiﬁcantly reduced search proba-bility would be evolutionarily disadvantageous and ourlower limit of p A ≥ − originated from a fairly gen-erous “worst case” analysis (see Appendix B 3), we thusfrom here on focus on the range 0 . ≤ p A ≤

1. In thisrange the search probability is largely insensitive to thevalue of p A .We note that naively it appears unintuitive for theoverall search probability to be so insensitive to threeorders of magnitude of variation in the probability thattwo adjacent clamps successfully form a complex. How-ever, we would like to point out that the microscopic association probability p A appears in Eq. (11) for themicroscopic association lifetime. Thus, diﬀerent valuesfor the microscopic association probability p A yield dif-ferent values for the microscopic association lifetime τ A ,µ to keep the macroscopic association lifetime τ A , M con-sistent with its measured value. The relative insensitiv-ity of the search probability to the value of the micro-scopic association probability thus indicates that changesto the microscopic association lifetime compensate forthe signiﬁcant variation in microscopic association prob-abilities over three orders of magnitude. This also ex-plains the change in behavior at p A = 0 . (cid:104) N A (cid:105) = 1000 for our parameters, the denom-inator ( (cid:104) N A (cid:105) − p A + 1 in Eq. (11) is larger than onefor p A ≥ .

001 and asymptotes to one for p A < . p A ≥ .

001 the clamps go through multiple re-association events before ﬁnal dissociation, the lifetimeof which compensates for the change in the microscopicassociation probability p A . For p A < . τ A ,µ is lockedto the macroscopic association lifetime τ A , M , and is nolonger able to compensate for changes in the associationprobability p A .Similar to our analysis of the sensitivity of the associa-tion probability p A , we vary the values of the dissociationdistances x d and x M by a factor of two in each directionto determine the sensitivity of the search probability tochanges in these parameters at both limits of p A . Fig. 6demonstrates that for p A = 1 and p A = 10 − variation ofthe dissociation distances x d and x M by a factor of twoonly introduces a relative diﬀerence of up to 13%. Wethus conclude that the diﬀerence between the approxi-mate and exact values of the dissociation distances x d and x M will not signiﬁcantly aﬀect our results. V. DISSOCIATIVE SEARCH EFFICIENCY

In this section we will systematically compare theeﬃciency of the dissociative search involving multipledissociation-reassociation cycles of the two clamps with anon-dissociative search, in which the complex of the twoclamps searches the DNA via simple diﬀusion. The goalis to determine if the dissociative search observed in theexperiments by Liu et al. [3] confers an evolutionary ad-vantage of increased success probability over the simplernon-dissociative search. The successful search probabil-ity of the dissociative search is calculated numericallyusing the ADESS approach presented in Sec. IV B, whilethe successful search probability of the non-dissociativesearch is given analytically by Eq. (4).0

FIG. 6. (color online) Comparison of ADESS results for factorof two variations in the macroscopic and microscopic dissoci-ation distances. (a) is calculated with p A = 1 and (b) with p A = 0 . A. Dissociative and non-dissociative searches resultin similar single search eﬃciency for experimentaldiﬀusion constants

Fig. 7 shows the successful search probability P ( ∗ ) t s , ofthe dissociative search and P (0) t s , of the non-dissociativesearch for the experimentally determined values of thediﬀusion constants as a function of distance x meth fromthe hemimethylated site. Here, the subscript t s indicatesthe search time in seconds, and the subscript 1 indicatesthat the probability indicated is the success probabilityfor only a single search. Probabilities are shown for var-ious search times t s within roughly a factor of two fromthe experimental value of 185 s in both directions. Theﬁgure presents results for diﬀusion constants correspond- ing to the case where MutH is not associated with MutLand p A = 1 in (a) and for diﬀusion constants correspond-ing to the case where MutH is associated with MutL and p A = 0 .

001 in (b). These are chosen as the two extremesin terms of the diﬀerences between dissociative and non-dissociative searches, as the results for MutH parametersat p A = 0 .

001 and for non-MutH parameters at p A = 1are in between the two cases shown.Surprisingly, the non-dissociative search mechanismsomewhat, but systematically, outperforms the disso-ciative mechanism for this choice of parameters, espe-cially for the case of microscopic association probability p A = 0 . . B. Dissociative searches confer an advantage acrossa broad range of diﬀusion constants

In the crowded in vivo environment, diﬀusion is likelysigniﬁcantly slower (10-100 fold) than in vitro [17]. Ad-ditionally the diﬀusion constants, hemimethylated sitedistances, and association lifetimes of mismatch repairproteins may vary across organisms. In light of theseobservations, we next characterize the relative eﬀect ofthe dissociative search mechanism across a wide range ofpossible diﬀusion rates. Although we only explicitly varythe diﬀusion rate, this can be seen as variation of thedimensionless combination √ Dt/x on which the proba-bility depends (see Eq. (4)). Thus, we eﬀectively studyvariations in association time t s and hemimethylated sitedistance x meth as well as diﬀusion rate.In order to characterize the eﬀect of the dissociativesearch mechanism across many possible diﬀusion rates,times, and distances, we systematically vary diﬀusionrates and measure the relative advantage conferred bythe dissociative mechanism. Fig. 8 shows the relativeprobability r , deﬁned as r ≡ P ( ∗ ) t s , /P (0) t s , (14)for t s = 185 s. The darkness of the color indicates themagnitude of the relative probability, and the squaresthat are brown and have hatching are those in whichthe dissociative mechanism lowers the successful searchprobability ( r < r >

1, i.e., areas of increased probability due to thedissociative search mechanism. To ensure that smallerdiﬀerences are visible, relative diﬀerences r >

100 and r < /

100 are set to r = 100 and r = 1 / D SL is varied along thevertical axis, while the fast diﬀusion rates D S and D L arevaried along the horizontal axis. In order to restrict the1 FIG. 7. (color online) Successful search probability of disso-ciative and non-dissociative searches as a function of distance x meth from the hemimethylated site for diﬀerent search times.Results in (a) use experimental parameters in the case whereMutH is not associated with MutL and p A = 1 and (b) whenMuH is associated with MutL and p A = 0 . plot to two dimensions, the ratio between the two fastrates is guided by experiment: either both rates are thesame, or they diﬀer by an order of magnitude, roughlycorresponding to the situation in the presence and in theabsence of MutH, respectively. The framed square corre-sponds to the in vitro diﬀusion constants of E. coli andthe dashed lines enclose the range of diﬀusion constantsthat are smaller than the in vitro diﬀusion constants,consistent with the in vivo expectation [17]. The solid(blue) line indicates a reduction of the in vitro diﬀusionconstants by two orders of magnitude while maintainingthe in vitro ratio between D SL and D S . It is importantto note that although in vivo diﬀusion rates for E. coli are likely to fall within the region enclosed by the dashedlines, this may not necessarily be the case for other or-

FIG. 8. (color online) Relative successful search probabilityas a function of diﬀusion constants for D S = D L / p A = 1 . D S = D L , p A = 0 .

001 (right column). Theformer corresponds roughly to the case in which MutH is notpresent, and the latter corresponds roughly to the case inwhich MutH is present. The color scale indicates the ratioof ADESS dissociative and analytic non-dissociative proba-bilities, and is cut oﬀ at 10 and 10 − so that variations lessthan an order of magnitude are visible. Ratios greater than10 and less than 10 − are set to 10 and 10 − , respectively.The ratios less than one are hatched, while the ratios greaterthan one are solid. The square outlined indicates the orderof magnitude of experimental diﬀusion constants, the possi-ble in vivo E. coli diﬀusion constants are enclosed within thedashed lines, and the non-physical ( D SL < D S ) regions of thecoeﬃcient space are blocked out (in red). ganisms.We ﬁnd that diﬀerences between the search mecha-nisms are most signiﬁcant for the largest distances fromthe hemimethylated site. Also, as expected, combina-tions of slow associated diﬀusion and fast dissociated dif-fusion are most favored by the dissociative mechanism(green/unhatched regions of the plot), whereas combina-tions of fast associated diﬀusion D SL and slow dissociateddiﬀusion D S and D L are least favored by the dissociativemechanism (brown/hatched regions of the plot). Thelatter case is often physically unrealistic since the associ-ated clamps must diﬀuse more slowly than the individualclamps in order to interact with the DNA backbone andrecognize the hemimethylated site. Accordingly the re-gions of the plot in which D SL < D S are blocked out in2red, eliminating much of the space that would be disfa-vored by the dissociative mechanism. This renders thearea favored by dissociation broad by comparison.The area which is most highly favored by the dissocia-tive mechanism (the dark green region), however, occursat very low single search success probabilities. Whilethe relative probability increase is quite large ( ≥ − to a dissociative success probability of10 − would cross some threshold probability below whichfailure of mismatch repair may negatively aﬀect the or-ganism. This point is emphasized by the inclusion ofthe absolute single non-dissociative search probability onthe vertical axis (since this probability only depends on D SL , it remains constant as one moves across the plothorizontally). C. Multiple searches emphasize low probabilitysingle search diﬀerences

Data published by Acharya et al. , Graham et al. ,and Hombauer et al. [18–20] suggest that the DNAmismatch repair process involves multiple MutS-MutL(-MutH) searches for the hemimethylated site. Thus,the cumulative probability for multiple low probabilitysearches may result in a physiologically relevant successprobability for the overall search process. In order toapproximate the eﬀect of multiple searches, we need tocalculate the probability that at least one search is suc-cessful. This quantity will be referred to as overall suc-cessful search probability. Although the proteins involvedin separate searches are in principle able to interact witheach other, accounting for these interactions is beyondthe scope of this study. Instead, we hope to gain at leastqualitative insight into the overall search probability un-der the assumption that the individual searches are in-dependent. Under this assumption, P t s ,n s = 1 − (1 − P t s , ) n s (15)where P t s ,n s is the overall search probability, P t s , is thesingle search probability, and n s is the total number ofsearches.Figs. 9 and 10 show diﬀusion space scans of δP t s ,n s ≡ P ( ∗ ) t s ,n s − P (0) t s ,n s (16)indicated by the coloring/hatching for n S = 3 and n S =10 searches, respectively. Note that in these ﬁgures dif-ference between the two probabilities, rather than theirratio, is chosen to avoid overemphasizing large relativechanges between two otherwise small probabilities.As in Fig. 8, the physically unrealistic regions areblocked out, the probable region in which E. coli dif-fusion constants reside are enclosed in the dotted lines,and the approximate in vitro E. coli diﬀusion constants

FIG. 9. (color online) diﬀusion constant space probabilitydiﬀerence scan for searches by n s = 3 protein complexesin the cases D S = D L / p A = 1 . D S = D L , p A = 0 .

001 (right column). The former corre-sponds roughly to the case in which MutH is not present, andthe latter corresponds roughly to the case in which MutH ispresent. The color scale indicates the absolute diﬀerence be-tween the ADESS dissociative and analytic non-dissociativeprobabilities. Diﬀerences less than zero are hatched, whilediﬀerences greater than zero are solid. The square outlinedin blue indicates order of magnitude of experimental diﬀu-sion constants, the possible in vivo E. coli diﬀusion constantsare enclosed within the dotted lines, and the non-physical( D SL < D S ) regions of the coeﬃcient space are blocked out(in red). are indicated by the blue square. Since probability diﬀer-ences are shown in the colormap, the probability of thenon-dissociative search is omitted from the vertical axis.Figs. 9 and 10 demonstrate that there is a muchbroader range of diﬀusion constants, and thereforehemimethylated site distances and association times, forwhich the dissociative search mechanism is beneﬁcial formismatch repair hemimethylated site searches as com-pared to pure diﬀusion. For 10 searches, the absolutediﬀerence in probability approaches δP s, = 1 for thecases in which dissociation is most favorable, whereas for3 searches the maximum diﬀerence in probability is moremodest, with δP s, ≈ .

5. The case with 3 searches,however, exhibits a larger regime in which the dissocia-tion mechanism is meaningfully beneﬁcial.3

FIG. 10. (color online) diﬀusion constant space probabil-ity diﬀerence scan for searches by n s = 10 protein com-plexes in the cases D S = D L / p A = 1 . D S = D L , p A = 0 .

Experiments by Liu et al. [3] observed repeated asso-ciation and dissociation between MutS and MutL slidingclamps involved in identiﬁcation of a hemimethylated siteduring DNA mismatch repair in

E. coli . This naturallyraises the question if locally searching the DNA in theassociated state and then quickly diﬀusing to a diﬀerentlocation on the DNA when dissociated actually providesan advantage to the search process. Here, we model thedissociative search process, calculate the probability thatsearching DNA mismatch repair proteins successfully lo-cate the hemimethylated site, and compare the successrate of this dissociative search to the success rate of a simple diﬀusive search. We ﬁnd that both search mech-anisms are highly eﬃcient for the majority of observedhemimethylated site distances at measured in vitro dif-fusion rates. Perhaps somewhat surprisingly, there is aslight disadvantage in terms of single search probabil-ity conferred by the dissociative search mechanism forsearches at these in vitro rates. We note, however, thatthere may be variation in diﬀusion rate, association life-time, and hemimethylated site distance among diﬀerentorganisms and that it has been shown that in vivo diﬀu-sion can be slower than in vitro diﬀusion by one or two or-ders of magnitude [17]. Accordingly, we studied the eﬀectof the dissociative search mechanism across a large rangeof the parameter space of diﬀusion rates, association life-times, and hemimethylated site distances and found thatthe dissociative mechanism is either neutral or favorablein most cases. We ﬁnd the most signiﬁcant advantages ofthe dissociative search in the parameter regime where theoverall search probabilities (of both the dissociative andthe non-dissociative searches) are very small. While suc-cessful search probabilities in the sub-percent range areprobably not physiologically meaningful by themselves,we showed that they do become meaningful when takinginto account that DNA mismatch repair includes multi-ple MutS initiated searches for the hemimethylated site,resulting in a physiologically relevant advantage of thedissociative search mechanism for large regions of thephysically realistic parameter space.It is important to emphasize that our treatments ofmultiple searches and in vivo diﬀusion here are neces-sarily approximate. A more detailed treatment that ac-counts for the interactions between proteins that are ini-tially involved in “separate” searches may be a fruitfulavenue for future research: in principle the base pairstepping simulation is capable of tracking more than twoproteins, but the current computational cost is too high.Additionally, it is likely possible to expand the associ-ation and dissociation event stepping simulation to ac-count for more than two proteins and the presence ofother molecules on the DNA strand. In particular, thepresence of other molecules on the DNA strand may pro-vide a spatial constraint that prevents the occurrence ofthe of long-lived dissociation events that decrease theeﬃciency of the dissociative mechanism. Moreover, wehave here assumed that the ﬁrst encounter of a MutS-MutL complex with a hemimethylated site results in itsrecognition followed by an incision. If recognition of thehemimethylated site is stochastic itself, this will also re-duce the overall search probability. Incorporating this ef-fect into our approach and quantitating its consequenceson the search probabilities of the dissociative and non-dissociative searches will be an interesting direction offuture research.Another potential avenue of study is the eﬀect of amore physiological environment on the diﬀusion con-stants of the proteins. We note that the in vivo dif-fusion constants are likely to be smaller than the mea-sured in vitro coeﬃcients, but are not able to quantita-4tively predict the magnitude of this decrease. A studythat determines the actual in vivo diﬀusion constants ofmismatch repair proteins could therefore be very useful.Similarly, determination of diﬀusion constants in systemsother than

E. coli would be interesting.We note that in addition to its role in the search fora hemimethylated site, MutL acts as a processivity fac-tor for the DNA helicase uvrD, resulting in the excisionthat is necessary for the progression MMR process [21].It therefore could be the case that the observed disso-ciative mechanism is evolutionarily preferred because thedissociation steps allow MutS to load multiple MutL pro-teins onto the strand, aiding in excision. This alternativehypothesis would be strengthened if further work deter-mines that in vivo search eﬃciency is not increased by thedissociative mechanism, although it is also possible thatthe dissociative mechanism serves a dual purpose: bothincreasing search eﬃciency and loading multiple MutLproteins onto the DNA strand.Beyond describing the speciﬁcs of the MutS-MutLsearch process, our approach in this paper is likely tobe applicable to other diﬀusive processes along DNA inbiology. For instance, Zessin et al. observe a fast andslow diﬀusion rate of proliferating cell nuclear antigen(PCNA), which is a eukaryotic protein similar to a β clamp that also forms a clamp structure during asso-ciation with DNA [22]. Eukaryotes also exhibit threehomologs to both MutS and MutL [6], combinationsof which are likely to result in a variety of associa-tion/dissociation and diﬀusion parameters. In this case,the broad parameter space characterized by our analysismay provide insight into MMR in many organisms.Despite the work still necessary to fully understandthe diﬀusive search process in DNA mismatch repair, weprovide a broad characterization of the observed dissocia-tive search mechanism along with a robust analytical andcomputational framework with which to study diﬀusionand interaction of protein clamps in DNA mismatch re-pair that can provide the basis for generalization to othersliding clamp systems in Biology. VII. ACKNOWLEDGMENTS

This material is based upon work supported by theNational Science Foundation under Grant No. DMR-1719316 to RB and by the National Institutes of Healthunder Grant Nos. GM129764 and CA067007 to RF. [1] Juana V. Martin-Lopez and Richard Fishel, “The mech-anism of mismatch repair and the functional analysis ofmismatch repair defects in Lynch syndrome.” Fam. Can-cer , 159–168 (2013).[2] Gloria X. Reyes, Tobias T. Schmidt, Richard D. Kolod-ner, and Hans Hombauer, “New insights into the mecha-nism of DNA mismatch repair,” Chromosoma , 443–462 (2015).[3] Jiaquan Liu, Jeungphill Hanne, Brooke M. Britton, JaredBennett, Daehyung Kim, Jong-Bong Lee, and RichardFishel, “Cascading MutS and MutL sliding clamps con-trol DNA diﬀusion to activate mismatch repair,” Nature(London) , 583–587 (2016).[4] Jean Y.J. Wang and Winfried Edelmann, “Mismatch re-pair proteins as sensors of alkylation DNA damage,” Can-cer Cell , 417–418 (2006).[5] Ravi R Iyer, Anna Pluciennik, Vickers Burdett, andPaul L Modrich, “DNA mismatch repair: functions andmechanisms,” Chem. Rev. , 302–323 (2006).[6] Richard Fishel, “Mismatch repair,” J. Biol. Chem. ,26395–26403 (2015).[7] Otto G. Berg, Robert B. Winter, and Peter H. Von Hip-pel, “Diﬀusion-driven mechanisms of protein transloca-tion on nucleic acids. 1. models and theory,” Biochem-istry , 6929–6948 (1981).[8] Ohad Givaty and Yaakov Levy, “Protein sliding alongDNA: Dynamics and structural characterization,” J. Mol.Biol. , 1087–1097 (2009).[9] M O’Donnell, J Kuriyan, X P Kong, P T Stukenberg,and R Onrust, “The sliding clamp of DNA polymeraseIII holoenzyme encircles DNA.” Mol. Biol. Cell , 953–957 (1992). [10] Dina Daitchman, Harry M Greenblatt, and Yaakov Levy,“Diﬀusion of ring-shaped proteins along DNA: case studyof sliding clamps,” Nucleic Acids Res. , 5935–5949(2018).[11] Xiang-Peng Kong, Rene Onrust, Mike O’Donnell, andJohn Kuriyan, “Three-dimensional structure of the β subunit of E. coli DNA polymerase III holoenzyme: Asliding DNA clamp,” Cell , 425–437 (1992).[12] Sidney Redner, A guide to ﬁrst-passage processes (Cam-bridge University Press, 2001).[13] Sanford Lacks and Bill Greenberg, “Complementaryspeciﬁcity of restriction endonucleases of Diplococcuspneumoniae with respect to DNA methylation,” J. Mol.Biol. , 153–168 (1977).[14] Stanley Hattman, Joan E Brooks, and Malthi Ma-surekar, “Sequence speciﬁcity of the P1 modiﬁcationmethylase (M · Eco P1) and the DNA methylase (m · Ecodam) controlled by the Escherichia coli dam gene,” J.Mol. Biol. , 367–380 (1978).[15] Gail E Geier and Paul Modrich, “Recognition sequenceof the dam methylase of Escherichia coli K12 and modeof cleavage of Dpn I endonuclease.” J. Biol. Chem. ,1408–1413 (1979).[16] Daniel T. Gillespie, “Exact stochastic simulation of cou-pled chemical reactions,” J. Phys. Chem. , 2340–2361(1977).[17] Michael C Konopka, Irina A Shkel, Scott Cayley,M Thomas Record, and James C Weisshaar, “Crowd-ing and conﬁnement eﬀects on protein diﬀusion in vivo,”J. Bacteriol. , 6115–6123 (2006).[18] Samir Acharya, Patricia L Foster, Peter Brooks, andRichard Fishel, “The coordinated functions of the E. coli MutS and MutL proteins in mismatch repair,” Mol. Cell. , 233–246 (2003).[19] William J Graham, Christopher D Putnam, Richard DKolodner, et al. , “The properties of Msh2–Msh6 ATPbinding mutants suggest a signal ampliﬁcation mecha-nism in DNA mismatch repair,” J. Biol. Chem. ,18055–18070 (2018).[20] Hans Hombauer, Christopher S Campbell, Catherine ESmith, Arshad Desai, and Richard D Kolodner, “Vi-sualization of eukaryotic DNA mismatch repair revealsdistinct recognition and repair intermediates,” Cell ,1040–1053 (2011).[21] Jiaquan Liu, Ryanggeun Lee, Brooke M Britton, James ALondon, Keunsang Yang, Jeungphill Hanne, Jong-BongLee, and Richard Fishel, “MutL sliding clamps coordi-nate exonuclease-independent Escherichia coli mismatchrepair,” Nat. Commun. , 1–15 (2019).[22] Patrick JM Zessin, Anje Sporbert, and Mike Heilemann,“PCNA appears in two populations of slow and fast dif-fusion with a constant ratio throughout S-phase in repli-cating mammalian cells,” Sci. Rep. , 18779 (2016).[23] Eli Ben-Naim, Paul L. Krapivsky, and Sidney Red-ner, “Random walk/diﬀusion,” http://physics.bu.edu/~redner/542/book/rw.pdf (2008), accessed: 2020-06-22.[24] Paul Krapivsky, Sidney Redner, and Eli Ben-Naim, A Kinetic View of Statistical Physics , edited by Cam-bridge University Press (Cambridge University Press,2010).[25] M.V. Smoluchowski, “Versuch einer mathematischenTheorie der Koagulationskinetik kolloider L¨osungen.” Z.Phys. Chem. , 129–168 (1917).[26] Laura Manelyte, Claus Urbanke, Luis Giron-Monzon,and Peter Friedhoﬀ, “Structural and functional analysisof the MutS C-terminal tetramerization domain,” NucleicAcids Res. , 5270–5279 (2006).[27] Michelle Grilley, Katherine M Welsh, SS Su, andPaul Modrich, “Isolation and characterization of the Es-cherichia coli MutL gene product.” J. Biol. Chem. ,1000–1004 (1989).[28] Samir Acharya, Patricia L Foster, Peter Brooks, andRichard Fishel, “The coordinated functions of the E. coliMutS and MutL proteins in mismatch repair,” Mol. Cell , 233–246 (2003). Appendix A: Time and location of re-association

In this appendix we derive the probability densities forthe time to reassociation and the reassociation location oftwo clamps once they have disassociated from each other.These distributions are used in the ADESS approach toupdate the time and position after a microscopic excur-sion of the clamps.

1. Independent diﬀusion of two sliding clamps

While the two clamps are diﬀusing independently, thestate of the system is given by positions x S and x L of theMutS and the MutL clamp along the DNA, respectively. The joint probability distribution for the two clamps fol-lows the diﬀusion equation ∂p ( x S ,x L | t ) ∂t = D S ∂ p ( x S ,x L | t ) ∂x S + D L ∂ p ( x S ,x L | t ) ∂x L . (A1)By analogy to the Schr¨odinger equation for a two-body quantum mechanical problem, this equation canbe rewritten in terms of relative and “center-of-mass”coordinates. In particular, substituting x CM ≡ D S x S + D L x L D S + D L , (A2) x rel ≡ x S − x L , (A3) D CM ≡ D S D L D S + D L and (A4) D rel ≡ D S + D L (A5)yields ∂p ( x CM , x rel | t ) ∂t = (A6)= D CM ∂ p ( x CM , x rel | t ) ∂x + D rel ∂ p ( x CM , x rel | t ) ∂x , which describes independent diﬀusion of the “center ofmass” coordinate x CM with diﬀusion constant D CM andthe relative coordinate x rel with diﬀusion constant D rel .

2. Time of reassociation

In our model, the microscopic dissociation of the twoclamps results in them being separated by the micro-scopic dissociation distance x d . Since relative and centerof mass position diﬀuse independently, the time to reasso-ciation is the time the freely diﬀusing relative coordinate x rel takes to reach x rel = 0 when starting at x rel = x d .This problem is mathematically equivalent to the prob-lem of the associated clamps reaching the hemimethy-lated site x meth after starting at some position x . Wecan thus mirror image Eq. (3) (since x rel = 0 providesa left boundary for this problem while x meth provided aright boundary in the context of Eq. (3)) and replace x with x d , x meth with 0, and D SL with D rel to obtain P ( t | x rel >

0) = erf (cid:18) x d √ D rel t (cid:19) (A7)for the probability that at time t the two clamps startingat an initial distance of x d have not yet touched. Theprobability density associated with the return of the dis-tance between the two clamps to 0 from a distance of x d is therefore given by the negative derivative of thisprobability, i.e., p dissoc ( t ) = − ∂P ( t | x rel < x meth ) ∂t = x d √ πD rel t exp (cid:20) − x d D rel t (cid:21) . (A8)6

3. Location of reassociation

Since at the time of reassociation the two clamps are atthe same location, all we have to do to ﬁnd the locationof this event is to follow the motion of the center of masscoordinate x CM during the excursion. Since this is a freediﬀusion, the probability density for the location of themeeting point x of the two clamps after a time t giventhat they dissociated at some location x is p return ( x | x , t ) = 1 √ πD CM t exp (cid:20) − ( x − x ) D CM t (cid:21) . (A9) Appendix B: Microscopic Parameter Calculation

The following are the full calculations used to deter-mine the microscopic protein dynamics from experimen-tal observables. In particular, we calculate the micro-scopic diﬀusion constant, D SL ,µ , and the microscopic as-sociation lifetime, τ A ,µ . The calculations of P M and τ ( x )calculations closely follow [23], a web published earlydraft of [24].

1. MutS-MutL Association Lifetime

First, we calculate the microscopic association lifetime.Consider ﬁrst the macroscopic association lifetime, whichcan be written as τ A , M = τ A ,µ [( (cid:104) N A (cid:105) − p A + 1] + τ R ( (cid:10) N A (cid:11) −

1) + τ M (B1)where N A is the number of times the clamps are mi-croscopically adjacent during a single macroscopic asso-ciation, p A is the probability of microscopic associationgiven that the clamps are adjacent, τ R is the averagetime to return to the adjacent state, and τ M is the aver-age time to reach distance x M without returning to theadjacent state (i.e. the average time to macroscopic dis-sociation). Note that removing a single adjacent statefrom the factor multiplied by p A and multiplying it di-rectly by τ A ,µ ensures that there is at least one micro-scopic association in every macroscopic association. Thismust be true physically, since diﬀerent diﬀusion rates areobserved during macroscopic association.Consider N A for a complex starting in the aggregatestate: P ( N A = 1) = P M P ( N A = 2) = (1 − P M ) P M P ( N A = 3) = (1 − P M ) P M P ( N A ) = (1 − P M ) N A − P M (B2)where P M is the probability for a newly microscopically dissociated complex to go to x M . Thus, (cid:10) N A (cid:11) = P M ∞ (cid:88) N A =1 N A (1 − P M ) N A − = 1 P M . (B3)In order determine P M we ﬁrst consider P M as a func-tion of the distance between the clamps, which we willdenote as x for the remainder of this subsection to avoidthe more cumbersome notation of x rel used in the rest ofthe manuscript. Evaluation of this function at x = x d will give P M . ( P M ( x ) will refer to the probability to goto x M from some position x without visiting 0, while P M ≡ P M ( x d ) refers to the probability to go to x M from x d .) Additionally, since the clamps diﬀuse with inter-mittent DNA contact, P M ( x ) will be calculated underthe assumption that the distance between clamps diﬀusescontinuously. This allows us to write P M ( x ) = 12 P M ( x + δx ) + 12 P M ( x − δx )0 = P M ( x + δx ) − P M ( x ) + P M ( x − δx ) δx (B4)and therefore ∂ P M ( x ) ∂x = 0 (B5)with the boundary conditions P M (0) = 0 P M ( x M ) = 1 . (B6)The unique solution of this diﬀerential equation is P M ( x ) = xx M (B7)and thus P M ≡ P M ( x d ) = x d x M (B8)where x d is the separation of the clamps immediatelyfollowing dissociation. Therefore we conclude that (cid:10) N A (cid:11) = x M /x d . (B9)In order to compute the microscopic association life-time τ A ,µ from Eq. (B1), it is also necessary to computethe average return time τ R and the average time τ M toreach x M . To this end, consider the average time τ ( x )for the distance between the clamps to reach either 0 or x M given that the starting distance is x : τ ( x ) = (cid:88) paths t p ( x ) P p ( x ) (B10)where t p ( x ) is the time for a path of length x and P p ( x )is the probability of such a path. Consideration of the7eﬀect of single inﬁnitesimal time step δt allows us to write τ ( x ) = (cid:88) paths t p ( x ) P p ( x )= (cid:88) paths (cid:104) t p ( x + δx ) P p ( x + δx ) + (B11)+ 12 t p ( x − δx ) P p ( x − δx ) (cid:105) + δt = 12 τ ( x + δx ) + 12 τ ( x − δx ) + δt. Thus, division by the square of some small spatial step δx yields − δtδx = τ ( x + δx ) + τ ( x − δx ) − τ ( x ) δx . (B12)Therefore, ∂ τ ( x ) ∂x = − δtδx = − D rel , (B13)where we write the right hand side in terms of the diﬀu-sion constant D rel = D S + D L . The boundary conditions τ (0) = 0 τ ( x M ) = 0 (B14)allow us to conclude τ ( x ) = xD rel ( x M − x ) . (B15)We now write this quantity in terms of τ R and τ M asfollows: (cid:10) N A (cid:11) τ ( x d ) = τ R ( (cid:10) N A (cid:11) −

1) + τ M . (B16)Thus, substitution into Eq. (B1) yields τ A , M = τ A ,µ [( (cid:104) N A (cid:105) − p A + 1] + (cid:10) N A (cid:11) τ ( x d ) (B17)Finally, we can conclude τ A ,µ = τ A , M − (cid:10) N A (cid:11) τ ( x d )[( (cid:104) N A (cid:105) − p A + 1] (B18)where (cid:10) N A (cid:11) = x M /x d .

2. Microscopic Diﬀusion Constant

Having computed the microscopic association lifetime,we turn our attention to the microscopic diﬀusion con-stant. During microscopic association, the observablequantity, that is, the diﬀusion of the “center of mass”of the oscillating dissociative complex, is given by D M,SL = P A D SL ,µ + P D D CM , (B19) where D SL ,µ and D CM are the microscopically associatedand dissociated complex diﬀusion rates, respectively, and D M,SL is the measured, macroscopic diﬀusion rate ofthe complex. P A and P D are the probabilities that theclamps are associated and dissociated, respectively. Asargued in Sec. A 1, D CM = D S D L D S + D L . It follows that thequantity needed for the microscopic model, the micro-scopic diﬀusion constant, is given by D SL ,µ = 1 P A ( D M,SL − P D D S D L D S + D L ) (B20)Since D M,SL , D S , and D L are measured experimen-tally, we only need to write P A and P D in terms of ob-servable quantities to obtain a value for D SL ,µ . In orderto do this, we observe that the probabilities that the pro-teins are microscopically associated and dissociated aregiven by the ratios of average time spent in an associatedand dissociated state, respectively, divided by the sum ofthese times: P A = p A τ A ,µ p A τ A ,µ + τ R (B21) P D = τ R p A τ A ,µ + τ R , (B22)where τ A ,µ is the microscopic association time, and τ R is the average time to return to the adjacent state. τ A ,µ is multiplied by the association probability, p A , becausethere are 1 /p A returns with time τ R for every microscopicassociation. Note that τ M does not enter these equations.This is because the ﬁnal walk from x rel = 0 to x M hasonly a minor inﬂuence on the experimentally measureddiﬀusion rate as τ M represents only the last ∼ x M /D rel ≈ . ≈

30 s macroscopic association.Eq. (B16) gives an expression for τ R in terms of τ M , soin order to determine τ R we must ﬁrst compute τ M . For-tunately, we can calculate τ M in a way that is analogousto the calculation of τ ( x ) in the previous section. Goingback to a discrete picture, during a random walk thatresults in a separation distance x = x M before reaching x = 0, the ﬁrst step after dissociation is from x = x d to x = 2 x d . Thus, τ M = τ step + (cid:10) N x d (cid:11) τ x d ,M (2 x d ) , (B23)where τ x d ,M ( x ) is the average time for the distance be-tween the clamps to reach either x d or x M and N x d isthe number of times the distance reaches x d before go-ing to x M . Modifying the calculation of τ ( x ) with theappropriate boundary conditions τ x d ,M ( x d ) = 0 τ x d ,M ( x M ) = 0 (B24)we ﬁnd τ x d ,M ( x ) = x − x d D rel ( x M − x ) (B25)which yields τ M = x d D rel + (cid:10) N x d (cid:11) x d D rel ( x M − x d ) . (B26)8Similarly, (cid:10) N x d (cid:11) can be computed in the same waythat (cid:10) N A (cid:11) was found earlier. In particular, (cid:10) N x d (cid:11) = 1 P x d ,M , (B27)where P x d ,M is the probability that the distance goes to x M before x d from distance 2 x d .Using Eqs. (B4) and (B5) with boundary conditions P M ( x d ) = 0 P M ( x M ) = 1 (B28)we get P x d ,M = x − x d x M − x d . (B29)Finally, since we assume that the walk starts at x = 2 x d , (cid:10) N x d (cid:11) = x M − x d x d . (B30)Appropriate substitutions and algebraic manipulationsyield D SL ,µ = D SL , M − δ ( D CM − D SL , M ) (B31)with δ = R x R τ (cid:16) − R x − R x (cid:17) (cid:16) R x p A (1 − R x ) (cid:17) − R x ) − R τ (B32) ≈ R x R τ (cid:18) R x p A (cid:19) (B33)where R x ≡ x d x M ∼ − , R τ ≡ x M τ M,A D rel ∼ − for thespeciﬁc values of the parameters and the approximationin the second line holds since R x (cid:28) R τ (cid:28)

1. In thefollowing section we show that 10 − ≤ p A ≤

1. For theexperimental values of the parameters and p A = 10 − the correction δ ( D CM − D SL , M ) is ∼

50 bp /s ∼ . D SL , M and for p A = 1, this correction is ∼ /s ∼ .

01% of D SL , M . Thus, D SL ,µ ≈ D SL , M . (B34)

3. Approximation of association probability lowerlimit

The lower limit of the association probability can becalculated under the assumption that p A ≥ P assoc, soln ,where P assoc, soln is the probability that a MutL in so-lution colliding with a DNA-bound MutS will associate.As discussed in the main text of the paper, it should beeasier for MutL and MutS to bind when they are bothalready somewhat aligned by their formation of clampstructures on the DNA. The association probability P assoc, soln is given by theratio P assoc, soln = k on, exp /k on, max , (B35)where k on, exp is the experimental rate at which MutLassociates with MutS on DNA from solution, and k on, max is the rate at which MutS and MutL collide (e.g. thediﬀusion limited rate).We ﬁrst focus on the diﬀusion limited rate. The Smolu-chowski equation yields an expression for the diﬀusion-limited rate constant for two uniform spheres [25]: k on, max = 4 πDR, (B36)where D is the relative diﬀusion constant and R is thereaction radius.Manelyte et al. give the MutS Stokes radius as R S,S ∼ et al. give the MutL Stoke radiusas R S,L ∼ R ≈ R S,S + R S,L ∼

10 nm.To determine the relative diﬀusion constant D , weuse the measured MutS diﬀusion along the DNA strand, D S = 0 . ± . µ m / s, and the Stokes-Einstein dif-fusion of MutL in water at room temperature D L , soln = k B T πηR S,L ≈ × − m /s (cid:29) D S . Thus D ∼ × − m /s and the diﬀusion limited on rate is k on, max ∼ M − s − . (B37)We can now turn to the experimental on rate. Liu et al. do not measure this rate directly, but they doﬁnd the fraction F SL of an ensemble of DNAs on whichMutS-MutL complexes associate in equilibrium to behigh enough to perform the experiment, i.e., a signiﬁ-cant fraction of their constructs shows association of aMutL at their experimental concentration of MutL [3].We thus choose F SL = 0 . F SL ≈ K d,S = 0 . µM [28] and the measured MutL oﬀ rate k oﬀ,L ∼ /τ on , L ≈ /

850 s can be used to estimate thedesired on rate. The fraction of DNAs with MutS-MutLassociated is given by F SL = [SLDNA] / [DNA] = k on,L [L][SDNA] k oﬀ,L [DNA] (B38)and thus k on,L = k oﬀ,L F SL K d,S [ L ][ S ] . (B39)For the reported [ L ] ≈

20 nM and [ S ] ≈

10 nM k on,L ∼ M − s − (B40)for the worst case estimate F SL = 0 . k on,L ∼ M − s − for F SL = 1. Thus we conclude that P assoc,soln ∼ − M − s − (B41)and therefore 10 − ≤ p A ≤ − ≤ p A ≤ F SL = 1.9 Appendix C: Base pair stepping simulation