Potential evolutionary advantage of a dissociative search mechanism in DNA mismatch repair
Kyle Crocker, James London, Andrés Medina, Richard Fishel, Ralf Bundschuh
PPotential evolutionary advantage of a dissociative search mechanism in DNAmismatch repair
Kyle Crocker, James London, Andr´es Medina, Richard Fishel, and Ralf Bundschuh Department of Physics, The Ohio State University, Columbus, Ohio 43210, USA Department of Cancer Biology and Genetics, The Ohio State University, Columbus, Ohio 43210, USA Department of Physics, Department of Chemistry and Biochemistry,Division of Hematology, Department of Internal Medicine,The Ohio State University, Columbus, Ohio 43210, USA (Dated: December 18, 2020)Protein complexes involved in DNA mismatch repair appear to diffuse along dsDNA in order tolocate a hemimethylated incision site via a dissociative mechanism. Here, we study the probabilitythat these complexes locate a given target site via a semi-analytic, Monte Carlo calculation thattracks the association and dissociation of the complexes. We compare such probabilities to thoseobtained using a non-dissociative diffusive scan, and determine that for experimentally observeddiffusion constants, search distances, and search durations in vitro , there is neither a significantadvantage nor disadvantage associated with the dissociative mechanism in terms of probability ofsuccessful search, and that both search mechanisms are highly efficient for a majority of hemimethy-lated site distances. Furthermore, we examine the space of physically realistic diffusion constants,hemimethylated site distances, and association lifetimes and determine the regions in which dis-sociative searching is more or less efficient than non-dissociative searching. We conclude that thedissociative search mechanism is advantageous in the majority of the physically realistic parameterspace.
I. INTRODUCTION
DNA mismatch repair (MMR) is a molecular pro-cess by which errors in DNA sequence indicated by mis-matched base pairs are corrected. Failure of this pro-cess is the cause of many cancers [1], but a completemechanistic description of the process does not yet ex-ist [1–3]. The MMR process is evolutionarily conservedfrom prokaryotes to eukaryotes [4–6], so
E. coli
MutS,MutL, and MutH proteins may be productively used tostudy MMR. In
E. coli , MMR consists of the followingsteps. First, MutS recognizes a mismatched site on aDNA strand and associates with the DNA. This MutSthen binds MutL from solution, which in turn can bindMutH. MutH then nicks the newly synthesized, erroneousDNA strand. Excision, followed by polymerization andligation, complete the repair process [5, 6].Here, we describe a quantitative model of the processby which the MMR proteins determine which strand isnewly synthesized. Since
E. coli methylates its DNAstrands whenever a GATC base sequence appears, anewly synthesized strand differs from existing strands inthat it is not yet methylated. A MutL activated MutH,therefore, nicks the new strand at a hemimethylated site,and the strand containing the nick is excised. In orderto create this nick, however, a hemimethylated site mustfirst be recognized. The hemimethylated sites may bethousands of base pairs away from the mismatch (andtherefore the place at which the MutS proteins bind tothe DNA), so recognition of a hemimethylated site is nota trivial problem. Through single molecule probing ofthe MMR process in vitro , Liu et al. recently found ex-tremely stable toroidal protein clamps diffusing along theDNA strand while transiently associating and dissociat- ing from each other in order to reach and recognize ahemimethylated site [3]. It is this diffusion mechanismthat is the subject of our quantitative model.Protein searches of DNA for specific sites are com-mon, and searches involving non-toroidal proteins havebeen studied extensively: Berg et al. derived a completemathematical model of this search process in terms ofassociation and dissociation rates, as well as geometricalconsiderations [7]. Givaty et al. developed a molecularsimulation based on electrostatic forces of non-toroidalDNA binding proteins searching DNA and tracked non-toroidal protein motion. They found that the most effi-cient DNA searches consist of ∼
20% sliding and ∼ et al. in the context of an E.coli polymerase, DNA polymerase III holoenzyme, whichis stabilized on the DNA by the β -clamp that encirclesthe DNA [9]. More recently, Daitchmen et al. have usedmolecular dynamics simulations to study the diffusion ofthese protein clamps and report on the way in which thephysical properties of the clamps affect the diffusion dy-namics [10]. However, all of the previous studies that weare aware of have focused on individual proteins ratherthan the search process as a whole.The focus of this paper is to quantitatively model theobserved protein clamp association-dissociation mecha-nism present in MMR protein clamp diffusion. Whilethis is similar to the non-toroidal search mechanism de-scribed by Berg in that it is characterized by a transitionbetween a slow searching state and a fast non-searchingstate, the time distribution of the fast state is differentin each case. In particular, the transition from the disso-ciated fast state into the slow searching state is governed a r X i v : . [ q - b i o . S C ] D ec by 3-D diffusion in the non-toroidal proteins discussed byBerg [7], whereas the toroidal structure of the proteinsthat we consider, while allowing dissociation of the pro-teins from each other, prevents release from the DNA andthus restricts their motion to a single dimension. Thisstructure also prevents transfer between nearby DNAsegments [9–11].After construction of the quantitative model, we inves-tigate if the association-dissociation mechanism serves toincrease the efficiency with which a hemimethylated siteis found, as compared to a more straightforward situa-tion in which the proteins are unable to dissociate fromone another (or, equivalently, there is only a single pro-tein). If this were the case, it could provide an evolution-ary pressure favoring the association-dissociation mecha-nism. We find that although the association-dissociationmechanism makes little difference at the observed E. coli parameters, there is a much larger section of parameterspace in which the association-dissociation mechanism isbeneficial as opposed to detrimental when compared toa case in which the proteins do not dissociate.This paper begins with a summary of the Liu et al. experiments that led to our model, including a tabula-tion of experimental parameters relevant to the modelin section II. In section III, the model itself is describedboth physically and mathematically. Section IV presentsour approach to calculating the probability of finding thehemimethylated site from the model. The main findingsconcerning the probability of finding the hemimethylatedsite are then presented in section V, and finally the im-plications of those results are discussed in section VI,along with potential future directions of research in thisarea. Several of the detailed derivations are relegated tovarious appendices.
II. EXPERIMENTAL OBSERVATION OFDISSOCIATIVE SEARCH MECHANISM
In this section, we briefly summarize the experimentalobservations by Liu et al. [3] that underlie the model de-veloped in this paper. Additionally, we compile in Table Iexperimentally determined quantities used to determinevalues of model parameters, since we refer to these quan-tities throughout the paper.In the experiment by Liu et al. , interactions of
Es-cherichia coli
DNA mismatch repair proteins MutS,MutL, and MutH with dsDNA were imaged via TIRFmicroscopy. Of particular interest is what will be re-ferred to as the dissociative search mechanism, so-calledbecause of the many cycles in which MutS and MutLassociate into a single complex and then dissociate intotwo separate complexes before re-forming a single com-plex as they diffuse along the DNA in order to locatethe hemimethylated site. (We called this mechanism the“association-dissociation mechanism” in the introductionfor clarity, but for the remainder of the paper we switchto the less cumbersome “dissociative mechanism.”) In particular, when MutS binds to a mismatch, it formsa stable clamp in the presence of ATP. It then diffusesalong the DNA strand. MutL may then bind to MutS,forming a new clamp which together diffuses more slowlyalong the DNA. This slower diffusion implies frequent in-teraction with the DNA backbone, thus indicating thatthe MutS-MutL clamp is capable of “searching” the DNAfor a hemimethlyated site [3]. Interestingly, MutL oftendissociates from MutS, and the two proteins form twoindependent, stable, and freely diffusing clamps, each ofwhich diffuses much more quickly than the MutS-MutLcomplex and is therefore not interacting with the DNAfrequently enough to perform a search [3]. If the dis-sociated clamps diffuse back into a state in which theyare adjacent along the DNA, they are able to reassociateand continue to search the DNA together. Finally, MutHassociates with MutL in order to cleave the newly synthe-sized DNA strand at the hemimethylated site. Measuredassociation lifetimes and diffusion constants for the dis-sociative search are compiled in Table I. Note that thediffusion of the MutS protein alone is ∼
10 times fasterthan the diffusion of the MutS-MutL complex, and thatthe diffusion of the MutL protein in the absence of MutHis a factor of ∼
20 faster than that of the MutS proteinalone. In the presence of MutH, MutS and MutL diffuseat similar rates. Furthermore, the addition of MutH doesnot seem to have a significant effect on the MutS-MutLdiffusion constant [3].The objective of this paper is to quantitatively studythe effect of this dissociative mechanism on search effi-ciency. In particular, there are two competing effects ofdissociative diffusion on search efficiency that make itsoverall effect unclear. Since it makes the overall diffu-sion faster (compared to a system that always remainsin the slow, searching state), it increases the region ofthe DNA that the protein clamps are able to visit. How-ever, since proteins in the dissociated state are unable toactually search the DNA, the amount of DNA actuallysearched may decrease if the proteins do not reassociateoften enough.
III. MODEL
To determine the effect of the dissociative DNA mis-match repair search mechanism on search probabilities,we propose the microscopic model illustrated in Fig.1. Inthe model, the search begins with an associated MutS-MutL protein complex. The complex then diffuses inone dimension along the DNA with diffusion constant D SL ,µ . During this time, any portion of the DNA overwhich the complex passes is considered “searched” andthe overall search is considered successful if a hemimethy-lated site is reached in this state. The MutS-MutL com-plex dissociates with some average lifetime τ A ,µ into in-dependent MutS and MutL clamps initially separated bya distance x d with diffusion constants D S and D L , re-spectively. The individual MutS and MutL clamps dif- Quantity Symbol SL Value SLH ValueSearchcomplexdiffusionconstant D SL , M (6 ± × bp /s (8 ± × bp /sMutSdiffusionconstant D S (7 ± × bp /s NAMutLdiffusionconstant D L (1 . ± . × bp /s (6 ± × bp /sMutS-MutLassociationlifetime τ A , M ± τ S ±
35 s NAMutL-DNAassociationlifetime τ L ±
150 s NATABLE I. Summary of relevant quantities measured by Liu et al. [3]. The column labelled “SL Value” gives the value ofeach quantity in the absence of MutH, whereas the columnlabelled “SLH Value” gives the value of each quantity in thepresence of MutH. Note that some values were not measuredseparately in the presence of MutH (indicated by NA), so wewill assume that MutH does not change these values. In Liu’sexperiment, 17 . . µ m, sowe use 4 . µ m = 17 . µ m by Liu et al. to units of bp, which is more convenient forour model. fuse along the DNA until they come into contact again.Once in contact, the proteins reassociate with each otherwith an association probability p A or continue indepen-dent diffusion starting from a distance x d with proba-bility 1 − p A . This dissociation-reassociation cycle con-tinues until one or both of the proteins dissociate fromthe DNA. However, dissociation of the MutS and MutLclamps from the DNA is not modeled directly; instead,a “cutoff time,” based on the MutS association lifetime τ S = 185 s (which is shorter than the MutL lifetime andthus provides the more stringent cutoff), is used [3, 6]after which the search is declared unsuccessful. SinceMutH binding to MutL changes the diffusion rates, weprovide results that correspond to a search with non-MutH rates as well as results that correspond to a searchwith MutH rates. These two scenarios provide the twolimiting cases, since it is not known if MutH tends toassociate with MutL early or late in the search. IV. SUCCESSFUL HEMIMETHYLATED SITESEARCH PROBABILITY
Analysis of the dissociative search mechanism is per-formed in terms of successful hemimethylated site searchprobabilities. A successful search is defined as one inwhich the MutS-MutL complex visits the hemimethy-
FIG. 1. (color online) Illustration of the model used to cal-culate successful hemimethylated site search probabilities inDNA mismatch repair. The DNA is modeled as a one di-mensional track along which the proteins travel. MutS andMutL protein clamps diffuse along the DNA with rates speci-fied by D S and D L , respectively, until they are directly adja-cent to each other. They may then either diffuse away fromeach other without associating or associate into a combined,searching MutS-MutL. These happen with probabilities 1 − p A and p A , respectively. The combined MutS-MutL can in turndiffuse along the DNA with a new rate specified by D SL , andis capable of searching the DNA over which it passes. After anaverage lifetime τ A ,µ , this complex dissociates into separateMutS and MutL clamps. lated site before the cutoff time passes. These searchprobabilities are calculated for different hemimethylatedsite distances from the original MutS-MutL position. A. Successful hemimethylated search probability ofnon-dissociative search
The baseline to which we compare successfulhemimethylated search probabilities using the dissocia-tive search mechanism are the equivalent probabilities forsearches in which the clamps remain associated with eachother for the entire search (pure 1-dimensional diffusion),which will be referred to as “non-dissociative” or “purelydiffusive” searches. The probability of a successful non-dissociative search can be derived analytically followingRedner [12]. We start with the diffusion equation in thecase of a MutS-MutL clamp: ∂p ( x, t | x ) ∂t = D SL ∂ p ( x, t | x ) ∂x , (1)where D SL is the diffusion constant associated with theMutS-MutL clamp and x is the position along the DNAat which the non-dissociative search begins. p ( x, t ) dx isthe probability that the clamp will be searching position x at time t .In order to consider the probability that the searchreaches the hemimethylated site x meth , we first solvethis differential equation in the presence of an absorb-ing boundary condition at x meth . Mathematically, thiscondition is expressed as p ( x meth , t ) = 0, requiring thatthe MutS-MutL has not arrived at the hemimethylatedsite. Using the method of images, this solution is givenby p ( x, t | x , x meth ) = 1 √ πD SL t (cid:34) exp (cid:32) − ( x − x ) D SL t (cid:33) − exp (cid:32) − ( x − (2 x meth − x )) D SL t (cid:33)(cid:35) , (2)which represents the spatial probability density of thesearch at some time t under the assumption that thesearch has not yet reached x meth . P ( x < x meth , t ) = (cid:90) x meth −∞ p ( x, t | x , x meth ) d x = erf (cid:18) x meth − x √ D SL t (cid:19) (3)thus gives the probability at time t that a clamp hasonly searched positions x S such that x S < x meth . Theprobability that x meth has been searched, therefore, is P (0) ( t ) = 1 − P ( x < x meth , t ) = 1 − erf (cid:18) x meth − x √ D SL t (cid:19) , (4)where the superscript (0) indicates that the probabilityis for a non-dissociative search. Successful probability fora dissociative search will be indicated by superscript ( ∗ ). B. Association and Dissociation Event SteppingSimulation
In order to use the model described in Sec. III tocalculate successful search probabilities, we develop aMonte Carlo approach that samples from analytic one-dimensional diffusion probability distributions. This cal-culation breaks the problem of determining the overallsuccessful search probability into the cumulation of theprobabilities that each individual microscopic associationidentifies the hemimethylated site. Each individual prob-ability can be determined analytically from the associa-tion lifetime and associated diffusion distributions if thedistance between the position at which the clamps asso-ciate and the hemimethylated site is known. For a giveninitial distance to the hemimethylated site, the subse-quent hemimethylated site distances are determined byboth the diffusion of the associated clamps and the dif-fusion of the dissociated clamps. In principle, this con-ceptual framework produces an analytic expression forthe successful search probability involving iterative con-volution integrals. In practice, however, this expressionis too complex to be used to compute values directly. In particular, we found that the most straightforward wayto calculate the many integrals over diffusion positionand association lifetime probability distributions was torandomly sample from these distributions many times.Each set of random samples produces a probability of ei-ther 1 or 0 that the hemimethylated site was successfullyreached, and the average of many of these sets gives theoverall successful search probability.Another way to think of this iterative random sam-pling is to imagine that each set of random samples rep-resents a path that the protein clamps can take alongthe DNA strand which results in either a successful orunsuccessful search. Each path occurs with a frequencyproportional to its probability, and therefore setting thesuccessful searches to 1 and the unsuccessful searches to0 and taking the average of many such searches producesthe successful search probability.The following algorithm is used to carry out this exper-iment, and will be called the association and dissociationevent stepping simulation (ADESS):1. The clamps start immediately adjacent to eachother. We set the starting position of the clamps to x = 0, the step counting index to i = 0, and theelapsed time to t e = 0. Input a position to searchfor on a 1-D axis (designated the “hemimethylatedsite” or simply “ x meth ”) representing its distancefrom the initial MutS-MutL association site on thedsDNA. Also choose a cutoff time, representing dis-sociation of the MutS clamp from the dsDNA.2. Decide whether the adjacent clamps associate bysampling randomly from a uniform distribution be-tween 0 and 1 and comparing the result to the inputprobability that adjacent clamps will associate, de-noted p A .3. If the clamps do not associate, go to step 7. If thecutoff time has been reached ( t e ≥ t s ) mark thesearch as unsuccessful and go to to step 9.4. Randomly select, using the method of inverse trans-form, an association lifetime from the probabilitydistribution given by p assoc ( t ) = τ − ,µ exp( − t/τ A ,µ ) , (5)where τ A ,µ is the average microscopic associationlifetime of the clamps. This represents the time forwhich the clamps are diffusing together during thisassociation. Denote this time as t assoc and increasethe total elapsed time t e by t assoc .5. Decide whether the hemimethylated site x meth hasbeen reached given the previous association posi-tion and lifetime by sampling randomly from a uni-form distribution between 0 and 1 and comparingthe result to the probability that the site has beenreached, given by P find ( t assoc ) = 1 − erf (cid:32) x meth − x i (cid:112) D SL ,µ t assoc (cid:33) , (6)where x i is the previous association position and D SL ,µ is the diffusion rate of the associated clamps.Note that this is simply Eq. (4) evaluated at t = t assoc . If the association time from the previous stepbrings the total time t e past the cutoff time, t assoc is taken to be the difference between the cutoff timeand the time at which the current association be-gan. This ensures that the final association findsthe hemimethylated site with the proper probabil-ity.6. (a) If x meth has been reached, the search is suc-cessful, so we proceed to step 9.(b) If x meth has not been reached, use the pre-vious association position x i and lifetime torandomly select the next dissociation posi-tion x i +1 from the probability density func-tion of Eq. (2) at t = t assoc , with an additionalnormalization factor C that ensures that theprobability that x meth has not been reachedis 1 at time t = t assoc . This factor is nec-essary because we have already determined inthe previous step that the hemimethylated sitehas not been reached: p ( x i +1 , t assoc | x i , x meth ) == C √ πD SL t assoc (cid:34) exp (cid:32) − ( x i +1 − x i ) D SL t assoc (cid:33) − (7)exp (cid:32) − ( x i +1 − (2 x meth − x i )) D SL t assoc (cid:33)(cid:35) , where C = 1erf (cid:2) ( x meth − x i ) / (cid:112) D SL ,µ t assoc (cid:3) . (8)We increase i by one to indicate that the x i +1 determined here is the new position of the twonewly dissociated clamps.7. Use the dissociation lifetime distribution p dissoc ( t ) = x d √ πD rel t exp (cid:20) − x d D rel t (cid:21) (9)to determine how long the clamps remain disso-ciated (see Appendix A). Here, x d is as before theinitial distance of the clamps following dissociation,and D rel is the diffusion constant associated withthe fluctuation of the distance between the clamps.Since each clamp is diffusing independently, the dis-tance between them is also diffusing without bias ina particular direction. Denote this chosen lifetime t dissoc and increment the total elapsed time t e by t dissoc .8. Using the lifetime chosen in the previous step t dissoc , select the next possible association position x i +1 from the distribution of positions at which the relative position of the clamps returns to 0.This distribution is given by the solution to theunbounded diffusion equation with constant D CM associated with the diffusion of the “center of mass”of the dissociated clamps (see Appendix A). In par-ticular, p return ( x i +1 | x i , t dissoc ) == 1 √ πD CM t dissoc exp (cid:20) − ( x i +1 − x i ) D CM t dissoc (cid:21) (10)Increase i by one and return to step 2.9. Perform many such searches and assign a value of1 to all those that are successful and 0 to those inwhich the cutoff time is reached without success.Take the average value of all of these searches todetermine the successful search probability. Dividethe trials into 10 independent blocks of equal num-ber of trials and calculate the search probability foreach block to determine standard error. C. Determination of model parameters
The model described above is written in terms of sev-eral microscopic parameters. In this section we will de-termine the values of these parameters. Some of theseparameters can be calculated directly from experimen-tally measured values and are summarized in Tab. II. Forthe remainder, we need to make reasonable assumptionsabout their values, summarized in Tab. III.The reason that the values of these parameters must becalculated or estimated rather than be measured directlyis that the spatial resolution of the experiment is diffrac-tion limited. Since the wavelength of visible light is onthe order of hundreds of nm and the protein footprints areon the order of a few nm, the proteins interact on scalesbelow the spatial sensitivity of the experiment. Impor-tantly, this implies that the clamps can appear to be asso-ciated with each other in the experiment, when they arecloser than the spatial resolution of the experiment, eventhough they may or may not be in actual physical con-tact. In contrast, in our model we define the associatedstate as the state in which the diffusion of the clamps iscoupled, and the clamps have undergone some conforma-tional change that allows them to interact more closelywith the backbone and thus changes their diffusion rate.The dissociated state is the state in which the clampsare diffusing independently of each other. To avoid con-fusion, we will thus for the purposes of describing thecalculation of model parameters from experimental ob-servables denote the state in which the clamps are physi-cally associated as “microscopically associated,” the statein which the clamps are physically dissociated but closeenough that their positions are indistinguishable withinthe resolution of the experiment as “proximate,” and thestate in which the clamps are physically dissociated and
Parameter Symbol SL Value SLH ValueDissociatedclamps relativepositiondiffusionconstant D rel (1 . ± . × bp /s (1 . ± . × bp /sDissociatedclamps “centerof mass”diffusionconstant D CM (7 ± × bp /s (3 . ± . × bp /sMutS-MutLdiffusionconstant D SL ,µ , D SL ,µ (6 ± × bp /s (8 ± × bp /sMutS-MutLassociationlifetime τ A ,µ .
03 s ≤ τ A ,µ <
30 s 0 .
03 s ≤ τ A ,µ <
30 sDistance fromhemimethy-latedsite x meth t s ±
35 s 185 ±
35 sTABLE II. Model parameter values calculated from experi-mental observables. The column labelled “SL Value” giveseach value in the absence of MutH, whereas the column la-belled “SLH Value” gives each value in the presence of MutH.Parameter Symbol ValueAdjacent MutS-MutLassociation probability p A − ≤ p A ≤ x d x M far enough away that their positions are distinguishableas “macroscopically dissociated”. In addition, we willuse “macroscopically associated” to describe clamps thatcould be either “microscopically associated” or “proxi-mate” and “microscopically dissociated” for clamps thatcould be either “proximate” or ”macrosopically dissoci-ated”.
1. Diffusion constants of individual clamps
Since diffusion is scale invariant, there is no reasonto believe that the microscopic diffusion constants D S and D L of the individual clamps are different from theirmacroscopically measured values given in Tab. I. Rewrit-ing the diffusion of two clamps of different diffusion con-stants in terms of relative and “center of mass” coor-dinate yields D rel = D S + D L for the diffusion of therelative coordinate and D CM = D S D L D S + D L for the diffusionof the “center of mass” coordinate.
2. Association lifetime and complex diffusion constant
The experiment measures the lifetime τ A , M and dif-fusion constant D SL , M of macroscopically associatedclamps (see Tab. I). Since macroscopically associatedclamps could be either microscopically associated orproximal, a macroscopic association event consists of asequence of transitions between the microscopically as-sociated state and the proximal state, where only aftermultiple excursions into the proximal state the clampsfinally reach a distance that can be resolved in the ex-periment and thus reach the macroscopically dissociatedstate. Thus, the macroscopically measured lifetime τ A , M is an effective lifetime that integrates over many mi-croscopic dissociation and re-association events, and themacroscopically measured diffusion constant D SL , M is atemporal average of the diffusion constant of microscopi-cally associated clamps D SL ,µ and the diffusion constantof the center of mass of individual clamps D CM duringtheir excursions in the proximal state.In Appendix B 1, we explicitly calculate how themacroscopically measured lifetime τ A , M that integratesover multiple microscopic dissociation and re-associationevents depends on the microscopic parameters of themodel. Solving this dependence for the microscopic as-sociation time yields τ A ,µ = 1[( (cid:104) N A (cid:105) − p A + 1] (cid:20) τ A , M − x M ( x M − x d ) D rel (cid:21) ≈ τ A , M [( (cid:104) N A (cid:105) − p A + 1] , (11)where p A is the probability that adjacent MutS and MutLclamps will associate, and (cid:10) N A (cid:11) = x M /x d is the num-ber of times the clamps are in a microscopically adja-cent state (making microscopic association possible) in asingle macroscopic association. x d and x M are the mi-croscopic and macroscopic association distances, respec-tively, so x d (cid:28) x M . The approximation in the second lineof Eq. (11) holds for our specific values of the parametersas x M ( x M − x d ) D rel ≈ .
07 s and τ A , M ≈
30 s. It implies thatthe time spent in the proximal state has a negligible con-tribution to the macroscopic association time due to thespeed of the dissociated diffusion, even though the factthat a macroscopic association event consists of multiplemicroscopic association events is relevant as evidencedby the prefactor [( (cid:104) N A (cid:105) − p A + 1] − . Accordingly (seeAppendix B 2), the excursions into the proximal state donot have a significant impact on the diffusion constanteither due to their short durations. Thus, D SL ,µ ≈ D SL , M . (12)
3. Distance from the nearest hemimethylated site In Escherichia coli , hemimethylation occurs at GATCsites [13–15]. Thus, the distance from a random lo-cation in the genome to the nearest hemimethylatedsite is governed by the distance distribution of adja-cent GATC sites, shown in Fig. 2 for the genome of
Escherichia coli
K-12 MG1655, NCBI RefSeq assembly:GCF 000005845.2. While in 90% of the cases, the dis-tance between neighboring GATC sites is 500 bp or less,the largest distances between adjacent GATC sites reachall the way to 5000 bp. Since the ability to repair mis-matches in the genome should depend on being able toidentify the closest hemimethylated site even in the worstcase scenario of being right in the middle of the two fur-thest separated GATC sites, we will report search prob-abilities over a range of x meth = 500 − FIG. 2. (color online) Distribution of hemimethylated site dis-tances in the
Escherichia coli genome. For each separation onthe horizontal axis, the vertical axis shows the number of adja-cent GATC sites in the
Escherichia coli
K-12 MG1655, NCBIRefSeq assembly GCF 000005845.2 genome with at least thatseparation.
4. Total search time
The search continues until either MutS or MutL disso-ciates from the the DNA. Since the experimentally deter-mined MutS association lifetime τ S = 185 ±
35 s is muchshorter than the experimentally determined MutL associ-ation lifetime τ L = 850 ±
150 s, the search time is limitedby the MutS association lifetime and thus t s = τ S .
5. Dissociation distances and association probability
Unlike the microscopic association lifetime, micro-scopic diffusion constants, and the distance fromhemimethylated sites, the dissociation distances x d and x M and the association probability p A are not deter-mined by experimental observables, and thus cannot becalculated directly. Physical arguments, however, al-low estimation of x d and x M . In particular, the micro- scopic dissociation distance, i.e., the distance at whichthe clamps can be considered as independent, is on theorder of x d ∼ x M ∼
300 nm ∼ p A , but argumentscan be made to set limits on this parameter. As a prob-ability, the upper limit on p A is evidently 1. Approxima-tion of a lower limit is made possible by the assumptionthat p A ≥ P assoc, soln , where P assoc, soln is the probabil-ity that a MutL in solution colliding with a DNA-boundMutS will associate. This assumption is plausible sincethere is only one dimension (namely rotation around theDNA) in which MutS and MutL clamps already asso-ciated with the DNA must align in order to associatewith each other, rather than the three dimensions thatmust align when MutL is not already associated to theDNA. This assumption combined with published experi-mental results independent of the experiments in [3] sug-gests that the association probability must be greaterthan 10 − (see Appendix B 3):10 − ≤ p A ≤ D. Validation of the ADESS approach
In order to validate the ADESS approach and the mi-croscopic parameter calculation, we compare ADESS toa much more time consuming simulation that explicitlytracks the positions along the DNA and interactions ofMutS, MutL, and MutS-MutL clamps. This simulationuses Gillespie’s stochastic simulation algorithm [16] tochoose a clamp and a direction in which to move it inevery step (see Appendix C for details). Each move hasa step size of a single base pair; thus, we will denotethis simulation approach as the base pair stepping sim-ulation (BPSS). By counting those positions over whicha MutS-MutL complex passed as having been searched,this simulation provides an alternative tool by which thesuccessful search probability can be calculated. Sincethe BPSS approach follows every single diffusion stepof the clamps, it becomes computationally unfeasible toobtain sufficient statistics for realistic values of the dif-fusion constants and we thus perform this validation for D S = 10 bp /s, D SL = 10 bp /s, and D L = 10 bp /s,which are each about two orders of magnitude smallerthan the actual experimentally determined diffusion con-stants. Fig. 3 compares the search probability calculatedusing the BPSS approach and the search probability cal-culated using the ADESS approach and finds them toyield identical results within statistical error. FIG. 3. (color online) Comparison between successful searchprobabilities calculated using BPSS and ADESS. (a) is calcu-lated with p A = 1 and (b) with p A = 10 − . The statisticaluncertainty is smaller than the size of the symbols. Additionally, the BPSS allows us to validate Eq. (11)for the microscopic association lifetime τ A ,µ empirically.In particular, the BPSS approach lets us keep track of thedistance between separate clamps and the times at whichthese distances occur. Using this feature, we calculate thetime t A,M for which the clamps remain within the macro-scopic association distance x M of each other, i.e., thetime until they first reach the macroscopically dissociatedstate. Fig. 4 shows histograms of this time to reach themacroscopically dissociated state calculated from simu-lations that use the microscopic association lifetime cal-culated via Eq. (11). We find that these simulated dis-tributions accurately reproduce the experimentally mea-sured macroscopic association lifetime τ A , M ≈ ± FIG. 4. (color online) Histogram of simulated macroscopicassociation times for (a) p A = 1 and (b) p A = 10 − . Theline is given by τ − exp (cid:0) − t A , M /τ A,M (cid:1) , where τ A,M is theaverage of the simulated macroscopic association times. Thisline therefore demonstrates that the association time proba-bility decays exponentially with a decay constant consistentwith the macroscopic association lifetime of τ A , M ≈ ± E. Robustness of results to variation in estimatedparameters
Since several model parameters can only be estimated(see Tab. III) we next determine how sensitive our modelis to variations in these parameters. The parameterwith the largest uncertainty is the microscopic associ-ation probability p A . In order to gauge the sensitivity ofthe model to this parameter, we hold all other parametersconstant at their values given in Tabs. II and III (both inthe presence of, and the absence of, MutH) while vary-ing the microscopic association probability over its entirepotential range given in Eq. (13). Then, we numericallycalculate the main observable of our model, namely theprobability of a successful search, using the ADESS ap-proach described in Sec. IV B. Fig. 5 shows the resulting FIG. 5. (color online) Search probability as a function ofsearch distance for different values of the association prob-ability p A . (a) was calculated with non-MutH parameters,while (b) was calculated with MutH parameters. The statis-tical uncertainty is smaller than the size of the symbols. search probabilities as a function of search distance x meth for different values of the association probability p A . Wenote that the successful search probability is largely in-dependent of the microscopic association probability p A as long as p A ≥ .
001 and then drops significantly for p A = 10 − . Since a significantly reduced search proba-bility would be evolutionarily disadvantageous and ourlower limit of p A ≥ − originated from a fairly gen-erous “worst case” analysis (see Appendix B 3), we thusfrom here on focus on the range 0 . ≤ p A ≤
1. In thisrange the search probability is largely insensitive to thevalue of p A .We note that naively it appears unintuitive for theoverall search probability to be so insensitive to threeorders of magnitude of variation in the probability thattwo adjacent clamps successfully form a complex. How-ever, we would like to point out that the microscopic association probability p A appears in Eq. (11) for themicroscopic association lifetime. Thus, different valuesfor the microscopic association probability p A yield dif-ferent values for the microscopic association lifetime τ A ,µ to keep the macroscopic association lifetime τ A , M con-sistent with its measured value. The relative insensitiv-ity of the search probability to the value of the micro-scopic association probability thus indicates that changesto the microscopic association lifetime compensate forthe significant variation in microscopic association prob-abilities over three orders of magnitude. This also ex-plains the change in behavior at p A = 0 . (cid:104) N A (cid:105) = 1000 for our parameters, the denom-inator ( (cid:104) N A (cid:105) − p A + 1 in Eq. (11) is larger than onefor p A ≥ .
001 and asymptotes to one for p A < . p A ≥ .
001 the clamps go through multiple re-association events before final dissociation, the lifetimeof which compensates for the change in the microscopicassociation probability p A . For p A < . τ A ,µ is lockedto the macroscopic association lifetime τ A , M , and is nolonger able to compensate for changes in the associationprobability p A .Similar to our analysis of the sensitivity of the associa-tion probability p A , we vary the values of the dissociationdistances x d and x M by a factor of two in each directionto determine the sensitivity of the search probability tochanges in these parameters at both limits of p A . Fig. 6demonstrates that for p A = 1 and p A = 10 − variation ofthe dissociation distances x d and x M by a factor of twoonly introduces a relative difference of up to 13%. Wethus conclude that the difference between the approxi-mate and exact values of the dissociation distances x d and x M will not significantly affect our results. V. DISSOCIATIVE SEARCH EFFICIENCY
In this section we will systematically compare theefficiency of the dissociative search involving multipledissociation-reassociation cycles of the two clamps with anon-dissociative search, in which the complex of the twoclamps searches the DNA via simple diffusion. The goalis to determine if the dissociative search observed in theexperiments by Liu et al. [3] confers an evolutionary ad-vantage of increased success probability over the simplernon-dissociative search. The successful search probabil-ity of the dissociative search is calculated numericallyusing the ADESS approach presented in Sec. IV B, whilethe successful search probability of the non-dissociativesearch is given analytically by Eq. (4).0
FIG. 6. (color online) Comparison of ADESS results for factorof two variations in the macroscopic and microscopic dissoci-ation distances. (a) is calculated with p A = 1 and (b) with p A = 0 . A. Dissociative and non-dissociative searches resultin similar single search efficiency for experimentaldiffusion constants
Fig. 7 shows the successful search probability P ( ∗ ) t s , ofthe dissociative search and P (0) t s , of the non-dissociativesearch for the experimentally determined values of thediffusion constants as a function of distance x meth fromthe hemimethylated site. Here, the subscript t s indicatesthe search time in seconds, and the subscript 1 indicatesthat the probability indicated is the success probabilityfor only a single search. Probabilities are shown for var-ious search times t s within roughly a factor of two fromthe experimental value of 185 s in both directions. Thefigure presents results for diffusion constants correspond- ing to the case where MutH is not associated with MutLand p A = 1 in (a) and for diffusion constants correspond-ing to the case where MutH is associated with MutL and p A = 0 .
001 in (b). These are chosen as the two extremesin terms of the differences between dissociative and non-dissociative searches, as the results for MutH parametersat p A = 0 .
001 and for non-MutH parameters at p A = 1are in between the two cases shown.Surprisingly, the non-dissociative search mechanismsomewhat, but systematically, outperforms the disso-ciative mechanism for this choice of parameters, espe-cially for the case of microscopic association probability p A = 0 . . B. Dissociative searches confer an advantage acrossa broad range of diffusion constants
In the crowded in vivo environment, diffusion is likelysignificantly slower (10-100 fold) than in vitro [17]. Ad-ditionally the diffusion constants, hemimethylated sitedistances, and association lifetimes of mismatch repairproteins may vary across organisms. In light of theseobservations, we next characterize the relative effect ofthe dissociative search mechanism across a wide range ofpossible diffusion rates. Although we only explicitly varythe diffusion rate, this can be seen as variation of thedimensionless combination √ Dt/x on which the proba-bility depends (see Eq. (4)). Thus, we effectively studyvariations in association time t s and hemimethylated sitedistance x meth as well as diffusion rate.In order to characterize the effect of the dissociativesearch mechanism across many possible diffusion rates,times, and distances, we systematically vary diffusionrates and measure the relative advantage conferred bythe dissociative mechanism. Fig. 8 shows the relativeprobability r , defined as r ≡ P ( ∗ ) t s , /P (0) t s , (14)for t s = 185 s. The darkness of the color indicates themagnitude of the relative probability, and the squaresthat are brown and have hatching are those in whichthe dissociative mechanism lowers the successful searchprobability ( r < r >
1, i.e., areas of increased probability due to thedissociative search mechanism. To ensure that smallerdifferences are visible, relative differences r >
100 and r < /
100 are set to r = 100 and r = 1 / D SL is varied along thevertical axis, while the fast diffusion rates D S and D L arevaried along the horizontal axis. In order to restrict the1 FIG. 7. (color online) Successful search probability of disso-ciative and non-dissociative searches as a function of distance x meth from the hemimethylated site for different search times.Results in (a) use experimental parameters in the case whereMutH is not associated with MutL and p A = 1 and (b) whenMuH is associated with MutL and p A = 0 . plot to two dimensions, the ratio between the two fastrates is guided by experiment: either both rates are thesame, or they differ by an order of magnitude, roughlycorresponding to the situation in the presence and in theabsence of MutH, respectively. The framed square corre-sponds to the in vitro diffusion constants of E. coli andthe dashed lines enclose the range of diffusion constantsthat are smaller than the in vitro diffusion constants,consistent with the in vivo expectation [17]. The solid(blue) line indicates a reduction of the in vitro diffusionconstants by two orders of magnitude while maintainingthe in vitro ratio between D SL and D S . It is importantto note that although in vivo diffusion rates for E. coli are likely to fall within the region enclosed by the dashedlines, this may not necessarily be the case for other or-
FIG. 8. (color online) Relative successful search probabilityas a function of diffusion constants for D S = D L / p A = 1 . D S = D L , p A = 0 .
001 (right column). Theformer corresponds roughly to the case in which MutH is notpresent, and the latter corresponds roughly to the case inwhich MutH is present. The color scale indicates the ratioof ADESS dissociative and analytic non-dissociative proba-bilities, and is cut off at 10 and 10 − so that variations lessthan an order of magnitude are visible. Ratios greater than10 and less than 10 − are set to 10 and 10 − , respectively.The ratios less than one are hatched, while the ratios greaterthan one are solid. The square outlined indicates the orderof magnitude of experimental diffusion constants, the possi-ble in vivo E. coli diffusion constants are enclosed within thedashed lines, and the non-physical ( D SL < D S ) regions of thecoefficient space are blocked out (in red). ganisms.We find that differences between the search mecha-nisms are most significant for the largest distances fromthe hemimethylated site. Also, as expected, combina-tions of slow associated diffusion and fast dissociated dif-fusion are most favored by the dissociative mechanism(green/unhatched regions of the plot), whereas combina-tions of fast associated diffusion D SL and slow dissociateddiffusion D S and D L are least favored by the dissociativemechanism (brown/hatched regions of the plot). Thelatter case is often physically unrealistic since the associ-ated clamps must diffuse more slowly than the individualclamps in order to interact with the DNA backbone andrecognize the hemimethylated site. Accordingly the re-gions of the plot in which D SL < D S are blocked out in2red, eliminating much of the space that would be disfa-vored by the dissociative mechanism. This renders thearea favored by dissociation broad by comparison.The area which is most highly favored by the dissocia-tive mechanism (the dark green region), however, occursat very low single search success probabilities. Whilethe relative probability increase is quite large ( ≥ − to a dissociative success probability of10 − would cross some threshold probability below whichfailure of mismatch repair may negatively affect the or-ganism. This point is emphasized by the inclusion ofthe absolute single non-dissociative search probability onthe vertical axis (since this probability only depends on D SL , it remains constant as one moves across the plothorizontally). C. Multiple searches emphasize low probabilitysingle search differences
Data published by Acharya et al. , Graham et al. ,and Hombauer et al. [18–20] suggest that the DNAmismatch repair process involves multiple MutS-MutL(-MutH) searches for the hemimethylated site. Thus,the cumulative probability for multiple low probabilitysearches may result in a physiologically relevant successprobability for the overall search process. In order toapproximate the effect of multiple searches, we need tocalculate the probability that at least one search is suc-cessful. This quantity will be referred to as overall suc-cessful search probability. Although the proteins involvedin separate searches are in principle able to interact witheach other, accounting for these interactions is beyondthe scope of this study. Instead, we hope to gain at leastqualitative insight into the overall search probability un-der the assumption that the individual searches are in-dependent. Under this assumption, P t s ,n s = 1 − (1 − P t s , ) n s (15)where P t s ,n s is the overall search probability, P t s , is thesingle search probability, and n s is the total number ofsearches.Figs. 9 and 10 show diffusion space scans of δP t s ,n s ≡ P ( ∗ ) t s ,n s − P (0) t s ,n s (16)indicated by the coloring/hatching for n S = 3 and n S =10 searches, respectively. Note that in these figures dif-ference between the two probabilities, rather than theirratio, is chosen to avoid overemphasizing large relativechanges between two otherwise small probabilities.As in Fig. 8, the physically unrealistic regions areblocked out, the probable region in which E. coli dif-fusion constants reside are enclosed in the dotted lines,and the approximate in vitro E. coli diffusion constants
FIG. 9. (color online) diffusion constant space probabilitydifference scan for searches by n s = 3 protein complexesin the cases D S = D L / p A = 1 . D S = D L , p A = 0 .
001 (right column). The former corre-sponds roughly to the case in which MutH is not present, andthe latter corresponds roughly to the case in which MutH ispresent. The color scale indicates the absolute difference be-tween the ADESS dissociative and analytic non-dissociativeprobabilities. Differences less than zero are hatched, whiledifferences greater than zero are solid. The square outlinedin blue indicates order of magnitude of experimental diffu-sion constants, the possible in vivo E. coli diffusion constantsare enclosed within the dotted lines, and the non-physical( D SL < D S ) regions of the coefficient space are blocked out(in red). are indicated by the blue square. Since probability differ-ences are shown in the colormap, the probability of thenon-dissociative search is omitted from the vertical axis.Figs. 9 and 10 demonstrate that there is a muchbroader range of diffusion constants, and thereforehemimethylated site distances and association times, forwhich the dissociative search mechanism is beneficial formismatch repair hemimethylated site searches as com-pared to pure diffusion. For 10 searches, the absolutedifference in probability approaches δP s, = 1 for thecases in which dissociation is most favorable, whereas for3 searches the maximum difference in probability is moremodest, with δP s, ≈ .
5. The case with 3 searches,however, exhibits a larger regime in which the dissocia-tion mechanism is meaningfully beneficial.3
FIG. 10. (color online) diffusion constant space probabil-ity difference scan for searches by n s = 10 protein com-plexes in the cases D S = D L / p A = 1 . D S = D L , p A = 0 .
001 (right column). The former corre-sponds roughly to the case in which MutH is not present, andthe latter corresponds roughly to the case in which MutH ispresent. The color scale indicates the absolute difference be-tween the ADESS dissociative and analytic non-dissociativeprobabilities. Differences less than zero are hatched, whiledifferences greater than zero are solid. The square outlinedin blue indicates order of magnitude of experimental diffu-sion constants, the possible in vivo E. coli diffusion constantsare enclosed within the dotted lines, and the non-physical( D SL < D S ) regions of the coefficient space are blocked out(in red). VI. CONCLUSIONS
Experiments by Liu et al. [3] observed repeated asso-ciation and dissociation between MutS and MutL slidingclamps involved in identification of a hemimethylated siteduring DNA mismatch repair in
E. coli . This naturallyraises the question if locally searching the DNA in theassociated state and then quickly diffusing to a differentlocation on the DNA when dissociated actually providesan advantage to the search process. Here, we model thedissociative search process, calculate the probability thatsearching DNA mismatch repair proteins successfully lo-cate the hemimethylated site, and compare the successrate of this dissociative search to the success rate of a simple diffusive search. We find that both search mech-anisms are highly efficient for the majority of observedhemimethylated site distances at measured in vitro dif-fusion rates. Perhaps somewhat surprisingly, there is aslight disadvantage in terms of single search probabil-ity conferred by the dissociative search mechanism forsearches at these in vitro rates. We note, however, thatthere may be variation in diffusion rate, association life-time, and hemimethylated site distance among differentorganisms and that it has been shown that in vivo diffu-sion can be slower than in vitro diffusion by one or two or-ders of magnitude [17]. Accordingly, we studied the effectof the dissociative search mechanism across a large rangeof the parameter space of diffusion rates, association life-times, and hemimethylated site distances and found thatthe dissociative mechanism is either neutral or favorablein most cases. We find the most significant advantages ofthe dissociative search in the parameter regime where theoverall search probabilities (of both the dissociative andthe non-dissociative searches) are very small. While suc-cessful search probabilities in the sub-percent range areprobably not physiologically meaningful by themselves,we showed that they do become meaningful when takinginto account that DNA mismatch repair includes multi-ple MutS initiated searches for the hemimethylated site,resulting in a physiologically relevant advantage of thedissociative search mechanism for large regions of thephysically realistic parameter space.It is important to emphasize that our treatments ofmultiple searches and in vivo diffusion here are neces-sarily approximate. A more detailed treatment that ac-counts for the interactions between proteins that are ini-tially involved in “separate” searches may be a fruitfulavenue for future research: in principle the base pairstepping simulation is capable of tracking more than twoproteins, but the current computational cost is too high.Additionally, it is likely possible to expand the associ-ation and dissociation event stepping simulation to ac-count for more than two proteins and the presence ofother molecules on the DNA strand. In particular, thepresence of other molecules on the DNA strand may pro-vide a spatial constraint that prevents the occurrence ofthe of long-lived dissociation events that decrease theefficiency of the dissociative mechanism. Moreover, wehave here assumed that the first encounter of a MutS-MutL complex with a hemimethylated site results in itsrecognition followed by an incision. If recognition of thehemimethylated site is stochastic itself, this will also re-duce the overall search probability. Incorporating this ef-fect into our approach and quantitating its consequenceson the search probabilities of the dissociative and non-dissociative searches will be an interesting direction offuture research.Another potential avenue of study is the effect of amore physiological environment on the diffusion con-stants of the proteins. We note that the in vivo dif-fusion constants are likely to be smaller than the mea-sured in vitro coefficients, but are not able to quantita-4tively predict the magnitude of this decrease. A studythat determines the actual in vivo diffusion constants ofmismatch repair proteins could therefore be very useful.Similarly, determination of diffusion constants in systemsother than
E. coli would be interesting.We note that in addition to its role in the search fora hemimethylated site, MutL acts as a processivity fac-tor for the DNA helicase uvrD, resulting in the excisionthat is necessary for the progression MMR process [21].It therefore could be the case that the observed disso-ciative mechanism is evolutionarily preferred because thedissociation steps allow MutS to load multiple MutL pro-teins onto the strand, aiding in excision. This alternativehypothesis would be strengthened if further work deter-mines that in vivo search efficiency is not increased by thedissociative mechanism, although it is also possible thatthe dissociative mechanism serves a dual purpose: bothincreasing search efficiency and loading multiple MutLproteins onto the DNA strand.Beyond describing the specifics of the MutS-MutLsearch process, our approach in this paper is likely tobe applicable to other diffusive processes along DNA inbiology. For instance, Zessin et al. observe a fast andslow diffusion rate of proliferating cell nuclear antigen(PCNA), which is a eukaryotic protein similar to a β clamp that also forms a clamp structure during asso-ciation with DNA [22]. Eukaryotes also exhibit threehomologs to both MutS and MutL [6], combinationsof which are likely to result in a variety of associa-tion/dissociation and diffusion parameters. In this case,the broad parameter space characterized by our analysismay provide insight into MMR in many organisms.Despite the work still necessary to fully understandthe diffusive search process in DNA mismatch repair, weprovide a broad characterization of the observed dissocia-tive search mechanism along with a robust analytical andcomputational framework with which to study diffusionand interaction of protein clamps in DNA mismatch re-pair that can provide the basis for generalization to othersliding clamp systems in Biology. VII. ACKNOWLEDGMENTS
This material is based upon work supported by theNational Science Foundation under Grant No. DMR-1719316 to RB and by the National Institutes of Healthunder Grant Nos. GM129764 and CA067007 to RF. [1] Juana V. Martin-Lopez and Richard Fishel, “The mech-anism of mismatch repair and the functional analysis ofmismatch repair defects in Lynch syndrome.” Fam. Can-cer , 159–168 (2013).[2] Gloria X. Reyes, Tobias T. Schmidt, Richard D. Kolod-ner, and Hans Hombauer, “New insights into the mecha-nism of DNA mismatch repair,” Chromosoma , 443–462 (2015).[3] Jiaquan Liu, Jeungphill Hanne, Brooke M. Britton, JaredBennett, Daehyung Kim, Jong-Bong Lee, and RichardFishel, “Cascading MutS and MutL sliding clamps con-trol DNA diffusion to activate mismatch repair,” Nature(London) , 583–587 (2016).[4] Jean Y.J. Wang and Winfried Edelmann, “Mismatch re-pair proteins as sensors of alkylation DNA damage,” Can-cer Cell , 417–418 (2006).[5] Ravi R Iyer, Anna Pluciennik, Vickers Burdett, andPaul L Modrich, “DNA mismatch repair: functions andmechanisms,” Chem. Rev. , 302–323 (2006).[6] Richard Fishel, “Mismatch repair,” J. Biol. Chem. ,26395–26403 (2015).[7] Otto G. Berg, Robert B. Winter, and Peter H. Von Hip-pel, “Diffusion-driven mechanisms of protein transloca-tion on nucleic acids. 1. models and theory,” Biochem-istry , 6929–6948 (1981).[8] Ohad Givaty and Yaakov Levy, “Protein sliding alongDNA: Dynamics and structural characterization,” J. Mol.Biol. , 1087–1097 (2009).[9] M O’Donnell, J Kuriyan, X P Kong, P T Stukenberg,and R Onrust, “The sliding clamp of DNA polymeraseIII holoenzyme encircles DNA.” Mol. Biol. Cell , 953–957 (1992). [10] Dina Daitchman, Harry M Greenblatt, and Yaakov Levy,“Diffusion of ring-shaped proteins along DNA: case studyof sliding clamps,” Nucleic Acids Res. , 5935–5949(2018).[11] Xiang-Peng Kong, Rene Onrust, Mike O’Donnell, andJohn Kuriyan, “Three-dimensional structure of the β subunit of E. coli DNA polymerase III holoenzyme: Asliding DNA clamp,” Cell , 425–437 (1992).[12] Sidney Redner, A guide to first-passage processes (Cam-bridge University Press, 2001).[13] Sanford Lacks and Bill Greenberg, “Complementaryspecificity of restriction endonucleases of Diplococcuspneumoniae with respect to DNA methylation,” J. Mol.Biol. , 153–168 (1977).[14] Stanley Hattman, Joan E Brooks, and Malthi Ma-surekar, “Sequence specificity of the P1 modificationmethylase (M · Eco P1) and the DNA methylase (m · Ecodam) controlled by the Escherichia coli dam gene,” J.Mol. Biol. , 367–380 (1978).[15] Gail E Geier and Paul Modrich, “Recognition sequenceof the dam methylase of Escherichia coli K12 and modeof cleavage of Dpn I endonuclease.” J. Biol. Chem. ,1408–1413 (1979).[16] Daniel T. Gillespie, “Exact stochastic simulation of cou-pled chemical reactions,” J. Phys. Chem. , 2340–2361(1977).[17] Michael C Konopka, Irina A Shkel, Scott Cayley,M Thomas Record, and James C Weisshaar, “Crowd-ing and confinement effects on protein diffusion in vivo,”J. Bacteriol. , 6115–6123 (2006).[18] Samir Acharya, Patricia L Foster, Peter Brooks, andRichard Fishel, “The coordinated functions of the E. coli MutS and MutL proteins in mismatch repair,” Mol. Cell. , 233–246 (2003).[19] William J Graham, Christopher D Putnam, Richard DKolodner, et al. , “The properties of Msh2–Msh6 ATPbinding mutants suggest a signal amplification mecha-nism in DNA mismatch repair,” J. Biol. Chem. ,18055–18070 (2018).[20] Hans Hombauer, Christopher S Campbell, Catherine ESmith, Arshad Desai, and Richard D Kolodner, “Vi-sualization of eukaryotic DNA mismatch repair revealsdistinct recognition and repair intermediates,” Cell ,1040–1053 (2011).[21] Jiaquan Liu, Ryanggeun Lee, Brooke M Britton, James ALondon, Keunsang Yang, Jeungphill Hanne, Jong-BongLee, and Richard Fishel, “MutL sliding clamps coordi-nate exonuclease-independent Escherichia coli mismatchrepair,” Nat. Commun. , 1–15 (2019).[22] Patrick JM Zessin, Anje Sporbert, and Mike Heilemann,“PCNA appears in two populations of slow and fast dif-fusion with a constant ratio throughout S-phase in repli-cating mammalian cells,” Sci. Rep. , 18779 (2016).[23] Eli Ben-Naim, Paul L. Krapivsky, and Sidney Red-ner, “Random walk/diffusion,” http://physics.bu.edu/~redner/542/book/rw.pdf (2008), accessed: 2020-06-22.[24] Paul Krapivsky, Sidney Redner, and Eli Ben-Naim, A Kinetic View of Statistical Physics , edited by Cam-bridge University Press (Cambridge University Press,2010).[25] M.V. Smoluchowski, “Versuch einer mathematischenTheorie der Koagulationskinetik kolloider L¨osungen.” Z.Phys. Chem. , 129–168 (1917).[26] Laura Manelyte, Claus Urbanke, Luis Giron-Monzon,and Peter Friedhoff, “Structural and functional analysisof the MutS C-terminal tetramerization domain,” NucleicAcids Res. , 5270–5279 (2006).[27] Michelle Grilley, Katherine M Welsh, SS Su, andPaul Modrich, “Isolation and characterization of the Es-cherichia coli MutL gene product.” J. Biol. Chem. ,1000–1004 (1989).[28] Samir Acharya, Patricia L Foster, Peter Brooks, andRichard Fishel, “The coordinated functions of the E. coliMutS and MutL proteins in mismatch repair,” Mol. Cell , 233–246 (2003). Appendix A: Time and location of re-association
In this appendix we derive the probability densities forthe time to reassociation and the reassociation location oftwo clamps once they have disassociated from each other.These distributions are used in the ADESS approach toupdate the time and position after a microscopic excur-sion of the clamps.
1. Independent diffusion of two sliding clamps
While the two clamps are diffusing independently, thestate of the system is given by positions x S and x L of theMutS and the MutL clamp along the DNA, respectively. The joint probability distribution for the two clamps fol-lows the diffusion equation ∂p ( x S ,x L | t ) ∂t = D S ∂ p ( x S ,x L | t ) ∂x S + D L ∂ p ( x S ,x L | t ) ∂x L . (A1)By analogy to the Schr¨odinger equation for a two-body quantum mechanical problem, this equation canbe rewritten in terms of relative and “center-of-mass”coordinates. In particular, substituting x CM ≡ D S x S + D L x L D S + D L , (A2) x rel ≡ x S − x L , (A3) D CM ≡ D S D L D S + D L and (A4) D rel ≡ D S + D L (A5)yields ∂p ( x CM , x rel | t ) ∂t = (A6)= D CM ∂ p ( x CM , x rel | t ) ∂x + D rel ∂ p ( x CM , x rel | t ) ∂x , which describes independent diffusion of the “center ofmass” coordinate x CM with diffusion constant D CM andthe relative coordinate x rel with diffusion constant D rel .
2. Time of reassociation
In our model, the microscopic dissociation of the twoclamps results in them being separated by the micro-scopic dissociation distance x d . Since relative and centerof mass position diffuse independently, the time to reasso-ciation is the time the freely diffusing relative coordinate x rel takes to reach x rel = 0 when starting at x rel = x d .This problem is mathematically equivalent to the prob-lem of the associated clamps reaching the hemimethy-lated site x meth after starting at some position x . Wecan thus mirror image Eq. (3) (since x rel = 0 providesa left boundary for this problem while x meth provided aright boundary in the context of Eq. (3)) and replace x with x d , x meth with 0, and D SL with D rel to obtain P ( t | x rel >
0) = erf (cid:18) x d √ D rel t (cid:19) (A7)for the probability that at time t the two clamps startingat an initial distance of x d have not yet touched. Theprobability density associated with the return of the dis-tance between the two clamps to 0 from a distance of x d is therefore given by the negative derivative of thisprobability, i.e., p dissoc ( t ) = − ∂P ( t | x rel < x meth ) ∂t = x d √ πD rel t exp (cid:20) − x d D rel t (cid:21) . (A8)6
3. Location of reassociation
Since at the time of reassociation the two clamps are atthe same location, all we have to do to find the locationof this event is to follow the motion of the center of masscoordinate x CM during the excursion. Since this is a freediffusion, the probability density for the location of themeeting point x of the two clamps after a time t giventhat they dissociated at some location x is p return ( x | x , t ) = 1 √ πD CM t exp (cid:20) − ( x − x ) D CM t (cid:21) . (A9) Appendix B: Microscopic Parameter Calculation
The following are the full calculations used to deter-mine the microscopic protein dynamics from experimen-tal observables. In particular, we calculate the micro-scopic diffusion constant, D SL ,µ , and the microscopic as-sociation lifetime, τ A ,µ . The calculations of P M and τ ( x )calculations closely follow [23], a web published earlydraft of [24].
1. MutS-MutL Association Lifetime
First, we calculate the microscopic association lifetime.Consider first the macroscopic association lifetime, whichcan be written as τ A , M = τ A ,µ [( (cid:104) N A (cid:105) − p A + 1] + τ R ( (cid:10) N A (cid:11) −
1) + τ M (B1)where N A is the number of times the clamps are mi-croscopically adjacent during a single macroscopic asso-ciation, p A is the probability of microscopic associationgiven that the clamps are adjacent, τ R is the averagetime to return to the adjacent state, and τ M is the aver-age time to reach distance x M without returning to theadjacent state (i.e. the average time to macroscopic dis-sociation). Note that removing a single adjacent statefrom the factor multiplied by p A and multiplying it di-rectly by τ A ,µ ensures that there is at least one micro-scopic association in every macroscopic association. Thismust be true physically, since different diffusion rates areobserved during macroscopic association.Consider N A for a complex starting in the aggregatestate: P ( N A = 1) = P M P ( N A = 2) = (1 − P M ) P M P ( N A = 3) = (1 − P M ) P M P ( N A ) = (1 − P M ) N A − P M (B2)where P M is the probability for a newly microscopically dissociated complex to go to x M . Thus, (cid:10) N A (cid:11) = P M ∞ (cid:88) N A =1 N A (1 − P M ) N A − = 1 P M . (B3)In order determine P M we first consider P M as a func-tion of the distance between the clamps, which we willdenote as x for the remainder of this subsection to avoidthe more cumbersome notation of x rel used in the rest ofthe manuscript. Evaluation of this function at x = x d will give P M . ( P M ( x ) will refer to the probability to goto x M from some position x without visiting 0, while P M ≡ P M ( x d ) refers to the probability to go to x M from x d .) Additionally, since the clamps diffuse with inter-mittent DNA contact, P M ( x ) will be calculated underthe assumption that the distance between clamps diffusescontinuously. This allows us to write P M ( x ) = 12 P M ( x + δx ) + 12 P M ( x − δx )0 = P M ( x + δx ) − P M ( x ) + P M ( x − δx ) δx (B4)and therefore ∂ P M ( x ) ∂x = 0 (B5)with the boundary conditions P M (0) = 0 P M ( x M ) = 1 . (B6)The unique solution of this differential equation is P M ( x ) = xx M (B7)and thus P M ≡ P M ( x d ) = x d x M (B8)where x d is the separation of the clamps immediatelyfollowing dissociation. Therefore we conclude that (cid:10) N A (cid:11) = x M /x d . (B9)In order to compute the microscopic association life-time τ A ,µ from Eq. (B1), it is also necessary to computethe average return time τ R and the average time τ M toreach x M . To this end, consider the average time τ ( x )for the distance between the clamps to reach either 0 or x M given that the starting distance is x : τ ( x ) = (cid:88) paths t p ( x ) P p ( x ) (B10)where t p ( x ) is the time for a path of length x and P p ( x )is the probability of such a path. Consideration of the7effect of single infinitesimal time step δt allows us to write τ ( x ) = (cid:88) paths t p ( x ) P p ( x )= (cid:88) paths (cid:104) t p ( x + δx ) P p ( x + δx ) + (B11)+ 12 t p ( x − δx ) P p ( x − δx ) (cid:105) + δt = 12 τ ( x + δx ) + 12 τ ( x − δx ) + δt. Thus, division by the square of some small spatial step δx yields − δtδx = τ ( x + δx ) + τ ( x − δx ) − τ ( x ) δx . (B12)Therefore, ∂ τ ( x ) ∂x = − δtδx = − D rel , (B13)where we write the right hand side in terms of the diffu-sion constant D rel = D S + D L . The boundary conditions τ (0) = 0 τ ( x M ) = 0 (B14)allow us to conclude τ ( x ) = xD rel ( x M − x ) . (B15)We now write this quantity in terms of τ R and τ M asfollows: (cid:10) N A (cid:11) τ ( x d ) = τ R ( (cid:10) N A (cid:11) −
1) + τ M . (B16)Thus, substitution into Eq. (B1) yields τ A , M = τ A ,µ [( (cid:104) N A (cid:105) − p A + 1] + (cid:10) N A (cid:11) τ ( x d ) (B17)Finally, we can conclude τ A ,µ = τ A , M − (cid:10) N A (cid:11) τ ( x d )[( (cid:104) N A (cid:105) − p A + 1] (B18)where (cid:10) N A (cid:11) = x M /x d .
2. Microscopic Diffusion Constant
Having computed the microscopic association lifetime,we turn our attention to the microscopic diffusion con-stant. During microscopic association, the observablequantity, that is, the diffusion of the “center of mass”of the oscillating dissociative complex, is given by D M,SL = P A D SL ,µ + P D D CM , (B19) where D SL ,µ and D CM are the microscopically associatedand dissociated complex diffusion rates, respectively, and D M,SL is the measured, macroscopic diffusion rate ofthe complex. P A and P D are the probabilities that theclamps are associated and dissociated, respectively. Asargued in Sec. A 1, D CM = D S D L D S + D L . It follows that thequantity needed for the microscopic model, the micro-scopic diffusion constant, is given by D SL ,µ = 1 P A ( D M,SL − P D D S D L D S + D L ) (B20)Since D M,SL , D S , and D L are measured experimen-tally, we only need to write P A and P D in terms of ob-servable quantities to obtain a value for D SL ,µ . In orderto do this, we observe that the probabilities that the pro-teins are microscopically associated and dissociated aregiven by the ratios of average time spent in an associatedand dissociated state, respectively, divided by the sum ofthese times: P A = p A τ A ,µ p A τ A ,µ + τ R (B21) P D = τ R p A τ A ,µ + τ R , (B22)where τ A ,µ is the microscopic association time, and τ R is the average time to return to the adjacent state. τ A ,µ is multiplied by the association probability, p A , becausethere are 1 /p A returns with time τ R for every microscopicassociation. Note that τ M does not enter these equations.This is because the final walk from x rel = 0 to x M hasonly a minor influence on the experimentally measureddiffusion rate as τ M represents only the last ∼ x M /D rel ≈ . ≈
30 s macroscopic association.Eq. (B16) gives an expression for τ R in terms of τ M , soin order to determine τ R we must first compute τ M . For-tunately, we can calculate τ M in a way that is analogousto the calculation of τ ( x ) in the previous section. Goingback to a discrete picture, during a random walk thatresults in a separation distance x = x M before reaching x = 0, the first step after dissociation is from x = x d to x = 2 x d . Thus, τ M = τ step + (cid:10) N x d (cid:11) τ x d ,M (2 x d ) , (B23)where τ x d ,M ( x ) is the average time for the distance be-tween the clamps to reach either x d or x M and N x d isthe number of times the distance reaches x d before go-ing to x M . Modifying the calculation of τ ( x ) with theappropriate boundary conditions τ x d ,M ( x d ) = 0 τ x d ,M ( x M ) = 0 (B24)we find τ x d ,M ( x ) = x − x d D rel ( x M − x ) (B25)which yields τ M = x d D rel + (cid:10) N x d (cid:11) x d D rel ( x M − x d ) . (B26)8Similarly, (cid:10) N x d (cid:11) can be computed in the same waythat (cid:10) N A (cid:11) was found earlier. In particular, (cid:10) N x d (cid:11) = 1 P x d ,M , (B27)where P x d ,M is the probability that the distance goes to x M before x d from distance 2 x d .Using Eqs. (B4) and (B5) with boundary conditions P M ( x d ) = 0 P M ( x M ) = 1 (B28)we get P x d ,M = x − x d x M − x d . (B29)Finally, since we assume that the walk starts at x = 2 x d , (cid:10) N x d (cid:11) = x M − x d x d . (B30)Appropriate substitutions and algebraic manipulationsyield D SL ,µ = D SL , M − δ ( D CM − D SL , M ) (B31)with δ = R x R τ (cid:16) − R x − R x (cid:17) (cid:16) R x p A (1 − R x ) (cid:17) − R x ) − R τ (B32) ≈ R x R τ (cid:18) R x p A (cid:19) (B33)where R x ≡ x d x M ∼ − , R τ ≡ x M τ M,A D rel ∼ − for thespecific values of the parameters and the approximationin the second line holds since R x (cid:28) R τ (cid:28)
1. In thefollowing section we show that 10 − ≤ p A ≤
1. For theexperimental values of the parameters and p A = 10 − the correction δ ( D CM − D SL , M ) is ∼
50 bp /s ∼ . D SL , M and for p A = 1, this correction is ∼ /s ∼ .
01% of D SL , M . Thus, D SL ,µ ≈ D SL , M . (B34)
3. Approximation of association probability lowerlimit
The lower limit of the association probability can becalculated under the assumption that p A ≥ P assoc, soln ,where P assoc, soln is the probability that a MutL in so-lution colliding with a DNA-bound MutS will associate.As discussed in the main text of the paper, it should beeasier for MutL and MutS to bind when they are bothalready somewhat aligned by their formation of clampstructures on the DNA. The association probability P assoc, soln is given by theratio P assoc, soln = k on, exp /k on, max , (B35)where k on, exp is the experimental rate at which MutLassociates with MutS on DNA from solution, and k on, max is the rate at which MutS and MutL collide (e.g. thediffusion limited rate).We first focus on the diffusion limited rate. The Smolu-chowski equation yields an expression for the diffusion-limited rate constant for two uniform spheres [25]: k on, max = 4 πDR, (B36)where D is the relative diffusion constant and R is thereaction radius.Manelyte et al. give the MutS Stokes radius as R S,S ∼ et al. give the MutL Stoke radiusas R S,L ∼ R ≈ R S,S + R S,L ∼
10 nm.To determine the relative diffusion constant D , weuse the measured MutS diffusion along the DNA strand, D S = 0 . ± . µ m / s, and the Stokes-Einstein dif-fusion of MutL in water at room temperature D L , soln = k B T πηR S,L ≈ × − m /s (cid:29) D S . Thus D ∼ × − m /s and the diffusion limited on rate is k on, max ∼ M − s − . (B37)We can now turn to the experimental on rate. Liu et al. do not measure this rate directly, but they dofind the fraction F SL of an ensemble of DNAs on whichMutS-MutL complexes associate in equilibrium to behigh enough to perform the experiment, i.e., a signifi-cant fraction of their constructs shows association of aMutL at their experimental concentration of MutL [3].We thus choose F SL = 0 . F SL ≈ K d,S = 0 . µM [28] and the measured MutL off rate k off,L ∼ /τ on , L ≈ /
850 s can be used to estimate thedesired on rate. The fraction of DNAs with MutS-MutLassociated is given by F SL = [SLDNA] / [DNA] = k on,L [L][SDNA] k off,L [DNA] (B38)and thus k on,L = k off,L F SL K d,S [ L ][ S ] . (B39)For the reported [ L ] ≈
20 nM and [ S ] ≈
10 nM k on,L ∼ M − s − (B40)for the worst case estimate F SL = 0 . k on,L ∼ M − s − for F SL = 1. Thus we conclude that P assoc,soln ∼ − M − s − (B41)and therefore 10 − ≤ p A ≤ − ≤ p A ≤ F SL = 1.9 Appendix C: Base pair stepping simulation