Effects of intersegmental transfers on target location by proteins
aa r X i v : . [ q - b i o . S C ] J u l Effects of intersegmental transfers on target location by proteins
Michael Sheinman and Yariv Kafri
Department of Physics, Technion-Israel Institute of Technology, 32000 Haifa, Israel (Dated: November 2, 2018)We study a model for a protein searching for a target, using facilitated diffusion,on a DNA molecule confined in a finite volume. The model includes three distinctpathways for facilitated diffusion: (a) sliding - in which the protein diffuses along thecontour of the DNA (b) jumping - where the protein travels between two sites alongthe DNA by three-dimensional diffusion, and finally (c) intersegmental transfer -which allows the protein to move from one site to another by transiently binding bothat the same time. The typical search time is calculated using scaling arguments whichare verified numerically. Our results suggest that the inclusion of intersegmentaltransfer (i) decreases the search time considerably (ii) makes the search time muchmore robust to variations in the parameters of the model and (iii) that the optimalsearch time occurs in a regime very different than that found for models whichignore intersegmental transfers. The behavior we find is rich and shows surprisingdependencies, for example, on the DNA length.
I. INTRODUCTION
Many biological processes depend on the ability of proteins to locate specific DNA se-quences on time scales ranging from seconds to minutes. Examples include gene expressionand repression, DNA replication and others [1]. Naively, one might expect the protein tosearch for its target using only three-dimensional diffusion . Neglecting interactions of theprotein with the environment and the DNA (apart from the target site) one then finds, usingresults first obtained by Smoluchowski [3], that the average search time, t search , is given by: t search ∼ Λ D r . (1) In this paper we only consider proteins whose motion is diffusive and not directed (directed motion couldresult from consumption of, for example, chemical energy and is discussed in [2]).
Here D is the three-dimensional diffusion constant of the protein, r is the target size and Λ is the volume that needs to be searched. Assuming a target size of the order of a base-pair r ≈ . nm , a typical nucleus (or bacteria) size of Λ ∼ nm and using the measuredthree-dimensional diffusion coefficient for a GFP protein in vivo, D ∼ nm /s [4], onefinds t search of the order of hundreds of seconds. If N proteins are searching for the sametarget the search time is given by t searchN ≃ t search /N . This suggests that about 10 proteinscould find a target in reasonable times for cells to function properly.In real systems, due to the interactions of proteins with non-specific DNA sequencesand the environment [5], the picture is more complex. Indeed, in vitro experiments havesuggested that mechanisms other than three-dimensional diffusion are used by many proteinsto locate their targets [6, 7]. These strategies have been studied and debated extensivelyboth in the context of in vivo [8, 9, 10, 11] and in vitro systems [8, 10, 12, 13, 14, 15, 16]and are believed, in general, to allow for search times which are faster than that given byEq. (1).Historically, the first strategy that was proposed combines one-dimensional diffusion (slid-ing) over the DNA with intervals of three-dimensional diffusion (typically called jumping inthis context) [8, 17] (see Fig. 1). Each individual search mechanism, when applied alone,has shortcoming and advantages over the other. When using only three-dimensional diffu-sion, the number of new three dimensional positions probed grows linearly in time but theprotein spends much time probing sites where there is no DNA present. In contrast, duringone-dimensional diffusion the protein is constantly bound to the DNA but suffers from aslow increase in the number of new positions probed as a function of time ( ∼ t / , where t denotes time) [18]. As shown, for example, in Refs. [8, 17] by intertwining one and threedimensional search strategies and tuning the properties of both one can in fact decrease thesearch time significantly .The combined strategy, while better than the pure search strategies, comes at a cost of be-ing sensitive to changes in the properties of either the three-dimensional or one-dimensional The relation between the search time t search for one protein and search time t searchN for N proteins remainsunchanged throughout the paper. Clearly, a pure one-dimensional search strategy is not efficient due to the slow diffusive search along theDNA, t search ∼ L D ∼ O ( hours ), where L ∼ nm is the genome length and D is the one-dimensionaldiffusion coefficient that was measured indirectly [12] and directly [19, 20] to be much smaller than three-dimensional diffusion coefficient D ∼ nm s [4]. diffusive processes. For example, as we argue below, the typical search time changes expo-nentially in the square root of the ionic strength. Moreover, given the many constraints onthe protein to function it is very restrictive to demand optimization for the search process.Indeed, equilibrium measurements [21] and recent single molecule experiment [19, 20] on theLac repressor protein suggest that the search process may not be in general optimized forthis search strategy.A third mechanism which was suggested to speed the search time is intersegmental trans-fer (IT) [22, 23]. During an IT the protein moves from one site to another by transientlybinding both at the same time. In principle the new site can be either close along the one-dimensional DNA sequence (or chemical distance) or distant (see Fig. 3). This mechanismis likely to be relevant for the proteins that have more than one binding domain like theLac repressor [24, 25], GRdbd [26] and SfiI enzyme [27]. However, it could also occur inproteins with a single binding site in locations where the DNA crosses itself . To date weare aware of direct evidence for IT only for RNA polymerase [28]. However, measurement ofthe dissociation rate from a labeled (operator) DNA site of the rat glucocorticoid receptor[26], CAP and Lac repressor [29] revealed significant dependence on the DNA concentrationin the solvent, a possible explanation for which is IT. Some theoretical work has suggestedthat in vivo , when the DNA concentration is much larger than in vitro experiments, IT mayplay a determinative role [8, 11, 16]. These studies focus on the ITs resulting from the DNAdynamics and consider the protein to be point like.In this paper we present a rather comprehensive study of the effects of ITs on the searchprocess for a DNA molecule confined in a finite volume, similar to the in vivo scenario.Our work complements previous ones by explicitly accounting for the size of the proteinand considering two limiting cases: (i) DNA which is completely static during the searchprocess and (ii) DNA whose motion is quicker than that of the protein’s motion along theDNA. Using scaling arguments backed by numerics we obtain expressions for t search , andthe optimal search time (obtained by tuning parameters such as the DNA-protein affinity).A central conclusion of this paper is that the search time is much more robust to variationsin parameters when ITs are allowed . This is to the extent that in some cases any finite Of course, this fact may be both advantageous and disadvantageous for the cell. In some cases thecell needs transcription factors whose kinetic (and, therefore equilibrium) properties do depend on theenvironment and in other cases it doesn’t. (c)(e) (b)(a) (d)
FIG. 1: Schematic plots illustrating the different mechanisms that can participate in the facilitateddiffusion process. Here dashed arrows represent different protein moves, the solid curve representsthe DNA and a small circle with two legs indicates a protein with two binding domains. The figureshows (a) sliding, (b) a correlated intersegmental transfer, (c) an uncorrelated intersegmentaltransfer, (d) jumping. The distinction between (b) and (c) is defined in Sec. III. (e) The dashed(dotted) line represents a one-dimensional (three-dimensional) distance. jumping rate can have a negative influence on the search time. In particular, the optimalsearch time is found to occur for parameter regimes very different than the canonical one(see Sec. II) found in models which ignore ITs. Perhaps most important, as we show, ourwork suggests that ITs could explain recent findings which indicate a much higher affinityof the TF Lac repressor to the DNA than required by an optimal search strategy which usesonly sliding and jumping [19, 20, 21].The scaling dependence of the search time on different parameters is rich and very dif-ferent from regular facilitated diffusion (involving only sliding and jumping). Consider, forexample, the dependence of the search process on the length of the DNA, L for a DNAconfined in a volume Λ . Using only sliding and jumping the regime typically thought tobe relevant to experiment has a linear dependence of the search time on the DNA length L .A Smoluchowski-like search time is independent of L . In contrast, when ITs are allowed wefind different behavior. We estimate that the regime most relevant to in vivo experiments (inprokaryotic organisms) occurs when the dependence on the length of the DNA is weak . Forexample, when a searches are performed using only ITs the search time can be independentof L or scales as √ L depending on the DNA’s dynamics. The scaling behaviors relics on theconfinement of the DNA in a finite volume (shown in detail in Fig. 9) and could be used asexperimental probes for the existence of ITs.The paper is organized as follows: Sec. II briefly reviews the main arguments used toanalyze searches that combines only sliding and jumping. In Sec. III the average search timeis calculated for the case of a strategy based only on ITs for both quenched and annealedDNA. In Sec. IV a search process that includes ITs and sliding is considered. Sec. Vconsiders the possibility that the protein can unbind from the DNA (jump) and performITs. Sec. VI studies a model with all three mechanisms. Finally, in Sec. VII we discusspossible scenarios for the Lac repressor and summarize in Sec. VIII. II. SLIDING AND JUMPING
To set the stage for a discussion of the effects of IT we consider a search process whichuses only sliding and jumping. The discussion follows Refs. [9] and [13] closely. We imaginea single protein searching for a single target located on the DNA. The search is composedof a series of intervals of one-dimensional diffusion along the DNA (sliding) and three-dimensional diffusion in the solution (jumping). The typical time of each is denoted by τ and τ respectively. Following a jump, the protein is assumed to associate on a newrandomly chosen location along the DNA. While this approach is somewhat simplistic forjumps occurring in two-dimensions and below, for three dimensions, which case we consider,it is well suited [30].Under these assumptions, during each sliding event the protein covers a typical length l ,where l ∼ √ D τ (often called the antenna size) [18]. Since correlations between the loca-tions of the protein before and after the jump are neglected, the search process, completedwhen roughly all the DNA is scanned, is separated into N r ∼ Ll s (2)rounds of sliding and jumping. Here l s is the typical length scanned by the protein during around. If during the slide the protein does not skip sites on the DNA l s ∼ l (the distinctionbetween l s and l will become apparent when ITs are introduced). The total time needed tofind a specific site is then: t search = N r τ r , (3)with τ r = τ + τ . Using Eqs. (2) and (3) one obtains t search ∼ Ll s ( τ + τ ) ∼ L √ D (cid:18) √ τ + τ √ τ (cid:19) . (4)Furthermore, it is easy to argue (see Appendix A) that τ ∼ Λ D L . (5)In Fig. 2 a comparison between the presented scaling arguments and a numerical simulationof a search that explicitly includes sliding on a DNA with a frozen configuration in a finitevolume and three dimensional diffusion is shown (see Appendix B for details of the numerics).The excellent agreement justifies many of the simplifications made, in particular, the neglectcorrelation between the initial and final location of the jump. Throughout the paper weassume this always holds (see Appendix B).The analysis leads to a richer range of possible behaviors than found in Eq. (1), wherethe search time depends only on the volume in which the DNA is embedded [10]. Here, incontrast, three regimes are found: (i) For τ ≪ τ there is no dependence on L and the searchtime is given to a good approximation by Eq. (1). (ii) For L D ≫ τ ≫ τ the dependence onthe DNA length is linear. This is the regime typically considered relevant for experiments.(iii) For L D ≪ τ one finds t search ∝ L .It is natural to ask which τ optimizes t search . Using Eq. (4) it is easy to verify that (cid:0) τ opt (cid:1) = τ , (6)where 0 denotes a value obtained with no ITs. Alternatively, one can consider an optimalantenna size ( l opt ) = √ D τ . When this condition is met, the total search time scales as t searchopt = r τ D L ∼ s Λ LD D . (7)Note that the √ L dependence is obtained by optimizing, say τ , as L is varied. −0.5 0 0.5 1 1.5 2 2.5 3 3.5 46.577.588.599.510 Log l Log t sea r c h FIG. 2: The search time t search is shown as a function of the antenna length, l . The thin linerepresents the results from numerical simulations while the bold one is given by Eq. (4). Numericswere performed on a DNA embedded in a finite volume with a frozen configuration. The length ofthe DNA was taken to be 1224000 lattice constants and D = D = 1 (see details in Appendix B).Similar results were obtained for different values of D and D . This model, at the optimal τ and assuming known values for D , L and τ , predictsreasonable search times in vivo and is commonly assumed to give a possible explanation forthe two order of magnitude difference between the experiments in vitro and Eq. (1).Within the model the optimal search process requires fine tuning of the antenna size, l ,as a function of the parameters D and τ . These parameters depend on various cell andenvironmental conditions such as the size of the cell, the DNA length, the ionic strength etc.The dependence can be quite significant: for example, the parameter τ τ has an exponentialdependence on the square root of the ionic strength [31]. Deviations of this parameterfrom the optimum value might be crucial to the search time since t search t searchopt = (cid:16)q τ τ + q τ τ (cid:17) .Indeed, a strong dependence of the search time on the ionic strength was found in in vitro experiments [7]. Interestingly, in vivo , when the DNA is densely packed, no effect of theionic strength on the efficiency of the Lac repressor was revealed [32]. Other experimentsalso suggest that τ is not optimized. In particular, equilibrium measurements [21], as wellas recent single molecule experiment [19, 20], find a value of τ for dimeric Lac repressorthat is much larger than the predicted optimum τ in vivo .The lack of sensitivity to the ionic strength in vivo and the rapid search times found forthe Lac repressor, even with very large values of τ , suggest that other processes, apart fromjumping and sliding, are involved in the search process. These seem to be more important in vivo than in vitro . In the next section we show that a search process which uses ITsmodifies the behavior found for searches which use only sliding and jumping in a significantmanner. In particular the problems encountered above (e.g., high sensitivity to the antennalength, very long and non-optimal measured antennas etc.), are largely eliminated when ITsare included. III. PURE INTERSEGMENTAL TRANSFER
Before turning to the full problem of a search which uses sliding, jumping and ITs we willconsider a series of simplified models. Within the first model, considered in this section, theprotein can only perform ITs. We will see that already at this level many of the problemsof the search discussed above, which uses only sliding and jumping, are resolved to a largeextent.To model ITs we consider a protein with two binding sites. The protein can either haveone site bound to the DNA or perform an IT to a new location by having both binding sitesbound to the DNA (see Fig. 1). The DNA is scanned for the target by the binding sites,each checking a length b when bound (note that since the protein has to align with the DNAsequence, b is of the order of a length of a single base-pair). A possible motivation for thispicture is, for example, the tetrameric structure of the Lac repressor. However, as will beevident many results also apply to proteins with different shapes.Motivated by DNA in cells, we consider a DNA molecule which is densely packed in a smallvolume. In typical systems the DNA has a total length of L ∼ nm , a persistence length L ∼ nm , a cross section radius ρ ∼ nm and is contained in a volume of Λ ∼ nm .The typical distance between segments of DNA of length L is therefore much smaller than L : Λ L/L ≪ L . Under these conditions, using Λ ≫ L , it is easy to check that the radiusof gyration of free DNA, which is of the order of L q LL is much larger than the cell sizeΛ - the DNA is densely packed even though its fractional volume, Lρ / Λ , in the containeris small (about one percent). By way of comparison, typical protein sizes are in the range R ∼ − nm , much smaller than the DNA’s persistence length. Although in vivo the packinghas a more complicate structure than we consider, we expect similar behavior to occur alsothere.As stated above the protein moves by first being bound with only one binding site andthen with both. The typical time for this, defined by δ = τ b + τ IT , is the sum of the typicaltime that protein probes a length b (by being bound with one domain) and the time thatthe protein is bound with both binding domains to the DNA while performing an IT . Weassume that the protein moves (for example, using both legs of the Lac repressor) to arandom position located at a distance smaller or equal to R , the size of the protein, from it(see Fig. 1) . Defining a “chemical” coordinate x which runs along the length of the DNAthe protein can either perform moves from its location x to the interval [ x − R, x + R ] (werefer to these as “correlated ITs” (CITs)) or reach distant sites along the chemical coordinateavailable through the structure of the packed DNA.Under the above conditions it is easy to verify (see Appendix C) that almost all ITsperformed by the protein are either correlated moves or performed to a coordinate along theDNA whose distance from its previous location is bigger than Λ L (but smaller than L ). Wecall these steps “uncorrelated ITs” (UITs) (see Fig. 1(c)). In other words, one can safelyneglect the possibility that the protein will move using ITs to a chemical distance largerthan R and smaller than Λ L .Our main interest is the typical search time. For this purpose it is useful to define λ -the average length that the protein travels before performing an UIT. On chemical distanceslarger than R but smaller than λ the motion is effectively diffusive in one dimension witha diffusion coefficient D eff ∼ R δ . On chemical distance scales larger than λ and smallerthan L the motion is controlled by UITs. Due to the three-dimensional nature of each UITone expects correlations between different UITs to be negligible. We verify this assumptionlater using numerical simulations.From the discussion and using a language similar to that of Sec. II the search processcan be described as a sequence of N r ∼ Ll s (8) rounds of correlated ITs where l s is the length scanned by the protein during each round We take δ independent of parameters such as cell size Λ and the DNA length, L . This is justified in aregime where most ITs are close along the chemical coordinate of the DNA. Different scenarios are considered at the end of the section. τ r ∼ λ D eff ∼ (cid:18) λR (cid:19) δ . (9)In general while performing CITs the protein can miss regions of the DNA by skippingover them. Since each segment of size R is visited p τ r δ ∼ λR times [18], when λR ≫ Rb thewalk is recurrent and no sites are skipped so that l s ∼ λ . In contrast, when λR ≪ Rb the walkis not recurrent and l s ∼ λR/b λR ∼ λ R b . Therefore the recurrence length, l R ∼ R b , (10)separates between two regimes l s ∼ λ l R λ ≪ l R λ λ ≫ l R , (11)the first transient and the second recurrent.Using Eqs. (3), (8), (9) and (11) the typical search time is obtained t search ∼ Ll s (cid:18) λR (cid:19) δ ∼ Lb δ λ ≪ l RLλR δ λ ≫ l R . (12)To complete the expression one needs to evaluate λ . Its value depends on various parametersand, in particular, the time scale which characterize the motion of the DNA. As discussedin the introduction we consider two extreme regimes - quenched DNA and annealed DNA.In both cases λ can be evaluated from an intermediate quantity, p , the probability thatthe protein can make an UIT from a specific location x on the DNA. Since this quantity isindependent of the DNA’s motion we estimate it first before turning to the two regimes.To do so, we consider a packed DNA as an ideal gas of LL straight rods of length L thatare distributed randomly in the cell (see Fig. 3). The probability p seg , that two given rodscross within a distance of R from each other is given by p seg = A L Λ R L = A L R Λ , (13)where A is a constant of order unity. Here L Λ is the probability that a given segmentsis located within a distance L of a point inside the cell and R L is proportional to theprobability that this segment crosses a sphere of radius R around the point. Under the1 FIG. 3: Illustrated schematically is the simplified treatment of the folded DNA. We first representthe DNA as the ideal gas of rods each with of a length of one persistence length. Then we connectthe rods randomly to form a small world network (see text for details). Numerically we find thedescription to work well. conditions described above we find that typically p seg ≪
1. Finally, to relate p to p seg wenote that to make an IT at least one segment should be accessible. This yields p = 1 − (1 − p seg ) L/L ≃ − e − A LR . (14)Eq. (14) implies that there are two possible regimes depending on the value of Lp = A LR Λ ≪ L ≪ L c L ≫ L c , (15)where L c ∼ Λ R . (16)In essence when L ≫ L c (which can occur for example by having a large protein) p ≃ L ≪ L c we have that p = A LR Λ ≪ L c for the range of parameters of interest is ofthe order of 10 nm for very large proteins ( R of order of tens of nm , similar to the Lacrepressor). Therefore in vivo we expect a relatively large L c , so the regime L ≪ L c shouldbe relevant . In a eucaryotic cell the concentration of DNA is much higher and this statement may be wrong. b , the scanned length on the DNA during one binding event. Each step on thenetwork takes on average a time δ . During an IT the protein can move from its position, x ,to a randomly chosen position in the interval [ x − R, x + R ] along the chemical coordinate(correlated transfer) with probability 1 − p or to an uncorrelated site with probability p (uncorrelated transfer). Such networks are commonly referred to as Small World Networks[33] (see Fig. 3(c)).To find the relation between λ and p one has to consider the dynamics of the DNA.Below we consider two extreme cases (a) a completely quenched DNA configuration and (b)a strongly fluctuating DNA, which we term annealed. A quenched DNA is static throughoutthe search process. An annealed DNA changes its conformation on time scale much quickerthan the motion of the protein. A. Quenched DNA
In this section we derive the search time for a quenched DNA. In particular we will showthat it is has a non-trivial behavior as a function of L . In the regime that is expected to berelevant in vivo the search time is independent of the DNA’s length (see Fig. 4).For quenched DNA one expects that if an UIT can occur at point x it can also happenin a region of size R around it. Similar considerations apply to sites where an UIT can notoccur. The typical distance traveled by the protein along the DNA’s chemical coordinatebetween two subsequent UITs, L > λ > R , is of the order of the typical distance betweentwo distinct locations where an UIT can occur. This implies for p ≫ R/L a scaling of λ ofthe form λ ∼ Rp , (17)where p is defined above (see Eq. (15)) while for p ≪ R/L clearly λ = L (see Fig. 5).From the previous discussion one may infer that there are three distinct behaviors as afunction of L shown on Fig. 5. The first regime occurs for DNA so short that an UIT cannotoccur during the search. This happens when p ≪ R/L , or equivalently when L ≪ L Q , where3 Log [L] Log [t sea r c h ] I III t search ~ δ L /R t search ~ δ L/bt search ~ δΛ /R II L L FIG. 4: The search time, t search , is plotted as a function of L , the DNA length for the pure IT casewith a quenched DNA configuration. The circles represent numerical data, while the solid line wasobtained using Eqs. (21) ,(23) and (24). The three visible regimes correspond to the three on Fig.5 (see also Fig. 9). In this plot R and b were taken to be 3 and 1 lattice constants respectively(the rest of the details are found in Appendix B). The search time is shown in units of δ .FIG. 5: The schematic behaviors of λ and l s as a function of L (on a log-log scale) is shown forquenched DNA and b ≪ R . L Q = r Λ R . (18)In fact, the estimate for L Q pushes the limit of our treatment since the DNA is no longerdensely packed in this regime. Nonetheless, we find good agreement with numerical simula-tions.The other regimes occur for L ≫ L Q , where one has p ≫ R/L . In this case the proteinscan use UITs during the search. As discussed above there is a length scale L c separatingtwo distinct behaviors of p , and therefore we have three different behaviors for λ which aregiven by (see Fig. 5): λ ∼ L L ≪ q Λ R = L Q LR L Q ≪ L ≪ L c R L ≫ Λ R = L c , (19)where as before L c = Λ /R . Furthermore, as described above, the scan between twosubsequent UIT can either be recurrent ( λ ≫ l R ) or transient ( λ ≪ l R ) with a crossoverlength L Q . This length scale L Q is determined by the condition λ (cid:16) L = L Q (cid:17) ∼ l R . In therecurrent regime the walk between two ITs doesn’t skip locations on the DNA. This is incontrast to the transient regime where many sites are skipped. Thus using Eqs. (15) and(17) one finds L Q = Λ R b . (20)For L ≫ L Q the search between two subsequent UITs is short and therefore transient whilefor L ≪ L Q the search between two subsequent UITs is long and therefore recurrent.Note that when the search is transient, t search is independent of λ (see Eq. (12)). There-fore, the crossover between two distinct scaling behaviors of t search is governed by the smallerof the two length scales L c and L Q . For proteins performing only ITs one expects b to besmaller than R . It is easy to see that in such cases L Q is smaller than L c . (Other possibilitiesare discussed in Sec. IV.)To summarize there are two length scales L Q and L Q which separate three possible regimes(see Fig. 5). • Regime I : L ≪ L Q λ ∼ L . There are no UITs and Eqs. (11) and (12) give t search ∼ L D eff ∼ L R δ . (21)This regime is clearly not relevant in vivo (using the typical values, Λ ∼ µm and R ∼ nm ,we find L Q ∼ µm which is much shorter than typical DNA lengths). • Regime II : L Q ≪ L ≪ L Q Now the motion between two subsequent UITs is recurrent, l R ≪ λ , and Eq. (17) gives λ ∼ Λ LR . (22)Using Eqs. (11) and (12) we obtain t search ∼ Λ R δ . (23)Note that in this regime, as opposed to Sec. II, the search time is independent of the DNA’slength. Eq. (23) is equivalent to Eq. (1) with an effective three-dimensional diffusioncoefficient D ∼ R rδ . In contrast to the simple three-dimensional diffusive search Eq. (23)does not depends on the target size r but rather on the protein size which may be muchlarger. • Regime
III : L ≫ L Q Here λ ≪ l R . and Eqs. (11) and (12) give t search ∼ Lδb . (24)The obtained results, compared to numerics, are summarized in Figs. 4 (see also Fig.9). One can clearly see the three regimes arising for different lengths of DNA which areseparated by L Q and L Q . The details of the numerical simulation are described in AppendixB. Note that L Q and L Q are well predicted by the scaling arguments.The most relevant regime for in vivo experiments in prokaryotic organisms is likely to bethe intermediate regime ( II ) where the search time is independent of the DNA’s length andscales as Λ . Comparing the search time in this regime (23) with the minimal search time inthe case when sliding and jumping are used, Eq. (7), one may see that if δ < R q L Λ D D thesearch time in the pure IT scenario is in fact smaller than the one of Sec. II which includesonly sliding and jumping case. This is despite the fact that the protein never unbinds fromthe DNA .6 B. Annealed DNA
In this section we consider the annealed case. As we show, here the search time also hasnon-trivial but different than the quenched case behavior as a function of L . In the regimethat is expected to be relevant in vivo the search time scales as √ L .In the annealed case the time scale for a rearrangement of the DNA’s configuration isassumed to be much smaller than the time of the protein’s motion during an IT. As a resultof the constant rearrangement of the DNA UITs now occur with probability p for each IT.The average number of ITs with no UITs performed is therefore of the order of p and thusthe average time that the protein spends between two subsequent UITs is δp ( δ as beforeis the typical time between two subsequent ITs). On one-dimensional length scales smallerthan λ the protein diffuses with a diffusion constant D eff ∼ R δ . Therefore, the typicalone-dimensional distance between two subsequent UITs λ is λ ∼ s D eff δp ≃ q Λ L L ≪ L c R L ≫ L c , (25)where L c is defined in Eq. (16). As for the quenched case we will see that again threedistinct behaviors arise with two crossover lengths.The first occurs when no UITs occur. The crossover length L A can be extracted usingthe condition λ (cid:0) L = L A (cid:1) ∼ L which under our assumptions on the protein’s size can onlyoccur when L ≪ L c . This yields L A ∼ Λ . It is easy to see that L A ≪ L Q . This means that, as expected, in the annealed case theeffects of UITs become important at much smaller DNA concentration than in the quenchedcase. This happens because fast DNA movements increase the probability to perform anUIT. As for L Q , the estimate for L A pushes the limit of our treatment since the DNA is nolonger densely packed in this regime.The second crossover length L A occurs when the motion between UITs becomes transient.It can therefore be estimated using λ (cid:0) L = L A (cid:1) ∼ l R . Taking the regime L ≪ L c in Eq. (25)yields L A ∼ b Λ R . (26)For target sizes much smaller than the protein size ( b ≪ R ), it is clear that L A ≪ L c (see7 FIG. 6: The schematic behavior of λ and l s as a function of L (on a log-log scale) is shown forannealed DNA and b ≪ R . Eq. (16)). Hence, using the same arguments as before, only two length scales, L A and L A ,determines three possible regimes (see Fig. 6).The three regimes which arise are: • Regime I : L ≪ L A Here λ ∼ L . There are no UITs and Eqs. (11) and (12) give t search ∼ L D eff ∼ L R δ . (27) • Regime II : L A ≪ L ≪ L A Here searches between two subsequent UIT are recurrent so that l R ≪ λ . Eq. (25) gives λ ∼ r Λ L . (28)Using Eqs. (11) and (12) we obtain t search ∼ √ L Λ R δ . (29)Here, in contrast to the quenched case the intermediate result scales with the length of theDNA as L / . Note that the search time is always shorter than that on a quenched DNA.This happens because the DNA’s movement destroys the correlation in the motion of theprotein and, therefore, increases the efficiency of the search. A similar dependence on L ( t search ∝ √ L ) was obtained for a different model [11]. There, however, the origin of the8dependence is different, and is linked to modeling the DNA’s motion as diffusion of an idealgas of rods. • Regime
III : L ≫ L A Here λ ≪ l R . Therefore, Eqs. (11) and (12) give t search ∼ Lδb . (30)The obtained results are summarized later in Fig. 9.The most relevant regime for in vivo experiments in prokaryotic organisms is likely tobe the intermediate one ( II ) where the search time scales as L / or alternatively as Λ / .Comparing the search time in this regime (29) with the minimal search time in the casewhen sliding and jumping are used (7) one may see that if δ < R √ D D the search time in thepure IT case is smaller than the one in the sliding and jumping case. This is despite thefact that the protein never unbinds from the DNA .Numerical simulation of the annealed case require dynamical moves for the whole DNAmolecule. This is a formidable task for DNAs with reasonable length which is beyond thescope of this paper. IV. INTERSEGMENTAL TRANSFER AND SLIDING
Next we consider a protein that can perform both ITs and sliding. Namely, in addition toITs the protein can perform one-dimensional diffusion with only one binding domain bound(see Fig. 1(a)). In the language of Sec. III, b is now the typical sliding length betweentwo subsequent ITs. Now each step (distinct from a round defined above), defined as theinterval between the ends of two subsequent ITs, takes a typical time δ = b D + τ IT , where D is the one dimensional diffusion coefficient of the protein with only one binding domainbound and τ IT is the typical time that the protein is bound to two DNA segments. The one-dimensional diffusion on the length scales larger than b has a different effective diffusion coefficientdue to the possibility of a CIT. Thus, to measure D on large length-scales one should not allow for ITs.This may by done, for example, by measuring the motion of the part of the protein that contains onlyone binding domain [19, 20]. b ≪ R it is straightforward to see that the results of the Sec. III hold with a redefined δ . However, in general the sliding length b might be much larger than the proteins size R .This is the regime that we focus on in this section.Clearly, now the search between two subsequent UIT is always recurrent so that l s ∼ λ .Here as before λ is the typical distance traveled by the protein between two subsequentUITs. However, now D eff ∼ b δ , where as above δ = b D + τ IT . The search time as afunction of λ , similar to Eq. (12), becomes t search ∼ Lλ λ D eff ∼ Lλb δ . (31)The value of λ , as in the previous section, depends on the dynamics of the DNA molecule.Again we consider two extreme cases (a) quenched DNA and (b) annealed DNA. A. Quenched DNA
To obtain λ we first introduce a new quantity, λ , defined as the typical chemical distancebetween two locations in which the protein can perform an UIT. Note that we are interestingin the regime b ≫ R . Therefore the values of λ and λ may be distinct since an UIT isnot necessarily performed at every possible location on the DNA. Clearly, however, thefunctional behavior of λ is identical to that of λ in the previous section. This yields (seeEq. 19) λ ∼ L L ≪ q Λ R = L Q LR L Q ≪ L ≪ L c R L ≫ Λ R = L c , (32)where we have used the definitions of L Q and L c of the previous section.Similar to the derivation of Eq. (10), when λ /b ≫ b/R , the effective random walk ofthe protein along a length λ is recurrent. Here recurrent motion implies that sites wherean UIT can occur are visited many times before a neighboring site where an UIT can occuris met (note that this is distinct from the recurrent behavior of Sec. III). In the recurrentregime a location of a possible UITs is visited many times and therefore not missed. Inthis case λ ∼ λ . In the opposite transient regime (again distinct in meaning from thatused in Sec. III), λ b ≪ bR and the protein performs an UIT only after it travels a distance λ ≫ λ . In the latter regime each IT has a probability Rλ to be an UIT. Therefore between0 (cid:79) (cid:79) / b R R (cid:79) b R (cid:79) b L / b R L FIG. 7: The schematic behavior of λ and l s as a function of L (on a log-log scale) is shown forquenched DNA, L ≫ b R and b ≫ R . two subsequent UITs the protein performs λ R ITs. Using the diffusive nature of the motionwe find λ ∼ b q λ R . The value of λ as function of λ is shown schematically in Fig. 7.Combining the three regimes of λ with the above mentioned crossover from λ ∼ λ to λ ∼ b q λ R (which occurs at L = Λ /b ) one finds, using b/R ≫ four regimes for thesearch time: • Regime I occurs for L ≪ L Q corresponding to λ ∼ λ = L in Eq. (32). Using Eq.(31) gives t search ∼ L b δ . (33) • Regime II occurs for Λ b ≫ L ≫ L Q and λ ∼ λ ∼ Λ LR . Using Eq. (31) yields t search ∼ Λ b R δ . (34) • Regime
III occurs for Λ b ≪ L ≪ L c . Now λ ∼ b q λ R ∼ b q Λ LR . Using Eq. (31) wefind t search ∼ √ Λ LbR δ . (35) • Regime IV occurs for L ≫ L c . Here λ ∼ b q λ R ∼ b and with Eq. (31) one gets t search ∼ Lb δ . (36)1
Log [t sea r c h ] Log [L]t search ~ δΛ /b R L c L II III IV t search ~ δ L /R I Λ /b t search ~ δ L/bt search ~ δΛ /b R(L/ Λ ) FIG. 8: t search is plotted as a function of L , the DNA length for a model with IT and sliding fora quenched DNA. The thin line with dots represent numerical data, while the bold solid line wasobtained using Eqs. (33) ,(34), (35) and (36) (see also Fig. 9). In this plot R and b were taken tobe 1 and 20 lattice constants respectively (the rest of the details could be found in Appendix B).The search time is shown in units of δ . Fig. 8 shows a comparison between the four theoretically predicted regimes and thenumerical simulation of the model. Three regimes are reproduced by the numerics while thefourth one was not reproduced due to computational limitations.For a moderate values of τ IT one may see that long sliding may drastically decreasethe efficiency of the search. This occurs because long sliding prevents both UITs that de-stroy correlations in the search process and CITs that increase the effective one-dimensionaldiffusive constant.2 B. Annealed DNA
Here using the arguments presented in Sec. III B, the average number of steps performedbetween two subsequent UITs is of the order of p where p is given in Eq. (15). This impliesa typical time between the subsequent UITs of the order of δp . Using the fact that along theDNA the motion of the protein is diffusive with an effective diffusion constant D eff ∼ b δ one finds λ ∼ s D eff δp . (37)Clearly, λ can only take values in the range b ≤ λ ≤ L . These with the possible values of p (see Eq. (15)) define the borders of the following three regimes: • Regime I occurs for λ ∼ L . Using Eq. (37) and p = LR / Λ it can be verified thatthis regime occurs when L ≪ Λ (cid:0) bR (cid:1) / . In this case no UIT occur during the searchand Eq. (31) gives t search ∼ L b δ . (38) • Regime II occurs when Λ (cid:0) bR (cid:1) / ≪ L ≪ L c . Using Eq. (37) and p = LR / Λ onefinds that in this case λ ∼ b q Λ LR . Using Eq. (31) gives t search ∼ √ Λ LbR δ . (39) • Regime
III occurs where L ≫ L c and almost all ITs are UITs. Here λ ∼ b and p ≃ t search ∼ Lb δ . (40)The obtained results are summarized in Fig. 9.One may see that in the case of long sliding, rapid DNA motion cannot decrease thesearch time significantly as in the pure IT case. This is because long sliding prevents fastdecay of correlations.
C. Motion with no CIT
Here we consider a case where the structure of the protein causes it to prefer UITs overCITs. This may occur, for example, in cases where the “legs” of the protein are antiparallel3 log search t log L search Lt R (cid:71)(cid:47) (cid:16) search Lt R (cid:71) (cid:16) searc t h Lb (cid:71) (cid:16) Annealed DNA A L A L III
III log search t log L search t R (cid:71) (cid:47) (cid:16) search Lt R (cid:71) (cid:16) searc t h Lb (cid:71) (cid:16) Q L Quenched DNA Q L IIIIII log search t log L search Lt R (cid:71) (cid:16) search
Lt b (cid:71) (cid:16)
Annealed DNA R (cid:47)(cid:168) (cid:184)(cid:169) (cid:185) c Lb (cid:167) (cid:183) IIIIII log search t log L search b Rt (cid:71)(cid:47) (cid:16) L search t R (cid:71) (cid:16) search Lt b (cid:16) Q L Quenched DNA / b IV IIIII (cid:71) I search Lt bR (cid:71)(cid:47) (cid:47) (cid:16) c L (cid:47) search Lt bR (cid:47) (cid:71)(cid:47) (cid:16) ( ) a ( ) b FIG. 9: In this figure the schematic behavior of t search as a function of the DNA length L isshown in absence of the jumps. (a) shows short sliding results ( b ≪ R ). (b) shows long slidingresults ( b ≫ R ). and rigid. The motion on length scale smaller than λ is then diffusive involving only slidingwith a diffusion coefficient D . In this case, clearly l s = λ and the time between twosubsequent UITs is given λ D + τ IT where τ IT is the time of an UIT. One finds, similar toSec. II t search ∼ Lλ (cid:18) λ D + τ IT (cid:19) . (41)The relationship between this and the picture of Sec. II is given by identifying the antenna’slength l with λ and the three-dimensional diffusion time τ with τ IT .Most of the results of Secs. III and IV are summarized in Fig. 9. The results of thissection indicate that ITs may supply reasonable search times if they are quick enough.Combining IT with sliding we see that even rare UIT events may break correlations createdby one-dimensional diffusion. In this sense ITs act as jumps without the need for detachmentfrom the DNA. Besides this, CITs may effectively accelerate the one-dimensional diffusionor even replace it altogether.4 V. INTERSEGMENTAL TRANSFER AND JUMPING
We now turn to consider the effect of jumping on the results described above. Beforeaddressing the full problem, including ITs sliding and jumping, we first consider a modelin which only ITs and jumps occur, and ignore sliding. To include jumping we assigna probability dtτ for a protein to detach from the DNA during a time interval dt . Theunbinding initiates a jump in which the protein uses three-dimensional diffusion to rebindat a new location on the DNA. Note that since there is no sliding it is safe to assume b ≪ R .As argued in the previous section, it is reasonable that both UITs and jumps move theprotein to a new location which is chosen randomly on the DNA. Therefore, the search pro-cess is composed of a series of one-dimensional scans (occurring through CITs) of the DNAinterrupted by uncorrelated relocations. The uncorrelated relocations can occur through twoindependent processes: jumps and UITs. The typical search time can be evaluated using anapproach identical to that of the previous sections.First, we need to estimate the typical time τ eff between two uncorrelated relocations.Combining, the previously derived typical time between two subsequent UITs, λ D eff , andthe typical time between jumps τ we obtain τ eff ≃ D eff l + D eff λ , (42)where λ , defined before, is the typical distance that the protein travels between two subse-quent UITs and we define an antenna length l = p D eff τ .Here and in the next section we focus on the search time as a function of l . This quantityis influenced by the protein-DNA non-specific binding energy and governs the frequency ofjumps. Other parameters that do not depend on l , such as λ , are taken as fixed. The valueof λ relevant for the discussion here is given in Sec. III , where b ≪ R . Note, that whenincorporated in the results below the resulting behavior is very complicated. While this iseasy to obtain we skip all the regimes and focus on important qualitative behavior.To proceed we note that the typical distance between two uncorrelated relocation events This expression is exact in the annealed case but it is only an approximation in the quenched regime.However, the error does not exceed 50% (see Appendix D 1 for details). l eff = p D eff τ eff ≃ l s
11 + l λ . (43)As expected, and seen in Eqs. (42) and (43), the relative importance of both mechanismsis controlled by the ratio lλ . In the case of l/λ ≫ l/λ ≪ τ , and the time of an IT,weighed with the probability of performing each. This gives τ eff = τ τ eff τ + δ (cid:18) − τ eff τ (cid:19) = τ l eff l + δ (cid:18) − l eff l (cid:19) ≃ τ + δ l λ l λ , (44)where /τ /τ eff is the probability of a jump, 1 − /τ /τ eff = / λ D eff /τ eff is the probability of an UITand δ , defined above is the time of an IT (see Appendix D 2 for a more detailed derivation).The total search time, as before, takes the form of Eq. (3). Now, each search round isdefined as the interval between two subsequent uncorrelated relocations. The total time ofone round is τ r ∼ τ eff + τ eff , and therefore the search time is given by t search ∼ N r τ r ∼ Ll s ( τ eff + τ eff ) . (45)Here l s is the length scanned between two subsequent uncorrelated relocations. In the casediscussed here b ≪ R , and the value of l s depends on the properties of the search betweentwo uncorrelated relocations, namely the ratio of l eff and l R , the recurrence length (see Eq.(10) and the relevant discussion). If l eff ≫ l R the search between two subsequent jumps isrecurrent and l s ∼ l eff . However, in the opposite regime, l ≪ l R , l s ∼ l eff l R .Therefore, for a given λ there are two regimes (see Fig. 10 and 11): • Regime I ( l R ≪ l eff ):6In this regime, using Eq. (45), the total search time is t search ∼ N r τ r ∼ Ll eff ( τ eff + τ eff ) ∼∼ Ll l D eff + τ + δ l λ q l λ ≃ Ll l D eff + τ q l λ , (46)where we used λ ≫ R .Comparing with Eq. (3) we note that here we have both an effective diffusion constantand an extra enhancement factor given by (cid:16) l λ (cid:17) − / . As we now show, this factor hasimportant consequence.Consider the value of τ = l D eff for which a minimal search time is obtained and compareit with the usual paradigm of (cid:0) τ opt (cid:1) = τ . Due to the enhancement factor we now find τ opt = (cid:0) τ opt (cid:1) − D eff τ λ , (47)where (cid:0) τ opt (cid:1) = τ (see Eq. (6)) is the optimal antenna size in absence of ITs ( λ → ∞ )(see Sec. II). It is interesting to note that l opt approaches infinity when τ is larger than acritical value τ c = λ D eff . (48)Hence, the minimal search time for τ ≥ τ c , is identical to that with no jumps (see Sec. III).It is important to note that τ c depends, as expected, on the time of an IT through D eff .In the case when τ ≤ τ c Eqs. (46) and (47) give t searchopt ∼ L r τ D eff r − τ τ c . (49)In this regime t searchopt is monotonically increasing in τ .In Fig. 12 we show a comparison between the results of numerical simulation and Eq.(46). • Regime II ( l eff ≪ l R )In this case l s ∼ l eff l R and Eq. (45) yields t search ∼ Ll R l eff ( τ eff + τ eff ) ∼ Ll R l (cid:18) l D eff + τ (cid:19) . (50)7 l (cid:79) R l R l eff D (cid:87) search R eff Ll lt Dl (cid:87)(cid:167) (cid:183)(cid:14)(cid:168) (cid:184)(cid:168) (cid:184)(cid:169) (cid:185) (cid:16) effopt eff
Dl D (cid:87)(cid:87)(cid:79)(cid:32) (cid:16) opt l (cid:32) (cid:102) eff D (cid:87) effsearch lDLt l l (cid:87)(cid:79)(cid:14)(cid:14) (cid:16) FIG. 10: Possible regimes as a function of l and λ are shown in the case of ITs and jumping (or IT,jumping and sliding with b ≪ R ) for l R ≪ p D eff τ . The gray (white) area represents regime I ( II ). The dashed line represents the optimal antenna length. The optimal antenna length in theabsence of IT is equal to √ D τ . Interestingly, in this regime the minimal search time is obtained when τ diverges ; Thismeans that jumping only increase the search slower in this case . We note that some careneeds to be taken with the limit since if λ > l R and the value of l exceeds l R the regime l eff ≪ l R transforms into Regime I .The results of this section highlight several interesting features which will also appearin the more general case, where sliding is also allowed. First, we note that in the limitof very strong protein-DNA affinity (large values of τ ) the search time becomes robust tochanges in the value of τ . This is very different from a search process with no ITs (see Eqs.(46), (50) and Fig. 12), and may give a possible explanation to the difference between invitro experiments on the Lac repressor [7]. There a strong dependence of the search timeon ionic strength (and therefore on the protein-DNA affinity) was found. However, in vivo experiment [32] found that the efficiency of the repression by the same protein is very robustto changes in the ionic strength.Furthermore, by examining the optimal search time, we find that beyond some criticalvalue of τ jumps increase the search time (see Fig. 12 for demonstration). This may givea possible explanation of the obtained value of τ in vitro [19] and in vivo [20] for the Lac8 l (cid:79) R l R l eff D (cid:87) search R eff Ll lt Dl (cid:87)(cid:167) (cid:183)(cid:14)(cid:168) (cid:184)(cid:168) (cid:184)(cid:169) (cid:185) (cid:16) effsearch lDLt l l (cid:87)(cid:79)(cid:14)(cid:14) (cid:16) opt l (cid:32) (cid:102) eff D (cid:87) effopt eff Dl D (cid:87)(cid:87)(cid:79)(cid:32) (cid:16) search R Ll R t l (cid:167) (cid:183)(cid:167) (cid:183) l (cid:168) (cid:184)(cid:168) (cid:184) l (cid:87)(cid:87)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183) l (cid:14)(cid:14)(cid:168) (cid:184) D (cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184) D (cid:87)(cid:87)(cid:87)(cid:87)(cid:14)(cid:14)(cid:14)(cid:14)(cid:169) (cid:185)(cid:169) (cid:185) effeffeffeff D (cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184) D (cid:16) ef f efef opt ef f efef D l D (cid:87) (cid:87) (cid:79) (cid:32) (cid:16) opt R l l (cid:16) FIG. 11: Possible regimes as a function of l and λ are shown in the case of ITs and jumping (or IT,jumping and sliding with b ≪ R ) for l R ≫ p D eff τ . The gray (white) area represents regime I ( II ). The dashed line represents the optimal antenna length. The optimal antenna length in theabsence of IT is equal to √ D τ . repressor. These are much larger than the optimal τ predicted by models that do notinclude ITs.In Fig. 12 a comparison between Eq. (46) and numerical simulation is shown. One maysee that increasing the value of τ increases the optimal value of l (or equivalently τ ) insuch a way that above some critical value, predicted by Eq. (48), it becomes infinite. VI. INTERSEGMENTAL TRANSFER, SLIDING AND JUMPING
With the results of the previous section it is straightforward to consider the general casewhere ITs, sliding and jumping are allowed. Similar to the previous section we show thatjumping may slow the search process significantly. However, ITs make the search processmuch more robust to variations in parameters.First consider the case b ≪ R where sliding events are very short. Clearly, in this casethe results of the previous section hold with δ = b D + τ IT . Here as in Sec. IV, D is the onedimensional diffusion coefficient for sliding and τ IT is the typical time that the protein isbound to two DNA segments. With this in mind, in this section we discuss only the opposite9 Log l Log t sea r c h FIG. 12: The influence of ITs on the search time is shown.The search time, t search ,is plotted as a function of the antenna length, l, for a different values of τ (140 , , , , , δ from bottom up). Here only ITsand jumping are allowed. Thin solid lines represent the numerical results. The bold solid linesrepresent analytic results (Eq. (46)). The black, dashed lines represent the search time in the casewith no ITs, obtained by using Eq. (4) with the effective diffusion constant D eff = R δ insteadof D . Here L , R and b were taken to be 1224000, 1 and 1 lattice constants respectively. Since R = b = 1 diffusion through sliding is identical to one through CITs. This allows us to directlycompare sliding and jumping with ITs and jumping. case of b ≫ R . Here, as in Sec. V, the parameters that do not depend on l , such as λ , aretaken as given. In Sec. IV contains the relevant derivation of λ is calculated for the casediscussed here of long sliding, b ≫ R .As shown in Sec. IV in this case D eff ∼ b δ with δ = b D + τ IT . Following Sec. V wefirst need τ eff , the typical time of an uncorrelated relocation. This is given by (see thederivation of Eq. (44) and Appendix D 1) τ eff = τ + τ IT l λ l λ . (51)Note that here, since b ≫ R , the search between two subsequent uncorrelated relocationsis always recurrent and therefore l s ∼ l eff . Therefore, similar to Sec. V, the search time is0given by t search ∼ Ll eff ( τ eff + τ eff ) ∼ Ll eff l eff D eff + τ + τ IT l λ l λ ! . (52)Using Eqs. (43) and (52), the total search time can be written as t search ∼ Ll q l λ (cid:18) l D eff + τ IT l λ + τ (cid:19) . (53)Again, it is interesting to consider the optimal value of τ τ opt = (cid:0) τ opt (cid:1) − τ − τ IT λ / D eff , (54)where (cid:0) τ opt (cid:1) = τ (see Eq. (6)) is the optimal antenna size in absence of ITs ( λ → ∞ ).Interestingly, Eq. (54) shows that the optimal τ opt , may either be smaller or largerthan (cid:0) τ opt (cid:1) depending on the time of an IT, τ IT . It is also noteworthy that when 2 τ >λ / D eff + τ IT the optimal τ value becomes infinite. Namely, jumping makes the searchprocess slower . This is similar to the behavior found in Sec. V, and again the critical valueof τ depends on microscopic quantities such as the time of an IT.The minimal search time obtain is t searchopt ∼ Lλ √ τ p λ / D eff + τ IT − τ τ < λ / D eff + τ IT L (cid:16) D eff + τ IT λ (cid:17) τ > λ / D eff + τ IT . (55)We stress again that it is clearly seen that jumping may slow the search considerably. Notethat again the optimal value of τ is very different than the canonical one discussed in Sec.II.Fig. 13 shows a comparison between the theoretically predicted search time (Eq. (53))and numerical simulation. VII. APPLICATION TO THE LAC REPRESSOR
The above results cover a very wide variety of regimes. For a given protein only severalare of interest. To illustrate the use of the results presented above we consider Lac repressor.Lac repressor is both the most studied DNA-binding protein (see [34] for a review) and itsstructure is highly suggestive of intersegmental transfers taking place. Despite of this severalphysical parameters of the protein are yet unknown. In this subsection we use the knownparameters: R ∼ nm [35], Λ ∼ µ , L ∼ mm , and those measured for Lac repressor1 Log t sea r c h Log l FIG. 13: The influence of ITs on the search time is shown. The search time, t search , is plotted vs.the antenna length, l , for a different values of τ (10 , , , , δ frombottom up). Here ITs, jumping and sliding are allowed. Thin solid lines with dots represent thenumerical results. The bold solid lines represent analytic results (Eq. (53)). The black, dashedlines represent the search time in the case with no ITs, obtained by using Eq. (4) with D eff = R δ .Here L , R and b were taken to be 1224000, 1 and 20 lattice constants respectively. with only one DNA-binding domain τ ∼ ms , τ ∼ . τ and D ∼ . µ /s [19, 20]. Stillunknown are b , the sliding length, and τ IT which we use as free parameters and study thesearch time as these are varied. It is interesting to note that Lac repressor is so large that, aswe show, essentially all ITs can move the protein at each step to a completely uncorrelatedlocation on the DNA.Fig. 14 shows the predicted t search from Secs. V and VI as a function of b and τ IT . Onemay see that for b ≫ R , ITs do not affect the search time significantly even if τ IT is small.This is results from the small probability of performing UIT for a large values of b . However,if b ≪ R the search time may be decreased in a significant manner by including ITs. Forexample, by setting b to be the size of one base pair ∼ . nm the search time decrease by afactor of three when τ IT = τ and if τ IT = τ the search time decreases by a factor of ten.Finally, Fig. 14 shows that for large values of τ IT , ITs may slow down the search process.2 FIG. 14: On this figure the analytical prediction of t search is shown as a function of the unknownparameters b and τ IT . VIII. SUMMARY
In this article we presented a comprehensive study of the influence of ITs on the searchprocess. Using simple scaling arguments we studied a model which includes the protein dy-namics and DNA conformation. Two extreme regimes for the DNA dynamics were studied:completely quenched (frozen) and annealed (rapidly moving) DNA. ITs were assumed torelocate the protein to a randomly chosen DNA position within a range of the order of theprotein size. The essence of the description may be understood from Sec. III. The followingsections elaborate and study a search processes based on ITs with sliding and/or jumping.The results for a particular protein of interest may be obtained by suitably selecting thesection most relevant for a particular case.The obtained results clearly indicate that including IT in the search process may increase,the robustness of the search efficiency to different parameters of the model such as theprotein-DNA affinity, the three-dimensional diffusion coefficient etc.The mechanism of IT may produce a significant increase of the optimal residence time ofthe protein on the DNA between two subsequent rounds of three-dimensional diffusion fromthe value predicted by the models that do not include IT. Recent experiments indicates that3the value of the residence time of the proteins on the DNA between two subsequent roundsof the three-dimensional diffusion is much larger than the optimum predicted by the model.It is possible that the existence of the IT mechanism may explain the rather quick searchtimes found in vivo experiments.One of the most surprising results found that above some critical value of the typicaltime of a jump the protein has no reason to detach from the DNA. It is more efficient for itto stay bound to the DNA. The value of the critical jump time depends on the time of anIT.A key ingredient needed for the behavior to occur is the confinement of the DNA ina volume much smaller than its radius of gyration. The probability to perform an UITobviously depends on the DNA density. Larger density implies a larger probability forUITs. Therefore the effects of IT are expected to be more important in the systems withhigh DNA density as cells or eucaryotic nuclei rather than in the in vitro experiments.
The dependency, mentioned above, on the DNA density leads to many possible regimeswhich depend on the cell size, DNA length etc. In particular, we found non-trivial regimeswhen the search time increases as a square root of the DNA length or is completely in-dependent of it. Our estimates indicate that these seem to be the ones most relevant toexperiments.Our results also show that the search on quenched and annealed DNA may have quitedifferent scaling behavior. In general a search that uses ITs is shown to be more rapid onan annealed DNA than on a quenched DNA. This happens due to the rapid decrease incorrelations which results from the motion of the DNA molecule.Similar scaling arguments were used to discuss the effects of IT in [11]. However, therethe main mechanism that drives the IT was assumed to be the motion of the DNA molecule.In our study even on completely quenched DNA ITs are shown to be important.
Acknowledgments
We thank R. Voituriez and R. Metzler for discussions and D. Levine for discussions andcomments on the manuscript. The Israel Science Foundation is acknowledged for financialsupport.4
APPENDIX A:
In this appendix we argue that the typical time that the protein spends in a jump isgiven by τ ∼ Λ D L . This quantity is controlled by average volume which is free from DNA.Consider, first, the probability to find a volume, free from DNA of radius s . To do so wedescribe the packed DNA as an ideal gas of LL straight rods of length L that are distributedrandomly in the cell (see Fig. 3). The probability p seg , that a given rods crosses a volumeof radius s is of order of L Λ s L = L s Λ . Here L Λ is the probability that a given segments islocated within a distance L of a point inside the cell and ∼ s L is the probability that thissegment crosses a sphere of radius s around the point. The probability that at least onesegment crosses the void is 1 − (1 − p seg ) L/L ≃ − e − LR . (A1)Therefore the typical free volume radius is ∼ q Λ L . Hence, the typical time to explore thisvolume is τ ∼ Λ D L . A second way to get the same expression for τ is based on a comparisonbetween Eqs. (1) and (4). Obviously, in the limiting case τ ≪ τ and √ D τ = r , thesearch becomes based only on the three-dimensional diffusion. Hence, in this case the formula(4) should give (1). It is easy to see that this happens only when τ ∼ Λ D L . APPENDIX B:
In this appendix we describe the details of the numerical simulation. The simulationswere done on a cubic lattice containing 800 × ×
800 sites. Assuming that a real cell hasa volume of 1 µm each site on the lattice represents a volume of ( dx ) = (cid:0) µm (cid:1) . Polymers(representing the DNA) with different lengths were embedded in the lattice by using a self-avoiding random walk. The persistence length was accounted for by assigning a probability p of changing direction randomly among the possible directions. Using the persistencelength of about 50 nm leads to p = dx nm = 0 . O (10) In the three-dimensional space diffusive exploration is not compact i.e. the probability to find a finitetarget (sphere) is less than one. However, the DNA as a target may be described as a set of straight rods.Hence, the search process effectively looks like the two-dimensional search for a finite target (disk) i.e.compact (up to logarithmic corrections). dx b to perform an IT and a probability dx l to perform a jump. ITs were simulated by arandomly choosing a DNA site within a distance R from the location of the protein. Withthe exception of Sec. II, where a complete simulation of the three-dimensional diffusionwas carried out by performing moves to the 6 available directions, a jump was simulated byrandomly choosing a site on the DNA. The time of the jump was taken as a free constant(). APPENDIX C:
In this appendix we argue that using ITs the protein can only move along the chemicalcoordinate to distances smaller than R or larger than Λ L . As mentioned above, we assumethat during an IT the protein chooses a new location whose three-dimensional distance fromits current location is smaller than R . The new location is chosen randomly with a uniformprobability. Given the uniform probability we need to estimate the total typical lengthavailable at each IT, G . We separate this quantity to four types of contributions: G = G + G + G + G . (C1)The first G is the contribution from DNA whose distance along the chemical coordinatefrom a point x is smaller than R , the protein size. This is given by G ( x ) ≃ R . (C2)The contribution G arises from DNA whose chemical distance from a point x is larger than R but smaller than L . The probability for the DNA to bend on a scale l is approximatelygiven by − e − l/ L L . However, the probability that this bend will connect to x is ∼ R l (dueto the area ratio). Since each connection contributes a length of the order of R to G weobtain G ∼ R L Z R − e − l/ L L R l dl ∼ R L (cid:0) − e − R/ L (cid:1) ≃ R L . (C3)6The contribution G comes from DNA whose chemical distance from x is larger than L but smaller than the length at which the DNA feels the boundaries of the cell is ∼ Λ L .This value can be overestimated using the fact that a free three-dimensional random walk on a lattice returns to the origin about 1 . L returns to a region with radius R an order of (cid:16) RL (cid:17) times. Each such return contributes length of about R to G , leadingto G ∼ R L . (C4)Finally, G is the contribution from the rest of the DNA (whose chemical distance is largerthan Λ L but smaller than L ). Using (13) and since each connected segment contributes alength of the order R to G one obtains G ∼ R LL p seg ∼ LR Λ . (C5)This result can be understood within a mean field approach: if the DNA has a total length L and is assumed to be distributed uniformly in the cell, every volume in the cell contains apart of the total DNA length that is equal to the total DNA length times the fraction of thevolume. One can see that in the assumed regime where L ≫ R and L ≫ Λ Λ L , G and G are much smaller than G and G . Therefore, we can safely neglect the probability that theprotein will move to a location on the DNA whose chemical distance from protein’s actuallocation is larger than R and smaller than Λ L . APPENDIX D:
In this appendix the effective times τ eff and τ eff are calculated.
1. The effective time of a correlated movement
We have two independent mechanisms for an uncorrelated motion. The first is jumpingwith a typical time of τ between two subsequent jumps. This process has Poissonianstatistics and therefore the probability that the protein does not perform a jump beforetime t is P J = exp (cid:18) − tτ (cid:19) . (D1)7The second mechanism for uncorrelated motion is an UIT with a typical time of order of λ D eff between two subsequent UITs. In the case of annealed DNA this mechanism hasPoissonian statistics and the probability that the protein does not perform an UIT beforetime t is P IT ∼ exp (cid:18) − tλ /D eff (cid:19) . (D2)For quenched DNA the probability that the protein did not performed an UIT after travelingdistance x is ∼ e − x/λ . Since the protein performs an effective one-dimensional diffusion, x ∼ p D eff t and we obtain P IT ∼ exp − s tλ / D eff ! . We will take the typical time of a non-interrupted (by an uncorrelated relocation) one-dimensional effective diffusion to be τ eff = Z ∞ P IT P J dt ∼
12 1 D eff l + D eff λ . (D3)The last expression is exact in the annealed case but it is only an approximation in thequenched regime. One can verify that the error does not exceed 50%, which is sufficient forscaling arguments of the type used in the paper.
2. The effective time of an uncorrelated movement
Since there are two mechanisms for uncorrelated movement: a jump with a typical time τ and an UIT with a typical time δ the typical time of the uncorrelated movement is theaverage of τ and δ weighted by the relevant probabilities for each process: τ eff = δ Z ∞ (cid:18) − dP IT dt (cid:19) P J dt + τ Z ∞ (cid:18) − dP J dt (cid:19) P IT dt == δ Z ∞ dP J dt P IT dt − τ Z ∞ dP J dt P IT dt − δ Z ∞ ddt ( P J P IT ) dt == τ − δτ Z ∞ P J P IT dt + δ = τ − δτ τ eff + δ == τ τ eff τ + δ (cid:18) − τ eff τ (cid:19) = τ l eff l + δ (cid:18) − l eff l (cid:19) = τ + δ l λ l λ . (D4)8In the case of sliding δ is replaced by τ IT . [1] B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson. The molecular biologyof the cell . Garland, New York, 4’th edition, 1994.[2] C. Loverdo, O. Benichou, M. Moreau, and R. Voituriez. Enhanced reaction kinetics in biolog-ical cells.
Nature Physics , 4:134, 2007.[3] M. von Smoluchowski. Mathematical theory of the kinetics of the coagulation of colloidalsolutions.
Z. Phys. Chem. , 92:129, 1917.[4] M. B. Elowitz, M. G. Surette, P. E. Wolf, J. B. Stock, and S. Leibler. Protein mobility in thecytoplasm of Escherichia coli.
J. Bacteriol. , 181(1):197, 1999.[5] S. Y. Lin and A. D. Riggs. Lac repressor binding to non-operator DNA: detailed studies anda comparison of eequilibrium and rate competition methods.
J. Mol. Biol. , 72(3):671, 1972.[6] A. D. Riggs, H. Suzuki, and S. Bourgeois. Lac repressor-operator interaction I. Equilibriumstudies.
J. Mol. Biol. , 48(1):67, 1970.[7] A. D. Riggs, S. Bourgeois, and M. Cohn. The Lac repressor-operator interaction. 3. Kineticstudies.
J. Mol. Biol. , 53(3):401, 1970.[8] O. G. Berg, R. B. Winter, and P. H. von Hippel. Diffusion-driven mechanisms of proteintranslocation on nucleic acids. 1. models and theory.
Biochemistry , 20(24):6929, 1981.[9] M. Slutsky and L. A. Mirny. Kinetics of protein-DNA interaction: Facilitated target locationin sequence-dependent potential.
Biophys J. , 87:4021, 2004.[10] T. Hu, A. Y. Grosberg, and B. I. Shklovskii. How proteins search for their specific sites onDNA: The role of DNA conformation.
Biophys J. , 90:2731, 2006.[11] Tao Hu and B. I. Shklovskii. How proteins search for their specific sites on DNA: The role ofintersegment transfer.
Phys. Rev. E , 76:051909, 2007.[12] O. G. Berg and C. Blomberg. Association kinetics with coupled diffusional flows. specialapplication to the Lac repressor-operator system.
Biophys. Chem. , 4:367, 1976.[13] S. E. Halford and J. F. Marko. How do site-specific DNA-binding proteins find their targets?
Nucleic Acids Research , 32(10):3040, 2004.[14] B.P. Belotserkovskii and D.A. Zarling. Analysis of a one-dimensional random walk withirreversible losses at each step: applications for protein movement on DNA.
J. Theor. Biol. , Biophysical Journal , 88:1608,2005.[16] M. A. Lomholt, T. Ambjrnsson, and R. Metzler. Optimal target search on a fast-foldingpolymer chain with volume exchange.
Phys. Rev. Lett. , 95:260603, 2005.[17] G. Adam and M. Delbruck. Reduction of dimensionality in biological diffusion processes. InA. Rich and N. Davidson, editors,
Structural Chemistry and Molecular Biology , pages 198–215,San Francisco, CA, 1968. Freeman.[18] B. D. Hughes.
Random walks and random enviroments , volume 1: Random walks. Clarendonpress, Oxford, UK, 1995.[19] Y. M. Wang, Robert H. Austin, and Edvard C. Cox. Single molecule measurement of repressorprotein 1d diffusion on DNA.
Phys.Rev. Lett. , 97:048302, 2006.[20] J. Elf, G.-W. Li, and P. X. Xie. Probing transcription factor dynamics at the simple single-molecule level in a living cell.
Science , 316:1191, 2007.[21] Y. Kao-Huang, A. Revzin, A. P. Butler, P. O’Conner, D. W. Noble, and P. H. Von Hippel.Nonspecific DNA binding of genome-regulating proteins as a biological control mechanism:Measurement of DNA-bound Escherichia coli Lac repressor in vivo.
PNAS , 74:4228, 1977.[22] P. H. von Hippel, A. Revzin, C. A. Gross, and A. C. Wang. In H. Sund and G. Blauer, editors,
Protein-Ligand Interactions , pages 279–347, Berlin, 1975. Walter de Gruyter.[23] J. L. Bresloff and D. M. Crothers. DNA-ethidium reaction kinetics: demonstration of directligand transfer between DNA binding sites.
L. Mol. Biol. , 172:263, 1975.[24] R. Fickert and B. Muller-Hill. How Lac repressor finds Lac operator in vitro.
J. Mol. Biol. ,226(1):59, 1992.[25] T. Ruusala and D. M. Crothers. Sliding and intermolecular transfer of the Lac repressor:Kinetic perturbation of a reaction intermediate by a distant DNA sequence.
Proc. Natl. Acad.Sci. USA , 89:4903, 1992.[26] B. A. Lieberman and S. K. Nordeen. DNA intersegment transfer, how steroid receptors searchfor a target site.
J. Biol. Chem. , 272(2):1061, 1997.[27] M. L. Embleton, S. A. Williams, M. A. Watson, and S. E. Halford. Specificity from thesynapsis of DNA elements by the SfiI endonuclease.
J. Mol. Biol. , 289(4):785, 1999. [28] C. Bustamante, M. Guthold, X. Zhu, and G. Yang. Facilitated target location on DNAby individual Escherichia coli RNA polymerase molecules observed with the scanning forcemicroscope operating in liquid. ASBMB , 274(24):16665, 1999.[29] M. G. Fried and D. M. Crothers. Kinetics and mechanism in the reaction of gene regulatoryproteins with DNA.
J. Mol. Biol. , 172:263, 1984.[30] S. Condamin, O. Benichou, V. Tejedor, R. Voituriez, and J. Klafter. First-passage times incomplex scale-invariant media.
Nature , 450:77, 2007.[31] S. Y. Lin and A. D. Riggs. The general affinity of Lac repressor for E. coli DNA: implicationfor gene regulation in procaryotes and eucaryotes.
Cell , 4:107, 1975.[32] B. Richey, D. S. Cayley, M. C. Mossing, C. Kolka, C. F. Anderson, T. C. Farrar, and M. T.Record. Variability of the intracellular ionic environment of Escherichia coli. differences be-tween in vitro and in vivo effects of ion concentrations on protein-DNA interactions and geneexpression.
J. Biol. Chem , 262:7157, 1987.[33] D. J. Watts.
Small Worlds: The Dynamics of Networks Between Order and Randomness.
Princeton University Press, 1999.[34] M. Muller-Hill.
The Lac operon. A short history of a genetic paradigm . Walter de Gruyter,Berlin, 1996.[35] G. C. Ruben and T. B. Roos. Conformation of Lac repressor tetramer in solution, bound andunbound to operator DNA.