[PDF] Effects of intersegmental transfers on target location by proteins

Abstract

We study a model for a protein searching for a target, using facilitated diffusion, on a DNA molecule confined in a finite volume. The model includes three distinct pathways for facilitated diffusion: (a) sliding - in which the protein diffuses along the contour of the DNA (b) jumping - where the protein travels between two sites along the DNA by three-dimensional diffusion, and finally (c) intersegmental transfer - which allows the protein to move from one site to another by transiently binding both at the same time. The typical search time is calculated using scaling arguments which are verified numerically. Our results suggest that the inclusion of intersegmental transfer (i) decreases the search time considerably (ii) makes the search time much more robust to variations in the parameters of the model and (iii) that the optimal search time occurs in a regime very different than that found for models which ignore intersegmental transfers. The behavior we find is rich and shows surprising dependencies, for example, on the DNA length.

Full PDF

aa r X i v : . [ q - b i o . S C ] J u l Eﬀects of intersegmental transfers on target location by proteins

Michael Sheinman and Yariv Kafri

Department of Physics, Technion-Israel Institute of Technology, 32000 Haifa, Israel (Dated: November 2, 2018)We study a model for a protein searching for a target, using facilitated diﬀusion,on a DNA molecule conﬁned in a ﬁnite volume. The model includes three distinctpathways for facilitated diﬀusion: (a) sliding - in which the protein diﬀuses along thecontour of the DNA (b) jumping - where the protein travels between two sites alongthe DNA by three-dimensional diﬀusion, and ﬁnally (c) intersegmental transfer -which allows the protein to move from one site to another by transiently binding bothat the same time. The typical search time is calculated using scaling arguments whichare veriﬁed numerically. Our results suggest that the inclusion of intersegmentaltransfer (i) decreases the search time considerably (ii) makes the search time muchmore robust to variations in the parameters of the model and (iii) that the optimalsearch time occurs in a regime very diﬀerent than that found for models whichignore intersegmental transfers. The behavior we ﬁnd is rich and shows surprisingdependencies, for example, on the DNA length.

I. INTRODUCTION

Many biological processes depend on the ability of proteins to locate speciﬁc DNA se-quences on time scales ranging from seconds to minutes. Examples include gene expressionand repression, DNA replication and others [1]. Naively, one might expect the protein tosearch for its target using only three-dimensional diﬀusion . Neglecting interactions of theprotein with the environment and the DNA (apart from the target site) one then ﬁnds, usingresults ﬁrst obtained by Smoluchowski [3], that the average search time, t search , is given by: t search ∼ Λ D r . (1) In this paper we only consider proteins whose motion is diﬀusive and not directed (directed motion couldresult from consumption of, for example, chemical energy and is discussed in [2]).

Here D is the three-dimensional diﬀusion constant of the protein, r is the target size and Λ is the volume that needs to be searched. Assuming a target size of the order of a base-pair r ≈ . nm , a typical nucleus (or bacteria) size of Λ ∼ nm and using the measuredthree-dimensional diﬀusion coeﬃcient for a GFP protein in vivo, D ∼ nm /s [4], oneﬁnds t search of the order of hundreds of seconds. If N proteins are searching for the sametarget the search time is given by t searchN ≃ t search /N . This suggests that about 10 proteinscould ﬁnd a target in reasonable times for cells to function properly.In real systems, due to the interactions of proteins with non-speciﬁc DNA sequencesand the environment [5], the picture is more complex. Indeed, in vitro experiments havesuggested that mechanisms other than three-dimensional diﬀusion are used by many proteinsto locate their targets [6, 7]. These strategies have been studied and debated extensivelyboth in the context of in vivo [8, 9, 10, 11] and in vitro systems [8, 10, 12, 13, 14, 15, 16]and are believed, in general, to allow for search times which are faster than that given byEq. (1).Historically, the ﬁrst strategy that was proposed combines one-dimensional diﬀusion (slid-ing) over the DNA with intervals of three-dimensional diﬀusion (typically called jumping inthis context) [8, 17] (see Fig. 1). Each individual search mechanism, when applied alone,has shortcoming and advantages over the other. When using only three-dimensional diﬀu-sion, the number of new three dimensional positions probed grows linearly in time but theprotein spends much time probing sites where there is no DNA present. In contrast, duringone-dimensional diﬀusion the protein is constantly bound to the DNA but suﬀers from aslow increase in the number of new positions probed as a function of time ( ∼ t / , where t denotes time) [18]. As shown, for example, in Refs. [8, 17] by intertwining one and threedimensional search strategies and tuning the properties of both one can in fact decrease thesearch time signiﬁcantly .The combined strategy, while better than the pure search strategies, comes at a cost of be-ing sensitive to changes in the properties of either the three-dimensional or one-dimensional The relation between the search time t search for one protein and search time t searchN for N proteins remainsunchanged throughout the paper. Clearly, a pure one-dimensional search strategy is not eﬃcient due to the slow diﬀusive search along theDNA, t search ∼ L D ∼ O ( hours ), where L ∼ nm is the genome length and D is the one-dimensionaldiﬀusion coeﬃcient that was measured indirectly [12] and directly [19, 20] to be much smaller than three-dimensional diﬀusion coeﬃcient D ∼ nm s [4]. diﬀusive processes. For example, as we argue below, the typical search time changes expo-nentially in the square root of the ionic strength. Moreover, given the many constraints onthe protein to function it is very restrictive to demand optimization for the search process.Indeed, equilibrium measurements [21] and recent single molecule experiment [19, 20] on theLac repressor protein suggest that the search process may not be in general optimized forthis search strategy.A third mechanism which was suggested to speed the search time is intersegmental trans-fer (IT) [22, 23]. During an IT the protein moves from one site to another by transientlybinding both at the same time. In principle the new site can be either close along the one-dimensional DNA sequence (or chemical distance) or distant (see Fig. 3). This mechanismis likely to be relevant for the proteins that have more than one binding domain like theLac repressor [24, 25], GRdbd [26] and SﬁI enzyme [27]. However, it could also occur inproteins with a single binding site in locations where the DNA crosses itself . To date weare aware of direct evidence for IT only for RNA polymerase [28]. However, measurement ofthe dissociation rate from a labeled (operator) DNA site of the rat glucocorticoid receptor[26], CAP and Lac repressor [29] revealed signiﬁcant dependence on the DNA concentrationin the solvent, a possible explanation for which is IT. Some theoretical work has suggestedthat in vivo , when the DNA concentration is much larger than in vitro experiments, IT mayplay a determinative role [8, 11, 16]. These studies focus on the ITs resulting from the DNAdynamics and consider the protein to be point like.In this paper we present a rather comprehensive study of the eﬀects of ITs on the searchprocess for a DNA molecule conﬁned in a ﬁnite volume, similar to the in vivo scenario.Our work complements previous ones by explicitly accounting for the size of the proteinand considering two limiting cases: (i) DNA which is completely static during the searchprocess and (ii) DNA whose motion is quicker than that of the protein’s motion along theDNA. Using scaling arguments backed by numerics we obtain expressions for t search , andthe optimal search time (obtained by tuning parameters such as the DNA-protein aﬃnity).A central conclusion of this paper is that the search time is much more robust to variationsin parameters when ITs are allowed . This is to the extent that in some cases any ﬁnite Of course, this fact may be both advantageous and disadvantageous for the cell. In some cases thecell needs transcription factors whose kinetic (and, therefore equilibrium) properties do depend on theenvironment and in other cases it doesn’t. (c)(e) (b)(a) (d)

FIG. 1: Schematic plots illustrating the diﬀerent mechanisms that can participate in the facilitateddiﬀusion process. Here dashed arrows represent diﬀerent protein moves, the solid curve representsthe DNA and a small circle with two legs indicates a protein with two binding domains. The ﬁgureshows (a) sliding, (b) a correlated intersegmental transfer, (c) an uncorrelated intersegmentaltransfer, (d) jumping. The distinction between (b) and (c) is deﬁned in Sec. III. (e) The dashed(dotted) line represents a one-dimensional (three-dimensional) distance. jumping rate can have a negative inﬂuence on the search time. In particular, the optimalsearch time is found to occur for parameter regimes very diﬀerent than the canonical one(see Sec. II) found in models which ignore ITs. Perhaps most important, as we show, ourwork suggests that ITs could explain recent ﬁndings which indicate a much higher aﬃnityof the TF Lac repressor to the DNA than required by an optimal search strategy which usesonly sliding and jumping [19, 20, 21].The scaling dependence of the search time on diﬀerent parameters is rich and very dif-ferent from regular facilitated diﬀusion (involving only sliding and jumping). Consider, forexample, the dependence of the search process on the length of the DNA, L for a DNAconﬁned in a volume Λ . Using only sliding and jumping the regime typically thought tobe relevant to experiment has a linear dependence of the search time on the DNA length L .A Smoluchowski-like search time is independent of L . In contrast, when ITs are allowed weﬁnd diﬀerent behavior. We estimate that the regime most relevant to in vivo experiments (inprokaryotic organisms) occurs when the dependence on the length of the DNA is weak . Forexample, when a searches are performed using only ITs the search time can be independentof L or scales as √ L depending on the DNA’s dynamics. The scaling behaviors relics on theconﬁnement of the DNA in a ﬁnite volume (shown in detail in Fig. 9) and could be used asexperimental probes for the existence of ITs.The paper is organized as follows: Sec. II brieﬂy reviews the main arguments used toanalyze searches that combines only sliding and jumping. In Sec. III the average search timeis calculated for the case of a strategy based only on ITs for both quenched and annealedDNA. In Sec. IV a search process that includes ITs and sliding is considered. Sec. Vconsiders the possibility that the protein can unbind from the DNA (jump) and performITs. Sec. VI studies a model with all three mechanisms. Finally, in Sec. VII we discusspossible scenarios for the Lac repressor and summarize in Sec. VIII. II. SLIDING AND JUMPING

To set the stage for a discussion of the eﬀects of IT we consider a search process whichuses only sliding and jumping. The discussion follows Refs. [9] and [13] closely. We imaginea single protein searching for a single target located on the DNA. The search is composedof a series of intervals of one-dimensional diﬀusion along the DNA (sliding) and three-dimensional diﬀusion in the solution (jumping). The typical time of each is denoted by τ and τ respectively. Following a jump, the protein is assumed to associate on a newrandomly chosen location along the DNA. While this approach is somewhat simplistic forjumps occurring in two-dimensions and below, for three dimensions, which case we consider,it is well suited [30].Under these assumptions, during each sliding event the protein covers a typical length l ,where l ∼ √ D τ (often called the antenna size) [18]. Since correlations between the loca-tions of the protein before and after the jump are neglected, the search process, completedwhen roughly all the DNA is scanned, is separated into N r ∼ Ll s (2)rounds of sliding and jumping. Here l s is the typical length scanned by the protein during around. If during the slide the protein does not skip sites on the DNA l s ∼ l (the distinctionbetween l s and l will become apparent when ITs are introduced). The total time needed toﬁnd a speciﬁc site is then: t search = N r τ r , (3)with τ r = τ + τ . Using Eqs. (2) and (3) one obtains t search ∼ Ll s ( τ + τ ) ∼ L √ D (cid:18) √ τ + τ √ τ (cid:19) . (4)Furthermore, it is easy to argue (see Appendix A) that τ ∼ Λ D L . (5)In Fig. 2 a comparison between the presented scaling arguments and a numerical simulationof a search that explicitly includes sliding on a DNA with a frozen conﬁguration in a ﬁnitevolume and three dimensional diﬀusion is shown (see Appendix B for details of the numerics).The excellent agreement justiﬁes many of the simpliﬁcations made, in particular, the neglectcorrelation between the initial and ﬁnal location of the jump. Throughout the paper weassume this always holds (see Appendix B).The analysis leads to a richer range of possible behaviors than found in Eq. (1), wherethe search time depends only on the volume in which the DNA is embedded [10]. Here, incontrast, three regimes are found: (i) For τ ≪ τ there is no dependence on L and the searchtime is given to a good approximation by Eq. (1). (ii) For L D ≫ τ ≫ τ the dependence onthe DNA length is linear. This is the regime typically considered relevant for experiments.(iii) For L D ≪ τ one ﬁnds t search ∝ L .It is natural to ask which τ optimizes t search . Using Eq. (4) it is easy to verify that (cid:0) τ opt (cid:1) = τ , (6)where 0 denotes a value obtained with no ITs. Alternatively, one can consider an optimalantenna size ( l opt ) = √ D τ . When this condition is met, the total search time scales as t searchopt = r τ D L ∼ s Λ LD D . (7)Note that the √ L dependence is obtained by optimizing, say τ , as L is varied. −0.5 0 0.5 1 1.5 2 2.5 3 3.5 46.577.588.599.510 Log l Log t sea r c h FIG. 2: The search time t search is shown as a function of the antenna length, l . The thin linerepresents the results from numerical simulations while the bold one is given by Eq. (4). Numericswere performed on a DNA embedded in a ﬁnite volume with a frozen conﬁguration. The length ofthe DNA was taken to be 1224000 lattice constants and D = D = 1 (see details in Appendix B).Similar results were obtained for diﬀerent values of D and D . This model, at the optimal τ and assuming known values for D , L and τ , predictsreasonable search times in vivo and is commonly assumed to give a possible explanation forthe two order of magnitude diﬀerence between the experiments in vitro and Eq. (1).Within the model the optimal search process requires ﬁne tuning of the antenna size, l ,as a function of the parameters D and τ . These parameters depend on various cell andenvironmental conditions such as the size of the cell, the DNA length, the ionic strength etc.The dependence can be quite signiﬁcant: for example, the parameter τ τ has an exponentialdependence on the square root of the ionic strength [31]. Deviations of this parameterfrom the optimum value might be crucial to the search time since t search t searchopt = (cid:16)q τ τ + q τ τ (cid:17) .Indeed, a strong dependence of the search time on the ionic strength was found in in vitro experiments [7]. Interestingly, in vivo , when the DNA is densely packed, no eﬀect of theionic strength on the eﬃciency of the Lac repressor was revealed [32]. Other experimentsalso suggest that τ is not optimized. In particular, equilibrium measurements [21], as wellas recent single molecule experiment [19, 20], ﬁnd a value of τ for dimeric Lac repressorthat is much larger than the predicted optimum τ in vivo .The lack of sensitivity to the ionic strength in vivo and the rapid search times found forthe Lac repressor, even with very large values of τ , suggest that other processes, apart fromjumping and sliding, are involved in the search process. These seem to be more important in vivo than in vitro . In the next section we show that a search process which uses ITsmodiﬁes the behavior found for searches which use only sliding and jumping in a signiﬁcantmanner. In particular the problems encountered above (e.g., high sensitivity to the antennalength, very long and non-optimal measured antennas etc.), are largely eliminated when ITsare included. III. PURE INTERSEGMENTAL TRANSFER

Before turning to the full problem of a search which uses sliding, jumping and ITs we willconsider a series of simpliﬁed models. Within the ﬁrst model, considered in this section, theprotein can only perform ITs. We will see that already at this level many of the problemsof the search discussed above, which uses only sliding and jumping, are resolved to a largeextent.To model ITs we consider a protein with two binding sites. The protein can either haveone site bound to the DNA or perform an IT to a new location by having both binding sitesbound to the DNA (see Fig. 1). The DNA is scanned for the target by the binding sites,each checking a length b when bound (note that since the protein has to align with the DNAsequence, b is of the order of a length of a single base-pair). A possible motivation for thispicture is, for example, the tetrameric structure of the Lac repressor. However, as will beevident many results also apply to proteins with diﬀerent shapes.Motivated by DNA in cells, we consider a DNA molecule which is densely packed in a smallvolume. In typical systems the DNA has a total length of L ∼ nm , a persistence length L ∼ nm , a cross section radius ρ ∼ nm and is contained in a volume of Λ ∼ nm .The typical distance between segments of DNA of length L is therefore much smaller than L : Λ L/L ≪ L . Under these conditions, using Λ ≫ L , it is easy to check that the radiusof gyration of free DNA, which is of the order of L q LL is much larger than the cell sizeΛ - the DNA is densely packed even though its fractional volume, Lρ / Λ , in the containeris small (about one percent). By way of comparison, typical protein sizes are in the range R ∼ − nm , much smaller than the DNA’s persistence length. Although in vivo the packinghas a more complicate structure than we consider, we expect similar behavior to occur alsothere.As stated above the protein moves by ﬁrst being bound with only one binding site andthen with both. The typical time for this, deﬁned by δ = τ b + τ IT , is the sum of the typicaltime that protein probes a length b (by being bound with one domain) and the time thatthe protein is bound with both binding domains to the DNA while performing an IT . Weassume that the protein moves (for example, using both legs of the Lac repressor) to arandom position located at a distance smaller or equal to R , the size of the protein, from it(see Fig. 1) . Deﬁning a “chemical” coordinate x which runs along the length of the DNAthe protein can either perform moves from its location x to the interval [ x − R, x + R ] (werefer to these as “correlated ITs” (CITs)) or reach distant sites along the chemical coordinateavailable through the structure of the packed DNA.Under the above conditions it is easy to verify (see Appendix C) that almost all ITsperformed by the protein are either correlated moves or performed to a coordinate along theDNA whose distance from its previous location is bigger than Λ L (but smaller than L ). Wecall these steps “uncorrelated ITs” (UITs) (see Fig. 1(c)). In other words, one can safelyneglect the possibility that the protein will move using ITs to a chemical distance largerthan R and smaller than Λ L .Our main interest is the typical search time. For this purpose it is useful to deﬁne λ -the average length that the protein travels before performing an UIT. On chemical distanceslarger than R but smaller than λ the motion is eﬀectively diﬀusive in one dimension witha diﬀusion coeﬃcient D eff ∼ R δ . On chemical distance scales larger than λ and smallerthan L the motion is controlled by UITs. Due to the three-dimensional nature of each UITone expects correlations between diﬀerent UITs to be negligible. We verify this assumptionlater using numerical simulations.From the discussion and using a language similar to that of Sec. II the search processcan be described as a sequence of N r ∼ Ll s (8) rounds of correlated ITs where l s is the length scanned by the protein during each round We take δ independent of parameters such as cell size Λ and the DNA length, L . This is justiﬁed in aregime where most ITs are close along the chemical coordinate of the DNA. Diﬀerent scenarios are considered at the end of the section. τ r ∼ λ D eff ∼ (cid:18) λR (cid:19) δ . (9)In general while performing CITs the protein can miss regions of the DNA by skippingover them. Since each segment of size R is visited p τ r δ ∼ λR times [18], when λR ≫ Rb thewalk is recurrent and no sites are skipped so that l s ∼ λ . In contrast, when λR ≪ Rb the walkis not recurrent and l s ∼ λR/b λR ∼ λ R b . Therefore the recurrence length, l R ∼ R b , (10)separates between two regimes l s ∼  λ l R λ ≪ l R λ λ ≫ l R , (11)the ﬁrst transient and the second recurrent.Using Eqs. (3), (8), (9) and (11) the typical search time is obtained t search ∼ Ll s (cid:18) λR (cid:19) δ ∼  Lb δ λ ≪ l RLλR δ λ ≫ l R . (12)To complete the expression one needs to evaluate λ . Its value depends on various parametersand, in particular, the time scale which characterize the motion of the DNA. As discussedin the introduction we consider two extreme regimes - quenched DNA and annealed DNA.In both cases λ can be evaluated from an intermediate quantity, p , the probability thatthe protein can make an UIT from a speciﬁc location x on the DNA. Since this quantity isindependent of the DNA’s motion we estimate it ﬁrst before turning to the two regimes.To do so, we consider a packed DNA as an ideal gas of LL straight rods of length L thatare distributed randomly in the cell (see Fig. 3). The probability p seg , that two given rodscross within a distance of R from each other is given by p seg = A L Λ R L = A L R Λ , (13)where A is a constant of order unity. Here L Λ is the probability that a given segmentsis located within a distance L of a point inside the cell and R L is proportional to theprobability that this segment crosses a sphere of radius R around the point. Under the1 FIG. 3: Illustrated schematically is the simpliﬁed treatment of the folded DNA. We ﬁrst representthe DNA as the ideal gas of rods each with of a length of one persistence length. Then we connectthe rods randomly to form a small world network (see text for details). Numerically we ﬁnd thedescription to work well. conditions described above we ﬁnd that typically p seg ≪

1. Finally, to relate p to p seg wenote that to make an IT at least one segment should be accessible. This yields p = 1 − (1 − p seg ) L/L ≃ − e − A LR . (14)Eq. (14) implies that there are two possible regimes depending on the value of Lp =  A LR Λ ≪ L ≪ L c L ≫ L c , (15)where L c ∼ Λ R . (16)In essence when L ≫ L c (which can occur for example by having a large protein) p ≃ L ≪ L c we have that p = A LR Λ ≪ L c for the range of parameters of interest is ofthe order of 10 nm for very large proteins ( R of order of tens of nm , similar to the Lacrepressor). Therefore in vivo we expect a relatively large L c , so the regime L ≪ L c shouldbe relevant . In a eucaryotic cell the concentration of DNA is much higher and this statement may be wrong. b , the scanned length on the DNA during one binding event. Each step on thenetwork takes on average a time δ . During an IT the protein can move from its position, x ,to a randomly chosen position in the interval [ x − R, x + R ] along the chemical coordinate(correlated transfer) with probability 1 − p or to an uncorrelated site with probability p (uncorrelated transfer). Such networks are commonly referred to as Small World Networks[33] (see Fig. 3(c)).To ﬁnd the relation between λ and p one has to consider the dynamics of the DNA.Below we consider two extreme cases (a) a completely quenched DNA conﬁguration and (b)a strongly ﬂuctuating DNA, which we term annealed. A quenched DNA is static throughoutthe search process. An annealed DNA changes its conformation on time scale much quickerthan the motion of the protein. A. Quenched DNA

In this section we derive the search time for a quenched DNA. In particular we will showthat it is has a non-trivial behavior as a function of L . In the regime that is expected to berelevant in vivo the search time is independent of the DNA’s length (see Fig. 4).For quenched DNA one expects that if an UIT can occur at point x it can also happenin a region of size R around it. Similar considerations apply to sites where an UIT can notoccur. The typical distance traveled by the protein along the DNA’s chemical coordinatebetween two subsequent UITs, L > λ > R , is of the order of the typical distance betweentwo distinct locations where an UIT can occur. This implies for p ≫ R/L a scaling of λ ofthe form λ ∼ Rp , (17)where p is deﬁned above (see Eq. (15)) while for p ≪ R/L clearly λ = L (see Fig. 5).From the previous discussion one may infer that there are three distinct behaviors as afunction of L shown on Fig. 5. The ﬁrst regime occurs for DNA so short that an UIT cannotoccur during the search. This happens when p ≪ R/L , or equivalently when L ≪ L Q , where3 Log [L] Log [t sea r c h ] I III t search ~ δ L /R t search ~ δ L/bt search ~ δΛ /R II L L FIG. 4: The search time, t search , is plotted as a function of L , the DNA length for the pure IT casewith a quenched DNA conﬁguration. The circles represent numerical data, while the solid line wasobtained using Eqs. (21) ,(23) and (24). The three visible regimes correspond to the three on Fig.5 (see also Fig. 9). In this plot R and b were taken to be 3 and 1 lattice constants respectively(the rest of the details are found in Appendix B). The search time is shown in units of δ .FIG. 5: The schematic behaviors of λ and l s as a function of L (on a log-log scale) is shown forquenched DNA and b ≪ R . L Q = r Λ R . (18)In fact, the estimate for L Q pushes the limit of our treatment since the DNA is no longerdensely packed in this regime. Nonetheless, we ﬁnd good agreement with numerical simula-tions.The other regimes occur for L ≫ L Q , where one has p ≫ R/L . In this case the proteinscan use UITs during the search. As discussed above there is a length scale L c separatingtwo distinct behaviors of p , and therefore we have three diﬀerent behaviors for λ which aregiven by (see Fig. 5): λ ∼  L L ≪ q Λ R = L Q LR L Q ≪ L ≪ L c R L ≫ Λ R = L c , (19)where as before L c = Λ /R . Furthermore, as described above, the scan between twosubsequent UIT can either be recurrent ( λ ≫ l R ) or transient ( λ ≪ l R ) with a crossoverlength L Q . This length scale L Q is determined by the condition λ (cid:16) L = L Q (cid:17) ∼ l R . In therecurrent regime the walk between two ITs doesn’t skip locations on the DNA. This is incontrast to the transient regime where many sites are skipped. Thus using Eqs. (15) and(17) one ﬁnds L Q = Λ R b . (20)For L ≫ L Q the search between two subsequent UITs is short and therefore transient whilefor L ≪ L Q the search between two subsequent UITs is long and therefore recurrent.Note that when the search is transient, t search is independent of λ (see Eq. (12)). There-fore, the crossover between two distinct scaling behaviors of t search is governed by the smallerof the two length scales L c and L Q . For proteins performing only ITs one expects b to besmaller than R . It is easy to see that in such cases L Q is smaller than L c . (Other possibilitiesare discussed in Sec. IV.)To summarize there are two length scales L Q and L Q which separate three possible regimes(see Fig. 5). • Regime I : L ≪ L Q λ ∼ L . There are no UITs and Eqs. (11) and (12) give t search ∼ L D eff ∼ L R δ . (21)This regime is clearly not relevant in vivo (using the typical values, Λ ∼ µm and R ∼ nm ,we ﬁnd L Q ∼ µm which is much shorter than typical DNA lengths). • Regime II : L Q ≪ L ≪ L Q Now the motion between two subsequent UITs is recurrent, l R ≪ λ , and Eq. (17) gives λ ∼ Λ LR . (22)Using Eqs. (11) and (12) we obtain t search ∼ Λ R δ . (23)Note that in this regime, as opposed to Sec. II, the search time is independent of the DNA’slength. Eq. (23) is equivalent to Eq. (1) with an eﬀective three-dimensional diﬀusioncoeﬃcient D ∼ R rδ . In contrast to the simple three-dimensional diﬀusive search Eq. (23)does not depends on the target size r but rather on the protein size which may be muchlarger. • Regime

III : L ≫ L Q Here λ ≪ l R . and Eqs. (11) and (12) give t search ∼ Lδb . (24)The obtained results, compared to numerics, are summarized in Figs. 4 (see also Fig.9). One can clearly see the three regimes arising for diﬀerent lengths of DNA which areseparated by L Q and L Q . The details of the numerical simulation are described in AppendixB. Note that L Q and L Q are well predicted by the scaling arguments.The most relevant regime for in vivo experiments in prokaryotic organisms is likely to bethe intermediate regime ( II ) where the search time is independent of the DNA’s length andscales as Λ . Comparing the search time in this regime (23) with the minimal search time inthe case when sliding and jumping are used, Eq. (7), one may see that if δ < R q L Λ D D thesearch time in the pure IT scenario is in fact smaller than the one of Sec. II which includesonly sliding and jumping case. This is despite the fact that the protein never unbinds fromthe DNA .6 B. Annealed DNA

In this section we consider the annealed case. As we show, here the search time also hasnon-trivial but diﬀerent than the quenched case behavior as a function of L . In the regimethat is expected to be relevant in vivo the search time scales as √ L .In the annealed case the time scale for a rearrangement of the DNA’s conﬁguration isassumed to be much smaller than the time of the protein’s motion during an IT. As a resultof the constant rearrangement of the DNA UITs now occur with probability p for each IT.The average number of ITs with no UITs performed is therefore of the order of p and thusthe average time that the protein spends between two subsequent UITs is δp ( δ as beforeis the typical time between two subsequent ITs). On one-dimensional length scales smallerthan λ the protein diﬀuses with a diﬀusion constant D eff ∼ R δ . Therefore, the typicalone-dimensional distance between two subsequent UITs λ is λ ∼ s D eff δp ≃  q Λ L L ≪ L c R L ≫ L c , (25)where L c is deﬁned in Eq. (16). As for the quenched case we will see that again threedistinct behaviors arise with two crossover lengths.The ﬁrst occurs when no UITs occur. The crossover length L A can be extracted usingthe condition λ (cid:0) L = L A (cid:1) ∼ L which under our assumptions on the protein’s size can onlyoccur when L ≪ L c . This yields L A ∼ Λ . It is easy to see that L A ≪ L Q . This means that, as expected, in the annealed case theeﬀects of UITs become important at much smaller DNA concentration than in the quenchedcase. This happens because fast DNA movements increase the probability to perform anUIT. As for L Q , the estimate for L A pushes the limit of our treatment since the DNA is nolonger densely packed in this regime.The second crossover length L A occurs when the motion between UITs becomes transient.It can therefore be estimated using λ (cid:0) L = L A (cid:1) ∼ l R . Taking the regime L ≪ L c in Eq. (25)yields L A ∼ b Λ R . (26)For target sizes much smaller than the protein size ( b ≪ R ), it is clear that L A ≪ L c (see7 FIG. 6: The schematic behavior of λ and l s as a function of L (on a log-log scale) is shown forannealed DNA and b ≪ R . Eq. (16)). Hence, using the same arguments as before, only two length scales, L A and L A ,determines three possible regimes (see Fig. 6).The three regimes which arise are: • Regime I : L ≪ L A Here λ ∼ L . There are no UITs and Eqs. (11) and (12) give t search ∼ L D eff ∼ L R δ . (27) • Regime II : L A ≪ L ≪ L A Here searches between two subsequent UIT are recurrent so that l R ≪ λ . Eq. (25) gives λ ∼ r Λ L . (28)Using Eqs. (11) and (12) we obtain t search ∼ √ L Λ R δ . (29)Here, in contrast to the quenched case the intermediate result scales with the length of theDNA as L / . Note that the search time is always shorter than that on a quenched DNA.This happens because the DNA’s movement destroys the correlation in the motion of theprotein and, therefore, increases the eﬃciency of the search. A similar dependence on L ( t search ∝ √ L ) was obtained for a diﬀerent model [11]. There, however, the origin of the8dependence is diﬀerent, and is linked to modeling the DNA’s motion as diﬀusion of an idealgas of rods. • Regime

III : L ≫ L A Here λ ≪ l R . Therefore, Eqs. (11) and (12) give t search ∼ Lδb . (30)The obtained results are summarized later in Fig. 9.The most relevant regime for in vivo experiments in prokaryotic organisms is likely tobe the intermediate one ( II ) where the search time scales as L / or alternatively as Λ / .Comparing the search time in this regime (29) with the minimal search time in the casewhen sliding and jumping are used (7) one may see that if δ < R √ D D the search time in thepure IT case is smaller than the one in the sliding and jumping case. This is despite thefact that the protein never unbinds from the DNA .Numerical simulation of the annealed case require dynamical moves for the whole DNAmolecule. This is a formidable task for DNAs with reasonable length which is beyond thescope of this paper. IV. INTERSEGMENTAL TRANSFER AND SLIDING

Next we consider a protein that can perform both ITs and sliding. Namely, in addition toITs the protein can perform one-dimensional diﬀusion with only one binding domain bound(see Fig. 1(a)). In the language of Sec. III, b is now the typical sliding length betweentwo subsequent ITs. Now each step (distinct from a round deﬁned above), deﬁned as theinterval between the ends of two subsequent ITs, takes a typical time δ = b D + τ IT , where D is the one dimensional diﬀusion coeﬃcient of the protein with only one binding domainbound and τ IT is the typical time that the protein is bound to two DNA segments. The one-dimensional diﬀusion on the length scales larger than b has a diﬀerent eﬀective diﬀusion coeﬃcientdue to the possibility of a CIT. Thus, to measure D on large length-scales one should not allow for ITs.This may by done, for example, by measuring the motion of the part of the protein that contains onlyone binding domain [19, 20]. b ≪ R it is straightforward to see that the results of the Sec. III hold with a redeﬁned δ . However, in general the sliding length b might be much larger than the proteins size R .This is the regime that we focus on in this section.Clearly, now the search between two subsequent UIT is always recurrent so that l s ∼ λ .Here as before λ is the typical distance traveled by the protein between two subsequentUITs. However, now D eff ∼ b δ , where as above δ = b D + τ IT . The search time as afunction of λ , similar to Eq. (12), becomes t search ∼ Lλ λ D eff ∼ Lλb δ . (31)The value of λ , as in the previous section, depends on the dynamics of the DNA molecule.Again we consider two extreme cases (a) quenched DNA and (b) annealed DNA. A. Quenched DNA

To obtain λ we ﬁrst introduce a new quantity, λ , deﬁned as the typical chemical distancebetween two locations in which the protein can perform an UIT. Note that we are interestingin the regime b ≫ R . Therefore the values of λ and λ may be distinct since an UIT isnot necessarily performed at every possible location on the DNA. Clearly, however, thefunctional behavior of λ is identical to that of λ in the previous section. This yields (seeEq. 19) λ ∼  L L ≪ q Λ R = L Q LR L Q ≪ L ≪ L c R L ≫ Λ R = L c , (32)where we have used the deﬁnitions of L Q and L c of the previous section.Similar to the derivation of Eq. (10), when λ /b ≫ b/R , the eﬀective random walk ofthe protein along a length λ is recurrent. Here recurrent motion implies that sites wherean UIT can occur are visited many times before a neighboring site where an UIT can occuris met (note that this is distinct from the recurrent behavior of Sec. III). In the recurrentregime a location of a possible UITs is visited many times and therefore not missed. Inthis case λ ∼ λ . In the opposite transient regime (again distinct in meaning from thatused in Sec. III), λ b ≪ bR and the protein performs an UIT only after it travels a distance λ ≫ λ . In the latter regime each IT has a probability Rλ to be an UIT. Therefore between0 (cid:79) (cid:79) / b R R (cid:79) b R (cid:79) b L / b R L FIG. 7: The schematic behavior of λ and l s as a function of L (on a log-log scale) is shown forquenched DNA, L ≫ b R and b ≫ R . two subsequent UITs the protein performs λ R ITs. Using the diﬀusive nature of the motionwe ﬁnd λ ∼ b q λ R . The value of λ as function of λ is shown schematically in Fig. 7.Combining the three regimes of λ with the above mentioned crossover from λ ∼ λ to λ ∼ b q λ R (which occurs at L = Λ /b ) one ﬁnds, using b/R ≫ four regimes for thesearch time: • Regime I occurs for L ≪ L Q corresponding to λ ∼ λ = L in Eq. (32). Using Eq.(31) gives t search ∼ L b δ . (33) • Regime II occurs for Λ b ≫ L ≫ L Q and λ ∼ λ ∼ Λ LR . Using Eq. (31) yields t search ∼ Λ b R δ . (34) • Regime

III occurs for Λ b ≪ L ≪ L c . Now λ ∼ b q λ R ∼ b q Λ LR . Using Eq. (31) weﬁnd t search ∼ √ Λ LbR δ . (35) • Regime IV occurs for L ≫ L c . Here λ ∼ b q λ R ∼ b and with Eq. (31) one gets t search ∼ Lb δ . (36)1

Log [t sea r c h ] Log [L]t search ~ δΛ /b R L c L II III IV t search ~ δ L /R I Λ /b t search ~ δ L/bt search ~ δΛ /b R(L/ Λ ) FIG. 8: t search is plotted as a function of L , the DNA length for a model with IT and sliding fora quenched DNA. The thin line with dots represent numerical data, while the bold solid line wasobtained using Eqs. (33) ,(34), (35) and (36) (see also Fig. 9). In this plot R and b were taken tobe 1 and 20 lattice constants respectively (the rest of the details could be found in Appendix B).The search time is shown in units of δ . Fig. 8 shows a comparison between the four theoretically predicted regimes and thenumerical simulation of the model. Three regimes are reproduced by the numerics while thefourth one was not reproduced due to computational limitations.For a moderate values of τ IT one may see that long sliding may drastically decreasethe eﬃciency of the search. This occurs because long sliding prevents both UITs that de-stroy correlations in the search process and CITs that increase the eﬀective one-dimensionaldiﬀusive constant.2 B. Annealed DNA

Here using the arguments presented in Sec. III B, the average number of steps performedbetween two subsequent UITs is of the order of p where p is given in Eq. (15). This impliesa typical time between the subsequent UITs of the order of δp . Using the fact that along theDNA the motion of the protein is diﬀusive with an eﬀective diﬀusion constant D eff ∼ b δ one ﬁnds λ ∼ s D eff δp . (37)Clearly, λ can only take values in the range b ≤ λ ≤ L . These with the possible values of p (see Eq. (15)) deﬁne the borders of the following three regimes: • Regime I occurs for λ ∼ L . Using Eq. (37) and p = LR / Λ it can be veriﬁed thatthis regime occurs when L ≪ Λ (cid:0) bR (cid:1) / . In this case no UIT occur during the searchand Eq. (31) gives t search ∼ L b δ . (38) • Regime II occurs when Λ (cid:0) bR (cid:1) / ≪ L ≪ L c . Using Eq. (37) and p = LR / Λ oneﬁnds that in this case λ ∼ b q Λ LR . Using Eq. (31) gives t search ∼ √ Λ LbR δ . (39) • Regime

III occurs where L ≫ L c and almost all ITs are UITs. Here λ ∼ b and p ≃ t search ∼ Lb δ . (40)The obtained results are summarized in Fig. 9.One may see that in the case of long sliding, rapid DNA motion cannot decrease thesearch time signiﬁcantly as in the pure IT case. This is because long sliding prevents fastdecay of correlations.

C. Motion with no CIT

Here we consider a case where the structure of the protein causes it to prefer UITs overCITs. This may occur, for example, in cases where the “legs” of the protein are antiparallel3 log search t log L search Lt R (cid:71)(cid:47) (cid:16) search Lt R (cid:71) (cid:16) searc t h Lb (cid:71) (cid:16) Annealed DNA A L A L III

III log search t log L search t R (cid:71) (cid:47) (cid:16) search Lt R (cid:71) (cid:16) searc t h Lb (cid:71) (cid:16) Q L Quenched DNA Q L IIIIII log search t log L search Lt R (cid:71) (cid:16) search

Lt b (cid:71) (cid:16)

Annealed DNA R (cid:47)(cid:168) (cid:184)(cid:169) (cid:185) c Lb (cid:167) (cid:183) IIIIII log search t log L search b Rt (cid:71)(cid:47) (cid:16) L search t R (cid:71) (cid:16) search Lt b (cid:16) Q L Quenched DNA / b IV IIIII (cid:71) I search Lt bR (cid:71)(cid:47) (cid:47) (cid:16) c L (cid:47) search Lt bR (cid:47) (cid:71)(cid:47) (cid:16) ( ) a ( ) b FIG. 9: In this ﬁgure the schematic behavior of t search as a function of the DNA length L isshown in absence of the jumps. (a) shows short sliding results ( b ≪ R ). (b) shows long slidingresults ( b ≫ R ). and rigid. The motion on length scale smaller than λ is then diﬀusive involving only slidingwith a diﬀusion coeﬃcient D . In this case, clearly l s = λ and the time between twosubsequent UITs is given λ D + τ IT where τ IT is the time of an UIT. One ﬁnds, similar toSec. II t search ∼ Lλ (cid:18) λ D + τ IT (cid:19) . (41)The relationship between this and the picture of Sec. II is given by identifying the antenna’slength l with λ and the three-dimensional diﬀusion time τ with τ IT .Most of the results of Secs. III and IV are summarized in Fig. 9. The results of thissection indicate that ITs may supply reasonable search times if they are quick enough.Combining IT with sliding we see that even rare UIT events may break correlations createdby one-dimensional diﬀusion. In this sense ITs act as jumps without the need for detachmentfrom the DNA. Besides this, CITs may eﬀectively accelerate the one-dimensional diﬀusionor even replace it altogether.4 V. INTERSEGMENTAL TRANSFER AND JUMPING

We now turn to consider the eﬀect of jumping on the results described above. Beforeaddressing the full problem, including ITs sliding and jumping, we ﬁrst consider a modelin which only ITs and jumps occur, and ignore sliding. To include jumping we assigna probability dtτ for a protein to detach from the DNA during a time interval dt . Theunbinding initiates a jump in which the protein uses three-dimensional diﬀusion to rebindat a new location on the DNA. Note that since there is no sliding it is safe to assume b ≪ R .As argued in the previous section, it is reasonable that both UITs and jumps move theprotein to a new location which is chosen randomly on the DNA. Therefore, the search pro-cess is composed of a series of one-dimensional scans (occurring through CITs) of the DNAinterrupted by uncorrelated relocations. The uncorrelated relocations can occur through twoindependent processes: jumps and UITs. The typical search time can be evaluated using anapproach identical to that of the previous sections.First, we need to estimate the typical time τ eff between two uncorrelated relocations.Combining, the previously derived typical time between two subsequent UITs, λ D eff , andthe typical time between jumps τ we obtain τ eff ≃ D eff l + D eff λ , (42)where λ , deﬁned before, is the typical distance that the protein travels between two subse-quent UITs and we deﬁne an antenna length l = p D eff τ .Here and in the next section we focus on the search time as a function of l . This quantityis inﬂuenced by the protein-DNA non-speciﬁc binding energy and governs the frequency ofjumps. Other parameters that do not depend on l , such as λ , are taken as ﬁxed. The valueof λ relevant for the discussion here is given in Sec. III , where b ≪ R . Note, that whenincorporated in the results below the resulting behavior is very complicated. While this iseasy to obtain we skip all the regimes and focus on important qualitative behavior.To proceed we note that the typical distance between two uncorrelated relocation events This expression is exact in the annealed case but it is only an approximation in the quenched regime.However, the error does not exceed 50% (see Appendix D 1 for details). l eff = p D eff τ eff ≃ l s

11 + l λ . (43)As expected, and seen in Eqs. (42) and (43), the relative importance of both mechanismsis controlled by the ratio lλ . In the case of l/λ ≫ l/λ ≪ τ , and the time of an IT,weighed with the probability of performing each. This gives τ eff = τ τ eff τ + δ (cid:18) − τ eff τ (cid:19) = τ l eff l + δ (cid:18) − l eff l (cid:19) ≃ τ + δ l λ l λ , (44)where /τ /τ eff is the probability of a jump, 1 − /τ /τ eff = / λ D eff /τ eff is the probability of an UITand δ , deﬁned above is the time of an IT (see Appendix D 2 for a more detailed derivation).The total search time, as before, takes the form of Eq. (3). Now, each search round isdeﬁned as the interval between two subsequent uncorrelated relocations. The total time ofone round is τ r ∼ τ eff + τ eff , and therefore the search time is given by t search ∼ N r τ r ∼ Ll s ( τ eff + τ eff ) . (45)Here l s is the length scanned between two subsequent uncorrelated relocations. In the casediscussed here b ≪ R , and the value of l s depends on the properties of the search betweentwo uncorrelated relocations, namely the ratio of l eff and l R , the recurrence length (see Eq.(10) and the relevant discussion). If l eff ≫ l R the search between two subsequent jumps isrecurrent and l s ∼ l eff . However, in the opposite regime, l ≪ l R , l s ∼ l eff l R .Therefore, for a given λ there are two regimes (see Fig. 10 and 11): • Regime I ( l R ≪ l eff ):6In this regime, using Eq. (45), the total search time is t search ∼ N r τ r ∼ Ll eff ( τ eff + τ eff ) ∼∼ Ll  l D eff + τ + δ l λ q l λ  ≃ Ll l D eff + τ q l λ , (46)where we used λ ≫ R .Comparing with Eq. (3) we note that here we have both an eﬀective diﬀusion constantand an extra enhancement factor given by (cid:16) l λ (cid:17) − / . As we now show, this factor hasimportant consequence.Consider the value of τ = l D eff for which a minimal search time is obtained and compareit with the usual paradigm of (cid:0) τ opt (cid:1) = τ . Due to the enhancement factor we now ﬁnd τ opt = (cid:0) τ opt (cid:1) − D eff τ λ , (47)where (cid:0) τ opt (cid:1) = τ (see Eq. (6)) is the optimal antenna size in absence of ITs ( λ → ∞ )(see Sec. II). It is interesting to note that l opt approaches inﬁnity when τ is larger than acritical value τ c = λ D eff . (48)Hence, the minimal search time for τ ≥ τ c , is identical to that with no jumps (see Sec. III).It is important to note that τ c depends, as expected, on the time of an IT through D eff .In the case when τ ≤ τ c Eqs. (46) and (47) give t searchopt ∼ L r τ D eff r − τ τ c . (49)In this regime t searchopt is monotonically increasing in τ .In Fig. 12 we show a comparison between the results of numerical simulation and Eq.(46). • Regime II ( l eff ≪ l R )In this case l s ∼ l eff l R and Eq. (45) yields t search ∼ Ll R l eff ( τ eff + τ eff ) ∼ Ll R l (cid:18) l D eff + τ (cid:19) . (50)7 l (cid:79) R l R l eff D (cid:87) search R eff Ll lt Dl (cid:87)(cid:167) (cid:183)(cid:14)(cid:168) (cid:184)(cid:168) (cid:184)(cid:169) (cid:185) (cid:16) effopt eff

Dl D (cid:87)(cid:87)(cid:79)(cid:32) (cid:16) opt l (cid:32) (cid:102) eff D (cid:87) effsearch lDLt l l (cid:87)(cid:79)(cid:14)(cid:14) (cid:16) FIG. 10: Possible regimes as a function of l and λ are shown in the case of ITs and jumping (or IT,jumping and sliding with b ≪ R ) for l R ≪ p D eff τ . The gray (white) area represents regime I ( II ). The dashed line represents the optimal antenna length. The optimal antenna length in theabsence of IT is equal to √ D τ . Interestingly, in this regime the minimal search time is obtained when τ diverges ; Thismeans that jumping only increase the search slower in this case . We note that some careneeds to be taken with the limit since if λ > l R and the value of l exceeds l R the regime l eff ≪ l R transforms into Regime I .The results of this section highlight several interesting features which will also appearin the more general case, where sliding is also allowed. First, we note that in the limitof very strong protein-DNA aﬃnity (large values of τ ) the search time becomes robust tochanges in the value of τ . This is very diﬀerent from a search process with no ITs (see Eqs.(46), (50) and Fig. 12), and may give a possible explanation to the diﬀerence between invitro experiments on the Lac repressor [7]. There a strong dependence of the search timeon ionic strength (and therefore on the protein-DNA aﬃnity) was found. However, in vivo experiment [32] found that the eﬃciency of the repression by the same protein is very robustto changes in the ionic strength.Furthermore, by examining the optimal search time, we ﬁnd that beyond some criticalvalue of τ jumps increase the search time (see Fig. 12 for demonstration). This may givea possible explanation of the obtained value of τ in vitro [19] and in vivo [20] for the Lac8 l (cid:79) R l R l eff D (cid:87) search R eff Ll lt Dl (cid:87)(cid:167) (cid:183)(cid:14)(cid:168) (cid:184)(cid:168) (cid:184)(cid:169) (cid:185) (cid:16) effsearch lDLt l l (cid:87)(cid:79)(cid:14)(cid:14) (cid:16) opt l (cid:32) (cid:102) eff D (cid:87) effopt eff Dl D (cid:87)(cid:87)(cid:79)(cid:32) (cid:16) search R Ll R t l (cid:167) (cid:183)(cid:167) (cid:183) l (cid:168) (cid:184)(cid:168) (cid:184) l (cid:87)(cid:87)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183)(cid:167) (cid:183) l (cid:14)(cid:14)(cid:168) (cid:184) D (cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184) D (cid:87)(cid:87)(cid:87)(cid:87)(cid:14)(cid:14)(cid:14)(cid:14)(cid:169) (cid:185)(cid:169) (cid:185) effeffeffeff D (cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184)(cid:168) (cid:184) D (cid:16) ef f efef opt ef f efef D l D (cid:87) (cid:87) (cid:79) (cid:32) (cid:16) opt R l l (cid:16) FIG. 11: Possible regimes as a function of l and λ are shown in the case of ITs and jumping (or IT,jumping and sliding with b ≪ R ) for l R ≫ p D eff τ . The gray (white) area represents regime I ( II ). The dashed line represents the optimal antenna length. The optimal antenna length in theabsence of IT is equal to √ D τ . repressor. These are much larger than the optimal τ predicted by models that do notinclude ITs.In Fig. 12 a comparison between Eq. (46) and numerical simulation is shown. One maysee that increasing the value of τ increases the optimal value of l (or equivalently τ ) insuch a way that above some critical value, predicted by Eq. (48), it becomes inﬁnite. VI. INTERSEGMENTAL TRANSFER, SLIDING AND JUMPING

With the results of the previous section it is straightforward to consider the general casewhere ITs, sliding and jumping are allowed. Similar to the previous section we show thatjumping may slow the search process signiﬁcantly. However, ITs make the search processmuch more robust to variations in parameters.First consider the case b ≪ R where sliding events are very short. Clearly, in this casethe results of the previous section hold with δ = b D + τ IT . Here as in Sec. IV, D is the onedimensional diﬀusion coeﬃcient for sliding and τ IT is the typical time that the protein isbound to two DNA segments. With this in mind, in this section we discuss only the opposite9 Log l Log t sea r c h FIG. 12: The inﬂuence of ITs on the search time is shown.The search time, t search ,is plotted as a function of the antenna length, l, for a diﬀerent values of τ (140 , , , , , δ from bottom up). Here only ITsand jumping are allowed. Thin solid lines represent the numerical results. The bold solid linesrepresent analytic results (Eq. (46)). The black, dashed lines represent the search time in the casewith no ITs, obtained by using Eq. (4) with the eﬀective diﬀusion constant D eff = R δ insteadof D . Here L , R and b were taken to be 1224000, 1 and 1 lattice constants respectively. Since R = b = 1 diﬀusion through sliding is identical to one through CITs. This allows us to directlycompare sliding and jumping with ITs and jumping. case of b ≫ R . Here, as in Sec. V, the parameters that do not depend on l , such as λ , aretaken as given. In Sec. IV contains the relevant derivation of λ is calculated for the casediscussed here of long sliding, b ≫ R .As shown in Sec. IV in this case D eff ∼ b δ with δ = b D + τ IT . Following Sec. V weﬁrst need τ eff , the typical time of an uncorrelated relocation. This is given by (see thederivation of Eq. (44) and Appendix D 1) τ eff = τ + τ IT l λ l λ . (51)Note that here, since b ≫ R , the search between two subsequent uncorrelated relocationsis always recurrent and therefore l s ∼ l eff . Therefore, similar to Sec. V, the search time is0given by t search ∼ Ll eff ( τ eff + τ eff ) ∼ Ll eff l eff D eff + τ + τ IT l λ l λ ! . (52)Using Eqs. (43) and (52), the total search time can be written as t search ∼ Ll q l λ (cid:18) l D eff + τ IT l λ + τ (cid:19) . (53)Again, it is interesting to consider the optimal value of τ τ opt = (cid:0) τ opt (cid:1) − τ − τ IT λ / D eff , (54)where (cid:0) τ opt (cid:1) = τ (see Eq. (6)) is the optimal antenna size in absence of ITs ( λ → ∞ ).Interestingly, Eq. (54) shows that the optimal τ opt , may either be smaller or largerthan (cid:0) τ opt (cid:1) depending on the time of an IT, τ IT . It is also noteworthy that when 2 τ >λ / D eff + τ IT the optimal τ value becomes inﬁnite. Namely, jumping makes the searchprocess slower . This is similar to the behavior found in Sec. V, and again the critical valueof τ depends on microscopic quantities such as the time of an IT.The minimal search time obtain is t searchopt ∼  Lλ √ τ p λ / D eff + τ IT − τ τ < λ / D eff + τ IT L (cid:16) D eff + τ IT λ (cid:17) τ > λ / D eff + τ IT . (55)We stress again that it is clearly seen that jumping may slow the search considerably. Notethat again the optimal value of τ is very diﬀerent than the canonical one discussed in Sec.II.Fig. 13 shows a comparison between the theoretically predicted search time (Eq. (53))and numerical simulation. VII. APPLICATION TO THE LAC REPRESSOR

The above results cover a very wide variety of regimes. For a given protein only severalare of interest. To illustrate the use of the results presented above we consider Lac repressor.Lac repressor is both the most studied DNA-binding protein (see [34] for a review) and itsstructure is highly suggestive of intersegmental transfers taking place. Despite of this severalphysical parameters of the protein are yet unknown. In this subsection we use the knownparameters: R ∼ nm [35], Λ ∼ µ , L ∼ mm , and those measured for Lac repressor1 Log t sea r c h Log l FIG. 13: The inﬂuence of ITs on the search time is shown. The search time, t search , is plotted vs.the antenna length, l , for a diﬀerent values of τ (10 , , , , δ frombottom up). Here ITs, jumping and sliding are allowed. Thin solid lines with dots represent thenumerical results. The bold solid lines represent analytic results (Eq. (53)). The black, dashedlines represent the search time in the case with no ITs, obtained by using Eq. (4) with D eff = R δ .Here L , R and b were taken to be 1224000, 1 and 20 lattice constants respectively. with only one DNA-binding domain τ ∼ ms , τ ∼ . τ and D ∼ . µ /s [19, 20]. Stillunknown are b , the sliding length, and τ IT which we use as free parameters and study thesearch time as these are varied. It is interesting to note that Lac repressor is so large that, aswe show, essentially all ITs can move the protein at each step to a completely uncorrelatedlocation on the DNA.Fig. 14 shows the predicted t search from Secs. V and VI as a function of b and τ IT . Onemay see that for b ≫ R , ITs do not aﬀect the search time signiﬁcantly even if τ IT is small.This is results from the small probability of performing UIT for a large values of b . However,if b ≪ R the search time may be decreased in a signiﬁcant manner by including ITs. Forexample, by setting b to be the size of one base pair ∼ . nm the search time decrease by afactor of three when τ IT = τ and if τ IT = τ the search time decreases by a factor of ten.Finally, Fig. 14 shows that for large values of τ IT , ITs may slow down the search process.2 FIG. 14: On this ﬁgure the analytical prediction of t search is shown as a function of the unknownparameters b and τ IT . VIII. SUMMARY

In this article we presented a comprehensive study of the inﬂuence of ITs on the searchprocess. Using simple scaling arguments we studied a model which includes the protein dy-namics and DNA conformation. Two extreme regimes for the DNA dynamics were studied:completely quenched (frozen) and annealed (rapidly moving) DNA. ITs were assumed torelocate the protein to a randomly chosen DNA position within a range of the order of theprotein size. The essence of the description may be understood from Sec. III. The followingsections elaborate and study a search processes based on ITs with sliding and/or jumping.The results for a particular protein of interest may be obtained by suitably selecting thesection most relevant for a particular case.The obtained results clearly indicate that including IT in the search process may increase,the robustness of the search eﬃciency to diﬀerent parameters of the model such as theprotein-DNA aﬃnity, the three-dimensional diﬀusion coeﬃcient etc.The mechanism of IT may produce a signiﬁcant increase of the optimal residence time ofthe protein on the DNA between two subsequent rounds of three-dimensional diﬀusion fromthe value predicted by the models that do not include IT. Recent experiments indicates that3the value of the residence time of the proteins on the DNA between two subsequent roundsof the three-dimensional diﬀusion is much larger than the optimum predicted by the model.It is possible that the existence of the IT mechanism may explain the rather quick searchtimes found in vivo experiments.One of the most surprising results found that above some critical value of the typicaltime of a jump the protein has no reason to detach from the DNA. It is more eﬃcient for itto stay bound to the DNA. The value of the critical jump time depends on the time of anIT.A key ingredient needed for the behavior to occur is the conﬁnement of the DNA ina volume much smaller than its radius of gyration. The probability to perform an UITobviously depends on the DNA density. Larger density implies a larger probability forUITs. Therefore the eﬀects of IT are expected to be more important in the systems withhigh DNA density as cells or eucaryotic nuclei rather than in the in vitro experiments.

The dependency, mentioned above, on the DNA density leads to many possible regimeswhich depend on the cell size, DNA length etc. In particular, we found non-trivial regimeswhen the search time increases as a square root of the DNA length or is completely in-dependent of it. Our estimates indicate that these seem to be the ones most relevant toexperiments.Our results also show that the search on quenched and annealed DNA may have quitediﬀerent scaling behavior. In general a search that uses ITs is shown to be more rapid onan annealed DNA than on a quenched DNA. This happens due to the rapid decrease incorrelations which results from the motion of the DNA molecule.Similar scaling arguments were used to discuss the eﬀects of IT in [11]. However, therethe main mechanism that drives the IT was assumed to be the motion of the DNA molecule.In our study even on completely quenched DNA ITs are shown to be important.

Acknowledgments

We thank R. Voituriez and R. Metzler for discussions and D. Levine for discussions andcomments on the manuscript. The Israel Science Foundation is acknowledged for ﬁnancialsupport.4

APPENDIX A:

In this appendix we argue that the typical time that the protein spends in a jump isgiven by τ ∼ Λ D L . This quantity is controlled by average volume which is free from DNA.Consider, ﬁrst, the probability to ﬁnd a volume, free from DNA of radius s . To do so wedescribe the packed DNA as an ideal gas of LL straight rods of length L that are distributedrandomly in the cell (see Fig. 3). The probability p seg , that a given rods crosses a volumeof radius s is of order of L Λ s L = L s Λ . Here L Λ is the probability that a given segments islocated within a distance L of a point inside the cell and ∼ s L is the probability that thissegment crosses a sphere of radius s around the point. The probability that at least onesegment crosses the void is 1 − (1 − p seg ) L/L ≃ − e − LR . (A1)Therefore the typical free volume radius is ∼ q Λ L . Hence, the typical time to explore thisvolume is τ ∼ Λ D L . A second way to get the same expression for τ is based on a comparisonbetween Eqs. (1) and (4). Obviously, in the limiting case τ ≪ τ and √ D τ = r , thesearch becomes based only on the three-dimensional diﬀusion. Hence, in this case the formula(4) should give (1). It is easy to see that this happens only when τ ∼ Λ D L . APPENDIX B:

In this appendix we describe the details of the numerical simulation. The simulationswere done on a cubic lattice containing 800 × ×

800 sites. Assuming that a real cell hasa volume of 1 µm each site on the lattice represents a volume of ( dx ) = (cid:0) µm (cid:1) . Polymers(representing the DNA) with diﬀerent lengths were embedded in the lattice by using a self-avoiding random walk. The persistence length was accounted for by assigning a probability p of changing direction randomly among the possible directions. Using the persistencelength of about 50 nm leads to p = dx nm = 0 . O (10) In the three-dimensional space diﬀusive exploration is not compact i.e. the probability to ﬁnd a ﬁnitetarget (sphere) is less than one. However, the DNA as a target may be described as a set of straight rods.Hence, the search process eﬀectively looks like the two-dimensional search for a ﬁnite target (disk) i.e.compact (up to logarithmic corrections). dx b to perform an IT and a probability dx l to perform a jump. ITs were simulated by arandomly choosing a DNA site within a distance R from the location of the protein. Withthe exception of Sec. II, where a complete simulation of the three-dimensional diﬀusionwas carried out by performing moves to the 6 available directions, a jump was simulated byrandomly choosing a site on the DNA. The time of the jump was taken as a free constant(). APPENDIX C:

In this appendix we argue that using ITs the protein can only move along the chemicalcoordinate to distances smaller than R or larger than Λ L . As mentioned above, we assumethat during an IT the protein chooses a new location whose three-dimensional distance fromits current location is smaller than R . The new location is chosen randomly with a uniformprobability. Given the uniform probability we need to estimate the total typical lengthavailable at each IT, G . We separate this quantity to four types of contributions: G = G + G + G + G . (C1)The ﬁrst G is the contribution from DNA whose distance along the chemical coordinatefrom a point x is smaller than R , the protein size. This is given by G ( x ) ≃ R . (C2)The contribution G arises from DNA whose chemical distance from a point x is larger than R but smaller than L . The probability for the DNA to bend on a scale l is approximatelygiven by − e − l/ L L . However, the probability that this bend will connect to x is ∼ R l (dueto the area ratio). Since each connection contributes a length of the order of R to G weobtain G ∼ R L Z R − e − l/ L L R l dl ∼ R L (cid:0) − e − R/ L (cid:1) ≃ R L . (C3)6The contribution G comes from DNA whose chemical distance from x is larger than L but smaller than the length at which the DNA feels the boundaries of the cell is ∼ Λ L .This value can be overestimated using the fact that a free three-dimensional random walk on a lattice returns to the origin about 1 . L returns to a region with radius R an order of (cid:16) RL (cid:17) times. Each such return contributes length of about R to G , leadingto G ∼ R L . (C4)Finally, G is the contribution from the rest of the DNA (whose chemical distance is largerthan Λ L but smaller than L ). Using (13) and since each connected segment contributes alength of the order R to G one obtains G ∼ R LL p seg ∼ LR Λ . (C5)This result can be understood within a mean ﬁeld approach: if the DNA has a total length L and is assumed to be distributed uniformly in the cell, every volume in the cell contains apart of the total DNA length that is equal to the total DNA length times the fraction of thevolume. One can see that in the assumed regime where L ≫ R and L ≫ Λ Λ L , G and G are much smaller than G and G . Therefore, we can safely neglect the probability that theprotein will move to a location on the DNA whose chemical distance from protein’s actuallocation is larger than R and smaller than Λ L . APPENDIX D:

In this appendix the eﬀective times τ eff and τ eff are calculated.

1. The eﬀective time of a correlated movement

We have two independent mechanisms for an uncorrelated motion. The ﬁrst is jumpingwith a typical time of τ between two subsequent jumps. This process has Poissonianstatistics and therefore the probability that the protein does not perform a jump beforetime t is P J = exp (cid:18) − tτ (cid:19) . (D1)7The second mechanism for uncorrelated motion is an UIT with a typical time of order of λ D eff between two subsequent UITs. In the case of annealed DNA this mechanism hasPoissonian statistics and the probability that the protein does not perform an UIT beforetime t is P IT ∼ exp (cid:18) − tλ /D eff (cid:19) . (D2)For quenched DNA the probability that the protein did not performed an UIT after travelingdistance x is ∼ e − x/λ . Since the protein performs an eﬀective one-dimensional diﬀusion, x ∼ p D eff t and we obtain P IT ∼ exp − s tλ / D eff ! . We will take the typical time of a non-interrupted (by an uncorrelated relocation) one-dimensional eﬀective diﬀusion to be τ eff = Z ∞ P IT P J dt ∼

12 1 D eff l + D eff λ . (D3)The last expression is exact in the annealed case but it is only an approximation in thequenched regime. One can verify that the error does not exceed 50%, which is suﬃcient forscaling arguments of the type used in the paper.

2. The eﬀective time of an uncorrelated movement

Since there are two mechanisms for uncorrelated movement: a jump with a typical time τ and an UIT with a typical time δ the typical time of the uncorrelated movement is theaverage of τ and δ weighted by the relevant probabilities for each process: τ eff = δ Z ∞ (cid:18) − dP IT dt (cid:19) P J dt + τ Z ∞ (cid:18) − dP J dt (cid:19) P IT dt == δ Z ∞ dP J dt P IT dt − τ Z ∞ dP J dt P IT dt − δ Z ∞ ddt ( P J P IT ) dt == τ − δτ Z ∞ P J P IT dt + δ = τ − δτ τ eff + δ == τ τ eff τ + δ (cid:18) − τ eff τ (cid:19) = τ l eff l + δ (cid:18) − l eff l (cid:19) = τ + δ l λ l λ . (D4)8In the case of sliding δ is replaced by τ IT . [1] B. Alberts, D. Bray, J. Lewis, M. Raﬀ, K. Roberts, and J. D. Watson. The molecular biologyof the cell . Garland, New York, 4’th edition, 1994.[2] C. Loverdo, O. Benichou, M. Moreau, and R. Voituriez. Enhanced reaction kinetics in biolog-ical cells.

Nature Physics , 4:134, 2007.[3] M. von Smoluchowski. Mathematical theory of the kinetics of the coagulation of colloidalsolutions.

Z. Phys. Chem. , 92:129, 1917.[4] M. B. Elowitz, M. G. Surette, P. E. Wolf, J. B. Stock, and S. Leibler. Protein mobility in thecytoplasm of Escherichia coli.

J. Bacteriol. , 181(1):197, 1999.[5] S. Y. Lin and A. D. Riggs. Lac repressor binding to non-operator DNA: detailed studies anda comparison of eequilibrium and rate competition methods.

J. Mol. Biol. , 72(3):671, 1972.[6] A. D. Riggs, H. Suzuki, and S. Bourgeois. Lac repressor-operator interaction I. Equilibriumstudies.

J. Mol. Biol. , 48(1):67, 1970.[7] A. D. Riggs, S. Bourgeois, and M. Cohn. The Lac repressor-operator interaction. 3. Kineticstudies.

J. Mol. Biol. , 53(3):401, 1970.[8] O. G. Berg, R. B. Winter, and P. H. von Hippel. Diﬀusion-driven mechanisms of proteintranslocation on nucleic acids. 1. models and theory.

Biochemistry , 20(24):6929, 1981.[9] M. Slutsky and L. A. Mirny. Kinetics of protein-DNA interaction: Facilitated target locationin sequence-dependent potential.

Biophys J. , 87:4021, 2004.[10] T. Hu, A. Y. Grosberg, and B. I. Shklovskii. How proteins search for their speciﬁc sites onDNA: The role of DNA conformation.

Biophys J. , 90:2731, 2006.[11] Tao Hu and B. I. Shklovskii. How proteins search for their speciﬁc sites on DNA: The role ofintersegment transfer.

Phys. Rev. E , 76:051909, 2007.[12] O. G. Berg and C. Blomberg. Association kinetics with coupled diﬀusional ﬂows. specialapplication to the Lac repressor-operator system.

Biophys. Chem. , 4:367, 1976.[13] S. E. Halford and J. F. Marko. How do site-speciﬁc DNA-binding proteins ﬁnd their targets?

Nucleic Acids Research , 32(10):3040, 2004.[14] B.P. Belotserkovskii and D.A. Zarling. Analysis of a one-dimensional random walk withirreversible losses at each step: applications for protein movement on DNA.

J. Theor. Biol. , Biophysical Journal , 88:1608,2005.[16] M. A. Lomholt, T. Ambjrnsson, and R. Metzler. Optimal target search on a fast-foldingpolymer chain with volume exchange.

Phys. Rev. Lett. , 95:260603, 2005.[17] G. Adam and M. Delbruck. Reduction of dimensionality in biological diﬀusion processes. InA. Rich and N. Davidson, editors,

Structural Chemistry and Molecular Biology , pages 198–215,San Francisco, CA, 1968. Freeman.[18] B. D. Hughes.

Random walks and random enviroments , volume 1: Random walks. Clarendonpress, Oxford, UK, 1995.[19] Y. M. Wang, Robert H. Austin, and Edvard C. Cox. Single molecule measurement of repressorprotein 1d diﬀusion on DNA.

Phys.Rev. Lett. , 97:048302, 2006.[20] J. Elf, G.-W. Li, and P. X. Xie. Probing transcription factor dynamics at the simple single-molecule level in a living cell.

Science , 316:1191, 2007.[21] Y. Kao-Huang, A. Revzin, A. P. Butler, P. O’Conner, D. W. Noble, and P. H. Von Hippel.Nonspeciﬁc DNA binding of genome-regulating proteins as a biological control mechanism:Measurement of DNA-bound Escherichia coli Lac repressor in vivo.

PNAS , 74:4228, 1977.[22] P. H. von Hippel, A. Revzin, C. A. Gross, and A. C. Wang. In H. Sund and G. Blauer, editors,

Protein-Ligand Interactions , pages 279–347, Berlin, 1975. Walter de Gruyter.[23] J. L. Bresloﬀ and D. M. Crothers. DNA-ethidium reaction kinetics: demonstration of directligand transfer between DNA binding sites.

L. Mol. Biol. , 172:263, 1975.[24] R. Fickert and B. Muller-Hill. How Lac repressor ﬁnds Lac operator in vitro.

J. Mol. Biol. ,226(1):59, 1992.[25] T. Ruusala and D. M. Crothers. Sliding and intermolecular transfer of the Lac repressor:Kinetic perturbation of a reaction intermediate by a distant DNA sequence.

Proc. Natl. Acad.Sci. USA , 89:4903, 1992.[26] B. A. Lieberman and S. K. Nordeen. DNA intersegment transfer, how steroid receptors searchfor a target site.

J. Biol. Chem. , 272(2):1061, 1997.[27] M. L. Embleton, S. A. Williams, M. A. Watson, and S. E. Halford. Speciﬁcity from thesynapsis of DNA elements by the SﬁI endonuclease.

J. Mol. Biol. , 289(4):785, 1999. [28] C. Bustamante, M. Guthold, X. Zhu, and G. Yang. Facilitated target location on DNAby individual Escherichia coli RNA polymerase molecules observed with the scanning forcemicroscope operating in liquid. ASBMB , 274(24):16665, 1999.[29] M. G. Fried and D. M. Crothers. Kinetics and mechanism in the reaction of gene regulatoryproteins with DNA.

J. Mol. Biol. , 172:263, 1984.[30] S. Condamin, O. Benichou, V. Tejedor, R. Voituriez, and J. Klafter. First-passage times incomplex scale-invariant media.

Nature , 450:77, 2007.[31] S. Y. Lin and A. D. Riggs. The general aﬃnity of Lac repressor for E. coli DNA: implicationfor gene regulation in procaryotes and eucaryotes.

Cell , 4:107, 1975.[32] B. Richey, D. S. Cayley, M. C. Mossing, C. Kolka, C. F. Anderson, T. C. Farrar, and M. T.Record. Variability of the intracellular ionic environment of Escherichia coli. diﬀerences be-tween in vitro and in vivo eﬀects of ion concentrations on protein-DNA interactions and geneexpression.

J. Biol. Chem , 262:7157, 1987.[33] D. J. Watts.

Small Worlds: The Dynamics of Networks Between Order and Randomness.

Princeton University Press, 1999.[34] M. Muller-Hill.

The Lac operon. A short history of a genetic paradigm . Walter de Gruyter,Berlin, 1996.[35] G. C. Ruben and T. B. Roos. Conformation of Lac repressor tetramer in solution, bound andunbound to operator DNA.