[PDF] The synergy between protein positioning and DNA elasticity: energy minimization of protein-decorated DNA minicircles

Abstract

The binding of proteins onto DNA contributes to the shaping and packaging of genome as well as to the expression of specific genetic messages. With a view to understanding the interplay between the presence of proteins and the deformation of DNA involved in such processes, we developed a new method to minimize the elastic energy of DNA fragments at the mesoscale level. Our method makes it possible to obtain the optimal pathways of protein-decorated DNA molecules for which the terminal base pairs are spatially constrained. We focus in this work on the deformations induced by selected architectural proteins on circular DNA. We report the energy landscapes of DNA minicircles subjected to different levels of torsional stress and containing one or two proteins as functions of the chain length and spacing between the proteins. Our results reveal cooperation between the elasticity of the double helix and the structural distortions of DNA induced by bound proteins. We find that the imposed mechanical stress influences the placement of proteins on DNA and that, the proteins, in turn, modulate the mechanical stress and thereby broadcast their presence along DNA.

Full PDF

aa r X i v : . [ phy s i c s . b i o - ph ] M a y The synergy between protein positioning and DNA elasticity: energyminimization of protein-decorated DNA minicircles

Nicolas Clauvelin a , Wilma K. Olson b a BioMaPS Institute for Quantitative Biology, Rutgers, the State University of New Jersey, Piscataway, NJ, USA b BioMaPS Institute for Quantitative Biology and Department of Chemistry and Chemical Biology, Rutgers, the StateUniversity of New Jersey, Piscataway, NJ, USA

Abstract

The binding of proteins onto DNA contributes to the shaping and packaging of genome as well as to theexpression of speciﬁc genetic messages. With a view to understanding the interplay between the presence ofproteins and the deformation of DNA involved in such processes, we developed a new method to minimize theelastic energy of DNA fragments at the mesoscale level. Our method makes it possible to obtain the optimalpathways of protein-decorated DNA molecules for which the terminal base pairs are spatially constrained.We focus in this work on the deformations induced by selected architectural proteins on circular DNA.We report the energy landscapes of DNA minicircles subjected to diﬀerent levels of torsional stress andcontaining one or two proteins as functions of the chain length and spacing between the proteins. Ourresults reveal cooperation between the elasticity of the double helix and the structural distortions of DNAinduced by bound proteins. We ﬁnd that the imposed mechanical stress inﬂuences the placement of proteinson DNA and that, the proteins, in turn, modulate the mechanical stress and thereby broadcast their presencealong DNA.

Keywords:

DNA, elasticity, numerical optimization

1. Introduction

The organization of long genomes in the conﬁned spaces of a cell requires special facilitating mechanisms.A variety of architectural proteins play key roles in these processes. Some of these proteins help to compactthe DNA by introducing sharp turns along its pathway while others bring distant parts of the chain moleculeinto close proximity. The histone-like HU protein from

Escherichia coli strain U93 and the structurallyrelated Hbb protein from

Borrelia burgdorferi induce some of the largest known deformations of DNA double-helical structure—including global bends in excess of 160 deg, appreciable untwisting, and accompanyingdislocations of the helical axis [1–5]. The degree of DNA deformation reﬂects both the phasing of the protein-induced structural distortions and the number of nucleotides wrapped on the protein surface. Changes inthe spacing between deformation sites on DNA, one such deformation associated with each half of thesetwo dimeric proteins, contribute to chiral bends, while variation in the electrostatic surface of the proteinsperturbs the degree of association with DNA and the extent of duplex bending.By contrast, recent studies have highlighted the important role played by proteins which bridge andscaﬀold distant DNA sites and induce the formation of loops. For example, the

Escherichia coli

H-NS(histone-like nucleoid structuring) protein forms a superhelical scaﬀold [6] to compact DNA and the

Es-cherichia coli terminus-containing factor MatP can bridge two DNA sites to form a loop [7]. The MatPprotein is responsible for the compaction of a speciﬁc macrodomain (Ter) in

Escherichia coli and is believedto be able to bind sites located on either separate chromosomes or within one chromosome [8]. The detailed

Email address: [email protected] (Nicolas Clauvelin)

Preprint Nicolas Clauvelin September 25, 2018 nﬂuence of MatP on chromosome segregation and cellular division is yet to be unraveled [8], but it is ex-pected that the presence of MatP proteins could signiﬁcantly alter the organization and the dynamics ofthe whole Ter macrodomain [7].The composite kinking and wrapping of DNA on the surface of HU resembles, albeit on a much smallerscale, the packaging of DNA around the histone octamer of the nucleosome core particle [9], while MatPmight serve as a bacterial analog of the insulator-binding proteins that divide chromatin into independentfunctional domains [10]. Deciphering how these two very diﬀerent bacterial proteins inﬂuence the structureand properties of DNA provides a ﬁrst step in understanding how architectural proteins contribute to thespatial organization of genomes.With this goal in mind, we have recently developed a new method for the optimization of DNA structureat the base-pair level. Our method takes account of the sequence-dependent elasticity of DNA and can beapplied to chain fragments in which the ﬁrst and last base pairs are spatially constrained. Moreover, ourapproach allows for constraints in intervening parts of the DNA and thus makes it possible to model thepresence of bound proteins on DNA. We can accordingly compute the energy landscape for a wide varietyof protein-DNA systems and herein illustrate the eﬀects of HU and Hbb on the conﬁgurations of circularDNA and provide examples of protein-mediated loops of the type induced by MatP.The advantage of our method resides in the direct control of the positions and orientations of the basepairs at the ends of a DNA chain and in the capability of accounting for the presence of bound proteins.In addition, the diﬀerent constraints due to the boundary conditions and the presence of proteins aredirectly integrated in the minimization process, which makes it possible to rely on unconstrained numericaloptimization methods. Others have derived methods to optimize the energy of elastic deformation for DNAfragments. For example, Coleman et al [11] developed a method similar to ours, but their approach requireexplicit speciﬁcation of the forces and moments acting on the terminal base pairs (that is, there is no directcontrol on the positions and orientations of the terminal base pairs). Zhang and Crothers also presented anoptimization method [12], albeit limited to circular DNA. Their method only considers angular deformationswithin the double helix and takes the boundary conditions into account through Lagrange multipliers.We start with a brief description of our method (Section 2) and then present our results for the eﬀect ofdiﬀerent proteins on circular DNA. The various appendices contain the details of the calculations.

2. Energy minimization of spatially constrained, protein-decorated DNA

We present in this section a novel method for minimizing the elastic energy of a collection of DNA base-pairs. Our method accounts for the sequence-dependent elasticity of DNA and can be applied to a DNAfragment in which the ﬁrst and last base pairs are spatially constrained. Moreover, our approach makes itpossible to constrain parts of the DNA in order to model the presence of bound proteins. We describe ourmethod in a symbolic fashion for readability and present full details of the calculations in the appendices.

We consider a collection of N rigid base pairs and for the i -th base pair we denote x i its origin and d i the matrix containing the axes of the base-pair frame organized as column vectors. The segmented curvedeﬁned by the base-pair origins x i is referred to as the collection centerline and can be interpreted as adiscrete double-helical axis.Here and in the rest of this paper vector symbols are underlined and matrices are represented with boldsymbols. Superscripted Roman letters ( i, j, . . . ) are used to index base pairs and base-pair steps. The geometry of a collection of base pairs is traditionally described in terms of the base-pair stepparameters (or step parameters for short) [13]. These parameters describe the relative arrangement ofsuccessive base pairs and are denoted for the i -th step p i = (cid:0) θ i , θ i , θ i , ρ i , ρ i , ρ i (cid:1) [11] and referred to,respectively, as tilt, roll, twist, shift, slide and rise. The ﬁrst three parameters correspond to three anglesdescribing the relative orientation of base-pair frames d i and d i +1 and the last three parameters are the2omponents of the step joining vector r i = x i +1 − x i expressed in a particular frame referred to as thestep frame (see Appendix A for more details and [11, 14] for explicit calculation methods). The set of stepparameters for the base-pair collection is a 6( N −

1) vector and is denoted P .We shall see later that the energy of elastic deformation for a base-pair step can be written as a quadraticform with respect to the step parameters. The step parameters, however, are not a convenient representationto express constraints on the positions and orientations of base-pairs within the collection. This is due tothe fact that the components of the step joining vector r i depend on the relative orientation of the twobase pairs forming the step. In order to circumvent this issue we introduce a diﬀerent representation of thegeometry of a base-pair collection. We now deﬁne a new set of variables for each step of the base-pair collection. These variables are denoted φ i = (cid:0) ψ i , ψ i , ψ i , r i , r i , r i (cid:1) for the i -th step and referred to as the step degrees of freedom or step dofs forshort. Although the variables ψ i are identical to the angular step parameters θ i , we use a diﬀerent symbolfor clarity. The variables r i are the components of the step joining vector r i = x i +1 − x i expressed withrespect to the global reference frame (a convenient choice for the global reference frame is the ﬁrst base-pairframe). The set of step dofs for the base-pair collection is a 6( N −

1) vector and is denoted Φ.The main advantage of this choice of variables is to separate the representation of the centerline ofthe base-pair collection from the orientation of the base-pair frames. This is somewhat equivalent to thecenterline/spin representation of Langer and Singer [15] for continuous elastic rods. We shall see that thisrepresentation is particularly convenient to deal with the end conditions applied to a collection of base pairs.

The energy of elastic deformation for a DNA base-pair collection is deﬁned as the sum of the energy ofelastic deformation for each step. For the i -th step, this energy is given by the following quadratic form: E i = 12 (cid:0) p i − p i (cid:1) ⊺ F i (cid:0) p i − p i (cid:1) , (1)where p i contains the intrinsic step parameters and describes the reference conﬁguration of the step ( i.e. ,the conﬁguration of zero energy, also called the rest state), and F i is a 6 × E = N − X i =1 E i . (2)The purpose of our method is to minimize the elastic energy of a base-pair collection given by Eq. (2)under the constraints detailed below. In other words, we need to calculate the derivatives of the elasticenergy. The diﬃculty stems from the fact that this gradient has to be calculated with respect to a setof independent variables. This set of independent variables depends on the end conditions applied to thecollection of base pairs and also on the presence of bound proteins. As mentioned above, the step parametersare not convenient for dealing with the end conditions. We therefore calculate the gradient with respectto the step dofs. We ﬁrst consider the trivial case of an unconstrained collection ( i.e. , a collection with noimposed end conditions and no bound proteins) and we then show how to account for speciﬁc end conditionsand the presence of bound proteins.Note that, our method does not require the elastic energy of a step to be a quadratic form with respectto the step parameters. It is also possible to include coupling between successive steps in the expression ofthe total elastic energy of the collection of base pairs.3 .3. Elastic energy gradient for a free base-pair collection The case of a base-pair collection free of any end conditions and without bound proteins is trivial inthe sense that the optimization leads to the reference conﬁguration. We will use this gradient for a freecollection as the starting point of our method.The gradient of the elastic energy for a single step is obtained directly from Eq. (1): ∂E i ∂p i = F is (cid:0) p i − p i (cid:1) , (3)where we introduce the symmetrized force constant matrix F is deﬁned as F is = ( F i + F i ⊺ ) /

2. It follows thatthe variation of the elastic energy for the complete collection is given by: δ E = ∂ E ∂P ⊺ δP = N − X i =1 ∂E i ∂p i ⊺ δp i = N − X i =1 (cid:0) F is (cid:0) p i − p i (cid:1)(cid:1) ⊺ δp i . (4)In order to obtain the gradient with respect to the step dofs we introduce the Jacobian matrix J Φ deﬁnedas: δP = ∂P∂ Φ δ Φ = J Φ δ Φ . (5)It follows that: δ E = (cid:18) J ⊺ Φ ∂ E ∂P (cid:19) ⊺ δ Φ , (6)which leads to the following expression for the free-collection gradient with respect to the step dofs: ∂ E ∂ Φ = J Φ ⊺ ∂ E ∂P . (7)The details of the calculation for the matrix J Φ are given in Appendix B.2. We now consider the case of a base-pair collection with end conditions, that is, both the end-to-end vectorand the orientation between the ﬁrst and last base pairs are ﬁxed. Our results can easily be generalized toother types of end conditions.The end-to-end vector for the collection of base pairs is given by: x N − x = N − X i =1 r i . (8)If the end-to-end vector is imposed, it follows that: N − X i =1 δr i = 0 , (9)and we can therefore write: δr N − = − N − X i =1 δr i . (10)The end-to-end rotation corresponds to the orientation between the ﬁrst and last base pairs and is givenby: d ⊺ d N = D (1 ,N ) = N − Y i =1 D i ( ψ i ) , (11)4here D ( i,j ) denotes the rotation matrix between the i -th and j -th base pairs, and D i is the rotationmatrix for the i -th step which is completely parametrized by the angular step dofs ψ i = (cid:0) ψ i , ψ i , ψ i (cid:1) (referto Appendix A.1, Appendix B.1 and the reference [11] for more details). If the end-to-end rotation isimposed we have the constraint: δ D (1 ,N ) = δ N − Y i =1 D i ( ψ i ) ! = . (12)This constraint can be written as a condition on δψ N − , that is: δψ N − = − N − X i =1 K i δψ i , (13)where the details of the matrix K i are given in Appendix C (see Eq. (C.7)).It follows from the conditions given by Eq. (10) and Eq. (13) that the variation of the step dofs for thelast step can be expressed as: δφ N − = − N − X i =1 B i δφ i , (14)where, once again, the details of the matrix B i are given in Appendix C (see Eq. (C.9)). This result meansthat the step dofs φ N − are not independent variables. In other words, the end conditions reduce the numberof independent step dofs. Note that we choose to express the last step dofs as the non-independent variablesbut any other step could have been chosen. We now denote ˆΦ the set of independent step dofs and we havethe relation ˆΦ ⊂ Φ. We then deﬁne the Jacobian matrix ˆ J such that: δ Φ = ˆ J δ ˆΦ . (15)The dimensions of this matrix depend on the precise details of the end conditions (although the number ofrows is always 6( N − δ E = (cid:18)(cid:16) J Φ ˆ J (cid:17) ⊺ ∂ E ∂P (cid:19) ⊺ δ ˆΦ . (16)It follows that the gradient is: ∂ E ∂ ˆΦ = (cid:16) J Φ ˆ J (cid:17) ⊺ ∂ E ∂P . (17)Note that this gradient corresponds to the gradient with respect to the independent step dofs and accountsfor the end conditions. This is one advantage of our method: the end conditions are directly accountedfor. Hence, we transform a constrained optimization problem into an unconstrained one, which makes thenumerical implementation simpler and more robust. In order to study protein-decorated DNA we need to account for the presence of bound proteins on thedouble helix. We model the binding of proteins on the DNA by considering the step parameters of thebinding domain as frozen , that is, these step parameters are imposed and cannot change. For example, thefrozen step parameters can be extracted from high-resolution crystal structures of protein-DNA complexeswith the help of the 3DNA software [16].We now consider that the k -th step in the base-pair collection is frozen, that is, the step parameters p k are imposed (the following results can be easily generalized to an arbitrary number of frozen steps). Itfollows directly that the angular step dofs are constant and we therefore have δψ k = 0. The translationalstep dofs, however, are not constant: any changes in the angular steps dofs ψ j with j < k will change the5rientation of the joining vector r k of the frozen step. Indeed, we show in Appendix D that the variationsof the translational step dofs of the frozen step can be written as: δr k = k − X j =1 W j,k δψ j . (18)We therefore obtain for the variation of the frozen step dofs: δφ k = k − X j =1 C j,k δφ j , (19)where the details of the matrices W j,k and C j,k are given in Appendix D (see Eq. (D.4)).Similar to the set of independent step dofs ˆΦ introduced for the treatment of the end conditions, weintroduce a new set of independent step dofs ˜Φ which corresponds to the set of non-frozen step dofs amongthe set ˆΦ, that is, we have ˜Φ ⊂ ˆΦ ⊂ Φ. We deﬁne the matrix ˜ J as the Jacobian (see Appendix D fordetails): δ ˆΦ = ˜ J δ ˜Φ . (20)We obtain for the variation of the energy: δ E = (cid:18)(cid:16) ˜ J Φ ˆ J ˜ J (cid:17) ⊺ ∂ E ∂ ˜ P (cid:19) ⊺ δ ˜Φ , (21)and we have for the gradient: ∂ E ∂ ˜Φ = (cid:16) ˜ J Φ ˆ J ˜ J (cid:17) ⊺ ∂ E ∂ ˜ P . (22)In the above expression, ˜ P denotes the set of non-frozen step parameters and ˜ J Φ is the Jacobian matrixdeﬁned as: ˜ J Φ = ∂ ˜ P∂ Φ . (23)This matrix is directly obtained from J Φ by removing the rows associated with the frozen step parameters.The gradient obtained with Eq. (22) corresponds to the gradient of the elastic energy of the base-paircollection (Eq. (2)) with respect to the set of non-frozen independent step dofs. We introduced in the expression of the elastic energy of a base-pair step (Eq. (1)) two quantities whichdepend on the sequence of the DNA base-pair collection: the intrinsic step parameters p i and the forceconstants matrix F i . These quantities constitute the force ﬁeld of our theory. The intrinsic step parametersdescribe the rest states of the base-pair steps, while the force constants are the stiﬀnesses of the steps.In this work, we use a uniform ideal force ﬁeld ( i.e. , not depending on the details of the DNA sequence).The intrinsic step parameters for this force ﬁeld are given by p = (cid:0) , , . , , , . (cid:1) ,which correspond to a B-DNA like straight rest state and a helical repeat of 10 . (cid:0) F ideal (cid:1) = (0 . , . , . , , , . (24)The units for the the ﬁrst three diagonal entries are k B T. deg − and k B T. ˚A − for the last three entries.This force ﬁeld can be considered as quasi-inextensible due to the high force constants associated with thetranslational step parameters. In the limit of a inﬁnitely long base-pair collection we can calculate thepersistence lengths corresponding to the force ﬁeld and we ﬁnd that the bending and twisting persistencelengths are 47 . . .7. Protein binding procedure As explained earlier, we model the presence of proteins on DNA by setting the base-pair step parametersof the binding domain to imposed values. These imposed values can be extracted from high-resolutioncrystal structures of protein-DNA complexes or can be chosen to represent an ideal protein-DNA bindingsystem.In this study, we use an iterative procedure to bind a protein onto DNA progressively. For example, weconsider a protein with a binding domain of n p steps to be bound starting at the k -th step, that is, thesteps k to k + n p − linear ramp function: p j ( λ ) = (1 − λ ) p j free + λp j protein , ∀ j ∈ [ k, k + n p − , (25)where p j free denote the step parameters of the protein-free DNA (in our case they will correspond to the stepparameters of a portion of a naked-DNA minicircle) and p j protein are the step parameters of the DNA found inthe protein-DNA complex. The parameter λ ∈ [0 ,

1] is the ramp parameter. Our procedure starts with λ = 0and we minimize the DNA base-pair collection while gradually increasing the value of the ramp parameteruntil λ = 1. As an outcome we obtain an optimized DNA base-pair collection in which a part of DNA is shaped as if the protein were bound to it. This numerical procedure is not meant to convey the physicalprocess of proteins binding to DNA. It is designed to set regions of DNA to the states found in protein-DNAcomplexes. We also would like to point out that the binding ramp can be tuned in order to improve therobustness of the method. For example, the linear terms can be replaced by quadratic expressions. Suchmodiﬁcations might be needed to avoid instabilities or to control changes in the total twist while rampingthe step parameters in the case of binding domains with large deformations. The topology of a DNA minicircle is characterized by its linking number Lk [17], writhing number Wr [18,19], and total twist Tw [18]. The linking number of a minicircle, an integer, describes the entanglement ofthe minicircle centerline and the curve traced out by one of the edges of the base-pair frames (see [20] fordetailed explanations and computational methods). In particular, for a planar minicircle the linking numbercorresponds to the number of turns the double helix makes. For a minicircle of N bp, the relaxed linkingnumber Lk is given by the integer nearest to N/ .

5, where 10 . − Lk .Because a DNA minicircle is covalently closed, its linking number, and hence ∆Lk, are constant and are notaltered by the deformation of the double helix.The linking number of a minicircle is always equal to the sum of its writhing number and total twist,Lk = Tw + Wr. The writhing number characterizes the global folding and non-chiral distortions of theminicircle centerline, while the total twist measures the twisting or twist density of the base pairs aroundthe centerline. Like the linking number, the writhing number and total twist can be directly obtained fromthe minicircle base-pair frames as explained in [20]. The invariance of the linking number implies that whenthe minicircle is deformed, the total twist and the writhing number are redistributed. The total twist isdirectly related to the torsional stress within the double helix, while the writhing number depends on thecurvature of the centerline, albeit in a non-trivial way. In order to study protein-decorated DNA minicircles we ﬁrst need to generate protein-free minicircles.For a minicircle of N bp, we ﬁrst build a set of step parameters using the following formula for the i -th step: p i = (∆ sin ( θ ⋆ i ) , ∆ cos ( θ ⋆ i ) , θ ⋆ , , , ρ ) , (26)where ∆ = 2 π/N is the bending angle between the planes of successive base pairs and ρ denotes theintrinsic value of the rise step parameter within our force ﬁeld. The parameter θ ⋆ corresponds to the local7wist density in the minicircle and can be adjusted to generate over- or undertwisted conﬁgurations. Inorder to enforce the covalent closure of the minicircle, we add at the end of the base-pair collection anadditional base-pair identical to the ﬁrst one. We then minimize the energy of the base-pair collection underthe imposed end-to-end vector and rotation (which are both null in the present case). The outcome of theminimization is an optimized DNA minicircle with an imposed ∆Lk.Here we initially focus on planar, protein-free minicircles of lengths ranging from 63 bp to 105 bp ( i.e. ,from 6 to 10 helical repeats). Planar minicircles always have a writhing number equal to zero, which meansthat the total twist is equal to the linking number. Because we require the naked minicircles to be planarthere is an upper bound on the value of | Tw | (and, hence, of | ∆Lk | ). This limiting value is related to theMichell-Zajac instability [21, 22], and for larger values of | Tw | the planar circular conﬁgurations are no longerstable. With our ideal force ﬁeld, for a given chain length we can always generate optimized minicircleswith ∆Lk = 0 and, depending on the chain length, ∆Lk = − −

3. Results

We ﬁrst apply our minimization method to the optimization of DNA minicircles on which one or twoproteins are bound. We focus on two distinct proteins: the abundant histone-like HU protein and the struc-turally related Hbb protein. These two proteins have a common fold and share some structural similarities,such as the fact that they both associate as dimers and both introduce signiﬁcant localized bends in thedouble helix. We used the 3DNA software [16] to extract the step parameters of the DNA found in knowncrystal complexes [1, 5] (Protein Data Bank ﬁles 1P71 and 2NP2 for HU and Hbb, respectively). The lengthsof the extracted binding domains are 17 bp for HU and 35 bp for Hbb. Within the scope of our work, wethink of the Hbb-bound DNA as an extreme model of HU-induced distortion.The net bending angle introduced by the HU dimer in the selected structure is roughly 135 deg, andthat by Hbb is 160 deg (these angles correspond to the angles between the normals of the ﬁrst and lastbase pairs). Both proteins also slightly underwind the double helix. That is, the total twist across the DNAbound to each protein is less than the total twist of an undeformed DNA fragment of same length (for HU∆Tw ≃ − .

18 turns and for Hbb ∆Tw ≃ − . We ﬁrst performed a series of optimizations to study how the addition of an HU or Hbb dimer aﬀectsa DNA minicircle. We focused on chain lengths of 63 bp to 105 bp and considered relaxed minicircles(∆Lk = 0) as well as over- or undertwisted minicircles (∆Lk = ± ǫ as: ǫ = NN free EE , (27)where E is the energy of the optimized minicircle with the bound dimer, E is the energy of the optimizedprotein-free minicircle, N denotes the minicircle chain length, and N free the protein-free chain length. Inother words, ǫ is the ratio of the energy per base-pair step for the minicircle with a single protein versusthat for the naked minicircle. We recall that our elastic energy for a minicircle accounts for protein-freesteps only. The results for HU are presented in Fig. 1 and those for Hbb in Fig. 2.We ﬁrst remark that the presence of an HU dimer almost always reduces the energy of the protein-freeDNA compared to that of the naked minicircle (as shown by the values of the relative step energy ǫ lessthan one in Fig. 1). On the other hand, the addition of an Hbb protein has a mixed eﬀect on the relativestep energy depending on the chain length and the value of ∆Lk (Fig. 2). A possible explanation lies inthe fact that the binding domain of the Hbb dimer is twice as long as that for the HU dimer. Thus, the8 igure 1: Chain-length dependence of the relative step energy ǫ for minicircles with a single HU dimer. The blue circlescorrespond to the energies of relaxed minicircles and span the whole range of chain lengths. The red and orange trianglesrepresent the energies of over- and underwound minicircles, respectively. The horizontal dashed line ( ǫ = 1) corresponds tothe energy of a base-pair step in a naked minicircle of the same length. The vertical lines indicate the chain lengths equal tointegral numbers of helical repeats. length of protein-free DNA is less for minicircles with an Hbb protein, although the boundary conditions arecomparable to those of HU. In other words, the deformation required to satisfy the boundary conditions islarger in minicircles with an Hbb dimer, and, hence, the energy is higher. This argument also accounts forthe diﬀerences in the chain-length dependence of the relative step energy for minicircles containing an HUdimer versus those with an Hbb dimer. As shown in Fig. 1, for the HU dimer the chain-length dependenceof the relative step energy follows a damped oscillatory pattern (the period is roughly equal to the assumed10.5-bp DNA helical repeat). By contrast, there is no clear periodicity in the Hbb results (Fig. 2). Thissuggests, that within the present range of chain lengths, a minicircle with a single Hbb dimer is more constrained than one with an HU protein. Figure 2: Chain-length dependence of the relative step energy ǫ for minicircles with a single Hbb dimer. The blue circlescorrespond to the energies of relaxed minicircles and span the whole range of chain lengths. The red and orange trianglesrepresent the energies of over- and underwound minicircles, respectively. The horizontal dashed line ( ǫ = 1) corresponds tothe energy of a base-pair step in a naked minicircle of the same length. The vertical lines indicate the chain lengths equal tointegral numbers of helical repeats. The other interesting feature found in these results is the inﬂuence of the linking number and thetotal twist on the relative step energy. In the case of the HU dimer, relaxed minicircles have, in mostcases, a lower energy than under- or overwound minicircles. It is only for chain lengths close to half-9ntegral numbers of helical repeats that the energy is lower for underwound minicircles. On the other hand,underwound minicircles of 79 bp or greater bearing an Hbb dimer are consistently lower in energy thanrelaxed or overwound minicircles. As mentioned earlier, both dimers slightly underwind the double helix,which means that their presence on a minicircle induces a redistribution of the torsional stress. Note that,this redistribution may entail the conversion of part of the torsional stress into bending deformations as aconsequence of the changes in the total twist and the writhing number. This also explains why overwoundminicircles lead to higher energies: the large diﬀerence in the torsional stress between the naked DNA(prior to the addition of a protein) and the bound DNA causes higher deformations in the remaining partof the minicircle. In addition, the torsional stress in naked DNA depends on the chain length: for chainlengths close to integral numbers of helical repeats, the torsional stress is lower in relaxed minicircles thanin underwound minicircles, but for chain lengths near half-integral numbers of helical repeats the situationis reversed.Our results show that the torsional stress in DNA may inﬂuence the recruitment of HU and Hbb dimersand might act as a control mechanism for the presence of such proteins. For example, a relaxed minicircleof 94 bp is more likely to be bound to an HU than an Hbb dimer (assuming that the binding aﬃnity of bothdimers are comparable), whereas an underwound minicircle of the same length is more likely to take up anHbb dimer. Although our results are obtained on covalently closed DNA minicircles (and, hence, under atopological constraint), it is reasonable to think that similar eﬀects will take place on torsionally constrainedlinear DNA fragments (for example, the anchoring conditions could be achieved by large protein assembliesor magnetic/optical tweezers).

Our second series of optimizations focuses on minicircles of 100 bp and 105 bp containing two HU ortwo Hbb dimers. In addition, for minicircles of 100 bp we consider relaxed (∆Lk = 0) and underwound(∆Lk = −

1) molecules. This choice is motivated by the fact that the diﬀerence in energy between a nakedDNA minicircle of 100 bp with ∆Lk = 0 and ∆Lk = − . B T (the relaxed minicircle correspondsto the lower energy). Our computations provide energy landscapes of the protein-bound minicircles asfunctions of the spacing between the two binding sites. These landscapes are directly related to the relativelikelihoods of forming minicircles with two dimers at speciﬁc locations. In other words, the landscapesreveal which dimer positions along the minicircle are more apt to be occupied in, for example, cyclizationexperiments. We report in Figs. 3-6 the energy landscapes as well as the associated changes in the totaltwist ∆Tw (with respect to the planar and naked minicircles).The energy landscapes consist of several local minima separated by high energy states. In other words,there are well-deﬁned locations for optimal placements of two HU or Hbb proteins along the DNA minicircles.Notice that, only the minima with the lowest energies and, hence, with the highest Boltzmann weights,are likely to be relevant to the statistical physics of protein-decorated minicircles. These minima appearperiodically in the landscapes and, as expected, the period is roughly equal to the assumed DNA helicalrepeat (10.5 bp). In addition, the relative locations of the two proteins along the minicircle appear to aﬀectthe torsional stress signiﬁcantly as evidenced by the variations in the total twist. We also notice that thelowest energy conﬁgurations are all similar from a geometric point of view. That is, for both HU and Hbbproteins the globally optimal conﬁgurations are obtained when the two dimers are located at antipodal ornear antipodal sites (as shown by the conﬁgurations labeled ‘1’ in Fig. 3 to Fig. 6). We note that the minimafor the minicircles with two HU dimers are of lower energy than those for minicircles with two Hbb dimers.For Hbb-bound minicircles the length of protein-free DNA is shorter than for HU (for two Hbb dimers, thelength of naked DNA is comparable to the size of the Hbb binding domain). The Hbb system is thereforehighly constrained and leads to larger values in the energy landscapes and greater changes in the total twist.The results for the minicircles of 100 bp containing two HU or two Hbb dimers (see Figs. 3 and 5) showthat, the energy minima are lower for the underwound minicircle (∆Lk = −

1) than for the relaxed minicircle(∆Lk = 0). This is consistent with the results obtained for a minicircle of 100 bp with a single dimer (seeFigs. 1 and 2). We remark that the variations in the energy landscapes of the relaxed and underwoundminicircles of 100 bp are out of phase. That is, the minima for the relaxed minicircles correspond to themaxima of the underwound minicircles and vice versa . This phase shift between the energy landscapes of10 U bound do m a i n HU bound do m a i n HU bound do m a i n HU bound do m a i n Figure 3: Optimization results for relaxed and underwound minicircles of 100 bp with two HU dimers. The two plots on the leftrepresent (top) the optimized energy and (bottom) the changes in the total twist as functions of the center-to-center spacing s between the two proteins. In both plots, the gray areas denote the binding domain of the ﬁrst HU protein and the lightblue areas the binding domain of the second protein. The vertical lines indicate the chain lengths equal to integral numbersof helical repeats. The numbers in the energy plot refer to the structures depicted on the right in which the HU proteins arerepresented in green. The underwound (∆Lk = −

1) and relaxed (∆Lk = 0) minicircles are represented in orange and blue,respectively. relaxed and underwound minicircles is roughly equal to half the assumed helical repeat ( ∼ − handedness . We also notice that, thepattern in the changes in the total twist for minicircles of 100 bp with two dimers is similar: the localmaxima in the energy correspond to the highest changes in the total twist, while the local minima correlatewith those of moderate values of ∆Tw.For the minicircles of 105 bp containing two HU or Hbb dimers, the energy landscapes are similar tothose obtained for 100-bp minicircles. In particular, the minima appear periodically (every ∼ . U bound do m a i n HU bound do m a i n HU bound do m a i n HU bound do m a i n Figure 4: Optimization results for minicircles of 105 bp with two HU dimers. The two plots on the left represent (top) theoptimized energy and (bottom) the changes in the total twist as functions of the center-to-center spacing s between the twoproteins. In both plots, the gray areas denote the binding domain of the ﬁrst HU protein and the light blue areas the bindingdomain of the second protein. The vertical lines indicate the chain lengths equal to integral numbers of helical repeats. Thenumbers in the energy plot refer to the structures depicted on the right in which the HU proteins are represented in green. dissimilarity between minicircles of 100 bp and 105 bp resides in the changes in the total twist. We haveseen that for 100-bp minicircles, the changes in the total twist are comparable for HU and Hbb dimers. Inthe case of the minicircles of 105 bp, however, the values of ∆Tw corresponding to the local minima are ofdiﬀerent signs for the Hbb and HU minicircles. That is, for the lower energy conﬁgurations, the presenceof two HU dimers reduces the torsional stress in the minicircle (see Fig. 4), while two Hbb dimers increasethe torsional stress (see Fig. 6). We also notice that, for these local minima the magnitude of the changesin the total twist is comparable.Our study of minicircles with two dimers reveals the interplay between the elastic properties of DNAand the positioning of proteins on the double-helix. It appears that this is not about the proteins shapingthe double helix nor the DNA stiﬀnesses controlling the positioning of proteins. Rather there is cooperationbetween the presence of proteins and the elasticity of the double helix, particularly in the distribution of thetorsional stress. The high degree of contrast in our energy landscapes suggests that, once a protein is boundto a topologically constrained DNA fragment, the stress in the double helix will favor speciﬁc binding sitesfor other proteins. Such an interpretation echoes the recent results about DNA-protein allosteric eﬀectsobtained in single-molecule experiments [23]. It is also interesting to notice that the local minima found inour energy landscapes are always ﬂanked by conﬁgurations of comparable energy (up to a few k B T). This12 bb bound do m a i n H bb bound do m a i n H bb bound do m a i n H bb bound do m a i n Figure 5: Optimization results for relaxed and underwound minicircles of 100 bp with two Hbb dimers. The two plots onthe left represent (top) the optimized energy and (bottom) the changes in the total twist as functions of the center-to-centerspacing s between the two proteins. In both plots, the gray areas denote the binding domain of the ﬁrst Hbb protein andthe light blue areas the binding domain of the second protein. The vertical lines indicate the chain lengths equal to integralnumbers of helical repeats. The numbers in the energy plot refer to the structures depicted on the right in which the Hbbproteins are represented in pink. The underwound (∆Lk = −

1) and relaxed (∆Lk = 0) minicircles are represented in orangeand blue, respectively. suggests that there should be ﬂuctuations in the experimental measurements of the most likely binding sitesof HU and Hbb dimers along DNA minicircles. Notice that, our results have been obtained with a minimalmodel, that is, DNA is modeled as an isotropic material with standard bending and twisting stiﬀnesses andwe have focused on a special class of protein geometry which includes speciﬁc deformations on the doublehelix. Nevertheless, our approach serves as a proof of concept and paves the way for more detailed studiesabout the synergy between DNA deformation and protein positioning.

4. Discussion

The minimization procedure introduced in this work facilitates the investigation of how architecturalproteins may contribute to the spatial organization and genetic processing of DNA. This new approachgives us direct control of the positions and orientations of the base pairs at the ends of a DNA chain andallows us to specify the precise sites of protein uptake and the detailed changes in double-helical structurebrought about by the binding of protein. Here we illustrate the utility of the method in a study of theelastic energy landscapes of DNA minicircles decorated by the nonspeciﬁc architectural protein HU and13 bb bound do m a i n H bb bound do m a i n H bb bound do m a i n H bb bound do m a i n Figure 6: Optimization results for minicircles of 105 bp with two Hbb dimers. The two plots on the left represent (top) theoptimized energy and (bottom) the changes in the total twist as functions of the center-to-center spacing s between the twoproteins. In both plots, the gray areas denote the binding domain of the ﬁrst Hbb protein and the light blue areas the bindingdomain of the second protein. The vertical lines indicate the chain lengths equal to integral numbers of helical repeats. Thenumbers in the energy plot refer to the structures depicted on the right in which the Hbb proteins are represented in pink. the similarly folded, albeit site-speciﬁc [24], Hbb protein. Both proteins associate as dimers and introducesevere bends and untwist their DNA targets. We consider the Hbb-bound DNA as an extreme exampleof HU-induced DNA distortion and thus treat both proteins as nonspeciﬁc. We focus on covalently closedmolecules comparable in length to the loops that are formed by various regulatory proteins and enzymesthat bind to sequentially distant sites on DNA [25, 26] and allow for the uptake of one or two HU or Hbbdimers on the DNA. We also consider underwound and overwound minicircles in order to study the addedeﬀects of the torsional stress within the double helix on the protein-binding landscapes. We ﬁnd that thepresence of protein has a signiﬁcant eﬀect on the bending and twisting deformations in the minicircles, andconversely, that the torsional stress within DNA prior to the addition of proteins has a strong eﬀect on theoptimal placement of proteins along the minicircles. For example, we show that a single HU dimer is morelikely to bind relaxed rather than under- or overwound minicircles of most chain lengths between 63-105 bpand that an Hbb dimer binds preferentially to underwound minicircles of the same lengths.Our results reveal cooperation between the deformability of the double helix and the structural distortionsof DNA induced by bound proteins. In the case of minicircles with two HU or two Hbb dimers, the presenceof a ﬁrst protein strongly inﬂuences the locations of the optimal binding sites of a second protein. Thatis, the DNA, through its elastic deformation, acts as a communication medium between the proteins. In14articular, the torsional stress and the twisting stiﬀness appear to play a major role in this action at adistance. The mechanical signaling also provides a rationale for the DNA allostery reported in recent single-molecules studies of the dissociation of proteins on DNA chains constrained to full extension by the ﬂow ofsolvent [23, 27]. In particular, the binding of one protein on the extended DNA stabilizes or destabilizes thebinding of another protein, even when not in direct contact. Indeed, this sort of long-range communication,in which the binding of a ligand in one part of the DNA helix inﬂuences (positively or negatively) therecognition of a diﬀerent ligand at a remote site, also termed telestability [28], has puzzled DNA scientistsfor decades. Although our results concern a special class of proteins bound to covalently closed rather thanstraightened DNA, we observe an interplay between the elasticity of the double helix and the placementof proteins along DNA that may hold for extended as well as cyclic chains. In our case, the most likelyplacement of a second protein is antipodal to the ﬁrst, i.e. , as sequentially far apart as possible. Our studiesprovide examples of how the mechanical stress in DNA can control the placement of proteins and howproteins can alter the mechanical stress to broadcast their presence along DNA. Our ﬁndings also suggestthat local modiﬁcations of the mechanical properties of DNA, such as methylation or the occurrence ofkinks, could modulate and possibly repress this type of mechanical signaling. Figure 7: Orthogonal views of optimized loops induced by the DNA-bridging protein MatP [7] on circular DNA. The loopsdepicted on the left side run in anti-parallel directions and those on the right side run in parallel directions (note the arrowsand chain lengths in the images). The total chain lengths of the circular DNA on the left and right sides are 177 bp and 251bp, respectively. The MatP dimeric protein is represented in yellow. The step parameters of the two bound DNA domainshave been extracted from Protein Data Bank ﬁle 3VEA.

Although the minimal elastic energy conﬁgurations of long DNA chains are not necessarily relevantfrom a statistical physics point of view, our method can be used to study larger systems, such as longDNA molecules decorated by multiple proteins. For example, we show in Fig. 7 two examples of multipleloops induced by the Ter-speciﬁc protein MatP [7] on circular DNA, which is thought to be involved in thecondensation of chromosomes in

Escherichia coli . We also show in Fig. 8 a long DNA plasmid decoratedby 64 closely spaced Hbb proteins. Our software can also be used together with other tools (for example,3DNA [16]) to optimize DNA fragments anchored by proteins. An interesting application of our approachresides in the development of software for biomolecular sculpting [29], which makes it possible to study howproteins can bundle and organize DNA. We are currently working on improving our method to account foradditional types of constraints, such as the treatment of excluded volume. To date, our software does notcheck for collisions between DNA and proteins. Although the proteins collide on some of the minicirclespresented in this work (see Figs. 3-6), the collisions only occur in high-energy conﬁgurations, in which pairsof proteins lie immediately next to one another. We also plan to include more realistic force ﬁelds to account,for example, for the higher deformability of pyrimidine-purine compared to other base-pair steps and for theﬂexibility of protein assemblies. This task is facilitated by the fact that our method already accounts for thesequence-dependent elasticity of DNA. Finally, we are now concentrating our eﬀorts on other biomolecularsystems, including the loops mediated by the Lac and Gal repressor proteins and the relative contributionsof protein and DNA ﬂexibility in loop formation. 15 igure 8: Perspective (left) and orthographic (right) views of an optimized DNA plasmid of 1280 bp containing 64 Hbb proteinsregularly spaced by 5-bp linkers. The presence of the Hbb proteins (represented in pink) greatly condenses the DNA into ahelical ﬁber-like structure that folds into a circular conﬁguration. The circular radius of the resulting superstructure is 31 . . .

19 k B T, which suggests that the linkers are lowlydeformed. In contrast to the zigzag superhelical structure found in the crystal lattice (see Fig. 1c in [5]) in which the proteins lieon opposite sides of the superhelical axis, the proteins form a central core in the optimized structure. Notice that, this structurehas been obtained with a reduced

Hbb model in which only the ﬁfteen central base pairs of the full model described earlier areused. As a consequence, the spacing between the proteins is greater in the optimized structure than in the pseudo-continoushelix found in the crystal lattice.

References [1] K. K. Swinger, K. M. Lemberg, Y. Zhang, P. A. Rice, Flexible DNA bending in HU-DNA cocrystal structures, The EMBOJournal 22 (14) (2003) 3749–3760.[2] K. K. Swinger, P. A. Rice, IHF and hu: ﬂexible architects of bent DNA, Current Opinion in Structural Biology 14 (1)(2004) 28–35.[3] K. K. Swinger, P. A. Rice, Structure-based Analysis of HU–DNA Binding, Journal of Molecular Biology 365 (4) (2007)1005–1016.[4] D. Sagi, N. Friedman, C. Vorgias, A. B. Oppenheim, J. Stavans, Modulation of DNA Conformations Through the Forma-tion of Alternative High-order HU–DNA Complexes, Journal of Molecular Biology 341 (2) (2004) 419–428.[5] K. W. Mouw, P. A. Rice, Shaping the

Borrelia burgdorferi genome: crystal structure and binding properties of theDNA-bending protein Hbb, Molecular Microbiology 63 (5) (2007) 1319–1330.[6] S. T. Arold, P. G. Leonard, G. N. Parkinson, J. E. Ladbury, H-NS forms a superhelical protein scaﬀold for dna condensation,Proceedings of the National Academy of Sciences 107 (36) (2010) 15728–15732.[7] P. Dupaigne, N. K. Tonthat, O. Esp´eli, T. Whitﬁll, F. Boccard, M. A. Schumacher, Molecular Basis for a Protein-Mediated DNA-Bridging Mechanism that Functions in Condensation of the

E. coli

Chromosome, Molecular Cell 48 (4)(2012) 560–571.[8] R. T. Dame, O. J. Kalmykowa, D. C. Grainger, Chromosomal macrodomains and associated proteins: Implications forDNA organization and replication in gram negative bacteria, PLoS Genetics 7 (6) (2011) e1002123.[9] K. Luger, A. W. Mader, R. K. Richmond, D. F. Sargent, T. J. Richmond, Crystal structure of the nucleosome core particleat 2.8-˚A resolution, Nature 389 (6648) (1997) 251–260.[10] A. G. West, M. Gaszner, G. Felsenfeld, Insulators: many functions, many mechanisms, Genes & Development 16 (3)(2002) 271–288.[11] B. D. Coleman, W. K. Olson, D. Swigon, Theory of sequence-dependent DNA elasticity., Journal of Chemical Physics118 (15) (2003) 7127–7140.[12] Y. Zhang, D. M. Crothers, Statistical mechanics of sequence-dependent circular DNA and its application for DNA cycliza-tion, Biophysical Journal 84 (1) (2003) 136–153.[13] R. Dickerson, M. Bansal, C. Calladine, S. Diekmann, W. Hunter, O. Kennard, E. von Kitzing, R. Lavery, H. Nelson,W. Olson, W. Saenger, Z. Shakked, H. Sklenar, D. Soumpasis, C.-S. Tung, A.-J. Wang, V. Zhurkin, Deﬁnitions andnomenclature of nucleic acid structure parameters, Journal of Molecular Biology 205 (4) (1989) 787–791.[14] M. A. El Hassan, C. R. Calladine, The assessment of the geometry of dinucleotide steps in double-helical DNA; a newlocal calculation scheme, Journal of Molecular Biology 251 (5) (1995) 648–664.[15] J. Langer, D. A. Singer, Lagrangian aspects of the kirchhoﬀ elastic rod, SIAM Review 38 (1996) 605–618.[16] X.-J. Lu, W. K. Olson, 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization ofthree-dimensional nucleic-acid structures, Nature Protocols 3 (7) (2008) 1213–1227.

17] C. F. Gauss, Carl Friedrich Gauss Werke ... Herausgegeben von der K. Gesellschaft der Wissenschaften zu G¨ottingen,G¨ottingen, Gedruckt in der Dieterichschen Universit¨atsdruckerei (W.F. Kaestner), 1867.[18] J. H. White, W. R. Bauer, Calculation of the twist and the writhe for representative models of DNA, Journal of MolecularBiology 189 (2) (1986) 329–341.[19] W. Pohl, The self-linking number of a closed space curve, Indiana Univ. Math. J. 17 (1968) 975–985.[20] N. Clauvelin, W. K. Olson, I. Tobias, Characterization of the geometry and topology of DNA pictured as a discretecollection of atoms, Journal of Chemical Theory and Computation 8 (3) (2012) 1092–1107.[21] E. E. Zajac, Stability of two planar loop elasticas, ASME J. Applied Mechanics 29 (1962) 136–142.[22] A. Goriely, Twisted Elastic Rings and the Rediscoveries of Michell’s Instability, Journal of Elasticity 84 (3) (2006) 281–299.[23] S. Kim, E. Brostr¨omer, D. Xing, J. Jin, S. Chong, H. Ge, S. Wang, C. Gu, L. Yang, Y. Q. Gao, X.-d. Su, Y. Sun, X. S.Xie, Probing allostery through DNA, Science 339 (6121) (2013) 816–819.[24] K. Kobryn, D. Z. Naigamwalla, G. Chaconas, Site-speciﬁc DNA binding and bending by the

Borrelia burgdorferi ppendix A. Base-pair step geometry We consider in this appendix and the following ones a collection of N rigid base pairs and for the i -thbase pair we denote x i its origin and d i the matrix containing the axes of the base-pair frame organizedas column vectors. The step parameters of the i -th step are denoted p i and the step dofs are denoted φ i .The deﬁnitions of the step parameters are given in [11, 13, 14] (in particular, the ﬁrst reference providesexpressions for the angular step parameters in terms of the Euler angles describing the rotation between twosuccessive base pairs).Vector symbols are underlined and matrices are represented with bold symbols and the elements of avector or a matrix are denoted with square brackets. Superscripted Greek letters ( α, β, . . . ) are used todenote vector, matrix, or tensor entries and range from 1 to 3. Superscripted Roman letters ( i, j, . . . ), onthe other hand, are used to index base pairs and base-pair steps. For example, the α -th axis of base-pairframe d i is denoted d iα , and the β -th component of that vector is (cid:2) d iα (cid:3) β = (cid:2) d i (cid:3) αβ . We also use the Einsteinsummation notation on lower indices: for example, the scalar product of two vectors u and v is given by[ u ] α [ v ] α , that is, u · v = [ u ] α [ v ] α = P α =1 [ u ] α [ v ] α . Appendix A.1. Step rotation

For a given base-pair step, we denote D i the matrix describing the rotation between the two successivebase-pair frames d i and d i +1 . This rotation matrix is referred to as the step rotation matrix and is deﬁnedby: D i ( ψ i ) = d i ⊺ d i +1 . (A.1)One of the main results about the geometry of a base-pair step is the relation between the changes inthe angular step dofs and the changes in the relative orientation of the base-pair frames. We assume thatthe base-pair frames are orthonormal and it follows that an inﬁnitesimal perturbation of a base-pair frameaxis d iα can be written as δd iα = χ i × d iα . The changes in the step rotation matrix due to the perturbationof the two base-pair frames forming the step is therefore given by: (cid:2) δ D i (cid:3) αβ = δχ i · d i +1 β × d iα = − (cid:2) D i (cid:3) γβ ǫ αγζ d iζ · δχ i , (A.2)where δχ i = χ i +1 − χ i and ǫ αγζ is the three-dimensional Levi-Civita symbol. We introduce the vector δω i = d i ⊺ δχ i which is the projection of δχ i in the i -th base-pair frame. It follows that: h ( δ D i ) D i ⊺ i αβ = − ǫ αβγ (cid:2) δω i (cid:3) γ . (A.3)The term on the left-hand side corresponds to a skew matrix because of the orthogonality of D i and weintroduce the matrix Ξ i such that: h ( δ D i ) D i ⊺ i αβ = (cid:20) ∂ D i ∂ψ iγ D i ⊺ (cid:21) αβ δψ iγ = − ǫ αβζ (cid:2) Ξ i (cid:3) ζγ ψ iγ (A.4)We ﬁnally obtain after contracting the Levi-Civita symbols: δω i = Ξ i δψ i . (A.5)This expression relates the relative changes in the base-pair frames to the variations of the angular stepdofs. The matrix Ξ i is invertible [11] and we introduce the matrix Ω i = Ξ i − to obtain: δψ i = Ω i δω i . (A.6)The matrix Ω i can be obtained by direct calculation and its transpose is identical to the matrix Γ n givenin Eq. (A.1) of [11]. 18 ppendix A.2. Step frame The step itself can be characterized by introducing a so-called step frame (referred to as the mid-stepframe in [14]). This frame is located at ( x i + x i +1 ) / s i , is given by: s i = d i D is ( ψ i ) , (A.7)where D is is a rotation matrix that can be obtained directly from the angular step dofs (see [11] for moredetails). Appendix A.3. Translational step parameters

The step parameters ρ i are deﬁned as the projection of the step joining vector r i = x i +1 − x i on the stepframe, that is, we have: ρ i = s i ⊺ r i . (A.8)The deﬁnition given by Eq. (A.8) shows that the step parameters ρ i depend on the step dofs r i and onthe angular step dofs ψ i . We introduce the matrix T i such that: δρ i = T i δr i , (A.9)and we have: (cid:2) T i (cid:3) αβ = ∂ρ iα ∂r iβ = h s i ⊺ i αβ . (A.10)We also introduce the matrix R i such that: δρ i = R i δψ i . (A.11)We have: (cid:2) R i (cid:3) αβ = ∂ρ iα ∂ψ iβ = " ∂ D is ⊺ ∂ψ iβ D is αγ ρ iγ , (A.12)where the matrix appearing in the right-hand side is skew ( D is is orthogonal). We denote Λ iβ its vectorrepresentations and we obtain: (cid:2) R i (cid:3) αβ = (cid:2) Λ iβ × ρ i (cid:3) α . (A.13)The vectors Λ iβ can be obtained by direct calculation and are identical to the ones given in Eqs (A.2-A.4)in [11]. Appendix B. Base-pair collection geometry

We derive in this appendix the results related to the Jacobian matrix J Φ . This Jacobian matrix is deﬁnedas (see Eq. (5)): J Φ = ∂P∂ Φ , (B.1)where P denotes the set of all base-pair step parameters and Φ the set of all step dofs.19 ppendix B.1. Rotations within a base-pair collection We denote D ( i,j ) the matrix describing the rotation between the base-pair frames d i and d j ( i < j ).This rotation matrix is deﬁned as: D ( i,j ) = d i ⊺ d j = j − Y k = i D k ( ψ k ) , (B.2)where D k ( ψ k ) is the step rotation matrix (see Eq. (A.1)).We want to calculate the derivatives of this product of rotation matrices with respect to the angular stepdofs ψ k ( i ≤ k < j ). We have: ∂ D ( i,j ) ∂ψ kα = D i . . . D k − ∂ D k ∂ψ kα D k +1 . . . D j − , (B.3)which can be rewritten as: ∂ D ( i,j ) ∂ψ kα = d i ⊺ d k (cid:18) ∂ D k ∂ψ kα D k ⊺ (cid:19) d k ⊺ d j . (B.4)The matrix in parentheses on the right-hand side is a skew matrix (see Eq. (A.4)) and we introduce thenotation (cid:0) ∂ D k /∂ψ kα (cid:1) D k ⊺ = P kα such that: ∂ D ( i,j ) ∂ψ kα = d i ⊺ d k P kα d k ⊺ d j = D ( i,k ) P kα D ( k,j ) , (B.5)and the transpose is given by: ∂ D ( i,j ) ∂ψ kα ! ⊺ = D ( k,j ) ⊺ P kα ⊺ D ( i,k ) ⊺ = − D ( k,j ) ⊺ P kα D ( i,k ) ⊺ , (B.6)where we used the property P kα ⊺ = − P kα .We can use this result to calculate the derivatives of a base-pair frame with respect to the angular stepdofs of the preceding steps. The i -th base-pair frame in the collection is given by: d i = d D (1 ,i ) . (B.7)It follows from Eq. (B.5) that: ∂ d i ∂ψ kα = (cid:16) d k P kα d k ⊺ (cid:17) d i (B.8)The matrix multiplying d i on the right-hand side is a skew matrix and can therefore be represented by avector. This vector expresses the inﬁnitesimal rotation due to changes in the ψ k . We deﬁne the vector S kα as: h S kα i β = − ǫ γζβ h d k P kα d k ⊺ i γζ . (B.9)This expression can be rewritten in a more concise form with the help of Eq. (A.4) ( P kα is the skew matrixobtained from the α -th column of Ξ k ): S kα = d j Ξ kα , (B.10)where Ξ kα is the α -th column of Ξ k . In other words, S kα is the expression of the vector Ξ kα in the globalreference frame and the vector S kα describes the inﬁnitesimal rotation associated with a change in the α -thangular step dof of the k -th step. We ﬁnally obtain: ∂∂ψ kα (cid:2) d k (cid:3) γζ = ∂∂ψ kα h d kζ i γ = h S kα × d iζ i γ . (B.11)20 ppendix B.2. Jacobian matrix J Φ We already have two of the three contributions to the Jacobian matrix of the collection: the matrices T i and R i (see Eq. (A.10) and Eq. (A.13)) express the dependence of the ρ i on the step dofs ψ i and r i . Thedeﬁnition of the ρ i (Eq. (A.8)) shows that there is also a dependence on the angular dofs from the precedingsteps in the collection. We have from Eq. (A.8): ∂ρ iα ∂ψ jβ = " D is ⊺ ∂ d i ∂ψ jβ ! ⊺ αγ r iγ , where j < i (B.12)which, with the help of Eq. (B.11), leads to: ∂ρ iα ∂ψ jβ = − h(cid:16) s i ⊺ S jβ (cid:17) × ρ i i α = (cid:2) U j,i (cid:3) αβ . (B.13)We can now write the full variation of the step parameters p i in terms of the step dofs φ k (1 ≤ k ≤ i ) as: δp i =  I R i T i  δφ i + i − X j =1  U j,i  δφ j , (B.14)where R i is deﬁned in Eq. (A.13), T i in Eq. (A.10), and U j,i in Eq. (B.13). The matrix J Φ is directlyobtained from this expression and its structure is shown in Fig. B.9. Figure B.9: Structure of the matrix J Φ for four central steps i − , . . . , i + 1. The vectors p i denote the step parameters whilethe vectors φ i denote the step dofs. The solid lines delimit the 6 × θ i and ρ i for the step parameters, and ψ i and r i for the stepdofs). The white blocks represent null entries. ppendix C. End conditions We provide in this appendix the details for the calculation of the Jacobian matrix ˆ J . This Jacobianmatrix relates the set of independent step dofs to the complete set of step dofs. The number of independentstep dofs depends on the details of the boundary conditions. We focus here on an imposed end-to-end vectorand an imposed end-to-end rotation but other types of boundary conditions can be treated along the samelines.The imposed end-to-end vector condition is given by Eq. (10) which reads: δr N − = − N − X i =1 δr i . (C.1)In other words, the change in the joining vector of the last step is completely determined by the changes inthe joining vectors of the other steps.The imposed end-to-end rotation condition corresponds to (see Eq. (12)): δ D (1 ,N ) = δ N − Y i =1 D i ( ψ i ) ! = N − X i =1 ∂ D (1 ,N ) ∂ψ iα δψ iα = . (C.2)This condition can be written as: ∂ D (1 ,N ) ∂ψ N − α δψ N − α = − N − X i =1 ∂ D (1 ,N ) ∂ψ iα δψ iα , (C.3)and with the help of Eq. (B.5) we obtain: P N − α δψ N − α = − d N − ⊺ N − X i =1 d i P iα d i ⊺ ! d N − δψ N − α . (C.4)The right-hand side corresponds to a sum of skew matrices and we therefore introduce Q i,N − as: h d N − ⊺ d i P iα d i ⊺ d N − i βγ δψ iα = − ǫ βγζ (cid:2) Q i,N − (cid:3) ζα δψ iα . (C.5)Using Eq. (A.4) we obtain after contracting the Levi-Civita symbols: (cid:2) Ξ N − (cid:3) ζα δψ N − α = − N − X i =1 (cid:2) Q i,N − (cid:3) ζα δψ iα . (C.6)We can invert the left-hand side (see Eq. (A.6)) to ﬁnally obtain: δψ N − = − N − X i =1 Ω N − Q i,N − δψ i . = − N − X i =1 K i δψ i . (C.7)Note that the matrix Q i,N − can be expressed in terms of the vectors S iβ : (cid:2) Q i,N − (cid:3) αβ = d N − α · S iβ . (C.8)In other words, the matrix Q i,N − is obtained by projecting the vectors S iβ in the base-pair frame d N − and, hence, expresses the inﬁnitesimal rotation originating from the changes in the angular step dofs ψ i inthe base-pair frame d N − . 22he results obtained in Eq. (10) and Eq. (C.7) show that: δφ N − = − N − X i =1  K i

00 I  δφ i = − N − X i =1 B i δφ i . (C.9)This shows that the imposed end-to-end vector and rotation conditions reduce the number of independentstep dofs (the step dofs of the last step can be expressed in terms of the step dofs of all the other steps).That is, the set of independent step dofs, denoted ˆΦ, corresponds to ˆΦ = { φ i } i =1 ,...,N − . The Jacobianmatrix ˆ J , deﬁned as ˆ J = ∂ Φ /∂ ˆΦ, is sparse and its structure is shown in Fig. C.10. Figure C.10: Structure of the matrix ˆ J for imposed end-to-end vector and rotation for the last four steps of a collection of basepairs. The vectors φ i denote the step dofs. The columns of this matrix are related to the independent step dofs and, hence,the last column is for step ( N −

2) as the ( N − × ψ i from the translationalvariables r i . The white blocks represent null entries. Appendix D. Frozen steps

We focus in this appendix on the calculation of the matrix ˜ J (see Eq. (20)).We consider that the k -th step in the collection of base pairs is frozen, that is, its step parameters p k areimposed and constant. It follows directly that: δψ k = 0 . (D.1)The condition on the variation of the translational step dofs, δr k , is obtained from the expression: δρ k = ∂ρ kα ∂r kβ δr kβ + k − X ℓ =1 ∂ρ kα ∂ψ ℓβ δψ ℓβ = 0 . (D.2)It follows from Eq. (A.10) and Eq. (B.13): δr kβ = k − X ℓ =1 h S ℓγ × r k i β δψ ℓγ = k − X ℓ =1 (cid:2) W ℓ,k (cid:3) βγ δψ ℓγ . (D.3)23his result simply mean that the vector r k only changes in orientation due to the changes in the angularstep dofs of the preceding steps.We introduce the matrix C ℓ,k as: δφ k = k − X ℓ =1 C ℓ,k δφ ℓ = k − X ℓ =1  W ℓ,k  δφ ℓ . (D.4)The Jacobian matrix ˜ J is readily determined from this result and its structure is shown in Fig. D.11. Figure D.11: Structure of the matrix ˜ J for a base-pair collection in which the k -th step is frozen. The solid lines delimitbase-pair step block matrices, while the dashed lines separate the angular variables ψ i from the translational variables r i . Thewhite blocks represent null entries. Appendix E. Zajac instability for DNA minicircles

The Zajac instability [21, 22] is related to the fact that a twisted elastic ring becomes unstable if thetwist density within the ring exceeds a certain value. It is straightforward to transpose this result to ourDNA base-pair model (in fact, our mesoscale model together with our ideal force ﬁeld can be seen as adiscretized Kirchhoﬀ elastic rod model).For a DNA fragment of N bp in its rest state, the intrinsic total twist Tw is given by:Tw = N ∗ θ

360 = N . , (E.1)where θ = 360 / . . − Tw and the stability criterion is written as: | ∆Tw | < √ , (E.2)24here Γ is the ratio of the bending and twisting stiﬀnesses (for our ideal for ﬁeld we have Γ = 0 . and we recall thatLk = [ N/ .

5] (where the brackets stand for the nearest integer operator). It follows that the criterion canbe written as: − √ − (cid:18)(cid:20) N . (cid:21) − N . (cid:19) < ∆Lk < √ − (cid:18)(cid:20) N . (cid:21) − N . (cid:19) . (E.3)These conditions with the fact that ∆Lk is an integer leads to the following set of solutions:∆Lk = {− , , +1 } . (E.4) Appendix F. Implementation details

We use the C ++ ALGLIB library [30] to implement our minimization method and rely on the L-BFGSalgorithm. We also use the C ++++