[PDF] Effective Model of Loop Extrusion Predicts Chromosomal Domains

Abstract

An active loop-extrusion mechanism is regarded as the main out--of--equilibrium mechanism responsible for the structuring of megabase-sized domains in chromosomes. We developed a model to study the dynamics of the chromosome fibre by solving the kinetic equations associated with the motion of the extruder. By averaging out the position of the extruder along the chain, we build an effective equilibrium model capable of reproducing experimental contact maps based solely on the positions of extrusion--blocking proteins. We assessed the quality of the effective model using numerical simulations of chromosomal segments and comparing the results with explicit-extruder models and experimental data.

Full PDF

EEﬀective Model of Loop Extrusion Predicts ChromosomalDomains

Martina Crippa

Department of Physics, Universit`a degli Studi di Milano,via Celoria 16, 20133 Milano, Italy andDepartment of Applied Science and Technology, Politecnico di Torino,Corso Duca degli Abruzzi 24, 10129 Torino, Italy

Yinxiu Zhan

Friedrich Miescher Institute for Biomedical Research,Maulbeerstrasse 66, 4058 Basel, Switzerland

Guido Tiana ∗ Department of Physics and Center for Complexity and Biosystems,Universit`a degli Studi di Milano and INFN,via Celoria 16, 20133 Milano, Italy (Dated: August 4, 2020) a r X i v : . [ q - b i o . B M ] A ug bstract An active loop-extrusion mechanism is regarded as the main out–of–equilibrium mechanismresponsible for the structuring of megabase-sized domains in chromosomes. We developed a modelto study the dynamics of the chromosome ﬁbre by solving the kinetic equations associated with themotion of the extruder. By averaging out the position of the extruder along the chain, we buildan eﬀective equilibrium model capable of reproducing experimental contact maps based solely onthe positions of extrusion–blocking proteins. We assessed the quality of the eﬀective model usingnumerical simulations of chromosomal segments and comparing the results with explicit-extrudermodels and experimental data.

I. INTRODUCTION

Chromosomes display a hierarchical structure of domains during cellular interphase [1,2]. In mammals, the level of topological associating domains (TADs), at the mega–basescale, constitutes the most important level in the hierarchy for their role in controlling geneexpression. The folding of TADs has been described at a molecular level by an active loop–extrusion mechanism [3], where a protein complex extrudes chromatin loops and it can bestopped by proteins bound to chromosome (for a review, see refs. [4, 5]).The cohesin protein complex has been suggested to extrude the chromatin ﬁber, keepingclose in space the two chromosomal segments at which it is bound at a given time (see Fig.1). The extrusion activity can be stopped by CTCF proteins bound to chromatin, thusstabilizing the contact between the CTCF-bound chromosomal regions. In fact, enrichmentin CTCF has been observed in loci pivoting strong contacts [6]. Cells lacking either CTCF[7] or cohesin [8] display a reduced structuring of TADs. Recently, microscopy experimentsusing biochemically reconstructed systems showed that cohesin can extrude chromatin in anATP–dependent way [9, 10].An interesting feature of CTCF is that it is directional, in the sense that it can bindasymmetrically to chromatin in both directions and can stop eﬃciently cohesin only if itoriented towards it, but not those oriented opposite to it [11]. This directionality arisesbecause CTCF is not simply a barrier to the motion of cohesin, but interacts with it in ∗ [email protected] II. MOTION OF THE EXTRUDER

Some of the numerical parameters that are necessary to build the model are known,see Appendix A. In particular, the diﬀusion coeﬃcient of cohesin in the nucleoplasm ismuch larger than that of chromatin loci on the TAD length scale, suggesting that one canassume cohesin to be well-mixed in cellular nucleus. Moreover, the time scale associatedwith extrusion is slightly smaller than that associated with the motion of the polymer chainon the TAD length scale. Even if this diﬀerence is marginal, we tested the assumption that3 a)(b) (c)(d) (e) (f) {r i } (g) FIG. 1. A sketch of the loop–extrusion mechanism. Cohesin diﬀuses in the nucleus (a) and canbe loaded onto the chromosome at a random position (b). From here, it starts to run on thechromosome (c), extruding its two strings and thus forming a loop (d). CTCF proteins boundto chromosomes towards the running cohesin (black arrows) can stop tits motion (e). At anypoint, cohesin has a probability to detach from the chromosome (f). (g) The position of cohesinalong the chain deﬁne the interactions that contribute to determining the conformations { r i } ofthe chromosome. the distribution of cohesin along the chain can be regarded as stationary. We comparedthe results of the eﬀective model with both the experimental data and polymer model inwhich cohesin in simulated explicitly, thus without making in this case any assumption onits probability distribution.Let’s assume that the extruder can only walk towards the ends of the chain, that it walkswith constant rate in a ﬁxed direction and that it cannot overcome a CTCF molecule. Let’sdeﬁne the binary quantities σ + i and σ − i that assume the values 1 if site i contains a CTCFmolecule oriented forward and backward, respectively. We also deﬁne˜ δ ± i ≡ − δ σ ± i , (1)that assumes the value 0 in the sites with a CTCF molecule oriented in the speciﬁed directionand thus it is able to stop the motion of the extruder in that direction; it takes the value 1otherwise.The rate equation that describes the amount p i,j ( t ) of extruder linking sites i and j of4he chromosomes is dp i,j dt = k on δ | i − j | , − k oﬀ p i,j + k ˜ δ − i +1 p i +1 ,j − (2) − k ˜ δ − i p i,j + k ˜ δ + j − p i,j − − k ˜ δ − j p i,j , where k on is the loading rate of the extruder on the chromosome, k oﬀ the detachment rateand k the advancement rate. The stationary distribution can be obtained setting to zerothe time derivative for every i and j , that is p ij = k on δ | i − j | , + k ˜ δ − i +1 p i +1 ,j + k ˜ δ + j − p i,j − k oﬀ + k (˜ δ − i + ˜ δ + j ) . (3)This equation can be solved recursively, exploiting the fact that p ij depends only on theprobabilities p kl such that i < k < l < j .An important approximation that we implicitly did in Eq. (2) is that multiple extrudersdo not interact with each other by excluded volume when they walk on the chromosome. A. Chromosome without CTCF

The simplest case is that in which the extruder can walk freely on the chromosome inabsence of CTCF, as described in Fig. 2(a).In this case, Eq. (3) becomes p i,j = k on δ | i − j | , + kp i +1 ,j + kp i,j − k oﬀ + 2 k . (4)Starting from the case i, i + 1 at which the extruder can bind, one can write iteratively p i,i +1 = k on k oﬀ + 2 k ≡ p ,p i,i +2 = kp i +1 ,i +2 + kp i,i +1 k oﬀ + 2 k = 2 kk oﬀ + 2 k p ,p i,i +3 = kp i +1 ,i +3 + kp i,j +2 k oﬀ + 2 k = (cid:18) kk oﬀ + 2 k (cid:19) p ...p i,j = (cid:18) kk oﬀ + 2 k (cid:19) j − i − p , (5)where use is made of the translational invariance p i + n,j + n = p ij and the boundary condition p i,i = 0. 5 a) i j (b) i j (c) i ji c j c (d) i ji c j c k c (e) i ji c j c k c (f) i ji c j c k c l c j c i c k c l c l c FIG. 2. A sketch of the diﬀerent ways in which cohesin can run, according to the position of CTCF.The pairs of bead indicate the position of cohesin at the two chromosomal sites it encloses. (a) Achromosomal segment where the extruder can move freely. (b) The case in which the extruder isconstrained by two convergent CTCF molecules (i.e., σ + i c = 1 and σ − j c = 1). (c) The case in which afurther CTCF molecule prevents the motion of the extruder in one direction (i.e., σ + i c = 1, σ + k c = 1and σ − j c = 1). (d) Is the case similar to the previous one, with multiple aligned CTCF. (e) Thecase with convergent CTCF in between. (f) The case of several divergent CTCF in between, theinner being at positions k c and l c , respectively. B. Contacts between sites within convergent CTCF

Consider a chromosome segment bordered by convergent CTCF at sites i c and j c , as inFig. 2(b). The value of p ij with i c < i < j < j c depends only on the amount of extruder inthe interval from i to j , so for i c < i < j < j c Eq. (5) still holds.Equation (3) can now be written as p ij = k on δ | i − j | , + k (1 − δ i +1 ,i c ) p i +1 ,j + k (1 − δ j − ,j c ) p i,j − k oﬀ + k (2 − δ i,i c − δ j,j c ) . (6)The amount of extruder in sites containing a CTCF molecule can be found from Eqs. (6)6nd (5). For example, the term p i,j c = kp i +1 ,j c + kp i,j c − k oﬀ + k , (7)where p i,j c − is that of Eq. (5) and we iterate on p i +1 ,j c . We get from Eqs. (7) and (6) p j c − ,j c = k on k oﬀ + kp j c − ,j c = kk oﬀ + k (cid:20) k on k oﬀ + k + p (cid:21) p j c − ,j c = k on k oﬀ + k (cid:18) kk oﬀ + k (cid:19) + (cid:18) kk oﬀ + k (cid:19) p ++ k on k oﬀ + k (cid:18) kk oﬀ + 2 k (cid:19) p p j c − n,j c = k on k oﬀ + k (cid:18) kk oﬀ + k (cid:19) n − ++ n − (cid:88) l =1 (cid:18) kk oﬀ + k (cid:19) n − l (cid:18) kk oﬀ + 2 k (cid:19) l − p . (8)The general form of p j c − n,j c contains a geometric sum that gives p j c − n,j c = k on k oﬀ + k (cid:18) kk oﬀ + k (cid:19) n − ++ k oﬀ + 2 kk (cid:18) kk oﬀ + k (cid:19) n − p − (cid:18) kk oﬀ + 2 k (cid:19) n − p (9)By symmetry, the same expression is valid for p i c ,i c + n . The probability associated with bothCTCF sites obeys by Eq. (6) the relation p i c ,j c = kp i c +1 ,j c + kp i c ,j c − k oﬀ , (10)that can be evaluated substituting Eq. (9) in it. C. Contacts across a CTCF site

Consider a segment from i c to j c closed by convergent CTCF molecules, with a furtherCTCF molecule at position k c with i c < k c < j c and, for instance, directed upward (i.e., σ + k c = 1), as in Fig. 2(c).Pairs of sites on the same side with respect to k c displays the same probabilities asdescribed above, that is Eqs. (4), (9) and (10). Pairs interspersed by CTCF molecules,7.e. i < k c < j , are aﬀected by the fact that the two sites cannot be reached evenly fromextruders coming from all parts of the segment ( i, j ).Let’s use again an iterative approach, starting from p k c − ,k c = p . (11)The probabilities involoving site k c + 1 obey p k c − n,k c +1 = kk oﬀ + 2 k p k c − n +1 ,k c +1 == (2 n − (cid:18) kk oﬀ + 2 k (cid:19) n p . (12)Similarly, those involving site k c − p k c − ,k c + m = (cid:18) kk oﬀ + 2 k (cid:19) m p . (13)For any pair of sites across k c , the probability obeys the iterative relation p k c − n,k c + m = kk oﬀ + 2 k ( p k c − n +1 ,k c + m + p k c − n,k c + m − ) . (14)One can look for solutions in the form p k c − n,k c + m = a n,m (cid:18) kk oﬀ + 2 k (cid:19) n + m − p , (15)that, substituted in Eq. (14), gives the iterative relation a n,m = kk oﬀ + 2 k ( a n − ,m + a n,m − ) , (16)starting from a n, = 2 n − a ,m = 1 (Eq. 13).Solving the iterative problem making use of a bivariate generating function (see AppendixB), one obtains p k c − n,k c + m = (cid:18) n + m − m (cid:19) F (1 , − n, m, − ·· (cid:18) kk oﬀ + 2 k (cid:19) m + n − p , (17)where F is the Gaussian hypergeometric function.8 . Contacts across several CTCF sites Consider now the case of a pair of sites i and j separated by more than a CTCF molecule,with various orientations, like in Figs. 2(d)-(e). If both orientations are present, like in Fig.2(e), then p i,j = 0 because no extruder can bind to any pair of sites q, q + 1 with i < q < j and reach sites i and j .For sites i and j separated by two CTCF sites (at positions k c and l c ) with the samealignment, as in Fig. 2(d) one can follow the same strategy as that of Sect. II C.Analogously to Eq. (11), the starting point is the probability p k c − ,l c that in the presentcase is given by Eq. (13) because sites k c − l c fall in the case of Fig. 2(c), that is p k c − ,l c = (cid:18) kk oﬀ + 2 k (cid:19) l c − k c p . (18)From here, an iterative relation analogous to Eq. (14) holds, that is p k c − n,l c + m = kk oﬀ + 2 k ( p k c − n +1 ,l c + m + p k c − n,l c + m − ) , (19)whose solution is the same as that of Eq. (17), p k c − n,l c + m = (cid:18) n + m − m (cid:19) F (1 , − n. m, − ·· (cid:18) kk oﬀ + 2 k (cid:19) m + n + l c − k c − p , (20)with the diﬀerence that the iterative propagation is applied to Eq. (18) instead that to p only.This solution can be easily extended to the case in which between the two sites of interestthere is an arbitrary sequence { k c , k c , ..., k cN } of CTCF sites aligned in the same direction.In this case, one can apply the propagator of Eq. (20) to p k c ,c ( N − obtaining p k c − n,k cN + m = (cid:18) n + m − m (cid:19) F (1 , − n. m, − ·· (cid:18) kk oﬀ + 2 k (cid:19) m + n + cN − c − p , (21)thanks to the fact that F (1 , , m, −

1) = 1.The most problematic case is that of two sites i and j separated by diverging CTCFmolecules, like in Fig. 2(f). Calling k c and l c , respectively, the inner sites, we know that p k c − ,l c +1 = kk oﬀ + 2 k [ p k c − ,l c + p k c ,l c +1 ] (22)9hat can be easilly evaluated using Eq. (20). However, the exact solution of this case forgeneric values of n and m would require the summation of terms in the form of Eq. (20),that we are not able to do. For this reason we resort to an approximation, writing p k c − n,l c + m = p k c ,l c + m (cid:18) kk oﬀ + k (cid:19) n + p k c − n,l c (cid:18) kk oﬀ + k (cid:19) m , (23)in which the probabilities at the right–hand side are given by Eq. (20). This correspondsto the assumption that the extruders that can reach sites i and j are only those that afterreaching sites k c and l c + m walks n steps on to reach k c + n , and those that do the samething from sites k c + n and l c , making m steps from the former. Equation (23) is exactfor n = m = 1 and is expected to underestimate the true probability for large n and m ,probability that is anyway low in this limit. Moreover, under the same assumptions, theresulting probability does not change if multiple CTCF sites are aligned in the two directions,as in Fig. 2(f). III. EFFECTIVE MODEL

From the knowledge of the stationary distribution of extruder, we built an eﬀective poly-meric model in which the degrees of freedom of the extruder are averaged out. In otherwords, we started from a model which is surely out of equilibrium because the motion ofcohesin does not obey the condition of detailed balance, we showed that the distribution ofcohesin along the chain has a stationary distribution and investigated if there is an eﬀec-tive potential that displays that distribution at equilibrium, through Boltzmann statistics.Then, we used the parameters of this potential in the conformational space of the polymer,in connection with a realistic (although arbitrary) contact function that deﬁnes the spatialdependence of the potential.Let’s assume that the number µ ij of extruder molecules binding sites i and j of thechromosome can be written as an equilibrium state of an eﬀective potential P ( µ ) = 1 Z µ e − (cid:80) ij (cid:15) ij µ ij , (24)where (cid:15) ij is a site-dependent eﬀective energy.Due to the rigid nature of the extruder, the conditional probability associated with a10onformation { r i } of the system for any given state { µ ij } of the extruder along the chain is P ( r | µ ) = 1 Z (cid:89) i

To test the performance of the eﬀective model, we performed molecular–dynamics sim-ulations of chromosomal segments described by a chain of beads connected by springs andinteracting with a potential U = U + U eﬀ (cf. Eq. 28) given by a polymeric term U = k s r ij − a ) + (cid:15) (cid:88) i

89. The simulated contact map displays themain features of the experimental Hi–C map, including two regions with high contact prob-ability (cf panels b and c in Fig. 3). Moreover, the simulated map displays strong contactsin the initial part of the polymer which are not present in the experimental map; this is aregion lacking of any CTCF molecule.To better understand the eﬀective model, we simulated a toy model made of a 30-beadstring with two convergent CTCF sites, as displayed in Fig. 4(a). The eﬀective interactions (cid:15) ij display a square of strongly interacting elements within the two CTCF sites, see Fig.4(b); the borders of this square are even more interacting, as well as the corner where theextruder accumulates. In addition, there is a sort of ’border eﬀect’ due to extruders thatbind to the CTCF–free ends of the chain. The simulation of the eﬀective model produces a13 xperimental simulatedr=0.89

10 10 Δ t [ps] H [ kJ / m o l ] ~ (b)(a) ChrX (d) Δ ij> (e) (f)(g) -80 ε ij (c) C T C F FIG. 3. (a) The width of ﬂuctuations of the eﬀective energy ˜ H as a function of the time step∆ t of the simulation, in ps. (b) The experimental Hi–C map of the Tsix region. (c) The contactmap simulated with the eﬀective model. The positions of CTCF in both orientations are indicatedbelow the map. The correlation coeﬃcient r between simulated and experimental contact mapas a function of the shift E of the energy parameters (d) and of the distance R cont that deﬁnescontacts (e). (f) The interaction energy (cid:15) ij between the beads. (g) The standard deviation between10 simulations of 16 minutes each, plotted in the same color scale of the contact maps. contact map which reﬂects essentially the interaction potential (cf. 4(c)). V. COMPARISON WITH EXPLICIT EXTRUDER MODEL

A relevant question we want to answer is how the the eﬀective model performs with respectto a model in which the extruder is described explicitly [3]. In fact, we do not expect thatthe eﬀective model can reproduce all details of Hi–C maps, because loop extrusion is not theonly mechanism at work. For example, it is known that the formation of compartments onthe scale of the whole chromosome is not driven by loop extrusion but interacts with it ata smaller scale [21]. Our main goal is then to show that the eﬀective model can reproduceHi–C map with the same accuracy as the explicit–extruder model.In the explicit model, we assumed that the extruder is well–mixed around the polymerand it is always available for binding. It can bind to a pair of adjacent sites with rate14 a) (d) (e)(c)(b)1 30

FIG. 4. (a) A toy model with two convergent CTCF molecules. (b) The eﬀective contact energy (cid:15) ij . (c) The result of a simulation with the eﬀective model of 16 minutes with E = 13. (d) Thecontact map resulting from the average of 30 explicit-extruder simulations of 16 minutes each inwhich extruders can overcome each other freely. (e) The result of simulations in which extruderscannot move into occupied sites. k on , each side of the extruder can walk with rate k and it can detach with rate k oﬀ . Themonomers linked by an extruder experience a harmonic force characterized by a harmonicconstant k s and a rest distance a , that is the same force that guarantees the integrity of thepolymer. We assumed that diﬀerent bound extruders cannot overcome each other and theycannot overcome CTCF sites. The numerical parameters are given in Appendix A.The average contact map obtained from 30 simulations, calculated in the same way asthose obtained with the eﬀective model (cf. Sect. IV), is displayed in Figs. 5(a) and (b) forthe Tsix domain. Contact maps seem to be at convergence. The correlation coeﬃcient withthe experimental map is r = 0 .

89, which is identical to that of the eﬀective model.The main diﬀerence between the explicit and the eﬀective model is in the ﬂuctuationsaround the average. In Fig. 5(c) we showed the result of three individual simulations andin Fig. 5(d) the standard deviation associated with the simulations. It is apparent that inexplicit–extruder simulations the average map is given by the contribution of maps whichare quite diﬀerent from each other. In fact, the standard deviation is comparable with theaverage. This result is diﬀerent than that of the eﬀective model, in which the diﬀerentsimulations generate maps which are much more homogeneous (cf. Fig. 3(g) ).15 xperimental simulation (mean) individual simulations < Δ ij >00.10.30.3 ChrX (a) (b)(d)(c) simulation (stdev)r=0.89

FIG. 5. Results of the simulation of the Tsix TAD with explicit extruders. (a) The experimentalmap. (b) The mean contact function (cid:104) ∆ ij (cid:105) , averaged over 30 simulations (c) Examples of individualsimulations that contribute to the average. (d) Their standard deviation. In the case of the toy model of Fig. 4, explicit-extruder simulations produce contactmaps in which corner peaks are more evident, and the overall domain is less clear. Wealso compared the results of simulations in which extuders are freely allowed to overcomeeach other (Fig. 4(d)) with simulations in which an extruder cannot occupy a site which isalready occupied (Fig. 4(e)). The two maps are essentially identical, suggesting that thehypothesis done in connection with Eq. (4) is not critical.We repeated similar calculations for other two regions of the chromosome X of mouseembryonic stem cells, of 1300 and 2600 kbp, respectively. The energy maps (cid:15) ij are displayedin Fig. 6. As in the case of Tsix, the energy maps contain most of the features displayedby the experimental maps. The comparison between the results of the explicit–extrudermodel, those of the eﬀective model and the experimental map are displayed in Fig. 7. Alsoin these cases, the eﬀective model ( r = 0 .

82 and r = 0 .

76 for the two regions, respectively)performs similarly, if not better, than the explicit–extruder model ( r = 0 .

78 and r = 0 . -8(a) (b) ε ij FIG. 6. The interaction energies (cid:15) ij between beads for (a) region chrX:102278307-103570000 and(b) region chrX:103578307-106170000. The inset is the zoom of the squared region. r=0.82experiment effective modelr=0.76explicit model effective modelexplicit model r=0.78r=0.71 experiment CTCFCTCF

FIG. 7. Comparison between the experimental data (central panels), the results of the explicit–extruder model (left panels) and those of the eﬀective model (right panels). The upper panels areregion chrX:102278307-103570000, the lower panel chrX:103578307-106170000. The dashed linesare a guide to the eye. due to simulation statistics: while the explicit-extruder model requires to average the mo-tion of the extruders over multiple simulations (30 in the case shown above), the eﬀectivemodel already contains implicitly this average. The most apparent diﬀerence between thetwo models is that while the explict extruder generates maps whose elements are spatiallycorrelated with their neighbors, the maps obtained with the eﬀective model display abruptchanges between neighboring elements. This is not unexpected, since the eﬀective model,assuming a stationary distribution of the extruder along the chain, neglects correlationsbetween consecutive sites associated with the motion of the extruder on short time scales.A popular way of summarizing the information contained in contact maps of chromosomes17s studying the average contact probability between sites as a function of their distance | i − j | along the chain, that is usually a power law [25].The three sets of experimental data we studied display scaling coeﬃcients β = 0 . β = 0 .

87 and 0 .

77, respectively. The simulations display a central region (around 10 beads,corresponding to 50 kbp) in which the contact probability is a power law with good exponentsboth in the case of the explicit–extruder (giving 0.74, 0.86 and 0.75, respectively) and theeﬀective model (0.73, 0.86 and 0.78, respectively). In addition, all models display a bend at | i − j | <

5, likely due to the coarse–graining of the model and an exponential cut-oﬀ due toﬁnite–size eﬀects.This power–law dependence of the contact probability on the linear distance is not surpris-ing in the light of the eﬀective model, if this describes realistically the eﬀective interactionsbetween the beads of the polymer. In fact, both the interaction energy (cf. Eq. 29) and thepolymer looping entropy display such a power–law dependence.

VI. CONCLUSIONS

We developed an eﬀective model for the dynamics of chromosomes based on the assump-tion that the interactions that stabilize TADs are mediated by extruders running along thepolymer consuming energy. The eﬀective model is built in such a way that its equilib-rium conformations approximate the conformations visited in the long run by the out–of–equilibrium extrusion mechanism.We showed that simulations performed with the eﬀective model produce average contactmaps that are as similar to the experimental Hi–C map as those from an explicit extrusionmechanism. Even if they allow to detect TADs, the agreement with the experimental datafor both kinds of model is still not perfect. This is not surprising because they are based ona minimal amount of information, that is the position of CTCF along the chain. There areindeed models [26–28] that produce contact maps closer to the experimental ones, but at theprice of a larger amount of input information, being thus less predictive. Importantly, thepresent model can be improved by adding information from the experimental maps, such asthe presence of compartments on a length scale larger than that of TADs [16].The eﬀective model in controlled by a free parameter ( E ) because interaction energies aredeﬁned but for an additive constant. This parameter cannot be determined by the position18 . . . FIG. 8. The contact probability between sites as a function of their distance along the chain, forthe Tsix domain (upper panel), for region chrX:102278307-103570000 (middle panel) and regionchrX:103578307-106170000 (lower panel). of CTCF but has to be tuned manually. Although the details of the simulated maps dependon this choice, the overall partitioning of the chromosome into domains seems quite robustwith respect to it.The eﬀective model is based on averaging out the position of the extruder along the chain,so it is a mean–ﬁeld approximation. Although this appears to be good enough to reproduceaverage maps, by deﬁnition it cannot account for ﬂuctuations. Thus, the price to be payedto reduce the complexity in the description of the system is the loss of information aboutcell–to–cell variability.Nonetheless, the maps that summarize the interaction energies (cid:15) ij display the main pat-terns present in the experimental Hi–C maps, suggesting that polymeric entropy plays a19imited role in shaping the architecture of chromosomes, at least at the scale of Mbp.Another strong approximation that we implemented is that the extruder cannot overcomeCTCF sites. This approximation allowed us to obtain an analytical expression for thedistribution of the extruder. However, it is known that CTCF is bound to its binding sitesonly for ≈

50% of time [29], resulting in an eﬀective permeability of CTCF sites. Explicitsimulations of the motion of phantom and of sterically-interacting extruders give similarresults, at least in a simple toy system.An approach analogous to ours was followed in ref. [17], in which a Fokker–Planckequation for the binding probability of the extruder is solved in case of constant velocity,of pure diﬀusion and of diﬀusion in an eﬀective potential that reﬂects the entropic cost ofpolymer looping. Only in the last case the binding probability displays a power-law scaling,reﬂecting the dependence of the entropy cost on the linear distance of the loop. However,recent experiments [9, 10] indicate that cohesin uses ATP not only to bind/unbind but alsoto run on the chromosome. Since ATP hydrolysis rate in cohesin is approximately 2 s − [9], assuming that hydrolysis provides ≈

30 kJ/mol, the provided power is approximately60 kJ/mol per second. On the other hand, cohesin runs at a rate of 2 kbp/s, making aloop of 4 kbp every second. Assuming a persistence length of the order of a kbp, the energyloss associated with the formation of the loop is at most T log 4 ≈ ppendix A: Numerical parameters of the system Experiments of ﬂuorescence recovery after photobleaching indicate that the mean resi-dence time of cohesin on the chromatin ﬁber is 13 minutes, corresponding to a detachmentrate k oﬀ = 1 . · − s − [30]. Total internal reﬂection microscopy of reconstructed cohesin–chromatin in a ﬂow cell indicate that the stepping rate of cohesin is k = 10 bp/s [9].Fluorescence correlation spectroscopy experiments show that approximately c = 250 , c = 109 , k (cid:48) on of cohesin on chromatinper base can be estimated from k oﬀ and from the fraction of bound molecules, that is k (cid:48) on = k oﬀ V n N c b c − c b , . (A1)where V n is the nuclear volume, N is the total number of base pairs, c b is the number ofbound cohesin molecules and c is the total number of cohesin molecules. Using V n = 500 µm , N = 3 · , c = 2 . · and c b = 1 . · [30] one obtains k (cid:48) on = 3 . · − nm / ( ps · bp ).The (eﬀective) diﬀusion coeﬃcient of the chromatin ﬁber measured by live–cell imagingis D ch = 3 · − µm /s [32]. The diﬀusion coeﬃcient of cohesin can be estimated by Stoke’slaw, using a hydrodynamic radius of R = 8 . η = 1 . D co = 18 µm /s , which is ﬁve orders of magnitudelarger than that of chromatin, justifying the well–mixed hypothesis.The time scale τ ext associated with extrusion on the TAD length scale (i.e., L ∼ bp extruded by n ext ∼

30 cohesin molecules) is τ ext ∼ L/ ( k n ext ) ∼ s . The time scaleassociated with the motion of the chain is τ ∼ L /D ch ∼ L ∼ nm [35]).The Hi–C maps we used as reference have a resolution of 5 · bp, an thus we used thisas elementary unit of the model. From the density obtained from ref. [26], this correspondsto a = 67 nm . We used this quantity as elementary length scale for the model. The frictionconstant of the polymer can be obtained from Einstein’s equation γ = D ch /k B T and, interms of the length scale a , at room temperature is γ = 4 · kJ ps/mol a . The steppingrate of cohesin is k = 2 · − a/ps . The loading rate used for simulations of N monomersis k on = N k (cid:48) on . 21ssuming the mass density typical of biomolecules, 1 g/cm , the mass of a monomer isof the order of 10 − Kg . Appendix B: Solution of the recursive equation

Let’s deﬁne b n,m = a n +1 ,m +1 . Eq. (B1) can be written as b n,m = b n − ,m + b n,m − , (B1)that can be solved iteratively starting from b n, = 2 n +1 − b ,m = 1.Let’s deﬁne the bivariate generating function f ( x, y ) = ∞ (cid:88) n,m =0 b n,m x n y m (B2)Separating the terms m, n = 0 one obtains f ( x, y ) = (cid:88) n =0 (2 n +1 − x n + (cid:88) m> y n + (cid:88) n,m> b n,m x n y m = 11 − x + y − y + (cid:88) n,m> b n,m x n y m . (B3)Substituting Eq. (B1) and renaming the indexes, f ( x, y ) = 1(1 − x )(1 − x ) ++ y − y + (cid:88) n,m> [ b n − ,m + b n,m − ] x n y m == 1(1 − x )(1 − x ) + y − y + x (cid:88) n =0 ,m> b n,m x n y m − + y (cid:88) n> ,m =0 b n,m x n y m == 1(1 − x )(1 − x ) + y − y + xf ( x, y ) −− x (1 − x )(1 − x ) + yf ( x, y ) − y − y == 1(1 − x ) + xf ( x, y ) + yf ( x, y ) (B4)Thus, f ( x, y ) = 1(1 − x − y )(1 − x ) , (B5)22hose series expansion is b n,m = 2 n + m +1 − Γ( n + m + 2) F (1 , n + m + 2 , n + 2 , / n + 2)Γ( m + 1) (B6)where F is the Gaussian hypergeometric function. This expression can be simpliﬁed to b n,m = (cid:18) n + m + 1 m + 1 (cid:19) F (1 , − n. m, −

1) (B7)and thus a n,m = (cid:18) n + m − m (cid:19) F (1 , − n. m, −

1) (B8) [1] J. R. Dixon, S. Selvaraj, F. Yue, A. Kim, Y. Li, Y. Shen, M. Hu, J. S. Liu, and B. Ren, Nature , 376 (2012).[2] Y. Zhan, L. Mariani, I. Barozzi, E. G. Schulz, N. Bl¨uthgen, M. Stadler, G. Tiana, and L. Gior-getti, Genome Res. , 479 (2017).[3] G. Fudenberg, M. Imakaev, C. Lu, A. Goloborodko, N. Abdennur, and L. A. Mirny, Cell Rep. , 2038 (2016).[4] G. Fudenberg, N. Abdennur, M. Imakaev, A. Goloborodko, and L. A. Mirny, Cold SpringHarb. Symp. Quant. Biol. , 45 (2017).[5] S. K. Ghosh and D. Jost, Brief. Funct. Genomics , 119 (2020).[6] R. J. Spencer, B. C. del Rosario, S. F. Pinter, D. Lessing, R. I. Sadreyev, and J. T. Lee,Genetics , 441 (2011).[7] G. Wutz, C. V´arnai, K. Nagasaka, D. A. Cisneros, R. R. Stocsits, W. Tang, S. Schoenfelder,G. Jessberger, M. Muhar, M. J. Hossain, N. Walther, B. Koch, M. Kueblbeck, J. Ellenberg,J. Zuber, P. Fraser, and J.-M. Peters, EMBO J. , 3573 (2017).[8] S. S. P. Rao, S.-C. Huang, B. Glenn St Hilaire, J. M. Engreitz, E. M. Perez, K.-R. Kieﬀer-Kwon, A. L. Sanborn, S. E. Johnstone, G. D. Bascom, I. D. Bochkov, X. Huang, M. S.Shamim, J. Shin, D. Turner, Z. Ye, A. D. Omer, J. T. Robinson, T. Schlick, B. E. Bernstein,R. Casellas, E. S. Lander, and E. L. Aiden, Cell , 305 (2017).[9] I. F. Davidson, B. Bauer, D. Goetz, W. Tang, G. Wutz, and J.-M. Peters, Science. , 1338(2019).[10] Y. Kim, Z. Shi, H. Zhang, I. J. Finkelstein, and H. Yu, Science. , 1345 (2019).

11] A. L. Sanborn, S. S. P. Rao, S.-C. Huang, N. C. Durand, M. H. Huntley, A. I. Jewett,I. D. Bochkov, D. Chinnappan, A. Cutkosky, J. Li, K. P. Geeting, A. Gnirke, A. Melnikov,D. McKenna, E. K. Stamenova, E. S. Lander, and E. L. Aiden, Proc. Natl. Acad. Sci. U. S.A. , E6456 (2015).[12] M. Yin, J. Wang, M. Wang, X. Li, M. Zhang, Q. Wu, and Y. Wang, Cell Res. , 1365 (2017).[13] Y. Li, J. H. I. Haarhuis, ´A. Sede˜no Cacciatore, R. Oldenkamp, M. S. van Ruiten, L. Willems,H. Teunissen, K. W. Muir, E. de Wit, B. D. Rowland, and D. Panne, Nature , 472 (2020).[14] S. S. P. Rao, M. H. Huntley, N. C. Durand, E. K. Stamenova, I. D. Bochkov, J. T. Robinson,A. L. Sanborn, I. Machol, A. D. Omer, E. S. Lander, and E. L. Aiden, Cell , 1665 (2014).[15] G. Tiana and L. Giorgetti, Curr. Opin. Struct. Biol. , 11 (2018).[16] J. Nuebler, G. Fudenberg, M. Imakaev, N. Abdennur, and L. A. Mirny, Proc. Natl. Acad. Sci.U. S. A. , E6697 (2018).[17] C. A. Brackley, J. Johnson, D. Michieletto, A. N. Morozov, M. Nicodemi, P. R. Cook, andD. Marenduzzo, Phys. Rev. Lett. , 138101 (2017).[18] C. A. Brackley, J. Johnson, D. Michieletto, A. N. Morozov, M. Nicodemi, P. R. Cook, andD. Marenduzzo, Nucleus , 95 (2018).[19] J. F. Marko, P. De Los Rios, A. Barducci, and S. Gruber, Nucleic Acids Res. , 51905 (2019).[20] A. Y. Grosberg and A. Khokhlov, Statistical Physics of Macromolecules (AIP Press, 1994).[21] W. Schwarzer, N. Abdennur, A. Goloborodko, A. Pekowska, G. Fudenberg, Y. Loe-Mie, N. A.Fonseca, W. Huber, C. Haering, L. Mirny, and F. Spitz, Nature , 1270 (2017).[22] G. Bussi and M. Parrinello, Phys. Rev. E, Stat. nonlinear, soft matter Phys. , 56707 (2007).[23] E. P. Nora, A. Goloborodko, A. L. Valton, J. H. Gibcus, A. Uebersohn, N. Abdennur,J. Dekker, L. A. Mirny, and B. G. Bruneau, Cell , 930 (2017).[24] J. Redolﬁ, Y. Zhan, C. Valdes-Quezada, M. Kryzhanovska, I. Guerreiro, V. Iesmantavicius,T. Pollex, R. S. Grand, E. Mulugeta, J. Kind, G. Tiana, S. A. Smallwood, W. de Laat, andL. Giorgetti, Nat. Struct. Mol. Biol. , 471 (2019).[25] E. Lieberman-Aiden, N. L. van Berkum, L. Williams, M. Imakaev, T. Ragoczy, A. Telling,I. Amit, B. R. Lajoie, P. J. Sabo, M. O. Dorschner, R. Sandstrom, B. Bernstein, M. A. Bender,M. Groudine, A. Gnirke, J. Stamatoyannopoulos, L. A. Mirny, E. S. Lander, and J. Dekker,Science. , 289 (2009).[26] L. Giorgetti, R. Galupa, E. P. Nora, T. Piolot, F. Lam, J. Dekker, G. Tiana, and E. Heard, ell , 950 (2014).[27] D. Marenduzzo, Genome Biol. , 1 (2016).[28] D. Jost, C. Vaillant, and P. Meister, Curr. Opin. Cell Biol. , 20 (2017).[29] A. S. Hansen, I. Pustova, C. Cattoglio, R. Tjian, and X. Darzacq, Elife , 10.7554/eLife.25776(2017).[30] J. Holzmann, A. Z. Politi, K. Nagasaka, M. Hantsche-Grininger, N. Walther, B. Koch, J. Fuchs,G. D¨urnberger, W. Tang, R. Ladurner, R. R. Stocsits, G. A. Busslinger, B. Nov´ak, K. Mechtler,I. F. Davidson, J. Ellenberg, and J.-M. Peters, Elife , 10.7554/eLife.46269 (2019).[31] C. Cattoglio, I. Pustova, N. Walther, J. J. Ho, M. Hantsche-Grininger, C. J. Inouye, M. J.Hossain, G. M. Dailey, J. Ellenberg, X. Darzacq, R. Tjian, and A. S. Hansen, Elife , e40164(2019).[32] G. Tiana, A. Amitai, T. Pollex, T. Piolot, D. Holcman, E. Heard, and L. Giorgetti, Biophys.J. , 1234 (2016).[33] S. Weitzer, C. Lehane, and F. Uhlmann, Curr. Biol. , 1930 (2003).[34] L. Liang, X. Wang, D. Xing, T. Chen, and W. R. Chen, J. Biomed. Opt. , 024013 (2009).[35] A. N. Boettiger, B. Bintu, J. R. Moﬃtt, S. Wang, B. J. Beliveau, G. Fudenberg, M. Imakaev,L. A. Mirny, C.-t. Wu, and X. Zhuang, Nature , 1 (2016)., 1 (2016).