[PDF] Chromosome-wide simulations uncover folding pathway and 3D organization of interphase chromosomes

Abstract

Three-dimensional interphase organization of metazoan genomes has been linked to cellular identity. However, the principles governing 3D interphase genome architecture and its faithful transmission through disruptive events of cell-cycle, like mitosis, are not fully understood. By using Brownian dynamics simulations of Drosophila chromosome 3R up to time-scales of minutes, we show that chromatin binding profile of Polycomb-repressive-complex-1 robustly predicts a sub-set of topologically associated domains (TADs), and inclusion of other factors recapitulates the profile of all TADs, as observed experimentally. Our simulations show that chromosome 3R attains interphase organization from mitotic state by a two-step process in which formation of local TADs is followed by long-range interactions. Our model also explains statistical features and tracks the assembly kinetics of polycomb subnuclear clusters. In conclusion, our approach can be used to predict structural and kinetic features of 3D chromosome folding and its associated proteins in biological relevant genomic and time scales.

Full PDF

CChromosome-wide simulations uncover folding pathway and 3D organization of interphase chromosomes

Davide Michieletto , Davide Marenduzzo ∗ , Ajazul H. Wani ∗ SUPA, School of Physics and Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK Department of Biotechnology, University of Kashmir, Srinagar, J&K, India, 190006. ∗ Corresponding authors: [email protected], [email protected]

Three-dimensional interphase organization of metazoan genomes has been linked to cellular iden-tity. However, the principles governing 3D interphase genome architecture and its faithful transmissionthrough disruptive events of cell-cycle, like mitosis, are not fully understood. By using Brownian dy-namics simulations of

Drosophila chromosome 3R up to time-scales of minutes, we show that chromatinbinding proﬁle of Polycomb-repressive-complex-1 robustly predicts a sub-set of topologically associateddomains (TADs), and inclusion of other factors recapitulates the proﬁle of all TADs, as observed experi-mentally. Our simulations show that chromosome 3R attains interphase organization from mitotic stateby a two-step process in which formation of local TADs is followed by long-range interactions. Our modelalso explains statistical features and tracks the assembly kinetics of polycomb subnuclear clusters. In con-clusion, our approach can be used to predict structural and kinetic features of 3D chromosome foldingand its associated proteins in biological relevant genomic and time scales.INTRODUCTION

Eukaryotic genomes have a sophisticated hierarchical three-dimensional organization, ranging from nucleosomes to loops,topologically-associating domains (TADs) [1, 3, 4, 8] and chro-mosome territories [5]. Understanding this multi-scale organiza-tion is a fundamental problem in biology as it has been linked todevelopment, differentiation and diseases [6–8]. Application ofchromosome conformation capture (3C) and microscopy-basedmethods has provided unprecedented insight into this problem.3C-based methods have shown that genomes of metazoans areorganized into TADs separated by boundaries [1, 3, 4, 8]. Whilethe mechanism leading to the formation of TADs and the bio-logical nature of boundaries are still not fully understood, goodcorrelations between enrichment of speciﬁc proteins and histonemodiﬁcations in different TADs have been observed [3, 8].Chromatin-binding proteins, usually present in the formof multiprotein complexes, interact through a combination ofprotein-protein and DNA-protein interactions and can provide theenthalpy to compensate for the entropic costs associated with theformation of TADs or other compacted or looped forms of chro-matin. The multiprotein complex, Polycomb repressive complex1 (PRC1), can compact chromatin in vitro , and it is known that in vivo it bridges chromatin regions separated by tens of kbs toMbs [3]. The ability of PRC1 to mediate intra-domain as well aslong-range chromatin interactions stems from its self-associationproperty mediated by SAM-domain polymerization of its PH sub-unit [3]. Heterochromatin protein 1 (HP1) also self interacts andstabilizes heterochromatic regions. Although CTCF and otherinsulator proteins mark boundary regions, they are also foundwithin TADs and can mediate long-range loops either by homo-typic or heterotypic protein-protein interactions [10].Association of chromatin-bound proteins with each other alsoregulates their subnuclear distribution. Microscopy studies haveuncovered subnuclear organization of some proteins into clusters,or foci . Polycomb group (PcG) proteins are organized into hun-dreds of nanoscale clusters and their subnuclear distribution iscoupled to chromatin topology [3]. In addition, “transcriptionfactories” underlying active gene clusters, Cajal bodies and nu-clear speckles are other features of subnuclear protein distribution linked to gene expression [11, 12]. Super-resolution microscopycoupled to FISH, in contrast to Hi-C, a population averaged tech-nique, can provide single cell information and physical scales ofvarious chromatin features [13–15].Many of the chromatin-associated proteins regulating inter-phase organization dissociate from chromatin when cells enterinto mitosis, where chromatin is reconﬁgured [16–22]. As aconsequence, how interphase chromatin features are preservedthrough chromatin reconﬁguration stages of cell division isnot well understood. Faithful inheritance and imprinting ofepigenetic traits regulating interphase chromatin organizationfrom parent cell to daughter cells ensures maintenance of propergene expression which, in turn, determines cellular identity [23].Monitoring real time relaxation of chromosomes from the mitoticto the interphase state would shed much light into this complexissue, but such experiment has not been realised yet, as itappears very challenging and demands extensive high resolutionobservations.In parallel with this rapidly growing body of experimental data,there have been a number of polymer physics models aiming atelucidating different aspects of genome organization within eu-karyotic nuclei. Some of the models [24–26] start from the Hi-Cdata and have the goal of ﬁnding an appropriate force ﬁeld forthe intra-polymer interactions which can recreate these data. Acomplementary approach, so called “bottom-up”, starts from ba-sic biological assumptions and on the sequencing, e.g. , chromatinimmuneprecipitation coupled to sequencing (ChIP-Seq), datasetscharacterizing the one-dimensional (1D) genomic features. Fromthis information, but rigorously avoiding any prior knowledge ofthree-dimensional (3D) contacts or genomic architecture, thesemodels explore via large-scale computer simulations the result-ing 3D spatial organization of the genome [7, 27, 28, 30–32]. Inthis case, the predicted 3D structure can then be directly com-pared with Hi-C and microscopy data. The two approaches arealso sometimes denoted as “inverse” and “direct” modelling re-spectively. Our current work is an example of the “direct” mod-elling avenue, and its key strength is in its much enhanced pre-dicted power. As our direct modelling approach does not involvecomputationally costly ﬁtting procedures, we can simulate very a r X i v : . [ q - b i o . S C ] A p r large regions, in this case entire chromosomes, and follow theirdynamics for time-scales which are biologically relevant, up toseveral minutes. These two aspects of our approach enable us tomake prediction of structural and kinetic features of chromosomerelaxation in biologically relevant genomic and time scales.In this work we use Brownian dynamics simulations to studythe folding of the entire Drosophila chromosome 3R (chr3R). Inour model, this chromosome is viewed as a semi-ﬂexible polymermade up of coarse grained beads (each representing about 3 kbpof the ﬂy’s genome, see SI, Methods), which interact with solu-ble proteins, or protein complexes. Our minimal model is basedon the simple biological assumption that protein complexes aremainly responsible for driving the observed 3D organization ofthe chromatin [27, 28, 33–35]. Since these proteins can bindto chromatin at many places, they can bridge non-contiguouschromatin regions to which they are bound. Loops mediated byCTCF, chromatin contacts mediated by polymerization of PcGprotein PH [36], and heterochromatin-associated complexes suchas dimers of HP1 [37] are some examples of chromatin con-tacts orchestrated by a combination of DNA-protein and protein-protein interactions.For simplicity, we only directly model protein bridges, and dis-regard monovalent proteins. The afﬁnity of chromatin beads for aprotein complex is either directly set according to the chromatinimmunoprecipitation (ChIP-seq) data, or as per the local chro-matin states, which have previously been inferred from variousChIP-seq data [4, 5].In order to test the predictions of our model, we compare oursimulations with both, experimentally obtained chromatin contactmaps and subnuclear size distribution of protein clusters. In allcases, we ﬁnd good overall agreement between simulations andexperimental observations, which is remarkable given the sim-plicity of the assumptions in our model. In addition, we alsotrack the folding dynamics of the chromosome up to experimen-tally relevant time-scales of several minutes. From this study wedraw an important conclusion: the folding dynamics of an in-terphase chromosome proceeds in a two-step fashion. First, localcontacts within a TAD form fast and reproducibly; concomitantly,nuclear protein clusters self-assemble within minutes. Later on,non-local contacts between genomic regions in different TADsare established. This process is slower and it is accompanied byan equally slow coarsening of the nuclear protein clusters. Impor-tantly, the ﬁrst step (local folding into TADs and nuclear proteincluster assembly) appears solely dependent on the protein bindinglandscape along the chromosome and strongly suggests that thisminimal 1D information can lead to the faithful 3D organizationof the genome. On the other hand, there is a notable dependenceon the chosen initial chromosomal state in the higher order fold-ing; the formation of non-local, inter-TAD interactions (which isaccompanied by coarsening of nuclear protein clusters) has also asigniﬁcant stochastic component, as different contacts are set upin different simulations – similarly to what observed in single cellHi-C experiments. In what follows, we will refer to both depen-dencies on initial conditions and on stochasticity by saying thatthe second folding step depends on the conformational “history”of the system.

RESULTSPRC1 binding pattern robustly predicts the formation of a sub-setof the experimentally observed TADs

We begin by modelling the chromatin ﬁbre of

Drosophila chro-mosome 3R (chr3R) during interphase as a semi-ﬂexible polymer(see SI, Methods) interacting with soluble proteins, representingthe polycomb repressive complex 1 (PRC1). This complex bindsto chromatin at many places, and a subunit of it, polyhomeoticPH, self-polymerizes, making PRC1 able to effectively bridge be-tween different regions of the chromatin ﬁber [3]. The afﬁnitybetween PRC1 and each chromatin bead is informed by ChIP-seqdata (see SI, Methods) and can vary between two possible values:high afﬁnity binding (interaction strength (cid:15) ) and weaker afﬁnity(interaction strength (cid:15) < (cid:15) ). The relative afﬁnities of these twotypes of binding sites are set accordingly to the relative bindingintensities obtained from ChIP-seq data [3] (see SI Table T1 andFig. S1).We initialized the simulations from three possible conforma-tions of the polymer thereby simulating different initial chro-mosomal states: (i) random walk (RW), (ii) stretched randomwalk (S) and (iii) fractal globule (FG) (see right-bottom cornerof Figs. (a),(b) and (c), respectively, for examples of such con-ﬁgurations). While the random walk represents the simplest poly-mer physics choice (see Suppl. Movie M1), the stretched randomwalk may better emulate a post-mitotic elongated ﬁber (Suppl.Movie M2). Finally, the fractal globule state is motivated bythe homonymous model, according to which interphase chromo-somes are folded so that they can crumple and segregate whilstremaining entanglement-free. As a consequence of this peculiarglobal folding, contacts in the fractal globule are established al-most exclusively among genomic regions which are also contigu-ous in 1D sequence space [40] (see Suppl. Movie M3). Althoughall initial conﬁgurations are prepared in such a way that they pos-sess a small knotting probability, we do not check for the pres-ence of knots in either initial or ﬁnal conﬁgurations. Nonetheless,the chain is allowed to cross through itself with an exponentiallysmall probability (see SI) in order to quickly move away fromhighly entangled, topologically frustrated, conﬁgurations.Starting from these initial conﬁgurations, we followed the evo-lution of contacts of chr3R for about 2-20 minutes of real time(calculated assuming a nucleoplasmic viscosity of 10-100 cP, re-spectively, ten or a hundred times that of water, see SI, Methods).Remarkably, we ﬁnd a very similar local domain structure for allthree initial conditions (as shown in the zoom of the region 10-15Mbp in the second and third rows of Fig. (a)-(c)), which suggeststhat the formation of some topological domains is largely drivenby PRC1 binding and self-association.Interestingly, in the case of the fractal globule, while at thebeginning the domains of enriched contacts are visible at mul-tiple scales (see top row in Fig. (c)), as expected for a fractalstructure, later on these domains rearrange to settle into their ﬁ-nal locations, which is determined by the ChIP-seq binding sitesof PRC1, and bear little resemblance with the initial state (seeFig. (c) and compare second and third rows with ﬁrst row, seealso SI). Remarkably, the ﬁnal domain locations predicted fromall three different starting conditions are in good agreement with Figure 1 : PcG protein driven topology of chr3R in simulations: Local domains form robustly and similarly for all initial chromosomal states,whereas non-local contact patterns reﬂects the system history.

Here we show the contact maps and the contact probability for different initialconﬁgurations and concentration of binding proteins. The top three rows show contact maps: the left picture shows the contact map which is rotatedby 45 degrees and truncated to show more clearly local domains along the diagonal in the region 10-15 Mbp; the right picture shows the full contactmap. The three rows correspond to, respectively: starting conﬁguration (top); ﬁnal conﬁguration with 500 PRC1 proteins (middle); ﬁnal conﬁgurationwith 1500 PRC1 proteins (bottom). The ﬁnal conﬁgurations are obtained after simulation time units, or 2-20 minutes (see SI for the mappingbetween simulation and physical units). The fourth row shows the decay of the contact probability as a function of genomic distance in the initial (greyline) and ﬁnal conﬁgurations (coloured lines), and a snapshot of the initial conﬁguration (right picture). In the contact probability plot we also showpower law ﬁts with different effective exponents in the intradomain ( < Mbp) and interdomain ( > Mbp) range. (a) (b) and (c) correspond to Randomwalk (RW), stretched (S) and fractal globule (FG) initial chromosomal states. See text and SI for further details on the preparation. By comparinginitial and ﬁnal rows one can notice that local domains form robustly and similarly for all initial conditions, whereas non-local contact patterns dependmuch more strongly on initial conditions. This is also supported by the two exponents found for the decay of contact probability: close to -1 forsub-megabase contacts and for all cases, while case-subjective for the inter-TAD ( > Mbp) regime. See also Suppl. Movies M1-M3.

TADs as found in Hi-C experiments [8] and classiﬁed as PcGTADs (see SI Fig. S2).Differently to the local domain folding, the non-local patternof contacts, visible as the off-diagonal structure in the full contactmap (reported in Fig. for the various cases), is more strongly af-fected by the starting conﬁguration – the contact maps also showvariability in different runs, so that several runs need to be av-eraged when computing the contact map. For the stretched ran-dom walk, local interactions ( < P ( s ) , Fig. bottomrow) conﬁrms that there is a “local” scale where the informationregarding the initial chromosomal state is lost and a “non-local”scale which carries the signature of system history. This can beseen from the power law decay exponent of P ( s ) . Interestingly,we ﬁnd that restricting the datasets to contacts between regionscloser than 1 Mbp in sequence, i.e. , “local” interactions, the ef-fective exponent measuring the decay of contact probability withdistance (within chr3R) is close to -1 whatever the initial chro-mosomal state. This values becomes closer to the experimen-tally observed one [8] ( ∼ − . ) when a more detailed model isconsidered (see later sections and SI Fig. S9). Nonetheless, thishistory-independent emerging exponent reﬂects the fact that thelocal organization is solely determined by chromatin-associatedproteins and their binding proﬁles, rather than the initial chromo- somal architecture, which, as we showed, is lost in steady state(Fig. ). The exponent characterizing non-local interactions is in-stead observed to vary signiﬁcantly with initial condition, whichonce again suggests that the inter-TAD organization is dependenton the system history. The fact that there can exist two effec-tive exponents: one for local, intradomain interactions, and an-other one for non-local, or interdomain, interactions is also sup-ported by the most recent HiC data for contacts within humanchromosomes [41]. These ﬁndings also complement the onesin Ref. [27], which demonstrated that the effective exponent isdependent on interaction strength and protein concentration. Inour simulations we in fact observe that the effective exponent forlong-range contacts depends on the history of the system, whichtranslates into a sensitivity on timing of the cell cycle or on sam-ple preparation. Including binding patterns of other proteins predicts location ofremaining TADs

The results discussed above pertain to the case in which onlyPRC1 interacted with the chromatin ﬁber. We refer to this caseas the “polycomb-only” model from now on. There are otherproteins which act as architectural players in shaping the 3Dorganization of chromatin. Heterochromatin associated proteinHP1, transcription factors and insulator proteins associated withenhancer-promoter loops are common examples. We therefore

Figure 2 : Comparison between predicted and experimentally observed chromatin topology using “polycomb-only” or “full” model simula-tions.

In each of these plots we rotate the contact maps in order to better compare the TADs obtained from simulations (bottom halves) to the onesfrom Hi-C experiments [8] (top halves) for three chosen segments of ch3R (left:1.5-2.35Mbp; middle:12-13.4Mbp; right:23-24.4Mbp). The top rowshows the comparison with the “polycomb-only” model while the bottom row with the “full” model. It is evident that the latter better captures theposition and the ﬁne structure of the experimentally observed TADs, comprising heterochromatic and active domains, while the former only capturesthe presence of PcG TADs [8] (shown in the middle column). tackled the problem of understanding how these additional fac-tors change the chromosome folding, which has been thus fardriven by the PcG proteins alone. To this end, we introducedtwo other sets of binding and bridging factors, one representinga heterochromatin-binding factor, and another one representing aeuchromatin-binding factor. We chose these generic binders forthese epigenetic states because mechanistic aspects driving fold-ing of these chromatin states are not as clearly understood as inthe case of PcG proteins, which have been clearly shown to thecompact and regulate the topology of chromatin [42].We marked chromatin beads according to their afﬁnity for thethree binders in our simulations: (i) PRC1, (ii) heterochromatin-binding factor and (iii) euchromatin-binding factor. While theafﬁnity between chromatin and PRC1 was determined throughChIP-seq data as previously discussed, we used chromatin statedata from Refs. [4, 5] to determine the interactions with the othertwo factors, as follows. First, we used chromatin states fromRef. [4], and stipulated that heterochromatin factors only bind(with weak afﬁnity, (cid:15) = 2 k B T ) to “black” chromatin state,which labels generic heterochromatin (accounting for almost halfof the Drosophila genome). The heterochromatin factors maytherefore model proteins like histone protein H1, AT-Hook pro-tein D1, SUUR, and Su(Hw), all proteins found in black chro-matin and associated with architectural roles or the establishmentof higher order chromatin structure [4]. Second, we used the chro-matin state model from Ref. [5] to identify enhancers (state 1 inRef. [5]) and promoters (state 3), as chromatin regions where eu-chromatin factors bind strongly (with high afﬁnity, (cid:15) = 9 k B T ).Such euchromatin factors may represent clusters of transcription factors, bridging factors like CTCF and polymerases which canloop the chromatin of active genes. We also allowed binding ofeuchromatin factors to transcribed regions (states 2, 4 in Ref. [5])with weak afﬁnity, (cid:15) = 3 k B T . We refer to the model with threesets of binders as the “full-model”. For simplicity, we present inthis case mainly results obtained with a mitotic-like initial chro-mosomal state (see preparation details in SI and Suppl. MoviesM4-M6). This structure is qualitatively similar to the stretchedstate used for the polycomb-only model, but has additionally aninternal looped structure matching that found in some models formitotic chromosome organization [6] (see SI).We ﬁnd that incorporation of additional bridges in the fullmodel leads to formation of further TADs, while PcG drivenTADs remain largely unaltered. This suggests that, to a ﬁrst ap-proximation, the bridges act mostly independently of one another.Fig. shows a comparison between the predicted and experi-mental contact maps across different regions of chr3R. At 1.5–2.35 Mbp a big TAD ﬂanked by smaller TADs appear in the fullmodel. The polycomb-only simulations (from any initial condi-tion) show instead a weak domain encompassing these, but withno well deﬁned boundary. As shown by the comparison, theseTADs clearly correlate with TADs from Hi-C data, and fall in the“null” class [8]. Next at 12–13.4 Mbp, additional TADs ﬂankingthe central PcG TAD (highlighted by the light blue box) come upin the full model; these are absent in the polycomb-only model,and again they correlate well with contacts map from Hi-C. In-terestingly, on the left hand side, one can further see that TADsplits into two subdomains, as in experiments. Towards the endof chr3R there is a region, located at 23–24.4 Mbp, which re-mains completely structureless in the polycomb-only simulations(top row), but instead folds into regular TADs in the full model,yet again matching well with the Hi-C data.To further validate the predicted contact map, 4C simulationsusing a bait at 12.77 Mbp on chr3R were performed (SI Fig. S3).An asymmetric distribution of contacts was observed around thebait, which is expected given that the bait lies close to the bound-ary of the TAD (see Fig. and SI Fig. S3). This trend is wellreproduced by both polycomb-only and full model simulations.Comparing with experimental 4C contact map, detailed analysisof the results shows another intriguing trend. The polycomb-onlymodel shows four large peaks to the left of the bait which correlatewith the 4C data, but are much larger than in experiments. Simi-larly, also the forward contacts predicted by this model are signif-icantly more than those observed by 4C-seq. On the other hand,the number of contacts made by the bait decline in the case of thefull model and the agreement with the experimentally observed4C curve improves (SI Fig. S3). Therefore, although at ﬁrst ap-proximation we can say that each of the bridges act separatelyto form separate TADs, the non-PcG bridges can screen PcG-mediated interaction, e.g. , by sequestering weak binding sites forPRC1. Overall, both models give a good prediction for the posi-tion of troughs and peaks in the 4C experiment.From the above-presented simulations, it appears that 1D in-formation obtained from ChIP-seq data can successfully be usedto predict the chromatin contacts. Although this procedure canlead to a static picture of interphase chromatin organization, it isalso crucial to understand the dynamics of formation of contactsand folding. In particular, how the interphase organization is at-tained by a chromosome upon exiting mitosis is still not clear.Understanding the relaxation of chromosomes from mitotic to in-terphase states may yield insight into how chromatin organizationis inherited across cell generations and enabling cells to remem-ber their identity. Our large scale simulations allow us to probeexperimentally relevant time-scales and in light of this we pro-ceed to analyse the chromosomal folding dynamics and study itsfeatures. Chromatin folding is a two-step process

In order to understand how chromosomes attain locus-speciﬁcinterphase organization from a locus-independent mitotic state,we studied the relaxation dynamics of chr3R in silico . This pro-cess would be very difﬁcult to probe experimentally as it wouldentail, for instance, an extensive Hi-C study of temporally syn-chronised cells. In light of this, simulations provide an ideal toolto gain some insight into this complex process.Following the evolution of contacts from three distinct initialconﬁgurations using the polycomb-only model, different kineticsis observed for the formation of local and non-local contacts. Forall three initial conditions, Fig. shows that local contacts are es-tablished faster than non-local ones, and the growth rates also takesimilar values across different initial conditions. This suggeststhat the starting conﬁgurations chosen to model the initial stateof chr3R do not affect its folding kinetics. Contacts are classiﬁedas “local” when occurring between beads < > ) also sug-gests that local TADs are formed earlier than the establishmentof non-local contacts. This can be readily seen in Fig. , wherewe show the evolution of a selected region of chr3R simulatedusing the “full model” and starting from a random walk conﬁgu-ration. This case displays a quick local folding (within ∼ ∼

12 sec-onds) which does not show further folding and suggests the estab-lishment of a long-lived conﬁguration (up to 2 minutes). Similarcontact maps for chromosomes starting from mitotic-like conﬁg-urations are reported in the SI (Figs. S6-S7 and Suppl. MoviesM4-M5).Using the full-model, we simulated folding of chr3R from amitotic like conformation, which is one of the major chromatin re-organization events in vivo . Fig. shows that also with this morecomplete model, and more realistic initial conditions, the conclu-sions on the kinetics of chromosome folding reached on the basisof the simpler polycomb only model remain unaltered. The chro-mosome appears to fold in a two-step fashion, with local contactsbeing established signiﬁcantly earlier than non-local ones. Timeevolution of contact map from mitotic-like state shows that non-local contacts continue to form after the establishment of localTADs (SI Fig. S6-S7, Suppl. Movies M4-M6). The fraction oflocal contacts decreases while fraction of non-local contacts in-creases upon relaxation from the mitotic-like state. This appearsto be in agreement with experimental observations [47], whereapplication of Hi-C showed that mitotic chromosomes are de-pleted of long-range contacts, TADs, but enriched in very short-range (local) contacts. Here also the nature and genomic coordi-nates of contacts appears different between initial and ﬁnal con-formations. On the other hand, the dynamics of folding is againvery robust against different initial conditions. In Fig. we alsoshow the folding dynamics of chr3R starting from a random walkinitial conﬁguration (see Suppl. Movie M7). This is both qual-itatively and quantitatively strikingly similar to the dynamics ofand ﬁnal states reached by the mitotic-like starting chromosome(compare ﬁnal snapshots in insets of Fig. and the time-scales inthe two cases). These results strongly suggest a broad robustnessof the folding dynamics which might give a clue on the repro-ducibility and stability of TADs in vivo .Even though the kinetics are qualitatively similar for differentinitial conﬁgurations and the two models considered (compareFigs. and ), the quantitative balance struck between local andnon-local contacts over time varies for the three different initialconditions. For the RW case, an increase in local and a decreasein non-local contacts is observed, while for the stretched conﬁg-uration a moderate decrease in local contacts is seen. Finally, forthe FG case, the fraction of local contacts increases, while thatof non-local ones decreases over time. This dynamic quantita-tive evolution is accompanied by an even more signiﬁcant quali-tative dynamic change in the nature and genomic coordinates ofcontacts, which are very different in the initial and ﬁnal statesacross different starting conditions (compare top and bottom rows Figure 3 : Two-step folding of chr3R.

This ﬁgure shows the dynamic formation of “local” ( < Mbp) and “non-local” ( > Mbp) contacts insimulations of “polycomb-only” model. The top row shows the initial and ﬁnal distribution of the contacts for the random walk (red, left), stretched(green, middle) and fractal globule (purple, right) initial conﬁgurations. The bottom row shows the time evolution of the number of local and non-localcontacts from the initial state to the ﬁnal one for the same initial chromosomal state. In all cases, there are two different time-scales, one associatedwith the fast formation of local contacts and another one associated with the slow formation of non-local contacts. This is strong evidence for atwo-step mechanism for the folding of chromosomes exiting any arbitrary initial state. Here, τ Br can be mapped to 1-10 minutes of real time (seeSI). in Fig. ).Given that the extent of non-local contacts varies across initialconditions, one might expect that upon waiting a very long timeall initial conditions would eventually reach the same steady stateproperties for the non-local contacts. Even if this were the case, itis likely that, in practice, this time-scale may not be easily reach-able in reality: for instance, the equilibration (Rouse) time of apolymer the size of chr3R in isolation (and under inﬁnite dilu-tion) is time units, or 100-1000 minutes, and nuclear crowd-ing is likely to further slow down the dynamics dramatically [6].It is therefore possible that the system may still evolve, but onmuch larger time-scales; alternatively, it may be effectively ar-rested in a metastable state dependent on the initial condition. Wefavour the latter interpretation on the basis of our simulations, asno detectable change in the pattern is observed even in selected,longer simulations of time units. Furthermore, the radius ofgyration and average cluster size appear to have reached a steadystate within our simulation time, implying there might not be fur-ther major conformational changes (SI Fig. S4-S5). This is alsostrongly supported by Figs. S6-S7, where we show the time evo- lution of selected contact maps. Protein clusters form concomitantly with chromosome folding

In all of the above-presented simulations we showed that chro-matin associated proteins bind to the genome at speciﬁc sites, andthen interact with each other to bridge different regions of chro-matin. This implies that the distribution of chromatin-associatedproteins will change as chromatin will reorganize. We thereforenow analyse the dynamics of PRC1 during chromosome folding in silico , as the subnuclear distribution and dynamics of this pro-tein complex are relatively well studied experimentally [3, 50].Using both models and for all initial conditions, we remark-ably ﬁnd that the PRC1 proteins rapidly form clusters, which thengrow more slowly and appear to reach a stable steady state withinour simulated time window, after which no appreciable coars-ening occurs (Fig. and SI Fig. S5). Although we consideredseveral different possible values for the concentration of PRC1,we found that all these, except the smallest one (100 proteins for Figure 4 : The folding dynamics is arrested and displays long-livedTADs matching the experimental ones.

The time evolution of domainsbetween 23 and 24.5 Mbp is shown. The contact maps are made by aver-aging over 6 independent simulations starting from a RW conﬁgurationand simulated using the “full” model. The snapshots represent conﬁg-urations taken from one of the simulations. This ﬁgure clearly showsthat local interactions and intra-TAD arrangements occur on faster time-scales than inter-TAD ones. It also suggests that the folding structureachieved at short time ( ∼ seconds) is very long lived (virtually un-changed from 12 seconds on to 120 seconds) and well captures the epi-genetic states along the polymer (see bottom left ﬁgure). the whole of chr3R) led to very similar cluster size distributions.Remarkably, by assuming a diameter of about 30 nm per com-plex, these distributions compare well with those observed bySTORM (Fig. ). Because the afﬁnities of PRC1 are only de-termined qualitatively from ChIP-seq data [3] (see SI), we furthervaried the values of high and low afﬁnity sites used in the sim-ulations, and found that, while clustering is observed for all thecases we considered, the distribution only matches well that ob-served by STORM for values of binding for high afﬁnity sitesin the range 8-10 k B T and for weak sites in the range 2-4 k B T (see SI Fig. S2). These values are reasonable estimates for DNA-protein interactions, which need to be in a range to allow stablebut not irreversible binding. This also appears to be in agree-ment with about four-fold difference in binding intensity betweenlow and high intensity ChIP-seq peaks (Fig. S1) The clusteringoccurs through the following positive feedback mechanism: theformation of a small cluster of PRC1 proteins is followed by anincreased local concentration of polycomb group binding sites;because of this enhanced concentration, more PRC1 proteins arerecruited there leading to a further increase in binding site density.This is an example of a generic mechanism through which DNA-binding proteins with multiple binding sites along the chromatincluster, and which has previously been dubbed “bridging-inducedattraction” [28, 34], because it only works with proteins whichcan bridge chromatin (or indeed any other polymer to which theybind). In this case we consider that binding is speciﬁc, and thepositive feedback loop we have outlined further requires that asingle chromatin bead can bind to more than one protein complex.Because a chromatin bead represents 3.1 kbp (see SI, Methods),this is a realistic assumption for PRC1, as many of the ChIP-seq Figure 5 : The two-step folding process is also observed in the fullmodel. (left)

Kinetic of the formation of local and non-local contacts(deﬁned and normalised as in Fig. ) for the full model, where chr3R isinitialised in a mitotic cylinder conﬁguration (top row) (see SI, Ref. [6]and snapshot in inset) and as random walk (bottom row). Time is mea-sured in simulation units (or Brownian times τ Br , see SI for mapping tophysical units) and τ Br can be mapped to 1-10 minutes of real time. (right) Histograms showing the non-local and local fraction of contactsin the simulated chromosome conformation at the start and end of thesimulations shown for the two cases. The qualitative and quantitativesimilarity of the ﬁnal conﬁgurations, distribution of contacts and foldingdynamics is remarkable given so different initial conﬁgurations (com-pare snapshots in inset) and strongly suggest a very robust mechanism ofgenome organization. peaks are smaller than bead size and each of them can likely bindmultiple molecules. We note that the afﬁnity range we consideredleads to stable rather than transient foci. This is in line with theslow recovery times observed for polycomb bodies; also note thatpost-translational protein modiﬁcation, not included here, maypotentially render foci dynamics faster.We followed the dynamics of PRC1 during folding of chr3Ralso using the “full-model” and observed that PRC1 started bridg-ing different segments of chr3R by self-associating and this re-sulted in the formation of PRC1 clusters. A biphasic kinetics wasobserved for self-association of PRC1, a fast phase followed byslow phase reaching to steady-state (SI Fig. S5). Assuming a sizeof 30 nm for PRC1, and the formation of spherical cluster (seeSI), we observed that cluster diameter reached 120 nm which isthe range of experimentally observed values.In both the polycomb-only and the full model, the biphasicclustering dynamics of PRC1 mirrors the two-step chromatinfolding. In particular, the time-scales associated with the biphasicdynamics compare well quantitatively with the time-scale associ-ated with the faster local folding and the slower non-local foldingof the chromatin ﬁbre. The growth of PRC1 clusters also follow

Figure 6 : Formation of PRC1 clusters in chr3R in computer simulations and STORM experiments. (a) - (c) The ﬁnal steady state distributionof cluster diameters in the three cases considered in this work and for different number of proteins N ph are shown. The three cases correspond tohow we initialise the chromatin: as a random walk (RW), a stretched random walk (S) or a fractal globule (FG) (see SI). (d) The cluster diameterdistribution measured from STORM experiments [3] shows a remarkable agreement with the simulations. (e) - (f) Snapshots from simulations showingthe clustering of polycomb proteins (red). (g) - (h) Images form STORM experiment [3]. The scale bar is 500 nm in all images. approximate power laws, where the apparent exponent is signif-icantly larger for the faster initial dynamics (Fig. S8). The ap-parent exponents are always lower than 1/3 at late times, whichwould be the exponent expected of diffusive coarsening for clus-ters of self-attracting particles [48]: possibly, this discrepancy isdue to the fact that the coarsening droplets are made up by DNA-bound, rather than freely diffusing, proteins.

DISCUSSION AND CONCLUSIONS

In summary, we studied the folding dynamics of a

Drosophila chromosome (chr3R), covering realistic genomic and time scales,by means of Brownian dynamics simulations. These simulationsare based on the simple assumption that chromosome structureis maintained by the action of “bridges”: these are architecturalproteins, or protein complexes, which can bind chromatin at manysites. We considered three types of chromatin-binding bridges: (i)PRC1, (ii) a generic heterochromatin-associating factor, and (iii)a generic euchromatin-associating factor, bridging enhancers andpromoters. Prior to this full model, we also considered a simpli-ﬁed, polycomb-only, version where only the role of PRC1 wasmodeled. This allowed us to study in more detail the nature andassembly dynamics of polycomb-associated structures. Our mod-eling is informed by a series of sequencing data, on the basis ofwhich we selected the afﬁnity between a chromatin “bead” and aprotein, these include ChIP-seq data (for PRC1), and chromatinstate data (for heterochromatin- and euchromatin-associating fac-tors). Our model can follow the evolution of both proteins andchromatin, and we compare the nuclear organization we ﬁnd insilico with that found experimentally. An example of typical ﬁnalconﬁguration that we obtain from our simulations is represented in Fig. (a), where we also show the typical 3D folding of hete-rochromatin (Fig. (b)), PcG binding sites (Fig. (c)) and euchro-matin (Fig. (d)).An important outcome of our work is that the folding dynam-ics of eukaryotic chromosomes in silico proceeds in a two-stepfashion. First, local domains (TADs) form fast (Figs. - and SIFigs. S6-S7) and reproducibly (independent of the initial chromo-somal conﬁguration, Fig. ). Later on, non-local contacts formmore slowly, and their structure markedly depends on the initialchromosomal state (Figs. and , ). TADs have emerged as uni-fying feature of metazoan domains, but the mechanisms drivingtheir folding are not clear. Our simulations suggest that interac-tion among chromatin bound proteins can play a signiﬁcant rolein driving the folding of TADs. Observation of good correlationbetween predicted chromatin topology with that of experimen-tally observed topology demonstrates that 1D binding landscapeof chromatin proteins can be helpful in predicting 3D organiza-tion.This is an important difference with respect to the previouswork on the folding of Drosophila chromosome fragments (1 Mbin size) [7], where signiﬁcant dependence of the initial condi-tion at the local (intradomain) scale was shown. The large-scaleBrownian dynamics simulations reported in this work suggestinstead a broader robustness with respect to the initial chromo-somal conformation (see also Suppl. Movies M1-M3 and M6-M7). In this previous study [7] effective interactions betweenchromatin beads were only considered, whereas here we directlymodel the proteins which mediate such interactions. Thereforewe are able to study the self-assembly of chromatin associatedproteins like polycomb subnuclear clusters and we can follow di-rectly the consequences of changing the nuclear concentration of

Figure 7 : Snapshots from simulations of the “full model”. (a-d)

Showthe ﬁnal conﬁguration of chr3R, as simulated via the “full” model (seetext for details). Yellow, black and magenta spheres represent PRC1,heterochromatin-associating, and euchromatin-associating bridging pro-teins, respectively. White chromatin beads are non-binding; Chromatinbeads coloured with hues of blue and hues of green denote weak andstrong afﬁnity to euchromatin-associating bridges; Red and orange chro-matin beads have strong and weak binding afﬁnity only for PRC1 pro-teins, respectively; beads coloured with hues of purple denote bindingsites for heterochromatin-associating proteins. Beads representing eu-and hetero-chromatin can also have some (weak or strong) afﬁnity toPRC1 proteins (resulting from the coarse graining procedure), and aretherefore shown with different hues of colour. (a)

All bead types areshown; (b)

Shows heterochromatin-associating bridges (black) and theirbinding sites; (c)

Shows PRC1 bridges (yellow) and their binding sites; (d)

Represent active chromatin regions together with their associatedprotein bridges (magenta). such proteins. Furthermore, because we model the chromosomeand associated proteins for biologically realistic time-scales, wecan directly probe the chromosomal dynamics; on the other hand,in a scaled-down model kinetic features can only be indirectly ac-cessed as the dynamics of the slowest dynamical processes, i.e. ,the formation of non-local interactions with genomic regions out-side the simulation domain, are not included.In addition, while our results for the dynamics of long-range(non-local) contacts can be ﬁtted with a time-scale of about sev-eral minutes (Fig. and ), the length of our simulation run-timeallow us to reach a steady state, indicating that the chromosomesbecome trapped into very long-lived metastable states which canretain a memory of their conformational history. Our ﬁndingsare therefore also consistent with the conjectured glassy dynam- ics of chromosomes [6, 45]. More generally, our results suggestthat the folding into local TADs is essentially an equilibrium pro-cess, driven by the interaction between chromatin and chromatin-associating bridging factors, while the same does not apply tothe pattern of non-local contacts, and inter-domain interactions,which both depend more strongly on the initial chromosomalstate and on stochasticity, leaving a signature of glassy dynamics(Fig. ). In other words, we ﬁnd that inter-TAD interactions maydepend on two main factors: (i) the chosen initial conﬁguration,as one can notice from the full contact maps in Fig. , and (ii)the (stochastic) temporal sequence of long-range contacts whichare established. The latter implies that different simulations mayhave different inter-TADs contacts as they follow independent tra-jectories in the phase space. This last observation is fully consis-tent with results obtained with single cell Hi-C, which displayedcell-to-cell variability due to intrinsic stochasticity [46]. Nonethe-less, we ﬁnd that local interactions, or intra-TADs structures, arelargely unaffected by such factors and this strongly supports theimportance and stability of bridging-induced attraction in estab-lishing local topological structures in interphase chromosomes.Our simulations of the two-step relaxation of chr3R startingfrom a mitotic-like state can be used to suggest a possible modelfor the complex rearrangements that chromosomes must undergoduring the cell cycle. In particular, the simulations predict a dy-namics during which chromosomes ﬁrst establish local contactsupon binding of chromatin proteins which are released from chro-matin as cells enter into mitosis [49], and then, later on, non-localcontacts form. The parent cell and daughter cells have the samechromatin organization, which in turn regulates gene expression,but during cell division huge reorganization of chromatin takesplace. Understanding the mechanism and kinetic pathway bywhich interphase contacts are re-established after cells exit frommitosis can provide important information in order to addresshow memory for cellular identity is maintained across cell gen-erations [49]. Our simulations suggest that chromatin associatedproteins play an important role in attaining the interphase organi-zation after exiting mitosis.In contrast to most of the simulations in the literature, ashighlighted previously, we followed the dynamics of chromatin-binding proteins during chromosome folding. In our simulations,nuclear bodies made up of chromatin binding proteins arise nat-urally, through a generic positive feedback loop (also known as“bridging induced attraction” [28, 31, 34]). Notably, this feed-back does not proceed indeﬁnitely, and polycomb clusters onlygrow up to a steady state size in our simulations; the structuraland kinetic features of these nuclear bodies also match very wellwith experimental observations. The size distribution of poly-comb clusters in steady state in silico is close to that observed in Drosophila

S2 cells by STORM (Fig. ). Furthermore, a poly-comb cluster can self-assemble in just 1-10 min in our simula-tions, which is similar to the assembly time found in vivo [50].Strikingly, the self-assembly dynamics of the nuclear bodies alsoshows a two-step kinetics with power law growth, as well as ar-rested coarsening (see SI, Fig. S5).Assumptions and predictions of our simulations appear reason-able as the ﬁnal output correlates well with experimental observa-tions. Overall good agreement is observed between the predictedand experimental contact maps (Fig. and SI Fig. S2), which is0even more enhanced when the full model is implemented (Fig. ).In addition, the comparison with the 4C-seq data (SI Fig. S3) sug-gests that the three types of bridges which we consider share someof their binding sites, so that they compete with each other tobind to these, and, as a result, knocking out some of the proteinsdoes not necessarily open up the associated chromatin domain.Therefore by incorporation of further one-dimensional informa-tion (from ChIP-seq data) better and more robust predictions canbe made about three-dimensional chromatin organization.In conclusion, the current work provides a framework withinwhich to model in a realistic way structural and kinetic featuresof the 3D organization of chromatin and chromatin-binding pro-teins. We expect this approach to be generalized in order to pre-dict how the organization is affected when the chromatin-proteinbinding afﬁnity landscape changes, e.g. , during development or indiseases, mainly based on the ChIP-seq data of the key bridgingfactors used here. In addition, in light of the observations re-ported here and of the size of the Drosophila genome, we suggestthat, with the aid of supercomputers, it is now theoretically feasi-ble to follow, in silico , the folding dynamics of entire eukaryoticgenomes on realistic time-scale. This represent a crucial mile-stone towards achieving a more comprehensive understanding ofthe key mechanism leading the 3D organization of the eukaryoticnucleus.

Simulations Details

The entire chromosome 3R of

Drosophila melanogaster wassimulated as a ﬁnitely-extensible and semi-ﬂexible [1] bead-spring co-polymer chain at a resolution of nm or . kbp.The different type of beads composing the polymer representthe speciﬁc chromatin states as obtained from ChIP-seq tracks of Drosophila

S2 type cells [8]. The persistence length of the chainis set to l p = 3 σ (cid:39) kbp. The dynamics of the monomers isevolved by means of a Brownian Dynamics (BD) scheme. Thefriction imposed on each monomer and the stochastic term addedby the Brownian Dynamics scheme take into account the pres-ence of a surrounding thermal bath whose temperature is set to T = 300 K (see SI for further details). The Langevin equationis then integrated using the LAMMPS package in the canonical(NVT) ensemble. The systems were simulated in dilute regime(volume fraction ρ = N πσ / V ol (cid:39) . ) and conﬁnedin a box to avoid self-interactions through the periodic bound-aries. The binding proteins are modelled as spheres with diam-eter σ = 30 nm; these interact sterically with other proteins andthe chromatin via a cut-and-shift LJ potential (WCA) allowing foran attraction region between speciﬁc chromatin states and corre-sponding binding proteins. The afﬁnity of the binding proteins tospeciﬁc chromatin loci is varied to probe the robustness of the re-sults (see SI for details). The initial conﬁgurations we consideredare the most addressed in the literature and range from the self-avoiding random walk to a fractal globule state to model a ran-dom interphase conformation and from an elongated (persistent)random walk to a collapsed stack of rosettes to model possiblepost-mitotic situations (see SI for further details). ACKNOWLEDGEMENTS

DMa and DMi acknowledge support from the ERC (Consol-idator Grant THREEDCELLPHYSICS, Ref. 648050). AHW is aRamanujan fellow, awarded by Department of Science and Tech-nology, India. [1] Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M.,Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner,M.O., et al. (2009). Comprehensive mapping of long-range inter-actions reveals folding principles of the human genome.

Science

Cell

Nature

Cell

Nat. Rev. Gen.

Cell

Nat. Struct. Mol. Biol.

Trends CellBiol.

Nat. Comm. , 10291.[10] Seitan, V. C. et al. (2013) Cohesin-based chromatin interactions en-able regulated gene expression within preexisting architectural com-partments. Gen. Res.

23: 2066-2077.[11] Sleeman, J.E., and Trinkle-Mulcahy, L. (2014). Nuclear bodies:new insights into assembly/dynamics and disease relevance.

Curr.Opin. Cell Biol.

Chem. Rev.

Nat. Rev. Gen.

Mol. Cell. [19] Buchenau P, Hodgson J, Strutt H, Arndt-jovin DJ (1998) The Dis-tribution of Polycomb-Group Proteins During Cell Division and De-velopment in. Cell 141(2):469481.[20] Xing H, Vanderford NL, Sarge KD (2008) The TBP-PP2A mitoticcomplex bookmarks genes by preventing condensin action. Nat CellBiol 10(11):13181323.[21] Fonseca JP, et al. (2012) In vivo Polycomb kinetics and mitoticchromatin binding distinguish stem cells from differentiated cells.Genes Dev 26(8):857871.[22] Follmer NE, Wani AH, Francis NJ (2012) A Polycomb Group Pro-tein Is Retained at Speciﬁc Sites on Chromatin in Mitosis. PLoSGenet 8(12).[23] Steffen P A, Ringrose L (2014) What are memories made of? HowPolycomb and Trithorax proteins mediate epigenetic memory. NatRev Mol Cell Biol. 15(5):34056.[24] Nora, E.P., Lajoie, B.R., Schulz, E.G., Giorgetti, L., Okamoto, I.,Servant, N., Piolot, T., van Berkum, N.L., Meisig, J., Sedat, J., Grib-nau, J., Barillot, E., Blthgen, N., Dekker, J., and Heard, E. (2012).Spatial partitioning of the regulatory landscape of the X-inactivationcentre. Nature

Methods

Proc. Natl. Acad. Sci. USA

Proc. Natl.Acad. Sci. USA

Proc.Natl. Acad. Sci. USA

Nucl. Ac. Res.

Nucleic Acids Res

GenomeBiol.

Biophys. J.

J. Phys. Condens. Matter

Elife

Nat. Rev. :123–135.[37] Kilic, S., Bachmann, A.L., Bryan, L.C., and Fierz, B. (2015). Mul-tivalency governs HP1 α association dynamics with the silent chro-matin state. Nat. Commun.

Cell , 212–224.[39] Kharchenko, P. V. et al. (2011). Comprehensive analysis of thechromatin landscape in

Drosophila melanogaster . Nature :480–485.[40] Mirny, L. A. (2011). The fractal globule as a model of chromatinarchitecture in the cell.

Chromosome Res. :37–51.[41] Sanborn, A. L. et al. (2015). Chromatin extrusion explains keyfeatures of loop and domain formation in wild-type and engi-neered genomes. Proc. Natl. Acad. Sci. USA early edition, doi:10.1073/pnas.1518552112.[42] Francis NJ (2004) Chromatin Compaction by a Polycomb GroupProtein Complex. Science (80) 306(5701):15741577.[43] Vandenbunder, B., Fourr´e, N., Leray, A., Mueller, F., V¨olkel, P.,Angrand, P. O. and H´eliot, L. (2014). PRC1 components exhibit dif-ferent binding kinetics in Polycomb bodies.

Biol. Cell. :111–125.[44] Rosa, A., and Everaers, R. (2008). Structure and dynamics of inter-phase chromosomes.

PLoS Comput Biol

Phys. Rev. Lett.

Nature

Science , 948–953.[48] Chaikin, P. M. and Lubensky, T. C. (1995).

Principles of CondensedMatter Physics , Cambridge University Press, Cambridge.[49] Alberts B., Johnson A., Lewis J., Morgan D., Raff M. (2014)

Molecular Biology of the Cell , Taylor & Francis.[50] Brand, P., Lenser, T. and Hemmerich, P. (2010). Assembly dynam-ics of PML nuclear bodies in living cells.

PMC Biophysics :3.[51] Kremer K, Grest GS (1990) Dynamics of entangled linear poly-mer melts: A molecular-dynamics simulation. J. Chem. Phys. Supplementary MaterialSimulation Details

The ﬁnitely-extensible worm-like chain is modelled via theKremer-Grest model [1] as follows: Let i and d i,j ≡ r j − r i be respectively the position of the center of the i -th bead and thevector of length d i,j between beads i and j , the connectivity ofthe chain is treated within the ﬁnitely extensible non-linear elas-tic model with potential energy, U F ENE ( i, i + 1) = − k R ln (cid:34) − (cid:18) d i,i +1 R (cid:19) (cid:35) for d i,i +1 < R and U F ENE ( i, i + 1) = ∞ , otherwise; here wechose R = 1 . σ and k = 30 (cid:15)/σ . The bending rigidity of thechain is captured with a standard Kratky-Porod potential, U b ( i, i + 1 , i + 2) = k B T l p σ (cid:20) − d i,i +1 · d i +1 ,i +2 d i,i +1 d i +1 ,i +2 (cid:21) , where we set the persistence length l p = 3 σ in order to reproducethe observed stiffness of chromatin, i.e. l p (cid:39) nm.The steric interaction between monomer a and monomer b (ofsizes σ a and σ b , respectively), representing either a chromatinregion, or bridge proteins is taken into account via a truncatedand shifted Lennard-Jones (or WCA) potential U LJ ( i, j ) = 4 (cid:15) ab (cid:34)(cid:18) σ c d i,j (cid:19) − (cid:18) σ c d i,j (cid:19) − (cid:18) σ c r c (cid:19) + (cid:18) σ c r c (cid:19) (cid:35) for d i,j < r c and 0 otherwise and where σ c = ( σ a + σ b ) / is the minimumdistance between beads a and b and r c is the chosen cut off. Thisparameter is set to r c = 2 / σ in order to model purely repul-sive interactions and instead set to r c = 1 . σ to include attractiveinteractions. In both cases, the potential is shifted to zero at thecut-off in order to have a smooth curve and avoid singularities inthe forces. Purely repulsive interactions, such as those betweenproteins, have (cid:15) ab = 1 , while attractive interactions such as thosebetween PRC1 proteins and Polycomb binding sites have beenvaried to study the robustness of the results. The range of param-eters are reported in Table S1.In order to avoid topologically frustrated conﬁgurations we al-low the beads forming the chain the pass through themselves witha small probability. This is regulated by a soft potential of theform U soft ( i, j ) = A (cid:20) (cid:18) πd i,j r c (cid:19)(cid:21) θ ( r c − d i,j ) (A.1)with A = 20 k B T being the maximum height of the potentialbarrier, i.e. , crossing probability p = exp − .Denoting by U the total potential energy, the dynamic ofthe beads forming the polymer is described by the followingLangevin equation: m ¨ r i = − ξ ˙ r i − ∇ U + η (A.2) Chromatin state Binding Protein (cid:15) ab [ k B T ]PRC1 strong [3] PRC1 6-10 ( (cid:15) )PRC1 weak [3] PRC1 2-5 ( (cid:15) )Heterochromatin [4] HF 2 ( (cid:15) )Euchromatin strong (States 1 3) [5] TF 9 ( (cid:15) )Euchromatin weak (States 2 4) [5] TF 3 ( (cid:15) )TABLE I: Table of the parameters (cid:15) ab used to model the attractive inter-action between DNA and speciﬁc binding proteins. The polycomb bind-ing sites are taken from Ref. [3]. The eu- and hetero-chromatin states areobtained from the works Ref. [4, 5] (see main text). The strong and weakafﬁnities of the PRC1 binding sites are varied to explore a range of cases.The strongly and weakly binding euchromatin represent strong enhancer-promoter interactions and weaker interactions in the other cases (suchas generally active euchromatin), respectively. These can be bound togeneric transcription factors (TF). Heterchromatin is instead bound bygeneric heterochromatin factors (HF). where ξ is the friction coefﬁcient and η is the stochastic delta-correlated noise. The variance of each Cartesian component ofthe noise, σ η satisﬁes the usual ﬂuctuation dissipation relationship σ η = 2 ξk B T .As customary [1] we set m/ξ = τ LJ = τ Br , with the LJtime τ LJ = σ (cid:112) m/(cid:15) and the Brownian time τ Br = σ/D b ischosen as simulation time step. Here, D b = k B T /ξ is thediffusion coefﬁcient of a bead of size σ . From the Stokesfriction coefﬁcient of spherical beads of diameters σ we have: ξ = 3 πη sol σ where η sol is the solution viscosity. One can mapthis to real-time units by using the viscosity of nucleoplasm,which ranges between − cP, and by setting T = 300 Kand σ = 30 nm, as previously deﬁned. From this it follows that τ LJ = τ Br = 3 πη sol σ /(cid:15) (cid:39) . − ms. The numerical integra-tion of Eq. (A.2) is performed by using a standard velocity-Verletalgorithm with time step ∆ t = 0 . τ Br and is implemented inthe LAMMPS engine. We perform simulations up to τ Br which correspond to 2-20 minutes in real time. The thermalenergy k B T is set to (cid:15) , the energy scale of the WCA potential forthe purely repulsive interactions. Modelling the Afﬁnities between Proteins and Chromatin

For the “polycomb-only” model we only consider proteinPRC1 and PcG binding sites; this means that the polymer is madeup of beads that can be one of three types: (i) non interacting, (ii)strongly binding to PRC1 ( (cid:15) ) or (iii) weakly binding to PRC1( (cid:15) ). The polymer is then initialised in a box together with a ran-dom distribution of PRC1 proteins, here modelled as spheres ofdiameter σ which can then freely interact to the polymer. In thefull model the beads can be in one of the 12 possible states whichare made by considering that a bead is a coarse-grained sectionof chromatin ( . kbp) and can therefore encompass more thanone chromatin state. If the two states are not mutually exclu-sive (such as euchromatin and heterochromatin for which only themore abundant state is chosen as the marking type for that bead)then a bead can have have an afﬁnity for both, PRC1 and HP1 or3PRC1 and TF. For the ﬁrst case, one can choose between the 3combinations of afﬁnities ( (cid:15) , (cid:15) or no afﬁnity) for PRC1 and onefor the heterochromatin (cid:15) . For the second case, one has to com-bine three PRC1 afﬁnities and two TF afﬁnities ( (cid:15) , (cid:15) )(see tableT1). In total we therefore have 9 mixed types, one neutral and 2PRC1-only afﬁne. Alongside, three proteins identifying genericbinding proteins (PRC1, HP1 and TF) complete the system. Pre-Equilibration

In order to avoid numerical blow-up and unwanted metastablestates dominated by entangled (knotted) conﬁgurations, we modelthe interaction between beads forming the chromatin via a stiffsoft potential, i.e. U pre − soft ( i, j ; t ) = A pre ( t ) (cid:20) (cid:18) πd i,j r c (cid:19)(cid:21) θ ( r c − d i,j ) . (A.3)with A pre ( t ) a ramp function increasing from 0 to 20 k B T dur-ing the pre-equilibration time. This potential allows for frequentstrand-crossing events at the beginning of the pre-equilibrationand progressively forbids them in time. This aims to reproducethe presence of topological enzymes such as topoisomerases,which can un-entangle knotted chromatin conformations, espe-cially close the mitotic state. The soft potential (with A pre ( t ) = A = 20 k B T ) is also employed later on during the simulationsto allow rare strand-crossings due to particularly entangled andfrustrated conﬁgurations and mimics the presence (although atsmaller concentrations) of Topo II. Initial Conﬁgurations

In this work we four different classes of initial conditions: (i)Self-avoiding Random Walk (RW), (ii) Stretched (S), (iii) Frac-tal globule (FG) and (iv) Mitotic (M). The RW conﬁguration of a N -beads long polymer is generated by standard means in whichevery bond connecting two beads is picked in a random direc-tion, uncorrelated with respect to the previous bond. This is thenturned into a self-avoiding polymer by performing a short run inwhich the beads are subject to the soft potential of eq. (A.3). Thestretched conﬁguration is generated by biasing the RW algorithmin order to display a larger persistence length along the ˆ z directionand by performing the short soft repulsion run. The FG conﬁgura-tion is generated via a quick collapse of a RW conﬁguration. Thisis done by extending the cut-off of the WCA potential describingthe interaction between the polymer beads. In this way we inducea fast non-equilibrium collapse. A conﬁning sphere is also usedto drive the symmetric collapse of the polymer which would, oth-erwise, display segregated (dumbbell-like) sections. The mitoticconformation has been modelled by using the equations reportedin Ref. [6]. The following generalised helix describes a stack ofrosettes of length l r = 200 σ (cid:39) kpb with 12 “leaves” each,meaning that each loop forming a rosette is around kbp long. The equations are x ( φ ) = r chr (cid:104) f + (1 − f ) cos ( kφ ) cos φ (cid:105) y ( φ ) = r chr (cid:104) f + (1 − f ) cos ( kφ ) sin φ (cid:105) z ( φ ) = pφ π where f × r chr = 0 . × σ (cid:39) . σ (cid:39) nm is the ra-dius of the mitotic cylinder, p is the vertical step for each fullturn and k = 6 in order to get 12 leaves per rosette. Thismeans that beads, forming a polymer with contour length L c = 9000 σ (cid:39) . Mbp, are laid into planes of beads eachand with separation of σ between planes, leading to a cylinder ofheight h (cid:39) . µ m.Interestingly, we observed that the local folding dynamics islargely independent on the chosen initial conﬁguration. An ex-ample is reported in Fig. S2 where we compare a PcG topologi-cally associated domain folding on sub-megabase scales and con-sistently recovered for several choices of initial conditions andwell matching with the experimentally observed structure. Onthe other hand, we observed that the non-local, long range, fold-ing is instead affected by the choice of initial condition. This canbe seen from the full contact maps reported in Fig. 1 of the maintext. Observables

The most important observables that we keep track of are thecontact maps, radius of gyration and clustering of PcG proteins.These quantities are reported in main text and SI, and if not oth-erwise speciﬁed, they are produced by averaging over 6 indepen-dent replicas. The contact maps are obtained using a cut-off of σ for 3D neighbouring beads. The radius of gyration computed as R g = 12 N N (cid:88) i =1 N (cid:88) j =1 [ r i − r j ] . (A.4)The clustering analysis is performed setting a cut-off distance d c of σ , for which two beads are in the same cluster only if theirdistance is smaller than d c . All other observables, can be obtainedby further processing these quantities. Robustness of interaction parameters

The robustness of the results is probed by repeating the simula-tions for different values of (cid:15) and (cid:15) . We performed the test onlyfor the “polycomb-only” case. We observed that the cluster dis-tribution is consistent with experiment (see Fig. 6 in main text), i.e. it shows a peak at cluster diameters at d C (cid:39) nm, for arather spread region of the interaction parameters (cid:15) and (cid:15) . Weexplored the ranges (cid:15) ∼ − and (cid:15) ∼ − . In all the caseswe could observe the formation of clusters with diameters rang-ing between − nm. Nonetheless, the nature of the clusterschanged when the difference between (cid:15) and (cid:15) was in the regionof few k B T s or (cid:15) was larger than k B T . In these regimes the4clusters could be seen to contain a larger fraction of weakly poly-merising polycomb binding sites. At the same time, the contactprobability of monomers s segments apart would show a powerlaw behaviour strongly deviating from s − . This is reported forone case where (cid:15) = 10 k B T and (cid:15) = 5 k B T in Fig. S8. Simulated 4C experiments

In order to compare our model with experiments we also per-form a simulated 4C experiment. By placing a “bait” at 12.77Mbp, we can record the contact proﬁle along the rest of the chro-mosome (Fig. S3). Interestingly, the forward and backward pro-ﬁle are not symmetric with respect to the bait. In particular, thebackward proﬁle is generally higher than the forward one, indicat-ing that the location of the bait is at the end of a TAD. In addition,we observe that both, “PcG-only” and “full”, models capture thegeneral behaviour of the proﬁle, although the latter is closer to theexperimentally probed intensity. This can be due to the fact thatconsidering more proteins types and possible binding sites createsa competing effect that lowers and redistributes the chromosomecontacts.

Coarsening

Another important aspect that has been addressed in the litera-ture [7] is the problem of the stability of the observed TADs. Un-derstanding whether the domains are destined to coalesce at longenough times or whether they have reached a steady state is ofparamount importance in both, experiments and simulations andto understand the nature of these domains. In order to investigatethis aspect we have looked at the evolution of the cluster sizes andof the contact maps for various initial conditions during the courseof simulations. The simulations performed can be mapped to upto 2 or 20 minutes (depending on the viscosity η sol (cid:39) − cP) in real time and can therefore shed some light onto the long-time dynamics and evolution of the clusters. We computed theaverage size of the clusters of PH proteins by taking the aver-age number of proteins belonging to a cluster and by assuminga spherical shape for the clusters, obtained the average diameteras d = (cid:104) n (cid:105) / σ (we also assume the proteins to occupy a spher-ical volume of radius σ/ ). We observed that the clusters showa slowing down at large times, in that their average size scales very weakly with time (Fig. S5). In addition, the coarsening dy-namics is never observed to scale faster than t . at large time. InFig. S5 we report the cluster size count as a function of time forthe “polycomb-only” (a-f) and the “full” (g-i) model. In Fig. S6we also report several contact maps at different simulation time-steps for the “full” model and starting from the mitotic state (seeabove for the mitotic initialisation and Suppl. Movies M4-M6).Fig. S6 and S7 show that the structures obtained are long livedand seem to have reached a steady state. In the main text (Fig. 8)we also show the evolution of a shorter segment between 23 and24.5 Mbp from the initial (RW). This again displays a remarkablelong-lived structure. These ﬁgures (S4 - S7) strongly suggest thatthe ﬁnal structures that we obtain are truly stable and long lived.Whether their stability is kinetic or thermodynamic in nature is anopen problem that we aim to address in the future as it has partic-ular relevance towards the understanding of the 3D organisationof in vivo chromatin, where the environment is strongly drivenout-of-equilibrium. [1] Kremer K, Grest GS (1990) Dynamics of entangled linear polymermelts: A molecular-dynamics simulation. J Chem Phys 92(8):5057.[2] Sexton T, et al. (2012) Three-dimensional folding and functional or-ganization principles of the Drosophila genome. Cell 148(3):458472.[3] Wani, A. H., Boettiger, A. N., Schorderet, P., Ergun, A., Munger,C., Sadreyev, R. I., Zhuang, X., Kingston, R. E. and Francis, N. J.(2016). Chromatin topology is coupled to Polycomb group proteinsubnuclear organization. Nat. Comm. , 10291.[4] Filion, G. J., van Bemmel, J. G., Braunschweig, U., Talhout, W.,Kind, J., Ward, L. D., Brugman, W., de Castro, I. J., Kerkhoven,R. M., Bussemaker, H. J. and van Steensel, B. (2010). SystematicProtein Location Mapping Reveals Five Principal Chromatin Typesin Drosophila Cells. Cell , 212–224.[5] Kharchenko, P. V. et al. (2011). Comprehensive analysis of the chro-matin landscape in

Drosophila melanogaster . Nature :480–485.[6] Rosa, A., and Everaers, R. (2008). Structure and dynamics of inter-phase chromosomes.

PLoS Comput Biol

Nucl. Ac. Res.

Cell Figure

S1: (left)

Density of binding sites in

Drosophila genome as a function of their intensities. One can notice two distinct types of bindingsites for a PcG protein, PH. These strong (red) and weak (black) binding sites were identiﬁed previously [3] and sequencing data was deposited inNCBIs Gene Expression Omnibus and accessible under accession number GSE60686. (right)

ChIP-seq (binding) proﬁle for strongly (red) and weakly(black) binding sites on

Drosophila chr3R. The y-axis is arbitrary in that the strong (red) binding sites have higher values than the black ones only forvisualisation purposes. The afﬁnity of the binding sites is tuned to study a range of cases (see text), while the position of the binding sites reportedhere is carefully mapped to the coarse-grained polymer model.

Figure

S2: These plots show the contact maps found experimentally (Exp) , and by simulations (FG-RW-S) for the region 12-13 Mbp. The samecolorbar applies to all plots and is shown at the top. The simulated contact maps correspond to different initial conditions: fractal globule (FG) ,random walk (RW) , stretched conﬁguration (S) . As these results are reported form the “polycomb-only” model, it is important to notice that the PcGTADs are the ones highlighted by the blue squares, and therefore our model correctly captures those, and not the domain in the middle. The localfolding is robust and essentially independent of the initial condition and also shows additional contacts between the PcG domains in agreement withthe experiments. Figure

S3:

Comparison between simulations and 4C-seq data.

The simulated 4C experiment is in qualitative agreement with the experimentalproﬁle (dark grey, from Ref. [3]). Both models, “polycomb-only” (hues of blue) and “full” (hues of red), qualitatively capture the asymmetric proﬁlewhile the latter model also provides some clues on how the competition between binding proteins might play a role in determining the chromatinfolding (see discussion in main text).

Figure

S4: Evolution of the radius of gyration R g of the chromosome ch3R in the case of the “polycomb-only” (left) and and “full” (right) model. Thedifferent initial conﬁgurations are indicated in the ﬁgure legends. The “full” model drives a stronger compaction, possibly due to the more numerousproteins considered. The fractal globule case (the dramatic decay indicates the artiﬁcial collapse of the polymer prior to its release) shows instead aswelling that contributes to losing many local contacts. Figure

S5: (a-b)

Time evolution of the average cluster size. The coarsening dynamics of the binding proteins slows down towards the end of thesimulation, indicating that the systems has reached a (quasi-)steady state. (c-f)

Time evolution of the cluster size for the case of the “polycomb-only”model starting from a random walk conﬁguration and with different number of PH proteins ( N ph ). The cluster size count reaches a steady state whoseaverage is around nm, remarkably close to the one experimentally observed (see main text). (g-i) Coarsening dynamics for the full model. Theclusters of PRC1 display a severely slow coarsening towards the end of the simulation also in this case. Figure

S6: Time evolution of the whole contact map staring from a mitotic-like conﬁguration. The snapshots are taken at t = 0 , . , . , . , . and seconds (for viscosity η = 10 cP). One can notice a very deﬁned rosette structure in (a) that is lost to the advantage of the formation of TADs.See Movie M4. Figure

S7: Time evolution of domains between 10 and 20 Mbp starting from a mitotic-like conﬁguration. The snapshots are taken at t = 0 , . , . , . , . and seconds (for viscosity η = 10 cP). One can notice a very deﬁned rosette structure in (a) that is lost to the advantage of theformation of TADs. See Movie M5. Figure

S8: Cluster distribution, contact probability and a snapshot for the case in which the chromosome 3R of

Drosophila was initialised as a RandomWalk (RW) and with (cid:15) = 10 k B T and (cid:15) = 5 k B T . In this case there are clusters with a large fraction of “polymerisation independent binding sites”(PIBS), as highlighted by the blue arrow. In the ﬁgure, red and blue beads are respectively, strong and weak Polycomb binding sites while grey beadsare neutral. Yellow beads represent PH (or PRC) bridge proteins. Figure

S9: Contact probability for the full model starting from RW or mitotic-like conﬁgurations. One can notice that for both cases, the sub-megabasecontacts decay with an exponent around . − .7