[PDF] Merging 1D and 3D genomic information: Challenges in modelling and validation

Abstract

Genome organization in eukaryotes during interphase stems from the delicate balance between non-random correlations present in the DNA polynucleotide linear sequence and the physico/chemical reactions which shape continuously the form and structure of DNA and chromatin inside the nucleus of the cell. It is now clear that these mechanisms have a key role in important processes like gene regulation, yet the detailed ways they act simultaneously and, eventually, come to influence each other even across very different length-scales remain largely unexplored. In this paper, we recapitulate some of the main results concerning gene regulatory and physical mechanisms, in relation to the information encoded in the 1D sequence and the 3D folding structure of DNA. In particular, we stress how reciprocal crossfeeding between 1D and 3D models may provide original insight into how these complex processes work and influence each other.

Full PDF

MMerging 1D and 3D genomic information: Challenges in modelling and validation

Alessandra Merlotti ∗ Department of Physics and Astronomy (DIFA), University of Bologna,Viale Berti Pichat 6/2, 40127 Bologna Italy, and INFN Sez. Bologna, Italy

Angelo Rosa † SISSA - Scuola Internazionale Superiore di Studi Avanzati, Via Bonomea 265, 34136 Trieste, Italy

Daniel Remondini ‡ Department of Physics and Astronomy (DIFA), University of Bologna,Viale Berti Pichat 6/2, 40127 Bologna Italy, and INFN Sez. Bologna, Italy

Genome organization in eukaryotes during interphase stems from the delicate balance be-tween non-random correlations present in the DNA polynucleotide linear sequence and thephysico/chemical reactions which shape continuously the form and structure of DNA and chromatininside the nucleus of the cell. It is now clear that these mechanisms have a key role in importantprocesses like gene regulation, yet the detailed ways they act simultaneously and, eventually, cometo inﬂuence each other even across very diﬀerent length-scales remain largely unexplored. In thispaper, we recapitulate some of the main results concerning gene regulatory and physical mecha-nisms, in relation to the information encoded in the 1D sequence and the 3D folding structure ofDNA. In particular, we stress how reciprocal crossfeeding between 1D and 3D models may provideoriginal insight into how these complex processes work and inﬂuence each other.

I. INTRODUCTION

The interplay between the 1D sequence and the 3Dfolding of DNA in chromosomes and cell nuclei is me-diated by the delicate balance between classical phys-ical forces stemming from the DNA nature as a long,tightly packed polymer ﬁlament [1–6] and complex chem-ical processes governing DNA and histone methylations,nucleosome positioning and the binding of transcriptionfactors to DNA sequence [7, 8] whose actions repre-sent fundamental driving mechanisms in cell-fate deci-sion [9]. For these reasons, understanding how the 1Dgenome aﬀects its 3D spatial organization (and, vicev-ersa) is a challenging task that requires a deeper under-standing of both, the physico/chemical forces governingDNA folding and the mechanisms beyond gene regula-tion: advancing along this ambitious direction is com-pelling now more than ever, as it stands as the prerequi-site for the comprehension of complex pathologies such ascancer [10], laminopathies and premature aging diseaseslike Hutchinson-Gilford progeria and Werner syndromes[11, 12].In eukaryotes, every ≈

200 basepairs of the long DNAﬁlament of each chromosome wrap around the histonecomplex [14], by creating a necklace-like linear sequenceof nucleosomes , commonly known as the 10nm chromatinﬁber, see Fig. 1. The present understanding of chromo-some organization on spatial scales beyond the 10nm-ﬁber (in particular with respect to the existence of the ∗ Electronic address: [email protected] † Electronic address: [email protected] ‡ Electronic address: [email protected] “elusive” 30nm-ﬁber [13, 15–19]) appears still remarkablyconfused.However ambitious though, merging the informationcoming from the 1D/3D levels of knowledge promises tobe increasingly aﬀordable in the next future especiallythanks to the recent, dramatic progress in sequencingtechniques, such as the recent ATAC-seq and ChIA-Drop,which helped gaining new insights into the comprehen-sion of 3D DNA organization as a function of 1D epi-genetic “marks”, in particular by allowing to map chro-matin accessibility and nucleosome positioning genome-wide in a faster and more sensitive way than MNase-seq and DNase-seq [20] as for ATAC-seq, and revealingpromoter-centred multivalent interactions in the ChIA-Drop case [21].At the same time, high-precision/high-resolution ex-perimental techniques have now greatly contributed toexpand our understanding of the physico/chemical prop-erties of DNA in vivo : • Chemical “painting” of DNA sequences by “ﬂu-orescence in situ hibridization” (FISH) (Fig. 2)shows that chromosomes fold into compact confor-mations (chromosome “territories” [22, 27]), whichhave non-random, gene-correlated locations insidethe nucleus [23] and are crucial to cell correct be-havior [22, 23, 27]: in particular, territories helpkeeping some sort of “physical barrier” betweenclose-by chromosomes (see Fig. 2), with minimalamount of tangling [28] at the borders. • Then, the internal structure inside each territorydiscloses itself by chromosome conformation cap-ture techniques (3C [29]) and HiC [24]), whichare based on chromatin-chromatin cross-linking fol-lowed by DNA sequencing (Fig. 3, top): this a r X i v : . [ q - b i o . GN ] M a y OUTLINE OF THE REVIEW

Section 1 – Introduction

State of the art modellingand experimental evidence

Section 2 – DNA sequences: random walk-basednull models and deviations from them

Section 3 – Chromatin structure reconstruction

Section 2.1 – Random walks on DNA

DNA sequences as anomalous walks: experimental evidences

Section 2.2 – Dinucleotide interdistances

Functional role of specific DNA motifs recovered from sequence statistics

Section 3.1 – Bottom-up approaches

Principle – driven random polymers with minimal assumptions: features and limitations Section 3.2 – Top-down approaches

Inclusion of data-driven constraints toreproduce experimental observations procedure showed that chromosomes display acheckerboard pattern of interactions [24] reveal-ing some compartmentalization into open/closedmega-basepair-sized sub-domains (Fig. 3, top).At higher resolution, chromosomes cluster [25]into “topologically-associated domains” (TADs),regions separated by boundaries enriched for spe-ciﬁc protein factors and identiﬁed by the unusuallyhigh number of contacts recorded in the each TAD’sinterior which drops suddenly at the boundaries(see the heat maps in Fig. 3, bottom). Interest-ingly, chromosome organization into TADs appears“universal”, being both stable across diﬀerent celllines and across diﬀerent species [26].With these premises, the hard core of the challenge liesin elaborating appropriate models that take into accountor even integrate the 1D/3D levels of information. Inthis review, we discuss the state-of-the-art of computa-tional approaches which – in our opinion – have bestcontributed to shed new light on this fascinating andpromising research ﬁeld. To this purpose, we adopt thefollowing outline: • In Section II we discuss 1D models for understand-ing non-random features in DNA sequences. • In Sec. III we present a comprehensive catalogue oftheoretical models based on polymer physics whichdescribe relevant aspects of chromosome folding.Unless stated diﬀerently, we consider models for chromosome conformations during interphase [14], i.e chromosomes within nuclear conﬁnement.In both sections we talk mainly of genome structure inhigher eukaryotes (like mammals), occasionally thoughwe generalize to other classes of organisms. Finally, weconclude the work (Sec. IV) by highlighting promisingdirections for future work, in particular with regard towhat one can possibly learn by exploiting the connectionsbetween 1D and 3D modeling approaches.

II. READING THE SEQUENCE: 1D MODELSFOR NUCLEOTIDE ORGANIZATION

At the 1D level, the DNA sequence can be representedas an ordinary string of text composed by four letters cor-responding to the four nucleotides: A, C, G, T. This sim-ple representation allowed to treat genomes as symbolicsequences and thus to exploit the knowledge developedin the ﬁelds of physics and statistics to extract informa-tion about their structure. In particular, two approacheshave revealed helpful to identify some peculiar structuralproperties of genomic sequences that are involved in geneexpression regulation, such as coding and non-coding re-gions [30, 31], enhancers [32] and CpG islands [33]: DNArandom walks and dinucleotide interdistance.

FIG. 1: Principles of chromosome folding. I. Schemat-ing cartoon of the 10nm-ﬁber structure resulting from DNAwrapping around the histone complex. Chromatin foldingbeyond the 10nm-ﬁber up to the scale of the whole chro-mosome remains controversial. Reproduced with permissionfrom Ref. [13].

A. Random walks on DNA sequences

One of the ﬁrst models of random walk on DNA se-quence [30] was deﬁned according to the following rule:the walker steps up ( u ( i ) = +1) if a pyrimidine (‘C’ or‘T’ nucleotides) occurs at position i along the sequence,otherwise for the opposite case of a purine (‘A’ or ‘G’ nu-cleotides) the walker steps down ( u ( i ) = − l steps as y ( l ) = l (cid:88) i =1 u ( i ) (1)and to identify regions with diﬀerent purine-pyrimidinecontent by plotting y ( l ) as a function of nucleotide dis-tance l (see Fig. 4), where positive slopes correspond tohigh concentration of pyrimidine and negative slopes cor-respond to high concentration of purines [34]. The powerof this simple approach is that diﬀerent hypotheses onDNA sequence organization can be mapped onto speciﬁc“null models” about the characteristics of such randomwalks, and can thus be tested against the properties of

1 2 3 4 5

FIG. 2: Principles of chromosome folding. II. Chromosome“painting” by FISH (panels 1 to 3) reveals that chromosomesoccupy distinct territories within the nucleus: panel 4 andpanel 5 show examples of chromosome territories in chickenand human ﬁbroblasts, respectively. Panels 1 to 4 are repro-duced with permission from Ref. [22], Panel 5 is reproducedfrom Ref. [23] under Creative Commons License. the real sequences. A fundamental statistical quantitycharacterizing any walk is the root mean square ﬂuctua-tion F ( l ) around the average displacement: F ( l ) = [∆ y ( l )] − [∆ y ( l )] (2)where ∆ y ( l ) = y ( l + l ) − y ( l ) and the bars indicate anaverage over all positions l on the gene. The calculationof F ( l ) is a key step in order to identify “anomalous”diﬀusion. In fact, in pure “random” walks F ( l ) ∼ l / ;otherwise, F ( l ) ∼ l α , with α (cid:54) = 1 /

2, thus revealing long-range correlations between walk steps, corresponding tocorrelations in nucleotide positioning process. One ofthe earliest and most relevant results obtained by ap-plying this method concerns the identiﬁcation of coding

FIG. 3: Principles of chromosome folding. III. (Top)Chromatin contacts by HiC (top) at 1Mbp-resolution canbe visualized in terms of heat maps. These maps showa “plaid-pattern” structure of intra-chromosome contactsstemming from chromosome compartmentalization into two(A/B) sub-compartments (bottom). Reproduced with per-mission from Ref. [24]). (Bottom) At higher resolution ( (cid:46) topologically-associated domains (TADs), regions characterized by unusu-ally frequent contacts well separated by narrower regions al-most interaction-depleted [25, 26]. TADs correlate well withknown epigenetic marks. Reproduced with permission fromRef. [25]. and non-coding sequences inside genes [30]. In particu-lar, long-range correlations were identiﬁed as systematicmarkers of the presence of intron-containing genes andnon-transcribed genomic regulatory elements, whereas,the absence of long-range correlations is characteristic ofcDNA sequences and genes without introns (Fig. 4).

FIG. 4: The DNA walk representation of intron-rich human β -cardiac myosin heavy-chain gene sequence (a), its cDNA (b),and the intron-less bacteriophage λ DNA sequence (c). Notethe more complex ﬂuctuations for the intron-containing genein (a) compared with the intron-less sequences in (b) and (c).Heavy bars denote coding regions of the gene. Reproducedwith permission from Ref. [30].

Moreover, long sequences (thousands of base pairs)were found inside non-coding regions, which were charac-terized by long-range correlations, and this led Buldyrev et al. [35] to apply a generalized L´evy-walk modelto non-coding sequences, and to hypothesize the exis-tence of DNA loops. In generalized L´evy-walks the typ-ical walk step l j can be very long (in fact, walk stepsare distributed according to a power-law distribution P ( l j ) ∝ /l µj with 2 < µ < l j , whose ends come close to eachother in the space. In fact, Buldyrev et al. pointedout that the long uncorrelated subsequences inside non-coding regions may correspond to repetitive elements,such as LINE-1, or retroviral sequences. B. Dinucleotide interdistance

Another approach results very powerful at identifyingstructural genomic features at the 1D level: the studyof dinucleotide interdistance distributions. The idea isinspired to the theory of ﬁrst-return-time distributionsin stochastic and deterministic processes by H. Poincar´e,who developed this model to study the trajectories ofbounded dynamical systems [36].Referring to genome sequences, the analysis can becarried out through the following steps: given a dinu-cleotide XY , where X and Y can take any value among { A, C, G, T } , its interdistance distribution ˆ p ( τ ) can becalculated by (i) identifying the positions x j ( j = 1 , , ... )of each XY along the sequence, (ii) calculating the dis-tance between two consecutive XY as τ j ≡ x j +1 − x j , (iii)counting the abundance of a given interdistance value τ and (iv) estimating its relative frequency ˆ p ( τ ) accordingto the formula:ˆ p ( τ ) = { j = 1 , , ... | τ j = τ } { j = 1 , , ... | τ j } , (3)where the numerator counts all values where τ j = τ whilethe denominator runs over all unrestricted values τ j .The ﬁrst analysis of this quantity on genome se-quences [37] showed that dinucleotide interdistance dis-tributions have a pronounced period-3 oscillatory be-haviour in protein-coding regions which is absent in thewhole-genome distributions and appears to be related tothe triplet structure of the protein-coding genetic code.Furthermore, the comparison between real distributionsand randomly generated ones revealed that the behaviourof CG dinucleotides is considerably diﬀerent from all theothers. This study opened the avenue to subsequentworks that led to methods for the identiﬁcation of CpGislands [33], and to a more general characterization of CGinterdistances in association to DNA methylation func-tionalities [38, 39]. In particular, the work of Paci etal. [38] revealed that CG interdistance distribution inhigher-order organisms greatly diﬀers from all other dinu-cleotides (see the comparison between Homo sapiens and

Mus musculus in Fig. 5), showing the strong exponentialdecay ˆ p ( τ ) ∼ e − τ/b . (4)This diﬀerence seems to be related to the diﬀerent rolethat methylation plays in this class of organisms [38].Interestingly, in higher-order organisms the characteristic“length-scale” b of Eq. (4) measuring the average contourlength distance between consecutive CGs showed a value200 bp < b <

300 bp, which is comparable to the typicalDNA ﬁlament wrapped around the histone complex [14].An even deeper analysis of CG interdistance distribu-tions was performed in human genome, by identifying theso-called Gamma-distributionˆ p ( τ ) ∼ τ a − e − τ/b (5) as the best ﬁtting model distribution [39]. Furthermore,in this work the authors extended the study to a largevariety of organisms spanning all available ranges of bi-ological complexity, ﬁnding that the value of parameter b is correlated to the biological complexity of the organ-ism category: in fact, it steadily increases moving frombacteria to vertebrates (see Fig. 6, left) and it is stronglycorrelated to CG density (CG%), displaying in particu-lar a power-law behavior b ∝ CG% m . The study showedthat all categories, except vertebrates, are characterizedby an exponent m ∼ −

1, which is compatible with a sim-ple null model predicting that the average distance be-tween dinucleotides is inversely proportional to the din-ucleotide density inside the sequence. For vertebratesinstead, the exponent m takes the value (cid:39) − . III. FOLDING THE SEQUENCE: 3D MODELSOF CHROMOSOME ORGANIZATION INEUKARYOTES “Predicting” 3D chromosome structure starting fromthe 1D DNA sequence – a question reminding in someway of the analogous protein folding problem [40] – is along-standing problem in cell biology and a very challeng-ing one. Although the two problems (DNA folding andprotein folding) may appear similar, a huge diﬀerencelies in the fact that DNA structure is not only guidedby the chemical properties of its components (as for pro-tein peptides) but relies on the complex interplay withmany epigenetic factors (histones, noncoding RNAs, co-hesins, lamines, etc.) that can be guided by “signals”set along the native DNA sequence (transcription factorbinding sites, enhancer/promoter binding, DNA methy-lation, etc.) some of which, possibly, might be still un-known. Moreover, chromosome state may depend onother important factors like, to mention a few, the par-

FIG. 5: Distribution functions of dinucleotide interdistances ( τ , measured in units of DNA basepairs) in log-log scale for Homosapiens (left) and

Mus musculus (right). The distribution for CG dinucleotides is represented in red. ticular cellular type, phase of the cellular cycle, gene ac-tivity and the mechanisms beyond DNA repair [14]. For-tunately, the rapid development and increasing availabil-ity of structural data on chromosome organization (FISHand HiC in primis , see Sec. I) alongside with the moreand more sophisticate analysis tools (see Sec. II) whichare now capable of detecting ﬁner and ﬁner correlationsin the 1D DNA sequences are rapidly shifting the ﬁeld to-wards a more conﬁdent description of how chromosomesfold inside the nucleus and how it reverberates on chro-mosome function.As for it, in recent years there has been an impressive“explosion” of models trying to ﬁll the missing concep-tual gap between the 1D DNA or chromatin sequenceand the 3D chromosome packing inside the nucleus. In-terestingly, most of (if not all) these models have beenproposed by physicists and are based on the (rather ob-vious) assumption that chromosomes are long polymerchains subject to the same classical [41, 42] laws of poly-mer physics: these laws can then be used to predict the invivo chromosome behavior and, then, make quantitativeand testable predictions.As it has been stressed in the Introduction (Sec. I),chromosome structure inside the nucleus remains highlycontroversial. It is no surprise then, that there exists aconspicuous literature concerning diﬀerent polymer mod-els presenting alternative scenarios to illustrate the linkbetween chromosome sequence and folding. In the nextsections we will discuss in more detail some of these mod-els and the physical bases of each of them.To better accomplish this purpose, it is instructive toclassify the models into two categories: 1. In the ﬁrst category (Sec. III A), we place thosemodels which rely on relatively few, minimal physi-cal assumptions. The idea behind these approachesis that certain features of chromosome organizationare common to all species and, in some respect, are more important than the details contained in eachDNA sequence which make each species so diﬀerentfrom any one else. Minimal models of this kind areextremely useful and instructive because they con-stitute the paradigm to understand the “nuclear”forces which continuously remodel the genomes.2. In the second category (Sec. III B), we considerthose polymer models which are constructed to sat-isfy a certain number of constraints obtained fromexperimental results. For this reason, we namethese data-driven models . These approaches arenow becoming especially popular, for one hopes toemploy them in the near future to provide accuratepredictions on how genomes react when the “na-tive” conditions upon which they were constructedchange as the result of some stress on the cell orbecause of some induced mutation on the DNA se-quence.

FIG. 6: (Left) Box-plots for the Gamma-distribution scale parameters b (see Eq. (5)) for seven categories of organisms: bacteria(BT), protozoa (PZ), fungi (FG), invertebrates (IN), plants (PL), non-mammal vertebrates (NMV) and mammal vertebrates(MV). (Right) Estimated average values and error bars for m exponents relative to the same classes of organisms. A. Chromosome organization by generic,“bottom-up” polymer physics

1. I. The role of topological constraints

Chromosomes are constituted by long chromatin ﬁla-ments tightly packed inside the nucleus. By neglectingall details related to the heterogeneity of DNA sequences,at ﬁrst approximation the entire system of chromosomescontained in the nucleus can be described as a solution ofpolymer chains [41, 42] subject to thermal ﬂuctuations.Under these conditions topological constraints, whichare known to force nearby polymer chains to move ran-domly by sliding past each other without passing througheach [41, 42], are expected to play a key role by aﬀectingchromosome structural and dynamical properties.In fact, it is a non-trivial question to ask how a sin-gle centimeter-long chromosome can be eﬃciently storedinside the nucleus which is typically about thousandtimes [14] narrower. While the presence of histone com-plexes and territories point towards the fact that chromo-somes maintain a certain level of compactness, they saynothing about how compactness can be practically andeﬃciently achieved. In this respect, physical theories ofpolymers may become useful.A major turning point occurred in the late ’80s whenGrosberg and colleagues published two inﬂuential pa-pers [43, 48], suggesting that the DNA or the chromatinﬁber of a single chromosome should exist in an unknot- ted, oﬀ-equilibrium state which they termed “the crum-pled globule”, see Fig. 7(A). Intuitively, this model can beconstructed by assuming that the linear DNA sequencefolds by hierarchical compaction from small up to thelargest scales: this fractal-like conformation features thetwo advantages of being maximally packed and knot-free.From the theoretical point of view, two possible mech-anisms leading to a crumpled globule were suggested:either by fast switching the solvent conditions of thepolymer chain from “good” to “bad” ( i.e. , polymer self-interactions turn from repulsive to attractive [48]) or byfast conﬁnement of the polymer into a narrow region [24].Either way, the chain has no time to fully relax fromits initial (knot-free) conformation, the ﬁnal state beingcrumpled and displaying the presence of domains. Con-versely, when the process of crumpling is slow, the ﬁnalstate is akin to an “equilibrium” globule with no domains(see the comparison between the two contrasting sets ofmodel polymer conformations in Fig. 7(A)). Althoughinteresting from the theoretical point of view, fast crum-pling is not expected to take place inside the cell.In 1999, Langowski and collaborators introduced theso-called random-loop model [49]: interphase chromo-some structure was described in terms of a self-repulsiverandom polymer with pairs of monomers permanentlybound to form small loops on the scale of ∼ A Structural relaxation Topological relaxation B Mitotic -like initial conformations

Interphase -like final conformations C Equilibrium globules Fractal globules A B C o n t a c t p r o b a b ili t y FIG. 7: The role of topological constraints in chromosome organization. (A, top) Schematic illustration of the “crumpledglobule”, showing the diﬀerent layers in the hierarchical folding. The fundamental units (the monomers, ﬁlled spheres) fold intoglobular structures of larger sizes (the smaller empty spheres), acting in turn as “super”-monomers in the following crumplingevent. The process proceeds then at the next stage, and so on. The ﬁnal structure resembles a fractal [43] with maximalcompactness. (A, bottom) Examples of polymer conformations obtained by computer simulations, illustrating the structuraldiﬀerences between equilibrium and crumpled (fractal) globules. Reproduced with permission from Ref. [24]. (B) Becausestructural and topological relaxations of mitotic-like conformations have markedly diﬀerent time-scales, chromosomes remaineﬀectively “trapped” into territorial-like conformations [1, 2, 44, 45]. Reproduced from Ref. [44] under Creative CommonsLicense. (C) Chromatin ﬁbers with (negative) levels of supercoiling form TAD-like structures [46, 47], reproducing contactpatterns observed in HiC experiments. Reproduced from Ref. [46] under Creative Commons CC BY License.

3D structure of the murine immunoglobulin heavy-chain locus. The random-loop model appears in qualitativeagreement with chromatin organization into TADs andterritories, however this is not entirely surprising becausethese motifs were directly imposed on the model and,then, not really explained .Instead, crumpled conformations can be easily ob- tained through a very simple physical mechanism whichlooks almost as the “reverse” of the one considered inthe construction of a crumpled globule. In two publi-cations [44, 45] Rosa and Everaers presented a polymermodel for chromosome organization implying that terri-tories emerge “spontaneously” as the result of the slowrelaxation of the mitotic-like original chromosome struc-ture (Fig. 7(B)): in other words, the microscopic topolog-ical chromatin state remains quenched in time with nochance to relax and chromosomes get trapped into crum-pled, territory-like conformations. It was proposed [44]then that the physical mechanism underlying chromo-some compaction is the same driving the folding of untan-gled ring polymers in concentrated solutions [2, 51–53].As demonstrated in [44, 45], the proposed model is able tocapture quantitatively generic chromosome features likeinternal chromatin-chromatin distances and HiC contactfrequencies with no ﬁtting parameters, and can be usedto model chromosome dynamics on time-scales from sec-onds to days in real time. Third, it can be also naturallygeneralized [54] so to take into account the heterogeneityof DNA sequence.We conclude the section connecting chromosome orga-nization and the topological properties of the chromatinﬁbers by mentioning some recent work by the Stasiak’sgroup in Lausanne [46, 47] which suggests a possible linkbetween the presence of supercoiling in chromatin andTADs (mentioned in Sec. I). Chromosomal DNA is ex-pected to be naturally supercoiled due to continuouslyongoing processes like replication and transcription. Thisexcess of supercoiling is expected to never relax, onceagain because of the typically large size of chromosomes.It may thus induce local crumpling of the chromatin ﬁber,similar to what occurs to a familiar phone cord whenexcessive twist is applied. By ﬁne-tuning the amountof supercoiling in a numerical polymer model for chro-matin ﬁbers, Stasiak and colleagues showed that the phe-nomenology of TADs, summarized by the excess of intra-domain contacts with respect to inter-domain contacts(see Fig. 7(C)), can be generically captured.

2. II. Sequence-speciﬁc chromatin-chromatin interactions

The polymer models presented in Sec. III A 1 showthat notable chromosome features like intra-DNA posi-tions and contacts may be quantitatively understood interms of the same theoretical mechanisms describing thephenomenology of entangled polymer solutions. On theother hand, there is more to chromosome biology whichrequires a thorough discussion.In this respect, it is known that certain species ofprotein complexes present in the nucleus tend to bindto speciﬁc DNA target sites and inﬂuence chromosomeorganization: important examples include the CCCTCbinding factor (CTCF) involved in promoter activationor repression and methylation-dependent chromatin insu-lation [55] and the trascription units which by clusteringinto transcription “factories” [56] mediate and regulatethe production of transcripts. The role of these protein-DNA interactions in chromosome architecture has beenaddressed in an increasing number of publications.In the so-called “strings-and-binders-switch” (SBS)polymer model [57], chromatin is described as a blockcopolymer where a certain fraction of monomers (the A Polymer B FIG. 8: The role of sequence-speciﬁc chromatin-chromatininteractions in chromosome organization. I. (A) In the“strings-and-binders-switch” (SBS) model, chromatin acts asa block copolymer with site-selective aﬃnity E X for speciﬁcbinders at concentration c m . Chromatin folding/unfoldingcan be represented in terms of the phase diagram in these twoparameters. Reproduced with permission from Ref. [57]. (B)Protein-like particles mimicking transcription factors bindingto cognate sites on a block copolymer model promote chromo-some compaction by forming rosettes and TAD-like domains.The model predicts also the spontaneous self-assembly of pro-teins into factories. Reproduced from Ref. [58] under CreativeCommons CC BY License. “binders”) act as binding sites for freely diﬀusive par-ticles, see Fig. 8(A). The binding of particles to DNAis dynamic (binders attach and detach intermittently atﬁnite rates), the mechanism being described in termsof two phenomenological parameters: the binder aﬃn-ity ( E X ) and the binder concentration ( c m ). It is thenpossible to construct a phase diagram in the E X - c m spacewhere a single line separates swollen from compact poly-mer conformations, as in the classical θ -collapse [42, 57]in polymer physics. The SBS model predicts that as peradaptation to continuously-changing external conditionschromatin is switching between these two states througha suitable combination of the concentration/aﬃty of thebinders, thus accounting qualitatively for the observedﬂuctuations in chromatin loci spatial positions and con-tacts as measured in FISH and HiC.In a variation of the SBS-model, Brackley et al. [58]pointed out that protein-like particles mimicking tran-scription factors which bind reversibly to cognate siteson a block copolymer model promote chromosome com-0 A B AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeCHjxWsB/QhrLZbtq1m03YnQgl9D948aCIV/+PN/+N2zYHbX0w8Hhvhpl5QSKFQdf9dgpr6xubW8Xt0s7u3v5B+fCoZeJUM95ksYx1J6CGS6F4EwVK3kk0p1EgeTsY38z89hPXRsTqAScJ9yM6VCIUjKKVWr1bLpH2yxW36s5BVomXkwrkaPTLX71BzNKIK2SSGtP13AT9jGoUTPJpqZcanlA2pkPetVTRiBs/m187JWdWGZAw1rYUkrn6eyKjkTGTKLCdEcWRWfZm4n9eN8Xw2s+ESlLkii0WhakkGJPZ62QgNGcoJ5ZQpoW9lbAR1ZShDahkQ/CWX14lrVrVu6jW7i8r9SCPowgncArn4MEV1OEOGtAEBo/wDK/w5sTOi/PufCxaC04+cwx/4Hz+AGt6jyA=

250 nm AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hd0V0WPAi8cI5gHJEmYns8mYeSwzs0JY8g9ePCji1f/x5t84SfagiQUNRVU33V1xypmxvv/tra1vbG5tl3bKu3v7B4eVo+OWUZkmtEkUV7oTY0M5k7RpmeW0k2qKRcxpOx7fzvz2E9WGKflgJymNBB5KljCCrZNa4ZWPpOhXqn7NnwOtkqAgVSjQ6Fe+egNFMkGlJRwb0w381EY51pYRTqflXmZoiskYD2nXUYkFNVE+v3aKzp0yQInSrqRFc/X3RI6FMRMRu06B7cgsezPxP6+b2eQmyplMM0slWSxKMo6sQrPX0YBpSiyfOIKJZu5WREZYY2JdQGUXQrD88ipphbXgshbeh9V6XMRRglM4gwsI4BrqcAcNaAKBR3iGV3jzlPfivXsfi9Y1r5g5gT/wPn8AV7COag== contact probability low high microphase separation (c) coil (a)

FIG. 9: The role of sequence-speciﬁc chromatin-chromatininteractions in chromosome organization. II. (A) Chromo-somes may fold due to epigenome-speciﬁc attractive interac-tions promoting microphase segregation and TAD-like organi-zation. Reproduced from Ref. [59] under Creative CommonsCC BY License. (B) Trimethylated chromosomal sites at-tracting each other by the mediated action of oligomerizedHP1 model proteins drive phase segregation into compact,heterochromatin domains vs. swollen, euchromatin domains.Reproduced with permission from Ref. [60]. paction, see Fig. 8(B). This model outline a picture wherea single chromosome is organized into spatial motifs likerosettes and topological domains similar to the ones ob-served in HiC experiments. Interestingly, as a by-productthe model predicts that proteins self-organize into clus-ters (or, factories [56]) due to a “bridging-induced attrac-tion” which is mediated by polymer folding.Alternatively (or, in addition) to the action of thebinders, the observed chromosome organization may bethe consequence of the partitioning into a small [61] setof distinct epigenomic domains which cluster together by epigenome-dependent attractive interactions. Jost andcollaborators [59, 62] have implemented this idea intoa copolymer model, where each monomer of a speciﬁcepigenomic domain bind exclusively to monomers of thesame species. The chromatin ﬁber associated to eachchromosome thus segregate by a physical mechanismknown as microphase separation (see Fig. 9(A)) whichdisplays a checkerboard pattern of contacts which mayexplain chromosome structure into TADs (reported inSec. I). In a related study involving a very similar com-putational set-up, Shi et al. have shown [63] that chro-matin dynamics is highly heterogeneous, reﬂecting theobserved cell-to-cell variations in the contact maps: fold- ing is a two-step, hierarchical process which involves theformation of TAD-like chromatin domains (or, droplets)followed by their “fusion” inside the entire territory.An interesting hypothesis on the connection betweenepigenetic marks (speciﬁcally, histone methylation) andchromosome folding has been recently formulated byMacPherson et al. [60]. By using Monte Carlo com-puter simulation of a nucleosome-resolved polymer modelcomplemented by H3K9me3-methylation patterns fromChIP-seq data, the authors suggested that dimeriza-tion of HP1 single protein units which bind preferen-tially to methylated chromatin sites drive chromatinsegregation into heterochromatin (dense and H3K9me3-rich) and euchromatin (open and H3K9me3-poor) do-mains, see Fig. 9(B). The segregation results in plaid-patterned heat-maps resembling those obtained in HiCexperiments.

3. III. Out-of-equilibrium eﬀects: loop extrusion andactivity-induced phase separation

Life is a dynamic process maintained through the con-tinuous contribution of external energy sources: as such,in recent years a conspicuous body of work on the experi-mental and theoretical aspects of non-equilibrium physicshas had a tremendous impact on our understanding ofhow living matter works [66]. In this respect, chromo-somes are no exception. In the following, we summarizea few works which have contributed to highlight the roleof non-equilibrium mechanisms with regard to chromo-some organization.Ganai et al. [64] suggested that certain reported cor-relation between chromosome positioning within the nu-cleus and gene density (see Sec. I) can be understood asthe consequence of diﬀerent “activity” levels: similarlyto the approaches described in previous sections chro-mosomes are modeled as coarse polymers, however – incontrast to the purely passive systems discussed so far –here each monomer is classiﬁed according to its level ofactivity (proportional to gene density) and coupled to aspeciﬁc, eﬀective temperature. Thus, a higher eﬀectivetemperature means a larger activity. With the additionof a given amount of permanent loops between chromatinﬁbers, this models shows that chromosomes tend to bepartitioned into clusters of diﬀerent temperatures, seeFig. 10(A). A rigorous physical explanation of this phe-nomenon was provided in Ref. [67] and later conﬁrmed inRef. [68] by means of systematic computer simulations:even small temperature gaps induce phase separation insystems of colloids or polymer chains. In spite of theintrinsic out-of-equilibrium nature of the system, it cannonetheless be shown that the phenomenon can be cap-tured by the analogy to the classical equilibrium theoryof binary mixtures which phase separate as the result ofdistinct chemical aﬃnities [42].Recently, it has been pointed out that active loop ex-trusion may be universally responsible for chromosome1

A B

FIG. 10: The role of active processes in chromosome organization. (A) Chromatin is classiﬁed as “inactive” and “active”depending on its gene content (top). Gene-poor and gene-rich chromosomes phase separate and form territories whose spatialpositions with respect to the nucleus correlate with experimental observations (bottom). Reproduced from Ref. [64] underCreative Commons CC BY License. (B) The condensin complex (in yellow) bind to the chromatin ﬁber (in black) and, bymoving into opposite directions, eﬀectively produces chromatin loop extrusion. Extrusion stops when two (or more) complexesbump unto each other (top). An apparently disordered tangled mass of chromatin can then self-organize into a regular arrayof extruded loops (bottom). Reproduced from Ref. [65] under Creative Commons Attribution License. segregation during mitosis [65, 69] and for chromosomecompartmentalization into TADs [70]. Speciﬁc proteinscalled “condensins” assemble into complexes and bondtogether spatially close loci on the chromatin ﬁber, seeFig. 10(B). Then, the chromatin ﬁlament ﬁxed by thecondensins starts to be eﬀectively extruded when thecomplex moves into opposite directions along the ﬁber.When two condensins collide into each other the translo-cation process stops. Moreover, with the addition oftopoisomerase-II the loop extrusion mechanism is ableto simplify chromosome topology by removing knotsand links [71, 72] between chromatin ﬁbers within thecrowded environment of the nucleus.

B. Building chromosomes by data-driven,“top-down” polymer models

The polymer models illustrated in Sec. III A employminimal physical assumptions in trying to capture vari-ous aspects of chromosomes phenomenology and, for thisreason, they have been generically termed “bottom-up”.The most fascinating side of these approaches is that theyoften make testable predictions which are amenable toexperimental validation.Recently, a number of studies have attacked the prob-lem of chromosome organization from a radically diﬀer-ent perspective: instead of explaining experimental ob-servation by employing minimal physics why not using the information contained in the experiments to deducethe most probable chromosome conformations compatiblewith the observations?In two related studies, Di Stefano and cowork-ers showed that by just enforcing co localization ofco expressed genes in a polymer model for human chro-mosome 19 ﬁrst [73] and then for the entire humangenome [77] without major additional constraints, theresulting conformations (see the example shown inFig. 11(A)) appear compatible with chromatin classiﬁca-tion in A/B sub-domains and with the non-random loca-tions of chromosome territories correlated to gene content(see Sec. I).In order to exploit the nature of TADs and ofchromatin-chromatin interactions measured within a sin-gle TAD, Giorgetti et al. [74] introduced a computa-tional polymer model (see. Fig. 11(B)) where sequence-dependent monomer-monomer interactions were ob-tained upon maximizing the agreement between contactfrequencies predicted by the model and the ones mea-sured by ordinary conformation capture techniques. Themodel, targeted onto a speciﬁc region of mouse chro-mosome X, reveals that the structure of a single TADmeasured by HiC reﬂects a full ensemble of ﬂuctuatingconformations across the cell population with no stableloops. Interestingly, the model was later tested by in-ducing a deletion at a speciﬁc locus and measuring thealtered spatial distances.A similar approach, the Minimal Chromatin Model2

C D A B

FIG. 11: Data-driven polymer models. (A) A polymer model promoting colocalization of coexpressed genes in human chro-mosome 19 produces conformations organized in spatial macrodomains which correlate with HiC [24] predictions. Reproducedfrom Ref. [73] under Creative Commons License. (B) A single TAD is modeled as a polymer chain whose beads interact via a square-well potential with an attractive wall. The energy parameters are optimized by iteration of a Monte Carlo samplingscheme so to maximize the agreement between the predicted and observed chromosome conformation contacts maps. Reproducedwith permission from Ref. [74]. (C) In the Minimal Chromatin Model (MiChroM), chromatin loci are classiﬁed into diﬀerenttypes (colors) and certain pairs of genomic loci (“anchors”) tend to form loops. The interaction potential for the polymer chainis trained based on the HiC [24] contact matrix for human chromosome 10, and used then to construct and study the spatialfeatures of the other chromosomes. Reproduced with permission from Ref. [75]. (D) The Polymer-based Recursive StatisticalInference Method (PRISMR) reﬁnes the SBS polymer model [57] by “ﬁltering” the simulated chromosome conformations so toderive the minimal set of binding sites and binding molecules which best reproduces the input HiC contact matrix. Instructingthe model on wild-type (WT) chromosome data, the eﬀects of genomic mutations (deletions/inversions/duplications) on ab-normal chromosome conformations can be then predicted without further additional parameters. Reproduced with permissionfrom Ref. [76]. (MiChroM), was introduced recently by Di Pierro etal. [75] with the intent of expanding the analysis to anentire chromosome and trying to export the derived force-ﬁeld to describe the whole diploid nucleus. Speciﬁcally,polymer loci were classiﬁed into chromatin types (as insome of the models considered in Sec. III A 2) and the energy parameters describing the interactions betweenthem were trained by using HiC data for human chro-mosome 10 from a speciﬁc cell line, see Fig. 11(C). Themodel was then used to predict an ensemble of possi-ble structures for the other chromosomes not used forthe training of the energy function: interestingly, the ob-3tained maps match well the ones obtained by HiC andthe simulated chromosome structures recapitulate othernotable features of interphase chromatin, like microphaseseparation of chromatin types (Sec III A 2) and the ten-dency of open chromatin to remain at the periphery ofits territory.Finally, Bianco et al. reﬁned the SBS model dis-cussed in Sec. III A 2, by introducing the Polymer-basedRecursive Statistical Inference Method (PRISMR) [76]:PRISMR works by minimizing a cost function which –again – takes into account the predicted vs. the measuredHiC contact frequencies, see Fig. 11(D). The “optimal”polymer model is then exported to construct chromo-some conformations for a number of so-called structuralvariants of chromosomes which are known to produceanomalous chromatin folding and diseases. The protocolis then shown to be very eﬃcient in detecting mutatedchromatin-chromatin interactions which are involved inanomalous phenotypes: the work reports in particularthe example of the

EPHA4 locus where speciﬁc deletionsare associated to anomalous polydactyly.

IV. DISCUSSION

In this article, we have described some of the mostpopular modelling approaches to 1D and 3D features ofgenomic DNA sequences.With regard to 1D features, we have shown (Sec. II)evidence of nontrivial displacement of nucleotides alongthe sequence: (1) at the single-nucleotide level, sincepyrimidines and purines are not randomly distributedbut show long-range correlations up to kb scale (Sec. II A)and (2) at a dinucleotide level (Sec. II B), in partic-ular CG-dinucleotides associated to DNA methylation,for which the distribution of mutual interdistances alongthe genome shows a diﬀerent behaviour from the otherdinucleotides and seems correlated to speciﬁc regulationmechanisms (CpG islands) or to organism complexity.Thus, the analysis of 1D sequences in these speciﬁc casesreveals important properties that go beyond the 1D en-vironment itself, and likely have an impact on (or areinﬂuenced by) the surrounding 3D context.In order to understand how genomes fold in 3Dwe have presented recent work about molecular mod-eling (Sec. III) of chromosomes. In this respect,the state-of-the-art is remarkably complex: topologi-cal eﬀects (Sec. III A 1), speciﬁc DNA-DNA interactions(Sec. III A 2), energy-driven, active (opposed to entropy-driven, passive) mechanisms (Sec. III A 3) are all likely toact concurrently. Future work has to dissect one by oneeach of these mechanisms with the goal to understandtheir relative importance with respect to the full picture.Inspired by the phenomenology of the “protein fold-ing” problem [40] where the aminoacid sequence contains the essential information to drive the protein towards itsunambiguous, “native” structure, it is natural to ask towhich extent the 1D sequence inﬂuences the 3D chro-matin architecture, provided that epigenetic factors area key player to be associated to DNA sequence. Two re-cent complementary approaches suggested that a signiﬁ-cant amount of spatial contacts detected by chromosomeconformation capture techniques can be predicted basedon the spatial colocalization of transcription-factor bind-ing sites measured by ChIA-PET [78] or from 1D mapsof histone modiﬁcations and other epigenetic marks [79].However, in spite of some evidence pointing to some non-trivial interplay between 1D sequence and 3D folding, thefull picture remains poorly understood.In this respect, some recent attempts (Sec. III B) basedon “data-driven” polymer physics with input from epige-netic patterns seem to describe well the spatial structureof chromosomes in vivo and, in some speciﬁc cases, areable to identify critical hot-spots along the sequence asso-ciated to mutations in the phenotype. At the same time,the 3D chromosome conformation participates activelyin the occurrence of epigenetic phenomena along the 1Dsequence, such as the formation of loops between spe-ciﬁc chromatin loci having distant locations along the se-quence. Therefore, it appears plausible that the 3D chro-mosome organization is “echoed” in the positioning alongthe DNA sequence of 1D motifs associated to promot-ers and enhancers regulating gene expression [80], andthat it is a major “driving force” in ﬁxing and stabiliz-ing the complex architectures [81, 82] of gene regulatorynetworks.Providing answers to these questions represents an ex-citing challenge which requires concerted experimentaland theoretical eﬀorts: the hope of the future is to ﬁnda systematic way for addressing unsolved biological andmedical challenges linking DNA sequences anomalies,chromosome misfolding and aberrant phenotypic behav-ior. A combination of 1D and 3D genome information canimprove the understanding of pathologies with a “struc-tural” basis, such as the Hutchinson-Gilford progeria syn-drome in which a protein associated to nuclear membranescaﬀolding and DNA arrangement is mutated [83], or ofpathologies such as cancer [10], characterized by signiﬁ-cant expression de regulation due to epigenetic phenom-ena and in which speciﬁc 1D mutational events can beassociated to DNA 3D structure [84]. V. ACKNOWLEDGEMENTS

AR and DR would like to acknowledge networking sup-port by the COST Action CA18127. DR and AM wouldlike to acknowledge support by the HARMONY IMI-2 n.116026.4 [1] A. Y. Grosberg, Polym. Sci. Ser. C , 1 (2012).[2] J. D. Halverson, J. Smrek, K. Kremer, and A. Y. Gros-berg, Rep. Prog. Phys. , 022601 (2014).[3] A. Rosa and C. Zimmer, Int. Rev. Cell Mol. Biol. ,275 (2014).[4] S. Bianco, A. M. Chiariello, C. Annunziatella, A. Espos-ito, and M. Nicodemi, Chromosome Res. , 25 (2017).[5] D. Jost, C. Vaillant, and P. Meister, Curr. Opin. CellBiol. , 20 (2017).[6] D. Jost, A. Rosa, C. Vaillant, and R. Everaers, in NuclearArchitecture and Dynamics , edited by C. Lavelle and J.-M. Victor (Academic Press, 2017), vol. 2, pp. 149–169.[7] A. Arneodo, C. Vaillant, B. Audit, F. Argoul, Y. dAuben-ton Carafa, and C. Thermes, Phys. Rep. , 45 (2011).[8] R. Cortini, M. Barbi, B. R. Car´e, C. Lavelle, A. Lesne,J. Mozziconacci, and J.-M. Victor, Rev. Mod. Phys. ,025002 (2016).[9] T. Stadhouders, G. J. Filion, and T. Graf, Nature ,345 (2019).[10] J.-P. Mallm, M. Iskar, N. Ishaque, L. C. Klett, S. J. Ku-gler, J. M. Muino, V. B. Teif, A. M. Poos, S. Großmann,F. Erdel, et al., Mol. Syst. Biol. (2019).[11] H. Heyn, S. Moran, and M. Esteller, Epigenetics , 28(2013).[12] K. N. Dahl, P. Scaﬃdi, M. F. Islam, A. G. Yodh, K. L.Wilson, and T. Misteli, Proc. Natl. Acad. Sci. USA ,10271 (2006).[13] G. Ozer, A. Luque, and T. Schlick, Curr. Opin. Struct.Biol. , 124 (2015).[14] B. Alberts et al., Molecular Biology of the Cell (GarlandScience, New York, 2014), 6 th ed.[15] D. J. Tremethick, Cell , 651 (2007).[16] Y. Nishino, M. Eltsov, Y. Joti, K. Ito, H. Takata,Y. Takahashi, S. Hihara, A. S. Frangakis, N. Imamoto,T. Ishikawa, et al., Embo J. , 1644 (2012).[17] K. Maeshima, R. Rogge, S. Tamura, Y. Joti, T. Hikima,H. Szerlong, C. Krause, J. Herman, E. Seidel, J. DeLuca,et al., Embo J. , 1115 (2016).[18] K. Maeshima, S. Ide, K. Hibino, and M. Sasai, Curr.Opin. Genet. Dev. , 36 (2016).[19] H. D. Ou, S. Phan, T. J. Deerinck, A. Thor, M. H. Ellis-man, and C. C. O’Shea, Science , eaag0025 (2017).[20] J. D. Buenrostro, B. Wu, H. Y. Chang, and W. J. Green-leaf, Curr. Protoc. Mol. Biol. , 21.29.1 (2015).[21] M. Zheng, S. Z. Tian, D. Capurso, M. Kim, R. Maurya,B. Lee, E. Piecuch, L. Gong, J. J. Zhu, Z. Li, et al.,Nature , 558 (2019).[22] T. Cremer and C. Cremer, Nat. Rev. Genet. , 292(2001).[23] A. Bolzer, G. Kreth, I. Solovei, D. Koehler, K. Saracoglu,C. Fauth, S. Muller, R. Eils, C. Cremer, M. R. Speicher,et al., Plos Biol. , e157 (2005).[24] E. Lieberman-Aiden, N. L. van Berkum, L. Williams,M. Imakaev, T. Ragoczy, A. Telling, I. Amit, B. R. La-joie, P. J. Sabo, M. O. Dorschner, et al., Science ,289 (2009).[25] J. R. Dixon, S. Selvaraj, F. Yue, A. Kim, Y. Li, Y. Shen,M. Hu, J. S. Liu, and B. Ren, Nature , 376 (2012).[26] J. R. Dixon, D. U. Gorkin, and B. Ren, Mol. Cell ,668 (2016).[27] T. Cremer and M. Cremer, Cold Spring Harbor Perspec- tives in Biology , a003889 (2010).[28] M. R. Branco and A. Pombo, Plos Biol. , e138 (2006).[29] J. Dekker, K. Rippe, M. Dekker, and N. Kleckner, Sci-ence , 1306 (2002).[30] C.-K. Peng, S. V. Buldyrev, A. L. Goldberger, S. Havlin,F. Sciortino, M. Simons, and H. E. Stanley, Nature ,168 (1992).[31] C.-K. Peng, S. V. Buldyrev, S. Havlin, M. Simons, H. E.Stanley, and A. L. Goldberger, Phys. Rev. E , 1685(1994).[32] A. P. Singh, S. Mishra, and S. Jabin, Sci. Rep. (2018).[33] V. Afreixo, C. A. C. Bastos, J. M. O. S. Rodrigues,and R. M. Silva, in Optimization in the Natural Sciences (Springer International Publishing, 2015), pp. 162–172.[34] P. M. Iannaccone and M. Khokha,

Fractal Geometryin Biological Systems - An Analytical Approach (CRCPress, 1996).[35] S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K. Peng,M. Simons, and H. E. Stanley, Phys. Rev. E , 4514(1993).[36] H. Poincar´e, Acta Mathematica (1890).[37] C. Bastos, V. Afreixo, A. Pinho, S. Garcia, J. Rodrigues,and P. Ferreira, J. Integr. Bioinform. , 31 (2011).[38] G. Paci, G. Cristadoro, B. Monti, M. Lenci, M. Degli Es-posti, G. C. Castellani, and D. Remondini, Philos. Trans.Royal Soc. A (2016).[39] M. A., F. d. Valle I., C. G., and R. D., BMC Bioinfor-matics , 355 (2018).[40] V. S. Pande, A. Y. Grosberg, and T. Tanaka, Rev. Mod.Phys. , 259 (2000).[41] M. Doi and S. F. Edwards, The Theory of Polymer Dy-namics (Oxford University Press, New York, 1986).[42] M. Rubinstein and R. H. Colby,

Polymer Physics (OxfordUniversity Press, New York, 2003).[43] A. Grosberg, Y. Rabin, S. Havlin, and A. Neer, EPL(Europhysics Letters) , 373 (1993).[44] A. Rosa and R. Everaers, PLoS Comput. Biol. ,e1000153 (2008).[45] A. Rosa, N. B. Becker, and R. Everaers, Biophys. J. ,2410 (2010).[46] F. Benedetti, J. Dorier, Y. Burnier, and A. Stasiak, Nu-cleic Acids Res. , 2848 (2013).[47] F. Benedetti, D. Racko, J. Dorier, Y. Burnier, andA. Stasiak, Nucleic Acids Res. , 9850 (2017).[48] A. Y. Grosberg, S. K. Nechaev, and E. I. Shakhnovich,J. Phys. France , 2095 (1988).[49] C. M¨unkel, R. Eils, S. Dietzel, D. Zink, C. Mehring,G. Wedemann, T. Cremer, and J. Langowski, J. Mol.Biol. , 1053 (1999).[50] S. Jhunjhunwala, M. C. van Zelm, M. M. Peak,S. Cutchin, R. Riblet, J. J. M. van Dongen, F. G.Grosveld, T. A. Knoch, and C. Murre, Cell , 265(2008).[51] A. Rosa and R. Everaers, Phys. Rev. Lett. , 118302(2014).[52] J. Smrek, K. Kremer, and A. Rosa, ACS Macro Lett. ,155 (2019).[53] R. D. Schram, A. Rosa, and R. Everaers, Soft Matter ,2418 (2019).[54] A.-M. Florescu, P. Therizols, and A. Rosa, Plos Comput.Biol. , e1004987 (2016). [55] M. Renda, I. Baglivo, B. Burgess-Beusse, S. Esposito,R. Fattorusso, G. Felsenfeld, and P. V. Pedone, J. Biol.Chem. , 33336 (2007).[56] P. R. Cook, J. Mol. Biol. , 1 (2010).[57] M. Barbieri, M. Chotalia, J. Fraser, L.-M. Lavitas,J. Dostie, A. Pombo, and M. Nicodemi, Proc. Natl. Acad.Sci. USA , 16173 (2012).[58] C. A. Brackley, J. Johnson, S. Kelly, P. R. Cook, andD. Marenduzzo, Nucleic Acids Res. , 3503 (2016).[59] D. Jost, P. Carrivain, G. Cavalli, and C. Vaillant, NucleicAcids Res. , 9553 (2014).[60] Q. MacPherson, B. Beltran, and A. J. Spakowitz, Proc.Natl. Acad. Sci. USA , 12739 (2018).[61] T. Sexton, E. Yaﬀe, E. Kenigsberg, F. Bantignies,B. Leblanc, M. Hoichman, H. Parrinello, A. Tanay, andG. Cavalli, Cell , 458 (2012).[62] S. K. Ghosh and D. Jost, Plos Computat. Biol. ,e1006159 (2018).[63] G. Shi, L. Liu, C. Hyeon, and D. Thirumalai, Nat. Com-mun. , 3161 (2018).[64] N. Ganai, S. Sengupta, and G. I. Menon, Nucleic AcidsRes. , 4145 (2014).[65] A. Goloborodko, M. V. Imakaev, J. F. Marko, andL. Mirny, eLife , e14864 (2016).[66] S. Ramaswamy, Annu. Rev. Condens. Matter Phys. ,323 (2010).[67] A. Y. Grosberg and J.-F. Joanny, Phys. Rev. E ,032118 (2015).[68] J. Smrek and K. Kremer, Phys. Rev. Lett. , 098002(2017).[69] A. Goloborodko, J. F. Marko, and L. A. Mirny, Biophys.J. , 2162 (2016).[70] G. Fudenberg, M. Imakaev, C. Lu, A. Goloborodko,N. Abdennur, and L. A. Mirny, Cell Rep. , 2038 (2016).[71] D. Racko, F. Benedetti, D. Goundaroulis, and A. Stasiak,Polymers , 1126 (2018). [72] E. Orlandini, D. Marenduzzo, and D. Michieletto, Proc.Natl. Acad. Sci. USA , 8149 (2019).[73] M. Di Stefano, A. Rosa, V. Belcastro, D. di Bernardo,and C. Micheletti, Plos Comput. Biol. , e1003019(2013).[74] L. Giorgetti, R. Galupa, E. P. Nora, T. Piolot, F. Lam,J. Dekker, G. Tiana, and E. Heard, Cell , 950 (2014).[75] M. Di Pierro, B. Zhang, E. Lieberman-Aiden, P. G.Wolynes, and J. N. Onuchic, Proc. Natl. Acad. Sci. USA , 12168 (2016).[76] S. Bianco, D. G. Lupi´a˜nez, A. M. Chiariello, C. Annun-ziatella, K. Kraft, R. Sch¨opﬂin, L. Wittler, G. Andrey,M. Vingron, A. Pombo, et al., Nature Genet. , 662(2018).[77] M. Di Stefano, J. Paulsen, T. G. Lien, E. Hovig, andC. Micheletti, Sci. Rep. , 35985 (2016).[78] P. Sza(cid:32)laj, Z. Tang, P. Michalski, M. J. Pietal, O. J.Luo, M. Sadowski, X. Li, K. Radew, Y. Ruan, andD. Plewczynski, Genome Res. , 1 (2016).[79] Y. Zhu, Z. Chen, K. Zhang, M. Wang, D. Medovoy, J. W.Whitaker, B. Ding, N. Li, L. Zheng, and W. Wang, Nat.Commun. , 10812 (2016).[80] A. Pombo and N. Dillon, Nat. Rev. Mol. Cell Biol. ,245 (2015).[81] H. de Jong, J. Comput. Biol. , 67 (2002).[82] M. Cosentino-Lagomarsino, B. Bassetti, G. Castellani,and D. Remondini, Mol. Biosyst. , 335 (2009).[83] R. McCord, A. Nazario-Toole, H. Zhang, P. Chines,Y. Zhan, M. R. Erdos, F. S. Collins, J. Dekker, andK. Cao, Genome Res. , 260 (2013).[84] G. I. Dellino, F. Palluzzi, A. M. Chiariello, R. Piccioni,S. Bianco, L. Furia, G. De Conti, B. A. M. Bouwman,G. Melloni, D. Guido, et al., Nature Genet.51