[PDF] Dynamics of transposable elements generates structure and symmetries in genetic sequences

Abstract

Genetic sequences are known to possess non-trivial composition together with symmetries in the frequencies of their components. Recently, it has been shown that symmetry and structure are hierarchically intertwined in DNA, suggesting a common origin for both features. However, the mechanism leading to this relationship is unknown. Here we investigate a biologically motivated dynamics for the evolution of genetic sequences. We show that a metastable (long-lived) regime emerges in which sequences have symmetry and structure interlaced in a way that matches that of extant genomes.

Full PDF

DDynamics of transposable elements generatesstructure and symmetries in genetic sequences

Giampaolo Cristadoro, Mirko Degli Esposti, and Eduardo G. Altmann Dipartimento di Matematica e Applicazioni, Universit`a di Milano - Bicocca, Italy Dipartimento di Informatica, Universit`a di Bologna, Italy School of Mathematics and Statistics, University of Sydney, Australia

Genetic sequences are known to possess non-trivial composition together with symmetries in thefrequencies of their components. Recently, it has been shown that symmetry and structure arehierarchically intertwined in DNA, suggesting a common origin for both features. However, themechanism leading to this relationship is unknown. Here we investigate a biologically motivateddynamics for the evolution of genetic sequences. We show that a metastable (long-lived) regimeemerges in which sequences have symmetry and structure interlaced in a way that matches that ofextant genomes. a. Introduction.

Transposable elements (TEs) areDNA sequences that can relocate themselves in new sitesof the genome. They were ﬁrstly discovered in maize byB. McClintock in the mid-1940s and initially consideredas parasites with no functional roles [1]. Nowadays TEsare known to be ubiquitous in both prokaryotes and eu-karyotes genomes [2, 3] and little doubts are left of theirprominent role in genome evolution, shaping structureand function in a multitude of ways [4, 5]. As TEs con-stitute more than half of the sequence in many highereukaryotes, a ﬁngerprint of their presence can be quanti-tatively extracted from the statistical properties of theirhost DNA. Indeed, TEs properties were shown to be cru-cial in explaining structural global features of genomesequences [6–11].Recently, Albrecht-Buehler [12] suggested that TEswere the main driving force for the emergence of the sec-ond Chargaﬀ parity rule. This rule states that, in each strand of the DNA, the frequencies of a short oligonu-cleotide w is approximately equal to that of its symmet-rically related ˆw , obtained from w by reversing the orderof the symbols and substituting each nucleotide with itsconjugated A ↔ T and C ↔ G (e.g. w = ACT GGCT ,ˆ w = AGCCAGT ). It has been ﬁrst observed by Chargaﬀin the 1950s [13] and since then detected across diﬀerentorganisms leading to diﬀerent proposals for its origin andfunction [14–29]. The importance of Albrecht-Buehlerexplanation is that it shows how this symmetry natu-rally emerges as an asymptotic outcome of the cumula-tive action of inversions/transpositions, one of the mainmechanism of relocation of TEs. As we will show, whilethe proposed mechanism nicely induces Chargaﬀ sym-metry in the asymptotic DNA, it does it at the cost oftrivialisation of the structural properties of the sequence:symmetry is obtained because of the complete random-ization of the full double-stranded DNA. In view of theubiquity of complex structures in genomes [30–38], thisresult raises the question whether symmetry can appearwithout a full randomization of the sequence and in away that is compatible with the existence of structure.The importance of this question is enhanced by our recentﬁndings [39] that Chargaﬀ symmetry extends beyond the

FIG. 1.

Symmetry and structure are intertwined inDNA.

Results are shown for Homo-Sapiens chromosome 1(symbols) and its randomly shuﬄed version (dashed lines).Each curve corresponds to one observable. Symmetrically re-lated observables appear in the same box in the legend. (a)

Distribution P ( τ ) of recurrence-times τ (measured in num-ber of basis) between successive occurrences of the same nu-cleotide. (b) Probability f X A ,X B ( (cid:96) ) that the bigrams X A and X B appear separated by a distance (cid:96) . Plotted is the normal-ized cross-correlation z [ X A ,X B ] ( (cid:96) ) = f X A ,X B / ( f X A f X B ) as afunction of (cid:96) , for symmetrically related couples [ X A , X B ](seelegend). Diﬀerent nested symmetries are valid at diﬀerentscales (cid:96) (see Ref. [39] for further details): for (cid:96) (cid:47)

150 Char-gaﬀ z [ X A ,X B ] = z [ ˆ X B , ˆ X A ] , for 150 (cid:47) (cid:96) (cid:47) z [ X A ,X B ] = z [ X B ,X A ] , and for (cid:96) (cid:39) z [ X A ,X B ] = z [ ˆ X A , ˆ X B ] and reverse symmetry. frequencies of short oligonucleotides – remaining valid onscales where non-trivial structure is present – and that anhierarchy of other symmetries exists, nested at diﬀerentstructural scales. This ﬁndings are conﬁrmed in Fig. 1, a r X i v : . [ q - b i o . GN ] J un which shows how commonly used indicators of structures,such as recurrence-time distribution (panel a) and cor-relation functions (panel b), coincide for symmetricallyrelated observables at diﬀerent scales.In this work we present a biologically motivated dy-namical process that explains the observed relation be-tween symmetry and structure in DNA sequences. Inparticular, we propose a model that mimics the action(inversion/transpositions) of TEs on DNA and we an-alytically describe its dynamical behavior. Using indi-cators to quantify both symmetry and the presence ofnon-trivial structure in symbolic sequences, we show thatthe co-occurrence of symmetry and structure is an emer-gent statistical property in sequences generated by suchmodel, reproducing the same hierarchical relation de-tected in extant genomes. b. Quantifying structure and symmetry. We con-sider symbolic sequences s = { s i } Ni =1 of length | s | = N with s i ∈ A = { A, C, G, T } . Given a subsequence a of s (a word) we denote its corresponding reverse-complemented word as ˆ a , obtained from a by revers-ing the order of the symbols and substituting each nu-cleotide { A, C, G, T } by its complementary one A ↔ T and C ↔ G . We call f x ( s ) the percentage of thenucleotide x in the sequence s . Finally, we denote by CG ( s ) := f C ( s ) + f G ( s ) (the so called CG-content). Inthe following, it will be useful to partition the full set A N into disjoint subsets of ﬁxed CG-content B N ( k ) := { s ∈A N | CG ( s ) = k/N } ; A N = ∪ Nk =0 B N ( k ).We introduce the following simple indicators of thepresence of Chargaﬀ Symmetry and of non-trivial struc-ture composition of a given sequence s .To quantify the compliance of s with Chargaﬀ sym-metry, we average the normalized diﬀerence of the abun-dance between a nucleotide and its symmetric one (see[21] where a similar measure was ﬁrstly introduced) I sym ( s ) = 14 (cid:88) x ∈A | f x ( s ) − f ˆ x ( s ) | f x ( s ) + f ˆ x ( s ) . (1) I sym = 0 indicates a fully Chargaﬀ-symmetric sequence, I sym = 1 is obtained for a sequence for which Chargaﬀis perfectly violated ( f A = f C = 0 . , f T = f G = 0),and I sym = 0 .

08 is obtained for a 2% variation of equalfrequencies (e.g., f A = f C = 0 . , f T = f G = 0 . I sym > .

08 to be a violation ofChargaﬀ symmetry.To quantify the presence of non-trivial structuresin a given symbolic sequence s we ﬁrst compute thedistribution P ( τ ) of distances τ between two succes-sive occurrence of the same nucleotide x . For ran-dom sequences, P ( τ ) decays exponentially as P ( τ ) = f x (1 − f x ) τ − and thus has average 1 /f x and standarddeviation √ − f x /f x (which is ≈ /f x for small f x ). Incontrast, the presence of a fat tail (standard deviationmuch larger than the mean) is considered a signature ofa complex organization. We thus quantify structure as the distance of s from random sequences by I str ( s ) = 14 (cid:88) x ∈A (cid:32) (cid:112) − f x ( s ) σ τ ( x ) µ τ ( x ) − (cid:33) , (2)where µ τ ≡ (cid:104) τ (cid:105) and σ τ ≡ (cid:112) (cid:104) τ (cid:105) − (cid:104) τ (cid:105) are the mean andstandard deviation of the measured P ( τ ), and √ − f x is the expected σ τ /µ τ for nucleotide x in a random se-quence. For random sequence we thus have I str ( s ) = 0,while departure from this value mark the presence of non-trivial structure. For simplicity, we consider I str > . c. Dynamics. We investigate symmetry and struc-ture of sequences that evolve through the following dy-namics, that maps one sequence s ( t ) ∈ A N into anothersequence s ( t + 1) ∈ A N by mimicking the action of TEs[12]. The dynamics is deﬁned composing two actions:(i) pick a random position j of s and a random size (cid:96) ≥

0, with (cid:104) (cid:96) (cid:105) = L [ ? ].(ii) replace the subsequence b ≡ { s i } j + (cid:96) − i = j of size (cid:96) starting at position j , by its reverse complement ˆ b .The couple ( j, (cid:96) ) parametrizes the eﬀect of an inver-sion/transposition, which we denote by g ( j,(cid:96) ) : A N →A N . Its action has interesting properties: g ( j,(cid:96) ) is an in-volution for every ( j, (cid:96) ) and the total number of C and G (or, equivalently, of A and T ) is invariant under g : CG ( s t ) = CG ( s ) ∀ t . This implies that the dynamicsis restricted to the invariant subspace of sequences withconstant CG-content B N ( CG ( s )). d. Asymptotic equilibrium. The dynamics can beequivalently described as an ergodic Markov chain overthe space of sequences B N ( CG ( s )). The fact that g ( j,(cid:96) ) is an involution forces the transition matrix to be bi-stochastic and thus in the asymptotic equilibrium all se-quences are equiprobable. This means that, for t → ∞ and irrespective of the initial ancient DNA sequence, theevolution asymptotically leads to sequences that can beequivalently considered generated by an independent andidentically distributed (iid) process with p ( G ) = p ( C ) = CG ( s ) / p ( A ) = p ( T ) = (1 − CG ( s )) /

2. There-fore, the expected value of our indicators of symmetryand structure Eqs. (1) and (2) vanish asymptoticallylim t →∞ I str ( s ( t )) = lim t →∞ I sym ( s ( t )) = 0 , for any initial sequence s (0)[ ? ]. This shows analyticallythat the TE dynamics asymptotically leads to Chargaﬀsymmetric sequences, in agreement with previous claims[12]. However, this symmetric equilibrium is a (trivial)consequence of a full randomization. Therefore our re-sults show also that the current explanations of the sec-ond Chargaﬀ parity rule [12] is not satisfactory as it isnot compatible with any structure, which is known toremain signiﬁcant at distances of several thousands ofnucleotides [30–38] (see also Fig.1). Next we show thatthe same TE dynamics is rich enough by showing thatsymmetric sequences with non-trivial structure are gen-erated pre-asymptotically as long-lived metastable statesof TEs dynamics. e. Symmetry and structure over time - three regimes. We now investigate symmetry and structure of the se-quences s ( t ) by computing how our indicators I sym an I str depend on time t (i.e., their values after t appli-cations of g ( j,(cid:96) ) ). We show that Chargaﬀ symmetryemerges much before equilibrium, together with a com-plex domain-like structure.We ﬁrst investigate structural properties of sequencesafter a ﬁnite number t of iterations. We deﬁne a domain of s ( t ) as a subsequence of consecutive sites that havebeen involved in the same series of reverse/complementevents. We then distinguish between domains of type Γand ˆΓ, depending on whether the number of transforma-tions g they were involved is even or odd, respectively.By deﬁnition, the starting sequence is composed by a sin-gle domain of type Γ. After one iteration it is split intothree domains, two of type Γ and one of type ˆΓ of length (cid:96) , corresponding to the subsequence involved in the ﬁrstreverse/complement event. We now compute the aver-age sizes (cid:104) (cid:96) Γ (cid:105) ( t ) and (cid:104) (cid:96) ˆΓ (cid:105) ( t ) of domains after t iterations.Three regimes can be identiﬁed:(i) For short times t , if L (cid:28) N , the probability that theﬁrst few iterates all involve diﬀerent subsequence is veryhigh[ ? ]. At each iterate, a subsequence of a domainof type Γ of average size (cid:104) (cid:96) (cid:105) = L is created, cutting adomain of type ˆΓ. Thus we have that in this regime: (cid:104) (cid:96) ˆΓ (cid:105) ( t ) = L and (cid:104) (cid:96) Γ (cid:105) ( t ) = N/t. (3)This regime lasts until iterates start overlapping, whichhappens when

N/t ≈ L and average domain-sizes equal-ize (cid:104) (cid:96) ˆΓ (cid:105) ( t ) = (cid:104) (cid:96) Γ (cid:105) ( t ) = L . This regime is thus valid for0 < t (cid:46) t metastable = N/L .(ii) For t (cid:38) t metastable = N/L a typical re-verse/complement event will overlap with more than onedomain. In this case all the domains that lie fully in-side the subsequence involved in the reverse/complementevent will change type (and position) without changinglength; the domains at the border are instead split intwo sub-domains of diﬀerent type. The randomness ofthis process guarantees that the already reached balancebetween the number and average length of the two do-mains types Γ and ˆΓ is not broken while their commonaverage length decreases in time as (cid:104) (cid:96) ˆΓ (cid:105) ( t ) = (cid:104) (cid:96) Γ (cid:105) ( t ) = N/t. (4)This second regime ends after a number of iterations t ∼ t equilibrium = N when equilibrium is reached.(iii) For t > t equilibrium = N the average lengths stabilizeat the stationary value (cid:104) (cid:96) ˆΓ (cid:105) ( t ) = (cid:104) (cid:96) Γ (cid:105) ( t ) = 1 , (5)and the sequence can be thought as a realization of theasymptotic equilibrium discussed above. FIG. 2.

Temporal evolution of symmetry and struc-ture in the model. (a)

Numerical evaluation of the averagesizes of domains of the two types (cid:104) (cid:96) Γ (cid:105) ( t ) and (cid:104) (cid:96) ˆΓ (cid:105) ( t ) as afunction of the number of iterates t of TE’s dynamics. (b) Numerical evaluation of the symmetry and structural prop-erties of the sequence generated by the dynamics and quan-tiﬁed by the indicators I str ( t ) and I sym ( t ). The ﬁlled sym-bols in I str indicate that these values are statistically diﬀer-ent from a random sequence ( p − value < .

01, equivalentresults are obtained using as an alternative deﬁnition of I str the Jensen-Shannon divergence between the P ( τ ) obtained inthe model and in random sequences). The sequences s ( t ) havelength N = 10 and the size of reverse/complement events is L = 500, thus leading to time-scales t metastable = 10 and t equilibrium = 10 . The starting sequence is fully randomwith f A = 0 . , f C = 0 . , f G = 0 . , f T = 0 . . We now explain how structure I str ( s ( t )) and symmetry I sym ( s ( t )) depend on the domain sizes (cid:104) (cid:96) Γ (cid:105) and (cid:104) (cid:96) ˆΓ (cid:105) andthus on the diﬀerent regimes. I str ( s ) : in order to identify the contribution of the dy-namics in generating complex structural features,we consider an initial s (0) generated by an iid pro-cess (no structure, I str (0) = 0). With this choice, avalue I str (cid:54) = 0 signal the construction, under the ac-tion of the dynamics, of diﬀerent domain-types. Inparticular, at t metastable and for L >>

1, the totalvariance σ τ can be estimated, using the law of totalvariance, as the sum of two components: one thatmeasure variability of the mean of returns betweendomain-types and the other measuring variabilityof returns within each type. Accordingly I str ( t )grows from 0 to the value I str ( t metastable ) > I str ( t ) decreases to zeroat equilibrium (at t equilibrium ). In terms of regimeswe thus expect: (i) I str grows; (ii) I str decays; (iii) I str = 0. I sym ( s ) : each domain of type Γ is a subsequence of theancient sequence s (0). If average size of such do-mains at time t is large enough, the frequencyof each nucleotide are approximately the sameas their frequency in s (0); similarly for ˆΓ and ˆs (0). No constraints are imposed to the sym-metry of the ancient genome. In particular, ifthe original sequence is not Chargaﬀ symmet-ric I sym ( s (0)) > t (cid:46) t metastable as quantiﬁed by I sym ( s ( t )) (cid:39) tN (cid:12)(cid:12) (cid:104) (cid:96) Γ (cid:105) ( t ) − (cid:104) (cid:96) ˆΓ (cid:105) ( t ) (cid:12)(cid:12) I sym ( s (0)). Interms of regimes we thus expect: (i) I sym > I sym = 0; (iii) I sym = 0.Altogether, the estimations and calculations abovelead to the following predictions for the presence of sym-metry and structure as a function of time t (regimes i-iii):(i) 0 ≤ t ≤ t metastable = N/L :Structure I str > I sym > t metastable = N/L ≤ t ≤ t equilibrium = N Structure I str > I sym = 0.(iii) t equilibrium = N < t ;Symmetry I sym = 0 but no structure I str = 0.In Fig. 2 we conﬁrm these predictions in a numericalsimulation. f. The metastable regime. The crucial feature of theTE dynamics discussed above is that in regime (ii) bothnon-trivial structure and symmetry co-exists in the gen-erated sequences. The time (measured in number of it-erations) for which this regime is valid is orders of mag-nitude larger than that of the ﬁrst regime, as the ra-tio t equilibrium /t metastable = L corresponds to the aver-age size of transposable elements (for example L (cid:39) in Homo Sapiens [46]). We thus denote such long-livedregime as metastable and we expect it to be genericallyobserved, even though it does not correspond to the sta-ble equilibrium of our model.The DNA sequences in the metastable regime are char-acterized by a symmetric domain-like structure. Domainmodels have been already introduced in literature to re-produce the complex structure generically observed inextant DNAs [11, 40–46]. In particular if the distribu-tion of domain sizes has a fat tail, this will lead to along-range correlated sequence [11], signalled by a slowdecay of P ( τ ). The novelty of our approach is twofold:ﬁrstly, the domain-like structure in the metastable regimeis an emergent property of the TE dynamics (it is notimposed a priori); secondly, such complex structure is in-tertwined with symmetry, that itself is an output of thedynamics. In particular, we have shown that sequencesin the metastable regime are not only Chargaﬀ symmet-ric ( I sym = 0), they reproduce the hierarchical relationbetween symmetry and structure that is a distinctive fea-ture of extant genomes (see Fig. 3 ). g. Diﬀerent organisms. In Fig.4 we report I sym and I str computed for genomes of diﬀerent families, togetherwith the values obtained from our dynamics. It shows FIG. 3.

Symmetry and structure in the metastableregime.

Same observables as in Fig 1 are computed for a se-quence in the metastable regime of our dynamics. Data showthat this regime is characterized by a similar co-occurrenceof symmetry and structure as in extant genomes. Results inpanel (a) are for a sequence of length N = 5 × initial-ized as in Fig. 2 and evolved using our model with TE sizes (cid:96) all equals to L = 5000 until t = 2048 (cid:39) t metastable = 1000.Results in panel (b) are for a sequence of length N = 10 initialized as in the artiﬁcial sequence reported in Ref. [39]and evolved using our dynamic model with ﬁxed L = 500until t = 256 (cid:39) t metastable = 200. The more generic initialsequence in panel (b) (i.e., Markov chain instead of fully ran-dom) allow us to distinguish between the diﬀerent types ofscale-dependent symmetries generated by the dynamics. that symmetry and structure coexist in most cases. Thesequences from Animals shows enhanced structure whilethe cases of Archaea and Bacteria shows a moderate sig-natures of structure, in agreement with the temporal be-haviour of our model (i.e., associating t with the age ofthe genomes). Note that symmetry and structure proper-ties are both statistical observations we made on the fullDNA sequence. Any evolutionary constraint that per-tains a small percentage of an organism genome does notaﬀect these statistical observation in a sensible way. Asan example, the protein-coding regions of Homo-Sapiensaccount for 1 .

5% of the full sequence. On the otherhand, care should be taken when dealing with many dif-ferent organisms: extensions of the model incorporatingadditional aspects of DNA evolution will be required fora quantitative comparison with the empirical data. h. Conclusion.

We have shown how a model thatcaptures the action of transposable elements (TEs) isable to reproduce the intricate relation between symme-try and structure present in DNA sequences. We ﬁndthat symmetry and structure change diﬀerently at dif-

FIG. 4.

Structure and symmetry in diﬀerent organ-isms.

Values of I sym vs. I str for diﬀerent genomes belongingto the families Archaea, Bacteria, Animals[50]. Superimposedare the values of the sequences evolved via our model (startingin ( I sym , I str ) = (0 . ,

0) and evolving to (0 , ferent time scales (i.e., for diﬀerent number of actions ofTEs). For a large (pre-asymptotic) time interval, thesequences obtained in our model show the same non-trivial structures and an hierarchy of symmetries (includ-ing Chargaﬀ) as in actual DNA sequences (confront pan-els (b) of Fig.1 and Fig.3). Our mathematical model isextremely simpliﬁed and includes the essential elementsto explain the onset of symmetry and structure. In par-ticular, it mimics only a simple action of TEs (reverse-complement), ignoring the fact that TEs are classiﬁedin diﬀerent families, have diﬀerent properties, and actaccording to diﬀerent mechanisms [47–49]. We expectthat incorporating more details of the TE dynamics inour model will reﬁne our understanding of their role inshaping statistical properties of DNA sequences, in par-ticular in an evolutionary viewpoint that would lead toreﬁnements in the data-model comparison presented inFig. 4. [1] McClintock B, The origin and behavior of mutable loci inmaize, Proc. Natl. Acad. Sci. USA (6): 344-55 (1950).[2] Feschotte C, Pritham E.J., DNA Transposons and theEvolution of Eukaryotic Genomes, Annu. Rev. Genet : 331-368 (2007).[3] Kleckner N, Transposable elements in prokaryotes, Annu.Rev. Gen. : 341-404 (1981)[4] Bourque G, Burns KH, Gehring M, Gorbunova V, Selu-anov A, Hammell M, Imbeault M, Izsvk Z, Levin HL,Macfarlan TS, Mager DL, Feschotte C, Ten things youshould know about transposable elements, Genome Bi-ology (1): 199 (2018).[5] Fedoroﬀ N.V., Transposable Elements, Epigenetics andGenome Evolution, Science , 758-767 (2012).[6] Holste D, Grosse I, Beirer S, Schieg P, Herzel H, Repeatsand correlations in human DNA sequences

Phys. Rev. E (6): 061913, (2003).[7] Sheinman, M, Ramisch, A, Massip, F, Arndt PF, Evolu-tionary dynamics of selﬁsh DNA explains the abundancedistribution of genomic subsequences, Scientiﬁc Reports

Physical Review Letters (14): 148101 (2013).[9] Messer PW, Arndt PF, L¨assig M, Solvable Sequence Evo-lution Models and Genomic Correlations,

Physical Re-view Letters EPL (Europhysics Letters) , 391 (1996).[11] Buldyrev SV , Goldberger AL, Havlin S, Peng CK, Si-mons M, and Stanley HE, Generalized Levy Walk Modelfor DNA Nucleotide Sequences, Phys. Rev. E , 4514-4523 (1993),[12] Albrecht-Buehler G, Asymptotically increasing compli-ance of genomes with Chargaﬀ’s second parity rulesthrough inversions and inverted transpositions, Proc.Natl. Acad. Sci. USA , 17828-17833 (2006). [13] Rudner R, Karkas JD, Chargaﬀ E, Separation of B. sub-tilis DNA into complementary strands I. Biological prop-erties, II. Template functions and composition as deter-mined, III Direct analysis.

Proc. Natl. Acad. Sci. USA , 630-635; 915-922 (1968).[14] Rogerson AC, There appear to be conserved constraintson the distribution of nucleotide sequences in cellulargenomes. J. Mol. Evol , 24-30 (1991).[15] Mitchell D, Bridge R, A test of Chargaﬀ’s second rule Biochem. Biophys. Res. Commun. , 90-94 (2006).[16] Nikolaou C, Almirantis Y, Deviations from Chargaﬀ’ssecond parity rule in organellar DNA Insights into theevolution of organellar genomes,

Gene , 34-41 (2006).[17] Qi D, Cuticchia AJ, Compositional symmetries in com-plete genomes,

Bioinformatics , 557-559 (2001).[18] Fickett JW , Torney DC, Wolf DR Base compositionalstructure of genomes. Genomics , 1056-1064 (1992).[19] Prabhu VV, Symmetry observations in long nucleotidesequences, Nucleic Acids Res. , 2797-2800 (1993).[20] Bell SJ, Forsdyke DR, Accounting units in DNA. J.Theor. Biol. , 51-61 (1999).[21] Baisn´ee PF, Hampson S, Baldi P, Why are complemen-tary DNA strands symmetric?,

Bioinformatics , 1021-1033 (2002).[22] Kong S-G, Fan W-L, Chen H-D, Hsu Z-T, Zhou N, ZhengBo, Lee H-C, Inverse Symmetry in Complete Genomesand Whole-Genome Inverse Duplication, PLOS one ,e7553 (2009).[23] Afreixo V1, Bastos CA, Garcia SP, Rodrigues JM, PinhoAJ, Ferreira PJ, The breakdown of the word symmetry inthe human genome, J. Theor. Biol. , 153-1599 (2013).[24] Bell SJ, Forsdyke DR, Deviations from Chargaﬀ’s SecondParity Rule Correlate with Direction of Transcription,

J.Theor. Biol. , 63-76 (1999).[25] Lobry JR, Lobry C, Evolution of DNA base composi-tion under no-strand-bias condition when the substitu-tion rates are not constant,

Mol. Biol. Evol. , 719-723(1999). [26] Zhang SH, Huang YZ, Limited contribution of stem-looppotential to symmetry of single-stranded genomic DNA, Bioinformatics , 478-485 (2010).[27] Hart A, Mart´ınez S, Olmos FA, Gibbs Approach to Char-gaﬀs Second Parity Rule, Journal of Statistical Physics , 408-422 (2012).[28] Coons LA, Burkholder AB, Hewitt SC, McDonnell DP,Korach KS, Decoding the Inversion Symmetry Under-lying Transcription Factor DNA-Binding Speciﬁcity andFunctionality in the Genome, iScience , 552-591 (2019)[29] Fariselli P, Taccioli C, Pagani L, Maritan A, DNA se-quence symmetries from randomness: the origin of theChargaﬀs second parity rule, Brieﬁngs in Bioinformat-ics , bbaa041, (2020).[30] Peng CK , Buldyrev SV,Goldberger AL , Havlin S,Sciortino F, Simons M and Stanley HE, Long-range cor-relation in nucleotide sequences,

Nature , 168-170(1992).[31] Li W, Kaneko K, Long-Range Correlation and Partial1 /f α Spectrum in a Noncoding DNA Sequence,

EPL ,655-660 (1992).[32] Voss R, Evolution of Long-Range Fractal Correlationsand 1 /f Noise in DNA Base Sequences,

Phys. Rev. Lett. , 3805-3808 (1992).[33] Amato I, DNA shows unexplained patterns writ large, Science , 747(1992).[34] Yam P, Noisy nucleotides: DNA sequences show fractalcorrelations,

Sci. Am. , 23-24,27 (1992).[35] Li W, Marr TG, Kaneko K, Understanding long-rangecorrelations in DNA sequences,

Physica D , 392-416(1994).[36] Audit B, Thermes C, Vaillant C, d’Aubenton-Carafa J,Muzy JF and Arneodo A, Long-Range Correlations inGenomic DNA: A Signature of the Nucleosomal Struc-ture, Phys. Rev. Lett. , 2471 (2001).[37] Frahm KM, Shepelyansky DL, Poincar´e recurrences ofDNA sequences, Phys. Rev. E , 016214 (2012).[38] Colliva, A, Pellegrini R, Testori A, Caselle M, Ising-model description of long-range correlations in DNA se- quences, Phys. Rev. E : 052703 (2015).[39] Cristadoro G, Degli Esposti M, Altmann EG, The com-mon origin of symmetry and structure in genetic se-quences, Scientiﬁc Reports (1), 15817 (2018).[40] Nee S, Uncorrelated DNA walks, Nature , 450 (1992).[41] Karlin S, Brendel V, Patchiness and correlations in DNAsequences,

Science , 677-680 (1993).[42] Peng CK, Buldyrev SV, Havlin S, Simons M, StanleyHE and Goldberger AL, Mosaic organization of DNA nu-cleotides,

Phys. Rev. E , 1685-1689 (1994).[43] Bernaola-Galv´an P, Rom´an-Rold´an R, Oliver JL, Com-positional segmentation and long-range fractal correla-tions in DNA sequences, Phys. Rev. E , 5181-5189(1996).[44] Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J,Cuny G, Meunier-Rotival M and Rodier F, The mosaicgenome of warm-blooded vertebrates, Science , 953-958 (1985).[45] Rajeev K, Azad J, Subba R, Wentian Li, and Ramakr-ishna R, Simplifying the mosaic description of DNA se-quences,

Phys. Rev. E , 031913 (2002).[46] Carpena P , Bernaola-Galv´an P, Coronado AV, Hacken-berg M, Oliver JL, Identifying characteristic scales in thehuman genome, Phys. Rev. E , 032903 (2007).[47] Jurka J, Smith T, A fundamental division in the Alufamily of repeated sequences, Proc Natl Acad Sci U S A , 47758 (1988).[48] Muoz-Lpez M, Garca-Prez JL, DNA transposons: natureand applications in genomics, Current genomics (2),115128 (2010).[49] Jurka J, Bao W, Kojima KK, Families of transposableelements, population structure and the origin of species, Biology direct6