[PDF] Fibration symmetries uncover the building blocks of biological networks

Abstract

Full PDF

FFibration symmetries uncover the building blocks of biologicalnetworks

Flaviano Morone, Ian Leifer, Hern´an A. Makse

Levich Institute and Physics Department,City College of New York, New York, NY 10031

Abstract

A major ambition of systems science is to uncover the building blocks of any biological networkto decipher how cellular function emerges from their interactions. Here, we introduce a graphrepresentation of the information ﬂow in these networks as a set of input trees, one for each node,which contains all pathways along which information can be transmitted in the network. In thisrepresentation, we ﬁnd remarkable symmetries in the input trees that deconstruct the networkinto functional building blocks called ﬁbers. Nodes in a ﬁber have isomorphic input trees and thusprocess equivalent dynamics and synchronize their activity. Each ﬁber can then be collapsed into asingle representative base node through an information-preserving transformation called ’symmetryﬁbration’, introduced by Grothendieck in the context of algebraic geometry. We exemplify thesymmetry ﬁbrations in gene regulatory networks and then show that they universally apply acrossspecies and domains from biology to social and infrastructure networks. The building blocks areclassiﬁed into topological classes of input trees characterized by integer branching ratios and fractalgolden ratios of Fibonacci sequences representing cycles of information. Thus, symmetry ﬁbrationsdescribe how complex networks are built from the bottom up to process information through thesynchronization of their constitutive building blocks. a r X i v : . [ q - b i o . M N ] J un central theme in systems science is to break down the system into its fundamentalbuilding blocks to then uncover the principles by which complex collective behavior emergesfrom their interactions [1–3]. In number theory, every natural number can be represented bya unique product of primes. Thus, prime numbers are the building blocks of natural numbers.This mathematical notion of building blocks is extended to the more abstract notion of grouptheory since ﬁnite groups can also be factored into simple subgroups [4]. The latter example,entirely abstract as it may be, has important implications for natural systems due to thefundamental relationship between group theory and the notion of symmetry, that has ledto the discovery of the fundamental building blocks of matter, such as quarks and leptons[3, 5]. Here we ask whether similar principles of symmetry can uncover the fundamentalbuilding blocks of biological networks [1, 2, 6, 7]. Primary examples of these networks aregene regulatory networks that control gene expression in cells [2, 8–10], as well as metabolicnetworks, cellular processes and pathways, neural networks and ecosystems and, beyondbiology, to other information-processing networks like social and infrastructure networks [7].Previous studies have identiﬁed building blocks or ‘network motifs’ [2, 6, 8] by looking forpatterns in the network that appear more often that they would by pure chance. The cruxof the matter is to test whether the building blocks of these networks obey a predictivedesign principle that explains how the cell functions, and whether such a principle can beexpressed in the language of symmetries.We introduce the use of symmetries in biological networks by analyzing the transcriptionalregulatory network of bacterium Escherichia coli [11], since this is a well-characterizednetwork. We ﬁnd that this network exhibits ﬁbration symmetries [12–14]; ﬁrst introducedby Grothendieck [12] in the context of algebraic geometry.Symmetry ﬁbrations are morphisms between networks that identify clusters of synchro-nized genes (called ﬁbers ) with isomorphic input trees. Genes in a ﬁber are collapsed by asymmetry ﬁbration into a single representative gene called the base . The ﬁbers are then thesynchronized building blocks of the genetic network and symmetry ﬁbrations are transforma-tions that preserve the dynamics of information ﬂow in the network. We use this symmetryprinciple to classify the building blocks into topological classes of input trees characterizedby integer branching ratios and complex topologies with golden ratios of Fibonacci sequencesrepresenting cycles in the network. We then show that symmetry ﬁbrations explain synchro-nization patterns of gene co-expression in cells and universally apply to a range of complex2etworks across diﬀerent species and domains beyond biology.

I. RESULTS

We search for symmetries in the

E. coli transcriptional regulatory network (most updatedcompilation at RegulonDB [11]) where nodes are genes and a directed link represents atranscriptional regulation (see Supplementary Information Section III).A directed link from a source gene i to a target gene j in a transcriptional regulatorynetwork represents a direct interaction where gene i encodes for a transcription factor thatbinds to the binding site of gene j to regulate (activate or repress) its expression. Such a linkrepresents a regulatory ’message’ sent by the source to the target gene using the transcriptionfactor as a ‘messenger’. This process deﬁnes the ’information ﬂow’ in the system which isnot restricted to two interacting genes, but it is transferred to diﬀerent regions within thenetwork that are accessible through the connecting pathways. The information arriving toa gene contains the entire history transmitted through all pathways that reach this gene.We formalize this process of communication between genes with the notion of ’input tree’ ofthe gene. In a network G = ( N G , E G ) with N G nodes and E G directed edges, for every gene i ∈ N G there is a corresponding input tree, denoted as T i , which is the tree of all pathwaysof G ending at i . More precisely, T i is a rooted tree with a selected node i at the root,such that every other node j in the tree represents the initial node of a path in the networkending at i .Next, we analyze the input trees in the E. coli sub-circuit shown in Fig. 1a regulatedby gene cpxR which regulates its own expression (via an autoregulation activator loop)and also regulates other genes as shown in the ﬁgure. Gene cpxR is not regulated by anyother transcription factor in the network, so, we say that this gene forms its own ‘stronglyconnected component’, see below. Therefore, it is an ideal simple circuit to explain theconcept of ﬁbration.

A. Input tree representation

In practice, the input tree of a gene is constructed as follows (SI Section IV A). Considerthe circuit in Fig. 1a. The input tree of gene spy depicted in Fig. 1b starts with spy at the3oot (ﬁrst layer). Since this gene is upregulated by baeR and cpxR , then, the second layerof the input tree contains these two pathways of length one starting at both genes. Gene baeR is further upregulated by cpxR and by itself through the autoregulation loop and cpxR is also autoregulated. Thus, the input tree continues to the third layer taking into accountthese three possible pathways of length 2, one starting at baeR and two starting at cpxR .The procedure now continues, and since there are loops in the circuit, the input tree has aninﬁnite number of layers.The input tree formalism is a powerful framework to search for symmetries in information-processing networks, in that it replaces the canonical notion of a single trajectory with theset of all possible ‘histories’ from an initial to a ﬁnal state of the network, and this makes,in practice, reasonably straightforward to ‘guess’ a type of symmetry which is not apparentin the classical network framework. Based on results from [13–16], we will show in SectionI C that if two input trees have the same ’shape’, then the genes at the root of the inputtrees synchronize their activity [17–23], even though their input trees are made of diﬀerentgenes. This informal notion of equivalence is formalized by isomorphisms. An isomorphismbetween two input trees is a bijective map that preserves the topology of the input treesincluding the type of links. Speciﬁcally, a map τ : T → T (cid:48) is an isomorphism iﬀ for any pairof nodes a and b of T connected by a link, the pair of nodes τ ( a ) and τ ( b ) of T (cid:48) is connectedby the same type of link (SI Section IV B). In practice, this means that isomorphic inputtrees are ‘the same’ except for the labeling of the nodes. Genes with isomorphic input treesare symmetric and synchronous. We quantify this result, next, by introducing the conceptof symmetry ﬁbration [13]. B. Symmetry ﬁbration of a network

The set of all input tree isomorphisms deﬁnes the symmetries of the network, which can bedescribed by a ’Grothendieck ﬁbration’ [12]. The original Grothendieck deﬁnition of ﬁbrationis between categories [12], so the passage to a deﬁnition of ﬁbrations between graphs requiresto associate a category with a graph and rephrase Grothendieck’s deﬁnition in elementaryterms. Diﬀerent categories may be associated with a graph, giving rise to diﬀerent notionsof ﬁbrations between graphs. The notion of ﬁbration that we use henceforth has beenintroduced in computer science as a ’surjective minimal graph ﬁbration’ [13, 15].4n general, a graph ﬁbration G = ( N G , E G ) is any morphism ψ : G → B (1)that maps G to a graph B = ( N B , E B ) (with N B nodes and E B edges) called the ’base’ of the graph ﬁbration ψ (SI Section IV C). In this work we consider a surjective minimalgraph ﬁbration [13] which is a graph ﬁbration ψ that maps all nodes with isomorphic inputtrees inside a ﬁber to a single node in B , thus producing the minimal base of the network.In this case, the base B consists of a graph where all genes in a ﬁber have been collapsedinto one representative node by the minimal ﬁbration. Thus, a surjective minimal graphﬁbration, hereafter called symmetry ﬁbration for the sake of lexical convenience, leads to adimensional reduction of the network into its irreducible components. Crucially, a symmetryﬁbration is a dimensional reduction that preserves the dynamics in the network as we shownext. C. Symmetry ﬁbration leads to synchronization

Next, we explain the connection between ﬁbration and synchrony in a generality that isneeded to justify our results following Ref. [15, 16]. In order to describe the dynamical stateof each gene in the transcriptional regulatory network, we ﬁrst attach a phase space to eachnode in G = ( N G , E G ) by considering a map P : N G → M that assigns each node i ∈ N G tothe phase space of the node denoted by the manifold M . For example, in a transcriptionalregulatory network we assign to each gene i ∈ N G the phase space of real numbers M = R .Then, the state of each gene is described by x i ( t ) ∈ R , representing the expression level ofthe gene i at time t , which is typically measured by mRNA concentration of gene product.We then obtain the total phase space of G as the product P G = (cid:81) i ∈ N G P ( i ).The ﬁbers partition the graph G into unique and non-overlapping sets Π = { Π , . . . , Π r } ,such that Π ∪ · · · ∪ Π r = G and Π k ∩ Π l = ∅ if k (cid:54) = l [24]. We denote i ∼ Π j whenthe input-trees of i and j are isomorphic and belong to the same ﬁber Π k . That is, ∃ k | i, j ∈ Π k and there exist a symmetry ﬁbration that sends both nodes to the samenode in the base, ψ ( i ) = ψ ( j ). DeVille & Lerman [15] showed that symmetry ﬁbrationsinduce robust synchronization in the system (Theorem 4.3.1 in [15]). In particular, it wasshown that if ψ is a symmetry ﬁbration then— by proposition 2.1.12 in Ref. [15]— there5xist a map P ψ : P B → P G that maps the total phase space of the base B , named P B ,to the total phase space of the graph G . This map creates a polysynchronous subspace ofsynchronized solutions in ﬁbers: ∆ Π = { x ∈ P G | x i ( t ) = x j ( t ) whenever ψ ( i ) = ψ ( j ) } ,where each set of synchronous components of this subspace corresponds to a ﬁber in Π(Lemma 5.1.1 in [15], see also [16]). In other words, ∆ Π is a polysynchronous subspace of P G , such that components x i , x j ∈ x synchronize (i.e., x i ( t ) = x j ( t )) whenever the symmetryﬁbration ψ sends them to the same node in B .According to these results, we interpret synchronous genes to process the same informa-tion received through isomorphic pathways in the network, and, accordingly, we interpreta symmetry ﬁbration as a transformation that preserves the dynamics of information ﬂowsince it collapses synchronous nodes in ﬁbers (redundant from the point of view of dynamics)into a common base with identical dynamics as the ﬁber.Synchronous nodes in a ﬁber induced by symmetry ﬁbrations correspond to the ’minimalbalanced coloring’ in [14]. A balanced coloring assigns two nodes the same color only iftheir inputs, self-consistently, receives the same content of colored nodes, whence the term‘balanced’. Thus, the ﬂow of information arriving to genes in a ﬁber is analogous to a processof assigning a color to each gene such that each gene ‘receives’ the colors from adjacent genesvia incoming links and ‘sends’ its color to the adjacent genes via its outgoing links. Thenodes in a ﬁber have the same color symbolizing the fact that they synchronize. The nodeswith the same color in the balanced coloring partition [14] correspond to ﬁbers induced bysymmetry ﬁbrations [15]. We use the minimal balanced coloring algorithm proposed in [25]for the computation of minimal bases [24] to ﬁnd ﬁbers (SI Section V). D. Strongly connected components of the

E. coli network

The input trees in the

E. coli cpxR circuit are displayed in Fig. 1b. The input trees of baeR and spy are isomorphic and deﬁne the baeR - spy ﬁber (Fig. 1c). We call this circuit afeed-forward ﬁber (FFF). The input tree of cpxR is not isomorphic to either baeR or spy ,and therefore cpxR is not symmetric with these genes, but it is isomorphic to bacA, slt and yebE forming another ﬁber. Likewise, genes ung, tsr and psd are all isomorphic composinganother ﬁber (Fig. 1b). Figure 1d shows the symmetry ﬁbration ψ : G → B that collapsesthe genes in the ﬁbers to the base B . Figure 1e shows another example (out of many) of a6ingle connected component, fadR , and its corresponding isomorphic input trees (Fig. 1f),ﬁbers and base.The dynamical state of a gene is encoded in the topology of the input-tree. In turn, thistopology is encoded by a sequence, a i , deﬁned as the number of genes in each i − th layer ofthe input tree (Fig. 1b). The sequence a i represents the number of paths of length i − n of the input tree deﬁned as a i +1 /a i −−−→ i →∞ n , which represents the multiplicative growth ofthe number of paths across the network reaching the gene at the root. For instance, theinput trees of genes baeR-spy (Fig. 1b) encode a sequence a i = i with branching ratio n = 1representing the single ( n =1) autoregulation loop inside the ﬁber.Beyond several single-gene strongly connected components like those shown in Fig. 1,we ﬁnd that the E. coli network has other strongly connected components [in a stronglyconnected component, each gene is reachable from every other gene, SI Section VI], threein total, which regulate more involved topologies of ﬁbers. We ﬁnd: (i) a two-gene stronglyconnected component composed of master regulators crp-ﬁs involved in a myriad of functionslike carbon utilization (Fig. 2a, top), (ii) a ﬁve-gene strongly connected component involvedin the stress response system (SI Fig. 7), and (iii) the largest strongly connected componentat the core of the network which is composed of genes involved in the pH-system that regulatehydrogen concentration (Fig. 2b). Each of these three components regulate a rich varietyof ﬁber topologies which are collapsed into the base by the symmetry ﬁbration ψ : G → B ,as shown in the ﬁgure. E. Fiber building blocks

We ﬁnd that the transcriptional regulatory network of

E. coli is organized in 91 diﬀerentﬁbers. The complete list of ﬁbers in

E. coli is shown in SI Section VII and SI-Table VI andthe statistics are shown in SI Table I. Plots of each ﬁber are shown in Supplementary File 1.We ﬁnd a rich variety of topologies of the input trees. Despite this diversity, the input treespresent common topological features that allow us to classify all ﬁbers into concise classesof fundamental ’ﬁber building blocks’ (Figs. 3a and 3b). We associate a building block toa ﬁber by considering the genes in the ﬁber plus the external in-coming regulators of theﬁber plus the minimal number of their regulators in turn that are needed to establish the7somorphism in the ﬁber. When the ﬁber is connected to any external regulator, either viaa direct link or through a path in the strongly connected component forming a cycle, thenthe genes in this cycle are considered part of the building block of the ﬁber, since such acycle is crucial to establish the dynamical syncronization state (when there is more than onecycle, the shortest cycle is considered).We ﬁnd that the most basic input tree topologies can be classiﬁed by integer ’ﬁber num-bers’ | n, (cid:96) (cid:105) reﬂecting two features: (a) inﬁnite n -ary trees with branching ratio n representingthe inﬁnite pathways going through n loops inside the base of the ﬁber, and (b) ﬁnite treesrepresenting ﬁnite pathways starting at (cid:96) external regulators of the ﬁber. The most basicﬁbers in E. coli have three values of n = 0 , , (i) ﬁbers with n = 0 loops, calledStar Fibers (SF), (ii) ﬁbers with n = 1 loop, called Chain Fibers (CF), and (iii) ﬁberswith n = 2 loops, called Binary-Tree Fibers (BTF). This classiﬁcation does not take intoaccount the types of repressor or activator links in the building blocks, which lead to furthersub-classes of ﬁbers that determine the type of synchronization (ﬁxed point, limit cycles,etc) and thus the functionality of the ﬁbers.Figure 3a shows a sample of dissimilar circuits that can be concisely classiﬁed by | n, (cid:96) (cid:105) (full list in Supplementary File 1). For instance the n = 0 SF class includes dissimilarcircuits like | arcZ-ydeA (cid:105) = | , (cid:105) , | dcuC-ackA (cid:105) = | , (cid:105) which is a bi-fan network motif [2],and generalizations with (cid:96) = 3 regulators like | dcuR-aspA (cid:105) = | , (cid:105) (Fig. 3a, top). The mainfeature of these building blocks is that they do not contain loops and therefore the inputtrees are ﬁnite. The CF class contains n = 1 loop in the ﬁber, and therefore an inﬁnitechain in the input tree, like the autoregulated loop in the ﬁber | ttdR (cid:105) = | , (cid:105) . We notethat while the input tree is inﬁnite, the topological class is characterized by a single number n = 1 concisely represented in the base. Furthermore, a theorem proven by Norris [26]demonstrates that it suﬃces to test N G − (cid:96) = 1) to this circuit, converts it to the purine ﬁber | purR (cid:105) = | , (cid:105) which is anexample of a FFF, like the baeR circuit in Fig. 1a. This circuit resembles a feed-forwardloop motif [2], but it diﬀers in the crucial addition of the autoregulator loop at purR thatallows genes purR and pyrC to synchronize. When another external regulator is added, weﬁnd the idonate ﬁber | idnR (cid:105) = | , (cid:105) . More elaborated circuits contain two autoregulatedloops and feed-back loops featuring trees with branching ratio n = 2.8 . Fibonacci ﬁbers So far we have analyzed building blocks that receive information from the external regu-lators in their respective strongly connected components, but do not send back informationto the external regulators. These topologies are characterized by integer branching ratios, n = 0 , ,

2, as shown in Fig. 3a. We ﬁnd, however, more interesting building blocks that alsosend information back to their regulators. These circuits contain additional cycles in thebuilding blocks that transform the input trees into fractal trees characterized by non-integerfractal branching ratios. Notably, the building block of the ﬁber uxuR-lgoR that is regulatedby the connected component crp-ﬁs (Fig. 2) forms an intricate input tree (Fig. 3b, top)where the number of paths of length i − a i =1, 3,4, 7, 11, 18, 29, ... characterized by the Fibonacci recurring relation: a = 1, a = 3, and a i = a i − + a i − for i >

2. This sequence leads to the non-integer branching ratio known asthe golden ratio: a i +1 /a i −−−→ i →∞ ϕ = (1 + √ / . ... This topology arises in the genetic network due to the combination of two cycles ofinformation ﬂow. First, the autoregulation loop inside the ﬁber at uxuR creates a cycleof length d = 1 which contributes to the input tree with an inﬁnite chain with branchingratio n = 1. This sequence is reﬂected in the Fibonacci series by the term a i = a i − . Theimportant addition to the building block is a second cycle of length d = 2 between uxuR inthe ﬁber and its regulator exuR : uxuR → exuR → uxuR . This cycle sends information fromthe ﬁber to the regulator and back to the ﬁber by traversing a path of length d = 2 thatcreates a ’delay’ of d = 2 steps in the information that arrives back to the ﬁber (see Fig.3b, top). This short-term ’memory’ eﬀect is captured by the second term a i = a i − in theFibonacci sequence leading to a i = a i − + a i − and the golden ratio. We call this topologya Fibonacci ﬁber (FF).This argument implies that an autoregulated ﬁber that further regulates itself by con-necting to its connected component via a cycle of length d encodes a generalized Fibonaccisequence of order d deﬁned as a i = a i − + a i − d with generalized golden ratio ϕ d (Fig. 3bfourth row). We ﬁnd such a Fibonacci sequence in the evgA-nhaR ﬁber building blocklinked to the pH strongly connected components shown in Fig. 2b. This ﬁber contains anautoregulation cycle inside the ﬁber and also an external cycle of length d = 4 throughthe pH strongly connected component: evgA → gad E → gadX → hns → evgA (Fig. 3b,9hird row). This topology forms a fractal input tree with sequence a i = a i − + a i − (se-quence A123456 in [27]) and branching golden ratio ϕ = 1 . ... We call this topology4-Fibonacci ﬁber, 4-FF. Generalized Fibonaccis appear inside strongly connected compo-nents, like the rcsB-adiY d , the Fibonacci sequencegeneralizes to: a i = a i − + a i − + · · · + a i − − d + a i − d , and the branching ratio satisﬁes: d = − log(2 − ϕ d )log ϕ d [28]. G. Multi-layer composite ﬁbers

Building blocks can also be combined to make composite ﬁbers, like prime numbersor quarks can be combined to form natural numbers or composite particles like protonsand neutrons, respectively. The ability to assemble ﬁber building blocks to make largercomposites is important in that it helps to understand systematically higher order functionsof biological systems composed of many genetic elements. We discover a particular type ofcomposite made up of two elementary building blocks, that we name multi-layer compositeﬁber. For instance, the double layer add-oxyS ﬁber in the crp-ﬁs connected component(see Figs. 2a and 3b bottom, and ID | add − oxyS (cid:105) = | , (cid:105) ⊕ | , (cid:105) made of a series of genes composing a single ﬁberof type | , (cid:105) = | add , dsbG , gor , grxA , hemH , oxyS , trxC (cid:105) that are regulated by two diﬀerenttranscription factors rbsR and oxyR that form another ﬁber of type | , (cid:105) = | rbsR, oxyR (cid:105) .This composite is of importance since it allows for information to be shared between twogenes, for instance add and oxyS , which are not directly connected (in this case, separatedby a distant in the network of length two).Composite ﬁbers satisfy a simple engineering ’sum-rule’: elementary ﬁbers are composedin series of ﬁbers in a predeﬁned order where the ﬁrst layer is represented by an entry ﬁber(carrying transcription factors), and the last layer is formed by a terminator ﬁber of outputgenes (encoding enzymes), as shown in Fig. 3b, bottom. This multi-layer composite ﬁber isbiologically signiﬁcant because genes in the output layer synchronize a genetic module thatimplement the same function even though the genes in the module are not directly connected,and, indeed, can be at far distances in the network. Such functionally related modules couldnot be identiﬁed by modularity algorithms [29] which cluster nodes in modules of highly10onnected nodes.We ﬁnd that composite ﬁbers are dominant in eukaryotes (yeast, mouse, human, seeSection I H). They resemble the building blocks of multilayered deep neural networks whereeach subsequent gene in the layer synchronizes despite the fact that nodes can be distant inthe network. More generally, composite ﬁbers with multiple layers streamline the construc-tion of larger aggregates of ﬁbration building blocks performing more complex function in acoordinated fashion. These composite topologies complete the classiﬁcation of input trees. H. Fibration landscape across biological networks, species and system domains

To study the applicability of ﬁbration symmetries across domains of complex net-works we have analyzed 373 publically available datasets (SI Section VIII). Full details ofeach network and results can be accessed at https://docs.google.com/spreadsheets/d/1-RG5vR_EGNPqQcnJU8q3ky1OpWi3OjTh5Uo-Xa0PjOc . The codes to reproduce this analysisare at github.com/makselab (SI Section V) and the full datasets at kcorelab.org . Weanalyze biological networks spanning from transcriptional regulatory networks, metabolicnetworks, cellular processes networks and signaling pathways, disease networks, and neuralnetworks. We span diﬀerent species ranging from A. thaliana, E. coli, B. subtilis, S. enterica(salmonella), M. tuberculosis, D. melanogaster, S. cerevisiae (yeast), M. musculus (mouse)to H. sapiens (human). The topological ﬁber numbers | n, (cid:96) (cid:105) allow us to systematically clas-sify ﬁbers across the diﬀerent domains in a unifying way. We ﬁnd that ﬁbration symmetriesare found across all biological processes and domains. The ﬁber distributions for each typeof biological network calculated by summing over the studied species are displayed in Fig.4a and the ﬁber distributions for each species calculated over the type of biological net-works are shown in Fig. 4b. Our analysis allows to investigate the speciﬁc attributes andcommonalities of the ﬁber building blocks inside and across biological domains. We ﬁnd avaried set of ﬁbers that characterize the biological landscape. Certain features of the ﬁbernumber distribution are visible in the transcriptional networks in Fig. 4a. For instance, atail with (cid:96) is seen in the n = 0 class as well as in the n = 1 class. Across species (Fig. 4b),bacteria like E. coli or B. subtilus display a majority of n = 0 building blocks, while higherlevel organisms like yeast, mouse and human display a majority of more complex buildingblocks like multi-layers and Fibonaccis. 11o test the existence of symmetry ﬁbrations across other domains we extend our studiesto complex networks beyond biology ranging from social, infrastructure, internet, software,economic networks and ecosystems (details of datasets in SI Section VIII). Figure 4c showsthe obtained ﬁber distributions for each domain. A normalized comparison across domainsis visualized in Fig. 4d showing the cumulative number of ﬁbers over all domains andspecies per network size of 10 nodes. The results support the applicability of the conceptof symmetry ﬁbration beyond biology to describe the building blocks of networks across alldomains. I. Gene co-expression and synchronization via symmetry ﬁbration

We have shown in Section I C that ﬁbers in networks determine cluster synchronizationin the dynamical system. In a gene regulatory network, symmetric genes in a ﬁber synchro-nize their activity to produce gene co-expression levels that sustain cellular functions. Wecorroborate this result numerically in Fig. 1g in the particular example of the baeR-spy

FFFin

E. coli , and this result applies to all ﬁbers, irrespective of the dynamical system law.To exemplify the synchronization in ﬁbers, we consider the dynamics in the compositeﬁber | add − oxyS (cid:105) = | , (cid:105)⊕| , (cid:105) depicted in Fig. 2a and Fig. 3b bottom, which is composedof autoregulator 1 = crp , and two layers of ﬁbers: 2 = rbsR , 3 = oxyR , and 4 = add ,5 = oxyS (we consider here a reduced ﬁber for simplicity, and we add the autoregulatorto crp to the building block for completeness). Graph G = { N G , E G } consists of N G = { , , , , } , E G = { → , → , → , (cid:97) , (cid:97) , → , → } ( (cid:97) refers torepressor and → to activation) and a 5-dimensional total phase space P G = R with statevector X ( t ) = { x ( t ) , x ( t ) , x ( t ) , x ( t ) , x ( t ) } describing the expression levels of each gene’sproduct (e.g., mRNA concentration).The symmetry ﬁbration ψ : G → B collapses the graph G into the base B = { N B , G B } ,where N B = { a, b, c } and E B = { a → a, a → b, b (cid:97) b, b → c } . The symmetry ﬁbrationacts on the nodes: ψ (1) = a , ψ (2) = ψ (3) = b , ψ (4) = ψ (5) = c , and on the edges: ψ (1 →

1) = a → a , ψ (1 →

2) = ψ (1 →

3) = a → b , ψ (2 (cid:97)

2) = ψ (3 (cid:97)

3) = b (cid:97) b , and ψ (2 →

4) = ψ (3 →

5) = b → c . Thus, the ﬁbers partition the graph G as Π = { Π a , Π b , Π c } ,where Π a = { } , Π b = { , } and Π c = { , } .We represent the dynamics by two functions k ( x ) and g ( x ) modeling degradation and12ynthesis of gene product, respectively [9, 10]. For example, k ( x ) can be modeled as alinear degradation term and g ( x ) as a Hill function [9]. We consider that multiple inputsare combined by multiplying functions g ( x ), but any other way of combining inputs can beused. Then, the dynamics of the expression levels of the genes in the circuit are describedby:  dx dt = − k ( x ) + g ( x ) dx dt = − k ( x ) + g ( x ) ∗ g ( x ) dx dt = − k ( x ) + g ( x ) ∗ g ( x ) dx dt = − k ( x ) + g ( x ) dx dt = − k ( x ) + g ( x ) . (2)The dynamics of the base are described by the state vector of the base: ( y a ( t ) , y b ( t ) , y c ( t ))with dynamical equations [16]:  dy a dt = − k ( y a ) + g ( y a ) dy b dt = − k ( y b ) + g ( y a ) ∗ g ( y b ) dy c dt = − k ( y c ) + g ( y b ) . (3)If ( y a ( t ) , y b ( t ) , y c ( t )) is a solution for the base Eqs. (3), then the map P ψ sends the phasespace of this base to the phase space of the solutions in the graph G [16]: (cid:16) x ( t ) , x ( t ) , x ( t ) , x ( t ) , x ( t ) (cid:17) = P ψ (cid:104) y a ( t ) , y b ( t ) , y c ( t ) (cid:105) = (cid:16) y a ( t ) , y b ( t ) , y b ( t ) , y c ( t ) , y c ( t ) (cid:17) . (4)Therefore, the graph G sustains a polysynchronous subspace (see for instance Motivatingexample 1.4 in [15]):∆ Π = { ( x , x , x , x , x ) ∈ R | x ( t ) , x ( t ) = x ( t ) , x ( t ) = x ( t ) } . (5)This result can be corroborated by simply plugging (cid:16) x ( t ) , x ( t ) , x ( t ) = x ( t ) , x ( t ) , x ( t ) = x ( t ) (cid:17) into Eqs. (2) to obtain a solution of the dynamics, implyingthe synchrony x ( t ) = x ( t ) in ﬁber Π b and x ( t ) = x ( t ) in ﬁber Π c . We note that the con-13ept of sheaves and stacks might be useful to generalize the symmetry ﬁbration frameworkto multiplex networks.We test this gene synchronization with publically available transcription proﬁle experi-ments available from the literature. We use gene expression data proﬁles in E. coli compiledat Ecomics http://prokaryomics.com [30]. This portal collects microarray and RNA-seqexperiments from diﬀerent sources such as the NCBI Gene Expression Omnibus (GEO) pub-lic database [31] and ArrayExpress [32] under diﬀerent experimental growth conditions. Thedata is also compiled at the Colombos web portal [33]. The database contains transcriptomeexperiments measuring the expression level of 4,096 genes in

E. coli strains over 3,579 exper-imental conditions which are described as: strain, medium, stress, and perturbation. Rawdata is pre-processed to obtain expression levels by using noise reduction and bias correctionto normalize data across diﬀerent platforms [30].

E. coli can adapt its growth to the diﬀerent conditions that ﬁnds in the medium. Thisadaptation is made by sensing extra and intracellular molecules and using them as eﬀectorsto activate or repress transcription factors. This implies that the diﬀerent ﬁbers are activatedby speciﬁc experimental conditions. The Ecomics portal allows to obtain those experimentalconditions where a set of genes has been signiﬁcantly expressed under a particular set ofconditions. We perform standard gene expression analysis (see colombos.net and Ref. [33])of the expression levels in

E. coli obtained under these conditions.For a given set of genes in a ﬁber, we ﬁnd the experimental conditions for which thegenes have been signiﬁcantly expressed by comparing the expression samples over the 4,096diﬀerent growth conditions. Following [33], the experimental conditions are ranked with theinverse coeﬃcient of variation (ICV) deﬁned as ICV k = | µ k | /σ k , where µ k is the averageexpression level of the genes in the condition k and σ k is the standard deviation. Following[33], we select those conditions with ICV k >

1, i.e., where the average expression levels inthe particular condition k are higher than the standard deviation. This score reﬂects thefact that, in a relevant condition, the genes show an increment of their expression abovethe individual variations caused by random noise. Details on the expression analysis can befound at Ref. [33] and https://doi.org/10.1371/journal.pone.0020938.s001 . Thus,we obtain expression levels organized by the relevant experimental conditions which arelabeled according to the GEO database [31]. From these data, we calculate the co-expressionmatrix using the Pearson correlation coeﬃcient between the expression levels of two genes14 and j in the relevant conditions for genes in a ﬁber. For oﬀ-diagonal correlations betweengenes in diﬀerent ﬁbers, we use the combined sets of conditions of both genes.Results for the correlation matrix are shown in Fig. 2a (bottom) for ﬁbers regulatedby the crp-ﬁs strongly connected component. Gene expression is obtained for every gene,so we plot the correlation matrix calculated over each pair of genes. Genes that belong tothe same operon are transcribed as a single unit by the same mRNA molecule, so thesegenes are expected to trivially synchronize (variations exist due to attenuators inside theoperon). Thus, we group together these genes as operons in the ﬁgure to indicate thistrivial synchronization. To test the existence of ﬁber synchronization we compare gene co-expression belonging to diﬀerent operons. Figure 2a (bottom) shows that expression levelsof the genes that belong to a ﬁber are highly correlated as predicted by the symmetryﬁbration. Genes that belong to diﬀerent ﬁbers show no signiﬁcant correlations among them.In particular, there is no signiﬁcant correlation between the expression of genes in a givenﬁber and the two master regulators crp and ﬁs . This result is consistent with the ﬁbrationsymmetry and occurs despite the fact that both, crp and ﬁs , directly regulates all genesin the studied ﬁbers. We ﬁnd some oﬀ-diagonal weak correlations between ﬁbers (e.g., malI ), probably indicating missing links or missing regulatory processes that produce extrasynchronizations. Some genes present weak correlations inside ﬁbers (e.g., cirA ), indicatingweak symmetry breaking probably from asymmetries in the strength of binding rate oftranscription factors or input functions; eﬀects that are not considered in the topologicalview of the input trees, and can lead to desynchronization inside the ﬁber. II. DISCUSSION

Fibration symmetries make sure that genes are turned on and oﬀ at the right amountto assure the synchronization of expression levels in the ﬁber needed to execute cellularfunctions. In the ﬁbration framework, network function can be pictured as an orchestra inwhich each instrument is a gene in the network. When the instruments play coherently, withstructured temporal patterns, the network is functional. Here we have concentrated on thesimplest temporal organization, one in which some units (instruments) act synchronouslyin time, a ubiquitous pattern observed in all biological networks. Our ﬁndings identify thesymmetries that predict this synchronization and give rise to functionally related genes from15he ﬁbrations of the genetic network.Unlike network motifs which are identiﬁed by statistical overrepresentation [2], ﬁbersin biology arise from principles of symmetries following the tradition of how the buildingblocks of elementary particles have being discovered in physics and geometry [5]. Ourﬁrst principle approach to identify building blocks is based on the circuit’s theoretical andpractical (rather than statistical) signiﬁcance to serve minimal forms of coherent functionand logic computation.Further results shown in [34] indicate that symmetries also describe the structure ofneural connectomes and these symmetries factorize according to function. Thus, symme-tries can be used to systematically organize biological diversity into building blocks usinginvariances in the information ﬂow encoded in the topologies of the input trees. Genes re-lated by symmetries are co-expressed, thus providing a functional rationale for the biologicalexistence of these symmetries.

Acknowledgments

Research was sponsored by NIH-NIGMS R01EB022720, NIH-NCIU54CA137788/U54CA132378, NSF-IIS 1515022 and NSF-DMR 1308235. We thankL. Parra, W. Liebermeister, C. Ishida, M. S´anchez and J. D. Farmer for discussions. FM,IL, and HAM designed research, performed research, analyzed data, and wrote the paper.16

1] Hartwell, L. H., Hopﬁeld, J. J., Leibler, S. & Murray, A. W. From molecular to modular cellbiology.

Nature , C47-C52 (1999).[2] Alon, U.

An Introduction to Systems Biology: Design Principles of Biological Circuits (CRCPress, Boca Raton, 2006).[3] Gell-Mann, M.

The Quark and the Jaguar (Holt Paperbacks, New York, 1994).[4] Dixon, J. D. & Mortimer, B.

Permutation Groups , Graduate Texts in Mathematics, 163(Springer-Verlag, New York, 1996).[5] Weinberg, S.

The Quantum Theory of Fields (Cambridge University Press, Cambridge, 2005).[6] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. & Alon, U. Network motifs:simple building blocks of complex networks.

Science , 824-827 (2002).[7] Buchanan, M., Caldarelli, G., De Los Rios, P. Rao, F. & Vendruscolo M. (editors),

Networksin Cell Biology (Cambridge University Press, Cambridge, 2010).[8] Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptionalregulation network of

Escherichia coli . Nature Genet. , 64-68 (2002).[9] Karlebach, G., Shamir, R. Modeling and analysis of gene regulatory networks. Nature reviews:Molecular Cell Biology , 770-780 (2008).[10] Klipp, E., Liebermeister, W., Wierling, C. & Kowald A. Systems Biology (Wiley-VCH, Wein-heim, 2016).[11] Gama-Castro, S. et al. , RegulonDB version 9.0: high-level integration of gene regulation,coexpression, motif clustering and beyond. Nucleic Acids Res. , D133-D143 (2016). http://regulondb.ccg.unam.mx .[12] Grothendieck, A. Technique de descente et th´eor´emes d’existence en g´eom´etrie alg´ebrique, I.G´en´eralit´es. Descente par morphismes ﬁd´element plats. S´eminaire N. Bourbaki , Talk no. 190,p. 299-327 (1958-1960). , https://ncatlab.org/nlab/show/Grothendieck+fibration [13] Boldi, P. & Vigna, S. Fibrations of graphs. Discrete Mathematics , 21-66 (2001). http://vigna.di.unimi.it/fibrations [14] Golubitsky, M. & Stewart, I. Nonlinear dynamics of networks: the groupoid formalism.

Bull.Amer. Math. Soc. , 305-364 (2006). [15] DeVille, L. & Lerman, E. Modular dynamical systems on networks. J. Eur. Math. Soc. ,2977-3013 (2015). https://arxiv.org/abs/1303.3907 [16] Nijholt, E., Rink, B. and Sanders, J. Graph ﬁbrations and symmetries of network dynamics. J. of Diﬀerential Equations , 4861-4896 (2016).[17] Abrams, D. M., Pecora, L. M. & Motter, A. E. Focus issue: Patterns of network synchroniza-tion.

Chaos , 094601 (2016).[18] Pecora, L. M., Sorrentino, F., Hagerstrom, A. M., Murphy, T. E. & Roy, R. Cluster synchro-nization and isolated desynchronization in complex networks with symmetries. Nature Comm. , 4079 (2014).[19] Sorrentino, F., Pecora, L. M., Hagerstrom, A. M., Murphy T. E. & Roy, R. Complete charac-terization of the stability of cluster synchronization in complex dynamical networks. Sci. Adv. , e1501737 (2016).[20] Stewart, I., Golubitsky, M. & Pivato, M. Symmetry groupoids and patterns of synchrony incoupled cell networks. SIAM J. Appl. Dyn. Syst. , 609-646 (2003).[21] Arenas, A., D´ıaz-Guilera, J. K. A., Moreno, Y. & Zhou, C. Synchronization in complexnetworks. Phys. Rep. , 93-153 (2008).[22] Rodrigues, F. A., Peron, T. K., Ji, P. & Kurths, J. The Kuramoto model in complex networks.

Physics Reports , 1-98 (2016).[23] Strogatz, S.

Nonlinear Dynamics and Chaos: with Applications to Physics, Biology, Chem-istry, and Engineering (Westview Press, Boulder, 2000).[24] Cardon, A. & Crochemore, M. Partitioning a graph in O ( | A | log | V | ). Theoretical ComputerScience , 85-98 (1982).[25] Kamei, H. & Cock, P. J. A. Computational of balanced relations and their lattice for a coupledcell network. SIAM J. Appl. Dyn. Syst. , 352-382 (2013).[26] Norris, N. Universal covers of graphs: isomorphism to depth n - 1 implies isomorphism to alldepths. Discrete Applied Mathematics , 61-74 (1995).[27] OEIS Foundation Inc. (2020), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A003269 [28] Gardner, M. The Scientiﬁc American Book of Mathematical Puzzles and Diversions, Vol. II,p. 101 (Simon and Schuster, 1961).

29] Girvan, M. & Newman, M. E. J. Community structure in social and biological networks.

Proc.Nat. Acad. Sci. US , 7821-7826 (2002).[30] Kim M. et al. Multi-omics integration accurately predicts cellular state in unexplored condi-tions for Escherichia coli. Nat. Commun. , 13090 (2016).[31] Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., Marshall,K. A., Phillippy, K. H., Sherman, P. M., Holko, M. et al. NCBI GEO: archive for functionalgenomics data sets– update.

Nucleic Acids Res., , D991-D995 (2016).[32] Kolesnikov, N., Hastings, E., Keays, M., Melnichuk, O., Tang, Y. A., Williams, E., Dylag,M., Kurbatova, N., Brandizi, M., Burdett, T. et al. ArrayExpress update: simplifying datasubmissions.

Nucleic Acids Res. , D1113-D1116 (2015).[33] Moretto, M., Sonego, P., Dierckxsens, N., Brilli, M., Bianco, L., Ledezma-Tejeida, D., Gama-Castro, S., Galardini, G., Romualdi, C., Laukens, C., Collado-Vides, J., Meysman, P. & En-gelen, K. COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses. Nucleic Acids Res. , D620-D623 (2016). http://colombos.net [34] Morone, F. & Makse, H. A. Symmetry group factorization reveals the structure-functionrelation in the neural connectome of Caenorhabditis elegans , Nature Comm. , 4961 (2019).[35] Boldi, P., Lonati, V., Santini, M. & Vigna, S. Graph ﬁbrations, graph isomorphism, andPageRank. RAIRO Inform. Th´eor. , 227-253 (2006).[36] Boldi, P. & Vigna, S. An eﬀective characterization of computability in anonymous networks.In J. L. Welch, editor, Distributed Computing. 15th International Conference, DISC 2001,number 2180 in Lecture Notes in Computer Science , pp 33-47 (Springer-Verlag, 2001).[37] Boldi, P. & Vigna, S. Universal dynamic synchronous self-stabilization.

Distr. Comput. ,137-153 (2002). Deﬁnition of input tree, symmetry ﬁbration, ﬁber and base . a, Thecircuit controlled by the cpxR gene regulates a series of ﬁbers as shown by the diﬀerentcolored genes. The circuit regulates more genes represented by the dotted lines which arenot displayed for simplicity. The full lists of genes and operons in this circuit are in SITable VI, ID=27, 28 and 54. b, The input tree of representative genes involved in the cpxR circuit showing the isomorphisms that deﬁne the ﬁbers. For each ﬁber, we show the numberof paths of length i − a i , and its branching ratio n . c, Isomorphism between the input trees of baeR and spy . The input trees are composed of aninﬁnite number of layers due to the autoregulation loop at baeR and cpxR . How to provethe equivalence of two input trees when they have an inﬁnite number of levels? A theoremproven by Norris [26] demonstrates that it suﬃces to ﬁnd an isomorphism up to N − N is the number of nodes in the circuit. Thus, in this case, 2 levels are suﬃcient toprove the isomorphism. d, Symmetry ﬁbration ψ transforms the cpxR circuit G into itsbase B by collapsing the genes in the ﬁbers into one. e, Symmetry ﬁbration of the fadR circuit and f, its isomorphic input trees. Full list of genes in this circuit appears in SI TableVI, ID=3, 4, and 58. g, Symmetric genes in the ﬁber synchronize their activity to producesame activity levels. We use the mathematical model of gene regulatory kinetics from Ref.[8] (sigmoidal interactions lead to qualitatively similar results) to show the synchronizationinside the ﬁber baeR-spy when the ﬁber is activated by its regulator cpxR . Notice that cpxR does not synchronize with the ﬁber.FIG. 2.

Strongly connected components of the genetic network and synchro-nization of gene co-expression in the ﬁbers in

E. coli . a, Top, Two-gene connectedcomponent of crp-ﬁs . This component controls a rich set of ﬁbers as shown. We also show thesymmetry ﬁbration collapsing the graph to the base. We highlight the ﬁber uxuR-lgoR whichsends information to its regulator exuR and forms a 2-Fibonacci ﬁber | ϕ = 1 . .., (cid:96) = 2 (cid:105) ,as well as the double-layer composite | add − oxyS (cid:105) = | , (cid:105) ⊕ | , (cid:105) . a, Bottom. Co-expression correlation matrix calculated from the Pearson coeﬃcient between the expressionlevels of each pair of genes in Fig. 2a. Synchronization of the genes in the respective ﬁbersis corroborated as the block structure of the matrix. b, The core of the

E. coli network isthe strongly connected component formed by genes involved in the pH system as shown.This component supports two Fibonacci ﬁbers: 3-FF and 4-FF and ﬁbers as shown. Hollowcolored circles indicate genes that are in ﬁbers and also belong to the pH component.20IG. 3.

Classiﬁcation of building blocks in

E. coli . a, Basic ﬁber building blocks .These building blocks are characterized by a ﬁber that does not send back information toits regulator. They are characterized by two integer ﬁber numbers: | n, (cid:96) (cid:105) . We show selectedexamples of circuits and input trees and bases. The full list of ﬁbers appears in SI TableVI and Supplementary File 1. The statistical count of every class is in SI Table I. The lastexample shows a generic building block for a general n-ary tree | n, (cid:96) (cid:105) with (cid:96) regulators. b,Complex Fibonacci and multilayer building blocks . These building blocks are morecomplex and characterized by an autoregulated ﬁber that sends back information to itsregulator. This creates a fractal input tree that encodes a Fibonacci sequence with goldenbranching ratio in the number of paths a i versus path length, i −

1. When the informationis sent to the connected component that includes the regulator, then a cycle of length d isformed and the topology is a generalized Fibonacci block with golden ratio ϕ d as indicated.We ﬁnd three such building blocks: 2-FF, 3-FF and 4-FF. Last panel shows a multilayercomposite ﬁber with a feed-forward structure.FIG. 4. Fibration landscape across domains and species . a, Fibration landscapefor biological networks . Total number of ﬁber building blocks across 5 types of biologicalnetworks analyzed in the present work. The count includes the total number of ﬁbers inthe networks of each biological type considering all species analyzed for each type (see SITable IV). b, Fibration landscape across species . Count of ﬁbers across each analyzedspecies. Each panel shows the count over the diﬀerent type of biological networks ( E. coli contains only the transcriptional network, see SI Table IV). c, Fibration landscape acrossdomains . Count of ﬁbers across the major domains studied. The biological domain panelis calculated over all networks and species in a and b . d, Global ﬁbration landscape .Cumulative count of ﬁbers in all domains in c . The cumulative count represents the totalnumber of ﬁbers per network of 10 nodes. Speciﬁcally, the quantity is calculated as thetotal number of ﬁbers divided by the total number of nodes in all networks per domainmultiplied by 10 . 21 IG. 1: IG. 2: a IG. 2: b IG. 3: a IG. 3: b IG. 4: a, b IG. 4: c, d upplementary Information Fibration symmetries uncover the building blocks of biological networks

Flaviano Morone, Ian Leifer, Hern´an A. Makse

Contents

I. Results

E. coli network 6E. Fiber building blocks 7F. Fibonacci ﬁbers 9G. Multi-layer composite ﬁbers 10H. Fibration landscape across biological networks, species and system domains 11I. Gene co-expression and synchronization via symmetry ﬁbration 12

II. Discussion References III. Transcriptional regulatory network of

E. coli IV. Symmetry ﬁbrations

V. Algorithm to ﬁnd ﬁbers with minimal balance coloring VI. Strongly connected component VII. Statistics of ﬁbers in the TRN of

E. coli

E. coli VIII. Datasets of biological and non-biological networks

II. TRANSCRIPTIONAL REGULATORY NETWORK OF

E. COLI

To deﬁne the transcriptional regulatory network (TRN) we use the transcription factor-gene target bi-partite network of

Escherichia coli

K-12 obtained from the RegulonDB datasource ( http://regulondb.ccg.unam.mx ). RegulonDB manually curates all transcriptionalregulations from literature searches [11]. We download all transcriptional regulatory inter-actions catalogued in RegulonDB version 9.0 from http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt , last accessed September 15, 2018.The database downloaded from RegulonDB is composed of a bipartite transcription factor- gene target network. In this bi-partite dataset, a directed link between a source transcrip-tion factor (TF) and a target gene means that the TF binds to the DNA sequence at thebinding site of the target gene to regulate its rate of transcription. In

E. coli , each geneexpresses a single TF (this is not the case in eukaryotic genes that contains introns andsplicing of protein-coding RNA can produce many proteins from a single gene). Therefore, agene-gene regulatory network can be constructed from the bipartite transcription factor-genetarget network by associating each TF to the gene that expresses the TF. Then, a directedlink in the TRN from gene i → gene j implies that gene i encodes for a TF that controlsthe rate of transcription of gene j . Thus, a directed link encodes the combined processes oftranscription, translation and TF binding to a target gene. We denote genes in bacteria initalics, e.g., gadX and its protein as GadX. Thus, we say that gene i sends a genetic ’message’to gene j and the ’messenger’ is the TF. The history of all messages passing in the networkdeﬁnes the information ﬂow in the network. A TF can either be an activator, repressor orcan have a dual function. For the purpose of calculating isomorphisms between input trees,the dual interactions are treated as distinct interactions. Thus, these three interactions aretreated as three diﬀerent types.For the purpose of building the TRN it is important to distinguish the gene’s productsbetween genes encoding for TFs and the rest of the genes encoding for the rest of the proteins(enzymes, kinases, transport proteins, etc). A TF is a regulatory protein that regulates agene by binding, and therefore will always have an out-going link in the network. Thereare other regulatory proteins (like kinases, histones, coactivators, etc) that regulate geneexpression but they do not have a DNA-binding domain and they regulate gene expressionwithout binding. In our TRN, genes that encode for a protein that is not a TF do not have31ut-going links in the network. They only have in-going links and therefore are danglingends in the network. In E. coli most of these proteins are enzymes that catalyze biochemicalreactions in the metabolic network. Other proteins are involved in transport and signalingprocesses (kinase) in the cell.TF are also activated by eﬀector molecules (metabolites) that bind non-covalently to anallosteric site of the TF to alter the conformation of the TF to activate it or deactivated bycontrolling the binding/unbinding of the TF to DNA. Eﬀectors can also produce covalentactivation of the TF like for instance during phosphorylation mediated by kinases in the twocomponent TFs.We treat these eﬀector activities as external parameters, determined by the growth con-ditions in the surrounding system (the cell in its changing environment) or by the metabolicnetwork, which is considered external to the TRN. These external perturbations are consid-ered as the external growth conditions when we analyze the co-expression proﬁles in SectionI I. In the present study, the metabolic network is considered external to the TRN, so we donot consider feedback loops from the TRN to the metabolic network and back to the TRNmediated by eﬀector metabolites. This extended network is treated in a follow up.In

E. coli , genes are also grouped by operons. An operon is a set of contiguous genesthat are transcribed as a single unit from the same mRNA molecule and the same promotersite upstream of all genes and a terminator downstream [11]. An operon can contain genesencoding for TF or non-TF proteins, and more than two TFs can be part of the operon.Since the operons are transcribed by the same RNA molecule, then we group these genesinto a single node in the network. This is certainly the case when the operon has a singlepromoter transcribing the full operon. However, there is some ambiguity in the constructionof the network using the deﬁnition of operon in RegulonDB when there are promoters inthe middle of the operon and these promoters transcribe more than one TF in the operon,forming diﬀerent transcription units. For instance, the operon in the gad system, gadAXW which is important in the pH strongly connected component in Fig. 2b. This operonexpressed two TFs, GadX and GadW, and one enzyme GadA. Here, each gene has its ownpromoter and terminator and thus are diﬀerent nodes in the network. Moreover, each TFis regulated by diﬀerent TFs as well as each TF regulates diﬀerent genes. As seen in Fig.2b, for instance, GadX binds to hns but not GadW. Also, GadW is regulated by ydeO but ydeO does not regulate gadX . Thus, putting together these two genes in the same operon32 adAXW would miss all these links. Thus, when two TF with diﬀerent promoters are partof the operon, we consider the TF as diﬀerent genes. On the other hand, the non-TFgenes in operons are always put together with other genes in the operon. For instance, the gadAXW operon from RegulonDB is considered as two nodes: gadW and gadAX . To simplifynotation, when there is an operon that contains one TF and several non-TF proteins, thenfor simplicity, we call this operon by the name of the TF. For instance, gadAX is simplycalled gadX or the operon rbsDACBKR is called rbsR and therefore the TF rsbR representsthe entire operon rbsDACBKR . Finally, when all the genes in the operon are non-TF, thenwe call the operon with all the genes names, as for instance, lsrACDBFG-tam .In the RegulonDB database there are a total of 4690 genes. Out of these genes, RegulonDB provides a bipartite network consisting of 1843 genes with interactions from or to othergenes, the remaining genes are not considered in the analysis. There are 192 genes thatencode for TFs. We cluster the genes into 313 operons as explained above. Full names ofoperons and genes appear in SI Table VI. After grouping the genes into operons, the networkis reduced to 879 nodes. There are 1835 directed edges with an average in-degree (or out-degree) of 2.1. In this network we ﬁnd 91 diﬀerent ﬁbers that encompass 416 diﬀerent nodes.We ﬁnd that 28 nodes are involved in 7 strongly connected components of size larger thanone node, and the rest are single node connected components.

IV. SYMMETRY FIBRATIONS

Below we provide formal deﬁnitions of the main concepts using in the paper: (a) inputtrees and isomorphisms, (b) from ﬁbrations → surjective minimal graph ﬁbrations calledhere symmetry ﬁbrations, (c) ﬁbers and minimal bases, and (d) minimal balance coloringalgorithm. We start with a review of the literature (not exhaustive).The literature on ﬁbrations and groupoids crosses the ﬁelds of mathematics, computerscience and dynamical systems theory. The notion of ﬁbration was ﬁrst introduced byGrothendieck as ﬁbrations between categories in algebraic geometry [12]. The original pa-per of Grothendieck has been published as a part of the S´eminaire N. Bourbaki in 1958and can be found at .A mathematical account of Grothendieck ﬁbrations in the context of category theory ap-pears in https://ncatlab.org/nlab/show/Grothendieck+fibration . For a review of33 IG. 5:

Group symmetries and ﬁbrations with their input tree . a, Example of a networkwith a symmetry group. The automorphism shown maps the network into another network leavinginvariant the connectivity of every nodes in the network [4, 14, 17, 18]. b, A network withoutautomorphisms but with a ﬁbration. The addition of a single out-link from 3 → (c). There areno more isomorphisms as shown by the rest of the input trees. Therefore, nodes 2 and 3 form aﬁber. Nodes 4 and 5 also form another ﬁber, yet independently of the other ﬁber. The ﬁbration isa morphism that maps the network into a base which is formed by collapsing the isomorphic nodesinto one, i.e., collapsing node 2 and 3 together, and node 4 and 5 together. The resulting base isalso called a quotient graph. the history of ﬁbrations from Grothendieck to modern studies, see the blog of Vigna at http://vigna.di.unimi.it/fibrations/ . The formulation of Grothendieck is highly ab-stract and diﬀers from our present work which refers to the notion of surjective mini-mal graph ﬁbration which is a ﬁbration between graphs. The work of Boldi & Vigna3413] and DeVille & Lerman [15] on graph ﬁbrations are the closest to our formulation,see http://vigna.di.unimi.it/ftp/papers/FibrationsOfGraphs.pdf . Graph ﬁbra-tions have been applied in computer science to understand PageRank [35], and the stateof synchrony of processors in computing distributed systems [36, 37], where ﬁbrations arethe key concept in the computation of identical states in distributed system. The relationbetween surjective minimal graph ﬁbrations and synchronous subspaces is elaborated inDeVille & Lerman [15] and Nijholt, Rink & Sanders [16]. It should be noted that all theseworks on ﬁbrations pertain to a highly abstract mathematical level which, in turn, providesthe concept of ﬁbration with a quite broad applicability. For a more accessible readingon ﬁbrations within the particular context application to biological networks, the reader isrecommended to follow our paper and supplementary sections.In parallel, the work of Golubitsky and Stewart [14, 20] and others in dynamical sys-tems theory consider the equivalent formalism of symmetry groupoids, equitable partitionof balanced colored nodes and its relation with synchronization [21–23]. A review of thegroupoid formalism and its application to synchronization in dynamical systems appearsin [14]. DeVille and Lerman [15] also discuss the relation between graph ﬁbrations and thegroupoid formalism.Synchronization arises also as a consequence of permutation symmetries in the network,called automorphisms [4], which form symmetry groups and are diﬀerent from symmetryﬁbrations and symmetry groupoids. There is a large literature in the dynamical systemcommunity dealing with cluster synchronization from automorphisms, since synchronizationis an ubiquitous phenomenon across all sciences [21–23]. Reviews can be found in the work ofGolubitsky and Stewart [14, 20] to recent work in [17–19] and references therein. Symmetrygroups are the cornerstone of physical phenomena appearing in all physical systems [5].Below, to elaborate on the deﬁnition of symmetry ﬁbrations, we ﬁrst compare ﬁbrationsto automorphisms which form symmetry groups [4, 14, 17–19] using the example networks ofFigs. 5a and 5b. An automorphism is a transformation that preserves the full connectivity ofthe network. That is, an automorphism preserves not only the inputs but also the outputsof each node in the network, and therefore, it presents more stringent conditions on theconnectivity than symmetry ﬁbrations which preserve only the input trees. For example,35he network of Fig. 5a is invariant under the automorphism deﬁned by the permutation: σ =  ↓ ↓ ↓ ↓ ↓ ↓  , (6)because the nodes are connected exactly to the same nodes before and after the applicationof the permutation σ , which is a global mirror symmetry.Next, consider the slightly modiﬁed network depicted in Fig. 5b left, which diﬀers fromthe network in Fig. 5a by one extra out-going link from node 3 to 7. In this network, thepermutation of nodes 2 ↔ ↔

5, Eq. (6), is not an automorphism anymore, becauseit does not preserve the in and out connectivities of all nodes, e.g., node 3 is connected with7 but loses this connection after the permutation (Fig. 5b right). It is interesting to see howfragile group symmetries are: if we connect just one extra node to the network as shownin Fig. 5b, the symmetry (i.e. the network automorphism group) is broken. This occursbecause automorphisms require very strict arrangements of nodes and links to preserve,rigidly, the global structure of the network. Fibration symmetries, with their emphasis inthe preservation of the input trees only, is less restrictive. This might explain why ﬁbrationsymmetries emerged in living systems as opposed to the more restrictive automorphismswhich describe all aspects of matter, from elementary particles to atoms, molecules andphases of matter.This example raises the following question: are there extra symmetries in the networkshown in Fig. 5b beyond its automorphisms? The answer to this question is, indeed, yes:there are extra symmetries in the network of Fig. 5b, the ﬁbration symmetries [12, 13],which do not form a group [4] but groupoids [14]. A groupoid is a set of transformationssatisfying the axioms of invertibility, identity and associativity but not the composition law(closure) [14], while in a group, transformations satisfy the four axioms. For this reason,groupoids are fundamentally diﬀerent algebraic structures compared with traditional groupsymmetries.

A. Input tree

Roughly speaking, symmetry ﬁbrations take into account only the input trees of thenodes, but not the output-trees (this is not true though when the input and output trees36re connected). Thus, node 3 in Fig. 5b is connected to node 7 via an out-going link, andthis link destroys the symmetry group, but node 3 is still symmetric with 2 via a symmetryﬁbration, since the input trees of nodes 2 and 3 are isomorphic, even though node 3 isconnected with 7. This is because the connection 3 → input tree ,which contains the full information received by a given node through the totality of all thepossible paths ending in that node and starting from every other node in the network. Thus,for every node i in the network G there is a corresponding input tree, called T i , which isdeﬁned as a tree with a selected node r i , called the root, and such that every other node isa path P j → i of G starting from j and ending in i [16]. A link from node P j → i to node P k → i exists if P j → i = e j → k P k → i =, where e j → k is an edge of G .The concept of input tree has appeared in the literature as the universal total space intraditional categorical or topological terminology [12], the universal total graph from [13], theview in the theory of distributed systems, or the unfolding of a nondeterministic automatonin concurrency theory [13].For example, let us construct the input tree T of node 2 in the network on the left ofFig. 5b. The root is the node r at the uppermost level of the tree. Every other node of theinput tree of node 2 is a path P j → ending in 2. There are two paths of length 1: P (1)3 → and P (1)4 → ; three paths of length 2: P (2)2 → , P (2)5 → , and P (2)6 → ; and so on. Since P (2)2 → = e → P (1)3 → ,we put a link in the input tree from P (2)2 → to P (1)3 → because P (2)2 → = e → P (1)3 → . We thenadd all other links in the input tree using the same criterion. The resulting input tree T isshown in Fig. 5c, together with the input trees of all other nodes in the network in Fig. 5b.To simplify, we label each node of T i using the starting point of the corresponding path P j → i . For example, in T nodes P (1)3 → and P (1)4 → are labeled 3 and 4 respectively, and thelength of the path is equal to the depth of the node in the input tree.Thus, in practice, we arrive at the following way to construct the input tree: we startwith the node at the root, lets say node 2. We label every node P j → in the input tree by37ode j where the path starts. The ﬁrst layer of the input tree consists of all the nodes thatare at a distance one from the root. In this case, nodes 3 and 4. Thus we add two links to2 from 3 and 4 in the input tree.The second layer of the input tree is obtained applying the same procedure to each nodein the ﬁrst layer, 3 and 4. For instance, node 3 receives a link from 2 and 5. Thereforethe second layer of the input tree contains nodes 2 and 5 connected to node 3. We repeatthe procedure with the other node in layer 2: node 4. Node 4 receives a link only fromnode 6, and node 6 from no one. So, we add a link from 6 to 4 and this path does notpropagate further. The third layer of the input tree is obtained iteratively applying thesame procedure, and so on.We note that the input trees of nodes 1, 2, 3 and 7 are inﬁnite since the network containsa cycle (or loop) between nodes 2 (cid:29)

3. For instance, T is inﬁnite because there are pathscrossing the loop inﬁnite times. On the other hand, the input trees of nodes 4, 5 and 6 areﬁnite since they do not cross the loop. B. Isomorphic input trees

The input tree T i at node i can be interpreted as the collection of all possible ‘histories’starting at some node and ending in node i . As shown in Section I C, if two input trees T i and T j are isomorphic, then the corresponding nodes i and j in network G have thesame dynamical state [15, 16]. This equivalence is understood in terms of a local in-isomorphism that maps nodes to nodes and links to links, so it formalizes the fact that thedynamical interactions represented by a directed link from gene to gene could be in principlediﬀerent across genes, as long as the links are the same (or similar, in case that the producedsynchronization is approximate) inside the ﬁber.An isomorphism between T i and T j is deﬁned as a bijective map τ : T i → T j , which mapsone-to-one the nodes and edges of T i to nodes and edges of T j .A minimal condition for the existence of an isomorphism between the input trees is thatthe two input trees have the same number of nodes (we could also add a condition of thesame degree sequence). Thus, it is clear that there could be no isomorphism between theinput trees of nodes 2 and 4, since the former contains an inﬁnite number of nodes and thelater just two. Thus, a minimal condition for an isomorphism to exist is that it should be a38apping between two input trees with the same number of nodes, since the mapping needs tobe bijective, i.e., with an inverse. By inspection it is then clear that there is an isomorphismbetween the input trees of nodes 4 and 5. This isomorphism is the map τ → : T → T , andit is written as a transformation following the notation: τ → =  ↓ ↓  , (isomorphism between input trees of nodes 4 and 5). (7)which maps the root of T to the root of T as τ → (4) = 5, and node 6 ∈ T to node 6 ∈ T as τ → (6) = 6. The notation starts with the root of the tree and then we write nodes ineach level from top to bottom starting from left to right in each level. In this particularexample the links are of the same type, so there is no need to specify the mapping betweenlinks in the isomorphism, but in general the local equivalence require that nodes are map tonodes and also links are mapped to the same type of link by the isomorphism.The map in Eq. (7) is one of the simplest isomorphism since the input tree contains onlyone level. In this particular case, to see that nodes T and T are isomorphic, it is thusenough to see that both nodes 4 and 5 connect to one and the same node, which is node6 in this case. That is, both input trees of nodes 4 and 5 are isomorphic because they aremade up of just two nodes and one edge, and this isomorphism implies that 4 and 5 receivethe same information. This is the simplest form of an isomorphism between input trees. Inthis case, we say that node 4 and 5 have the same input-set , which is an input tree of onlyone level, that is the set of incoming links. The input-set is used in the groupoid formalismin Ref. [14].Next, we consider the input trees of nodes 2 and 3. By visual inspection, both inputtrees have the same ‘shape’. However, these trees are inﬁnite in the number of levels. Howdo we decide if two input trees are isomorphic when they have an inﬁnite number of levels?Remarkably, to determine if two input trees are isomorphic, it suﬃces to check that theyare isomorphic up to the N − N is thetotal number of nodes in the network G . This is an important result that allows us to avoidto check an inﬁnite number of equivalences. Since G has | N G | = 7, we use six levels in theinput trees to determine that there is an isomorphism between T and T which corresponds39o the following map: τ → =  . . . ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ . . .  , (isomorphism between input trees of 2 and 3).(8)There are no other isomorphism between the other input trees. Notice that T is not iso-morphic to T and T by just one link to the root.The existence of an isomorphism τ from the input tree of node i to the input tree of node j implies the synchronization of x i and x j [15]. In the groupoid formalism of Golubitskyand Stewart, it is said that two nodes are synchronized if their input-set are synchronized,too [14]. Analogous work in dynamical systems shows that automorphisms in networks leadto synchronized nodes in orbits, see [17–20] and references therein. The orbit of a givennode is obtained by applying all automorphisms of a network to the node and the nodesin the orbit are synchronous. The synchronized orbits obtained from automorphisms areanalogous to the synchronized ﬁbers obtained from symmetry ﬁbrations. In general, everyorbit is also a ﬁber, but the opposite is not true, since a ﬁber is not necessarily an orbit.In our analysis of the E. coli network, we ﬁnd some automorphisms. Some of the starﬁbers with n = 0 are also orbits of the networks since they are invariant under permutationsymmetries of the symmetric group of order n , S n . But this is only when the genes inthe star have no out-going links. As shown in the example of Fig. 5, an out-going link inany of the star genes, will destroy the automorphism, but not the ﬁber. For this reason,automorphisms are somehow more prevalent in undirected networks. For instance, we havefound that automorphisms describe the symmetries of the gap junction connectome of C.elegans , which is composed all of undirected links [34]. In the case of directed biologicalnetworks treated here, while automorphisms could be of use to discover some synchronizednodes, the majority of synchronization is due to symmetry ﬁbrations, which are not describedby automorphisms. 40 . From ﬁbrations to symmetry ﬁbrations via isomorphic input trees and minimalbases

A ﬁbration is any morphism from a network G = ( N G , E G ) to a base G = ( N G , E G ): ψ : G → B [12]. If a network G = ( N G , E G ) has at least one pair of isomorphic input trees,then there exists a network B = ( N B , E B ), called the base of G , such that G can be ‘ﬁbered’over B by the graph ﬁbration. The base B is deﬁned as follows: • a node I ∈ N B is a representative of the set of nodes { i ∈ N G } whose input trees areisomorphic; • an edge e I → J where I, J ∈ E B is deﬁned as e I → J = (cid:80) i ∈ I e i → j , where e i → j ∈ E G .Having deﬁned the base network B , we say that G is ﬁbered over B if there exists a surjectivemorphism ψ : G → B , called surjective graph ﬁbration [13], that maps nodes and edges of G to nodes and edges of B as: ψ ( i ) = I for all i ∈ N G , and ψ ( e i → j ) = e I → J . A surjectivemorphism is a map between two sets (the domain and codomain) where each element of thecodomain (in this case B ) is mapped to, at least, by one element of the domain (in this case G ). The set of nodes i ∈ N G that are mapped to the same node I ∈ N B , and denoted by ψ − ( I ), is called the ﬁber of G over node I . We notice that all input trees of nodes whichbelong to the same ﬁber are pairwise isomorphic.In general a surjective graph ﬁbration ψ can map nodes with isomorphic input trees todiﬀerent bases, thus, the number of ﬁbers is not minimal.A surjective graph ﬁbration that maps all genes with isomorphic input trees to a singlecommon node in B is called a surjective minimal graph ﬁbration in the sense of [13]. Such aminimal ﬁbration will generate then the minimal bases of the network and will produce thelargest collapse of nodes in ﬁbers. In this work we only deal with surjective minimal graphﬁbrations and we call them symmetry ﬁbrations for short.In practice, a symmetry ﬁbration maps G to the minimal base B (analogous to thequotient), that consists of the following steps: (i) consider all the nodes in a ﬁber (whichhave isomorphic input trees) and choose one as the representative I , (ii) collapse the nodesin the ﬁber into one single node in B and call it by the name of the representative node I , (iii) for every link of a node j in G directed to the node I in G , add a link in B from j to I . If the node j belongs to the ﬁber, then the corresponding link in B is an autoregulation41oop in B , (iv) repeat for every ﬁber in G . When ﬁbers belong to disjoint components ofthe network, then they are considered as distinct ﬁbers. V. ALGORITHM TO FIND FIBERS WITH MINIMAL BALANCE COLORING

The algorithm to partition the network into ﬁbers is based on the ’minimal balancedcoloring’ algorithm developed by Cardon & Crochemore in Ref. [24]. Here we follow a versiondeveloped by Kamei & Cock [25] to construct a minimal balanced coloring of a network,namely a coloring that employs the least possible number of colors, which is associatedwith minimal graph ﬁbrations. The algorithm’s runtime scales as O ( | E G | log | N G | ), whichimplies that it is essentially linear with the network size, specially for sparse networks, andcan be applied to very large networks.The theory of balance coloring is explained in Ref. [14]. A balance coloring creates apartition of nodes of G into disjoint sets (corresponding to synchronous ﬁbers) such that eachnode in one set receives the same number of colors from nodes within other sets [14, 20]. Acoloring of G with this property is the balanced coloring and represents an equitable partition of the network, see [14, 20]. The sets identiﬁed by a minimal balanced coloring partitionsthe network with minimal colors and corresponds to the ﬁbers of G identiﬁed by minimalgraph ﬁbrations ψ [13–15].Thus, we color nodes such that synchronous nodes in a ﬁber receive the same colors fromtheir synchronous nodes. As example, the genes baeR and spy (Fig. 1a) have the same colorand are in the same ﬁber since they receive the same colors from their neighbors: both baeR and spy receive one red color via the activator link from one red node ( baeR from itself and spy from baeR ) and one green activator link each from the green node cpxR .The algorithm constructs a coloring of the nodes that is balanced. A coloring is balancedif two identically colored nodes are connected to identically colored nodes via their inboundlinks. Each balanced colored cluster is a ﬁber in the network. The ﬁbers also corresponds tothe orbits in a network when the symmetries are automorphisms rather than isomorphismsin the input trees. The ﬂow of the algorithm is exempliﬁed with the example network ofFig. 6. • Step 1 - We start by assigning the same color to all nodes. In Fig. 6a all nodes areinitially colored in blue. In addition, we assign to each link the same color of the42

IG. 6:

Algorithm to ﬁnd the ﬁbers of a network through a minimal balanced coloring.

The goal of the algorithm is to ﬁnd a minimal balanced coloring of the network, so that two nodeshave the same color only if they are connected to the same number of identically colored nodes viainbound links. The colors represent the ﬁbers in the network. node from where it emanates. To update the coloring (or, equivalently, to generate anew partition) of nodes, we construct the table shown in the right panel of Fig. 6a, asexplained next. In the top row of this table we put the network nodes colored withtheir current color. In the leftmost column we put each type of colored link. In thisinitial stage of the algorithm we only have a blue link for all the nodes. Then, weﬁll the entries of the table with the number of colored links of this blue type that arereceived by the corresponding node. For example, node 1 receives two 2 blue links aswell as nodes 2 and 3. Nodes 4, 5 and 7 receive one blue link each, and node 6 nothing.The structure of this table determines the new coloring as explained in the next step.43

Step 2 - Using the table in Fig. 6a we update the coloring of nodes as follows. Weassign the same color to all nodes that receive the same number of colored links of eachtype. Speciﬁcally, nodes 1, 2 and 3 receive two blue links, so we assign them the same(blue) color. Analogously, nodes 4, 5 and 7 receive one blue link, so we assign themthe same color, but diﬀerent from blue. We assign them a purple color. Similarly, weassign another color to node 6 (green). We then obtain the colored network in theleft of Fig. 6b. Applying the counting of receiving coloring links to this network, weobtain the new coloring table shown in Fig. 6b, where each link has the color of thenode from where it emanates. Thus, we update the table to generate the new coloring,as shown in the right panel of Fig. 6b. • Step 3 - Using the same criterion as in Step 2, we update the coloring of nodes,comprising now ﬁve diﬀerent colors, and then we generate the new table, as shownin Fig. 6c. At this point the algorithm stops, because we do not need to introducemore colors, since each color is balanced. Each color corresponds to a ﬁber, and eachnode in each colored ﬁber receives the same colors from other ﬁbers or from nodes inthe same ﬁber. Therefore, the coloring shown in the network of Fig. 6c is the minimalbalanced coloring of the network, and the colors indicate the ﬁbers in the network.As far as only minimal ﬁbrations are considered, the algorithm will return always the sameﬁbers containing the same nodes, for any initial condition and realization. Below we providethe pseudo-code to clarify the algorithm. More detailed instructions and methodology forobtaining ﬁber building blocks will be given in a follow-up paper. We start by assigning allnodes to the same ﬁber and then continue to reﬁne the partition basing on the input set ofthe node until no further reﬁnement can be obtained.44 lgorithm 1

Finding ﬁbers following Kamei & Cock Ref. [25]

Input:

Graph G = { N G , E G } , where N G are vertices and E G are edges of the analyzednetwork | N G | - number of vertices, N G = { v . . . v | N G | } Output: C = { c i } , where c i - color of node i and i = 1 · · · | V | Notation: I i = { I i . . . I Ni } , where N = current number of colors N = 1 for i = 1 · · · | N G | do c i = 1 end for j = 0 repeat for i = 1 · · · | N G | , k = 1 ...N j do I ki = number of nodes of color k in the input set of v i end for H = set of all unique { I i } // assign each unique vector a color and color the graph accordingly for i = 1 · · · | N G | do c i = index of I i in H , e.g. if two nodes have the same I i and I j → c i = c j end for j = j + 1 N j = | H | until N j (cid:54) = N j − return { c i } I. STRONGLY CONNECTED COMPONENT

In a directed network, the strongly connected component is composed of nodes that arereachable from every other node in the component. That is, there is a directed path fromevery node to any other node in the strongly connected component. A weakly connectedcomponent is obtained when we ignore the directionality of the links. Strongly connectedcomponents are relevant to genetic ﬁbers since they contain loops that control the state ofthe genes. We ﬁnd four types of strongly connected components. Single-gene componentscomposed of autoregulator loops like cpxR and fadR in Figs. 1a and 1e. The other typeof components are those in Fig. 2a and Fig. 2b and also a ﬁve-gene connected componentshown in SI Fig. 7. We note that most of the ﬁbers regulated by these components donot belong to the connected component. This is because they receive information but donot send information back to the connected component. These ﬁbers are characterized byinteger ﬁber numbers. When the ﬁber receives and sends back information, that is, whenthe ﬁber belongs to the strongly connected component, then it becomes a Fibonacci ﬁber.The largest strongly connected component in the

E. coli network controls the pH systemshown in Fig. 2b.

VII. STATISTICS OF FIBERS IN THE TRN OF

E. COLI

A. Fibers statistics in

E. coli

SI Table I shows the counts in the

E. coli network of each building block. For instancethe most abundant building blocks are the following: | n = 0 , (cid:96) = 1 (cid:105) : 45 | n = 1 , (cid:96) = 0 (cid:105) : 13 | n = 0 , (cid:96) = 2 (cid:105) : 13 | n = 1 , (cid:96) = 1 (cid:105) : 8The list is completed with the fractal building blocks of Fibonacci sequences which areless numerous but more complex in their structure: | ϕ = 1 . .., (cid:96) = 2 (cid:105) : 1 46 IG. 7: A ﬁve-gene connected component of soxR, soxS, fnr, fur, and arcA with its regulatedﬁbers. ϕ = 1 . .., (cid:96) = 1 (cid:105) : 1 | ϕ = 1 . ..., (cid:96) = 1 (cid:105) : 1 Structure type Amount in E-coli | n = 0 , l = 1 (cid:105) | n = 0 , l = 2 (cid:105) | n = 0 , l = 3 (cid:105) | n = 1 , l = 0 (cid:105) | n = 1 , l = 1 (cid:105) | n = 1 , l = 2 (cid:105) | n = 2 , l = 0 (cid:105) | n = 2 , l = 1 (cid:105) | ϕ d = 1 . .., l = 1 (cid:105) | ϕ d = 1 . .., l = 1 (cid:105) | ϕ d = 1 . .., l = 2 (cid:105) Total number of building blocks

B. Full list of ﬁbers in

E. coli

SI Table VI shows the complete list of the 91 ﬁbers building blocks found in the geneticnetwork of

E. coli . We list the genes in the ﬁber plus their external regulators. If a geneor operon is not in this list, for instance lacZYA , it means that the gene or operon is notin a ﬁber. Supplementary File 1 shows the plot of the circuit of every ﬁber and the ﬁberbuilding block.The ﬁrst column in SI Table VI is the ID of the ﬁber. This ID refers to the plot of theﬁber building block in Supplementary File 1. The second column lists the genes in the ﬁber,the third column lists the external regulators. The last column speciﬁes the ﬁber number48ssociated with each ﬁber as | n, (cid:96) (cid:105) or | ϕ d , (cid:96) (cid:105) . VIII. DATASETS OF BIOLOGICAL AND NON-BIOLOGICAL NETWORKS

To investigate the applicability of ﬁbrations in a broader context, we performed an ex-tensive analysis of diﬀerent complex networks from diverse domains in systems science.Full details of each network analyzed can be accessed at https://docs.google.com/spreadsheets/d/1-RG5vR_EGNPqQcnJU8q3ky1OpWi3OjTh5Uo-Xa0PjOc . The codes to re-produce this analysis are at github.com/makselab and the full datasets appear at kcorelab.org . See also tables below with information about the networks.We ﬁrst show the symmetry ﬁbrations in biological networks and species. See SectionI H. We characterize biological networks spanning from: • Biological networks: transcriptional regulatory networks, metabolic net-works, cellular processes networks and pathways, disease networks, neuralnetworks.

We study the following species: • Species: A. thaliana, E. coli, B. subtilis, S. enterica (salmonella), M. tuber-culosis, D. melanogaster, S. cerevisiae (yeast), M. musculus (mouse), andH. sapiens (human).

We then study non-biological networks in Section I H: • Social Networks: online social networks, Facebook, Twitter, Wikipedia,Youtube, email networks, communication networks, citation networks, col-laboration networks, bloggers • Internet: routers, autonomous systems, web graphs, hyperlinks, peer-to-peer • Infrastructure Networks: power grid, airport, roads, ﬂights • Economic Networks • Software Networks: Linux, jdk • Ecosystems etwork Domain Total No. of nodes Total No. of edges No. of networksBiological 287390 4211856 289Economic 1752 108639 5Ecosystems 1879 5378 14Infrastructure 24511 82534 16Internet 244634 835565 27Social 104909 1261009 15Software 43391 503645 3TABLE II: Features of the networks across domains. We report the total numbers for each domainsummed over all the networks in the domain.Species Total No. of nodes Total No. of edges No. networksYeast 55932 1392926 11Arabidopsis Thaliana 790 1431 1Bacillus subtilis 5602 11417 3Drosophila 39549 321734 5Escherichia coli 879 1835 1Human 72587 1198712 248Micobacterium Tuberculosis 1624 3212 1Mouse 64709 987424 7Salmonella 8293 15589 6TABLE III: Number of networks per species. rabidopsis Bacillus Caenorhabditis Cat Drosophila Escherichia Human Micobacterium Mouse Rat Salmonella YeastThaliana subtilis elegans coli TuberculosisTF 1 2 2 0 4 1 4 1 4 0 2 11Neuron 0 0 0 1 1 0 0 0 3 3 0 0Metabolic 0 0 0 0 0 0 48 0 0 0 2 0Disease 0 0 0 0 0 0 66 0 0 0 0 0Kinase 0 0 0 0 0 0 2 0 0 0 0 0Pathway 0 0 0 0 0 0 127 0 0 0 0 0Protein 0 1 0 0 0 0 1 0 0 0 2 0 TABLE IV: Table with the count of networks per type of biological network and species. Thesenetworks are used to calculate the distributions of ﬁber across species and biological types in Figs.4a, b, and c. For each type of biological network in Fig. 4a, b, we calculate the count over thetotal number of networks as indicates at the end of each row for each biological type. The sameoccurs with the number of networks at the end of each column for each species. Figure 4c showsthe counts over all the network shown in the last row/column. etwork Subdomain Total No. of nodes Total No. of edges No. of networksAutonomous systems graphs 141842 481415 14Bitcoin 9664 59777 2Collaboration networks 50260 504897 4Disease 4309 15254 66Facebook 4039 88234 1Youtube subscriptions 13723 76765 1Internet peer-to-peer networks 31978 110154 4Jazz 198 5484 1Linux 30837 213954 1Metabolic 4273 33829 50Networks with ground-truth communities 1005 25571 1Neural networks 3694 129812 8Cellular processes and Pathways 9825 54712 127Plant-Pollinator 1631 2719 11Plant-Seed-Disperser 65 165 2Power grid 4941 6594 1Sentiment 99 278 2Transcriptional regulatory 260258 3908769 32TABLE V: Subtypes of networks belonging to the diﬀerent domains. d Fiber Regulators Fiber Number1 aaeR, ampDE, azuC, comR, cyaA, narQ, sohB, speC,spf, trxA, yaeP-rof, yaeQ-arfB-nlpE, yjeF-tsaE-amiB-mutL-miaA-hfq-hﬂXKC crp | n = 0 , l = 1 (cid:105) | n = 0 , l = 1 (cid:105) | n = 1 , l = 0 (cid:105) | n = 1 , l = 1 (cid:105) | n = 0 , l = 2 (cid:105) | n = 0 , l = 3 (cid:105) | n = 0 , l = 1 (cid:105)⊕| n =1 , l = 1 (cid:105) | ϕ d = 1 . .., l = 1 (cid:105) | n = 1 , l = 0 (cid:105)

10 alaA-yfbR, avtA, leuE, livJ, livKHMGF, lysU, sdaA lrp | n = 0 , l = 1 (cid:105)

11 alaE, kbl-tdh, yojI lrp | n = 0 , l = 1 (cid:105)

12 alaWX, argU, argW, argX-hisR-leuT-proM, aspV, ﬂxA,glyU, leuQPV, leuX, lptD-surA-pdxA-rsmA-apaGH, lysT-valT-lysW, metT-leuW-glnUW-metU-glnVX, pheU, pheV,proK, proL, queA, serT, serX, thrU-tyrU-glyT-thrT-tufB,thrW, trmA, tyrTV-tpr, valUXY-lysV ﬁs | n = 0 , l = 1 (cid:105)

13 aldB, hupB crp, ﬁs | n = 0 , l = 2 (cid:105)

14 allA, allS, gcl-hyi-glxR-ybbW-allB-ybbY-glxK allR | n = 0 , l = 1 (cid:105)

15 alsR, rpiB | n = 1 , l = 0 (cid:105)

16 amiA-hemF, cmk-rpsA-ihfB, uspB IHF | n = 0 , l = 1 (cid:105) | n = 1 , l = 0 (cid:105)

18 ampC, dacC bolA | n = 0 , l = 1 (cid:105)

19 araE-ygeA, araFGH araC, crp | n = 0 , l = 2 (cid:105)

20 arcZ, ydeA arcA | n = 0 , l = 1 (cid:105)

21 argA, argCBH, argE, argF, argI, argR, artJ, artPIQM, lysO | n = 1 , l = 0 (cid:105)

22 argO, lysP argP, lrp | n = 0 , l = 2 (cid:105)

23 aroF-tyrA, tyrB tyrR | n = 0 , l = 1 (cid:105)

24 aroH, trpLEDCBA, trpR | n = 1 , l = 0 (cid:105)

25 asnB, clpPX-lon, glsA-ybaT, uspE gadX | n = 0 , l = 1 (cid:105)

26 aspA-dcuA, dcuR crp, fnr,narL | n = 0 , l = 3 (cid:105)

27 bacA, cpxPQ, cpxR, ftnB, ldtC, ldtD, ppiD, sbmA-yaiW,slt, srkA-dsbA, xerD-dsbC-recJ-prfB-lysS, yccA, yebE,yidQ, yqaE-kbp, yqjA-mzrA | n = 1 , l = 0 (cid:105)

28 baeR, spy cpxR | n = 1 , l = 1 (cid:105)

29 bcsABZC, fnrS, pdeF, pepT, pitA, ravA-viaA, tar-tap-cheRBYZ, upp-uraA, xdhABC, ydeJ, ytiCD-idlP-iraD fnr | n = 0 , l = 1 (cid:105)

30 bdcA, dkgB, grxD, mepH, mhpT, pgpC-tadA, rfe-wzzE-wecBC-rﬀGHC-wecE-wzxE-rﬀT-wzyE-rﬀM, rybB, tehAB,tsgA, ydbD, yeaE nsrR | n = 0 , l = 1 (cid:105)

31 betI, betT arcA, cra | n = 1 , l = 2 (cid:105)

32 bioA, bioBFCD birA | n = 0 , l = 1 (cid:105)

33 bluF, ydeI rcdA | n = 0 , l = 1 (cid:105)

34 borD, envY-ompT, mgrB, mgrR, mgtLA, mgtS, pagP, rstA,ybjG phoP | n = 0 , l = 1 (cid:105)

35 cbpAM, gltX, gyrB, msrA ﬁs | n = 0 , l = 1 (cid:105)

36 cdaR, garD, gudPXD | n = 1 , l = 0 (cid:105) | n = 1 , l = 0 (cid:105)

38 cirA, entCEBAH, fepA-entD, ﬁu crp, fur | n = 0 , l = 2 (cid:105)

39 copA, cueO cueR | n = 0 , l = 1 (cid:105)

40 cra, pitB, sbcDC phoB | n = 0 , l = 1 (cid:105)

41 crl 1, exbBD, fepDGC, fhuACDB, fhuE, gpmA, metJ, nohA-ydfN-tfaQ, ryhB, ygaC, yhhY, yjjZ fur | n = 0 , l = 1 (cid:105)

42 cusCFBA, cusR, yedX hprR, phoB | n = 1 , l = 2 (cid:105)

43 cvpA-purF-ubiX, glrR-glnB, hﬂD-purB, lolB-ispE-prs,purC, purEK, purL, speAB purR | n = 0 , l = 1 (cid:105)

44 cysDNC, cysK, tcyP, yciW, ygeH, yoaC cysB | n = 0 , l = 1 (cid:105)

45 cytR, nagC, nagE, ycdZ crp | n = 1 , l = 1 (cid:105)

46 dapB, lysC argP | n = 0 , l = 1 (cid:105)

47 ddpXABCDF, patA, potFGHI, yeaGH, yhdWXYZ ntrC | n = 0 , l = 1 (cid:105)

48 decR, mlaFEDCB, yncE marA | n = 0 , l = 1 (cid:105)

49 dgcC, iraP, nlpA, wrbA-yccJ, yccT csgD | n = 0 , l = 1 (cid:105)

50 dicB-ydfDE-insD-7-intQ, dicC-ydfXW dicA | n = 0 , l = 1 (cid:105)

51 dsdC, norR nsrR | n = 1 , l = 1 (cid:105)

52 dtpA, omrA, omrB ompR | n = 0 , l = 1 (cid:105)

53 ecpA, ecpR matA | n = 0 , l = 1 (cid:105)

54 efeU 1U 2, motAB-cheAW, psd-mscM, tsr, ung cpxR | n = 0 , l = 1 (cid:105)

55 epd-pgk-fbaA, gapA-yeaD, mpl cra, crp | n = 0 , l = 2 (cid:105)

56 erpA, iscR, rnlAB | n = 1 , l = 0 (cid:105)

57 evgA, nhaR hns | ϕ d = 1 . .., l = 1 (cid:105)

58 fabA, fabB fabR, fadR | n = 0 , l = 2 (cid:105)

59 fadE, fadIJ arcA, fadR | n = 0 , l = 2 (cid:105)

60 fbaB, fruBKA, glk, gpmM-envC-yibQ, pfkA, ppc, pykF,pyrG-eno, tpiA cra | n = 0 , l = 1 (cid:105) | n = 0 , l = 1 (cid:105)

62 folE-yeiB, metA, metC, metF metJ | n = 0 , l = 1 (cid:105)

63 fpr, pqiABC, rirA-waaQGPSBOJYZU marA, soxS | n = 0 , l = 2 (cid:105)

64 fucAO, fucR, zraR crp | n = 1 , l = 1 (cid:105)

65 gfcA, ybhL, yﬁR-dgcN-yﬁB, ymiA-yciX yjjQ | n = 0 , l = 1 (cid:105)

66 hupA, trg crp, ﬁs | n = 0 , l = 2 (cid:105)

67 ibaG-murA, rplU-rpmA-yhbE-obgE mlrA | n = 0 , l = 1 (cid:105)

68 ibpAB, yadV-htrE IHF | n = 0 , l = 1 (cid:105)

69 idnK, idnR crp, gntR | n = 1 , l = 2 (cid:105)

70 isrC-ﬂu, pth-ychF oxyR | n = 0 , l = 1 (cid:105)

71 lgoR, uxuR crp, exuR | ϕ d = 1 . .., l = 2 (cid:105)

72 lolA-rarA, osmB rcsB | n = 0 , l = 1 (cid:105)

73 lsrACDBFG-tam, lsrR, oxyR, rbsR crp | n = 1 , l = 1 (cid:105)

74 malI, mlc crp | n = 1 , l = 1 (cid:105)

75 manA, yhfA crp | n = 0 , l = 1 (cid:105)

76 mngAB, mngR | n = 1 , l = 0 (cid:105)

77 nadA-pnuC, nadB nadR | n = 0 , l = 1 (cid:105)

78 nimR, nimT | n = 1 , l = 0 (cid:105)

79 ompX, rpsP-rimM-trmD-rplS, ychO, ysgA fnr | n = 0 , l = 1 (cid:105)

80 pepD, yhbTS csgD | n = 0 , l = 1 (cid:105)

81 phoP, slyB | n = 2 , l = 0 (cid:105)

82 pspABCDE, pspG IHF, pspF | n = 0 , l = 2 (cid:105)

83 purR, pyrC fur | n = 1 , l = 1 (cid:105)

84 rhaR, rhaS crp | n = 2 , l = 1 (cid:105)

85 rrsA-ileT-alaT-rrlA-rrfA, rrsE-gltV-rrlE-rrfE ﬁs, lrp | n = 0 , l = 2 (cid:105)

86 rrsB-gltT-rrlB-rrfB, rrsC-gltU-rrlC-rrfC, rrsD-ileU-alaU-rrlD-rrfD-thrV-rrfF, rrsG-gltW-rrlG-rrfG, rrsH-ileV-alaV-rrlH-rrfH ﬁs, hns, lrp | n = 0 , l = 3 (cid:105)

87 ssb, uvrA arcA, lexA | n = 0 , l = 2 (cid:105)

88 ttdABT, ttdR | n = 1 , l = 0 (cid:105) | n = 0 , l = 1 (cid:105)

90 yegRZ, yfdX-frc-oxc-yfdVE evgA | n = 0 , l = 1 (cid:105)