Fibration symmetries uncover the building blocks of biological networks
FFibration symmetries uncover the building blocks of biologicalnetworks
Flaviano Morone, Ian Leifer, Hern´an A. Makse
Levich Institute and Physics Department,City College of New York, New York, NY 10031
Abstract
A major ambition of systems science is to uncover the building blocks of any biological networkto decipher how cellular function emerges from their interactions. Here, we introduce a graphrepresentation of the information flow in these networks as a set of input trees, one for each node,which contains all pathways along which information can be transmitted in the network. In thisrepresentation, we find remarkable symmetries in the input trees that deconstruct the networkinto functional building blocks called fibers. Nodes in a fiber have isomorphic input trees and thusprocess equivalent dynamics and synchronize their activity. Each fiber can then be collapsed into asingle representative base node through an information-preserving transformation called ’symmetryfibration’, introduced by Grothendieck in the context of algebraic geometry. We exemplify thesymmetry fibrations in gene regulatory networks and then show that they universally apply acrossspecies and domains from biology to social and infrastructure networks. The building blocks areclassified into topological classes of input trees characterized by integer branching ratios and fractalgolden ratios of Fibonacci sequences representing cycles of information. Thus, symmetry fibrationsdescribe how complex networks are built from the bottom up to process information through thesynchronization of their constitutive building blocks. a r X i v : . [ q - b i o . M N ] J un central theme in systems science is to break down the system into its fundamentalbuilding blocks to then uncover the principles by which complex collective behavior emergesfrom their interactions [1–3]. In number theory, every natural number can be represented bya unique product of primes. Thus, prime numbers are the building blocks of natural numbers.This mathematical notion of building blocks is extended to the more abstract notion of grouptheory since finite groups can also be factored into simple subgroups [4]. The latter example,entirely abstract as it may be, has important implications for natural systems due to thefundamental relationship between group theory and the notion of symmetry, that has ledto the discovery of the fundamental building blocks of matter, such as quarks and leptons[3, 5]. Here we ask whether similar principles of symmetry can uncover the fundamentalbuilding blocks of biological networks [1, 2, 6, 7]. Primary examples of these networks aregene regulatory networks that control gene expression in cells [2, 8–10], as well as metabolicnetworks, cellular processes and pathways, neural networks and ecosystems and, beyondbiology, to other information-processing networks like social and infrastructure networks [7].Previous studies have identified building blocks or ‘network motifs’ [2, 6, 8] by looking forpatterns in the network that appear more often that they would by pure chance. The cruxof the matter is to test whether the building blocks of these networks obey a predictivedesign principle that explains how the cell functions, and whether such a principle can beexpressed in the language of symmetries.We introduce the use of symmetries in biological networks by analyzing the transcriptionalregulatory network of bacterium Escherichia coli [11], since this is a well-characterizednetwork. We find that this network exhibits fibration symmetries [12–14]; first introducedby Grothendieck [12] in the context of algebraic geometry.Symmetry fibrations are morphisms between networks that identify clusters of synchro-nized genes (called fibers ) with isomorphic input trees. Genes in a fiber are collapsed by asymmetry fibration into a single representative gene called the base . The fibers are then thesynchronized building blocks of the genetic network and symmetry fibrations are transforma-tions that preserve the dynamics of information flow in the network. We use this symmetryprinciple to classify the building blocks into topological classes of input trees characterizedby integer branching ratios and complex topologies with golden ratios of Fibonacci sequencesrepresenting cycles in the network. We then show that symmetry fibrations explain synchro-nization patterns of gene co-expression in cells and universally apply to a range of complex2etworks across different species and domains beyond biology.
I. RESULTS
We search for symmetries in the
E. coli transcriptional regulatory network (most updatedcompilation at RegulonDB [11]) where nodes are genes and a directed link represents atranscriptional regulation (see Supplementary Information Section III).A directed link from a source gene i to a target gene j in a transcriptional regulatorynetwork represents a direct interaction where gene i encodes for a transcription factor thatbinds to the binding site of gene j to regulate (activate or repress) its expression. Such a linkrepresents a regulatory ’message’ sent by the source to the target gene using the transcriptionfactor as a ‘messenger’. This process defines the ’information flow’ in the system which isnot restricted to two interacting genes, but it is transferred to different regions within thenetwork that are accessible through the connecting pathways. The information arriving toa gene contains the entire history transmitted through all pathways that reach this gene.We formalize this process of communication between genes with the notion of ’input tree’ ofthe gene. In a network G = ( N G , E G ) with N G nodes and E G directed edges, for every gene i ∈ N G there is a corresponding input tree, denoted as T i , which is the tree of all pathwaysof G ending at i . More precisely, T i is a rooted tree with a selected node i at the root,such that every other node j in the tree represents the initial node of a path in the networkending at i .Next, we analyze the input trees in the E. coli sub-circuit shown in Fig. 1a regulatedby gene cpxR which regulates its own expression (via an autoregulation activator loop)and also regulates other genes as shown in the figure. Gene cpxR is not regulated by anyother transcription factor in the network, so, we say that this gene forms its own ‘stronglyconnected component’, see below. Therefore, it is an ideal simple circuit to explain theconcept of fibration.
A. Input tree representation
In practice, the input tree of a gene is constructed as follows (SI Section IV A). Considerthe circuit in Fig. 1a. The input tree of gene spy depicted in Fig. 1b starts with spy at the3oot (first layer). Since this gene is upregulated by baeR and cpxR , then, the second layerof the input tree contains these two pathways of length one starting at both genes. Gene baeR is further upregulated by cpxR and by itself through the autoregulation loop and cpxR is also autoregulated. Thus, the input tree continues to the third layer taking into accountthese three possible pathways of length 2, one starting at baeR and two starting at cpxR .The procedure now continues, and since there are loops in the circuit, the input tree has aninfinite number of layers.The input tree formalism is a powerful framework to search for symmetries in information-processing networks, in that it replaces the canonical notion of a single trajectory with theset of all possible ‘histories’ from an initial to a final state of the network, and this makes,in practice, reasonably straightforward to ‘guess’ a type of symmetry which is not apparentin the classical network framework. Based on results from [13–16], we will show in SectionI C that if two input trees have the same ’shape’, then the genes at the root of the inputtrees synchronize their activity [17–23], even though their input trees are made of differentgenes. This informal notion of equivalence is formalized by isomorphisms. An isomorphismbetween two input trees is a bijective map that preserves the topology of the input treesincluding the type of links. Specifically, a map τ : T → T (cid:48) is an isomorphism iff for any pairof nodes a and b of T connected by a link, the pair of nodes τ ( a ) and τ ( b ) of T (cid:48) is connectedby the same type of link (SI Section IV B). In practice, this means that isomorphic inputtrees are ‘the same’ except for the labeling of the nodes. Genes with isomorphic input treesare symmetric and synchronous. We quantify this result, next, by introducing the conceptof symmetry fibration [13]. B. Symmetry fibration of a network
The set of all input tree isomorphisms defines the symmetries of the network, which can bedescribed by a ’Grothendieck fibration’ [12]. The original Grothendieck definition of fibrationis between categories [12], so the passage to a definition of fibrations between graphs requiresto associate a category with a graph and rephrase Grothendieck’s definition in elementaryterms. Different categories may be associated with a graph, giving rise to different notionsof fibrations between graphs. The notion of fibration that we use henceforth has beenintroduced in computer science as a ’surjective minimal graph fibration’ [13, 15].4n general, a graph fibration G = ( N G , E G ) is any morphism ψ : G → B (1)that maps G to a graph B = ( N B , E B ) (with N B nodes and E B edges) called the ’base’ of the graph fibration ψ (SI Section IV C). In this work we consider a surjective minimalgraph fibration [13] which is a graph fibration ψ that maps all nodes with isomorphic inputtrees inside a fiber to a single node in B , thus producing the minimal base of the network.In this case, the base B consists of a graph where all genes in a fiber have been collapsedinto one representative node by the minimal fibration. Thus, a surjective minimal graphfibration, hereafter called symmetry fibration for the sake of lexical convenience, leads to adimensional reduction of the network into its irreducible components. Crucially, a symmetryfibration is a dimensional reduction that preserves the dynamics in the network as we shownext. C. Symmetry fibration leads to synchronization
Next, we explain the connection between fibration and synchrony in a generality that isneeded to justify our results following Ref. [15, 16]. In order to describe the dynamical stateof each gene in the transcriptional regulatory network, we first attach a phase space to eachnode in G = ( N G , E G ) by considering a map P : N G → M that assigns each node i ∈ N G tothe phase space of the node denoted by the manifold M . For example, in a transcriptionalregulatory network we assign to each gene i ∈ N G the phase space of real numbers M = R .Then, the state of each gene is described by x i ( t ) ∈ R , representing the expression level ofthe gene i at time t , which is typically measured by mRNA concentration of gene product.We then obtain the total phase space of G as the product P G = (cid:81) i ∈ N G P ( i ).The fibers partition the graph G into unique and non-overlapping sets Π = { Π , . . . , Π r } ,such that Π ∪ · · · ∪ Π r = G and Π k ∩ Π l = ∅ if k (cid:54) = l [24]. We denote i ∼ Π j whenthe input-trees of i and j are isomorphic and belong to the same fiber Π k . That is, ∃ k | i, j ∈ Π k and there exist a symmetry fibration that sends both nodes to the samenode in the base, ψ ( i ) = ψ ( j ). DeVille & Lerman [15] showed that symmetry fibrationsinduce robust synchronization in the system (Theorem 4.3.1 in [15]). In particular, it wasshown that if ψ is a symmetry fibration then— by proposition 2.1.12 in Ref. [15]— there5xist a map P ψ : P B → P G that maps the total phase space of the base B , named P B ,to the total phase space of the graph G . This map creates a polysynchronous subspace ofsynchronized solutions in fibers: ∆ Π = { x ∈ P G | x i ( t ) = x j ( t ) whenever ψ ( i ) = ψ ( j ) } ,where each set of synchronous components of this subspace corresponds to a fiber in Π(Lemma 5.1.1 in [15], see also [16]). In other words, ∆ Π is a polysynchronous subspace of P G , such that components x i , x j ∈ x synchronize (i.e., x i ( t ) = x j ( t )) whenever the symmetryfibration ψ sends them to the same node in B .According to these results, we interpret synchronous genes to process the same informa-tion received through isomorphic pathways in the network, and, accordingly, we interpreta symmetry fibration as a transformation that preserves the dynamics of information flowsince it collapses synchronous nodes in fibers (redundant from the point of view of dynamics)into a common base with identical dynamics as the fiber.Synchronous nodes in a fiber induced by symmetry fibrations correspond to the ’minimalbalanced coloring’ in [14]. A balanced coloring assigns two nodes the same color only iftheir inputs, self-consistently, receives the same content of colored nodes, whence the term‘balanced’. Thus, the flow of information arriving to genes in a fiber is analogous to a processof assigning a color to each gene such that each gene ‘receives’ the colors from adjacent genesvia incoming links and ‘sends’ its color to the adjacent genes via its outgoing links. Thenodes in a fiber have the same color symbolizing the fact that they synchronize. The nodeswith the same color in the balanced coloring partition [14] correspond to fibers induced bysymmetry fibrations [15]. We use the minimal balanced coloring algorithm proposed in [25]for the computation of minimal bases [24] to find fibers (SI Section V). D. Strongly connected components of the
E. coli network
The input trees in the
E. coli cpxR circuit are displayed in Fig. 1b. The input trees of baeR and spy are isomorphic and define the baeR - spy fiber (Fig. 1c). We call this circuit afeed-forward fiber (FFF). The input tree of cpxR is not isomorphic to either baeR or spy ,and therefore cpxR is not symmetric with these genes, but it is isomorphic to bacA, slt and yebE forming another fiber. Likewise, genes ung, tsr and psd are all isomorphic composinganother fiber (Fig. 1b). Figure 1d shows the symmetry fibration ψ : G → B that collapsesthe genes in the fibers to the base B . Figure 1e shows another example (out of many) of a6ingle connected component, fadR , and its corresponding isomorphic input trees (Fig. 1f),fibers and base.The dynamical state of a gene is encoded in the topology of the input-tree. In turn, thistopology is encoded by a sequence, a i , defined as the number of genes in each i − th layer ofthe input tree (Fig. 1b). The sequence a i represents the number of paths of length i − n of the input tree defined as a i +1 /a i −−−→ i →∞ n , which represents the multiplicative growth ofthe number of paths across the network reaching the gene at the root. For instance, theinput trees of genes baeR-spy (Fig. 1b) encode a sequence a i = i with branching ratio n = 1representing the single ( n =1) autoregulation loop inside the fiber.Beyond several single-gene strongly connected components like those shown in Fig. 1,we find that the E. coli network has other strongly connected components [in a stronglyconnected component, each gene is reachable from every other gene, SI Section VI], threein total, which regulate more involved topologies of fibers. We find: (i) a two-gene stronglyconnected component composed of master regulators crp-fis involved in a myriad of functionslike carbon utilization (Fig. 2a, top), (ii) a five-gene strongly connected component involvedin the stress response system (SI Fig. 7), and (iii) the largest strongly connected componentat the core of the network which is composed of genes involved in the pH-system that regulatehydrogen concentration (Fig. 2b). Each of these three components regulate a rich varietyof fiber topologies which are collapsed into the base by the symmetry fibration ψ : G → B ,as shown in the figure. E. Fiber building blocks
We find that the transcriptional regulatory network of
E. coli is organized in 91 differentfibers. The complete list of fibers in
E. coli is shown in SI Section VII and SI-Table VI andthe statistics are shown in SI Table I. Plots of each fiber are shown in Supplementary File 1.We find a rich variety of topologies of the input trees. Despite this diversity, the input treespresent common topological features that allow us to classify all fibers into concise classesof fundamental ’fiber building blocks’ (Figs. 3a and 3b). We associate a building block toa fiber by considering the genes in the fiber plus the external in-coming regulators of thefiber plus the minimal number of their regulators in turn that are needed to establish the7somorphism in the fiber. When the fiber is connected to any external regulator, either viaa direct link or through a path in the strongly connected component forming a cycle, thenthe genes in this cycle are considered part of the building block of the fiber, since such acycle is crucial to establish the dynamical syncronization state (when there is more than onecycle, the shortest cycle is considered).We find that the most basic input tree topologies can be classified by integer ’fiber num-bers’ | n, (cid:96) (cid:105) reflecting two features: (a) infinite n -ary trees with branching ratio n representingthe infinite pathways going through n loops inside the base of the fiber, and (b) finite treesrepresenting finite pathways starting at (cid:96) external regulators of the fiber. The most basicfibers in E. coli have three values of n = 0 , , (i) fibers with n = 0 loops, calledStar Fibers (SF), (ii) fibers with n = 1 loop, called Chain Fibers (CF), and (iii) fiberswith n = 2 loops, called Binary-Tree Fibers (BTF). This classification does not take intoaccount the types of repressor or activator links in the building blocks, which lead to furthersub-classes of fibers that determine the type of synchronization (fixed point, limit cycles,etc) and thus the functionality of the fibers.Figure 3a shows a sample of dissimilar circuits that can be concisely classified by | n, (cid:96) (cid:105) (full list in Supplementary File 1). For instance the n = 0 SF class includes dissimilarcircuits like | arcZ-ydeA (cid:105) = | , (cid:105) , | dcuC-ackA (cid:105) = | , (cid:105) which is a bi-fan network motif [2],and generalizations with (cid:96) = 3 regulators like | dcuR-aspA (cid:105) = | , (cid:105) (Fig. 3a, top). The mainfeature of these building blocks is that they do not contain loops and therefore the inputtrees are finite. The CF class contains n = 1 loop in the fiber, and therefore an infinitechain in the input tree, like the autoregulated loop in the fiber | ttdR (cid:105) = | , (cid:105) . We notethat while the input tree is infinite, the topological class is characterized by a single number n = 1 concisely represented in the base. Furthermore, a theorem proven by Norris [26]demonstrates that it suffices to test N G − (cid:96) = 1) to this circuit, converts it to the purine fiber | purR (cid:105) = | , (cid:105) which is anexample of a FFF, like the baeR circuit in Fig. 1a. This circuit resembles a feed-forwardloop motif [2], but it differs in the crucial addition of the autoregulator loop at purR thatallows genes purR and pyrC to synchronize. When another external regulator is added, wefind the idonate fiber | idnR (cid:105) = | , (cid:105) . More elaborated circuits contain two autoregulatedloops and feed-back loops featuring trees with branching ratio n = 2.8 . Fibonacci fibers So far we have analyzed building blocks that receive information from the external regu-lators in their respective strongly connected components, but do not send back informationto the external regulators. These topologies are characterized by integer branching ratios, n = 0 , ,
2, as shown in Fig. 3a. We find, however, more interesting building blocks that alsosend information back to their regulators. These circuits contain additional cycles in thebuilding blocks that transform the input trees into fractal trees characterized by non-integerfractal branching ratios. Notably, the building block of the fiber uxuR-lgoR that is regulatedby the connected component crp-fis (Fig. 2) forms an intricate input tree (Fig. 3b, top)where the number of paths of length i − a i =1, 3,4, 7, 11, 18, 29, ... characterized by the Fibonacci recurring relation: a = 1, a = 3, and a i = a i − + a i − for i >
2. This sequence leads to the non-integer branching ratio known asthe golden ratio: a i +1 /a i −−−→ i →∞ ϕ = (1 + √ / . ... This topology arises in the genetic network due to the combination of two cycles ofinformation flow. First, the autoregulation loop inside the fiber at uxuR creates a cycleof length d = 1 which contributes to the input tree with an infinite chain with branchingratio n = 1. This sequence is reflected in the Fibonacci series by the term a i = a i − . Theimportant addition to the building block is a second cycle of length d = 2 between uxuR inthe fiber and its regulator exuR : uxuR → exuR → uxuR . This cycle sends information fromthe fiber to the regulator and back to the fiber by traversing a path of length d = 2 thatcreates a ’delay’ of d = 2 steps in the information that arrives back to the fiber (see Fig.3b, top). This short-term ’memory’ effect is captured by the second term a i = a i − in theFibonacci sequence leading to a i = a i − + a i − and the golden ratio. We call this topologya Fibonacci fiber (FF).This argument implies that an autoregulated fiber that further regulates itself by con-necting to its connected component via a cycle of length d encodes a generalized Fibonaccisequence of order d defined as a i = a i − + a i − d with generalized golden ratio ϕ d (Fig. 3bfourth row). We find such a Fibonacci sequence in the evgA-nhaR fiber building blocklinked to the pH strongly connected components shown in Fig. 2b. This fiber contains anautoregulation cycle inside the fiber and also an external cycle of length d = 4 throughthe pH strongly connected component: evgA → gad E → gadX → hns → evgA (Fig. 3b,9hird row). This topology forms a fractal input tree with sequence a i = a i − + a i − (se-quence A123456 in [27]) and branching golden ratio ϕ = 1 . ... We call this topology4-Fibonacci fiber, 4-FF. Generalized Fibonaccis appear inside strongly connected compo-nents, like the rcsB-adiY d , the Fibonacci sequencegeneralizes to: a i = a i − + a i − + · · · + a i − − d + a i − d , and the branching ratio satisfies: d = − log(2 − ϕ d )log ϕ d [28]. G. Multi-layer composite fibers
Building blocks can also be combined to make composite fibers, like prime numbersor quarks can be combined to form natural numbers or composite particles like protonsand neutrons, respectively. The ability to assemble fiber building blocks to make largercomposites is important in that it helps to understand systematically higher order functionsof biological systems composed of many genetic elements. We discover a particular type ofcomposite made up of two elementary building blocks, that we name multi-layer compositefiber. For instance, the double layer add-oxyS fiber in the crp-fis connected component(see Figs. 2a and 3b bottom, and ID | add − oxyS (cid:105) = | , (cid:105) ⊕ | , (cid:105) made of a series of genes composing a single fiberof type | , (cid:105) = | add , dsbG , gor , grxA , hemH , oxyS , trxC (cid:105) that are regulated by two differenttranscription factors rbsR and oxyR that form another fiber of type | , (cid:105) = | rbsR, oxyR (cid:105) .This composite is of importance since it allows for information to be shared between twogenes, for instance add and oxyS , which are not directly connected (in this case, separatedby a distant in the network of length two).Composite fibers satisfy a simple engineering ’sum-rule’: elementary fibers are composedin series of fibers in a predefined order where the first layer is represented by an entry fiber(carrying transcription factors), and the last layer is formed by a terminator fiber of outputgenes (encoding enzymes), as shown in Fig. 3b, bottom. This multi-layer composite fiber isbiologically significant because genes in the output layer synchronize a genetic module thatimplement the same function even though the genes in the module are not directly connected,and, indeed, can be at far distances in the network. Such functionally related modules couldnot be identified by modularity algorithms [29] which cluster nodes in modules of highly10onnected nodes.We find that composite fibers are dominant in eukaryotes (yeast, mouse, human, seeSection I H). They resemble the building blocks of multilayered deep neural networks whereeach subsequent gene in the layer synchronizes despite the fact that nodes can be distant inthe network. More generally, composite fibers with multiple layers streamline the construc-tion of larger aggregates of fibration building blocks performing more complex function in acoordinated fashion. These composite topologies complete the classification of input trees. H. Fibration landscape across biological networks, species and system domains
To study the applicability of fibration symmetries across domains of complex net-works we have analyzed 373 publically available datasets (SI Section VIII). Full details ofeach network and results can be accessed at https://docs.google.com/spreadsheets/d/1-RG5vR_EGNPqQcnJU8q3ky1OpWi3OjTh5Uo-Xa0PjOc . The codes to reproduce this analysisare at github.com/makselab (SI Section V) and the full datasets at kcorelab.org . Weanalyze biological networks spanning from transcriptional regulatory networks, metabolicnetworks, cellular processes networks and signaling pathways, disease networks, and neuralnetworks. We span different species ranging from A. thaliana, E. coli, B. subtilis, S. enterica(salmonella), M. tuberculosis, D. melanogaster, S. cerevisiae (yeast), M. musculus (mouse)to H. sapiens (human). The topological fiber numbers | n, (cid:96) (cid:105) allow us to systematically clas-sify fibers across the different domains in a unifying way. We find that fibration symmetriesare found across all biological processes and domains. The fiber distributions for each typeof biological network calculated by summing over the studied species are displayed in Fig.4a and the fiber distributions for each species calculated over the type of biological net-works are shown in Fig. 4b. Our analysis allows to investigate the specific attributes andcommonalities of the fiber building blocks inside and across biological domains. We find avaried set of fibers that characterize the biological landscape. Certain features of the fibernumber distribution are visible in the transcriptional networks in Fig. 4a. For instance, atail with (cid:96) is seen in the n = 0 class as well as in the n = 1 class. Across species (Fig. 4b),bacteria like E. coli or B. subtilus display a majority of n = 0 building blocks, while higherlevel organisms like yeast, mouse and human display a majority of more complex buildingblocks like multi-layers and Fibonaccis. 11o test the existence of symmetry fibrations across other domains we extend our studiesto complex networks beyond biology ranging from social, infrastructure, internet, software,economic networks and ecosystems (details of datasets in SI Section VIII). Figure 4c showsthe obtained fiber distributions for each domain. A normalized comparison across domainsis visualized in Fig. 4d showing the cumulative number of fibers over all domains andspecies per network size of 10 nodes. The results support the applicability of the conceptof symmetry fibration beyond biology to describe the building blocks of networks across alldomains. I. Gene co-expression and synchronization via symmetry fibration
We have shown in Section I C that fibers in networks determine cluster synchronizationin the dynamical system. In a gene regulatory network, symmetric genes in a fiber synchro-nize their activity to produce gene co-expression levels that sustain cellular functions. Wecorroborate this result numerically in Fig. 1g in the particular example of the baeR-spy
FFFin
E. coli , and this result applies to all fibers, irrespective of the dynamical system law.To exemplify the synchronization in fibers, we consider the dynamics in the compositefiber | add − oxyS (cid:105) = | , (cid:105)⊕| , (cid:105) depicted in Fig. 2a and Fig. 3b bottom, which is composedof autoregulator 1 = crp , and two layers of fibers: 2 = rbsR , 3 = oxyR , and 4 = add ,5 = oxyS (we consider here a reduced fiber for simplicity, and we add the autoregulatorto crp to the building block for completeness). Graph G = { N G , E G } consists of N G = { , , , , } , E G = { → , → , → , (cid:97) , (cid:97) , → , → } ( (cid:97) refers torepressor and → to activation) and a 5-dimensional total phase space P G = R with statevector X ( t ) = { x ( t ) , x ( t ) , x ( t ) , x ( t ) , x ( t ) } describing the expression levels of each gene’sproduct (e.g., mRNA concentration).The symmetry fibration ψ : G → B collapses the graph G into the base B = { N B , G B } ,where N B = { a, b, c } and E B = { a → a, a → b, b (cid:97) b, b → c } . The symmetry fibrationacts on the nodes: ψ (1) = a , ψ (2) = ψ (3) = b , ψ (4) = ψ (5) = c , and on the edges: ψ (1 →
1) = a → a , ψ (1 →
2) = ψ (1 →
3) = a → b , ψ (2 (cid:97)
2) = ψ (3 (cid:97)
3) = b (cid:97) b , and ψ (2 →
4) = ψ (3 →
5) = b → c . Thus, the fibers partition the graph G as Π = { Π a , Π b , Π c } ,where Π a = { } , Π b = { , } and Π c = { , } .We represent the dynamics by two functions k ( x ) and g ( x ) modeling degradation and12ynthesis of gene product, respectively [9, 10]. For example, k ( x ) can be modeled as alinear degradation term and g ( x ) as a Hill function [9]. We consider that multiple inputsare combined by multiplying functions g ( x ), but any other way of combining inputs can beused. Then, the dynamics of the expression levels of the genes in the circuit are describedby: dx dt = − k ( x ) + g ( x ) dx dt = − k ( x ) + g ( x ) ∗ g ( x ) dx dt = − k ( x ) + g ( x ) ∗ g ( x ) dx dt = − k ( x ) + g ( x ) dx dt = − k ( x ) + g ( x ) . (2)The dynamics of the base are described by the state vector of the base: ( y a ( t ) , y b ( t ) , y c ( t ))with dynamical equations [16]: dy a dt = − k ( y a ) + g ( y a ) dy b dt = − k ( y b ) + g ( y a ) ∗ g ( y b ) dy c dt = − k ( y c ) + g ( y b ) . (3)If ( y a ( t ) , y b ( t ) , y c ( t )) is a solution for the base Eqs. (3), then the map P ψ sends the phasespace of this base to the phase space of the solutions in the graph G [16]: (cid:16) x ( t ) , x ( t ) , x ( t ) , x ( t ) , x ( t ) (cid:17) = P ψ (cid:104) y a ( t ) , y b ( t ) , y c ( t ) (cid:105) = (cid:16) y a ( t ) , y b ( t ) , y b ( t ) , y c ( t ) , y c ( t ) (cid:17) . (4)Therefore, the graph G sustains a polysynchronous subspace (see for instance Motivatingexample 1.4 in [15]):∆ Π = { ( x , x , x , x , x ) ∈ R | x ( t ) , x ( t ) = x ( t ) , x ( t ) = x ( t ) } . (5)This result can be corroborated by simply plugging (cid:16) x ( t ) , x ( t ) , x ( t ) = x ( t ) , x ( t ) , x ( t ) = x ( t ) (cid:17) into Eqs. (2) to obtain a solution of the dynamics, implyingthe synchrony x ( t ) = x ( t ) in fiber Π b and x ( t ) = x ( t ) in fiber Π c . We note that the con-13ept of sheaves and stacks might be useful to generalize the symmetry fibration frameworkto multiplex networks.We test this gene synchronization with publically available transcription profile experi-ments available from the literature. We use gene expression data profiles in E. coli compiledat Ecomics http://prokaryomics.com [30]. This portal collects microarray and RNA-seqexperiments from different sources such as the NCBI Gene Expression Omnibus (GEO) pub-lic database [31] and ArrayExpress [32] under different experimental growth conditions. Thedata is also compiled at the Colombos web portal [33]. The database contains transcriptomeexperiments measuring the expression level of 4,096 genes in
E. coli strains over 3,579 exper-imental conditions which are described as: strain, medium, stress, and perturbation. Rawdata is pre-processed to obtain expression levels by using noise reduction and bias correctionto normalize data across different platforms [30].
E. coli can adapt its growth to the different conditions that finds in the medium. Thisadaptation is made by sensing extra and intracellular molecules and using them as effectorsto activate or repress transcription factors. This implies that the different fibers are activatedby specific experimental conditions. The Ecomics portal allows to obtain those experimentalconditions where a set of genes has been significantly expressed under a particular set ofconditions. We perform standard gene expression analysis (see colombos.net and Ref. [33])of the expression levels in
E. coli obtained under these conditions.For a given set of genes in a fiber, we find the experimental conditions for which thegenes have been significantly expressed by comparing the expression samples over the 4,096different growth conditions. Following [33], the experimental conditions are ranked with theinverse coefficient of variation (ICV) defined as ICV k = | µ k | /σ k , where µ k is the averageexpression level of the genes in the condition k and σ k is the standard deviation. Following[33], we select those conditions with ICV k >
1, i.e., where the average expression levels inthe particular condition k are higher than the standard deviation. This score reflects thefact that, in a relevant condition, the genes show an increment of their expression abovethe individual variations caused by random noise. Details on the expression analysis can befound at Ref. [33] and https://doi.org/10.1371/journal.pone.0020938.s001 . Thus,we obtain expression levels organized by the relevant experimental conditions which arelabeled according to the GEO database [31]. From these data, we calculate the co-expressionmatrix using the Pearson correlation coefficient between the expression levels of two genes14 and j in the relevant conditions for genes in a fiber. For off-diagonal correlations betweengenes in different fibers, we use the combined sets of conditions of both genes.Results for the correlation matrix are shown in Fig. 2a (bottom) for fibers regulatedby the crp-fis strongly connected component. Gene expression is obtained for every gene,so we plot the correlation matrix calculated over each pair of genes. Genes that belong tothe same operon are transcribed as a single unit by the same mRNA molecule, so thesegenes are expected to trivially synchronize (variations exist due to attenuators inside theoperon). Thus, we group together these genes as operons in the figure to indicate thistrivial synchronization. To test the existence of fiber synchronization we compare gene co-expression belonging to different operons. Figure 2a (bottom) shows that expression levelsof the genes that belong to a fiber are highly correlated as predicted by the symmetryfibration. Genes that belong to different fibers show no significant correlations among them.In particular, there is no significant correlation between the expression of genes in a givenfiber and the two master regulators crp and fis . This result is consistent with the fibrationsymmetry and occurs despite the fact that both, crp and fis , directly regulates all genesin the studied fibers. We find some off-diagonal weak correlations between fibers (e.g., malI ), probably indicating missing links or missing regulatory processes that produce extrasynchronizations. Some genes present weak correlations inside fibers (e.g., cirA ), indicatingweak symmetry breaking probably from asymmetries in the strength of binding rate oftranscription factors or input functions; effects that are not considered in the topologicalview of the input trees, and can lead to desynchronization inside the fiber. II. DISCUSSION
Fibration symmetries make sure that genes are turned on and off at the right amountto assure the synchronization of expression levels in the fiber needed to execute cellularfunctions. In the fibration framework, network function can be pictured as an orchestra inwhich each instrument is a gene in the network. When the instruments play coherently, withstructured temporal patterns, the network is functional. Here we have concentrated on thesimplest temporal organization, one in which some units (instruments) act synchronouslyin time, a ubiquitous pattern observed in all biological networks. Our findings identify thesymmetries that predict this synchronization and give rise to functionally related genes from15he fibrations of the genetic network.Unlike network motifs which are identified by statistical overrepresentation [2], fibersin biology arise from principles of symmetries following the tradition of how the buildingblocks of elementary particles have being discovered in physics and geometry [5]. Ourfirst principle approach to identify building blocks is based on the circuit’s theoretical andpractical (rather than statistical) significance to serve minimal forms of coherent functionand logic computation.Further results shown in [34] indicate that symmetries also describe the structure ofneural connectomes and these symmetries factorize according to function. Thus, symme-tries can be used to systematically organize biological diversity into building blocks usinginvariances in the information flow encoded in the topologies of the input trees. Genes re-lated by symmetries are co-expressed, thus providing a functional rationale for the biologicalexistence of these symmetries.
Acknowledgments
Research was sponsored by NIH-NIGMS R01EB022720, NIH-NCIU54CA137788/U54CA132378, NSF-IIS 1515022 and NSF-DMR 1308235. We thankL. Parra, W. Liebermeister, C. Ishida, M. S´anchez and J. D. Farmer for discussions. FM,IL, and HAM designed research, performed research, analyzed data, and wrote the paper.16
1] Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. From molecular to modular cellbiology.
Nature , C47-C52 (1999).[2] Alon, U.
An Introduction to Systems Biology: Design Principles of Biological Circuits (CRCPress, Boca Raton, 2006).[3] Gell-Mann, M.
The Quark and the Jaguar (Holt Paperbacks, New York, 1994).[4] Dixon, J. D. & Mortimer, B.
Permutation Groups , Graduate Texts in Mathematics, 163(Springer-Verlag, New York, 1996).[5] Weinberg, S.
The Quantum Theory of Fields (Cambridge University Press, Cambridge, 2005).[6] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. & Alon, U. Network motifs:simple building blocks of complex networks.
Science , 824-827 (2002).[7] Buchanan, M., Caldarelli, G., De Los Rios, P. Rao, F. & Vendruscolo M. (editors),
Networksin Cell Biology (Cambridge University Press, Cambridge, 2010).[8] Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptionalregulation network of
Escherichia coli . Nature Genet. , 64-68 (2002).[9] Karlebach, G., Shamir, R. Modeling and analysis of gene regulatory networks. Nature reviews:Molecular Cell Biology , 770-780 (2008).[10] Klipp, E., Liebermeister, W., Wierling, C. & Kowald A. Systems Biology (Wiley-VCH, Wein-heim, 2016).[11] Gama-Castro, S. et al. , RegulonDB version 9.0: high-level integration of gene regulation,coexpression, motif clustering and beyond. Nucleic Acids Res. , D133-D143 (2016). http://regulondb.ccg.unam.mx .[12] Grothendieck, A. Technique de descente et th´eor´emes d’existence en g´eom´etrie alg´ebrique, I.G´en´eralit´es. Descente par morphismes fid´element plats. S´eminaire N. Bourbaki , Talk no. 190,p. 299-327 (1958-1960). , https://ncatlab.org/nlab/show/Grothendieck+fibration [13] Boldi, P. & Vigna, S. Fibrations of graphs. Discrete Mathematics , 21-66 (2001). http://vigna.di.unimi.it/fibrations [14] Golubitsky, M. & Stewart, I. Nonlinear dynamics of networks: the groupoid formalism.
Bull.Amer. Math. Soc. , 305-364 (2006). [15] DeVille, L. & Lerman, E. Modular dynamical systems on networks. J. Eur. Math. Soc. ,2977-3013 (2015). https://arxiv.org/abs/1303.3907 [16] Nijholt, E., Rink, B. and Sanders, J. Graph fibrations and symmetries of network dynamics. J. of Differential Equations , 4861-4896 (2016).[17] Abrams, D. M., Pecora, L. M. & Motter, A. E. Focus issue: Patterns of network synchroniza-tion.
Chaos , 094601 (2016).[18] Pecora, L. M., Sorrentino, F., Hagerstrom, A. M., Murphy, T. E. & Roy, R. Cluster synchro-nization and isolated desynchronization in complex networks with symmetries. Nature Comm. , 4079 (2014).[19] Sorrentino, F., Pecora, L. M., Hagerstrom, A. M., Murphy T. E. & Roy, R. Complete charac-terization of the stability of cluster synchronization in complex dynamical networks. Sci. Adv. , e1501737 (2016).[20] Stewart, I., Golubitsky, M. & Pivato, M. Symmetry groupoids and patterns of synchrony incoupled cell networks. SIAM J. Appl. Dyn. Syst. , 609-646 (2003).[21] Arenas, A., D´ıaz-Guilera, J. K. A., Moreno, Y. & Zhou, C. Synchronization in complexnetworks. Phys. Rep. , 93-153 (2008).[22] Rodrigues, F. A., Peron, T. K., Ji, P. & Kurths, J. The Kuramoto model in complex networks.
Physics Reports , 1-98 (2016).[23] Strogatz, S.
Nonlinear Dynamics and Chaos: with Applications to Physics, Biology, Chem-istry, and Engineering (Westview Press, Boulder, 2000).[24] Cardon, A. & Crochemore, M. Partitioning a graph in O ( | A | log | V | ). Theoretical ComputerScience , 85-98 (1982).[25] Kamei, H. & Cock, P. J. A. Computational of balanced relations and their lattice for a coupledcell network. SIAM J. Appl. Dyn. Syst. , 352-382 (2013).[26] Norris, N. Universal covers of graphs: isomorphism to depth n - 1 implies isomorphism to alldepths. Discrete Applied Mathematics , 61-74 (1995).[27] OEIS Foundation Inc. (2020), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A003269 [28] Gardner, M. The Scientific American Book of Mathematical Puzzles and Diversions, Vol. II,p. 101 (Simon and Schuster, 1961).
29] Girvan, M. & Newman, M. E. J. Community structure in social and biological networks.
Proc.Nat. Acad. Sci. US , 7821-7826 (2002).[30] Kim M. et al. Multi-omics integration accurately predicts cellular state in unexplored condi-tions for Escherichia coli. Nat. Commun. , 13090 (2016).[31] Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., Marshall,K. A., Phillippy, K. H., Sherman, P. M., Holko, M. et al. NCBI GEO: archive for functionalgenomics data sets– update.
Nucleic Acids Res., , D991-D995 (2016).[32] Kolesnikov, N., Hastings, E., Keays, M., Melnichuk, O., Tang, Y. A., Williams, E., Dylag,M., Kurbatova, N., Brandizi, M., Burdett, T. et al. ArrayExpress update: simplifying datasubmissions.
Nucleic Acids Res. , D1113-D1116 (2015).[33] Moretto, M., Sonego, P., Dierckxsens, N., Brilli, M., Bianco, L., Ledezma-Tejeida, D., Gama-Castro, S., Galardini, G., Romualdi, C., Laukens, C., Collado-Vides, J., Meysman, P. & En-gelen, K. COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses. Nucleic Acids Res. , D620-D623 (2016). http://colombos.net [34] Morone, F. & Makse, H. A. Symmetry group factorization reveals the structure-functionrelation in the neural connectome of Caenorhabditis elegans , Nature Comm. , 4961 (2019).[35] Boldi, P., Lonati, V., Santini, M. & Vigna, S. Graph fibrations, graph isomorphism, andPageRank. RAIRO Inform. Th´eor. , 227-253 (2006).[36] Boldi, P. & Vigna, S. An effective characterization of computability in anonymous networks.In J. L. Welch, editor, Distributed Computing. 15th International Conference, DISC 2001,number 2180 in Lecture Notes in Computer Science , pp 33-47 (Springer-Verlag, 2001).[37] Boldi, P. & Vigna, S. Universal dynamic synchronous self-stabilization.
Distr. Comput. ,137-153 (2002). Definition of input tree, symmetry fibration, fiber and base . a, Thecircuit controlled by the cpxR gene regulates a series of fibers as shown by the differentcolored genes. The circuit regulates more genes represented by the dotted lines which arenot displayed for simplicity. The full lists of genes and operons in this circuit are in SITable VI, ID=27, 28 and 54. b, The input tree of representative genes involved in the cpxR circuit showing the isomorphisms that define the fibers. For each fiber, we show the numberof paths of length i − a i , and its branching ratio n . c, Isomorphism between the input trees of baeR and spy . The input trees are composed of aninfinite number of layers due to the autoregulation loop at baeR and cpxR . How to provethe equivalence of two input trees when they have an infinite number of levels? A theoremproven by Norris [26] demonstrates that it suffices to find an isomorphism up to N − N is the number of nodes in the circuit. Thus, in this case, 2 levels are sufficient toprove the isomorphism. d, Symmetry fibration ψ transforms the cpxR circuit G into itsbase B by collapsing the genes in the fibers into one. e, Symmetry fibration of the fadR circuit and f, its isomorphic input trees. Full list of genes in this circuit appears in SI TableVI, ID=3, 4, and 58. g, Symmetric genes in the fiber synchronize their activity to producesame activity levels. We use the mathematical model of gene regulatory kinetics from Ref.[8] (sigmoidal interactions lead to qualitatively similar results) to show the synchronizationinside the fiber baeR-spy when the fiber is activated by its regulator cpxR . Notice that cpxR does not synchronize with the fiber.FIG. 2.
Strongly connected components of the genetic network and synchro-nization of gene co-expression in the fibers in
E. coli . a, Top, Two-gene connectedcomponent of crp-fis . This component controls a rich set of fibers as shown. We also show thesymmetry fibration collapsing the graph to the base. We highlight the fiber uxuR-lgoR whichsends information to its regulator exuR and forms a 2-Fibonacci fiber | ϕ = 1 . .., (cid:96) = 2 (cid:105) ,as well as the double-layer composite | add − oxyS (cid:105) = | , (cid:105) ⊕ | , (cid:105) . a, Bottom. Co-expression correlation matrix calculated from the Pearson coefficient between the expressionlevels of each pair of genes in Fig. 2a. Synchronization of the genes in the respective fibersis corroborated as the block structure of the matrix. b, The core of the
E. coli network isthe strongly connected component formed by genes involved in the pH system as shown.This component supports two Fibonacci fibers: 3-FF and 4-FF and fibers as shown. Hollowcolored circles indicate genes that are in fibers and also belong to the pH component.20IG. 3.
Classification of building blocks in
E. coli . a, Basic fiber building blocks .These building blocks are characterized by a fiber that does not send back information toits regulator. They are characterized by two integer fiber numbers: | n, (cid:96) (cid:105) . We show selectedexamples of circuits and input trees and bases. The full list of fibers appears in SI TableVI and Supplementary File 1. The statistical count of every class is in SI Table I. The lastexample shows a generic building block for a general n-ary tree | n, (cid:96) (cid:105) with (cid:96) regulators. b,Complex Fibonacci and multilayer building blocks . These building blocks are morecomplex and characterized by an autoregulated fiber that sends back information to itsregulator. This creates a fractal input tree that encodes a Fibonacci sequence with goldenbranching ratio in the number of paths a i versus path length, i −
1. When the informationis sent to the connected component that includes the regulator, then a cycle of length d isformed and the topology is a generalized Fibonacci block with golden ratio ϕ d as indicated.We find three such building blocks: 2-FF, 3-FF and 4-FF. Last panel shows a multilayercomposite fiber with a feed-forward structure.FIG. 4. Fibration landscape across domains and species . a, Fibration landscapefor biological networks . Total number of fiber building blocks across 5 types of biologicalnetworks analyzed in the present work. The count includes the total number of fibers inthe networks of each biological type considering all species analyzed for each type (see SITable IV). b, Fibration landscape across species . Count of fibers across each analyzedspecies. Each panel shows the count over the different type of biological networks ( E. coli contains only the transcriptional network, see SI Table IV). c, Fibration landscape acrossdomains . Count of fibers across the major domains studied. The biological domain panelis calculated over all networks and species in a and b . d, Global fibration landscape .Cumulative count of fibers in all domains in c . The cumulative count represents the totalnumber of fibers per network of 10 nodes. Specifically, the quantity is calculated as thetotal number of fibers divided by the total number of nodes in all networks per domainmultiplied by 10 . 21 IG. 1: IG. 2: a IG. 2: b IG. 3: a IG. 3: b IG. 4: a, b IG. 4: c, d upplementary Information Fibration symmetries uncover the building blocks of biological networks
Flaviano Morone, Ian Leifer, Hern´an A. Makse
Contents
I. Results
E. coli network 6E. Fiber building blocks 7F. Fibonacci fibers 9G. Multi-layer composite fibers 10H. Fibration landscape across biological networks, species and system domains 11I. Gene co-expression and synchronization via symmetry fibration 12
II. Discussion References III. Transcriptional regulatory network of
E. coli IV. Symmetry fibrations
V. Algorithm to find fibers with minimal balance coloring VI. Strongly connected component VII. Statistics of fibers in the TRN of
E. coli
E. coli
E. coli VIII. Datasets of biological and non-biological networks
II. TRANSCRIPTIONAL REGULATORY NETWORK OF
E. COLI
To define the transcriptional regulatory network (TRN) we use the transcription factor-gene target bi-partite network of
Escherichia coli
K-12 obtained from the RegulonDB datasource ( http://regulondb.ccg.unam.mx ). RegulonDB manually curates all transcriptionalregulations from literature searches [11]. We download all transcriptional regulatory inter-actions catalogued in RegulonDB version 9.0 from http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt , last accessed September 15, 2018.The database downloaded from RegulonDB is composed of a bipartite transcription factor- gene target network. In this bi-partite dataset, a directed link between a source transcrip-tion factor (TF) and a target gene means that the TF binds to the DNA sequence at thebinding site of the target gene to regulate its rate of transcription. In
E. coli , each geneexpresses a single TF (this is not the case in eukaryotic genes that contains introns andsplicing of protein-coding RNA can produce many proteins from a single gene). Therefore, agene-gene regulatory network can be constructed from the bipartite transcription factor-genetarget network by associating each TF to the gene that expresses the TF. Then, a directedlink in the TRN from gene i → gene j implies that gene i encodes for a TF that controlsthe rate of transcription of gene j . Thus, a directed link encodes the combined processes oftranscription, translation and TF binding to a target gene. We denote genes in bacteria initalics, e.g., gadX and its protein as GadX. Thus, we say that gene i sends a genetic ’message’to gene j and the ’messenger’ is the TF. The history of all messages passing in the networkdefines the information flow in the network. A TF can either be an activator, repressor orcan have a dual function. For the purpose of calculating isomorphisms between input trees,the dual interactions are treated as distinct interactions. Thus, these three interactions aretreated as three different types.For the purpose of building the TRN it is important to distinguish the gene’s productsbetween genes encoding for TFs and the rest of the genes encoding for the rest of the proteins(enzymes, kinases, transport proteins, etc). A TF is a regulatory protein that regulates agene by binding, and therefore will always have an out-going link in the network. Thereare other regulatory proteins (like kinases, histones, coactivators, etc) that regulate geneexpression but they do not have a DNA-binding domain and they regulate gene expressionwithout binding. In our TRN, genes that encode for a protein that is not a TF do not have31ut-going links in the network. They only have in-going links and therefore are danglingends in the network. In E. coli most of these proteins are enzymes that catalyze biochemicalreactions in the metabolic network. Other proteins are involved in transport and signalingprocesses (kinase) in the cell.TF are also activated by effector molecules (metabolites) that bind non-covalently to anallosteric site of the TF to alter the conformation of the TF to activate it or deactivated bycontrolling the binding/unbinding of the TF to DNA. Effectors can also produce covalentactivation of the TF like for instance during phosphorylation mediated by kinases in the twocomponent TFs.We treat these effector activities as external parameters, determined by the growth con-ditions in the surrounding system (the cell in its changing environment) or by the metabolicnetwork, which is considered external to the TRN. These external perturbations are consid-ered as the external growth conditions when we analyze the co-expression profiles in SectionI I. In the present study, the metabolic network is considered external to the TRN, so we donot consider feedback loops from the TRN to the metabolic network and back to the TRNmediated by effector metabolites. This extended network is treated in a follow up.In
E. coli , genes are also grouped by operons. An operon is a set of contiguous genesthat are transcribed as a single unit from the same mRNA molecule and the same promotersite upstream of all genes and a terminator downstream [11]. An operon can contain genesencoding for TF or non-TF proteins, and more than two TFs can be part of the operon.Since the operons are transcribed by the same RNA molecule, then we group these genesinto a single node in the network. This is certainly the case when the operon has a singlepromoter transcribing the full operon. However, there is some ambiguity in the constructionof the network using the definition of operon in RegulonDB when there are promoters inthe middle of the operon and these promoters transcribe more than one TF in the operon,forming different transcription units. For instance, the operon in the gad system, gadAXW which is important in the pH strongly connected component in Fig. 2b. This operonexpressed two TFs, GadX and GadW, and one enzyme GadA. Here, each gene has its ownpromoter and terminator and thus are different nodes in the network. Moreover, each TFis regulated by different TFs as well as each TF regulates different genes. As seen in Fig.2b, for instance, GadX binds to hns but not GadW. Also, GadW is regulated by ydeO but ydeO does not regulate gadX . Thus, putting together these two genes in the same operon32 adAXW would miss all these links. Thus, when two TF with different promoters are partof the operon, we consider the TF as different genes. On the other hand, the non-TFgenes in operons are always put together with other genes in the operon. For instance, the gadAXW operon from RegulonDB is considered as two nodes: gadW and gadAX . To simplifynotation, when there is an operon that contains one TF and several non-TF proteins, thenfor simplicity, we call this operon by the name of the TF. For instance, gadAX is simplycalled gadX or the operon rbsDACBKR is called rbsR and therefore the TF rsbR representsthe entire operon rbsDACBKR . Finally, when all the genes in the operon are non-TF, thenwe call the operon with all the genes names, as for instance, lsrACDBFG-tam .In the RegulonDB database there are a total of 4690 genes. Out of these genes, RegulonDB provides a bipartite network consisting of 1843 genes with interactions from or to othergenes, the remaining genes are not considered in the analysis. There are 192 genes thatencode for TFs. We cluster the genes into 313 operons as explained above. Full names ofoperons and genes appear in SI Table VI. After grouping the genes into operons, the networkis reduced to 879 nodes. There are 1835 directed edges with an average in-degree (or out-degree) of 2.1. In this network we find 91 different fibers that encompass 416 different nodes.We find that 28 nodes are involved in 7 strongly connected components of size larger thanone node, and the rest are single node connected components.
IV. SYMMETRY FIBRATIONS
Below we provide formal definitions of the main concepts using in the paper: (a) inputtrees and isomorphisms, (b) from fibrations → surjective minimal graph fibrations calledhere symmetry fibrations, (c) fibers and minimal bases, and (d) minimal balance coloringalgorithm. We start with a review of the literature (not exhaustive).The literature on fibrations and groupoids crosses the fields of mathematics, computerscience and dynamical systems theory. The notion of fibration was first introduced byGrothendieck as fibrations between categories in algebraic geometry [12]. The original pa-per of Grothendieck has been published as a part of the S´eminaire N. Bourbaki in 1958and can be found at .A mathematical account of Grothendieck fibrations in the context of category theory ap-pears in https://ncatlab.org/nlab/show/Grothendieck+fibration . For a review of33 IG. 5:
Group symmetries and fibrations with their input tree . a, Example of a networkwith a symmetry group. The automorphism shown maps the network into another network leavinginvariant the connectivity of every nodes in the network [4, 14, 17, 18]. b, A network withoutautomorphisms but with a fibration. The addition of a single out-link from 3 → (c). There areno more isomorphisms as shown by the rest of the input trees. Therefore, nodes 2 and 3 form afiber. Nodes 4 and 5 also form another fiber, yet independently of the other fiber. The fibration isa morphism that maps the network into a base which is formed by collapsing the isomorphic nodesinto one, i.e., collapsing node 2 and 3 together, and node 4 and 5 together. The resulting base isalso called a quotient graph. the history of fibrations from Grothendieck to modern studies, see the blog of Vigna at http://vigna.di.unimi.it/fibrations/ . The formulation of Grothendieck is highly ab-stract and differs from our present work which refers to the notion of surjective mini-mal graph fibration which is a fibration between graphs. The work of Boldi & Vigna3413] and DeVille & Lerman [15] on graph fibrations are the closest to our formulation,see http://vigna.di.unimi.it/ftp/papers/FibrationsOfGraphs.pdf . Graph fibra-tions have been applied in computer science to understand PageRank [35], and the stateof synchrony of processors in computing distributed systems [36, 37], where fibrations arethe key concept in the computation of identical states in distributed system. The relationbetween surjective minimal graph fibrations and synchronous subspaces is elaborated inDeVille & Lerman [15] and Nijholt, Rink & Sanders [16]. It should be noted that all theseworks on fibrations pertain to a highly abstract mathematical level which, in turn, providesthe concept of fibration with a quite broad applicability. For a more accessible readingon fibrations within the particular context application to biological networks, the reader isrecommended to follow our paper and supplementary sections.In parallel, the work of Golubitsky and Stewart [14, 20] and others in dynamical sys-tems theory consider the equivalent formalism of symmetry groupoids, equitable partitionof balanced colored nodes and its relation with synchronization [21–23]. A review of thegroupoid formalism and its application to synchronization in dynamical systems appearsin [14]. DeVille and Lerman [15] also discuss the relation between graph fibrations and thegroupoid formalism.Synchronization arises also as a consequence of permutation symmetries in the network,called automorphisms [4], which form symmetry groups and are different from symmetryfibrations and symmetry groupoids. There is a large literature in the dynamical systemcommunity dealing with cluster synchronization from automorphisms, since synchronizationis an ubiquitous phenomenon across all sciences [21–23]. Reviews can be found in the work ofGolubitsky and Stewart [14, 20] to recent work in [17–19] and references therein. Symmetrygroups are the cornerstone of physical phenomena appearing in all physical systems [5].Below, to elaborate on the definition of symmetry fibrations, we first compare fibrationsto automorphisms which form symmetry groups [4, 14, 17–19] using the example networks ofFigs. 5a and 5b. An automorphism is a transformation that preserves the full connectivity ofthe network. That is, an automorphism preserves not only the inputs but also the outputsof each node in the network, and therefore, it presents more stringent conditions on theconnectivity than symmetry fibrations which preserve only the input trees. For example,35he network of Fig. 5a is invariant under the automorphism defined by the permutation: σ = ↓ ↓ ↓ ↓ ↓ ↓ , (6)because the nodes are connected exactly to the same nodes before and after the applicationof the permutation σ , which is a global mirror symmetry.Next, consider the slightly modified network depicted in Fig. 5b left, which differs fromthe network in Fig. 5a by one extra out-going link from node 3 to 7. In this network, thepermutation of nodes 2 ↔ ↔
5, Eq. (6), is not an automorphism anymore, becauseit does not preserve the in and out connectivities of all nodes, e.g., node 3 is connected with7 but loses this connection after the permutation (Fig. 5b right). It is interesting to see howfragile group symmetries are: if we connect just one extra node to the network as shownin Fig. 5b, the symmetry (i.e. the network automorphism group) is broken. This occursbecause automorphisms require very strict arrangements of nodes and links to preserve,rigidly, the global structure of the network. Fibration symmetries, with their emphasis inthe preservation of the input trees only, is less restrictive. This might explain why fibrationsymmetries emerged in living systems as opposed to the more restrictive automorphismswhich describe all aspects of matter, from elementary particles to atoms, molecules andphases of matter.This example raises the following question: are there extra symmetries in the networkshown in Fig. 5b beyond its automorphisms? The answer to this question is, indeed, yes:there are extra symmetries in the network of Fig. 5b, the fibration symmetries [12, 13],which do not form a group [4] but groupoids [14]. A groupoid is a set of transformationssatisfying the axioms of invertibility, identity and associativity but not the composition law(closure) [14], while in a group, transformations satisfy the four axioms. For this reason,groupoids are fundamentally different algebraic structures compared with traditional groupsymmetries.
A. Input tree
Roughly speaking, symmetry fibrations take into account only the input trees of thenodes, but not the output-trees (this is not true though when the input and output trees36re connected). Thus, node 3 in Fig. 5b is connected to node 7 via an out-going link, andthis link destroys the symmetry group, but node 3 is still symmetric with 2 via a symmetryfibration, since the input trees of nodes 2 and 3 are isomorphic, even though node 3 isconnected with 7. This is because the connection 3 → input tree ,which contains the full information received by a given node through the totality of all thepossible paths ending in that node and starting from every other node in the network. Thus,for every node i in the network G there is a corresponding input tree, called T i , which isdefined as a tree with a selected node r i , called the root, and such that every other node isa path P j → i of G starting from j and ending in i [16]. A link from node P j → i to node P k → i exists if P j → i = e j → k P k → i =, where e j → k is an edge of G .The concept of input tree has appeared in the literature as the universal total space intraditional categorical or topological terminology [12], the universal total graph from [13], theview in the theory of distributed systems, or the unfolding of a nondeterministic automatonin concurrency theory [13].For example, let us construct the input tree T of node 2 in the network on the left ofFig. 5b. The root is the node r at the uppermost level of the tree. Every other node of theinput tree of node 2 is a path P j → ending in 2. There are two paths of length 1: P (1)3 → and P (1)4 → ; three paths of length 2: P (2)2 → , P (2)5 → , and P (2)6 → ; and so on. Since P (2)2 → = e → P (1)3 → ,we put a link in the input tree from P (2)2 → to P (1)3 → because P (2)2 → = e → P (1)3 → . We thenadd all other links in the input tree using the same criterion. The resulting input tree T isshown in Fig. 5c, together with the input trees of all other nodes in the network in Fig. 5b.To simplify, we label each node of T i using the starting point of the corresponding path P j → i . For example, in T nodes P (1)3 → and P (1)4 → are labeled 3 and 4 respectively, and thelength of the path is equal to the depth of the node in the input tree.Thus, in practice, we arrive at the following way to construct the input tree: we startwith the node at the root, lets say node 2. We label every node P j → in the input tree by37ode j where the path starts. The first layer of the input tree consists of all the nodes thatare at a distance one from the root. In this case, nodes 3 and 4. Thus we add two links to2 from 3 and 4 in the input tree.The second layer of the input tree is obtained applying the same procedure to each nodein the first layer, 3 and 4. For instance, node 3 receives a link from 2 and 5. Thereforethe second layer of the input tree contains nodes 2 and 5 connected to node 3. We repeatthe procedure with the other node in layer 2: node 4. Node 4 receives a link only fromnode 6, and node 6 from no one. So, we add a link from 6 to 4 and this path does notpropagate further. The third layer of the input tree is obtained iteratively applying thesame procedure, and so on.We note that the input trees of nodes 1, 2, 3 and 7 are infinite since the network containsa cycle (or loop) between nodes 2 (cid:29)
3. For instance, T is infinite because there are pathscrossing the loop infinite times. On the other hand, the input trees of nodes 4, 5 and 6 arefinite since they do not cross the loop. B. Isomorphic input trees
The input tree T i at node i can be interpreted as the collection of all possible ‘histories’starting at some node and ending in node i . As shown in Section I C, if two input trees T i and T j are isomorphic, then the corresponding nodes i and j in network G have thesame dynamical state [15, 16]. This equivalence is understood in terms of a local in-isomorphism that maps nodes to nodes and links to links, so it formalizes the fact that thedynamical interactions represented by a directed link from gene to gene could be in principledifferent across genes, as long as the links are the same (or similar, in case that the producedsynchronization is approximate) inside the fiber.An isomorphism between T i and T j is defined as a bijective map τ : T i → T j , which mapsone-to-one the nodes and edges of T i to nodes and edges of T j .A minimal condition for the existence of an isomorphism between the input trees is thatthe two input trees have the same number of nodes (we could also add a condition of thesame degree sequence). Thus, it is clear that there could be no isomorphism between theinput trees of nodes 2 and 4, since the former contains an infinite number of nodes and thelater just two. Thus, a minimal condition for an isomorphism to exist is that it should be a38apping between two input trees with the same number of nodes, since the mapping needs tobe bijective, i.e., with an inverse. By inspection it is then clear that there is an isomorphismbetween the input trees of nodes 4 and 5. This isomorphism is the map τ → : T → T , andit is written as a transformation following the notation: τ → = ↓ ↓ , (isomorphism between input trees of nodes 4 and 5). (7)which maps the root of T to the root of T as τ → (4) = 5, and node 6 ∈ T to node 6 ∈ T as τ → (6) = 6. The notation starts with the root of the tree and then we write nodes ineach level from top to bottom starting from left to right in each level. In this particularexample the links are of the same type, so there is no need to specify the mapping betweenlinks in the isomorphism, but in general the local equivalence require that nodes are map tonodes and also links are mapped to the same type of link by the isomorphism.The map in Eq. (7) is one of the simplest isomorphism since the input tree contains onlyone level. In this particular case, to see that nodes T and T are isomorphic, it is thusenough to see that both nodes 4 and 5 connect to one and the same node, which is node6 in this case. That is, both input trees of nodes 4 and 5 are isomorphic because they aremade up of just two nodes and one edge, and this isomorphism implies that 4 and 5 receivethe same information. This is the simplest form of an isomorphism between input trees. Inthis case, we say that node 4 and 5 have the same input-set , which is an input tree of onlyone level, that is the set of incoming links. The input-set is used in the groupoid formalismin Ref. [14].Next, we consider the input trees of nodes 2 and 3. By visual inspection, both inputtrees have the same ‘shape’. However, these trees are infinite in the number of levels. Howdo we decide if two input trees are isomorphic when they have an infinite number of levels?Remarkably, to determine if two input trees are isomorphic, it suffices to check that theyare isomorphic up to the N − N is thetotal number of nodes in the network G . This is an important result that allows us to avoidto check an infinite number of equivalences. Since G has | N G | = 7, we use six levels in theinput trees to determine that there is an isomorphism between T and T which corresponds39o the following map: τ → = . . . ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ . . . , (isomorphism between input trees of 2 and 3).(8)There are no other isomorphism between the other input trees. Notice that T is not iso-morphic to T and T by just one link to the root.The existence of an isomorphism τ from the input tree of node i to the input tree of node j implies the synchronization of x i and x j [15]. In the groupoid formalism of Golubitskyand Stewart, it is said that two nodes are synchronized if their input-set are synchronized,too [14]. Analogous work in dynamical systems shows that automorphisms in networks leadto synchronized nodes in orbits, see [17–20] and references therein. The orbit of a givennode is obtained by applying all automorphisms of a network to the node and the nodesin the orbit are synchronous. The synchronized orbits obtained from automorphisms areanalogous to the synchronized fibers obtained from symmetry fibrations. In general, everyorbit is also a fiber, but the opposite is not true, since a fiber is not necessarily an orbit.In our analysis of the E. coli network, we find some automorphisms. Some of the starfibers with n = 0 are also orbits of the networks since they are invariant under permutationsymmetries of the symmetric group of order n , S n . But this is only when the genes inthe star have no out-going links. As shown in the example of Fig. 5, an out-going link inany of the star genes, will destroy the automorphism, but not the fiber. For this reason,automorphisms are somehow more prevalent in undirected networks. For instance, we havefound that automorphisms describe the symmetries of the gap junction connectome of C.elegans , which is composed all of undirected links [34]. In the case of directed biologicalnetworks treated here, while automorphisms could be of use to discover some synchronizednodes, the majority of synchronization is due to symmetry fibrations, which are not describedby automorphisms. 40 . From fibrations to symmetry fibrations via isomorphic input trees and minimalbases
A fibration is any morphism from a network G = ( N G , E G ) to a base G = ( N G , E G ): ψ : G → B [12]. If a network G = ( N G , E G ) has at least one pair of isomorphic input trees,then there exists a network B = ( N B , E B ), called the base of G , such that G can be ‘fibered’over B by the graph fibration. The base B is defined as follows: • a node I ∈ N B is a representative of the set of nodes { i ∈ N G } whose input trees areisomorphic; • an edge e I → J where I, J ∈ E B is defined as e I → J = (cid:80) i ∈ I e i → j , where e i → j ∈ E G .Having defined the base network B , we say that G is fibered over B if there exists a surjectivemorphism ψ : G → B , called surjective graph fibration [13], that maps nodes and edges of G to nodes and edges of B as: ψ ( i ) = I for all i ∈ N G , and ψ ( e i → j ) = e I → J . A surjectivemorphism is a map between two sets (the domain and codomain) where each element of thecodomain (in this case B ) is mapped to, at least, by one element of the domain (in this case G ). The set of nodes i ∈ N G that are mapped to the same node I ∈ N B , and denoted by ψ − ( I ), is called the fiber of G over node I . We notice that all input trees of nodes whichbelong to the same fiber are pairwise isomorphic.In general a surjective graph fibration ψ can map nodes with isomorphic input trees todifferent bases, thus, the number of fibers is not minimal.A surjective graph fibration that maps all genes with isomorphic input trees to a singlecommon node in B is called a surjective minimal graph fibration in the sense of [13]. Such aminimal fibration will generate then the minimal bases of the network and will produce thelargest collapse of nodes in fibers. In this work we only deal with surjective minimal graphfibrations and we call them symmetry fibrations for short.In practice, a symmetry fibration maps G to the minimal base B (analogous to thequotient), that consists of the following steps: (i) consider all the nodes in a fiber (whichhave isomorphic input trees) and choose one as the representative I , (ii) collapse the nodesin the fiber into one single node in B and call it by the name of the representative node I , (iii) for every link of a node j in G directed to the node I in G , add a link in B from j to I . If the node j belongs to the fiber, then the corresponding link in B is an autoregulation41oop in B , (iv) repeat for every fiber in G . When fibers belong to disjoint components ofthe network, then they are considered as distinct fibers. V. ALGORITHM TO FIND FIBERS WITH MINIMAL BALANCE COLORING
The algorithm to partition the network into fibers is based on the ’minimal balancedcoloring’ algorithm developed by Cardon & Crochemore in Ref. [24]. Here we follow a versiondeveloped by Kamei & Cock [25] to construct a minimal balanced coloring of a network,namely a coloring that employs the least possible number of colors, which is associatedwith minimal graph fibrations. The algorithm’s runtime scales as O ( | E G | log | N G | ), whichimplies that it is essentially linear with the network size, specially for sparse networks, andcan be applied to very large networks.The theory of balance coloring is explained in Ref. [14]. A balance coloring creates apartition of nodes of G into disjoint sets (corresponding to synchronous fibers) such that eachnode in one set receives the same number of colors from nodes within other sets [14, 20]. Acoloring of G with this property is the balanced coloring and represents an equitable partition of the network, see [14, 20]. The sets identified by a minimal balanced coloring partitionsthe network with minimal colors and corresponds to the fibers of G identified by minimalgraph fibrations ψ [13–15].Thus, we color nodes such that synchronous nodes in a fiber receive the same colors fromtheir synchronous nodes. As example, the genes baeR and spy (Fig. 1a) have the same colorand are in the same fiber since they receive the same colors from their neighbors: both baeR and spy receive one red color via the activator link from one red node ( baeR from itself and spy from baeR ) and one green activator link each from the green node cpxR .The algorithm constructs a coloring of the nodes that is balanced. A coloring is balancedif two identically colored nodes are connected to identically colored nodes via their inboundlinks. Each balanced colored cluster is a fiber in the network. The fibers also corresponds tothe orbits in a network when the symmetries are automorphisms rather than isomorphismsin the input trees. The flow of the algorithm is exemplified with the example network ofFig. 6. • Step 1 - We start by assigning the same color to all nodes. In Fig. 6a all nodes areinitially colored in blue. In addition, we assign to each link the same color of the42
IG. 6:
Algorithm to find the fibers of a network through a minimal balanced coloring.
The goal of the algorithm is to find a minimal balanced coloring of the network, so that two nodeshave the same color only if they are connected to the same number of identically colored nodes viainbound links. The colors represent the fibers in the network. node from where it emanates. To update the coloring (or, equivalently, to generate anew partition) of nodes, we construct the table shown in the right panel of Fig. 6a, asexplained next. In the top row of this table we put the network nodes colored withtheir current color. In the leftmost column we put each type of colored link. In thisinitial stage of the algorithm we only have a blue link for all the nodes. Then, wefill the entries of the table with the number of colored links of this blue type that arereceived by the corresponding node. For example, node 1 receives two 2 blue links aswell as nodes 2 and 3. Nodes 4, 5 and 7 receive one blue link each, and node 6 nothing.The structure of this table determines the new coloring as explained in the next step.43
Step 2 - Using the table in Fig. 6a we update the coloring of nodes as follows. Weassign the same color to all nodes that receive the same number of colored links of eachtype. Specifically, nodes 1, 2 and 3 receive two blue links, so we assign them the same(blue) color. Analogously, nodes 4, 5 and 7 receive one blue link, so we assign themthe same color, but different from blue. We assign them a purple color. Similarly, weassign another color to node 6 (green). We then obtain the colored network in theleft of Fig. 6b. Applying the counting of receiving coloring links to this network, weobtain the new coloring table shown in Fig. 6b, where each link has the color of thenode from where it emanates. Thus, we update the table to generate the new coloring,as shown in the right panel of Fig. 6b. • Step 3 - Using the same criterion as in Step 2, we update the coloring of nodes,comprising now five different colors, and then we generate the new table, as shownin Fig. 6c. At this point the algorithm stops, because we do not need to introducemore colors, since each color is balanced. Each color corresponds to a fiber, and eachnode in each colored fiber receives the same colors from other fibers or from nodes inthe same fiber. Therefore, the coloring shown in the network of Fig. 6c is the minimalbalanced coloring of the network, and the colors indicate the fibers in the network.As far as only minimal fibrations are considered, the algorithm will return always the samefibers containing the same nodes, for any initial condition and realization. Below we providethe pseudo-code to clarify the algorithm. More detailed instructions and methodology forobtaining fiber building blocks will be given in a follow-up paper. We start by assigning allnodes to the same fiber and then continue to refine the partition basing on the input set ofthe node until no further refinement can be obtained.44 lgorithm 1
Finding fibers following Kamei & Cock Ref. [25]
Input:
Graph G = { N G , E G } , where N G are vertices and E G are edges of the analyzednetwork | N G | - number of vertices, N G = { v . . . v | N G | } Output: C = { c i } , where c i - color of node i and i = 1 · · · | V | Notation: I i = { I i . . . I Ni } , where N = current number of colors N = 1 for i = 1 · · · | N G | do c i = 1 end for j = 0 repeat for i = 1 · · · | N G | , k = 1 ...N j do I ki = number of nodes of color k in the input set of v i end for H = set of all unique { I i } // assign each unique vector a color and color the graph accordingly for i = 1 · · · | N G | do c i = index of I i in H , e.g. if two nodes have the same I i and I j → c i = c j end for j = j + 1 N j = | H | until N j (cid:54) = N j − return { c i } I. STRONGLY CONNECTED COMPONENT
In a directed network, the strongly connected component is composed of nodes that arereachable from every other node in the component. That is, there is a directed path fromevery node to any other node in the strongly connected component. A weakly connectedcomponent is obtained when we ignore the directionality of the links. Strongly connectedcomponents are relevant to genetic fibers since they contain loops that control the state ofthe genes. We find four types of strongly connected components. Single-gene componentscomposed of autoregulator loops like cpxR and fadR in Figs. 1a and 1e. The other typeof components are those in Fig. 2a and Fig. 2b and also a five-gene connected componentshown in SI Fig. 7. We note that most of the fibers regulated by these components donot belong to the connected component. This is because they receive information but donot send information back to the connected component. These fibers are characterized byinteger fiber numbers. When the fiber receives and sends back information, that is, whenthe fiber belongs to the strongly connected component, then it becomes a Fibonacci fiber.The largest strongly connected component in the
E. coli network controls the pH systemshown in Fig. 2b.
VII. STATISTICS OF FIBERS IN THE TRN OF
E. COLI
A. Fibers statistics in
E. coli
SI Table I shows the counts in the
E. coli network of each building block. For instancethe most abundant building blocks are the following: | n = 0 , (cid:96) = 1 (cid:105) : 45 | n = 1 , (cid:96) = 0 (cid:105) : 13 | n = 0 , (cid:96) = 2 (cid:105) : 13 | n = 1 , (cid:96) = 1 (cid:105) : 8The list is completed with the fractal building blocks of Fibonacci sequences which areless numerous but more complex in their structure: | ϕ = 1 . .., (cid:96) = 2 (cid:105) : 1 46 IG. 7: A five-gene connected component of soxR, soxS, fnr, fur, and arcA with its regulatedfibers. ϕ = 1 . .., (cid:96) = 1 (cid:105) : 1 | ϕ = 1 . ..., (cid:96) = 1 (cid:105) : 1 Structure type Amount in E-coli | n = 0 , l = 1 (cid:105) | n = 0 , l = 2 (cid:105) | n = 0 , l = 3 (cid:105) | n = 1 , l = 0 (cid:105) | n = 1 , l = 1 (cid:105) | n = 1 , l = 2 (cid:105) | n = 2 , l = 0 (cid:105) | n = 2 , l = 1 (cid:105) | ϕ d = 1 . .., l = 1 (cid:105) | ϕ d = 1 . .., l = 1 (cid:105) | ϕ d = 1 . .., l = 2 (cid:105) Total number of building blocks
B. Full list of fibers in
E. coli
SI Table VI shows the complete list of the 91 fibers building blocks found in the geneticnetwork of
E. coli . We list the genes in the fiber plus their external regulators. If a geneor operon is not in this list, for instance lacZYA , it means that the gene or operon is notin a fiber. Supplementary File 1 shows the plot of the circuit of every fiber and the fiberbuilding block.The first column in SI Table VI is the ID of the fiber. This ID refers to the plot of thefiber building block in Supplementary File 1. The second column lists the genes in the fiber,the third column lists the external regulators. The last column specifies the fiber number48ssociated with each fiber as | n, (cid:96) (cid:105) or | ϕ d , (cid:96) (cid:105) . VIII. DATASETS OF BIOLOGICAL AND NON-BIOLOGICAL NETWORKS
To investigate the applicability of fibrations in a broader context, we performed an ex-tensive analysis of different complex networks from diverse domains in systems science.Full details of each network analyzed can be accessed at https://docs.google.com/spreadsheets/d/1-RG5vR_EGNPqQcnJU8q3ky1OpWi3OjTh5Uo-Xa0PjOc . The codes to re-produce this analysis are at github.com/makselab and the full datasets appear at kcorelab.org . See also tables below with information about the networks.We first show the symmetry fibrations in biological networks and species. See SectionI H. We characterize biological networks spanning from: • Biological networks: transcriptional regulatory networks, metabolic net-works, cellular processes networks and pathways, disease networks, neuralnetworks.
We study the following species: • Species: A. thaliana, E. coli, B. subtilis, S. enterica (salmonella), M. tuber-culosis, D. melanogaster, S. cerevisiae (yeast), M. musculus (mouse), andH. sapiens (human).
We then study non-biological networks in Section I H: • Social Networks: online social networks, Facebook, Twitter, Wikipedia,Youtube, email networks, communication networks, citation networks, col-laboration networks, bloggers • Internet: routers, autonomous systems, web graphs, hyperlinks, peer-to-peer • Infrastructure Networks: power grid, airport, roads, flights • Economic Networks • Software Networks: Linux, jdk • Ecosystems etwork Domain Total No. of nodes Total No. of edges No. of networksBiological 287390 4211856 289Economic 1752 108639 5Ecosystems 1879 5378 14Infrastructure 24511 82534 16Internet 244634 835565 27Social 104909 1261009 15Software 43391 503645 3TABLE II: Features of the networks across domains. We report the total numbers for each domainsummed over all the networks in the domain.Species Total No. of nodes Total No. of edges No. networksYeast 55932 1392926 11Arabidopsis Thaliana 790 1431 1Bacillus subtilis 5602 11417 3Drosophila 39549 321734 5Escherichia coli 879 1835 1Human 72587 1198712 248Micobacterium Tuberculosis 1624 3212 1Mouse 64709 987424 7Salmonella 8293 15589 6TABLE III: Number of networks per species. rabidopsis Bacillus Caenorhabditis Cat Drosophila Escherichia Human Micobacterium Mouse Rat Salmonella YeastThaliana subtilis elegans coli TuberculosisTF 1 2 2 0 4 1 4 1 4 0 2 11Neuron 0 0 0 1 1 0 0 0 3 3 0 0Metabolic 0 0 0 0 0 0 48 0 0 0 2 0Disease 0 0 0 0 0 0 66 0 0 0 0 0Kinase 0 0 0 0 0 0 2 0 0 0 0 0Pathway 0 0 0 0 0 0 127 0 0 0 0 0Protein 0 1 0 0 0 0 1 0 0 0 2 0 TABLE IV: Table with the count of networks per type of biological network and species. Thesenetworks are used to calculate the distributions of fiber across species and biological types in Figs.4a, b, and c. For each type of biological network in Fig. 4a, b, we calculate the count over thetotal number of networks as indicates at the end of each row for each biological type. The sameoccurs with the number of networks at the end of each column for each species. Figure 4c showsthe counts over all the network shown in the last row/column. etwork Subdomain Total No. of nodes Total No. of edges No. of networksAutonomous systems graphs 141842 481415 14Bitcoin 9664 59777 2Collaboration networks 50260 504897 4Disease 4309 15254 66Facebook 4039 88234 1Youtube subscriptions 13723 76765 1Internet peer-to-peer networks 31978 110154 4Jazz 198 5484 1Linux 30837 213954 1Metabolic 4273 33829 50Networks with ground-truth communities 1005 25571 1Neural networks 3694 129812 8Cellular processes and Pathways 9825 54712 127Plant-Pollinator 1631 2719 11Plant-Seed-Disperser 65 165 2Power grid 4941 6594 1Sentiment 99 278 2Transcriptional regulatory 260258 3908769 32TABLE V: Subtypes of networks belonging to the different domains. d Fiber Regulators Fiber Number1 aaeR, ampDE, azuC, comR, cyaA, narQ, sohB, speC,spf, trxA, yaeP-rof, yaeQ-arfB-nlpE, yjeF-tsaE-amiB-mutL-miaA-hfq-hflXKC crp | n = 0 , l = 1 (cid:105) | n = 0 , l = 1 (cid:105) | n = 1 , l = 0 (cid:105) | n = 1 , l = 1 (cid:105) | n = 0 , l = 2 (cid:105) | n = 0 , l = 3 (cid:105) | n = 0 , l = 1 (cid:105)⊕| n =1 , l = 1 (cid:105) | ϕ d = 1 . .., l = 1 (cid:105) | n = 1 , l = 0 (cid:105)
10 alaA-yfbR, avtA, leuE, livJ, livKHMGF, lysU, sdaA lrp | n = 0 , l = 1 (cid:105)
11 alaE, kbl-tdh, yojI lrp | n = 0 , l = 1 (cid:105)
12 alaWX, argU, argW, argX-hisR-leuT-proM, aspV, flxA,glyU, leuQPV, leuX, lptD-surA-pdxA-rsmA-apaGH, lysT-valT-lysW, metT-leuW-glnUW-metU-glnVX, pheU, pheV,proK, proL, queA, serT, serX, thrU-tyrU-glyT-thrT-tufB,thrW, trmA, tyrTV-tpr, valUXY-lysV fis | n = 0 , l = 1 (cid:105)
13 aldB, hupB crp, fis | n = 0 , l = 2 (cid:105)
14 allA, allS, gcl-hyi-glxR-ybbW-allB-ybbY-glxK allR | n = 0 , l = 1 (cid:105)
15 alsR, rpiB | n = 1 , l = 0 (cid:105)
16 amiA-hemF, cmk-rpsA-ihfB, uspB IHF | n = 0 , l = 1 (cid:105) | n = 1 , l = 0 (cid:105)
18 ampC, dacC bolA | n = 0 , l = 1 (cid:105)
19 araE-ygeA, araFGH araC, crp | n = 0 , l = 2 (cid:105)
20 arcZ, ydeA arcA | n = 0 , l = 1 (cid:105)
21 argA, argCBH, argE, argF, argI, argR, artJ, artPIQM, lysO | n = 1 , l = 0 (cid:105)
22 argO, lysP argP, lrp | n = 0 , l = 2 (cid:105)
23 aroF-tyrA, tyrB tyrR | n = 0 , l = 1 (cid:105)
24 aroH, trpLEDCBA, trpR | n = 1 , l = 0 (cid:105)
25 asnB, clpPX-lon, glsA-ybaT, uspE gadX | n = 0 , l = 1 (cid:105)
26 aspA-dcuA, dcuR crp, fnr,narL | n = 0 , l = 3 (cid:105)
27 bacA, cpxPQ, cpxR, ftnB, ldtC, ldtD, ppiD, sbmA-yaiW,slt, srkA-dsbA, xerD-dsbC-recJ-prfB-lysS, yccA, yebE,yidQ, yqaE-kbp, yqjA-mzrA | n = 1 , l = 0 (cid:105)
28 baeR, spy cpxR | n = 1 , l = 1 (cid:105)
29 bcsABZC, fnrS, pdeF, pepT, pitA, ravA-viaA, tar-tap-cheRBYZ, upp-uraA, xdhABC, ydeJ, ytiCD-idlP-iraD fnr | n = 0 , l = 1 (cid:105)
30 bdcA, dkgB, grxD, mepH, mhpT, pgpC-tadA, rfe-wzzE-wecBC-rffGHC-wecE-wzxE-rffT-wzyE-rffM, rybB, tehAB,tsgA, ydbD, yeaE nsrR | n = 0 , l = 1 (cid:105)
31 betI, betT arcA, cra | n = 1 , l = 2 (cid:105)
32 bioA, bioBFCD birA | n = 0 , l = 1 (cid:105)
33 bluF, ydeI rcdA | n = 0 , l = 1 (cid:105)
34 borD, envY-ompT, mgrB, mgrR, mgtLA, mgtS, pagP, rstA,ybjG phoP | n = 0 , l = 1 (cid:105)
35 cbpAM, gltX, gyrB, msrA fis | n = 0 , l = 1 (cid:105)
36 cdaR, garD, gudPXD | n = 1 , l = 0 (cid:105) | n = 1 , l = 0 (cid:105)
38 cirA, entCEBAH, fepA-entD, fiu crp, fur | n = 0 , l = 2 (cid:105)
39 copA, cueO cueR | n = 0 , l = 1 (cid:105)
40 cra, pitB, sbcDC phoB | n = 0 , l = 1 (cid:105)
41 crl 1, exbBD, fepDGC, fhuACDB, fhuE, gpmA, metJ, nohA-ydfN-tfaQ, ryhB, ygaC, yhhY, yjjZ fur | n = 0 , l = 1 (cid:105)
42 cusCFBA, cusR, yedX hprR, phoB | n = 1 , l = 2 (cid:105)
43 cvpA-purF-ubiX, glrR-glnB, hflD-purB, lolB-ispE-prs,purC, purEK, purL, speAB purR | n = 0 , l = 1 (cid:105)
44 cysDNC, cysK, tcyP, yciW, ygeH, yoaC cysB | n = 0 , l = 1 (cid:105)
45 cytR, nagC, nagE, ycdZ crp | n = 1 , l = 1 (cid:105)
46 dapB, lysC argP | n = 0 , l = 1 (cid:105)
47 ddpXABCDF, patA, potFGHI, yeaGH, yhdWXYZ ntrC | n = 0 , l = 1 (cid:105)
48 decR, mlaFEDCB, yncE marA | n = 0 , l = 1 (cid:105)
49 dgcC, iraP, nlpA, wrbA-yccJ, yccT csgD | n = 0 , l = 1 (cid:105)
50 dicB-ydfDE-insD-7-intQ, dicC-ydfXW dicA | n = 0 , l = 1 (cid:105)
51 dsdC, norR nsrR | n = 1 , l = 1 (cid:105)
52 dtpA, omrA, omrB ompR | n = 0 , l = 1 (cid:105)
53 ecpA, ecpR matA | n = 0 , l = 1 (cid:105)
54 efeU 1U 2, motAB-cheAW, psd-mscM, tsr, ung cpxR | n = 0 , l = 1 (cid:105)
55 epd-pgk-fbaA, gapA-yeaD, mpl cra, crp | n = 0 , l = 2 (cid:105)
56 erpA, iscR, rnlAB | n = 1 , l = 0 (cid:105)
57 evgA, nhaR hns | ϕ d = 1 . .., l = 1 (cid:105)
58 fabA, fabB fabR, fadR | n = 0 , l = 2 (cid:105)
59 fadE, fadIJ arcA, fadR | n = 0 , l = 2 (cid:105)
60 fbaB, fruBKA, glk, gpmM-envC-yibQ, pfkA, ppc, pykF,pyrG-eno, tpiA cra | n = 0 , l = 1 (cid:105) | n = 0 , l = 1 (cid:105)
62 folE-yeiB, metA, metC, metF metJ | n = 0 , l = 1 (cid:105)
63 fpr, pqiABC, rirA-waaQGPSBOJYZU marA, soxS | n = 0 , l = 2 (cid:105)
64 fucAO, fucR, zraR crp | n = 1 , l = 1 (cid:105)
65 gfcA, ybhL, yfiR-dgcN-yfiB, ymiA-yciX yjjQ | n = 0 , l = 1 (cid:105)
66 hupA, trg crp, fis | n = 0 , l = 2 (cid:105)
67 ibaG-murA, rplU-rpmA-yhbE-obgE mlrA | n = 0 , l = 1 (cid:105)
68 ibpAB, yadV-htrE IHF | n = 0 , l = 1 (cid:105)
69 idnK, idnR crp, gntR | n = 1 , l = 2 (cid:105)
70 isrC-flu, pth-ychF oxyR | n = 0 , l = 1 (cid:105)
71 lgoR, uxuR crp, exuR | ϕ d = 1 . .., l = 2 (cid:105)
72 lolA-rarA, osmB rcsB | n = 0 , l = 1 (cid:105)
73 lsrACDBFG-tam, lsrR, oxyR, rbsR crp | n = 1 , l = 1 (cid:105)
74 malI, mlc crp | n = 1 , l = 1 (cid:105)
75 manA, yhfA crp | n = 0 , l = 1 (cid:105)
76 mngAB, mngR | n = 1 , l = 0 (cid:105)
77 nadA-pnuC, nadB nadR | n = 0 , l = 1 (cid:105)
78 nimR, nimT | n = 1 , l = 0 (cid:105)
79 ompX, rpsP-rimM-trmD-rplS, ychO, ysgA fnr | n = 0 , l = 1 (cid:105)
80 pepD, yhbTS csgD | n = 0 , l = 1 (cid:105)
81 phoP, slyB | n = 2 , l = 0 (cid:105)
82 pspABCDE, pspG IHF, pspF | n = 0 , l = 2 (cid:105)
83 purR, pyrC fur | n = 1 , l = 1 (cid:105)
84 rhaR, rhaS crp | n = 2 , l = 1 (cid:105)
85 rrsA-ileT-alaT-rrlA-rrfA, rrsE-gltV-rrlE-rrfE fis, lrp | n = 0 , l = 2 (cid:105)
86 rrsB-gltT-rrlB-rrfB, rrsC-gltU-rrlC-rrfC, rrsD-ileU-alaU-rrlD-rrfD-thrV-rrfF, rrsG-gltW-rrlG-rrfG, rrsH-ileV-alaV-rrlH-rrfH fis, hns, lrp | n = 0 , l = 3 (cid:105)
87 ssb, uvrA arcA, lexA | n = 0 , l = 2 (cid:105)
88 ttdABT, ttdR | n = 1 , l = 0 (cid:105) | n = 0 , l = 1 (cid:105)
90 yegRZ, yfdX-frc-oxc-yfdVE evgA | n = 0 , l = 1 (cid:105)