A system-wide network reconstruction of gene regulation and metabolism in Escherichia coli
Anne Grimbs, David F. Klosik, Stefan Bornholdt, Marc-Thorsten Hütt
AA system-wide network reconstruction of gene regulation and metabolism in
Escherichia coli
Anne Grimbs , David F. Klosik , Stefan Bornholdt , and Marc-ThorstenHütt ∗1 Computational Systems Biology, Department of Life Sciences & Chemistry, JacobsUniversity, Bremen, 28759, Germany Institute for Theoretical Physics, University of Bremen, Bremen, 28359, Germany ∗ [email protected] a r X i v : . [ q - b i o . M N ] M a r enome-scale metabolic models have become a fundamental tool for examining metabolic principles.However, metabolism is not solely characterized by the underlying biochemical reactions and catalyzingenzymes, but also affected by regulatory events. Since the pioneering work of Covert and co-workers aswell as Shlomi and co-workers it is debated, how regulation and metabolism synergistically characterizea coherent cellular state. The first approaches started from metabolic models which were extendedby the regulation of the encoding genes of the catalyzing enzymes. By now, bioinformatics databasesin principle allow addressing the challenge of integrating regulation and metabolism on a system-wide level. Collecting information from several databases we provide a network representation ofthe integrated gene regulatory and metabolic system for Escherichia coli , including major cellularprocesses, from metabolic processes via protein modification to a variety of regulatory events. Besidestranscriptional regulation, we also take into account regulation of translation, enzyme activities andreactions. Our network model provides novel topological characterizations of system components basedon their positions in the network. We show that network characteristics suggest a representation ofthe integrated system as three network domains (regulatory, metabolic and interface networks) insteadof two. This new three-domain representation reveals the structural centrality of components withknown high functional relevance. This integrated network can serve as a platform for understandingcoherent cellular states as active subnetworks and to elucidate crossover effects between metabolismand gene regulation. ntroduction
So far, metabolic processes and gene regulatory events are typically considered individually in system-level investigations. However, ample evidence exists that the majority of cellular processes involvesboth, metabolism and gene regulation, and thus requires their joint examination [1]. One of the best-investigated individual examples in
Escherichia coli ( E. coli ) is the phosphoenolpyruvate–carbohydratephosphotransferase system (PTS) which is responsible for import and phosphorylation of sugars [2].Additionally, the PTS is involved in the regulation of the import process depending on the availablecarbohydrate mixtures in the growth medium. By carbon catabolite repression and inducer exclusion,primarily the uptake of a preferred carbon source to be metabolized, such as glucose, is selected fromother carbohydrates present in the growth medium. In order to understand the underlying principles,not only the effects of both ’layers’, metabolism and regulation, need to be taken into account, butalso their interface [3].On a more qualitative level, the importance of the interface of metabolism and gene regulationcan be illustrated by having a closer look at their most prominent representatives, namely, enzymesand metabolic transcriptional regulators. Both examples are proteins and can be thought of as acomponent type organizing the interplay of genes and metabolic reactions (Figure 1). For enzymesthe connection is straightforward: The majority of metabolic reactions can only take place if thecorresponding genes of the catalyzing enzymes are expressed. These genes, in turn, are often involvedin regulatory processes, especially if they are associated with central biochemical reactions. In contrast,metabolic transcriptional regulators can be illustrated by looking at transcription factors, the probablybest-investigated transcriptional regulators. Some of them require the binding of a metabolite to beactive and are therefore called metabolic transcriptional regulators. In the context of the integrativeview discussed here, it is noteworthy that only the interaction with a metabolic component enablestheir functionality as gene expression regulators.Conventional reconstructions of
E. coli ’s metabolism as well as of its gene regulation thoroughlydescribe the process itself but usually lack information on interacting elements of the other biologicalsystem. While there are numerous genome-scale metabolic reconstructions available [4–9], only afew large-scale transcriptional regulatory networks exist that are mainly based on the informationfrom RegulonDB [10]. First attempts to integrate both cellular processes started from metabolicreconstructions which were expanded by regulatory genes and stimuli of the associated encodingmetabolic genes [11, 12]. Both studies started from the metabolic model of Reed et al. [5] and include104 regulatory genes and 583 regulatory rules regulating approximately 50 % of the metabolic genes.In this manner, the close proximity of regulatory events was captured but more far-reaching andglobal effects, e.g. , self-contained regulatory dynamics among genes, could not be considered. Furtherapproaches examine the regulatory processes of the metabolic network based on the aforementionedpioneering attempts [13, 14]. For this purpose, the information about regulatory events was assembledin terms of Boolean rules as a variant of Boolean network models.More recently, Chandrasekaran and Price [15] introduced a method called probabilistic regulationof metabolism, a new variant of regulatory flux balance analysis, i.e., the class of approaches behindsome of the pioneering integrative models discussed above [11, 12]. The state of the many variantsof integrating regulatory information into flux-balance analysis models has been reviewed in [16] and[17]. The necessity of achieving such data integration, even on the network level, has recently beendiscussed in [18]. To a certain degree all these studies consider the regulation of metabolism but onlycover the proximity rather than a genome scale.Understanding the interplay of metabolism and gene regulation will help to gain insight in cellular,system-wide responses such as to changing environmental conditions. Here, we present the database-assisted reconstruction of an integrative
E. coli network capturing metabolic as well as regulatoryprocesses. The attribution of network components (in terms of individiual vertices) to the metabolicand regulatory domains, as well as the protein interface enables the further characterization of thenetwork in terms of its modular organization, its path statistics and the vertex centrality.In particular, we formulate a new measure by evaluating domain-traversing paths , in order to quan-titatively assess the role of components in the interface domain and thus identify cross-systemic key3 ene regulationMetabolism
Gene
TranscriptionalregulationEnzymecatalysis
Reaction Transcriptionfactor
Geneexpression
EnzymeCompound
Figure 1: Schematic representation of the involved processes and biological elements in the integrativemetabolic-regulatory
E. coli network. Gene regulatory processes primarily comprise genes( ) and several proteins (monomers as well as complexes ), mainly transcription factors.In contrast, metabolic processes are predominantly defined by small molecules ( ) and thecatalyzing biochemical reactions ( ). The interactions between regulatory and metabolicprocesses can be mainly characterized by proteins (also modified proteins ) serving asenzymes and regulators, respectively. While regulatory links are represented as dashedlines, the encoding and reaction-associated links are shown as solid lines.elements contributing to both regulatory and metabolic processes. In all cases, these topological as-sessments highlight system components and functional subsystems, which are well known for theirbiological relevance, thus emphasizing the predictive power of network topology. Employing observa-tions on the topological (structural, network-architectural) level, in order to identify components inthe system of particular functional relevance has a long tradition in network biology (and in networkscience in general).The main results of our investigation are: We present an integrated network representation ofgene regulation and metabolism of
E. coli and illustrate how it is a promising starting point for thestructural investigation of system-wide phenomena. In particular, the network perspective suggeststhe explicit consideration of a protein interface between the genetic and metabolic realms of thecell. Employing network metrics we argue (1) that a three-domain partitioning is architecturallyand functionally plausible, and (2) show that prominent components of the network according tothe structural investigation tend to be of evident biological importance. Especially, the evaluation ofpossible paths through the interface domain of the network reconstruction yields well-known functionalsubsystems. The overlap of structural and biological relevance, here, suggests that a careful analysisof such a structural model can guide biological investigations by focusing on a limited number ofstructurally outstanding components. This network model can also serve as a starting point for arange of topological analyses with methods developed in statistical physics (see, e.g ., [19] for a recentreview).Summarizing, in contrast to the separate analyses of ( e.g. , the metabolic or gene regulatory) subsys-tems, we expect that the integrative network model shown here will draw the attention to system-widefeedback loops not contained in the individual subsystems and to different roles of individual compo-nents, which become only visible from the perspective of interdependent networks.4 esults
Database-assisted network reconstruction
By now, the dramatic growth of bioinformatics databases [20], both in content and in diversity, allowsaddressing the challenge of integrating regulation and metabolism on a system-wide level. We deviseda semi-automated framework to integrate information from EcoCyc database [21] and RegulonDB [10]into a network for
E. coli including major cellular processes, from metabolic processes via proteinmodifications to a variety of regulatory events (see Methods). Networks are an efficient data structurefor integrating this wealth of information [22–24]. In this way, the vast amount of information con-tained in the bioinformatics databases provide an ’architectural embedding’ for metabolic-regulatorynetworks and guides subsequent steps of model refinement and validation. We augmented and vali-dated the resulting network based on existing reconstructions of metabolic [6, 8, 25–27] as well as ofgene regulatory processes [10].The integrative
E. coli network constructed here comprises the three major biological components,genes, proteins, and metabolites, as well as the metabolizing reactions summing up to more than 12,000components. Represented as a graph the network has seven types of vertices (Figure 2, Table S1) andseven different types of edges including two types of encoding associations, four reaction-associated re-lations, and regulatory links (Table S2). The graph representation facilitates the mapping of reactionsand their catalyzing enzymes, as both are depicted as vertices. In contrast, metabolic systems areoften represented as hypergraphs to illustrate the Boolean ’AND’ association of reaction educts andthe fixed stoichiometric ratio of the involved metabolites. Those aspects are assigned explicitely asedge properties in the graph representation. Besides the associations of reaction educts, the encodingrelations of protein complexes are of Boolean ’AND’ type, termed conjunct links. On the contrary,associations representing isoforms of protein subunits, isoenzymes as well as reaction products areimplemented by Boolean ’OR’ links, called disjunct . The third linkage type, regulation , covers ap-proximately 7,300 regulatory associations, i.e. , transcriptional, translational as well as metabolic ones(Table S3). Vertex composition i MC1010*Reaction 4693 569/ 767Compound 2681 557/ 615Gene 2545 971/1010Protein monomer 1917 771/ 817Protein-protein complex 929Protein-compound complex 100Protein-RNA complex 312868 2868/3209 * accounted only for enzymatic reactions and uniquemetabolites (1076 and 762 in total)
Figure 2: Spring-block graph representation (using a scalable force directed placement algorithm) andvertex composition of the integrative
E. coli network. The coverage of the pioneer modelfrom Covert et al. [11] is provided in column i MC1010.
The metabolic and regulatory processes
The comparison with existing models reveals that the presented integrative network is a comprehen-sive representation of the metabolic and regulatory processes in
E. coli . The very first approach of5mbedding metabolic processes in the regulatory context of Covert et al. [11], the i MC1010 model,started from a metabolic model which was extended by the regulation of the encoding genes of the cat-alyzing enzymes. For the purpose of determining the overlap of the integrative metabolic-regulatorynetwork and the i MC1010 model, transport reactions as well as the artificial biomass reaction havebeen disregarded and, moreover, only unique metabolites (neglecting compartmentation) have beentaken into account. Else, the different levels of details of the transport systems such as PTS as well asof the compound compartmentation would render a correct mapping impossible. Overall, the i MC1010model is covered by our model to more than 89 % (Figure 2, see Table S4, column 3).To assess the coverage of
E. coli ’s metabolic processes, the embedded metabolic processes of theintegrative
E. coli network have been associated to the ones of an established
E. coli metabolicreconstruction, namely the i AF1260 model from Feist et al. [6]. About 67 % of the involved biochemicalreactions, compounds and genes could be mapped directly (see Table S4, column 4). Particularly, thesetwo thirds capture almost all biologically relevant components in terms of in silico viability. Usingflux balance analysis for simulating the biomass production capacity of the i AF1260 model and takingthe overlap with mapped components of the integrative
E. coli network revealed that for the defaultmedium setup approximately 75 % of the essential reactions (to yield 1 % biomass) are covered bythe integrative
E. coli network.Analogous to the metabolic processes, the coverage of
E. coli ’s gene regulation has been determinedusing the transcriptional regulatory network from RegulonDB [10]. This model has been assembledin a similar fashion but is accounting only for transcription factors and their regulated genes. With acoverage of more than 98 %, the transcription-related regulatory processes are considered as completelyrecorded in the integrative
E. coli network (see Table S4, column 5). Apart from that, for thisassessment of overlap a comparison of regulatory processes associated with RNA translation as wellas metabolic regulatory events is not possible since the RegulonDB transcriptional regulatory networkdoes not consider protein and metabolic interaction processes.
The interface of metabolic and regulatory processes
The most conspicuous links between metabolic and gene regulatory processes are metabolic transcrip-tion factors, i.e. , gene expression regulators binding metabolites, and metabolic genes, i.e. , genes withsignificant and coordinated response on the metabolic level such as encoding enzymes. Intuitively,the interface is considered so far as the direct interactions of metabolic elements and gene regulatoryelements, and the integrative
E. coli network can be partitioned into metabolic and regulatory domain(MD – RD).However, by examining those interactions in more detail the topological role of proteins becomesapparent. Regarding the metabolic transcription factors, the respective metabolite binds to a proteinand this metabolite-protein complex then subsequently regulates the gene expression. In the case ofmetabolic genes, ultimately the respective gene encodes a protein which either by itself or as a complexserves as an enzyme. In line with this, the interface of metabolic and gene regulatory processes shouldbe considered as the series of interactions of metabolites and genes, respectively, with proteins andsubsequent protein modifications. Thus, the interface does not only comprise interactions (edges) butalso components (vertices), and the integrative
E. coli network will in the following be divided into ametabolic domain, a protein interface and a regulatory domain (MD – PI – RD).In the next section, the plausibility of the three-domain partition (and the set of biologically moti-vated rules devised to create it) will be assessed in comparison to the likewise proposed two-domain(MD – RD) representation.
The interface structure – a matter of network partition
In order to assess the large-scale structure of the reconstructed network we apply a set of rules thatassign each vertex of the network to one of two and three domains, respectively, by considering thebiological types of the vertices themselves as well as those of their neighbors (as outlined in the6ethods section). Since these rules have been designed to group together vertices connected to thesame biological processes we expect them to result in biologically plausible network partitions.To complement the two functional partitions, MD – RD and MD – PI – RD, two partitions thatsolely take into account the vertex types have been analyzed, also representing a metabolic-regulatorydivision into two and three domains, respectively. For the vertex-driven two-domain partition, thesets of gene and protein vertices denote the regulatory processes while in the three-domain partitionregulation is given by the set of genes and the interface domain only consists of the protein vertices.In both cases, metabolism is represented by the sets of reactions and compounds. In the three-domaincase, the vertex-driven three-domain partition, the vertex set of proteins form an interface similar tothe MD – PI – RD partition (Figure 3). The functional and vertex-driven three-domain partitionsare of roughly similar size in terms of vertex count, while the respective two-domain partitions havea metabolic-regulatory vertex ratio of 5:1 and 4:3, respectively (see Table 1).First, the two three-domain partitions will be compared, i.e. , the functional partition, MD – PI –RD, and the vertex-driven partition. In the following, we will argue that the additional third domainacts as an interface between the regulatory and metabolic domains in the functional partition, while wewill see that the vertex-driven partition fails to give a coherent picture of the domain-level organizationof the biological system.Especially, it will become clear, also in later sections, that the interface domain in the functional par-tition contains processes that are known to play prominent roles in system-scale communication withinthe cell, and may therefore be considered an important component of the large-scale organizationalstructure of the combined regulation and metabolism of
E. coli .A simple quantity to illustrate the domain-level picture is the fraction of inter-module edges (linkingto a vertex of a different domain) over all edges connected to vertices of a specific domain ( i.e. , externaland internal edges). Of course, there is no objectively ’correct’ partition the result of our procedurecould be measured against, but there are a number of fundamental properties that a biologicallyplausible partition in the given context should possess. On the one hand, a proper interface providesthe main means of communication between the regulatory and the metabolic processes, i.e. , themajority of paths between the outer two domains should run through the interface. Indeed, theinterface of the functional partition shows a considerably larger inter-module edge fraction than theremaining domains (0.7 compared to 0.5 and 0.1, Table 1), stressing its special character as a bridgingmodule. A high inter-module edge fraction of the interface is also found in the vertex-driven partition,however, its regulatory domain shows an even higher inter-module edge fraction which indicates anentanglement between the two groups rather than one domain acting as a bridging module to anotherdomain. This exactly give rise for the second criteria, that the domains should capture actual processes(here, structures on the level of several vertices). Unambiguously, regulatory or metabolic processesshould be contained within the respective domain so that system-wide interaction takes place betweenprocesses. In the following chapter, Interface characterization, it will be shown that this actually is alsothe case for the interface in the functional partition. In contrast, in the vertex-driven partition alreadythe regulatory domain show deficiencies with respect to that criterion. Since this regulatory domainsolely contains gene-gene interactions the intermediate transcription factor steps are not within thedomain which become visible in the almost exclusively inter-module edges, linking it to the interfacedomain.Next, we compare the three-domain partitions with the two-domain partitions. While the intro-duction of a third domain allows to study the system in terms of an explicit interface, the partitionsinto two domains is much closer to common biological intuition. The question which needs to beanswered is whether metabolism and gene regulation are solely interfaced by the linking processessuch as gene expression, and activation or inhibition of transcription factors and genes, so that thesystem can appropriately be described with two domains. Or whether there is an actual interface thatpreferably comprises entire processes additionally including protein modifications and suchlike. Here,this question will be assessed from a topological perspective.A relevant topological quantity is the network modularity [28] of a given network partition. Fora biologically meaningful classification, one would expect on the network level that the regulatory7unctional three-domain partition A Functional two-domain partition B Vertex type-driven three-domain partition C Vertex type-driven two-domain partition D Figure 3: Graph snapshots of the four partitions: the functional three-domain partition into metabolicand regulatory domains and protein interface (MD – PI – RD) ( A ), the functional two-domain partition into metabolic and regulatory domains (MD – RD) ( B ), vertex-driventhree-domain partition into compounds/reactions, proteins and genes ( C ), vertex-driventwo-domain partition into compounds/reactions, and proteins/genes ( D ). Vertices are col-ored according to their domain-affiliation: yellow – (pseudo) regulatory and gene-focuseddomain, respectively, and blue – (pseudo) metabolic and compound-focused domain, respec-tively. The interface domain in the three-domain partitions are drawn in red. The diagramsin the top right corners of each panel show the edge composition of the system in terms ofintra-domain and inter-domain edges. 8able 1: Topological properties of the functional and vertex type-driven network partitions. The func-tional partitions are denoted by the respective modules, metabolic domains (MD), regulatorydomain (RD) and protein interface (PI). The vertex type-driven partitions are representedby the comprising vertex types, reaction ( ), compound ( ), gene ( ), and protein ( ). Foreach property, the module-specific coefficients and contributions (I, II, III) are presented,respectively. For the modularity, M , the overall network coefficient (Total) is shown as wellas the best coefficient is underlined, the module-specific values correspond to the terms inthe sum of equation (1). Functional partitions Vertex-driven partitionsMD – PI – RD MD – RD – – –I – II – III I – III I – II – III I – IIIVertices I 8369 10655 7374 7374II 2286 2949III 2213 2213 2545 5494I 0.086 0.106 0.319 0.319Inter-moduleedge fraction II 0.701 0.915III 0.485 0.485 0.969 0.49Modularity, M Total 0.287 0.157 0.081 0.226I 0.166 0.079 0.113 0.113II 0.042 -0.027III 0.078 0.079 -0.005 0.113and the metabolic domains show high intra-module connectivity (a large number of links are within adomain) and sparse inter-module linkages (a small number of links are between domains). Accordingly,the network modularity should be high for a successful partition. The results for the modularity arelisted in Table 1. The functional partitions clearly outperform the vertex type-driven partitions. Also,when going from MD – RD to MD – PI – RD there is a notable increase in the modularity of thenetwork ( M = . M = . Interface characterization
The interface of metabolic and gene regulatory processes of the integrative
E. coli network comprises,as expected, predominantly proteins, i.e. , monomers and complexes (Table S1), and mainly proteinmodification processes such as protein translation, protein complex formation and biochemical proteinconversion (Table S2). On closer examination, the covered processes can be divided in internal andperipheral ones. According to the bridging role of the interface, the majority of these are peripheralprocesses (Figure 3, Table S2). The peripheral processes, in turn, can be subdivided according to theirdirectionality meaning from regulatory to metabolic domain (subsequently termed ’downwards’) andfrom metabolic to regulatory domain (’upwards’), respectively. To enumerate the portion of peripheralprocesses forming complete paths across the interface, direct downwards and upwards links and thenew topological concept of domain-traversing paths (or short: traversing paths ) have to be considered.A traversing path connects regulatory and metabolic domain via the protein interface, whereby onlystarting and end vertex are not affiliated to the bridging domain and the path direction is considered9arefully (see Methods).Examination of the downwards-upwards subdivision, especially the traversing paths, reveals a con-siderable (though biologically expected) asymmetry of the interface (Figure 4): The downwards inter-face is much more pronounced comprising predominantly the transcription of enzymes, i.e. , metabolicgenes, and the formation of enzymatic protein complexes. On the contrary, the upwards inter-face is comparably sparse with roughly half the direct (102/283) and quarter the traversing paths(4,070/18,904) connections of the downwards interface. These few upwards processes mainly includethe formation of metabolic regulators, especially transcription factors, and the corresponding regula-tory events.
Gene regulatory domainProtein interfaceMetabolic domain
Vertices
Figure 4: Schematic overview of the components and connections of the integrative
E. coli network,especially those involved in the protein interface. The information about edges are presentedin gray and about traversing paths are shown in dark goldenrod while the number of verticesare shown in dark blue and the traversing paths-related ones are given in dark brown, inaddition. The solid lines denote direct link connections while the dashed lines the traversingpaths connections.In addition to confirming the interface asymmetry, the traversing paths reveal the bottleneck char-acteristic of the interface. First indications for this special property are (1) the low number of involvedvertices and (2) the distribution of traversing path lengths. For both, downwards and upwards travers-ing paths, the number of distinct interface vertices in the traversing paths is low compared to the totalnumber, i.e. , 1,393 and 449 interface vertices of 2,286 in total, respectively (Figure 4). On the otherhand, for both, downwards and upwards traversing paths, emerges a remarkable clustering of pathsof length 8–10 and four, six, and 9–11, respectively (Figure 5). This is in contrast to a smooth distri-bution one would expect in random graphs. By enumerating the involved vertices it is striking thatmore than 44 % of traversing paths contain one of five three-vertex-combinations, respectively. The10espective combinations of downwards and upwards traversing paths pertain to three functional sys-tems, the phosphoenolpyruvate-dependent sugar phosphotransferase system, PTS, the ribonucleotidereducing system, RNR system, as well as the nitrogen regulation two-component signal transductionsystem, NtrBC system (Table S5). C oun t s RDMD 3 4 5 6 7 8 9 10 11 12 13 14 15Length of traversing paths0200400600800100012001400 C oun t s RDMD
Figure 5: Distribution of the path lengths for the downwards (RD MD) and upwards traversingpaths (MD RD), respectively (dark blue). The golden bars represent the fraction ofdownwards and upwards traversing paths comprising the PTS and RNR, and the NtrBCsystem associated vertices.All three biological subsystems, the PTS [2, 29], the RNR [30–33] as well as the NtrBC system[34–36] are well-studied with respect to their functionality and their cellular context. A schematicrepresentation of the three subsystems is provided in Figure 6. The PTS is an enzymatically activeprotein complex involved in the transport and phosphorylation of several sugars, so-called PTS-sugars[2]. In the integrative
E. coli network more than 18 different sugars serve as potential substrates whichare imported from peroxisome to cytosol at the same time (Table S6). The substrate variety togetherwith the manifold usage of the associatively produced pyruvate point out the key role of the PTS in
E. coli ’s metabolism and, moreover, suggest that the PTS acts as a bottleneck in the interface.The RNR system, the second system dominating the downwards traversing paths, provides themajor DNA building blocks [32]. Each of the different core enzyme classes, ribonucleotide reductaseclass I–III, are capable of catalyzing the reduction of all four nucleotides. Its transcriptional andmetabolic regulation ensures the balanced supply and, thus, avoid the increase of mutation rates andthe loss of DNA replication fidelity [37]. The central cellular role which is reflected in its regulatoryembedding, together with its alternate substrates point to its special position in the interface.The NtrBC system is a two-component signal transduction system initiating the nitrogen starvationresponse regulation. More precisely, depending on the nitrogen availability NtrB can autophosphory-late and the transfer of the NtrB phosphate group activates the global transduction regulator, NtrC. In
E. coli , more than 40 genes known to be activated are involved in the nitrogen-response reaction suchas active transport and mobilization of nitrogen in terms of N-containing compounds (for integrative
E. coli network see Table S7). The extensive regulatory function and the linkage to metabolism dueto the allocation of ATP for NtrB autophosphorylation indicate that also the NtrBC system acts as abottleneck in the interface, in the opposite direction to the PTS and RNR system.The three central traversing paths systems and their biological relevance suggest that a topologicallyprominent position can be indicative of a biologically important functional entity. To corroborate thegeneral validity of this indication, in the following section different topological properties have beenanalyzed and the prominent elements have been further characterized from a functional perspective.11 lc p G6P HPr-PHPr EI-PEI PyrPEP
Periplasm Cytosol A NADPNADPH red. TRXox. TRX NDPdNDPRNR B ATPADP NtrBNtrB-P NtrC-PNtrC H OPi transcriptional activation C Figure 6: Classical representation of the three major interface systems of the integrative
E. coli net-work, the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS, A ), theribonucleotide reducing system (RNR system, B ) and the nitrogen regulation two-componentsignal transduction system (NtrBC system, C ). The edges represent biochemical reactionsand the vertices denote the involved compounds and proteins. The reactions and proteinshighlighted in dark blue are the most abundant vertices determining nearly half of thetraversing paths (Table S5). Cross-systemic key elements of
E. coli
The integration of metabolic and regulatory events allows us to determine the key elements of
E. coli ,especially those beyond the individual processes. In particular, the functional three-domain partitionfacilitates to recover network components (in terms of individual vertices) of evident biological rele-vance, e.g. , by means of simple centrality measures. In the following, two different aspects of centralityhave been examined [38]: degree centrality depicting the direct linkage of a vertex, and betweennesscentrality which can be thought of as the participation of a vertex in the network flow [39].Starting with the prominent local vertex structure, the so-called hubs (here, vertices with a totaldegree larger than 50), it is noticable that they are primarily compounds and proteins, in particularprotein complexes and appear in all three domains (see Table S8, columns 3–5). In the metabolic do-main, hubs include trivial compounds such as H + and H O and, so-called, currency metabolites, e.g. ,ATP, NAD(P)H and coenzyme A, while hubs of regulatory processes are obviously global regulatorswhich characteristically exhibit a remarkably strong asymmetry of in-degree and out-degree. Partic-ularly, well-known transcriptions factors top this list such as FNR (fumarate and nitrate reduction)[40], Fis (factor for inversion stimulation) and H-NS (histone-like nucleoid structuring protein) [41].As stated above, hubs predominantly occur in metabolic and gene regulatory domain while only a feware affiliated to the protein interface. However, it was not to be expected to identify cross-systemicelements solely based on their degree. 12o assess/detect cross-systemic key elements an extended approach of degree centrality has beenused that additionally accounts for the domain boundaries. The intra-domain degree fraction ξ , alsotermed embeddedness [42], denotes the ratio of the internal degree of a vertex, within a domain, andthe total degree in the network. This measure very clearly distinguishes between, on the one hand,metabolic and regulatory hubs which show intra-domain degree fractions ξ > .
87 (except one singlecompound with ξ = . ξ ≤ .
06 (see Table S8,last column). Thus, while metabolic and regulatory hubs are embedded in their respective domains,hubs in the protein interface are mainly connected to vertices in the neighboring domains. In total,seven hubs show a significant low intra-domain degree fraction pointing to their prevalent interactionswith the other two domains (Figure S1 and Table S11, column 5). Six of them are affiliated to theprotein interface exhibiting numerous interactions with the regulatory domain. Their linkages to themetabolic domain become visible when considering their composition, in case of the protein complexes,and their modes of action, respectively. The former involve the four protein-compound complexes Crp-cAMP (cyclic-AMP receptor protein binding cyclic-AMP) [29, 43, 44], DksA-ppGpp (dnaK suppressorbinding guanosine 3’-diphosphate 5’-diphosphate) [45–47], NsrR-NO (nitrite-sensitive repressor bind-ing nitric oxide) [48–50] and Lrp-Leu (leucine-responsive regulatory protein binding leucine) [51–53]whose naming schemes already indicate the metabolic link. The latter, namely, protein complex Cra(catabolite repressor activator) and protein monomer Lrp (leucine-responsive regulatory protein) formin the presence of appropriate metabolites, i.e. , fructose 1,6-bisphosphate/fructose 1-phosphate andleucine, complexes affecting their regulatory effect. The remaining hub is the metabolic-domain vertexrepresenting guanosine 5’-diphosphate 3’-diphosphate (ppGpp). Besides its special domain-affiliationamong the low intra-domain degree hubs, ppGpp acts as an important regulator of both, metabolismand transcriptional processes. More precisely, it regulates several enzyme activities as well as numeroustranscription initiations by allosterically binding to RNA polymerase.So far, we demonstrated that the protein interface of the
E. coli network reconstruction acts asa bridging module between regulatory and metabolic domain enabling their interaction and com-munication. Therefore, we expect the betweenness centrality to directly highlight vertices from theinterface. Indeed, ten out of the top-25-ranked (still including currency metabolites) vertices are fromthe interface (see Table S9, column 5), while overall the interface only accounts for about 18 % ofthe vertices of the network. Especially, the already mentioned protein-compound complexes Crp-cAMP and DksA-ppGpp are among these compounds. In general, currency metabolites and trivialcompounds (see above) as well as global regulators are among the central components with respectto betweenness. Overall, the compliance of the most central components regarding degree and be-tweenness accounts approximately 50 %. Apart from that, biochemical reactions building up and/orbreaking down these metabolites and proteins as well as the other involved reactants pertain to themost betweenness-central components. Component association to functional systems allows to assessthe systemic feature and by considering the corresponding network affiliation to depict the candidatesfor cross-systemic key elements. In this manner the network analysis allows us to detect the centralrole of Crp-cAMP, Lrp-Leu and ppGpp on purely topological grounds, as each component is the focusof such a functional system with high betweenness. Additionally among the top-ranked vertices withrespect to betweenness centrality are five further cross-systemic components which are assigned tothe protein interface, namely, phosphorylated PhoB (PhoB-P), Fur-Fe , and three outer membraneproteins (Omp), OmpC, OmpE and OmpF (Table S9). The former two components are transcriptionfactors and therefore acting in the gene regulatory domain, while at the same time they are proteincomplexes binding a metabolic small molecule depicting the connection to the metabolic processes.The latter three, the outer membran porins, form hydrophilic channels, enabling non-specific diffu-sion of small molecules across the outer membrane [54–56]. In this role these proteins represent themost obvious connections of gene regulatory and metabolic domain – their encoding genes are highlyregulated while the porins enable numerous metabolic transport reactions.By focusing on the connecting domain of gene regulation and metabolism, the two centrality mea-sures reinforce the key role of further cross-systemic elements. Considering the protein interface-induced subgraph both centralities point out the vertices that top the list of the above-discussed13ownwards traversing paths (Table S10). In more detail, both major systems contributing to thedownwards traversing paths are represented each by three vertices, namely, PTS and RNR system(Figure 6, panels A and B). Having a look at the intra-domain degree fraction, which put the focuson protein interface vertices as described above, additionally highlights a representative of the up-wards traversing path system NtrBC (Figure 6, panel C), as the second non-hub (Table S11). Thiscorroborates the predictions from the traversing paths and, thus, shows that our new topological mea-sure reveals cross-systemic elements which otherwise only stand out under detailed scrutiny of a largeamount of biological information. Discussion
Here, we present an integrative network covering metabolic processes as well as regulatory events of
E. coli but, especially, the interaction between both systems. With more than 10,000 vertices, itcomprises around two third of the metabolic processes currently integrated in metabolic reconstruc-tions [6] and concerning regulatory events, the presented network incorporates more than 95 % of theestablished transcription-related processes [10]. Both, metabolic and gene regulatory processes areintegrated on a genome scale rather than one of the two providing the network basis which is thenexpanded by closely related processes in the other subsystem, as it has been done, for example, inconventional metabolic reconstructions which solely involve the encoding genes indirectly. Hitherto,integration of transcriptomics data could only be achieved using the so-called gene-protein-reaction(GPR) associations. On the one hand, this procedure limits the applicable data set to metabolicgenes and, on the other hand, it acts on the assumption that all expressed enzymes are present intheir active form. Starting from the integrative
E. coli network, integrating transcriptomics data ismuch more straightforward and, more importantly, the complete data set can be applied. In this way,multi-domain variants of the frequently employed network-based interpretation of ’omics’ data [57–61]can be formulated and indirect and regulatory impacts on metabolism can be examined.The novelty of the reconstruction, the connection of metabolism and gene regulation, allows usnot only to investigate the separate systems but also to assess their interactions. The most relevantconnecting links are proteins, on the one hand, those acting as enzymes and, on the other hand,metabolic transcription factors. The functional classification, together with the topological analysis,suggests a network division into three domains: metabolic domain, protein interface and regulatorydomain. This partition was corroborated by different connectivity measures and reflects a biologicallyreliable categorization in two delimited modules linked by a bridging module.The principal structural feature of the network model, the three-domain organization, is reminicientof the ’bow-tie’ architectures frequently discussed in the theory of complex systems, where an inputand an output layer are connected via a (typically much smaller) intermediate network [62–64]. Such abow-tie structure (or, rather, the presence of several nested bow-tie architectures) has for example beendiscussed for metabolic networks [65], where the diversity of inputs (nutrients) and outputs (biomasscomponents) is much larger than the intermediate processing layer. It has been hypothesized that sucha bow-tie organization is a prerequisite for the robust operation of a complex system [62, 63]. Herewe observe a bow-tie organization in a system consisting of a rich ’material flow’ system (metabolism)and a similarly rich ’control’ system (gene regulation) connected via a protein interface.As our topological assessment shows, the bridging character of the protein interface entails a bottle-neck functionality. The analysis of the new topological measure, termed traversing paths , highlightedthree major biological systems represented by 12 vertices forming more than 40 % of these paths(comprising in total 1465 distinct vertices). These traversing path systems, namely phosphotrans-ferase system (PTS), ribonucleotide reducing (RNR) and nitrogen regulation two-component signaltransduction (NtrBC) system, are well-investigated ones with key biological relevance for
E. coli ’smetabolism as well as its gene regulation suggesting that a topologically prominent position points toan important biologically functional entity.Further detection of cross-systemic key elements in the network was accomplished using additionaltopological measures. In particular, two centrality measures were studied to account for different14spects of importance in terms of direct linkage and participation in network flow. Apart from con-spicuous components, such as trivial compounds, currency metabolites and global regulators, a groupof seven hubs were revealed by degree centrality whose characteristic is a significant low intra-domaindegree fraction what numerically reflects the bridging feature of the protein interface. As expected,these components are located in the interface except for one, the vertex representing guanosine 5’-diphosphate 3’-diphosphate (ppGpp) which is affiliated to the metabolic domain. On the other hand,the inspection of betweenness centrality highlights rather biological systems than single componentsand as such point to key components detected before in their functional context. Besides trivialcompounds and currency metabolites, this includes Crp-cAMP (cyclic-AMP receptor protein bindingcyclic-AMP), Lrp-Leu (leucine-responsive regulatory protein binding leucine) and ppGpp which standout due to their intra-domain degree fraction as well as seven further components already revealed ashubs.Intriguingly, the interface-specific key elements of the network could be corroborated by exactly thesetwo centrality measures. The assessment of the interface-induced subgraph using both centralitiesemphasizes altogether eight vertices of the downwards traversing paths discussed above contributingto the two major systems PTS and RNR. Taking into account the intra-domain degree fraction pointout a representative of the upwards traversing path system NtrBC. In conclusion, the importanceof vertices revealed by the here presented traversing paths could be reinforced by well-establishedtopological measures showing the predictive power of the new measure.Eventually, the key elements of the integrative
E. coli network according to both centralities illus-trate the importance of the different domains and their combined consideration (Table 2). Unsur-prisingly, the majority of key elements are affiliated to the metabolic domain and represent trivialcompounds and currency metabolites, e.g. , H + , H O, ATP and NAD(P) + . Moreover, predominantlycross-systemic components top this combined list of central elements. First of all, the vertices empha-sized also by their low intra-domain degree fraction attract attention, namely, Crp-cAMP, Lrp-Leuand ppGpp. These vertices demonstrate the value of the integrative approach: Only when embeddedin domain context their vertex importance emerged. In case of the former two components, addition-ally, the composition unveils the cross-systemic role, i.e. , a transcriptional factor protein binding ametabolic small molecule affecting its regulatory activity. Likewise the two regulatory key elements,Fur-Fe and PhoB-P, exhibit this conspicuous linkage to the metabolic domain illustrating theircross-systemic property. In other words, they belong to the so-called metabolic transcription factorsand, thus, are related to the upwards interface. The opposite is the case for the three metabolic Omp(outer membran porin) transporters that are among the key elements. While their metabolic linkageis more than obvious, the relation to the regulatory domain appears when the encoding genes areexamined. These are highly regulated amongst others by the global regulators Crp-cAMP, Fur-Fe ,Lrp-Leu and PhoB-P. In this manner, the Omp’s are classical representatives of proteins related tothe downwards interface, even though they are not affiliated with it. The remaining key elementsare three metabolic small molecules which are counter-intuitively also related to the interface and thecross-systemic elements detected by the traversing paths. While in case of pyruvate the connectionto PTS is apparent at first glance (Figure 6, panel A), the link of glutamate and ammonium andthe NtrBC system is less perceptible. The actual connecting element is glutamine which is the ligaseproduct of glutamate and ammonium. It activates the (de)uridylylation of the regulatory protein PIIwhich, in turn, inhibits NtrB autophosphorylation [34, 66]. Altogether, the links to the three majortraversing path systems are certainly not the only important processes these elements are involvedin but they reinforce their biologically central roles. Remarkably, these connecting elements showup when considering the entire network while to acknowledge their importance the interface-specificanalysis is needed.Beyond the detection of key elements, the integrative approach will allow to examine the interplayand distribution of short-term and long-term regulation in E. coli ’s metabolism. While metabolicregulation of, for instance, enzyme activities occurs on a short time-scale, regulation of gene expressionis a long-term control process. Both types of regulation have been incorporated in the network eventhough only on a qualitative level, i.e. , as activator or inhibitor. Like this, the different effective ranges15able 2: Key elements of the integrative
E. coli network with respect to degree (DC) and betweennesscentrality (BC) rank as well as their functional characteristic and cross-systemic property,respectively. Squares denote trivial compounds ( ◻ ) and currency metabolites ( ∎ ) whilethe colored arrows depict the cross-systemic contribution – ▼ downwards interface-related, ▲ upwards interface-related. The orange arrows emphasize the cross-systemic componentswith significant low intra-domain degree fraction and the golden ones point out elementsindirectly linking to one of the major traversing paths systems.Vertex name DC BC PropertyProton 1 1 ◻ H O 2 2 ◻ ATP 5 3 ∎ Phosphate (P) 4 4 ∎ Proton (periplasmic) 6 5 ◻ Crp-cAMP, transcriptional dual regulator 3 9 ▲ ADP 10 6 ∎ outer membran porin F 7 24 ▼ outer membran porin C 7 29 ▼ H O (periplasmic) 12 31 ◻ outer membran porin E 9 34 ▼ Fur-Fe + , transcriptional dual regulator 25 21 ▲ Pyrophosphate 18 30 ∎ NAD +
13 35 ∎ Phosphate (periplasmic) 19 36 ∎ PhoB-P, transcriptional dual regulator 45 12 ▲ Lrp-Leucine, transcriptional dual regulator 43 17 ▲ Guanosine 5’-diphosphate 3’-diphosphate 35 26 ▲ NADP +
21 41 ∎ Glutamate 24 40 ▲ Pyruvate 30 39 ▼ Coenzyme A 23 48 ∎ CO
32 41 ∎ NH +
31 46 ▲ in metabolism can be assessed and, thus, its covering by one or both regulation types where centralmetabolism is said to be highly controlled. From the perspective of recent advances in network theory
With their balance of structural detail and functional simplicity, network models are capable of re-vealing organizational principles, which are hard to recognize on a smaller systemic scale ( e.g. , byanalyzing individual pathways) or in functionally richer system representations ( e.g. , in dynamicalmodels). One purpose of the network provided here is to enable work at the interface of statisti-cal physics and systems biology, where the rich toolbox of complex network analysis is employed toidentify functionally relevant non-random features of such biological networks.The recent work of Jensen et al. [67], for example, showed that network structure can reveal,whether an enzyme is susceptible rather to genetic knockdown or pharmacologic inhibition. Whilein the present study, the network measures do not distinguish between different kinds of vertices orlinks, the rich biological meta data concerning the different biological roles of the components couldbe translated into distinct vertex and edge classes. In our own investigation [68] we used this fact tostudy, in a further example of such an interdisciplinary effort, the balance of robustness and sensitivityin the interdependent network of gene regulation and metabolism, based on the reconstructed network16rovided here.In general, we expect that our network reconstruction can serve as a relevant data resource forthe application of methods from the analysis of multiplex [69] and other multilayer networks [19,70]. Recently, there has been a growing interest in the properties of these systems, especially in thepresence of explicit interdependencies between vertices [69, 71]. In contrast to monoplex networksinterdependent networks can show a qualitatively different robustness against failures, i.e. , cascadingfailures leading to a sudden system breakdown at a critical initial attack size [72, 73]. The case ofdifferent vertex types (as opposed to different edge types) has been considered, for example, in thecontext of secure communication in a network where eavesdroppers control sets of vertices [74].On a general level, analyzing statistics of paths with respect to the network’s large-scale structure,like the domain-traversing paths used here, might prove useful for the evaluation of other networksthat show (possibly more than one) interface-like features.
Concluding remarks
In summary, the analysis of network topology allows to determine key system components in the inte-grative
E. coli network. In line with expectations, trivial compounds as well as currency metabolitesshowed up regardless of the measure that has been applied. In addition, further obvious componentsincluding several global regulators were identified. More striking is the detection of components andsystems which solely emerge when analyzing specifically the interface. These hidden elements areassociated to two of the biologically well-investigated functional subsystems, PTS and NtrBC. Bothwell-established and newly designed measures of the interface point out the same subsystems, and eventhe analysis of the entire network discloses components indirectly related to these hidden subsystems.Apart from trivial and currency metabolites, every detected key element of the entire networkcontributes to some extent to the downwards and/or upwards interface. This unlooked-for cross-systemic property is reflected either in the complex composition, the intra-domain degree fraction,the proximity to key systems, and/or the interplay with regulatory and metabolic processes. Thebiological relevance of these components supports their detection and reinforces the predictive powerof the novel traversing path measure. In general, we believe that the presented integrative
E. coli network allows further investigations of the interplay of metabolism and gene regulation which willprovide insights into cellular, system-wide responses.
Methods
The interconnected
E. coli network is based on the EcoCyc database [21], release 20.0, which includes veri-fied information of metabolic and regulatory processes (corresponds to RegulonDB 8.6 [10]) for
E. coli
K-12substr. MG1655. The network is represented as a graph comprising five different types of vertices, encodinggenes, protein monomers and complexes (including enzymes), small compounds, and (bio)chemical reactions(Table S1), as well as three types of edges, encoding and catalyzing associations, reaction connections to eductsand products, and regulatory links to sources and targets (Table S2).
Extraction of database information
First, relevant information of the database has been extracted and arranged (Algorithm 1). For each regulatoryprocess, the respective source and target were specified and converted to match one of the vertex types (’regu-lation.dat’, file name of the EcoCyc-archive). To this end, the transcript units were separated into promoter,genes and terminator (if applicable), and the regulatory processes were multiplied per comprising gene. More-over, each regulating RNA has been translated into its encoding gene to meet the vertex types. In case of themetabolic processes, the reaction educts and products as well as the catalyzing enzymes have been assembledand converted to match one of the vertex groups, the respective educt and product stoichiometry have beenassigned and the reaction compartmentation and reversiblity have been assessed (’reactions.dat’). Thereby, ascell compartments the periplasmic space, the inner membrane, and the cytosol have been taken into accountand reversible reactions have been split up.Second, vertex candidates have been validated (’reactions.dat’, ’compounds.dat’, ’proteins.dat’, ’genes.dat’,’rnas.dat’) and divided into reaction , compound , protein monomer , protein-protein complex , protein-compound omplex , protein-RNA complex , and gene . In doing so, generic terms such as DIPEPTIDES have been substituted(’classes.dat’) and double annotations, e.g. , CPD-15709 and
FRUCTOSE-6P have been decoded. Thereupon, thecompositions and the encoding genes of the assembled proteins have been gathered and matched to the vertexgroups and the respective logical operation and stoichiometry have been annotated (’protcplxs.col’). Based onthe validated vertex lists, the regulatory and metabolic processes have been updated whereby each process wasremoved with at least one unidentified vertex resulting in the final edge lists.
Network implementation
With the validated vertex and edge lists the graph has been assembled and its largest weakly connected compo-nent has been extracted. The three domain partition MD – PI – RD (Tables 3 and S1) as well as the two-domainpartition are implemented as vertex properties affiliation and metabolic . Algorithms 2A and 2B show how thedomain affiliation of a vertex is determined by its type and its neighbors’ types and affiliations.Moreover, the mapping to the
E. coli model of Covert et al. [11] has been annotated which integrates themetabolic network i JR904 published by Reed et al. [5] and the transcription regulatory events related to theencoding genes of the catalzying enzymes. To this end, genes, proteins, metabolites as well as biochemicalreactions of the metabolic model have been mapped to the EcoCyc database (release 20.0), in a first stepautomatically based on their identifier and the resulting dictionaries have been manually curated. As theEcoCyc database does not account for compartmentation of compounds and reaction as well as for exchangereactions, unique metabolites and internal reactions have been considered resulting in a coverage of more than93 %. By additionally disregarding internal transport reactions a coverage of 96.5 % can be achieved (Table 3).Integrating the manually curated Covert dictionaries, each vertex has attributed (1) a unique identifier,according to the EcoCyc identifier but also indicating the compartment, (2) a unique type reference, (3) aunique assignment of the model components from Covert et al. [11], if applicable, and (4) the affiliations of thetwo- and three-domain partition. Furthermore, vertices of types gene and reaction have (5a) a name assigned,the blattner ID and the EC number, if applicable. The remaining vertices have additionally (5b) a compartmentassigned, where cytosol ( c ), extracellular space ( e ), periplasmic space ( p ), inner membrane ( i ), outer membrane( o ) and membrane in general ( m ) were taken into account. Similarly, each edge of the network has the attribute(1) type, specifying the connected vertices, and the corresponding (2) stoichiometry, where zero is assigned ifnot applicable or ambiguous. For edges depicting regulatory processes the stoichiometry actually denotes themode of regulation, namely activation ( + ,1) inhibition ( − ,-1) or combined (0). These edges additionally haveassigned (3) an identifier, according to the EcoCyc identifier and (4) a name, specifying the regulation type.All other edge types can be classified as either representing conjunct or disjunct links in the sense that all orsolely one incoming link is required for functionality (Table S2).The fully annotated integrative reconstruction of E. coli ’s metabolic and regulatory processes is provided asa graph representation in Supplementary File 1.
Graph properties concerning intra- and inter-module connectivity
The following measures have been used in the assessment of the graph partitioning scheme.
Inter-module edge fraction c : Given the set of vertices with the domain label D , edges connecting thesevertices to a vertex of a different label are considered external , while edges between vertices of the same labelare internal . We call c D = ( ( external + internal edges of D ) the inter-module edge fraction of domain D . Network modularity M denotes the degree to which a given partition divides the network in highly connectedgroups, modules, which are comparably sparsely connected among each other. Therefore, the intra-module linksare counted against the total degree of the module vertices (Equation 1), M = N M ∑ j = ⎛⎝ L ( v M j , w M j ) L G − ( deg ( v M j ) L G ) ⎞⎠ (1)with N M – L G – G , L ( v M , w N ) = ∑ v ∈ M ∑ w ∈ N link ( v, w ) , deg ( v M ) = ∑ v ∈ M deg ( v ) . omain-traversing paths A traversing path connects the regulatory and the metabolic domains via the protein interface, specifically, atraversing path of length k is of the form [( u, v ) , ( v , v ) , . . . ( v k − , w )] (2)where the vertices u and w are from the regulatory and the metabolic domain (and vice versa) and the vertices v i are distinct and part of the protein interface. Starting from the set of edges directly at the intersection oftwo domains iteratively the vertex successors of the interface domain as well as the final, first successor in thethird domain have been determined (Algorithm 3). Vertex centrality
The key elements of the integrative
E. coli network have been determined based on two graph properties.
Degree Centrality DC is a local centrality measure and denotes the total number of in- and out-going edgesof a vertex, (Equation 3), DC ( v ) = k v = k in v + k out v . (3)Here, the vertices with a total degree greater than 50 are termed hubs.By additionally accounting for the domain boundaries, the intra-domain degree fraction ξ (also termedembeddedness [42]) have been defined as ratio of internal degree, within domain D , and total degree of a vertex,(Equation 4), ξ D ( v ) = k int v k v = k v ∑ w ∈ D ( A vw + A wv ) (4)where A denotes the adjacency matrix of the graph. Betweenness Centrality BC describes the impact on the flux through the network, under the assumptionthat the transfer follows the shortest paths. In particular, it quantifies the fraction of shortest paths betweenall pairs of vertices which involve the designated vertex (Equation 5), BC ( v ) = ∑ s ≠ v ≠ t ∈ V σ st ( v ) σ st (5)where σ st is the number of all shortest-paths between the vertices s and t while σ st ( v ) yields the number ofthese paths that run through v [39]. lgorithm 1: Extraction of database information (EcoCyc, release 20.0) on regulatory andmetabolic processes. regprocs = AssembleRegulatoryProcesses()
Extract information on regulatory processes (’regulation.dat’)
REG_type,REG_ID,REG_source,REG_target,REG_mode = ParseRegulation() for rT in REG_type dofor rID in REG_ID[rT] do Convert regulation source and target to match one of the vertex groups if REG_source[iT][iID] ==
RNA then
Translate RNA into corresponding genes if REG_target[iT][iID] == transcript unit then
Split up transcription units into promoters, genes and terminatorsTranslate promoters and terminators into corresponding genes s,t = Match2Vertex(REG_source[rT][rID],REG_target[rT][rID])regprocs.append(rT,rID,s,t,REG_mode[rT][rID]) metprocs = AssembleCompartmentedMetabolicProcesses()
Extract information on metabolic processes (’reaction.dat’)
Rxn_ID,Rxn_enz,Rxn_dir,Rxn_l,Rxn_r,Rxn_loc = ParseReactions() for mID in Rxn_ID do Match catalyzing enzyme, educts and products to one of the vertex groups enz,left,right = Match2Vertex(Rxn_enz[mID],Rxn_l[mID],Rxn_r[mID]])
Annotate reaction compartmentation left_comp,right_comp = AnnotateCompartment(Rxn_loc[mID],left,right)
Split up reversible reactions and reverse ’right-to-left’ reactions if Rxn_dir == ’reversible’ then metprocs.append(mID+’_f’,enz,left_comp,right_comp)metprocs.append(mID+’_r’,enz,right_comp,left_comp) else if
Rxn_dir == ’right-to-left’ then metprocs.append(mID,enz,right_comp,left_comp) else metprocs.append(mID,enz,left_comp,right_comp) valReg,valMet,valVertices = ValidateProcessesVertices(regprocs,metprocs)
Compile vertex candidates from regulation sources and targets as well as metabolic reactions, thecorresponding educts and products, and encoding enzymes vertCands = AssembleCandidates(regprocs,metprocs)
Assign vertex candidates to the seven types: reaction , compound , gene , protein monomer , protein-protein-complex , protein-compound-complex , protein-rna-complex rxns,cmps,gns,prts,ppc,pcc,prc = ComposeVertexLists(vertCands) Decode generic terms (’classes.dat’) and double annotations curReg,curMet,curRxns,curCmps = Curation(regprocs,metprocs,rxns,cmps)
Prune vertex lists regarding unmapped candidates valVertices = PruneVertices(curRxns,curCmps,gns,prts,ppc,pcc,prc)
Prune regulatory and metabolic processes with respect to validated vertex lists valReg,valMet = PruneRegMetProcesses(curReg,curMet,valVertices)
Export lists of validated vertices and links, i.e. , processes lgorithm 2A: Network affilition compilation based on vertex type, and the vertex neighborstypes and affiliations. Affiliation assignment for non-ambiguous reactions, compounds and proteins.. continued in Algorithm 2B aff = AssignNonAmbiguousVertices()
Metabolic regulatory processes will be interpreted as metabolic links
MetRegs = [
Regulation-of-Enzyme-Activity , Regulation-of-Reactions ] for v in Vtypes(network.vertices) == reaction doif
Vtypes(Educts(v) ∧ Products(v)) == compound then aff(v) = metabolic else if
Vtypes(Educts(v) ∧ Products(v)) == ( compound ∨ protein ) then aff(v) = interface else aff(v) = tba for v in Vtypes(network.vertices) == compound do Compounds which participate in at least one reaction if InvolvedRxns(v) > 0 thenif aff(InvolvedRxns(v)) == 1 then aff(v) = aff(InvolvedRxns(v)) else if metabolic in aff(InvolvedRxns(v)) then aff(v) = metabolic else if interface in aff(InvolvedRxns(v)) then aff(v) = interface else aff(v) = ambiguous Compounds which adjacent vertices are either compounds or proteins else if
Vtypes(Neighbors(v)) == ( compound ∨ protein ) thenif regulation in Etypes(OutEdges(v)) thenif
REGtypes(OutEdges(v)) in MetRegs then aff(v) = metabolic else aff(v) = ambiguous else aff(v) = ambiguous else aff(v) = tba for v in Vtypes(network.vertices) == protein do Proteins with enzymatic function if enzyme - reaction in Etypes(OutEdges(v)) thenif
Vtypes(OutNeighbors(v)) == reaction thenif aff(OutNeighbors(v)) == 1 then aff(v) = aff(OutNeighbors(v)) else if metabolic in aff(OutNeighbors(v))) then aff(v) = metabolic else if interface in aff(OutNeighbors(v))) then aff(v) = interface else aff(v) = tba else if regulation in Etypes(OutEdges(v)) thenif
REGtypes(OutEdges(v)) in MetRegs then aff(v) = metabolic else aff(v) = regulatory else if aff(InvolvedRxns(OutNeighbors(v))) == 1 then aff(v) =aff(InvolvedRxns(OutNeighbors(v))) else if metabolic in aff(InvolvedRxns(OutNeighbors(v))) then aff(v) = metabolic else if interface in aff(InvolvedRxns(OutNeighbors(v))) then aff(v) = interface else aff(v) = tba Proteins involved in metabolic reactions else if educt - reaction in Etypes(OutEdges(v)) then aff(v) = interface
Proteins involved in regulatory processes else if regulation in Etypes(OutEdges(v)) thenif
REGtypes(OutEdges(v)) in MetRegs then aff(v) = metabolic else aff(v) = regulatory
Proteins involved in protein complex formation else if protein - complex in Etypes(OutEdges(v)) thenif reaction - product in Etypes(InEdges(v)) then aff(v) = interface else if regulation in Etypes(InEdges(v)) thenif
REGtypes(InEdges(v)) in MetRegs then aff(v) = metabolic else aff(v) = regulatory else aff(v) = interface else aff(v) = ambiguous lgorithm 2B: Network affilition compilation based on vertex type, and the vertex neighborstypes and affiliations. Affiliation assignment for non-ambiguous genes and vertices assigned as ambiguous . aff = AssignNonAmbiguousVertices() function resumption for v in Vtypes(network.vertices) == gene doif regulation in Etypes(OutEdges(v)) then aff(v) = regulatory else if regulation in Etypes(InEdges(v)) then aff(v) = regulatory else if regulatory in aff(OutNeighbors(v)) then aff(v) = regulatory else if metabolic in aff(OutNeighbors(v)) then aff(v) = metabolic else aff(v) = ambiguous Assign affiliation for vertices formerly denoted as ambiguous aff = AssignAmbiguousVertices(aff) additionalRun = True while additionalRun do additionalRun = False for v in aff(network.vertices) == ambiguous doif aff(AllNeighbors(v)) == 1 then aff(v) = aff(AllNeighbors(v))additionalRun = True else if ( regulatory ∧ metabolic ) in aff(AllNeighbors(v)) then aff(v) = ambiguous else if regulatory in aff(AllNeighbors(v)) then aff(v) = regulatory additionalRun = True else if metabolic in aff(AllNeighbors(v)) then aff(v) = metabolic additionalRun = True else aff(v) = ambiguous for v in aff(network.vertices) == ambiguous do aff(v) = interface Table 3: Comparison of vertex composition and the coverage to the model from Covert et al. [11] ofthe integrative
E. coli network (Largest WCC), the underlying full graph and the EcoCycdatabase (release 20.0).Largest WCC Full graph Database**Vertices EcoCyc i MC1010* EcoCyc i MC1010* EcoCyc i MC1010*Reaction 4693 569/ 767 7251 601/ 767 2617 717/ 767Compound 2681 557/ 615 2785 558/ 615 2678 614/ 615Gene 2545 971/1010 2801 981/1010 4506 1010/1010Protein monomer 1917 771/ 817 2012 775/ 817 5708 815/ 817Protein-protein complex 929 986Protein-compound complex 100 103Protein-RNA complex 3 412868 2868/3209 15942 2915/3209 15509 3151/3209 * accounted only for enzymatic reactions and unique metabolites (1076 and 762 in total); ** reactions in EcoCycdatabase are case insensitive for compartmentation and reversibility lgorithm 3: Recursive algorithm for the determination of the, so-termed, domain-traversingpaths from regulatory to metabolic domain and vice versa truly passing the interface domain. input : graph
G = {V,E} ,map aff : V → { regulatory , interface , metabolic } downTP,upTP = TraversingPaths( G, aff ) downTP,upTP = list() for (source,target) in E doif aff(target) == interface thenif aff(source) == regulatory then InterfaceSuccessors( G, [source,target], metabolic , downTP ) else if aff(source) == metabolic then
InterfaceSuccessors( G, [source,target], regulatory , upTP ) recursively determine all successors
InterfaceSuccessors( G, vList, aimAff, sucList) for suc in outneighbours( vList[-1] ) do vL = list(vList) if suc not in vL then vL.append( suc ) if aff(suc) == interface then InterfaceSuccessors( G, vL, aimAff, sucList ) else if aff(suc) == aimAff then sucList.append( vL ) eferences [1] Karl Kochanowski, Uwe Sauer, and Elad Noor. Posttranslational regulation of microbial metabolism. Current Opinion in Microbiology , 27:10–17, Oct 2015. ISSN 1369-5274. doi: 10.1016/j.mib.2015.05.007.URL http://dx.doi.org/10.1016/j.mib.2015.05.007 .[2] Adelfo Escalante, Ania Salinas Cervantes, Guillermo Gosset, and Francisco Bolivar. Current knowl-edge of the Escherichia coli phosphoenolpyruvate-carbohydrate phosphotransferase system: peculiari-ties of regulation and impact on growth and product formation.
Applied Microbiology and Biotech-nology , 94(6):1483–1494, May 2012. ISSN 1432-0614. doi: 10.1007/s00253-012-4101-5. URL http://dx.doi.org/10.1007/s00253-012-4101-5 .[3] E Goncalves, J Bucher, A Ryll, J Niklas, K Mauch, S Klamt, M Rocha, and J Saez-Rodriguez. Bridgingthe layers: towards integration of signal transduction, regulation and metabolism into mathematical mod-els.
Mol Biosyst , 9(7):1576–1583, 2013. doi: 10.1039/C3MB25489E. URL http://pubs.rsc.org/en/content/articlehtml/2013/mb/c3mb25489e .[4] J S Edwards and B Palsson. The Escherichia coli MG1655 in silico metabolic genotype: Its definition,characteristics, and capabilities.
Proceedings of the National Academy of Sciences of the UnitedStates of America , 97(10):5528–5533, March 2000. ISSN 1091-6490. URL .[5] Jennifer L Reed, Thuy D Vo, Christophe H Schilling, and Bernhard Palsson. An expanded genome-scalemodel of Escherichia coli K-12 (iJR904 GSM/GPR).
Genome Biology , 4(9):R54, 2003. ISSN 1465-6906.doi: 10.1186/gb-2003-4-9-r54. URL http://dx.doi.org/10.1186/gb-2003-4-9-r54 .[6] Adam M Feist, Christopher S Henry, Jennifer L Reed, Markus Krummenacker, Andrew R Joyce, Pe-ter D Karp, Linda J Broadbelt, Vassily Hatzimanikatis, and Bernhard Palsson. A genome-scale metabolicreconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic infor-mation.
Molecular Systems Biology , 3:121, Jun 2007. ISSN 1744-4292. doi: 10.1038/msb4100155. URL http://dx.doi.org/10.1038/msb4100155 .[7] Jeffrey D. Orth, Bernhard. Palsson, and R. M. T. Fleming. Reconstruction and use of microbial metabolicnetworks: the core Escherichia coli metabolic model as an educational guide.
EcoSal Plus , 4(1), Sep 2010.ISSN 2324-6200. doi: 10.1128/ecosalplus.10.2.1. URL http://dx.doi.org/10.1128/ecosalplus.10.2.1 .[8] J. D. Orth, T. M. Conrad, J. Na, J. A. Lerman, H. Nam, A. M. Feist, and B. Palsson. A comprehensivegenome-scale reconstruction of Escherichia coli metabolism.
Molecular Systems Biology , 7(1):535, Apr2011. ISSN 1744-4292. doi: 10.1038/msb.2011.65. URL http://dx.doi.org/10.1038/msb.2011.65 .[9] Jonathan M Monk, Colton J Lloyd, Elizabeth Brunk, Nathan Mih, Anand Sastry, Zachary King, RikiyaTakeuchi, Wataru Nomura, Zhen Zhang, Hirotada Mori, et al. iML1515, a knowledgebase that computesEscherichia coli traits.
Nature biotechnology , 35(10):904, 2017.[10] Socorro Gama-Castro, Heladia Salgado, Alberto Santos-Zavaleta, Daniela Ledezma-Tejeida, Luis Muñiz-Rascado, Jair Santiago García-Sotelo, Kevin Alquicira-Hernández, Irma Martínez-Flores, Lucia Pannier,Jaime Abraham Castro-Mondragón, et al.
RegulonDB version 9.0: high-level integration of gene regulation,coexpression, motif clustering and beyond.
Nucleic Acids Research , 44(D1):D133–D143, 2015. ISSN1362-4962. doi: 10.1093/nar/gkv1156. URL http://dx.doi.org/10.1093/nar/gkv1156 .[11] Markus W. Covert, Eric M. Knight, Jennifer L. Reed, Markus J. Herrgard, and Bernhard Palsson. Inte-grating high-throughput and computational data elucidates bacterial networks.
Nature , 429(6987):92–96,2004. ISSN 1476-4679. doi: 10.1038/nature02456. URL http://dx.doi.org/10.1038/nature02456 .[12] Tomer Shlomi, Yariv Eisenberg, Roded Sharan, and Eytan Ruppin. A genome-scale computational studyof the interplay between transcriptional regulation and metabolism.
Molecular Systems Biology , 3:101,2007. ISSN 1744-4292. doi: 10.1038/msb4100141. URL http://dx.doi.org/10.1038/msb4100141 .[13] Areejit Samal and Sanjay Jain. The regulatory network of E. coli metabolism as a Boolean dynamicalsystem exhibits both homeostasis and flexibility of response.
BMC Systems Biology , 2(1):21, 2008. ISSN1752-0509. doi: 10.1186/1752-0509-2-21. URL http://dx.doi.org/10.1186/1752-0509-2-21 .
14] Erwin P Gianchandani, Andrew R Joyce, Bernhard Palsson, and Jason A Papin. Functional statesof the genome-scale Escherichia coli transcriptional regulatory system.
PLoS Computational Biol-ogy , 5(6):e1000403, May 2009. ISSN 1553-7358. URL .[15] Sriram Chandrasekaran and Nathan D Price. Probabilistic integrative modeling of genome-scale metabolicand regulatory networks in Escherichia coli and Mycobacterium tuberculosis.
Proceedings of the NationalAcademy of Sciences of the United States of America , 107(41):17845–17850, 2010. ISSN 1091-6490.doi: 10.1073/pnas.1005139107. URL .[16] Saheed Imam, Sascha Schäuble, Aaron N. Brooks, Nitin S. Baliga, and Nathan D. Price. Data-drivenintegration of genome-scale regulatory and metabolic network models.
Frontiers in Microbiology , 6:409,2015. ISSN 1664-302X. URL .[17] RP Vivek-Ananth and Areejit Samal. Advances in the integration of transcriptional regulatory informationinto genome-scale metabolic models.
Biosystems , 147:1–10, 2016.[18] Tong Hao, Dan Wu, Lingxuan Zhao, Qian Wang, Edwin Wang, and Jinsheng Sun. The genome-scaleintegrated networks in microorganisms.
Frontiers in Microbiology , 9:296, 2018.[19] Nicole E. Radde and Marc-Thorsten Hütt. The Physics behind Systems Biology.
EPJ NonlinearBiomedical Physics , 4(1):7–, 2016. ISSN 2195-0008. doi: 10.1140/epjnbp/s40366-016-0034-8. URL https://doi.org/10.1140/epjnbp/s40366-016-0034-8 .[20] Michael Y. Galperin, Xose M. Fernandez-Suarez, and Daniel J. Rigden. The 24th annual Nucleic AcidsResearch database issue: a look back and upcoming changes.
Nucleic Acids Research , 45(D1):D1–D11,2017. doi: 10.1093/nar/gkw1188. URL +http://dx.doi.org/10.1093/nar/gkw1188 .[21] I. M. Keseler, A. Mackie, M. Peralta-Gil, A. Santos-Zavaleta, S. Gama-Castro, C. Bonavides-Martinez,C. Fulcher, A. M. Huerta, A. Kothari, M. Krummenacker, and et al. EcoCyc: fusing model organismdatabases with systems biology.
Nucleic Acids Research , 41(D1):D605–D612, 2012. ISSN 1362-4962.doi: 10.1093/nar/gks1027. URL http://dx.doi.org/10.1093/nar/gks1027 .[22] Trey Ideker and Nevan J. Krogan. Differential network biology.
Molecular Systems Biology , 8:565, 2012.doi: 10.1038/msb.2011.99. URL http://dx.doi.org/10.1038/msb.2011.99 .[23] Dexter Pratt, Jing Chen, David Welker, Ricardo Rivas, Rudolf Pillich, Vladimir Rynkov, Keiichiro Ono,Carol Miello, Lyndon Hicks, Sandor Szalma, Aleksandar Stojmirovic, Radu Dobrin, Michael Braxenthaler,Jan Kuentzer, Barry Demchak, and Trey Ideker. NDEx, the Network Data Exchange.
Cell Systems , 1(4):302–305, 2015. ISSN 2405-4712. doi: 10.1016/j.cels.2015.10.001. URL http://dx.doi.org/10.1016/j.cels.2015.10.001 .[24] Michael Ku Yu, Michael Kramer, Janusz Dutkowski, Rohith Srivas, Katherine Licon, Jason F. Kreisberg,Cherie T. Ng, Nevan Krogan, Roded Sharan, and Trey Ideker. Translation of genotype to phenotype bya hierarchy of cell subsystems.
Cell Systems , 2(2):77–88, 2016. doi: 10.1016/j.cels.2016.02.003. URL http://dx.doi.org/10.1016/j.cels.2016.02.003 .[25] E. J. O’Brien, J. A. Lerman, R. L. Chang, D. R. Hyduke, and B. Palsson. Genome-scale models ofmetabolism and gene expression extend and refine growth phenotype prediction.
Molecular SystemsBiology , 9(1):693, 2013. ISSN 1744-4292. doi: 10.1038/msb.2013.52. URL http://dx.doi.org/10.1038/msb.2013.52 .[26] Kieran Smallbone. Standardized network reconstruction of E. coli metabolism. 2013. URL https://arxiv.org/abs/1304.2960 .[27] Joanne K Liu, Edward J O’Brien, Joshua A Lerman, Karsten Zengler, Bernhard Palsson, and Adam MFeist. Reconstruction and modeling protein translocation and compartmentalization in Escherichia coliat the genome-scale.
BMC Systems Biology , 8(1):110, Sep 2014. ISSN 1752-0509. doi: 10.1186/s12918-014-0110-6. URL http://dx.doi.org/10.1186/s12918-014-0110-6 .[28] Roger Guimera and Luis A. Nunes Amaral. Functional cartography of complex metabolic networks.
Nature ,433(7028):895–900, Feb 2005. ISSN 1476-4679. doi: 10.1038/nature03288. URL http://dx.doi.org/10.1038/nature03288 .
29] Josef Deutscher. The mechanisms of carbon catabolite repression in bacteria.
Current Opinion inMicrobiology , 11(2):87–93, Apr 2008. ISSN 1369-5274. doi: 10.1016/j.mib.2008.02.007. URL http://dx.doi.org/10.1016/j.mib.2008.02.007 .[30] L Thelander and P Reichard. Reduction of ribonucleotides.
Annual Review of Biochemistry , 48(1):133–158, Jun 1979. ISSN 1545-4509. doi: 10.1146/annurev.bi.48.070179.001025. URL http://dx.doi.org/10.1146/annurev.bi.48.070179.001025 .[31] Marc Fontecave, Per Nordlund, Hans Eklund, and Peter Reichard. The Redox Centers of Ribonu-cleotide Reductase of Escherichia coli. In F.F. Nord, editor,
Advances in Enzymology and RelatedAreas of Molecular Biology , volume 65, chapter 4, pages 147–183. Wiley-Blackwell, Nov 1992. ISBNhttp://id.crossref.org/isbn/9780471527602. doi: 10.1002/9780470123119.ch4. URL http://dx.doi.org/10.1002/9780470123119.ch4 .[32] A. Jordan and P. Reichard. Ribonucleotide reductases.
Annual Review of Biochemistry , 67(1):71–98,Jun 1998. ISSN 1545-4509. doi: 10.1146/annurev.biochem.67.1.71. URL http://dx.doi.org/10.1146/annurev.biochem.67.1.71 .[33] Eduard Torrents. Ribonucleotide reductases: essential enzymes for bacterial life.
Frontiers in Cellularand Infection Microbiology , 4:52, Apr 2014. ISSN 2235-2988. doi: 10.3389/fcimb.2014.00052. URL http://dx.doi.org/10.3389/fcimb.2014.00052 .[34] Peng Jiang and Alexander J Ninfa. Regulation of autophosphorylation of Escherichia coli nitrogen regulatorII by the PII signal transduction protein.
J Bacteriol , 181(6):1906–1911, Mar 1999. URL http://jb.asm.org/content/181/6/1906.full .[35] Larry Reitzer. Nitrogen assimilation and global regulation in Escherichia coli.
Annual Review of Micro-biology , 57(1):155–176, Oct 2003. ISSN 1545-3251. doi: 10.1146/annurev.micro.57.030502.090820. URL http://dx.doi.org/10.1146/annurev.micro.57.030502.090820 .[36] Daniel R. Brown, Geraint Barton, Zhensheng Pan, Martin Buck, and Sivaramesh Wigneshweraraj. Nitrogenstress response and stringent response are coupled in Escherichia coli.
Nature Communications , 5:4115,Jun 2014. ISSN 2041-1723. doi: 10.1038/ncomms5115. URL http://dx.doi.org/10.1038/ncomms5115 .[37] C. K. Mathews. DNA precursor metabolism and genomic stability.
The FASEB Journal , 20(9):1300–1314, Jul 2006. ISSN 1530-6860. doi: 10.1096/fj.06-5730rev. URL http://dx.doi.org/10.1096/fj.06-5730rev .[38] Tore Opsahl, Filip Agneessens, and John Skvoretz. Node centrality in weighted networks: Generaliz-ing degree and shortest paths.
Social Networks , 32(3):245–251, 2010. ISSN 0378-8733. doi: http://dx.doi.org/10.1016/j.socnet.2010.03.006. URL .[39] Mark Newman.
Networks: An Introduction . Oxford University Press, Inc., New York, NY, USA, 2010.ISBN 0199206651, 9780199206650.[40] Patricia J. Kiley and Helmut Beinert. Oxygen sensing by the global regulator, FNR: the role of theiron-sulfur cluster.
FEMS Microbiology Reviews , 22(5):341–352, 1998. doi: 10.1111/j.1574-6976.1998.tb00375.x. URL http://dx.doi.org/10.1111/j.1574-6976.1998.tb00375.x .[41] Andrew Travers and Georgi Muskhelishvili. Dna supercoiling – a global transcriptional regulator forenterobacterial growth?
Nat Rev Micro , 3(2):157–169, 2005. ISSN 1740-1526. doi: 10.1038/nrmicro1088.URL http://dx.doi.org/10.1038/nrmicro1088 .[42] Santo Fortunato and Darko Hric. Community detection in networks: A user guide.
Physics Re-ports , 659:1–44, November 2016. ISSN 0370-1573. doi: 10.1016/j.physrep.2016.09.002. URL .[43] A. Kolb, S. Busby, H. Buc, S. Garges, and S. Adhya. Transcriptional regulation by cAMP and its receptorprotein.
Annual Review of Biochemistry , 62(1):749–797, Jun 1993. ISSN 1545-4509. doi: 10.1146/annurev.bi.62.070193.003533. URL http://dx.doi.org/10.1146/annurev.bi.62.070193.003533 .
44] E. Fic, P. Bonarek, A. Gorecki, S. Kedracka-Krok, J. Mikolajczak, A. Polit, M. Tworzydlo, M. Dziedzicka-Wasylewska, and Z. Wasylewski. cAMP Receptor Protein from Escherichia coli as a Model of SignalTransduction in Proteins – A Review.
Journal of Molecular Microbiology and Biotechnology , 17(1):1–11, 2009. ISSN 1464-1801. doi: 10.1159/000178014. URL http://dx.doi.org/10.1159/000178014 .[45] Lisa U. Magnusson, Anne Farewell, and Thomas Nyström. ppGpp: a global regulator in Escherichia coli.
Trends in Microbiology , 13(5):236–242, May 2005. ISSN 0966-842X. doi: 10.1016/j.tim.2005.03.008.URL http://dx.doi.org/10.1016/j.tim.2005.03.008 .[46] Katarzyna Potrykus and Michael Cashel. (p)ppGpp: still magical?
Annual Review of Microbiology , 62(1):35–51, Oct 2008. ISSN 1545-3251. doi: 10.1146/annurev.micro.62.081307.162903. URL http://dx.doi.org/10.1146/annurev.micro.62.081307.162903 .[47] Anjana Srivatsan and Jue D Wang. Control of bacterial transcription, translation and replication by(p)ppGpp.
Current Opinion in Microbiology , 11(2):100–105, Apr 2008. ISSN 1369-5274. doi: 10.1016/j.mib.2008.02.001. URL http://dx.doi.org/10.1016/j.mib.2008.02.001 .[48] Stephen Spiro. Regulators of bacterial responses to nitric oxide.
FEMS Microbiology Reviews , 31(2):193–211, Mar 2007. ISSN 1574-6976. doi: 10.1111/j.1574-6976.2006.00061.x. URL http://dx.doi.org/10.1111/j.1574-6976.2006.00061.x .[49] Jonathan D. Partridge, Diane M. Bodenmiller, Michael S. Humphrys, and Stephen Spiro. NsrR targets inthe Escherichia coli genome: new insights into DNA sequence requirements for binding and a role for NsrRin the regulation of motility.
Molecular Microbiology , 73(4):680–694, Aug 2009. ISSN 1365-2958. doi:10.1111/j.1365-2958.2009.06799.x. URL http://dx.doi.org/10.1111/j.1365-2958.2009.06799.x .[50] Nicholas P. Tucker, Nick E. Le Brun, Ray Dixon, and Matthew I. Hutchings. There’s NO stopping NsrR,a global regulator of the bacterial NO stress response.
Trends in Microbiology , 18(4):149–156, Apr 2010.ISSN 0966-842X. doi: 10.1016/j.tim.2009.12.009. URL http://dx.doi.org/10.1016/j.tim.2009.12.009 .[51] B R Ernsting, M R Atkinson, A J Ninfa, and R G Matthews. Characterization of the regulon controlled bythe leucine-responsive regulatory protein in Escherichia coli.
Journal of Bacteriology , 174(4):1109–1118,Feb 1992. ISSN 1098-5530. doi: 10.1128/jb.174.4.1109-1118.1992. URL http://dx.doi.org/10.1128/jb.174.4.1109-1118.1992 .[52] J M Calvo and R G Matthews. The leucine-responsive regulatory protein, a global regulator of metabolismin Escherichia coli.
Microbiological Reviews , 58(3):466–490, Sep 1994. ISSN 0146-0749. URL .[53] Arie B. Brinkman, Thijs J. G. Ettema, Willem M. De Vos, and John Van Der Oost. The Lrp familyof transcriptional regulators.
Molecular Microbiology , 48(2):287–294, Apr 2003. ISSN 1365-2958. doi:10.1046/j.1365-2958.2003.03442.x. URL http://dx.doi.org/10.1046/j.1365-2958.2003.03442.x .[54] Georg E. Schulz. Bacterial porins: structure and function.
Current Opinion in Cell Biology , 5(4):701–707, Aug 1993. ISSN 0955-0674. doi: 10.1016/0955-0674(93)90143-e. URL http://dx.doi.org/10.1016/0955-0674(93)90143-E .[55] B. K. Jap and P. J. Walian. Structure and functional mechanism of porins.
Physiological Reviews , 76(4):1073–1088, 1996. URL http://physrev.physiology.org/content/76/4/1073 .[56] Tilman Schirmer. General and specific porins from bacterial outer membranes.
Journal of StructuralBiology , 121(2):101–109, 1998. ISSN 1047-8477. doi: 10.1006/jsbi.1997.3946. URL http://dx.doi.org/10.1006/jsbi.1997.3946 .[57] Carsten Marr, Marcel Geertz, Marc-Thorsten Hütt, and Georgi Muskhelishvili. Dissecting the logicaltypes of network control in gene expression profiles.
BMC Syst Biol , 2(1):18, Jan 2008. doi: 10.1186/1752-0509-2-18. URL https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-2-18 .[58] Nikolaus Sonnenschein, Marcel Geertz, Georgi Muskhelishvili, and Marc-Thorsten Hütt. Analog regulationof metabolic demand.
BMC Syst Biol , 5(1):40, Jan 2011. doi: 10.1186/1752-0509-5-40. URL .
59] Nikolaus Sonnenschein, José Felipe Golib Dzib, Annick Lesne, Sebastian Eilebrecht, Sheerazed Boulkroun,Maria-Christina Zennaro, Arndt Benecke, and MT Hütt. A network perspective on metabolic inconsistency.
BMC Systems Biology , 6(1):41, May 2012. doi: 10.1186/1752-0509-6-41. URL https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-6-41 .[60] Carolin Knecht, Christoph Fretter, Philipp Rosenstiel, Michael Krawczak, and Marc-Thorsten Hütt.Distinct metabolic network states manifest in the gene expression profiles of pediatric inflammatorybowel disease patients and controls.
Scientific Reports , 6:32584, 2016. doi: 10.1038/srep32584. URL .[61] Moritz E Beber, Patrick Sobetzko, Georgi Muskhelishvili, and Marc-Thorsten Hütt. Interplay of digitaland analog control in time-resolved gene expression profiles.
EPJ Nonlinear Biomedical Physics , 4(1):8,2016. doi: 10.1140/epjnbp/s40366-016-0035-7. URL https://epjnonlinearbiomedphys.springeropen.com/articles/10.1140/epjnbp/s40366-016-0035-7 .[62] Hiroaki Kitano. Biological robustness.
Nature Reviews Genetics , 5:826–837, November 2004. doi:10.1038/nrg1471. URL http://dx.doi.org/10.1038/nrg1471 .[63] Hiroaki Kitano. Towards a theory of biological robustness.
Molecular Systems Biology , 3:137, September2007. ISSN 1744-4292. doi: 10.1038/msb4100179. URL .[64] Tamar Friedlander, Avraham E Mayo, Tsvi Tlusty, and Uri Alon. Evolution of bow-tie architectures inbiology.
PLoS Computational Biology , 11(3):e1004055, November 2014. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1004055. URL .[65] Marie Csete and John Doyle. Bow ties, metabolism and disease.
Trends in Biotechnology , 22(9):446–450, 2004. ISSN 0167-7799. doi: 10.1016/j.tibtech.2004.07.007. URL http://dx.doi.org/10.1016/j.tibtech.2004.07.007 .[66] W. C. van Heeswijk, H. V. Westerhoff, and F. C. Boogerd. Nitrogen assimilation in Escherichia coli:putting molecular data into a systems perspective.
Microbiology and Molecular Biology Reviews , 77(4):628–695, Dec 2013. ISSN 1092-2172. doi: 10.1128/mmbr.00025-13. URL http://dx.doi.org/10.1128/MMBR.00025-13 .[67] Karin J Jensen, Christian B Moyer, and Kevin A Janes. Network architecture predisposes an enzyme toeither pharmacologic or genetic targeting.
Cell systems , 2(2):112–121, February 2016. ISSN 2405-4720.doi: 10.1016/j.cels.2016.01.012. URL .[68] David F. Klosik, Anne Grimbs, Stefan Bornholdt, and Marc-Thorsten Hütt. The interdependent networkof gene regulation and metabolism is robust where it needs to be.
Nature Communications , 8:534, 2017.doi: 10.1038/s41467-017-00587-4. URL .[69] Filippo Radicchi and Ginestra Bianconi. Redundant interdependencies boost the robustness of multiplexnetworks.
Physical Review X , 7:011013, 2017. doi: 10.1103/PhysRevX.7.011013. URL https://link.aps.org/doi/10.1103/PhysRevX.7.011013 .[70] Mikko Kivelä, Alex Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, and Mason A. Porter.Multilayer networks.
Journal of Complex Networks , 2(3):203–271, 2014. doi: 10.1093/comnet/cnu016.URL https://doi.org/10.1093/comnet/cnu016 .[71] Jianxi Gao, Sergey V. Buldyrev, H. Eugene Stanley, and Shlomo Havlin. Networks formed from interde-pendent networks.
Nature Physics , 8(1):40–48, January 2012. ISSN 1745-2473. doi: 10.1038/nphys2180.URL http://dx.doi.org/10.1038/nphys2180 .[72] Sergey V. Buldyrev, Roni Parshani, Gerald Paul, H. Eugene Stanley, and Shlomo Havlin. Catastrophiccascade of failures in interdependent networks.
Nature , 464(7291):1025–1028, 2010. ISSN 0028-0836. doi:10.1038/nature08932. URL http://dx.doi.org/10.1038/nature08932 .[73] Seung-Woo Son, Golnoosh Bizhani, Claire Christensen, Peter Grassberger, and Maya Paczuski. Percolationtheory on interdependent networks based on epidemic spreading.
Europhysics Letters , 97:16006, 2012.ISSN 0295-5075. doi: 10.1209/0295-5075/97/16006. URL http://stacks.iop.org/0295-5075/97/i=1/a=16006 .
74] Sebastian M. Krause, Michael M. Danziger, and Vinko Zlatić. Hidden connectivity in networks withvulnerable classes of nodes.
Physical Review X , 6:041022, 2016. doi: 10.1103/PhysRevX.6.041022. URL https://link.aps.org/doi/10.1103/PhysRevX.6.041022 . upplementary information Table S1: Vertex composition of the integrative
E. coli network in total (Total) and for the partitionregulatory domain – protein interface – metabolic domain (RD,PI,MD).Vertices Total RD PI MD reaction compound gene protein monomer protein-protein complex
929 65 243 621 protein-compound complex
100 0 100 0 protein-rna complex
E. coli network in total (Total), for the partition regu-latory domain – protein interface – metabolic domain (RD,PI,MD) and for the peripheraledges between the three domains.Edges Link Total MD MD / PI PI PI / RD RD RD / MD gene - protein D 1916 312 0 325 803 198 278 protein - complex
C 1182 6 809 291 68 5 34 0 0 4 0 0 080 8 42 22 7 1 06 0 0 0 6 0 0 enzyme - reaction
D 1272 1225 5 42 0 0 02775 2657 6 109 1 0 2 educt - reaction
C 7707 7374 298 35 0 0 0246 0 1 245 0 0 0181 0 3 178 0 0 0100 0 0 100 0 0 0 reaction - product
D 8303 7892 398 13 0 0 0171 0 1 170 0 0 0252 0 7 210 35 0 0102 0 0 102 0 0 0 transport
C 291 281 8 2 0 0 0 regulation
R 207 0 0 0 0 207 0
11 6 1 1 0 1 2
98 0 0 0 0 0 98
701 650 50 1 0 0 0
10 0 8 2 0 0 031880 22086 1703 1854 2642 3210 385
C – Conjunct encoding; D – Disjunct encoding; R – Regulation regulation ofthe integrative
E. coli network (EcoCyc, release 20.0). Each of the 7296 regulatory processescomprises the regulator source (’Regulator’) and target (’Regulated entity’) as well as theregulatory mode, namely activation ( + ) and inhibition ( − ).Regulation type Transcription-Factor-Binding 4302 Protein Transunit, Promoter Allosteric-Regulation-of-RNAP 219 Protein Promoter Ribosome-Mediated-Attenuation 12 RNA Terminator Protein-Mediated-Attenuation 5 Protein Transunit, Terminator Transcriptional-Attenuation 3 Compound Transunit, Terminator Rho-Blocking-Antitermination 3 Compound Terminator Small-Molecule-Mediated-Attenuation 2 Compound Transunit, Terminator RNA-Mediated-Translation-Regulation 195 RNA Transunit, Gene Protein-Mediated-Translation-Regulation 56 Protein Transunit, Gene Compound-Mediated-Translation-Regulation 22 Protein Transunit, Gene Regulation-of-Translation 4 Compound Transunit, Gene Regulation-of-Enzyme-Activity 2456 Compound, Protein Enzyme Regulation-of-Reactions 15 Compound, Protein Reaction Regulation 2 Protein ProteinTable S4: Comparison of vertex composition of the integrative
E. coli network and the coverage tothe integrative model from Covert et al. [11], the metabolic model from Feist et al. [6], andthe transcriptional regulatory network based on the RegulonDB [10].Vertices of the integrative
E. coli network i MC1010* i AF1260** RegulonDBReaction 4693 569/ 767 665/1436 0/ 0Compound 2681 557/ 615 607/ 963 0/ 0Gene 2545 971/1010 1168/1260 1764/1788Protein monomer 1917 771/ 817 0/ 0 185/ 190Protein-protein complex 929 11/ 13Protein-compound complex 100 0/ 0Protein-RNA complex 3 0/ 0Total (model coverage) 12868 2868/3209 2440/3636 1960/199189.4% 67.1% 98.4Total (EcoCyc coverage) 3156/3209 3636/3636 1991/199198.3% 100.0% 100.0% * accounted only for intracellular reactions and unique metabolites, in total 1076 and 762** accounted only for intracellular reactions and unique metabolites, in total 2382 and 1668 a b l e S : M o s t a bund a n t v e r t i ce s o f t h e d o w n w a r d s ( R D M D ) a ndup w a r d s t r a v e r s i n g p a t h s ( M D R D ) , t h e i r q u a n t i t y a nd t h e r e s p ec t i v e t o t a l d e g r ee a nd t h ec o rr e s p o nd i n g d e g r ee r a n k . T r a v e r s i n g p a t h s V e r t e x I D V e r t e x n a m e D e g r ee R a n k R D M D PTSH-MONOMER H P r ( h i s t i d i n e p r o t e i n ) PTSH-PHOSPHORYLATED H P r - P ( ph o s ph o r y l a t e d H P r ) RXN0-6718 E I - P + H P r → H P r - P + E I PTSH-MONOMER H P r ( h i s t i d i n e p r o t e i n ) PTSH-PHOSPHORYLATED H P r - P ( ph o s ph o r y l a t e d H P r ) RXN0-7166
PEP + H P r ↔ H P r - P + P y r RIBONUCLEOSIDE-DIP-REDUCTI-CPLX R D P R ( r i b o nu c l e o s i d e - d i ph o s ph a t e r e du c t a s e ) RED-THIOREDOXIN-MONOMER ( r e du ce d )t h i o r e d o x i n OX-THIOREDOXIN-MONOMER o x i d i ze d t h i o r e d o x i n RIBONUCLEOSIDE-DIP-REDUCTI-CPLX R D P R ( r i b o nu c l e o s i d e - d i ph o s ph a t e r e du c t a s e ) RED-THIOREDOXIN2-MONOMER ( r e du ce d )t h i o r e d o x i n OX-THIOREDOXIN2-MONOMER o x i d i ze d t h i o r e d o x i n f o v e r a ll M D R D PROTEIN-NRIP N t r C - P ( ph o s ph o r y l a t e d N t r C ) PROTEIN-NRIIP N t r B - P ( ph o s ph o r y l a t e d N t r B ) NRIPHOS-RXN N t r B - P + N t r C → N t r B + N t r C - P f o v e r a ll f o v e r a ll N-ACETYL-D-GLUCOSAMINE_p N -acetylglucosamine N-ACETYL-D-MANNOSAMINE_p N -acetylmannosamine NACMUR_p N -acetylmuramate ASCORBATE_p
Ascorbate
CELLOBIOSE_p
Cellobiose
DIHYDROXYACETONE
Dihydroxyacetone
GALACTITOL_p
Galactitol
CPD-12538_p
Glucosamine
CPD-15382_p keto -Fructose
GLC_p
Glucose
CPD-3570_p
Methylglucoside
HYDROQUINONE-O-BETA-D-GLUCOPYRANOSIDE_p
Hydroquinone-O-glucopyranoside (arbutin)
MANNITOL_p
Mannitol
CPD-12601_p
Mannose O -Mannosylglycerate CPD-1142_p
Salicin
SORBITOL_p
Sorbitol
TREHALOSE_p
Trehalose34able S7: Regulated entities of the global transcriptional response regulator of the NtrBC system, thephosphorylated NtrC.VertexnameVertex ID Function of the encoded protein
EG10385 glnG NtrC (inhibition)
EG10387 glnL NtrB (inhibition)
EG10383 glnA Glutamine synthetase (as 12-fold oligomer; inhibition)
EG12191 glnK PII-2 (as trimer) can activate the adenylylation of glutamine synthetase
EG10386 glnH glutamine ABC transporter - periplasmic binding protein
EG10388 glnP glutamine ABC transporter - membrane subunit
EG10389 glnQ glutamine ABC transporter - ATP binding subunit
EG11629 potF putrescine ABC transporter - periplasmic binding protein
EG11630 potG putrescine ABC transporter - ATP binding subunit
EG11631 potH putrescine ABC transporter - membrane subunit
EG11632 potI putrescine ABC transporter - membrane subunit
EG12124 hisJ histidine ABC transporter - periplasmic binding protein
EG10007 hisM arginine/histidine/lysine/ornithine ABC transporter - membrane subunit
EG10452 hisP arginine/histidine/lysine/ornithine ABC transporter - ATP binding subunit
EG12125 hisQ arginine/histidine/lysine/ornithine ABC transporter - membrane subunit
EG10072 argT arginine/lysine/ornithine ABC transporter - periplasmic binding protein
EG11821 amtB member of NH /NH + transporters, necessary for growth only at low NH levels G7071 cbl Cbl DNA-binding transcriptional activator
G7072 nac Nac DNA-binding transcriptional dual regulator
G6943 astA Arginine succinyltransferase - 1 st step in arginine degradation II (AST pathway) G6941 astB Succinylarginine dihydrolase (as dimer) - 2 nd step in AST pathway G6944 astC Succinylornithine transaminase - 3 rd step in AST pathway G6942 astD Succinylglutamate semialdehyde dehydrogenase - 4 th step in AST pathway G6940 astE Succinylglutamate desuccinylase - 5 th and final reaction in AST pathway G6523 rutA Uracil oxygenase - 1 st step in uracil degradation III G6522 rutB peroxyureidoacrylate/ureidoacrylate amido hydrolase - 2 nd step in uracil degra-dation III G6521 rutC (predicted aminoacrylate peracid reductase - 3 rd step in uracil degradation III) G6520 rutD predicted aminoacrylate hydrolase - 4 th step in uracil degradation III G6519 rutE predicted malonic semialdehyde reductase - 5 th step in uracil degradation III G6518 rutF (flavin reductase - activity required for 1 st step in uracil degradation (RutA)) G6517 rutG member of the nucleobase:cation symporter-2 (NCS2) family of transporters(probably for uracil)
G6782 ddpX D-Ala-D-Ala dipeptidase required for wild-type peptidoglycan biosynthesis
G6781 ddpA (predicted peptide ABC transporter - periplasmic binding component)
G6780 ddpB (predicted peptide ABC transporter - membrane component)
G6779 ddpC (predicted peptide ABC transporter - membrane component)
G6778 ddpD (predicted peptide ABC transporter - ATP-binding component)
G6777 ddpF (predicted peptide ABC transporter - ATP-binding component)
G6969 yeaG (impact in adaptation to sustained N starvation, member of Ser protein kinases)
G6970 yeaH (impact in adaptation to sustained N starvation)
EG12834 yhdW (predicted amino acid ABC transporter - membrane component)
EG12835 yhdX (predicted amino acid ABC transporter - ATP-binding component)
EG12836 yhdY (predicted amino acid ABC transporter - membrane component)
EG12837 yhdZ (predicted amino acid ABC transporter - periplasmic binding component)35 a b l e S : H ub s o f t h e i n t e g r a t i v e E . c o l i n e t w o r k w i t h a t o t a l d e g r ee o f a t l e a s t ( D C ) , t h e i r m o du l e a ffi li a t i o n , a nd t h e d i ff e r e n t i a t i o n i n i n - d e g r ee ( I n ) a nd o u t - d e g r ee ( O u t) i n c l ud i n g t h e a ffi li a t i o n a nd li n k ag e a ss i g n m e n t s . T h e l a s t c o l u m nd e n o t e s t h e i n t r a - d o m a i nd e g r ee f r a c t i o n , ξ , h e r e g i v e n a s p e r ce n t a l f r a c t i o n . A ffi li a t i o n i n - a ffi li a t i o n i n - li n k ag e o u t - a ffi li a t i o n o u t - li n k ag e ξ [ % ] D C V e r t e x I D V e r t e x n a m e R D P I M D I n R D P I M D C D R O u t R D P I M D C D R PROTON P r o t o n ✓ . WATER H O ✓ . CPLX0-226 C r p - c A M P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . Pi P h o s ph a t e ( P ) ✓ . ATP A T P ✓ . PROTON_p P r o t o n ( p e r i p l a s m i c ) ✓ . CPLX0-7534_o [ O m p F ] , o u t e r m e m b r a np o r i n F c o m p l e x ✓ . CPLX0-7533_ o [ O m p C ] , o u t e r m e m b r a np o r i n C c o m p l e x ✓ . CPLX0-7530_o [ O m p E ] , o u t e r m e m b r a np o r i n E c o m p l e x ✓ . ADP A D P ✓ . CPLX0-7797 F N R , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . WATER_p H O ( p e r i p l a s m i c ) ✓ . NAD NA D + ✓ . NADH NA D H / H + ✓ . PC00027 I H F , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . CPLX0-7705 F i s , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . PD00288 H - N S , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . PPI P y r o ph o s ph a t e ✓ . Pi_p P h o s ph a t e ( p e r i p l a s m i c ) ✓ . PHOSPHO-ARCA A r c A - P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . NADP NA D P + ✓ . NADPH NA D P H / H + ✓ . CO-A C o e n z y m e A ✓ . GLT G l u t a m a t e ✓ . CPLX0-7639 [ F u r - F e + ] , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . - K e t og l u t a r a t e ✓ . PHOSPHO-NARL N a r L - P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . CPLX0-8070 D k s A - pp G pp ✓ . AMP A M P ✓ . PYRUVATE P y r u v a t e ✓ . AMMONIUM NH + ✓ . CARBON-DIOXIDE C O ✓ . C o n t i nu e d o nn e x t p ag e o n t i nu e d A ffi li a t i o n i n - a ffi li a t i o n i n - li n k ag e o u t - a ffi li a t i o n o u t - li n k ag e ξ [ % ] D C V e r t e x I D V e r t e x n a m e R D P I M D I n R D P I M D C D R O u t R D P I M D C D R PC00061 C r a , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . CPLX0-3930 F l h D C , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . S-ADENOSYLMETHIONINE A d e n o s y l m e t h i o n i n e ✓ . GUANOSINE_TETRAPHOSPHATE G u a n o s i n e ’ - d i ph o s ph a t e ’ - d i ph o s ph a t e ( pp G pp ) ✓ . CPLX0-8047 N s r R - n i t r i c o x i d e ✓ . ADENOSYL-HOMO-CYS A d e n o s y l h o m o c y s t e i n e ✓ . ZN+2 Z i n c ( Z n + ) ✓ . ACETYL-COA a ce t y l - C o A ✓ . OXYGEN-MOLECULE O ✓ . PHOSPHO-CPXR C p x R , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . GTP G T P ✓ . MONOMER0-155 L r p - L e u c i n e , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . PHOSPHO-PHOB P h o B - P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . PC00010 L e x A , t r a n s c r i p t i o n a l r e p r e ss o r ✓ . PD00353 L r p , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . CFA-CPLX C y c l o p r o p a n e f a tt y a c y l ph o s ph o li p i d s y n t h a s e ✓ . PHOSPHO-PHOP P h o P - P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . AAS-MONOMER A c y l t r a n s f e r a s e ✓ . ALKAPHOSPHA-CPLX_p A l k a li n e ph o s ph a t a s e ( p e r i p l a s m i c ) ✓ . PHOSPHO-NARP N a r P - P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ . D-ALANINE A l a n i n e ✓ . SUC Su c r o s e ✓ . a b l e S : T o pb e t w ee nn e ss ce n t r a l v e r t i ce s o f t h e i n t e g r a t i v e E . c o l i n e t w o r k a nd i n v o l v e d s y s t e m s . T h ece n t r a l r e a c t i o n s o f e a c h s y s t e m a r e s h a d e d i n t h ec o rr e s p o nd i n g s y s t e m ’ s c o l o r . B e t w ee nn e ss V e r t e x I D V e r t e x n a m e G D P I M D S y s t e m . PROTON P r o t o n ✓ ∎∎∎∎∎∎ . WATER H O ✓ ∎∎ . ATP A T P ✓ ∎∎∎∎∎∎ . Pi P h o s ph a t e ( P ) ✓ ∎∎ . PROTON_p P r o t o n ( p e r i p l a s m i c ) ✓ ∎ . ADP A D P ✓ ∎∎∎∎∎ . CAMP c y c li c - A M P ( c A M P ) ✓ ∎ . RXN0-269_f ✓ ∎ . CPLX0-226 C r p - c A M P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ ∎ . ADENYLATECYC-RXN ✓ ∎ . PHOR-RXN ✓ ∎ . PHOSPHO-PHOB P h o B - P , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ ∎ . PHOBR-RXN ✓ ∎ . PHOSPHO-PHOR_i P h o R , s e n s o r y h i s t i d i n e k i n a s e ( i nn e r m e m b r a n e ) ✓ ∎ . LEU L e u c i n e ✓ ∎ . RXN0-261_f ✓ ∎ . MONOMER0-155 L r p - L e u c i n e , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ ∎ . FE+2 F e rr o u s ( F e + ) ✓ ∎ . RXN0-5252_f ✓ ∎ . CPLX0-7620 F u r - F e + ✓ ∎ . CPLX0-7639 [ F u r - F e + ] , t r a n s c r i p t i o n a l du a l r e g u l a t o r ✓ ∎ . EG10671 o m p F , o u t e r m e m b r a n e p o r i n F g e n e ✓ ∎ . EG10671-MONOMER O m p F , o u t e r m e m b r a n e p o r i n F ✓ ∎ . CPLX0-7534_o [ O m p F ] , o u t e r m e m b r a np o r i n F c o m p l e x ✓ ∎ . ADENYLATECYC-MONOMER C y a A , a d e n y l a t ec y c l a s e ✓ ∎ . GUANOSINE_TETRAPHOSPHATE G u a n o s i n e ’ - d i ph o s ph a t e ’ - d i ph o s ph a t e ( pp G pp ) ✓ ∎ . EG10670 o m p C , o u t e r m e m b r a n e p o r i n C g e n e ✓ ∎ . EG10670-MONOMER O m p C , o u t e r m e m b r a n e p o r i n C ✓ ∎ . CPLX0-7533_o [ O m p C ] , o u t e r m e m b r a n e p o r i n C c o m p l e x ✓ ∎ . PPI P y r o ph o s ph a t e ✓ ∎ . WATER_p H O ( p e r i p l a s m i c ) ✓ ◻ . EG10729 o m p E , o u t e r m e m b r a n e p o r i n E g e n e ✓ ∎ . MONOMER0-282 O m p E , o u t e r m e m b r a n e p o r i n E ✓ ∎ . CPLX0-7530_o [ O m p E ] , o u t e r m e m b r a np o r i n E c o m p l e x ✓ ∎ C o n t i nu e d o nn e x t p ag e o n t i nu e d B e t w ee nn e ss V e r t e x I D V e r t e x n a m e G D P I M D S y s t e m . NAD NA D + ✓ ◻ . Pi_p P h o s ph a t e ( p e r i p l a s m i c ) ✓ ◻ . PPPGPPHYDRO-RXN ✓ ∎ . ATPSYN-RXN_f ✓ ∎ . PYRUVATE P y r u v a t e ✓ ∎ . GLT G l u t a m a t e ✓ ◻ . CARBON-DIOXIDE C O − ✓ ∎ . NADP NA D P + ✓ ◻ . PEPDEPHOS-RXN_2 ✓ ∎ . PEPDEPHOS-RXN_1 ✓ ∎ . ADENYLYLSULFKIN-RXN_r ✓ ∎ . AMMONIUM NH + ✓ ∎ . CARBAMATE-KINASE-RXN ✓ ∎ . CO-A C o e n z y m e A ✓ ◻ . ABC-35-RXN_2 ✓ ∎ . ABC-35-RXN_1 ✓ ∎ . EG30063 m i c F , m R NA - i n t e r f e r i n g c o m p l e m e n t a r y R NA g e n e ✓ ∎ ∎ ATPSYN-RXN : ADP + Pi + PROTON p ↔ ATP + WATER + PROTON ∎ ADENYLATECYC-RXN : ADENYLATECYC-MONOMER + ATP → CAMP + PPI ; RXN0-269 : PC00004 + CAMP → CPLX0-226 ∎ PHOR-RXN : PHOR-MONOMER i + ATP + Pi → PHOSPHO-PHOR i + ADP ; PHOBR-RXN : PHOB-MONOMER + PHOSPHO-PHOR i → PHOSPHO-PHOB + PHOR-MONOMER i ∎ RXN0-261 : PD00353 + LEU ↔ MONOMER0-155 ∎ RXN0-5252 : PD00260 + FE+2 ↔ CPLX0-7620 ⇢ CPLX0-7639 ∎ EG30063 ⊣ EG10671 → EG10671-MONOMER ⇢ CPLX0-7534 o ; EG10670 → EG10670-MONOMER ⇢ CPLX0-7533 o ; EG10729 → MONOMER0-282 ⇢ CPLX0-7530 o ∎ PPPGPPHYDRO-RXN : GDP-TP + WATER → GUANOSINE_TETRAPHOSPHATE + Pi + PROTON ∎ PEPDEPHOS-RXN : PHOSPHO-ENOL-PYRUVATE + ADP + PROTON ↔ PYRUVATE + ATP ∎ CARBAMATE-KINASE-RXN : CARBAMOYL-P + ADP + PROTON → CARBON-DIOXIDE + AMMONIUM + ATP ∎ ADENYLYLSULFKIN-RXN : PAPS + ADP + PROTON ↔ APS + ATP a b l e S : T o p t e n k e y e l e m e n t s o f t h e i n t e g r a t i v e E . c o l i n e t w o r k w i t h r e s p ec tt o p r o t e i n i n t e r f a ce - s p ec i fi c d e g r ee ( p i D C ) a ndb e t w ee nn e ss ce n t r a li t y ( p i B C ) , r e s p ec t i v e l y . T h e t r a v e r s i n g p a t h s y s t e m s d e p i c tt h ee m b e dd i n go f s y s t e m c o m p o n e n t s i n t h e p r o t e i n i n t e r f a ce ( F i g u r e s nd , T a b l e S ) . R a n k V e r t e x I D V e r t e x n a m e p i D C p i B CT r a v e r s e p a t h s y s t e m PTSH-MONOMER H P r ( h i s t i d i n e p r o t e i n ) ∎ PTSH-PHOSPHORYLATED H P r - P ( ph o s ph o r y l a t e d H P r ) ∎ RED-THIOREDOXIN-MONOMER r e d T r x ( r e du ce d t h i o r e d o x i n ) ∎ RED-THIOREDOXIN2-MONOMER r e d T r x ( r e du ce d t h i o r e d o x i n ) ∎ OX-THIOREDOXIN-MONOMER o x T r x ( o x i d i ze d t h i o r e d o x i n ) ∎ OX-THIOREDOXIN2-MONOMER o x T r x ( o x i d i ze d t h i o r e d o x i n ) ∎ EG50003-MONOMER a c y l c a rr i e r p r o t e i n ( A C P ) ◻ FLAVODOXIN1-MONOMER fl a v o d o x i n ◻ OX-FLAVODOXIN1 o x i d i ze dfl a v o d o x i n ◻ PROTEIN-CHEA c h e m o t a x i s p r o t e i n C h e A ◻ ⋮ RIBONUCLEOSIDE-DIP-REDUCTI-CPLX R D P R ( r i b o nu c l e o s i d e - d i ph o s ph a t e r e du c t a s e ) ∎ ⋮ RXN0-6718 E I - P + H P r → H P r - P + E I ∎ ⋮ ADPREDUCT-RXN_f_1 N D P + r e d T r x → d N D P + o x T r x + H O ◻ ADPREDUCT-RXN_f_2 N D P + r e d T r x → d N D P + o x T r x + H O ◻ ⋮ THIOREDOXIN-REDUCT-NADPH-RXN_1 o x T r x + NA D P H / H + → r e d T r x + NA D P + ◻ THIOREDOXIN-REDUCT-NADPH-RXN_2 o x T r x + NA D P H / H + → r e d T r x + NA D P + ◻ ∎ RXN0- : PTSH-MONOMER + PTSI-PHOSPHORYLATED → PTSH-PHOSPHORYLATED + PTSI-MONOMER ∎ RIBONUCLEOSIDE-DIP-REDUCTI-CPLX ⇢ RIBONUCLEOSIDE-DIP-REDUCTI-RXN : C o un t s Figure S1: Distribution of intra-domain degree fraction of the integrative
E. coli network. The yellowshaded area represents the significant low intra-domain degree fractions tested via z-score.Table S11: Hubs and non-hubs of the integrative
E. coli network with a significant low intra-domaindegree fraction ( ξ , tested via z-score) and a total degree (DC) larger than 12, and theirdomain affiliation (RD, PI and MD).Vertex name RD PI MD ξ p -value DC H ub s CRP-cAMP DNA-binding transcriptional dual regulator ✓ .
39 0 . ✓ .
65 0 . ✓ .
95 0 . ✓ .
52 0 . ✓ .
56 0 . ✓ .
17 0 . ✓ .
45 0 . N o n - hub s ModE-MoO
DNA-binding transcriptional dual regulator ✓ .
17 0 .
006 48NtrC-P, transcriptional dual regulator ✓ .
51 0 . ✓ .
35 0 . ✓ .
55 0 . ✓ . ✓ .
13 0 . ✓ .
06 0 . ✓ .
25 0 . ✓ .
25 0 . ✓ .
69 0 . ✓ .
18 0 . ✓ .
29 0 . ✓ .
75 0 . ✓ . . ✓
20 0 . ✓ .
29 0 . ✓ .
29 0 . ✓ .
38 0 ..