A Graph-Based Approach to Analyze Flux-Balanced Pathways in Metabolic Networks
Mona Arabzadeh, Morteza Saheb Zamani, Mehdi Sedighi, Sayed-Amir Marashi
AA Graph-Based Approach to Analyze Flux-Balanced Pathwaysin Metabolic Networks
Mona Arabzadeh , Morteza Saheb Zamani , Mehdi Sedighi ,and Sayed-Amir Marashi Department of Computer Engineering and Information Technology,Amirkabir University of Technology, Tehran, Iran and Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran. { m.arabzadeh,szamani,msedighi } @aut.ac.ir, [email protected] Abstract
An Elementary Flux Mode (EFM) is a pathway with minimum set ofreactions that are functional in steady-state constrained space. Due to thehigh computational complexity of calculating EFMs, different approacheshave been proposed to find these flux-balanced pathways. In this paper,an approach to find a subset of EFMs is proposed based on a graph datamodel. The given metabolic network is mapped to the graph model anddecisions for reaction inclusion can be made based on metabolites andtheir associated reactions. This notion makes the approach more conve-nient to categorize the output pathways. Implications of the proposedmethod on metabolic networks are discussed.
Keywords:
Elementary Flux Mode (EFM); Graph Data Model; Metabolic Network
Metabolic network models are among the well-studied models in biotechnology.The reconstruction of these networks is possible by collecting the gene-protein-reaction information from related genomic data and literature [1]. It is impor-tant to explore biologically relevant pathways in metabolic networks. Forcingconstraints to a reconstructed biochemical network results in the definition ofachievable cellular functions [2]. Mathematical representation of constraints areas flux-balance constraints (e.g., conservation of flux) which means the networkshould be at the steady-state condition, and flux bounds which limit numer-ical ranges of network parameters and coefficients such as the minimum andmaximum range of fluxes for each reaction. Flux Balance Analysis (FBA), amethod to predict the optimal growth rate of a a certain species when grown onparticular set of metabolites [3], and Elementary Flux Mode (EFM) analysis,an approach to decompose a network to minimal functional pathways [4], areamong the approaches used in constraint-based analysis of metabolic networks.EFMs [5, 6] have been used in several biological applications such as bio-engineering [7], phenotypic characterization [8], drug target prediction [9] andstrain design [10]. The incorporation of kinetic analysis into EFMs enables amore complete description of cellular functions for which kinetics play a domi-nant role [11]. 1 a r X i v : . [ q - b i o . M N ] F e b everal methods were introduced for finding EFMs based on double-descriptionmethod [5]. Double-description is a technique to enumerate all extreme rays of apolyhedral cone. An improved approach to the primary method was introducedin which the null-space of the stoichiometric matrix is used instead of the matrixitself to generate EFM candidates [12, 13, 14]. Various effective computationalapproaches have been proposed to speed-up previous methods for computingEFMs [15, 16]. Some of these approaches led to the development of computa-tional tools such as Metatool [17] and
EFMtool [16]. Besides, methods basedon linear programming have been proposed that explore a set of EFMs withspecific properties, such as K -shortest EFMs [18], or EFMs with a given set oftarget reactions [19]. A new set of methods based on graph-theory, [20, 21] triesto overcome the scalability problem of the double-description-based techniques.According to the complexity of computing EFMs as discussed in [22, 23],an additional challenge of extracting biological properties from large set ofEFMs [11] has been raised. As stated in [23], the complexity of enumerating allEFMs still remains unknown and the computational complexity of enumeratingEFMs containing a specific reaction is hard. Even for networks with the samenumber of reactions and metabolites, knowing the number of EFMs in one net-work cannot necessarily help in finding the number of EFMs in the other one,as they have different connectivities and topologies. This fact emphasizes theinherent structural information reflected by EFMs [24] and is a motivation toextract a set of biologically meaningful EFMs according to a biological aspectsuch as motifs [25] and thermodynamics [26].The main focus of this paper is to introduce a data model based on theAND/OR graph and to propose an approach to find flux-balanced pathwaysaccording to pathway topology and reaction stoichiometries. It is shown thatthe computed flux-balanced pathways are a biologically relevant subset of EFMswhich include external input and output metabolites. In other words, the subsetcomprises all EFMs connecting input and output external metabolites. Besides,based on the introduced graph data model, an upper-bound for the complex-ity of exploring EFMs containing external metabolites is calculated. Using thetopology of the network gives us the opportunity to make decisions according tothe metabolite/reaction aspects to preserve or eliminate a particular reaction ormetabolite in a certain pathway. Introducing a model to consider both topology and stoichiometry of a metabolic network for network analysis may lead to bet-ter biological decisions and output categorization. The graph structure makesit easier to set-up rules for pathways and to explore the intended solution spacevia rules. Finally, our method can potentially be implemented on hardware(through a system design approach) more conveniently.The rest of the text is organized as follows. Some required concepts and pre-liminaries are provided in Section 2. Before explaining the proposed approachto find the elementary flux modes in a given metabolic network (Section 3.2),we first describe the modified AND/OR graph model and its properties in Sec-tion 3.1. Finally, Section 4 is devoted to results and discussions and Section 5concludes the paper. 2 Preliminaries
In this section, some basic concepts along with the formal definition of elemen-tary flux modes are provided.
Metabolic networks model the metabolism of living cells in terms of a set ofbiochemical reactions. Definitions in this section are derived from [27]. Thebiochemical reactions can be irreversible, i.e., the reaction can be active onlyin one direction, or reversible, i.e., the reaction can be active in both direc-tions. The contributing metabolites in a reaction can be either substrates orproducts. Substrates are consumed and products are produced during the op-eration of a reaction. The topology of a metabolic network is characterized byits m ˆ n stoichiometric matrix, S , where m and n correspond to the number ofmetabolites and reactions, respectively. In this paper, external metabolites arenot included in S . The value S ij represents the stoichiometric coefficient of themetabolite i in the reaction j . S ij is positive/negative if the metabolite i is pro-duced/consumed. If this coefficient is zero it means that the metabolite i doesnot contribute to the reaction j . The network is considered in the steady-stateif for each internal metabolite, the rates of consumption and production areequal. The reactions connected to the external metabolites are called Boundary reactions.
A flux vector v ‰ and v P R n is considered an EFM if it meets the followingconditions: • v i ě i P irreversible reactions (thermodynamic constraint). • S . v “ (steady-state condition). • There is no v P R n with supp p v q Ă supp p v q , where support of a modeis defined as supp p v q “ t i | v i ‰ u , (minimality limitation). In this section, the proposed graph-based elementary flux mode analysis is dis-cussed in detail. To do so, the data structure used in our approach is defined,and then, the proposed approach is introduced accordingly.
The proposed data model is based on the conventional AND/OR graph in com-puter science. However, we provide a different definition with additional featuresto make the model appropriate for our proposed algorithm. In our model, the3oefficients of the metabolites in reactions are embedded in the graph structureas attributes of each node (explained in Definition 3, see below). A pathway is defined as a set of reactions with their associated fluxes. The flux of eachreaction in a certain pathway is also embedded in the graph.
Definition 1.
The Modified Graph for representing a metabolic network, de-noted as MG , is defined as a set of Nodes , N i , i.e., MG = t N i | ď i ď M ´ u ,where M is the number of internal metabolites in the metabolic network. Definition 2.
Each node N i in MG is a 3-tuple N = ( i , I , O ) where • i is the tag of a metabolite, • I is an array of input reactions that produce the metabolite and • O is an array of output reactions that consume the metabolite. Definition 3.
Each I / O in Definition 2 contains the following data: • The reaction j , 0 ď j ď r ´
1, where r is the number of reactions consum-ing/producing the metabolite i , • I M j / O M j , an array of the metabolites consumed/produced by reaction j .In other words, I M j / O M j = t m kj | ď k ď m ´ u , where m is the numberof consumed/produced metabolites by the reaction j , • I " M j / O " M j , an array of the metabolites produced/consumed by reaction j . In other words, I " M j / O " M j = t m kj | ď k ď m ´ u , where m is thenumber of produced/consumed metabolites by the reaction j excludingthe metabolite i itself, • The direction of the reaction for reversible reactions, • I c ij / O c ij , the coefficient of the reaction j for the produced/consumedmetabolite i in the stoichiometric matrix S and • I f ijp / O f ijp , the flux of the input/output reaction j for a certain pathway p .A general configuration of a node N i in MG is shown in Figure 1. The incomingarcs (i.e., I in Definition 2) in each N i produce the metabolite i and the outgo-ing arcs (i.e., O in Definition 2) consume it. Therefore, consumed metabolites, m i , and produced metabolites, m i , contributing to reaction j as shown in Equa-tion 1, are as arcs between N i nodes each associated to one metabolite. I arcsand O arcs in N i nodes are related to each other by reaction tags in Definition3. r j : m ` m ` ... ` m i ` ... é m ` m ` ... ` m i ` ... (1)4 m -1 0 1 m' -1 j' r' -10 j r -1 j +1 r j : A+B C+D A B C D BC D B A C D AC D C D A B DA B D C A B CA B (a) (b) Figure 1: (a) A general configuration of a node N i in a modified AND/OR graphas defined in Definitions 1, 2 and 3. The node has r input reactions and r outputreactions. The maximum number of AND metabolites for an input reaction j and for an output reaction j is considered as m and m , respectively. (b) Theway the reversible reaction r j is stored in an MG model: MG = { A,B,C,D } ;since the reaction is reversible, it is added to both I and O array of each node.For example for node A , I M j “ t C, D u and I " M j “ t B u ; O M j “ t C, D u and O " M j “ t B u . 5 .2 GB-EFM: Graph-Based EFM Analysis Algorithm The proposed algorithm for finding EFMs of a metabolic network is based onthe following facts:1. Starting from boundary metabolites, each incoming flux in a pathwayproduce some metabolite(s), and each produced metabolite should be con-sumed and the flux of the contributed reactions can be obtained accordingto the stoichiometric coefficients in order to make sure that the constructedpathway is in steady-state.2. To guarantee the elementarity of the produced pathways, two rules arefollowed: (1) Only one output from the node N i should be considered whenconstructing a pathway and (2) multi-path condition should be checked ina node which has more than one input and one output. The multi-pathcondition occurs in a node when independent pathways share part of apathway while traversing the graph.3. When one metabolite is consumed by a reaction, the presence of its AND-related metabolites are required. For example, when metabolites A and B are AND-related, if A is consumed during the reaction A + B Ñ C + D ,then B should also be present which means A and B should be con-sumed simultaneously. Therefore, the algorithm is designed to have aforward/backward flow for the produced/consumed metabolites. In thisexample, the metabolites A and B are consumed while C and D are pro-duced.The proposed graph-based EFM analysis algorithm is composed of the followingfive steps: Step 0. Construction of MG.
The graph MG is constructed based on thestoichiometric matrix as stated in Algorithm 1. Step 1. Finding independent pathways with dependent nodes.
Inthis step, starting from the metabolites connected to the boundary reactions,all possible pathways are constructed by traversing the internal metabolites.Proposition 1 shows that the pathways are constructed with minimum numberof possible reactions using Definition 4 and Definition 5. The semi-minimalityoccurs since different pathways are constructed from different outputs and eachnode is visited multiple times only if required.
Definition 4. Forward/Backward Flow.
The forward flow is applied forthe output metabolites of the selected reaction and the backward flow is appliedfor the and-related metabolites of that reaction.
Definition 5. Primary and Secondary Reactions.
On each pathway, eachnode has a primary input and a primary output which directs the flow of thepathway and tags the metabolite as visited . When an edge (i.e., a reaction)enters an already visited node in that pathway in a forward/backward flow asan input/output, that reaction is tagged as a secondary input/output of thatnode. 6
LGORITHM 1:
Construction of Modified AND/OR Graph.
Input: S : The stoichiometric matrix of the internal metabolites in a metabolicnetwork and an array V representing reversible reactions Output:
The equivalent MG of S for all metabolites i , ď i ď M ´ , the rows of S dofor all reactions j , ď j ď R ´ , the columns of S do Create node N i . for all irreversible reactions doif S ij ą then add an input to N i with the reaction tag j . else if S ij ă then add an output to N i with the reaction tag j . endendfor all reversible reactions v i P V do Add both an input and an output to N i with the reaction tag j . ˚ endendend *The model demonstrates a hierarchy. Each node has a set of reactions (tagged as input or output)and each reaction itself has an array of input and-related metabolites and an array of output and-related metabolites. When we add a reaction to both input array and output array of a metabolite,each input/output becomes an object with the same tag j , that is the reaction name. This affirmshow AND/OR information is preserved in the model. Proposition 1.
Starting from metabolites connected to the boundary reactions, r pathways are constructed from each node where r is the number of output/inputreactions of that node in the forward/backward flow. The pathways are semi-minimal based on the following statements: • each output/input reaction in forward/backward flow is traversed once andlabeled as primary , and • when an edge enters an already visited node, the pathway is considered asclosed in the node and the edge is labeled as secondary . In the forward flow of a node, for each output reaction, an independentpathway is constructed. Each pathway is traversed independently of the otherpathways. In the backward flow of a node, for each input reaction an indepen-dent pathway is constructed. In Figure 2 the forward/backward flow of eachnode N i is shown in an illustrative example. The information of each pathwayis saved independently. Step 2. Eliminate multi-way pathways.
By using Proposition 2 and elim-inating multi-way pathways (defined in Definition 7) which include nodes withmulti-path condition (defined in Definition 6), the set of semi-minimal pathsis reduced to the set of minimal pathways. An example of this is shown inFigure 3. 7 orward Backward N i N i Figure 2: Constructing possible pathways from outputs/inputs in for-ward/backward flow in Step 1. In the forward/backward flow, from the pri-mary input/output illustrated by the thick arrow, two independent pathwaysare constructed along each output/input reaction illustrated by dashed arrows.
Definition 6. Multi-Path Condition.
This is a condition that occurs in anode with more than one input and more than one output and two independentpathways from different origins share this node.
Definition 7. Multi-Way Pathways.
Semi-minimal pathways that containreactions from two independent pathways are referred to as multi-way pathways.
Proposition 2.
Since different pathways are constructed from different outputsof a node and there are both forward and backward flows, depending on the orderof traversing the nodes, there exist multi-way pathways containing reactions fromtwo independent pathways. The pathway is minimal if it is not a multi-waypathway. In other words, it has no node with multi-path condition applied to it.
To solve the multi-path condition, for each produced pathway in Step 1, wetraverse the subgraph of that pathway and tag each reaction with its associatedmetabolites. Since the structure of the pathway is known, one can label theoutgoing/ingoing reactions in forward/backward flow of a pathway.Then, in a pathway, for each node with more than one input and morethan one output reaction, if one can find two routes crossing the node and theorigins of the routes are different, the pathway should be discarded. This ischecked according to the labels, i.e., r node.pathN um s , which has been assignedto each reaction. pathN um is a variable which counts the number of pathwaysgenerated by a node. In other words, this pathway can be functional by eitherof the found routes in the node. Step 3. Adjusting reaction fluxes.
In this step, every node i in a pathway p is examined and the fluxes of the primary input and the primary output ofthat node are calculated using the following equations:8 y x y x y x y x y x y x y x y x y x y x y [ x. ] [ x. ][ y. ] [ y. ] r start r end (a) (b) Figure 3: An example of the multi-path condition and a multi-way path. (a)This figure shows how the order of traversing the nodes may cause a multi-pathcondition in node y . (b) In general, starting from node x , two paths are con-structed and their reactions are labeled as x. x.
2. After arriving at node y from each of the paths, two other paths are constructed and their reactions arelabeled as y. y.
2. Altogether four different paths are obtained, includingthe reaction pairs ( r x. s , r y. s ), ( r x. s , r y. s ), ( r x. s , r y. s ) and ( r x. s , r y. s ). Ascan be seen in the graph, the paths with reactions labeled with 1 are convergedto each other and the paths which their reactions labeled with 2 are convergedto each other as well. Therefore, two of the four paths passing ( r x. s , r y. s ),( r x. s , r y. s ) in fact belong to the same metabolic pathway but the two otherpaths, path 1 shown by solid lines and path 2 shown by dashed lines, are in-dependent. The starting and ending reactions belong to both paths. Step 2identifies this by labeling the outputs and finding the multi-path nodes to elim-inate multi-way paths which are a combination of independent paths.9 f ijp “ I fijp I cij O cij , F orwardI f ijp “ O fijp O cij I cij , Backward (2)In Equation 2, j refers to the primary input/output reaction. Proposition 3explains how all reactions in a pathway p get a flux. Proposition 3.
Starting from a boundary reaction with a certain flux, all nodesare being traversed and all reactions in a pathway get a flux using Equation 2, aseach node has a primary input and a primary output to be used in this equation.Equation 2 states that the amount of the incoming and outgoing fluxes in N i should be equal. Step 4. Balancing flux.
After Step 3, the consumption/production balance ofthe nodes with secondary reactions may be disturbed. To fix this, we go throughthese nodes and find a way forward/backward to output/input reactions torecalculate the extra production/consumption of the unbalanced nodes. UsingProposition 4, the pathways with no balance rate are discarded. If ř I c ij I f ijp ´ ř O c ij O f ijp “
0, the balance condition is accepted for this node. Otherwise,on this pathway, the extra flux is added to the next reactions on the pathway,towards the boundary reactions using Equation 3. In this case, the differencebetween the input rates and the output rates is equal to I c ie I f iep or O c ie O f iep where e implies the reaction which causes the extra production/consumption ofthe N i . The old/new value of the flux is shown by o / n labels in Equation 3 and j refers to the primary input/output reaction. I f p n q ijp “ I f p o q ijp I cij ´ I fiep I cie I cij ,I f p n q ijp “ I f p o q ijp I cij ` O fiep I cie I cij ,O f p n q ijp “ O f p o q ijp O cij ` O fiep I cie O cij ,O f p n q ijp “ O f p o q ijp O cij ´ I fiep I cie O cij (3) Proposition 4.
Flux dependencies can prevent the rate of the production/consumptionof an internal metabolite in a given topology of a pathway from being zero. Equa-tion 3 is used to calculate an update for the consumption/production rate of thenode. In the case of • I f p n q ijp “ or O f p n q ijp “ or • a repetitive loop over a node (i.e., getting back to a node from the samereaction multiple times), while trying to find a way out in a subgraph of agiven pathway, y x. x. y. y. r r r r r r Figure 4: An illustration of Steps 3 and 4 on a sample graph. The pairs ( r , r ),( r , r ), ( r , r ), ( r , r ), ( r , r ) act as primary reactions (input, output) for thefive nodes in Step 3. The two nodes with dashed secondary reactions updateproduction/consumption of metabolites for r / r in Step 4. the pathway is discarded. An illustration of Steps 3 and 4 on a sample graph is shown in Figure 4.The proposed GB-EFM analysis algorithm is presented in Algorithm 2 andis summarized in Theorem 1.
Theorem 1.
Starting from boundary nodes in a metabolic network, GB-EFManalysis algorithm produces all minimal flux-balanced pathways connecting inputand output metabolites with the following properties: • R k Ć R l , for all P k and P l , k, l P r , P ´ s k ‰ l . • ř j I f ijk I c ij “ ř j O f ijk O c ij , for all nodes N i , ď i ď M ´ .where P k refers to a pathway ď k ď P ´ and R k is the set of contribut-ing reactions in the pathway. The above properties indicate that the producedpathways are elementary flux modes.Proof. Step 0 provides the required interface from the stoichiometric matrix.By using Propositions 1 and 2, semi-minimal and then minimal pathways areproduced, respectively. According to Propositions 3 and 4, steady-state condi-tion is checked for the given pathways and fluxes are calculated. Non-qualifiedpathways are discarded in Steps 2 and 4 based on Propositions 2 and 4. Theremaining pathways have minimum functional reactions with balanced fluxesas elementary flux modes. Therefore, the produced pathways are a subset ofEFMs connecting input and output external metabolites.A complete framework of Algorithm 2 is illustrated in Figure 5.11
LGORITHM 2:
Graph-Based EFM Analysis Algorithm.
Input:
The MG graph of a metabolic network derived from its given stoichiometric matrix. Output:
The set of elementary flux modes of the given network from input metabolites to outputmetabolites.Store all boundary metabolites of the network with their associated reactions in the queue Q . for each entry Q i in the queue, ď i ď size p Q q ´ dofor each O j / I j , in forward/backward flow do Start a new path for each reaction j as a primary output/input. If the reaction isreversible, disable the reverse input/output direct. if N i R P then Add N i to P accordingly for each path. else Add j to N i as a secondary output/input. end Push back O Mi / I Mi to Q to be considered in the forward path of the algorithm.Push back O " Mi / I " Mi to Q to be considered in the backward path of the algorithm.Repeat until Q = H . endendfor all P k , ď k ď P ´ dofor all N i in P k doif j ¿ 1 thenfor all O j / I j , ď j ď r ´ in forward/backward flow do Add label N ij to O j / I j . endelse Pass the label from input/output to output/input in forward/backward flow. endendfor all N i in P k with more than two inputs and two outputs doif N x P Ť M N m where M contains all visited metabolites associated with the reaction j thenif a cross match for N xy and N xz in the inputs is found as well as a match in theoutputs then Discard P k . endendendendfor all remaining P k , ď k ď P ´ dofor all N i in P k do Apply Eq. 1. endendfor all remaining P k , ď k ď P ´ dofor all N i in P k with Secondary inputs/outputs do Calculate ř j I fiek I cie “ ř j I fijk I cij ´ ř j O fijk O cij . if ř j I fiek I cie ‰ then Use Eq. 2 to pass the flux to the primary input/output reaction of N i .Repeat until reaching the boundary reactions. endendend r r r r r r r r
01 2 34 6 57 r r r r r r r r r Algorithm 1.Step 1.Step 2.Step 3.Step 4.
Elementary Flux Modes
Step 0.
Figure 5: An example illustrating Steps 0 to 4 on a simple network. In Step 1,three different paths are constructed from a given MG produced in Step 0. Thepaths are divided as we go further in the graph, as shown by dark-colored nodeschanging to light-colored nodes. In Step 2, for the first two paths from left,there is no node with more than one input and one output. Therefore, multi-path condition is not applied. However, the third path is constructed fromthe first two paths. The internal nodes of each of these two paths are shown bydifferent colors. Multi-path condition is checked in their common node shown byoverlapped circles. As a result, this path is discarded. In Step 3, for each path,all nodes are traversed and primary inputs and primary outputs, illustrated byred solid lines, get a flux. In Step 4, nodes with secondary input(s)/output(s)are traversed again to pass the extra flux to the boundary reactions. Secondaryinput(s)/output(s) are illustrated by dotted lines and their associated nodes areillustrated by red circles. According to the coefficients, as shown in the givenstoichiometric matrix embedded in the graph, the second path cannot get astable flux value for all reactions and is discarded. The final output after Step 4is one EFM. 13 Results and Discussion
To calculate the computational complexity of Algorithm 2, the complexity ofeach step is analyzed below: • In Step 1, the maximum number of created pathways is calculated as P max “ M B ˆp r max q M z B , in which M B is the number of boundary metabo-lites, M z B is the number of all other internal metabolites except the bound-ary ones and r max is the maximum number of input/output reactions forall nodes. • In Step 2, for each pathway P i , there are two traversals with the com-plexity of (1) M P i , the number of all nodes in a pathway, and (2) M t P i ,the number of all tagged nodes with more than one input and one out-put in that pathway. Therefore, the order of maximum traversals is P ˆ r M x P i ˆ M P i s or P ˆ M where M P i and M t P i ď M . • In Step 3, all nodes in P i should be traversed which requires P ˆ M P i iterations. P ď P because some pathways may be discarded in Step 2.Therefore, the worst-case computational complexity for this step wouldbe P ˆ M . • In Step 4, for each P i and for all nodes with secondary reactions, (i.e., M v P i nodes), at least half of the nodes should be traversed to lead the extraproduction/consumtion of metabolites to input/output. Therefore, P ˆr M v P i ˆ M P i { s is the computational complexity of this step. Since P ď P and M v P i , M P i { ď M , P ˆ r M { s is the computational complexity inthe worst-case.Considering all items above, P ˆ O p M q , or as a result r max M ˆ O p M q is the computational complexity of the algorithm if we assume M B ! M . Thecomputational complexity of the approach can be considered as an upper-boundfor the number of EFMs including external reactions. Three different topologies for an EFM can be observed in metabolic networks: • Internal pathways with no boundary reaction(s), • pathways with only input or only output reaction(s) consisting of an in-ternal loop, • pathways from input reaction(s) to output reaction(s).These topologies are illustrated in Figure 6. The main focus of Algorithm 2 isthe third topology. However, in order to expand this focus to all EFMs, the14 a)(c)(b) Figure 6: Categorization of EFM topologies as explained in Section 4.2. Anexample of: (a) a “futile cycle” EFM, (b) an inconsistent route: a path withonly input reaction consisting of an internal loop, and (c) a consistent non-cyclicEFM: a straight path (connecting external metabolites).algorithm can be easily modified to keep pathways without the contributionof output reactions (in Case 2) rather than discarding them, or starting frominternal reactions (in Case 1).However, from the biological point of view, internal pathways with no bound-ary reactions, (i.e., loops) are called “futile cycles” and are not biotechnologicallyrelevant [28], and the pathways with boundary inputs and no output implicateinconsistency in the network [29]. Therefore, a meaningful subset of EFMs aretargeted here to keep more relevant EFMs in the output.
To demonstrate the functionality of the model, the GB-EFM method has beenimplemented in C++ and tested on an Intel Core-i5 CPU with 4 GB RAM. Thesource code and results are provided on https://github.com/marabzadeh/GB-EFM. A small metabolic network comprising tricarboxylic acid cycle, glyoxylateshunt and adjacent reactions of amino acid synthesis in
E. coli [30] is consideredto describe the steps of the algorithm and proof the concept. The network hasone boundary input reaction and four boundary output reactions among all25 reactions and 18 internal metabolites as shown in Table 1. Reactions R , R , R and R are boundary output reactions. The elementary flux modeswere obtained using EFMtool [16] as well as the proposed method. In thecase of considering all reactions as irreversible, EFMtool produces 12 EFMs inwhich 3 are cycles, 2 are inconsistent routes with only an input reaction andno output (mentioned in [29] as elementary leakage modes) and the other 7are consistent, non-cyclic pathways. Considering the reversible reactions in the15able 1: The list of reactions for the sample network of E. coli model andthe resulting EFMs obtained by GB-EFM. External output metabolites andcofactors are marked by asterisks. The external metabolite consumed by theinput boundary reaction R is marked by two asterisks. Reactions EFMs ˚ ñ
Pyr + ATP ˚ ˚ + CoA ñ AcCoA + CO2 ˚ + NADH ˚ ñ Cit + CoA 0 0 1 2 1 2 1 1 0 1 0R3 Icd IsoCit + NADP ˚ ñ
OG + CO2 ˚ + NADPH ˚ ˚ + CoA ñ SucCoA + CO2 ˚ + NADH ˚ ñ Succ + Gly 0 0 0 1 1 1 0 1 0 1 0R6 Mas Gly + AcCoA ñ Mal + CoA 0 0 0 1 1 1 0 1 0 1 0R7 AspCon Asp ñ Asp ex ˚ ñ Fum + NH3 ˚ ˚ ñ PEP + ADP ˚ + CO2 ˚ ˚ ñ OAA 1 0 1 0 0 0 1 0 1 1 1R11 Pps Pyr + ATP ˚ ñ
PEP + AMP ˚ ñ Glu ex ˚ ñ Ala ex ˚ ñ Suc ex ˚ + CoA 0 0 0 0 0 1 1 1 1 2 1R15 Eno PG ˚˚ ô PEP 1 1 2 3 2 3 2 2 1 3 1R16 Acn Cit ô IsoCit 0 0 1 2 1 2 1 1 0 1 0R17 SucCD SucCoA + ADP ˚ ô
Succ + ATP ˚ + CoA 0 0 0 0 0 0 0 -1 -1 -2 -1R18 Sdh Succ + FAD ˚ ô Fum + FADH2 ˚ ô Mal 0 0 0 1 1 1 0 0 0 -1 -1R20 Mdh Mal + NAD ˚ ô
OAA + NADH ˚ ô Asp + OG 1 0 0 0 1 0 0 0 1 0 0R22 Gdh OG + NH3 ˚ + NADPH ˚ ô Glu + NADP ˚ ô Ala + OG 0 1 0 0 0 0 0 0 0 0 0R24 SucEx Suc ô Suc ex ˚ table as reversible, 16 EFMs are produced, in which 3 and 2 are cycles andinconsistent routes belonging to the first and the second category as definedin Section 4.2, respectively, and 11 are pathways from input to an output asproduced by GB-EFM. All information are provided in Supplementary file 1.The information of inconsistent routes with further description are provided inSupplementary file 2.The number of pathways produced by the proposed algorithm after Step 1 is40. After Step 2, this is reduced to 30 and after applying Steps 3-4, 5 pathways(final EFMs) remain. These pathways are the same as the eleven “acyclic”EFMs obtained by EFMtool. The resulting EFMs are illustrated in Figure 7.Besides, a set of moderate-sized metabolic networks are considered as furthertest cases. The first network is the pentose phosphate pathway in trypanosomabrucei [32], the second one is the simple network of human red blood cellmetabolism [33] and the third one is an E. coli core model obtained from [34].GB-EFM found all EFMs from
Glucose as reported in Table 2. Computationaltime for all the networks was less than one second. The currency metabolitessuch as ADP, ATP, AMP, CO , H O, O , H , NH , Pi, NAD, NADH, NADPand NADPH were removed from the input stoichiometry matrix of all samples.As stated in the literature [35, 18], some simplifications to the system ' s modelcan be performed in order to reduce the complexity of the problem, such assetting currency metabolites like cofactors and energy metabolites to external.According to [35], for energy currency metabolites like ATP, NADH and FADH,since their concentration is assumed to be constant they are not required to bebalanced by an EFM. PG is an input metabolite and Asp ex, Glu ex, Ala ex16igure 7: The resulting EFMs of the metabolic network comprising tricarboxylicacid cycle, glyoxylate shunt and adjacent reactions of amino acid synthesis in E. coli [30]. Abbreviations are explained in [30]. EFMs are depicted usingEscher [31]. (a) “futile cycle” EFMs, (b) inconsistent routes: paths with onlyinput reaction consisting of internal loops, with no output reaction, and (c)consistent non-cyclic EFMs: paths connecting external metabolites; producedby both EFMtool and GB-EFM. PG is an input metabolite shown by pinkellipse and Asp ex, Glu ex, Ala ex and Suc ex are output metabolites shownby purple squares. External and internal metabolites are illustrated by largercircles and currency metabolites by smaller ones.17able 2: Number of EFMs starting from
Glucose in some networks.
Network Size ( m ˚ n ) T. brucei ˚
35 4Human red blood cell 20 ˚
50 20
E. coli core (Escherichia coli str. K-12 substr. MG1655) 53 ˚
94 47 and Suc ex are output metabolites.
According to the graph data model, the order of traversing the nodes and therelations between reactions and metabolites can be maintained for each pathway.Besides, decisions to select or eliminate a particular reaction or metabolite canbe made during the algorithm. Since in this approach the traversal is on thegraph, many pathways are produced which may be discarded afterwards.The proposed data model gives the opportunity of different levels of paral-lelism to trace pathways. Therefore, the complexity of implementation can bereduced by optimizing the method to only focus on pathways that are desirablein EFM analysis and by exploiting advantages of the possible parallelism in themethod. The graph structure makes a good track of unbalanced metabolites.Our approach tries to decide if a certain “pathway” is EFM or not, merely basedon topology rules (instead of recognizing the elementarity through rank test orcomparing new candidates with produced ones).Our strategy brings up the opportunity of object-oriented programming byconsidering a metabolite as an object. The method at least in its present form,might not be applicable to large-scale networks. However, the graph structuremakes it easier to set-up rules for pathways and explore the intended solutionspace via rules. The proposed method can potentially be implemented througha system design approach on hardware more conveniently.A hypergraph is a simplified type of the modified AND/OR graph. Theapplication of hypergraphs in biological networks has been reviewed in [36]. Inthe introduced model, each arc has two different input and output coefficients.Besides, for each arc, a dynamic label for each pathway is defined, that is, theflux of the reaction represented by that arc. The proposed framework can beconsidered as an extension to the well-known flow network problem for hyper-graphs. The edges are directed and each has been associated with a weight.However, the weight of an output arc of a node is different from the weightwhen the arc enters another node as an input. Besides, the hypergraph struc-ture complicates pathways topology [37]. The complication arises since fluxesof contributing input and output reactions over a node may be related. This re-sults to the non-linear property of fluxes. The non-linearity has been consideredin our flux calculation procedure (see Eq. 2 and Eq. 3).The first introduced double-description method for finding EFMs exploresthe whole set of pathways to find pathways that are in the steady-state solution18pace and then selects EFMs by comparing the reaction subset of pathways [5].In the improved double-description versions of the method which use null-spaceof the stoichiometry matrix as an input, different combinations of pathwaysin the null-space are calculated and then the reaction subset of pathways arecompared [12, 13, 14]. The main difference of the GB-EFM method with double-description-based methods is that GB-EFM first calculates the shortest path-ways in the solution space and then checks to see if these pathways can be inthe steady-state condition by using some rules on the topology of the pathway.The method of [20] explores the shortest pathways in the AND/OR graphof the network. The authors did not take into account the stoichiometry coeffi-cients. Thus, the resulting pathways may or may not be EFM. In the methodof [21], all reaction combinations to balance a metabolite are calculated and allEFMs are computed accordingly. The order of traversing unbalanced metabo-lites is based on the graph model. The proposed model in this paper traversesthe graph and keeps both the stoichiometry information and the minimality ofthe pathway with the opportunity of not exploring the whole solution space.The methods based on linear programming usually restrict their method to in-teger variables [18] which is limiting. In the case of the proposed method thisis not a serious concern.The presented approach, as discussed earlier, aims at finding EFMs by spe-cific characteristics, i.e., the third category is Section 4.2. This brings up the factthat we are looking for new ways to reduce the solution space. EFMtool and themethod of [21] target the whole set of EFMs, which means either they find thecomplete set of EFMs of a metabolic network (which is a lot of non-categorizedoutput information) or they fail to find any EFM. In comparison, our methodis looking for a meaningful subset of EFMs which leads to a categorized outputset with the cost of losing speed.In several approaches such as [16, 18], reversible reactions are considered astwo irreversible reactions and the two-cycle EFMs are being removed duringthe EFM calculation. As it has been studied in [38], by splitting reversiblereactions, the size of the flux cone description increases by one. However, inthe proposed method, while reversible reactions are added to both input andoutput of a metabolite, they are treated as reversible routes. When a reversibleroute is used in one direction, the other direction is disabled. Consequently, theproposed method preserves the dimension of the flux cone.
In order to calculate the elementary flux modes of a given metabolic network,an algorithm was proposed based on a modified AND/OR graph. It was shownin the paper that the steps in the algorithm leads to exact pathways accordingto EFM definition. The worst-case computational complexity of the algorithmwas calculated as O p r max M ˆ M q which is a newly reported upper-bound forthe number of EFMs with external metabolites. Additionally, a set of test caseswas provided to prove the concept of the algorithm.19he main focus of this paper was to introduce a model to find minimal flux-balanced pathways in metabolic networks. The model can also be applicableto several constraint-based pathway analysis approaches. Using the potentialparallelism in the method to speed-up the algorithm on a hardware structureand proposing different categorization of EFMs based on reactions/metabolitescharacteristics, as is possible according to the graph data model, are consideredas our future research. Acknowledgements
Authors would like to thank Dr. Nathan Lewis (UCSD) for helpful discussionson the concept.
References [1] Christopher S Henry, Matthew DeJongh, Aaron A Best, Paul M Frybarger,Ben Linsay, and Rick L Stevens. High-throughput generation, optimiza-tion and analysis of genome-scale metabolic models.
Nature biotechnology ,28(9):977–982, 2010.[2] Nathan D Price, Jennifer L Reed, and Bernhard Ø Palsson. Genome-scale models of microbial cells: evaluating the consequences of constraints.
Nature Reviews Microbiology , 2(11):886–897, 2004.[3] Jeffrey D Orth, Ines Thiele, and Bernhard Ø Palsson. What is flux balanceanalysis?
Nature biotechnology , 28(3):245–248, 2010.[4] Edda Klipp, Ralf Herwig, Axel Kowald, Christoph Wierling, and HansLehrach.
Systems biology in practice: concepts, implementation and appli-cation . John Wiley & Sons, 2008.[5] Stefan Schuster and Claus Hilgetag. On elementary flux modes in bio-chemical reaction systems at steady state.
Journal of Biological Systems ,2(02):165–182, 1994.[6] Stefan Schuster, David A Fell, and Thomas Dandekar. A general definitionof metabolic pathways useful for systematic organization and analysis ofcomplex metabolic networks.
Nature biotechnology , 18(3):326–332, 2000.[7] S Schuster, S Klamt, W Weckwerth, F Moldenhauer, and T Pfeiffer. Useof network analysis of metabolic systems in bioengineering.
Bioprocess andBiosystems Engineering , 24(6):363–372, 2002.[8] Devesh Radhakrishnan, Meghna Rajvanshi, and KV Venkatesh. Pheno-typic characterization of corynebacterium glutamicum using elementarymodes towards synthesis of amino acids.
Systems and synthetic biology ,4(4):281–291, 2010. 209] K Parvatham and L Veerakumari. Drug target prediction using elementarymode analysis in ascaris lumbricoides energy metabolism.
Biotechnologyand bioprocess engineering , 18(3):491–500, 2013.[10] Daniel Machado and Markus J Herrg˚ard. Co-evolution of strain designmethods based on flux balance and elementary mode analysis.
MetabolicEngineering Communications , 2:85–92, 2015.[11] Jason A Papin, Nathan D Price, Sharon J Wiback, David A Fell, andBernhard O Palsson. Metabolic pathways in the post-genome era.
Trendsin biochemical sciences , 28(5):250–258, 2003.[12] C Wagner. Nullspace approach to determine the elementary modes of chem-ical reaction systems.
The Journal of Physical Chemistry B , 108(7):2425–2431, 2004.[13] Robert Urbanczik and Carl Wagner. An improved algorithm for stoi-chiometric network analysis: theory and applications.
Bioinformatics ,21(7):1203–1210, 2005.[14] Lake Ee Quek and Lars K Nielsen. A depth-first search algorithm to com-pute elementary flux modes by linear programming.
BMC systems biology ,8(1):1, 2014.[15] Julien Gagneur and Steffen Klamt. Computation of elementary modes: aunifying framework and the new binary approach.
BMC bioinformatics ,5(1):1, 2004.[16] Marco Terzer and J¨org Stelling. Large-scale computation of elementaryflux modes with bit pattern trees.
Bioinformatics , 24(19):2229–2235, 2008.[17] Axel Von Kamp and Stefan Schuster. Metatool 5.0: fast and flexible ele-mentary modes analysis.
Bioinformatics , 22(15):1930–1931, 2006.[18] Luis F De Figueiredo, Adam Podhorski, Angel Rubio, Christoph Kaleta,John E Beasley, Stefan Schuster, and Francisco J Planes. Computing theshortest elementary flux modes in genome-scale metabolic networks.
Bioin-formatics , 25(23):3158–3165, 2009.[19] Laszlo David and Alexander Bockmayr. Computing elementary flux modesinvolving a set of target reactions.
IEEE/ACM Transactions on Computa-tional Biology and Bioinformatics , 11(6):1099–1107, 2014.[20] Jose Francisco Hidalgo C´espedes, Francisco De As´ıs Guil Asensio, and JoseManuel Garc´ıa Carrasco. A new approach to obtain efms using graphmethods based on the shortest path between end nodes. In
InternationalConference on Bioinformatics and Biomedical Engineering , pages 641–649.Springer, 2015. 2121] Ehsan Ullah, Shuchin Aeron, and Soha Hassoun. gEFM: an algorithmfor computing elementary flux modes using graph traversal.
IEEE/ACMTransactions on Computational Biology and Bioinformatics , 13(1):122–134,2016.[22] Vicente Acu˜na, Flavio Chierichetti, Vincent Lacroix, Alberto Marchetti-Spaccamela, Marie-France Sagot, and Leen Stougie. Modes and cuts inmetabolic networks: Complexity and algorithms.
Biosystems , 95(1):51–60,2009.[23] Vicente Acu˜na, Alberto Marchetti-Spaccamela, Marie-France Sagot, andLeen Stougie. A note on the complexity of finding and enumerating ele-mentary modes.
Biosystems , 99(3):210–214, 2010.[24] Steffen Klamt and J¨org Stelling. Combinatorial complexity of pathwayanalysis in metabolic networks.
Molecular biology reports , 29(1-2):233–236,2002.[25] Sabine P´er`es, Fran¸cois Vall´ee, Marie Beurton-Aimar, and Jean-PierreMazat. Acom: a classification method for elementary flux modes basedon motif finding.
BioSystems , 103(3):410–419, 2011.[26] Matthias P Gerstl, David E Ruckerbauer, Diethard Mattanovich, Chris-tian Jungreuthmayer, and J¨urgen Zanghellini. Metabolomics integratedelementary flux mode analysis in large metabolic networks.
Scientific re-ports , 5:8930, 2015.[27] J¨urgen Zanghellini, David E Ruckerbauer, Michael Hanscho, and Chris-tian Jungreuthmayer. Elementary flux modes in a nutshell: properties,calculation and applications.
Biotechnology journal , 8(9):1009–1016, 2013.[28] Anthony P Burgard, Shankar Vaidyaraman, and Costas D Maranas. Min-imal reaction sets for
Escherichia coli metabolism under different growthrequirements and uptake environments.
Biotechnology progress , 17(5):791–797, 2001.[29] Albert Gevorgyan, Mark G Poolman, and David A Fell. Detection ofstoichiometric inconsistencies in biomolecular models.
Bioinformatics ,24(19):2245–2251, 2008.[30] Stefan Schuster, Thomas Dandekar, and David A Fell. Detection of ele-mentary flux modes in biochemical networks: a promising tool for pathwayanalysis and metabolic engineering.
Trends in biotechnology , 17(2):53–60,1999.[31] Zachary A King, Andreas Dr¨ager, Ali Ebrahim, Nikolaus Sonnenschein,Nathan E Lewis, and Bernhard O Palsson. Escher: a web applicationfor building, sharing, and embedding data-rich visualizations of biologicalpathways.
PLoS Comput Biol , 11(8):e1004321, 2015.2232] Eduard J Kerkhoven, Fiona Achcar, Vincent P Alibu, Richard J Burch-more, Ian H Gilbert, Maciej Trybi(cid:32)lo, Nicole N Driessen, David Gilbert,Rainer Breitling, Barbara M Bakker, et al. Handling uncertainty in dy-namic models: the pentose phosphate pathway in trypanosoma brucei.
PLoS Comput Biol , 9(12):e1003371, 2013.[33] Sharon J Wiback and Bernhard O Palsson. Extreme pathway analysisof human red blood cell metabolism.
Biophysical Journal , 83(2):808–818,2002.[34] Zachary A King, Justin Lu, Andreas Dr¨ager, Philip Miller, StephenFederowicz, Joshua A Lerman, Ali Ebrahim, Bernhard O Palsson, andNathan E Lewis. Bigg models: A platform for integrating, standardizingand sharing genome-scale models.
Nucleic acids research , 44(D1):D515–D522, 2016.[35] Christoph Kaleta, Luıs Filipe De Figueiredo, J¨orn Behre, and Stefan Schus-ter. EFMEvolver: Computing elementary flux modes in genome-scalemetabolic networks. In
Lecture Notes in Informatics-Proceedings , volume157, pages 179–189, 2009.[36] Steffen Klamt, Utz-Uwe Haus, and Fabian Theis. Hypergraphs and cellularnetworks.
PLoS computational biology , 5(5):e1000385, 2009.[37] Sayed Amir Marashi and Mojtaba Tefagh. A mathematical approach toemergent properties of metabolic networks: Partial coupling relations, hy-perarcs and flux ratios.
Journal of theoretical biology , 355:185–193, 2014.[38] Abdelhalim Larhlimi and Alexander Bockmayr. On inner and outer descrip-tions of the steady-state flux cone of a metabolic network. In