[PDF] A Graph-Based Approach to Analyze Flux-Balanced Pathways in Metabolic Networks

Abstract

An Elementary Flux Mode (EFM) is a pathway with minimum set of reactions that are functional in steady-state constrained space. Due to the high computational complexity of calculating EFMs, different approaches have been proposed to find these flux-balanced pathways. In this paper, an approach to find a subset of EFMs is proposed based on a graph data model. The given metabolic network is mapped to the graph model and decisions for reaction inclusion can be made based on metabolites and their associated reactions. This notion makes the approach more convenient to categorize the output pathways. Implications of the proposed method on metabolic networks are discussed.

Full PDF

AA Graph-Based Approach to Analyze Flux-Balanced Pathwaysin Metabolic Networks

Mona Arabzadeh , Morteza Saheb Zamani , Mehdi Sedighi ,and Sayed-Amir Marashi Department of Computer Engineering and Information Technology,Amirkabir University of Technology, Tehran, Iran and Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran. { m.arabzadeh,szamani,msedighi } @aut.ac.ir, [email protected] Abstract

An Elementary Flux Mode (EFM) is a pathway with minimum set ofreactions that are functional in steady-state constrained space. Due to thehigh computational complexity of calculating EFMs, diﬀerent approacheshave been proposed to ﬁnd these ﬂux-balanced pathways. In this paper,an approach to ﬁnd a subset of EFMs is proposed based on a graph datamodel. The given metabolic network is mapped to the graph model anddecisions for reaction inclusion can be made based on metabolites andtheir associated reactions. This notion makes the approach more conve-nient to categorize the output pathways. Implications of the proposedmethod on metabolic networks are discussed.

Keywords:

Elementary Flux Mode (EFM); Graph Data Model; Metabolic Network

Metabolic network models are among the well-studied models in biotechnology.The reconstruction of these networks is possible by collecting the gene-protein-reaction information from related genomic data and literature [1]. It is impor-tant to explore biologically relevant pathways in metabolic networks. Forcingconstraints to a reconstructed biochemical network results in the deﬁnition ofachievable cellular functions [2]. Mathematical representation of constraints areas ﬂux-balance constraints (e.g., conservation of ﬂux) which means the networkshould be at the steady-state condition, and ﬂux bounds which limit numer-ical ranges of network parameters and coeﬃcients such as the minimum andmaximum range of ﬂuxes for each reaction. Flux Balance Analysis (FBA), amethod to predict the optimal growth rate of a a certain species when grown onparticular set of metabolites [3], and Elementary Flux Mode (EFM) analysis,an approach to decompose a network to minimal functional pathways [4], areamong the approaches used in constraint-based analysis of metabolic networks.EFMs [5, 6] have been used in several biological applications such as bio-engineering [7], phenotypic characterization [8], drug target prediction [9] andstrain design [10]. The incorporation of kinetic analysis into EFMs enables amore complete description of cellular functions for which kinetics play a domi-nant role [11]. 1 a r X i v : . [ q - b i o . M N ] F e b everal methods were introduced for ﬁnding EFMs based on double-descriptionmethod [5]. Double-description is a technique to enumerate all extreme rays of apolyhedral cone. An improved approach to the primary method was introducedin which the null-space of the stoichiometric matrix is used instead of the matrixitself to generate EFM candidates [12, 13, 14]. Various eﬀective computationalapproaches have been proposed to speed-up previous methods for computingEFMs [15, 16]. Some of these approaches led to the development of computa-tional tools such as Metatool [17] and

EFMtool [16]. Besides, methods basedon linear programming have been proposed that explore a set of EFMs withspeciﬁc properties, such as K -shortest EFMs [18], or EFMs with a given set oftarget reactions [19]. A new set of methods based on graph-theory, [20, 21] triesto overcome the scalability problem of the double-description-based techniques.According to the complexity of computing EFMs as discussed in [22, 23],an additional challenge of extracting biological properties from large set ofEFMs [11] has been raised. As stated in [23], the complexity of enumerating allEFMs still remains unknown and the computational complexity of enumeratingEFMs containing a speciﬁc reaction is hard. Even for networks with the samenumber of reactions and metabolites, knowing the number of EFMs in one net-work cannot necessarily help in ﬁnding the number of EFMs in the other one,as they have diﬀerent connectivities and topologies. This fact emphasizes theinherent structural information reﬂected by EFMs [24] and is a motivation toextract a set of biologically meaningful EFMs according to a biological aspectsuch as motifs [25] and thermodynamics [26].The main focus of this paper is to introduce a data model based on theAND/OR graph and to propose an approach to ﬁnd ﬂux-balanced pathwaysaccording to pathway topology and reaction stoichiometries. It is shown thatthe computed ﬂux-balanced pathways are a biologically relevant subset of EFMswhich include external input and output metabolites. In other words, the subsetcomprises all EFMs connecting input and output external metabolites. Besides,based on the introduced graph data model, an upper-bound for the complex-ity of exploring EFMs containing external metabolites is calculated. Using thetopology of the network gives us the opportunity to make decisions according tothe metabolite/reaction aspects to preserve or eliminate a particular reaction ormetabolite in a certain pathway. Introducing a model to consider both topology and stoichiometry of a metabolic network for network analysis may lead to bet-ter biological decisions and output categorization. The graph structure makesit easier to set-up rules for pathways and to explore the intended solution spacevia rules. Finally, our method can potentially be implemented on hardware(through a system design approach) more conveniently.The rest of the text is organized as follows. Some required concepts and pre-liminaries are provided in Section 2. Before explaining the proposed approachto ﬁnd the elementary ﬂux modes in a given metabolic network (Section 3.2),we ﬁrst describe the modiﬁed AND/OR graph model and its properties in Sec-tion 3.1. Finally, Section 4 is devoted to results and discussions and Section 5concludes the paper. 2 Preliminaries

In this section, some basic concepts along with the formal deﬁnition of elemen-tary ﬂux modes are provided.

Metabolic networks model the metabolism of living cells in terms of a set ofbiochemical reactions. Deﬁnitions in this section are derived from [27]. Thebiochemical reactions can be irreversible, i.e., the reaction can be active onlyin one direction, or reversible, i.e., the reaction can be active in both direc-tions. The contributing metabolites in a reaction can be either substrates orproducts. Substrates are consumed and products are produced during the op-eration of a reaction. The topology of a metabolic network is characterized byits m ˆ n stoichiometric matrix, S , where m and n correspond to the number ofmetabolites and reactions, respectively. In this paper, external metabolites arenot included in S . The value S ij represents the stoichiometric coeﬃcient of themetabolite i in the reaction j . S ij is positive/negative if the metabolite i is pro-duced/consumed. If this coeﬃcient is zero it means that the metabolite i doesnot contribute to the reaction j . The network is considered in the steady-stateif for each internal metabolite, the rates of consumption and production areequal. The reactions connected to the external metabolites are called Boundary reactions.

A ﬂux vector v ‰ and v P R n is considered an EFM if it meets the followingconditions: • v i ě i P irreversible reactions (thermodynamic constraint). • S . v “ (steady-state condition). • There is no v P R n with supp p v q Ă supp p v q , where support of a modeis deﬁned as supp p v q “ t i | v i ‰ u , (minimality limitation). In this section, the proposed graph-based elementary ﬂux mode analysis is dis-cussed in detail. To do so, the data structure used in our approach is deﬁned,and then, the proposed approach is introduced accordingly.

The proposed data model is based on the conventional AND/OR graph in com-puter science. However, we provide a diﬀerent deﬁnition with additional featuresto make the model appropriate for our proposed algorithm. In our model, the3oeﬃcients of the metabolites in reactions are embedded in the graph structureas attributes of each node (explained in Deﬁnition 3, see below). A pathway is deﬁned as a set of reactions with their associated ﬂuxes. The ﬂux of eachreaction in a certain pathway is also embedded in the graph.

Deﬁnition 1.

The Modiﬁed Graph for representing a metabolic network, de-noted as MG , is deﬁned as a set of Nodes , N i , i.e., MG = t N i | ď i ď M ´ u ,where M is the number of internal metabolites in the metabolic network. Deﬁnition 2.

Each node N i in MG is a 3-tuple N = ( i , I , O ) where • i is the tag of a metabolite, • I is an array of input reactions that produce the metabolite and • O is an array of output reactions that consume the metabolite. Deﬁnition 3.

Each I / O in Deﬁnition 2 contains the following data: • The reaction j , 0 ď j ď r ´

1, where r is the number of reactions consum-ing/producing the metabolite i , • I M j / O M j , an array of the metabolites consumed/produced by reaction j .In other words, I M j / O M j = t m kj | ď k ď m ´ u , where m is the numberof consumed/produced metabolites by the reaction j , • I " M j / O " M j , an array of the metabolites produced/consumed by reaction j . In other words, I " M j / O " M j = t m kj | ď k ď m ´ u , where m is thenumber of produced/consumed metabolites by the reaction j excludingthe metabolite i itself, • The direction of the reaction for reversible reactions, • I c ij / O c ij , the coeﬃcient of the reaction j for the produced/consumedmetabolite i in the stoichiometric matrix S and • I f ijp / O f ijp , the ﬂux of the input/output reaction j for a certain pathway p .A general conﬁguration of a node N i in MG is shown in Figure 1. The incomingarcs (i.e., I in Deﬁnition 2) in each N i produce the metabolite i and the outgo-ing arcs (i.e., O in Deﬁnition 2) consume it. Therefore, consumed metabolites, m i , and produced metabolites, m i , contributing to reaction j as shown in Equa-tion 1, are as arcs between N i nodes each associated to one metabolite. I arcsand O arcs in N i nodes are related to each other by reaction tags in Deﬁnition3. r j : m ` m ` ... ` m i ` ... é m ` m ` ... ` m i ` ... (1)4 m -1 0 1 m' -1 j' r' -10 j r -1 j +1 r j : A+B C+D A B C D BC D B A C D AC D C D A B DA B D C A B CA B  (a) (b) Figure 1: (a) A general conﬁguration of a node N i in a modiﬁed AND/OR graphas deﬁned in Deﬁnitions 1, 2 and 3. The node has r input reactions and r outputreactions. The maximum number of AND metabolites for an input reaction j and for an output reaction j is considered as m and m , respectively. (b) Theway the reversible reaction r j is stored in an MG model: MG = { A,B,C,D } ;since the reaction is reversible, it is added to both I and O array of each node.For example for node A , I M j “ t C, D u and I " M j “ t B u ; O M j “ t C, D u and O " M j “ t B u . 5 .2 GB-EFM: Graph-Based EFM Analysis Algorithm The proposed algorithm for ﬁnding EFMs of a metabolic network is based onthe following facts:1. Starting from boundary metabolites, each incoming ﬂux in a pathwayproduce some metabolite(s), and each produced metabolite should be con-sumed and the ﬂux of the contributed reactions can be obtained accordingto the stoichiometric coeﬃcients in order to make sure that the constructedpathway is in steady-state.2. To guarantee the elementarity of the produced pathways, two rules arefollowed: (1) Only one output from the node N i should be considered whenconstructing a pathway and (2) multi-path condition should be checked ina node which has more than one input and one output. The multi-pathcondition occurs in a node when independent pathways share part of apathway while traversing the graph.3. When one metabolite is consumed by a reaction, the presence of its AND-related metabolites are required. For example, when metabolites A and B are AND-related, if A is consumed during the reaction A + B Ñ C + D ,then B should also be present which means A and B should be con-sumed simultaneously. Therefore, the algorithm is designed to have aforward/backward ﬂow for the produced/consumed metabolites. In thisexample, the metabolites A and B are consumed while C and D are pro-duced.The proposed graph-based EFM analysis algorithm is composed of the followingﬁve steps: Step 0. Construction of MG.

The graph MG is constructed based on thestoichiometric matrix as stated in Algorithm 1. Step 1. Finding independent pathways with dependent nodes.

Inthis step, starting from the metabolites connected to the boundary reactions,all possible pathways are constructed by traversing the internal metabolites.Proposition 1 shows that the pathways are constructed with minimum numberof possible reactions using Deﬁnition 4 and Deﬁnition 5. The semi-minimalityoccurs since diﬀerent pathways are constructed from diﬀerent outputs and eachnode is visited multiple times only if required.

Deﬁnition 4. Forward/Backward Flow.

The forward ﬂow is applied forthe output metabolites of the selected reaction and the backward ﬂow is appliedfor the and-related metabolites of that reaction.

Deﬁnition 5. Primary and Secondary Reactions.

On each pathway, eachnode has a primary input and a primary output which directs the ﬂow of thepathway and tags the metabolite as visited . When an edge (i.e., a reaction)enters an already visited node in that pathway in a forward/backward ﬂow asan input/output, that reaction is tagged as a secondary input/output of thatnode. 6

LGORITHM 1:

Construction of Modiﬁed AND/OR Graph.

Input: S : The stoichiometric matrix of the internal metabolites in a metabolicnetwork and an array V representing reversible reactions Output:

The equivalent MG of S for all metabolites i , ď i ď M ´ , the rows of S dofor all reactions j , ď j ď R ´ , the columns of S do Create node N i . for all irreversible reactions doif S ij ą then add an input to N i with the reaction tag j . else if S ij ă then add an output to N i with the reaction tag j . endendfor all reversible reactions v i P V do Add both an input and an output to N i with the reaction tag j . ˚ endendend *The model demonstrates a hierarchy. Each node has a set of reactions (tagged as input or output)and each reaction itself has an array of input and-related metabolites and an array of output and-related metabolites. When we add a reaction to both input array and output array of a metabolite,each input/output becomes an object with the same tag j , that is the reaction name. This aﬃrmshow AND/OR information is preserved in the model. Proposition 1.

Starting from metabolites connected to the boundary reactions, r pathways are constructed from each node where r is the number of output/inputreactions of that node in the forward/backward ﬂow. The pathways are semi-minimal based on the following statements: • each output/input reaction in forward/backward ﬂow is traversed once andlabeled as primary , and • when an edge enters an already visited node, the pathway is considered asclosed in the node and the edge is labeled as secondary . In the forward ﬂow of a node, for each output reaction, an independentpathway is constructed. Each pathway is traversed independently of the otherpathways. In the backward ﬂow of a node, for each input reaction an indepen-dent pathway is constructed. In Figure 2 the forward/backward ﬂow of eachnode N i is shown in an illustrative example. The information of each pathwayis saved independently. Step 2. Eliminate multi-way pathways.

By using Proposition 2 and elim-inating multi-way pathways (deﬁned in Deﬁnition 7) which include nodes withmulti-path condition (deﬁned in Deﬁnition 6), the set of semi-minimal pathsis reduced to the set of minimal pathways. An example of this is shown inFigure 3. 7 orward Backward N i N i Figure 2: Constructing possible pathways from outputs/inputs in for-ward/backward ﬂow in Step 1. In the forward/backward ﬂow, from the pri-mary input/output illustrated by the thick arrow, two independent pathwaysare constructed along each output/input reaction illustrated by dashed arrows.

Deﬁnition 6. Multi-Path Condition.

This is a condition that occurs in anode with more than one input and more than one output and two independentpathways from diﬀerent origins share this node.

Deﬁnition 7. Multi-Way Pathways.

Semi-minimal pathways that containreactions from two independent pathways are referred to as multi-way pathways.

Proposition 2.

Since diﬀerent pathways are constructed from diﬀerent outputsof a node and there are both forward and backward ﬂows, depending on the orderof traversing the nodes, there exist multi-way pathways containing reactions fromtwo independent pathways. The pathway is minimal if it is not a multi-waypathway. In other words, it has no node with multi-path condition applied to it.

To solve the multi-path condition, for each produced pathway in Step 1, wetraverse the subgraph of that pathway and tag each reaction with its associatedmetabolites. Since the structure of the pathway is known, one can label theoutgoing/ingoing reactions in forward/backward ﬂow of a pathway.Then, in a pathway, for each node with more than one input and morethan one output reaction, if one can ﬁnd two routes crossing the node and theorigins of the routes are diﬀerent, the pathway should be discarded. This ischecked according to the labels, i.e., r node.pathN um s , which has been assignedto each reaction. pathN um is a variable which counts the number of pathwaysgenerated by a node. In other words, this pathway can be functional by eitherof the found routes in the node. Step 3. Adjusting reaction ﬂuxes.

In this step, every node i in a pathway p is examined and the ﬂuxes of the primary input and the primary output ofthat node are calculated using the following equations:8 y x y x y x y x y x y x y x y x y x y x y [ x. ] [ x. ][ y. ] [ y. ] r start r end (a) (b) Figure 3: An example of the multi-path condition and a multi-way path. (a)This ﬁgure shows how the order of traversing the nodes may cause a multi-pathcondition in node y . (b) In general, starting from node x , two paths are con-structed and their reactions are labeled as x. x.

2. After arriving at node y from each of the paths, two other paths are constructed and their reactions arelabeled as y. y.

2. Altogether four diﬀerent paths are obtained, includingthe reaction pairs ( r x. s , r y. s ), ( r x. s , r y. s ), ( r x. s , r y. s ) and ( r x. s , r y. s ). Ascan be seen in the graph, the paths with reactions labeled with 1 are convergedto each other and the paths which their reactions labeled with 2 are convergedto each other as well. Therefore, two of the four paths passing ( r x. s , r y. s ),( r x. s , r y. s ) in fact belong to the same metabolic pathway but the two otherpaths, path 1 shown by solid lines and path 2 shown by dashed lines, are in-dependent. The starting and ending reactions belong to both paths. Step 2identiﬁes this by labeling the outputs and ﬁnding the multi-path nodes to elim-inate multi-way paths which are a combination of independent paths.9 f ijp “ I fijp I cij O cij , F orwardI f ijp “ O fijp O cij I cij , Backward (2)In Equation 2, j refers to the primary input/output reaction. Proposition 3explains how all reactions in a pathway p get a ﬂux. Proposition 3.

Starting from a boundary reaction with a certain ﬂux, all nodesare being traversed and all reactions in a pathway get a ﬂux using Equation 2, aseach node has a primary input and a primary output to be used in this equation.Equation 2 states that the amount of the incoming and outgoing ﬂuxes in N i should be equal. Step 4. Balancing ﬂux.

After Step 3, the consumption/production balance ofthe nodes with secondary reactions may be disturbed. To ﬁx this, we go throughthese nodes and ﬁnd a way forward/backward to output/input reactions torecalculate the extra production/consumption of the unbalanced nodes. UsingProposition 4, the pathways with no balance rate are discarded. If ř I c ij I f ijp ´ ř O c ij O f ijp “

0, the balance condition is accepted for this node. Otherwise,on this pathway, the extra ﬂux is added to the next reactions on the pathway,towards the boundary reactions using Equation 3. In this case, the diﬀerencebetween the input rates and the output rates is equal to I c ie I f iep or O c ie O f iep where e implies the reaction which causes the extra production/consumption ofthe N i . The old/new value of the ﬂux is shown by o / n labels in Equation 3 and j refers to the primary input/output reaction. I f p n q ijp “ I f p o q ijp I cij ´ I fiep I cie I cij ,I f p n q ijp “ I f p o q ijp I cij ` O fiep I cie I cij ,O f p n q ijp “ O f p o q ijp O cij ` O fiep I cie O cij ,O f p n q ijp “ O f p o q ijp O cij ´ I fiep I cie O cij (3) Proposition 4.

Flux dependencies can prevent the rate of the production/consumptionof an internal metabolite in a given topology of a pathway from being zero. Equa-tion 3 is used to calculate an update for the consumption/production rate of thenode. In the case of • I f p n q ijp “ or O f p n q ijp “ or • a repetitive loop over a node (i.e., getting back to a node from the samereaction multiple times), while trying to ﬁnd a way out in a subgraph of agiven pathway, y x. x. y. y. r r r r r r Figure 4: An illustration of Steps 3 and 4 on a sample graph. The pairs ( r , r ),( r , r ), ( r , r ), ( r , r ), ( r , r ) act as primary reactions (input, output) for theﬁve nodes in Step 3. The two nodes with dashed secondary reactions updateproduction/consumption of metabolites for r / r in Step 4. the pathway is discarded. An illustration of Steps 3 and 4 on a sample graph is shown in Figure 4.The proposed GB-EFM analysis algorithm is presented in Algorithm 2 andis summarized in Theorem 1.

Theorem 1.

Starting from boundary nodes in a metabolic network, GB-EFManalysis algorithm produces all minimal ﬂux-balanced pathways connecting inputand output metabolites with the following properties: • R k Ć R l , for all P k and P l , k, l P r , P ´ s k ‰ l . • ř j I f ijk I c ij “ ř j O f ijk O c ij , for all nodes N i , ď i ď M ´ .where P k refers to a pathway ď k ď P ´ and R k is the set of contribut-ing reactions in the pathway. The above properties indicate that the producedpathways are elementary ﬂux modes.Proof. Step 0 provides the required interface from the stoichiometric matrix.By using Propositions 1 and 2, semi-minimal and then minimal pathways areproduced, respectively. According to Propositions 3 and 4, steady-state condi-tion is checked for the given pathways and ﬂuxes are calculated. Non-qualiﬁedpathways are discarded in Steps 2 and 4 based on Propositions 2 and 4. Theremaining pathways have minimum functional reactions with balanced ﬂuxesas elementary ﬂux modes. Therefore, the produced pathways are a subset ofEFMs connecting input and output external metabolites.A complete framework of Algorithm 2 is illustrated in Figure 5.11

LGORITHM 2:

Graph-Based EFM Analysis Algorithm.

Input:

The MG graph of a metabolic network derived from its given stoichiometric matrix. Output:

The set of elementary ﬂux modes of the given network from input metabolites to outputmetabolites.Store all boundary metabolites of the network with their associated reactions in the queue Q . for each entry Q i in the queue, ď i ď size p Q q ´ dofor each O j / I j , in forward/backward ﬂow do Start a new path for each reaction j as a primary output/input. If the reaction isreversible, disable the reverse input/output direct. if N i R P then Add N i to P accordingly for each path. else Add j to N i as a secondary output/input. end Push back O Mi / I Mi to Q to be considered in the forward path of the algorithm.Push back O " Mi / I " Mi to Q to be considered in the backward path of the algorithm.Repeat until Q = H . endendfor all P k , ď k ď P ´ dofor all N i in P k doif j ¿ 1 thenfor all O j / I j , ď j ď r ´ in forward/backward ﬂow do Add label N ij to O j / I j . endelse Pass the label from input/output to output/input in forward/backward ﬂow. endendfor all N i in P k with more than two inputs and two outputs doif N x P Ť M N m where M contains all visited metabolites associated with the reaction j thenif a cross match for N xy and N xz in the inputs is found as well as a match in theoutputs then Discard P k . endendendendfor all remaining P k , ď k ď P ´ dofor all N i in P k do Apply Eq. 1. endendfor all remaining P k , ď k ď P ´ dofor all N i in P k with Secondary inputs/outputs do Calculate ř j I fiek I cie “ ř j I fijk I cij ´ ř j O fijk O cij . if ř j I fiek I cie ‰ then Use Eq. 2 to pass the ﬂux to the primary input/output reaction of N i .Repeat until reaching the boundary reactions. endendend r r r r r r r r

01 2 34 6 57 r r r r r r r r r Algorithm 1.Step 1.Step 2.Step 3.Step 4.

Elementary Flux Modes

Step 0.

Figure 5: An example illustrating Steps 0 to 4 on a simple network. In Step 1,three diﬀerent paths are constructed from a given MG produced in Step 0. Thepaths are divided as we go further in the graph, as shown by dark-colored nodeschanging to light-colored nodes. In Step 2, for the ﬁrst two paths from left,there is no node with more than one input and one output. Therefore, multi-path condition is not applied. However, the third path is constructed fromthe ﬁrst two paths. The internal nodes of each of these two paths are shown bydiﬀerent colors. Multi-path condition is checked in their common node shown byoverlapped circles. As a result, this path is discarded. In Step 3, for each path,all nodes are traversed and primary inputs and primary outputs, illustrated byred solid lines, get a ﬂux. In Step 4, nodes with secondary input(s)/output(s)are traversed again to pass the extra ﬂux to the boundary reactions. Secondaryinput(s)/output(s) are illustrated by dotted lines and their associated nodes areillustrated by red circles. According to the coeﬃcients, as shown in the givenstoichiometric matrix embedded in the graph, the second path cannot get astable ﬂux value for all reactions and is discarded. The ﬁnal output after Step 4is one EFM. 13 Results and Discussion

To calculate the computational complexity of Algorithm 2, the complexity ofeach step is analyzed below: • In Step 1, the maximum number of created pathways is calculated as P max “ M B ˆp r max q M z B , in which M B is the number of boundary metabo-lites, M z B is the number of all other internal metabolites except the bound-ary ones and r max is the maximum number of input/output reactions forall nodes. • In Step 2, for each pathway P i , there are two traversals with the com-plexity of (1) M P i , the number of all nodes in a pathway, and (2) M t P i ,the number of all tagged nodes with more than one input and one out-put in that pathway. Therefore, the order of maximum traversals is P ˆ r M x P i ˆ M P i s or P ˆ M where M P i and M t P i ď M . • In Step 3, all nodes in P i should be traversed which requires P ˆ M P i iterations. P ď P because some pathways may be discarded in Step 2.Therefore, the worst-case computational complexity for this step wouldbe P ˆ M . • In Step 4, for each P i and for all nodes with secondary reactions, (i.e., M v P i nodes), at least half of the nodes should be traversed to lead the extraproduction/consumtion of metabolites to input/output. Therefore, P ˆr M v P i ˆ M P i { s is the computational complexity of this step. Since P ď P and M v P i , M P i { ď M , P ˆ r M { s is the computational complexity inthe worst-case.Considering all items above, P ˆ O p M q , or as a result r max M ˆ O p M q is the computational complexity of the algorithm if we assume M B ! M . Thecomputational complexity of the approach can be considered as an upper-boundfor the number of EFMs including external reactions. Three diﬀerent topologies for an EFM can be observed in metabolic networks: • Internal pathways with no boundary reaction(s), • pathways with only input or only output reaction(s) consisting of an in-ternal loop, • pathways from input reaction(s) to output reaction(s).These topologies are illustrated in Figure 6. The main focus of Algorithm 2 isthe third topology. However, in order to expand this focus to all EFMs, the14 a)(c)(b) Figure 6: Categorization of EFM topologies as explained in Section 4.2. Anexample of: (a) a “futile cycle” EFM, (b) an inconsistent route: a path withonly input reaction consisting of an internal loop, and (c) a consistent non-cyclicEFM: a straight path (connecting external metabolites).algorithm can be easily modiﬁed to keep pathways without the contributionof output reactions (in Case 2) rather than discarding them, or starting frominternal reactions (in Case 1).However, from the biological point of view, internal pathways with no bound-ary reactions, (i.e., loops) are called “futile cycles” and are not biotechnologicallyrelevant [28], and the pathways with boundary inputs and no output implicateinconsistency in the network [29]. Therefore, a meaningful subset of EFMs aretargeted here to keep more relevant EFMs in the output.

To demonstrate the functionality of the model, the GB-EFM method has beenimplemented in C++ and tested on an Intel Core-i5 CPU with 4 GB RAM. Thesource code and results are provided on https://github.com/marabzadeh/GB-EFM. A small metabolic network comprising tricarboxylic acid cycle, glyoxylateshunt and adjacent reactions of amino acid synthesis in

E. coli [30] is consideredto describe the steps of the algorithm and proof the concept. The network hasone boundary input reaction and four boundary output reactions among all25 reactions and 18 internal metabolites as shown in Table 1. Reactions R , R , R and R are boundary output reactions. The elementary ﬂux modeswere obtained using EFMtool [16] as well as the proposed method. In thecase of considering all reactions as irreversible, EFMtool produces 12 EFMs inwhich 3 are cycles, 2 are inconsistent routes with only an input reaction andno output (mentioned in [29] as elementary leakage modes) and the other 7are consistent, non-cyclic pathways. Considering the reversible reactions in the15able 1: The list of reactions for the sample network of E. coli model andthe resulting EFMs obtained by GB-EFM. External output metabolites andcofactors are marked by asterisks. The external metabolite consumed by theinput boundary reaction R is marked by two asterisks. Reactions EFMs ˚ ñ

Pyr + ATP ˚ ˚ + CoA ñ AcCoA + CO2 ˚ + NADH ˚ ñ Cit + CoA 0 0 1 2 1 2 1 1 0 1 0R3 Icd IsoCit + NADP ˚ ñ

OG + CO2 ˚ + NADPH ˚ ˚ + CoA ñ SucCoA + CO2 ˚ + NADH ˚ ñ Succ + Gly 0 0 0 1 1 1 0 1 0 1 0R6 Mas Gly + AcCoA ñ Mal + CoA 0 0 0 1 1 1 0 1 0 1 0R7 AspCon Asp ñ Asp ex ˚ ñ Fum + NH3 ˚ ˚ ñ PEP + ADP ˚ + CO2 ˚ ˚ ñ OAA 1 0 1 0 0 0 1 0 1 1 1R11 Pps Pyr + ATP ˚ ñ

PEP + AMP ˚ ñ Glu ex ˚ ñ Ala ex ˚ ñ Suc ex ˚ + CoA 0 0 0 0 0 1 1 1 1 2 1R15 Eno PG ˚˚ ô PEP 1 1 2 3 2 3 2 2 1 3 1R16 Acn Cit ô IsoCit 0 0 1 2 1 2 1 1 0 1 0R17 SucCD SucCoA + ADP ˚ ô

Succ + ATP ˚ + CoA 0 0 0 0 0 0 0 -1 -1 -2 -1R18 Sdh Succ + FAD ˚ ô Fum + FADH2 ˚ ô Mal 0 0 0 1 1 1 0 0 0 -1 -1R20 Mdh Mal + NAD ˚ ô

OAA + NADH ˚ ô Asp + OG 1 0 0 0 1 0 0 0 1 0 0R22 Gdh OG + NH3 ˚ + NADPH ˚ ô Glu + NADP ˚ ô Ala + OG 0 1 0 0 0 0 0 0 0 0 0R24 SucEx Suc ô Suc ex ˚ table as reversible, 16 EFMs are produced, in which 3 and 2 are cycles andinconsistent routes belonging to the ﬁrst and the second category as deﬁnedin Section 4.2, respectively, and 11 are pathways from input to an output asproduced by GB-EFM. All information are provided in Supplementary ﬁle 1.The information of inconsistent routes with further description are provided inSupplementary ﬁle 2.The number of pathways produced by the proposed algorithm after Step 1 is40. After Step 2, this is reduced to 30 and after applying Steps 3-4, 5 pathways(ﬁnal EFMs) remain. These pathways are the same as the eleven “acyclic”EFMs obtained by EFMtool. The resulting EFMs are illustrated in Figure 7.Besides, a set of moderate-sized metabolic networks are considered as furthertest cases. The ﬁrst network is the pentose phosphate pathway in trypanosomabrucei [32], the second one is the simple network of human red blood cellmetabolism [33] and the third one is an E. coli core model obtained from [34].GB-EFM found all EFMs from

Glucose as reported in Table 2. Computationaltime for all the networks was less than one second. The currency metabolitessuch as ADP, ATP, AMP, CO , H O, O , H , NH , Pi, NAD, NADH, NADPand NADPH were removed from the input stoichiometry matrix of all samples.As stated in the literature [35, 18], some simpliﬁcations to the system ' s modelcan be performed in order to reduce the complexity of the problem, such assetting currency metabolites like cofactors and energy metabolites to external.According to [35], for energy currency metabolites like ATP, NADH and FADH,since their concentration is assumed to be constant they are not required to bebalanced by an EFM. PG is an input metabolite and Asp ex, Glu ex, Ala ex16igure 7: The resulting EFMs of the metabolic network comprising tricarboxylicacid cycle, glyoxylate shunt and adjacent reactions of amino acid synthesis in E. coli [30]. Abbreviations are explained in [30]. EFMs are depicted usingEscher [31]. (a) “futile cycle” EFMs, (b) inconsistent routes: paths with onlyinput reaction consisting of internal loops, with no output reaction, and (c)consistent non-cyclic EFMs: paths connecting external metabolites; producedby both EFMtool and GB-EFM. PG is an input metabolite shown by pinkellipse and Asp ex, Glu ex, Ala ex and Suc ex are output metabolites shownby purple squares. External and internal metabolites are illustrated by largercircles and currency metabolites by smaller ones.17able 2: Number of EFMs starting from

Glucose in some networks.

Network Size ( m ˚ n ) T. brucei ˚

35 4Human red blood cell 20 ˚

50 20

E. coli core (Escherichia coli str. K-12 substr. MG1655) 53 ˚

94 47 and Suc ex are output metabolites.

According to the graph data model, the order of traversing the nodes and therelations between reactions and metabolites can be maintained for each pathway.Besides, decisions to select or eliminate a particular reaction or metabolite canbe made during the algorithm. Since in this approach the traversal is on thegraph, many pathways are produced which may be discarded afterwards.The proposed data model gives the opportunity of diﬀerent levels of paral-lelism to trace pathways. Therefore, the complexity of implementation can bereduced by optimizing the method to only focus on pathways that are desirablein EFM analysis and by exploiting advantages of the possible parallelism in themethod. The graph structure makes a good track of unbalanced metabolites.Our approach tries to decide if a certain “pathway” is EFM or not, merely basedon topology rules (instead of recognizing the elementarity through rank test orcomparing new candidates with produced ones).Our strategy brings up the opportunity of object-oriented programming byconsidering a metabolite as an object. The method at least in its present form,might not be applicable to large-scale networks. However, the graph structuremakes it easier to set-up rules for pathways and explore the intended solutionspace via rules. The proposed method can potentially be implemented througha system design approach on hardware more conveniently.A hypergraph is a simpliﬁed type of the modiﬁed AND/OR graph. Theapplication of hypergraphs in biological networks has been reviewed in [36]. Inthe introduced model, each arc has two diﬀerent input and output coeﬃcients.Besides, for each arc, a dynamic label for each pathway is deﬁned, that is, theﬂux of the reaction represented by that arc. The proposed framework can beconsidered as an extension to the well-known ﬂow network problem for hyper-graphs. The edges are directed and each has been associated with a weight.However, the weight of an output arc of a node is diﬀerent from the weightwhen the arc enters another node as an input. Besides, the hypergraph struc-ture complicates pathways topology [37]. The complication arises since ﬂuxesof contributing input and output reactions over a node may be related. This re-sults to the non-linear property of ﬂuxes. The non-linearity has been consideredin our ﬂux calculation procedure (see Eq. 2 and Eq. 3).The ﬁrst introduced double-description method for ﬁnding EFMs exploresthe whole set of pathways to ﬁnd pathways that are in the steady-state solution18pace and then selects EFMs by comparing the reaction subset of pathways [5].In the improved double-description versions of the method which use null-spaceof the stoichiometry matrix as an input, diﬀerent combinations of pathwaysin the null-space are calculated and then the reaction subset of pathways arecompared [12, 13, 14]. The main diﬀerence of the GB-EFM method with double-description-based methods is that GB-EFM ﬁrst calculates the shortest path-ways in the solution space and then checks to see if these pathways can be inthe steady-state condition by using some rules on the topology of the pathway.The method of [20] explores the shortest pathways in the AND/OR graphof the network. The authors did not take into account the stoichiometry coeﬃ-cients. Thus, the resulting pathways may or may not be EFM. In the methodof [21], all reaction combinations to balance a metabolite are calculated and allEFMs are computed accordingly. The order of traversing unbalanced metabo-lites is based on the graph model. The proposed model in this paper traversesthe graph and keeps both the stoichiometry information and the minimality ofthe pathway with the opportunity of not exploring the whole solution space.The methods based on linear programming usually restrict their method to in-teger variables [18] which is limiting. In the case of the proposed method thisis not a serious concern.The presented approach, as discussed earlier, aims at ﬁnding EFMs by spe-ciﬁc characteristics, i.e., the third category is Section 4.2. This brings up the factthat we are looking for new ways to reduce the solution space. EFMtool and themethod of [21] target the whole set of EFMs, which means either they ﬁnd thecomplete set of EFMs of a metabolic network (which is a lot of non-categorizedoutput information) or they fail to ﬁnd any EFM. In comparison, our methodis looking for a meaningful subset of EFMs which leads to a categorized outputset with the cost of losing speed.In several approaches such as [16, 18], reversible reactions are considered astwo irreversible reactions and the two-cycle EFMs are being removed duringthe EFM calculation. As it has been studied in [38], by splitting reversiblereactions, the size of the ﬂux cone description increases by one. However, inthe proposed method, while reversible reactions are added to both input andoutput of a metabolite, they are treated as reversible routes. When a reversibleroute is used in one direction, the other direction is disabled. Consequently, theproposed method preserves the dimension of the ﬂux cone.

In order to calculate the elementary ﬂux modes of a given metabolic network,an algorithm was proposed based on a modiﬁed AND/OR graph. It was shownin the paper that the steps in the algorithm leads to exact pathways accordingto EFM deﬁnition. The worst-case computational complexity of the algorithmwas calculated as O p r max M ˆ M q which is a newly reported upper-bound forthe number of EFMs with external metabolites. Additionally, a set of test caseswas provided to prove the concept of the algorithm.19he main focus of this paper was to introduce a model to ﬁnd minimal ﬂux-balanced pathways in metabolic networks. The model can also be applicableto several constraint-based pathway analysis approaches. Using the potentialparallelism in the method to speed-up the algorithm on a hardware structureand proposing diﬀerent categorization of EFMs based on reactions/metabolitescharacteristics, as is possible according to the graph data model, are consideredas our future research. Acknowledgements

Authors would like to thank Dr. Nathan Lewis (UCSD) for helpful discussionson the concept.

References [1] Christopher S Henry, Matthew DeJongh, Aaron A Best, Paul M Frybarger,Ben Linsay, and Rick L Stevens. High-throughput generation, optimiza-tion and analysis of genome-scale metabolic models.

Nature biotechnology ,28(9):977–982, 2010.[2] Nathan D Price, Jennifer L Reed, and Bernhard Ø Palsson. Genome-scale models of microbial cells: evaluating the consequences of constraints.

Nature Reviews Microbiology , 2(11):886–897, 2004.[3] Jeﬀrey D Orth, Ines Thiele, and Bernhard Ø Palsson. What is ﬂux balanceanalysis?

Nature biotechnology , 28(3):245–248, 2010.[4] Edda Klipp, Ralf Herwig, Axel Kowald, Christoph Wierling, and HansLehrach.

Systems biology in practice: concepts, implementation and appli-cation . John Wiley & Sons, 2008.[5] Stefan Schuster and Claus Hilgetag. On elementary ﬂux modes in bio-chemical reaction systems at steady state.

Journal of Biological Systems ,2(02):165–182, 1994.[6] Stefan Schuster, David A Fell, and Thomas Dandekar. A general deﬁnitionof metabolic pathways useful for systematic organization and analysis ofcomplex metabolic networks.

Nature biotechnology , 18(3):326–332, 2000.[7] S Schuster, S Klamt, W Weckwerth, F Moldenhauer, and T Pfeiﬀer. Useof network analysis of metabolic systems in bioengineering.

Bioprocess andBiosystems Engineering , 24(6):363–372, 2002.[8] Devesh Radhakrishnan, Meghna Rajvanshi, and KV Venkatesh. Pheno-typic characterization of corynebacterium glutamicum using elementarymodes towards synthesis of amino acids.

Systems and synthetic biology ,4(4):281–291, 2010. 209] K Parvatham and L Veerakumari. Drug target prediction using elementarymode analysis in ascaris lumbricoides energy metabolism.

Biotechnologyand bioprocess engineering , 18(3):491–500, 2013.[10] Daniel Machado and Markus J Herrg˚ard. Co-evolution of strain designmethods based on ﬂux balance and elementary mode analysis.

MetabolicEngineering Communications , 2:85–92, 2015.[11] Jason A Papin, Nathan D Price, Sharon J Wiback, David A Fell, andBernhard O Palsson. Metabolic pathways in the post-genome era.

Trendsin biochemical sciences , 28(5):250–258, 2003.[12] C Wagner. Nullspace approach to determine the elementary modes of chem-ical reaction systems.

The Journal of Physical Chemistry B , 108(7):2425–2431, 2004.[13] Robert Urbanczik and Carl Wagner. An improved algorithm for stoi-chiometric network analysis: theory and applications.

Bioinformatics ,21(7):1203–1210, 2005.[14] Lake Ee Quek and Lars K Nielsen. A depth-ﬁrst search algorithm to com-pute elementary ﬂux modes by linear programming.

BMC systems biology ,8(1):1, 2014.[15] Julien Gagneur and Steﬀen Klamt. Computation of elementary modes: aunifying framework and the new binary approach.

BMC bioinformatics ,5(1):1, 2004.[16] Marco Terzer and J¨org Stelling. Large-scale computation of elementaryﬂux modes with bit pattern trees.

Bioinformatics , 24(19):2229–2235, 2008.[17] Axel Von Kamp and Stefan Schuster. Metatool 5.0: fast and ﬂexible ele-mentary modes analysis.

Bioinformatics , 22(15):1930–1931, 2006.[18] Luis F De Figueiredo, Adam Podhorski, Angel Rubio, Christoph Kaleta,John E Beasley, Stefan Schuster, and Francisco J Planes. Computing theshortest elementary ﬂux modes in genome-scale metabolic networks.

Bioin-formatics , 25(23):3158–3165, 2009.[19] Laszlo David and Alexander Bockmayr. Computing elementary ﬂux modesinvolving a set of target reactions.

IEEE/ACM Transactions on Computa-tional Biology and Bioinformatics , 11(6):1099–1107, 2014.[20] Jose Francisco Hidalgo C´espedes, Francisco De As´ıs Guil Asensio, and JoseManuel Garc´ıa Carrasco. A new approach to obtain efms using graphmethods based on the shortest path between end nodes. In

InternationalConference on Bioinformatics and Biomedical Engineering , pages 641–649.Springer, 2015. 2121] Ehsan Ullah, Shuchin Aeron, and Soha Hassoun. gEFM: an algorithmfor computing elementary ﬂux modes using graph traversal.

IEEE/ACMTransactions on Computational Biology and Bioinformatics , 13(1):122–134,2016.[22] Vicente Acu˜na, Flavio Chierichetti, Vincent Lacroix, Alberto Marchetti-Spaccamela, Marie-France Sagot, and Leen Stougie. Modes and cuts inmetabolic networks: Complexity and algorithms.

Biosystems , 95(1):51–60,2009.[23] Vicente Acu˜na, Alberto Marchetti-Spaccamela, Marie-France Sagot, andLeen Stougie. A note on the complexity of ﬁnding and enumerating ele-mentary modes.

Biosystems , 99(3):210–214, 2010.[24] Steﬀen Klamt and J¨org Stelling. Combinatorial complexity of pathwayanalysis in metabolic networks.

Molecular biology reports , 29(1-2):233–236,2002.[25] Sabine P´er`es, Fran¸cois Vall´ee, Marie Beurton-Aimar, and Jean-PierreMazat. Acom: a classiﬁcation method for elementary ﬂux modes basedon motif ﬁnding.

BioSystems , 103(3):410–419, 2011.[26] Matthias P Gerstl, David E Ruckerbauer, Diethard Mattanovich, Chris-tian Jungreuthmayer, and J¨urgen Zanghellini. Metabolomics integratedelementary ﬂux mode analysis in large metabolic networks.

Scientiﬁc re-ports , 5:8930, 2015.[27] J¨urgen Zanghellini, David E Ruckerbauer, Michael Hanscho, and Chris-tian Jungreuthmayer. Elementary ﬂux modes in a nutshell: properties,calculation and applications.

Biotechnology journal , 8(9):1009–1016, 2013.[28] Anthony P Burgard, Shankar Vaidyaraman, and Costas D Maranas. Min-imal reaction sets for

Escherichia coli metabolism under diﬀerent growthrequirements and uptake environments.

Biotechnology progress , 17(5):791–797, 2001.[29] Albert Gevorgyan, Mark G Poolman, and David A Fell. Detection ofstoichiometric inconsistencies in biomolecular models.

Bioinformatics ,24(19):2245–2251, 2008.[30] Stefan Schuster, Thomas Dandekar, and David A Fell. Detection of ele-mentary ﬂux modes in biochemical networks: a promising tool for pathwayanalysis and metabolic engineering.

Trends in biotechnology , 17(2):53–60,1999.[31] Zachary A King, Andreas Dr¨ager, Ali Ebrahim, Nikolaus Sonnenschein,Nathan E Lewis, and Bernhard O Palsson. Escher: a web applicationfor building, sharing, and embedding data-rich visualizations of biologicalpathways.

PLoS Comput Biol , 11(8):e1004321, 2015.2232] Eduard J Kerkhoven, Fiona Achcar, Vincent P Alibu, Richard J Burch-more, Ian H Gilbert, Maciej Trybi(cid:32)lo, Nicole N Driessen, David Gilbert,Rainer Breitling, Barbara M Bakker, et al. Handling uncertainty in dy-namic models: the pentose phosphate pathway in trypanosoma brucei.

PLoS Comput Biol , 9(12):e1003371, 2013.[33] Sharon J Wiback and Bernhard O Palsson. Extreme pathway analysisof human red blood cell metabolism.

Biophysical Journal , 83(2):808–818,2002.[34] Zachary A King, Justin Lu, Andreas Dr¨ager, Philip Miller, StephenFederowicz, Joshua A Lerman, Ali Ebrahim, Bernhard O Palsson, andNathan E Lewis. Bigg models: A platform for integrating, standardizingand sharing genome-scale models.

Nucleic acids research , 44(D1):D515–D522, 2016.[35] Christoph Kaleta, Luıs Filipe De Figueiredo, J¨orn Behre, and Stefan Schus-ter. EFMEvolver: Computing elementary ﬂux modes in genome-scalemetabolic networks. In

Lecture Notes in Informatics-Proceedings , volume157, pages 179–189, 2009.[36] Steﬀen Klamt, Utz-Uwe Haus, and Fabian Theis. Hypergraphs and cellularnetworks.

PLoS computational biology , 5(5):e1000385, 2009.[37] Sayed Amir Marashi and Mojtaba Tefagh. A mathematical approach toemergent properties of metabolic networks: Partial coupling relations, hy-perarcs and ﬂux ratios.

Journal of theoretical biology , 355:185–193, 2014.[38] Abdelhalim Larhlimi and Alexander Bockmayr. On inner and outer descrip-tions of the steady-state ﬂux cone of a metabolic network. In