Theoretical estimation of metabolic network robustness against multiple reaction knockouts using branching process approximation
aa r X i v : . [ q - b i o . M N ] J u l Theoretical estimation of metabolic networkrobustness against multiple reaction knockoutsusing branching process approximation
Kazuhiro Takemoto a , ∗ Takeyuki Tamura b Tatsuya Akutsu b a Department of Bioscience and Bioinformatics, Kyushu Institute of Technology,Kawazu 680-4, Iizuka, Fukuoka 820-8502, Japan b Bioinformatics Center, Institute for Chemical Research, Kyoto University,Gokasho, Uji, Kyoto, 611-0011, Japan
Abstract
In our previous study, we showed that the branching process approximation is usefulfor estimating metabolic robustness, measured using the impact degree . By applyinga theory of random family forests, we here extend the branching process approxi-mation to consider the knockout of multiple reactions, inspired by the importanceof multiple knockouts reported by recent computational and experimental studies.In addition, we propose a better definition of the number of offspring of each re-action node, allowing for an improved estimation of the impact degree distributionobtained as a result of a single knockout. Importantly, our proposed approach is alsoapplicable to multiple knockouts. The comparisons between theoretical predictionsand numerical results using real-world metabolic networks demonstrate the validityof the modeling based on random family forests for estimating the impact degreedistributions resulting from the knockout of multiple reactions.
Key words:
Metabolic network, Branching process, Cascading failure
PACS:
Robustness is an important concept for understanding how living organismsadapt to changing environments, as well as for understanding how they are ∗ Corresponding author.
Email address: [email protected] (Kazuhiro Takemoto).
Preprint submitted to Elsevier 8 October 2018 ble to survive when carrying mutated genes [1, 2]. In particular, metabolicrobustness is of increasing interest not only to researchers in the field of ba-sic biology, but also to those in biotechnology and medical research becausemetabolic processes are essential for physiological functions, and are respon-sible for maintaining life.The development of high-throughput methods has facilitated the collectionand compilation of large metabolic network datasets, which are stored indatabases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG)[3] and the Encyclopedia of Metabolic Pathways (MetaCyc) [4]. In recentyears, numerous methods and measures have been developed for analyzingmetabolic robustness in the context of gene/reaction knockout by using avail-able metabolic network data.Many of these methods and measures are based on flux balance analysis (FBA). Edwards and Palsson used the change in the optimal objective functionvalue (e.g., growth rate) as a measure of robustness against the change in aparticular reaction [5]. Segr`e et al. [6] proposed the minimization of metabolicadjustment (MOMA) method, which predicts the flux vectors by minimizingthe Euclidean distance between the mutant and wild type. Deutscher et al. [7]proposed another measure by combining FBA with the Shapley value in gametheory. With respect to FBA, elementary flux modes (EFMs) have also beenused to analyze the robustness of metabolic networks, where an EFM is aminimal set of reactions that can operate at the steady state [8]. Wilhelm etal. [9] proposed a measure based on the numbers of EFMs before and afterknockout, which was later extended to include the knockout of multiple re-actions [10]. In order to evaluate robustness for the production of a specifictarget compound(s), several studies have used a minimum reaction cut basedon FBA and/or EFM [11–14], which involves a minimum set of reactions (orenzymes), the removal of which leads to prevention of the production of aspecific set of compounds. Other approaches based on FBA/EFM have alsobeen implemented [15].Boolean modeling is an alternative way to model metabolic networks, wherebythe activity of each reaction or compound is represented by either 0 (inactive)or 1 (active) and reactions and compounds are modeled as AND nodes and ORnodes respectively. Handorf et al. [16] introduced the concept of scope basedon Boolean modeling and applied it to analyses of the robustness of metabolicnetworks. Li et al. [17], Sridhar et al. [18], and Tamura et al. [19] developedinteger programming-based methods for determining the minimum reactioncut under Boolean models. Lemke et al. [20] defined the damage as the numberof reactions inactivated by the knockout of a single reaction under a Booleanmodel. Smart et al. [21] refined the concept of damage by introducing the topological flux balance (TFB) criterion. Jiang et al. defined the impact degree as the number of reactions inactivated by knockout of a specified reaction [22]2nder a Boolean model. Although there are some differences in the treatmentof reversible reactions, the damage and the impact degree are very similarconcepts.To date, most studies have focused on the prediction and/or accuracy of ro-bustness measures but have given less consideration to the distribution of suchmeasures. Lemke et al. [20] analyzed the distribution of damage using com-puter simulation. Smart et al. [21] performed a similar analysis. In addition,they applied percolation theory and branching processes to the analysis of thedistribution of cluster sizes of damaged subnetworks [21]; however, they didnot explicitly estimate damage distribution (i.e., impact degree distribution).Until recently, theoretical frameworks for estimating the tolerance of metabolicnetworks to various failures were poorly established. Motivated by this, in ourprevious study [23], we analyzed the distribution of impact degree triggered byrandom knockout of a single reaction using a branching process theory [24,25].By treating the propagation of the impact triggered by the knockout of areaction as a branching process approximation, the relevance of which hadbeen shown in the context of loading-dependent cascading failure [26–28],we demonstrated that the branching process model (or theory) reflects theobserved impact degree distributions. In addition, Lee et al. [32] also recentlydemonstrated the use of a Boolean model and a theory of branching processin this context.As above, most previous studies focused on the impact of a single knockout.In recent years, however, multiple-knockout experiments have been activelyperformed, and have shown new interesting results on metabolic robustness,such as synergetic effects resulting from multiple knockouts [29–31]. Therefore,computational and theoretical frameworks need to be extended to includemultiple knockouts. For example, Deutscher et al. [7] discussed the impactof multiple knockouts in yeast metabolism based on the Shapley value fromgame theory. Tamura et al. [33] proposed an efficient method for computingmetabolic robustness in the context of impact degree. However, theoreticalapproaches remain incomplete.In this study, by extending the branching process approximation proposed inour previous study [23], we show that the branching process approximation(specifically, the assumption of a random family forest, which is a collection offamily trees) is also useful for estimating the distribution of the impact degreetriggered by the random knockout (or disruption) of multiple reactions.3
Impact Degree
Here, we briefly review the impact degree [22] and its extensions [19, 33]. The impact degree was originally proposed by Jiang et al. as a measure of theimportance of each reaction in a metabolic network [22], and is defined as thenumber of inactivated reactions caused by the knockout of a single reaction.Since the effect of cycles was not considered in their study, Tamura et al.extended the impact degree so that the effect of cycles is taken into accountby introducing the maximal valid assignment concept [19]. Furthermore, theyextended it to cope with the knockout of multiple reactions [33].Let V c = { C , . . . , C m } and V r = { R , . . . , R n } be a set of compound nodes anda set of reaction nodes respectively, where V c ∩ V r = {} . A metabolic network is defined as a bipartite directed graph G ( V c ∪ V r , E ) in which each edge isdirected either from a node in V c to a node in V r , or from a node in V r to anode in V c . Each of the included reactions and compounds takes 1 of 2 states:0 (inactive) or 1 (active).The impact degree for the knockout of multiple reactions is computed as fol-lows [33]. Let V ko = { R i , . . . , R i D } be a set of reactions that have been knockedout. We start with the global state, such that all compounds are active (i.e., C i = 1 for all C i ∈ V c ) and all reactions except for those in V ko are active (i.e., R i = 1 for all R i ∈ V r − V ko and R i = 0 for all R i ∈ V ko ). Then, we updatethe states of reactions and compounds using the following rules.(1) A reaction is inactivated if any predecessor (i.e., substrate) or successor(i.e., product) is inactive.(2) A compound is inactivated if all predecessors or all successors are inactive.We repeat this procedure until reaching a stable global state, which is deter-mined uniquely regardless of the order of updates [19]. The impact degree for V ko is the number of inactive reactions (i.e., reactions with value 0) in the sta-ble global state. Although this procedure simultaneously gives the definitionand algorithm for the impact degree, Tamura et al. developed a much moreefficient algorithm to compute it [33].Fig. 1 shows an example of a metabolic network. If R is knocked out, theother reactions remain active and thus the impact degree is 1. If R is knockedout, C is inactivated and then R is inactivated. However, C and C remainactive, and thus the impact degree is 2. If both R and R are knocked out, C is inactivated, and R and R are inactivated. Since R , R , R , R , R areinactive in the stable global state, the resulting impact degree is 5.4 R R R C R C C R C R C C C Fig. 1. Example of a metabolic network. Circles and boxes correspond to compoundsand reactions, respectively.
We have extended the branching process approximation, previously reportedin [23], to include the estimation of the impact degree distributions obtainedas a result of the knockout of multiple reactions.The branching process [35] is a stochastic process in which each progenitorgenerates offspring according to a fixed probability distribution called theoffspring distribution. Since the propagation of an impact on a network is es-sentially similar to cascading failures, which is a sequence of failures caused byan accident, branching process approximation is useful for estimating impactdegree distributions, although it requires the assumption of a network treestructure (see Sec. 5 for details of model limitations).
We need to define the number of offspring for each reaction node in metabolicnetworks in order to analyze the impact degree distributions using the branch-ing process approximation.To obtain the number of offspring, in the previous study [23], we used thereaction network obtained as the unipartite projection of the metabolic net-work, where we draw an edge from reaction A to reaction B when at least1 product of A is a substrate of B (see Sec 3.1 in [23] for details). Assumingthat the impact spreads though reactions whose substrates are synthesized viaunique metabolic reactions (i.e., reaction nodes with the indegree of 1) whenassuming tree structures of networks, we defined the number d i of offspringfor reaction node i in metabolic networks as follows: d i = k out i if k in i = 1 and 0otherwise, where k out i and k in i are the outdegree and indegree of reaction node i in the reaction network, respectively.However, this definition is not appropriate when different metabolic networks5Figs. 2A and 2B) are projected into the same reaction network (Fig. 2C). Inthis case, the similar offspring distribution (or potential of spreading) is definedbetween these different metabolic networks although the tendency of impactspreading is clearly different between the networks. For example, in particular,we consider the knockout of the reaction R (i.e., the inactivation of R ) inFig. 2. In Fig. 2A, the reaction R is also inactive (i.e., the impact spreads)after the inactivation of the reaction R because it requires the compound C synthesized through the reaction R . On the other hand, in Fig. 2B, theimpact does not spread because the compound C required by the reaction R can be synthesized through the reaction R even if the reaction R is inactive. R R R R R R R R R (A)(B) C C C C C (C) Unipartite projection
Fig. 2. Example illustrating how different metabolic networks (A and B) are con-verted to the same reaction network (C) through the unipartite projection. Circlesand boxes correspond to compounds and reactions, respectively. Lightning boltsindicate the impacts.
To distinguish between these cases, we consider spreading edges that con-tributes to the impact spreading on reaction networks. In particular, spreadingedges are defined as directed edges between a reaction node pair (i.e., sourceand target nodes) whose interjacent chemical compounds are synthesized bythe source reaction only. For example, in Fig. 2A, the interjacent compound(i.e., C ) between the reaction nodes R and R is generated through the reac-tion R only; thus, the directed edge from the reaction node R to the reactionnode R in Fig. 2C is defined as a spreading edge. On the other hand, in Fig.2B, the interjacent compound C is obtained either through the reactions R or R ; thus, the directed edge from the reaction node R to the reaction node R in Fig. 2C is not regarded as a spreading edge.After finding the spreading edges in reaction networks according to this defi-nition, the number of offspring of a reaction node is defined as the number ofout-going spreading edges. For example, the number of offspring of the reac-tion node R and R in Fig. 2A are both 1; however, they are both 0 in Fig.2B. In this manner, we can determine the difference in the tendency of impactspreading between these metabolic networks.6e finally obtain the offspring distribution P ( d ) from an empirical metabolicnetwork as P ( d ) = 1 N N X i =1 δ ( d, d i ) , (1)where N corresponds to the total number of reaction nodes. The function δ ( x, y ) is the Kronecker’s delta function, and it returns 1 if x = y and 0otherwise. Here, we explain the estimation method for the impact degree distributionobtained as a result of the knockout of either a single reaction or multiplereactions, based on a theory of branching processes.
We previously investigated the impacts of a single knockout [23]; here, webriefly summarize these methods and findings as a basis for extending ourapproach to include cases in which multiple reactions are knocked out.In this study, we consider a branching process model (i.e., the empirical modelin Ref. [23]) using empirical offspring distributions, obtained using Eq. (1).Because the impact degree can be regarded as the total number of offspringthrough a branching process [23], we obtained the distribution of the totalnumber of offspring to estimate the impact degree distribution. However, thebranching process model we described previously does not count the first im-pact (i.e., progenitor) although counting the first impact is necessary for esti-mating the impact degree distribution obtained as a result of the knockout ofmultiple reactions. Thus, we estimate the distribution of the impact degree,including the first impact, triggered by the knockout of a single reaction, asfollows.Let F ( s ) be the probability generating function of the impact degree r (i.e.,the total number of offspring, including the progenitor); the function F ( s )satisfies the recursive relation [34–36]: F ( s ) = sf ( F ( s )) , (2)where f ( s ) denotes the probability generating function of the number d of7ffspring of each reaction node: f ( s ) = d max X d =0 P ( d ) s d , (3)where d max is the maximum number of offspring.Using the formula reported by Burman and Lagrange and the relation of P ( r ) = 1 r ! d r F ( s ) ds r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s =0 , (4)the distribution P ( r ) (i.e., impact degree distribution) is derived from theabove implicit equation as the following explicit equation [34]: P ( r ) = 1 r ! " d r − ds r − ( f ( s )) r s =0 . (5)The impact degree distributions obtained as a result of the random knockoutof a single reaction are estimated through this equation. In this study, we assume the impact spreading triggered by the knockout ofmultiple reactions is represented by the independent branching process, givenan initial number of k individuals (progenitors; i.e., the random family forest,which is a collection of k family trees) [36].Because the branching process with k initial progenitors can be defined as k independent copies of a branching process with a single initial progenitor [36],the impact degree triggered by the knockout of multiple reactions can be de-scribed as the sum of the total number of offspring through a branching pro-cess with a single initial knockout. Thus, the probabilistic generating function F k ( s ) of the impact degree, which is induced by the knockout of k reactions, isderived using the probabilistic generating function F ( s ) of the impact degreethat results from a single knockout. This is demonstrated as follows: F k ( s ) = [ F ( s )] k . (6)To obtain the probabilistic generating function F ( s ) of the impact degreetriggered by a single knockout, we consider 2 cases. In the first case, we use8n estimated value for F ( s ) that is derived using Eq. (5): F ( s ) = r max X r =0 P ( r ) s r . (7)In the second case, we use an empirical value of F ( s ) (i.e., the probabilis-tic generating function calculated according to Eq. (7), using the empirical P ( r ) computed by numerical simulations). Using the empirical F ( s ), we canpurely evaluate the validity of the assumption of random family forests. r max corresponds to the maximum impact degree in the empirical P ( r ).Finally, the impact degree distribution P k ( r ) obtained as a result of the knock-out of k reactions is estimated as P k ( r ) = the coefficient of s r in F k ( s ) . (8) We evaluated the efficiency of the above estimation methods for the impactdegree distributions using real-world metabolic networks of several species.We focused on 2 bacteria, (i.e.,
Escherichia coli and
Bacillus subtilis ), and 2 eu-karyotes, (i.e.,
Saccharomyces cerevisiae (yeast) and
Homo sapiens (human)),whose metabolic pathways have been well characterized using experimentalapproaches in a previous study [23]. We downloaded metabolic network datafor each species from the KEGG database [3,37]. Each dataset was representedas bipartite networks, as shown in Fig. 1,.For these species, the impact degree distributions obtained as a result of theknockout of k reactions were numerically calculated using an efficient algo-rithm [33]. In this study, we considered cases of k = 1 (i.e., single knockout)and k = 2 and 3 (i.e., multiple knockouts). We compared these observed im-pact degree distributions using the theoretical distributions that were obtainedas follows.According to Sec. 3.1, we constructed the reaction networks through the uni-partite projection of bipartite metabolic networks from the KEGG databaseand obtained the offspring distributions from the reaction networks (Fig. 3).The number of offspring follows a power-law-like distribution. This may bebecause of the heterogeneous (or scale-free) connectivity in metabolic net-works [38]. The offspring distributions are slightly different between the modi-fied definition based on spreading edges and the previous definition in Ref. [23].Table 1 summarizes the parameters extracted from the reaction networks, and9 able 1Parameters extracted from real-world reaction networks. The character µ previous and µ modified are the mean number of offspring of a reac-tion node estimated using our previously described definition [23] and the modifieddefinition described here based on spreading edges (this study), respectively.Species µ previous µ modified Escherichia coli
Bacillus subtilis
937 2799 0.47 0.58
Saccharomyces cerevisiae
856 2328 0.48 0.59
Homo sapiens it shows that the mean number of offspring, an important parameter for de-scriptions of branching processes, is either overestimated or underestimatedusing the previously described definition, which is in contrast to the use ofour modified definition presented here. The cases of
E. coli and
H. sapiens represent µ previous > µ modified , whereas the cases of B. subtilis and
S. cerevisiae show µ previous < µ modified . These differences in offspring distributions affect theaccuracy of predicting the impact degree distributions (Table 2). Note thatthe number of nodes and the mean number of offspring are slightly differ-ent between the previous study [23] and this study because of the removal ofredundant metabolic reactions in the database. As reported in our previous study [23], the branching process approximationis useful for estimating the impact degree distributions (Fig. 4). Because ofthe subcritical case (i.e., µ < µ , for example, the total number of offspring (i.e., impact degree) r is distributed according to the Borel distribution [26]: P ( r ) = ( µr ) r − e − µr /r !.Using Stirling’s formula (i.e., r ! ≈ √ πrr r e − r ), the above equation leads tothe approximation P ( r ) ∝ r − / e − r (ln µ − µ +1) . (9)When µ = 1 (i.e., the critical case), the impact degree follows the power-law distribution with the exponent of − / P ( d ) ∝ /d γ +1 , where γ < P ( r ) ∝ /r /γ [25]. 10 -3 -2 -1 -3 -2 -1 -4 -3 -2 -1 -4 -3 -2 -1 R e l a t i v e f r equen cy R e l a t i v e f r equen cy A BC D
Fig. 3. The offspring distributions of
Escherichia coli (A),
Bacillus subtilis (B),
Saccharomyces cerevisiae (C), and
Homo sapiens (D). The open circles and crossesindicate the distributions obtained using the modified and previous definitions ofthe number of offspring, respectively.
To evaluate the goodness of fit between the empirical distributions and theo-retical distributions from Eq. (5), we performed the Kolmogorov-Smirnov (KS)test (Table 2). The KS test shows that the theoretical distributions estimatedusing the branching process approximation are in good agreement with em-pirical distributions of real-world metabolic networks and that the estimationusing the modified definition of the number of offspring is better than thatusing the previous definition. In particular, the poor agreement between theimpact degree distributions of the
H. sapiens metabolic network calculatedusing the theoretical estimation and real data, as reported in the previousstudy [23], was improved as a result of the consideration of spreading edges(see Sec. 3.1). Note that the KS distance and the P -value are different betweenthe previous study [23] and this study because the previous study does notconsider the first impact when calculating the impact degree.11 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree10 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degreeActualModified Previous
A BC D
Fig. 4. The impact degree distributions obtained after the knockout of a singlereaction in
Escherichia coli (A),
Bacillus subtilis (B),
Saccharomyces cerevisiae (C),and
Homo sapiens (D). The circles indicate observed data. The solid lines anddashed lines correspond to the theoretical distributions estimated by Eq. (5) usingthe modified and previous definitions of offspring distributions, respectively.
We evaluated the efficiency of the branching process approximation whenknocking out 2 reactions (Fig. 5) and when knocking out 3 reactions (Fig.6). As explained in Sec. 3.2.2, we consider both the estimated F ( s ) and em-pirical F ( s ) when estimating the impact degree distributions resulting fromthe knockout of multiple reactions. We used the definition based on spread-ing edges when calculating the estimated F ( s ) due to its improved accuracy(Table 2). 12 able 2Comparison of prediction accuracy (i.e., goodness of fit) for the impact degreedistributions between the previous definition and the modified definition of d i :Kolmogorov-Smirnov (KS) distance, defined as sup x | R ( x ) − M ( x ) | , where R ( x )and M ( x ) are empirical distributions and theoretical distributions, respectively.The parenthetic values indicate the logarithmic P -values p from the KS test, de-fined as − log ( p ). A smaller KS distance and logarithmic P -values indicate a highergoodness of fit between the empirical distribution and theoretical distribution. Thehighlighted values correspond to the best accuracy.Species Previous version Modified version Escherichia coli
Bacillus subtilis
Saccharomyces cerevisiae
Homo sapiens
Overall, the branching process approximation (i.e., the modeling based onrandom family forests [36]) is also useful for estimating the impact degreedistribution for the knockout of multiple reactions, despite some exceptions.The larger impact degrees of the theoretical distribution calculated from
B.subtilis (Figs. 5B and 6B) using the estimated F ( s ) and the observed distri-butions are not similar. This weaker fit is caused by the prediction accuracyin the branching process approximation (i.e., Eq. (5)) rather than the as-sumption of random family forests, illustrated by the fact that the theoreticaldistributions derived using the empirical F ( s ) are in good agreement with theobserved distributions.In the case of S. cerevisiae (Figs. 5C and 6C), our data suggest that the as-sumption of random family forests is less suitable for estimating the impactdegree distributions. In particular, the theoretical distributions obtained us-ing the empirical F ( s ) are narrower than the observed distributions althoughthe theoretical distributions calculated using the estimated F ( s ) are in goodagreement with the observed distributions. This result implies the existenceof a synergistic effect of multiple knockouts on impact spreading in metabolicnetworks (i.e., the effect of multiple knockouts is larger than expected basedon the assumptions regarding the combined effects of independent impactspreading initiated by individual knockouts). By extending the branching process approximation in the previous study [23],we proposed a random family forest model for estimating the impact degree13 ctual
Using estimated F(s) Using emprical F(s)
A BC D -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree10 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree
Fig. 5. The impact degree distributions obtained following the knockout of 2 reac-tions in
Escherichia coli (A),
Bacillus subtilis (B),
Saccharomyces cerevisiae (C), and
Homo sapiens (D). The circles indicate observed data. The solid lines and dashedlines correspond to the theoretical distributions from Eq. (8) using the estimated F ( s ) and empirical F ( s ), respectively. distributions obtained as a result of multiple knockouts of metabolic networks,and demonstrated its validity using real-world metabolic network data. Inde-pendent of our previous study [23], Lee et al. [32] have also proposed a branch-ing process approach for Boolean bipartite networks of metabolic reactions;however, they only considered cases including a single knockout.In addition, we have provided a better definition of the number of offspring ofa reaction node using the concept of spreading edges (see Sec. 3.1). Because ofthis modified definition, the poor predictions reported in our previous study[23] were improved here in the context of a single knockout (see Table 2).14 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree10 -9 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -9 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree
A BC D
Actual
Using estimated F(s) Using emprical F(s)
Fig. 6. The impact degree distributions obtained as a result of the knockout of3 reactions of
Escherichia coli (A),
Bacillus subtilis (B),
Saccharomyces cerevisiae (C), and
Homo sapiens (D). The circles indicate observed data. The solid linesand dashed lines correspond to the theoretical distributions from Eq. (8) using theestimated F ( s ) and empirical F ( s ), respectively. However, the definition of spreading edges is not without limitations. In partic-ular, spreading edges do not allow for the consideration of conditional impactspreading. For example, in Fig. 2B, the impact can spread from the reactionnode R to the reaction node R , if the reaction node R has been alreadyinactive; however, this definition is not applicable to such a situation.The branching process approximation (i.e., the estimation using Eq. (5)) alsopossesses limitations, such as the assumption of network tree (or less modular)structures, as explained in the previous study [23]. The existence of cycles maylead to an overestimation of the number of offspring for each reaction node15ecause some offspring of a progenitor (reaction node) may have already beeninactivated due to cycle structures; thus, the branching process approxima-tion may overestimate the impact degree distributions. Such overestimationof the impact degree distributions is observed in the case of multiple knockoutsbecause the error is amplified through a power of the probability generatingfunction of the impact degree (i.e., Eq. (6)) (see the solid lines in Figs. 5B and6B). On the other hand, however, the number of offspring may also be underes-timated because the definition of spreading edges does not consider conditionalimpact spreading (see Sec. 3.1 for details). The estimation of offspring numberis influenced by these 2 effects; therefore, the difficulty in estimating offspringnumber is also a limitation of the branching process approximation.Analysis based on random family forest (i.e., the estimation using Eq. (7)) alsohas limitations such as an overestimation and/or underestimation of the im-pact degree distributions. For example, due to independence, random familyforests do not consider the integration of impact spreading initiated by initialimpacts on metabolic networks. In such cases, the impact degree distributionmay be overestimated. In contrast, the impact degree distribution may alsobe underestimated because the modeling based on random family forests doesnot consider the synergetic effect due to multiple knockouts; both computa-tional (e.g., [7]) and experimental studies (e.g., [29, 31]) have reported suchsynergetic effects associated with multiple knockouts, indicating that genes(or reactions) not essential for growth, as in the case of a single knockout,are identified as essential genes (or reactions) when multiple knockouts areconsidered. This effect is slightly related to conditional impact spreading (seeSec. 3.1 for details). In Fig. 2B, for example, the impact does not spread ifeither the reaction R or the reaction R is inactivated (i.e., in the case of asingle knockout). However, the impact spreading occurs when both reactions R and R are inactivated (i.e., in the case of double knockouts). Thus, theunderestimation of the impact degree distributions (see the dashed lines inFigs. 5C and 6C) might be observed. The estimation of the impact degreedistribution obtained in cases of multiple knockouts, when based on randomfamily forests, is also complicated by these effects.In order to make better predictions, analysis based on branching process maybe improved by considering additional assumptions other than static metabolicnetwork structures For example, it may be useful to take into account the as-sumption of variable propagations in the branching process [39], in which themean of offspring distributions differs at each propagation stage. For this mod-ification, we would need to numerically obtain the offspring distributions ateach propagation stage after the initial impact. Although this procedure is lessadvantageous from the viewpoint of computational costs, such an improvementmay become an important topic in the future.Although the analysis based on branching process approximation possesses16everal limitations, as explained above, its validity has been confirmed overallusing real-world metabolic networks. This may be the result of the randomnessor neutrality in metabolic network structures. Several analytical [40, 41] andtheoretical studies [42,43] suggest that the structure of metabolic networks canbe determined through simple evolutionary processes and it is less modular(or more random) than previously thought. Our results support the validityof the application of branching processes to the estimation of impact degreedistributions.The theoretical estimation of impact degree distributions is helpful for bothexperimental and computational studies on multiple knockouts because theevaluation of the impact of multiple knockouts is expensive. For example,it is difficult to numerically obtain the impact degree distributions by theknockout of more than 3 reactions, even when an efficient algorithm [33] isused. This is because of the combinatorial complexity; thus, we could onlytest cases in which of fewer than 4 reactions are simultaneously are knockedout. The analysis based on branching processes can easily predict the impactdegree distributions obtained as a result of multiple knockouts using eitherthe structure of metabolic networks or the impact degree distributions forsingle knockout, which can be easily calculated using the efficient algorithm[33]; however, some errors may be observed due to several limitations withincrease in the number of disrupted reactions (i.e., in the case of high-orderknockouts). Such combinatorial complexity is also observed in experimentalstudies on multiple knockouts (e.g., [29, 30]). In particular, it is hard to findthe set of reactions significantly affecting metabolic dynamics. Deutscher etal. [7] also mentioned such a problem due to the combinatorial complexity inexperimental studies. Our theoretical approach may be able to support suchexperimental studies by using computational approaches, as introduced in Sec.1, as well as studies of robustness in metabolic networks. Acknowledgments
This work was supported by a Grant-in-Aid for Young Scientists (A) from theJapan Society for the Promotion of Science (JSPS) (no. 25700030). K.T. waspartly supported by Chinese Academy of Sciences Fellowships for Young In-ternational Scientists (no. 2012Y1SB0014), and the International Young Sci-entists Program of the National Natural Science Foundation of China (no.11250110508). T.A. was partly supported by JSPS, Japan (Grants-in-Aid22240009 and 22650045). 17 eferences [1] J. Stelling, U. Sauer, Z. Szallasi, F. J. Doyle, J. Doyle, Robustness of cellularfunctions, Cell 118 (2004) 675–685.[2] H. Kitano, Towards a theory of biological robustness, Mol. Syst. Biol. 3 (2007)137.[3] M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe and M. Hirakawa, KEGG forrepresentation and analysis of molecular networks involving diseases and drugs,Nucleic Acids Res. 38 (2010) D355–D360.[4] I. M. Keseler, et al., EcoCyc: a comprehensive database of
Escherichia coli biology, Nucleic Acids Res. 39 (2011) D583–D590.[5] J. S. Edwards, B. O. Palsson, Robustness analysis of the
Escherichia coli metabolic network, Biotechnol. Prog. 16 (2000) 927–939.[6] D. Segr`e, D. Vitkup, G. M. Church, Analysis of optimality in natural andperturbed metabolic networks, Proc. Natl. Acad. Sci. USA 99 (2002) 15112–15117.[7] D. Deutscher, I. Meilijson, S. Schuster, E. Ruppin, Can single knockoutsaccurately single out gene functions? BMC Bioinformatics 2 (2008) 50.[8] J. A. Papin, J. Stelling, N. D. Price, S. Klamt, S. Schuster, B. O. Palsson,Comparison of network-based pathway analysis methods, Trends Biotechnol. 22(2004) 400–405.[9] T. Wilhelm, J. Behre, S. Schuster, Analysis of structural robustness of metabolicnetworks, Syst. Biol. 1 (2004) 114–120.[10] J. Behre, T. Wilhelm, A. von Kamp, E. Ruppin, S. Schuster, Structuralrobustness of metabolic networks with respect to multiple knockouts, J. Theor.Biol. 252 (2008) 433–441.[11] V. Acu˜na, F. Chierichetti, V. Lacroix, A. Marchetti-Spaccamela, M. F. Sagot,L. Stougie, Modes and cuts in metabolic networks: complexity and algorithms,Biosystems 95 (2009) 51–60.[12] A. P. Burgard, P. Pharkya, C. D. Maranas, OptKnock: a bilevel programmingframework for identifying gene knockout strategies for microbial strainoptimization, Biotechnol. Bioeng. 84 (2003) 647–657.[13] U. U. Haus, S. Klamt, T. Stephen, Computing knock-out strategies in metabolicnetworks, J. Comput. Biol. 15 (2008) 259–268.[14] S. Klamt, E. D. Gilles, Minimal cut sets in biochemical reaction networks,Bioinformatics 20 (2004) 226–234.[15] A. Larhlimia,-S. Blachona, J. Selbiga, Z. Nikoloski, Robustness of metabolicnetworks: a review of existing definitions, BioSystems 106 (2011) 1–8.
16] T. Handorf, O. Ebenh¨oh, R. Heinrich, Expanding metabolic networks: scopesof compounds, robustness, and evolution, J. Mol. Evol. 61 (2005) 498–512.[17] Z. Li, R-S. Wang, X-S, Zhang, L. Chen, Detecting drug targets with minimumside effects in metabolic networks, IET Syst. Biol. 3 (2009) 523–533.[18] P. Sridhar, B. Song, T. Kahveci, S. Ranka, Mining metabolic networks foroptimal drug targets, in: Proc. Pacific Symposium on Biocomputing 2008, 2008,291–302.[19] T. Tamura, K. Takemoto, T. Akutsu, Finding minimum reaction cuts ofmetabolic networks under a Boolean model using integer programming andfeedback vertex sets, Int. J. Knowl. Discov. Bioinform. 1 (2009) 14–31.[20] N. Lemke, F. Her´edia, C. K. Barcellos, A. N. dos Reis, J. C. M. Mombach,Essentiality and damage in metabolic networks, Bioinformatics 20 (2004) 115–119.[21] A. G. Smart, L. A. N. Amaral, J. M. Ottino, Cascading failure and robustnessin metabolic networks, Proc. Natl. Acad. Sci. USA 105 (2008) 13223–13228.[22] D. Jiang, S. Zhou, Y-P. P. Chen. Compensatory ability to null mutation inmetabolic networks, Biotechnol. Bioeng. 103 (2009) 361–369.[23] K. Takemoto, T. Tamura, Y. Cong, W.-K. Ching, J.-P. Vert, T. Akutsu,Analysis of the impact degree distribution in metabolic networks using branchingprocess approximation. Physica A 391 (2012) 379–387.[24] D.-S. Lee, K.-I. Goh, B. Khang, D. Kim, Branching process approach toavalanche dynamics on complex networks, J. Korean Phys. Soc. 44 (2004) 633–637.[25] A. Saichev, A. Helmestetter, D. Sornette, Power-law distributions of offspringand generation numbers in branching models of earthquake triggering, Pure Appl.Geophys. 162 (2005) 1113–1134.[26] I. Dobson, B. A. Carreras, D. E. Newman, A branching process approximationto cascading load-dependent system failure, In: Proc. 37th Annual HawaiiInternational Conference on System Sciences (2004).[27] I. Dobson, B. A. Carreras, D. E. Newman, A loading-dependent model ofprobabilistic cascading failure, Probab. Eng. Inform. Sc. 19 (2005) 15–31.[28] J. Kim, I. Dobson, Approximating a loading-dependent cascading failure modelwith a branching process, IEEE T. Reliab. 59 (2010) 691–699.[29] D. Deutscher, I. Meilijson, M. Kupiec, E. Ruppin, Multiple knockout analysis ofgenetic robustness in the yeast metabolic network, Nat. Genet. 38 (2006) 993–998.[30] K. Nakahigashi, et al., Systematic phenome analysis of
Escherichia coli multiple-knockout mutants reveals hidden reactions in central carbon metabolism,Mol. Syst. Biol. 5 (2009) 306.