[PDF] Theoretical estimation of metabolic network robustness against multiple reaction knockouts using branching process approximation

Abstract

In our previous study, we showed that the branching process approximation is useful for estimating metabolic robustness, measured using the impact degree. By applying a theory of random family forests, we here extend the branching process approximation to consider the knockout of {\it multiple} reactions, inspired by the importance of multiple knockouts reported by recent computational and experimental studies. In addition, we propose a better definition of the number of offspring of each reaction node, allowing for an improved estimation of the impact degree distribution obtained as a result of a single knockout. Importantly, our proposed approach is also applicable to multiple knockouts. The comparisons between theoretical predictions and numerical results using real-world metabolic networks demonstrate the validity of the modeling based on random family forests for estimating the impact degree distributions resulting from the knockout of multiple reactions.

Full PDF

aa r X i v : . [ q - b i o . M N ] J u l Theoretical estimation of metabolic networkrobustness against multiple reaction knockoutsusing branching process approximation

Kazuhiro Takemoto a , ∗ Takeyuki Tamura b Tatsuya Akutsu b a Department of Bioscience and Bioinformatics, Kyushu Institute of Technology,Kawazu 680-4, Iizuka, Fukuoka 820-8502, Japan b Bioinformatics Center, Institute for Chemical Research, Kyoto University,Gokasho, Uji, Kyoto, 611-0011, Japan

Abstract

In our previous study, we showed that the branching process approximation is usefulfor estimating metabolic robustness, measured using the impact degree . By applyinga theory of random family forests, we here extend the branching process approxi-mation to consider the knockout of multiple reactions, inspired by the importanceof multiple knockouts reported by recent computational and experimental studies.In addition, we propose a better deﬁnition of the number of oﬀspring of each re-action node, allowing for an improved estimation of the impact degree distributionobtained as a result of a single knockout. Importantly, our proposed approach is alsoapplicable to multiple knockouts. The comparisons between theoretical predictionsand numerical results using real-world metabolic networks demonstrate the validityof the modeling based on random family forests for estimating the impact degreedistributions resulting from the knockout of multiple reactions.

Key words:

Metabolic network, Branching process, Cascading failure

PACS:

Robustness is an important concept for understanding how living organismsadapt to changing environments, as well as for understanding how they are ∗ Corresponding author.

Email address: [email protected] (Kazuhiro Takemoto).

Preprint submitted to Elsevier 8 October 2018 ble to survive when carrying mutated genes [1, 2]. In particular, metabolicrobustness is of increasing interest not only to researchers in the ﬁeld of ba-sic biology, but also to those in biotechnology and medical research becausemetabolic processes are essential for physiological functions, and are respon-sible for maintaining life.The development of high-throughput methods has facilitated the collectionand compilation of large metabolic network datasets, which are stored indatabases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG)[3] and the Encyclopedia of Metabolic Pathways (MetaCyc) [4]. In recentyears, numerous methods and measures have been developed for analyzingmetabolic robustness in the context of gene/reaction knockout by using avail-able metabolic network data.Many of these methods and measures are based on ﬂux balance analysis (FBA). Edwards and Palsson used the change in the optimal objective functionvalue (e.g., growth rate) as a measure of robustness against the change in aparticular reaction [5]. Segr`e et al. [6] proposed the minimization of metabolicadjustment (MOMA) method, which predicts the ﬂux vectors by minimizingthe Euclidean distance between the mutant and wild type. Deutscher et al. [7]proposed another measure by combining FBA with the Shapley value in gametheory. With respect to FBA, elementary ﬂux modes (EFMs) have also beenused to analyze the robustness of metabolic networks, where an EFM is aminimal set of reactions that can operate at the steady state [8]. Wilhelm etal. [9] proposed a measure based on the numbers of EFMs before and afterknockout, which was later extended to include the knockout of multiple re-actions [10]. In order to evaluate robustness for the production of a speciﬁctarget compound(s), several studies have used a minimum reaction cut basedon FBA and/or EFM [11–14], which involves a minimum set of reactions (orenzymes), the removal of which leads to prevention of the production of aspeciﬁc set of compounds. Other approaches based on FBA/EFM have alsobeen implemented [15].Boolean modeling is an alternative way to model metabolic networks, wherebythe activity of each reaction or compound is represented by either 0 (inactive)or 1 (active) and reactions and compounds are modeled as AND nodes and ORnodes respectively. Handorf et al. [16] introduced the concept of scope basedon Boolean modeling and applied it to analyses of the robustness of metabolicnetworks. Li et al. [17], Sridhar et al. [18], and Tamura et al. [19] developedinteger programming-based methods for determining the minimum reactioncut under Boolean models. Lemke et al. [20] deﬁned the damage as the numberof reactions inactivated by the knockout of a single reaction under a Booleanmodel. Smart et al. [21] reﬁned the concept of damage by introducing the topological ﬂux balance (TFB) criterion. Jiang et al. deﬁned the impact degree as the number of reactions inactivated by knockout of a speciﬁed reaction [22]2nder a Boolean model. Although there are some diﬀerences in the treatmentof reversible reactions, the damage and the impact degree are very similarconcepts.To date, most studies have focused on the prediction and/or accuracy of ro-bustness measures but have given less consideration to the distribution of suchmeasures. Lemke et al. [20] analyzed the distribution of damage using com-puter simulation. Smart et al. [21] performed a similar analysis. In addition,they applied percolation theory and branching processes to the analysis of thedistribution of cluster sizes of damaged subnetworks [21]; however, they didnot explicitly estimate damage distribution (i.e., impact degree distribution).Until recently, theoretical frameworks for estimating the tolerance of metabolicnetworks to various failures were poorly established. Motivated by this, in ourprevious study [23], we analyzed the distribution of impact degree triggered byrandom knockout of a single reaction using a branching process theory [24,25].By treating the propagation of the impact triggered by the knockout of areaction as a branching process approximation, the relevance of which hadbeen shown in the context of loading-dependent cascading failure [26–28],we demonstrated that the branching process model (or theory) reﬂects theobserved impact degree distributions. In addition, Lee et al. [32] also recentlydemonstrated the use of a Boolean model and a theory of branching processin this context.As above, most previous studies focused on the impact of a single knockout.In recent years, however, multiple-knockout experiments have been activelyperformed, and have shown new interesting results on metabolic robustness,such as synergetic eﬀects resulting from multiple knockouts [29–31]. Therefore,computational and theoretical frameworks need to be extended to includemultiple knockouts. For example, Deutscher et al. [7] discussed the impactof multiple knockouts in yeast metabolism based on the Shapley value fromgame theory. Tamura et al. [33] proposed an eﬃcient method for computingmetabolic robustness in the context of impact degree. However, theoreticalapproaches remain incomplete.In this study, by extending the branching process approximation proposed inour previous study [23], we show that the branching process approximation(speciﬁcally, the assumption of a random family forest, which is a collection offamily trees) is also useful for estimating the distribution of the impact degreetriggered by the random knockout (or disruption) of multiple reactions.3

Impact Degree

Here, we brieﬂy review the impact degree [22] and its extensions [19, 33]. The impact degree was originally proposed by Jiang et al. as a measure of theimportance of each reaction in a metabolic network [22], and is deﬁned as thenumber of inactivated reactions caused by the knockout of a single reaction.Since the eﬀect of cycles was not considered in their study, Tamura et al.extended the impact degree so that the eﬀect of cycles is taken into accountby introducing the maximal valid assignment concept [19]. Furthermore, theyextended it to cope with the knockout of multiple reactions [33].Let V c = { C , . . . , C m } and V r = { R , . . . , R n } be a set of compound nodes anda set of reaction nodes respectively, where V c ∩ V r = {} . A metabolic network is deﬁned as a bipartite directed graph G ( V c ∪ V r , E ) in which each edge isdirected either from a node in V c to a node in V r , or from a node in V r to anode in V c . Each of the included reactions and compounds takes 1 of 2 states:0 (inactive) or 1 (active).The impact degree for the knockout of multiple reactions is computed as fol-lows [33]. Let V ko = { R i , . . . , R i D } be a set of reactions that have been knockedout. We start with the global state, such that all compounds are active (i.e., C i = 1 for all C i ∈ V c ) and all reactions except for those in V ko are active (i.e., R i = 1 for all R i ∈ V r − V ko and R i = 0 for all R i ∈ V ko ). Then, we updatethe states of reactions and compounds using the following rules.(1) A reaction is inactivated if any predecessor (i.e., substrate) or successor(i.e., product) is inactive.(2) A compound is inactivated if all predecessors or all successors are inactive.We repeat this procedure until reaching a stable global state, which is deter-mined uniquely regardless of the order of updates [19]. The impact degree for V ko is the number of inactive reactions (i.e., reactions with value 0) in the sta-ble global state. Although this procedure simultaneously gives the deﬁnitionand algorithm for the impact degree, Tamura et al. developed a much moreeﬃcient algorithm to compute it [33].Fig. 1 shows an example of a metabolic network. If R is knocked out, theother reactions remain active and thus the impact degree is 1. If R is knockedout, C is inactivated and then R is inactivated. However, C and C remainactive, and thus the impact degree is 2. If both R and R are knocked out, C is inactivated, and R and R are inactivated. Since R , R , R , R , R areinactive in the stable global state, the resulting impact degree is 5.4 R R R C R C C R C R C C C Fig. 1. Example of a metabolic network. Circles and boxes correspond to compoundsand reactions, respectively.

We have extended the branching process approximation, previously reportedin [23], to include the estimation of the impact degree distributions obtainedas a result of the knockout of multiple reactions.The branching process [35] is a stochastic process in which each progenitorgenerates oﬀspring according to a ﬁxed probability distribution called theoﬀspring distribution. Since the propagation of an impact on a network is es-sentially similar to cascading failures, which is a sequence of failures caused byan accident, branching process approximation is useful for estimating impactdegree distributions, although it requires the assumption of a network treestructure (see Sec. 5 for details of model limitations).

We need to deﬁne the number of oﬀspring for each reaction node in metabolicnetworks in order to analyze the impact degree distributions using the branch-ing process approximation.To obtain the number of oﬀspring, in the previous study [23], we used thereaction network obtained as the unipartite projection of the metabolic net-work, where we draw an edge from reaction A to reaction B when at least1 product of A is a substrate of B (see Sec 3.1 in [23] for details). Assumingthat the impact spreads though reactions whose substrates are synthesized viaunique metabolic reactions (i.e., reaction nodes with the indegree of 1) whenassuming tree structures of networks, we deﬁned the number d i of oﬀspringfor reaction node i in metabolic networks as follows: d i = k out i if k in i = 1 and 0otherwise, where k out i and k in i are the outdegree and indegree of reaction node i in the reaction network, respectively.However, this deﬁnition is not appropriate when diﬀerent metabolic networks5Figs. 2A and 2B) are projected into the same reaction network (Fig. 2C). Inthis case, the similar oﬀspring distribution (or potential of spreading) is deﬁnedbetween these diﬀerent metabolic networks although the tendency of impactspreading is clearly diﬀerent between the networks. For example, in particular,we consider the knockout of the reaction R (i.e., the inactivation of R ) inFig. 2. In Fig. 2A, the reaction R is also inactive (i.e., the impact spreads)after the inactivation of the reaction R because it requires the compound C synthesized through the reaction R . On the other hand, in Fig. 2B, theimpact does not spread because the compound C required by the reaction R can be synthesized through the reaction R even if the reaction R is inactive. R R R R R R R R R (A)(B) C C C C C (C) Unipartite projection

Fig. 2. Example illustrating how diﬀerent metabolic networks (A and B) are con-verted to the same reaction network (C) through the unipartite projection. Circlesand boxes correspond to compounds and reactions, respectively. Lightning boltsindicate the impacts.

To distinguish between these cases, we consider spreading edges that con-tributes to the impact spreading on reaction networks. In particular, spreadingedges are deﬁned as directed edges between a reaction node pair (i.e., sourceand target nodes) whose interjacent chemical compounds are synthesized bythe source reaction only. For example, in Fig. 2A, the interjacent compound(i.e., C ) between the reaction nodes R and R is generated through the reac-tion R only; thus, the directed edge from the reaction node R to the reactionnode R in Fig. 2C is deﬁned as a spreading edge. On the other hand, in Fig.2B, the interjacent compound C is obtained either through the reactions R or R ; thus, the directed edge from the reaction node R to the reaction node R in Fig. 2C is not regarded as a spreading edge.After ﬁnding the spreading edges in reaction networks according to this deﬁ-nition, the number of oﬀspring of a reaction node is deﬁned as the number ofout-going spreading edges. For example, the number of oﬀspring of the reac-tion node R and R in Fig. 2A are both 1; however, they are both 0 in Fig.2B. In this manner, we can determine the diﬀerence in the tendency of impactspreading between these metabolic networks.6e ﬁnally obtain the oﬀspring distribution P ( d ) from an empirical metabolicnetwork as P ( d ) = 1 N N X i =1 δ ( d, d i ) , (1)where N corresponds to the total number of reaction nodes. The function δ ( x, y ) is the Kronecker’s delta function, and it returns 1 if x = y and 0otherwise. Here, we explain the estimation method for the impact degree distributionobtained as a result of the knockout of either a single reaction or multiplereactions, based on a theory of branching processes.

We previously investigated the impacts of a single knockout [23]; here, webrieﬂy summarize these methods and ﬁndings as a basis for extending ourapproach to include cases in which multiple reactions are knocked out.In this study, we consider a branching process model (i.e., the empirical modelin Ref. [23]) using empirical oﬀspring distributions, obtained using Eq. (1).Because the impact degree can be regarded as the total number of oﬀspringthrough a branching process [23], we obtained the distribution of the totalnumber of oﬀspring to estimate the impact degree distribution. However, thebranching process model we described previously does not count the ﬁrst im-pact (i.e., progenitor) although counting the ﬁrst impact is necessary for esti-mating the impact degree distribution obtained as a result of the knockout ofmultiple reactions. Thus, we estimate the distribution of the impact degree,including the ﬁrst impact, triggered by the knockout of a single reaction, asfollows.Let F ( s ) be the probability generating function of the impact degree r (i.e.,the total number of oﬀspring, including the progenitor); the function F ( s )satisﬁes the recursive relation [34–36]: F ( s ) = sf ( F ( s )) , (2)where f ( s ) denotes the probability generating function of the number d of7ﬀspring of each reaction node: f ( s ) = d max X d =0 P ( d ) s d , (3)where d max is the maximum number of oﬀspring.Using the formula reported by Burman and Lagrange and the relation of P ( r ) = 1 r ! d r F ( s ) ds r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s =0 , (4)the distribution P ( r ) (i.e., impact degree distribution) is derived from theabove implicit equation as the following explicit equation [34]: P ( r ) = 1 r ! " d r − ds r − ( f ( s )) r s =0 . (5)The impact degree distributions obtained as a result of the random knockoutof a single reaction are estimated through this equation. In this study, we assume the impact spreading triggered by the knockout ofmultiple reactions is represented by the independent branching process, givenan initial number of k individuals (progenitors; i.e., the random family forest,which is a collection of k family trees) [36].Because the branching process with k initial progenitors can be deﬁned as k independent copies of a branching process with a single initial progenitor [36],the impact degree triggered by the knockout of multiple reactions can be de-scribed as the sum of the total number of oﬀspring through a branching pro-cess with a single initial knockout. Thus, the probabilistic generating function F k ( s ) of the impact degree, which is induced by the knockout of k reactions, isderived using the probabilistic generating function F ( s ) of the impact degreethat results from a single knockout. This is demonstrated as follows: F k ( s ) = [ F ( s )] k . (6)To obtain the probabilistic generating function F ( s ) of the impact degreetriggered by a single knockout, we consider 2 cases. In the ﬁrst case, we use8n estimated value for F ( s ) that is derived using Eq. (5): F ( s ) = r max X r =0 P ( r ) s r . (7)In the second case, we use an empirical value of F ( s ) (i.e., the probabilis-tic generating function calculated according to Eq. (7), using the empirical P ( r ) computed by numerical simulations). Using the empirical F ( s ), we canpurely evaluate the validity of the assumption of random family forests. r max corresponds to the maximum impact degree in the empirical P ( r ).Finally, the impact degree distribution P k ( r ) obtained as a result of the knock-out of k reactions is estimated as P k ( r ) = the coeﬃcient of s r in F k ( s ) . (8) We evaluated the eﬃciency of the above estimation methods for the impactdegree distributions using real-world metabolic networks of several species.We focused on 2 bacteria, (i.e.,

Escherichia coli and

Bacillus subtilis ), and 2 eu-karyotes, (i.e.,

Saccharomyces cerevisiae (yeast) and

Homo sapiens (human)),whose metabolic pathways have been well characterized using experimentalapproaches in a previous study [23]. We downloaded metabolic network datafor each species from the KEGG database [3,37]. Each dataset was representedas bipartite networks, as shown in Fig. 1,.For these species, the impact degree distributions obtained as a result of theknockout of k reactions were numerically calculated using an eﬃcient algo-rithm [33]. In this study, we considered cases of k = 1 (i.e., single knockout)and k = 2 and 3 (i.e., multiple knockouts). We compared these observed im-pact degree distributions using the theoretical distributions that were obtainedas follows.According to Sec. 3.1, we constructed the reaction networks through the uni-partite projection of bipartite metabolic networks from the KEGG databaseand obtained the oﬀspring distributions from the reaction networks (Fig. 3).The number of oﬀspring follows a power-law-like distribution. This may bebecause of the heterogeneous (or scale-free) connectivity in metabolic net-works [38]. The oﬀspring distributions are slightly diﬀerent between the modi-ﬁed deﬁnition based on spreading edges and the previous deﬁnition in Ref. [23].Table 1 summarizes the parameters extracted from the reaction networks, and9 able 1Parameters extracted from real-world reaction networks. The character µ previous and µ modiﬁed are the mean number of oﬀspring of a reac-tion node estimated using our previously described deﬁnition [23] and the modiﬁeddeﬁnition described here based on spreading edges (this study), respectively.Species µ previous µ modiﬁed Escherichia coli

Bacillus subtilis

937 2799 0.47 0.58

Saccharomyces cerevisiae

856 2328 0.48 0.59

Homo sapiens it shows that the mean number of oﬀspring, an important parameter for de-scriptions of branching processes, is either overestimated or underestimatedusing the previously described deﬁnition, which is in contrast to the use ofour modiﬁed deﬁnition presented here. The cases of

E. coli and

H. sapiens represent µ previous > µ modiﬁed , whereas the cases of B. subtilis and

S. cerevisiae show µ previous < µ modiﬁed . These diﬀerences in oﬀspring distributions aﬀect theaccuracy of predicting the impact degree distributions (Table 2). Note thatthe number of nodes and the mean number of oﬀspring are slightly diﬀer-ent between the previous study [23] and this study because of the removal ofredundant metabolic reactions in the database. As reported in our previous study [23], the branching process approximationis useful for estimating the impact degree distributions (Fig. 4). Because ofthe subcritical case (i.e., µ < µ , for example, the total number of oﬀspring (i.e., impact degree) r is distributed according to the Borel distribution [26]: P ( r ) = ( µr ) r − e − µr /r !.Using Stirling’s formula (i.e., r ! ≈ √ πrr r e − r ), the above equation leads tothe approximation P ( r ) ∝ r − / e − r (ln µ − µ +1) . (9)When µ = 1 (i.e., the critical case), the impact degree follows the power-law distribution with the exponent of − / P ( d ) ∝ /d γ +1 , where γ < P ( r ) ∝ /r /γ [25]. 10 -3 -2 -1 -3 -2 -1 -4 -3 -2 -1 -4 -3 -2 -1 R e l a t i v e f r equen cy R e l a t i v e f r equen cy A BC D

Fig. 3. The oﬀspring distributions of

Escherichia coli (A),

Bacillus subtilis (B),

Saccharomyces cerevisiae (C), and

Homo sapiens (D). The open circles and crossesindicate the distributions obtained using the modiﬁed and previous deﬁnitions ofthe number of oﬀspring, respectively.

To evaluate the goodness of ﬁt between the empirical distributions and theo-retical distributions from Eq. (5), we performed the Kolmogorov-Smirnov (KS)test (Table 2). The KS test shows that the theoretical distributions estimatedusing the branching process approximation are in good agreement with em-pirical distributions of real-world metabolic networks and that the estimationusing the modiﬁed deﬁnition of the number of oﬀspring is better than thatusing the previous deﬁnition. In particular, the poor agreement between theimpact degree distributions of the

H. sapiens metabolic network calculatedusing the theoretical estimation and real data, as reported in the previousstudy [23], was improved as a result of the consideration of spreading edges(see Sec. 3.1). Note that the KS distance and the P -value are diﬀerent betweenthe previous study [23] and this study because the previous study does notconsider the ﬁrst impact when calculating the impact degree.11 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree10 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degreeActualModified Previous

A BC D

Fig. 4. The impact degree distributions obtained after the knockout of a singlereaction in

Escherichia coli (A),

Bacillus subtilis (B),

Saccharomyces cerevisiae (C),and

Homo sapiens (D). The circles indicate observed data. The solid lines anddashed lines correspond to the theoretical distributions estimated by Eq. (5) usingthe modiﬁed and previous deﬁnitions of oﬀspring distributions, respectively.

We evaluated the eﬃciency of the branching process approximation whenknocking out 2 reactions (Fig. 5) and when knocking out 3 reactions (Fig.6). As explained in Sec. 3.2.2, we consider both the estimated F ( s ) and em-pirical F ( s ) when estimating the impact degree distributions resulting fromthe knockout of multiple reactions. We used the deﬁnition based on spread-ing edges when calculating the estimated F ( s ) due to its improved accuracy(Table 2). 12 able 2Comparison of prediction accuracy (i.e., goodness of ﬁt) for the impact degreedistributions between the previous deﬁnition and the modiﬁed deﬁnition of d i :Kolmogorov-Smirnov (KS) distance, deﬁned as sup x | R ( x ) − M ( x ) | , where R ( x )and M ( x ) are empirical distributions and theoretical distributions, respectively.The parenthetic values indicate the logarithmic P -values p from the KS test, de-ﬁned as − log ( p ). A smaller KS distance and logarithmic P -values indicate a highergoodness of ﬁt between the empirical distribution and theoretical distribution. Thehighlighted values correspond to the best accuracy.Species Previous version Modiﬁed version Escherichia coli

Bacillus subtilis

Saccharomyces cerevisiae

Homo sapiens

Overall, the branching process approximation (i.e., the modeling based onrandom family forests [36]) is also useful for estimating the impact degreedistribution for the knockout of multiple reactions, despite some exceptions.The larger impact degrees of the theoretical distribution calculated from

B.subtilis (Figs. 5B and 6B) using the estimated F ( s ) and the observed distri-butions are not similar. This weaker ﬁt is caused by the prediction accuracyin the branching process approximation (i.e., Eq. (5)) rather than the as-sumption of random family forests, illustrated by the fact that the theoreticaldistributions derived using the empirical F ( s ) are in good agreement with theobserved distributions.In the case of S. cerevisiae (Figs. 5C and 6C), our data suggest that the as-sumption of random family forests is less suitable for estimating the impactdegree distributions. In particular, the theoretical distributions obtained us-ing the empirical F ( s ) are narrower than the observed distributions althoughthe theoretical distributions calculated using the estimated F ( s ) are in goodagreement with the observed distributions. This result implies the existenceof a synergistic eﬀect of multiple knockouts on impact spreading in metabolicnetworks (i.e., the eﬀect of multiple knockouts is larger than expected basedon the assumptions regarding the combined eﬀects of independent impactspreading initiated by individual knockouts). By extending the branching process approximation in the previous study [23],we proposed a random family forest model for estimating the impact degree13 ctual

Using estimated F(s) Using emprical F(s)

A BC D -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree10 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree

Fig. 5. The impact degree distributions obtained following the knockout of 2 reac-tions in

Escherichia coli (A),

Bacillus subtilis (B),

Saccharomyces cerevisiae (C), and

Homo sapiens (D). The circles indicate observed data. The solid lines and dashedlines correspond to the theoretical distributions from Eq. (8) using the estimated F ( s ) and empirical F ( s ), respectively. distributions obtained as a result of multiple knockouts of metabolic networks,and demonstrated its validity using real-world metabolic network data. Inde-pendent of our previous study [23], Lee et al. [32] have also proposed a branch-ing process approach for Boolean bipartite networks of metabolic reactions;however, they only considered cases including a single knockout.In addition, we have provided a better deﬁnition of the number of oﬀspring ofa reaction node using the concept of spreading edges (see Sec. 3.1). Because ofthis modiﬁed deﬁnition, the poor predictions reported in our previous study[23] were improved here in the context of a single knockout (see Table 2).14 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree10 -9 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree 10 -9 -8 -7 -6 -5 -4 -3 -2 -1 R e l a t i v e f r equen cy Impact degree

A BC D

Actual

Using estimated F(s) Using emprical F(s)

Fig. 6. The impact degree distributions obtained as a result of the knockout of3 reactions of

Escherichia coli (A),

Bacillus subtilis (B),

Saccharomyces cerevisiae (C), and

Homo sapiens (D). The circles indicate observed data. The solid linesand dashed lines correspond to the theoretical distributions from Eq. (8) using theestimated F ( s ) and empirical F ( s ), respectively. However, the deﬁnition of spreading edges is not without limitations. In partic-ular, spreading edges do not allow for the consideration of conditional impactspreading. For example, in Fig. 2B, the impact can spread from the reactionnode R to the reaction node R , if the reaction node R has been alreadyinactive; however, this deﬁnition is not applicable to such a situation.The branching process approximation (i.e., the estimation using Eq. (5)) alsopossesses limitations, such as the assumption of network tree (or less modular)structures, as explained in the previous study [23]. The existence of cycles maylead to an overestimation of the number of oﬀspring for each reaction node15ecause some oﬀspring of a progenitor (reaction node) may have already beeninactivated due to cycle structures; thus, the branching process approxima-tion may overestimate the impact degree distributions. Such overestimationof the impact degree distributions is observed in the case of multiple knockoutsbecause the error is ampliﬁed through a power of the probability generatingfunction of the impact degree (i.e., Eq. (6)) (see the solid lines in Figs. 5B and6B). On the other hand, however, the number of oﬀspring may also be underes-timated because the deﬁnition of spreading edges does not consider conditionalimpact spreading (see Sec. 3.1 for details). The estimation of oﬀspring numberis inﬂuenced by these 2 eﬀects; therefore, the diﬃculty in estimating oﬀspringnumber is also a limitation of the branching process approximation.Analysis based on random family forest (i.e., the estimation using Eq. (7)) alsohas limitations such as an overestimation and/or underestimation of the im-pact degree distributions. For example, due to independence, random familyforests do not consider the integration of impact spreading initiated by initialimpacts on metabolic networks. In such cases, the impact degree distributionmay be overestimated. In contrast, the impact degree distribution may alsobe underestimated because the modeling based on random family forests doesnot consider the synergetic eﬀect due to multiple knockouts; both computa-tional (e.g., [7]) and experimental studies (e.g., [29, 31]) have reported suchsynergetic eﬀects associated with multiple knockouts, indicating that genes(or reactions) not essential for growth, as in the case of a single knockout,are identiﬁed as essential genes (or reactions) when multiple knockouts areconsidered. This eﬀect is slightly related to conditional impact spreading (seeSec. 3.1 for details). In Fig. 2B, for example, the impact does not spread ifeither the reaction R or the reaction R is inactivated (i.e., in the case of asingle knockout). However, the impact spreading occurs when both reactions R and R are inactivated (i.e., in the case of double knockouts). Thus, theunderestimation of the impact degree distributions (see the dashed lines inFigs. 5C and 6C) might be observed. The estimation of the impact degreedistribution obtained in cases of multiple knockouts, when based on randomfamily forests, is also complicated by these eﬀects.In order to make better predictions, analysis based on branching process maybe improved by considering additional assumptions other than static metabolicnetwork structures For example, it may be useful to take into account the as-sumption of variable propagations in the branching process [39], in which themean of oﬀspring distributions diﬀers at each propagation stage. For this mod-iﬁcation, we would need to numerically obtain the oﬀspring distributions ateach propagation stage after the initial impact. Although this procedure is lessadvantageous from the viewpoint of computational costs, such an improvementmay become an important topic in the future.Although the analysis based on branching process approximation possesses16everal limitations, as explained above, its validity has been conﬁrmed overallusing real-world metabolic networks. This may be the result of the randomnessor neutrality in metabolic network structures. Several analytical [40, 41] andtheoretical studies [42,43] suggest that the structure of metabolic networks canbe determined through simple evolutionary processes and it is less modular(or more random) than previously thought. Our results support the validityof the application of branching processes to the estimation of impact degreedistributions.The theoretical estimation of impact degree distributions is helpful for bothexperimental and computational studies on multiple knockouts because theevaluation of the impact of multiple knockouts is expensive. For example,it is diﬃcult to numerically obtain the impact degree distributions by theknockout of more than 3 reactions, even when an eﬃcient algorithm [33] isused. This is because of the combinatorial complexity; thus, we could onlytest cases in which of fewer than 4 reactions are simultaneously are knockedout. The analysis based on branching processes can easily predict the impactdegree distributions obtained as a result of multiple knockouts using eitherthe structure of metabolic networks or the impact degree distributions forsingle knockout, which can be easily calculated using the eﬃcient algorithm[33]; however, some errors may be observed due to several limitations withincrease in the number of disrupted reactions (i.e., in the case of high-orderknockouts). Such combinatorial complexity is also observed in experimentalstudies on multiple knockouts (e.g., [29, 30]). In particular, it is hard to ﬁndthe set of reactions signiﬁcantly aﬀecting metabolic dynamics. Deutscher etal. [7] also mentioned such a problem due to the combinatorial complexity inexperimental studies. Our theoretical approach may be able to support suchexperimental studies by using computational approaches, as introduced in Sec.1, as well as studies of robustness in metabolic networks. Acknowledgments

This work was supported by a Grant-in-Aid for Young Scientists (A) from theJapan Society for the Promotion of Science (JSPS) (no. 25700030). K.T. waspartly supported by Chinese Academy of Sciences Fellowships for Young In-ternational Scientists (no. 2012Y1SB0014), and the International Young Sci-entists Program of the National Natural Science Foundation of China (no.11250110508). T.A. was partly supported by JSPS, Japan (Grants-in-Aid22240009 and 22650045). 17 eferences [1] J. Stelling, U. Sauer, Z. Szallasi, F. J. Doyle, J. Doyle, Robustness of cellularfunctions, Cell 118 (2004) 675–685.[2] H. Kitano, Towards a theory of biological robustness, Mol. Syst. Biol. 3 (2007)137.[3] M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe and M. Hirakawa, KEGG forrepresentation and analysis of molecular networks involving diseases and drugs,Nucleic Acids Res. 38 (2010) D355–D360.[4] I. M. Keseler, et al., EcoCyc: a comprehensive database of

Escherichia coli biology, Nucleic Acids Res. 39 (2011) D583–D590.[5] J. S. Edwards, B. O. Palsson, Robustness analysis of the

Escherichia coli metabolic network, Biotechnol. Prog. 16 (2000) 927–939.[6] D. Segr`e, D. Vitkup, G. M. Church, Analysis of optimality in natural andperturbed metabolic networks, Proc. Natl. Acad. Sci. USA 99 (2002) 15112–15117.[7] D. Deutscher, I. Meilijson, S. Schuster, E. Ruppin, Can single knockoutsaccurately single out gene functions? BMC Bioinformatics 2 (2008) 50.[8] J. A. Papin, J. Stelling, N. D. Price, S. Klamt, S. Schuster, B. O. Palsson,Comparison of network-based pathway analysis methods, Trends Biotechnol. 22(2004) 400–405.[9] T. Wilhelm, J. Behre, S. Schuster, Analysis of structural robustness of metabolicnetworks, Syst. Biol. 1 (2004) 114–120.[10] J. Behre, T. Wilhelm, A. von Kamp, E. Ruppin, S. Schuster, Structuralrobustness of metabolic networks with respect to multiple knockouts, J. Theor.Biol. 252 (2008) 433–441.[11] V. Acu˜na, F. Chierichetti, V. Lacroix, A. Marchetti-Spaccamela, M. F. Sagot,L. Stougie, Modes and cuts in metabolic networks: complexity and algorithms,Biosystems 95 (2009) 51–60.[12] A. P. Burgard, P. Pharkya, C. D. Maranas, OptKnock: a bilevel programmingframework for identifying gene knockout strategies for microbial strainoptimization, Biotechnol. Bioeng. 84 (2003) 647–657.[13] U. U. Haus, S. Klamt, T. Stephen, Computing knock-out strategies in metabolicnetworks, J. Comput. Biol. 15 (2008) 259–268.[14] S. Klamt, E. D. Gilles, Minimal cut sets in biochemical reaction networks,Bioinformatics 20 (2004) 226–234.[15] A. Larhlimia,-S. Blachona, J. Selbiga, Z. Nikoloski, Robustness of metabolicnetworks: a review of existing deﬁnitions, BioSystems 106 (2011) 1–8.

16] T. Handorf, O. Ebenh¨oh, R. Heinrich, Expanding metabolic networks: scopesof compounds, robustness, and evolution, J. Mol. Evol. 61 (2005) 498–512.[17] Z. Li, R-S. Wang, X-S, Zhang, L. Chen, Detecting drug targets with minimumside eﬀects in metabolic networks, IET Syst. Biol. 3 (2009) 523–533.[18] P. Sridhar, B. Song, T. Kahveci, S. Ranka, Mining metabolic networks foroptimal drug targets, in: Proc. Paciﬁc Symposium on Biocomputing 2008, 2008,291–302.[19] T. Tamura, K. Takemoto, T. Akutsu, Finding minimum reaction cuts ofmetabolic networks under a Boolean model using integer programming andfeedback vertex sets, Int. J. Knowl. Discov. Bioinform. 1 (2009) 14–31.[20] N. Lemke, F. Her´edia, C. K. Barcellos, A. N. dos Reis, J. C. M. Mombach,Essentiality and damage in metabolic networks, Bioinformatics 20 (2004) 115–119.[21] A. G. Smart, L. A. N. Amaral, J. M. Ottino, Cascading failure and robustnessin metabolic networks, Proc. Natl. Acad. Sci. USA 105 (2008) 13223–13228.[22] D. Jiang, S. Zhou, Y-P. P. Chen. Compensatory ability to null mutation inmetabolic networks, Biotechnol. Bioeng. 103 (2009) 361–369.[23] K. Takemoto, T. Tamura, Y. Cong, W.-K. Ching, J.-P. Vert, T. Akutsu,Analysis of the impact degree distribution in metabolic networks using branchingprocess approximation. Physica A 391 (2012) 379–387.[24] D.-S. Lee, K.-I. Goh, B. Khang, D. Kim, Branching process approach toavalanche dynamics on complex networks, J. Korean Phys. Soc. 44 (2004) 633–637.[25] A. Saichev, A. Helmestetter, D. Sornette, Power-law distributions of oﬀspringand generation numbers in branching models of earthquake triggering, Pure Appl.Geophys. 162 (2005) 1113–1134.[26] I. Dobson, B. A. Carreras, D. E. Newman, A branching process approximationto cascading load-dependent system failure, In: Proc. 37th Annual HawaiiInternational Conference on System Sciences (2004).[27] I. Dobson, B. A. Carreras, D. E. Newman, A loading-dependent model ofprobabilistic cascading failure, Probab. Eng. Inform. Sc. 19 (2005) 15–31.[28] J. Kim, I. Dobson, Approximating a loading-dependent cascading failure modelwith a branching process, IEEE T. Reliab. 59 (2010) 691–699.[29] D. Deutscher, I. Meilijson, M. Kupiec, E. Ruppin, Multiple knockout analysis ofgenetic robustness in the yeast metabolic network, Nat. Genet. 38 (2006) 993–998.[30] K. Nakahigashi, et al., Systematic phenome analysis of

Escherichia coli multiple-knockout mutants reveals hidden reactions in central carbon metabolism,Mol. Syst. Biol. 5 (2009) 306.