[PDF] An analytic approximation of the feasible space of metabolic networks

Abstract

Assuming a steady-state condition within a cell, metabolic fluxes satisfy an under-determined linear system of stoichiometric equations. Characterizing the space of fluxes that satisfy such equations along with given bounds (and possibly additional relevant constraints) is considered of utmost importance for the understanding of cellular metabolism. Extreme values for each individual flux can be computed with Linear Programming (as Flux Balance Analysis), and their marginal distributions can be approximately computed with Monte-Carlo sampling. Here we present an approximate analytic method for the latter task based on Expectation Propagation equations that does not involve sampling and can achieve much better predictions than other existing analytic methods. The method is iterative, and its computation time is dominated by one matrix inversion per iteration. With respect to sampling, we show through extensive simulation that it has some advantages including computation time, and the ability to efficiently fix empirically estimated distributions of fluxes.

Full PDF

AAn analytic approximation of the feasible space of metabolic networks

Alfredo Braunstein ∗ DISAT, Politecnico di Torino, 10129 Torino, ItalyHuman Genetics Foundation-Torino, 10126 Torino, Italy andCollegio Carlo Alberto, 10024 Moncalieri, Italy

Anna Paola Muntoni

DISAT, Politecnico di Torino, 10129 Torino, Italy

Andrea Pagnani

DISAT, Politecnico di Torino, 10129 Torino, ItalyHuman Genetics Foundation-Torino, 10126 Torino, Italy andIstituto Nazionale di Fisica Nucleare (INFN) Via Pietro Giuria, 10125, Torino, Italy

Assuming a steady-state condition within a cell, metabolic ﬂuxes satisfy an under-determinedlinear system of stoichiometric equations. Characterizing the space of ﬂuxes that satisfy such equa-tions along with given bounds (and possibly additional relevant constraints) is considered of utmostimportance for the understanding of cellular metabolism. Extreme values for each individual ﬂuxcan be computed with Linear Programming (as Flux Balance Analysis), and their marginal distribu-tions can be approximately computed with Monte-Carlo sampling. Here we present an approximateanalytic method for the latter task based on Expectation Propagation equations that does not in-volve sampling and can achieve much better predictions than other existing analytic methods. Themethod is iterative, and its computation time is dominated by one matrix inversion per iteration.With respect to sampling, we show through extensive simulation that it has some advantages in-cluding computation time, and the ability to eﬃciently ﬁx empirically estimated distributions ofﬂuxes. ∗ [email protected] a r X i v : . [ phy s i c s . b i o - ph ] A p r INTRODUCTION

The metabolism of a cell entails a complex network of chemical reactions performed by thousands of enzymes that con-tinuously process intake nutrients to allow for growth, replication, defense and other cellular tasks [1]. Thanks to the newhigh-throughput techniques and comprehensive databases of chemical reactions, large scale reconstructions of organism-widemetabolic networks are nowadays available. Such reconstructions are believed to be accurate from a topological and stoichio-metric viewpoint (e.g. the set of metabolites targeted by each enzyme, and their stoichiometric ratio). For the determinationof reaction rates, large-scale constraint based approaches have been proposed [2]. Typically, such methods assume a steady-state regime in the system where metabolite concentrations remain constant over time (mass-balance condition). A secondtype of constraints limit the reaction velocities and their direction. In full generality the topology of a metabolic networkis described in terms of the chemical relations between the M metabolites and N reactions. In mathematical terms we candeﬁne a M × N stoichiometric matrix S in which rows correspond to the stoichiometric coeﬃcients of the correspondingmetabolites in all reactions. A positive (resp. negative) S ij term indicates that metabolite i is created (resp. consumed) byreaction j . Assuming mass-balance and limited interval of variation for the diﬀerent reactions, we can cast the problem interms of ﬁnding the set of ﬂuxes ν ∈ R N compatible with the following linear system of constraints and inequalities: S ν = b (1) ν inf ≤ ν ≤ ν sup (2)where b ∈ R M is the known set of intakes/uptakes, and the pair ν inf , ν sup represent the extremes of variability for thevariables of our problem. Only in few cases the extremes are experimentally accessible, in the remaining ones they areﬁxed to arbitrarily large values. It turns out that N ≥ M , and the system is typically under-determined. As an example,the RECON1 model of Homo Sapiens has N = 2469 ﬂuxes ( i.e. variables) and M = 1587 metabolites ( i.e. equations).The mass-balance constraints and the ﬂux inequalities encoded in Eqs. (1)-(2) deﬁne a convex bounded polytope, whichconstitutes the space of all feasible solutions of our metabolic system.The most widely used technique to analyze ﬂuxes in large scale metabolic reconstruction is Flux Balance Analysis (FBA)[3, 4] where a linear objective function, typically the biomass or some biological proxy of it is introduced, and the problemreduces to ﬁnd the subspace of the polytope which optimizes the objective function. If this subspace consists in only onepoint, the problem can be eﬃciently solved using linear programming. FBA has been successfully applied in many metabolicmodels to predict speciﬁc phenotypes under speciﬁc growth condition (e.g. bacteria in the exponential growth phase).However, if one is interested in describing more general growth conditions, or is interested in other ﬂuxes than the biomass[5], diﬀerent computational strategies must be envisaged [6–8].As long as no prior knowledge is considered, each point of the polytope is an equally viable metabolic phenotype of thebiological system under investigation. Therefore, being able to sample high-dimensional polytopes becomes a theoreticalproblem with concrete practical applications. From a theoretical standpoint the problem is known to be S shouldbe statistically uncorrelated. Unfortunately, neither assumption is really fulﬁlled by large-scale metabolic reconstructions.To give an example, consider the rows of the stoichiometric matrix for ecoli-core model [24]. The rows corresponding to theadenosine-diphosphate (ADP) and adenosine-triphosphate (ATP) appear strongly correlated as both metabolites commonlyappear in 11 reactions; the same apply for the intracellular water and hydrogen ion that have 10 reactions in common. Forthese reasons MP methods suﬀer from all kind of convergence and accuracy problems.In this work we propose a new Bayesian inference strategy to analyze with unprecedented eﬃciency large dimensionalpolytopes. The use of a Bayesian framework allows us to map the original problem of sampling the feasible space of solutionsof Eqs. (1)-(2) into the inference problem of the joint distribution of metabolic ﬂuxes. Linear and inequality constraints willbe encoded within the likelihood and the prior probabilities that via Bayes theorem provide a model for the posterior P ( ν | b ) .The goal of this work is to determine a tractable multivariate probability density Q ( ν | b ) able to accurately approximate theposterior even in the case of strongly row-correlated stoichiometric matrices. This strategy relies on an iterative and localreﬁnement of the parameters of Q ( ν | b ) that falls into the class of Expectation Propagation (EP) algorithms. We reportresults of EP for representative state-of-the-art models of metabolic networks in comparison with HR estimate, showing thatEP can be used to compute marginals in a fraction of the computing time needed by HR. We also show how the techniquecan be eﬃciently adapted to incorporate the estimated growth rate of a population of Escherichia Coli . RESULTSFormulation of the problem

We are going to formulate an iterative strategy to solve the problem of ﬁnding a multivariate probability measure overthe set of ﬂuxes ν compatible with Eqs. (1)-(2). For a vector of ﬂuxes satisfying bounds 2, we can deﬁne a quadratic energyfunction E ( ν ) whose minimum(s) lies on the assignment of variables ν satisfying the stoichiometric constraints in Eq. (1): E ( ν ) = 12 ( S ν − b ) T ( S ν − b ) (3)We deﬁne the likelihood of observing b given a set of ﬂuxes ν as a Boltzmann distribution: P ( b | ν ) = (cid:18) β π (cid:19) M e − β ( S ν − b ) T ( S ν − b ) (4)where β is a positive parameter, the “inverse temperature” in statistical physics jargon, that governs the penalty of whoseconﬁgurations of ﬂuxes that are far from the minimum of the energy. In a Bayesian perspective one can consider the posteriorprobability of observing P ( ν | b ) as: P ( ν | b ) = P ( b | ν ) P ( ν ) P ( b ) (5)where the prior P ( ν ) = N (cid:89) n =1 ψ n ( ν n ) = N (cid:89) n =1 I (cid:0) ν n ∈ (cid:2) ν infn , ν supn (cid:3)(cid:1) ν supn − ν infn (6)enforces the bounds over the allowed range of ﬂuxes. The function I (cid:0) ν n ∈ (cid:2) ν infn , ν supn (cid:3)(cid:1) is an indicator function that takesvalue if ν n ∈ (cid:2) ν infn , ν supn (cid:3) and otherwise. We ﬁnally obtain the following relation for the posterior: P ( ν | b ) = 1 P ( b ) (cid:18) β π (cid:19) M e − β ( S ν − b ) T ( S ν − b ) N (cid:89) n =1 ψ n ( ν n ) (7)and eventually we will investigate the β → ∞ limit. Neglecting terms that do not depend on ν , the posterior takes the formof P ( ν | b ) ∝ e − β ( S ν − b ) T ( S ν − b ) N (cid:89) n =1 ψ n ( ν n ) (8)where we have not explicitly reported the normalization constant. By marginalization of Eq. (7) one can determine themarginal posterior P n ( ν n | b ) for each ﬂux n ∈ { , . . . , N } . However, performing this computation naively would require thecalculation of a multiple integral that is in principle computationally very expensive and cannot be performed analyticallyin an eﬃcient way.A standard way of approximately computing P ( ν | b ) is through sampling methods, such as the HR technique. The accuracyobtained with HR depends of course on the number of samples, and sampling accurately can be very time consuming. In thefollowing we develop an analytic approach to approximately compute marginal posteriors which is able to achieve results asaccurate as the HR sampling technique for a large number of sampled points in a fraction of the computing time. But ﬁrst,we will describe as a warm-up a naive analytic method to approximately compute marginal distributions P n ( ν n | b ) . A non-adaptive approach

As a ﬁrst approximation one can think of replacing each exact prior ψ n ( ν n ) with a single Gaussian distribution φ n ( ν n ; a n , d n ) = e − ( νn − an )22 dn √ πd n whose statistics, i.e. the mean and the variance, are constrained to be equal to the one of ψ n ( ν n ) . That is (cid:40) a n = (cid:104) ν n (cid:105) ψ n ( ν n ) d n = (cid:10) ν n (cid:11) ψ n ( ν n ) − (cid:104) ν n (cid:105) ψ n ( ν n ) n ∈ { , . . . , N } (9)We estimate the marginal posteriors from the distribution Q ( ν | b ) = 1 Z Q e − β ( S ν − b ) T ( S ν − b ) N (cid:89) n =1 φ n ( ν n ; a n , d n ) (10) Z Q = (cid:90) d N ν e − β ( S ν − b ) T ( S ν − b ) N (cid:89) n =1 φ n ( ν n ; a n , d n ) (11)Notice that in this approximation ﬂuxes result unbounded. Marginals obtained by this strategy against the Hit-and-RunMonte Carlo estimate are shown is Figure 1 (cyan line) for 9 representative metabolic ﬂuxes of one of the standard model forred blood cell [25]. Marginals evaluated with this simple non-adaptive strategy diﬀer signiﬁcantly from the ones evaluatedwith the Montecarlo sampling technique. In the following we will show how we can overcome this limitation by choosingdiﬀerent values for the means a and the variances d in Eq. 10 making use of the Expectation Propagation algorithm. Expectation Propagation

Expectation Propagation (EP) [26] is an eﬃcient technique to approximate intractable ( i.e. impossible or impractical tocompute analytically) posterior probabilities. EP was ﬁrst introduced in the framework of statistical physics as an advancedmean-ﬁeld method [27, 28] and further developed for Bayesian inference problems in the seminal work of Minka [26].Let us consider the n th ﬂux and its corresponding approximate prior φ n ( ν n ; a n , d n ) . We deﬁne a tilted distribution Q ( n ) as Q ( n ) ( ν | b ) ≡ Z Q ( n ) e − β ( S ν − b ) T ( S ν − b ) ψ n ( ν n ) (cid:89) m (cid:54) = n φ m ( ν m ) (12)The important diﬀerence between the tilted distribution and the multivariate Gaussian Q ( ν | b ) is that all the intractablepriors are approximated as Gaussian probability densities except for the n th prior which is treated exactly . For this reasonwe expect that this distribution will be more accurate than Q ( ν | b ) regarding the estimate of the statistics of ﬂux n withoutsigniﬁcantly aﬀecting the computation of expectations. Bearing in mind that it is a large number of exact priors ( i.e. thedistributions { ψ i } i =1 , ··· ,N ) that make the computation intractable and not a single one, we have introduced only one exact intractable prior in Q ( n ) .One way of determining the unknown parameters a n and d n of φ n ( ν n ; a n , d n ) is to require that the multivariate Gaussiandistribution Q ( ν | b ) is as close as possible to the auxiliary distribution Q ( n ) ( ν | b ) . Intuitively, there are at least two possi-bilities to enforce this similarity: (i) matching the ﬁrst and the second moments of the two distributions (ii) minimizing theKullback-Leibler divergence D KL ( Q n (cid:107) Q ) ; these two methods coincide (see details in Supplementary Note 1). Thus, we aimat imposing the following moment matching conditions: (cid:40) (cid:104) ν n (cid:105) Q ( n ) = (cid:104) ν n (cid:105) Q (cid:10) ν n (cid:11) Q ( n ) = (cid:10) ν n (cid:11) Q (13)from which we get a relation for the parameters a n , d n that is explicitly reported in Section .EP consists in sequentially repeating this update step for all the other ﬂuxes and iterate until we reach a numericalconvergence. Further technical details about the convergence are reported in Subsection . At the ﬁxed point we directlyestimate the marginal posteriors P n ( ν n | b ) , for n ∈ { , . . . , N } , from marginalization of the tilted distribution Q ( n ) that turnsout to be a truncated Gaussian density in the interval (cid:2) ν infn , ν supn (cid:3) (see Supplementary Note 2).At diﬀerence from the non-adaptive approach, the EP algorithm determines the approximated prior density by trying toreproduce the eﬀect that the true prior density has on variable ν n , including the interaction of this term with the rest ofthe system. First, the information encoded in the stoichiometric matrix is surely encompassed in the computation of themeans and the variances of the approximation since both the distributions Q ( n ) and Q contain the exact expression of thelikelihood. Secondly, the reﬁnement of each prior also depends on the parameters of all the other ﬂuxes.As an example of the accuracy of this technique we report in Figure 1 (red line) the nine best marginals estimated by EPof the red blood cell against the results of Hit-and-Run Monte Carlo sampling. Fig. 1 suggests that this technique leads to asigniﬁcant improvement of the non-adaptive approximation as the plot shows a very good overlap between the distributionsprovided by HR and EP. The entire set of marginals and a comparison with a state-of-the-art message passing algorithm [7]is reported in the Supplementary Fig. 2. Numerical results for large scale metabolic networks

This section is devoted to compare the results of our algorithm against the outcomes of a state-of-the-art Hit-and-Run MonteCarlo sampling technique on three representative models of metabolic networks, precisely the iJR904 [29], the

CHOLnorm[30] and the

RECON1 [31] models for

Escherichia Coli , the

Cholinergic neuron and

Homo Sapiens organisms respectively.In Supplementary Fig. 3 we report results for a larger set of models all selected from the Bigg Models database [32]. HK PGI

DPGase

Xu5PE TA GLC

PYR -3 INO

NADPH

Figure 1. Marginal probability densities of nine ﬂuxes of the red blood cell. The blue bars represent the result of Monte Carlo estimatefor T ∼ sampling points. The cyan line is the result of the non-adaptive Gaussian approximation while the red line represents theExpectation Propagation estimate. Experiments are performed as follows. First we preprocess the stoichiometric matrix of the model in order to remove allreactions involving metabolites that are only produced or only degraded[33].After the pre-processing, we run HR and EP, both implemented on Matlab or as Matlab libraries, to the reduced model.Let us explain how the two methods work. Starting from a point lying on the polytope, HR iteratively chooses a randomdirection and collects new samples in that direction such that they also reside in the solution space. In this work we usean optimized implementation of HR, called optGpSampler [6]. Regarding the HR simulations we set the number of sampledpoints to be equal to for an increasing number of explored conﬁgurations T from to in most of the cases; for somespeciﬁc models, i.e. very large networks having N ∼ reactions, we explore up to T ∼ points. Concerning the EPalgorithm we perform the same experiment setting the β parameter to be equal to for almost all models. In only onecase (the RECON1 model), we encountered convergence problems and thus we decreased it to . Numerical convergenceof EP depends on the reﬁnement of parameters a and d or, more precisely, on the estimate of the marginal distributions ofﬂuxes. At each iteration t we compute an error ε which measures how the approximate marginal distributions change in twoconsecutive iterations. Formally, we deﬁne the error as the maximum value of the sum of the diﬀerences (in absolute values)of the mean and second moment of the marginal distribution, that is ε t = max n (cid:12)(cid:12)(cid:12) (cid:104) ν n (cid:105) t +1 Q ( n ) − (cid:104) ν n (cid:105) tQ ( n ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:10) ν n (cid:11) t +1 Q ( n ) − (cid:10) ν n (cid:11) tQ ( n ) (cid:12)(cid:12)(cid:12) . If ε t is smaller than a predetermined target precision (we used − ) , the algorithm stops.To quantitatively compare the two techniques we report here the scatter plots of variances and means of the approximatemarginals computed via HR and EP. Moreover we estimate the degree of correlation among the two sets of parameterscomputing the Pearson product-moment correlation coeﬃcientNotice that we cannot have access to the exact marginals and that we assume that the results obtained by HR areexact only asymptotically. Thus our performances, both for the direct comparison of the means and variances and for thePearson’s coeﬃcient, should be considered accurate if they are approached by the Monte-Carlo ones for an increasing numberof explored points.The three large subplots in Fig. 2 show the results for Escherichia Coli , Cholinergic neuron and

Homo Sapiens respectively.For each organism, we report on the top-left panel the time spent by EP (straight line) and by HR (cyan points) and onthe bottom-left panel the Pearson correlation coeﬃcients. Both measures of time and correlation, are plotted as functionsof the number of conﬁguration T obtained from the HR algorithm. As shown in these plots, we can notice that to reach ahigh correlation regime a very large number of explored conﬁgurations, employing a computing time that is always severalorders of magnitude larger than the EP running time. This is particularly strinking in the case of the RECON1 model, forwhich we needed to run HR for about 20 days in order to reach results similar to the outcomes of EP, that converges in lessthan one hour on the same machine.To underline how EP seems to approach HR results in the asymptotic limit, we report in the rest of the sub-ﬁgures thescatter plots of the means (top) and the variances (bottom) of the marginals. On the y -axis we plot the EP means (variances)against the HR means (variances) for an increasing number of explored conﬁgurations, as indicated in x -axis. Results clearlyshow that as T grows, the points (both means and variances) are more and more aligned to the identity line: not only thesemeasures are highly correlated for large T , but they assume very similar values. This is remarkably appreciable in the resultsfor CHOLnorm model: for T = 4 · the means of the scatter plots are quite unaligned but as T reaches · , theyalmost lie on the identity line. In fact, means are poorly correlated for T = 4 · while the Pearson correlation coeﬃcientis close to 1 for T = 4 · . Study of Escherichia Coli metabolism for a constrained growth rate

The EP formalism can eﬃciently deal with a slightly modiﬁed version of problem of sampling metabolic networks. Supposeto have access to experimental measurements of the distribution of some ﬂuxes under speciﬁc environmental conditions. Wewould like to embed this empirical knowledge in our algorithm, by matching the posterior distribution of the measuredﬂuxes with the empirical measurements. Within the EP scheme, this task corresponds to matching the ﬁrst two moments(mean and variance) of the posteriors with the one deﬁned by the empirical measurements. With the inclusion of empiricallyestablished prior knowledge, we want to investigate how the experimental evidence is related to the metabolism at the levelof reactions or, in other words, we want to determine how ﬂuxes modify in order to reproduce the experiments. In thisperspective, the EP scheme can easily accommodate additional constraints on the posteriors by modifying the EP updateequations as outlined in Methods.We have tested this variant of EP algorithm on the iJR904 model of

Escherichia Coli for a constrained growth rate. Infact, one of the few ﬂuxes that are experimentally accessible is the biomass ﬂux, often measured in terms of doubling perhour. As a matter of example we decide to extract one of the growth rates reported in Fig. 3(a) of [34]; the proﬁle labeledas

Glc (P5-ori) can be well ﬁtted by a Gaussian probability density of mean .

92 h − and variance . − . This curverepresent single-cell measures of a population of bacteria living in the so-called minimal substrate whose main characteristicsare in principle well caught by the iJR904 model. We ﬁxed the bound on the glucose exchange ﬂux EX_glc(e) such that themaximum allowed growth rate (about − ) contained all experimental values in the proﬁle labeled as Glc (P5-ori) of Fig.3(a) of [34]. This was easily computed by ﬁxing the biomass ﬂux to the desired value and minimizing the glucose exchangeﬂux using FBA, and gies a the lower bound of the exchanged glucose ﬂux of −

43 mmol (g[DW]) − h − .We then apply EP algorithm to the modiﬁed iJR904 model in two diﬀerent conditions. First, we do not impose anyadditional constraint and we run the original EP algorithm as described in the previous section. Then, as described inMethods, we ﬁx the marginal posterior of the biomass. We can now compare the means and the variances of all the otherﬂuxes in the two cases and single out those ﬂuxes that have been more aﬀected by the empirical constraints on the growth Escherichia Coli -- iJR904 -- N = 786 -- M = 499 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -6 -2 V a r i a n ce , EP -6 -2 T =6 : " -6 -2 -6 -2 T =1 : " -6 -2 -6 -2 T =4 : " -6 -2 -6 -2 Cholinergic neuron -- CHOLnorm -- N = 666 -- M = 577 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -13 -2 V a r i a n ce , EP -13 -2 T =6 : " -13 -2 -13 -2 T =1 : " -12 -2 -12 -2 T =4 : " -13 -2 -13 -2 Homo Sapiens -- RECON1 -- N = 2469 -- M = 1587 T P e a r s o n MeanVariance

T = 1 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 2 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 2 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =1 : " -6 -2 V a r i a n ce , EP -6 -2 T =2 : " -5 -2 -5 -2 T =1 : " -5 -2 -5 -2 T =2 : " -5 -2 -5 -2 Figure 2. Comparison of the results of HR vs EP for iJR904 , CHOLnorm and

RECON1 models. The top-left plot shows the Pearsoncorrelation coeﬃcients between variances and means estimated through EP and HR. The bottom-left panel reports the computingtime of EP and HR for diﬀerent values of T . The plots on the right are scatter plots of the means and variances of the approximatedmarginals computed via EP against the ones estimated via HR for an increasing number of explored conﬁgurations T. log --- h n i Q ( n ) EP --- -8 -6 -4 -2 0 2 4 6 h n i Q ( n ) E P h n i Q ( n ) E P c o n s t : -101234 TKT2 A ! MDHRPE A BiomassEcoli ! !

PYNP2r ! PPM EX_leu_L(e) EX_ile_L(e)

EX_asp_L(e)ASPabc ASPt2_3 + n , Q ( n ) EP ! h n i Q ( n ) EP -2 h n i Q ( n ) E P ! h n i Q ( n ) E P h n i Q ( n ) E P c o n s t : ! h n i Q ( n ) E P c o n s t : A EX_asp_L(e) ASPt2_3 ASPt2_2 ASPabc ASPt2 A CADVt LYSDC EX_15dap(e) A BiomassEcoliSOTA A SADH Figure 3. Results for a constrained biomass ﬂux. Comparison between the means (a) and variances (b) of the marginal probabilitydensities for all the ﬂuxes computed without the additional constraint (unconstrained case) and with the constrained on the biomass(constrained case). The green point indicates the biomass ﬂux. rate. We report in Fig. 3 the plot of the ratio between the means (Fig. 3 (a)) and the variances (Fig. 3 (b)) in theunconstrained case and in the constrained case. In Fig. 3 (a) these ratios are plotted against the logarithm of the absolutevalue of the unconstrained means to diﬀerentiate those ﬂuxes having means close to zero and all the other cases. The ratios ofthe variances are instead plotted as a function of the unconstrained variances in semi-log scale. We can notice that apparentlya large fraction of the ﬂuxes have changed their marginal distribution in order to accommodate the ﬁxed marginal for thebiomass. We have reported the name of the reactions with the most signiﬁcant changes; for instance, the marginal of the

TKT2 reaction has reduced its mean of more than one third, while many reactions involving aspartate have signiﬁcantlylowered their variances.To underline the non-trivial results of EP algorithm in the constrained case, we apply again the standard EP algorithmto the iJR904 model when the lower bound and the upper bound of the biomass is ﬁxed to the average value of theexperimental proﬁle. The comparison (not shown) between the two approaches suggests that the most relevant changeconcerns the

EX_asp_L(e) ﬂux as both the average value and the variance estimated in the second case are about twotimes the ones predicted by constrained EP. The distributions of most other ﬂuxes remain do not considerably change. Weunderline that the diﬀerent behavior of the marginals in the two cases, even if not signiﬁcant for most of the ﬂuxes, was inprinciple unpredictable without the use of constrained EP; and we do not exclude that ﬁxing other empirical proﬁles canlead to very diﬀerent results. Likewise, it seems unlikely that the results computed with constrained EP could be obtainedusing unbiased samples as provided by standard HR implementations (see a discussion in Supplementary Note 6).0

DISCUSSION

In this work we have shown how to study the space of feasible conﬁgurations of metabolic ﬂuxes within a cell via ananalytic description of the marginal probability distribution characterizing each ﬂux. Such marginals are described astruncated Gaussian densities whose parameters are determined through an iterative and extremely eﬃcient algorithm, theExpectation Propagation algorithm. We have compared our predictions against the estimates provided by HR samplingtechnique and results shown in Subsection suggest a very good agreement between EP and HR for a large number ofexplored conﬁgurations, T . First of all, the direct comparison of the means and variances of EP vs HR reported in thescatter plots shows that the more we increment the HR points, the more the scatter points are aligned. Secondly we see anincrement of the correlation between EP and HR statistics for an increasing number of sampled points; correlations reachvalues very close to for large values of T and for almost all the models we have considered. The most important pointis that the computation times of EP, at high correlation regime, are always orders of magnitude lower than HR samplingtimes. This is extremely time-saving when we deal with very large networks, as the RECON1 model for

Homo Sapiens where the running time (in seconds) of EP is three order of magnitude smaller then HR. We underline that exact marginalsare generally inaccessible and we cannot compare our results against a ground-truth; our measures well approximate “true”distributions as long as the exactness of HR in the asymptotic limit is correct.We have shown how to include empirical knowledge on distribution of ﬂuxes on the EP algorithm without compromisingthe computing time. More precisely we have investigated how ﬁxing an experimental proﬁle of the growth rate into the iJR904 model of Escherichia Coli aﬀect non-trivially all other ﬂuxes. This is a remarkable advantage of the EP algorithmwith respect to other methods.EP provides an analytic estimate of each single ﬂux marginal which relies on the optimization of two parameters, themean and the variance of a Gaussian distribution. The formalism allows in principle more complicated parametrizations ofposteriors to include other biological insights.EP equations are extremely easy to derive and to implement, as the main loop can be written in few lines of Matlab code.The method is iterative, and the number of operations in each iteration scales as Θ (cid:0) N (cid:1) , rendering EP extremely convenientin terms of computation time with respect to existing alternatives.An shown in Fig. 2 in real cases variances of the marginal distributions can span several orders of magnitude. This range ofvariability implies that also the variances of the approximation need to allow both very small and huge values. To cope withthe numeric problems that may arise, we allow parameters d to vary in a ﬁnite range of values with the drawback of limitingthe set of allowed Gaussian densities of the approximation. For instance, a ﬂat distribution cannot be perfectly approximatedthrough a Gaussian whose variance cannot be arbitrary large; in the opposite extreme, imposing a lower bound on variancesprevents the approximation of posteriors that are too concentrated on a single point. Thus this range needs to be reasonablydesigned in order to catch as many “true” variances as possible. In this work we have tried to impose a very large range ofvalues, typically (cid:2) − , (cid:3) , to include as many distributions as possible without compromising the convergence of thealgorithm. Moreover, the Gaussian proﬁle itself is surely a limitation of the approximation as true marginals can have inprinciple arbitrary proﬁles.EP performances are sensitive to the parameter β and equations become numerically unstable for too large β (e.g. − ). On the other hand β can be seen as the inverse-variance of a Gaussian noise aﬀecting the conservation laws. Thenature of this noise could depend on localization properties on the cell and real thermal noise. In this case, an optimizationof the free energy with respect to β can in principle lead to better predictions. METHODSUpdate rule

The algorithm described in the Expectation Propagation section relies on local moves in which, at each step, we reﬁne onlythe parameters of one single prior, minimizing the dissimilarity between the auxiliary tilted distribution Q ( n ) and Q . Thevalues of the mean and of the variance of φ n ( ν n ; a n , d n ) are iteratively tuned in a way that the ﬁrst and second moments ofthe two distributions match. The update rule for the parameters a n and d n of the Gaussian prior will be derived in detailsin the following section.1Let us express the auxiliary density Q ( n ) in Eq. 12 as a standard multivariate Gaussian distribution times the exact priorof the n th ﬂux as Q ( n ) ( ν | b ) = 1 Z Q ( n ) e − β ( S ν − b ) T ( S ν − b ) − ( ν − a ) T D ( ν − a ) ψ n ( ν n ) (14) = 1˜ Z Q ( n ) e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) ψ n ( ν n ) (15)where ˜ Z Q ( n ) = Z Q ( n ) e β b T b − ¯ ν T ¯ ν , D is a diagonal matrix with components D mm = d m for m (cid:54) = n and D nn = 0 (and ofcourse non-diagonal terms D mk = 0 if m (cid:54) = k ). The covariance matrix Σ − and the mean vector ¯ ν satisfy: (cid:40) Σ − = β S T S + D ¯ ν = Σ ( β S t b + D a ) (16)Note that we are omitting for notational simplicity the dependence of D , Σ , ν on n . Equivalently Q ( ν | b ) = 1˜ Z Q e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) φ n ( ν n ; a n , d n ) (17)where ˜ Z Q = Z Q e β b T b − ¯ ν T ¯ ν . If we now exploit the moment matching condition in Eq. (13) (a detailed calculation of themoments of Q and Q ( n ) expressed in standard form is reported in Supplementary Notes 2-3) we obtain an update equationfor the mean and the variance:  d n = (cid:18) (cid:104) ν n (cid:105) Q ( n ) −(cid:104) ν n (cid:105) Q ( n ) − nn (cid:19) − a n = d n (cid:104) (cid:104) ν n (cid:105) Q ( n ) (cid:16) d n + nn (cid:17) − ¯ ν n Σ nn (cid:105) (18)Notice that the sequential update scheme described in the Expectation Propagation section requires the inversion of thematrix Σ − each time that we have to reﬁne the parameters of ﬂux n , leading to N inversions per iteration amountingto Θ (cid:0) N (cid:1) operations per iteration. We propose in Supplementary Note 4 a parallel update, that needs only one matrixinversion per iteration, i.e. Θ (cid:0) N (cid:1) operations per iteration. Update equations for a constrained posterior

Let us assume to have access to experimental measures of the (marginal) posterior f ( ν i ) for ﬂux i . We aim at deter-mining how the posteriors of other ﬂuxes would modify to ﬁt with the experimental results compared, for instance, to theunconstrained case. The so-called maximum entropy principle [35] dictates that the most unconstrained distribution whichis consistent with the experiment, prior distributions and ﬂux conservation S ν = b , is simply P ( ν | b ) = 1 Z e − β ( S ν − b ) T ( S ν − b ) N (cid:89) n =1 I (cid:0) ν n ∈ (cid:2) ν infn , ν supn (cid:3)(cid:1) g ( ν i ) , (19)where β → ∞ and g ( ν i ) is the (exponential of the) function of unknown Lagrange multipliers that has to be deter-mined in order for the constraint (cid:82) (cid:81) n (cid:54) = i dν n P ( ν | b ) = f ( ν i ) to be satisﬁed. In the particular case in which the pos-terior can be reasonably ﬁtted by a Gaussian distribution N ( ν i | a expi , d expi ) , then it suﬃces to consider also a Gaussian g ( ν i ) = N ( ν i | a i , d i ) = φ i ( ν i | a i , d i ) with only two free parameters. The determination of a i , d i can be achieved by slightly2modifying the EP update for ﬂux i . Assuming as before that the prior of each ﬂux n (cid:54) = i can be approximated as a Gaussianproﬁle φ n ( ν n ; a n , d n ) of parameters a n and d n , also to be determined, we must impose that N ( ν i | a expi , d expi ) ∝ N ( ν i | a i , d i ) (cid:90) (cid:89) n (cid:54) = i dν n Q ( ν | b ) (20) ∝ φ i ( ν i ; a i , d i ) e − ( νi − ¯ νi ) ii (21)where the distribution Q ( ν | b ) is the one in Eq. (10). Since both the left-hand side and the right-hand side of Eq. (21)contain Gaussian distributions, the relations for a i and d i can be easily computed and take the form  d i = (cid:16) d expi − ii (cid:17) − a i = d i (cid:16) a expi d expi − ¯ ν i Σ ii (cid:17) (22)This expression is exactly the same in Eq. (18) if we replace the mean and the variance of the tilted distribution with theexperimental ones. Technical details

The computations were performed on a Dell Poweredge server with 128 Gb of memory and 48 AMD Opteron cpus clockedat 1.9Ghz. No constraint have been placed on the number of cpu threads, allowing both EP and HR to parallelize theirprocesses. We observed that EP used 2-3 cores, exclusively in the matrix inversion phase (which was time-dominant), whileHR employed a variable number of cores (around 6 or 7 at some times). For this reason only the order of magnitude ofcomputing times of HR and EP are fairly comparable but they are suﬃcient to underline the diﬀerences between the twoalgorithms.

Data and code availability

All data generated or analysed during this study are included in this published article (and its Supplementary Informationﬁle).An implementation of the algorithm presented in this work is publicly available at https://github.com/anna-pa-m/Metabolic-EP .Authors declare no competing interest.

ACKNOWLEDGMENT

We warmly thank Andrea De Martino, Roberto Mulet, Jorge Fernández de Cossio, Eduardo Martinez Montes for interestingdiscussions. We acknowledge support from project SIBYL, ﬁnanced by Fondazione CRT under the initiative “La ricerca deiTalenti” and project “From cellulose to biofuel through Clostridium cellulovorans: an alternative bioreﬁnery approach” ofUniversity of Turin, ﬁnanced by Compagnia di Sanpaolo.3

AUTHORS CONTRIBUTION

All authors contributed equally to this work. [1] David L Nelson, Albert L Lehninger, and Michael M Cox.

Lehninger principles of biochemistry . Macmillan, 2008.[2] Bernhard Ø Palsson.

Systems biology: constraint-based reconstruction and analysis . Cambridge University Press, 2015.[3] Amit Varma and Bernhard O. Palsson. Metabolic Flux Balancing: Basic Concepts, Scientiﬁc and Practical Use.

Nat Biotech ,12(10):994–998, October 1994.[4] Kenneth J Kauﬀman, Purusharth Prakash, and Jeremy S Edwards. Advances in ﬂux balance analysis.

Current Opinion inBiotechnology , 14(5):491–496, October 2003.[5] Patrick F. Suthers, Anthony P. Burgard, Madhukar S. Dasika, Farnaz Nowroozi, Stephen Van Dien, Jay D. Keasling, and Costas D.Maranas. Metabolic ﬂux elucidation for large-scale models using 13c labeled isotopes.

Metabolic Engineering , 9(5-6):387 – 405,2007.[6] Wout Megchelenbrink, Martijn Huynen, and Elena Marchiori. optGpSampler : An improved tool for uniformly sampling thesolution-space of genome-scale metabolic networks.

PLoS ONE , 9(2):e86587, 02 2014.[7] J. Fernandez-de Cossio-Diaz and R. Mulet. Fast inference of ill-posed problems within a convex space.

Journal of StatisticalMechanics: Theory and Experiment , 2016(7):073207, 2016.[8] Daniele De Martino, Matteo Mori, and Valerio Parisi. Uniform sampling of steady states in metabolic networks: Heterogeneousscales and rounding.

PLoS ONE , 10(4):e0122670, 04 2015.[9] M. E. Dyer and A. M. Frieze. On the complexity of computing the volume of a polyhedron.

SIAM Journal on Computing ,17(5):967–974, 1988.[10] Robert L. Smith. Eﬃcient monte carlo procedures for generating points uniformly distributed over bounded regions.

OperationsResearch , 32(6):1296–1308, 1984.[11] V. Turchin. On the Computation of Multidimensional Integrals by the Monte-Carlo Method.

Theory Probab. Appl. , 16(4):720–724,January 1971.[12] Robert L. Smith David E. Kaufman. Direction choice for accelerated convergence in hit-and-run sampling.

Operations Research ,46(1):84–95, 1998.[13] Jan Schellenberger, Richard Que, Ronan M. T. Fleming, Ines Thiele, Jeﬀrey D. Orth, Adam M. Feist, Daniel C. Zielinski, AarashBordbar, Nathan E. Lewis, Sorena Rahmanian, Joseph Kang, Daniel R. Hyduke, and Bernhard O. Palsson. Quantitative predictionof cellular metabolism with constraint-based models: the cobra toolbox v2.0.

Nat. Protocols , 6(9):1290–1307, Sep 2011.[14] Nathan D. Price, Jan Schellenberger, and Bernhard O. Palsson. Uniform sampling of steady-state ﬂux spaces: Means to designexperiments and to interpret enzymopathies.

Biophysical Journal , 87(4):2172 – 2186, 2004.[15] Jan Schellenberger and Bernhard Ø Palsson. Use of Randomized Sampling for Analysis of Metabolic Networks.

Journal ofBiological Chemistry , 284(9):5457–5461, February 2009.[16] Sattar Taheri-Araghi, Serena Bradde, John T. Sauls, Norbert S. Hill, Petra Anne Levin, Johan Paulsson, Massimo Vergassola,and Suckjoon Jun. Cell-size control and homeostasis in bacteria.

Current Biology , 25(3):385 – 391, 2015.[17] Daniele De Martino, Matteo Figliuzzi, Andrea De Martino, and Enzo Marinari. A Scalable Algorithm to Explore the Gibbs EnergyLandscape of Genome-Scale Metabolic Networks.

PLoS Comput Biol , 8(6):e1002562, June 2012.[18] Daniele De Martino, Fabrizio Capuani, and Andrea De Martino. Growth against entropy in bacterial metabolism: the phenotypictrade-oﬀ behind empirical growth rate distributions in E. coli. arXiv preprint arXiv:1601.03243 , 2016.[19] Marc Mezard and Andrea Montanari.

Information, Physics, and Computation . Oxford University Press, Inc., New York, NY,USA, 2009.[20] Alfredo Braunstein, Roberto Mulet, and Andrea Pagnani. Estimating the size of the solution space of metabolic networks.

BMCBioinformatics , 9(1):240, May 2008.[21] A. Braunstein, R. Mulet, and A. Pagnani. The space of feasible solutions in metabolic networks.

J. Phys.: Conf. Ser. , 95(1):012017,January 2008.[22] Francesco Alessandro Massucci, Francesc Font-Clos, Andrea De Martino, and Isaac Pérez Castillo. A novel methodology toestimate metabolic ﬂux distributions in constraint-based models.

Metabolites , 3(3):838, 2013.[23] Francesc Font-Clos, Francesco Alessandro Massucci, and Isaac Pérez Castillo. A weighted belief-propagation algorithm for estimat-ing volume-related properties of random polytopes.

Journal of Statistical Mechanics: Theory and Experiment , 2012(11):P11003,2012. [24] Jeﬀrey D. Orth, Bernhard Ø. Palsson, and R. M. T. Fleming. Reconstruction and Use of Microbial Metabolic Networks: the CoreEscherichia coli Metabolic Model as an Educational Guide. EcoSal Plus , 4(1), September 2010.[25] Sharon J. Wiback, Iman Famili, Harvey J. Greenberg, and Bernhard Ø Palsson. Monte Carlo sampling can be used to determinethe size and shape of the steady-state ﬂux space.

Journal of Theoretical Biology , 228(4):437–447, June 2004.[26] Thomas P. Minka. Expectation propagation for approximate Bayesian inference. In

Proceedings of the Seventeenth conference onUncertainty in artiﬁcial intelligence , pages 362–369. Morgan Kaufmann Publishers Inc., 2001.[27] M. Opper and O. Winther. Gaussian processes for classiﬁcation: mean-ﬁeld algorithms.

Neural Computation , 12(11):2655–2684,November 2000.[28] Manfred Opper and Ole Winther. Adaptive and self-averaging Thouless-Anderson-Palmer mean-ﬁeld theory for probabilisticmodeling.

Physical Review E , 64(5):056131, October 2001.[29] Jennifer L. Reed, Thuy D. Vo, Christophe H. Schilling, and Bernhard O. Palsson. An expanded genome-scale model of Escherichiacoli K-12 (iJR904 GSM/GPR).

Genome Biology , 4(9):R54, 2003.[30] Nathan E. Lewis, Gunnar Schramm, Aarash Bordbar, Jan Schellenberger, Michael P. Andersen, Jeﬀrey K. Cheng, Nilam Patel,Alex Yee, Randall A. Lewis, Roland Eils, Rainer König, and Bernhard Ø Palsson. Large-scale in silico modeling of metabolicinteractions between cell types in the human brain.

Nature Biotechnology , 28(12):1279–1285, December 2010.[31] Natalie C. Duarte, Scott A. Becker, Neema Jamshidi, Ines Thiele, Monica L. Mo, Thuy D. Vo, Rohith Srivas, and Bernhard Ø.Palsson. Global reconstruction of the human metabolic network based on genomic and bibliomic data.

Proceedings of the NationalAcademy of Sciences of the United States of America , 104(6):1777–1782, February 2007.[32] Zachary A King, Justin Lu, Andreas Dräger, Philip Miller, Stephen Federowicz, Joshua A Lerman, Ali Ebrahim, Bernhard OPalsson, and Nathan E Lewis. Bigg models: A platform for integrating, standardizing and sharing genome-scale models.

Nucleicacids research , 44(D1):D515–D522, 2016.[33] Christopher S Henry, Matthew DeJongh, Aaron A Best, Paul M Frybarger, Ben Linsay, and Rick L Stevens. High-throughputgeneration, optimization and analysis of genome-scale metabolic models.

Nature biotechnology , 28(9):977–982, 2010.[34] Andrew S. Kennard, Matteo Osella, Avelino Javer, Jacopo Grilli, Philippe Nghe, Sander J. Tans, Pietro Cicuta, and MarcoCosentino Lagomarsino. Individuality and universality in the growth-division laws of single

E. coli cells.

Phys. Rev. E , 93:012408,Jan 2016.[35] Edwin T Jaynes. Information theory and statistical mechanics.

Physical review , 106(4):620, 1957. SUPPLEMENTARY NOTESSupplementary note 1. KL-divergence minimization of the full conditional probabilities

Let us now rewrite the full probability distributions in Eq. (15) and Eq. (10) making explicit the dependency of thenormalization factors with respect to the parameters a n , d n : Q ( n ) ( ν | b ) = 1˜ Z Q ( n ) e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) ψ n ( ν n ) (S23) Q ( ν | b ) = 1˜ Z Q ( a n , d n ) e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) e − ( νn − an )22 dn (S24)where the partition functions are given by: ˜ Z Q ( n ) = (cid:90) d N ν e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) ψ n ( ν n ) (S25) ˜ Z Q ( a n , d n ) = (cid:90) d N ν e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) e − ( νn − an )22 dn (S26)Let us compute D KL ( Q ( n ) || Q ) : D KL ( Q ( n ) || Q ) = (cid:90) Q ( n ) ( ν | b ) log (cid:32) ψ n ( ν n ) ˜ Z Q ( a n , d n ) φ n ( ν n ) ˜ Z Q ( n ) (cid:33) d N ν (S27) = (cid:90) Q ( n ) ( ν | b ) log (cid:32) ˜ Z Q ( a n , d n ) e − ( νn − an )22 dn (cid:33) d N ν + const (S28) = (cid:90) Q ( n ) ( ν | b ) (cid:20) ( ν n − a n ) d n + log ˜ Z Q ( a n , d n ) (cid:21) d N ν + const (S29) = (cid:104) ( ν n − a n ) (cid:105) Q ( n ) d n + log ˜ Z Q ( a n , d n ) + const (S30)where const does not depend on a n and d n . We aim at minimizing D KL ( Q ( n ) || Q ) with respect to a n , d n : ∂D KL ( Q ( n ) || Q ) ∂a n = −(cid:104) ν n (cid:105) Q ( n ) + a n d n + 1˜ Z Q ∂ ˜ Z Q ∂a n (S31) ∂D KL ( Q ( n ) || Q ) ∂d n = − (cid:104) ( ν n − a n ) (cid:105) Q ( n ) d n + 1˜ Z Q ∂ ˜ Z Q ∂d n (S32)Since we can move the derivative inside the integration in ∂ ˜ Z Q ∂a n and in ∂ ˜ Z Q ∂d n we get: Z Q ∂ ˜ Z Q ∂a n = 1˜ Z Q (cid:90) d N ν e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) e − ( νn − an )22 dn ( ν n − a n ) d n = (cid:104) ν n − a n d n (cid:105) Q Z Q ∂ ˜ Z Q ∂d n = 1 Z Q (cid:90) d N ν e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) e − ( νn − an )22 dn ( ν n − a n ) d n = (cid:104) ( ν n − a n ) (cid:105) Q d n Setting the derivatives in (S32) to and assuming d n (cid:54) = 0 we ﬁnally get  −(cid:104) ν n (cid:105) Q ( n ) + a n d n + (cid:104) ν n (cid:105) Q − a n d n − (cid:104) ( ν n − a n ) (cid:105) Q ( n ) d n + (cid:104) ( ν n − a n ) (cid:105) Q d n (S33) (cid:40) (cid:104) ν n (cid:105) Q ( n ) = (cid:104) ν n (cid:105) Q (cid:104) ν n (cid:105) Q ( n ) = (cid:104) ν n (cid:105) Q (S34)and thus the moment matching condition in Eq. (13) turns out to be equivalent to the KL-divergence minimization condition. Supplementary note 2. Moments of the tilted distribution

Let us compute (cid:104) ν n (cid:105) Q ( n ) and (cid:10) ν n (cid:11) Q ( n ) as (cid:104) ν n (cid:105) Q ( n ) = 1˜ Z Q ( n ) (cid:90) dν n ν n ψ n ( ν n ) (cid:90) (cid:89) m (cid:54) = n dν m e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) (S35) (cid:10) ν n (cid:11) Q ( n ) = 1˜ Z Q ( n ) (cid:90) dν n ν n ψ n ( ν n ) (cid:90) (cid:89) m (cid:54) = n dν m e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) (S36)Integrating out the multivariate Gaussian we obtain for the ﬁrst moment (cid:104) ν n (cid:105) Q ( n ) = 1˜ Z Q ( n ) (cid:90) dν n ν n q n ( ν n ) (S37)where q n ( ν n ) is the marginal probability function q n ( ν n ) ∝ ψ n ( ν n ) e − ( νn − ¯ νn )22Σ nn (S38) ∝ (cid:40) e − ( νn − ¯ νn )22Σ nn if ν n ∈ (cid:2) v infn , ν supn (cid:3) (S39)Let us rewrite the non-zero part of (S38) in standard notation: q n ( ν n ) = nn N (cid:16) ν − ¯ ν n √ Σ nn (cid:17) Φ (cid:16) ν supn − ¯ ν n √ Σ nn (cid:17) − Φ (cid:16) ν infn − ¯ ν n √ Σ nn (cid:17) (S40)7where the N ( x ) = √ π e − x is the probability density function of the standard normal distribution and Φ( x ) = (cid:82) x −∞ e − y √ π dy = (cid:104) (cid:16) x √ (cid:17)(cid:105) is its cumulative. In the following we will need the value of the ﬁrst two moments of this distribution thatare given by: (cid:104) ν n (cid:105) Q ( n ) = ¯ ν n + N (cid:16) ν infn − ¯ ν n √ Σ nn (cid:17) − N (cid:16) ν supn − ¯ ν n √ Σ nn (cid:17) Φ (cid:16) ν supn − ¯ ν n √ Σ nn (cid:17) − Φ (cid:16) ν infn − ¯ ν n √ Σ nn (cid:17) (cid:112) Σ nn (S41) (cid:104) ν n (cid:105) Q ( n ) − (cid:104) ν n (cid:105) Q ( n ) = Σ nn  ν infn − ¯ ν n Σ nn N (cid:16) ν infn − ¯ ν n √ Σ nn (cid:17) − ν supn − ¯ ν n √ Σ nn N (cid:16) ν supn − ¯ ν n √ Σ nn (cid:17) Φ (cid:16) ν supn − ¯ ν n √ Σ nn (cid:17) − Φ (cid:16) ν infn − ¯ ν n √ Σ nn (cid:17) + (S42) −  N (cid:16) ν infn − ¯ ν n √ Σ nn (cid:17) − N (cid:16) ν supn − ¯ ν n √ Σ nn (cid:17) Φ (cid:16) ν supn − ¯ ν n √ Σ nn (cid:17) − Φ (cid:16) ν infn − ¯ ν n √ Σ nn (cid:17)   (S43)Unfortunately when Σ nn → and thus x → + ∞ several numeric issues occur when we compute (S41),(S42). We proposean expansion up to the th order of these equations to overcome this problem (see details in Supplementary note 5). Supplementary note 3. Moments of Q ( ν | b ) Let us compute (cid:104) ν n (cid:105) Q and (cid:10) ν n (cid:11) Q as (cid:104) ν n (cid:105) Q = 1˜ Z Q (cid:90) dν n ν n φ n ( ν n ) (cid:90) (cid:89) m (cid:54) = n dν m e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) (S44) (cid:10) ν n (cid:11) Q = 1˜ Z Q (cid:90) dν n ν n φ n ( ν n ) (cid:90) (cid:89) m (cid:54) = n dν m e − ( ν − ¯ ν ) T Σ − ( ν − ¯ ν ) (S45)Integrating out the multivariate Gaussian we obtain (cid:104) ν n (cid:105) Q = 1˜ Z Q (cid:90) dν n ν n q n ( ν n ) (S46) (cid:10) ν n (cid:11) Q = 1˜ Z Q (cid:90) dν n ν n q n ( ν n ) (S47)where q n ( ν n ) is the marginal probability function q n ( ν n ) ∝ e − ( νn − an )22 dn e − ( νn − ¯ νn )22Σ nn (S48)The proportional sign denotes that the normalization constant is missing. Remembering that the product of two Gaussiandistributions satisfy N ( x | µ , σ ) N ( x | µ , σ ) = N ( x | µ, σ ) (cid:40) µσ = µ σ + µ σ σ = σ + σ In our case, we obtain the following result for the ﬁrst and second (connected) moment of (S48):  (cid:104) ν n (cid:105) Q − (cid:104) ν n (cid:105) Q = dn + nn (cid:104) ν n (cid:105) Q = (cid:16) d n + nn (cid:17) − (cid:16) a n d n + ¯ ν n Σ nn (cid:17) (S49) Supplementary note 4. Fast computation of Σ and ¯ ν Each time we update the parameters of one φ n we need to build a new matrix D and solve the system of equations in Eq.(16) which requires the inversion of a big matrix of dimension N × N . Globally we need to invert N times a large matrix periteration which severely aﬀects the computational time. Here we present a scheme by which we can invert one large matrixper iteration.Let us deﬁne D (cid:48) a diagonal matrix of elements D (cid:48) nn = d n and Σ (cid:48) , ¯ ν (cid:48) the solutions of (cid:40) Σ (cid:48) − = β S T S + D (cid:48) ¯ ν (cid:48) = Σ (cid:48) (cid:16) β S T b + D (cid:48) a (cid:17) (S50)We aim at determining the values of Σ and ¯ ν entering in the computation of a n and d n as functions of Σ (cid:48) and ¯ ν (cid:48) that canbe computed per each iteration. Let us write for each ﬂux n the respective D matrix as D = D (cid:48) − d n e n e Tn that must satisfy (cid:40)(cid:0) β S T S + D (cid:1) ¯ ν = β S T b + D a (cid:16) β S T S + D (cid:48) (cid:17) ¯ ν (cid:48) = β S T b + D (cid:48) a (S51)Take the ﬁrst equation in (S51) and subtract to the second: (cid:16) β S T S + D (cid:48) (cid:17) (cid:16) ¯ ν − ¯ ν (cid:48) (cid:17) − d n e n e Tn ¯ ν = − d n e n e Tn a (S52) (cid:16) β S T S + D (cid:48) (cid:17) − (cid:18) − d n a n e n + 1 d n ¯ ν n e n (cid:19) + ¯ ν (cid:48) = ¯ ν (S53)where it is possible to extract the ¯ ν n component as ¯ ν n (cid:20) − (cid:16) β S T S + D (cid:48) (cid:17) − nn d n (cid:21) = − D (cid:48) nn a n (cid:16) β S T S + D (cid:48) (cid:17) − nn + ¯ ν (cid:48) n (S54) ¯ ν n = − d n a n (cid:16) β S T S + D (cid:48) (cid:17) − nn + ¯ ν (cid:48) n − ( β S T S + D (cid:48) ) − nn d n (S55)9Equivalently the diagonal elements of Σ satisfying Σ − = β S T S + D can be computed as follows. We deﬁne x the solutionof equation Σ − x = e n such that x is the n th column of Σ ; thus x n = Σ nn . Now consider the homogeneous equation Σ (cid:48) − x (cid:48) = for Σ (cid:48) − = β S t S + D (cid:48) which surely has solution x (cid:48) = . We write the system of equations: (cid:40)(cid:0) β S T S + D (cid:1) x = e n (cid:16) β S T S + D (cid:48) (cid:17) x (cid:48) = e n (S56)and we proceed with the same argument as before. Take the ﬁrst equation and subtract the second (cid:18) β S T S + D (cid:48) − d n e n e Tn (cid:19) x − (cid:16) β S T S + D (cid:48) (cid:17) x (cid:48) = e n (S57) (cid:16) β S T S + D (cid:48) (cid:17) (cid:16) x − x (cid:48) (cid:17) − d n x n e n = e n (S58) (cid:16) β S T S + D (cid:48) (cid:17) x (cid:48) + e n + 1 d n x n e n = (cid:16) β S T S + D (cid:48) (cid:17) x (S59) x (cid:48) + (cid:18) d n x n (cid:19) (cid:16) β S T S + D (cid:48) (cid:17) − e n = x (S60)Since x (cid:48) = , the n th component of x will be: x n = (cid:18) d n x n (cid:19) (cid:16) β S T S + D (cid:48) (cid:17) − nn (S61) x n = (cid:16) β S T S + D (cid:48) (cid:17) − nn − ( β S T S + D (cid:48) ) − nn d n . (S62)Finally  Σ nn = Σ (cid:48) nn − Σ (cid:48) nn dn ¯ ν n = − dn a n Σ (cid:48) nn +¯ ν (cid:48) n − Σ (cid:48) nn dn (S63)Components of ¯ ν diﬀerent from ¯ ν n and all non-diagonal entries of Σ can be computed following the same strategy; we donot report their expression here since the update rules of a n and d n in Eq. (18) only depends on terms in (S63). Supplementary note 5. Asymptotic expansion of the ﬁrst two moments of the tilted distribution

As already remarked in Supplementary note 2, the computation of the ﬁrst two moments of the tilted distribution Q ( n ) deﬁned in Eq. (15) turns out to be numerically diﬃcult to compute in particular in cases when we need to evaluate integralsover the tails of the distributions. To overcome such diﬃculties, we resorted to an asymptotic expansion to the th orderwhich accounts for both accuracy and numerical stability in all conditions analyzed in our tests. The idea is to start bynoting that up to the required precision, in the limit x → ∞ , Φ( x ) (cid:39) − N ( x ) (cid:0) x − x + x − o (cid:0) x (cid:1)(cid:1) so that:0 φ ( x ) − φ ( x )Φ( x ) − Φ( x ) = e − x − e − x e − x (cid:16) x − x + x (cid:17) − e − x (cid:16) x − x + x (cid:17) = x x (cid:18) − e x − x (cid:19) e x − x x ( − x + x ) + x (3 − x + x ) (S64) =  x − x − x for ( x − x ) → ∞ x − x + x for ( x − x ) → −∞ (S65) x φ ( x ) − x φ ( x )Φ( x ) − Φ( x ) = x e − x − x e − x e − x (cid:16) x − x + x (cid:17) − e − x (cid:16) x − x + x (cid:17) = x x (cid:18) x − x e x − x (cid:19) e x − x x ( − x − x ) + x (3 − x + x ) (S66) =  x − x − x for ( x − x ) → ∞ x − x − x for ( x − x ) → −∞ (S67)Taking in consideration the expressions above and deﬁning x = ν infn − ¯ ν n √ Σ nn , x = ν supn − ¯ ν n √ Σ nn , s = sign(x x ) , m = min( | x | , | x | ) , ∆ = x − x , and calling γ = γ ( x , x ) = x x e x − x x ( − x + x ) + x (3 − x + x ) we can ﬁnally approximate the two ﬁrst moments of the tilted distribution in the following manner: (cid:104) ν (cid:105) Q ( n ) = ¯ ν + (cid:112) Σ nn ·  N ( x ) −N ( x )Φ( x ) − Φ( x ) for m ≤ − γ (cid:18) − e x − x (cid:19) for m ≥ , s = 1 , ∆ < x − x − x for m ≥ , s = 1 , ∆ ≥ (cid:104) ν (cid:105) Q ( n ) − (cid:104) ν (cid:105) Q ( n ) = Σ nn ·  x N ( x ) − x N ( x )Φ( x ) − Φ( x ) − (cid:16) N ( x ) −N ( x )Φ( x ) − Φ( x ) (cid:17) for m ≤ or s = −

11 + γ (cid:18) x − x e x − x (cid:19) − γ (cid:18) − e x − x (cid:19) for m ≥ , s = 1 , ∆ <

401 + x − x − x − (cid:16) x − x − x (cid:17) for m ≥ , s = 1 , ∆ ≥ Note that the generating series for Φ is of alternate signs and one can easily see that upon considering only the rd order,the variance might turn to be negative. To overcome this diﬃculty, only terms of order , , . . . , n + 1 must be considered,and so the next useful approximation is of order th .1 Supplementary note 6. Weighted Hit-and-Run

Hit-and-Run is a Monte Carlo method that aims at uniformly sample the feasible conﬁguration space of ﬂuxes. To add anon-uniform prior such as g ( ν i ; a i , d i ) in Eq. (19) , one should resort to an importance sampling generalization of HR (seee.g. [ ? ]). However, the determination of the parameters a i and d i must be done by multiple HR convergences in some sortof gradient descent scheme (in a procedure similar to Boltzmann learning), which is deemed to be too time consuming.A seemingly simpler alternative is to perform a re-weighting of the conﬁgurations explored by the uniform sampling in away that the HR marginal of the ﬂux ﬁts the experimental data. Calling ν i the experimentally known ﬂux and deﬁning there-weighting function as g ( ν i ; a i , d i ) , our scope is to tune a and b to reproduce the experimental marginal. More formally g ( ν i ; a i , d i ) is the exponential function of the unknown Lagrange multipliers enforcing the constraint on the ﬁxed marginalas the one introduced in Eq. (19). One way of determining the two parameters and of performing the sampling can be thefollowing. The empirical ﬁrst and second moments of the re-weighted distribution will read (cid:104) ν i (cid:105) g (cid:39) A (cid:88) α =1 ν i,α g ( ν i,α ; a i , d i ) W (S68) (cid:10) ν i (cid:11) g (cid:39) A (cid:88) α =1 ν i,α g ( ν i,α ; a i , d i ) W where the index α runs over all sampled conﬁgurations and W = (cid:80) α g ( ν i,α ; a i , d i ) is the normalization constant. This is a × system that needs to be solved for variables a i , d i .However, we will show in the following that the approximation in Eq. S68 is normally too rough to expect good resultswith a reasonable number of sample points in most cases. To give an example of the reliability of this procedure let us ﬁxtwo marginals of our choice for the biomass of a population of Escherichia Coli described by the modiﬁed iJR904 modelintroduced in section “Results” of the main text. The distribution of the unconstrained biomass ﬂux in this network is roughlya Gaussian distribution with standard deviation · − and mean µ = 0 . as shown in Supplementary ﬁgure 1 (a). We willattempt to constraint the system to two diﬀerent “observed” biomass ﬂuxes, both with standard deviation − and means . and . respectively.We start by computing a uniformly weighted sampling set with standard HR. We performed this with three diﬀerent setssizes T = (cid:8) . · , . · , . · (cid:9) . In each of the two cases, we also apply constrained EP to ﬁx the marginaldistribution to the observed one, obtaining parameters a EP (0 . i , d EP (0 . i and a EP (0 . i , d EP (0 . i . Now we can perform there-weighting of the conﬁgurations according to the functions g (cid:16) ν i ; a EP (0 . i , d EP (0 . i (cid:17) and g (cid:16) ν i ; a EP (0 . i , d EP (0 . i (cid:17) . Weshow in Supplementary ﬁgure 1 the re-weighted marginals of the biomass ﬂux (blue, green and yellow bars) along with theGaussian distributions with mean . (Supplementary ﬁgure 1 (b)) and the one with mean . (Supplementary ﬁgure 1 (c))that we would like to retrieve (red line). In Supplementary ﬁgure 1 (b) we can notice that as we increase the number ofsampled points, the re-weighted marginal very slowly approaches the desired one; diﬀerently in Supplementary ﬁgure 1 (c)HR estimate fails to retrieve the ﬁxed proﬁle, indicating that a much larger set of uniform sampling points would be needed.The reason is that in the second case, being the mean in the unconstrained case equal to . , for an exponentiallyoverwhelming fraction of the HR points the value of the ﬂux ν i is far from the mean value of the experimental distribution(and thus the associated weight g ( ν i ; a, b ) is exponentially small). As a consequence the number of sampling points needed toreasonably sample the constrained distribution becomes extremely large. This is what happens to the experimental growthrate described in the “Results” section that cannot be recovered by this method in a feasible time.2 SUPPLEMENTARY FIGURES

Unconstrained mean: 0.03 (a)

HR T = 2.56e7HR T = 1.02e8HR T = 4.10e8

Constrained mean: 0.09 (b)

HR T = 2.56e7HR T = 1.02e8HR T = 4.10e8Fixed marginal

Constrained mean: 0.20 (c)

HR T = 2.56e7HR T = 1.02e8HR T = 4.10e8Fixed marginal

Supplementary ﬁgure 1: Marginal probability densities for the biomass ﬂux computed through the re-weighting procedure (blue bars)and the ﬁxed ones (red line) in two cases: (b) the ﬁxed proﬁle has mean 0.09 while in (c) the mean has been shifted to 0.2. Fig. (a)shows the marginal in the unconstrained case. HK PGI

PFK

ALD

TPI

GAPDH

PGK

DPGM

DPGase

PGM EN PK LDH

G6PDH

PGL

PDGH

R5PI

Xu5PE

TKI

TKII TA AMPase

ADA AK ApK -4 AMPDA -4 AdPRT

IMPase

PNPase

PRM

PRPPsyn

HGPRT

GLC

PYR

LAC HX -4 ADE -3 ADO -3 INO

ADP

ATP

NAD

NADH

NADP

NADPH

Supplementary ﬁgure 2: Marginals of the entire set of ﬂuxes of red blood cell. The blue bars are obtained through HR sampling for T ∼ explored conﬁgurations; the green line is the prediction of the Belief propagation (BP) algorithm in [7] while the red linedenotes the results of our EP algorithm. Bacillus Subtilis -- iYO844 -- N = 501 -- M = 369 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -15 -2 V a r i a n ce , EP -15 -2 T =6 : " -14 -2 -14 -2 T =1 : " -14 -2 -14 -2 T =4 : " -14 -2 -14 -2 Staphylococcus Aureus -- iSB619 -- N = 294 -- M = 250 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -11 -3 V a r i a n ce , EP -11 -3 T =6 : " -10 -3 -10 -3 T =1 : " -10 -3 -10 -3 T =4 : " -10 -3 -10 -3 Gabaergic neuron -- GABAnorm -- N = 657 -- M = 566 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -13 -2 V a r i a n ce , EP -13 -2 T =6 : " -13 -2 -13 -2 T =1 : " -12 -2 -12 -2 T =4 : " -12 -2 -12 -2 Glutamatergic neuron -- GLUnorm -- N = 657 -- M = 567 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -13 -2 V a r i a n ce , EP -13 -2 T =6 : " -13 -2 -13 -2 T =1 : " -12 -2 -12 -2 T =4 : " -12 -2 -12 -2 Saccharomyces Cerevisiae -- iND750 -- N = 389 -- M = 296 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -27 -2 V a r i a n ce , EP -27 -2 T =6 : " -27 -2 -27 -2 T =1 : " -26 -2 -26 -2 T =1 : " -26 -2 -26 -2 Mycobacterium Tuberculosis -- iNJ661 -- N = 414 -- M = 316 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -13 -2 V a r i a n ce , EP -13 -2 T =6 : " -13 -2 -13 -2 T =1 : " -13 -2 -13 -2 T =4 : " -13 -2 -13 -2 Methanosarcina Barkeri -- iAF692 -- N = 131 -- M = 117 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -13 -2 V a r i a n ce , EP -13 -2 T =6 : " -13 -2 -13 -2 T =1 : " -13 -2 -13 -2 T =4 : " -12 -2 -12 -2 Helicobacter Pylori -- iIT341 -- N = 210 -- M = 175 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -11 -2 V a r i a n ce , EP -11 -2 T =6 : " -11 -2 -11 -2 T =1 : " -11 -2 -11 -2 T =4 : " -11 -2 -11 -2 Clostridium Thermocellum -- iSR432 -- N = 399 -- M = 323 T P e a r s o n MeanVariance

T = 4 : " -0.5 0 0.5 M e a n , EP -0.500.5 T = 6 : " -0.5 0 0.5-0.500.5 T = 1 : " -0.5 0 0.5-0.500.5 T = 4 : " -0.5 0 0.5-0.500.5 T T i m e [ s ] HREP

T =4 : " -11 -2 V a r i a n ce , EP -11 -2 T =6 : " -11 -2 -11 -2 T =1 : " -11 -2 -11 -2 T =4 : " -11 -2 -11 -2 Red Blood Cell -- iAB RBC 283 -- N = 416 -- M = 308 T P e a r s o n MeanVariance

T = 4 : " M e a n , EP T = 6 : " T = 1 : " T = 4 : " T T i m e [ s ] HREP

T =4 : " -13 -2 V a r i a n ce , EP -13 -2 T =6 : " -12 -2 -12 -2 T =1 : " -12 -2 -12 -2 T =4 : " -12 -2 -12 -2-2