[PDF] Geometric algorithms for sampling the flux space of metabolic networks

Abstract

Systems Biology is a fundamental field and paradigm that introduces a new era in Biology. The crux of its functionality and usefulness relies on metabolic networks that model the reactions occurring inside an organism and provide the means to understand the underlying mechanisms that govern biological systems. Even more, metabolic networks have a broader impact that ranges from resolution of ecosystems to personalized medicine.The analysis of metabolic networks is a computational geometry oriented field as one of the main operations they depend on is sampling uniformly points from polytopes; the latter provides a representation of the steady states of the metabolic networks. However, the polytopes that result from biological data are of very high dimension (to the order of thousands) and in most, if not all, the cases are considerably skinny. Therefore, to perform uniform random sampling efficiently in this setting, we need a novel algorithmic and computational framework specially tailored for the properties of metabolic networks.We present a complete software framework to handle sampling in metabolic networks. Its backbone is a Multiphase Monte Carlo Sampling (MMCS) algorithm that unifies rounding and sampling in one pass, obtaining both upon termination. It exploits an improved variant of the Billiard Walk that enjoys faster arithmetic complexity per step. We demonstrate the efficiency of our approach by performing extensive experiments on various metabolic networks. Notably, sampling on the most complicated human metabolic network accessible today, Recon3D, corresponding to a polytope of dimension 5 335 took less than 30 hours. To our knowledge, that is out of reach for existing software.

Full PDF

GGeometric algorithms for sampling the ﬂux space ofmetabolic networks

Apostolos Chalkis

Department of Informatics & TelecommunicationsNational & Kapodistrian University of Athens, andAthena Research Innovation Center, [email protected]

Vissarion Fisikopoulos

Department of Informatics & TelecommunicationsNational & Kapodistrian University of Athens, Greecevﬁ[email protected]

Elias Tsigaridas

Inria Paris and IMJ-PRG,Sorbonne Université and Paris Université[email protected]

Haris Zafeiropoulos

Department of Biology, University of CreteInstitute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine [email protected]

Abstract

Systems Biology is a fundamental ﬁeld and paradigm that introduces a new era in Biology. The crux of itsfunctionality and usefulness relies on metabolic networks that model the reactions occurring inside an organismand provide the means to understand the underlying mechanisms that govern biological systems. Even more,metabolic networks have a broader impact that ranges from resolution of ecosystems to personalized medicine.The analysis of metabolic networks is a computational geometry oriented ﬁeld as one of the main operationsthey depend on is sampling uniformly points from polytopes; the latter provides a representation of the steadystates of the metabolic networks. However, the polytopes that result from biological data are of very high dimen-sion (to the order of thousands) and in most, if not all, the cases are considerably skinny. Therefore, to performuniform random sampling e ﬃ ciently in this setting, we need a novel algorithmic and computational frameworkspecially tailored for the properties of metabolic networks.We present a complete software framework to handle sampling in metabolic networks. Its backbone is a Mul- tiphase Monte Carlo Sampling (MMCS) algorithm that uniﬁes rounding and sampling in one pass, obtaining bothupon termination. It exploits an improved variant of the Billiard Walk that enjoys faster arithmetic complexity perstep. We demonstrate the e ﬃ ciency of our approach by performing extensive experiments on various metabolicnetworks. Notably, sampling on the most complicated human metabolic network accessible today, Recon3D,corresponding to a polytope of dimension 5 335 took less than 30 hours. To our knowledge, that is out of reachfor existing software. Mathematics of computing → Mathematical software; Applied computing → Systems biology; Computing methodologies → Modeling and simulation

Keywords and phrases

Flux analysis, metabolic networks, convex polytopes, random walks, sampling

Acknowledgements

We thank Ioannis Emiris for his useful comments. a r X i v : . [ q - b i o . Q M ] D ec Geometric analysis of metabolic networks

Systems Biology establishes a scientiﬁc approach and a paradigm. As a research approach, it is the qualit-ative and quantitative study of the systemic properties of a biological entity along with their ever evolvinginteractions [25, 26]. By combining experimental studies with mathematical modeling it analyzes the func-tion and the behavior of biological systems. In this setting, we model the interactions between the compon-ents of a system to shed light on the system’s raison d’être and to decipher its underlying mechanisms interms of evolution, development, and physiology [21].Initially, Systems Biology emerged as a need. New technologies in Biology accumulate vast amounts ofinformation / data from di ﬀ erent levels of the biological organization i.e., genome, transcriptome, proteome,metabolome [41]. This leads to the emerging question "what shall we do with all these pieces of informa-tion"? The answer, if we consider Systems Biology as a paradigm, is to move away from reductionism, stillthe main conceptual approach in biological research, and adopt holistic approaches for interpreting how asystem’s properties emerge [35]. The following diagram provides a ﬁrst, rough, mathematical formalizationof this approach. components → networks → in silico models → phenotype [39].Systems Biology expands in all the di ﬀ erent levels of living entities, from the molecular, to the organis-mal and ecological level. The notion that penetrates all levels horizontally is metabolism ; the process thatmodiﬁes molecules and maintains the living state of a cell or an organism through a set of chemical reactions[45]. The reactions begin with a particular molecule which they convert into some other molecule(s), whilethey are catalyzed by enzymes in a key-lock relationship. We call the quantitative relationships between thecomponents of a reaction stoichiometry . Linked reactions, where the product of the ﬁrst acts as the substratefor the next, build up metabolic pathways. Each pathway is responsible for a certain function. We can linktogether the aggregation of all the pathways that take place in an organism (and their corresponding reac-tions) and represent them mathematically using the reactions’ stoichiometry. Therefore, at the species level,metabolism is a network of its metabolic pathways and we call these representations metabolic networks . The complete reconstruction of the metabolic network of an organism is a challenging and computation-ally intensive task. Determining the systems components DNA sequencing, usually at the genome scale,provides us with quality insight. Manual curation is necessary and large groups of researchers spend a greatamount of time to build such models [49]. Fortunately, there are recent automatic reconstruction approachesfor building genome-scale metabolic models [32] of relatively high quality. We can also obtain models forall the species present in a biological community so that biologists can have a global view of the reactionsoccur in the species of interest. Using the stoichiometry of each reaction, which is independent of the spe-cies, we convert the metabolic network of an organism into a mathematical model. Thus, the metabolicnetwork becomes an in silico model of the knowledge it represents. In metabolic networks analysis massand energy are considered to be conserved [38]. As many homeostatic states, that is steady internal condi-tions [46], are close to steady states (where the production rate of each metabolite equals its consumptionrate [7]) we commonly use the latter in metabolic networks analysis.Stoichiometric coe ﬃ cients are the number of molecules a biochemical reaction consumes and produces.The coe ﬃ cients of all the reactions in a network, with m metabolites and n reactions ( m < n ), form the stoichiometric matrix S ∈ R m × n [39]. The nullspace of S corresponds to the steady states of the network: S · x = , (1)where x ∈ R n is the ﬂux vector that contains the ﬂuxes of each chemical reaction of the network. Flux is therate of turnover of molecules through a metabolic pathway. halkis et al. 3 Figure 1

From DNA sequences to distributions of metabolic ﬂuxes. (A) The genes of an organism provide us withthe enzymes that it can potentially produce. Enzymes are like a blueprint for the reactions they can catalyze. (B) Usingthe enzymes we identify the reactions in the organism. (C) We construct the stoichiometric matrix of the metabolicmodel. (D) We consider the ﬂux space under di ﬀ erent conditions (e.g., steady states); they correspond to polytopescontaining ﬂux vectors addressing these conditions. (E) We sample from polytopes that are typically skinny and of highdimension. (F) The distribution of the ﬂux of a reaction provides great insights to biologists. All physical variables are ﬁnite, therefore the ﬂux (and the concentration) is bounded [39]; that is foreach coordinate x i of the x , there are 2 n constants x ub , i and x lb , i such that x lb , i ≤ x i ≤ x ub , i , for i ∈ [ n ]. Wederive the constraints from explicit experimental information. In cases where there is no such information,reactions are left unconstrained by setting arbitrary large values to their corresponding bounds according totheir reversibility properties; i.e., if a reaction is reversible then its ﬂux might negative as well [30]. Theconstraints deﬁne a n -dimensional box containing both the steady and the dynamic states of the system. Ifwe intersect that box with the nullspace of S , then we deﬁne a polytope that encodes all the possible steadystates and their ﬂux distributions [39]. We call it the steady-state ﬂux space . Fig. 1 illustrates the completeworkﬂow from building a metabolic network to the computation of a ﬂux distribution.Using the polytopal representation, a commonly used method for the analysis of a metabolic network isFlux Balance Analysis (FBA) [37]. FBA identiﬁes a single optimal ﬂux distribution by optimizing a linearobjective function over a polytope [37]. Unfortunately, this is a biased method because it depends on theselection of the objective function. To study the global features of a metabolic network we need unbiasedmethods . To obtain an accurate picture of the whole solution space we exploit sampling techniques [44]. Ifcollect a su ﬃ cient number of points uniformly distributed in the interior of the polytope, then the biologistscan study the properties of certain components of the whole network, and deduce signiﬁcant biologicalinsights [39]. Therefore, e ﬃ cient sampling tools are of great importance. E ﬃ cient uniform random sampling on polytopes resulting from metabolic networks is a very challengingtask both from the theoretical (algorithmic) and the engineering (implementation) point of view. First,the dimension of the polytopes is of the order of certain thousands. This requires, for example, advancedengineering techniques to cope with memory requirements and to perform linear algebra operations withlarge matrices; e.g., in Recon3D [6] we compute the null space of a 8 399 ×

13 543 matrix. Second, thepolytopes are rather skinny (Sec. 4); this makes it harder for sampling algorithms to move in the interior of

Geometric analysis of metabolic networks

Figure 2

Flux distributions in the most recent human metabolic network Recon3D [6]. We estimate the ﬂuxdistributions of the reactions catalyzed by the enzymes Hexokinase (D-Glucose:ATP) (HEX), Glucose-6-PhosphatePhosphatase, Edoplasmic Reticular (G6PPer) and Phosphoenolpyruvate carboxykinase (GTP) (PEPCK). As we samplesteady states, the production rate of glc __ D _ c should be equal to its consumption rate. Thus, in the corresponding cop-ula, we see a positive dependency between HEX, i.e., the reaction that consumes glc __ D _ c and G6PPer, that producesit. Furthermore, the PEPCK reaction operates when there is no glc __ D _ c available and does not operate when the latteris present. Thus, in their copula we observe a negative dependency between HEX and PEPCK. A copula is a bivariateprobability distribution for which the marginal probability distribution of each variable is uniform. It implies a posit-ive dependency when the mass of the distribution concentrates along the up-diagonal (HEX - G6PPer) and a negativedependency when the mass is concentrated along the down-diagonal (HEX - PEPCK). The bottom line contains thereactions and their stoichiometry. polytopes and calls for novel practical techniques to sample.There is extended on-going research concerning advanced algorithms and implementations for samplingmetabolic networks over the last decades. Markov Chain Monte Carlo algorithms such as Hit-and-Run(HR) [47] have been widely used to address the challenges of sampling. Two variants of HR are the non-Markovian Artiﬁcial Centering Hit-and-Run (ACHR) [23] that has been widely used in sampling metabolicmodels, e.g., [43], and Coordinate Hit-and-Run with Rounding (CHRR) [18]. The latter is part of the cobra toolbox [19], the most commonly used software package for the analysis of metabolic networks. CHRRenables sampling from complex metabolic network corresponding to the highest dimensional polytopes sofar. There are also stochastic formulations where the inclusion of experimental noise in the model makes itmore compatible with the stochastic nature of biological networks [31]. The recent study in [13] o ﬀ ers anoverview as well as an experimental comparison of the currently available samplers.These implementations played a crucial role in actually performing in practice uniform sampling fromthe ﬂux space. However, they are currently limited to handle polytopes of dimension say ≤ ﬃ ciently handle sampling from the ﬂux space of Recon3D.Apparently, the dimension of the polytopes will keep rising and not only for the ones correspondingto human metabolic networks. Metabolism governs systems biology at all its levels, including the one ofthe community. Thus, we are not only interested in sampling a sole metabolic network, even if it has thechallenges of the human. Sampling in polytopes associated to network of networks are the next big thing inmetabolic networks analysis and in Systems Biology [4, 40]. halkis et al. 5 Regarding the sampling process, from the theoretical point of view, we are interested in the convergencetime, or mixing time , of the Markov Chain, or geometric random walk , to the target distribution. Given a d -dimensional polytope P , the mixing time of several geometric random walks (e.g., HR or Ball Walk)grows quadratically with respect to the sandwiching ratio R / r of the polytope [28, 29]. Here r and R are theradii of the smallest and largest ball with center the origin that contains and is contained in P respectively;i.e., rB d ⊆ P ⊆ RB d , where B d is the unit ball. It is crucial to reduce R / r , i.e., to put P in well roundedposition, that is to achieve R / r = (cid:101) O ( √ d ). A powerful approach to obtain well roundness is to put P in near isotropic position . In general, K ⊂ R d is in isotropic position if the uniform distribution over K is inisotropic position, that is the E X ∼ K [ X ] = E X ∼ K [ X T X ] = I d , where I d is the d × d identity matrix. Thus,to put a polytope P into isotropic position we can generate a set of uniform points in the interior and applyin P the transformation that maps the point-set to isotropic position; then iterate until P is in c -isotropicposition [12, 29], where c is a constant. In [1] they prove that O ( d ) points su ﬃ ce to achieve 2-isotropicposition. Alternatively in [18] they compute the maximum volume ellipsoid in P and then they map it tothe unit ball and apply to P the same transformation. They experimentally show that a few iterations su ﬃ ceto put P in John’s position [22].An important parameter of a random walk is the walk length , i.e., the number of the intermediate pointsthat a random walk visits before producing a single sample point. The longer the walk length of a randomwalk is, the smaller the distance of the current distribution to the stationary (target) distribution becomes.For the majority of random walks there are bounds on the walk length to bound the mixing time. Forexample, HR mixes after (cid:101) O ( d ) [29] steps for well rounded convex bodies and CDHR after a polynomial, inthe diameter and the dimension, number of steps [27, 34]. However, extended practical results have shownthat both CDHR and HR converges after O ( d ) steps [9, 12, 18].Billiard Walk (BW) [16] is a random walk that employs linear trajectories in a convex body with bound-ary reﬂections. However, the mixing time of Billiard Walk is unknown. Interestingly, [16] shows that,experimentally, BW converges faster than HR for a proper tuning of its parameters. This was also con-ﬁrmed for computing the volume of zonotopes [10]. It is also unknown if the sandwiching ratio R / r of P a ﬀ ects the mixing time of BW.For almost all random walks the theoretical bounds on their mixing times are pessimistic and unrealisticfor computations. Hence, if we terminate the random walk earlier, we generate samples that are usuallyhighly correlated. There are several MCMC Convergence Diagnostics [42] to check if the quality of asample can provide an accurate approximation of the target distribution. For a dependent sample, a powerfuldiagnostic is the E ﬀ ective Sample Size (ESS). It is the number of e ﬀ ectively independent draws from thetarget distribution that the Markov chain is equivalent to. For autocorrelated samples, ESS bounds theuncertainty in estimates [15] and provides information about the quality of the sample.Last, the copula representation we employ in Fig. 2 to capture the dependence between two ﬂuxes ofreactions has also been used successfully in a geometric framework to detect ﬁnancial crises capturing thedependence between portfolio return and volatility [8]. We introduce a Multi-phase Monte Carlo Sampling (MMCS) algorithm (Sec. 3 and Alg. 2) to samplefrom a polytope P . In particular, we split the sampling procedure in phases where, starting from P , eachphase uses the sample to round the polytope. This improves the e ﬃ ciency of the random walk in thenext phase, see Fig. 3 and Table 2. For sampling, we propose an improved variant of Billiard Walk (BW)(Sec. 2 and Alg. 1) that enjoys faster arithmetic complexity per step. We accompany the MMCS algorithmwith a powerful MCMC diagnostic, namely the estimation of E ﬀ ective Sample Size (ESS), to identify asatisfactory convergence to the uniform distribution. However, our method is ﬂexible and so one could useany other random walk and any combination of MCMC diagnostics to decide convergence. Geometric analysis of metabolic networks

The open-source implementation of our algorithms provides a complete software framework to handlee ﬃ ciently sampling in metabolic networks. We demonstrate the e ﬃ ciency of our tools by performingexperiments on the most of metabolic networks that are publicly available and comparing with state-of-the-art software packages as cobra (Sec. 4.2). Our implementation is faster than cobra for low dimensionalmodels with a speed-up that ranges from 10 to 100 times, while this gap on running times increases forbigger models (Table 1). The quality of the sample our software produces is measured with two widelyused diagnostics, i.e., ESS and potential scale reduction factor (PSRF) [14]. The highlight of our method isthe ability to sample from the most complicated human metabolic network that is accessible today, namelyRecon3D. In Fig. 2 we estimate marginal univariate and bivariate ﬂux distributions in Recon3D validating(a) that our method indeed generates steady states and (b) the quality of the sample by conﬁrming a mutuallyexclusive pair of biochemical pathways. In particular, our software can sample 1 . · points from a5 335-dimensional polytope in a day using modest hardware. This set of points su ﬃ ces for the majority ofsystems biology analytics. To our understanding this task is out of reach for existing software. The geometric random walk of our choice to sample from a polytope is based on Billiard Walk (BW) [16],which we modify to reduce the per-step cost.For a polytope P = { x ∈ R d | Ax ≤ b } , where A ∈ R k × d and b ∈ R k , BW starts from a given point p ∈ P , selects uniformly at random a direction, say v , and it moves along the direction of v for length L ;it reﬂects on the boundary if necessary. This results a new point p inside P . We repeat the procedure from x . Asymptotically it converges to the uniform distribution over P . The length is L = − τ ln η , where η isa uniform number in (0 , η ∼ U (0 , τ is a predeﬁned constant. It is useful to set a bound,say ρ , on the number of reﬂections to avoid computationally hard cases where the trajectory may stuck incorners. In [16] they set τ ≈ diam ( P ) and ρ = d . Our choices for τ and ρ depend on a burn-in step thatwe detail in Sec. 4.At each step of BW we compute the intersection point of a ray, say (cid:96) : = { p + t (cid:51) , t ∈ R + } , with theboundary of P , ∂ P , and the normal vector of the tangent plane at the intersection point. The inner vector ofthe facet that the intersection point belongs to is a row of A . To compute the point ∂ P ∩ (cid:96) where the ﬁrstreﬂection of a BW step takes place, we solve the following m linear equations a Tj ( p + t j (cid:51) ) = b j ⇒ t j = ( b j − a Tj p ) / a Tj (cid:51) , j ∈ [ k ] , and keep the smallest positive t j ; a j is the j -th row of the matrix A . We solve each equation in O ( d )operations and so the overall complexity is O ( dk ). A straightforward approach for BW would consider thateach reﬂection costs O ( kd ) and thus the per step cost is O ( ρ kd ). However, our improved version performsmore e ﬃ ciently both point and direction updates in pseudo-code by storing computations from the previousiteration combined with a preprocessing step. The preprocessing step involves the normal vectors of thefacets, that takes m d operations, and the amortized per-step complexity of BW becomes O (( ρ + d ) k ). Thepseudo-code appear in Alg. 1. (cid:73) Lemma 1.

The amortized per step complexity of the BW (Alg. 1) is O (( ρ + d ) k ) operations after apreprocess step that takes O ( k d ) operations, where ρ is the maximum number of reﬂections per step. Proof.

The ﬁrst reﬂection of a BW step costs O ( kd ). During its computation, we store all the values ofthe inner products a Tj x and a Tj (cid:51) . At the reﬂection i >

0, we start from a point x i , and the solutions of the https://github.com/GeomScale/volume_approximation/tree/metabolic_networks halkis et al. 7 Algorithm 1:

Billiard_Walk( P , p , ρ, τ, W ) Input : polytope P ; point p ; upper bound on the number of reﬂections ρ ; length of trajectoryparameter τ ; walk length W . Require: point p ∈ P Output :

A point in P for j = , . . . , W do L ← − τ ln η , η ∼ U (0 , // length of the trajectory i ← // current number of reflections p ← p // initial point of the step pick a uniform vector (cid:51) from the boundary of the unit ball do (cid:96) ← { p i + t (cid:51) i , ≤ t ≤ L } // segment if ∂ P ∩ (cid:96) = ∅ then p i + ← p i + L (cid:51) i ; break ; p i + ← ∂ P ∩ (cid:96) // point update the inner vector, s , of the tangent plane at p , s.t. || s || = L ← L − | P ∩ (cid:96) | ; (cid:51) i + ← (cid:51) i − (cid:51) Ti s ) s // direction update i ← i + while i ≤ ρ ; if i = ρ then p ← p else p ← p i ; return p ;corresponding linear equations are a Tj ( p i + t j (cid:51) i ) = b j ⇒ a Tj ( p i − + t i − (cid:51) i − ) + t j a Tj ( (cid:51) i − − (cid:51) Ti − a r ) a r ) = b j ⇒ t j = b j − a Tj ( p i − + t i − (cid:51) i − ) a Tj ( (cid:51) i − − (cid:51) Ti − a r ) a r ) , for j ∈ [ k ] , (2)and (cid:51) i + = (cid:51) i − (cid:51) Ti a l ) a l , (3)where a r , a l are the normal vectors of the facets that (cid:96) hits at reﬂection i − i respectively, and t i − thesolution of the reﬂection i −

1. The index l of the normal a l corresponds to the equation with the smallestpositive t j in (2). We solve each of the equations in (3) in O (1) based on our bookkeeping from the previousreﬂection. We also store the inner product (cid:51) Ti a l in (3) from the previous reﬂection. After computing all a Ti a j as a preprocessing step, which takes k d operations, the total per-step cost of Billiard Walk is O (( d + ρ ) k ). (cid:74) To sample steady states in the ﬂux space of a metabolic network, with m metabolites and n reactions, weintroduce a Multiphase Monte Carlo Sampling (MMCS) algorithm; it is multiphase because it consists of asequence of sampling phases.Let S ∈ R m × n be the stoichiometric matrix and x lb , x ub ∈ R n bounds on the ﬂuxes. The ﬂux space is thebounded convex polytopeFS : = { x ∈ R n | S x = , x lb ≤ x ≤ x ub } ⊂ R n . (4)The dimension, d , of FS is smaller than the dimension of the ambient space; that is d ≤ n . To work with afull dimensional polytope we restrict the box induced by the inequalities x lb ≤ x ≤ x ub to the null space of Geometric analysis of metabolic networks

Figure 3

An illustration of our Multiphase Monte Carlo Sampling algorithm. The method is given an integer n andstarts at phase i = P . In each phase it samples a maximum number of points λ . If the sum of E ﬀ ectiveSample Size in each phase becomes larger than n before the total number of samples in P i reaches λ then the algorithmterminates. Otherwise, we proceed to a new phase. We map back to P all the generated samples of each phase. S . Let the H-representation of the box be (cid:40) x ∈ R n (cid:12)(cid:12)(cid:12)(cid:12) (cid:32) I n − I n (cid:33) x ≤ (cid:32) x ub x lb (cid:33)(cid:41) , where I n is the n × n identity matrix,and let N ∈ R n × d be the matrix of the null space of S , that is S N = m × d . Then P = { x ∈ R d | Ax ≤ b } ,where A = (cid:32) I n N − I n N (cid:33) and b = (cid:32) x ub x lb (cid:33) N , is a full dimensional polytope (in R d ). After we sample (uniformly)points from P , we transform them to uniformly distributed points (that is steady states) in FS by applyingthe linear map induced by N .MMCS generates, in a sequence of sampling phases, a set of points, that is almost equivalent to n independent uniformly distributed points in P , where n is given. At each phase, it employs Billiard Walk(Section 2) to sample approximate uniformly distributed points, rounding to speedup sampling, and uses theE ﬀ ective Sample Size (ESS) diagnostic to decide termination. The pseudo-code of the algorithm appearsin Alg. 2. Overview.

Initially we set P = P .At each phase i ≥ λ points from P i . We generate them in chunks; we also callthem chain of sampling points. Each chain contains at most l points (for simplicity consider l = O (1)).To generate the points in each chain we employ BW, starting from a point inside P i ; the starting pointis di ﬀ erent for each chain. We repeat this procedure until the total number of samples in P i reaches themaximum number λ ; we need λ l chains. To compute a starting point for a chain, we pick a point uniformlyat random in the Chebychev ball of P i and we perform O ( √ d ) burn-in BW steps to obtain a warm start.After we have generated λ sample points we perform a rounding step on P i to obtain the polytope of thenext phase, P i + . In particular, we compute a linear transformation, T i , that puts the sample into isotropicposition and then P i + = T i ( P i ). The e ﬃ ciency of BW improves from one phase to the next one because thesandwiching ratio decreases and so the average number of reﬂections decreases and thus the convergence tothe uniform distribution accelerates (Section 4.2). That is we obtain faster a sample of better quality. Finally,the (product of the) inverse transformations maps the samples to P = P . Fig. 3 depicts the procedure. Termination.

There are no bounds on the mixing time of BW [16], hence for termination we relyon ESS. MMCS terminates when the minimum ESS among all the univariate marginals is larger than arequested value. We chose the marginal distributions (of each ﬂux) because they are essential for systemsbiologists, see [5] for a typical example. In particular, after we generate a chain, the algorithm updates theESS of each univariate marginal to take into account all the points that we have sampled in P i , including thenewly generated chain. We keep the minimum, say n i , among all marginal ESS values. If (cid:80) ij = n j becomeslarger than n before the total number of samples in P i reaches the upper bound λ , then MMCS terminates.Otherwise, we proceed to the next phase. In summary, MMCS terminates when the sum of the minimummarginal ESS values of each phase reaches n . halkis et al. 9 Algorithm 2:

Multiphase Monte Carlo Sampling( P , n , l , λ, ρ, τ, W ) Input :

A full dimensional polytope P ∈ R d ; Requested e ﬀ ectiveness n ∈ N ; l length of eachchain; upper bound for the number of generated points in each phase λ ; upper bound onthe number of reﬂections ρ ; length of trajectory parameter τ ; walk length W . Output :

A set of approximate uniformly distributed points S ∈ P Set P ← P , sum _ ess ← , S ← ∅ , i ← , T = I d ; do sum _ point _ phase ← , U ← ∅ ; do Set Q ← ∅ ;Generate a starting point q ∈ P i ; for j = , . . . , l do q j ← Billiard_Walk( P i , q j − , ρ, τ, W );Store the point q j to the set Q ; S ← S ∪ T − i ( Q ); U ← U ∪ Q ; sum _ point _ phase ← sum _ point _ phase + l ;Update ESS n i of this phase; if sum_ess + n i ≥ n then break ; while sum_point_phase < λ ; sum _ ess ← sum _ ess + n i ;Compute T such that T ( U ) is in isotropic position; P i + ← T ( P i ); T i + ← T i ◦ T ; i ← i + while sum_ess < n ; return S ; Rounding step . This step is motivated by the theoretical result in [1] and the rounding algorithms [29,12]. We apply the linear transformation T i to P i so that the sandwiching ratio of P i + is smaller than thatof P i . To ﬁnd the suitable T i we compute the SVD decomposition of the matrix that contains the samplerow-wise [3]. Updating the E ﬀ ective Sample Size . The e ﬀ ective sample size of a sample of points generated by aprocess with autocorrelations ρ t at lag t is function (actually an inﬁnite series) in the ρ t ’s; its exact value isunknown. Following [15], we e ﬃ ciently compute ESS employing a ﬁnite sum of monotone estimators ˆ ρ t ofthe autocorrelation at lag t , by exploiting Fast Fourier Transform. To update the ESS, for every new chainof points the algorithm generates, we compute the estimator of its autocorrelation. Then, using Welford’salgorithm we update the average of the estimators of autocorrelation at lag t , as well as the between-chainvariance and the within-sample variance estimators given in [14]. Finally, we update the ESS using theseestimators. (cid:73) Lemma 2.

Let P = { x ∈ R d | Ax ≤ b } , A ∈ R k × d , b ∈ R k a full dimensional polytope in R d . The totalnumber of operations per phase that Alg. 2 performs, is O ( W ( ρ + d ) k λ + λ d + d ), where W is the walklength for Billiard Walk. Proof.

The cost per step of Billiard Walk is O (( ρ + d ) k ). In each phase we generate with Billiard Walk atmost λ points with walk length W . Thus, the cost to generate those points is O ( W ( ρ + d ) k λ ).To compute the starting point of each chain the algorithm picks a random point uniformly distributedin the Chebychev ball of P and performs O (1) Billiard Walk steps starting from it. The former takes O ( d ) operations and latter takes O ( W ( ρ + d ) k ) operations. The total number of chains is O ( λ/ l ) = O ( λ ), as l = O (1). Thus, the total cost to generate all the starting points is O ( d λ + W ( ρ + d ) k λ ). The update of ESSof each univariate marginal takes O (1) operations since l = O (1).If the termination criterion has not been met after generating λ points, the algorithm computes a lineartransformation to put the set of points to isotropic position. We can do this by computing the SVD decom-position of the matrix that contains the set of points row-wise. This corresponds to an SVD of a λ × d matrixand takes O ( λ d + d ) operations [20]. (cid:74) In Section 4 we discuss how to tune the parameters of MMCS to make it more e ﬃ cient in practice. Wealso comment on the (practical) complexity of each phase, based on the tunning. In the sequel we present the implementation of our approach and the tuning of various parameters. Wepresent experiments in an extended set of BIGG models [24], including the most complex metabolic net-works i.e., the human Recon2D [48] and Recon3D [6]. We end up to sample from polytopes of thousandsof dimensions and show that our method can estimate precisely the ﬂux distributions. We analyze variousaspects of our method as the runtime, the e ﬃ ciency and the quality of the output. We compare our methodagainst the state-of-the-art software for the analysis of metabolic networks, which is the Matlab toolbox of cobra [19]. Our implementation for low dimensional networks is two orders of magnitude faster than cobra . As the dimension grows this gap on the run-time increases. The fast mixing of billiard walk allowus to use all the generated samples to approximate each ﬂux distribution improving the ﬂux distributionestimation.We provide a complete open-source software framework to handle big metabolic networks. The frame-work loads a metabolic model in some standard format (e.g., mat, json ﬁles) and performs an analysisof the model e.g., compute the marginal distributions of a given metabolite. All the results in this paperare reproducible using our publicly available code . The core of our implementation is in C++ to optim-ize performance while the user interface is implemented in R . The package employs eigen [17] for linearalgebra, boost [33] for random number generation, mosek [2] as the linear program solver, and expands volesti [11] an open-source package for high dimensional sampling and volume approximation. Allexperiments were performed on a PC with Intel ® Core ™ i7-6700 3.40GHz × and . We give details on how we tune various parameters presented in Section 3 in our implementation.

Parameters of Billiard Walk : To employ Billiard Walk (Section 2) we have to make e ﬃ cient selectionsfor the parameter τ that controls the length of the trajectory in each step, for the maximum number ofreﬂections per step ρ , and for the walk length W of the random walk. We experimentally found that setting W = W >

1. To set τ in phase i , ﬁrst we set τ = √ dr where r is the radius of theChebychev ball of P i . Then, we start from the center of the Chebychev ball, we perform 100 + √ d BilliardWalk steps and we store all the points in a set Q . Then we set τ = max { max q ∈ Q {|| q − p || } , √ dr } . For themaximum number of reﬂections we found experimentally that ρ = d is violated in less than 0 .

1% of thetotal number of Billiard Walk steps in our experiments.

Rounding step : In each phase i of our method, when the minimum value of ESS among all the mar-ginals has not reached the requested threshold, we use the generated sample to perform a rounding step bymapping the points to isotropic position. After computing the SVD decomposition of the point-set we alsorescale the singular values such that the smallest one is 1, to improve numerical stability as suggested in https://github.com/GeomScale/volume_approximation/tree/metabolic_networks halkis et al. 11 Figure 4

Our method estimation of the marginal distribution of the “Thioredoxin reductace" ﬂux in a constraint-based model of Homo Sapiens metabolism Recon2D [48] (left) and Recon3D [6] (right). [12]. We found experimentally that setting the maximum number of Billiard Walk points per phase λ = d ,where d is the dimension of the polytope, su ﬃ ce to improve the roundness from phase to phase. When,in any phase, the ratio between the maximum over the minimum singular value is smallest than 3 we stopperforming any new rounding step. In that case we stay on the current phase until we reach the requestedvalue of ESS. (cid:73) Remark.

Given the Stoichiometric matrix S ∈ R m × n of a metabolic network with ﬂux bounds x lb ≤ x ≤ x ub , the total number of operations per phase that our implementation of Alg. 2 performs, according theparameterization given in this Section is O ( nd ), where d is the dimension of the null space of S and n isthe number of reactions occur in the metabolic network. We test and evaluate our software on 17 models from the BIGG database [24] and Recon2D, Recon3Dfrom [36]. In particular, we sample from models that correspond to polytopes of dimension less than100; the simplest model in this setting is the well known bacteria

Escherichia Coli . We also computedwith models that correspond to polytopes of dimension a few thousands; this is the case for Recon2D andRecon3D. We do not employ parallelism for any implementation, thus we report only sequential runningtimes. To assess the quality of our results we employ a second MCMC convergence diagnostic besides theE ﬀ ective Sample Size (ESS). This is the potential scale reduction factor (PSRF), introduced by Rubin andGelman [14]. In particular, we compute the PSRF for each univariate marginal of the sample that MMCSoutputs. Following [14], a convergence is satisfying according to PSRF when all the marginals have PSRFsmaller than 1 . cobra for sampling ﬁrst performs a rounding step and then samples using CoordinateDirections Hit-and-Run (CDHR). To compare with cobra we set the walk length of CDHR according to theempirical suggestion made in [18], i.e., equal to 8 d , where d is the dimension of the polytope we sample.For Recon2D we follow the paradigm in [18] which shows that the method converges for walk length equalto 1 . e +

08. To have a fair comparison we let cobra to sample a minimum number of 1 000 points. If inthe computed sample there is a marginal with PSRF larger than 1 . . cobra . We run MMCS until we get a value of ESS equalto 1 000; meaning that we stop when the sum over all phases of the minimum values of ESS among all themarginals is larger than 1 000. Moreover, in Table 1 all the marginals of the sample that MMCS returns havePSRF < .

1. This is another statistical evidence on the quality of the generated sample. The histogramsin Fig. 4 illustrate an approximation for the ﬂux distribution of the reaction Thioredoxin as computedin Recon2D and Recon3D respectively. The same marginal ﬂux distribution in Recon2D was estimatedalso in [18]. Notice that the estimated density slightly changes in Recon3D as the stoichiometric matrix

MMCS cobra name (m) (n) (d) Time (sec) (N) Time (sec) (N)e_coli_core 72 95 24 6.50e-01 3.40e +

03 (8) 7.20e +

01 4.61e + +

00 5.40e +

03 (5) 4.54e +

02 2.79e + +

01 8.20e +

03 (5) 9.56e +

02 5.51e + +

01 6.80e +

03 (4) 1.03e +

03 6.19e + +

01 8.10e +

03 (4) 1.17e +

03 6.62e + +

01 6.20e +

03 (4) 1.33e +

03 6.77e + +

01 8.70e +

03 (3) 2.22e +

03 1.07e + +

01 1.07e +

04 (5) 7.85e +

03 4.05e + +

02 1.62e +

04 (4) 8.81e +

03 4.12e + +

02 1.04e +

04 (2) 1.73e +

04 6.68e + +

03 2.31e +

04 (3) 6.66e +

04 2.07e + +

03 5.33e +

04 (6) 7.04e +

04 2.13e + +

03 3.95e +

04 (4) 9.42e +

04 2.67e + +

03 5.14e +

04 (5) 9.99e +

04 2.71e + +

03 4.22e +

04 (4) 1.05e +

05 2.97e + +

03 5.65e +

04 (5) 1.15e +

05 3.21e + +

03 1.94e +

04 (2) 3.20e +

05 6.93e + +

04 5.44e +

04 (2) ∼

140 days 1 . e + +

05 1.44e +

05 (2) – –

Table 1

17 metabolic networks from [24] and Recon2D, Recon3D from [36]; (m) the number of Metabolites, (n)the number of Reactions, (d) the dimension of the polytope; (N) is the total number of sampled points × walk length; forMMCS we stop when the sum of the minimum value of ESS among all the univariate marginals in each phase is 1000(we report the number of phases in parenthesis); for cobra we set the walk length to 8 d and 1 . e +

08 for Recon2Dfollowing [18], sample at least 1000 points and stop when all marginals have PSRF < .

1; the runtime of cobra forRecon2D is an estimation of the sequential time just for the purpose of the comparison in this paper. has been updated and thus the corresponding marginal is a ﬀ ected. In Fig. 2 we also employ the copularepresentation to capture the dependence between two ﬂuxes of reactions to conﬁrm a mutually exclusivepair of biochemical pathways. Notice that the run-time of MMCS is one or two orders of magnitudesmaller than the run-time of cobra and this gap becomes much larger for higher dimensional models suchas Recon2D and Recon3D.For some models –we report them in Table 3– we introduce a further improvement to obtain a betterconvergence. If there is a marginal in the generated sample from MMCS that has a PSRF larger than 1 . k ﬁrst phases, starting with k = . k phases– nor we sum up its ESS to the overall ESSconsidered for termination by MMCS. Note that for these models it is not practical to repeat MMCS runsfor di ﬀ erent k until we get the required PSRF value. We can obtain the ﬁnal results –reported in Tables 1–in one pass. We simply drop a phase when the ESS reaches the requested value but the PSRF is not smallerthan 1 . ﬀ erent k just forperformance analysis reasons.Interestingly, the total number of Billiard Walk steps -and consequently the run-time- does not increaseas k increases in Table 3. That means that the performance of our method improves for those models, whenwe do not take into account the k ﬁrst phases of MMCS. That occurs because the performance of BilliardWalk improves as the polytope becomes more rounded from phase to phase. In particular, in Table 2 weanalyze Billiard Walk performance for the model iAF1260. We sample 20 d points per phase with walklength equal to 1 and we report the average number of reﬂections, the ESS, the run-time and the ratio σ max /σ min per phase. The later is the ratio between the maximum over the minimum singular value of thepoint-set. The larger this ratio is the more skinny the polytope of the corresponding phase is. As the methodprogresses from the ﬁrst to the last phase, the average number of reﬂections and the run-time decrease and halkis et al. 13 Sampling from iAF1260Phase Avg. σ max σ min Time (sec)1st 7819 67 43459 22712nd 4909 68 922 16313rd 3863 77 582 12784th 3198 71 360 10805th 1300 592 29 4546th 1187 4821 3.5 4177th 1181 4567 2.8 415

Table 2

We sample 20 d = d =

516 isthe dimension of the corresponding polytope. For each phase we report the average number of reﬂections per BIlliardWalk step, the the minimum value of E ﬀ ective Sample Size among all the univariate marginals, the ratio between themaximum over the minimum singular value derived from the SVD decomposition of the generated sample and therun-time. the ESS increases. That means that as the polytope becomes more rounded from phase to phase, the BilliardWalk step becomes faster and the generated sample has better quality. That explains why the total run-timedoes not increase when we do not take into account a number k of ﬁrst phases: the initial phases are slowand they poorly contribute to the quality of the ﬁnal sample; the last phases are fast and contribute withmore accurate samples. References Radosław Adamczak, Alexander Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. Quantitative es-timates of the convergence of the empirical covariance matrix in log-concave ensembles.

Journal of theAmerican Mathematical Society , 23(2):535–561, 2010. doi:10.1090/S0894-0347-09-00650-X . MOSEK ApS.

The MOSEK optimization toolbox for R manual. Version 9.2. , 2019. URL: https://docs.mosek.com/9.2/rmosek/index.html . Shiri Artstein-Avidan, Haim Kaplan, and Micha Sharir. On radial isotropic position: Theory and al-gorithms, 2020. arXiv:2005.04918 . David B Bernstein, Floyd E Dewhirst, and Daniel Segre. Metabolic network percolation quantiﬁes biosyn-thetic capabilities across the human oral microbiome.

Elife , 8:e39733, 2019. Sergio Bordel, Rasmus Agren, and Jens Nielsen. Sampling the solution space in genome-scale metabolicnetworks reveals transcriptional regulation in key enzymes.

PLOS Computational Biology , 6(7):1–13, 072010. doi:10.1371/journal.pcbi.1000859 . Elizabeth Brunk, Swagatika Sahoo, Daniel C Zielinski, Ali Altunkaya, Andreas Dräger, Nathan Mih,Francesco Gatto, Avlant Nilsson, German Andres Preciat Gonzalez, Maike Kathrin Aurich, et al. Re- con3D enables a three-dimensional view of gene variation in human metabolism.

Nature biotechnology ,36(3):272, 2018. Ali Cakmak, Xinjian Qi, A Ercument Cicek, Ilya Bederman, Leigh Henderson, Mitchell Drumm, andGultekin Ozsoyoglu. A new metabolomics analysis technique: steady-state metabolic network dynamicsanalysis.

Journal of bioinformatics and computational biology , 10(01):1240003, 2012. Ludovic Calès, Apostolos Chalkis, Ioannis Z. Emiris, and Vissarion Fisikopoulos. Practical Volume Com-putation of Structured Convex Bodies, and an Application to Modeling Portfolio Dependencies and Finan-cial Crises. In Bettina Speckmann and Csaba D. Tóth, editors, , volume 99 of

LIPIcs , pages 19:1–19:15, Dagstuhl, Germany, 2018. SchlossDagstuhl–Leibniz-Zentrum fuer Informatik. doi:10.4230/LIPIcs.SoCG.2018.19 . Apostolos Chalkis, Ioannis Z. Emiris, and Vissarion Fisikopoulos. Practical volume estimation by a newannealing schedule for cooling convex bodies, 2019. arXiv:1905.05494 . Apostolos Chalkis, Ioannis Z. Emiris, and Vissarion Fisikopoulos. Practical volume estimation of zono-topes by a new annealing schedule for cooling convex bodies. In Anna Maria Bigatti, Jacques Carette,

James H. Davenport, Michael Joswig, and Timo de Wol ﬀ , editors, Mathematical Software – ICMS 2020 ,pages 212–221, Cham, 2020. Springer International Publishing. Apostolos Chalkis and Vissarion Fisikopoulos. volesti: Volume approximation and sampling for convexpolytopes in R, 2020. https://github.com/GeomScale/volume_approximation . arXiv:2007.01578 . Ben Cousins and Santosh Vempala. A practical volume algorithm.

Mathematical Programming Computa-tion , 8(2):133–160, 2016. Shirin Fallahi, Hans J Skaug, and Guttorm Alendal. A comparison of Monte Carlo sampling methods formetabolic network models.

PLOS One , 15(7):e0235393, 2020. Andrew Gelman and Donald B. Rubin. Inference from Iterative Simulation Using Multiple Sequences.

Statistical Science , 7(4):457–472, 1992. Publisher: Institute of Mathematical Statistics. URL: . Charles J. Geyer. Practical Markov chain Monte Carlo.

Statist. Sci. , 7(4):473–483, 11 1992. doi:10.1214/ss/1177011137 . Elena Gryazina and Boris Polyak. Random sampling: Billiard walk algorithm.

European Journal ofOperational Research , 238(2):497 – 504, 2014. doi:https://doi.org/10.1016/j.ejor.2014.03.041 . Gaël Guennebaud, Benoît Jacob, et al.

Eigen v3 , 2010. URL: http://eigen.tuxfamily.org . Hulda S Haraldsdóttir, Ben Cousins, Ines Thiele, Ronan MT Fleming, and Santosh Vempala. CHRR:coordinate hit-and-run with rounding for uniform sampling of constraint-based models.

Bioinformatics ,33(11):1741–1743, 2017. Laurent Heirendt, Sylvain Arreckx, Thomas Pfau, Sebastián N Mendoza, Anne Richelle, Almut Heinken,Hulda S Haraldsdóttir, Jacek Wachowiak, Sarah M Keating, Vanja Vlasov, et al. Creation and analysisof biochemical constraint-based models using the cobra toolbox v. 3.0.

Nature protocols , 14(3):639–702,2019. Gene H.Golub and Charles F. Van Loan.

Matrix Computations . Johns Hopkins Studies in the MathematicalSciences. Johns Hopkins University Press, 2013. Trey Ideker, Timothy Galitski, and Leroy Hood. A new approach to decoding life: systems biology.

Annualreview of genomics and human genetics , 2(1):343–372, 2001. Fritz John. Extremum Problems with Inequalities as Subsidiary Conditions. In Giorgio Giorgi andTinne Ho ﬀ Kjeldsen, editors,

Traces and Emergence of Nonlinear Programming , pages 197–215. Springer,Basel, 2014. doi:10.1007/978-3-0348-0439-4_9 . David E Kaufman and Robert L Smith. Direction choice for accelerated convergence in hit-and-runsampling.

Operations Research , 46(1):84–95, 1998. Zachary A King, Justin Lu, Andreas Dräger, Philip Miller, Stephen Federowicz, Joshua A Lerman, Ali Eb-rahim, Bernhard Ø. Palsson, and Nathan E Lewis. Bigg models: A platform for integrating, standardizingand sharing genome-scale models.

Nucleic acids research , 44(D1):D515–D522, 2016. Edda Klipp, Wolfram Liebermeister, Christoph Wierling, and Axel Kowald.

Systems biology: a textbook . John Wiley & Sons, 2016. Peter Kohl, Edmund J Crampin, TA Quinn, and Denis Noble. Systems biology: an approach.

ClinicalPharmacology & Therapeutics , 88(1):25–33, 2010. Aditi Laddha and Santosh Vempala. Convergence of Gibbs Sampling: Coordinate Hit-and-Run MixesFast, 2020. arXiv:2009.11338 . László Lovász, Ravi Kannan, and Miklós Simonovits. Random walks and an O ∗ ( n ) volume algorithm forconvex bodies. Random Structures and Algorithms , 11:1–50, 1997. László Lovász and Santosh Vempala. Simulated annealing in convex bodies and an o ∗ ( n ) volume al-gorithms. J. Computer & System Sciences , 72:392–417, 2006. Maximilian Lularevic, Andrew J Racher, Colin Jaques, and Alexandros Kiparissides. Improving the accur-acy of ﬂux balance analysis through the implementation of carbon availability constraints for intracellularreactions.

Biotechnology and bioengineering , 116(9):2339–2352, 2019. halkis et al. 15 Michael MacGillivray, Amy Ko, Emily Gruber, Miranda Sawyer, Eivind Almaas, and Allen Holder. Robustanalysis of ﬂuxes in genome-scale metabolic pathways.

Scientiﬁc Reports , 7, 12 2017. doi:10.1038/s41598-017-00170-3 . Daniel Machado, Sergej Andrejev, Melanie Tramontano, and Kiran Raosaheb Patil. Fast automated re-construction of genome-scale metabolic models for microbial species and communities.

Nucleic acidsresearch , 46(15):7542–7553, 2018. Jens Maurer and Steven Watanabe. Boost random number library. Software, 2017. URL: . Hariharan Narayanan and Piyush Srivastava. On the mixing time of coordinate hit-and-run, 2020. arXiv:2009.14004 . Denis Noble.

The music of life: biology beyond genes . Oxford University Press, 2008. Alberto Noronha, Jennifer Modamio, Yohan Jarosz, Elisabeth Guerard, Nicolas Sompairac, German Pre-ciat, Anna Dröfn Daníelsdóttir, Max Krecke, Diane Merten, Hulda S Haraldsdóttir, Almut Heinken,Laurent Heirendt, Stefanía Magnúsdóttir, Dmitry A Ravcheev, Swagatika Sahoo, Piotr Gawron, LuciaFriscioni, Beatriz Garcia, Mabel Prendergast, Alberto Puente, Mariana Rodrigues, Akansha Roy, MoussRouquaya, Luca Wiltgen, Alise Žagare, Elisabeth John, Maren Krueger, Inna Kuperstein, Andrei Zinovyev,Reinhard Schneider, Ronan M T Fleming, and Ines Thiele. The Virtual Metabolic Human database: in-tegrating human and gut microbiome metabolism with nutrition and disease.

Nucleic Acids Research ,47(D1):D614–D624, 10 2018. doi:10.1093/nar/gky992 . Je ﬀ rey D Orth, Ines Thiele, and Bernhard Ø. Palsson. What is ﬂux balance analysis? Nature biotechnology ,28(3):245–248, 2010. Bernhard Ø. Palsson. Metabolic systems biology.

FEBS letters , 583(24):3900–3904, 2009. Bernhard Ø. Palsson.

Systems biology . Cambridge university press, 2015. Octavio Perez-Garcia, Gavin Lear, and Naresh Singhal. Metabolic network modeling of microbial interac-tions in natural and engineered environmental systems.

Frontiers in microbiology , 7:673, 2016. Robert A. Quinn, Jose A. Navas-Molina, Embriette R. Hyde, Se Jin Song, Yoshiki Vázquez-Baeza, GregHumphrey, James Ga ﬀ ney, Jeremiah J. Minich, Alexey V. Melnik, Jakob Herschend, Je ﬀ DeReus, AustinDurant, Rachel J. Dutton, Mahdieh Khosroheidari, Cli ﬀ ord Green, Ricardo da Silva, Pieter C. Dorrestein,and Rob Knight. From sample to multi-omics conclusions in under 48 hours. msystems 1: e00038-16. Crossref, Medline , 2016. Vivekananda Roy. Convergence Diagnostics for Markov Chain Monte Carlo.

Annual Review of Statisticsand Its Application , 7(1):387–412, 2020. doi:10.1146/annurev-statistics-031219-041300 . Pedro A. Saa and Lars K. Nielsen. ll-ACHRB: a scalable algorithm for sampling the feasible solution spaceof metabolic networks.

Bioinform. , 32(15):2330–2337, 2016. doi:10.1093/bioinformatics/btw132 . Jan Schellenberger and Bernhard Ø. Palsson. Use of randomized sampling for analysis of metabolic net-works.

Journal of biological chemistry , 284(9):5457–5461, 2009. John R Schramski, Anthony I Dell, John M Grady, Richard M Sibly, and James H Brown. Metabolic theorypredicts whole-ecosystem properties.

Proceedings of the National Academy of Sciences , 112(8):2617–2622, 2015. Siamak S. Shishvan, Andrea Vigliotti, and Vikram S. Deshpande. The homeostatic ensemble for cells.

Biomechanics and Modeling in Mechanobiology , 17(6):1631–1662, 2018. Robert L. Smith. E ﬃ cient Monte Carlo procedures for generating points uniformly distributed overbounded regions. Operations Research , 32(6):1296–1308, 1984. Neil Swainston, Kieran Smallbone, Hooman Hefzi, Paul D. Dobson, Judy Brewer, Michael Hanscho,Daniel C. Zielinski, Kok Siong Ang, Natalie J. Gardiner, Jahir M. Gutierrez, Sarantos Kyriakopoulos,Meiyappan Lakshmanan, Shangzhong Li, Joanne K. Liu, Veronica S. Martínez, Camila A. Orellana, Lake-Ee Quek, Alex Thomas, Juergen Zanghellini, Nicole Borth, Dong-Yup Lee, Lars K. Nielsen, Douglas B.Kell, Nathan E. Lewis, and Pedro Mendes. Recon 2.2: from reconstruction to model of human metabolism.

Metabolomics , 12(7):109, June 2016. doi:10.1007/s11306-016-1051-4 . Ines Thiele and Bernhard Ø. Palsson. A protocol for generating a high-quality genome-scale metabolicreconstruction.

Nature protocols , 5(1):93, 2010.

A Additional experiments

Sampling from iAF1260Do not take into account the sample of the k ﬁrst phases Time (sec) PSRF < Table 3

We run our method and we do not take into account the sample of the k ﬁrst phases, thus we do not alsocount the value of the E ﬀ ective Sample Size (ESS) in those phases, before we start storing the generated sample andsum up the ESS of each phase. In all cases MMCS stops when the sum of ESS reaches 1000. For each case we reportthe total run-time, the percentage of the marginals that have PSRF smaller than 1 . k ﬁrst phases and the total number of Billiard Walk steps (N) included those performed in the kk