Improved estimations of stochastic chemical kinetics by finite state expansion
Tabea Waizmann, Luca Bortolussi, Andrea Vandin, Mirco Tribastone
IImproved estimations of stochastic chemical kinetics by finitestate expansion
Tabea Waizmann , Luca Bortolussi , Andrea Vandin , Mirco Tribastone IMT School for Advanced Studies, Lucca, Italy Department of Mathematics and Geosciences, University of Trieste, Italy Sant’Anna School for Advanced Studies, Pisa, Italy* [email protected] article has been submitted to the journal PLOS Computational Biology.
Abstract
Quantitative mechanistic models based on reaction networks with stochastic chemicalkinetics can help elucidate fundamental biological process where random fluctuationsare relevant, such as in single cells. The dynamics of such models is described by themaster equation, which provides the time course evolution of the probabilitydistribution across the discrete state space consisting of vectors of population levels ofthe interacting biochemical species. Since solving the master equation exactly is verydifficult in general due to the combinatorial explosion of the state space size, severalanalytical approximations have been proposed. The deterministic rate equation (DRE)offers a macroscopic view of the system by means of a system of differential equationsthat estimate the average populations for each species, but it may be inaccurate in thecase of nonlinear interactions such as in mass-action kinetics. Here we propose finitestate expansion (FSE), an analytical method that mediates between the microscopicand the macroscopic interpretations of a chemical reaction network by coupling themaster equation dynamics of a chosen subset of the discrete state space with thepopulation dynamics of the DRE. This is done via an algorithmic translation of aJuly 6, 2020 1/33 a r X i v : . [ q - b i o . M N ] J u l hemical reaction network into a target expanded one where each discrete state isrepresented as a further distinct chemical species. The translation produces a networkwith stochastically equivalent dynamics, but the DRE of the expanded network can beinterpreted as a correction to the original ones. Through a publicly available softwareimplementation of FSE, we demonstrate its effectiveness in models from systems biologywhich challenge state-of-the-art techniques due to the presence of intrinsic noise,multi-scale population dynamics, and multi-stability. Author summary
Many biological systems exhibit random fluctuations which are of fundamentalimportance, for example, at the level of single cells. The elucidation of such behaviorcan be assisted by quantitative mechanistic models that are typically described byreaction networks with stochastic kinetics, yielding a microscopic description that tracksdiscrete changes in the populations of the interacting biochemical species. In manycircumstances, however, exact analytical solutions of these models are not available, andsimulation methods can be too demanding computationally. On the other hand,macroscopic descriptions that are based on deterministic approximations can yieldinaccurate estimates because they fail to account for the inherent stochasticity in thesystem. In this article we present finite state expansion, a method that interpolatesbetween the microscopic and macroscopic views by explicitly tracking a subset of theoriginal discrete configurations and consistently coupling their dynamics withdeterministic variables. Using a software implementation of our method on models fromthe literature that challenge other state-of-the-art approximation techniques, we showthat finite state expansion improves deterministic estimates when the stochasticity ofthe system cannot be neglected.
Introduction
Chemical reaction networks are a fundamental model to analyze species that interactstochastically through reaction channels according to dynamics governed by thewell-known master equation [1]. This provides a microscopic description in terms of aJuly 6, 2020 2/33et of coupled linear differential equations, each defining the time course of a discretestate of the system as a vector of population counts of the chemical species involved. Itis widely understood, however, that the analysis of the master equation is intractable ingeneral, since analytical solutions are available only in special cases and directnumerical integration is hindered by the combinatorial growth of the state space as afunction of the abundances of the species. Networks with large numbers of species andreactions, and the correspondingly huge state spaces that they typically subsume, alsohave a considerable impact on the computational cost of stochastic simulationmethods [2]. In addition, forgoing an analytical treatment in favor of simulation maypreclude other important studies such as stability, perturbation analysis, bifurcation,and parameter inference [3, 4]. In all these cases it is useful to consider analyticalapproximations of the master equation that trade off precision with cost.The deterministic rate equation (DRE) provides a macroscopic dynamical view of achemical reaction network by associating one ordinary differential equation with eachspecies. The DRE solution gives the exact mean population levels as a function of timeif each reaction’s propensity function is linear, as occurs, for instance, in monomolecularchemical reaction networks [2]. With nonlinear propensity functions, under mildconditions the DRE does give the true expectations only in the thermodynamic limit [5].Away from this asymptotic regime, nonlinear propensities lead to DREs that provideonly an approximation to the true mean dynamics. This is the case, for example, inmodels of cell regulation which depend on low-abundance species (in the order of a fewunits) to describe the behavior of genes [6]. Processes such as activation anddeactivation that vary with time as a result of various interactions may introducesignificant variability in gene expression [7], caused by inherent stochasticity in thebio-molecular processes involved [8, 9]. Since such forms of noise are not accounted forin the DRE, approximation errors may be large.Here we present finite state expansion (FSE), a method which offers a mediationbetween the discrete and continuous representations of the master equation and theDRE, respectively. The former can be interpreted as corresponding to a situation whereevery discrete state is tracked; the latter, on the other hand, corresponds to thesituation where no discreteness is kept and the chemical species are observed onlythrough their approximate average populations. In essence, FSE bridges these twoJuly 6, 2020 3/33escriptions by keeping track discretely of only a user-defined subset of the state space,while collapsing the rest as a continuous approximation. In particular, the state spaceto be kept discretely is determined by a parameter which specifies the maximumpopulation level for each species to be tracked explicitly.FSE is realized by means of a systematic translation of a chemical reaction network(with arbitrary propensity functions) into an expanded one which features additionalspecies and modified reactions. Specifically, each tracked discrete state is represented asa new auxiliary species; the original set of reactions is transformed such that thedynamics of the auxiliary species are coupled with those of the original species, whoserole is to buffer the probability mass that falls out of the state space that is tracked. Inthis respect, FSE can be seen as a mass-preserving variation to the well-known finitestate projection method, which truncates the state space [10].FSE enjoys two useful properties. The first concerns its soundness , in the sense thatany expanded chemical reaction network is stochastically equivalent to the original one.In other words, their master equations can be put in exact correspondence with eachother. This is formally done by proving a result of aggregation for Markov chains knownas ordinary lumpability [11]. It essentially states that the state space of the expandednetwork can be projected onto a lower-dimensional one which still satisfies the Markovproperty and which turns out to correspond to the original network. Importantly, suchexact correspondence does not carry over to the respective DREs of the original and theexpanded networks. Indeed, any expansion arising from a strict subset of the discretestate space will lead to a DRE with more equations, which can be interpreted as refiningterms for the mean estimates. We demonstrate this experimentally with a number ofcase studies taken from the systems biology literature. Our second theoreticalcontribution is a result of asymptotic correctness , stating that if every discrete state istracked then the DRE of the expanded network corresponds to the master equation.There are several approaches for improving the accuracy of the DRE. These includemoment-closure approximations [12], the effective mesoscopic rate equation, which addscorrection terms to van Kampen’s well-known system size expansion [13, 14], and hybridtechniques [15]; ref. [3] offers an up-to-date review. However, these methods areapplicable under certain assumptions such as smoothness of the propensityfunctions [13, 16, 17], mass-action kinetics [18–22], specific structure of the chemicalJuly 6, 2020 4/33eaction network, e.g., to describe gene regulatory systems [23], and species that can bepartitioned into low- and high-abundance classes [15, 24, 25]. FSE, instead, can beapplied to any chemical reaction network in principle. Additionally, the case studiespresented in this paper were chosen as representative instances that may challenge thequality of the approximations by state-of-the-art methods. With these models we showthat FSE improves mean estimates when implementations of moment-closureapproximations give unphysical results or when the hybrid method from ref. [15]experiences numerical difficulties. Otherwise, FSE may outperform the effectivemesoscopic rate equation, and provide more accurate approximations than finite stateprojection when both methods track the same subset of the discrete state space.
Results
In this section we use a number of case studies from the literature to show that FSE canrefine the accuracy of the approximation of mean estimates even with modestexpansions. Analytical solutions of the master equations for the chosen models are notknown, and numerical solutions are difficult because the models give rise to Markovchains with infinite state spaces. For these reasons, we considered ground-truth meantrajectories computed by stochastic simulation via Gillespie’s algorithm [2]. Thenumerical experiments herein reported were performed with an implementation of FSEpublicly available with the software tool ERODE [26].
Schl¨ogl model
The well-known Schl¨ogl system is an autocatalytic process for a single species X [27].The DRE of the original Schl¨ogl model has two equilibrium points, owing to its strong(cubic) nonlinearity [28], deterministically converging only to one [29]. Its discrepancywith respect to the average mean trajectory computed by stochastic simulation has beenobserved for a long time [30]. Fig 1 provides a fully worked application of our FSE as afunction of the upper bound for species X , denoted by O X , which determines thelargest population level to be tracked explicitly in the expansion. The solutions to theDRE of the expanded networks show that larger values of such upper boundincreasingly improve the accuracy of the mean estimates.July 6, 2020 5/33 R1) 2 X k ! X (R2) 3 X k ! X (R3) k ! X (R4) X k !
Fig. 1. Finally, the propensity function f o is derived from that of the original reaction f as f o : R S O æ R +0 , with f o ( x ) = x J o K · f ( o + x | S ) , [2] where, for a given x œ R S O , x | S denotes its projection onto the original set of population classes S . This modification accounts for the fact that the observed state J o K encodes additional population counts, as given by the multiset o . Importantly, we prove that such a translation preserves the stochastic properties of the RN in the sense of ordinary lumpability of Markov chains (26) (see
SI Appendix, SI Text ). Denoting by ˆ P the probability distribution in the expanded RN, ordinary lumpability implies that P ‡ ( t ) = ÿ o + › = ‡ ˆ P J o K + › ( t ) , for all t and ‡ œ N S . [3] That is, the ME solution for a state ‡ in the original RN will exactly correspond to the sum of the ME solutions for all states in the expanded RN that track the same overall population levels. Furthermore, when the RN is fully expanded, i.e., when O = N S , we recover the original ME. Although the stochastic behavior of the source RN and any expansion are equivalent in this specific sense, their respective
DREs are not. The target RN has | O | + | S | variables: its solution can be interpreted as a corrected estimate of the solution of the | S | -variable source DRE. Applications
Schlögl’s model.
The Schögl model is an extensively studiedtri-molecular scheme (29), given by the RN A + 2 X k ≠æ X + A X k ≠æ X [4] B k ≠æ X + B X k ≠æ ÿ [5]Here, the parameters k , k , k , k are mass-action kinetic parameters. The associated propensity function is defined in the usual way, by counting the total distinct individual reactions that can occur in every state: for a reaction with reagents fl and kinetic parameter k , the propensity function for state ‡ is thus given by f k ( ‡ ) = k r S œ S ! ‡ S fl S " . The Schlögl model describes an autocatalytic process for species X in the presence of reservoirs for chemical species A and B which we assume not to vary with time. Overall, the scheme results in a one-dimensional RN which only tracks Time X (a) Individual simulation traces
Time X (b) Deterministic estimates
Fig. 2.
Evaluation of the Schlögl model with scheme in Eqs. 4-5 using kinetic param-eters k = 3 · ≠ , k = 10 ≠ , k = 10 ≠ , k = 3 . , taken from ref. (28).A) Representative realizations of the stochastic process demonstrate bimodality, withsteady state populations approaching ca. 600 and ca. 100, respectively, when startingfrom an initial condition with 200 molecules of species X , molecules of species A , and · molecules of species B . B) The DRE converges to a single equilibrium(ca. 85.50, blue line), causing a noticeable discrepancy with respect to the true mean(dotted line, computed as the average of simulations). Finite state expansionachieves excellent agreement with an upper bound O X = 650 . Time M A / M B SIM DRE 1-5 2-10 2-150 100 200 300 400
Time P A / P B Fig. 3.
Numerical simulations of the genetic toggle switch in scheme (6) comparingstochastic simulation, DRE and finite state expansions fixing O PA = O PB = 0 while using different upper bounds O M – O S for the number of copies of M A / M B and S A / S B (as indicated in the legend), respectively. Initial condition was the zerostate. The ODE system size for the tested choices of upper bounds is equal to ( O M + 1) · ( O S + 1) + 6 (corresponding to 150, 1095 and 2310 equations for O M – O S = 1 – , O M – O S = 2 – , and O M – O S = 2 – , respectively). Kineticparameters were chosen as follows: k = 0 . , k = 0 . , k = 1 . , k = 10 . , k = 0 . , k = 0 . , k = 20 . . Protein production (right plot) is controlled by alow population of precursor mRNA (left plot), which causes significant underestimationerrors with DRE. Increasing the upper bounds of finite state expansion consistentlyimproves the accuracy of the mean estimate. The corrections for species S A and S B , not reported here (see SI Fig.
S5), are similar. the number of molecules of X , while the initial populations of molecules A and B are taken as further model parameters. The discrepancy between the stochastic kinetics and the DRE approximation has been observed for a long time (30). Under an appropriate choice of kinetic parameters, the DRE features two equilibrium points owing to the strong (cubic) nonlinearity in the ODEs (31). Analogously, the underlying infinite-state stochastic process is well-known for the bimodality of the steady-state probability distribution of species X . In this case, the DRE may provide inaccurate estimates of the average population of species X because the DRE will deterministically converge only to one steady state (32). Finite state expansion can correct the mean estimates by expanding increasingly larger populations of species X , paying a linear cost in the resulting ODE system size (Figure 2). Genetic Toggle Switch.
The toggle switch network is a funda-mental regulatory system of two mutually repressing genes (33).Models of toggle-switch networks are mathematically challeng- et al.
PNAS |
December 12, 2019 | vol. XXX | no. XX | Original reaction network ··· i ··· Reaction network with finite state expansion
Observation bound D R A F T where the ‘+’ symbol in the reaction denotes multiset union,and multisets ÷ ,  and o Õ œ N S are given componentwise by ÷ S = max(0 , fl S ≠ o S )  S = max(0 , max(0 , o S ≠ fl S ) + fi S ≠ O S ) o Õ S = min( O S , max(0 , o S ≠ fl S ) + fi S ) . Intuitively, for each original reaction, Eq. (1) considers its behavior with respect to each observed configuration J o K . Any expanded reaction maintains the same overall counts of educts and products as the originating reaction, with a target observed configuration J o Õ K that results from the addition of products and removal of educts within the upper bound O . The multisets of original population classes ÷ and  act as bu er pools for configurations that are not explicitly observed. An example of such a construction is discussed in detail in
Fig. 1. Finally, the propensity function f o is derived from that of the original reaction f as f o : R S O æ R +0 , with f o ( x ) = x J o K · f ( o + x | S ) , [2] where, for a given x œ R S O , x | S denotes its projection onto the original set of population classes S . This modification accounts for the fact that the observed state J o K encodes additional population counts, as given by the multiset o . Importantly, we prove that such a translation preserves the stochastic properties of the RN in the sense of ordinary lumpability of Markov chains (26) (see
SI Appendix, SI Text ). Denoting by ˆ P the probability distribution in the expanded RN, ordinary lumpability implies that P ‡ ( t ) = ÿ o + › = ‡ ˆ P J o K + › ( t ) , for all t and ‡ œ N S . [3] That is, the ME solution for a state ‡ in the original RN will exactly correspond to the sum of the ME solutions for all states in the expanded RN that track the same overall population levels. Furthermore, when the RN is fully expanded, i.e., when O = N S , we recover the original ME. Although the stochastic behavior of the source RN and any expansion are equivalent in this specific sense, their respective
DREs are not. The target RN has | O | + | S | variables: its solution can be interpreted as a corrected estimate of the solution of the | S | -variable source DRE. Applications
Schlögl’s model.
The Schögl model is an extensively studiedtri-molecular scheme (29), given by the RN A + 2 X k ≠æ X + A X k ≠æ X [4] B k ≠æ X + B X k ≠æ ÿ [5]Here, the parameters k , k , k , k are mass-action kinetic parameters. The associated propensity function is defined in the usual way, by counting the total distinct individual reactions that can occur in every state: for a reaction with reagents fl and kinetic parameter k , the propensity function for state ‡ is thus given by f k ( ‡ ) = k r S œ S ! ‡ S fl S " . The Schlögl model describes an autocatalytic process for species X in the presence of reservoirs for chemical species A and B which we assume not to vary with time. Overall, the scheme results in a one-dimensional RN which only tracks Time X (a) Individual simulation traces
Time X (b) Deterministic estimates
Fig. 2.
Evaluation of the Schlögl model with scheme in Eqs. 4-5 using kinetic param-eters k = 3 · ≠ , k = 10 ≠ , k = 10 ≠ , k = 3 . , taken from ref. (28).A) Representative realizations of the stochastic process demonstrate bimodality, withsteady state populations approaching ca. 600 and ca. 100, respectively, when startingfrom an initial condition with 200 molecules of species X , molecules of species A , and · molecules of species B . B) The DRE converges to a single equilibrium(ca. 85.50, blue line), causing a noticeable discrepancy with respect to the true mean(dotted line, computed as the average of simulations). Finite state expansionachieves excellent agreement with an upper bound O X = 650 . Time M A / M B SIM DRE 1-5 2-10 2-150 100 200 300 400
Time P A / P B Fig. 3.
Numerical simulations of the genetic toggle switch in scheme (6) comparingstochastic simulation, DRE and finite state expansions fixing O PA = O PB = 0 while using different upper bounds O M – O S for the number of copies of M A / M B and S A / S B (as indicated in the legend), respectively. Initial condition was the zerostate. The ODE system size for the tested choices of upper bounds is equal to ( O M + 1) · ( O S + 1) + 6 (corresponding to 150, 1095 and 2310 equations for O M – O S = 1 – , O M – O S = 2 – , and O M – O S = 2 – , respectively). Kineticparameters were chosen as follows: k = 0 . , k = 0 . , k = 1 . , k = 10 . , k = 0 . , k = 0 . , k = 20 . . Protein production (right plot) is controlled by alow population of precursor mRNA (left plot), which causes significant underestimationerrors with DRE. Increasing the upper bounds of finite state expansion consistentlyimproves the accuracy of the mean estimate. The corrections for species S A and S B , not reported here (see SI Fig.
S5), are similar. the number of molecules of X , while the initial populations of molecules A and B are taken as further model parameters. The discrepancy between the stochastic kinetics and the DRE approximation has been observed for a long time (30). Under an appropriate choice of kinetic parameters, the DRE features two equilibrium points owing to the strong (cubic) nonlinearity in the ODEs (31). Analogously, the underlying infinite-state stochastic process is well-known for the bimodality of the steady-state probability distribution of species X . In this case, the DRE may provide inaccurate estimates of the average population of species X because the DRE will deterministically converge only to one steady state (32). Finite state expansion can correct the mean estimates by expanding increasingly larger populations of species X , paying a linear cost in the resulting ODE system size (Figure 2). Genetic Toggle Switch.
The toggle switch network is a funda-mental regulatory system of two mutually repressing genes (33).Models of toggle-switch networks are mathematically challeng- et al.
PNAS |
December 13, 2019 | vol. XXX | no. XX | Continuous-time Markov chains Mean approximationsStochastic simulations Deterministic Rate Equations
Original coupled mass-action propensityconditional probability O X
Original Finite state expansion E Finite state expansion F d J n K dt = I n