Interval Analysis of Worst-case Stationary Moments for Stochastic Chemical Reactions with Uncertain Parameters
aa r X i v : . [ ee ss . S Y ] F e b Interval Analysis of Worst-case Stationary Moments for Stochastic ChemicalReactions with Uncertain Parameters
Yuta Sakurai a) and Yutaka Hori b) Department of Applied Physics and Physico-Informatics,Keio University3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522,Japan
The dynamics of cellular chemical reactions are variable due to stochastic noise fromintrinsic and extrinsic sources. The intrinsic noise is the intracellular fluctuations ofmolecular copy numbers caused by the probabilistic encounter of molecules and ismodeled by the chemical master equation. The extrinsic noise, on the other hand,represents the intercellular variation of the kinetic parameters due to the variationof global factors affecting gene expression. The objective of this paper is to proposea computational framework to analyze the combined effect of the intrinsic and theextrinsic noise modeled by the chemical master equation with uncertain parameters.More specifically, we formulate a convex optimization problem to compute the inter-vals of the stationary solution of uncertain moment equations whose parameters aregiven only partially in the form of the statistics of their distributions. The optimiza-tion program is derived without approximating the governing equation in contrastwith many existing approaches. Thus, we can obtain guaranteed intervals of theworst possible values of the moments for all parameter distributions satisfying thegiven statistics, which is useful for model-based rational engineering of biomolecularcircuits in synthetic biology, where the robustness of synthetic reactions is impor-tant. We demonstrate the proposed optimization approach using two examples ofstochastic chemical reactions and show that the solution of the optimization problemgives practically useful upper and lower bounds of the statistics of the stationarycopy number distributions. a) Electronic mail: [email protected] b) Electronic mail: [email protected] . INTRODUCTION The dynamics of chemical reactions in cells are variable even when the cell populationis genetically identical. The stochastic response of the individual cells allows the cell pop-ulation to maintain diverse phenotypes in a homogeneous culture environement . Themechanism of the stochastic response is explained using the framework of two types of noisecalled intrinsic and extrinsic noise . The intrinsic noise is the intracellular fluctuations ofmolecular copy numbers caused by the probabilistic encounter of molecular species such asmRNA and proteins in a single cell, whose typical copy numbers are on the order of 10 to10 . The extrinsic noise, on the other hand, arises from the intercellular variation of theglobal factors affecting gene expression such as RNA polymerase and ribosome copy num-bers possibly due to the cell cycle and plasmid copy numbers, for example. Thus, it resultsin the cell-to-cell variation of the kinetic parameters of the stochastic chemical reactions.The dynamics of the intrinsic noise is modeled by a continuous-time discrete state Markovprocess on a semi-infinite integer lattice associated with the copy numbers of molecularspecies. The governing equation of the Markov process is called the chemical master equation(CME) , which is an infinite dimensional linear ordinary differential equation. In manycases, however, the exact analytic solution of the CME is hard to obtain because of theinfinite dimensional equation. Thus, analyses of stochastic chemical reactions are carriedout based either on sample-path generation using the stochastic simulation algorithm or onapproximate models.To date, many different approaches were proposed to approximately compute the statis-tics and/or the distribution of the copy numbers. The chemical Langevin equation and thelinear noise approximation are widely used approaches to approximately capture the meanand the variance of the copy number distribution. The moment closure approximation is yetanother major approach, which predicts the statistics of the copy number distributions basedon an approximated moment equation derived from the CME and some assumptions on theunderlying copy number distribution . The finite state projection (FSP), on the otherhand, enables a direct computation of the copy number distribution based on an approxi-mated CME . Unlike the aforementioned approaches, the FSP has the advantage thatthere is a mathematically rigorous way to quantify the error bound of the approximation,and thus, the accuracy of analysis results can be quantitatively discussed. More recently, a2oment computation method with guaranteed error bounds was proposed separately by sev-eral authors . This method uses truncated moment equations to quantify the momentsof the copy number distributions, but unlike the moment closure approximation , thevalues of the truncated moments are rigorously estimated based on matrix inequality con-ditions. In particular, the moment computation problem with these conditions boils downto a class of mathematical optimization called the generalized moment problem . Thus,using well-established solvers of mathematical optimization, the statistics of molecular copynumbers can be systematically and efficiently predicted with mathematically guaranteedintervals.Despite these advancements, one limitation of these general frameworks is that theyfocus only on the analysis of intrinsic noise while experimental observations suggest thatthe stochastic cellular response is the result of the combined effects of intrinsic and extrinsicnoise . Thus, an important next step is to generalize these frameworks to enable thesimultaneous analysis of intrinsic and extrinsic noise. Toward this goal, a sample-pathgeneration algorithm was proposed by extending the stochastic simulation algorithm .The linear noise approximation and the method of moment closure were also extendedto analyze the interplay between the two sources of noise. Other works proposed efficientsampling schemes of the kinetic parameters for simulations . However, there are fewgeneral frameworks that can guarantee the accuracy of approximation except the work by˘Ceska et al. , where the formal method was employed to guarantee the accuracy of theanalysis.Motivated by this situation, we here consider chemical reactions subject to both intrinsicand extrinsic noise and propose a theoretical framework for directly computing the guaran-teed intervals of the statistics of molecular copy number distributions without generatingsample paths. To be more specific, our goal is to obtain the upper and the lower bounds ofthe stationary moments of the copy number distributions when the rate parameters, whichappear in the propensity functions of the CME are heterogeneous within the cell popula-tion. Since exact identification of the joint parameter distribution is hard in practice, wehere assume that only part of the statistics of parameter distribution such as the meanof the marginal distributions and/or the covariance of the parameters are available. Thisimplies that the full (joint) distribution of the parameters cannot be uniquely determined,and thus, the stationary moments of the copy number distribution can be obtained only3 arameter m o l e c u l a r c op y nu m be r uncertain parameter distribution dist FIG. 1. The concept of the worst-case interval of stationary moments for uncertain parameterdistributions as the worst-case intervals for all possible parameter distributions satisfying the a priori statistics (FIG. 1). To this end, we derive the stationary moment equation with uncertainparameters and develop an optimization problem for bounding the intervals of the copynumber moments. In particular, building upon the recently proposed moment computationmethod by the authors and several others , we formulate a semidefinite program, whichbelongs to a class of convex optimization whose globally optimal solution can be efficientlyfound with established algorithms. The proposed approach is then demonstrated with tworeaction systems with different assumptions on the underlying parameter distributions.The organization of this paper is as follows. In section II B, we formally define the problemand derive the moment equation with uncertain parameters. In section III, we derive theoptimization to bound the intervals of the copy number statistics. Section IV is devoted tothe demonstration of the proposed optimization approach using two illustrative examples.Finally, we summarize the results and discuss the remaining issues and opportunities inSection V.The following notations are used. N is the set of natural numbers including zero, N := { , , , . . . } , Z is the set of integers, and R ≥ is the set of non-negative real numbers R ≥ := { x ∈ R | x ≥ } . A superscript is used to represent the dimension of the vector space, e.g., N n . supp f ( x ) is the support of a (probability density) function f ( x ).4 I. MODEL OF STOCHASTIC CHEMICAL REACTIONS
In this section, we first introduce a chemical master equation, a mathematical modelof stochastic chemical reactions, and formally define the problem of moment analysis withuncertain reaction parameters. The stationary moment equation is then derived to use forthe development of our optimization problem.
A. Chemical Master Equation and Problem Formulation
Consider a chemical reaction system that consists of n species of molecules, M , M , . . . , M n ,and r reactions. We denote the copy number of the molecular species M j by x j and define x := [ x , x . . . , x n ] T ∈ N n . The stoichiometry of the i -th reaction is defined by s i ∈ Z n ,meaning that the molecular copy numbers change from x to x + s i by the reaction i . Thereactions occur in a stochastic manner due to the low copy nature of the molecular species,and thus, the dynamics of the copy number x is considered as a stochastic process. Morespecifically, the probability of the occurrence of the reaction i in an infinitesimal time dt isgiven by w i ( x , k i ) dt , where w i ( x , k i ) is the propensity function. We assume that all reactionsare elementary, meaning that the propensity function can be defined depending on the typeof the reactions as follows: w i ( x , k i ) := k i ( ∅ k i −→ product) k i x p ( M p k i −→ product) k i x p x q ( M p + M q k i −→ product) k i x p ( x p −
1) ( M p + M p k i −→ product) , (1)where k i ∈ R ≥ is a positive rate constant associated with the i -th reaction.The copy number x is a random variable, and thus, we define a conditional probability oftime t , initial value x , and the rate constants k := [ k , k , . . . , k r ] T ∈ R r ≥ , which we denoteby P ( x | k , t, x ). The evolution of P ( x | k , t, x ) is then governed by the chemical masterequation (CME). ddt P ( x | k , t, x ) = r X i =1 { w i ( x − s i , k i ) P ( x − s i | k , t, x ) − w i ( x , k i ) P ( x | k , t, x ) } . (2)5he CME is also known as Kolmogorov’s forward equation for a discrete state Markov chain,where the state of the (semi-infinite) chain is the copy number of molecular species.The CME (2) characterizes the dynamics of the intrinsic variability caused by the stochas-tic reaction events within a cell. On the other hand, the cell population is also subject toextrinsic noise, which results in the variation of the kinetic parameters k . This variationcan be characterized by the parameter distribution P ( k ) across the cell population.In what follows, we consider analyzing the stationary moments of the copy number dis-tribution P ( x ) := lim t →∞ P ( x | t ) when the parameter distribution P ( k ) is partially given inthe form of its moments. Specifically, our goal is to propose a mathematical optimizationprogram for computing the bounds of the stationary moments and their associated statis-tical values such as the mean and the variance of the distribution based on the a priori information of the parameter distribution P ( k ). More formally, the problem is stated asfollows. Problem.
Consider the chemical master equation (2). Suppose the Markov chain modeledby eq. (2) is ergodic, and the stationary moment of the copy number distribution P ( x )is finite. For a given set of parameter distributions P , compute the upper and the lowerbounds of the stationary moments of P ( x ).It is reasonable, in practice, to assume that the actual distribution of the parameters P ( k ) is unknown but only some statistics such as the mean and the covariance are known.Thus, we here consider the case where the set P is characterized by some of the momentsof parameter distributions. We also assume that the Markov chain is ergodic, i.e., the chainis irreducible and positive recurrent, in which case there is a unique stationary distributionsatisfying P ( x | k ) = lim t →∞ P ( x | k , t, x ). Assumption 1
The Markov chain associated with the CME (2) is ergodic, and the station-ary moment of the copy number distribution P ( x | k ) is finite for all k ∈ supp P ( k ), where P ( k ) is a parameter distribution that belongs to a given set of distributions P .It should be noted that the stationary distribution of the copy numbers P ( x ) might notbe unique for the set of the parameter distributions P . In other words, we can obtain onlyan interval of statistics of the copy number distribution. The upper and the lower bound ofthe statistics then gives the worst-case statistics of the stochastic chemical system (2) whenthe underlying parameter distribution P ( k ) ∈ P is uncertain (FIG. 1).6 . Stationary moment equation with uncertain parameters To analyze the stationary moments of the copy number distribution, we introduce themoments of the stationary density distribution P ( x , k , x ). Specifically, we define E [ x α k β ] := X x ∈ N n Z K×X x α k β P ( x , k , x ) , (3)where K := supp P ( k ) and X := supp P ( x ). The scalars x α and k β are defined by x α := n Y i =1 x α j i = x α x α . . . x α n n , (4) k β := r Y i =1 k β i i = k β k β . . . k β r r . (5)with α := [ α , α , . . . , α n ] T ∈ N n , β := [ β , β , . . . , β r ] T ∈ N r . (6)It should be noted that the moments of the copy number distribution P ( x ) and the parameterdistribution P ( k ) are the special cases where β = and α = , respectively.The stationary moment equation that E [ x α k β ] must satisfy is obtained by substituting0 into the left-hand side of the CME (2), multiplying both sides by x α k β and taking sumand integral for x , k and x . Specifically, we have0 = r X i =1 X γ ∈ Γ i a α i, γ E [ x γ k β + e i ] , (7)for α ∈ N n and β ∈ N r , where e i is the vector whose i -th entry is 1, and 0 otherwise, andthe constant a α i, γ is the coefficient of x γ in the polynomial { ( x + s i ) α − x α } w i ( x , k i ) /k i ( i =1 , , . . . , r ). The range of the summation Γ i ( i = 1 , , . . . , r ) isΓ i := ( [ γ , γ , . . . , γ n ] T ∈ N n | n X j γ j = n X j =1 α j + deg( w i ( x , k i )) − ) , (8)which depends on the degree of the polynomial w i ( x , k i ) in x , i.e., deg( w i ( x , k i )).It should be noticed that the equation (7) is linear in moments but is infinite dimen-sion. This is because, for each α and β , the right-hand side has higher-order moments E [ x α k β + e i ] ( i = 1 , , . . . , r ), and when w i ( x , k i ) defined in eq. (1) is quadratic for some i = 1 , , . . . , r , there are higher-order moments E [ x γ k β ] satisfying P ni =1 γ i > P ni =1 α i .7o solve the stationary moment equation (7), we consider truncating equations for highorder moments. Let ρ and σ denote the maximum order of the moments used for ouranalysis, i.e., ρ := max k α k , σ := max k β k , (9)where α and β are defined in eq. (6), and k · k is the ℓ norm of the vector space. Thetruncated equation can then be represented by = A µ + B ν + C ξ (10)where µ , ν and ξ are the vectors of E [ x α ], E [ x α k β ] and E [ k β ], respectively, i.e., µ := [ E [ x α ] , E [ x α ] , . . . ] T , (11) ν := [ E [ x α k β ] , E [ x α k β ] , . . . , E [ x α k β ] , E [ x α k β ] , . . . ] T ( α i = , β i = ) , (12) ξ := [ E [ k β ] , E [ k β ] , . . . ] T . (13)Note that x α i and k β i in eqs. (11)–(13) are scalars defined in eq. (4) and eq. (5), respectively.The equation (10) implies that the stationary moments exist in the null space of anassociated linear equation = A ¯ µ + B ¯ ν + C ¯ ξ , (14)where ¯ µ , ¯ ν and ¯ ξ are (independent) variables. When the moments of the parameter distri-bution, ξ , are a priori , they can be substituted in ¯ ξ to reduce the variables. However, theequation is still highly underdetermined, and the solution cannot be uniquely determinedby simply solving the linear equation (14).In the next section, we formulate an optimization problem that seeks guaranteed upperand lower bounds of the stationary moments based on eq. (14) by introducing additionalnecessary conditions to constrain the solution space of the linear equation (14). The resultingoptimization problem provides practically useful bounds of E [ x α ], enabling rigorous andrational design and analysis of stochastic chemical reactions.8 II. MATHEMATICAL OPTIMIZATION FOR BOUNDING STATIONARYMOMENTS OF UNCERTAIN CHEMICAL REACTIONSA. Overview of the optimization scheme
Before presenting a key proposition that leads to the optimization problem, we formallydefine the class of uncertain parameter distributions P ( k ). We consider a set of uncertainparameter distributions, P , whose (uncertain) moments are specified by linear inequalities of ξ by h i ( ξ ) ≥ i = 1 , , . . . , m ), and the support of the random variable k , i.e., supp P ( k ),is a subset of a semialgebraic set { k | c i ( k ) ≥ i = 1 , , . . . , ℓ ) } . That is, P := (cid:8) P ( k ) | h i ( ξ ) ≥ i = 1 , , . . . , m ) , supp P ( k ) ⊆ { c i ( k ) ≥ } ℓ i =1 (cid:9) . (15)We also define a semialgebraic set { x | d i ( x ) ≥ i = 1 , , . . . , ℓ ) } satisfyingsupp P ( x ) ⊆ { d i ( x ) ≥ } ℓ i =1 (=: X ) . (16)The set X is, for example, the positive orthant { x | x i ≥ i = 1 , , . . . , n ) } since the copynumber is non-negative.Then, we have the following proposition. Proposition 1.
Consider the stochastic reaction system governed by the CME (2) and theassociated truncated moment equation (10). Suppose Assumption 1 holds, and X is given.Then, for any given parameter distributions P ( k ) ∈ P , the moments of the joint distribution P ( x , k ), i.e., µ , ν and ξ in eq. (10), satisfy A µ + B ν + C ξ = 0 , (17) h i ( ξ ) ≥ i = 1 , , . . . , m ) (18)( H :=) H ( g g T0 ) (cid:23) O, (19)( H i :=) H ( c i ( k ) g i g T i ) (cid:23) O ( i = 1 , , . . . , ℓ ) , H ( d i − ℓ ( x ) g i g T i ) (cid:23) O ( i = ℓ + 1 , ℓ + 2 , . . . , ℓ + ℓ ) , (20)where g i is a vector of monomial basis of polynomials R [ x , k ] of an arbitrary degree, and H ( G ) := X x ∈ N n Z K×X G ( x , k ) P ( x , k , x ) (21)9ith G ( x , k ) being a polynomial matrix of x and k .The proof of this proposition is shown in Appendix A. The entries of the matrices H i ( i =1 , , . . . , ℓ + ℓ ) are the moments of the joint distribution P ( x , k ), and thus, the entriescontain the variables of the linear equation (17), i.e., µ , ν and ξ . The semidefinite matrix H ( g g T0 ) is known as the moment matrix . In what follows, we refer to the semidefiniteconditions (19) and (20) as moment conditions.This proposition leads to additional necessary conditions that the variables ¯ µ , ¯ ν and ¯ ξ inthe linear equation (14) must satisfy. Thus, by adding these constraints, we can narrow thesolutions of the underdetermined linear equation (14). It should be noted that Proposition1 is an extension of the Proposition in the authors’ previous work , where similar algebraicconditions were presented for the case with deterministic (fixed) parameters.We use the conditions in Proposition 1 to seek the upper and the lower bounds of momentvalues. Specifically, we replace the moments µ , ν and ξ in the conditions of Proposition 1with independent variables ¯ µ , ¯ ν and ¯ ξ and explore the maximum or the minimum value ofa target variable subject to the constraints (17)–(20). For example, the interval of the meancopy number E [ x i ] can be computed by maximizing and minimizing the corresponding entryof ¯ µ subject to eqs. (17)–(20). Since eqs. (17)–(20) are necessary conditions, the maximumand the minimum value becomes a guaranteed upper and lower bound of the moment,respectively. More generally, this maximization/minimization process can be summarizedas an optimization problem as follows. Optimization Problem. min ¯ µ , ¯ ν , ¯ ξ f T ¯ µ subject to A ¯ µ + B ¯ ν + C ¯ ξ = 0 , (22) h i ( ¯ ξ ) ≥ i = 1 , , . . . , m ) , H i (cid:23) O ( i = 0 , , , . . . , ℓ + ℓ ) . where f is a vector of user-defined constants for defining the moments of interest.This optimization problem explores the minimum value of the linear combination of theuncentered moments µ over the set specified by the constraints. The maximization problemcan be formulated by simply flipping the sign of the constant f . The optimization prob-lem belongs to a class of convex optimization called semidefinite program (SDP), or more10pecifically, it is a class of the generalized moment problem . Thus, existing optimizationsolvers are available to efficiently compute the intervals of the moments.The optimization program has two tuning parameters, ρ and σ , which control the numberof variables as defined in eq. (9). These parameters are used to control the tradeoff betweenthe tightness of the bounds and the computational cost. In general, the solution of theoptimization problem becomes gives tighter bounds with the increase of ρ and σ , while thenumber of variables increases also increases. This feature is demonstrated in Section IV withillustrative examples. Moreover, The size of the matrices H i ( ℓ = 0 , , . . . , ℓ + ℓ ) dependson the choice of the maximum degree of the polynomial basis g i defined in Proposition 1,which is specifically written by g i := X p i ⊗ K q i , (23)where X p and K q represent a vector of monomial basis of polynomials R [ x ] of degree p and R [ k ] of degree q , respectively. In what follows, we use p i = ⌈ ρ/ ⌉ ( i = 0 , , . . . , l ) ⌊ ρ/ ⌋ ( i = l + 1 , l + 2 , . . . , l + l ) , (24) q i = ⌈ σ/ ⌉ ( i = 0 , l + 1 , l + 2 , . . . , l + l ) ⌊ σ/ ⌋ ( i = 1 , , . . . , l ) (25)to constrain all the variables in the linear equation. B. Uncertainty of parameter distributions
The proposed optimization program provides a mathematical certificate that the momentof a copy number distribution is within the computed bounds even when the parameterdistribution is given only as a set P . This feature is particularly useful when one needsto guarantee the robustness of synthetic biomolecular systems in synthetic biology. Thefollowing proposition answers the Problem posed in Section II A. Proposition 2.
Consider the stochastic reaction system governed by the CME (2) and a setof parameter distributions P . Suppose Assumption 1 holds, and X in eq. (16) is given. Let ϕ ∗ min denote the minimum value of the stationary moment f T µ among all possible parameter11 ABLE I. Typical representative distributions and their momentsDistribution Raw moments E [ k β i i ] NotationPoisson e − λ P ∞ j =0 λ j j ! j β i λ is the mean of the distributionGamma θ β i Q β i j =1 ( η + j − η and θ are shape and scale factor distributions P ( k ) ∈ P , i.e., ϕ ∗ min := min P f T µ . Then, the solution of the minimizationproblem (22), ϕ min , satisfies ϕ min ≤ ϕ ∗ min .In the simplest case where the entire parameter distribution P ( k ) is given without uncer-tainty, we can simply substitute its moments E [ k β ] to ¯ ξ in eqs. (22) to solve the optimizationproblem. The moments of the parameter distribution, ξ , can be systematically determinedif the parameters k i ( i = 1 , , . . . , r ) are independently distributed, and the underlying dis-tribution P ( k ) = Q ri =1 P ( k i ) is given by well-known parametric distributions such as Poissonand gamma distributions.For gamma distributions, for example, we substitute E [ k β i i ] = θ β i β i Y j =1 ( η + j −
1) ( β i = 1 , , . . . , r ) (26)into the corresponding entries of ¯ ξ , where η and θ are the shape and the scale factor of thegamma distribution, respectively. The other cross-moments of P ( k ) should be the product ofeach moment, E [ Q ri =1 k β i i ] = Q ri =1 E [ k β i i ] since the parameters are independently distributed.For Poisson distributions, the raw moments are E [ k β i i ] = e − λ ∞ X j =0 λ j j ! j β i , (27)where λ is the mean of the distribution. These relations are summarized in Table I.In a more complex case, where the underlying parameter distribution is not parametricnor completely known, the parameter distribution can be constrained by the inequalities bythe inequalities h i ( ξ ) ≥ ξ .For example, an uncertain mean value of a parameter k can be specified by E [ k ] − E [ k ] ≥ − E [ k ] + E [ k ] ≥
0, where E [ k ] ≥ E [ k ] ≥ ABLE II. Reaction kinetics in the reaction system (28)reaction index i propensity w i stoichiometry s i k D [1 , T k x [ − , T k x [0 , T k x [0 , − T IV. APPLICATION EXAMPLES
In this section, we demonstrate the proposed optimization approach by using two exam-ples of stochastic chemical reactions and show that the solution of the optimization problemgives practically useful upper and lower bounds of the statistics of the stationary copy num-ber distribution P ( x ). In Section IV A, we first consider a model of gene expression, forwhich the propensity functions w i ( x , k i ) are affine in the copy number x . The goal of thefirst example is to illustrate the proposed optimization program using a concrete exampleand verify the output of the optimization by comparing it with the analytic solutions of themoment equation, which we can obtain for this special example. In particular, we considerthe case where the parameter distributions are uncertain, and only part of the moments areavailable, and thus, the statistics of the molecular copy numbers can only be obtained asan interval. We show that the upper and the lower bounds of the interval can be tightlycomputed using the proposed optimization framework. Then, in Section IV B, we apply thesame procedure to a more general class of reaction systems, for which the analytic solutionof the moment equation is no longer available. A. A model of gene expression
1. Illustration of the stationary moment equation
Consider the following reaction system that consists of two species of molecules, mRNAand protein. ∅ w −→ mRNA , mRNA w −→ ∅ , mRNA w −→ mRNA + protein , protein w −→ ∅ , (28)13 A) (B)
FIG. 2. Upper and lower bounds of statistics of protein in the reaction system (28). (A) Meancopy number. (B) Standard deviation (SD) of copy number. where ∅ represents either degradation or molecules outside of the reaction system. We denotethe copy numbers of mRNA and protein by x and x , respectively, and define x = [ x , x ] T .The propensity function w i ( x , k i ) and the stoichiometry s i ( i = 1 , , ,
4) are shown inTABLE II.For the illustration purpose, we first consider the case where the underlying distributionof the transcription rate k is a gamma distribution G ( k ; η, θ ), and the other parameters, k , k and k , are identical between cells, i.e., P ( k ) = G ( k ; η, θ ) δ ( k − k ∗ ) δ ( k − k ∗ ) δ ( k − k ∗ ) , (29)where η and θ are the shape and the scale parameters of the gamma distribution, respectively,and δ ( · ) represents the delta function. Our goal is to compute the mean and the varianceof the copy numbers of the protein.To this goal, we first derive the moment equation (10). The truncated equation with theorder of moments up to ρ = 1 and σ = 0 is given by = − k ∗ k ∗ − k ∗ E [ x ] E [ x ] + D E [ k ] . (30)Suppose the shape and the scale factors of the gamma distribution G ( k ; η, θ ) are η = 2and θ = 0 .
4, respectively, and the other reaction parameters are k ∗ = 0 . , k ∗ = 0 . , k ∗ = 0 . D = 5 . (31)14 ound of correlation FIG. 3. Bounds of maximum and minimum values of mean copy number of molecule B in thereaction system (28).
The mean of the parameter k , which appears in the second term of eq. (30) is then obtainedas E [ k ] = θη = 0 . E [ x ]obtained by the optimization (22) coincide with each other even without the moment con-ditions H i (cid:23)
0. FIGs. 2 (A) and (B) show the computed upper and lower bounds of themean and the standard deviation of the protein obtained for different values of σ . Thesevalues are indeed consistent with the analytic solutions, E [ x ] = k ∗ k ∗ E [ x ] = k ∗ Ck ∗ k ∗ E [ k ] , (32)and q E [ x ] − ( E [ x ]) = s(cid:18) k ∗ Ck ∗ k ∗ (cid:19) { E [ k ] − ( E [ k ]) } + k ∗ Ck ∗ k ∗ (cid:18) k ∗ k ∗ + k ∗ (cid:19) E [ k ] . (33)15 . Analysis for an uncertain parameter distribution Next, we consider a more complex and realistic case where the parameter distributionhas uncertainty and cannot be uniquely determined. For this case, the statistics of the copynumber distribution are obtained as an interval.Consider the set of reactions (28) where the transcription rate k and the translation rate k follow a joint distribution P ( k , k ). The degradation rates are assumed to be identicalbetween cells, which are given by k = k ∗ and k = k ∗ . We assume that the marginaldistributions, P ( k ) and P ( k ) are gamma distributions, whose shape and scale factors aregiven by ( η , θ ) and ( η , θ ), respectively, and k and k , are possibly correlated, which isoften the case if a common cellular resource is affected by extrinsic noise. Specifically, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E [ k k ] − E [ k ] E [ k ] p { E [ k ] − ( E [ k ]) }{ E [ k ] − ( E [ k ]) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | r | ( | r | ≤ , (34)where r is Pearson’s correlation coefficient. It should be noted that the joint distribution ofthe parameters P ( k , k ) cannot be uniquely determined when r = 0 since the cross-moments E [ k β k β ] ( β , β >
0) are not unique while the moments of each parameter E [ k β ] and E [ k β ]can be determined by eq. (26).To compute the intervals of the mean protein copy number E [ x ], we introduce the con-straint corresponding to eq. (34) as h ( E [ k k ]) := − E [ k k ] + E [ k ] E [ k ] + | r | q E [ k ] − ( E [ k ]) q E [ k ] − ( E [ k ]) (35) h ( E [ k k ]) := E [ k k ] − E [ k ] E [ k ] − | r | q E [ k ] − ( E [ k ]) q E [ k ] − ( E [ k ]) (36)in addition to the stationary moment equation and the moment conditions. Note that thevariable of h i ( · ) is E [ k k ], and the other moments can be computed from the a priori information of the marginal distributions. The moment matrix H = g g T0 is defined with g = [1 , E [ x ] , E [ x ] , E [ k ] , E [ k ] , E [ x k ] , E [ x k ] , E [ x k ] , E [ x k ]] T (37)when ρ = σ = 1, and c ( k ) := k , c ( k ) := k , d ( x ) = x , d ( x ) = x (38)Finally, we replace all of the moment variables E [ x i k j ] in these constraints with indepen-dent variables ¯ µ , ¯ ν and ¯ ξ to obtain the optimization problem (22). Then, the optimization16 ABLE III. Reaction kinetics in the reaction system (40)Reaction number i Reaction rate w i Stoichiometry s i w = k D s = 12 w = k x s = − w = k x ( x − s = − problem is solved with the parameters k ∗ , k ∗ and C shown in (31), and the shape and thescale factors, η i and θ i , are set to ( η , θ ) = (2 , .
4) and ( η , θ ) = (3 , . ρ = 1 and σ = 1 are used.The upper and the lower bounds of the mean copy number E [ x ] for different values ofthe correlation coefficient, | r | = 0 . , . , . , . , and 1 .
00 are shown in FIG. 3. Thegap between the upper and the lower bounds increases almost linearly in the correlationcoefficient | r | . This is because of the uncertainty in the underlying parameter distribution P ( k , k ).In this special example, the upper and the lower bounds can be analytically obtained.To be more specific, Cθ θ k ∗ k ∗ ( η η − | r |√ η η ) ≤ E [ x ] ≤ Cθ θ k ∗ k ∗ ( η η + | r |√ η η ) , (39)which is linear in | r | . The output of the optimization problem indeed coincides with thisinequality as shown in Fig. 3, implying that the bounds are tight even when the parameterdistribution is uncertain, i.e., the distribution is not uniquely determined. B. A model of dimerization process
1. Analysis for fully known parameter distributions
In this section, we consider a more complex case that includes bimolecular reactions.Specifically, we consider the dimerization process of a molecular species A, φ w −→ A, A w −→ φ, A w −→ A : A, (40)where the copy number of the molecular species A is denoted by x . The propensity functions w i ( x, k i ) and stoichiometries s i ( i = 1 , ,
3) for these reactions are specified in TABLE III.17
A) (B)
FIG. 4. Upper and lower bounds of statistics of molecule A in the reaction system (40). (A) Mean.(B) Standard deviation.
Unlike the previous example, the propensity function w ( x, k ) = k x ( x −
1) is not linear in x , in which case an analytic solution of the moment equation is no longer obtained. In whatfollows, we first show that the bounds of the statistics can be tightly obtained even whenthe moment equation is not closed for the copy number x , and then show that these boundsbecome less tight when the equation is not closed for both of the parameters and the copynumbers, for which case the propensity function becomes a third order monomial of k and x . Suppose the distribution of the parameter k is a gamma distribution, and P ( k ) is givenby P ( k ) = G ( k ; η, θ ) δ ( k − k ∗ ) δ ( k − k ∗ ) , (41)and the parameters are η = 2, θ = 0 . k ∗ = 0 . k ∗ = 0 .
02 and D = 5. Here, we areinterested in computing the mean and the variance of the monomer copy number x .The truncated moment equation for ρ = 1 and σ = 1 is, for example, obtained as = − k ∗ + 2 k ∗ − k ∗ E [ x ] E [ x ] + − k ∗ + 2 k ∗ − k ∗ E [ xk ] E [ x k ] + D D E [ k ] E [ k ] , (42)18here E [ k ] and E [ k ] are constants given by eq. (26).The equation (42) is underdetermined, and thus, the solution of the truncated momentequation (42) is not unique. Thus, we use the moment matrices that include all of themoments to restrict possible solutions of eq. (42). Specifically, H = E [ x ] E [ k ] E [ xk ] E [ x ] E [ x ] E [ xk ] E [ x k ] E [ k ] E [ xk ] E [ k ] E [ xk ] E [ xk ] E [ x k ] E [ xk ] E [ x k ] (cid:23) O, (43)and c ( k ) = k , d ( x ) = x. (44)Solving the optimization problem, we obtain the mean copy number E [ x ] and the stan-dard deviation q E [ x ] − ( E [ x ]) for different values of σ as shown in FIG. 4 (A) and (B),respectively, where the order of the moments for the x is set as ρ = 5. These figuresshow that the gap between the upper and the lower bounds of the statistics monotonicallyapproaches to each other. With σ = 9, the bounds are obtained as5 . ≤ E [ x ] ≤ . , . ≤ E [ x ] − ( E [ x ]) ≤ . . (45)These bounds are also verified by the 100,000 sample-path simulations generated by thestochastic simulation algorithm (see “simulation” in FIG. 4(A) and (B)).Not surprisingly, the tightness of the bounds depends on the number of underdeterminedvariables in the equation (42). The next example considers the case where the parameter k instead of k follows a gamma distribution, i.e., P ( k ) = G ( k ; θ, η ) δ ( k − k ∗ ) δ ( k − k ∗ ),where η = 1 and θ = 0 .
02, respectively. The parameters are k ∗ = 0 . , k ∗ = 0 . C = 5 . (46)In this case, both x and k in w ( x, k ) become variables, and thus, the right-hand side ofthe moment equation depends on higher-order moments in x and k , which results in more19 A) (B)
FIG. 5. Computed upper and lower bounds of (A) the mean and (B) the standard deviation of themolecular species A when the distribution of k is uncertain. variables than eq. (42). For example, = Dk ∗ − k ∗
00 0 0 E [ x ] E [ x ] + − − k ∗ − E [ xk ] E [ x k ] E [ xk ] E [ x k ] + Dk ∗ E [ k ] E [ k ] , (47)where E [ k ] and E [ k ] are constants given by eq. (26).Figures 5 (A) and (B) illustrate the computed bounds of the mean E [ x ] and the standarddeviation q E [ x ] − ( E [ x ]) , where ρ = 3 and 7 is used. Similar to the previous examples,the upper and the lower bounds approach to each other, but, unlike FIG. 4, the gap remainsfor relatively large ρ and σ .
2. Analysis for uncertain parameter distributions
Finally, we consider a more practical case where the joint distribution of the parameters P ( k , k ) is uncertain. Specifically, we assume that the marginal distributions of P ( k ) and20 IG. 6. Bounds of maximum and minimum values of mean copy number E [ x ] for the reactionsystem (40). P ( k ) are gamma distributions, but they are correlated, i.e., h ( E [ k k ]) in eq. (35) and h ( E [ k k ]) in eq. (36) are non-negative as discussed in Section IV A 2.The shape and scale factors ( η i , θ i ) ( i = 1 ,
3) are assumed to be ( η , θ ) = (2 , .
4) and( η , θ ) = (4 , . ρ = 5 and σ = 9, and the other parameters are set to the same as eq. (46). Fordifferent values of the correlation coefficient r , we compute the upper and lower bounds ofmaximum and minimum values of mean copy number E [ x ]. The results are illustrated inFIG. 6.In FIG. 6, we also plot the bounds when P ( k ) and P ( k ) are independent, that is, E [ k k ] = E [ k ] E [ k ], for which case the joint distribution P ( k , k ) is uniquely determinedsince the marginal distributions are assumed to be gamma distributions. Since the jointparameter distribution cannot be uniquely determined except for the case of independentdistributions, it is natural that there is a gap between the upper and the lower bounds. Weobserve from FIG. 6 that the gap between the upper and the lower bounds monotonicallyincreases with the correlation coefficient | r | .21o verify this result, we generate 100,000 sample paths by the stochastic simulationalgorithm (SSA) and plot the intervals of the mean copy number (see FIG. 6). It should benoted that the tightness of the bounds cannot be discussed from FIG. 6 due to the possibleinsufficiency of the sample paths. In general, we need to generate a significantly large numberof sample paths for parameters drawn from the distributions satisfying the constraints (35)and (36) to obtain asymptotic bounds when the parameter distribution is uncertain. Theproposed approach, on the other hand, is capable of computing mathematically rigorousbounds of the moments. Thus, it is useful for rational engineering and analysis of stochasticchemical reaction systems. Remark.
All optimization problems in this section were solved with SeDuMi 1.3.2 onMATLAB 2019a. In the code, the variables were normalized by constants to avoid numericalinstability of the solver. V. CONCLUSION
We have proposed a computational framework to analyze the worst-case stationary mo-ments of the molecular copy number distributions in stochastic chemical reactions withparametric uncertainty. Specifically, a mathematical optimization method has been devel-oped to compute the intervals of the possible moment values of uncertain moment equationswhose parameters are given only partially using the statistics of the parameter distributions.A distinctive feature of the proposed method is that it has been derived without approxi-mating the governing equation of the stochastic chemical reactions, i.e. , the CME, unlikemany other approaches reviewed in Section I. In other words, the moments of interest areguaranteed to be within the bounds for all possible parameter distributions satisfying thegiven statistics. This feature is useful for model-based rational engineering of biomolecularcircuits in synthetic biology, where the robustness of synthetic reactions is important. Inparticular, the optimization program can easily incorporate the new information of the pa-rameter distributions upon the arrival of new data, allowing for the progressive improvementof the predicted bounds through the multiple cycles of analyses and experiments.Though this paper has presented the fundamental theoretical results, future efforts arestill required to broaden the applicability of the proposed approach to more complex chemi-cal reactions with many molecular species. The remaining challenges toward this goal lie in22omputational cost and the numerical instability of the optimization solver, which are com-mon to the moment computation methods based on the generalized moment problem . Thehigh computational burden is due to the number of variables in the optimization problem,which rapidly increases as we increase ρ and σ . A potential solution to this issue would beto use the sparse structure of the dual optimization problem to reduce the variables , butfurther investigation is left for future work. The numerical instability, on the other hand,is due to the fact that the values of the low and the high moments are different by ordersof magnitude. The normalization of the variables is effective for this problem, but a moresystematic scheme is desirable in the future. REFERENCES P. S. Swain, M. B. Elowitz, and E. D. Siggia, Proceedings of National Academy of Sciencesof the United States of America , 12795 (2002). E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, and A. van Oudenaarden,Nature Genetics , 69 (2002). Y. Taniguchi, P. J. Choi, G.-W. Li, H. Chen, M. Babu, J. Hearn, A. Emili, and X. S. Xie,Science , 533 (2010). M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain, Science , 1183 (2002). D. A. McQuarrie, Journal of Applied Probability , 413 (1967). D. T. Gillespie, Physica A , 404 (1992). D. T. Gillespie, Journal of Computational Physics , 403 (1976). D. T. Gillespie, The Journal of Chemical Physics , 297 (2000). D. T. Gillespie, The Journal of Chemical Physics , 1716 (2001). N. G. van Kampen,
Stochastic processes in physics and chemistry , 3rd ed. (North Holland,2007). A. Singh and J. P. Hespanha, IEEE Transactions on Automatic Control , 414 (2011). E. Lakatos, A. Ale, P. D. W. Kirk, and M. P. H. Stumpf, The Journal of Chemical Physics , 094107 (2015). D. Schnoerr, G. Sanguinetti, and R. Grima, The Journal of Chemical Physics , 185101(2015). B. Munsky and M. Khammash, The Journal of Chemical Physics , 044104 (2006).23 A. Gupta, J. Mikelson, and M. Khammash, The Journal of Chemical Physics , 154101(2017). K. R. Ghusinga, C. A. Vargas-Garcia, A. Lamperski, and A. Singh, Physical Biology ,04LT01 (2017). J. Kuntz, P. Thomas, G.-B. Stan, and M. Barahona, The Journal of Chemical Physics , 034109 (2019). Y. Sakurai and Y. Hori, in
Proceedings of IEEE Conference on Decision and Control (2017)pp. 1206–1211. Y. Sakurai and Y. Hori, Journal of the Royal Society Interface , 20170709 (2018). G. R. Dowdy and P. I. Barton, The Journal of Chemical Physics , 084106 (2018). G. R. Dowdy and P. I. Barton, The Journal of Chemical Physics , 074103 (2018). Y. Sakurai and Y. Hori, IEEE Control Systems Letters , 290 (2019). J. B. Lasserre,
Moments, positive polynomials and their applications (Imperial CollegePress, 2009). E. K. O. J. M. Raser, Science , 1811 (2004). V. Shahrezaei, J. F. Ollivier, and P. S. Swain, Molecular Systems Biology , 196 (2008). M. Scott, T. Hwa, and B. Ingalls, Proceedings of National Academy of Sciences of theUnited States of America , 7402 (2007). C. Zechner, J. Ruess, P. Krenn, S. Pelet, M. Peter, J. Lygeros, and H. Koeppl, Proceedingsof National Academy of Sciences of the United States of America , 8340 (2012). M. Hallen, B. Li, Y. Tanouchi, C. Tan, and L. You, PLOS Computational Biology ,e1002209 (2011). T. Toni and B. Tidor, PLOS Computational Biology , e1002960 (2013). G. Caravagna, G. Mauri, and A. d’Onofrio, PLOS Computational Biology , e51174(2013). M. ˇCeska, D. ˇSafr´anek, S. Draˇzan, and L. Brim, PLOS One , e94553 (2014). H. J. Landau,
Moments in Mathematics (American Mathematical Society, 1987). J. F. Sturm, Optimization Methods and Software , 625 (1999). H. Waki, S. Kim, M. Kojima, and M. Muramatsu, SIAM Journal on Optimization ,218 (2006). 24 ppendix A: Proof of Proposition 1 Eq. (17) and eq. (18) follow directly from eq. (10) and eq. (15), respectively, and eq.(19) is obvious because of the product of the same vectors. Eq. (20) holds because c i ( k ) ≥ d i − ℓ ( x ) ≥ P ( x , k , x ), which follows from eq.(15) and eq. (16). (cid:3)(cid:3)